Skip to content

PARQUET-142: add path filter in ParquetReader#89

Closed
nevillelyh wants to merge 1 commit intoapache:masterfrom
nevillelyh:gh/path-filter
Closed

PARQUET-142: add path filter in ParquetReader#89
nevillelyh wants to merge 1 commit intoapache:masterfrom
nevillelyh:gh/path-filter

Conversation

@nevillelyh
Copy link
Contributor

Currently parquet-tools command fails when input is a directory with _SUCCESS file from mapreduce. Filtering those out like ParquetFileReader does fixes the problem.

parquet-cat /tmp/parquet_write_test
Could not read footer: java.lang.RuntimeException: file:/tmp/parquet_write_test/_SUCCESS is not a Parquet file (too small)

$ tree /tmp/parquet_write_test
/tmp/parquet_write_test
├── part-m-00000.parquet
└── _SUCCESS

@rdblue
Copy link
Contributor

rdblue commented Dec 2, 2014

Thanks @nevillelyh! The patch looks good, but could you prefix this PR title and the commit message with the JIRA issue id? We use that as a way to keep track and make sure all commits are for an issue, so we can't merge this until that's fixed. Thanks!

@nevillelyh nevillelyh changed the title add path filter in ParquetReader PARQUET-142: add path filter in ParquetReader Dec 4, 2014
@nevillelyh
Copy link
Contributor Author

@rdblue fixed, thanks.

@rdblue
Copy link
Contributor

rdblue commented Dec 6, 2014

@tomwhite and @julienledem, what do you think about merging this? This is a behavior change, but I don't think we have behavior guarantees for parquet-tools. I also see this as more of a bugfix because it is far more common that users have a _SUCCESS file than it is for users to expect the tool to pick up dot files. I'm +1.

@julienledem
Copy link
Member

+1 this sounds good to me.

@asfgit asfgit closed this in b4380f2 Jan 30, 2015
@rdblue
Copy link
Contributor

rdblue commented Jan 30, 2015

I've merged this. Thanks @nevillelyh!

dongche pushed a commit to dongche/incubator-parquet-mr that referenced this pull request Feb 3, 2015
Currently parquet-tools command fails when input is a directory with _SUCCESS file from mapreduce. Filtering those out like ParquetFileReader does fixes the problem.

```
parquet-cat /tmp/parquet_write_test
Could not read footer: java.lang.RuntimeException: file:/tmp/parquet_write_test/_SUCCESS is not a Parquet file (too small)

$ tree /tmp/parquet_write_test
/tmp/parquet_write_test
├── part-m-00000.parquet
└── _SUCCESS
```

Author: Neville Li <[email protected]>

Closes apache#89 from nevillelyh/gh/path-filter and squashes the following commits:

7377a20 [Neville Li] PARQUET-142: add path filter in ParquetReader
rdblue pushed a commit to rdblue/parquet-mr that referenced this pull request Mar 9, 2015
Currently parquet-tools command fails when input is a directory with _SUCCESS file from mapreduce. Filtering those out like ParquetFileReader does fixes the problem.

```
parquet-cat /tmp/parquet_write_test
Could not read footer: java.lang.RuntimeException: file:/tmp/parquet_write_test/_SUCCESS is not a Parquet file (too small)

$ tree /tmp/parquet_write_test
/tmp/parquet_write_test
├── part-m-00000.parquet
└── _SUCCESS
```

Author: Neville Li <[email protected]>

Closes apache#89 from nevillelyh/gh/path-filter and squashes the following commits:

7377a20 [Neville Li] PARQUET-142: add path filter in ParquetReader
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants