Merge master to PR-98 by isnotinvain · Pull Request #3 · julienledem/parquet-java

isnotinvain · 2015-03-04T23:43:00Z

No description provided.

AssertionError(String, Throwable) was introduced in Java7. Replacing it with AssertionError(String) + initCause(Throwable) Author: Laurent Goujon <[email protected]> Closes apache#101 from laurentgo/fix-java7ism and squashes the following commits: c00fb7c [Laurent Goujon] Replaces AssertionError constructor introduced in Java7

Upgrade snappy-java to 1.1.1.6 (the latest vesrion), since 1.0.5 is no longer maintained in https://github.com/xerial/snappy-java, and 1.1.1.6 supports broader platforms including PowerPC, IBM-AIX 6.4, SunOS, etc. And also it has a better native coding loading mechanism (allowing to use snappy-java from multiple class loaders) Author: Taro L. Saito <[email protected]> Closes apache#85 from xerial/PARQUET-133 and squashes the following commits: 01d7b78 [Taro L. Saito] PARQUET-133: Upgrade snappy-java to 1.1.1.6

…and ... ...path Author: Chris Albright <[email protected]> Closes apache#79 from chrisalbright/master and squashes the following commits: b1b0086 [Chris Albright] Merge remote-tracking branch 'upstream/master' 9669427 [Chris Albright] PARQUET-124: Adding test (Thanks Ryan Blue) that proves mergeFooters was failing 8e342ed [Chris Albright] PARQUET-124: normalize path checking to prevent mismatch between URI and path

Currently parquet-tools command fails when input is a directory with _SUCCESS file from mapreduce. Filtering those out like ParquetFileReader does fixes the problem. ``` parquet-cat /tmp/parquet_write_test Could not read footer: java.lang.RuntimeException: file:/tmp/parquet_write_test/_SUCCESS is not a Parquet file (too small) $ tree /tmp/parquet_write_test /tmp/parquet_write_test ├── part-m-00000.parquet └── _SUCCESS ``` Author: Neville Li <[email protected]> Closes apache#89 from nevillelyh/gh/path-filter and squashes the following commits: 7377a20 [Neville Li] PARQUET-142: add path filter in ParquetReader

There is a divide by zero error in logging code inside the InternalParquetRecordReader. I've been running with this fixed for a while but everytime I revert I hit the problem again. I can't believe anyone else hasn't had this problem. I submitted a Jira ticket a few weeks ago but didn't hear anything on the list so here's the fix. This also avoids compiling log statements in some cases where it's unnecessary inside the checkRead method of InternalParquetRecordReader. Also added a .gitignore entry to clean up a build artifact. Author: Jim Carroll <[email protected]> Closes apache#102 from jimfcarroll/divide-by-zero-fix and squashes the following commits: 423200c [Jim Carroll] Filter out parquet-scrooge build artifact from git. 22337f3 [Jim Carroll] PARQUET-157: Fix a divide by zero error when Parquet runs quickly. Also avoid compiling log statements in some cases where it's unnecessary.

Updates for first Apache release of parquet-mr. Author: Ryan Blue <[email protected]> Closes apache#109 from rdblue/PARQUET-111-update-for-apache-release and squashes the following commits: bf19849 [Ryan Blue] PARQUET-111: Add ARRIS copyright header to parquet-tools. f1a5c28 [Ryan Blue] PARQUET-111: Update headers in parquet-protobuf. ee4ea88 [Ryan Blue] PARQUET-111: Remove leaked LICENSE and NOTICE files. 5bf178b [Ryan Blue] PARQUET-111: Update module names, urls, and binary LICENSE files. 6736320 [Ryan Blue] PARQUET-111: Add RAT exclusion for auto-generated POM files. 7db4553 [Ryan Blue] PARQUET-111: Add attribution for Spark dev script to LICENSE. 45e29f2 [Ryan Blue] PARQUET-111: Update LICENSE and NOTICE. 516c058 [Ryan Blue] PARQUET-111: Update license headers to pass RAT check. da688e3 [Ryan Blue] PARQUET-111: Update NOTICE with Apache boilerplate. 234715d [Ryan Blue] PARQUET-111: Add DISCLAIMER and KEYS. f1d3601 [Ryan Blue] PARQUET-111: Update to use Apache parent POM.

[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/incubator-parquet-mr/108)  Author: Cheng Lian <[email protected]> Closes apache#108 from liancheng/PARQUET-173 and squashes the following commits: d188f0b [Cheng Lian] Fixes test case be2c8a1 [Cheng Lian] Fixes `StatisticsFilter` for `And` filter predicate

This is similar to https://github.com/apache/incubator-parquet-mr/pull/43, but instead of making `ThriftWriteSupport` abstract, it keeps it around (but deprecated) and adds `AbstractThriftWriteSupport`. This is a little less elegant, but it seems to appease the semver overlords. Author: Colin Marc <[email protected]> Closes apache#58 from colinmarc/scrooge-write-support-2 and squashes the following commits: e2a0abd [Colin Marc] add write support to ParquetScroogeScheme 19cf1a8 [Colin Marc] Add ScroogeWriteSupport and ParquetScroogeOutputFormat.

PARQUET-177 Author: Daniel Weeks <[email protected]> Closes apache#115 from danielcweeks/memory-manager-limit and squashes the following commits: b2e4708 [Daniel Weeks] Updated to base memory allocation off estimated chunk size 09d7aa3 [Daniel Weeks] Updated property name and default value 8f6cff1 [Daniel Weeks] Added low bound to memory manager resize

This updates the InternalParquetRecordReader to initialize the ReadContext in each task rather than once for an entire job. There are two reasons for this change: 1. For correctness, the requested projection schema must be validated against each file schema, not once using the merged schema. 2. To avoid reading file footers on the client side, which is a performance bottleneck. Because the read context is reinitialized in every task, it is no longer necessary to pass the its contents to each task in ParquetInputSplit. The fields and accessors have been removed. This also adds a new InputFormat, ParquetFileInputFormat that uses FileSplits instead of ParquetSplits. It goes through the normal ParquetRecordReader and creates a ParquetSplit on the task side. This is to avoid accidental behavior changes in ParquetInputFormat. Author: Ryan Blue <[email protected]> Closes apache#91 from rdblue/PARQUET-139-input-format-task-side and squashes the following commits: cb30660 [Ryan Blue] PARQUET-139: Fix deprecated reader bug from review fixes. 09cde8d [Ryan Blue] PARQUET-139: Implement changes from reviews. 3eec553 [Ryan Blue] PARQUET-139: Merge new InputFormat into ParquetInputFormat. 8971b80 [Ryan Blue] PARQUET-139: Add ParquetFileInputFormat that uses FileSplit. 87dfe86 [Ryan Blue] PARQUET-139: Expose read support helper methods. 057c7dc [Ryan Blue] PARQUET-139: Update reader to initialize read context in tasks.

…2 api Currently for creating a user defined predicate using the new filter api, no value can be passed to create a dynamic filter at runtime. This reduces the usefulness of the user defined predicate, and meaningful predicates cannot be created. We can add a generic Object value that is passed through the api, which can internally be used in the keep function of the user defined predicate for creating many different types of filters. For example, in spark sql, we can pass in a list of filter values for a where IN clause query and filter the row values based on that list. Author: Yash Datta <[email protected]> Author: Alex Levenson <[email protected]> Author: Yash Datta <[email protected]> Closes apache#73 from saucam/master and squashes the following commits: 7231a3b [Yash Datta] Merge pull request #3 from isnotinvain/alexlevenson/fix-binary-compat dcc276b [Alex Levenson] Ignore binary incompatibility in private filter2 class 7bfa5ad [Yash Datta] Merge pull request #2 from isnotinvain/alexlevenson/simplify-udp-state 0187376 [Alex Levenson] Resolve merge conflicts 25aa716 [Alex Levenson] Simplify user defined predicates with state 51952f8 [Yash Datta] PARQUET-116: Fix whitespace d7b7159 [Yash Datta] PARQUET-116: Make UserDefined abstract, add two subclasses, one accepting udp class, other accepting serializable udp instance 40d394a [Yash Datta] PARQUET-116: Fix whitespace 9a63611 [Yash Datta] PARQUET-116: Fix whitespace 7caa4dc [Yash Datta] PARQUET-116: Add ConfiguredUserDefined that takes a serialiazble udp directly 0eaabf4 [Yash Datta] PARQUET-116: Move the config object from keep method to a configure method in udp predicate f51a431 [Yash Datta] PARQUET-116: Adding type safety for the filter object to be passed to user defined predicate d5a2b9e [Yash Datta] PARQUET-116: Enforce that the filter object to be passed must be Serializable dfd0478 [Yash Datta] PARQUET-116: Add a test case for passing a filter object to user defined predicate 4ab46ec [Yash Datta] PARQUET-116: Pass a filter object to user defined predicate in filter2 api

Author: Ryan Blue <[email protected]> Closes apache#119 from rdblue/PARQUET-164-add-memory-manager-warning and squashes the following commits: 241144f [Ryan Blue] PARQUET-164: Add warning when scaling row group sizes.

…reForRead ReadSupport.prepareForRead does not return RecordConsumer but RecordMaterializer Author: choplin <[email protected]> Closes apache#125 from choplin/fix-javadoc-comment and squashes the following commits: c3574f3 [choplin] fix an inconsistent Javadoc comment of ReadSupport.prepareForRead

Author: Ryan Blue <[email protected]> Closes apache#126 from rdblue/PARQUET-191-fix-map-value-conversion and squashes the following commits: 33f6bbc [Ryan Blue] PARQUET-191: Fix map Type to Avro Schema conversion.

This depends on PARQUET-191 for the correct schema representation. Author: Ryan Blue <[email protected]> Closes apache#127 from rdblue/PARQUET-192-fix-map-null-encoding and squashes the following commits: fffde82 [Ryan Blue] PARQUET-192: Fix parquet-avro maps with null values.

This was the behavior before the V2 pages were added. Author: Ryan Blue <[email protected]> Closes apache#129 from rdblue/PARQUET-188-fix-column-metadata-order and squashes the following commits: 3c9fa5d [Ryan Blue] PARQUET-188: Change column ordering to match the field order.

…seqAsJavaList The former was removed in 2.11, but the latter exists in 2.9, 2.10 and 2.11. With this change, I can build on 2.11 without any issue. Author: Colin Marc <[email protected]> Closes apache#121 from colinmarc/build-211 and squashes the following commits: 8a29319 [Colin Marc] Replace JavaConversions.asJavaList with JavaConversions.seqAsJavaList.

Conflicts: parquet-hadoop/src/main/java/parquet/hadoop/ColumnChunkPageWriteStore.java parquet-hadoop/src/test/java/parquet/hadoop/TestColumnChunkPageWriteStore.java

isnotinvain · 2015-03-04T23:43:31Z

Hmm, this didn't work the way I wanted, nevermind

laurentgo and others added 19 commits January 28, 2015 16:07

Update Travis CI link in README.md.

a635f21

PARQUET-164: Add warning when scaling row group sizes.

f48bca0

Author: Ryan Blue <[email protected]> Closes apache#119 from rdblue/PARQUET-164-add-memory-manager-warning and squashes the following commits: 241144f [Ryan Blue] PARQUET-164: Add warning when scaling row group sizes.

PARQUET-191: Fix map Type to Avro Schema conversion.

f1b5487

Author: Ryan Blue <[email protected]> Closes apache#126 from rdblue/PARQUET-191-fix-map-value-conversion and squashes the following commits: 33f6bbc [Ryan Blue] PARQUET-191: Fix map Type to Avro Schema conversion.

Merge branch 'master' into PR-98

7a16220

Conflicts: parquet-hadoop/src/main/java/parquet/hadoop/ColumnChunkPageWriteStore.java parquet-hadoop/src/test/java/parquet/hadoop/TestColumnChunkPageWriteStore.java

isnotinvain closed this Mar 4, 2015

julienledem pushed a commit that referenced this pull request Jun 9, 2017

Fix comptatibility issues (#3)

d8b991c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Merge master to PR-98#3

Merge master to PR-98#3
isnotinvain wants to merge 19 commits intojulienledem:avoid_wasting_64K_per_empty_bufferfrom
isnotinvain:PR-98

isnotinvain commented Mar 4, 2015

Uh oh!

isnotinvain commented Mar 4, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Comments

Conversation

isnotinvain commented Mar 4, 2015

Uh oh!

isnotinvain commented Mar 4, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants