Skip to content

Comments

Merge master to PR-98#3

Closed
isnotinvain wants to merge 19 commits intojulienledem:avoid_wasting_64K_per_empty_bufferfrom
isnotinvain:PR-98
Closed

Merge master to PR-98#3
isnotinvain wants to merge 19 commits intojulienledem:avoid_wasting_64K_per_empty_bufferfrom
isnotinvain:PR-98

Conversation

@isnotinvain
Copy link

No description provided.

laurentgo and others added 19 commits January 28, 2015 16:07
AssertionError(String, Throwable) was introduced in Java7. Replacing it with AssertionError(String) + initCause(Throwable)

Author: Laurent Goujon <[email protected]>

Closes apache#101 from laurentgo/fix-java7ism and squashes the following commits:

c00fb7c [Laurent Goujon] Replaces AssertionError constructor introduced in Java7
Upgrade snappy-java to 1.1.1.6 (the latest vesrion), since 1.0.5 is no longer maintained in https://github.com/xerial/snappy-java, and 1.1.1.6 supports broader platforms including PowerPC, IBM-AIX 6.4, SunOS, etc. And also it has a better native coding loading mechanism (allowing to use snappy-java from multiple class loaders)

Author: Taro L. Saito <[email protected]>

Closes apache#85 from xerial/PARQUET-133 and squashes the following commits:

01d7b78 [Taro L. Saito] PARQUET-133: Upgrade snappy-java to 1.1.1.6
…and ...

...path

Author: Chris Albright <[email protected]>

Closes apache#79 from chrisalbright/master and squashes the following commits:

b1b0086 [Chris Albright] Merge remote-tracking branch 'upstream/master'
9669427 [Chris Albright] PARQUET-124: Adding test (Thanks Ryan Blue) that proves mergeFooters was failing
8e342ed [Chris Albright] PARQUET-124: normalize path checking to prevent mismatch between URI and path
Currently parquet-tools command fails when input is a directory with _SUCCESS file from mapreduce. Filtering those out like ParquetFileReader does fixes the problem.

```
parquet-cat /tmp/parquet_write_test
Could not read footer: java.lang.RuntimeException: file:/tmp/parquet_write_test/_SUCCESS is not a Parquet file (too small)

$ tree /tmp/parquet_write_test
/tmp/parquet_write_test
├── part-m-00000.parquet
└── _SUCCESS
```

Author: Neville Li <[email protected]>

Closes apache#89 from nevillelyh/gh/path-filter and squashes the following commits:

7377a20 [Neville Li] PARQUET-142: add path filter in ParquetReader
There is a divide by zero error in logging code inside the InternalParquetRecordReader. I've been running with this fixed for a while but everytime I revert I hit the problem again. I can't believe anyone else hasn't had this problem. I submitted a Jira ticket a few weeks ago but didn't hear anything on the list so here's the fix.

This also avoids compiling log statements in some cases where it's unnecessary inside the checkRead method of InternalParquetRecordReader.

Also added a .gitignore entry to clean up a build artifact.

Author: Jim Carroll <[email protected]>

Closes apache#102 from jimfcarroll/divide-by-zero-fix and squashes the following commits:

423200c [Jim Carroll] Filter out parquet-scrooge build artifact from git.
22337f3 [Jim Carroll] PARQUET-157: Fix a divide by zero error when Parquet runs quickly. Also avoid compiling log statements in some cases where it's unnecessary.
Updates for first Apache release of parquet-mr.

Author: Ryan Blue <[email protected]>

Closes apache#109 from rdblue/PARQUET-111-update-for-apache-release and squashes the following commits:

bf19849 [Ryan Blue] PARQUET-111: Add ARRIS copyright header to parquet-tools.
f1a5c28 [Ryan Blue] PARQUET-111: Update headers in parquet-protobuf.
ee4ea88 [Ryan Blue] PARQUET-111: Remove leaked LICENSE and NOTICE files.
5bf178b [Ryan Blue] PARQUET-111: Update module names, urls, and binary LICENSE files.
6736320 [Ryan Blue] PARQUET-111: Add RAT exclusion for auto-generated POM files.
7db4553 [Ryan Blue] PARQUET-111: Add attribution for Spark dev script to LICENSE.
45e29f2 [Ryan Blue] PARQUET-111: Update LICENSE and NOTICE.
516c058 [Ryan Blue] PARQUET-111: Update license headers to pass RAT check.
da688e3 [Ryan Blue] PARQUET-111: Update NOTICE with Apache boilerplate.
234715d [Ryan Blue] PARQUET-111: Add DISCLAIMER and KEYS.
f1d3601 [Ryan Blue] PARQUET-111: Update to use Apache parent POM.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/incubator-parquet-mr/108)
<!-- Reviewable:end -->

Author: Cheng Lian <[email protected]>

Closes apache#108 from liancheng/PARQUET-173 and squashes the following commits:

d188f0b [Cheng Lian] Fixes test case
be2c8a1 [Cheng Lian] Fixes `StatisticsFilter` for `And` filter predicate
This is similar to https://github.com/apache/incubator-parquet-mr/pull/43, but instead of making `ThriftWriteSupport` abstract, it keeps it around (but deprecated) and adds `AbstractThriftWriteSupport`. This is a little less elegant, but it seems to appease the semver overlords.

Author: Colin Marc <[email protected]>

Closes apache#58 from colinmarc/scrooge-write-support-2 and squashes the following commits:

e2a0abd [Colin Marc] add write support to ParquetScroogeScheme
19cf1a8 [Colin Marc] Add ScroogeWriteSupport and ParquetScroogeOutputFormat.
PARQUET-177

Author: Daniel Weeks <[email protected]>

Closes apache#115 from danielcweeks/memory-manager-limit and squashes the following commits:

b2e4708 [Daniel Weeks] Updated to base memory allocation off estimated chunk size
09d7aa3 [Daniel Weeks] Updated property name and default value
8f6cff1 [Daniel Weeks] Added low bound to memory manager resize
This updates the InternalParquetRecordReader to initialize the ReadContext in each task rather than once for an entire job. There are two reasons for this change:

1. For correctness, the requested projection schema must be validated against each file schema, not once using the merged schema.
2. To avoid reading file footers on the client side, which is a performance bottleneck.

Because the read context is reinitialized in every task, it is no longer necessary to pass the its contents to each task in ParquetInputSplit. The fields and accessors have been removed.

This also adds a new InputFormat, ParquetFileInputFormat that uses FileSplits instead of ParquetSplits. It goes through the normal ParquetRecordReader and creates a ParquetSplit on the task side. This is to avoid accidental behavior changes in ParquetInputFormat.

Author: Ryan Blue <[email protected]>

Closes apache#91 from rdblue/PARQUET-139-input-format-task-side and squashes the following commits:

cb30660 [Ryan Blue] PARQUET-139: Fix deprecated reader bug from review fixes.
09cde8d [Ryan Blue] PARQUET-139: Implement changes from reviews.
3eec553 [Ryan Blue] PARQUET-139: Merge new InputFormat into ParquetInputFormat.
8971b80 [Ryan Blue] PARQUET-139: Add ParquetFileInputFormat that uses FileSplit.
87dfe86 [Ryan Blue] PARQUET-139: Expose read support helper methods.
057c7dc [Ryan Blue] PARQUET-139: Update reader to initialize read context in tasks.
…2 api

Currently for creating a user defined predicate using the new filter api, no value can be passed to create a dynamic filter at runtime. This reduces the usefulness of the user defined predicate, and meaningful predicates cannot be created. We can add a generic Object value that is passed through the api, which can internally be used in the keep function of the user defined predicate for creating many different types of filters.
For example, in spark sql, we can pass in a list of filter values for a where IN clause query and filter the row values based on that list.

Author: Yash Datta <[email protected]>
Author: Alex Levenson <[email protected]>
Author: Yash Datta <[email protected]>

Closes apache#73 from saucam/master and squashes the following commits:

7231a3b [Yash Datta] Merge pull request #3 from isnotinvain/alexlevenson/fix-binary-compat
dcc276b [Alex Levenson] Ignore binary incompatibility in private filter2 class
7bfa5ad [Yash Datta] Merge pull request #2 from isnotinvain/alexlevenson/simplify-udp-state
0187376 [Alex Levenson] Resolve merge conflicts
25aa716 [Alex Levenson] Simplify user defined predicates with state
51952f8 [Yash Datta] PARQUET-116: Fix whitespace
d7b7159 [Yash Datta] PARQUET-116: Make UserDefined abstract, add two subclasses, one accepting udp class, other accepting serializable udp instance
40d394a [Yash Datta] PARQUET-116: Fix whitespace
9a63611 [Yash Datta] PARQUET-116: Fix whitespace
7caa4dc [Yash Datta] PARQUET-116: Add ConfiguredUserDefined that takes a serialiazble udp directly
0eaabf4 [Yash Datta] PARQUET-116: Move the config object from keep method to a configure method in udp predicate
f51a431 [Yash Datta] PARQUET-116: Adding type safety for the filter object to be passed to user defined predicate
d5a2b9e [Yash Datta] PARQUET-116: Enforce that the filter object to be passed must be Serializable
dfd0478 [Yash Datta] PARQUET-116: Add a test case for passing a filter object to user defined predicate
4ab46ec [Yash Datta] PARQUET-116: Pass a filter object to user defined predicate in filter2 api
Author: Ryan Blue <[email protected]>

Closes apache#119 from rdblue/PARQUET-164-add-memory-manager-warning and squashes the following commits:

241144f [Ryan Blue] PARQUET-164: Add warning when scaling row group sizes.
…reForRead

ReadSupport.prepareForRead does not return RecordConsumer but RecordMaterializer

Author: choplin <[email protected]>

Closes apache#125 from choplin/fix-javadoc-comment and squashes the following commits:

c3574f3 [choplin] fix an inconsistent Javadoc comment of ReadSupport.prepareForRead
Author: Ryan Blue <[email protected]>

Closes apache#126 from rdblue/PARQUET-191-fix-map-value-conversion and squashes the following commits:

33f6bbc [Ryan Blue] PARQUET-191: Fix map Type to Avro Schema conversion.
This depends on PARQUET-191 for the correct schema representation.

Author: Ryan Blue <[email protected]>

Closes apache#127 from rdblue/PARQUET-192-fix-map-null-encoding and squashes the following commits:

fffde82 [Ryan Blue] PARQUET-192: Fix parquet-avro maps with null values.
This was the behavior before the V2 pages were added.

Author: Ryan Blue <[email protected]>

Closes apache#129 from rdblue/PARQUET-188-fix-column-metadata-order and squashes the following commits:

3c9fa5d [Ryan Blue] PARQUET-188: Change column ordering to match the field order.
…seqAsJavaList

The former was removed in 2.11, but the latter exists in 2.9, 2.10 and 2.11. With this change, I can build on 2.11 without any issue.

Author: Colin Marc <[email protected]>

Closes apache#121 from colinmarc/build-211 and squashes the following commits:

8a29319 [Colin Marc] Replace JavaConversions.asJavaList with JavaConversions.seqAsJavaList.
Conflicts:
	parquet-hadoop/src/main/java/parquet/hadoop/ColumnChunkPageWriteStore.java
	parquet-hadoop/src/test/java/parquet/hadoop/TestColumnChunkPageWriteStore.java
@isnotinvain
Copy link
Author

Hmm, this didn't work the way I wanted, nevermind

@isnotinvain isnotinvain closed this Mar 4, 2015
julienledem pushed a commit that referenced this pull request Jun 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants