Skip to content

Conversation

@leaves12138
Copy link
Contributor

[core] revert abstract manifest entry code

Purpose

Abstract manifest entry is created as a loader for scan by projection push down. We can define a SimpleManifestEntry class when we trigger a manifest file scan with projection push down. But the main format we use for manifest file is "avro", which is a column oriented file format. We test it with projection push down, the result is not what we expected.

The promotion for avro is extraordinarily limited. About 10-20 percent performance improved. It is not worth making the project complex by defining a AbstractManifestEntry for such a little performance. So we decided to revert the related code, but we still remain the code about the memory limit.

In addition, we test orc file format as well. Found it has a big performance improvement about 100%. But it's unrealistic to shift to this in production environment. I will list the result of the test below:

Entries in one manifest file: 30000
Columns in one table: 30 (20 String Type)

For orc manifest file format:
Benchmark Mode Cnt Score Error Units
ReadWrite.testListPartition thrpt 4 66.479 ± 38.925 ops/s
ReadWrite.testListPartitionFull thrpt 4 34.702 ± 23.081 ops/s

For avro manifest file format:
Benchmark Mode Cnt Score Error Units
ReadWrite.testListPartition thrpt 4 14.758 ± 5.765 ops/s
ReadWrite.testListPartitionFull thrpt 4 12.903 ± 1.489 ops/s

Tests

API and Format

Documentation

yejunhao added 3 commits May 10, 2023 11:27
@leaves12138 leaves12138 changed the title Revert abstract manifest entry [core] [revert] Revert abstract manifest entry May 10, 2023
Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 274688d into apache:master May 10, 2023
@leaves12138 leaves12138 deleted the revert_AbstractManifestEntry branch May 10, 2023 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants