-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Implemented BoundedColumnValue partition function #8224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented BoundedColumnValue partition function #8224
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8224 +/- ##
============================================
+ Coverage 63.98% 69.53% +5.54%
- Complexity 4239 4241 +2
============================================
Files 1584 1631 +47
Lines 83250 85276 +2026
Branches 12608 12844 +236
============================================
+ Hits 53268 59293 +6025
+ Misses 26144 21847 -4297
- Partials 3838 4136 +298
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
bec3162 to
ef0384e
Compare
ef0384e to
c840ebf
Compare
|
Hi @kmozaid thanks for taking the time to make this contribution. Can you explain what problem this solves? Is it because you already have a partitioning and you want to maintain locality within partitions? |
Hi @richardstartin , We have a table where data is being ingested from multiple sources. (these multiple sources pushes data to same kafka topic). Data is kept for 5 days in realtime table and then moved offline table by minion task. We want to keep data from these sources in separate segments for offline table. There is a column which identifies the source. Just a note regarding the image - The partitionId mentioned in image are source1, source2 and source 3 although they would be integer value based on the their position in configured |
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/ColumnStatistics.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/ColumnPartitionConfig.java
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/ColumnPartitionConfig.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/ColumnPartitionConfig.java
Show resolved
Hide resolved
...rc/main/java/org/apache/pinot/segment/spi/partition/BoundedColumnValuePartitionFunction.java
Show resolved
Hide resolved
...oker/src/main/java/org/apache/pinot/broker/routing/segmentpruner/PartitionSegmentPruner.java
Outdated
Show resolved
Hide resolved
| * } | ||
| * } | ||
| * } | ||
| * With this partition config on column "subject", partitionId would be 1 for Maths, 2 for English and so on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider putting other values as the last partition in case all values are already configured, and no value goes into partition 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to keep a fixed value of partitionId for unknown column values. When putting other/unknown values as last partition, the partition id would different every time partition config is changed to add/remove another column value.
When backfilling data for a particular source/tenant (in our example, a subject), we can easily ignore segments which has partitionId set to 0 in segment metadata because we know for sure that it could have data for many different subjects.
...rc/main/java/org/apache/pinot/segment/spi/partition/BoundedColumnValuePartitionFunction.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/pinot/segment/spi/partition/BoundedColumnValuePartitionFunction.java
Outdated
Show resolved
Hide resolved
...i/src/main/java/org/apache/pinot/segment/spi/partition/metadata/ColumnPartitionMetadata.java
Outdated
Show resolved
Hide resolved
...egment-spi/src/main/java/org/apache/pinot/segment/spi/index/metadata/ColumnMetadataImpl.java
Outdated
Show resolved
Hide resolved
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise. Please reformat the changes with latest Pinot Style
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/ColumnPartitionConfig.java
Outdated
Show resolved
Hide resolved
...gment-spi/src/main/java/org/apache/pinot/segment/spi/partition/PartitionFunctionFactory.java
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/PartitionFunction.java
Show resolved
Hide resolved
...rc/main/java/org/apache/pinot/segment/spi/partition/BoundedColumnValuePartitionFunction.java
Show resolved
Hide resolved
...rc/main/java/org/apache/pinot/segment/spi/partition/BoundedColumnValuePartitionFunction.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/pinot/segment/spi/partition/BoundedColumnValuePartitionFunction.java
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/StatsCollectorConfig.java
Outdated
Show resolved
Hide resolved
...egment-spi/src/main/java/org/apache/pinot/segment/spi/index/metadata/ColumnMetadataImpl.java
Outdated
Show resolved
Hide resolved
...i/src/main/java/org/apache/pinot/segment/spi/partition/metadata/ColumnPartitionMetadata.java
Outdated
Show resolved
Hide resolved
| */ | ||
| public ColumnPartitionMetadata(String functionName, int numPartitions, Set<Integer> partitions) { | ||
| public ColumnPartitionMetadata(String functionName, int numPartitions, Set<Integer> partitions, | ||
| Map<String, String> functionConfig) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(code format)
| Map<String, String> functionConfig) { | |
| @Nullable Map<String, String> functionConfig) { |
90b2317 to
5bc1ee8
Compare
5bc1ee8 to
f2f131f
Compare
|
Thanks @Jackie-Jiang and @richardstartin for your prompt reviews, appreciate it! |

Description
This PR adds a new implementation of
PartitionFunctionwhich is used to partition segments. The new partition function namedBoundedColumnValuecan be used to partition segments on column values and still generating partitionId of integer type.Example Usage -
PartitionId is generated based on position in columnValues. PartitionId would 1 for Maths, 2 for English and so on.
PartitionId 0 is reserved for any other subject which are not present in given
columnValues. The different column values can be specified using pipe (|) separation. This additional partitionfunctionConfigis persisted in metadata.properties and segment metadata in zookeeper. Broker can also use this function to prune segments.Upgrade Notes
Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
Does this PR fix a zero-downtime upgrade introduced earlier?
Does this PR otherwise need attention when creating release notes? Things to consider:
Release Notes
Added new partition function called
BoundedColumnValueto be able to partition segments based on column value.Documentation
Yes, will create another PR in pinot-docs repo.