-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Timestamp type index #8343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timestamp type index #8343
Conversation
cb8f7d1 to
b197bfe
Compare
Codecov Report
@@ Coverage Diff @@
## master #8343 +/- ##
============================================
+ Coverage 70.57% 70.66% +0.08%
- Complexity 4283 4285 +2
============================================
Files 1672 1674 +2
Lines 87483 87640 +157
Branches 13241 13273 +32
============================================
+ Hits 61745 61931 +186
+ Misses 21454 21398 -56
- Partials 4284 4311 +27
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
pinot-core/src/main/java/org/apache/pinot/core/query/pruner/DataSchemaSegmentPruner.java
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/IndexCreationContext.java
Outdated
Show resolved
Hide resolved
b197bfe to
b96fdbd
Compare
...in/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java
Outdated
Show resolved
Hide resolved
b85926d to
80ecfe8
Compare
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/IndexCreationContext.java
Outdated
Show resolved
Hide resolved
80ecfe8 to
6b76294
Compare
|
Please add description |
0caf992 to
760e539
Compare
Will do. This is still WIP, also need to add unit & integration tests. |
963231b to
de009b2
Compare
fc61d36 to
8e9f7ba
Compare
f4032e9 to
7b8f7e0
Compare
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some validation to the TableConfigUtils.validate()? The timestamp index can only be applied to columns with millis granularity (TIMESTAMP data type, or DATE_TIME field with EPOCH:MILLISECONDS, or it will give wrong result.
pinot-common/src/main/java/org/apache/pinot/common/config/provider/TableCache.java
Outdated
Show resolved
Hide resolved
.../main/java/org/apache/pinot/segment/local/indexsegment/immutable/ImmutableSegmentLoader.java
Outdated
Show resolved
Hide resolved
...t-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java
Outdated
Show resolved
Hide resolved
...t-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java
Outdated
Show resolved
Hide resolved
...t-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java
Outdated
Show resolved
Hide resolved
86b29f1 to
f58f1c6
Compare
make sense. |
427a1d9 to
5885c17
Compare
pinot-common/src/main/java/org/apache/pinot/common/config/provider/TableCache.java
Outdated
Show resolved
Hide resolved
adee833 to
e3731b3
Compare
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
e3731b3 to
a613e80
Compare
Description
Adding Timestamp index for Pinot data type TIMESTAMP.
Timestamp stores value as millisecond epoch long value. Typical workload for timestamp is filtering on a time range and group by with different time granularities(days/month/etc).
The current implementation requires the query executor to extract values, apply the function then do filter/groupBy, no leverage on dictionary or index.
This PR introduces the Timestamp Index. Users can configure the most useful granularities for a Timestamp data type column.
$${ts_column_name}$${ts_granularity}, e.g. Timestamp columntswith granularitiesDAY,MONTHwill have two extra columns generated:$ts$DAYand$ts$MONTH.2.1 GROUPBY: functions like
dateTrunc('DAY', ts)will be translated to use the underly column$ts$DAYto fetch data.2.2 PREDICATE: range index is auto-built for all granularity columns.
Example query usage:
Some preliminary benchmark shows the query perf over 2.7 billion records improved from 45 secs to 4.2 secs
vs.
Release Notes
Introduces the Timestamp Index. Users can configure the most useful granularities for a Timestamp data type column.
$${ts_column_name}$${ts_granularity}, e.g. Timestamp columntswith granularitiesDAY,MONTHwill have two extra columns generated:$ts$DAYand$ts$MONTH.2.1 SELECTION/GROUPBY: functions like
dateTrunc('DAY', ts)will be translated to use the underly column$ts$DAYto fetch data.2.2 PREDICATE: range index is auto-built for all granularity columns.
Documentation