-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Complete index spi #10184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complete index spi #10184
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #10184 +/- ##
============================================
- Coverage 70.15% 70.10% -0.05%
- Complexity 6114 6246 +132
============================================
Files 2090 2111 +21
Lines 112209 113108 +899
Branches 17078 17152 +74
============================================
+ Hits 78716 79296 +580
- Misses 27943 28203 +260
- Partials 5550 5609 +59
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
…dIndexAndDictionaryBasedForwardIndexCreator
…uded in SegmentV1V2ToV3FormatConverter.java
|
|
||
| @Override | ||
| public void putInt(int value) { | ||
| _delegate.putInt((int) _actualValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the value is not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. This index creator is used to always store a constant value decided at construction time. I know it is strange, but it is only used in a strange situation. Specifically, sometimes we want to disable the forward index in a column with a text index in order to only let the user to search using the text index. There is a plan to actually be able to disable the forward index in that situation, but right now it is not possible, so the thing we do is to create a fake forward index that always store the same value for all rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given this was introduced in this PR - can we add some java doc? it confused me too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump this question. not blocking though.
npawar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Non-blocking questions/comments only, feel free to address later on. Will merge this now to avoid you having to rebase yet again after the weekend.
Thanks for breaking this into pieces and making it consumable for reviewers
|
|
||
| private String _instanceId; | ||
| private Map<String, Map<String, String>> _instanceTierConfigs; | ||
| private boolean _dirty = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know this class is fated to just be removed, but some comments in code for dirty and knownColumns would really help, in case it takes you some time to get to the demolition. it is pretty confusing without this context
| private final RangeBitmap.Appender _appender; | ||
| private final File _rangeIndexFile; | ||
| private final long _minValue; | ||
| private final FieldSpec.DataType _valueType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/valueType/dataType ?
| private static final String COLUMN_LENGTH_MAP_KEY = "columnLengthMap"; | ||
| private static final String COLUMN_CARDINALITY_MAP_KEY = "columnCardinalityMap"; | ||
| private static final String MAX_NUM_MULTI_VALUES_MAP_KEY = "maxNumMultiValuesMap"; | ||
| private static final int DISK_SIZE_IN_BYTES = 20797327; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how did we increase metadata in the process of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value can change a bit depending on the order of the indexes and stuff. @Jackie-Jiang recently changed the v1&v2 conversion in order to make it deterministic.
Today we also found that there was an error in the code and inverted indexes were created even if createInvertedIndexDuringSegmentGeneration was false. It should be fixed in #10524. In that PR the value of this const also changes, but to be reduced to 20797324. TBH I don't know what else may have changed the value, although the new change is just 3 bytes.
|
|
||
| @Override | ||
| public void putInt(int value) { | ||
| _delegate.putInt((int) _actualValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given this was introduced in this PR - can we add some java doc? it confused me too
| for (String columnName : indexConfigs.keySet()) { | ||
| FieldSpec fieldSpec = schema.getFieldSpecFor(columnName); | ||
| if (fieldSpec == null) { | ||
| Preconditions.checkState(schema.hasColumn(columnName), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a precondition check again, if fieldSpec was null? i understand this is checking key, vs that one value, but can column exist with null fieldspec in that map?
| public BloomFilterConfig getConfig(TableConfig tableConfig, Schema schema) { | ||
| throw new UnsupportedOperationException("To be implemented in a future PR"); | ||
| public ColumnConfigDeserializer<BloomFilterConfig> createDeserializer() { | ||
| return IndexConfigDeserializer.fromIndexes("bloom", getIndexConfigClass()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't there a constant you created somewhre that you can use here? (for all *IndexType)
| public RangeIndexConfig(@JsonProperty("enabled") Boolean enabled, | ||
| @JsonProperty("version") @Nullable Integer version) { | ||
| super(enabled != null && enabled && version != null); | ||
| _version = version != null ? version : 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we take this from the creator/reader or define it here as current version, instead of hardcoding?
| if (indexType.equals(StandardIndexes.forward())) { | ||
| return (I) _forwardIndex; | ||
| } | ||
| if (indexType.equals(StandardIndexes.inverted())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh i didn't know we could create inverted index on an inverted column!
This is a very large PR that is a draft created when we were exploring #10183.
This PR is a fast forward from PRs:
The actual feature is in this PR, previous PRs are there in order to help the reviewers.