Skip to content

Conversation

@gortiz
Copy link
Contributor

@gortiz gortiz commented Jan 26, 2023

This is a very large PR that is a draft created when we were exploring #10183.

This PR is a fast forward from PRs:

The actual feature is in this PR, previous PRs are there in order to help the reviewers.

@codecov-commenter
Copy link

codecov-commenter commented Jan 27, 2023

Codecov Report

Attention: Patch coverage is 81.89756% with 311 lines in your changes missing coverage. Please review.

Project coverage is 70.10%. Comparing base (985d482) to head (e590233).
Report is 2792 commits behind head on master.

Files with missing lines Patch % Lines
...ent/creator/impl/SameValueForwardIndexCreator.java 28.20% 28 Missing ⚠️
...ment/creator/impl/SegmentColumnarIndexCreator.java 79.64% 16 Missing and 7 partials ⚠️
...pi/index/creator/CombinedInvertedIndexCreator.java 44.73% 18 Missing and 3 partials ⚠️
...he/pinot/segment/spi/index/ForwardIndexConfig.java 62.00% 16 Missing and 3 partials ⚠️
...ocal/segment/index/inverted/InvertedIndexType.java 64.86% 8 Missing and 5 partials ⚠️
...not/segment/spi/index/IndexConfigDeserializer.java 82.08% 11 Missing and 1 partial ⚠️
...ocal/segment/index/loader/ForwardIndexHandler.java 87.77% 0 Missing and 11 partials ⚠️
...t/index/loader/bloomfilter/BloomFilterHandler.java 62.50% 9 Missing ⚠️
.../segment/local/segment/index/fst/FstIndexType.java 77.77% 4 Missing and 4 partials ⚠️
...egment/local/segment/index/json/JsonIndexType.java 72.41% 4 Missing and 4 partials ⚠️
... and 44 more
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #10184      +/-   ##
============================================
- Coverage     70.15%   70.10%   -0.05%     
- Complexity     6114     6246     +132     
============================================
  Files          2090     2111      +21     
  Lines        112209   113108     +899     
  Branches      17078    17152      +74     
============================================
+ Hits          78716    79296     +580     
- Misses        27943    28203     +260     
- Partials       5550     5609      +59     
Flag Coverage Δ
integration1 24.44% <0.11%> (-0.10%) ⬇️
integration2 24.17% <0.11%> (+0.06%) ⬆️
unittests1 67.78% <81.87%> (-0.09%) ⬇️
unittests2 13.85% <0.00%> (-0.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


@Override
public void putInt(int value) {
_delegate.putInt((int) _actualValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the value is not used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. This index creator is used to always store a constant value decided at construction time. I know it is strange, but it is only used in a strange situation. Specifically, sometimes we want to disable the forward index in a column with a text index in order to only let the user to search using the text index. There is a plan to actually be able to disable the forward index in that situation, but right now it is not possible, so the thing we do is to create a fake forward index that always store the same value for all rows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given this was introduced in this PR - can we add some java doc? it confused me too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump this question. not blocking though.

Copy link
Contributor

@npawar npawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Non-blocking questions/comments only, feel free to address later on. Will merge this now to avoid you having to rebase yet again after the weekend.
Thanks for breaking this into pieces and making it consumable for reviewers


private String _instanceId;
private Map<String, Map<String, String>> _instanceTierConfigs;
private boolean _dirty = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know this class is fated to just be removed, but some comments in code for dirty and knownColumns would really help, in case it takes you some time to get to the demolition. it is pretty confusing without this context

private final RangeBitmap.Appender _appender;
private final File _rangeIndexFile;
private final long _minValue;
private final FieldSpec.DataType _valueType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/valueType/dataType ?

private static final String COLUMN_LENGTH_MAP_KEY = "columnLengthMap";
private static final String COLUMN_CARDINALITY_MAP_KEY = "columnCardinalityMap";
private static final String MAX_NUM_MULTI_VALUES_MAP_KEY = "maxNumMultiValuesMap";
private static final int DISK_SIZE_IN_BYTES = 20797327;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did we increase metadata in the process of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value can change a bit depending on the order of the indexes and stuff. @Jackie-Jiang recently changed the v1&v2 conversion in order to make it deterministic.

Today we also found that there was an error in the code and inverted indexes were created even if createInvertedIndexDuringSegmentGeneration was false. It should be fixed in #10524. In that PR the value of this const also changes, but to be reduced to 20797324. TBH I don't know what else may have changed the value, although the new change is just 3 bytes.


@Override
public void putInt(int value) {
_delegate.putInt((int) _actualValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given this was introduced in this PR - can we add some java doc? it confused me too

for (String columnName : indexConfigs.keySet()) {
FieldSpec fieldSpec = schema.getFieldSpecFor(columnName);
if (fieldSpec == null) {
Preconditions.checkState(schema.hasColumn(columnName),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a precondition check again, if fieldSpec was null? i understand this is checking key, vs that one value, but can column exist with null fieldspec in that map?

public BloomFilterConfig getConfig(TableConfig tableConfig, Schema schema) {
throw new UnsupportedOperationException("To be implemented in a future PR");
public ColumnConfigDeserializer<BloomFilterConfig> createDeserializer() {
return IndexConfigDeserializer.fromIndexes("bloom", getIndexConfigClass())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't there a constant you created somewhre that you can use here? (for all *IndexType)

public RangeIndexConfig(@JsonProperty("enabled") Boolean enabled,
@JsonProperty("version") @Nullable Integer version) {
super(enabled != null && enabled && version != null);
_version = version != null ? version : 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we take this from the creator/reader or define it here as current version, instead of hardcoding?

if (indexType.equals(StandardIndexes.forward())) {
return (I) _forwardIndex;
}
if (indexType.equals(StandardIndexes.inverted())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh i didn't know we could create inverted index on an inverted column!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants