-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[Index SPI] IndexType #10191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Index SPI] IndexType #10191
Conversation
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/FieldIndexConfigs.java
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/FieldIndexConfigs.java
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/FieldIndexConfigs.java
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexDeclaration.java
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexCreator.java
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexHandler.java
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexReaderFactory.java
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexType.java
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexType.java
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexType.java
Outdated
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## master #10191 +/- ##
============================================
+ Coverage 68.27% 70.26% +1.99%
- Complexity 5262 6027 +765
============================================
Files 2049 2053 +4
Lines 111033 111122 +89
Branches 16891 16899 +8
============================================
+ Hits 75810 78085 +2275
+ Misses 29816 27558 -2258
- Partials 5407 5479 +72
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 194 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
|
@Jackie-Jiang I've modified this PR in order to remove the IndexDeclaration, as we discussed online. I'll be checking the tests, but this should be ready to merge |
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is much cleaner now
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/FieldIndexConfigs.java
Outdated
Show resolved
Hide resolved
| _configMap = new HashMap<>(other._configMap); | ||
| } | ||
|
|
||
| public <C extends IndexConfig, I extends IndexType<C, ?, ?>> Builder add(I indexType, @Nullable C config) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we allow adding null config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove it and delegate on undeclare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, should we add the default config when it is not explicitly configured?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't necessary and it would force us to have larger maps for no reason. It is not necessary because the getConfig method is implemented as:
public <C extends IndexConfig, I extends IndexType<C, ?, ?>> C getConfig(I indexType) {
IndexConfig config = _configMap.get(indexType);
if (config == null) {
return indexType.getDefaultConfig();
}
return (C) config;
}The size of the map wouldn't be an actual memory problem, but:
- It would force us to iterate over all the index types registered whenever we create one of these objects
- We would therefore need to add a dependency to
StandardIndexes. This pattern were we have to (ab)use of static references is not something I really like, so I'm trying to limit it as much as possible. - This would make debugging (or logging) quite more difficult, as these objects will have one entry per index instead of one entry per explicitly define index.
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexCreator.java
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexHandler.java
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexType.java
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexType.java
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexType.java
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexReaderFactory.java
Outdated
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/IndexConfig.java
Show resolved
Hide resolved
I've just checked and there is no recent change in the interface. Both interfaces are not the same because I removed I'm going to move the class. It would increase the number of changes in this PR, but they should only be imports. PS: I will move the current IndexHandler class, with |
…he.pinot.segment.local.segment.index.loader
I don't like having the two concepts either, but I think we need them for compatibility and usability reasons. Yes, the ids of the literals in As you can see, all but dictionary and null value have the In {
"fieldConfigList": [
{
"name": “col1”,
"indexes": {
"inverted_index": {
"enabled": true
},
"range_index": {
"enabled": true
},
"text_index": {
"type": “lucene”,
"enableQueryCacheForTextIndex": false,
"stopWordsInclude": ["a", "b", "c"]
},
"bloom_index": {
"fpp": 0.01,
"maxSizeInBytes": 1000000,
"loadOnHeap": true
}
}
}
]
}And I think printing errors like At the same time we cannot decide to just remove the |
|
IMO it is okay to have the |
IMO that is cheep, repetitive (we define these objects as keys in |
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexType.java
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/FieldIndexConfigs.java
Outdated
Show resolved
Hide resolved
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexType.java
Outdated
Show resolved
Hide resolved
I think Also, all other branches are already using |
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexCreator.java
Outdated
Show resolved
Hide resolved
| * @param values The nonnull value of the cell. In case the cell was actually null, an empty array is received instead | ||
| * @param dictIds An optional array of dictionary values. If there is no dictionary, null is received. | ||
| */ | ||
| void add(@Nonnull Object[] values, @Nullable int[] dictIds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, the convention is to annotate @Nullable only and leave default as @Nonnull.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, but here I'm introducing both to be more explicit. Like adding bold in a text just to emphasize something. If you think that is problematic I can remove it.
I know this is not the place to discuss it, but I have to say that the convention is not correct. Or better said, it is partial in the sense that tools doesn't know that. We should add @ParametersAreNonnullByDefault to all packages in order to inform tools that this is our standard. For example with that spotless or spotbugs can detect NPE at compile time. Otherwise they require explicit nullability annotations anywhere.
|
|
||
| public IndexReaderConstraintException(String columnName, IndexType<?, ?, ?> type, String constraintDesc, | ||
| Throwable cause) { | ||
| this("Cannot read an index of type " + type + " on column " + columnName + ". Reason: " + constraintDesc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: String.format(...) to read more easily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is a personal taste. I find String.format quite more difficult to read given that I have to move my reading cursor to the end of the sentence in order to know what is going to feel the format variable.
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexType.java
Outdated
Show resolved
Hide resolved
| * Sometimes it doesn't make sense to create an index, even when the user explicitly asked for it. For example, an | ||
| * inverted index shouldn't be created when the column is sorted. | ||
| * | ||
| * Apache Pinot will call this method once all index configurations have been parsed and it is included in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apache Pinot feels too generic to me here. Perhaps list some example callers of this method here, e.g. SegmentPreProcessor or SegmentCreator (iiuc) .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to be too generic than too specific. Using an IDE (and even github!) it is easy to know who calls a function, so an interested reader can calculate the complete list of callers in real time. But in the too specific case, it may happen that in a future the caller changes to another class and it would be very easy to forget to change this javadoc, which would make it incorrect.
| * | ||
| * Most indexes are stored as a buffer, but for example TextIndexType is stored in a separate lucene file. | ||
| */ | ||
| default boolean storedAsBuffer() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious how this method is going to be used?
The StarTree index data is kept in a separate file (star_tree_index) as PinotDataBuffers. Would it break any assumption here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be used in TextIndexType. StarTree index is not an index covered by index-spi.
You can see how it is being used here (in case the link doesn't work: file is pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/converter/SegmentV1V2ToV3FormatConverter.java in #10184).
TL;DR: if this method returns false, the buffer is not copied. I think this will only be false in TextIndexType, but we need to introduce the method here in order to do not introduce a special case in the generic code of SegmentV1V2ToV3FormatConverter
This is the first draft PR related to #10183. The whole PR is in #10184. It isn't intended to be merged as it is and it doesn't require to compile or pass the checks. It has been created in order to make it easier to understand the PR with all the changes.
It shows the basics of the proposed
IndexTypewithout actually modifying the code.