-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Index spi: index service #10192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index spi: index service #10192
Conversation
Codecov Report
@@ Coverage Diff @@
## master #10192 +/- ##
============================================
+ Coverage 63.23% 64.14% +0.91%
- Complexity 5905 6087 +182
============================================
Files 2036 2013 -23
Lines 110973 109366 -1607
Branches 16892 16702 -190
============================================
- Hits 70177 70158 -19
+ Misses 35606 34092 -1514
+ Partials 5190 5116 -74
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 665 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
…o index-spi-index-service
…nto index-spi-index-type-only
|
I've modified the description of the PR by adding a class diagram. I hope it can be useful |
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexService.java
Show resolved
Hide resolved
| } | ||
|
|
||
| public Optional<IndexType<?, ?, ?>> getIndexTypeById(String indexId) { | ||
| return getAllIndexes().stream().filter(indexType -> indexType.getId().equalsIgnoreCase(indexId)).findAny(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this indexId be an enum instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it cannot be an enum. Enums are by definition a closed set of elements decided at compile time. The idea behind index-spi is to not define the indexes at compile time but at startup time. Therefore enums cannot be used.
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/StandardIndexes.java
Show resolved
Hide resolved
pinot-spi/src/main/java/org/apache/pinot/spi/config/table/BloomFilterConfig.java
Outdated
Show resolved
Hide resolved
|
|
||
| @Override | ||
| public IndexReaderFactory<BloomFilterReader> getReaderFactory() { | ||
| return new IndexReaderFactory.Default<BloomFilterConfig, BloomFilterReader>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to @saurabhd336 's comment.
Creating a new instance for what is meant to be a stateless instance is a little unnecessary. I agree that GC will take care of it. But it may not be obvious to every developer. So, better to move it to a single instance and share it.
| private final BloomFilter<String> _bloomFilter; | ||
| private final FieldSpec.DataType _dataType; | ||
|
|
||
| // TODO: This method is here for compatibility reasons, should be removed in future PRs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: In my experience, it is better to define an "exit criteria" for the deprecated feature. Meaning "when" in the future can this be safely removed. In my previous team, we used something like:
exit_criteria = "when PR #XYZ is merged in oss"
In this case, you can mention when it will no longer matter. It can also be as straightforward as "remove in releaseX.Y.Z"
I don't know if Pinot has a standard practice for removing deprecated feature/code. But just my 2 cents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, but I don't actually know when it could be removed. There should be no usages of this method once the complete PR is merged, but I kept it there just in case some people out there is using that constructor. We can simply remove it if you prefer.
What I'm going to do is to change the comment into something like: exit_criteria: remove once #10184 is merged (#10184 is the complete PR)
| * (usually by {@link java.util.ServiceLoader} services) and the actual implementation. | ||
| * | ||
| * In order to mark a class as a {@link java.util.ServiceLoader} service, some metadata has to be added. Java modules | ||
| * define a typesafe way to define services, but given that Pinot does not use them right now, the easier way to create |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not very familiar with the ServiceLoader. but say in the future, we want to use the type. Will that be a problem?
What's the implication of using Google AutoService? Compared with the other type safe method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historically, in order to create a ServiceLoader service you need to add some text in the MANIFES file all jars usually have. This is tedious, error prone and personally I think that is one of the reasons most people don't know about ServiceLoader system. This example from baeldung shows how to manually create a ServiceLoader service. There is a new typesafe mechanism that was added when using Java modules. Given that we don't use it, I'm not going to explain that much about it, but it is important to say that if someday use Java modules, both ways to declare services are compatible.
Baeldung also has another tutorial showing how to use Google AutoServe.
What Google AutoService does is a very simple trick based on standards:
- You annotate a class with
@AutoService(WhateverInterfaceTheClassImplement.class) - AutoService uses a Java preprocessor, which is a standard
javacfeature that offers hooks where your code can apply. What the AutoService Java preprocessor does is trivial: It just generate the entry in the MANIFEST.MF file you would need to add manually instead. - Given that this is only done in compile time, you don't even need to have the AutoService dependency at runtime.
So there is no magic here, no reflection, anything. It is just a typesafe way to generate an entry in the MANIFEST without having to add it manually.
navina
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are few comments that need to be resolved. Once that discussion reaches a consensus, I am good with this PR. +1
Tks for breaking down your work into consumable pieces 👍
|
qq @gortiz I don't see any changes being made to BloomFilterHandler.java in this PR. Will all the IndexHandler classes continue to directly create the corresponding creator instances? Can they follow the IndexService -> indexType -> Creator path too? Or are there challenges with that? |
Never mind looks like the original PR #10184 has those changes |
|
|
||
| @Override | ||
| public String getId() { | ||
| return "bloom_filter"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this id be class name or a public static final string? So it is easier for cross reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have changed that in #10193.
aec56aa to
3abed16
Compare
This is a partial draft PR related to #10183. The whole PR is in #10184. It isn't intended to be merged as it is and it doesn't require to compile or pass the checks. It has been created in order to make it easier to understand the PR with all the changes.
This PR depends on #10191. Readers only interested in the difference between this and the previous PR can focus on gortiz#1
This PR shows the basics of the proposed IndexService without actually modifying the code and also includes a BloomIndexPlugin as an example. Here we use Java Service Provider Interface to discover all services that implement IndexPlugin, although the procedure to discover the instances is subject of discussion.
classDiagram class StandardIndexes { bloomFilter() IndexType~BloomFilterConfig~ } note for StandardIndexes "In future PRs will have one method per standard Pinot index (forward, etc)" class IndexService { getAll() List~IndexType~ get(String indexId) Optional~IndexType~ } class IndexType~C, IC, IR~ { getId() String getConfig(TableConfig, Schema) C createIndexCreator(CreationContext, C) IC createIndexReader(ReadingContext, C) IR createIndexHandler(HandlerContext, C) IndexHandler } StandardIndexes ..> IndexService : delegates on IndexService o--> "*" IndexType IndexService ..> IndexPlugin : used to find IndexType IndexPlugin --> "1" IndexType class IndexPlugin class BloomIndexPlugin BloomIndexPlugin --|> IndexPlugin : implements BloomIndexPlugin ..> BloomIndexType : instanciates BloomIndexType --|> IndexType : implements