Skip to content

Conversation

@gortiz
Copy link
Contributor

@gortiz gortiz commented Jan 27, 2023

This is a partial draft PR related to #10183. The whole PR is in #10184. It isn't intended to be merged as it is and it doesn't require to compile or pass the checks. It has been created in order to make it easier to understand the PR with all the changes.

This PR depends on #10191. Readers only interested in the difference between this and the previous PR can focus on gortiz#1

This PR shows the basics of the proposed IndexService without actually modifying the code and also includes a BloomIndexPlugin as an example. Here we use Java Service Provider Interface to discover all services that implement IndexPlugin, although the procedure to discover the instances is subject of discussion.

classDiagram

  class StandardIndexes {
     bloomFilter() IndexType~BloomFilterConfig~
  }
  note for StandardIndexes "In future PRs will have one method per standard Pinot index (forward, etc)"

  class IndexService {
      getAll() List~IndexType~
      get(String indexId) Optional~IndexType~
  }
  
  class IndexType~C, IC, IR~ {
      getId() String
      getConfig(TableConfig, Schema) C
      createIndexCreator(CreationContext, C) IC
      createIndexReader(ReadingContext, C) IR
      createIndexHandler(HandlerContext, C) IndexHandler
  }
  
  StandardIndexes ..> IndexService : delegates on

  IndexService o--> "*" IndexType
  
  IndexService ..> IndexPlugin : used to find IndexType
  
  IndexPlugin --> "1" IndexType
  
  class IndexPlugin
  
  class BloomIndexPlugin
  BloomIndexPlugin --|> IndexPlugin : implements
  BloomIndexPlugin ..> BloomIndexType : instanciates
  
  BloomIndexType --|> IndexType : implements
Loading

@codecov-commenter
Copy link

codecov-commenter commented Jan 30, 2023

Codecov Report

Merging #10192 (cb807fb) into master (3687b2b) will increase coverage by 0.91%.
The diff coverage is 10.25%.

@@             Coverage Diff              @@
##             master   #10192      +/-   ##
============================================
+ Coverage     63.23%   64.14%   +0.91%     
- Complexity     5905     6087     +182     
============================================
  Files          2036     2013      -23     
  Lines        110973   109366    -1607     
  Branches      16892    16702     -190     
============================================
- Hits          70177    70158      -19     
+ Misses        35606    34092    -1514     
+ Partials       5190     5116      -74     
Flag Coverage Δ
integration1 ?
integration2 ?
unittests1 67.90% <10.25%> (-0.06%) ⬇️
unittests2 13.91% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...nt/local/segment/index/bloom/BloomIndexPlugin.java 0.00% <0.00%> (ø)
...ment/local/segment/index/bloom/BloomIndexType.java 0.00% <0.00%> (ø)
...he/pinot/segment/spi/index/IndexReaderFactory.java 0.00% <0.00%> (ø)
...g/apache/pinot/segment/spi/index/IndexService.java 0.00% <0.00%> (ø)
...pache/pinot/segment/spi/index/StandardIndexes.java 0.00% <0.00%> (ø)
.../segment/spi/index/creator/BloomFilterCreator.java 0.00% <0.00%> (ø)
...ache/pinot/spi/config/table/BloomFilterConfig.java 40.00% <26.66%> (-23.64%) ⬇️
...ator/impl/bloom/OnHeapGuavaBloomFilterCreator.java 95.65% <80.00%> (-4.35%) ⬇️

... and 665 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@gortiz
Copy link
Contributor Author

gortiz commented Mar 15, 2023

I've modified the description of the PR by adding a class diagram. I hope it can be useful

}

public Optional<IndexType<?, ?, ?>> getIndexTypeById(String indexId) {
return getAllIndexes().stream().filter(indexType -> indexType.getId().equalsIgnoreCase(indexId)).findAny();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this indexId be an enum instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it cannot be an enum. Enums are by definition a closed set of elements decided at compile time. The idea behind index-spi is to not define the indexes at compile time but at startup time. Therefore enums cannot be used.


@Override
public IndexReaderFactory<BloomFilterReader> getReaderFactory() {
return new IndexReaderFactory.Default<BloomFilterConfig, BloomFilterReader>() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @saurabhd336 's comment.
Creating a new instance for what is meant to be a stateless instance is a little unnecessary. I agree that GC will take care of it. But it may not be obvious to every developer. So, better to move it to a single instance and share it.

private final BloomFilter<String> _bloomFilter;
private final FieldSpec.DataType _dataType;

// TODO: This method is here for compatibility reasons, should be removed in future PRs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: In my experience, it is better to define an "exit criteria" for the deprecated feature. Meaning "when" in the future can this be safely removed. In my previous team, we used something like:

exit_criteria =  "when PR #XYZ is merged in oss"

In this case, you can mention when it will no longer matter. It can also be as straightforward as "remove in releaseX.Y.Z"
I don't know if Pinot has a standard practice for removing deprecated feature/code. But just my 2 cents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, but I don't actually know when it could be removed. There should be no usages of this method once the complete PR is merged, but I kept it there just in case some people out there is using that constructor. We can simply remove it if you prefer.

What I'm going to do is to change the comment into something like: exit_criteria: remove once #10184 is merged (#10184 is the complete PR)

* (usually by {@link java.util.ServiceLoader} services) and the actual implementation.
*
* In order to mark a class as a {@link java.util.ServiceLoader} service, some metadata has to be added. Java modules
* define a typesafe way to define services, but given that Pinot does not use them right now, the easier way to create
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very familiar with the ServiceLoader. but say in the future, we want to use the type. Will that be a problem?
What's the implication of using Google AutoService? Compared with the other type safe method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically, in order to create a ServiceLoader service you need to add some text in the MANIFES file all jars usually have. This is tedious, error prone and personally I think that is one of the reasons most people don't know about ServiceLoader system. This example from baeldung shows how to manually create a ServiceLoader service. There is a new typesafe mechanism that was added when using Java modules. Given that we don't use it, I'm not going to explain that much about it, but it is important to say that if someday use Java modules, both ways to declare services are compatible.

Baeldung also has another tutorial showing how to use Google AutoServe.

What Google AutoService does is a very simple trick based on standards:

  • You annotate a class with @AutoService(WhateverInterfaceTheClassImplement.class)
  • AutoService uses a Java preprocessor, which is a standard javac feature that offers hooks where your code can apply. What the AutoService Java preprocessor does is trivial: It just generate the entry in the MANIFEST.MF file you would need to add manually instead.
  • Given that this is only done in compile time, you don't even need to have the AutoService dependency at runtime.

So there is no magic here, no reflection, anything. It is just a typesafe way to generate an entry in the MANIFEST without having to add it manually.

Copy link
Contributor

@navina navina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are few comments that need to be resolved. Once that discussion reaches a consensus, I am good with this PR. +1

Tks for breaking down your work into consumable pieces 👍

@saurabhd336
Copy link
Contributor

qq @gortiz I don't see any changes being made to BloomFilterHandler.java in this PR. Will all the IndexHandler classes continue to directly create the corresponding creator instances? Can they follow the IndexService -> indexType -> Creator path too? Or are there challenges with that?

@saurabhd336
Copy link
Contributor

qq @gortiz I don't see any changes being made to BloomFilterHandler.java in this PR. Will all the IndexHandler classes continue to directly create the corresponding creator instances? Can they follow the IndexService -> indexType -> Creator path too? Or are there challenges with that?

Never mind looks like the original PR #10184 has those changes


@Override
public String getId() {
return "bloom_filter";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this id be class name or a public static final string? So it is easier for cross reference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have changed that in #10193.

@gortiz gortiz force-pushed the index-spi-index-service branch from aec56aa to 3abed16 Compare March 17, 2023 09:50
@gortiz gortiz mentioned this pull request Mar 21, 2023
@npawar npawar merged commit 854e7f8 into apache:master Mar 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants