Skip to content

Conversation

@Jackie-Jiang
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang commented Jun 9, 2020

Description

The new FunctionRegistry change of using reflection to plug in methods (introduced in #5440) is causing failure in query compilation.
The problem is caused by Reflections library not being thread-safe when multiple threads are accessing the same jar file.
We have multiple libraries (jersey, swagger) using reflection to load classes. The FunctionRegistry uses reflection in the static block, which happens when the class is loaded. If the first query arrives (first usage of FunctionRegistry which runs the static block) during the setup of the jersey server, Reflections will throw exception.
Using reflection in the static block can also cause the first query to block on reflection scanning the classes, which can potentially take seconds.
Read more about the thread-safety issue here: ronmamo/reflections#81
Users are reporting the same issue even with the latest Reflections version 0.9.12.
A common solution is to downgrade Reflections to 0.9.9, but we have other dependencies rely on 0.9.11, so downgrading might cause other issues.

The solution here is to introduce a no-op init() method to trigger the static block so that we can control when to scan the files to avoid the thread-safety issue as well as blocking the first query.
Another benefit of using an init() method is that the exception won't be swallowed (the exception in static block will be swallowed and query engine will start getting ClassNotFoundException)

This PR also introduces a convention for the plugin methods using reflection, where the class must includes ".function." in its class path.
This can significantly reduce the time of class scanning (reduced from 4 seconds to 200 ms locally)

Does this PR otherwise need attention when creating release notes? Things to consider:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed
  • Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

In order to plug in scalar methods using annotation via reflection, please put the methods in a class that includes ".function." in its class path.

@mayankshriv
Copy link
Contributor

Thanks a lot for identifying this issue, and providing a fix. Could you please elaborate a bit on how the thread safety issue leads to query compilation error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if there's any way to enforce this, so we ensure that future changes adhere to this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no easy way to enforce this for user-side classes because we can only load the classes at runtime. Added the log for all the registered functions so that at least we can catch the problem if the class is not with the correct class path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we measuring the time to init functions? do we expect it to take a long time?
Also, if we logging it may be useful to log the map size here, and also each function as they are registered

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was taking several seconds before adding filter on the class path. That is another reason why we should explicitly initialize it instead of waiting for class loader to run the static block. Before the change, the reflection happens when the first query arrives, which means the first query have to wait for several seconds. Right now it should take ~200ms, and it happens in the instance setup.
Added the map size and functions registered.

Copy link
Contributor

@sajjad-moradi sajjad-moradi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a couple of minor comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make a function like canonicalize(functionName) and use it here and also in registerFunction method where FUNCTION_INFO_MAP gets populated so in future we won't miss converting to lowercase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you removing this check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FunctionRegistry.getFunctionByName() can handle function name not registered by returning null. I have a null check after this to save the extra hash lookup.

The problem is caused by Reflections library not being thread-safe when multiple threads are accessing the same jar file.
Read more about the thread-safety issue here: ronmamo/reflections#81
Users are reporting the same issue even with the latest Reflections version 0.9.12.
A common solution is to downgrade Reflections to 0.9.9, but we have other dependencies rely on 0.9.11, so downgrading might cause other issues.

The solution introduced here is by replacing the static block with an init() method so that we can control when to scan the files to avoid the thread-safety issue.
Another benefit of using an init() method is that the exception won't be swallowed (the exception in static block will be swallowed and query engine will start getting ClassNotFoundException)

This PR also introduces a convention for the plugin methods using reflection, where the class must includes ".function." in its class path.
This can significantly reduce the time of class scanning (reduced from 4 seconds to 200 ms locally)
@codecov-commenter
Copy link

Codecov Report

Merging #5531 into master will increase coverage by 0.29%.
The diff coverage is 73.34%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5531      +/-   ##
==========================================
+ Coverage   66.44%   66.74%   +0.29%     
==========================================
  Files        1075     1096      +21     
  Lines       54773    56414    +1641     
  Branches     8168     8412     +244     
==========================================
+ Hits        36396    37655    +1259     
- Misses      15700    15982     +282     
- Partials     2677     2777     +100     
Flag Coverage Δ
#integrationtests 45.35% <53.10%> (?)
#unittests 56.93% <63.22%> (?)
Impacted Files Coverage Δ
...quota/HelixExternalViewBasedQueryQuotaManager.java 67.87% <0.00%> (ø)
...org/apache/pinot/common/function/FunctionInfo.java 73.33% <ø> (ø)
...org/apache/pinot/common/utils/CommonConstants.java 39.02% <0.00%> (+0.92%) ⬆️
...troller/helix/core/retention/RetentionManager.java 80.28% <0.00%> (+1.11%) ⬆️
...he/pinot/controller/util/AutoAddInvertedIndex.java 0.00% <0.00%> (ø)
.../org/apache/pinot/core/common/BaseBlockValSet.java 3.03% <0.00%> (-1.32%) ⬇️
.../java/org/apache/pinot/core/common/DataSource.java 100.00% <ø> (ø)
...data/manager/realtime/DefaultSegmentCommitter.java 80.00% <ø> (ø)
...e/data/manager/realtime/SplitSegmentCommitter.java 63.63% <ø> (ø)
...re/io/reader/impl/v1/FixedBitMultiValueReader.java 100.00% <ø> (ø)
... and 422 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a1a2434...503eff8. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants