Skip to content

Conversation

@saurabhd336
Copy link
Contributor

@saurabhd336 saurabhd336 commented Mar 21, 2024

This PR depends on #12532

Allows contextual filtering of documents when using jsonExtractIndex. Eg

columnName: jsonField
value:
{
    "stringField": "xyz",
    "arrField": [{"f1": 1, "f2": 2}, {"f1": 3, "f2": 4}, {"f2": 6}, {"f1": 0, "f2": 5}]
}

jsonExtractIndex(jsonField, '$.arrField[*].f1', 'INT_ARRAY', '0', '"$.arrField[*].f2" > 2') returns [3, 0]
i.e. Only return matching array elements that match the given json predicate.

@saurabhd336
Copy link
Contributor Author

saurabhd336 commented Mar 21, 2024

@wirybeaver @Jackie-Jiang Split the original PR (https://github.com/apache/pinot/pull/12532/files) into two as discussed. Please take a look at this one for filter support once #12532 is merged. Ty!

@codecov-commenter
Copy link

codecov-commenter commented Mar 21, 2024

Codecov Report

❌ Patch coverage is 67.34694% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.53%. Comparing base (59551e4) to head (f8b9b3c).
⚠️ Report is 2920 commits behind head on master.

Files with missing lines Patch % Lines
...t/index/readers/json/ImmutableJsonIndexReader.java 65.21% 5 Missing and 3 partials ⚠️
...rm/function/JsonExtractIndexTransformFunction.java 66.66% 1 Missing and 3 partials ⚠️
...local/realtime/impl/json/MutableJsonIndexImpl.java 76.92% 1 Missing and 2 partials ⚠️
...e/pinot/common/function/TransformFunctionType.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12683      +/-   ##
============================================
- Coverage     61.75%   61.53%   -0.23%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2456      +20     
  Lines        133233   134439    +1206     
  Branches      20636    20817     +181     
============================================
+ Hits          82274    82721     +447     
- Misses        44911    45531     +620     
- Partials       6048     6187     +139     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.48% <67.34%> (-0.23%) ⬇️
java-21 61.39% <67.34%> (-0.23%) ⬇️
skip-bytebuffers-false 61.51% <67.34%> (-0.23%) ⬇️
skip-bytebuffers-true 61.36% <67.34%> (+33.63%) ⬆️
temurin 61.53% <67.34%> (-0.23%) ⬇️
unittests 61.52% <67.34%> (-0.23%) ⬇️
unittests1 46.13% <16.32%> (-0.76%) ⬇️
unittests2 27.93% <51.02%> (+0.20%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

return getMatchingFlattenedDocIds(filter, false);
}

private RoaringBitmap getMatchingFlattenedDocIds(FilterContext filter, boolean allowNestedExclusivePredicate) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand why we cannot always allow nested exclusive predicate?

}
return matchingDocIds;
}
case PREDICATE: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add NOT since we allow exclusive predicate?


@Override
public Map<String, RoaringBitmap> getMatchingFlattenedDocsMap(String jsonPathKey) {
public Map<String, RoaringBitmap> getMatchingFlattenedDocsMap(String jsonPathKey, @Nullable String filterString) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not introduced in this PR, but I think we should support json path key with and without $ prefix. See getMatchingFlattenedDocIds() for reference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the support

@saurabhd336 saurabhd336 force-pushed the jsonExtractIndexFilterSupport branch from cb36a1b to 8e45c63 Compare March 22, 2024 05:43
@saurabhd336 saurabhd336 force-pushed the jsonExtractIndexFilterSupport branch from 2581212 to 3771339 Compare March 22, 2024 08:56
@saurabhd336
Copy link
Contributor Author

@Jackie-Jiang updated the PR to let getMatchingFlattenedDocsMap accept jsonPath string with $ itself (similar to getMatchingFlattenedDocIds)

As for the nested exclusive predicates, I think my understanding was incorrect. We can't infact support nested exclusive predicates (see https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/json/ImmutableJsonIndexReader.java#L106). Turns out, the contract is for the predicates to maintain context, and therefore trying to solve nested exclusive predicates by flipping results of individual flattened doc id bitmaps would be wrong. Since our usecase does not require nested exclusive predicates, I removed that part entirely. Do have a look ty!

@Jackie-Jiang Jackie-Jiang added feature documentation release-notes Referenced by PRs that need attention when compiling the next release notes labels Mar 25, 2024
@saurabhd336 saurabhd336 merged commit 57f50d3 into apache:master Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation feature release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants