-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Json extract index mv #12532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Json extract index mv #12532
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12532 +/- ##
============================================
- Coverage 61.75% 61.50% -0.26%
+ Complexity 207 198 -9
============================================
Files 2436 2452 +16
Lines 133233 133978 +745
Branches 20636 20756 +120
============================================
+ Hits 82274 82398 +124
- Misses 44911 45413 +502
- Partials 6048 6167 +119
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
e14682e to
d1abd9b
Compare
9b9f660 to
dcd8ca7
Compare
|
@wirybeaver @Jackie-Jiang I've updated the PR to consolidate with jsonExtractIndex (essentially added MV support to jsonExtractIndex, as well as support for jsonPath filter). There is one caveat here wrt MV return types, wherein for the MV return type to work correctly, users must set |
|
Just paste my previous suggestion from the slack discussion so that other reviewers can understand how we get here:
|
|
@saurabhd336 Two comments. |
| } | ||
|
|
||
| @Override | ||
| public Map<String, RoaringBitmap> getValueToFlattenedDocIdsMap(String jsonPathKey, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directly modify the getMatchingDocsMap which should have return FlattenDocIds and move the converting of flattenDocIds to DocIds into the up layer, i.e. transform function. I understand fixing the unit test JsonIndexTest might take more time but it's worth to do that since more test cases can be reused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly good
...ava/org/apache/pinot/core/operator/transform/function/JsonExtractIndexTransformFunction.java
Show resolved
Hide resolved
...ava/org/apache/pinot/core/operator/transform/function/JsonExtractIndexTransformFunction.java
Show resolved
Hide resolved
| _matchingDocsMap = _jsonIndexReader.getMatchingDocsMap(_jsonPathString); | ||
| _resultMetadata = new TransformResultMetadata(dataType, isSingleValue, false); | ||
| _valueToMatchingFlattenedDocIdsMap = | ||
| _jsonIndexReader.getMatchingFlattenedDocsMap(inputJsonPath.substring(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) Suggest passing in the full path, and JSON index reader should validate the path
| * @return String[][] where String[i] is the mv array for docIds[i] | ||
| */ | ||
| String[] getValuesForKeyAndDocs(int[] docIds, Map<String, RoaringBitmap> context); | ||
| String[][] getValuesForMv(int[] docIds, int length, Map<String, RoaringBitmap> matchingValueToFlattenedDocs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| String[][] getValuesForMv(int[] docIds, int length, Map<String, RoaringBitmap> matchingValueToFlattenedDocs); | |
| String[][] getValuesMV(int[] docIds, int length, Map<String, RoaringBitmap> matchingValueToFlattenedDocs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) We usually put API for SV in front of MV
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
| * @return String[] where String[i] is the sv value for docIds[i] | ||
| */ | ||
| Map<String, RoaringBitmap> getMatchingDocsMap(String key); | ||
| String[] getValuesForSv(int[] docIds, int length, Map<String, RoaringBitmap> matchingValueToFlattenedDocs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| String[] getValuesForSv(int[] docIds, int length, Map<String, RoaringBitmap> matchingValueToFlattenedDocs); | |
| String[] getValuesSV(int[] docIds, int length, Map<String, RoaringBitmap> matchingValueToFlattenedDocs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
...java/org/apache/pinot/segment/local/segment/index/readers/json/ImmutableJsonIndexReader.java
Show resolved
Hide resolved
| int numChildren = children.size(); | ||
| MutableRoaringBitmap matchingDocIds = getMatchingFlattenedDocIds(children.get(0)); | ||
| MutableRoaringBitmap matchingDocIds = | ||
| getMatchingFlattenedDocIds(children.get(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(format) Please reformat all the changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
| Map<String, RoaringBitmap> valueToMatchingFlattenedDocIdsMap = new HashMap<>(); | ||
| _readLock.lock(); | ||
| try { | ||
| Pair<String, RoaringBitmap> result = getKeyAndFlattenDocId(jsonPathKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor, not introduced here) Rename the method to getKeyAndFlattenedDocId
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
| public Map<String, RoaringBitmap> getMatchingDocsMap(String key) { | ||
| Map<String, RoaringBitmap> matchingDocsMap = new HashMap<>(); | ||
| @VisibleForTesting | ||
| public Map<String, RoaringBitmap> convertFlattenedDocIdsToDocIds(Map<String, RoaringBitmap> valueToFlattenedDocIds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now used to optimize SV extraction path by converting flattened doc IDs map to unflattened.
| } | ||
|
|
||
| @VisibleForTesting | ||
| public Map<String, RoaringBitmap> convertFlattenedDocIdsToDocIds(Map<String, RoaringBitmap> valueToFlattenedDocIds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now used to optimize SV extraction path by converting flattened doc IDs map to unflattened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense since the Transform function might be invoked multiple times. Then we should remove VisibleForTesting? I just noticed that I incidentally remove an optimization comment:
I just realized that we should keep both valueToDocId and valueToFlattenDocId. Using valueToFlattenDocIds would increase the read latency since it need to iterate the flattenDocId bitmap and create the docId bitmap. If the return type isSingleValue, init valueToDocId; otherwise, init valueToFlattenDocIds.
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor comments
| * @return String[] where String[i] is the sv value for docIds[i] | ||
| */ | ||
| String[] getValuesForKeyAndDocs(int[] docIds, Map<String, RoaringBitmap> context); | ||
| String[] getValuesSv(int[] docIds, int length, Map<String, RoaringBitmap> matchingValueToDocs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| String[] getValuesSv(int[] docIds, int length, Map<String, RoaringBitmap> matchingValueToDocs, | |
| String[] getValuesSV(int[] docIds, int length, Map<String, RoaringBitmap> matchingValueToDocs, |
| * @return String[][] where String[i] is the mv array for docIds[i] | ||
| */ | ||
| Map<String, RoaringBitmap> getMatchingDocsMap(String key); | ||
| String[][] getValuesMv(int[] docIds, int length, Map<String, RoaringBitmap> matchingValueToFlattenedDocs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| String[][] getValuesMv(int[] docIds, int length, Map<String, RoaringBitmap> matchingValueToFlattenedDocs); | |
| String[][] getValuesMV(int[] docIds, int length, Map<String, RoaringBitmap> matchingValueToFlattenedDocs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
| * and the associated flattenDocId bitmap | ||
| */ | ||
| private Pair<String, MutableRoaringBitmap> getKeyAndFlattenDocId(String key) { | ||
| private Pair<String, MutableRoaringBitmap> getKeyAndFlattenedDocId(String key) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| private Pair<String, MutableRoaringBitmap> getKeyAndFlattenedDocId(String key) { | |
| private Pair<String, MutableRoaringBitmap> getKeyAndFlattenedDocIds(String key) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the new syntax, I think we want to pass in the real json path, e.g. $.foo, or do we want to keep the current way of no dollar sign?
|
One more question: with current implementation, does |
the former one |
|
was the documentation updated with this change @saurabhd336 ? |

Allows support for multi value fields to
jsonExtractIndex. Explained belowjsonExtractIndex(jsonField, '$.arrField[*].f1', 'INT_ARRAY')returns[1, 3]for this rowjsonExtractIndex(jsonField, '$.arrField[*].f2', 'INT_ARRAY')returns[2, 4, 6]for this rowCaveat to be covered in documentation:
If the json doc has > 1 array fields, users must set
disableCrossArrayUnnest: truein the JSON index configs for the results to be correct.There is a followup PR #12683 which adds support for contextual filtering when extracting MV values