-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Implement NOT Operator #8148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement NOT Operator #8148
Conversation
|
Thanks for the change, @atris . I will review today |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8148 +/- ##
============================================
- Coverage 71.41% 71.08% -0.33%
- Complexity 4302 4319 +17
============================================
Files 1623 1629 +6
Lines 84312 85132 +820
Branches 12639 12812 +173
============================================
+ Hits 60213 60518 +305
- Misses 19974 20459 +485
- Partials 4125 4155 +30
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
pinot-common/src/main/java/org/apache/pinot/common/request/context/RequestContextUtils.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/NotDocIdIterator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/FilterOperatorUtils.java
Show resolved
Hide resolved
| */ | ||
| public class NotDocIdIterator implements BlockDocIdIterator { | ||
| private BlockDocIdIterator _childDocIdIterator; | ||
| private int _lowerLimit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest renaming _lowerLimit to _nextDocId to be consistent with other iterators?
| public class NotDocIdIterator implements BlockDocIdIterator { | ||
| private BlockDocIdIterator _childDocIdIterator; | ||
| private int _lowerLimit; | ||
| private int _upperLimit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest renaming _upperLimit to _nextNonMatchingDocId
|
|
||
| @Override | ||
| public int advance(int targetDocId) { | ||
| if (targetDocId == Constants.EOF || targetDocId > _numDocs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified to:
_nextDocId = targetDocId;
return next();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if that is valid -- doing that causes tests in NotDocIteratorTest to fail
|
|
||
| @Override | ||
| public int next() { | ||
| while (_lowerLimit == _upperLimit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change this to while (_lowerLimit >= _upperLimit), then both constructor and advance() can be simplified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think that works -- since that can lead to skipping of the first docID (0) if that is a valid docID for the not iterator to return. Tests in NotDocIdIteratorTest highlight that.
| _lowerLimit = 0; | ||
|
|
||
| int currentDocIdFromChildIterator = childDocIdIterator.next(); | ||
| _upperLimit = currentDocIdFromChildIterator == Constants.EOF ? numDocs : currentDocIdFromChildIterator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to initialize the upper limit here. The value check in next() should be able to handle that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't work, since it will lead to skipping the first document (i.e. not initialising upper limit implicitly implies that the first document is matching the inner iterator)
| List<Operator> resultList = new ArrayList<>(); | ||
| resultList.add(_filterOperator); | ||
|
|
||
| return resultList; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor)
| List<Operator> resultList = new ArrayList<>(); | |
| resultList.add(_filterOperator); | |
| return resultList; | |
| return Collections.singletonList(_filterOperator); |
| bitmap3.add(docIds3); | ||
| MutableRoaringBitmap bitmap4 = new MutableRoaringBitmap(); | ||
| bitmap4.add(docIds4); | ||
| OrDocIdIterator orDocIdIterator = new OrDocIdIterator(new BlockDocIdIterator[]{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to build this OrDocIdIterator? We don't want to test OrDocIdIterator in this test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use it to test NotDocIdIterator later in the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. What I'm saying is that we should directly set the docIds (similar to docIds1) instead of creating an OrDocIdIterator because that is more clear, and not mixing the scope of this test.
| assertEquals(notDocIdIterator.advance(1), 2); | ||
| assertEquals(notDocIdIterator.next(), 3); | ||
| assertEquals(notDocIdIterator.next(), 5); | ||
| assertEquals(notDocIdIterator.advance(7), 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(major) Per the contract, this should return 7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment above advance() BlockDocIdIterator is a bit confusing. It says:
Returns the first matching document whose id is greater than or equal to the given target document id
I interpreted it as being ok if we return the first id greater than the target document id, which is what this test demonstrates. If that is not the case, should we reword the API spec to be explicit?
Fixed this issue, thanks for sharing the proposed solution!
| assertEquals(notDocIdIterator.next(), 3); | ||
| assertEquals(notDocIdIterator.next(), 5); | ||
| assertEquals(notDocIdIterator.advance(7), 8); | ||
| assertEquals(notDocIdIterator.advance(13), 14); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(major) This should return 13, same for other places
| return _nextDocId; | ||
| } | ||
|
|
||
| _nextDocId = targetDocId + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(major) This will skip the targetDocId
| if (targetDocId == Constants.EOF || targetDocId > _numDocs) { | ||
| return Constants.EOF; | ||
| } | ||
|
|
||
| if (targetDocId < _nextDocId) { | ||
| return _nextDocId; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These checks are redundant because by contract, targetDocId >= _nextDocId && targetDocId < _numDocs
| } | ||
|
|
||
| private int findUpperLimitGreaterThanDocId(int currentDocId) { | ||
| int result = _childDocIdIterator.advance(currentDocId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(major) need to check _nextNonMatchingDocId < currentDocId before advancing the pointer
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor comments. Good job!
| bitmap3.add(docIds3); | ||
| MutableRoaringBitmap bitmap4 = new MutableRoaringBitmap(); | ||
| bitmap4.add(docIds4); | ||
| OrDocIdIterator orDocIdIterator = new OrDocIdIterator(new BlockDocIdIterator[]{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. What I'm saying is that we should directly set the docIds (similar to docIds1) instead of creating an OrDocIdIterator because that is more clear, and not mixing the scope of this test.
|
|
||
| @Test | ||
| public void testWeirdPredicates() { | ||
| String query = "SELECT FIRST_INT_COL, SECOND_INT_COL FROM MyTable WHERE NOT FIRST_INT_COL = 5 LIMIT " + "50000"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) Remove the unnecessary +, same for other places
| String query = "SELECT FIRST_INT_COL, SECOND_INT_COL FROM MyTable WHERE NOT FIRST_INT_COL = 5 LIMIT " + "50000"; | |
| String query = "SELECT FIRST_INT_COL, SECOND_INT_COL FROM MyTable WHERE NOT FIRST_INT_COL = 5 LIMIT 50000"; |
| return 4; | ||
| } | ||
| if (filterOperator instanceof NotFilterOperator) { | ||
| return getPriority((BaseFilterOperator) filterOperator.getChildOperators().get(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) We can add a getChildFilterOperator() in NotFilterOperator which can save an unnecessary array allocation
|
When I tried the change from the controller UI, I got the following exception:
] |
Thanks for trying it out. Can you check with the latest commit? |
This PR implements the NOT operator. NOT operator is implemented in a generic manner i.e. it can support most underlying operators. The operator operates by "skipping" documents that are valid for the underlying predicate and iterating over all other valid docIDs. Support for LIKE operator is not present yet due to Calcite's slightly different parsing of the LIKE operator.
|
@atris This is a great feature. Could you please add some release notes to the PR description which we can refer to when cutting the next release? |
This PR implements the NOT operator.
NOT operator is implemented in a generic manner i.e. it can support most underlying operators.
The operator operates by "skipping" documents that are valid for the underlying predicate and iterating over all other valid docIDs.
Support for LIKE operator is not present yet due to Calcite's slightly different parsing of the LIKE operator.