Skip to content

Conversation

@atris
Copy link
Contributor

@atris atris commented Feb 7, 2022

This PR implements the NOT operator.

NOT operator is implemented in a generic manner i.e. it can support most underlying operators.

The operator operates by "skipping" documents that are valid for the underlying predicate and iterating over all other valid docIDs.

Support for LIKE operator is not present yet due to Calcite's slightly different parsing of the LIKE operator.

@siddharthteotia
Copy link
Contributor

Thanks for the change, @atris . I will review today

@atris atris mentioned this pull request Feb 7, 2022
@codecov-commenter
Copy link

codecov-commenter commented Feb 9, 2022

Codecov Report

Attention: Patch coverage is 78.00000% with 11 lines in your changes missing coverage. Please review.

Project coverage is 71.08%. Comparing base (5fa4737) to head (bcb3886).
Report is 4683 commits behind head on master.

Files with missing lines Patch % Lines
...inot/core/operator/filter/FilterOperatorUtils.java 42.85% 2 Missing and 2 partials ⚠️
.../pinot/core/operator/filter/NotFilterOperator.java 55.55% 4 Missing ⚠️
...ot/common/request/context/RequestContextUtils.java 50.00% 0 Missing and 1 partial ⚠️
...core/operator/dociditerators/NotDocIdIterator.java 95.00% 0 Missing and 1 partial ⚠️
...ava/org/apache/pinot/core/plan/FilterPlanNode.java 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #8148      +/-   ##
============================================
- Coverage     71.41%   71.08%   -0.33%     
- Complexity     4302     4319      +17     
============================================
  Files          1623     1629       +6     
  Lines         84312    85132     +820     
  Branches      12639    12812     +173     
============================================
+ Hits          60213    60518     +305     
- Misses        19974    20459     +485     
- Partials       4125     4155      +30     
Flag Coverage Δ
integration1 28.81% <4.00%> (-0.09%) ⬇️
integration2 27.48% <4.00%> (-0.22%) ⬇️
unittests1 67.37% <78.00%> (-0.51%) ⬇️
unittests2 14.09% <0.00%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

*/
public class NotDocIdIterator implements BlockDocIdIterator {
private BlockDocIdIterator _childDocIdIterator;
private int _lowerLimit;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest renaming _lowerLimit to _nextDocId to be consistent with other iterators?

public class NotDocIdIterator implements BlockDocIdIterator {
private BlockDocIdIterator _childDocIdIterator;
private int _lowerLimit;
private int _upperLimit;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest renaming _upperLimit to _nextNonMatchingDocId


@Override
public int advance(int targetDocId) {
if (targetDocId == Constants.EOF || targetDocId > _numDocs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified to:

    _nextDocId = targetDocId;
    return next();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if that is valid -- doing that causes tests in NotDocIteratorTest to fail


@Override
public int next() {
while (_lowerLimit == _upperLimit) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to while (_lowerLimit >= _upperLimit), then both constructor and advance() can be simplified

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think that works -- since that can lead to skipping of the first docID (0) if that is a valid docID for the not iterator to return. Tests in NotDocIdIteratorTest highlight that.

_lowerLimit = 0;

int currentDocIdFromChildIterator = childDocIdIterator.next();
_upperLimit = currentDocIdFromChildIterator == Constants.EOF ? numDocs : currentDocIdFromChildIterator;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to initialize the upper limit here. The value check in next() should be able to handle that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't work, since it will lead to skipping the first document (i.e. not initialising upper limit implicitly implies that the first document is matching the inner iterator)

Comment on lines 47 to 50
List<Operator> resultList = new ArrayList<>();
resultList.add(_filterOperator);

return resultList;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor)

Suggested change
List<Operator> resultList = new ArrayList<>();
resultList.add(_filterOperator);
return resultList;
return Collections.singletonList(_filterOperator);

bitmap3.add(docIds3);
MutableRoaringBitmap bitmap4 = new MutableRoaringBitmap();
bitmap4.add(docIds4);
OrDocIdIterator orDocIdIterator = new OrDocIdIterator(new BlockDocIdIterator[]{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to build this OrDocIdIterator? We don't want to test OrDocIdIterator in this test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use it to test NotDocIdIterator later in the test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. What I'm saying is that we should directly set the docIds (similar to docIds1) instead of creating an OrDocIdIterator because that is more clear, and not mixing the scope of this test.

assertEquals(notDocIdIterator.advance(1), 2);
assertEquals(notDocIdIterator.next(), 3);
assertEquals(notDocIdIterator.next(), 5);
assertEquals(notDocIdIterator.advance(7), 8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(major) Per the contract, this should return 7

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above advance() BlockDocIdIterator is a bit confusing. It says:

Returns the first matching document whose id is greater than or equal to the given target document id

I interpreted it as being ok if we return the first id greater than the target document id, which is what this test demonstrates. If that is not the case, should we reword the API spec to be explicit?

Fixed this issue, thanks for sharing the proposed solution!

assertEquals(notDocIdIterator.next(), 3);
assertEquals(notDocIdIterator.next(), 5);
assertEquals(notDocIdIterator.advance(7), 8);
assertEquals(notDocIdIterator.advance(13), 14);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(major) This should return 13, same for other places

return _nextDocId;
}

_nextDocId = targetDocId + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(major) This will skip the targetDocId

Comment on lines 67 to 73
if (targetDocId == Constants.EOF || targetDocId > _numDocs) {
return Constants.EOF;
}

if (targetDocId < _nextDocId) {
return _nextDocId;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks are redundant because by contract, targetDocId >= _nextDocId && targetDocId < _numDocs

}

private int findUpperLimitGreaterThanDocId(int currentDocId) {
int result = _childDocIdIterator.advance(currentDocId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(major) need to check _nextNonMatchingDocId < currentDocId before advancing the pointer

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments. Good job!

bitmap3.add(docIds3);
MutableRoaringBitmap bitmap4 = new MutableRoaringBitmap();
bitmap4.add(docIds4);
OrDocIdIterator orDocIdIterator = new OrDocIdIterator(new BlockDocIdIterator[]{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. What I'm saying is that we should directly set the docIds (similar to docIds1) instead of creating an OrDocIdIterator because that is more clear, and not mixing the scope of this test.


@Test
public void testWeirdPredicates() {
String query = "SELECT FIRST_INT_COL, SECOND_INT_COL FROM MyTable WHERE NOT FIRST_INT_COL = 5 LIMIT " + "50000";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Remove the unnecessary +, same for other places

Suggested change
String query = "SELECT FIRST_INT_COL, SECOND_INT_COL FROM MyTable WHERE NOT FIRST_INT_COL = 5 LIMIT " + "50000";
String query = "SELECT FIRST_INT_COL, SECOND_INT_COL FROM MyTable WHERE NOT FIRST_INT_COL = 5 LIMIT 50000";

return 4;
}
if (filterOperator instanceof NotFilterOperator) {
return getPriority((BaseFilterOperator) filterOperator.getChildOperators().get(0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) We can add a getChildFilterOperator() in NotFilterOperator which can save an unnecessary array allocation

@ashishkf
Copy link

When I tried the change from the controller UI, I got the following exception:

pinot-broker-2 broker 2022/02/20 05:46:39.065 ERROR [PinotClientRequest] [jersey-server-managed-async-executor-2] Caught exception while processing POST request
pinot-broker-2 broker java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1
pinot-broker-2 broker at jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) ~[?:?]
pinot-broker-2 broker at jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70) ~[?:?]
pinot-broker-2 broker at jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248) ~[?:?]
pinot-broker-2 broker at java.util.Objects.checkIndex(Objects.java:372) ~[?:?]
pinot-broker-2 broker at java.util.ArrayList.get(ArrayList.java:459) ~[?:?]
pinot-broker-2 broker at org.apache.pinot.core.query.optimizer.filter.NumericalFilterOptimizer.optimize(NumericalFilterOptimizer.java:102) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.apache.pinot.core.query.optimizer.QueryOptimizer.optimize(QueryOptimizer.java:81) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleSQLRequest(BaseBrokerRequestHandler.java:381) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleRequest(BaseBrokerRequestHandler.java:197) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleRequest(BaseBrokerRequestHandler.java:102) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.apache.pinot.broker.api.resources.PinotClientRequest.processSqlQueryPost(PinotClientRequest.java:191) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
pinot-broker-2 broker at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
pinot-broker-2 broker at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
pinot-broker-2 broker at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
pinot-broker-2 broker at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:159) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.glassfish.jersey.server.model.ResourceMethodInvoker.lambda$apply$0(ResourceMethodInvoker.java:381) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.glassfish.jersey.server.ServerRuntime$AsyncResponder$2$1.run(ServerRuntime.java:819) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29

]

@atris
Copy link
Contributor Author

atris commented Feb 21, 2022

When I tried the change from the controller UI, I got the following exception:

pinot-broker-2 broker 2022/02/20 05:46:39.065 ERROR [PinotClientRequest] [jersey-server-managed-async-executor-2] Caught exception while processing POST request
pinot-broker-2 broker java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1
pinot-broker-2 broker at jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) ~[?:?]
pinot-broker-2 broker at jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70) ~[?:?]
pinot-broker-2 broker at jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248) ~[?:?]
pinot-broker-2 broker at java.util.Objects.checkIndex(Objects.java:372) ~[?:?]
pinot-broker-2 broker at java.util.ArrayList.get(ArrayList.java:459) ~[?:?]
pinot-broker-2 broker at org.apache.pinot.core.query.optimizer.filter.NumericalFilterOptimizer.optimize(NumericalFilterOptimizer.java:102) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.apache.pinot.core.query.optimizer.QueryOptimizer.optimize(QueryOptimizer.java:81) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleSQLRequest(BaseBrokerRequestHandler.java:381) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleRequest(BaseBrokerRequestHandler.java:197) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleRequest(BaseBrokerRequestHandler.java:102) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.apache.pinot.broker.api.resources.PinotClientRequest.processSqlQueryPost(PinotClientRequest.java:191) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
pinot-broker-2 broker at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
pinot-broker-2 broker at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
pinot-broker-2 broker at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
pinot-broker-2 broker at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:159) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.glassfish.jersey.server.model.ResourceMethodInvoker.lambda$apply$0(ResourceMethodInvoker.java:381) ~[pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29]
pinot-broker-2 broker at org.glassfish.jersey.server.ServerRuntime$AsyncResponder$2$1.run(ServerRuntime.java:819) [pinot-all-0.9.0-jar-with-dependencies.jar:0.9.0-02f03f4a81d44564e818ceb720ba25e2ab31bc29

]

Thanks for trying it out. Can you check with the latest commit?

@atris atris merged commit 3168194 into apache:master Feb 21, 2022
xiangfu0 pushed a commit to xiangfu0/pinot that referenced this pull request Feb 23, 2022
This PR implements the NOT operator.

NOT operator is implemented in a generic manner i.e. it can support most underlying operators.

The operator operates by "skipping" documents that are valid for the underlying predicate and iterating over all other valid docIDs.

Support for LIKE operator is not present yet due to Calcite's slightly different parsing of the LIKE operator.
@Jackie-Jiang Jackie-Jiang added feature release-notes Referenced by PRs that need attention when compiling the next release notes labels Mar 10, 2022
@Jackie-Jiang
Copy link
Contributor

@atris This is a great feature. Could you please add some release notes to the PR description which we can refer to when cutting the next release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants