-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Support st_contains using H3 index #8498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8498 +/- ##
============================================
+ Coverage 63.60% 68.79% +5.18%
- Complexity 4322 4326 +4
============================================
Files 1648 1694 +46
Lines 86917 88960 +2043
Branches 13265 13497 +232
============================================
+ Hits 55281 61197 +5916
+ Misses 27595 23431 -4164
- Partials 4041 4332 +291
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/H3Utils.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/filter/H3InclusionIndexFilterOperator.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/filter/H3InclusionIndexFilterOperator.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/filter/H3InclusionIndexFilterOperator.java
Outdated
Show resolved
Hide resolved
richardstartin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add some tests of the corner cases, such as on the line/vertex of the polygon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ST_Contains seems doesn't include the point or line lying in polygon boundary. https://postgis.net/docs/ST_Contains.html
but let me try create a test with points very close to the boundary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is just approximation, but it doesn't return same result of ST_Contains, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think
expressionFilterOperator.getNextBlock().getBlockDocIdSet().iterator();
MutableRoaringBitmap result = docIdIterator.applyAnd(partialMatchDocIds);
will do the final check on the potential match set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, it could be further optimized to skip the cells fully contained in the polygon. But we can leave it for futture (add a TODO) if you find it too difficult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. will create a TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created an issue to track here: #8547
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I did the optimization. but still will use the issue to track if H3 support different polyfill natively.
|
High-level comment is that using H3 index can speed up query evaluation, but we need to ensure it does not change query results. H3's Polyfill returns a set of H3 for approximation, but we need one more round of evaluation to check the borderlines. |
2717ee1 to
d8f86c4
Compare
yupeng9
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for adding the tests and glad to see the improved index handling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw some meaningful error message for better debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. add error message for the assert.
pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/filter/H3InclusionIndexFilterOperator.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/filter/H3InclusionIndexFilterOperator.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/filter/H3InclusionIndexFilterOperator.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/operator/filter/H3InclusionIndexFilterOperator.java
Outdated
Show resolved
Hide resolved
dc7b903 to
15e03bb
Compare
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) No need to check st_within and st_contains because the function name is already canonicalized (we keep 2 function names for st_distance for backward compatibility). We may directly use the value equals check into the canApplyH3IndexForInclusionCheck() instead of using a set for 2 values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) columnName will never be null
| return columnName != null && _indexSegment.getDataSource(columnName).getH3Index() != null; | |
| return _indexSegment.getDataSource(columnName).getH3Index() != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update on line 183 as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) We don't usually put final for local variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use iterator (LongIterator.nextLong()) instead to avoid the unnecessary boxing/unboxing, same for other places when iterating over LongSet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor)
| .collect(Collectors.toList()), ImmutableList.of(), resolution)); | |
| .collect(Collectors.toList()), Collections.emptyList(), resolution)); |
f9bba02 to
3dcf59b
Compare
3dcf59b to
094d2f1
Compare
This is a follow up PR for : #7252 since that one seems doesn't get update for long time.
The idea is to convert input geometry into a list of h3 cells by using polyfill. But h3 polyfill only fills with the hexagons whose centers are contained by the geometry.
so creating a method
coverGeometryInH3to return the set of H3 cells at the specified resolution which completely cover the input shape.Instructions:
feature(**) Use
release-noteslabel for scenarios like: