Skip to content

Conversation

@WangCHX
Copy link
Contributor

@WangCHX WangCHX commented Apr 30, 2022

Instructions:

  1. The PR has to be tagged with at least one of the following labels (*):
    1. enhancement

@codecov-commenter
Copy link

codecov-commenter commented Apr 30, 2022

Codecov Report

Merging #8620 (59f1b47) into master (b5690a7) will decrease coverage by 41.10%.
The diff coverage is 8.16%.

@@              Coverage Diff              @@
##             master    #8620       +/-   ##
=============================================
- Coverage     70.47%   29.36%   -41.11%     
=============================================
  Files          1703     1692       -11     
  Lines         89569    89238      -331     
  Branches      13564    13532       -32     
=============================================
- Hits          63123    26204    -36919     
- Misses        21990    60602    +38612     
+ Partials       4456     2432     -2024     
Flag Coverage Δ
integration1 27.30% <8.16%> (+0.12%) ⬆️
integration2 25.48% <8.16%> (-0.03%) ⬇️
unittests1 ?
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...perator/filter/H3InclusionIndexFilterOperator.java 0.00% <0.00%> (-83.93%) ⬇️
...ot/core/operator/filter/H3IndexFilterOperator.java 0.00% <0.00%> (-82.98%) ⬇️
...ava/org/apache/pinot/core/util/GeoColumnUtils.java 0.00% <0.00%> (ø)
.../org/apache/pinot/segment/local/utils/H3Utils.java 0.00% <0.00%> (-40.39%) ⬇️
...ava/org/apache/pinot/core/plan/FilterPlanNode.java 55.03% <17.39%> (-27.51%) ⬇️
...in/java/org/apache/pinot/spi/utils/StringUtil.java 0.00% <0.00%> (-100.00%) ⬇️
.../java/org/apache/pinot/spi/utils/BooleanUtils.java 0.00% <0.00%> (-100.00%) ⬇️
...java/org/apache/pinot/spi/trace/BaseRecording.java 0.00% <0.00%> (-100.00%) ⬇️
...java/org/apache/pinot/spi/trace/NoOpRecording.java 0.00% <0.00%> (-100.00%) ⬇️
...ava/org/apache/pinot/spi/config/table/FSTType.java 0.00% <0.00%> (-100.00%) ⬇️
... and 1173 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b5690a7...59f1b47. Read the comment docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some comments

also, i think this handles first level func only

Copy link
Contributor Author

@WangCHX WangCHX May 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. added comments and also handled multiple level.
just was not sure multiple level is a common thing.
like

toGeometry(toSphericalGeography(toGeometry(%s)))

looks like unnecessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it's possible for other nested functions such as some geometry manipulation.

@yupeng9
Copy link
Contributor

yupeng9 commented May 3, 2022

plz take a look at the tests

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the correct fix. The filter operator relies on the H3 index on the column. If the column is within an expression, the result will be wrong. E.g. ST_DISTANCE(geoCol, 'point') is not the same as ST_DISTANCE(myFunc(geoCol), 'point'), and we cannot use the same index because the index is not applied to the myFunc(geoCol)

@WangCHX
Copy link
Contributor Author

WangCHX commented May 3, 2022

This is not the correct fix. The filter operator relies on the H3 index on the column. If the column is within an expression, the result will be wrong. E.g. ST_DISTANCE(geoCol, 'point') is not the same as ST_DISTANCE(myFunc(geoCol), 'point'), and we cannot use the same index because the index is not applied to the myFunc(geoCol)

is it ok for me to add constraints that the function only can be toSphericalGeography and toGeometry right now?

@Jackie-Jiang
Copy link
Contributor

This is not the correct fix. The filter operator relies on the H3 index on the column. If the column is within an expression, the result will be wrong. E.g. ST_DISTANCE(geoCol, 'point') is not the same as ST_DISTANCE(myFunc(geoCol), 'point'), and we cannot use the same index because the index is not applied to the myFunc(geoCol)

is it ok for me to add constraints that the function only can be toSphericalGeography and toGeometry right now?

Why do you need to do toSphericalGeography or toGeometry on a column with H3 index? IIUC, H3 index implies the underlying points are geography, and ST_DISTANCE(toGeometry(geoCol), 'point') should not use H3 index.

@WangCHX
Copy link
Contributor Author

WangCHX commented May 3, 2022

This is not the correct fix. The filter operator relies on the H3 index on the column. If the column is within an expression, the result will be wrong. E.g. ST_DISTANCE(geoCol, 'point') is not the same as ST_DISTANCE(myFunc(geoCol), 'point'), and we cannot use the same index because the index is not applied to the myFunc(geoCol)

is it ok for me to add constraints that the function only can be toSphericalGeography and toGeometry right now?

Why do you need to do toSphericalGeography or toGeometry on a column with H3 index? IIUC, H3 index implies the underlying points are geography, and ST_DISTANCE(toGeometry(geoCol), 'point') should not use H3 index.

I think the underlying is a point. H3 index only take coordinate. so it can be applied to both geography and geometry. Also ST_Contains only can be applied to Geometry. If i want to use ST_DISTANCE and ST_Contains at the same time, seems I need to create two columns.

@Jackie-Jiang
Copy link
Contributor

@yupeng9 Can we apply ST_Contains to the geography objects? I think it should work on both geometry and geography

@yupeng9
Copy link
Contributor

yupeng9 commented May 4, 2022

@Jackie-Jiang
Copy link
Contributor

@yupeng9
Copy link
Contributor

yupeng9 commented May 4, 2022

current implementation of ST_Contains is for Geometry only https://github.com/apache/pinot/blob/master/pinot-core/src/main/java/org/apache/pinot/core/geospatial/transform/function/StContainsFunction.java

@yupeng9 Understood, but why does it not apply to the geography objects?

Different implementation, and I didn't scope it in the inital impl

@Jackie-Jiang
Copy link
Contributor

current implementation of ST_Contains is for Geometry only https://github.com/apache/pinot/blob/master/pinot-core/src/main/java/org/apache/pinot/core/geospatial/transform/function/StContainsFunction.java

@yupeng9 Understood, but why does it not apply to the geography objects?

Different implementation, and I didn't scope it in the inital impl

@yupeng9 Is H3 index always based on the geography objects? If so, we cannot use H3 index for ST_Contains because that will give wrong result (mixing geometry calculation with geography calculation).
Another option is to allow using geometry contains on geography objects because they should give close estimation on small areas. It won't work for large areas though.

@yupeng9
Copy link
Contributor

yupeng9 commented May 4, 2022

current implementation of ST_Contains is for Geometry only https://github.com/apache/pinot/blob/master/pinot-core/src/main/java/org/apache/pinot/core/geospatial/transform/function/StContainsFunction.java

@yupeng9 Understood, but why does it not apply to the geography objects?

Different implementation, and I didn't scope it in the inital impl

@yupeng9 Is H3 index always based on the geography objects? If so, we cannot use H3 index for ST_Contains because that will give wrong result (mixing geometry calculation with geography calculation). Another option is to allow using geometry contains on geography objects because they should give close estimation on small areas. It won't work for large areas though.

Yes, H3 is always based on geography. And yes, we can do approximation for small area.

@Jackie-Jiang
Copy link
Contributor

Then my suggestion is to remove the geometry check in the StContainsFunction, and we should document the behavior of using geometry contains for geography objects, which can give good estimation on small areas. We should also add a TODO to support accurate contains on geography objects in the future.

@WangCHX WangCHX closed this May 5, 2022
@WangCHX WangCHX reopened this May 5, 2022
@WangCHX
Copy link
Contributor Author

WangCHX commented May 5, 2022

Then my suggestion is to remove the geometry check in the StContainsFunction, and we should document the behavior of using geometry contains for geography objects, which can give good estimation on small areas. We should also add a TODO to support accurate contains on geography objects in the future.

thanks Jackie and Yupeng. just curious why it is not accurate? My use case is checking all the coordinates inside a city polygon (like SF). is it small area or big area for a city like SF?

@WangCHX WangCHX changed the title Handle nested conversion of geo column Remove geometry check for st_contain and st_within May 5, 2022
@Jackie-Jiang
Copy link
Contributor

Then my suggestion is to remove the geometry check in the StContainsFunction, and we should document the behavior of using geometry contains for geography objects, which can give good estimation on small areas. We should also add a TODO to support accurate contains on geography objects in the future.

thanks Jackie and Yupeng. just curious why it is not accurate? My use case is checking all the coordinates inside a city polygon (like SF). is it small area or big area for a city like SF?

IMO a single city on the earth sphere can be considered small area.
I can see one exception area around longitude of -180/180, where geography and geometry will give very different result (in geography, -179.9 to 179.9 is a short distance, but in geometry that is a whole circle of the sphere). In other areas treating geography as geometry should give close approximation.
@yupeng9 What do you think if we allow ST_Contains and ST_Within on geography but use the geometry algorithm?

@yupeng9
Copy link
Contributor

yupeng9 commented May 5, 2022

@Jackie-Jiang is spot on. There are corner cases around the boundary, and thats the risk you have face with the approximation. Alternatively, you can explore the ST_Contains implementation for geography and add the proper impl to Pinot

@Jackie-Jiang
Copy link
Contributor

I think we can merge this and update our documentation to warn about the behavior so that H3 index can be used for now. The geography support can be added in a separate PR. @yupeng9 wdyt?

@yupeng9
Copy link
Contributor

yupeng9 commented May 5, 2022

I'm fine with it. @WangCHX Please highlight this and put a warning in the user doc

@Jackie-Jiang Jackie-Jiang merged commit 60e2706 into apache:master May 5, 2022
@WangCHX WangCHX deleted the cwang/nest_st branch May 6, 2022 04:37
@WangCHX
Copy link
Contributor Author

WangCHX commented May 6, 2022

I'm fine with it. @WangCHX Please highlight this and put a warning in the user doc

sure. will add that.

@WangCHX
Copy link
Contributor Author

WangCHX commented May 9, 2022

doc: pinot-contrib/pinot-docs#92

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants