GEOSEARCH BYBOX: Reduce wastefull computation on geohashGetDistanceIfInRectangle and geohashGetDistance#11535
Merged
oranagra merged 5 commits intoredis:unstablefrom Nov 24, 2022
Conversation
oranagra
reviewed
Nov 24, 2022
Co-authored-by: Oran Agra <[email protected]>
zeekling
approved these changes
Nov 24, 2022
oranagra
approved these changes
Nov 24, 2022
Member
|
@filipecosta90 if it simple, i'm curious to know the impact of the reordering we did and update the numbers in the top comment |
Contributor
Author
@oranagra I've updated the PR comment with the data. |
oranagra
pushed a commit
that referenced
this pull request
Dec 5, 2022
… diff is 0 (#11579) This is take 2 of `GEOSEARCH BYBOX` optimizations based on haversine distance formula when longitude diff is 0. The first one was in #11535 . - Given longitude diff is 0 the asin(sqrt(a)) on the haversine is asin(sin(abs(u))). - arcsin(sin(x)) equal to x when x ∈[−𝜋/2,𝜋/2]. - Given latitude is between [−𝜋/2,𝜋/2] we can simplifiy arcsin(sin(x)) to x. On the sample dataset with 60M datapoints, we've measured 55% increase in the achievable ops/sec.
Merged
oranagra
added a commit
that referenced
this pull request
Dec 12, 2022
…InRectangle and geohashGetDistance (#11535) Optimize geohashGetDistanceIfInRectangle when there are many misses. It calls 3x geohashGetDistance. The first 2 times we call them to produce intermediate results. This PR focus on optimizing for those 2 intermediate results. 1 Reduce expensive computation on intermediate geohashGetDistance with same long 2 Avoid expensive lon_distance calculation if lat_distance fails beforehand Co-authored-by: Oran Agra <[email protected]> (cherry picked from commit ae1de54)
oranagra
pushed a commit
that referenced
this pull request
Dec 12, 2022
… diff is 0 (#11579) This is take 2 of `GEOSEARCH BYBOX` optimizations based on haversine distance formula when longitude diff is 0. The first one was in #11535 . - Given longitude diff is 0 the asin(sqrt(a)) on the haversine is asin(sin(abs(u))). - arcsin(sin(x)) equal to x when x ∈[−𝜋/2,𝜋/2]. - Given latitude is between [−𝜋/2,𝜋/2] we can simplifiy arcsin(sin(x)) to x. On the sample dataset with 60M datapoints, we've measured 55% increase in the achievable ops/sec. (cherry picked from commit e48ac07)
madolson
pushed a commit
to madolson/redis
that referenced
this pull request
Apr 19, 2023
…InRectangle and geohashGetDistance (redis#11535) Optimize geohashGetDistanceIfInRectangle when there are many misses. It calls 3x geohashGetDistance. The first 2 times we call them to produce intermediate results. This PR focus on optimizing for those 2 intermediate results. 1 Reduce expensive computation on intermediate geohashGetDistance with same long 2 Avoid expensive lon_distance calculation if lat_distance fails beforehand Co-authored-by: Oran Agra <[email protected]>
madolson
pushed a commit
to madolson/redis
that referenced
this pull request
Apr 19, 2023
… diff is 0 (redis#11579) This is take 2 of `GEOSEARCH BYBOX` optimizations based on haversine distance formula when longitude diff is 0. The first one was in redis#11535 . - Given longitude diff is 0 the asin(sqrt(a)) on the haversine is asin(sin(abs(u))). - arcsin(sin(x)) equal to x when x ∈[−𝜋/2,𝜋/2]. - Given latitude is between [−𝜋/2,𝜋/2] we can simplifiy arcsin(sin(x)) to x. On the sample dataset with 60M datapoints, we've measured 55% increase in the achievable ops/sec.
enjoy-binbin
pushed a commit
to enjoy-binbin/redis
that referenced
this pull request
Jul 31, 2023
…InRectangle and geohashGetDistance (redis#11535) Optimize geohashGetDistanceIfInRectangle when there are many misses. It calls 3x geohashGetDistance. The first 2 times we call them to produce intermediate results. This PR focus on optimizing for those 2 intermediate results. 1 Reduce expensive computation on intermediate geohashGetDistance with same long 2 Avoid expensive lon_distance calculation if lat_distance fails beforehand Co-authored-by: Oran Agra <[email protected]>
enjoy-binbin
pushed a commit
to enjoy-binbin/redis
that referenced
this pull request
Jul 31, 2023
… diff is 0 (redis#11579) This is take 2 of `GEOSEARCH BYBOX` optimizations based on haversine distance formula when longitude diff is 0. The first one was in redis#11535 . - Given longitude diff is 0 the asin(sqrt(a)) on the haversine is asin(sin(abs(u))). - arcsin(sin(x)) equal to x when x ∈[−𝜋/2,𝜋/2]. - Given latitude is between [−𝜋/2,𝜋/2] we can simplifiy arcsin(sin(x)) to x. On the sample dataset with 60M datapoints, we've measured 55% increase in the achievable ops/sec.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
If we run the following benchmark
and profile it:

We see that 54.78% of cpu cycles are from geohashGetDistanceIfInRectangle.
Within it we're calling 3x geohashGetDistance. The first 2 times we call them to produce intermediate results.
This PR focus on optimizing for those 2 intermediate results.
Results
On pipeline 1, single client benchmark, we move from average latency (including RTT) of 93.59895 ms to 73.04606 ms ( approximately 22% latency drop ).
Furthermore we can see that the command latency distribution is now more stable ( check avg, p50, p99 and p999 for last result )
baseline from unstable branch ( 3b462ce )
After 1 Reduce expensive computation on intermediate geohashGetDistance with same long ( 48895ee )
This PR at b27590a
After 2 Avoid expensive lon_distance calculation if lat_distance fails beforehand ( 6e2ecc7 )