Skip to content

GEOSEARCH BYBOX: Reduce wastefull computation on geohashGetDistanceIfInRectangle and geohashGetDistance#11535

Merged
oranagra merged 5 commits intoredis:unstablefrom
filipecosta90:optimize.geohashGetDistance
Nov 24, 2022
Merged

GEOSEARCH BYBOX: Reduce wastefull computation on geohashGetDistanceIfInRectangle and geohashGetDistance#11535
oranagra merged 5 commits intoredis:unstablefrom
filipecosta90:optimize.geohashGetDistance

Conversation

@filipecosta90
Copy link
Contributor

@filipecosta90 filipecosta90 commented Nov 24, 2022

If we run the following benchmark

memtier_benchmark -c 1 -t 1 --pipeline 1 --test-time 60 --command "GEOSEARCH key FROMLONLAT 7 55 BYBOX 200 200 KM" --hide-histogram

and profile it:
image

We see that 54.78% of cpu cycles are from geohashGetDistanceIfInRectangle.
Within it we're calling 3x geohashGetDistance. The first 2 times we call them to produce intermediate results.
This PR focus on optimizing for those 2 intermediate results.

  • 1 Reduce expensive computation on intermediate geohashGetDistance with same long ( 48895ee )
  • 2 Avoid expensive lon_distance calculation if lat_distance fails beforehand ( 6e2ecc7 )

Results

On pipeline 1, single client benchmark, we move from average latency (including RTT) of 93.59895 ms to 73.04606 ms ( approximately 22% latency drop ).

Furthermore we can see that the command latency distribution is now more stable ( check avg, p50, p99 and p999 for last result )

baseline from unstable branch ( 3b462ce )

ALL STATS
=====================================================================================================
Type            Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
-----------------------------------------------------------------------------------------------------
Geosearchs        10.67        93.59895        97.27900        97.79100        97.79100      1596.95 
Totals            10.67        93.59895        97.27900        97.79100        97.79100      1596.95 

After 1 Reduce expensive computation on intermediate geohashGetDistance with same long ( 48895ee )

ALL STATS
=====================================================================================================
Type            Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
-----------------------------------------------------------------------------------------------------
Geosearchs        11.43        87.40721        87.55100        90.62300        91.13500      1709.95 
Totals            11.43        87.40721        87.55100        90.62300        91.13500      1709.95 

This PR at b27590a

ALL STATS
=====================================================================================================
Type            Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
-----------------------------------------------------------------------------------------------------
Geosearchs        12.67        79.07036        79.35900        79.35900        79.87100      1895.72 
Totals            12.67        79.07036        79.35900        79.35900        79.87100      1895.72 

After 2 Avoid expensive lon_distance calculation if lat_distance fails beforehand ( 6e2ecc7 )

ALL STATS
=====================================================================================================
Type            Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
-----------------------------------------------------------------------------------------------------
Geosearchs        13.67        73.04606        73.21500        73.72700        74.23900      2046.08 
Totals            13.67        73.04606        73.21500        73.72700        74.23900      2046.08 

@filipecosta90 filipecosta90 added the action:run-benchmark Triggers the benchmark suite for this Pull Request label Nov 24, 2022
@oranagra oranagra added the release-notes indication that this issue needs to be mentioned in the release notes label Nov 24, 2022
@oranagra oranagra merged commit ae1de54 into redis:unstable Nov 24, 2022
@oranagra
Copy link
Member

@filipecosta90 if it simple, i'm curious to know the impact of the reordering we did and update the numbers in the top comment

@filipecosta90
Copy link
Contributor Author

@filipecosta90 if it simple, i'm curious to know the impact of the reordering we did and update the numbers in the top comment

@oranagra I've updated the PR comment with the data.
Reorerding moved avg. latency from 79.87100 ms to 74.23900 ms =) Indeed a good improvement!!

oranagra pushed a commit that referenced this pull request Dec 5, 2022
… diff is 0 (#11579)

This is take 2 of `GEOSEARCH BYBOX` optimizations based on haversine
distance formula when longitude diff is 0.
The first one was in #11535 . 

- Given longitude diff is 0 the asin(sqrt(a)) on the haversine is asin(sin(abs(u))).
- arcsin(sin(x)) equal to x when x ∈[−𝜋/2,𝜋/2]. 
- Given latitude is between [−𝜋/2,𝜋/2] we can simplifiy arcsin(sin(x)) to x.

On the sample dataset with 60M datapoints, we've measured 55% increase
in the achievable ops/sec.
@oranagra oranagra mentioned this pull request Dec 11, 2022
oranagra added a commit that referenced this pull request Dec 12, 2022
…InRectangle and geohashGetDistance (#11535)

Optimize geohashGetDistanceIfInRectangle when there are many misses.
It calls 3x geohashGetDistance. The first 2 times we call them to produce intermediate results.
This PR focus on optimizing for those 2 intermediate results.

1 Reduce expensive computation on intermediate geohashGetDistance with same long
2 Avoid expensive lon_distance calculation if lat_distance fails beforehand

Co-authored-by: Oran Agra <[email protected]>
(cherry picked from commit ae1de54)
oranagra pushed a commit that referenced this pull request Dec 12, 2022
… diff is 0 (#11579)

This is take 2 of `GEOSEARCH BYBOX` optimizations based on haversine
distance formula when longitude diff is 0.
The first one was in #11535 .

- Given longitude diff is 0 the asin(sqrt(a)) on the haversine is asin(sin(abs(u))).
- arcsin(sin(x)) equal to x when x ∈[−𝜋/2,𝜋/2].
- Given latitude is between [−𝜋/2,𝜋/2] we can simplifiy arcsin(sin(x)) to x.

On the sample dataset with 60M datapoints, we've measured 55% increase
in the achievable ops/sec.

(cherry picked from commit e48ac07)
madolson pushed a commit to madolson/redis that referenced this pull request Apr 19, 2023
…InRectangle and geohashGetDistance (redis#11535)

Optimize geohashGetDistanceIfInRectangle when there are many misses.
It calls 3x geohashGetDistance. The first 2 times we call them to produce intermediate results.
This PR focus on optimizing for those 2 intermediate results.

1 Reduce expensive computation on intermediate geohashGetDistance with same long
2 Avoid expensive lon_distance calculation if lat_distance fails beforehand

Co-authored-by: Oran Agra <[email protected]>
madolson pushed a commit to madolson/redis that referenced this pull request Apr 19, 2023
… diff is 0 (redis#11579)

This is take 2 of `GEOSEARCH BYBOX` optimizations based on haversine
distance formula when longitude diff is 0.
The first one was in redis#11535 . 

- Given longitude diff is 0 the asin(sqrt(a)) on the haversine is asin(sin(abs(u))).
- arcsin(sin(x)) equal to x when x ∈[−𝜋/2,𝜋/2]. 
- Given latitude is between [−𝜋/2,𝜋/2] we can simplifiy arcsin(sin(x)) to x.

On the sample dataset with 60M datapoints, we've measured 55% increase
in the achievable ops/sec.
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
…InRectangle and geohashGetDistance (redis#11535)

Optimize geohashGetDistanceIfInRectangle when there are many misses.
It calls 3x geohashGetDistance. The first 2 times we call them to produce intermediate results.
This PR focus on optimizing for those 2 intermediate results.

1 Reduce expensive computation on intermediate geohashGetDistance with same long
2 Avoid expensive lon_distance calculation if lat_distance fails beforehand

Co-authored-by: Oran Agra <[email protected]>
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
… diff is 0 (redis#11579)

This is take 2 of `GEOSEARCH BYBOX` optimizations based on haversine
distance formula when longitude diff is 0.
The first one was in redis#11535 . 

- Given longitude diff is 0 the asin(sqrt(a)) on the haversine is asin(sin(abs(u))).
- arcsin(sin(x)) equal to x when x ∈[−𝜋/2,𝜋/2]. 
- Given latitude is between [−𝜋/2,𝜋/2] we can simplifiy arcsin(sin(x)) to x.

On the sample dataset with 60M datapoints, we've measured 55% increase
in the achievable ops/sec.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

action:run-benchmark Triggers the benchmark suite for this Pull Request release-notes indication that this issue needs to be mentioned in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants