Skip to content

GEOSEARCH BYBOX: Simplified haversine distance formula when longitude diff is 0#11579

Merged
oranagra merged 3 commits intoredis:unstablefrom
filipecosta90:optimize.geodist.itt2
Dec 5, 2022
Merged

GEOSEARCH BYBOX: Simplified haversine distance formula when longitude diff is 0#11579
oranagra merged 3 commits intoredis:unstablefrom
filipecosta90:optimize.geodist.itt2

Conversation

@filipecosta90
Copy link
Contributor

@filipecosta90 filipecosta90 commented Dec 5, 2022

This is take 2 of GEOSEARCH BYBOX optimizations based on haversine distance formula when longitude diff is 0. The first one was in #11535 .

  • Given longitude diff is 0 the asin(sqrt(a)) on the haversine is asin(sin(abs(u))).
  • arcsin(sin(x)) equal to x when x ∈[−𝜋/2,𝜋/2].
  • Given latitude is between [−𝜋/2,𝜋/2] we can simplifiy arcsin(sin(x)) to x.

To test it we can simply focus on the geo.tcl

tclsh tests/test_helper.tcl --single unit/geo

Notice that this logic is covered/tested in the "GEOSEARCH box edges fuzzy test" scenario.

On the sample dataset with 60M datapoints, we've measured 55% increase in the achievable ops/sec.

We can benchmark this improvement using the following dataset with 60M datapoints on the GEO key. https://s3.us-east-2.amazonaws.com/redis.benchmarks.spec/datasets/geopoint/dump.rdb

and command:

memtier_benchmark -c 1 -t 1 --pipeline 1 --test-time 60 --command "GEOSEARCH key FROMLONLAT 7 55 BYBOX 200 200 KM" --hide-histogram

unstable ( 61c85a2 )

ALL STATS
=====================================================================================================
Type            Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
-----------------------------------------------------------------------------------------------------
Geosearchs        32.90        30.38723        30.20700        34.04700        35.32700      4922.83 
Totals            32.90        30.38723        30.20700        34.04700        35.32700      4922.83 

This PR

ALL STATS
=====================================================================================================
Type            Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
-----------------------------------------------------------------------------------------------------
Geosearchs        51.20        19.51808        19.45500        22.01500        23.03900      7662.12 
Totals            51.20        19.51808        19.45500        22.01500        23.03900      7662.12 

@filipecosta90 filipecosta90 added the action:run-benchmark Triggers the benchmark suite for this Pull Request label Dec 5, 2022
@oranagra oranagra merged commit e48ac07 into redis:unstable Dec 5, 2022
@oranagra oranagra added the release-notes indication that this issue needs to be mentioned in the release notes label Dec 5, 2022
@oranagra oranagra mentioned this pull request Dec 11, 2022
oranagra pushed a commit that referenced this pull request Dec 12, 2022
… diff is 0 (#11579)

This is take 2 of `GEOSEARCH BYBOX` optimizations based on haversine
distance formula when longitude diff is 0.
The first one was in #11535 .

- Given longitude diff is 0 the asin(sqrt(a)) on the haversine is asin(sin(abs(u))).
- arcsin(sin(x)) equal to x when x ∈[−𝜋/2,𝜋/2].
- Given latitude is between [−𝜋/2,𝜋/2] we can simplifiy arcsin(sin(x)) to x.

On the sample dataset with 60M datapoints, we've measured 55% increase
in the achievable ops/sec.

(cherry picked from commit e48ac07)
madolson pushed a commit to madolson/redis that referenced this pull request Apr 19, 2023
… diff is 0 (redis#11579)

This is take 2 of `GEOSEARCH BYBOX` optimizations based on haversine
distance formula when longitude diff is 0.
The first one was in redis#11535 . 

- Given longitude diff is 0 the asin(sqrt(a)) on the haversine is asin(sin(abs(u))).
- arcsin(sin(x)) equal to x when x ∈[−𝜋/2,𝜋/2]. 
- Given latitude is between [−𝜋/2,𝜋/2] we can simplifiy arcsin(sin(x)) to x.

On the sample dataset with 60M datapoints, we've measured 55% increase
in the achievable ops/sec.
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
… diff is 0 (redis#11579)

This is take 2 of `GEOSEARCH BYBOX` optimizations based on haversine
distance formula when longitude diff is 0.
The first one was in redis#11535 . 

- Given longitude diff is 0 the asin(sqrt(a)) on the haversine is asin(sin(abs(u))).
- arcsin(sin(x)) equal to x when x ∈[−𝜋/2,𝜋/2]. 
- Given latitude is between [−𝜋/2,𝜋/2] we can simplifiy arcsin(sin(x)) to x.

On the sample dataset with 60M datapoints, we've measured 55% increase
in the achievable ops/sec.
@judeng
Copy link
Contributor

judeng commented Jun 27, 2024

@filipecosta90 Hi, I run the same benchmark with Redis7.4, the ops is only 25% of yours. I suspect that my cpu or gcc maybe too old, could you give some additional information about your hard/soft platform?
Here is information of mine:

cpu:Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz
gcc:gcc version 5.4.0 (GCC)
kernel:3.10.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

action:run-benchmark Triggers the benchmark suite for this Pull Request release-notes indication that this issue needs to be mentioned in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants