Skip to content

Speedup GEODIST with fixedpoint_d2string as an optimized version of snprintf %.4f#11552

Merged
oranagra merged 7 commits intoredis:unstablefrom
filipecosta90:optimize.addReplyDoubleDistance
Dec 4, 2022
Merged

Speedup GEODIST with fixedpoint_d2string as an optimized version of snprintf %.4f#11552
oranagra merged 7 commits intoredis:unstablefrom
filipecosta90:optimize.addReplyDoubleDistance

Conversation

@filipecosta90
Copy link
Contributor

@filipecosta90 filipecosta90 commented Nov 28, 2022

GEODIST used snprintf("%.4f") for the reply using addReplyDoubleDistance, which was slow.
This PR optimizes it without breaking compatibility by following the approach of ll2string with some changes to match the use case of distance and precision.
I.e. we multiply it by 10000 format it as an integer, and then add a decimal point.
This can achieve about 35% increase in the achievable ops/sec.

Details

Out of the 36.44% geodistCommand CPU cycles we can pinpoint 26.02% to addReplyDoubleDistance:

image

To functionally test it:

tclsh tests/test_helper.tcl --single unit/geo

To benchmark:
Use the following RDB with 60M datapoints on the GEO key. https://s3.us-east-2.amazonaws.com/redis.benchmarks.spec/datasets/geopoint/dump.rdb

memtier_benchmark --pipeline 10 --test-time 60 --command "GEODIST key 1 2" --hide-histogram

unstable branch ( 155acef )

ALL STATS
===================================================================================================
Type          Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
---------------------------------------------------------------------------------------------------
Geodists    642577.89         3.11090         3.05500         6.07900         6.33500     33258.43 
Totals      642577.89         3.11090         3.05500         6.07900         6.33500     33258.43 

this PR (c7bb268)

Approximately 35% increase in the achievable ops/sec.

ALL STATS
===================================================================================================
Type          Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
---------------------------------------------------------------------------------------------------
Geodists    868741.45         2.30079         2.25500         4.51100         4.67100     44964.16 
Totals      868741.45         2.30079         2.25500         4.51100         4.67100     44964.16 

@filipecosta90 filipecosta90 added the action:run-benchmark Triggers the benchmark suite for this Pull Request label Nov 28, 2022
@filipecosta90
Copy link
Contributor Author

@oranagra all tests are green after #11553. Can you share what you think about this PR?

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a first glance, it looks scary, too much low level code to replace something simple.
however, reading the explanation of the decimal point shift i think it's neat.

maybe if we can move that function to utils.c and generalize it a bit, then i'd feel better.
i.e. int fixedpoint_d2string(char *buf, size_t len, double value, int fractional_digits)
then it'll be a generic optimized algorithm like our ull2string, and not some special geo hack that's going too deep.

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another (last) round of comments.
only now i actually reviewed the implementation code, and not just the general concept.

@filipecosta90
Copy link
Contributor Author

@oranagra followed the recommendations and updated the numbers on the PR description.
TLDR it's a 35% increase in the achievable ops/sec.

@oranagra oranagra changed the title Introduced geodist2string as an optimized version of snprintf %.4f Speedup GEODIST with fixedpoint_d2string as an optimized version of snprintf %.4f Dec 4, 2022
@oranagra oranagra merged commit 61c85a2 into redis:unstable Dec 4, 2022
@oranagra oranagra mentioned this pull request Dec 11, 2022
oranagra added a commit that referenced this pull request Dec 12, 2022
…nprintf %.4f (#11552)

GEODIST used snprintf("%.4f") for the reply using addReplyDoubleDistance,
which was slow. This PR optimizes it without breaking compatibility by following
the approach of ll2string with some changes to match the use case of distance
and precision. I.e. we multiply it by 10000 format it as an integer, and then add
a decimal point. This can achieve about 35% increase in the achievable ops/sec.

Co-authored-by: Oran Agra <[email protected]>
(cherry picked from commit 61c85a2)
oranagra pushed a commit that referenced this pull request Dec 15, 2022
…1631)

Fixes a regression introduced by #11552 in 7.0.6.
it causes replies in the GEO commands to contain garbage when the
result is a very small distance (less than 1)
Includes test to confirm indeed with junk in buffer now we properly reply
oranagra pushed a commit that referenced this pull request Dec 16, 2022
…1631)

Fixes a regression introduced by #11552 in 7.0.6.
it causes replies in the GEO commands to contain garbage when the
result is a very small distance (less than 1)
Includes test to confirm indeed with junk in buffer now we properly reply

(cherry picked from commit d7b4c91)
madolson pushed a commit to madolson/redis that referenced this pull request Apr 19, 2023
…nprintf %.4f (redis#11552)

GEODIST used snprintf("%.4f") for the reply using addReplyDoubleDistance,
which was slow. This PR optimizes it without breaking compatibility by following
the approach of ll2string with some changes to match the use case of distance
and precision. I.e. we multiply it by 10000 format it as an integer, and then add
a decimal point. This can achieve about 35% increase in the achievable ops/sec. 

Co-authored-by: Oran Agra <[email protected]>
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
…nprintf %.4f (redis#11552)

GEODIST used snprintf("%.4f") for the reply using addReplyDoubleDistance,
which was slow. This PR optimizes it without breaking compatibility by following
the approach of ll2string with some changes to match the use case of distance
and precision. I.e. we multiply it by 10000 format it as an integer, and then add
a decimal point. This can achieve about 35% increase in the achievable ops/sec. 

Co-authored-by: Oran Agra <[email protected]>
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
…dis#11631)

Fixes a regression introduced by redis#11552 in 7.0.6.
it causes replies in the GEO commands to contain garbage when the
result is a very small distance (less than 1)
Includes test to confirm indeed with junk in buffer now we properly reply
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

action:run-benchmark Triggers the benchmark suite for this Pull Request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

GEO commands should use the addReplyDouble API and not the addReplyDoubleDistance which is more expensive and lacks precision

2 participants