Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve GETRANGE command behavior #12272

Merged
merged 5 commits into from
Aug 20, 2024
Merged

Improve GETRANGE command behavior #12272

merged 5 commits into from
Aug 20, 2024

Conversation

linzihao1999
Copy link
Contributor

@linzihao1999 linzihao1999 commented Jun 6, 2023

Notice: this change was reverted in 0aeb86d
we concluded we can't accept it as we should avoid breaking changes for non-critical bug fixes.


Improve GETRANGE command behavior

A breaking change. Fixed the issue about GETRANGE and SUBSTR command return unexpected result caused by the start and end out of definition range of string.


break change

Before this PR, when negative end was out of range (i.e., end < -strlen), we would fix it to 0 to get the substring, which also resulted in the first character still being returned for this kind of out of range.
After this PR, we ensure that GETRANGE returns an empty bulk when the negative end index is out of range.

Closes #11738


After change (all return empty string)

127.0.0.1:6379> get r
"0123456789"
127.0.0.1:6379> GETRANGE r 0 -100
""
127.0.0.1:6379> GETRANGE r 1 -100
""
127.0.0.1:6379> GETRANGE r -1 -100
""
127.0.0.1:6379> GETRANGE r -100 -99
""
127.0.0.1:6379> GETRANGE r -100 -100
""
127.0.0.1:6379> GETRANGE r -100 -101
""

Before

127.0.0.1:6379> get r
"0123456789"
127.0.0.1:6379> GETRANGE r 0 -100
"0"                                                          <- change
127.0.0.1:6379> GETRANGE r 1 -100
""
127.0.0.1:6379> GETRANGE r -1 -100
""
127.0.0.1:6379> GETRANGE r -100 -99
"0"                                                          <- change
127.0.0.1:6379> GETRANGE r -100 -100
"0"                                                          <- change
127.0.0.1:6379> GETRANGE r -100 -101
""

@sundb
Copy link
Collaborator

sundb commented Jun 7, 2023

I think i don't like The end can be empty, that means the default end will take max length of key..
Since we already have the getrange test <start> -1, adding it would be more expensive to understand.

@linzihao1999
Copy link
Contributor Author

I think i don't like The end can be empty, that means the default end will take max length of key.. Since we already have the getrange test <start> -1, adding it would be more expensive to understand.

It use to solve a problem about Change the end inclusive to exclusive.
For example, when the end exclusive, getrange test <start> -1 will not take the last one.

127.0.0.1:6379> get test
"0123456789"
127.0.0.1:6379> getrange test -3 -1
"78"

If we use sentence like follow, I think it may ambiguous, so use empty end to instead of ''getrange test <start> -1.

127.0.0.1:6379> getrange test -3 0
"789"

Maybe there is another way to solve it, please give any ideas.

linzihao1999 added a commit to linzihao1999/redis that referenced this pull request Jun 11, 2023
1. Change the end inclusive to exclusive.
2. If it exceeds the valid index range, the valid index range will be extended to all integers, and elements smaller than or larger than the valid range will be considered empty elements.
3. The end can be empty, that means the default end will take max length of key.
linzihao1999 added a commit to linzihao1999/redis that referenced this pull request Jun 11, 2023
1. Change the end inclusive to exclusive.
2. If it exceeds the valid index range, the valid index range will be extended to all integers, and elements smaller than or larger than the valid range will be considered empty elements.
3. The end can be empty, that means the default end will take max length of key.
@sundb
Copy link
Collaborator

sundb commented Oct 8, 2023

IMHO, I'm not a fan of 1 and 3.
Isn't our goal to return empty string when start > end?

As mentioned in this comment: #11738 (comment)
We just let end to be non-inclusive.
What about the following patch:

diff --git a/src/t_string.c b/src/t_string.c
index 2bce3acc..2ef6e339 100644
--- a/src/t_string.c
+++ b/src/t_string.c
@@ -527,8 +527,11 @@ void getrangeCommand(client *c) {
     }
     if (start < 0) start = strlen+start;
     if (end < 0) end = strlen+end;
+    if (end < 0) {
+        addReply(c,shared.emptybulk);
+        return;
+    }
     if (start < 0) start = 0;
-    if (end < 0) end = 0;
     if ((unsigned long long)end >= strlen) end = strlen-1;

@linzihao1999
Copy link
Contributor Author

Yes, your idea is more simple. Let me make a quick change.

@mgravell
Copy link

mgravell commented Oct 8, 2023

Outright changing the meaning of a parameter (1) is a complete no-go. That's a hard break. If there is a valid reason, a valid suggestion might be to allow a syntax to optionally change it, i.e. like the ( in zrange, but honestly: I don't see the point of that in the case of indices (as opposed to scores): just request one fewer

Change the end to non-inclusive.
@sundb
Copy link
Collaborator

sundb commented Oct 8, 2023

@linzihao1999 I think you misunderstood non-inclusive, what I meant was that it shouldn't be set to 0 when end < 0, and like start, it can't be strlen when start >= strlen.

Another thought:
When entering the last if, the interval of start is [0, ∞] and the interval of end is [-∞, strlen-1].
So all we have to do is add in whether one of them is out of range, and we're done.

diff --git a/src/t_string.c b/src/t_string.c
index 2bce3acc8..8b887a8e9 100644
--- a/src/t_string.c
+++ b/src/t_string.c
@@ -520,20 +520,14 @@ void getrangeCommand(client *c) {
         strlen = sdslen(str);
     }
 
-    /* Convert negative indexes */
-    if (start < 0 && end < 0 && start > end) {
-        addReply(c,shared.emptybulk);
-        return;
-    }
     if (start < 0) start = strlen+start;
     if (end < 0) end = strlen+end;
     if (start < 0) start = 0;
-    if (end < 0) end = 0;
     if ((unsigned long long)end >= strlen) end = strlen-1;
 
     /* Precondition: end >= 0 && end < strlen, so the only condition where
      * nothing can be returned is: start > end. */
-    if (start > end || strlen == 0) {
+    if ((unsigned long long)start >= strlen || end < 0 || start > end || strlen == 0) {
         addReply(c,shared.emptybulk);
     } else {
         addReplyBulkCBuffer(c,(char*)str+start,end-start+1);

@linzihao1999
Copy link
Contributor Author

@sundb Thanks for patient guidance!
I make a new change, this should fix the bug and not a hard break.

@sundb sundb added the breaking-change This change can potentially break existing application label Nov 16, 2023
@sundb
Copy link
Collaborator

sundb commented Feb 1, 2024

@linzihao1999 please also update the top comment with a brief description of break changes.
@oranagra It's ok for merge.

@linzihao1999
Copy link
Contributor Author

Hi @sundb top comment updated

@oranagra
Copy link
Member

oranagra commented Feb 6, 2024

@linzihao1999 thanks.
AFAICT i think the current version looks good.

i think it was me that suggested that all the headache with this command comes from the fact end is inclusive, and IIRC my suggestion was to first convert the inputs in the beginning of the function so that end is non-inclusive, and then apply a set of corrections (for negative numbers and so on), so that the rest of the code will behave more logically.

in any case, we can't completely change the meaning of the command arguments, if we wanna do that we need to deprecate the old command and invent a new one.

looking at the recent revision in the code, i understand that you didn't change the meaning of the arguments, and the behavior change here is minimal. i.e. it'll only change behavior when the user used a very specific unlikely scenario when both start and end are <=0 and end >= start.
is that right? if it is (or isn't) please edit the top comment to explicitly state the conditions in which the result is different than before. (to supplement the example you gave).

i also see the top comment still mention something about changing start and end to non-inclusive. is that a relic from a previous revision? it's confusing...
maybe you should split the top comment into two parts, one meant for users and describing the interface changes, and another about the internal mechanics of the code you wrote (i.e. be very clear that this non-inclusive (if at all still relevant) is about the variables in the function, and not the command arguments))

@oranagra
Copy link
Member

we discussed this in a core-team meeting, we do agree it is worth fixing. (as a breaking change to be released in 8.0).

@linzihao1999 please ack on my previous message and see if you can make the top comment clearer.

@sundb
Copy link
Collaborator

sundb commented Mar 19, 2024

@oranagra the comment has been updated.

@oranagra
Copy link
Member

is the behavior change only when end < -strlen? or like i wrote above when start <=0 && end <=0 && end >= start? (don't remember how i got to that conclusion).
just trying to make the behavior change statement clear and accurate...

@sundb
Copy link
Collaborator

sundb commented Mar 19, 2024

is the behavior change only when end < -strlen?

all the changes were made for it and judge if it was out of range before set end to 0.
the other changes are just changing the position.

@CLAassistant
Copy link

CLAassistant commented Mar 24, 2024

CLA assistant check
All committers have signed the CLA.

@sundb sundb added the release-notes indication that this issue needs to be mentioned in the release notes label Jun 7, 2024
@sundb sundb merged commit 6ceadfb into redis:unstable Aug 20, 2024
14 checks passed
sundb added a commit to sundb/redis that referenced this pull request Aug 29, 2024
Fixed the issue about GETRANGE and SUBSTR command
return unexpected result caused by the `start` and `end` out of
definition range of string.

---
## break change
Before this PR, when negative `end` was out of range (i.e., end <
-strlen), we would fix it to 0 to get the substring, which also resulted
in the first character still being returned for this kind of out of
range.
After this PR, we ensure that `GETRANGE` returns an empty bulk when the
negative end index is out of range.

Closes redis#11738

---------

Co-authored-by: debing.sun <[email protected]>
@YaacovHazan YaacovHazan mentioned this pull request Sep 11, 2024
YaacovHazan added a commit that referenced this pull request Sep 12, 2024
### New Features in binary distributions

- 7 new data structures: JSON, Time series, Bloom filter, Cuckoo filter,
Count-min sketch, Top-k, t-digest
- Redis scalable query engine (including vector search)

### Potentially breaking changes

- #12272 `GETRANGE` returns an empty bulk when the negative end index is
out of range
- #12395 Optimize `SCAN` command when matching data type

### Bug fixes

- #13510 Fix `RM_RdbLoad` to enable AOF after RDB loading is completed
- #13489 `ACL CAT` - return module commands
- #13476 Fix a race condition in the `cache_memory` of `functionsLibCtx`
- #13473 Fix incorrect lag due to trimming stream via `XTRIM` command
- #13338 Fix incorrect lag field in `XINFO` when tombstone is after the
`last_id` of the consume group
- #13470 On `HDEL` of last field - update the global hash field
expiration data structure
- #13465 Cluster: Pass extensions to node if extension processing is
handled by it
- #13443 Cluster: Ensure validity of myself when loading cluster config
- #13422 Cluster: Fix `CLUSTER SHARDS` command returns empty array

### Modules API

- #13509 New API calls: `RM_DefragAllocRaw`, `RM_DefragFreeRaw`, and
`RM_RegisterDefragCallbacks` - defrag API to allocate and free raw
memory

### Performance and resource utilization improvements

- #13503 Avoid overhead of comparison function pointer calls in listpack
`lpFind`
- #13505 Optimize `STRING` datatype write commands
- #13499 Optimize `SMEMBERS` command
- #13494 Optimize `GEO*` commands reply
- #13490 Optimize `HELLO` command
- #13488 Optimize client query buffer
- #12395 Optimize `SCAN` command when matching data type
- #13529 Optimize `LREM`, `LPOS`, `LINSERT`, and `LINDEX` commands
- #13516 Optimize `LRANGE` and other commands that perform several
writes to client buffers per call
- #13431 Avoid `used_memory` contention when updating from multiple
threads

### Other general improvements

- #13495 Reply `-LOADING` on replica while flushing the db

### CLI tools

- #13411 redis-cli: Fix wrong `dbnum` showed after the client
reconnected

### Notes

- No backward compatibility for replication or persistence.
- Additional distributions, upgrade paths, features, and improvements
will be introduced in upcoming pre-releases.
- With the GA release of 8.0 we will deprecate Redis Stack.
YaacovHazan pushed a commit to YaacovHazan/redis that referenced this pull request Jan 13, 2025
Although the commit #6ceadfb58 improves GETRANGE command behavior,
we can't accept it as we should avoid a breaking change for non-critical bug fixes.

This reverts commit 6ceadfb.
YaacovHazan pushed a commit to YaacovHazan/redis that referenced this pull request Jan 13, 2025
Although the commit #6ceadfb58 improves GETRANGE command behavior,
we can't accept it as we should avoid breaking changes for non-critical bug fixes.

This reverts commit 6ceadfb.
YaacovHazan pushed a commit that referenced this pull request Feb 5, 2025
Although the commit #6ceadfb58 improves GETRANGE command behavior,
we can't accept it as we should avoid breaking changes for non-critical bug fixes.

This reverts commit 6ceadfb.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change This change can potentially break existing application release-notes indication that this issue needs to be mentioned in the release notes
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[BUG] SUBSTR returns wrong result with start 0 and end less than start
5 participants