Skip to content

EXPIRE, EXPIREAT, SETEX, GETEX: Return error when expire time overflows#8287

Merged
oranagra merged 9 commits intoredis:unstablefrom
GnaneshKunal:fix-expire-overflow
Feb 21, 2021
Merged

EXPIRE, EXPIREAT, SETEX, GETEX: Return error when expire time overflows#8287
oranagra merged 9 commits intoredis:unstablefrom
GnaneshKunal:fix-expire-overflow

Conversation

@GnaneshKunal
Copy link
Contributor

@GnaneshKunal GnaneshKunal commented Jan 5, 2021

TL:DR: Respond with error if expire time overflows from positive to negative of vice versa.

  • SETEX, SET EX, GETEX etc would have already error on negative value,
    but now they would also error on overflows (i.e. when the input was positive but
    after the manipulation it becomes negative, which would have passed before)
  • EXPIRE and EXPIREAT was ok taking negative values (would implicitly delete
    the key), we keep that, but we do error if the user provided a value that changes
    sign when manipulated (except the case of changing sign when basetime is added)

If we set the expiry of a key to 10^15, the key is automatically expired due to an overflow while trying to convert the value from seconds to milliseconds.

Expire time overflows (GDB):

Thread 1 "redis-server" hit Breakpoint 1, expireGenericCommand (c=0x7ffff751c480, basetime=1609840928734, unit=0) at expire.c
501         robj *key = c->argv[1], *param = c->argv[2];
(gdb) n
504         if (getLongLongFromObjectOrReply(c, param, &when, NULL) != C_OK)
(gdb) n
507         if (unit == UNIT_SECONDS) when *= 1000;
(gdb) n
508         when += basetime;
(gdb) p when
$1 = -8446744073709551616

Earlier:

127.0.0.1:6379> SET A B EX 10000000000000000
OK
127.0.0.1:6379> GET A
(nil)
127.0.0.1:6379> SET A B
OK
127.0.0.1:6379> GET A
"B"
127.0.0.1:6379> EXPIRE A 10000000000000000
(integer) 1
(48.60s)
127.0.0.1:6379> GET A
(nil)

Now:

127.0.0.1:6379> SET A B EX 10000000000000000
(error) ERR invalid expire time in set
127.0.0.1:6379> SET A B
OK
127.0.0.1:6379> GET A
"B"
127.0.0.1:6379> EXPIRE A 10000000000000000
(error) ERR invalid expire time in expire
127.0.0.1:6379> GET A
"B"

@GnaneshKunal GnaneshKunal changed the title Fix expire time overflow Fix expire time overflow: Return error when overflow is detected Jan 6, 2021
src/expire.c Outdated
Comment on lines 537 to 546
/*
* If the provided expiry from input was negative, and after
* unit conversion and adding `basetime`, if the key was still
* not expired, there is an integer overflow.
*/
if (is_expire_input_negative && when > 0) {
addReplyErrorFormat(c, "invalid expire time in %s", c->cmd->name);
return;
}
setExpire(c,c->db,key,when);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before:

127.0.0.1:6379> SET A B
OK
127.0.0.1:6379> EXPIRE A -9999999999999999
(integer) 1
127.0.0.1:6379> TTL A
(integer) 8446744073709551
127.0.0.1:6379>

After:

127.0.0.1:6379> SET A B
OK
127.0.0.1:6379> EXPIRE A -9999999999999999
(error) ERR invalid expire time in expire
127.0.0.1:6379> GET A
"B"
127.0.0.1:6379>

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is looks a bit convoluted.
for setGenericCommand, without a big comment it's hard to understand why we're repeating that check twice.

for expireGenericCommand it's odd to see an input validation check at the bottom of the functoin.

how about moving the one and only check to be above the multiplication, and using this condition:

    if (when <= 0 || (unit == UNIT_SECONDS && when * 1000 < 0)) {

@oranagra
Copy link
Member

oranagra commented Jan 8, 2021

btw, it may be a nice idea to add tests for this, so that we can catch any future regression.

@GnaneshKunal
Copy link
Contributor Author

GnaneshKunal commented Feb 6, 2021

for setGenericCommand, without a big comment it's hard to understand why we're repeating that check twice.

how about moving the one and only check to be above the multiplication, and using this condition:

    if (when <= 0 || (unit == UNIT_SECONDS && when * 1000 < 0)) {

I have tried the suggestion, but the condition behaves a little odd. The compiler smartly make the right hand when * 1000 < 0 to false because when <= 0 has failed. If we try to store when * 1000 in a temporary variable and try to use that variable instead, the compiler no longer does the prediction. If when * 1000 is inline, the following condition somehow passes and we no longer get overflow.

I'm afraid of the future problems due to different compiler behaviours though. You can check the same in the below video:

conv.mp4

for expireGenericCommand it's odd to see an input validation check at the bottom of the functoin.

Actually, thats kind of important.

For a positive to negative number overflow, all we have to do is try to convert it to seconds by (* 1000) and adding basetime and checking whether it has become negative.

For a negative to positive number overflow, we do the necessary unit conversions, and we need to check whether checkAlreadyExpired returns false. If it is false, we know that the key didn't expire (which happens when expiry is positive) and the negative number has become positive with the following condition:

int is_expire_input_negative = when < 0;

if (checkAlreadyExpired(when)) {

} else {
  if (is_expire_input_negative && when > 0) {
    printf("negavite to positive overflow");
  }
}

Moving the condition won't work since we completely depend on checkAlreadyExpired to return false.

And BTW, the tests passes on my machine but fails on CI even though they're simple. Will debug and modify the test case.

@oranagra
Copy link
Member

oranagra commented Feb 7, 2021

@GnaneshKunal if the compiler doesn't detect the possibility for overflow in that if and optimizes that check out, then let's do:

 if (when <= 0 || (unit == UNIT_SECONDS && when >= LLONG_MAX / 1000))

this will let us keep the input check at the input and avoid repeating the code that responds with error.
p.s. the division will be performed at compile time, not runtime.

@GnaneshKunal
Copy link
Contributor Author

@GnaneshKunal if the compiler doesn't detect the possibility for overflow in that if and optimizes that check out, then let's do:

 if (when <= 0 || (unit == UNIT_SECONDS && when >= LLONG_MAX / 1000))

this will let us keep the input check at the input and avoid repeating the code that responds with error.
p.s. the division will be performed at compile time, not runtime.

That works perfectly.

src/expire.c Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's convert that one too..
i.e. move the input check to share the same early exit with the other one at the top.

src/expire.c Outdated
Comment on lines 512 to 515
if (when <= 0 || (unit == UNIT_SECONDS && when >= LLONG_MAX / 1000)) {
addReplyErrorFormat(c, "invalid expire time in %s", c->cmd->name);
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check needs to move to before the adjustment above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the unit conversion, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when <= 0 makes EXPIRE to not take negative inputs at all.

@oranagra oranagra added release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus labels Feb 7, 2021
@oranagra
Copy link
Member

oranagra commented Feb 7, 2021

@GnaneshKunal thank you.. the current version looks much better than the earlier ones.
please note that since you began this project, a new getexCommand was added, and this fix is needed there too. please mirror the changes from setGenericCommand.

one concern about absolute times

In theory when basetime is 0 (or OBJ_EXAT / OBJ_PXAT are set), and when is 0, it means 1/1/1970, which is a valid time.
a negative when could also be valid i.e. 1/1/1069 is valid too.
this commit will cause these to be invalid (error), rather than the previous behavior which was to delete the key implicitly (like an absolute expire time of 1, which is a second after 1/1/1970).

i suppose that considering we're discussing wall clock here (not an arbitrary date record in a database), a negative absolute time (1969) is not a real concern. but i'd like to validate that with @yossigo.

in theory this change can break some application (return an error for a command that was previously valid), but i think it is more likely that he'll help someone find a bug in his code rather than break an innocent app.

meanwhile, i'm marking this a major-decision, but if we decide to skip this check when the user provides absolute time, i think we can consider this overflow fix just a bug fix, and not a breaking change.

@oranagra
Copy link
Member

oranagra commented Feb 7, 2021

more details about the negative absolute expire time:

  1. for GETEX and SET EXAT, the absolute expire time is a new feature, so there's no breaking change concern here.
  2. for both EXPIRE and EXPIREAT commands there was no negative check before this PR, so this could be a breaking change even for the non-absolute time variant.
    if we wanna avoid doing that breaking change, maybe for these we can just check the overflow, and not the <=0.

@GnaneshKunal
Copy link
Contributor Author

GnaneshKunal commented Feb 7, 2021

for both EXPIRE and EXPIREAT commands there was no negative check before this PR, so this could be a breaking change even for the non-absolute time variant.

Actually for EXPIRE, negative value is still valid to some extend. For example, EXPIRE A -1 or EXPIRE A -4324 will still remove A. If the negative number overflows to positive, we get an error. Also if the negative + basetime doesn't turn out to be a positive number, we get an error.

Example:

EXPIRE A -1 # Deletes key
EXPIRE A -9999 # Deletes key
EXPIRE A -9999999999999999 # Overflow so error
EXPIRE A -9999999991 # Doesn't overflow but `expiry + basetime` is still negative so error

for both EXPIRE and EXPIREAT commands there was no negative check before this PR, so this could be a breaking change even for the non-absolute time variant.

This is true for *AT commands. EXPIREAT A -1 will throw an error saying invalid time.

if we wanna avoid doing that breaking change, maybe for these we can just check the overflow, and not the <=0.

Instead of removing <=0 check, can we do the check only when basetime is not 0?

if ((basetime != 0 && when <= 0) || (unit == UNIT_SECONDS && when >= LLONG_MAX / 1000))

@oranagra
Copy link
Member

oranagra commented Feb 7, 2021

ohh, i was sure you took my advise in #8287 (comment) and moved the check to look at the raw input from the user (not after basetime is added).
keeping it like it is now (chekcing after basetime is added) means that negative inputs from the user are valid, up to some -50 years (i.e compute to before 1/1/1970), it makes no sense to me.

Instead of removing <=0 check, can we do the check only when basetime is not 0?

although this makes sense (allow absolute time before 1970, and reject just negative relative time), i suggested it too, i'm now afraid that this can cause a breaking change to some app.

bottom line:
i think the code in SET and GETEX commands is ok (validates the user input to not be negative or overflow, RESTORE does that too).
i think that considering backwards compatibility, we should only check overflows in EXPIRE and EXPIREAT.

if we're not sure, we can wait to see what Yossi thinks.

@GnaneshKunal
Copy link
Contributor Author

ohh, i was sure you took my advise in #8287 (comment) and moved the check to look at the raw input from the user (not after basetime is added).

Tried it but few non-expire test cases were failing. Looks like some tests cases for MASTER and SLAVE consistency. So ignored the change. Here's the complete trace:

[exception]: Executing test client: ERR invalid expire time in expire.
ERR invalid expire time in expire
    while executing
"[srv $level "client"] {*}$args"
    (procedure "r" line 7)
    invoked from within
"{*}$r expire [randomKey] [randomInt 2]"
    (procedure "createComplexDataset" line 10)
    invoked from within
"createComplexDataset r $numops useexpire"
    ("uplevel" body line 2)
    invoked from within
"uplevel 1 $code"
    (procedure "test" line 51)
    invoked from within
"test {MASTER and SLAVE consistency with expire} {
            createComplexDataset r $numops useexpire
            after 4000 ;# Make sure everything ..."
    ("uplevel" body line 13)
    invoked from within
"uplevel 1 $code "
    (procedure "start_server" line 3)
    invoked from within
"start_server {} {
        test {First server should have role slave after SLAVEOF} {
            r -1 slaveof [srv 0 host] [srv 0 port]
            wa..."
    ("uplevel" body line 2)
    invoked from within
"uplevel 1 $code "
    (procedure "start_server" line 3)
    invoked from within
"start_server {tags {"repl"}} {
    start_server {} {
        test {First server should have role slave after SLAVEOF} {
            r -1 slaveof [srv ..."
    (file "tests/integration/replication-3.tcl" line 1)
    invoked from within
"source $path"
    (procedure "execute_test_file" line 4)
    invoked from within
"execute_test_file $data"
    (procedure "test_client_main" line 10)
    invoked from within
"test_client_main $::test_server_port "
Killing still running Redis server 46661
Killing still running Redis server 46655
Killing still running Redis server 46656
Killing still running Redis server 46672
Killing still running Redis server 46674
Killing still running Redis server 46670
Killing still running Redis server 46684
Killing still running Redis server 46900
Killing still running Redis server 46991
Killing still running Redis server 47092
Killing still running Redis server 47232
Killing still running Redis server 47259
Killing still running Redis server 47271
I/O error reading reply
    while executing
"$r blpop $k 2"
    ("uplevel" body line 2)
    invoked from within
"uplevel 1 [lindex $args $path]"
    (procedure "randpath" line 3)
    invoked from within
"randpath {
            randpath {
                $r rpush $k $v
            } {
                $r lpush $k $v
            }
        } {
            ..."
    (procedure "bg_block_op" line 12)
    invoked from within
"bg_block_op [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3] [lindex $argv 4]"
    (file "tests/helpers/bg_block_op.tcl" line 54)
I/O error reading reply
    while executing
"$r blpop $k $k2 2"
    ("uplevel" body line 2)
    invoked from within
"uplevel 1 [lindex $args $path]"
    (procedure "randpath" line 3)
    invoked from within
"randpath {
            randpath {
                $r rpush $k $v
            } {
                $r lpush $k $v
            }
        } {
            ..."
    (procedure "bg_block_op" line 12)
    invoked from within
"bg_block_op [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3] [lindex $argv 4]"
    (file "tests/helpers/bg_block_op.tcl" line 54)
I/O error reading reply
    while executing
"$r bzpopmin $k 2"
    ("uplevel" body line 2)
    invoked from within
"uplevel 1 [lindex $args $path]"
    (procedure "randpath" line 3)
    invoked from within
"randpath {
            $r zadd $k [randomInt 10000] $v
        } {
            $r zadd $k [randomInt 10000] $v [randomInt 10000] $v2
        } {
     ..."
    (procedure "bg_block_op" line 30)
    invoked from within
"bg_block_op [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3] [lindex $argv 4]"
    (file "tests/helpers/bg_block_op.tcl" line 54)
Killing still running Redis server 47301
Killing still running Redis server 47315
Killing still running Redis server 47335
Killing still running Redis server 47348
Killing still running Redis server 47454
Killing still running Redis server 47465
I/O error reading reply
    while executing
"{*}$r type $k"
    (procedure "createComplexDataset" line 43)
    invoked from within
"createComplexDataset $r $ops"
    (procedure "bg_complex_data" line 4)
    invoked from within
"bg_complex_data [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3] [lindex $argv 4]"
    (file "tests/helpers/bg_complex_data.tcl" line 12)I/O error reading reply
    while executing
"{*}$r type $k"
    (procedure "createComplexDataset" line 27)
    invoked from within
"createComplexDataset $r $ops"
    (procedure "bg_complex_data" line 4)
    invoked from within
"bg_complex_data [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3] [lindex $argv 4]"
    (file "tests/helpers/bg_complex_data.tcl" line 12)

I/O error reading reply
    while executing
"{*}$r type $k"
    (procedure "createComplexDataset" line 27)
    invoked from within
"createComplexDataset $r $ops"
    (procedure "bg_complex_data" line 4)
    invoked from within
"bg_complex_data [lindex $argv 0] [lindex $argv 1] [lindex $argv 2] [lindex $argv 3] [lindex $argv 4]"
    (file "tests/helpers/bg_complex_data.tcl" line 12)
make[1]: *** [Makefile:374: test] Error 1
make[1]: Leaving directory '/home/gnanesh/self/redis/src'
make: *** [Makefile:6: check] Error 2

keeping it like it is now (chekcing after basetime is added) means that negative inputs from the user are valid, up to some -50 years (i.e compute to before 1/1/1970), it makes no sense to me.

Instead of removing <=0 check, can we do the check only when basetime is not 0?

although this makes sense (allow absolute time before 1970, and reject just negative relative time), i suggested it too, i'm now afraid that this can cause a breaking change to some app.

bottom line:
i think the code in SET and GETEX commands is ok (validates the user input to not be negative or overflow, RESTORE does that too).
i think that considering backwards compatibility, we should only check overflows in EXPIRE and EXPIREAT.

Okay.

if we're not sure, we can wait to see what Yossi thinks.

Sure.

@oranagra
Copy link
Member

i thought about that it was complicated so i decided to let it go.
thing is that a negative EXPIRE can change sign when basetime is added, and that's valid..
but i think the commit i pushed now solves it.
let me know what you think.

@GnaneshKunal
Copy link
Contributor Author

i thought about that it was complicated so i decided to let it go.
thing is that a negative EXPIRE can change sign when basetime is added, and that's valid..
but i think the commit i pushed now solves it.
let me know what you think.

All good. Tested it and works fine.

@oranagra
Copy link
Member

oranagra commented Feb 18, 2021

@redis/core-team i'm not entirely sure this is a "major decision" but better safe than sorry, so please have a look.
changes:

  • SETEX, SET EX, GETEX etc would have already error on negative value, but now they would also error on overflows (i.e. when the input was positive but after the manipulation it becomes negative, which would have passed before)
  • EXPIRE and EXPIREAT was ok taking negative values (would implicitly delete the key), we keep that, but we do error if the user provided a value that changes sign when manipulated (except the case of changing sign when basetime is added)

@oranagra
Copy link
Member

btw, running the new tests with the old version gives this:

[err]: SET with EX with big integer should report an error in tests/unit/expire.tcl
Expected 'ERR invalid expire time in set' to equal or match 'OK'
[ok]: SET with EX with smallest integer should report an error
[err]: GETEX with big integer should report an error in tests/unit/expire.tcl
Expected 'ERR invalid expire time in getex' to equal or match 'bar'
[ok]: GETEX with smallest integer should report an error
[err]: EXPIRE with big integer overflows when converted to milliseconds in tests/unit/expire.tcl
Expected 'ERR invalid expire time in expire' to equal or match '1'
[err]: PEXPIRE with big integer overflow when basetime is added in tests/unit/expire.tcl
Expected 'ERR invalid expire time in pexpire' to equal or match '1'
[err]: EXPIRE with big negative integer in tests/unit/expire.tcl
Expected '1' to match 'ERR invalid expire time in expire' (context: type eval line 4 cmd {assert_match {ERR invalid expire time in expire} $e} proc ::test)
[ok]: PEXPIREAT with big integer works
[ok]: PEXPIREAT with big negative integer works

@oranagra oranagra added the approval-needed Waiting for core team approval to be merged label Feb 18, 2021
@oranagra oranagra changed the title Fix expire time overflow: Return error when overflow is detected EXPIRE, EXPIREAT, SETEX, GETEX: Return error when expire time overflows Feb 21, 2021
@oranagra oranagra merged commit 0772098 into redis:unstable Feb 21, 2021
ningsongshen pushed a commit to ningsongshen/redis that referenced this pull request Feb 21, 2021
…ws (redis#8287)

Respond with error if expire time overflows from positive to negative of vice versa.

* `SETEX`, `SET EX`, `GETEX` etc would have already error on negative value,
but now they would also error on overflows (i.e. when the input was positive but
after the manipulation it becomes negative, which would have passed before)
* `EXPIRE` and `EXPIREAT` was ok taking negative values (would implicitly delete
the key), we keep that, but we do error if the user provided a value that changes
sign when manipulated (except the case of changing sign when `basetime` is added)

Signed-off-by: Gnanesh <[email protected]>
Co-authored-by: Oran Agra <[email protected]>
@oranagra
Copy link
Member

was tipped by @itamarhaber that this PR closed #86 (the oldest issue number I've seen getting closed).

JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Mar 2, 2021
…ws (redis#8287)

Respond with error if expire time overflows from positive to negative of vice versa.

* `SETEX`, `SET EX`, `GETEX` etc would have already error on negative value,
but now they would also error on overflows (i.e. when the input was positive but
after the manipulation it becomes negative, which would have passed before)
* `EXPIRE` and `EXPIREAT` was ok taking negative values (would implicitly delete
the key), we keep that, but we do error if the user provided a value that changes
sign when manipulated (except the case of changing sign when `basetime` is added)

Signed-off-by: Gnanesh <[email protected]>
Co-authored-by: Oran Agra <[email protected]>
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Nov 24, 2021
In redis#8287, some overflow checks have been added. But when
`when *= 1000` overflows, it will become a positive number.
And the check not able to catch it. The key will be added with
a short expiration time and will deleted a few seconds later.

In redis#9601, will check the overflow after `*=` and return an
error first, and avoiding this situation.

In this commit, added some tests to cover those code paths.
Found it in redis#9825, and close it.
oranagra pushed a commit that referenced this pull request Nov 24, 2021
In #8287, some overflow checks have been added. But when
`when *= 1000` overflows, it will become a positive number.
And the check not able to catch it. The key will be added with
a short expiration time and will deleted a few seconds later.

In #9601, will check the overflow after `*=` and return an
error first, and avoiding this situation.

In this commit, added some tests to cover those code paths.
Found it in #9825, and close it.
hwware pushed a commit to hwware/redis that referenced this pull request Dec 20, 2021
In redis#8287, some overflow checks have been added. But when
`when *= 1000` overflows, it will become a positive number.
And the check not able to catch it. The key will be added with
a short expiration time and will deleted a few seconds later.

In redis#9601, will check the overflow after `*=` and return an
error first, and avoiding this situation.

In this commit, added some tests to cover those code paths.
Found it in redis#9825, and close it.
@oranagra oranagra added the breaking-change This change can potentially break existing application label Mar 22, 2022
oranagra pushed a commit that referenced this pull request Apr 27, 2022
In #8287, some overflow checks have been added. But when
`when *= 1000` overflows, it will become a positive number.
And the check not able to catch it. The key will be added with
a short expiration time and will deleted a few seconds later.

In #9601, will check the overflow after `*=` and return an
error first, and avoiding this situation.

In this commit, added some tests to cover those code paths.
Found it in #9825, and close it.

(cherry picked from commit 9273d09)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approval-needed Waiting for core team approval to be merged breaking-change This change can potentially break existing application release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants