Skip to content

Fail Exec command in case a watched key is expired#9194

Merged
oranagra merged 4 commits intoredis:unstablefrom
perryitay:fix-issue-9068
Jul 11, 2021
Merged

Fail Exec command in case a watched key is expired#9194
oranagra merged 4 commits intoredis:unstablefrom
perryitay:fix-issue-9068

Conversation

@perryitay
Copy link
Contributor

@perryitay perryitay commented Jul 4, 2021

There are two issues fixed in this Pr:

  1. fixing issue [BUG?] EXEC commands succeeds when one of the queued commands uses a WATCHed expired key #9068, we want to fail the EXEC command in case there is a watched key that's logically expired but not yet deleted by active expire or lazy expire.

  2. we saw that currently cache time is update in every call, this time is being also being use for the isKeyExpired comparison, we in case of a nested call, we want to update the cache time only in the first call (execCommand)

@zuiderkwast zuiderkwast requested a review from oranagra July 4, 2021 13:04
Copy link
Contributor

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this change! We should add tests for these cases as well.

@oranagra
Copy link
Member

oranagra commented Jul 5, 2021

@perryitay please add a test, it can basically be formed slightly similar to the WATCH will consider touched expired keys test, but use r debug set-active-expire 0 at startup, and after 2 instead of the call for wait_for_dbsize.

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, but it doesn't seem to be ready yet.

  • The build fails because keyIsExpired() is not exported from db.c. You need to add it to server.h.
  • There are not tests.

@oranagra
Copy link
Member

oranagra commented Jul 8, 2021

@redis/core-team this is a behavior change, but it actually fixes an unpredictable (timing sensitive) behavior, so i think there's no doubt about merging it (people can't really rely on this behavior).
i also think we probably wanna backport it to both 6.2 and 6.0 (where #7920 was fixed), so let me know if you think otherwise.

@oranagra oranagra added approval-needed Waiting for core team approval to be merged release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten labels Jul 8, 2021
oranagra
oranagra previously approved these changes Jul 8, 2021
@zuiderkwast zuiderkwast added the state:needs-doc-pr requires a PR to redis-doc repository label Jul 8, 2021
@oranagra
Copy link
Member

oranagra commented Jul 8, 2021

@zuiderkwast what is there to document here?

@zuiderkwast
Copy link
Contributor

Doc: One more note under NOTE here: https://redis.io/topics/transactions#a-hrefcommandswatchwatcha-explained

(I see now that the bullet list is messed up too. Probably a missing blank line before the bullet list.)

@zuiderkwast
Copy link
Contributor

See redis/redis-doc#1590 (comment)

@oranagra
Copy link
Member

oranagra commented Jul 8, 2021

well, i'm not sure we need to document bug history in redis.io (we have command and argument history, it may be enough).
for commands, the history is needed since you may see a certain command or argument in the docs, but it doesn't yet exists in the version you're gonna use.

bugs are documented in the release notes.. new code that's written against a fixed version will never see the bug,
and for old code that relies on the bug, you'll see the behavior change in the release notes and then you'll consider if you need to change your app code.

i also wanted to argue that in #7920 the behavior was more consistent (app can depend on it) than the one fixed here, but actually i'm not sure that's right.

anyway, bottom line, i'm ok to document it, although i'm not sure it's necessary, and if we do so, i'd advise to move that note to the bottom into some distinct history section.

@soloestoy
Copy link
Contributor

Good job, this PR solve most of the problems(keys expired during WATCH to EXEC), but there is still a problem it doesn't solve(keys expired before WATCH but not deleted) IIUC:

Time 1: we have a key "foo" is expired but not deleted.

Time 2: we WATCH the key "foo".

Time 3: execute EXEC and we find "foo" is expired and discard the transaction, but it's not right, cause key "foo" is expired before WATCH.

@soloestoy
Copy link
Contributor

Maybe we should call lookupKeyRead in WATCH command to trigger expired key deletion.


/* EXEC with expired watched key is disallowed*/
if (isWatchedKeyExpired(c)) {
c->flags |= (CLIENT_DIRTY_CAS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: The () is unnecessary and adds no value.

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soloestoy you're right! Expired but not deleted before WATCH is broken by this PR. I wrote this test case and checked that it passes on unstable but fails in this PR branch:

    test {WATCH deletes already expired keys} {
        r del x
        r debug set-active-expire 0
        r set x foo px 1
        after 2
        r watch x
        r multi
        r ping
        assert_equal {PONG} [r exec]
        r debug set-active-expire 1
    }

It needs to be fixed. This test case can be added to tests/unit/multi.tcl.

@oranagra
Copy link
Member

oranagra commented Jul 9, 2021

@soloestoy @zuiderkwast i'm not sure i agree that this is really a problem (although i don't mind fixing it either).
I would still argue that the above TCL example was producing wrong results before this PR.

let's consider a more realistic examples (with similar timing as the one above):

SET x y px 1
after 2
WATCH x
GET x
MULTI
INCR x
EXEC  

in this case, GET will fail the transaction.

The problem this PR fixes is the case INCR will delete the key and create a new (non-volatile) one (which is a violation of the transaction guarantees).

In the example without a GET or INCR, if the new code (this PR) will fail the transaction, there's really no harm (not a violation of the transaction guarantees).
And if it had an INCR inside it instead of PING (still without the GET between WATCH and EXEC), then this PR still fixes a real issue (watched key is deleted by one of the transaction commands).

@madolson
Copy link
Contributor

madolson commented Jul 9, 2021

I agree with @oranagra that I don't believe it's a real problem. (and I also don't mind fixing it either) I think it can be fixed independently though.

@soloestoy
Copy link
Contributor

I have different opinions, I believe this is a bug, and I believe if we are sure the expire mechanism can affect transaction then we should handle all the scenarios not only the real expired deletion but also the logic expired time.

@madolson
Copy link
Contributor

madolson commented Jul 9, 2021

@soloestoy after thinking this through more, I suppose I do agree this is a real issue and should be addressed. Watch should call lookup key. I would still be fine merging this than fixing it though, but we should definitely fix it.

@oranagra
Copy link
Member

Ok. but for clarity, let's handle it in a separate PR (different scenario, and no shared code)
@perryitay do you wanna make that PR?

Copy link
Member

@itamarhaber itamarhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, haven't CR

@yossigo
Copy link
Collaborator

yossigo commented Jul 11, 2021

I also agree with @madolson and @soloestoy, WATCH should apply to the logical state and deterministic state of the key.

@oranagra
Copy link
Member

ok. @perryitay will make another PR to fix the other issue.
p.s. no one commented on backporting.. i'll keep the tags and we'll discuss it when time comes..

@oranagra oranagra merged commit ac8b1df into redis:unstable Jul 11, 2021
@oranagra oranagra added the breaking-change This change can potentially break existing application label Jul 20, 2021
This was referenced Jul 21, 2021
oranagra added a commit that referenced this pull request Jul 21, 2021
There are two issues fixed in this commit: 
1. we want to fail the EXEC command in case there is a watched key that's logically
   expired but not yet deleted by active expire or lazy expire.
2. we saw that currently cache time is update in every `call()` (including nested calls),
   this time is being also being use for the isKeyExpired comparison, we want to update
   the cache time only in the first call (execCommand)

Co-authored-by: Oran Agra <[email protected]>
(cherry picked from commit ac8b1df)
oranagra added a commit that referenced this pull request Jul 21, 2021
There are two issues fixed in this commit:
1. we want to fail the EXEC command in case there is a watched key that's logically
   expired but not yet deleted by active expire or lazy expire.
2. we saw that currently cache time is update in every `call()` (including nested calls),
   this time is being also being use for the isKeyExpired comparison, we want to update
   the cache time only in the first call (execCommand)

Co-authored-by: Oran Agra <[email protected]>
(cherry picked from commit ac8b1df)
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Sep 8, 2021
There are two issues fixed in this commit: 
1. we want to fail the EXEC command in case there is a watched key that's logically
   expired but not yet deleted by active expire or lazy expire.
2. we saw that currently cache time is update in every `call()` (including nested calls),
   this time is being also being use for the isKeyExpired comparison, we want to update
   the cache time only in the first call (execCommand)

Co-authored-by: Oran Agra <[email protected]>
soloestoy added a commit to soloestoy/libredis-BSD that referenced this pull request Feb 16, 2022
After a discussion in redis#9068 and redis#9194, we reached an agreement
that we should handle all scenarios about expire, not only the real
expired deletion but also the logical expired time.

This PR aims to fix the issue below:
Time 1: we have a key "foo" is expired but not deleted.
Time 2: we WATCH the key "foo".
Time 3: execute EXEC and we find "foo" is expired and discard the
transaction, but it's not right, cause key "foo" is expired before
WATCH.

To adddress the issue, the WATCH command now calls expireIfNeeded()
try to delete the expired keys.

But there are two scenarios that expireIfNeeded() cannot work:
clients are paused and role is replica. To handle the stale key,
we add a flag to record if the watchedKey is stale, and don't
flag client as CLIENT_DIRTY_CAS if key is stale when touch the key.
@sundb sundb mentioned this pull request Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approval-needed Waiting for core team approval to be merged breaking-change This change can potentially break existing application release-notes indication that this issue needs to be mentioned in the release notes state:major-decision Requires core team consensus state:needs-doc-pr requires a PR to redis-doc repository state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[BUG?] EXEC commands succeeds when one of the queued commands uses a WATCHed expired key

7 participants