WATCH no longer ignores keys which have expired for MULTI/EXEC. by QuChen88 · Pull Request #7920 · redis/redis

QuChen88 · 2020-10-16T06:23:12Z

WATCH no longer ignores keys which have expired for MULTI/EXEC. expire.c now calls signalModifiedKey() to invalidate watch as well. This fixes #7918 and Issue 9 in #6860

oranagra · 2020-10-16T09:11:04Z

Although we got a thumbs up from Salvatore for the complaint on the old behavior in slack, i'm a bit afraid to fix it now that i realize there's a test for that old behavior.
seems that it was intentional.
i see the blame log points me to b7a8dae, we need to dig in a bit more and see what was the reason and that we don't overlook some concern.

Also, i suppose the exact same concern goes for eviction, which currently doesn't even bother to invalidate client side caching, but probably needs to invalidate both.
let's handle it in this PR too.

QuChen88 · 2020-10-16T15:26:47Z

@oranagra I also noticed there was a test for the old behavior. However, this comes from an very old commit from very early days of Redis (2010) so that makes me think this likely was a legacy bug... Also that Salvatore OK'ed this suggestion makes me not too concerned about this change.

I will look into evict.c and fix this same issue over there.

QuChen88 · 2020-10-16T16:02:32Z

evict.c should be good. It is already calling signalModifiedKey() after the dbDelete operation. This will notify the WATCH'ed clients on this key. Client side caching use-case is also handled in signalModifiedKey.

https://github.com/redis/redis/blob/unstable/src/evict.c#L626

madolson · 2020-10-16T16:07:48Z

I remember talking to salvatore about this at somepoint, although I can't find it in the chat. The issue was non-deterministic multi-exec failures, since the expires could happen at any time.

I'm not personally inclined for the change given that it changes described functionality because it does introduce quite a bit of risk that an application that was previously working might suddenly see failures. We should definitely get a group consensus about it.

madolson · 2020-10-16T16:10:01Z

https://code.google.com/archive/p/redis/issues/270 Found this too.

QuChen88 · 2020-10-16T16:34:27Z

@madolson I think the behavior for MULTI/EXEC transaction is a lot more un-deterministic if WATCH'ed keys are not being honored. Imagine that my transaction relies on the value of some key A, but the key A was deleted while commands are being queued, then at EXEC time, the key no longer exists... As a user, I definitely think this will lead to inconsistencies in the expectation of my transaction and the actual result at the execution time. It is a lot better user-experience to not execute the transaction in this case.

madolson · 2020-10-16T16:50:50Z

@QuChen88 If you watch a volatile key, there is some non-determinism about whether or not you watched it before or after the key was actually expired. So you always need to handle both the cases where the key is present and the key was expired. This change will introduce a new failure mode, which is the multi-exec might fail completely because the key was "touched".

Breaking backwards compatibility is a bit of a big deal, so I think we need a more compelling argument than "it's a better user-experience."

QuChen88 · 2020-10-16T16:57:30Z

Would it be fair to assume that if I explicitly WATCH'ed a key, then I would expect that key to be present?

Having to program my MULTI/EXEC transaction to handle both the cases when the keys watched are present, or not present, is introducing unnecessary complexity. It would be a lot simpler the other way.

QuChen88 · 2020-10-16T17:01:14Z

Another argument I have here is that eviction also falls into similar bucket here as previously mentioned. Keys evicted will notify WATCH today. We should make the behavior of expire and eviction consistent.

madolson · 2020-10-16T17:26:55Z

Watch doesn't return if the key was actually watched though, if the key was already expired you wouldn't know and would have to handle that case. I'm not sure your first argument actually simplifies any current implementation.

I think we need to get the other core members to weigh in. @redis/core-team Thoughts? I would like to discuss this further and then we should only include it in a major version, not a patch version (maybe a minor). I'll clarify that I think the new behavior makes more sense to me, I'm just worried about the compatibility since I can see applications that were relying on the documented behavior.

oranagra · 2020-10-16T17:31:02Z

evict.c should be safe. It is already calling signalModifiedKey() after the dbDelete operation.

unstable/src/evict.c#L626

I see this is new 2d1968f (added between RC4 and 6.0 GA. after my report).

Personally i think not invalidating watch when the key disappears makes more damage to the transaction than failing it (can cause it to produce undesired results).

i.e. if you make a transaction that adds a HASH field that's a result of another hash field multiplied by 2, HSET that adds a new field will succeed but will actually create a new non-volatile hash with one field, instead of a volatile hash with several fields (which will expire at some point).

This really looks like a simple bug that was overlooked for many years. but the fact it has a test explicitly verifying the bad behavior bothers me (being 10 years old code from an early / immature stage of Redis can indeed hint that it maybe shouldn't be taken too seriously).

It's good @madolson found the google code discussion that lead to the creation of that test, but reading it i don't yet understand why.

@itamarhaber @yossigo maybe you remember some piece of history about this?

madolson · 2020-10-16T18:12:28Z

Just to comment that we need to update this page: https://redis.io/topics/transactions

madolson · 2020-10-16T18:18:17Z

Another random thought. That means multi-execs on a replica will fail since the expire is replicated as a DEL. I can't think of a specific case where this is a problem, but it's another inconsistency.

yossigo · 2020-10-18T09:42:58Z

@madolson Why do you think MULTI/EXEC will fail, given that WATCH is handled locally on the master?

@oranagra I don't see how we can treat this as anything but a bug, with a fix that requires careful delivery (a minor version) because it may break backwards compatibility.

madolson · 2020-10-20T00:11:14Z

@yossigo I meant that if you do the watch on the replica, it will trigger this while watching on a master will not trigger this. I also think that since this is explicitly changing documented behavior in a backwards incompatible way, we should do this in a major version. I know that sounds heavy handed, but that is what it is.

oranagra · 2020-10-20T05:59:11Z

@madolson was it also documented? or just had a test that makes sure the bug exists?
so in your last comment you meant that in the old code would have failed EXEC on a replica and not on the master. but the new one would fail both. so we're actually making it more consistent, right?

oranagra · 2020-10-20T06:40:57Z

while thinking of the considerations of breaking this in 6.2 or 7.0, i realized we merged #7671 to 6.0.7, and are currently considering for 5.0.10

soloestoy · 2020-10-20T12:07:57Z

Another random thought. That means multi-execs on a replica will fail since the expire is replicated as a DEL. I can't think of a specific case where this is a problem, but it's another inconsistency.

It's an interesting view, and after read all discussion above, I think we should fix it and release in a major version, in case of break compatibility.

madolson · 2020-10-20T15:07:33Z

@oranagra "(Note that if you WATCH a volatile key and Redis expires the key after you WATCHed it, EXEC will still work. More on this.)" <-https://redis.io/topics/transactions

oranagra · 2020-10-20T15:59:14Z

ohh, that that makes a big difference for me.
i thought it was just a test that for some unclear reason validated a wrong thing (and even that fact caused me to think twice about changing it), but the docs really spells out the intent without room for doubt (although the reference for the reasoning "more on this" doesn't seem explain why it's done, as far as i can tell).
maybe we can call @antirez to clear this up?

QuChen88 · 2020-10-20T16:26:08Z

@oranagra "(Note that if you WATCH a volatile key and Redis expires the key after you WATCHed it, EXEC will still work. More on this.)" <-https://redis.io/topics/transactions

That statement sounds very suspicious to me. I don't know why a key that is deleted via expiry should be treated any differently from regular deletion or key eviction...

antirez · 2020-10-20T17:02:24Z

Dudes and Gals, I created the issues, then fixed the problem and (guess what?) forgot to close it :D

antirez · 2020-10-20T17:04:25Z

I see that we had documented the contrary, which is a problem, but it sounds like if the bug was documented with a successive PR or something like that... That's why even merging documentation PRs sometimes can get very wrong I guess. Btw I can't see how it is correct that a key being watched is expired, and EXEC succeeds. The expire should be, in theory, like an external demon that is able to delete keys based on certain informations it has.

antirez · 2020-10-20T17:06:06Z

Btw I'm no longer sure what is the current Redis behavior. My last memory is that this was fixed for a specific reason, probably due to server side caching. At some point I added more logic in signalModifiedKey(), then I saw that it was not called during expires, and called it. That's the last thing I remember at least. If you have specific questions please ask, even if I'm not sure I remember all the story.

QuChen88 · 2020-10-20T17:12:43Z

I see that we had documented the contrary, which is a problem, but it sounds like if the bug was documented with a successive PR or something like that... That's why even merging documentation PRs sometimes can get very wrong I guess. Btw I can't see how it is correct that a key being watched is expired, and EXEC succeeds. The expire should be, in theory, like an external demon that is able to delete keys based on certain informations it has.

@antirez Thanks a lot for clearing up this whole confusion amongst the various speculations. As I suspected, the current behavior is a bug that needs to be fixed. 👍

antirez · 2020-10-20T17:15:12Z

It surprises me that I didn't fix this, I remember it, but ... maybe I remember the branch that implemented the threaded slow commands. Or maybe I fixed it, later users pointed out that there was a documented behavior for N years and I reverted. Can't recall.

oranagra · 2020-10-20T17:24:27Z

@antirez thanks for clearing that out...
It's hard to tell what's the history behind this, it does seem broken since forever (and even covered by a test and a documentation that describe the bad behavior as the correct one).
And in any case fixing may break someone's code, and needs to be mentioned in BOLD letters in the release notes.
But at least with your input we know that we're not missing any reason to keep this behavior.
Really appreciate you stepping in to comment on that.

oranagra · 2020-10-20T18:02:30Z

found this: 20eeddf (maybe that's what Salvatore remembered fixing recently)
git log -S doesn't show anything else about signalModifiedKey in expire.c or previously db.c. just to rule out the suspicion that it was fixed and later the fix was reverted for some reason.

antirez · 2020-10-20T19:27:37Z

Now I wonder, too, if there was any reason for having the broken behavior, because it is even tested, that is quite odd.

antirez · 2020-10-20T19:30:22Z

Maybe this was just due to some original idea that WATCH would only catch keys modified by other clients, and never by the server itself. Anyway, after 10 years, this looks like it was a bad POV.

oranagra · 2020-10-21T06:15:39Z

@redis/core-team please approve.

Regarding release schedule, i see 20eeddf was released in 6.0 RC2 (without a mention in the release notes), same as 2d1968f in 6.0.0 GA.
I guess the responsible thing would be to hold this change for 6.2 (unlike what we did with #7671 in 6.0.7), people should not expect surprises when upgrading to a new patch level release?

On the other hand, one can argue that the old behavior is nothing but a bug that just deserves to be fixed, and that application can't really rely on that behavior, unlike the 3 changed mentioned above, this one is not TOUCHED as a direct result of a user action, and on large databases depends on the timing at which active-expire will find that key.

yossigo · 2020-10-22T08:34:37Z

@oranagra Yes, formally this should have been postponed because it's potentially breaking.

But I do see the other aspects as contributing factors to make this an exception:

WATCH behavior has already been modified in a breaking way in 6.0.x, more than once, and not always with a clear indication of that.
There's a big claim that apps that break because of this are anyway flaky because they depend on random timing of background events. We just ensure they'll always break rather than sometimes, i.e. provide a more consistent failure.

Based on that I slightly lean towards not waiting for 6.2.

As a more general note - we're encountering this conflict quite often. I think being strict about potentially breaking changes when new features and improvements are introduced is obvious. With bug fixes I believe there should be more room for exceptions and ad-hoc analysis.

This wrong behavior was backed by a test, and also documentation, and dates back to 2010. But it makes no sense to anyone involved so it was decided to change that. Note that 20eeddf (invalidate watch on expire on access) was released in 6.0 RC2 and 2d1968f released in in 6.0.0 GA (invalidate watch when key is evicted). both of which do similar changes. (cherry picked from commit 556acef)

…s#7920) This wrong behavior was backed by a test, and also documentation, and dates back to 2010. But it makes no sense to anyone involved so it was decided to change that. Note that 20eeddf (invalidate watch on expire on access) was released in 6.0 RC2 and 2d1968f released in in 6.0.0 GA (invalidate watch when key is evicted). both of which do similar changes.

…s#7920) This wrong behavior was backed by a test, and also documentation, and dates back to 2010. But it makes no sense to anyone involved so it was decided to change that. Note that 20eeddf (invalidate watch on expire on access) was released in 6.0 RC2 and 2d1968f released in in 6.0.0 GA (invalidate watch when key is evicted). both of which do similar changes. (cherry picked from commit 556acef)

tzongw · 2021-06-07T08:32:43Z

If activeExpireCycleTryExpire not finding out the expired watched keys, EXEC will still run on success, and keys will expire on-access of a queued command.
20eeddf solves synchronous access before EXEC, but does it solve synchronous access of a queued command?

oranagra · 2021-06-07T10:59:57Z

@tzongw i'm not sure what you mean.
do you mean this scenario:

WATCH key
MULTI
GET key

now time passes, and the key logically expires, but not deleted yet (other clients and active expire don't hit it)

EXEC       <- exec successfully runs, the queued GET returns nil and key deleted.

Note that once the transaction started we can't abort it. we could in theory actively check the expiration time of all volatile keys before executing EXEC, and also freeze time so that the logical time of exec will not have a chance.
but do note that this is not the typical use case for WATCH/EXEC.
normally:

WATCH key
GET key
MULTI
SET key <some value that's based on the one returned by GET>
EXEC

so in that case SET may cause the key to be expired and deleted, and then be overwritten by a new non-volatile key (users that want to preserve the TTL may use SET EX or alike)

tzongw · 2021-06-07T14:27:23Z

@oranagra that's exactly what i mean. come back to the unit test of this PR, we can't be sure whether EXEC runs on success or failure, that will depend on whether active expire evicting the key or not.

we could in theory actively check the expiration time of all volatile keys before executing EXEC, and also freeze time so that the logical time of exec will not have a chance.

agree, maybe this is the only way to kill the chance.

oranagra · 2021-06-07T15:03:41Z

yes. that test assumes the active expire found that key before EXEC was called.
but unlike other flaky tests, in case the assumption fails, the test still passes (it just didn't fulfill it's purpose).
it is easy to fix that by using a wait_for_condition on dbsize instead of the constant sleep.

regarding the suggestion of having execCommand actively test the expiration of all watched keys before "freezing" the time and proceeding. i'd like to hear the opinion of the @redis/core-team about it, since IIUC it's not a problem for the typical use case of WATCH, but i suppose it can resolve some risk of timing issues causing the transaction to be executed differently. i.e. in the above example i gave a key can become non-volatile.

oranagra · 2021-06-10T13:35:49Z

seems that communicating on a closed issue is a wrong idea.. opened a new issue for that topic: #9068

WATCH no longer ignores keys which have expired for MULTI/EXEC.

d4cbb6e

guybe7 approved these changes Oct 16, 2020

View reviewed changes

madolson added the state:major-decision Requires core team consensus label Oct 16, 2020

madolson added the state:needs-code-doc the PR requires documentation inside the redis repository label Oct 16, 2020

oranagra added release-notes indication that this issue needs to be mentioned in the release notes state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten labels Oct 20, 2020

oranagra approved these changes Oct 20, 2020

View reviewed changes

madolson removed the state:major-decision Requires core team consensus label Oct 20, 2020

madolson approved these changes Oct 20, 2020

View reviewed changes

soloestoy approved these changes Oct 21, 2020

View reviewed changes

yossigo approved these changes Oct 22, 2020

View reviewed changes

oranagra merged commit 556acef into redis:unstable Oct 22, 2020

oranagra mentioned this pull request Oct 26, 2020

release 6.0.9 #7965

Merged

QuChen88 deleted the expire_fix branch December 16, 2020 07:31

oranagra mentioned this pull request Jun 10, 2021

[BUG?] EXEC commands succeeds when one of the queued commands uses a WATCHed expired key #9068

Closed

oranagra mentioned this pull request Jul 8, 2021

Fail Exec command in case a watched key is expired #9194

Merged

mrnovalles mentioned this pull request Apr 22, 2022

Remove the need for optimistic locking when counting hits ExHammer/hammer-backend-redis#28

Merged

Conversation

QuChen88 commented Oct 16, 2020

Uh oh!

oranagra commented Oct 16, 2020

Uh oh!

QuChen88 commented Oct 16, 2020

Uh oh!

QuChen88 commented Oct 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

madolson commented Oct 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

madolson commented Oct 16, 2020

Uh oh!

QuChen88 commented Oct 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

madolson commented Oct 16, 2020

Uh oh!

QuChen88 commented Oct 16, 2020

Uh oh!

QuChen88 commented Oct 16, 2020

Uh oh!

madolson commented Oct 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Oct 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

madolson commented Oct 16, 2020

Uh oh!

madolson commented Oct 16, 2020

Uh oh!

yossigo commented Oct 18, 2020

Uh oh!

madolson commented Oct 20, 2020

Uh oh!

oranagra commented Oct 20, 2020

Uh oh!

oranagra commented Oct 20, 2020

Uh oh!

soloestoy commented Oct 20, 2020

Uh oh!

madolson commented Oct 20, 2020

Uh oh!

oranagra commented Oct 20, 2020

Uh oh!

QuChen88 commented Oct 20, 2020

Uh oh!

antirez commented Oct 20, 2020

Uh oh!

antirez commented Oct 20, 2020

Uh oh!

antirez commented Oct 20, 2020

Uh oh!

QuChen88 commented Oct 20, 2020

Uh oh!

antirez commented Oct 20, 2020

Uh oh!

oranagra commented Oct 20, 2020

Uh oh!

oranagra commented Oct 20, 2020

Uh oh!

antirez commented Oct 20, 2020

Uh oh!

antirez commented Oct 20, 2020

Uh oh!

oranagra commented Oct 21, 2020

Uh oh!

yossigo commented Oct 22, 2020

Uh oh!

tzongw commented Jun 7, 2021

Uh oh!

oranagra commented Jun 7, 2021

Uh oh!

tzongw commented Jun 7, 2021

Uh oh!

oranagra commented Jun 7, 2021

Uh oh!

QuChen88 commented Oct 16, 2020 •

edited

Loading

madolson commented Oct 16, 2020 •

edited

Loading

QuChen88 commented Oct 16, 2020 •

edited

Loading

madolson commented Oct 16, 2020 •

edited

Loading

oranagra commented Oct 16, 2020 •

edited

Loading