WATCH no longer ignores keys which have expired for MULTI/EXEC.#7920
WATCH no longer ignores keys which have expired for MULTI/EXEC.#7920oranagra merged 1 commit intoredis:unstablefrom QuChen88:expire_fix
Conversation
|
Although we got a thumbs up from Salvatore for the complaint on the old behavior in slack, i'm a bit afraid to fix it now that i realize there's a test for that old behavior. Also, i suppose the exact same concern goes for eviction, which currently doesn't even bother to invalidate client side caching, but probably needs to invalidate both. |
|
@oranagra I also noticed there was a test for the old behavior. However, this comes from an very old commit from very early days of Redis (2010) so that makes me think this likely was a legacy bug... Also that Salvatore OK'ed this suggestion makes me not too concerned about this change. I will look into evict.c and fix this same issue over there. |
|
evict.c should be good. It is already calling |
|
I remember talking to salvatore about this at somepoint, although I can't find it in the chat. The issue was non-deterministic multi-exec failures, since the expires could happen at any time. I'm not personally inclined for the change given that it changes described functionality because it does introduce quite a bit of risk that an application that was previously working might suddenly see failures. We should definitely get a group consensus about it. |
|
@madolson I think the behavior for MULTI/EXEC transaction is a lot more un-deterministic if WATCH'ed keys are not being honored. Imagine that my transaction relies on the value of some key A, but the key A was deleted while commands are being queued, then at EXEC time, the key no longer exists... As a user, I definitely think this will lead to inconsistencies in the expectation of my transaction and the actual result at the execution time. It is a lot better user-experience to not execute the transaction in this case. |
|
@QuChen88 If you watch a volatile key, there is some non-determinism about whether or not you watched it before or after the key was actually expired. So you always need to handle both the cases where the key is present and the key was expired. This change will introduce a new failure mode, which is the multi-exec might fail completely because the key was "touched". Breaking backwards compatibility is a bit of a big deal, so I think we need a more compelling argument than "it's a better user-experience." |
|
Would it be fair to assume that if I explicitly WATCH'ed a key, then I would expect that key to be present? Having to program my MULTI/EXEC transaction to handle both the cases when the keys watched are present, or not present, is introducing unnecessary complexity. It would be a lot simpler the other way. |
|
Another argument I have here is that eviction also falls into similar bucket here as previously mentioned. Keys evicted will notify WATCH today. We should make the behavior of expire and eviction consistent. |
|
Watch doesn't return if the key was actually watched though, if the key was already expired you wouldn't know and would have to handle that case. I'm not sure your first argument actually simplifies any current implementation. I think we need to get the other core members to weigh in. @redis/core-team Thoughts? I would like to discuss this further and then we should only include it in a major version, not a patch version (maybe a minor). I'll clarify that I think the new behavior makes more sense to me, I'm just worried about the compatibility since I can see applications that were relying on the documented behavior. |
I see this is new 2d1968f (added between RC4 and 6.0 GA. after my report). Personally i think not invalidating watch when the key disappears makes more damage to the transaction than failing it (can cause it to produce undesired results). i.e. if you make a transaction that adds a HASH field that's a result of another hash field multiplied by 2, HSET that adds a new field will succeed but will actually create a new non-volatile hash with one field, instead of a volatile hash with several fields (which will expire at some point). This really looks like a simple bug that was overlooked for many years. but the fact it has a test explicitly verifying the bad behavior bothers me (being 10 years old code from an early / immature stage of Redis can indeed hint that it maybe shouldn't be taken too seriously). It's good @madolson found the google code discussion that lead to the creation of that test, but reading it i don't yet understand why. @itamarhaber @yossigo maybe you remember some piece of history about this? |
|
Just to comment that we need to update this page: https://redis.io/topics/transactions |
|
Another random thought. That means multi-execs on a replica will fail since the expire is replicated as a DEL. I can't think of a specific case where this is a problem, but it's another inconsistency. |
|
@yossigo I meant that if you do the watch on the replica, it will trigger this while watching on a master will not trigger this. I also think that since this is explicitly changing documented behavior in a backwards incompatible way, we should do this in a major version. I know that sounds heavy handed, but that is what it is. |
|
@madolson was it also documented? or just had a test that makes sure the bug exists? |
|
while thinking of the considerations of breaking this in 6.2 or 7.0, i realized we merged #7671 to 6.0.7, and are currently considering for 5.0.10 |
It's an interesting view, and after read all discussion above, I think we should fix it and release in a major version, in case of break compatibility. |
|
@oranagra "(Note that if you WATCH a volatile key and Redis expires the key after you WATCHed it, EXEC will still work. More on this.)" <-https://redis.io/topics/transactions |
|
ohh, that that makes a big difference for me. |
That statement sounds very suspicious to me. I don't know why a key that is deleted via expiry should be treated any differently from regular deletion or key eviction... |
|
Dudes and Gals, I created the issues, then fixed the problem and (guess what?) forgot to close it :D |
|
I see that we had documented the contrary, which is a problem, but it sounds like if the bug was documented with a successive PR or something like that... That's why even merging documentation PRs sometimes can get very wrong I guess. Btw I can't see how it is correct that a key being watched is expired, and EXEC succeeds. The expire should be, in theory, like an external demon that is able to delete keys based on certain informations it has. |
|
Btw I'm no longer sure what is the current Redis behavior. My last memory is that this was fixed for a specific reason, probably due to server side caching. At some point I added more logic in signalModifiedKey(), then I saw that it was not called during expires, and called it. That's the last thing I remember at least. If you have specific questions please ask, even if I'm not sure I remember all the story. |
@antirez Thanks a lot for clearing up this whole confusion amongst the various speculations. As I suspected, the current behavior is a bug that needs to be fixed. 👍 |
|
It surprises me that I didn't fix this, I remember it, but ... maybe I remember the branch that implemented the threaded slow commands. Or maybe I fixed it, later users pointed out that there was a documented behavior for N years and I reverted. Can't recall. |
|
@antirez thanks for clearing that out... |
|
found this: 20eeddf (maybe that's what Salvatore remembered fixing recently) |
|
Now I wonder, too, if there was any reason for having the broken behavior, because it is even tested, that is quite odd. |
|
Maybe this was just due to some original idea that WATCH would only catch keys modified by other clients, and never by the server itself. Anyway, after 10 years, this looks like it was a bad POV. |
|
@redis/core-team please approve. Regarding release schedule, i see 20eeddf was released in 6.0 RC2 (without a mention in the release notes), same as 2d1968f in 6.0.0 GA. On the other hand, one can argue that the old behavior is nothing but a bug that just deserves to be fixed, and that application can't really rely on that behavior, unlike the 3 changed mentioned above, this one is not TOUCHED as a direct result of a user action, and on large databases depends on the timing at which active-expire will find that key. |
|
@oranagra Yes, formally this should have been postponed because it's potentially breaking. But I do see the other aspects as contributing factors to make this an exception:
Based on that I slightly lean towards not waiting for 6.2. As a more general note - we're encountering this conflict quite often. I think being strict about potentially breaking changes when new features and improvements are introduced is obvious. With bug fixes I believe there should be more room for exceptions and ad-hoc analysis. |
This wrong behavior was backed by a test, and also documentation, and dates back to 2010. But it makes no sense to anyone involved so it was decided to change that. Note that 20eeddf (invalidate watch on expire on access) was released in 6.0 RC2 and 2d1968f released in in 6.0.0 GA (invalidate watch when key is evicted). both of which do similar changes. (cherry picked from commit 556acef)
…s#7920) This wrong behavior was backed by a test, and also documentation, and dates back to 2010. But it makes no sense to anyone involved so it was decided to change that. Note that 20eeddf (invalidate watch on expire on access) was released in 6.0 RC2 and 2d1968f released in in 6.0.0 GA (invalidate watch when key is evicted). both of which do similar changes.
…s#7920) This wrong behavior was backed by a test, and also documentation, and dates back to 2010. But it makes no sense to anyone involved so it was decided to change that. Note that 20eeddf (invalidate watch on expire on access) was released in 6.0 RC2 and 2d1968f released in in 6.0.0 GA (invalidate watch when key is evicted). both of which do similar changes. (cherry picked from commit 556acef)
|
If |
|
@tzongw i'm not sure what you mean. now time passes, and the key logically expires, but not deleted yet (other clients and active expire don't hit it) Note that once the transaction started we can't abort it. we could in theory actively check the expiration time of all volatile keys before executing EXEC, and also freeze time so that the logical time of exec will not have a chance. so in that case SET may cause the key to be expired and deleted, and then be overwritten by a new non-volatile key (users that want to preserve the TTL may use SET EX or alike) |
|
@oranagra that's exactly what i mean. come back to the unit test of this PR, we can't be sure whether EXEC runs on success or failure, that will depend on whether active expire evicting the key or not.
agree, maybe this is the only way to kill the chance. |
|
yes. that test assumes the active expire found that key before EXEC was called. regarding the suggestion of having |
|
seems that communicating on a closed issue is a wrong idea.. opened a new issue for that topic: #9068 |
WATCH no longer ignores keys which have expired for MULTI/EXEC. expire.c now calls signalModifiedKey() to invalidate watch as well. This fixes #7918 and Issue 9 in #6860