-
Notifications
You must be signed in to change notification settings - Fork 24.5k
Description
Lua scripts may contain conditionals in order to run or not commands depending on the existence or not of keys. For example let's examine the following Lua script:
if redis.call("exists",KEYS[1]) == 1
then
redis.call("incr","mycounter")
end
if redis.call("exists",KEYS[1]) == 1
then
return redis.call("incr","mycounter")
end
The script executes two times the same if key exists then increment counter logic. However the two executions will work differently in the master and the slaves.
In the master the first time the key may still exist, while the second time the key may no longer exist. This will result in the key incremented just one time. However as a side effect the master will generate a synthetic DEL command in the replication channel in order to force the slaves to expire the key (given that key expiration is master-driven).
When the same script will run in the slave, the key will no longer be there, so the script will not increment the key.
I reproduced the bug using the above Lua script and the following bash script:
while [ 1 ]
do
redis-cli set x foo px 100 > /dev/null
sleep 0.092
redis-cli --eval /tmp/myscript.lua x > /dev/null
sleep 0.1
redis-cli get mycounter
redis-cli -p 6380 get mycounter
done
After a few iterations the two counters will desynchronized.
Proposed fix
My guess is that the best way to fix this issue is:
- Don't expire keys when the client performing the lookup is flagged as being a Lua script.
- However before the script execution, run the expireIfNeeded() function against all the elements in the KEYS[] array (that is, the keys passed to the script).
Note that the second step is not required to guarantee consistency, but is required to guarantee a behavior that makes sense, that is, to expire keys ASAP when possible as the rest of the Redis semantics does.
An alternative way to address the problem is to replicate raw commands as proposed multiple times. Let's say that I don't believe this alone is sufficient to bias me towards this extreme solution (extreme as it will make Lua scripting bound to the replication link bandwidth), but it is true that this new bug adds some value to this alternative replication model for scripts.