Skip to content

EVAL replicated + conditionals about key existence = replication bug. #1525

@antirez

Description

@antirez

Lua scripts may contain conditionals in order to run or not commands depending on the existence or not of keys. For example let's examine the following Lua script:

if redis.call("exists",KEYS[1]) == 1
then
    redis.call("incr","mycounter")
end

if redis.call("exists",KEYS[1]) == 1
then
    return redis.call("incr","mycounter")
end

The script executes two times the same if key exists then increment counter logic. However the two executions will work differently in the master and the slaves.

In the master the first time the key may still exist, while the second time the key may no longer exist. This will result in the key incremented just one time. However as a side effect the master will generate a synthetic DEL command in the replication channel in order to force the slaves to expire the key (given that key expiration is master-driven).

When the same script will run in the slave, the key will no longer be there, so the script will not increment the key.

I reproduced the bug using the above Lua script and the following bash script:

while [ 1 ]
do
    redis-cli set x foo px 100 > /dev/null
    sleep 0.092
    redis-cli --eval /tmp/myscript.lua x > /dev/null
    sleep 0.1
    redis-cli get mycounter
    redis-cli -p 6380 get mycounter
done

After a few iterations the two counters will desynchronized.

Proposed fix

My guess is that the best way to fix this issue is:

  • Don't expire keys when the client performing the lookup is flagged as being a Lua script.
  • However before the script execution, run the expireIfNeeded() function against all the elements in the KEYS[] array (that is, the keys passed to the script).

Note that the second step is not required to guarantee consistency, but is required to guarantee a behavior that makes sense, that is, to expire keys ASAP when possible as the rest of the Redis semantics does.

An alternative way to address the problem is to replicate raw commands as proposed multiple times. Let's say that I don't believe this alone is sufficient to bias me towards this extreme solution (extreme as it will make Lua scripting bound to the replication link bandwidth), but it is true that this new bug adds some value to this alternative replication model for scripts.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions