-
Notifications
You must be signed in to change notification settings - Fork 24.5k
Description
Describe the bug
We are developing a static data race detection tool and detected a potential data race in redis version 6.0.7
We found a potential race on server.master_repl_offset between the main thread and an IO thread.
To reproduce
==== Found a race between:
line 4988 in server.c AND line 164 in replication.c
Shared variable:
at line 72 of server.c
72|struct redisServer server; /* Server global state */
Thread 1:
4986| {
4987| memcpy(server.replid,rsi.repl_id,sizeof(server.replid));
>4988| server.master_repl_offset = rsi.repl_offset;
4989| /* If we are a slave, create a cached master from this
4990| * information, in order to allow partial resynchronizations*/
>>>Stacktrace:
>>>main
>>> loadDataFromDisk [server.c:5272]
Thread 2:
162| unsigned char *p = ptr;
163|
>164| server.master_repl_offset += len;
165|
166| /* This is a circular buffer, so write as much data we can at every*/
>>>Stacktrace:
>>>pthread_create [networking.c:3016]
>>> IOThreadMain [networking.c:3016]
>>> readQueryFromClient [networking.c:2979]
>>> processInputBuffer [networking.c:1996]
>>> processGopherRequest [networking.c:1886]
>>> lookupKeyRead [gopher.c:53]
>>> lookupKeyReadWithFlags [db.c:146]
>>> expireIfNeeded [db.c:101]
>>> propagateExpire [db.c:1311]
>>> replicationFeedSlaves [db.c:1233]
>>> feedReplicationBacklog [replication.c:270]Inside of main in server.c, the function InitServerLast creates IOThreads which can eventually call feedReplicationBacklog and write to server.master_repl_offset through the call stack shown for thread 2:
IOThreadMain -> readQueryFromClient -> ... -> replicationFeedSlaves -> feedReplicationBacklog
Meanwhile, immediately after the main thread finishes spawning IOThreads and returns from InitServerLast, it calls loadDataFromDisk which also writes to server.master_repl_offset.
We did not see any synchronization preventing these two accesses from happening in parallel, but we are not entirely sure.
Could someone with more knowledge on this part of the code confirm? If there is some synchronization we missed, we would like to update our tool accordingly.