Skip to content

[BUG] Potential Data Race on server.master_repl_offset #7779

@BradSwain

Description

@BradSwain

Describe the bug

We are developing a static data race detection tool and detected a potential data race in redis version 6.0.7
We found a potential race on server.master_repl_offset between the main thread and an IO thread.

To reproduce

==== Found a race between: 
line 4988 in server.c AND line 164 in replication.c
Shared variable:
 at line 72 of server.c
 72|struct redisServer server; /* Server global state */
Thread 1:
 4986| {
 4987|     memcpy(server.replid,rsi.repl_id,sizeof(server.replid));
>4988|     server.master_repl_offset = rsi.repl_offset;
 4989|     /* If we are a slave, create a cached master from this
 4990|      * information, in order to allow partial resynchronizations*/
>>>Stacktrace:
>>>main
>>>  loadDataFromDisk [server.c:5272]
Thread 2:
 162|    unsigned char *p = ptr;
 163|
>164|    server.master_repl_offset += len;
 165|
 166|    /* This is a circular buffer, so write as much data we can at every*/
>>>Stacktrace:
>>>pthread_create [networking.c:3016]
>>>  IOThreadMain [networking.c:3016]
>>>    readQueryFromClient [networking.c:2979]
>>>      processInputBuffer [networking.c:1996]
>>>        processGopherRequest [networking.c:1886]
>>>          lookupKeyRead [gopher.c:53]
>>>            lookupKeyReadWithFlags [db.c:146]
>>>              expireIfNeeded [db.c:101]
>>>                propagateExpire [db.c:1311]
>>>                  replicationFeedSlaves [db.c:1233]
>>>                    feedReplicationBacklog [replication.c:270]

Inside of main in server.c, the function InitServerLast creates IOThreads which can eventually call feedReplicationBacklog and write to server.master_repl_offset through the call stack shown for thread 2:
IOThreadMain -> readQueryFromClient -> ... -> replicationFeedSlaves -> feedReplicationBacklog

Meanwhile, immediately after the main thread finishes spawning IOThreads and returns from InitServerLast, it calls loadDataFromDisk which also writes to server.master_repl_offset.

We did not see any synchronization preventing these two accesses from happening in parallel, but we are not entirely sure.

Could someone with more knowledge on this part of the code confirm? If there is some synchronization we missed, we would like to update our tool accordingly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions