Skip to content

make server.loading flag atomic#7778

Closed
BradSwain wants to merge 1 commit intoredis:unstablefrom
BradSwain:atomic_server_loading_flag
Closed

make server.loading flag atomic#7778
BradSwain wants to merge 1 commit intoredis:unstablefrom
BradSwain:atomic_server_loading_flag

Conversation

@BradSwain
Copy link

Reported a potential data race on server.loading in #7777 .

This is a potential fix by making the flag atomic.

@ShooterIT
Copy link
Member

ShooterIT commented Sep 10, 2020

No, what you said is related to "atomic relative to signal handling". Please see https://en.cppreference.com/w/c/program/sig_atomic_t
For most of platforms, int is enough. If we want to make it more safe, we can use 'sig_atomic_t'

@BradSwain
Copy link
Author

BradSwain commented Sep 11, 2020

Thanks for taking a look.

I'm not very familiar with signal handling in general, but in a multi-threaded program is it possible for the signal to be handled on a different thread? If so, the main thread could be setting server.loading = 1; while another thread is running the signal handler and reading server.loading in parallel, which would cause a data race.

I did a little research and found this from the pthreads(7) documentation:

POSIX.1 distinguishes the notions of signals that are directed to the process as a whole and signals that are directed to individual threads. According to POSIX.1, a process-directed signal (sent using kill(2), for example) should be handled by a single, arbitrarily selected thread within the process

This makes it sound like it may be possible for a signal to be handled by any arbitrary thread, but I'm not sure if that applies to this case.

@madolson
Copy link
Contributor

I'm not really sure this matters, there is still the case someone could have killed the process before the flag is set and it doesn't exit as you would expect. You would then signal it again, and it would definitely die the second time. I feel like if we're going to care about sig_atomic_t we look more generally at how we are handling interupts and making sure they act as expected.

@ShooterIT
Copy link
Member

Hi @BradSwain thanks for your sharing, I also don't know much what thread will execute signal handle. https://en.cppreference.com/w/c/program/signal, but this can provide reference.

In addition, server.shutdown_asap also should be volatile sig_atomic_t, right? Your tool didn't report errors on server.shutdown_asap?

@ShooterIT
Copy link
Member

I think it is the time to solve this issue if necessary. Currently, we can't use C11 _Atomic directly. For this problem, in my opinion, volatile sig_atomic_t is enough, and for the global variable server.shutdown_asap, I think we also define it as volatile sig_atomic_t type. @oranagra @yossigo Please have a look.

https://en.cppreference.com/w/c/program/signal

@oranagra
Copy link
Member

i don't think this is really critical, these are integers which are probably atomic in the H/W anyway. and we don't really need any memory ordering here (these are stand alone variables and if the signal or a thread catches them in an old state it's not that bad AFAIK).

however, i don't see any disadvantage in declaring them as volatile sig_atomic_t either, so @ShooterIT if you think so too, feel free to make a PR. (please put enough details in the commit comment, including the claim that it's not a real issue)

@ShooterIT
Copy link
Member

ShooterIT commented Sep 21, 2020

@oranagra

these are integers which are probably atomic in the H/W anyway.

Yes, for most of platforms, 'sig_atomic_t' just is 'int'. For atomicity, that only means assigning a value instead of ++,--(e.g. flag = 1, not flag++), actually, we exactly only use assigning a value to them, so, I think it is ok for their atomicity in redis code. But we also should care about keyword volatile.

the signal or a thread catches them in an old state it's not that bad AFAIK

For this, I think there are some differences if read old state, just as @madolson said, i think it is a bad case.

someone could have killed the process before the flag is set and it doesn't exit as you would expect. You would then signal it again, and it would definitely die the second time.

Another case i think, we second signal maybe don't take effect if we read 0 of shutdown_asap in signal handler since we don't know which thread will execute signal handler.

It is unspecified if we read/write a variable without volatile sig_atomic_t type, right?

@ShooterIT
Copy link
Member

Till now, we only analyze this problem in theory, I made two assumptions.
First, we can't know which thread will execute signal handler, just as BradSwain said

a process-directed signal (sent using kill(2), for example) should be handled by a single, arbitrarily selected thread within the process

Second, I assume one thread may not read the newest state of value that is modified by another thread.

So, i did some tests just now, i found the result does not meet my expectations. I found the signal always be handled by main thread whatever on macOS, Ubuntu or redhat.

int global_var;

void *thread_func(void *v) {
    while (1) {
        for (size_t i = 0; i < 1000000; i++) {
            global_var = i;
        }
        sleep(1);
    }
    return NULL;
}

void signal_handler(int signal) {
    char buf[32];
    pthread_t tid = pthread_self();
    int len = ll2string(buf, sizeof(buf), tid); /* Copy from redis code */
    buf[len] = '\n';
    write(1, buf, len+1);
}

int main(int argc, char **argv) {
    pthread_t tid[4];
    pthread_create(tid,NULL,thread_func,NULL);
    pthread_create(tid+1,NULL,thread_func,NULL);
    pthread_create(tid+2,NULL,thread_func,NULL);
    pthread_create(tid+3,NULL,thread_func,NULL);
    for (int i = 0; i < 4; i++) {
        printf("thread %d: %ld\n", i, tid[i]);
    }
    printf("main thread: %ld\n", pthread_self());

    /* Install a signal handler. */
    signal(SIGINT, signal_handler);
    while (1) {
        for (size_t i = 0; i < 1000000; i++) {
            global_var = i;
        }
        sleep(1);
    }

    return 0;
}

If that is a stable behavior, my above viewpoints will be not meaningful. But POSIX doesn't declare this behavior, for portability, i still think we should use volatile sig_atomic_t type for them.

https://stackoverflow.com/questions/50037809/why-my-signal-h-define-sig-atomic-t-to-int-instead-of-volatile-int
https://en.cppreference.com/w/c/program/signal

Maybe i miss somethings, please point it if i'm wrong

@oranagra
Copy link
Member

@ShooterIT i'm not worried about the volatile property at all, it comes to tell the compiler not to cache the variable in a register and re-fetch it from memory every time, but since this variable isn't used in a tight loop, there's absolutely no chance the compiler will have it cached in a register.

Your experiment findings are very encouraging, for some reason Linux (and other platforms which i don't care for much) always delivers the signal to the main thread, it reduces the chance that we'll overlook something and it'll cause a real bug. (@yossigo @ushachar please drop a comment if you have a better understanding of that)

that said, there's no reason not to change the code to be more POSIX complaint, even if it won't fix any real bug, at least it'll stop questions and doubts.

i'll just ask that the commit comment contain enough details on the change, specifically indicating that it doesn't appear to solve any real problem.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@BradSwain BradSwain closed this Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants