Access evsig_base_fd via the shared struct by akonradi · Pull Request #741 · libevent/libevent

akonradi · 2019-01-02T16:28:52Z

Without this fix, there is a data race on evsig_base_fd between evsig_add and evsig_handler. Normally the EVSIGBASE_LOCK() and EVSIGBASE_UNLOCK() calls would make all the operations atomic, but since evsig_handler runs in response to a signal, it can't acquire locks. If a signal is received during a call to evsig_add, the write to evsig_base_fd might not complete even though evsig_base was updated. This results in evsig_handler reading the wrong fd.

To fix this, this patch makes sure that evsig_handler only reads the fd from evsig_base, which is only ever updated atomically. Since evsig_base_fd is now no longer read from at all, it is removed.

Without this fix, there is a data race on evsig_base_fd between evsig_add and evsig_handler. Normally the EVSIGBASE_LOCK() and _UNLOCK() calls would make all the operations atomic, but since evsig_handler runs in response to a signal, it can't acquire locks. If a signal is received during a call to evsig_add, the write to evsig_base_fd might not complete even though evsig_base was updated. This results in evsig_handler reading the wrong fd. To fix this, this patch makes sure that evsig_handler only reads the fd from evsig_base, which is only ever updated atomically. Since evsig_base_fd is now no longer read from at all, it is removed.

coveralls · 2019-01-02T16:54:34Z

Coverage increased (+0.006%) to 80.525% when pulling edaeb93 on akonradi:remove-evsig-base-fd into 246f440 on libevent:master.

azat

So before your patch next situation is possible:

T1                              T2

evsignal_add()                  evsig_handler()
evsig_add()
EVSIGBASE_LOCK()
evsig_base = base
                                write(fd, &sig, 1, 0)
evsig_base_fd = base.fd

And you concern about writing to the wrong fd? (fd that uses for notifications previous base that was used with signals), and is this a problem?
I don't think so, since:

even after this patch this is still possible, if evsig_handler() will read fd before evsig_add() will switch the base
you cannot change base if there is active signal events in it (see evsig_base_n_signals_added)

Although I see one positive thing that this patch does - reducing amount of global variables. In 95a7d41 patch (and this bug) this variable had been introduced with the message "Make default signal backend fully threadsafe", but I don't see any problem without this variable, I guess that the relevant part in that patch was writing signal instead of one static byte to simplify signal processing.

That said that I like this patch but with another description, what you will say?

write to evsig_base_fd might not complete even though evsig_base was updated

Almost always there will be SA_RESTART.
But even without SA_RESTART, suppose that write(2) fails, so what the problem do you see here?

azat · 2019-01-05T00:18:24Z

signal.c

 #endif
 	ev_uint8_t msg;
+	evutil_socket_t base_fd;
+	struct event_base* base;


struct event_base *base not struct event_base* base

akonradi · 2019-01-07T13:31:10Z

In the scenario you present, it's certainly possible for evsig_handler() to send the operation to the "wrong" base in terms of absolute order, but that's always going to be a possibility since evsig_handler() can't grab a lock. My goal is to make the resulting trace linearizable, where the resulting linear trace would have evsig_handler() run before evsig_add(). Since evsig_handler() now keeps a local copy of evsig_base, its view is of the state before evsig_add() is called. This prevents a write from getting lost if, in your trace above, evsig_base_fd was -1 but evsig_base was updated to a non-null value.

Furthermore, it's worth noting that in the case you describe (without this patch), it's possible for either the compiler or processor to reorder the writes to evsig_base and evsig_base_fd such that either one is written first, which could result in the fd that receives the write being the one from the new base before evsig_base is set. This patch eliminates that possibility (assuming pointer writes are atomic).

I'd be happy to change the patch description but I think there is a real bug here that I intended this patch to address.

azat · 2019-01-07T14:06:49Z

In the scenario you present, it's certainly possible for `evsig_handler()` to send the operation to the "wrong" base in terms of absolute order, but that's always going to be a possibility since `evsig_handler()` can't grab a lock.

Indeed (and the whole signal handling with this backend is not that great, signalfd and other alternatives is better).

My goal is to make the resulting trace linearizable, where the resulting linear trace would have `evsig_handler()` run before `evsig_add()`.

How you guarantee this with this patch? IOW who will stop evsig_handler() from running after evsig_add() changes the base? Or I missing something? And what will be if that base will be freed already? (it is very very unlikely but I guess that this can be reproduced by playing with task priorities, RR for the event_base_free and !RR for the evsig_handler() and limiting the process to one CPU with cgroups CPU controller, but not sure 100%). T1: T2: evsig_add(base1, INT) evsig_handler() event_base_free(base1, INT) base = evsig_base evsig_del(base1, INT) ... <preempted by the sched> ... evsig_base_fd = -1 event_base_free() returned ... <continued by the sched> ... write(base.fd) <-- closed fd and use-after-tree (UAF), and UAF can SIGSEGV the program while EBADF not, although it is till very bad thing

Since `evsig_handler()` now keeps a local copy of `evsig_base`

It just holds a pointer to the base, not the base itself.

Furthermore, it's worth noting that in the case you describe (without this patch), it's possible for either the compiler or processor to reorder the writes to `evsig_base` and `evsig_base_fd` such that either one is written first, which could result in the fd that receives the write being the one from the *new* base before `evsig_base` is set.

And is this a problem? But anyway here the warning about signals events from different bases will be triggered, and I would prefer just making this path to return an error along with printing that warning (and if we are talking about the "race" between evsig_handler() and evsig_add() then this should be it). And personally I would not recommend using signals from different bases with libevent (at least for now, until we have such ugly backend for this). If I misunderstood you, can you provide possible scenario by describing thread states? (like the one above)

akonradi · 2019-01-09T21:32:41Z

By "make the resulting trace linearizable", I mean that for any trace that executes across multiple processors, there exists some ordering on a single processor such that the result of executing the single-processor and multi-processor traces is the same. Taking your original trace but with this patch applied, we'd have

T1                              T2

evsignal_add()                  evsig_handler()
evsig_add()
EVSIGBASE_LOCK()
                                base = evsig_base
evsig_base = base
                                base_fd = base->sig.ev_signal_pair[1];
                                write(base_fd, &sig, 1, 0)

so while the write to base_fd is not using the evsig_base set by T1, the effect is the same as if the following trace was run on a single core:

evsignal_add()
evsig_handler()
evsig_add()

You're right that this is still subject to use-after-free issues, unlikely as they might be. I don't think it's completely possible to solve that while the caller is the one allocating the memory used in evsig_base.

Since evsig_handler() now keeps a local copy of evsig_base

It just holds a pointer to the base, not the base itself.

Right, that was what I meant. It holds a copy of evsig_base, though, so if evsig_base changes during execution, it still sees the original value.

write(base.fd) <-- closed fd and use-after-tree (UAF), and UAF can SIGSEGV the program while EBADF not, although it is till very bad thing

This will not cause a segfault because base_fd is a local variable allocated on the signal handler's stack. The potential segfault would occur when base_fd is initialized, though as you mention above, it would require a particularly tricky race sequence and I don't think is easily solvable, and definitely out of scope of this PR.

I don't think this answers everything you brought up, but hopefully it clears some things up?

azat · 2019-01-11T17:34:53Z

write(base.fd) # SIGSEGV

This will not cause a segfault because base_fd is a local variable allocated on the signal handler's stack

By base.fd I meant base->sig.ev_signal_pair[1]

I don't think is easily solvable, and definitely out of scope of this PR.

SIGSEGV is not possible with the code that we have now (even though it sucks in a lot of other ways).

azat · 2019-01-11T17:36:28Z

And speaking about this problem, have you triggered the race that you report with your code? Or it is just about possibility?

akonradi · 2019-01-11T18:00:16Z

And speaking about this problem, have you triggered the race that you report with your code? Or it is just about possibility?

I haven't seen the race occur in practice, but I've seen TSAN errors reported for code using libevent around evsig_base_fd that don't reproduce with this patch.

htuch · 2019-02-27T19:40:25Z

@azat do you have suggestions on how to move this PR forward? We're likely to hit this in OSS Envoy/libevent now that we're enabling TSAN runs (see a precursor TSAN race at envoyproxy/envoy#6083).

This is needed to unblock envoyproxy#6196 while libevent/libevent#741 (and consequently envoyproxy#6083) go unresolved. It's not particularly wonderful, but it will allow us to make progress independent of the libevent work. Risk level: Low Testing: TSAN runs with envoyproxy#6196. Signed-off-by: Harvey Tuch <[email protected]>

This is needed to unblock #6196 while libevent/libevent#741 (and consequently #6083) go unresolved. It's not particularly wonderful, but it will allow us to make progress independent of the libevent work. Risk level: Low Testing: TSAN runs with #6196. Signed-off-by: Harvey Tuch <[email protected]>

See comment #13 on the bug for an explanation of what this fixes, what it doesn't fix, and why this hasn't been fixed upstream yet. See also libevent/libevent#779 (upstream bug report) and libevent/libevent#741 (stalled pull request with comments explaining the larger problem in more detail). Differential Revision: https://phabricator.services.mozilla.com/D85781

See comment #13 on the bug for an explanation of what this fixes, what it doesn't fix, and why this hasn't been fixed upstream yet. See also libevent/libevent#779 (upstream bug report) and libevent/libevent#741 (stalled pull request with comments explaining the larger problem in more detail). Differential Revision: https://phabricator.services.mozilla.com/D85781 UltraBlame original commit: c574dd7d46ef7ed237980e2672deb9ab09cea5c5

See comment #13 on the bug for an explanation of what this fixes, what it doesn't fix, and why this hasn't been fixed upstream yet. See also libevent/libevent#779 (upstream bug report) and libevent/libevent#741 (stalled pull request with comments explaining the larger problem in more detail). Differential Revision: https://phabricator.services.mozilla.com/D85781

azat reviewed Jan 6, 2019

View reviewed changes

htuch mentioned this pull request Feb 27, 2019

Thread unsafe signal handling in main_common_test envoyproxy/envoy#6083

Open

azat mentioned this pull request Feb 27, 2019

evsig_base is thread unsafe #779

Open

azat added the subsystem:signal label Feb 27, 2019

htuch mentioned this pull request Mar 6, 2019

main_common_test: workaround for libevent TSAN failures. envoyproxy/envoy#6200

Merged

htuch mentioned this pull request May 15, 2019

Update Envoy SHA to latest. istio/proxy#2214

Merged

akonradi closed this Oct 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Access evsig_base_fd via the shared struct#741

Access evsig_base_fd via the shared struct#741
akonradi wants to merge 1 commit intolibevent:masterfrom
akonradi:remove-evsig-base-fd

akonradi commented Jan 2, 2019

Uh oh!

coveralls commented Jan 2, 2019

Uh oh!

azat left a comment

Uh oh!

azat Jan 5, 2019

Uh oh!

akonradi commented Jan 7, 2019

Uh oh!

azat commented Jan 7, 2019 via email

Uh oh!

akonradi commented Jan 9, 2019

Uh oh!

azat commented Jan 11, 2019

Uh oh!

azat commented Jan 11, 2019

Uh oh!

akonradi commented Jan 11, 2019

Uh oh!

htuch commented Feb 27, 2019

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

Conversation

akonradi commented Jan 2, 2019

Uh oh!

coveralls commented Jan 2, 2019

Uh oh!

azat left a comment

Choose a reason for hiding this comment

Uh oh!

azat Jan 5, 2019

Choose a reason for hiding this comment

Uh oh!

akonradi commented Jan 7, 2019

Uh oh!

azat commented Jan 7, 2019 via email

Uh oh!

akonradi commented Jan 9, 2019

Uh oh!

azat commented Jan 11, 2019

Uh oh!

azat commented Jan 11, 2019

Uh oh!

akonradi commented Jan 11, 2019

Uh oh!

htuch commented Feb 27, 2019

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants