Skip to content

Access evsig_base_fd via the shared struct#741

Closed
akonradi wants to merge 1 commit intolibevent:masterfrom
akonradi:remove-evsig-base-fd
Closed

Access evsig_base_fd via the shared struct#741
akonradi wants to merge 1 commit intolibevent:masterfrom
akonradi:remove-evsig-base-fd

Conversation

@akonradi
Copy link
Copy Markdown

@akonradi akonradi commented Jan 2, 2019

Without this fix, there is a data race on evsig_base_fd between evsig_add and evsig_handler. Normally the EVSIGBASE_LOCK() and EVSIGBASE_UNLOCK() calls would make all the operations atomic, but since evsig_handler runs in response to a signal, it can't acquire locks. If a signal is received during a call to evsig_add, the write to evsig_base_fd might not complete even though evsig_base was updated. This results in evsig_handler reading the wrong fd.

To fix this, this patch makes sure that evsig_handler only reads the fd from evsig_base, which is only ever updated atomically. Since evsig_base_fd is now no longer read from at all, it is removed.

Without this fix, there is a data race on evsig_base_fd between
evsig_add and evsig_handler. Normally the EVSIGBASE_LOCK() and _UNLOCK()
calls would make all the operations atomic, but since evsig_handler runs
in response to a signal, it can't acquire locks. If a signal is received
during a call to evsig_add, the write to evsig_base_fd might not
complete even though evsig_base was updated. This results in
evsig_handler reading the wrong fd.

To fix this, this patch makes sure that evsig_handler only reads the fd
from evsig_base, which is only ever updated atomically. Since
evsig_base_fd is now no longer read from at all, it is removed.
@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage increased (+0.006%) to 80.525% when pulling edaeb93 on akonradi:remove-evsig-base-fd into 246f440 on libevent:master.

Copy link
Copy Markdown
Member

@azat azat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So before your patch next situation is possible:

T1                              T2

evsignal_add()                  evsig_handler()
evsig_add()
EVSIGBASE_LOCK()
evsig_base = base
                                write(fd, &sig, 1, 0)
evsig_base_fd = base.fd

And you concern about writing to the wrong fd? (fd that uses for notifications previous base that was used with signals), and is this a problem?
I don't think so, since:

  • even after this patch this is still possible, if evsig_handler() will read fd before evsig_add() will switch the base
  • you cannot change base if there is active signal events in it (see evsig_base_n_signals_added)

Although I see one positive thing that this patch does - reducing amount of global variables. In 95a7d41 patch (and this bug) this variable had been introduced with the message "Make default signal backend fully threadsafe", but I don't see any problem without this variable, I guess that the relevant part in that patch was writing signal instead of one static byte to simplify signal processing.

That said that I like this patch but with another description, what you will say?

write to evsig_base_fd might not complete even though evsig_base was updated

Almost always there will be SA_RESTART.
But even without SA_RESTART, suppose that write(2) fails, so what the problem do you see here?

#endif
ev_uint8_t msg;
evutil_socket_t base_fd;
struct event_base* base;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct event_base *base not struct event_base* base

@akonradi
Copy link
Copy Markdown
Author

akonradi commented Jan 7, 2019

In the scenario you present, it's certainly possible for evsig_handler() to send the operation to the "wrong" base in terms of absolute order, but that's always going to be a possibility since evsig_handler() can't grab a lock. My goal is to make the resulting trace linearizable, where the resulting linear trace would have evsig_handler() run before evsig_add(). Since evsig_handler() now keeps a local copy of evsig_base, its view is of the state before evsig_add() is called. This prevents a write from getting lost if, in your trace above, evsig_base_fd was -1 but evsig_base was updated to a non-null value.

Furthermore, it's worth noting that in the case you describe (without this patch), it's possible for either the compiler or processor to reorder the writes to evsig_base and evsig_base_fd such that either one is written first, which could result in the fd that receives the write being the one from the new base before evsig_base is set. This patch eliminates that possibility (assuming pointer writes are atomic).

I'd be happy to change the patch description but I think there is a real bug here that I intended this patch to address.

@azat
Copy link
Copy Markdown
Member

azat commented Jan 7, 2019 via email

@akonradi
Copy link
Copy Markdown
Author

akonradi commented Jan 9, 2019

By "make the resulting trace linearizable", I mean that for any trace that executes across multiple processors, there exists some ordering on a single processor such that the result of executing the single-processor and multi-processor traces is the same. Taking your original trace but with this patch applied, we'd have

T1                              T2

evsignal_add()                  evsig_handler()
evsig_add()
EVSIGBASE_LOCK()
                                base = evsig_base
evsig_base = base
                                base_fd = base->sig.ev_signal_pair[1];
                                write(base_fd, &sig, 1, 0)

so while the write to base_fd is not using the evsig_base set by T1, the effect is the same as if the following trace was run on a single core:

evsignal_add()
evsig_handler()
evsig_add()

You're right that this is still subject to use-after-free issues, unlikely as they might be. I don't think it's completely possible to solve that while the caller is the one allocating the memory used in evsig_base.

Since evsig_handler() now keeps a local copy of evsig_base

It just holds a pointer to the base, not the base itself.

Right, that was what I meant. It holds a copy of evsig_base, though, so if evsig_base changes during execution, it still sees the original value.

write(base.fd) <-- closed fd and use-after-tree (UAF), and UAF can SIGSEGV the program while EBADF not, although it is till very bad thing

This will not cause a segfault because base_fd is a local variable allocated on the signal handler's stack. The potential segfault would occur when base_fd is initialized, though as you mention above, it would require a particularly tricky race sequence and I don't think is easily solvable, and definitely out of scope of this PR.

I don't think this answers everything you brought up, but hopefully it clears some things up?

@azat
Copy link
Copy Markdown
Member

azat commented Jan 11, 2019

write(base.fd) # SIGSEGV

This will not cause a segfault because base_fd is a local variable allocated on the signal handler's stack

By base.fd I meant base->sig.ev_signal_pair[1]

I don't think is easily solvable, and definitely out of scope of this PR.

SIGSEGV is not possible with the code that we have now (even though it sucks in a lot of other ways).

@azat
Copy link
Copy Markdown
Member

azat commented Jan 11, 2019

And speaking about this problem, have you triggered the race that you report with your code? Or it is just about possibility?

@akonradi
Copy link
Copy Markdown
Author

And speaking about this problem, have you triggered the race that you report with your code? Or it is just about possibility?

I haven't seen the race occur in practice, but I've seen TSAN errors reported for code using libevent around evsig_base_fd that don't reproduce with this patch.

@htuch
Copy link
Copy Markdown

htuch commented Feb 27, 2019

@azat do you have suggestions on how to move this PR forward? We're likely to hit this in OSS Envoy/libevent now that we're enabling TSAN runs (see a precursor TSAN race at envoyproxy/envoy#6083).

htuch added a commit to htuch/envoy that referenced this pull request Mar 6, 2019
This is needed to unblock envoyproxy#6196 while
libevent/libevent#741 (and consequently
envoyproxy#6083) go unresolved.

It's not particularly wonderful, but it will allow us to make progress independent of the libevent
work.

Risk level: Low
Testing: TSAN runs with envoyproxy#6196.

Signed-off-by: Harvey Tuch <[email protected]>
htuch added a commit to envoyproxy/envoy that referenced this pull request Mar 7, 2019
This is needed to unblock #6196 while
libevent/libevent#741 (and consequently
#6083) go unresolved.

It's not particularly wonderful, but it will allow us to make progress independent of the libevent
work.

Risk level: Low
Testing: TSAN runs with #6196.

Signed-off-by: Harvey Tuch <[email protected]>
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Aug 4, 2020
See comment #13 on the bug for an explanation of what this fixes, what it
doesn't fix, and why this hasn't been fixed upstream yet.

See also libevent/libevent#779 (upstream bug report)
and libevent/libevent#741 (stalled pull request
with comments explaining the larger problem in more detail).

Differential Revision: https://phabricator.services.mozilla.com/D85781
xeonchen pushed a commit to xeonchen/gecko that referenced this pull request Aug 6, 2020
See comment #13 on the bug for an explanation of what this fixes, what it
doesn't fix, and why this hasn't been fixed upstream yet.

See also libevent/libevent#779 (upstream bug report)
and libevent/libevent#741 (stalled pull request
with comments explaining the larger problem in more detail).

Differential Revision: https://phabricator.services.mozilla.com/D85781
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this pull request Aug 6, 2020
See comment #13 on the bug for an explanation of what this fixes, what it
doesn't fix, and why this hasn't been fixed upstream yet.

See also libevent/libevent#779 (upstream bug report)
and libevent/libevent#741 (stalled pull request
with comments explaining the larger problem in more detail).

Differential Revision: https://phabricator.services.mozilla.com/D85781

UltraBlame original commit: c574dd7d46ef7ed237980e2672deb9ab09cea5c5
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this pull request Aug 6, 2020
See comment #13 on the bug for an explanation of what this fixes, what it
doesn't fix, and why this hasn't been fixed upstream yet.

See also libevent/libevent#779 (upstream bug report)
and libevent/libevent#741 (stalled pull request
with comments explaining the larger problem in more detail).

Differential Revision: https://phabricator.services.mozilla.com/D85781

UltraBlame original commit: c574dd7d46ef7ed237980e2672deb9ab09cea5c5
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this pull request Aug 6, 2020
See comment #13 on the bug for an explanation of what this fixes, what it
doesn't fix, and why this hasn't been fixed upstream yet.

See also libevent/libevent#779 (upstream bug report)
and libevent/libevent#741 (stalled pull request
with comments explaining the larger problem in more detail).

Differential Revision: https://phabricator.services.mozilla.com/D85781

UltraBlame original commit: c574dd7d46ef7ed237980e2672deb9ab09cea5c5
@akonradi akonradi closed this Oct 9, 2020
bhearsum pushed a commit to mozilla-releng/staging-firefox that referenced this pull request May 1, 2025
See comment #13 on the bug for an explanation of what this fixes, what it
doesn't fix, and why this hasn't been fixed upstream yet.

See also libevent/libevent#779 (upstream bug report)
and libevent/libevent#741 (stalled pull request
with comments explaining the larger problem in more detail).

Differential Revision: https://phabricator.services.mozilla.com/D85781
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants