fix(profiling): store and restore errno in I/O profiling wrappers#3654
fix(profiling): store and restore errno in I/O profiling wrappers#3654realFlowControl merged 2 commits intomasterfrom
errno in I/O profiling wrappers#3654Conversation
|
✨ Fix all issues with BitsAI or with Cursor
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3654 +/- ##
==========================================
- Coverage 62.21% 62.10% -0.12%
==========================================
Files 141 141
Lines 13387 13387
Branches 1753 1753
==========================================
- Hits 8329 8314 -15
- Misses 4260 4274 +14
- Partials 798 799 +1 see 3 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Benchmarks [ profiler ]Benchmark execution time: 2026-02-17 12:32:59 Comparing candidate commit 5b45014 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 29 metrics, 7 unstable metrics. |
| impl ErrnoBackup { | ||
| /// Snapshots the current `errno` value. | ||
| #[inline] | ||
| unsafe fn new() -> Self { |
There was a problem hiding this comment.
It's merged already but is this unsafe? Sure it uses unsafe code but I'm not seeing how this operation is unsafe.
There was a problem hiding this comment.
It would most likely be more correct to not have the function unsafe, but wrap the unsafe bits in the code itself, but then it would be just an unsafe block around the entire body or like this:
fn new() -> Self {
let location = unsafe { libc::__errno_location() };
Self {
errno: unsafe { *location },
location,
}
}Now that I see it looks okay actually haha ... 😉
Description
I/O profiling wraps libc I/O calls (for example
poll,recv*,send*,read/write,fread/fwrite,close) to measure duration and decide whether to collect a sample. That wrapper logic runs after the original syscall returns and includes timing and Poisson sampling-distance updates.Callers rely on
errnofrom the original syscall return path. The additional wrapper work can also toucherrno(for example error paths inclock_gettime, and libm operations used byrand_distr::Poisson, such asexp/tan/log-related math). Without preservation, the wrapper can return the syscall result with an unrelatederrno.This PR snapshots
errnoimmediately after the original syscall and restores it before returning from the wrapper, making the hook transparent to callers.Why now?
We had a customer escalation with I/O profiling enabled where phpredis intermittently reported
RedisException: Connection lost. In reproduced runs, wrapped syscalls returned witherrno=EAGAIN (11), but by wrapper returnerrnohad changed toERANGE (34). That incorrecterrnodrove phpredis into its disconnect path. Restoringerrnofixes this behavior.Alternatives
An alternative is to defer more work (especially sampling-distance recomputation) to
EG(vm_interrupt), similar to wall-time profiling. That could reduce risk in syscall wrappers, but it does not remove the need to preserveerrnoin wrappers, which is why this PR keeps the current architecture. It also would not have fixed this bug, it would only make the race less likely to trigger and harder to diagnose.Reviewer checklist
PROF-13760