Skip to content

fix(profiling): store and restore errno in I/O profiling wrappers#3654

Merged
realFlowControl merged 2 commits intomasterfrom
florian/fix-io-profiling
Feb 17, 2026
Merged

fix(profiling): store and restore errno in I/O profiling wrappers#3654
realFlowControl merged 2 commits intomasterfrom
florian/fix-io-profiling

Conversation

@realFlowControl
Copy link
Copy Markdown
Member

Description

I/O profiling wraps libc I/O calls (for example poll, recv*, send*, read/write, fread/fwrite, close) to measure duration and decide whether to collect a sample. That wrapper logic runs after the original syscall returns and includes timing and Poisson sampling-distance updates.

Callers rely on errno from the original syscall return path. The additional wrapper work can also touch errno (for example error paths in clock_gettime, and libm operations used by rand_distr::Poisson, such as exp/tan/log-related math). Without preservation, the wrapper can return the syscall result with an unrelated errno.

This PR snapshots errno immediately after the original syscall and restores it before returning from the wrapper, making the hook transparent to callers.

Why now?

We had a customer escalation with I/O profiling enabled where phpredis intermittently reported RedisException: Connection lost. In reproduced runs, wrapped syscalls returned with errno=EAGAIN (11), but by wrapper return errno had changed to ERANGE (34). That incorrect errno drove phpredis into its disconnect path. Restoring errno fixes this behavior.

Alternatives

An alternative is to defer more work (especially sampling-distance recomputation) to EG(vm_interrupt), similar to wall-time profiling. That could reduce risk in syscall wrappers, but it does not remove the need to preserve errno in wrappers, which is why this PR keeps the current architecture. It also would not have fixed this bug, it would only make the race less likely to trigger and harder to diagnose.

Reviewer checklist

  • Test coverage seems ok.
  • Appropriate labels assigned.

PROF-13760

@datadog-datadog-prod-us1
Copy link
Copy Markdown

datadog-datadog-prod-us1 Bot commented Feb 17, 2026

⚠️ Tests

Fix all issues with BitsAI or with Cursor

⚠️ Warnings

🧪 1026 Tests failed

    testSearchPhpBinaries from integration.DDTrace\Tests\Integration\PHPInstallerTest (Fix with Cursor)

    testSimplePushAndProcess from laravel-58-test.DDTrace\Tests\Integrations\Laravel\V5_8\QueueTest (Fix with Cursor)

testSimplePushAndProcess from laravel-8x-test.DDTrace\Tests\Integrations\Laravel\V8_x\QueueTest (Datadog) (Fix with Cursor)
DDTrace\Tests\Integrations\Laravel\V8_x\QueueTest::testSimplePushAndProcess
Test code or tested code printed unexpected output: spanLinksTraceId: 69945e460000000027b2905f9ae2a63e
tid: 69945e4600000000
hexProcessTraceId: 27b2905f9ae2a63e
hexProcessSpanId: 72910338220bbe89
processTraceId: 2860507453628524094
processSpanId: 8255383131571076745

phpvfscomposer://tests/vendor/phpunit/phpunit/phpunit:106
View all

ℹ️ Info

❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 5b45014 | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@github-actions github-actions Bot added profiling Relates to the Continuous Profiler tracing labels Feb 17, 2026
@realFlowControl realFlowControl marked this pull request as ready for review February 17, 2026 11:42
@realFlowControl realFlowControl requested a review from a team as a code owner February 17, 2026 11:42
Comment thread profiling/src/io/mod.rs Outdated
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 62.10%. Comparing base (72a021a) to head (5b45014).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3654      +/-   ##
==========================================
- Coverage   62.21%   62.10%   -0.12%     
==========================================
  Files         141      141              
  Lines       13387    13387              
  Branches     1753     1753              
==========================================
- Hits         8329     8314      -15     
- Misses       4260     4274      +14     
- Partials      798      799       +1     

see 3 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 72a021a...5b45014. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Feb 17, 2026

Benchmarks [ profiler ]

Benchmark execution time: 2026-02-17 12:32:59

Comparing candidate commit 5b45014 in PR branch florian/fix-io-profiling with baseline commit 72a021a in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 29 metrics, 7 unstable metrics.

@realFlowControl realFlowControl merged commit c4d3e54 into master Feb 17, 2026
2071 checks passed
@realFlowControl realFlowControl deleted the florian/fix-io-profiling branch February 17, 2026 15:58
@github-actions github-actions Bot added this to the 1.17.0 milestone Feb 17, 2026
Comment thread profiling/src/io/mod.rs
impl ErrnoBackup {
/// Snapshots the current `errno` value.
#[inline]
unsafe fn new() -> Self {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's merged already but is this unsafe? Sure it uses unsafe code but I'm not seeing how this operation is unsafe.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would most likely be more correct to not have the function unsafe, but wrap the unsafe bits in the code itself, but then it would be just an unsafe block around the entire body or like this:

    fn new() -> Self {
        let location = unsafe { libc::__errno_location() };
        Self {
            errno: unsafe { *location },
            location,
        }
    }

Now that I see it looks okay actually haha ... 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

profiling Relates to the Continuous Profiler tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants