Skip to content

fix(tracing): avoid curl's getenv calls#3528

Merged
morrisonlevi merged 2 commits intomasterfrom
levi/curl-no-getenv
Dec 12, 2025
Merged

fix(tracing): avoid curl's getenv calls#3528
morrisonlevi merged 2 commits intomasterfrom
levi/curl-no-getenv

Conversation

@morrisonlevi
Copy link
Copy Markdown
Collaborator

@morrisonlevi morrisonlevi commented Dec 11, 2025

Description

There’s a long‑standing crash caused by libcurl calling getenv from our background writer thread. We’ve often mitigated this by telling customers to use the sidecar sender instead, but since the sidecar is still not the default and this issue still happens weekly, roughly 3k per week across all of our customers, this PR addresses the root cause in the BGS path.

The typical backtrace looks like this:

#0   0x00007fabbbec214d getenv 
#1   0x00007fabb8838a09 curl_getenv 
#2   0x00007fabb8859356 curl_multi_perform 
#3   0x00007fabb882f05b curl_easy_perform 
#4   0x00007fabad37acb8 _dd_curl_send_stack (ext/coms.c:858)
#5   0x00007fabad37acb8 _dd_writer_loop (ext/coms.c:1138)

Curl is reading proxy configuration from environment variables (e.g. <scheme>_proxy, all_proxy, no_proxy), which internally goes through curl_getenv, which calls getenv on Linux.

Technical Fix

We now snapshot the relevant proxy environment variables once during MINIT, store them in process globals, and then configure our libcurl handles so that libcurl no longer needs to call curl_getenv in the writer thread.

New/changed pieces:

  • ddtrace_coms_minit_proxy_env: called from MINIT to read and strdup the proxy‑related env vars.
  • ddtrace_coms_mshutdown_proxy_env: called from MSHUTDOWN (after the writer has been shut down) to free those cached strings.
  • ddtrace_curl_set_hostname_generic: now always sets CURLOPT_NOPROXY (using the cached no_proxy/NO_PROXY value or "") and, when appropriate, CURLOPT_PROXY based on the cached proxy envs, so libcurl’s internal proxy detection never hits curl_getenv for these handles.

Testing

Due to the nature of the race, it's difficult to write an automated test for it. I set up a reproducer environment, and even there it's difficult for me to get it to trigger.

But I wrote with a getenv wrapper loaded through LD_PRELOAD:

#define _GNU_SOURCE
#include <dlfcn.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

static char *(*real_getenv)(const char *name);
static __thread int in_getenv;

char *getenv(const char *name) {
    if (!real_getenv) {
        real_getenv = dlsym(RTLD_NEXT, "getenv");
        if (!real_getenv) {
            /* If we can't find the real getenv, behave as if it were unset. */
            return NULL;
        }
    }

    /* Avoid recursion if any of the logging paths end up calling getenv. */
    if (in_getenv) {
        return real_getenv(name);
    }
    in_getenv = 1;

    /* Keep logging extremely simple and async-signal-safe(ish): no backtrace,
     * no stdio buffering. This should be safe even during very early process
     * startup (e.g. OpenSSL constructors reading OPENSSL_armcap). */
    static const char prefix[] = "[wrap_getenv] name=";
    write(STDERR_FILENO, prefix, sizeof(prefix) - 1);
    if (name) {
        size_t len = strlen(name);
        write(STDERR_FILENO, name, len);
    }
    write(STDERR_FILENO, "\n", 1);

    char *result = real_getenv(name);
    in_getenv = 0;
    return result;
}

And when I run it on master, I see that these env vars were being looked up by curl (using Debian, can vary a bit precisely depending on how curl was built):

  • no_proxy
  • http_proxy

And with this branch, those are gone.

Reviewer checklist

  • Test coverage seems ok.
  • Appropriate labels assigned.

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Dec 11, 2025

Benchmarks [ tracer ]

Benchmark execution time: 2025-12-11 18:24:33

Comparing candidate commit 3e24961 in PR branch levi/curl-no-getenv with baseline commit 14dc0a4 in branch master.

Found 1 performance improvements and 7 performance regressions! Performance is the same for 186 metrics, 0 unstable metrics.

scenario:MessagePackSerializationBench/benchMessagePackSerialization

  • 🟥 execution_time [+3.268µs; +4.372µs] or [+3.115%; +4.167%]

scenario:MessagePackSerializationBench/benchMessagePackSerialization-opcache

  • 🟥 execution_time [+8.990µs; +11.370µs] or [+8.605%; +10.882%]

scenario:SamplingRuleMatchingBench/benchRegexMatching1

  • 🟥 execution_time [+95.255ns; +165.345ns] or [+8.200%; +14.233%]

scenario:SamplingRuleMatchingBench/benchRegexMatching2

  • 🟥 execution_time [+71.746ns; +154.654ns] or [+6.051%; +13.044%]

scenario:SamplingRuleMatchingBench/benchRegexMatching3

  • 🟥 execution_time [+111.548ns; +169.652ns] or [+9.503%; +14.453%]

scenario:SamplingRuleMatchingBench/benchRegexMatching4

  • 🟥 execution_time [+69.709ns; +123.291ns] or [+5.890%; +10.418%]

scenario:SpanBench/benchOpenTelemetryAPI

  • 🟥 mem_peak [+1.446MB; +1.468MB] or [+3.478%; +3.531%]

scenario:TraceSerializationBench/benchSerializeTrace

  • 🟩 execution_time [-36.906µs; -21.294µs] or [-8.303%; -4.791%]

@morrisonlevi morrisonlevi marked this pull request as ready for review December 12, 2025 04:39
@morrisonlevi morrisonlevi requested a review from a team as a code owner December 12, 2025 04:39
@realFlowControl
Copy link
Copy Markdown
Member

Thanks for fixing this, we should get this (and the other tings lined up) shipped this week.

@morrisonlevi morrisonlevi merged commit e2834c8 into master Dec 12, 2025
2003 of 2007 checks passed
@morrisonlevi morrisonlevi deleted the levi/curl-no-getenv branch December 12, 2025 15:41
@github-actions github-actions Bot added this to the 1.15.0 milestone Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants