feat(sidecar): add thread mode as fallback connection for restricted environments#3573
feat(sidecar): add thread mode as fallback connection for restricted environments#3573
Conversation
|
✨ Fix all issues with BitsAI or with Cursor
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3573 +/- ##
==========================================
- Coverage 68.74% 68.68% -0.07%
==========================================
Files 166 166
Lines 19030 19030
Branches 1797 1797
==========================================
- Hits 13082 13070 -12
- Misses 5132 5145 +13
+ Partials 816 815 -1
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Benchmarks [ tracer ]Benchmark execution time: 2026-03-17 18:07:49 Comparing candidate commit 1db24ed in PR branch Found 1 performance improvements and 3 performance regressions! Performance is the same for 190 metrics, 0 unstable metrics. scenario:ComposerTelemetryBench/benchTelemetryParsing
scenario:HookBench/benchHookOverheadTraceMethod-opcache
scenario:SamplingRuleMatchingBench/benchRegexMatching4
scenario:TraceAnnotationsBench/benchTraceAnnotationOverhead-opcache
|
ce20a9e to
2eb0609
Compare
Benchmarks [ profiler ]Benchmark execution time: 2026-03-20 13:07:55 Comparing candidate commit b0dd55c in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 28 metrics, 8 unstable metrics. |
06223eb to
cda73e8
Compare
211d162 to
0b1eb24
Compare
Signed-off-by: Alexandre Rulleau <[email protected]>
Signed-off-by: Alexandre Rulleau <[email protected]>
Signed-off-by: Alexandre Rulleau <[email protected]>
Signed-off-by: Alexandre Rulleau <[email protected]>
Signed-off-by: Alexandre Rulleau <[email protected]>
…lter With newer libdatadog (29e00628b), debugger logs are routed to /debugger/v2/input instead of /debugger/v1/diagnostics. The filter in replayDebuggerData() still included /debugger/v1/diagnostics with body (added in c3be656 for older routing), which caused it to return a diagnostics response before the actual /debugger/v2/input snapshot, breaking debugger_span_decoration_probe.phpt. Remove the diagnostics clause — only /debugger/v1/input and /debugger/v2/input are snapshot/log endpoints.
…back routing libdatadog 29e00628b routes logs to /debugger/v2/input but falls back to /debugger/v1/diagnostics when the agent doesn't support v2 (as is the case with the request replayer in tests). Update the expected URI in debugger_span_decoration_probe.phpt accordingly. Also restore the /debugger/v1/diagnostics filter in replayDebuggerData() which was incorrectly removed in the previous commit.
…ailable In ddtrace_sidecar_setup_thread_mode, a forked child detecting is_child_process=true would try to connect to the parent's thread listener. If the parent used subprocess mode (no thread listener), the connect would fail and the child returned with no sidecar and no fallback. Mirror the existing fallback logic from ddtrace_sidecar_handle_fork: when the parent's listener is unavailable, reset ddtrace_sidecar_master_pid to the current process and fall through to start a new master listener in this process.
…ct path When a parent process initializes the sidecar in thread mode, forks, and then exits, the child inherits a broken transport (parent's listener thread is dead). In dd_sidecar_connect(), if ddog_sidecar_connect_worker() fails and current_pid != master_pid, promote the child to master so it can still submit traces. The existing fallback in ddtrace_sidecar_setup_thread_mode covers the initial-setup path, but the reconnect path (ddtrace_sidecar_connect_callback -> dd_sidecar_connect) had no equivalent fallback, causing a silent failure for orphaned children that already had an inherited transport. Add a .phpt test that verifies the orphaned child can create and submit spans after the parent exits.
… compatibility Thread-mode sockets now include the master's effective uid in the filename (libdd.<ver>@<uid>-<pid>.sock in /tmp/libdatadog/). A worker process that later drops privileges via setuid() (e.g. www-data under PHP-FPM) still computes the same socket path as the master listener, and ensure_dir_exists now best-effort chmods the directory world-writable to allow socket creation by any user. Also fixes a double-dot bug in socket/lock path construction (Rust >=1.87 no longer strips leading dots from with_extension arguments). Adds test: sidecar_thread_mode_permissions.phpt verifies the socket is created with the correct uid-pid encoding.
Update libdatadog submodule: thread mode sidecar now uses abstract Unix sockets on Linux (no filesystem permissions needed, any user can connect) and a single-threaded Tokio runtime (no extra OS threads, fixes LSan "Running thread was not suspended" ASAN warnings at process exit). Update sidecar_thread_mode_permissions.phpt to verify abstract socket usage (no filesystem socket created) instead of checking file permissions.
Signed-off-by: Alexandre Rulleau <[email protected]>
After fork, the child inherits the parent's heap including the Tokio current_thread runtime allocations and Rust stdlib once-cell inits from the sidecar thread. Since threads don't survive fork, these are orphaned allocations that LSan incorrectly reports as leaks. This is the same reason daemonize() already sets LSAN_OPTIONS=detect_leaks=0 for the subprocess sidecar.
When the PHP-FPM master process runs as root, the sidecar thread (thread mode) creates named SHM objects with 0600 by default, making them inaccessible to worker processes running as www-data. Call ddog_sidecar_set_shm_open_mode(0644) before starting the master sidecar listener when geteuid()==0, so workers can open the SHM regions read-only. The mode is set in both ddtrace_sidecar_setup_thread_mode() and ddtrace_sidecar_minit() to cover all code paths.
…-user SHM Replace the incorrect set_shm_open_mode(0644) workaround with the proper fchown-based fix implemented in the libdatadog submodule. The SHM ownership is now transferred to the worker's UID (obtained via SO_PEERCRED on first connection) rather than relaxing file permissions. - Remove ddog_sidecar_set_shm_open_mode() calls from thread mode setup - Remove declaration from components-rs/sidecar.h - Update libdatadog submodule to pick up the fchown implementation
…er thread In thread mode with PHP-FPM, dd_activate_once (called via pthread_once on first RINIT) runs independently in each worker process since the master never serves requests. When subprocess mode fails and thread mode is attempted, each worker hit the fallback path in ddtrace_sidecar_setup_thread_mode() that called ddog_sidecar_connect_master() — starting a new listener thread per worker. The master listener must only be started in MINIT (via ddtrace_sidecar_minit()) in the master process, so it survives PHP-FPM forking. When a worker cannot connect to the master's listener, it now logs a warning and runs without the sidecar instead of spawning its own thread. Non-child processes (master, CLI) retain the ability to start a new listener as a fallback, preserving the behavior requested in earlier review feedback.
…er SHM Adds a test that exercises the fchown() cross-user SHM path: PHP-FPM master runs as root, workers switch to an unprivileged user (www-data/daemon/nobody), and thread mode is used. This is the exact scenario that motivated the SO_PEERCRED + fchown() fix. Infrastructure changes: - www.conf: add {{user_group}} placeholder for optional user/group directives - PhpFpm.php: accept $fpmUser/$fpmGroup constructor params - WebServer.php: add setPhpFpmUser() and pass it through to PhpFpm - WebFrameworkTestCase.php: add configureWebServer() hook (called before start()) so subclasses can apply extra server config without reimplementing the full setUpWebServer() logic New test (SidecarThreadModeRootTest): - Skips if not root, not fpm-fcgi SAPI, or no unprivileged user available - testTracesAreSubmittedWithRootMasterAndUnprivilegedWorker: a single request through a root-master/www-data-worker FPM pool must produce traces — failure means the worker cannot access the SHM after fchown() - testMultipleWorkersShareSingleMasterListenerThread: 3 requests with multiple workers must all succeed, ensuring the per-worker-thread regression is caught
`void` return type hint was introduced in PHP 7.1; PHP 7.0 CI jobs were failing with TypeError on every WebFrameworkTestCase subclass.
Remove the DD_TRACE_TEST_SAPI=fpm-fcgi skip condition and instead call putenv() to force FPM mode before the parent sets up the web server. This allows the test to run in every test_web_custom matrix entry (cli-server, cgi-fcgi, apache2handler) rather than only when the CI job explicitly sets fpm-fcgi SAPI.
Use untilNumberOfTraces(3) so the test agent waits for all 3 traces before collecting, instead of returning early after the first one. Also restore the >= 3 assertion which is now reliable.
Add $runAsSudo param to PhpFpm and setPhpFpmSudo() to WebServer so php-fpm can be started as root even when the test runner is non-root. The test now skips only if neither root nor passwordless sudo is available, allowing it to run in CI where circleci has NOPASSWD sudo.
…deRootTest
Replace putenv('DD_TRACE_TEST_SAPI=fpm-fcgi') with a new WebServer::setForceSapi()
method that overrides the SAPI for a single WebServer instance only, without
affecting the global process environment used by all subsequent test classes.
Apache2handler uses a static shared instance on the same port as our FPM+Nginx setup, causing port conflicts. Skip the test in that case; it still runs under cli-server and cgi-fcgi CI jobs.
55f013f to
6f352fa
Compare
Signed-off-by: Alexandre Rulleau <[email protected]>
6f352fa to
b0dd55c
Compare
Description
Adds a new sidecar connection mode (
DD_TRACE_SIDECAR_CONNECTION_MODE) that allows the sidecar to run as a thread within the PHP process instead of a separate subprocess. This is useful in restricted environments where spawning subprocesses is not allowed (e.g. some container runtimes, managed hosting).New INI setting:
datadog.trace.sidecar_connection_modeauto(default): try subprocess first, fall back to thread if it failssubprocess: force subprocess mode (previous behavior)thread: force thread mode (useful in restricted environments; not compatible withpcntl_fork())Key behaviors:
www-data)pcntl_fork()reconnection gracefullyTests added:
SidecarThreadModeTest: multi-request thread mode behaviorSidecarThreadModeRootTest: FPM root+user-switch cross-user SHM scenario (runs in dedicatedfpm-fcgiCI jobs).phpttests for each connection mode value andautofallback behaviorReviewer checklist