Skip to content

feat(sidecar): add thread mode as fallback connection for restricted environments#3573

Merged
Leiyks merged 32 commits intomasterfrom
leiyks/sidecar-threaded-fallback
Mar 20, 2026
Merged

feat(sidecar): add thread mode as fallback connection for restricted environments#3573
Leiyks merged 32 commits intomasterfrom
leiyks/sidecar-threaded-fallback

Conversation

@Leiyks
Copy link
Copy Markdown
Contributor

@Leiyks Leiyks commented Jan 13, 2026

Description

Adds a new sidecar connection mode (DD_TRACE_SIDECAR_CONNECTION_MODE) that allows the sidecar to run as a thread within the PHP process instead of a separate subprocess. This is useful in restricted environments where spawning subprocesses is not allowed (e.g. some container runtimes, managed hosting).

New INI setting: datadog.trace.sidecar_connection_mode

  • auto (default): try subprocess first, fall back to thread if it fails
  • subprocess: force subprocess mode (previous behavior)
  • thread: force thread mode (useful in restricted environments; not compatible with pcntl_fork())

Key behaviors:

  • Thread mode uses an abstract Unix socket (Linux) or named pipe (Windows) for IPC between master and worker processes in PHP-FPM
  • In FPM, the master process owns the listener thread; worker processes connect to it without spawning their own thread
  • The master UID is encoded in the socket name to support cross-user SHM access (e.g. when FPM master runs as root and workers run as www-data)
  • Orphaned workers promote themselves to master if the original master's thread listener is unavailable
  • Fork-safe: thread mode is detected and handles pcntl_fork() reconnection gracefully
  • Windows support: threaded connection mode works on Windows where subprocess mode was previously the only option

Tests added:

  • SidecarThreadModeTest: multi-request thread mode behavior
  • SidecarThreadModeRootTest: FPM root+user-switch cross-user SHM scenario (runs in dedicated fpm-fcgi CI jobs)
  • .phpt tests for each connection mode value and auto fallback behavior

Reviewer checklist

  • Test coverage seems ok.
  • Appropriate labels assigned.

@datadog-official
Copy link
Copy Markdown

datadog-official bot commented Jan 13, 2026

⚠️ Tests

Fix all issues with BitsAI or with Cursor

⚠️ Warnings

❄️ 12 New flaky tests detected

    tmp/build_extension/tests/ext/background-sender/sidecar_thread_mode_permissions.phpt (Thread mode sidecar uses abstract Unix socket) from PHP.tmp.build_extension.tests.ext.background.sender (Datadog) (Fix with Cursor)

    tmp/build_extension/tests/ext/background-sender/sidecar_thread_mode_permissions.phpt (Thread mode sidecar: socket name encodes master uid for setuid compatibility) from PHP.tmp.build_extension.tests.ext.background.sender (Datadog) (Fix with Cursor)

    tmp/build_extension/tests/ext/background-sender/sidecar_thread_mode_permissions.phpt (Thread mode sidecar: uses abstract Unix socket (no filesystem permissions needed)) from PHP.tmp.build_extension.tests.ext.background.sender (Datadog) (Fix with Cursor)

View all

🧪 13 Tests failed

testSearchPhpBinaries from integration.DDTrace\Tests\Integration\PHPInstallerTest (Datadog) (Fix with Cursor)
DDTrace\Tests\Integration\PHPInstallerTest::testSearchPhpBinaries
Test code or tested code printed unexpected output: Searching for available php binaries, this operation might take a while.
testSimplePushAndProcess from laravel-58-test.DDTrace\Tests\Integrations\Laravel\V5_8\QueueTest (Datadog) (Fix with Cursor)
DDTrace\Tests\Integrations\Laravel\V5_8\QueueTest::testSimplePushAndProcess
Test code or tested code printed unexpected output: spanLinksTraceId: 69bd44550000000044ede640485ec11d
tid: 69bd445500000000
hexProcessTraceId: 44ede640485ec11d
hexProcessSpanId: 14ba9c50f7aa88b9
processTraceId: 4966879127825989917
processSpanId: 1493678097993205945

phpvfscomposer://tests/vendor/phpunit/phpunit/phpunit:106
testSimplePushAndProcess from laravel-8x-test.DDTrace\Tests\Integrations\Laravel\V8_x\QueueTest (Datadog) (Fix with Cursor)
DDTrace\Tests\Integrations\Laravel\V8_x\QueueTest::testSimplePushAndProcess
Test code or tested code printed unexpected output: spanLinksTraceId: 69bd4473000000009e2f0d892aa67a8a
tid: 69bd447300000000
hexProcessTraceId: 9e2f0d892aa67a8a
hexProcessSpanId: a789c630886101be
processTraceId: 11398344064675248778
processSpanId: 12072398187892113854

phpvfscomposer://tests/vendor/phpunit/phpunit/phpunit:106
View all
This comment will be updated automatically if new data arrives.
🔗 Commit SHA: b0dd55c | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jan 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.68%. Comparing base (17c83a1) to head (b0dd55c).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3573      +/-   ##
==========================================
- Coverage   68.74%   68.68%   -0.07%     
==========================================
  Files         166      166              
  Lines       19030    19030              
  Branches     1797     1797              
==========================================
- Hits        13082    13070      -12     
- Misses       5132     5145      +13     
+ Partials      816      815       -1     
Flag Coverage Δ
helper-rust-integration 78.82% <ø> (ø)
helper-rust-unit 49.17% <ø> (-0.20%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 2 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 17c83a1...b0dd55c. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Jan 13, 2026

Benchmarks [ tracer ]

Benchmark execution time: 2026-03-17 18:07:49

Comparing candidate commit 1db24ed in PR branch leiyks/sidecar-threaded-fallback with baseline commit 7d767af in branch master.

Found 1 performance improvements and 3 performance regressions! Performance is the same for 190 metrics, 0 unstable metrics.

scenario:ComposerTelemetryBench/benchTelemetryParsing

  • 🟩 execution_time [-1442.689ns; -357.311ns] or [-12.022%; -2.978%]

scenario:HookBench/benchHookOverheadTraceMethod-opcache

  • 🟥 execution_time [+4.522µs; +8.446µs] or [+2.645%; +4.941%]

scenario:SamplingRuleMatchingBench/benchRegexMatching4

  • 🟥 execution_time [+53.720ns; +138.480ns] or [+4.547%; +11.721%]

scenario:TraceAnnotationsBench/benchTraceAnnotationOverhead-opcache

  • 🟥 execution_time [+4.289µs; +7.539µs] or [+2.449%; +4.305%]

@Leiyks Leiyks force-pushed the leiyks/sidecar-threaded-fallback branch 4 times, most recently from ce20a9e to 2eb0609 Compare January 14, 2026 14:41
@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Jan 14, 2026

Benchmarks [ profiler ]

Benchmark execution time: 2026-03-20 13:07:55

Comparing candidate commit b0dd55c in PR branch leiyks/sidecar-threaded-fallback with baseline commit 17c83a1 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 28 metrics, 8 unstable metrics.

@Leiyks Leiyks force-pushed the leiyks/sidecar-threaded-fallback branch 9 times, most recently from 06223eb to cda73e8 Compare January 16, 2026 15:24
@Leiyks Leiyks force-pushed the leiyks/sidecar-threaded-fallback branch 9 times, most recently from 211d162 to 0b1eb24 Compare January 23, 2026 12:42
Leiyks added 26 commits March 20, 2026 13:25
Signed-off-by: Alexandre Rulleau <[email protected]>
Signed-off-by: Alexandre Rulleau <[email protected]>
…lter

With newer libdatadog (29e00628b), debugger logs are routed to
/debugger/v2/input instead of /debugger/v1/diagnostics. The filter
in replayDebuggerData() still included /debugger/v1/diagnostics with
body (added in c3be656 for older routing), which caused it to return
a diagnostics response before the actual /debugger/v2/input snapshot,
breaking debugger_span_decoration_probe.phpt.

Remove the diagnostics clause — only /debugger/v1/input and
/debugger/v2/input are snapshot/log endpoints.
…back routing

libdatadog 29e00628b routes logs to /debugger/v2/input but falls back to
/debugger/v1/diagnostics when the agent doesn't support v2 (as is the
case with the request replayer in tests). Update the expected URI in
debugger_span_decoration_probe.phpt accordingly.

Also restore the /debugger/v1/diagnostics filter in replayDebuggerData()
which was incorrectly removed in the previous commit.
…ailable

In ddtrace_sidecar_setup_thread_mode, a forked child detecting is_child_process=true
would try to connect to the parent's thread listener. If the parent used subprocess
mode (no thread listener), the connect would fail and the child returned with no
sidecar and no fallback.

Mirror the existing fallback logic from ddtrace_sidecar_handle_fork: when the
parent's listener is unavailable, reset ddtrace_sidecar_master_pid to the current
process and fall through to start a new master listener in this process.
…ct path

When a parent process initializes the sidecar in thread mode, forks, and
then exits, the child inherits a broken transport (parent's listener thread
is dead). In dd_sidecar_connect(), if ddog_sidecar_connect_worker() fails
and current_pid != master_pid, promote the child to master so it can still
submit traces.

The existing fallback in ddtrace_sidecar_setup_thread_mode covers the
initial-setup path, but the reconnect path (ddtrace_sidecar_connect_callback
-> dd_sidecar_connect) had no equivalent fallback, causing a silent failure
for orphaned children that already had an inherited transport.

Add a .phpt test that verifies the orphaned child can create and submit
spans after the parent exits.
… compatibility

Thread-mode sockets now include the master's effective uid in the filename
(libdd.<ver>@<uid>-<pid>.sock in /tmp/libdatadog/). A worker process that later
drops privileges via setuid() (e.g. www-data under PHP-FPM) still computes the
same socket path as the master listener, and ensure_dir_exists now best-effort
chmods the directory world-writable to allow socket creation by any user.

Also fixes a double-dot bug in socket/lock path construction (Rust >=1.87
no longer strips leading dots from with_extension arguments).

Adds test: sidecar_thread_mode_permissions.phpt verifies the socket is created
with the correct uid-pid encoding.
Update libdatadog submodule: thread mode sidecar now uses abstract Unix
sockets on Linux (no filesystem permissions needed, any user can connect)
and a single-threaded Tokio runtime (no extra OS threads, fixes LSan
"Running thread was not suspended" ASAN warnings at process exit).

Update sidecar_thread_mode_permissions.phpt to verify abstract socket
usage (no filesystem socket created) instead of checking file permissions.
Signed-off-by: Alexandre Rulleau <[email protected]>
After fork, the child inherits the parent's heap including the Tokio
current_thread runtime allocations and Rust stdlib once-cell inits from
the sidecar thread. Since threads don't survive fork, these are orphaned
allocations that LSan incorrectly reports as leaks. This is the same
reason daemonize() already sets LSAN_OPTIONS=detect_leaks=0 for the
subprocess sidecar.
When the PHP-FPM master process runs as root, the sidecar thread (thread
mode) creates named SHM objects with 0600 by default, making them
inaccessible to worker processes running as www-data.

Call ddog_sidecar_set_shm_open_mode(0644) before starting the master
sidecar listener when geteuid()==0, so workers can open the SHM regions
read-only. The mode is set in both ddtrace_sidecar_setup_thread_mode()
and ddtrace_sidecar_minit() to cover all code paths.
…-user SHM

Replace the incorrect set_shm_open_mode(0644) workaround with the proper
fchown-based fix implemented in the libdatadog submodule. The SHM ownership
is now transferred to the worker's UID (obtained via SO_PEERCRED on first
connection) rather than relaxing file permissions.

- Remove ddog_sidecar_set_shm_open_mode() calls from thread mode setup
- Remove declaration from components-rs/sidecar.h
- Update libdatadog submodule to pick up the fchown implementation
…er thread

In thread mode with PHP-FPM, dd_activate_once (called via pthread_once on
first RINIT) runs independently in each worker process since the master never
serves requests. When subprocess mode fails and thread mode is attempted, each
worker hit the fallback path in ddtrace_sidecar_setup_thread_mode() that called
ddog_sidecar_connect_master() — starting a new listener thread per worker.

The master listener must only be started in MINIT (via ddtrace_sidecar_minit())
in the master process, so it survives PHP-FPM forking. When a worker cannot
connect to the master's listener, it now logs a warning and runs without the
sidecar instead of spawning its own thread.

Non-child processes (master, CLI) retain the ability to start a new listener
as a fallback, preserving the behavior requested in earlier review feedback.
…er SHM

Adds a test that exercises the fchown() cross-user SHM path: PHP-FPM master
runs as root, workers switch to an unprivileged user (www-data/daemon/nobody),
and thread mode is used. This is the exact scenario that motivated the
SO_PEERCRED + fchown() fix.

Infrastructure changes:
- www.conf: add {{user_group}} placeholder for optional user/group directives
- PhpFpm.php: accept $fpmUser/$fpmGroup constructor params
- WebServer.php: add setPhpFpmUser() and pass it through to PhpFpm
- WebFrameworkTestCase.php: add configureWebServer() hook (called before
  start()) so subclasses can apply extra server config without reimplementing
  the full setUpWebServer() logic

New test (SidecarThreadModeRootTest):
- Skips if not root, not fpm-fcgi SAPI, or no unprivileged user available
- testTracesAreSubmittedWithRootMasterAndUnprivilegedWorker: a single request
  through a root-master/www-data-worker FPM pool must produce traces — failure
  means the worker cannot access the SHM after fchown()
- testMultipleWorkersShareSingleMasterListenerThread: 3 requests with multiple
  workers must all succeed, ensuring the per-worker-thread regression is caught
`void` return type hint was introduced in PHP 7.1; PHP 7.0 CI jobs
were failing with TypeError on every WebFrameworkTestCase subclass.
Remove the DD_TRACE_TEST_SAPI=fpm-fcgi skip condition and instead call
putenv() to force FPM mode before the parent sets up the web server.
This allows the test to run in every test_web_custom matrix entry
(cli-server, cgi-fcgi, apache2handler) rather than only when the CI job
explicitly sets fpm-fcgi SAPI.
Use untilNumberOfTraces(3) so the test agent waits for all 3 traces
before collecting, instead of returning early after the first one.
Also restore the >= 3 assertion which is now reliable.
Add $runAsSudo param to PhpFpm and setPhpFpmSudo() to WebServer so
php-fpm can be started as root even when the test runner is non-root.
The test now skips only if neither root nor passwordless sudo is
available, allowing it to run in CI where circleci has NOPASSWD sudo.
…deRootTest

Replace putenv('DD_TRACE_TEST_SAPI=fpm-fcgi') with a new WebServer::setForceSapi()
method that overrides the SAPI for a single WebServer instance only, without
affecting the global process environment used by all subsequent test classes.
Apache2handler uses a static shared instance on the same port as our
FPM+Nginx setup, causing port conflicts. Skip the test in that case;
it still runs under cli-server and cgi-fcgi CI jobs.
@Leiyks Leiyks force-pushed the leiyks/sidecar-threaded-fallback branch from 55f013f to 6f352fa Compare March 20, 2026 12:37
Signed-off-by: Alexandre Rulleau <[email protected]>
@Leiyks Leiyks force-pushed the leiyks/sidecar-threaded-fallback branch from 6f352fa to b0dd55c Compare March 20, 2026 12:48
@Leiyks Leiyks merged commit d442722 into master Mar 20, 2026
2071 of 2082 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants