Skip to content

fix(tracing): intern process_tags.serialized to avoid ZTS refcount race#3831

Merged
morrisonlevi merged 4 commits intomasterfrom
levi/fix-process-tags-zts-refcount-race
Apr 27, 2026
Merged

fix(tracing): intern process_tags.serialized to avoid ZTS refcount race#3831
morrisonlevi merged 4 commits intomasterfrom
levi/fix-process-tags-zts-refcount-race

Conversation

@morrisonlevi
Copy link
Copy Markdown
Collaborator

@morrisonlevi morrisonlevi commented Apr 26, 2026

SCP-11166

Description

Should help with #3729.

ext/process_tags.c stores a single process-wide zend_string at first-request init (zend_string_init persistent=1) and shares it across all ZTS threads. Every span serialization on a first-span path passes that pointer to Rust, which calls back into PHP's zend_string_addref/zend_string_release through ddog_init_span_func to manage a refcount on its Bytes wrapper. Those ops are non-atomic ++/-- on a single shared field; under FrankenPHP + PHP 8.4 ZTS with concurrent worker threads at load, the races produce torn refcount values, premature frees, and use-after-free reads of process_tags.serialized during convert_zend_to_bytes_string -> String::from_utf8_lossy, surfacing as SIGSEGV / SIGABRT / "memory allocation of bytes failed" / glibc "double free or corruption".

Mark the string IS_STR_INTERNED (= GC_IMMUTABLE) on creation so the refcount helpers see it as immutable and skip the ++/--. Clear the flag in clear_process_tags before the release so the persistent allocation is actually freed at module shutdown.

Reviewer checklist

  • Test coverage seems ok.
  • Appropriate labels assigned.

morrisonlevi and others added 2 commits April 24, 2026 22:13
…refcount race

ext/process_tags.c stores a single process-wide zend_string at first-request
init (zend_string_init persistent=1) and shares it across all ZTS threads.
Every span serialization on a first-span path passes that pointer to Rust,
which calls back into PHP's zend_string_addref/zend_string_release through
ddog_init_span_func to manage a refcount on its Bytes wrapper. Those ops are
non-atomic ++/-- on a single shared field; under FrankenPHP + PHP 8.4 ZTS
with concurrent worker threads at load, the races produce torn refcount
values, premature frees, and use-after-free reads of process_tags.serialized
during convert_zend_to_bytes_string -> String::from_utf8_lossy, surfacing as
SIGSEGV / SIGABRT / "memory allocation of <huge> bytes failed" / glibc
"double free or corruption".

Mark the string IS_STR_INTERNED (= GC_IMMUTABLE) on creation so the refcount
helpers see it as immutable and skip the ++/--. Wire dd_zend_string_try_addref
/ dd_zend_string_try_release wrappers (matching Zend's GC_TRY_* convention)
so the no-op behavior is explicit and robust across PHP versions where the
zend_string_addref interned check is incomplete or returns no value. Clear
the flag in clear_process_tags before the release so the persistent
allocation is actually freed at module shutdown.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…ng interned

Interned zend_strings are expected to have h pre-populated (callers may read
ZSTR_H(s) raw). zend_string_init leaves h zero, so setting IS_STR_INTERNED
without computing the hash leaves any hash-keyed lookup broken. Force the
compute via zend_string_hash_val() before adding the flag.

Squashable with the previous fix commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@datadog-datadog-prod-us1-2
Copy link
Copy Markdown

datadog-datadog-prod-us1-2 Bot commented Apr 26, 2026

Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 60.65% (-0.04%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 0aefc97 | Docs | Datadog PR Page | Give us feedback!

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Apr 26, 2026

Benchmarks [ tracer ]

Benchmark execution time: 2026-04-27 11:37:10

Comparing candidate commit 0aefc97 in PR branch levi/fix-process-tags-zts-refcount-race with baseline commit 96c5f06 in branch master.

Found 1 performance improvements and 3 performance regressions! Performance is the same for 190 metrics, 0 unstable metrics.

scenario:MessagePackSerializationBench/benchMessagePackSerialization-opcache

  • 🟩 execution_time [-3.739µs; -3.021µs] or [-3.345%; -2.703%]

scenario:PDOBench/benchPDOOverhead

  • 🟥 execution_time [+5.636µs; +7.502µs] or [+2.315%; +3.082%]

scenario:PDOBench/benchPDOOverheadWithDBM

  • 🟥 execution_time [+5.485µs; +8.359µs] or [+2.251%; +3.431%]

scenario:PHPRedisBench/benchRedisOverhead

  • 🟥 execution_time [+24.625µs; +38.162µs] or [+2.587%; +4.010%]

Comment thread ext/ddtrace.c
@bwoebi bwoebi marked this pull request as ready for review April 27, 2026 12:45
@bwoebi bwoebi requested a review from a team as a code owner April 27, 2026 12:45
Copy link
Copy Markdown
Collaborator

@bwoebi bwoebi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the redundant interned checks, looks fine now.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0aefc97d51

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread ext/process_tags.c
Comment on lines +59 to 60
GC_DEL_FLAGS(process_tags.serialized, IS_STR_INTERNED);
zend_string_release(process_tags.serialized);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid freeing interned process tags during runtime reload

Once IS_STR_INTERNED is set, the Rust conversion path stops taking a real reference (ddog_addref/release become no-ops for interned strings), so spans can retain borrowed pointers to process_tags.serialized without ownership. Releasing the string here during dd_trace_internal_fn("reload_process_tags") can free that memory while previously-created spans still hold it (for example, first span closed before reload and trace flushed later), which creates a use-after-free when those spans are later serialized or dropped.

Useful? React with 👍 / 👎.

@morrisonlevi morrisonlevi merged commit 7a66ef9 into master Apr 27, 2026
2103 of 2105 checks passed
@morrisonlevi morrisonlevi deleted the levi/fix-process-tags-zts-refcount-race branch April 27, 2026 14:10
@github-actions github-actions Bot added this to the 1.19.0 milestone Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants