[DBM] Add container tags hash to queries (if enabled)#8061
Conversation
## Summary of changes
Replaced custom mutex guard with `std::lock_guard`, using
`std::recursive_mutex` instead of `CRITICAL_SECTION` in windows and
`std::mutex` with railings in Linux
## Reason for change
Some locks have been spotted in smoke test wich could be cause by the
lack of thread recursive lock in the `std::mutex`
## Implementation details
## Test coverage
## Other details
<!-- Fixes #{issue} -->
<!-- ⚠️ Note:
Where possible, please obtain 2 approvals prior to merging. Unless
CODEOWNERS specifies otherwise, for external teams it is typically best
to have one review from a team member, and one review from apm-dotnet.
Trivial changes do not require 2 reviews.
MergeQueue is NOT enabled in this repository. If you have write access
to the repo, the PR has 1-2 approvals (see above), and all of the
required checks have passed, you can use the Squash and Merge button to
merge the PR. If you don't have write access, or you need help, reach
out in the #apm-dotnet channel in Slack.
-->
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4fd01fab6f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/AdoNet/DbScopeFactory.cs
Show resolved
Hide resolved
BenchmarksBenchmark execution time: 2026-03-25 14:05:49 Comparing candidate commit 8fed285 in PR branch Found 11 performance improvements and 11 performance regressions! Performance is the same for 253 metrics, 13 unstable metrics.
|
|
I just realized I need to put process tags in there too |
bouwkast
left a comment
There was a problem hiding this comment.
I think the main question that I have is that it appears this is correctly following the RFC in how we propagate the hash, but the merged Python implementation recomputes the hash.
But the RFC isn't precise enough in describing the hash and expected behavior / requirements for me to know which is correct really
tracer/src/Datadog.Trace/DatabaseMonitoring/DatabaseMonitoringPropagator.cs
Show resolved
Hide resolved
| getDiscoveryServiceFunc: static s => DiscoveryService.CreateUnmanaged( | ||
| s.TracerSettings.Manager.InitialExporterSettings, | ||
| ContainerMetadata.Instance, | ||
| new ServiceRemappingHash(null), |
There was a problem hiding this comment.
this one I'm not 100% sure, but since it's only used for DBM for now, I don't think it'd play any role in that code path, so it should be safe to hardcode a disabled instance
There was a problem hiding this comment.
I agree, this seems fine to me, I can't imagine we support DBM or DSM with CI Visibility today 🤔
2bf7dde to
91dc0c7
Compare
tracer/src/Datadog.Trace/DatabaseMonitoring/DatabaseMonitoringPropagator.cs
Show resolved
Hide resolved
tracer/src/Datadog.Trace/ClrProfiler/AutoInstrumentation/AdoNet/DbScopeFactory.cs
Outdated
Show resolved
Hide resolved
| if (!_warnedOnSet) | ||
| { | ||
| _warnedOnSet = true; | ||
| Log.Error("The code is trying to set the value '{Value}' to {Prop}, but this has no effect in .NET Framework.", value, nameof(ContainerTagsHash)); |
There was a problem hiding this comment.
This log is now gone in the new version. Intentional?
There was a problem hiding this comment.
yes 🤷
I mean, I could add an #if NETFRAMEWORK in the set body, but I'm not sure it's really worth the extra code...
There was a problem hiding this comment.
From the conversation in #8363 (closed), it's ok to remove this.
|
related: #8363 |
Instead of guarding the caller with #if !NETFRAMEWORK, make the setter a silent no-op. This avoids conflict with #8061 which replaces the caller entirely.
andrewlock
left a comment
There was a problem hiding this comment.
Nice Job 👍
I added a bunch of comments with suggestions/questions/nitpics, but there's nothing really blocking (other than potentially setting names etc and hash versions)
| ContainerTagsHash = containerTagsHash; | ||
|
|
||
| if (_serializedProcessTags != null) | ||
| { | ||
| Base64Value = Compute(_serializedProcessTags, containerTagsHash); | ||
| } |
There was a problem hiding this comment.
So just to confirm, we only have a Base64Value if we have both process tags and a container hash. And we don't care about the race condition/consistency here? e.g. for outside observers it's possible for there to be a ContainerTagsHash but no Base64Value, and vice versa. As long as we're fine with those limitations, this is fine. Otherwise, we can create a "holder" type to hold both values, and you can swap that atomically.
There was a problem hiding this comment.
yes there can be (for a brief period) a ContainerTagsHash but a stale Base64Value, it's fine, no precise timing needed here.
|
|
||
| private static string Compute(string processTags, string? containerTagsHash) | ||
| { | ||
| var hash = FnvHash64.GenerateHash(processTags, FnvHash64.Version.V1); |
There was a problem hiding this comment.
Do we definitely want to use v1 here, not V1A? IIRC, most places use V1A? 🤔
| #endif | ||
|
|
||
| BinaryPrimitives.WriteUInt64LittleEndian(buf, hash); // write 8 bytes into a 12-byte buffer | ||
| Base64.EncodeToUtf8InPlace(buf, dataLength: 8, out var bytesWritten); |
There was a problem hiding this comment.
Technically we should check the return value for correct encoding - I'm kinda surprised the new analyzers don't flag it. But also, this is fine, and we can make other assumptions later, so big meh 🤷♂️
There was a problem hiding this comment.
hmm, yeah, but the return values are either Done or DestinationTooSmall. DestinationTooSmall should never happen because we computed the size pretty deterministically...
I can like, throw an exception if this happens so that it's catched in tests if this ever happens because... we put more data in there
| telemetryProxyEndpoint: telemetryProxyEndpoint, | ||
| tracerFlareEndpoint: tracerFlareEndpoint, | ||
| containerTagsHash: _containerMetadata.ContainerTagsHash, // either the value just received, or the one we stored before (prevents overriding with null) | ||
| containerTagsHash: _serviceRemappingHash.ContainerTagsHash, // either the value just received, or the one we stored before (prevents overriding with null) |
There was a problem hiding this comment.
Is that correct though? 🤔 If the agent starts sending us null, is that not because something changed about the instance? Should we be second guessing the values we're sent here? (I haven't checked if the RFC spells this out)
There was a problem hiding this comment.
I don't really see a practical case where we'd like to reset our container tags hash... Like, either they change or they stay the same, but they cannot really disappear, it'd mean that the app exited the container while still running ^^
At the same time, I don't see a real reason why the agent would reply with null, BUT it could happen if some code path don't request the value in the headers, which could happen for good reasons (we could, for performance reasons, say that we don't need the hash on every single request if we already have one). I feel like it's safer to write it that way, especialy considering how far in the code the code "requesting" the header is
| getDiscoveryServiceFunc: static s => DiscoveryService.CreateUnmanaged( | ||
| s.TracerSettings.Manager.InitialExporterSettings, | ||
| ContainerMetadata.Instance, | ||
| new ServiceRemappingHash(null), |
There was a problem hiding this comment.
I agree, this seems fine to me, I can't imagine we support DBM or DSM with CI Visibility today 🤔
| Configuration key for setting DBM propagation mode | ||
| Default value is disabled, expected values are either: disabled, service or full | ||
| <seealso cref="Datadog.Trace.Configuration.TracerSettings.DbmPropagationMode"/> | ||
| DD_DBM_INJECT_SQL_BASEHASH: |
There was a problem hiding this comment.
Sigh this configuration key should end with _ENABLED so it should be something like DD_DBM_SQL_BASEHASH_INJECTION_ENABLED. If this ship hasn't already sailed in other languages, can we please fix it 🙏 🥺
|
|
||
| namespace Datadog.Trace.Tests; | ||
|
|
||
| public class ServiceRemappingHashTests |
There was a problem hiding this comment.
Do we have any example values from the RFC that we can use to validate the actual expected hash given a set of process tags and container tags hash? Would be useful for validating that the calculation is correct across languages
There was a problem hiding this comment.
no expected hash, as I said in an other reply above, we just need to be consistent with ourselves
Co-authored-by: Andrew Lock <[email protected]>
Co-authored-by: Andrew Lock <[email protected]>
Summary of changes
Add the ability to write the container tags hash to DBM queries + to the related span.
The goal is that DBM would then query the spans bearing that hash, and then use the container tags on this (those) spans(s) to enrich the queries with it.
This is controlled by a setting that is disabled by default, and would be enabled if propagation mode is "service" or greater
see RFC: https://docs.google.com/document/d/15GtNOKGBCt6Dc-HsDNnMmCdZwhewFQx8yUlI9in5n3M
related PR in python: DataDog/dd-trace-py#15293
Reason for change
DBM and DSM propagate service context in outbound communications (SQL comments, message headers), but neither product has awareness of the container environment (e.g.,
kube_cluster,namespace,pod_name). Propagating full container tags is not feasible due to cardinality constraints (query cache invalidation in OracleDB/SQLServer, exponential pathway growth in DSM) and size limitations (64–128 bytes for DBM non-comment methods).This is needed for the service renaming initiative (defining services based on container names) and APM primary tags (container-based dimensions like Kubernetes cluster).
The solution: the agent computes a hash of low-cardinality container tags and back-propagates it to the tracer, which includes it in outbound DBM/DSM communications. DBM then resolves the hash by correlating with APM spans that carry the same hash as a span tag.
Implementation details
BaseHashstatic class that computes an FNV-64 hash ofProcessTags.SerializedTagscombined with the container tags hash from the agent, encoded as base64DiscoveryService, stored inContainerMetadata.ContainerTagsHashContainerMetadataconverted from static to instance class (singleton viaContainerMetadata.Instance) to improve testabilityDatabaseMonitoringPropagatorinjects the base hash into SQL comments (asddsh) whenDD_DBM_INJECT_SQL_BASEHASHis true_dd.dbm_container_tags_hashspan tag onSqlTagsso DBM can correlate the hash back to the span's container tagsDD_DBM_INJECT_SQL_BASEHASH(disabled by default), intended to be enabled when DBM propagation mode isserviceor higherMinimalAgentHeaderHelperfor agent communicationTest coverage
Adding a test in DbScopeFactoryTests.cs forced me to inject the value from pretty high, which I find a bit "dirty", but at least we don't have to rely on global static instance in tests.
Other details