chore: upgrade workspace rust edition to 2024#96
Merged
duncanpharvey merged 2 commits intoduncan-harvey/config-cratefrom Mar 11, 2026
Merged
chore: upgrade workspace rust edition to 2024#96duncanpharvey merged 2 commits intoduncan-harvey/config-cratefrom
duncanpharvey merged 2 commits intoduncan-harvey/config-cratefrom
Conversation
duncanista
approved these changes
Mar 11, 2026
duncanpharvey
added a commit
that referenced
this pull request
Mar 11, 2026
* chore(bottlecap): make config a folder module (#242) * remove `config.rs` file * create `config/mod.rs` * move to `config/flush_strategy.rs` * move to `config/log_level.rs` * update imports * fmt * feat(bottlecap): add logs processing rules (#243) * add logs processing rules field * add `regex` crate * add `processing_rules.rs` config module * use `processing_rule` module instead * update logs `processor` to use compiled rules * update unit test * Svls 4825 support encrypted keys manual (#258) * add plumbing for aws secret manager * strip as much deps as possible * fix test * remove unused warning * reorg runner for bottlecap * fix overwriting of arch * add full error to the panic * avoid building the go agent all the time * rename module * speed up build * add simple scripts to build and publish * remove deleted call * remove changes from common scripts * resolve import conflicts * wrong file pushed * make sure permissions are right * move secret parsing after log activation * add some stat to build * add manual req for secret (still broken) * rebuild after conflict on cargo loc * automate update and call * change headers and fix signature * fix typo and small refactor * remove useless thread spawn * small refactors on deploy scripts * use access key always for signatures * the secret has to be used to sign * fix: missing newline in request * use only manual decrypt * add timed steps * add scripts to force restarts * fix launch script * refactor decrypt * cargo format and clippy * fix clippy error add formatting/clippy functinos --------- Co-authored-by: AJ Stuyvenberg <[email protected]> * add kms handling (#261) * add kms handling * fix return value * fix test * fix kms * remove committed test file * rename * format * fmt after fix * fix conflicts * await async stuff * formatting * bubble up error converting to sdt * use box dyn for generic errors * reforamt * address other comments * remove old build file added with conflict * Svls 4978 handle secrets error (#271) * add kms handling * fix return value * fix test * fix kms * remove committed test file * rename * format * fmt after fix * fix conflicts * await async stuff * formatting * bubble up error converting to sdt * use box dyn for generic errors * reforamt * address other comments * remove old build file added with conflict * do not pass around the whole config for just the secret * fix scope and just bubble up erros * reformat * renaming * without api key, just call next loop * fix types and format * fix folder path * fix cd and returns * resolve conflicts * formatter * chore(bottlecap): log failover reason (#292) * print failover reason as json string * fmt * update key to be more verbose * Add APM tracing support (#294) * wip: tracing * feat: tracing WIP * feat: rename mini agent to trace agent * feat: fmt * feat: Fix formatting after rename * fix: remove extra tokio task * feat: allow tracing * feat: working v5 traces * feat: Update to use my branch of libdatadog so we have v5 support * feat: Update w/ libdatadog to pass trace encoding version * feat: update w/ merged libdatadog changes * feat: Refactor trace agent, reduce code duplication, enum for trace version. Pass trace provider. Manual stats flushing. Custom create endpoint until we clean up that code in libdatadog. * feat: Unify config, remove trace config. Tests pass * feat: fmt * feat: fmt * clippy fixes * parse time * feat: clippy again * feat: revert dockerfile * feat: no-default-features * feat: Remove utils, take only what we need * feat: fmt moves the import * feat: replace info with debug. Replace log with tracing lib * feat: more debug * feat: Remove call to trace utils * feat: Allow appsec but in a disabled-only state until we add support for the runtime proxy (#296) * feat: Allow appsec but in a disabled-only state until we add support for the runtime proxy * feat: Log failover reason * fix: serverless_appsec_enabled. Also log the reason * feat: Require DD_EXTENSION_VERSION: next (#302) * feat: Require DD_EXTENSION_VERSION: next * feat: add tests, fix metric tests * feat: revert metrics test byte changes * feat: fmt * feat: remove ref * feat: honor enhanced metrics bool (#307) * feat: honor enhanced metrics bool * feat: add test * feat: refactor to log instead of return result * fix: clippy * feat: warn by default (#316) * chore(bottlecap): fallback on `datadog.yaml` usage (#326) * fallback on `datadog.yaml` presence * add comment * fix(bottlecap): filter debug logs from external crates (#329) * remove `tracing-log` instead, use the `tracing-subscriber` `tracing-log` feature * capitalize debugs * remove unnecessary file * update log formatter prefix * update log filter * fmt * chore(bottlecap): switch flushing strategy to race (#318) * feat: race flush * refactor: periodic only when configured * fmt * when flushing strategy is default, set periodic flush tick to `1s` * on `End`, never flush until the end of the invocation * remove `tokio_unstable` feature for building * remove debug comment * remove `invocation_times` mod * update `flush_control.rs` * use `flush_control` in main * allow `end,<ms>` strategy allows to flush periodically over a given amount of seconds and at the end * update `debug` comment for flushing * simplify logic for flush strategy parsing * remove log that could spam debug * refactor code and add unit test --------- Co-authored-by: jordan gonzález <[email protected]> Co-authored-by: alexgallotta <[email protected]> * remove log that might confuse customers (#333) * Fix dogstatsd multiline (#335) * test: add invalid string and multi line distro test with empty newline * test: move unit test to appropriate package * fix: do not error log for empty and new line strings --------- Co-authored-by: jordan gonzález <[email protected]> * add env vars to be ignored (#337) * feat: Open up more env vars which we don't rely on (#344) * feat: Allow trace disabled plugins (#348) * feat: Allow trace disabled plugins * feat: trace debug * feat: Allowlist additional env vars (#354) * feat: Allowlist additional env vars * fix: fmt * feat: and repo url * aj/allow apm replace tags array (#358) * fix: allow objects to be ignored * feat: specs * fix(bottlecap): set explicit deny list and allow yaml usage (#363) * set explicit deny list also allow `datadog.yaml` usage * add unit test for parsing rule from yaml * remove `object_ignore.rs` * remove import * remove logging failover reason when user is not opt-in * chore(bottlecap): fast failover (#371) * failover fast * typo * failover on `/opt/datadog_wrapper` set * aj/fix log level casing (#372) * feat: serde's rename_all isn't working, use a custom deserializer to lowercase loglevels * feat: default is warn * feat: Allow reptition to clear up imports * feat: rebase * feat: failover on dd proxy (#391) * feat: support HTTPS_PROXY (#381) * feat: support DD_HTTP_PROXY and DD_HTTPS_PROXY * fix: remove import * fix: fmt * feat: Revert fqdn changes to enable testing * feat: Use let instead of repeated instantiation * feat: Rip out proxy stuff we dont need but make sure we dont proxy the telemetry or runtime APIs with system proxies * feat: remove debug * fix: no debugs for hyper/h2 * fix: revert cargo changes * feat: Pin libdatadog deps to v13.1 * fix: rebase with dogstatsd 13.1 * fix: use main for dsdrs * fix: remove unwrap * fix: fmt * fix: licenses * increase size boo * fix: size ugh * fix: install_default() in tests * aj/honor both proxies in order (#410) * feat: Honor priority order of DD_PROXY_HTTPS over HTTPS_PROXY * feat: fmt * fix: Prefer Ok over some + ok * Feat: Use tags for proxy support in libdatadog * fix: no proxy for tests * fix: license * all this for a comma * accept `datadog_wrapper` * Revert "accept `datadog_wrapper`" This reverts commit 9560657582f2f22c8e68af5d0bb9d7d2b0765650. * accept `datadog_wrapper` (#373) * feat(bottlecap): create Inferred Spans baseline + infer API Gateway HTTP spans (#405) * add `Trigger` trait for inferred spans * add `ApiGatewayHttpEvent` trigger * add `SpanInferrer` * make `invocation::processor` to use `SpanInferrer` * send `aws_config` to `invocation::processor` * use incoming payload for `invocation::processor` for span inferring * add `api_gateway_http_event.json` for testing * add `api_gateway_proxy_event.json` for testing * fix: Convert tag hashmap to sorted vector of tags * fix: fmt --------- Co-authored-by: AJ Stuyvenberg <[email protected]> * feat(bottlecap): Add Composite Trace Propagator (#413) * add `trace_propagation_style.rs` * add Trace Propagation to `config.rs` also updated unit tests, as we have custom behavior, we should check only the fields we care about in the tests * add `links` to `SpanContext` * add composite propagator also known as our internal http propagator, but in reality, http doesnt make any sense to me, its just a composite propagator which we used based on our configuration * update `TextMapPropagator`s to comply with interface also updated the naming * fmt * add unit testing for `config.rs` * add `PartialEq` to `SpanContext` * correct logic from `text_map_propagator.rs` logic was wrong in some parts, this was discovered through unit tests * add unit tests for `DatadogCompositePropagator` also corrected some logic * feat(bottlecap): add capture lambda payload (#454) * add `tag_span_from_value` * add `capture_lambda_payload` config * add unit testing for `tag_span_from_value` * update listener `end_invocation_handler` parsing should not be handled here * add capture lambda payload feature also parse body properly, and handle `statusCode` * feat(bottlecap): add Cold Start Span + Tags (#450) * add some helper functions to `invocation::lifecycle` mod * create cold start span on processor * move `generate_span_id` to father module * send `platform_init_start` data to processor * send `PlatformInitStart` to main bus * update cold start `parent_id` * fix start time of cold start span * enhanced metrics now have a `dynamic_value_tags` for tags which we have to calculate at points in time * `AwsConfig` now has a `sandbox_init_time` value * add `is_empty` to `ContextBuffer` * calculate init tags on invoke also add a method to reset processor invocation state * restart init tags on set * set tags properly for proactive init * fix unit test * remove debug line * make sure `cold_start` tag is only set in one place * feat(bottlecap): support service mapping and `peer.service` tag (#455) * add some helper functions to `invocation::lifecycle` mod * create cold start span on processor * move `generate_span_id` to father module * send `platform_init_start` data to processor * send `PlatformInitStart` to main bus * update cold start `parent_id` * fix start time of cold start span * enhanced metrics now have a `dynamic_value_tags` for tags which we have to calculate at points in time * `AwsConfig` now has a `sandbox_init_time` value * add `is_empty` to `ContextBuffer` * calculate init tags on invoke also add a method to reset processor invocation state * restart init tags on set * set tags properly for proactive init * fix unit test * remove debug line * make sure `cold_start` tag is only set in one place * add service mapping config serializer * add `service_mapping.rs` * add `ServiceNameResolver` interface for service mapping * implement interface in every trigger * send `service_mapping` lookup table to span enricher * create `SpanInferrer` with `service_mapping` config * fmt * rename failover to fallback (#465) * fix(bottlecap): fallback when otel set (#470) * fallback on otel * add unit test * feat(bottlecap): fallback on opted out only (#473) * fallback on opted out only * log on opted out * fix(bottlecap): fallback on yaml otel config (#474) * fallback on opted out only * fallback on yaml otel config * switch `legacy` to `compatibility` * feat: honor serverless_logs (#475) * feat: honor serverless_logs * fmt --------- Co-authored-by: jordan gonzález <[email protected]> * feat: Flush timeouts (#480) * fix version parsing for number (#492) * fix: fallback on intake urls (#495) * fallback on `dd_url`, `dd_url`, and, apm and logs intake urls * fix env var for apm url * grammar * set dogstatsd timeout (#497) * set dogstatsd timeout * add todo for other edge case * add comment on jitter. Likely not required for lambda * fmt * update license * update sha for dogstatsd --------- Co-authored-by: jordan gonzález <[email protected]> * fix: set right domain and arn by region on secrets manager (#511) * check whether the region is in China and use the appropriated domain * correct arn for lambda in chinese regions * fix: typo in china arn * fix: reuse function to detect right aws partition and support gov too * nest and rearrange imports * fix imports again * fix: Honor noproxy and skip proxying if ddsite is in the noproxy list (#520) * fix: Honor noproxy and skip proxying if ddsite is in the noproxy list * feat: specs * feat: Oneline check, add comment * Support proxy yaml config (#523) * fix: Honor noproxy and skip proxying if ddsite is in the noproxy list * feat: specs * feat: yaml proxy had a different format * feat: Oneline check, add comment * feat: Support nonstandard proxy config * feat: specs * fix: bad merge whoops * feat: Support snapstart's vended credentials (#532) * feat: Support snapstart's vended credentials * feat: Add snapstart events * fix: specs * feat: Mutable config as we consume it entirely by the secrets module. * fix: needless borrow * feat: add zstd and compress (#558) * feat: add zstd and compress * hack: skip clippy for a sec * feat: Honor logs config settings. * fix: dont set zstd header unless we compress * fmt * clippy * fmt * fix: ints * licenses * remove debug code * wtf clippy and fmt, pick one --------- Co-authored-by: jordan gonzález <[email protected]> * Svls 6036 respect timeouts (#537) * log shipping times * set flush timeout for traces * remove retries * fix conflicts * address comments * Fallback on gov regions (#550) * Aj/support pci and custom endpoints (#585) * feat: logs_config_logs_dd_url * feat: apm pci endpoints * feat: metrics * feat: support metrics using dogstatsd methods * fix: use the right var * tests: use server url override * feat: refactor into flusher method * feat: clippy * Aj/yaml apm replace tags (#602) * feat: yaml APM replace tags rule parsing * feat: Custom deserializer for replace tags. yaml -> JSON so we can rely on the same method because ReplaceRule is totally private * remove aj * feat: merge w/ libdatadog main * feat: Parse http obfuscation config from yaml * feat: licenses * feat: parse env and service as strings or ints (#608) * feat: parse env and service as strings or ints * feat: add service test * fmt * Add DSM and Profiling endpoints (#622) - **feat: Support DSM proxy endpoint** - **feat: profiling support** - **feat: add additional tags** * chore(config): parse config only twice (#651) # What? Removes `FallbackConfig` and `FallbackYamlConfig` in favor of the existing configurations. # How? 1. Using only the known places where we are going to fallback from the available configs. 2. Moved environment variables and yaml config to its own file for readability. # Notes - Added fallbacks for OTLP (in preparation for that PR, allowed some fields to not fallback). * fix: Parse DD_APM_REPLACE_TAGS env var (#656) Fixes an issue where we didn't parse `DD_APM_REPLACE_TAGS` because the yaml block includes an additional `config` word after APM, which is not present in the env var. As usual, env vars override config file settings * feat: Optionally disable proc enhanced metrics (#663) Fixes #648 For customers using very very fast/small lambda functions (usually just rust), there can be a small 1-2ms increase in runtime duration when collecing metrics like open file descriptors or tmp file usage. We still enable these by default, but customers can now optionally disable them * fix(config): serialize booleans from anything (#657) # What? Serializes any boolean with values `0|1|true|TRUE|False|false` to its boolean part. # How? Using `serde-aux` crate to leverage the unit testing and ownership. # Motivation Some products at Datadog allow this values as they coalesce them – [SVLS-6687](https://datadoghq.atlassian.net/browse/SVLS-6687) [SVLS-6687]: https://datadoghq.atlassian.net/browse/SVLS-6687?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ * chore(config): create `aws` module (#659) # What? Refactors methods related to AWS config into its own module # Motivation Just cleaning and removing stuff from main – [SVLS-6686](https://datadoghq.atlassian.net/browse/SVLS-6686) [SVLS-6686]: https://datadoghq.atlassian.net/browse/SVLS-6686?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ * feat: [SVLS-6242] bottlecap fips builds (#644) Building bottlecap with fips mode. This is entirely focused on removing `ring` (and other non-FIPS-compliant dependencies from our `fips`-featured builds.) * fix(config): remove `apm_ignore_resources` check in OTEL (#676) # What? Removes usage of `DD_APM_IGNORE_RESOURCES` in the OTEL span transform. # Why? 1. The implementation was incorrect and shouldn't check for resources to ignore in the transformation step. 2. It was not properly used in the `apm_config` for YAML files. # Notes: - Follow up PR to implement `APM_IGNORE_RESOURCES` properly in the Trace Agent. # More Learn about ignoring resources: https://docs.datadoghq.com/tracing/guide/ignoring_apm_resources/?tab=datadogyaml#ignoring-based-on-resources `DD_APM_IGNORE_RESOURCES` is specified as: ``` A list of regular expressions can be provided to exclude certain traces based on their resource name. All entries must be surrounded by double quotes and separated by commas. ``` A correct usage would be: ```env DD_APM_IGNORE_RESOURCES="(GET|POST) /healthcheck,API::NotesController#index" ``` or in yaml ```yaml apm_config: ignore_resources: ["(GET|POST) /healthcheck","API::NotesController#index"] ``` * feat(proxy): abstract lambda runtime api proxy (#669) # What? Abstracts the concept of the `proxy` from the Lambda Web Adapter implementation. This will unlock the usage of ASM. # How? Using `axum` crate, we create a new server proxy with specific routes from the Lambda Runtime API which we are interested in proxying. # Motivation ASM and [SVLS-6760](https://datadoghq.atlassian.net/browse/SVLS-6760) [SVLS-6760]: https://datadoghq.atlassian.net/browse/SVLS-6760?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ * fix(config): fix otlp trace agent to start when right configuration is set (#680) # What? Ensures that OTLP agent is only enabled when the `otlp_config_receiver_protocols_http_endpoint` is set, and when `otlp_config_traces_enabled` is `true` # Motivation #678 # Notes OTEL agent should only spin up when receiver protocols endpoint is set, so this was a miss on my side. * feat: continuous flushing strategy for high throughput functions (#684) This is a heavy refactor and new feature. - Introduces FlushDecision and separates it from FlushStrategy - Cleans up FlushControl logic and methods It also adds the ability to flush telemetry across multiple serial lambda invocations. This is done using the `continuous` strategy. This is a huge win for busy functions as seen in our test fleet, where the p99/max drops precipitously, which also causes the average to plummet. This also helps reduce the number of cold starts encountered during scaleup events, which further reduces latency along with costs:  Technical implementation: We spawn the task and collect the flush handles, then in the two periodic strategies we check if there were any errors or unresolved futures in the next flush cycle. If so, we switch to the `periodic` strategy to ensure flushing completes successfully. We don't adapt to the periodic strategy unless the last 20 invocations occurred within the `config.flush_timeout` value, which has been increased by default. This is a naive implementation. A better one would be to calculate the first derivative of the invocation periodicity. If the rate is increasing, we can adapt to the continuous strategy. If the rate slows, we should fall back to the periodic strategy. <img width="807" alt="image" src="https://github.com/user-attachments/assets/d3c25419-f1da-4774-975f-0e254047b9b7" /> The existing implementation is cautious in that we could definitely adapt sooner but don't. Todo: add a feature flag for continuous flushing? * fix: bump flush_timeout default (#697) A little goofy because we use this to determine when/how to move over to continuous flushing, but the gist is that our invocation context tracks the start time of each invocation. Because it's all local to a single sandbox, this means that the time diff between invocations includes post runtime duration, so it's very common to have 20 invocations greater than 10s if there are even a couple of periodic/end flushes in there. This customizable with `DD_FLUSH_TIMEOUT` so if people want to set it to a very short timeout, they are able to. * feat: Allow users to specify continuous strategy (#701) https://datadoghq.atlassian.net/browse/SVLS-6994 * feat: Use http2 unless overridden or using a proxy (#706) We rolled out HTTP/2 support for logs in v81, which seems to have broken logs for some users relying on proxies which may not support http2. This change introduces a new configuration option called `use_http1`. 1. If `DD_HTTP_PROTOCOL` is explicitly set to http1, we'll use it 2. If `DD_HTTP_PROTOCOL` is not set and the user is using a proxy, we'll use http1 unless overridden by the `DD_HTTP_PROTOCOL` flag being set to anything other than `http1`. fixes #705 * Dual shipping metrics support (#704) Adds support for dual shipping metrics to endpoints configured using the `additional_endpoints` YAML or `DD_ADDITIONAL_ENDPOINTS` env var config. For each configured endpoint/API key combination, we now create a separate `MetricsFlusher` to flush the same batch of metrics to multiple endpoints in parallel. Also, updates the retry logic to retry flushing for the specific flusher that encountered an error. Tested dual shipping metrics to 2 additional orgs/endpoints including eu1. Depends on dogstatsd changes: https://github.com/DataDog/serverless-components/pull/20 * chore: Separate AwsCredentials from AwsConfig (#716) # Problem Right now `AwsConfig` has a lot of fields, including the ones related to credential: ``` pub aws_access_key_id: String, pub aws_secret_access_key: String, pub aws_session_token: String, pub aws_container_credentials_full_uri: String, pub aws_container_authorization_token: String, ``` The next PR https://github.com/DataDog/datadog-lambda-extension/pull/717 wants to lazily load API key and the credentials. To do that, for the resolver function `resolve_secrets()`, I need to change the param `aws_config` from `&AwsConfig` to `Arc<RwLock<AwsConfig>>`. Because `aws_config` is passed to many places, this change involves updating lots of functions, which is formidable. # This PR Separates these credential-related fields out from `AwsConfig` and creates a new struct `AwsCredentials` Thus, the next PR will only need to change the param `aws_credentials` from `&AwsCredentials` to `Arc<RwLock<AwsCredentials>>`. Because `aws_credentials` is not used in lots of places, the next PR becomes easier. https://datadoghq.atlassian.net/issues/SVLS-6996 https://datadoghq.atlassian.net/issues/SVLS-6998 * chore(config): separate config from sources (#709) # What? Separates the configuration from sources, allowing it to be used in more use cases. # How? Creates new default configuration and separates the environment variables and YAML sources from the default. # Why? Make it easier to track changes in every source, as the field names might be different to what they are used at the configuration level. # Notes I expect to abstract this even more by providing it as a crate which can have features, that way customers can only use the sources and product specific fields they need. --------- Co-authored-by: Aleksandr Pasechnik <[email protected]> Co-authored-by: Florentin Labelle <[email protected]> * Dual Shipping Logs Support (#718) Adds support for dual shipping metrics to endpoints configured using the `logs_config` YAML or `DD_LOGS_CONFIG_ADDITIONAL_ENDPOINTS` env var config. Implemented a `LogsFlusher` as a wrapper to all the `Flusher` instances to manages flushing to all configured endpoints. Moved retry logic to `LogsFlusher`, as the retry request contains the endpoint details and does not have to be tied to a particular flusher. --------- Co-authored-by: jordan gonzález <[email protected]> * chore: upgrade rust version for toolchain to 1.84.1 (#743) # This PR 1. In `rust-toolchain.toml`, upgrade Rust version from `1.81.0` to `1.84.1`. 2. Fix/mute clippy errors caused by the upgrade - some errors require non-trivial code changes, so I muted them for now and added a TODO to fix them in separate PRs. # Motivation `libdatadog` now uses `1.84.1` https://github.com/DataDog/libdatadog/blame/main/Cargo.toml#L62 To test changes on `libdatadog`, I need to change the Rust version in `datadog-lambda-extension` to 1.84.1 as well. Making this a separate PR: 1. so it's easier to test multiple PRs that depend on changes on `libdatadog` in parallel after I merge this PR to main. 4. because this PR also involves lots of code changes needed to make clippy happy * feat: dual shipping APM support (#735) Adds support for dual shipping traces to endpoints configured using the `apm_config` YAML or `DD_APM_CONFIG_ADDITIONAL_ENDPOINTS` env var config. #### Additional Notes: - Bumped libdatadog (and serverless-components) to include https://github.com/DataDog/libdatadog/pull/1139 - Adds configuration option to set compression level for trace payloads * chore: Add doc and rename function for flushing strategy (#740) # Motivation It took me quite some effort to understand flushing strategies. I want to make it easier to understand for me and future developers. # This PR Tries to make flushing strategy code more readable: 1. Add/move comments 2. Create an enum `ConcreteFlushStrategy`, which doesn't contain `Default` because it is required to be resolved to a concrete strategy 3. Rename `should_adapt` to `evaluate_concrete_strategy()` # To reviewers There are still a few things I don't understand, which are marked with `TODO`. Appreciate explanation! Also correct me if any comment I added is wrong. * chore: upgrade to edition 2024 and fix all linter warnings (#754) Also updates CI to run `clippy` on `--all-targets` so that linter errors aren't ignored on side targets such as tests. * fix(apm): Enhance Synthetic Span Service Representation (#751) <!--- Please remember to review the [contribution guidelines](https://github.com/DataDog/datadog-lambda-python/blob/main/CONTRIBUTING.md) if you have not yet done so._ ---> ### What does this PR do? <!--- A brief description of the change being made with this pull request. ---> Rollout of span naming changes to align serverless product with tracer to create streamlined Service Representation for Serverless Key Changes: - Change service name to match instance name for all managed services (aws.lambda -> lambda name, etc) (breaking) - Opt out via `DD_TRACE_AWS_SERVICE_REPRESENTATION_ENABLED` - Add `span.kind:server` on synthetic spans made via span-inferrer, cold start and lambda invocation spans - Remove `_dd.base_service` tags on synthetic spans to avoid unintentional service override ### Motivation <!--- What inspired you to submit this pull request? ---> Improve Service Map for Serverless. This allows for synthetic spans to have their own service on the map which connects with the inferred spans from the tracer side. * feat: port of Serverless AAP from Go to Rust (#755) # What? Ports the Serverless App & API Protection feature (AAP, also known as Serverless AppSec) from the Go extension to Rust. This is using https://github.com/DataDog/libddwaf-rust to provide bindings to the in-app WAF. This provides enhanced support for API Protection (notably, response schema collection) compared to the Go version. Tradeoff is that XML request and response security processing is not currently supported in this version (it was in Go, but likely seldom used). This introduces a `bottlecap::appsec::processor::Processor` that is integrated in the `bottlecap::proxy::Interceptor` (for request & response acquisition) as well as in the `bottlecap::trace_processor::TraceProcessor` (to decorate the `aws.lambda` span with security data). # Why? We plan on decommissioning the Go version of the agent and a tracer-side version of the Serverless AAP feature will not be available across all supported language runtimes before several weeks/months. Also [SVLS-5286](https://datadoghq.atlassian.net/browse/SVLS-5286) # Notes [SVLS-5286]: https://datadoghq.atlassian.net/browse/SVLS-5286?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ --------- Co-authored-by: jordan gonzález <[email protected]> * feat: No longer launch Go-based agent for compatibility/OTLP/AAP config (#788) https://datadoghq.atlassian.net/browse/SVLS-7398 - As part of coming release, bottlecap agent no longer launches Go-based agent when compatibility/AAP/OTLP features are active - Emit the same metric when detecting any of above configuration - Update corresponding unit tests Manifests: - [Test lambda function](https://us-east-1.console.aws.amazon.com/lambda/home?region=us-east-1#/functions/ltn1-fullinstrument-bn-cold-python310-lambda?code=&subtab=envVars&tab=testing) with [logs](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fltn1-fullinstrument-bn-cold-python310-lambda/log-events/2025$252F08$252F21$252F$255B$2524LATEST$255Df3788d359677452dad162488ff15456f$3FfilterPattern$3Dotel) showing compatibility/AAP/OTPL are enabled <img width="2260" height="454" alt="image" src="https://github.com/user-attachments/assets/5dfd4954-5191-4390-83f5-a8eb3bffb9d3" /> - [Logging](https://app.datadoghq.com/logs/livetail?query=functionname%3Altn1-fullinstrument-bn-cold-python310-lambda%20Metric&agg_m=count&agg_m_source=base&agg_t=count&cols=host%2Cservice&fromUser=true&messageDisplay=inline&refresh_mode=paused&storage=driveline&stream_sort=desc&viz=stream&from_ts=1755787655569&to_ts=1755787689060&live=false) <img width="1058" height="911" alt="image" src="https://github.com/user-attachments/assets/629f75d1-e115-4478-afac-ad16d9369fa7" /> - [Metric](https://app.datadoghq.com/screen/integration/aws_lambda_enhanced_metrics?fromUser=false&fullscreen_end_ts=1755788220000&fullscreen_paused=true&fullscreen_refresh_mode=paused&fullscreen_section=overview&fullscreen_start_ts=1755787200000&fullscreen_widget=2&graph-explorer__tile_def=N4IgbglgXiBcIBcD2AHANhAzgkAaEAxgK7ZIC2A%2BhgHYDWmcA2gLr4BOApgI5EfYOxGoTphRJqmDhQBmSNmQCGOeJgIK0CtnhA8ObCHyagAJkoUVMSImwIc4IMhwT6CDfNQWP7utgE8AjNo%2BvvaYRGSwpggKxkgA5gB0kmxgemh8mAkcAB4IHBIQ4gnSChBoSKlswAAkCgDumBQKBARW1Ai41ZxxhdSd0kTUBAi9AL4ABABGvuPAA0Mj4h6OowkKja2DCAAUAJTaCnFx3UpyoeEgo6wgsvJEGgJCN3Jk9wrevH6BV-iWbMqgTbtOAAJgADPg5MY9BRpkZEL4UHZ4LdXhptBBqNDsnAISAoXp7NDVJdmKMfiBsL50nBgOSgA&refresh_mode=sliding&from_ts=1755783890661&to_ts=1755787490661&live=true) <img width="1227" height="1196" alt="image" src="https://github.com/user-attachments/assets/2922eb54-9853-4512-a902-dfa97916b643" /> * Revert "feat: No longer launch Go-based agent for compatibility/OTLP/AAP config (#788)" This reverts commit 0f5984571eb842e5ce1cbadbec0f92d73befcd08. * Ignoring Unwanted Resources in APM (#794) ## Task https://datadoghq.atlassian.net/browse/SVLS-6846 ## Overview We want to allow users to set filter tags which drops traces with root spans that match specified span tags. Specifically, users can set `DD_APM_FILTER_TAGS_REQUIRE` or `DD_APM_FILTER_TAGS_REJECT`. More info [here](https://docs.datadoghq.com/tracing/guide/ignoring_apm_resources/?tab=datadogyaml#trace-agent-configuration-options). ## Testing Deployed changes to Lambda. Invoked Lambda directly and through API Gateway to check with different root spans. Set the tags to either be REQUIRE or REJECT with value `name:aws.lambda`. Confirmed in logs and UI that we were dropping spans. * feat: eat: Add hierarchical configurable compression levels (#800) feat: Add hierarchical configurable compression levels - Add global compression_level config parameter (0-9, default: 6) with fallback hierarchy - Support 2-level compression configuration: global level first, then module-specific - This makes configuration more convenient - set once globally or override per module - Apply compression configuration to metrics flushers and trace processor - Add environment variable DD_COMPRESSION_LEVEL for global setting Test - Configuration: <img width="966" height="742" alt="image" src="https://github.com/user-attachments/assets/b33c0fd3-2b02-4838-8660-fc9ea9493998" /> - ([log](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fltn1-fullinstrument-bn-cold-python310-lambda/log-events/2025$252F08$252F25$252F$255B$2524LATEST$255D9c19719435bc48839f6f005d2b58b552)) Configuration: <img width="965" height="568" alt="image" src="https://github.com/user-attachments/assets/dfef594a-549f-4773-879d-549234f03fb7" /> * cherry pick: No longer launch Go-based agent for compatibility/OTLP/AAP config (#817) Cherry pick of previously reverted #788 https://datadoghq.atlassian.net/browse/SVLS-7398 - As part of coming release, bottlecap agent no longer launches Go-based agent when compatibility/AAP/OTLP features are active - Emit the same metric when detecting any of above configuration - Update corresponding unit tests Attention: it is an known issue with .Net https://github.com/aws/aws-lambda-dotnet/issues/2093 Manifests: - [Test lambda function](https://us-east-1.console.aws.amazon.com/lambda/home?region=us-east-1#/functions/ltn1-fullinstrument-bn-cold-python310-lambda?code=&subtab=envVars&tab=testing) with [logs](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fltn1-fullinstrument-bn-cold-python310-lambda/log-events/2025$252F08$252F21$252F$255B$2524LATEST$255Df3788d359677452dad162488ff15456f$3FfilterPattern$3Dotel) showing compatibility/AAP/OTPL are enabled <img width="2260" height="454" alt="image" src="https://github.com/user-attachments/assets/5dfd4954-5191-4390-83f5-a8eb3bffb9d3" /> - [Logging](https://app.datadoghq.com/logs/livetail?query=functionname%3Altn1-fullinstrument-bn-cold-python310-lambda%20Metric&agg_m=count&agg_m_source=base&agg_t=count&cols=host%2Cservice&fromUser=true&messageDisplay=inline&refresh_mode=paused&storage=driveline&stream_sort=desc&viz=stream&from_ts=1755787655569&to_ts=1755787689060&live=false) <img width="1058" height="911" alt="image" src="https://github.com/user-attachments/assets/629f75d1-e115-4478-afac-ad16d9369fa7" /> - [Metric](https://app.datadoghq.com/screen/integration/aws_lambda_enhanced_metrics?fromUser=false&fullscreen_end_ts=1755788220000&fullscreen_paused=true&fullscreen_refresh_mode=paused&fullscreen_section=overview&fullscreen_start_ts=1755787200000&fullscreen_widget=2&graph-explorer__tile_def=N4IgbglgXiBcIBcD2AHANhAzgkAaEAxgK7ZIC2A%2BhgHYDWmcA2gLr4BOApgI5EfYOxGoTphRJqmDhQBmSNmQCGOeJgIK0CtnhA8ObCHyagAJkoUVMSImwIc4IMhwT6CDfNQWP7utgE8AjNo%2BvvaYRGSwpggKxkgA5gB0kmxgemh8mAkcAB4IHBIQ4gnSChBoSKlswAAkCgDumBQKBARW1Ai41ZxxhdSd0kTUBAi9AL4ABABGvuPAA0Mj4h6OowkKja2DCAAUAJTaCnFx3UpyoeEgo6wgsvJEGgJCN3Jk9wrevH6BV-iWbMqgTbtOAAJgADPg5MY9BRpkZEL4UHZ4LdXhptBBqNDsnAISAoXp7NDVJdmKMfiBsL50nBgOSgA&refresh_mode=sliding&from_ts=1755783890661&to_ts=1755787490661&live=true) <img width="1227" height="1196" alt="image" src="https://github.com/user-attachments/assets/2922eb54-9853-4512-a902-dfa97916b643" /> ==== Another manifest for .Net: - [Lambda function](https://us-east-1.console.aws.amazon.com/lambda/home?region=us-east-1#/functions/ltn1-fullinstrument-bn-cold-dotnet6-lambda?code=&subtab=envVars&tab=testing) - [Log](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fltn1-fullinstrument-bn-cold-dotnet6-lambda/log-events/2025$252F08$252F29$252F$255B$2524LATEST$255D15ca867ee94049129ed461283ae46f01$3FfilterPattern$3Dfailover) - Configuration <img width="1490" height="902" alt="image" src="https://github.com/user-attachments/assets/b070e5e1-8335-4494-877f-6475d9959af2" /> - Log shows the issue reasons <img width="990" height="536" alt="image" src="https://github.com/user-attachments/assets/5503de33-ea92-401c-a595-c165e39b0c6e" /> <img width="848" height="410" alt="image" src="https://github.com/user-attachments/assets/54d1e87c-93e7-4084-8a9a-173cb7d0c4a7" /> <img width="938" height="458" alt="image" src="https://github.com/user-attachments/assets/4f205ec2-d923-47d1-9005-762650798894" /> --------- Co-authored-by: Tianning Li <[email protected]> * feat: [Trace Stats] Add feature flag DD_COMPUTE_TRACE_STATS (#841) ## This PR Adds a feature flag `DD_COMPUTE_TRACE_STATS`. - If true, trace stats will be computed from the extension side. In this case, we set `_dd.compute_stats` to `0`, so trace stats won't be computed on the backend. - If false, trace stats will NOT be computed from the extension side. In this case, we set `_dd.compute_stats` to `1`, so trace stats will be computed on the backend. - Defaults to false for now, so `_dd.compute_stats` still defaults to `1`, i.e. default behavior is not changed. - After we fully support computing trace stats on extension side, I will change the default to true then delete the flag. Jira: https://datadoghq.atlassian.net/browse/SVLS-7593 * fix: use tokio time instead of std time because tokio time can be frozen (#846) Tokio time allows us to pause or sleep without blocking the runtime. It also allows time to be paused (mainly for tests). I think we may need the sleep to force blocking code to yield --------- Co-authored-by: jordan gonzález <[email protected]> * add support for observability pipeline (#826) ## Task https://datadoghq.atlassian.net/jira/software/c/projects/SVLS/boards/5420?quickFilter=7573&selectedIssue=SVLS-7525 ## Overview * Add support for sending logs to an Observability Pipeline instead of directly to Datadog. * To enable, customers must set `DD_ENABLE_OBSERVABILITY_PIPELINE_FORWARDING` to true, and `DD_LOGS_CONFIG_LOGS_DD_URL` to their Observability Pipeline endpoint. Will fast follow and update docs to reflect this. * Initially, I was using setting up the observability pipeline with 'Datadog Agent' as the source. This required us to format the log message in a certain format. However, chatting with the Observability Pipeline Team, they actually recommend we use 'Http Server' as the source for our pipeline setup instead since this just accepts any json. ## Testing Created an [observability pipeline](https://ddserverless.datadoghq.com/observability-pipelines/b15e4a64-880d-11f0-b622-da7ad0900002/view) and deployed a lambda function with the changes. Triggered the lambda function and confirmed we see it in our [logs](https://ddserverless.datadoghq.com/logs?query=function_arn%3A%22arn%3Aaws%3Alambda%3Aus-east-1%3A425362996713%3Afunction%3Aobcdkstackv3-hellofunctionv3ec5a2fbe-l9qvtrowzb5q%22&agg_m=count&agg_m_source=base&agg_t=count&cols=host%2Cservice&messageDisplay=inline&refresh_mode=sliding&storage=hot&stream_sort=desc&viz=stream&from_ts=1758196420534&to_ts=1758369220534&live=true). We know it is going through the observability pipeline because we can see an attached 'http_server' attached as the source type. * feat: lower zstd default compression (#867) A quick test run showed our max duration skews on smaller lambda sizes with lots of data setting the zstd compression level to 6. Looks like we start to block the CPU at around thi smark. Gonna default it to 3, as tested below with 3 500k runs. <img width="1293" height="319" alt="image" src="https://github.com/user-attachments/assets/d1224676-f14f-4a55-8440-089bb9ff91d0" /> * revert(#817): reverts fallback config (#871) # What? This reverts commit 2396c4fe102677179c834c2dd65cb5b2ea79ca8f from #817 # Why? Need a release # Notes We'll cherry pick and bring it back at some point * chore: [Trace Stats] Rename env var DD_COMPUTE_TRACE_STATS (#875) # This PR As @apiarian-datadog suggested in https://github.com/DataDog/datadog-lambda-extension/pull/841#discussion_r2376111825, rename the feature flag `DD_COMPUTE_TRACE_STATS` to `DD_COMPUTE_TRACE_STATS_ON_EXTENSION` for clarity. # Notes Jira: https://datadoghq.atlassian.net/browse/SVLS-7593 * feat: remove failover to go (#882) Removes the failover to Go. If we can't parse any of the config options we log the failing value and move on with the default specified. * fix: use datadog as default propagation style if supplied version is malformed (#891) Fixes an issue where config parsing fails if this is invalid * fix: use None if propagation style is invalid (#895) After internal discussion we determined that the tracing libraries use None of the trace propagation style is invalid or malformed. This brings us into alignment. * feat: Support periodic reload for api key secret (#893) # This PR Supports the env var `DD_API_KEY_SECRET_RELOAD_INTERVAL`, in seconds. It applies when Datadog API Key is set using `DD_API_KEY_SECRET_ARN`. For example: - if it's `120`, then api key will be reloaded about every 120 seconds. Note that reload can only be triggered when api key is used, usually when data is being flushed. If there is no invocation and no data needs to be flushed, then reload won't happen. - If it's not set or set to `0`, then api key will only be loaded once the first time it is used, and won't be reloaded. # Motivation Some customers regularly rotate their api key in a secret. We need to provide a way for them to update our cached key. https://github.com/DataDog/datadog-lambda-extension/issues/834 # Testing ## Steps 1. Set the env var `DD_API_KEY_SECRET_RELOAD_INTERVAL` to `120` 2. Invoke the Lambda every minute ## Result The reload interval is passed to the `ApiKeyFactory` <img width="711" height="25" alt="image" src="https://github.com/user-attachments/assets/6fcc5081-accb-4928-8fa7-094d36aa2fa1" /> Reload happens roughly every 120 seconds. It's sometimes longer than 120 seconds due to the reason explained above. <img width="554" height="252" alt="image" src="https://github.com/user-attachments/assets/3fa78249-ff98-47d2-a953-f090630bbeb1" /> # Notes to Users When you use this env var, please also keep a grace period for the old api key after you update the secret to the new key, and make the grace period longer than the reload interval to give the extension sufficient time to reload the secret. # Internal Notes Jira: https://datadoghq.atlassian.net/browse/SVLS-7572 * [SVLS-7885] update tag splitting to allow for ',' and ' ' (#916) ## Overview We currently split the`DD_TAGS` only by `,`. Customer is asking if we can also split by spaces since that is common for container images and lambda lets you deploy images. (https://docs.datadoghq.com/getting_started/tagging/assigning_tags/?tab=noncontainerizedenvironments) * [SLES-2547] add metric namespace for DogStatsD (#920) Follow up from https://github.com/DataDog/serverless-components/pull/48 What does this PR do? Add support for DD_STATSD_METRIC_NAMESPACE. Motivation This was brought up by a customer, they noticed issues migrating to bottlecap. Our docs show we should support this, but we currently don't have it implemented - https://docs.datadoghq.com/serverless/guide/agent_configuration/#dogstatsd-custom-metrics. Additional Notes Requires changes in agent/extension. Will follow up with those PRs. Describe how to test/QA your changes Deployed changes to extension and tested with / without the custom namespace env variable. Confirmed that metrics are getting the prefix attached, [metrics](https://ddserverless.datadoghq.com/metric/explorer?fromUser=false&graph_layout=stacked&start=1762783238873&end=1762784138873&paused=false#N4Ig7glgJg5gpgFxALlAGwIYE8D2BXJVEADxQEYAaELcqyKBAC1pEbghkcLIF8qo4AMwgA7CAgg4RKUAiwAHOChASAtnADOcAE4RNIKtrgBHPJoQaUAbVBGN8qVoD6gnNtUZCKiOq279VKY6epbINiAiGOrKQdpYZAYgUJ4YThr42gDGSsgg6gi6mZaBZnHKGABuMMiZeBoIOKoAdPJYTFJNcMRwtRIdmfgiCMAAVDwgfKCR0bmxWABMickIqel4WTl5iIXFIHPlVcgAVjiMIk3TmvIY2U219Y0tbYwdXT0EkucDeEOj4zwAXSornceEwoXCINUYIwMVK8QmFFAUJhcJ0CwmQJA9SwaByoGueIQCE2UBwMCcmXBGggmUSaFEcCcckUynSDKg9MZTnoTGUIjcHjQiKSEHsmCwzIUmwZIiUgJ4fGx8gZCAAwlJhDAUCIwWgeEA) * refactor: Move metric namespace validation to dogstatsd util (#921) https://datadoghq.atlassian.net/browse/SLES-2547 - Updates dependency to use centralized parse_metric_namespace function. - Removes duplicate code in favor of the shared implementation. Test: - Deploy the extension and config w/ [DD_STATSD_METRIC_NAMESPACE](https://us-east-1.console.aws.amazon.com/lambda/home?region=us-east-1#/functions/ltn-fullinstrument-bn-10bst-node22-lambda?subtab=envVars&tab=configure) <img width="964" height="290" alt="image" src="https://github.com/user-attachments/assets/94836a3a-9905-44b4-9565-185745e47981" /> - Invoke the function and expect to see the metric using this custom prefix namespace <img width="1170" height="516" alt="Screenshot 2025-11-11 at 4 59 57 PM" src="https://github.com/user-attachments/assets/0bf4ac5e-ac1c-4cfe-817e-89b004717caf" /> [Metric link](https://ddserverless.datadoghq.com/metric/explorer?fromUser=true&graph_layout=stacked&start=1762897808375&end=1762898083375&paused=true#N4Ig7glgJg5gpgFxALlAGwIYE8D2BXJVEADxQEYAaELcqyKBAC1pEbghkcLIF8qo4AMwgA7CAgg4RKUAiwAHOChASAtnADOcAE4RNIKtrgBHPJoQaUAbVBGN8qVoD6gnNtUZCKiOq279VKY6epbINiAiGOrKQdpYZAYgUJ4YThr42gDGSsgg6gi6mZaBZnHKGABuMMhsaGg4YG5oUAB0WmiCLapS4m6iMMAAVDwgPAC6VBpyaDmg8hgzCAg5STgwTpmYGhoQmYloonBOcorK6QdQ+4dO9EzKIm4eaKP8EPaYWMcKKwciSuM8Pggd7iADCUmEMBQIjwdR4QA) * [SVLS-7704] add support for SSM Parameter API key (#924) ## Overview * Add support for customers storing Datadog API Key in SSM Parameter Store. ## Testing * Deployed changes and confirmed this work with Parameter Store String and SecureString. * feat: Add support for DD_LOGS_ENABLED as alias for DD_SERVERLESS_LOGS_ENABLED (#928) https://datadoghq.atlassian.net/browse/SVLS-7818 ## Overview Add DD_LOGS_ENABLED environment variable and YAML config option as an alias for DD_SERVERLESS_LOGS_ENABLED. Both variables now use OR logic, meaning logs are enabled if either variable is set to true. Changes: - Add logs_enabled field to EnvConfig and YamlConfig structs - Implement OR logic in merge_config functions: logs are enabled if either DD_LOGS_ENABLED or DD_SERVERLESS_LOGS_ENABLED is true - Add comprehensive test coverage with 9 test cases covering all combinations of the two variables - Maintain backward compatibility with existing configurations - Default value remains true when neither variable is set ## Testing Set DD_LOGS_ENABLED and DD_SERVERLESS_LOGS_ENABLED to false and expect: - [Log can be found in AWS console](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logsV2:log-groups/log-group/$252Faws$252Flambda$252Fltn-fullinstrument-bn-cold-node22-lambda/log-events/2025$252F11$252F13$252F$255B$2524LATEST$255D455478dcbc944055b5be933e2e099f6a$3FfilterPattern$3DREPORT+RequestId) - [Log could NOT be found in DD console](https://ddserverless.datadoghq.com/logs?query=source%3Alambda%20%40lambda.arn%3A%22arn%3Aaws%3Alambda%3Aus-east-1%3A425362996713%3Afunction%3Altn-fullinstrument-bn-cold-node22-lambda%22%20AND%20%22REPORT%20RequestId%22&agg_m=count&agg_m_source=base&agg_t=count&clustering_pattern_field_path=message&cols=host%2Cservice%2C%40lambda.request_id&fromUser=true&messageDisplay=inline&refresh_mode=paused&storage=hot&stream_sort=desc&viz=stream&from_ts=1763063694206&to_ts=1763065424700&live=false) Otherwise the log should be available in DD console. * chore: Upgrade libdatadog and construct http client for traces (#917) Upgrade libdatadog. Including: - Rename a few creates: - `ddcommon` -> `libdd-common` - `datadog-trace-protobuf` -> `libdd-trace-protobuf` - `datadog-trace-utils` -> `libdd-trace-utils` - `datadog-trace-normalization` -> `libdd-trace-normalization` - `datadog-trace-stats` -> `libdd-trace-stats` - Use the new API to send traces, which takes in an http_client object instead of proxy url string GitHub issue: https://github.com/DataDog/datadog-lambda-extension/issues/860 Jira: https://datadoghq.atlassian.net/browse/SLES-2499 Slack discussion: https://dd.slack.com/archives/C01TCF143GB/p1762526199549409 * Merge Lambda Managed Instance feature branch (#947) https://datadoghq.atlassian.net/browse/SVLS-8080 ## Overview Merge Lambda Managed Instance feature branch ## Testing Covered by individual commits Co-authored-by: shreyamalpani <[email protected]> Co-authored-by: duncanista <[email protected]> Co-authored-by: astuyve <[email protected]> Co-authored-by: jchrostek-dd <[email protected]> Co-authored-by: tianning.li <[email protected]> * fix(config): support colons in tag values (URLs, etc.) (#953) https://datadoghq.atlassian.net/browse/SVLS-8095 ## Overview Tag parsing previously used split(':') which broke values containing colons like URLs (git.repository_url:https://...). Changed to usesplitn(2, ':') to split only on the first colon, preserving the rest as the value. Changes: - Add parse_key_value_tag() helper to centralize parsing logic - Refactor deserialize_key_value_pairs to use helper - Refactor deserialize_key_value_pair_array_to_hashmap to use helper - Add comprehensive test coverage for URL values and edge cases ## Testing unit test and expect e2e tests to pass Co-authored-by: tianning.li <[email protected]> * [SVLS-7934] feat: Support TLS certificate for trace/stats flusher (#961) ## Problem A customer reported that their Lambda is behind a proxy, and the Rust-based extension can't send traces to Datadog via the proxy, while the previous go-based extension worked. ## This PR Supports the env var `DD_TLS_CERT_FILE`: The path to a file of concatenated CA certificates in PEM format. Example: `DD_TLS_CERT_FILE=/opt/ca-cert.pem`, so the when the extension flushes traces/stats to Datadog, the HTTP client created can load and use this cert, and connect the proxy properly. ## Testing ### Steps 1. Create a Lambda in a VPC with an NGINX proxy. 2. Add a layer to the Lambda, which includes the CA certificate `ca-cert.pem` 3. Set env vars: - `DD_TLS_CERT_FILE=/opt/ca-cert.pem` - `DD_PROXY_HTTPS=http://10.0.0.30:3128`, where `10.0.0.30` is the private IP of the proxy EC2 instance - `DD_LOG_LEVEL=debug` 4. Update routing rules of security groups so the Lambda can reach `http://10.0.0.30:3128` 5. Invoke the Lambda ### Result **Before** Trace flush failed with error logs: > DD_EXTENSION | ERROR | Max retries exceeded, returning request error error=Network error: client error (Connect) attempts=1 DD_EXTENSION | ERROR | TRACES | Request failed: No requests sent **After** Trace flush is successful: > DD_EXTENSION | DEBUG | TRACES | Flushing 1 traces DD_EXTENSION | DEBUG | TRACES | Added root certificate from /opt/ca-cert.pem DD_EXTENSION | DEBUG | TRACES | Proxy connector created with proxy: Some("http://10.0.0.30:3128") DD_EXTENSION | DEBUG | Sending with retry url=https://trace.agent.datadoghq.com/api/v0.2/traces payload_size=1120 max_retries=1 DD_EXTENSION | DEBUG | Received response status=202 Accepted attempt=1 DD_EXTENSION | DEBUG | Request succeeded status=202 Accepted attempts=1 DD_EXTENSION | DEBUG | TRACES | Flushing took 1609 ms ## Notes This fix only covers trace flusher and stats flusher, which use `ServerlessTraceFlusher::get_http_client()` to create the HTTP client. It doesn't cover logs flusher and proxy flusher, which use a different function (http.rs:get_client()) to create the HTTP client. However, logs flushing was successful in my tests, even if no certificate was added. We can come back to logs/proxy flusher if someone reports an error. * chore: Upgrade libdatadog (#964) ## Overview The crate `datadog-trace-obfuscation` has been renamed as `libdd-trace-obfuscation`. This PR updates this dependency. ## Testing * [SVLS-8211] feat: Add timeout for requests to span_dedup_service (#986) ## Problem Span dedup service sometimes fails to return the result and thus logs the error: > DD_EXTENSION | ERROR | Failed to send check_and_add response: true I see this error in our Self Monitoring and a customer's account. Also I believe it causes extension to fail to receive traces from the tracer, causing missing traces. This is because the caller of span dedup is in `process_traces()`, which is the function that handles the tracer's HTTP request to send traces. If this function fails to get span dedup result and gets stuck, the HTTP request will time out. ## This PR While I don't yet know what causes the error, this PR adds a patch to mitigate the impact: 1. Change log level from `error` to `warn` 2. Add a timeout of 5 seconds to the span dedup check, so that if the caller doesn't get an answer soon, it defaults to treating the trace as not a duplicate, which is the most common case. ## Testing To merge this PR then check log in self monitoring, as it's hard to run high-volume tests in self monitoring from a non-main branch. * [SVLS-8150] fix(config): ensure logs intake URL is correctly prefixed (#1021) ## Overview Ensures `DD_LOGS_CONFIG_LOGS_DD_URL` is correctly prefixed with `https://` ## Testing Manually tested that logs get sent to alternate logs intake * chore(deps): upgrade dogstatsd (#1020) ## Overview Continuation of #1018 removing unnecessary mut lock on callers for dogstatsd * chore(deps): upgrade rust to `v1.93.1` (#1034) ## What? Upgrade rust to latest stable 1.93.1 ## Why? `time` vulnerability fix is only available on rust >= 1.88.0 * feat(http): allow skip ssl validation (#1064) ## Overview Add DD_SKIP_SSL_VALIDATION support, parsed from both env and YAML, matching the datadog-agent's behavior — applied to all outgoing HTTP clients (reqwest via danger_accept_invalid_certs, hyper via custom ServerCertVerifier). ## Motivation Customers in environments with corporate proxies or custom CA setups need the ability to disable TLS certificate validation, matching the existing datadog-agent config option. The Go agent applies tls.Config{InsecureSkipVerify: true} to all HTTP transports via a central CreateHTTPTransport() — we mirror this by wiring the config through to both client builders. And [SLES-2710](https://datadoghq.atlassian.net/browse/SLES-2710) ## Changes Config (config/mod.rs, config/env.rs, config/yaml.rs): - Add skip_ssl_validation: bool to Config, EnvConfig, and YamlConfig with default false reqwest client (http.rs): - .danger_accept_invalid_certs(config.skip_ssl_validation) on the shared client builder hyper client (traces/http_client.rs): - Custom NoVerifier implementing rustls::client::danger::ServerCertVerifier that accepts all certificates - Uses CryptoProvider::get_default() (not hardcoded aws_lc_rs) for FIPS-safe signature scheme reporting - New skip_ssl_validation parameter on create_client() ## Testing Unit tests and self monitoring [SLES-2710]: https://datadoghq.atlassian.net/browse/SLES-2710?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ * add Cargo.toml for datadog-agent-config * update licenses * remove aws.rs from datadog-agent-config * chore: upgrade workspace rust edition to 2024 (#96) * upgrade rust edition to 2024 for workspace * apply formatting --------- Co-authored-by: jordan gonzález <[email protected]> Co-authored-by: alexgallotta <[email protected]> Co-authored-by: AJ Stuyvenberg <[email protected]> Co-authored-by: Nicholas Hulston <[email protected]> Co-authored-by: Aleksandr Pasechnik <[email protected]> Co-authored-by: shreyamalpani <[email protected]> Co-authored-by: Yiming Luo <[email protected]> Co-authored-by: Florentin Labelle <[email protected]> Co-authored-by: Romain Marcadier <[email protected]> Co-authored-by: Zarir Hamza <[email protected]> Co-authored-by: Romain Marcadier <[email protected]> Co-authored-by: Tianning Li <[email protected]> Co-authored-by: jchrostek-dd <[email protected]> Co-authored-by: astuyve <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Upgrade workspace rust edition to 2024.
Motivation
Upstreaming config crate in #56. The config crate uses rust edition 2024 so this PR upgrades the rust edition to 2024 for the entire workspace.
https://datadoghq.atlassian.net/browse/SVLS-5564
Additional Notes
temp-envcrateDescribe how to test/QA your changes
Automated tests, built and deployed to Azure Functions in Serverless Compatibility Layer.