Skip to content

test(agent): add timeout and error logging to checkAgentStatus#7724

Merged
tlhunter merged 1 commit intomasterfrom
watson/better-flake-error
Mar 11, 2026
Merged

test(agent): add timeout and error logging to checkAgentStatus#7724
tlhunter merged 1 commit intomasterfrom
watson/better-flake-error

Conversation

@watson
Copy link
Copy Markdown
Collaborator

@watson watson commented Mar 10, 2026

What does this PR do?

Improves observability and resilience of the checkAgentStatus() function in the test agent helper (packages/dd-trace/test/plugins/agent.js). This function checks whether a real test agent is running before each test, but previously had no timeout on its HTTP request — meaning if a TCP connection was established but no HTTP response received, the promise would hang indefinitely, causing opaque Mocha timeouts.

Changes:

  • Add a 2s timeout to the checkAgentStatus() HTTP request with a descriptive warning when hit, so the root cause is immediately visible in CI logs instead of surfacing as a generic test timeout.
  • Log unexpected errors (anything other than ECONNREFUSED, which is the normal "no test agent" case) to aid future debugging.

Motivation

The AI Guard Windows CI job has been experiencing flaky timeouts in the beforeEach hook, which calls agent.load()checkAgentStatus(). The lack of a timeout on the HTTP request is a likely cause — if something on the CI runner accepts the TCP connection on port 9126 without speaking HTTP, the test hangs with no useful diagnostic output. This change ensures hangs are caught early with a clear message.

The checkAgentStatus function in the test agent helper makes an HTTP
request to the test agent with no timeout. If the TCP connection is
established but no HTTP response is received, the promise hangs
indefinitely. This is a likely cause of flaky test timeouts with no
indication of the root cause (e.g. the AI Guard Windows CI job).

Add a 2s timeout with a descriptive warning so hangs are caught early
and the cause is visible in CI logs. Also log unexpected errors (other
than ECONNREFUSED, which is the normal "no test agent" case) to aid
future debugging.
@watson watson requested a review from a team as a code owner March 10, 2026 09:49
@watson watson self-assigned this Mar 10, 2026
Copy link
Copy Markdown
Collaborator Author

watson commented Mar 10, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions
Copy link
Copy Markdown
Contributor

Overall package size

Self size: 4.95 MB
Deduped: 5.79 MB
No deduping: 5.79 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 3.0.0 | 81.15 kB | 815.98 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.39%. Comparing base (c8db939) to head (527b91b).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7724   +/-   ##
=======================================
  Coverage   80.38%   80.39%           
=======================================
  Files         741      741           
  Lines       32063    32064    +1     
=======================================
+ Hits        25775    25777    +2     
+ Misses       6288     6287    -1     
Flag Coverage Δ
aiguard-macos 38.88% <ø> (-0.10%) ⬇️
aiguard-ubuntu 39.00% <ø> (-0.10%) ⬇️
aiguard-windows 38.72% <ø> (-0.10%) ⬇️
apm-capabilities-tracing-macos 48.89% <ø> (+<0.01%) ⬆️
apm-capabilities-tracing-ubuntu 48.93% <ø> (+<0.01%) ⬆️
apm-capabilities-tracing-windows 48.66% <ø> (+<0.01%) ⬆️
apm-integrations-child-process 38.44% <ø> (-0.10%) ⬇️
apm-integrations-couchbase-18 37.35% <ø> (-0.09%) ⬇️
apm-integrations-couchbase-eol 37.82% <ø> (-0.10%) ⬇️
apm-integrations-oracledb 37.59% <ø> (-0.16%) ⬇️
appsec-express 55.14% <ø> (-0.07%) ⬇️
appsec-fastify 51.50% <ø> (-0.07%) ⬇️
appsec-graphql 51.68% <ø> (-0.07%) ⬇️
appsec-kafka 44.21% <ø> (-0.08%) ⬇️
appsec-ldapjs 43.90% <ø> (-0.08%) ⬇️
appsec-lodash 43.56% <ø> (-0.08%) ⬇️
appsec-macos 58.15% <ø> (-0.07%) ⬇️
appsec-mongodb-core 48.69% <ø> (-0.08%) ⬇️
appsec-mongoose 49.36% <ø> (-0.07%) ⬇️
appsec-mysql 50.74% <ø> (-0.07%) ⬇️
appsec-node-serialize 43.08% <ø> (-0.08%) ⬇️
appsec-passport 47.48% <ø> (-0.08%) ⬇️
appsec-postgres 50.48% <ø> (-0.07%) ⬇️
appsec-sourcing 42.49% <ø> (-0.08%) ⬇️
appsec-template 43.25% <ø> (-0.08%) ⬇️
appsec-ubuntu 58.23% <ø> (-0.07%) ⬇️
appsec-windows 58.01% <ø> (-0.05%) ⬇️
instrumentations-instrumentation-bluebird 32.29% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-body-parser 40.38% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-child_process 37.75% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-cookie-parser 34.27% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-express 34.59% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-express-mongo-sanitize 34.40% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-express-session 40.01% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-fs 31.91% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-generic-pool 29.91% <ø> (ø)
instrumentations-instrumentation-http 39.73% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-knex 32.30% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-mongoose 33.42% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-multer 40.13% <ø> (-0.09%) ⬇️
instrumentations-instrumentation-mysql2 38.21% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-passport 43.88% <ø> (-0.08%) ⬇️
instrumentations-instrumentation-passport-http 43.56% <ø> (-0.08%) ⬇️
instrumentations-instrumentation-passport-local 44.09% <ø> (-0.08%) ⬇️
instrumentations-instrumentation-pg 37.65% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-promise 32.22% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-promise-js 32.23% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-q 32.27% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-url 32.20% <ø> (-0.10%) ⬇️
instrumentations-instrumentation-when 32.24% <ø> (-0.10%) ⬇️
llmobs-ai 42.21% <ø> (-0.09%) ⬇️
llmobs-anthropic 40.19% <ø> (-0.09%) ⬇️
llmobs-bedrock 39.17% <ø> (-0.08%) ⬇️
llmobs-google-genai 39.73% <ø> (-0.08%) ⬇️
llmobs-langchain 39.97% <ø> (-0.07%) ⬇️
llmobs-openai 43.92% <ø> (-0.08%) ⬇️
llmobs-vertex-ai 39.98% <ø> (-0.02%) ⬇️
platform-core 31.53% <ø> (ø)
platform-esbuild 34.48% <ø> (ø)
platform-instrumentations-misc 48.40% <ø> (ø)
platform-shimmer 37.63% <ø> (ø)
platform-unit-guardrails 32.95% <ø> (ø)
plugins-azure-event-hubs 25.83% <ø> (ø)
plugins-azure-service-bus 25.19% <ø> (ø)
plugins-bullmq 44.18% <ø> (-0.09%) ⬇️
plugins-cassandra 37.70% <ø> (-0.09%) ⬇️
plugins-cookie 26.89% <ø> (ø)
plugins-cookie-parser 26.67% <ø> (ø)
plugins-crypto 26.79% <ø> (ø)
plugins-dd-trace-api 38.27% <ø> (-0.10%) ⬇️
plugins-express-mongo-sanitize 26.82% <ø> (ø)
plugins-express-session 26.63% <ø> (ø)
plugins-fastify 42.10% <ø> (-0.09%) ⬇️
plugins-fetch 38.26% <ø> (-0.09%) ⬇️
plugins-fs 38.54% <ø> (-0.10%) ⬇️
plugins-generic-pool 25.87% <ø> (ø)
plugins-google-cloud-pubsub 45.31% <ø> (-0.08%) ⬇️
plugins-grpc 40.82% <ø> (-0.09%) ⬇️
plugins-handlebars 26.86% <ø> (ø)
plugins-hapi 40.01% <ø> (-0.09%) ⬇️
plugins-hono 40.27% <ø> (-0.09%) ⬇️
plugins-ioredis 38.35% <ø> (-0.10%) ⬇️
plugins-knex 26.50% <ø> (ø)
plugins-ldapjs 24.36% <ø> (ø)
plugins-light-my-request 26.23% <ø> (ø)
plugins-limitd-client 32.42% <ø> (-0.24%) ⬇️
plugins-lodash 25.96% <ø> (ø)
plugins-mariadb 39.40% <ø> (-0.10%) ⬇️
plugins-memcached 38.07% <ø> (-0.10%) ⬇️
plugins-microgateway-core 39.06% <ø> (-0.09%) ⬇️
plugins-moleculer 40.41% <ø> (-0.09%) ⬇️
plugins-mongodb 39.09% <ø> (-0.09%) ⬇️
plugins-mongodb-core 38.92% <ø> (-0.09%) ⬇️
plugins-mongoose 38.78% <ø> (-0.09%) ⬇️
plugins-multer 26.63% <ø> (ø)
plugins-mysql 39.09% <ø> (-0.10%) ⬇️
plugins-mysql2 39.18% <ø> (-0.09%) ⬇️
plugins-node-serialize 26.93% <ø> (ø)
plugins-opensearch 37.53% <ø> (-0.09%) ⬇️
plugins-passport-http 26.68% <ø> (ø)
plugins-postgres 35.66% <ø> (-0.08%) ⬇️
plugins-process 26.79% <ø> (ø)
plugins-pug 26.89% <ø> (ø)
plugins-redis 38.81% <ø> (-0.10%) ⬇️
plugins-router 42.82% <ø> (-0.09%) ⬇️
plugins-sequelize 25.47% <ø> (ø)
plugins-test-and-upstream-amqp10 38.43% <ø> (-0.10%) ⬇️
plugins-test-and-upstream-amqplib 43.78% <ø> (-0.09%) ⬇️
plugins-test-and-upstream-apollo 38.93% <ø> (-0.08%) ⬇️
plugins-test-and-upstream-avsc 38.60% <ø> (-0.10%) ⬇️
plugins-test-and-upstream-bunyan 33.83% <ø> (-0.10%) ⬇️
plugins-test-and-upstream-connect 40.66% <ø> (-0.09%) ⬇️
plugins-test-and-upstream-graphql 40.04% <ø> (-0.09%) ⬇️
plugins-test-and-upstream-koa 40.25% <ø> (-0.09%) ⬇️
plugins-test-and-upstream-protobufjs 38.82% <ø> (-0.10%) ⬇️
plugins-test-and-upstream-rhea 43.96% <ø> (-0.07%) ⬇️
plugins-undici 39.04% <ø> (-0.09%) ⬇️
plugins-url 26.79% <ø> (ø)
plugins-valkey 38.02% <ø> (-0.10%) ⬇️
plugins-vm 26.79% <ø> (ø)
plugins-winston 34.03% <ø> (-0.09%) ⬇️
plugins-ws 41.77% <ø> (-0.09%) ⬇️
profiling-macos 39.85% <ø> (-0.09%) ⬇️
profiling-ubuntu 39.97% <ø> (-0.09%) ⬇️
profiling-windows 41.16% <ø> (-0.09%) ⬇️
serverless-azure-functions-client 25.54% <ø> (ø)
serverless-azure-functions-eventhubs 25.54% <ø> (ø)
serverless-azure-functions-servicebus 25.54% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Mar 10, 2026

Benchmarks

Benchmark execution time: 2026-03-10 09:58:35

Comparing candidate commit 527b91b in PR branch watson/better-flake-error with baseline commit c8db939 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 230 metrics, 30 unstable metrics.

@watson watson requested a review from a team March 10, 2026 10:10
@gh-worker-ownership-write-b05516 gh-worker-ownership-write-b05516 bot removed the request for review from a team March 11, 2026 18:21
@tlhunter tlhunter merged commit c555085 into master Mar 11, 2026
790 of 791 checks passed
@tlhunter tlhunter deleted the watson/better-flake-error branch March 11, 2026 18:21
dd-octo-sts bot pushed a commit that referenced this pull request Mar 12, 2026
The checkAgentStatus function in the test agent helper makes an HTTP
request to the test agent with no timeout. If the TCP connection is
established but no HTTP response is received, the promise hangs
indefinitely. This is a likely cause of flaky test timeouts with no
indication of the root cause (e.g. the AI Guard Windows CI job).

Add a 2s timeout with a descriptive warning so hangs are caught early
and the cause is visible in CI logs. Also log unexpected errors (other
than ECONNREFUSED, which is the normal "no test agent" case) to aid
future debugging.
@dd-octo-sts dd-octo-sts bot mentioned this pull request Mar 12, 2026
CarlesDD pushed a commit that referenced this pull request Mar 16, 2026
The checkAgentStatus function in the test agent helper makes an HTTP
request to the test agent with no timeout. If the TCP connection is
established but no HTTP response is received, the promise hangs
indefinitely. This is a likely cause of flaky test timeouts with no
indication of the root cause (e.g. the AI Guard Windows CI job).

Add a 2s timeout with a descriptive warning so hangs are caught early
and the cause is visible in CI logs. Also log unexpected errors (other
than ECONNREFUSED, which is the normal "no test agent" case) to aid
future debugging.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants