Skip to content

test.has_failed_all_retries tag not set for Auto Test Retries #7651

@anmarchenko

Description

@anmarchenko

Summary

When a test exhausts all Auto Test Retry (ATR) attempts and still fails, the test.has_failed_all_retries tag is not set on the last retry span. This tag is correctly set for Early Flake Detection (EFD) retries and Attempt-to-Fix retries, but the ATR code path was missed.

The bug affects all framework instrumentations: Jest, Mocha, Vitest, Playwright, and Cucumber.

Steps to Reproduce

  1. Enable Auto Test Retries (flaky_test_retries_enabled: true in settings)
  2. Run a test that always fails (exhausts all 5 default retries)
  3. Inspect the test spans

Expected: The last retry span has test.has_failed_all_retries: true
Actual: No span has the test.has_failed_all_retries tag. Only test.is_retry: true and test.retry_reason: auto_test_retry are set.

Reproduction with mockdog

# Using Shepherd (https://github.com/nicholasgasior/shepherd) e2e test infrastructure:

# Create an always-failing Jest test
cat > atrAlwaysFail.spec.js <<TESTEOF
describe('ATR Always Fail', () => {
  it('test_always_fail', () => {
    expect(1).toBe(2);
  });
});
TESTEOF

# Run with ATR-enabled mockdog scenario (flaky_test_retries_enabled: true)
# Result: 6 spans (1 original + 5 retries), all fail, but no test.has_failed_all_retries tag on any span

Root Cause

In the instrumentation code, failedAllTests / hasFailedAllRetries is only set to true under two conditions:

  1. Attempt-to-Fix retries (Test Management)
  2. EFD retries (Early Flake Detection)

There is no code path that sets it for ATR retries.

Affected files

All five framework instrumentations have the same gap:

File Variable ATR sets it?
packages/datadog-instrumentations/src/jest.js:579-669 failedAllTests No
packages/datadog-instrumentations/src/mocha/utils.js:260-289 hasFailedAllRetries No
packages/datadog-instrumentations/src/vitest.js:1027-1047 hasFailedAllRetries No
packages/datadog-instrumentations/src/playwright.js:388-405 test._ddHasFailedAllRetries No
packages/datadog-instrumentations/src/cucumber.js:312-368 hasFailedAllRetries No

Jest example (packages/datadog-instrumentations/src/jest.js)

// Line 579: initialized to false
let failedAllTests = false

// Lines 594-599: only set for Attempt-to-Fix
if (isAttemptToFix) {
  // ...
  if (testStatuses.every(status => status === 'fail')) {
    failedAllTests = true  // ← ONLY for attempt-to-fix
  }
}

// Lines 667-669: only set for EFD
if (efdRetryCount > 0 && testStatuses.length === efdRetryCount + 1 &&
  testStatuses.every(status => status === 'fail')) {
  failedAllTests = true  // ← ONLY for EFD
}

// Lines 711-714: ATR retry detection — no failedAllTests logic
let isAtrRetry = false
if (this.isFlakyTestRetriesEnabled && event.test?.invocations > 1 && !isAttemptToFix && !isEfdRetry) {
  isAtrRetry = true
  // ← Missing: failedAllTests = true when all ATR retries fail
}

Comparison with Ruby (datadog-ci-rb)

Ruby's datadog-ci-rb handles this correctly in lib/datadog/ci/test_retries/component.rb:132-133:

def tag_last_retry(test_span)
  test_span&.set_tag(TAG_HAS_FAILED_ALL_RETRIES, "true") if test_span&.all_executions_failed?
end

This method is called for ALL retry strategies (ATR, EFD, attempt-to-fix) via a unified code path, ensuring consistent tagging regardless of the retry reason.

Suggested Fix

For each framework instrumentation, add ATR-specific logic that sets failedAllTests / hasFailedAllRetries to true when:

  • ATR is enabled
  • The test has exhausted all retry attempts (invocations === maxRetries + 1)
  • Every execution (original + all retries) has status fail

Jest example fix

// After the existing EFD block (~line 670), add:
// ATR: check if all retries have been exhausted and all failed
if (this.isFlakyTestRetriesEnabled && !isAttemptToFix && !isEfdRetry) {
  const maxRetries = Number(this.global[RETRY_TIMES]) || 0
  if (event.test?.invocations === maxRetries + 1 && status === 'fail') {
    // All invocations failed (since ATR stops early on first pass,
    // reaching maxRetries + 1 with a fail status means all attempts failed)
    failedAllTests = true
  }
}

The same pattern should be applied to mocha, vitest, playwright, and cucumber instrumentations.

Integration Test Coverage

The existing integration tests only verify TEST_HAS_FAILED_ALL_RETRIES for EFD (jest.spec.js:2464) and attempt-to-fix (jest.spec.js:4746). A new test should be added for the ATR case, similar to the existing EFD test but with isFlakyTestRetriesEnabled: true instead of EFD settings.

Impact

  • Severity: Low-medium. The core ATR retry behavior works correctly (retries happen, tagging with test.is_retry and test.retry_reason is correct, build status is correct). Only the informational test.has_failed_all_retries tag is missing.
  • Affected users: Anyone using Auto Test Retries who filters or reports on tests that failed all retries — the Datadog UI or API queries filtering on this tag will miss ATR-exhausted tests.

Environment

  • dd-trace-js version: 6.0.0-pre (master branch, tested 2026-03-03)
  • Node.js: v24.13.1
  • Test framework: Jest 27.5.1 (but affects all frameworks)
  • OS: macOS (darwin arm64)

Metadata

Metadata

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions