Skip to content

fix(ci): intermittent Core Contrib Test failures due to git SHA propagation lag #4304

@MikeGoldsmith

Description

@MikeGoldsmith

Describe your environment

OS: GitHub Actions (ubuntu-latest)
Python version: N/A (CI infrastructure issue)
Package version: N/A

What happened?

The Core Contrib Test CI check triggered from opentelemetry-python PRs fails intermittently with:

failed to find branch, tag, or commit `<SHA>`
fatal: ambiguous argument '<SHA>^0': unknown revision or path not in the working tree.

Re-running the job fixes it. This happens ~5–20% of the time on fresh commits.

Steps to Reproduce

  1. Open or push a commit to a PR in open-telemetry/opentelemetry-python
  2. The Core Contrib Test workflow is triggered with CORE_REPO_SHA set to the new commit SHA
  3. Observe the job failing immediately on uv pip install 'opentelemetry-api @ git+https://github.com/open-telemetry/opentelemetry-python.git@<SHA>...'
  4. Re-run the job a minute later — it passes

Example failure: open-telemetry/opentelemetry-python#4898

Expected Result

CI passes deterministically on the first run.

Actual Result

Flaky failure requiring a manual re-run. The failure is not related to the code under test.

Root Cause

Each Core Contrib Test job performs two separate git operations against the core repo:

Step Aactions/checkout@v4 checks out core at path: opentelemetry-python using GitHub's internal API. This always works.

Step Btox installs core packages via a fresh raw git clone:

uv pip install 'opentelemetry-api @ git+https://github.com/open-telemetry/opentelemetry-python.git@<SHA>#egg=opentelemetry-api&subdirectory=opentelemetry-api'

Step B hits GitHub's public git CDN directly. On fresh commits, the SHA hasn't propagated to all CDN nodes yet, causing the failure. Step A succeeds because it uses a different (internal) path.

Proposed Fix

Eliminate step B by pointing tox at the already-checked-out local copy from step A instead of doing a second network clone.

The fix involves:

  1. Adding per-package env vars to tox.ini (CORE_REPO_API, CORE_REPO_SDK, etc.) with fallbacks to the existing git URL behavior — fully backward compatible for local dev
  2. Adding a CI step in the Jinja2 workflow template that sets those vars to local paths under $GITHUB_WORKSPACE/opentelemetry-python/
  3. Regenerating core_contrib_test_0.yml via tox -e generate-workflows

Both package @ /abs/path with tox-uv and nested tox substitution ({env:VAR:{env:OTHER}\#fragment}) have been validated locally.

Additional context

No existing issues or PRs track this in either repo. The fix is entirely in opentelemetry-python-contrib — no changes needed in opentelemetry-python.

Would you like to implement a fix?

Yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions