Skip to content

[Infra] Isolate unit test workflows with hardened security posture#24740

Merged
yuneng-berri merged 3 commits intomainfrom
litellm_unit_test_workflow_isolation
Mar 28, 2026
Merged

[Infra] Isolate unit test workflows with hardened security posture#24740
yuneng-berri merged 3 commits intomainfrom
litellm_unit_test_workflow_isolation

Conversation

@yuneng-berri
Copy link
Copy Markdown
Collaborator

Summary

Problem

The current matrix-based unit test workflow (test-litellm-matrix.yml) has two issues:

  1. Opaque job names — failures show as test (proxy-unit-b6) or test (other-3), making it impossible to tell what broke at a glance
  2. Single monolithic workflow — no separation between test domains, making it harder to enforce per-workflow security policies

Fix

  • Added 16 new workflow files that replace the matrix with individually-named workflows per test domain
  • Created a reusable base workflow (_test-unit-base.yml) to eliminate setup duplication
  • Hardened security posture across all workflows:
    • Zero secrets: references — unit tests have no access to any secrets
    • permissions: { contents: read } only — least privilege
    • All actions pinned to commit SHAs (not tags) to prevent supply chain attacks
    • persist-credentials: false on all checkout steps
    • Template injection prevention via env: indirection (no ${{ }} in run: blocks)
    • Concurrency groups with cancel-in-progress: true
    • Timeouts on all jobs

New Workflows

Workflow Tests
Unit Tests: LLM Provider Transformations Vertex AI + all other provider request/response transforms
Unit Tests: Proxy Auth & Key Management JWT, RBAC, API key validation, policy engine
Unit Tests: Proxy API Endpoints All proxy HTTP endpoint handlers (15 subdirs)
Unit Tests: Proxy Infrastructure DB ops, middleware, spend tracking, experimental
Unit Tests: Core Utilities Token counting, cost calculation, streaming
Unit Tests: Integrations Mocked Langfuse, DataDog, Prometheus callbacks
Unit Tests: Responses, Caching & Types Response format conversion, cache strategy, types
Unit Tests: Enterprise, Google GenAI & Routing Enterprise features, GenAI transforms, router logic
Unit Tests: MCP, Secrets, Containers & Misc Remaining test domains
Unit Tests: Proxy Legacy Tests Legacy proxy tests (9 descriptive matrix entries)
Unit Tests: Router Router unit tests (new to GHA)
Unit Tests: Pass-Through Endpoints Pass-through endpoint tests (new to GHA)
Unit Tests: LiteLLM Utilities Utility function tests (new to GHA)
Unit Tests: Security Proxy security tests (new to GHA)
Unit Tests: Documentation Validation Documentation validation tests (new to GHA)

Testing

  • All 16 workflows trigger and pass on this PR
  • Verify job names are descriptive in the Actions tab
  • Confirm no secrets are accessible to any unit test job

Type

🚄 Infrastructure
✅ Test

Replace monolithic matrix workflow with individual, descriptively-named
workflow files. Each workflow uses a shared reusable base and follows
least-privilege security: zero secrets, read-only permissions, SHA-pinned
actions, persist-credentials: false, and env-var indirection to prevent
template injection.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 28, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 28, 2026 5:18pm

Request Review

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 28, 2026

CLA assistant check
All committers have signed the CLA.

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 28, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing litellm_unit_test_workflow_isolation (c717189) with main (2eb3c20)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 28, 2026

Greptile Summary

This PR introduces a hardened CI architecture by decomposing the single test-litellm-matrix.yml into 11 new individual workflow files (10 caller workflows + 1 reusable base), with 5 additional workflows referenced in the PR description but not yet present in this diff. The security improvements are genuine and well-executed: all external actions are pinned to commit SHAs, persist-credentials: false is applied universally, permissions: contents: read is enforced everywhere, secrets are fully absent, and template injection is prevented via env: indirection throughout.

Key changes:

  • _test-unit-base.yml: Centralises the 6-step setup (checkout, Python, Poetry, cache, deps, Prisma) into a reusable workflow_call target, parameterising test-path, workers, reruns, timeout-minutes, and max-failures
  • 10 caller workflows: Each maps one or more test directories to the base, with descriptive job names replacing the opaque test (proxy-unit-b6) style
  • test-unit-proxy-legacy.yml: Cannot reuse the base (GitHub does not support matrix + workflow_call), so it duplicates all 6 setup steps inline and runs a 9-entry alphabetically-partitioned matrix over tests/proxy_unit_tests/
  • Dependency version improvements: google-cloud-aiplatform is now pinned (==1.115.0 vs old >=1.38), openapi-core is pinned (==0.23.0), and nodejs-wheel-binaries==24.13.1 is added for reproducible Prisma client generation
  • Coverage verification: Cross-referencing against test-litellm-matrix.yml confirms all 20 old matrix test paths are preserved, with router_strategy correctly migrated from other-3 to enterprise-routing
  • The old test-litellm-matrix.yml still exists and will double CI minutes on every PR until removed (noted in prior threads)

Confidence Score: 5/5

Safe to merge; all findings are P2 style/quality suggestions with no correctness or security impact

The security hardening is correctly implemented throughout. Full test coverage parity with the old matrix workflow was confirmed by cross-referencing both files. The only remaining concerns are P2: a latent alphabetical gap in the proxy-legacy glob partition (no current file falls through) and unquoted glob expansion in the base workflow (existing behaviour, not a regression). Known open items — old matrix not deleted, no workflow_dispatch, no push trigger — are already tracked in prior threads and do not block merge.

test-unit-proxy-legacy.yml — setup duplication and alphabetical glob partitioning warrant a maintenance comment for future contributors

Important Files Changed

Filename Overview
.github/workflows/_test-unit-base.yml New reusable base workflow with SHA-pinned actions, env-indirection for template injection prevention, and parameterized pytest runner
.github/workflows/test-unit-proxy-legacy.yml Legacy proxy test workflow inlines all 6 setup steps from base (no reuse possible due to matrix), with alphabetic glob partitioning across 9 matrix groups; setup must be manually kept in sync with _test-unit-base.yml
.github/workflows/test-unit-llm-providers.yml Splits LLM providers into vertex-ai (workers:1 for isolation) and all-others (--ignore= flag for exclusion), matching old matrix structure
.github/workflows/test-unit-proxy-endpoints.yml Covers 15 proxy endpoint subdirectories split from old monolithic proxy-misc group; rag_endpoints and realtime_endpoints directories absent (pre-existing gap, not a regression)
.github/workflows/test-unit-enterprise-routing.yml Adds router_strategy (was in old other-3) alongside enterprise, google_genai, router_utils; uses default 20-minute timeout for 4 directories

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    PR[Pull Request to main] --> CW1
    PR --> CW2
    PR --> CW3
    PR --> CW4
    PR --> CW5
    PR --> CW6
    PR --> CW7
    PR --> CW8
    PR --> CW9
    PR --> CW10

    CW1["test-unit-llm-providers\n(vertex-ai + other-providers)"]
    CW2["test-unit-proxy-auth\n(auth / hooks / policy / client)"]
    CW3["test-unit-proxy-endpoints\n(15 endpoint subdirs)"]
    CW4["test-unit-proxy-infra\n(db / middleware / spend / pass-through)"]
    CW5["test-unit-core-utils\n(litellm_core_utils)"]
    CW6["test-unit-integrations\n(callbacks & logging)"]
    CW7["test-unit-responses-caching-types\n(responses / caching / types)"]
    CW8["test-unit-enterprise-routing\n(enterprise / google_genai / router)"]
    CW9["test-unit-misc\n(MCP / secrets / containers / root)"]
    CW10["test-unit-proxy-legacy\n(matrix: 9 alphabetic groups)"]

    CW1 --> BASE["_test-unit-base.yml\n(reusable workflow)"]
    CW2 --> BASE
    CW3 --> BASE
    CW4 --> BASE
    CW5 --> BASE
    CW6 --> BASE
    CW7 --> BASE
    CW8 --> BASE
    CW9 --> BASE

    CW10 --> INLINE["Inline setup\n(duplicated from base)"]

    BASE --> STEPS["checkout → python → poetry\n→ cache → install deps\n→ enterprise → prisma → pytest"]
    INLINE --> STEPS2["checkout → python → poetry\n→ cache → install deps\n→ enterprise → prisma\n→ pytest (matrix loop)"]
Loading

Reviews (3): Last reviewed commit: "[Infra] Remove workflows that require AP..." | Re-trigger Greptile

Comment on lines +4 to +5
pull_request:
branches: [main]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Old matrix workflow not removed — tests will run twice

The PR description states these new workflows "replace" test-litellm-matrix.yml, but that file still exists and has the same pull_request: branches: [main] trigger. With both active, every PR will now run the old matrix jobs AND all 16 new workflow runs concurrently. This doubles CI minutes and resource consumption for every PR.

The old workflow covers the same test paths (llms-vertex, llms-other, proxy-guardrails, proxy-core, proxy-misc, integrations, core-utils, other-1 through other-3, root, and all 9 proxy-unit-* entries) — fully overlapping the new workflows. You'll want to delete .github/workflows/test-litellm-matrix.yml as part of this PR (or a fast-follow) to avoid the duplication.

Comment on lines +4 to +6
pull_request:
branches: [main]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No workflow_dispatch trigger on any new workflow

None of the 16 new workflows include a workflow_dispatch trigger, making it impossible to manually re-run a specific test suite from the GitHub Actions UI without pushing a new commit. This is particularly inconvenient when a flaky test needs a targeted re-run. Adding workflow_dispatch: {} (or with optional inputs) to each caller workflow would restore that capability.

Suggested change
pull_request:
branches: [main]
on:
pull_request:
branches: [main]
workflow_dispatch: {}

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +43 to +96
steps:
- uses: actions/checkout@08eba0b27e820071cde6df949e0beb9ba4906955 # v4.3.0
with:
persist-credentials: false

- name: Set up Python
uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
with:
python-version: "3.12"

- name: Install Poetry
run: pip install 'poetry==2.3.2'

- name: Cache Poetry dependencies
uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4.3.0
with:
path: |
~/.cache/pypoetry
~/.cache/pip
.venv
key: ${{ runner.os }}-poetry-${{ hashFiles('poetry.lock') }}
restore-keys: |
${{ runner.os }}-poetry-

- name: Install dependencies
run: |
poetry config virtualenvs.in-project true
poetry install --with dev,proxy-dev --extras "proxy semantic-router"
poetry run pip install google-genai==1.22.0 \
google-cloud-aiplatform==1.115.0 fastapi-offline==1.7.3 python-multipart==0.0.22 openapi-core==0.23.0

- name: Setup litellm-enterprise
run: |
poetry run pip install --force-reinstall --no-deps -e enterprise/

- name: Generate Prisma client
env:
PRISMA_BINARY_CACHE_DIR: ${{ runner.temp }}/prisma-cache
run: |
poetry run pip install nodejs-wheel-binaries==24.13.1
poetry run prisma generate --schema litellm/proxy/schema.prisma

- name: Run tests - ${{ matrix.test-group.name }}
env:
TEST_PATH: ${{ matrix.test-group.path }}
run: |
poetry run pytest ${TEST_PATH} \
--tb=short -vv \
--maxfail=10 \
-n 2 \
--reruns 1 \
--reruns-delay 1 \
--dist=loadscope \
--durations=20
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Full setup duplicated from _test-unit-base.yml

Because workflow_call doesn't support matrix strategies, test-unit-proxy-legacy.yml inlines all six setup steps (checkout, Python, Poetry install, cache, dependencies, enterprise, Prisma generate) that already live in the base workflow. Any future change to the shared setup (e.g. bumping poetry or google-genai versions, adding a new install step) must be manually applied in both places.

Consider adding a comment to the top of the job explicitly flagging this as intentional duplication that must be kept in sync with _test-unit-base.yml, so future maintainers don't accidentally diverge the two.

Comment on lines +4 to +6
pull_request:
branches: [main]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No push trigger on main — post-merge regressions undetected

All 16 workflows only trigger on pull_request: branches: [main]. Once a PR is merged, none of these jobs run again, so a merge that introduces a regression on main won't be caught until the next PR opens. Consider adding a push: branches: [main] trigger to at least the most critical suites (e.g. proxy-auth, llm-providers) to maintain a green main signal.

This also applies to: test-unit-documentation.yml, test-unit-enterprise-routing.yml, test-unit-integrations.yml, test-unit-litellm-utils.yml, test-unit-llm-providers.yml, test-unit-misc.yml, test-unit-pass-through.yml, test-unit-proxy-auth.yml, test-unit-proxy-endpoints.yml, test-unit-proxy-infra.yml, test-unit-proxy-legacy.yml, test-unit-responses-caching-types.yml, test-unit-router.yml, test-unit-security.yml.

Rename job keys from generic 'test' to descriptive names (e.g.,
'core-utils', 'proxy-auth', 'router') so GitHub checks display as
'core-utils / run' instead of 'test / test'.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
These test suites are not pure unit tests and don't belong in Phase 1:
- litellm_utils_tests: health check tests need OPENAI_API_KEY
- pass_through_unit_tests: tests hit real Anthropic API
- router_unit_tests: tests call real OpenAI moderation endpoints
- proxy_security_tests: requires DATABASE_URL (Postgres)
- documentation_tests: requires docs directory at specific relative path

These will be re-added in later phases with proper secret scoping.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@yuneng-berri yuneng-berri merged commit 428d837 into main Mar 28, 2026
57 of 105 checks passed
@yuneng-berri yuneng-berri deleted the litellm_unit_test_workflow_isolation branch March 28, 2026 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants