Skip to content

fix(eks): forward stored AWS integration credentials to k8s client builder#976

Merged
muddlebee merged 5 commits intoTracer-Cloud:mainfrom
chaosreload:fix/eks-k8s-client-ambient-credentials
Apr 28, 2026
Merged

fix(eks): forward stored AWS integration credentials to k8s client builder#976
muddlebee merged 5 commits intoTracer-Cloud:mainfrom
chaosreload:fix/eks-k8s-client-ambient-credentials

Conversation

@chaosreload
Copy link
Copy Markdown
Contributor

Fixes #969. Follow-up to #724.

Summary

PR #724 gated EKS source detection so that AWS integrations configured with stored IAM user credentials (no role_arn) can reach the EKS branch. This PR completes the story by actually using those stored credentials in build_k8s_clients.

Changes

  1. app/nodes/plan_actions/detect_sources.py: forward _eks_int.get("credentials") into eks_params["credentials"].
  2. app/services/eks/eks_k8s_client.py::build_k8s_clients: new optional credentials kwarg. Resolution priority:
    1. explicit credentials (stored-integration, new)
    2. role_arn → STS AssumeRole (existing production path)
    3. ambient botocore chain (fix(eks): detect EKS source when AWS integration uses IAM user credentials #724 fallback)

Secondary fixes from #724 preserved: empty SessionToken coerced to None, regional STS endpoint.

Why approach #2

In the review of #724, Greptile analyzed two candidate follow-ups and recommended this one:

"Approach #2 is the right call for the follow-up. The ambient botocore resolver would silently miss stored-integration credentials exactly when the user is relying on them."

#724 (comment)

Explicit forwarding also mirrors how every other integration (grafana, bitbucket, github, datadog) uses the values that resolve_integrations persisted.

Regression safety

The role_arn (AssumeRole) and ambient-botocore branches are unchanged. The new explicit branch only runs when credentials is truthy and has both access_key_id and secret_access_key, so existing deployments see no behaviour change.

Testing

  • python -m py_compile passes on both files.
  • ruff check and ruff format --check clean on both files.
  • Follows the project lint config in pyproject.toml.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 26, 2026

Greptile Summary

This PR correctly stores credentials in eks_params (in detect_sources.py) and adds a three-tier credential resolution to build_k8s_clients, but the chain is broken in the middle: the _eks_creds helper used by all seven EKS tool modules never includes credentials, so no tool call site ever passes it to build_k8s_clients. The new explicit-credential branch remains dead code and stored IAM credentials are still silently ignored.

  • P1 — _eks_creds must include credentials: every EKS tool's extract_params spreads **_eks_creds(eks), so the omission cascades to all seven build_k8s_clients call sites (e.g. line 81 of EKSPodLogsTool, line 71 of EKSListPodsTool, line 74 of EKSEventsTool, and four more). Each tool function signature and call site also needs a credentials parameter forwarded.

Confidence Score: 4/5

Not safe to merge as-is — the feature this PR claims to deliver does not work because credentials are never forwarded to build_k8s_clients.

A single P1 finding makes the feature non-functional. Existing deployments using role_arn or ambient creds are unaffected (fallback paths unchanged), but the new stored-credential path — the sole purpose of this PR — cannot be reached from any tool invocation.

app/tools/EKSListClustersTool/__init__.py (_eks_creds helper) and all seven EKS tool modules that call build_k8s_clients need credentials wired through.

Important Files Changed

Filename Overview
app/nodes/plan_actions/detect_sources.py Correctly forwards _eks_int.get("credentials") or None into eks_params["credentials"]; this half of the fix is correct on its own.
app/services/eks/eks_k8s_client.py New credentials kwarg and three-tier resolution logic are well-structured, but the parameter is never actually passed by any caller; _eks_creds in EKSListClustersTool omits it, so the explicit-credential branch is unreachable.

Sequence Diagram

sequenceDiagram
    participant DS as detect_sources.py
    participant EKS as eks_params (sources["eks"])
    participant EP as extract_params (_eks_creds)
    participant TF as Tool function
    participant BK as build_k8s_clients

    DS->>EKS: eks_params["credentials"] = _eks_int.get("credentials") ✓
    EKS->>EP: sources["eks"]["credentials"] exists
    EP-->>TF: **_eks_creds(eks) → role_arn, external_id, region only ❌ credentials NOT included
    TF->>BK: build_k8s_clients(cluster_name, role_arn, external_id, region) ❌ credentials kwarg never passed
    Note over BK: explicit-credential branch is unreachable dead code
Loading

Comments Outside Diff (1)

  1. app/tools/EKSListClustersTool/__init__.py, line 20-25 (link)

    P1 credentials never forwarded through _eks_creds — new code path is unreachable

    _eks_creds is the shared helper that every EKS tool (EKSPodLogsTool, EKSListPodsTool, EKSEventsTool, EKSNodeHealthTool, EKSListDeploymentsTool, EKSDeploymentStatusTool, EKSListNamespacesTool) uses to build its keyword arguments, and none of those tools pass anything beyond what this helper returns. Because credentials is absent here, it never reaches any of the seven build_k8s_clients(cluster_name, role_arn, external_id, region) call sites — the new explicit-credential branch added in eks_k8s_client.py is effectively dead code. Users with stored IAM credentials but no role_arn will still silently fall through to the ambient botocore chain, which is exactly the bug this PR claims to fix.

    def _eks_creds(eks: dict) -> dict:
        return {
            "role_arn": eks.get("role_arn", ""),
            "external_id": eks.get("external_id", ""),
            "region": eks.get("region", "us-east-1"),
            "credentials": eks.get("credentials"),
        }

    Each tool function signature and its build_k8s_clients call site will also need to be updated to accept and forward the new credentials kwarg.

Reviews (1): Last reviewed commit: "fix(eks): forward stored AWS integration..." | Re-trigger Greptile

…ilder

Fixes Tracer-Cloud#969. Follow-up to Tracer-Cloud#724.

Problem
-------
PR Tracer-Cloud#724 let EKS be detected when the AWS integration is configured with
stored IAM user credentials (access_key_id + secret_access_key, no
role_arn), but build_k8s_clients still had no way to use those stored
credentials. After Tracer-Cloud#724 it supports:

  1. role_arn -> STS AssumeRole
  2. ambient botocore chain (env / shared config / instance profile / IRSA)

The stored integration credentials (persisted by resolve_integrations
and surfaced in _eks_int['credentials']) were silently dropped. On hosts
that happen to have ambient credentials pointing elsewhere, this picks
up the wrong principal; on hosts without ambient credentials the call
fails entirely, even though the integration has perfectly usable keys.

Fix (approach Tracer-Cloud#2, pre-approved by Greptile in PR Tracer-Cloud#724 review)
-------------------------------------------------------------
Forward the stored credentials end-to-end through the EKS pipeline:

1. detect_sources.py: set eks_params['credentials'] = _eks_int.get(
   'credentials') or None alongside the other EKS integration fields.

2. EKSListClustersTool._eks_creds (shared helper used by every EKS tool
   for its extract_params): include 'credentials' in the returned dict
   so it flows into each tool's kwargs.

3. Seven EKS tool functions accept a new `credentials` kwarg and forward
   it to build_k8s_clients (and surface it in their input_schema):
     EKSPodLogsTool, EKSListPodsTool, EKSEventsTool, EKSNodeHealthTool,
     EKSListDeploymentsTool, EKSDeploymentStatusTool,
     EKSListNamespacesTool.

4. eks_k8s_client.build_k8s_clients accepts an optional credentials
   kwarg. Resolution priority:
     1. explicit credentials kwarg (stored-integration, new)
     2. role_arn AssumeRole         (existing production path)
     3. ambient botocore chain      (Tracer-Cloud#724 fallback)

The two secondary fixes from Tracer-Cloud#724 are preserved (empty SessionToken
coerced to None, regional STS endpoint).

Addresses Greptile P1
---------------------
The initial commit only wired the two ends of the pipeline (detect_sources
and build_k8s_clients); Greptile's review of Tracer-Cloud#976 correctly pointed out
that the shared _eks_creds helper and the seven tool call sites in between
were never updated, leaving the new credentials branch unreachable. This
revision wires credentials through every intermediate layer so each of
the seven EKS tools passes credentials=credentials to build_k8s_clients.

Test plan
---------
- py_compile passes on all 10 changed files.
- ruff check + ruff format clean on all 10 changed files.
- Regression safety: the role_arn (AssumeRole) and ambient-botocore
  branches are unchanged; the new explicit branch only runs when the
  credentials kwarg is truthy and has access_key_id + secret_access_key.

Related
-------
- Fixes Tracer-Cloud#969.
- Follow-up to Tracer-Cloud#724.
- Greptile pre-approved this approach in
  Tracer-Cloud#724 (comment)
- Greptile P1 on Tracer-Cloud#976 that drove the helper+tools fixups:
  Tracer-Cloud#976 (comment)
weichao and others added 3 commits April 27, 2026 03:36
The amend (ac2d841) wired credentials through _eks_creds helper + 7 EKS
tool call sites per Greptile P1 on the initial commit b03928d. Pushing
an empty commit to ensure Greptile re-reviews the new diff (10 files,
+114/-11 vs the original 2 files, +50/-4).
First re-trigger commit (61f15ed) did not produce a Greptile re-review after ~4h. Pushing another empty commit to kick Greptile again so the end-to-end credentials wiring (helper + 7 EKS tools in ac2d841) gets a fresh review.
PR Tracer-Cloud#976 wired stored AWS-integration credentials end-to-end for the
seven k8s-API EKS tools that route through `build_k8s_clients`, but
missed the four EKS tools that still go through `EKSClient`:
`list_eks_clusters`, `describe_eks_cluster`, `describe_eks_addon`,
`get_eks_nodegroup_health`. Their `_eks_creds` helper already exposes
`credentials`, but each tool absorbed it via `**_kwargs` and never
forwarded it, so `EKSClient._build` unconditionally called
`sts.assume_role(RoleArn="", ...)` and raised `ParamValidationError`
for IAM-user-only integrations at the cluster-discovery step.

Greptile flagged this exact gap on the (now-closed) parallel PR Tracer-Cloud#990.
This commit closes it on Tracer-Cloud#976.

Changes
-------
- `EKSClient.__init__` takes an optional `credentials` kwarg. `_build`
  prefers stored IAM user keys over STS AssumeRole; if neither is
  provided it raises `ValueError` (no silent fallthrough).
- New shared helper `_stored_credentials_to_aws_creds` normalises the
  stored dict into the AssumeRole-shaped triplet (empty `session_token`
  → None, both required keys present). `build_k8s_clients` now uses
  the same helper so both code paths stay in sync.
- The four EKSClient-backed tools accept `credentials` in the signature,
  expose it in `input_schema`, and pass it into the constructor.

Tests
-----
- New `tests/services/test_eks_client.py` covers the helper (full
  triplet / empty session token / missing required keys / None / empty
  dict) and EKSClient credential resolution (stored skips AssumeRole,
  empty session token coerced, partial dict falls back to AssumeRole,
  no creds + no role_arn raises ValueError, AssumeRole external_id
  passthrough stays intact).
- `test_eks_list_clusters_tool.py` grows two forwarding tests: the
  kwarg threads into EKSClient, and stays None by default for existing
  role-based callers.

Quality gate
------------
make lint / format-check / typecheck / test-cov all green
(3328 passed, 2 skipped, 1 xfailed).

Related
-------
- Extends Tracer-Cloud#976
- Fixes Tracer-Cloud#969
- Closes Greptile P1 on Tracer-Cloud#990
  (Tracer-Cloud#990 (comment))
@chaosreload
Copy link
Copy Markdown
Contributor Author

Pushed an additional commit () that closes the EKSClient-path gap Greptile flagged on the (now-closed) parallel PR #990. Until now #976 only covered the seven k8s-API tools that route through build_k8s_clients; list_eks_clusters, describe_eks_cluster, describe_eks_addon, and get_eks_nodegroup_health still hit sts.assume_role(RoleArn=\"\", ...) inside EKSClient._build for IAM-user-only integrations and raised ParamValidationError at the cluster-discovery step.

Changes

  • EKSClient.__init__ accepts an optional credentials kwarg; _build prefers stored IAM user keys over sts:AssumeRole and raises ValueError when both are missing (no silent fallthrough).
  • New shared helper _stored_credentials_to_aws_creds keeps the normalisation rules (empty session_tokenNone, both required keys present) in one place — build_k8s_clients now uses the same helper so both code paths stay in sync.
  • The four EKSClient-backed tools accept and forward credentials (function signature, input_schema.properties, constructor call).

Tests

  • New tests/services/test_eks_client.py covers the helper (full triplet / empty session token / missing required keys / None / empty dict) and EKSClient credential resolution (stored skips AssumeRole, empty token coerced to None, partial dict falls back to AssumeRole, no creds + no role_arn raises ValueError, existing AssumeRole + external_id passthrough stays intact).
  • test_eks_list_clusters_tool.py grows two forwarding tests: the kwarg threads into EKSClient, and defaults to None for existing role-based callers.

Quality gate

make lint          → All checks passed!
make format-check  → 905 files already formatted
make typecheck     → Success: no issues found in 391 source files
make test-cov      → 3328 passed, 2 skipped, 1 xfailed in 42.39s

@@ -1,21 +1,71 @@
"""EKS boto3 client with IAM role assumption."""
"""EKS boto3 client — stored credentials preferred, AssumeRole fallback."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing from __future__ import annotations — required by the project's code-style rules for all Python files.

Comment thread app/services/eks/eks_k8s_client.py Outdated
import botocore.credentials
from kubernetes import client as k8s_client

from app.services.eks.eks_client import _stored_credentials_to_aws_creds
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing a private helper (underscore-prefixed) across module boundaries couples two service files. Move _stored_credentials_to_aws_creds to app/services/eks/utils.py (or drop the leading underscore and make it part of the public API) so the import contract is intentional.

assumed = _assume_role(role_arn, external_id, "TracerEKSK8sInvestigation")
else:
# No role_arn and no explicit creds: fall back to ambient AWS
# credentials (env AK/SK, shared config profile, or instance profile
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline import inside a function body. Project convention requires absolute imports at the top of the file. botocore is already imported at module level (lines 14-16), so there's no circular-import concern — move this to the top-level imports.

logger.error("[eks] %s", msg)
raise RuntimeError(msg)
frozen = ambient.get_frozen_credentials()
assumed = {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frozen.token or "" stores an empty string, then assumed["SessionToken"] or None (line 131) silently corrects it downstream. Normalise at the source instead: frozen.token or None — matches the behaviour of _stored_credentials_to_aws_creds and removes the hidden dependency on the downstream guard.

role_arn: str,
external_id: str = "",
region: str = "us-east-1",
credentials: dict | None = None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use dict[str, Any] | None instead of bare dict | None — the same pattern is already used in the service layer and it keeps mypy strict mode happy. Applies identically to the other nine tool files changed in this PR.

Addresses all five inline comments from @muddlebee on Tracer-Cloud#976:

1. **eks_client.py** — add `from __future__ import annotations`
   (project style convention).

2. **utils module, public helper** — move
   `_stored_credentials_to_aws_creds` out of `eks_client.py` into a
   new `app/services/eks/utils.py` and drop the leading underscore
   (`stored_credentials_to_aws_creds`). The import contract between
   `eks_client` and `eks_k8s_client` is now explicit public API
   rather than two services reaching into each other for a private
   helper.

3. **eks_k8s_client.py** — promote the `import botocore.session`
   that lived in the ambient-fallback branch to the top-level imports.
   No circular-import concern (`botocore` is already imported at
   module level).

4. **eks_k8s_client.py** — normalize `frozen.token or None` at the
   source instead of relying on the downstream `or None` guard. This
   matches the behaviour of `stored_credentials_to_aws_creds` and
   removes the hidden dependency.

5. **All 11 EKS tool files** — tighten `credentials` parameter type
   annotation from `dict | None` to `dict[str, Any] | None`, matching
   the service-layer style and keeping mypy strict mode happy.

The `tests/services/test_eks_client.py` import also switches to the
new public name and module path.

Quality gate
------------
make lint          → All checks passed!
make format-check  → 906 files already formatted
make typecheck     → Success: no issues found in 392 source files
make test-cov      → 3328 passed, 2 skipped, 1 xfailed in 43.36s

No behaviour changes — pure refactor of public surface + style.
@chaosreload
Copy link
Copy Markdown
Contributor Author

@muddlebee thanks for the thorough review — all five points addressed in abb3345:

# Your comment Fix
1 eks_client.py missing from __future__ import annotations Added.
2 Cross-module private-helper import couples eks_client and eks_k8s_client Helper moved to new app/services/eks/utils.py and renamed to stored_credentials_to_aws_creds (public). Both service files now import from utils, so the contract is intentional.
3 Inline import botocore.session inside the ambient-fallback branch Promoted to top-level imports.
4 frozen.token or \"\" relies on a downstream or None guard Normalised at the source: frozen.token or None, matching stored_credentials_to_aws_creds.
5 credentials: dict | None should be dict[str, Any] | None Updated in all 11 EKS tool files.

Quality gate clean:

make lint          → All checks passed!
make format-check  → 906 files already formatted
make typecheck     → Success: no issues found in 392 source files
make test-cov      → 3328 passed, 2 skipped, 1 xfailed in 43.36s

No behaviour changes — pure refactor of public surface + style.

@muddlebee muddlebee merged commit d2558e9 into Tracer-Cloud:main Apr 28, 2026
7 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

🎊 Achievement unlocked: PR Merged. @chaosreload passed code review, survived CI, and shipped. Respect. 🤝


👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EKS k8s client silently ignores stored AWS integration credentials (follow-up to #724)

2 participants