fix(eks): forward stored AWS integration credentials to k8s client builder#976
Conversation
Greptile SummaryThis PR correctly stores
Confidence Score: 4/5Not safe to merge as-is — the feature this PR claims to deliver does not work because credentials are never forwarded to A single P1 finding makes the feature non-functional. Existing deployments using
Important Files Changed
Sequence DiagramsequenceDiagram
participant DS as detect_sources.py
participant EKS as eks_params (sources["eks"])
participant EP as extract_params (_eks_creds)
participant TF as Tool function
participant BK as build_k8s_clients
DS->>EKS: eks_params["credentials"] = _eks_int.get("credentials") ✓
EKS->>EP: sources["eks"]["credentials"] exists
EP-->>TF: **_eks_creds(eks) → role_arn, external_id, region only ❌ credentials NOT included
TF->>BK: build_k8s_clients(cluster_name, role_arn, external_id, region) ❌ credentials kwarg never passed
Note over BK: explicit-credential branch is unreachable dead code
|
…ilder Fixes Tracer-Cloud#969. Follow-up to Tracer-Cloud#724. Problem ------- PR Tracer-Cloud#724 let EKS be detected when the AWS integration is configured with stored IAM user credentials (access_key_id + secret_access_key, no role_arn), but build_k8s_clients still had no way to use those stored credentials. After Tracer-Cloud#724 it supports: 1. role_arn -> STS AssumeRole 2. ambient botocore chain (env / shared config / instance profile / IRSA) The stored integration credentials (persisted by resolve_integrations and surfaced in _eks_int['credentials']) were silently dropped. On hosts that happen to have ambient credentials pointing elsewhere, this picks up the wrong principal; on hosts without ambient credentials the call fails entirely, even though the integration has perfectly usable keys. Fix (approach Tracer-Cloud#2, pre-approved by Greptile in PR Tracer-Cloud#724 review) ------------------------------------------------------------- Forward the stored credentials end-to-end through the EKS pipeline: 1. detect_sources.py: set eks_params['credentials'] = _eks_int.get( 'credentials') or None alongside the other EKS integration fields. 2. EKSListClustersTool._eks_creds (shared helper used by every EKS tool for its extract_params): include 'credentials' in the returned dict so it flows into each tool's kwargs. 3. Seven EKS tool functions accept a new `credentials` kwarg and forward it to build_k8s_clients (and surface it in their input_schema): EKSPodLogsTool, EKSListPodsTool, EKSEventsTool, EKSNodeHealthTool, EKSListDeploymentsTool, EKSDeploymentStatusTool, EKSListNamespacesTool. 4. eks_k8s_client.build_k8s_clients accepts an optional credentials kwarg. Resolution priority: 1. explicit credentials kwarg (stored-integration, new) 2. role_arn AssumeRole (existing production path) 3. ambient botocore chain (Tracer-Cloud#724 fallback) The two secondary fixes from Tracer-Cloud#724 are preserved (empty SessionToken coerced to None, regional STS endpoint). Addresses Greptile P1 --------------------- The initial commit only wired the two ends of the pipeline (detect_sources and build_k8s_clients); Greptile's review of Tracer-Cloud#976 correctly pointed out that the shared _eks_creds helper and the seven tool call sites in between were never updated, leaving the new credentials branch unreachable. This revision wires credentials through every intermediate layer so each of the seven EKS tools passes credentials=credentials to build_k8s_clients. Test plan --------- - py_compile passes on all 10 changed files. - ruff check + ruff format clean on all 10 changed files. - Regression safety: the role_arn (AssumeRole) and ambient-botocore branches are unchanged; the new explicit branch only runs when the credentials kwarg is truthy and has access_key_id + secret_access_key. Related ------- - Fixes Tracer-Cloud#969. - Follow-up to Tracer-Cloud#724. - Greptile pre-approved this approach in Tracer-Cloud#724 (comment) - Greptile P1 on Tracer-Cloud#976 that drove the helper+tools fixups: Tracer-Cloud#976 (comment)
b03928d to
ac2d841
Compare
PR Tracer-Cloud#976 wired stored AWS-integration credentials end-to-end for the seven k8s-API EKS tools that route through `build_k8s_clients`, but missed the four EKS tools that still go through `EKSClient`: `list_eks_clusters`, `describe_eks_cluster`, `describe_eks_addon`, `get_eks_nodegroup_health`. Their `_eks_creds` helper already exposes `credentials`, but each tool absorbed it via `**_kwargs` and never forwarded it, so `EKSClient._build` unconditionally called `sts.assume_role(RoleArn="", ...)` and raised `ParamValidationError` for IAM-user-only integrations at the cluster-discovery step. Greptile flagged this exact gap on the (now-closed) parallel PR Tracer-Cloud#990. This commit closes it on Tracer-Cloud#976. Changes ------- - `EKSClient.__init__` takes an optional `credentials` kwarg. `_build` prefers stored IAM user keys over STS AssumeRole; if neither is provided it raises `ValueError` (no silent fallthrough). - New shared helper `_stored_credentials_to_aws_creds` normalises the stored dict into the AssumeRole-shaped triplet (empty `session_token` → None, both required keys present). `build_k8s_clients` now uses the same helper so both code paths stay in sync. - The four EKSClient-backed tools accept `credentials` in the signature, expose it in `input_schema`, and pass it into the constructor. Tests ----- - New `tests/services/test_eks_client.py` covers the helper (full triplet / empty session token / missing required keys / None / empty dict) and EKSClient credential resolution (stored skips AssumeRole, empty session token coerced, partial dict falls back to AssumeRole, no creds + no role_arn raises ValueError, AssumeRole external_id passthrough stays intact). - `test_eks_list_clusters_tool.py` grows two forwarding tests: the kwarg threads into EKSClient, and stays None by default for existing role-based callers. Quality gate ------------ make lint / format-check / typecheck / test-cov all green (3328 passed, 2 skipped, 1 xfailed). Related ------- - Extends Tracer-Cloud#976 - Fixes Tracer-Cloud#969 - Closes Greptile P1 on Tracer-Cloud#990 (Tracer-Cloud#990 (comment))
|
Pushed an additional commit () that closes the Changes
Tests
Quality gate |
| @@ -1,21 +1,71 @@ | |||
| """EKS boto3 client with IAM role assumption.""" | |||
| """EKS boto3 client — stored credentials preferred, AssumeRole fallback.""" | |||
There was a problem hiding this comment.
Missing from __future__ import annotations — required by the project's code-style rules for all Python files.
| import botocore.credentials | ||
| from kubernetes import client as k8s_client | ||
|
|
||
| from app.services.eks.eks_client import _stored_credentials_to_aws_creds |
There was a problem hiding this comment.
Importing a private helper (underscore-prefixed) across module boundaries couples two service files. Move _stored_credentials_to_aws_creds to app/services/eks/utils.py (or drop the leading underscore and make it part of the public API) so the import contract is intentional.
| assumed = _assume_role(role_arn, external_id, "TracerEKSK8sInvestigation") | ||
| else: | ||
| # No role_arn and no explicit creds: fall back to ambient AWS | ||
| # credentials (env AK/SK, shared config profile, or instance profile |
There was a problem hiding this comment.
Inline import inside a function body. Project convention requires absolute imports at the top of the file. botocore is already imported at module level (lines 14-16), so there's no circular-import concern — move this to the top-level imports.
| logger.error("[eks] %s", msg) | ||
| raise RuntimeError(msg) | ||
| frozen = ambient.get_frozen_credentials() | ||
| assumed = { |
There was a problem hiding this comment.
frozen.token or "" stores an empty string, then assumed["SessionToken"] or None (line 131) silently corrects it downstream. Normalise at the source instead: frozen.token or None — matches the behaviour of _stored_credentials_to_aws_creds and removes the hidden dependency on the downstream guard.
| role_arn: str, | ||
| external_id: str = "", | ||
| region: str = "us-east-1", | ||
| credentials: dict | None = None, |
There was a problem hiding this comment.
Use dict[str, Any] | None instead of bare dict | None — the same pattern is already used in the service layer and it keeps mypy strict mode happy. Applies identically to the other nine tool files changed in this PR.
Addresses all five inline comments from @muddlebee on Tracer-Cloud#976: 1. **eks_client.py** — add `from __future__ import annotations` (project style convention). 2. **utils module, public helper** — move `_stored_credentials_to_aws_creds` out of `eks_client.py` into a new `app/services/eks/utils.py` and drop the leading underscore (`stored_credentials_to_aws_creds`). The import contract between `eks_client` and `eks_k8s_client` is now explicit public API rather than two services reaching into each other for a private helper. 3. **eks_k8s_client.py** — promote the `import botocore.session` that lived in the ambient-fallback branch to the top-level imports. No circular-import concern (`botocore` is already imported at module level). 4. **eks_k8s_client.py** — normalize `frozen.token or None` at the source instead of relying on the downstream `or None` guard. This matches the behaviour of `stored_credentials_to_aws_creds` and removes the hidden dependency. 5. **All 11 EKS tool files** — tighten `credentials` parameter type annotation from `dict | None` to `dict[str, Any] | None`, matching the service-layer style and keeping mypy strict mode happy. The `tests/services/test_eks_client.py` import also switches to the new public name and module path. Quality gate ------------ make lint → All checks passed! make format-check → 906 files already formatted make typecheck → Success: no issues found in 392 source files make test-cov → 3328 passed, 2 skipped, 1 xfailed in 43.36s No behaviour changes — pure refactor of public surface + style.
|
@muddlebee thanks for the thorough review — all five points addressed in
Quality gate clean: No behaviour changes — pure refactor of public surface + style. |
|
🎊 Achievement unlocked: PR Merged. @chaosreload passed code review, survived CI, and shipped. Respect. 🤝 👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome. |

Fixes #969. Follow-up to #724.
Summary
PR #724 gated EKS source detection so that AWS integrations configured with stored IAM user credentials (no
role_arn) can reach the EKS branch. This PR completes the story by actually using those stored credentials inbuild_k8s_clients.Changes
app/nodes/plan_actions/detect_sources.py: forward_eks_int.get("credentials")intoeks_params["credentials"].app/services/eks/eks_k8s_client.py::build_k8s_clients: new optionalcredentialskwarg. Resolution priority:credentials(stored-integration, new)role_arn→ STS AssumeRole (existing production path)Secondary fixes from #724 preserved: empty
SessionTokencoerced toNone, regional STS endpoint.Why approach #2
In the review of #724, Greptile analyzed two candidate follow-ups and recommended this one:
— #724 (comment)
Explicit forwarding also mirrors how every other integration (
grafana,bitbucket,github,datadog) uses the values thatresolve_integrationspersisted.Regression safety
The
role_arn(AssumeRole) and ambient-botocore branches are unchanged. The new explicit branch only runs whencredentialsis truthy and has bothaccess_key_idandsecret_access_key, so existing deployments see no behaviour change.Testing
python -m py_compilepasses on both files.ruff checkandruff format --checkclean on both files.pyproject.toml.