feat: add Argo CD deployment investigation integration#948
Conversation
Greptile SummaryThis PR adds a read-only Argo CD integration covering client auth (bearer token + session login), credential redaction, URL safety enforcement, source detection, and two investigation tools (
Confidence Score: 4/5Safe to merge after fixing the list-mode evidence mapper; the P1 is isolated and doesn't affect the single-app status or diff paths. One confirmed P1 logic defect: _map_argocd_application_status silently drops all application data in list-mode. All other findings are P2 style/quality suggestions. The rest of the integration — auth, redaction, URL validation, drift detection, and tests — is solid. app/nodes/investigate/processing/post_process.py (lines 325–329) and app/tools/ArgoCDApplicationStatusTool/init.py (list-mode handling) Important Files Changed
Sequence DiagramsequenceDiagram
participant DS as detect_sources
participant ST as ArgoCDStatusTool
participant DT as ArgoCDDiffTool
participant C as ArgoCDClient
participant API as Argo CD API
DS->>DS: Extract app_name / hint from alert
DS-->>ST: sources[argocd] = {base_url, bearer_token, application_name, ...}
ST->>C: make_argocd_client(base_url, token)
alt application_name present
C->>API: GET /api/v1/applications/{name}
API-->>C: application payload
C-->>ST: {application: {...}, recent_history: [...]}
ST-->>post_process: _map_argocd_application_status → argocd_application, argocd_revision_history
else application_name absent
C->>API: GET /api/v1/applications
API-->>C: {items: [...]}
C-->>ST: {applications: [...], total: N}
ST-->>post_process: ⚠️ _map_argocd_application_status drops applications list
end
DT->>C: make_argocd_client(base_url, token)
C->>API: GET /api/v1/applications/{name}/server-side-diff
API-->>C: {diffs: [...]}
C-->>DT: {drift_detected: bool, diffs: [sanitized]}
DT-->>post_process: _map_argocd_application_diff → argocd_drift_detected, argocd_diff
|
| @field_validator( | ||
| "bearer_token", | ||
| "username", | ||
| "password", | ||
| "project", | ||
| "app_namespace", | ||
| "integration_id", | ||
| mode="before", | ||
| ) | ||
| @classmethod | ||
| def _normalize_str(cls, value: object) -> str: | ||
| text = str(value or "").strip() | ||
| if text.lower().startswith("bearer "): | ||
| text = text.split(None, 1)[1].strip() | ||
| return text |
There was a problem hiding this comment.
_normalize_str applied to non-token fields strips "bearer " prefix unexpectedly
The same _normalize_str validator is applied to username, project, app_namespace, and integration_id in addition to bearer_token. The "bearer " prefix-stripping logic only makes sense for token fields. While unlikely to trigger in practice, a project or namespace whose string value happens to begin with "bearer " would be silently mutated. Consider applying the "bearer " stripping only to bearer_token, or splitting the validator into two separate validators.
|
Updated in
Validation:
|
| ).lower() | ||
| has_gitops_hint = any( | ||
| marker in argocd_hint_text | ||
| for marker in ("argocd", "argo cd", "gitops", "drift", "deploy") |
There was a problem hiding this comment.
"deploy" is a substring match — "EKS pod deployment failed", "deploy pipeline", etc. all trigger this, adding Argo CD as a source for unrelated alerts. Replace with tighter Argo CD-specific terms: ("argocd", "argo cd", "argo-cd", "gitops", "outofsynced").
| headers["Authorization"] = f"Bearer {token}" | ||
| headers.update(kwargs.pop("headers", {}) or {}) | ||
| response = self._get_client().request(method, path, headers=headers, **kwargs) | ||
| response.raise_for_status() |
There was a problem hiding this comment.
If raise_for_status() throws a 401, _session_token is never cleared. A reused client instance will replay the expired token instead of re-authenticating. Clear self._session_token = "" when catching a 401 here, or document on the class that instances must not be reused after session expiry.
| ) | ||
|
|
||
|
|
||
| def _is_loopback_host(host: str) -> bool: |
There was a problem hiding this comment.
_is_loopback_host and _validate_base_url are identical to the versions in app/integrations/models.py. Extract to app/utils/ or import from models to avoid the logic diverging.
|
Updated in
Validation run locally:
|
|
Nice work. Demo video is missing though? Can you quickly add in the PR description |
|
Follow-up self-review fix in
Additional validation:
|
|
Added the demo video to the PR description. It shows the Argo CD setup/configuration, local verification, read-only status/list/diff paths, and secret/diff redaction behavior. Transparency note: the recording uses a live shell with automated input and a local Argo CD-compatible endpoint, so the real OpenSRE code path is exercised without exposing live credentials or Kubernetes Secrets. |
missing as per Acceptance criteria |
3651b38 to
3a66b7e
Compare
|
Thanks for the catch. Addressed in the latest commit ( Added
I also updated the multi-instance integrations page to include Local validation passed:
Update: the first GitHub Actions run was cancelled before a hosted runner acquired the jobs. I re-pushed the same tree to trigger fresh checks; current head |
3a66b7e to
250c02a
Compare
|
@MestreY0d4-Uninter you still need to add entry here https://github.com/Tracer-Cloud/opensre/blob/main/.env.example for the env vars you created and document it!! and in your demo I dont understand the command you ran? could you pls run any of the scenario mentioned in the docs instead? and not some unit tests pls |
|
and pls give a walk through whats happening in demo |
| "argocd": "argocd", | ||
| "argo cd": "argocd", | ||
| "argo-cd": "argocd", |
There was a problem hiding this comment.
do we need all of them? probably best to just stick to one
|
Thanks for the follow-up notes. I addressed the remaining items in the latest head commit (
Validation after the follow-up changes:
The PR description now includes the updated demo video link and a walkthrough of what is happening in the demo. Credentials are hidden/redacted, and the video OCR/secret scan did not find raw passwords, bearer tokens, JWTs, GitHub tokens, AWS keys, or private keys. |
yashksaini-coder
left a comment
There was a problem hiding this comment.
Senior Review — PR #948: Argo CD Deployment Investigation Integration
Overall Assessment
Solid, production-quality implementation. One hard blocker (merge conflicts), the rest is minor. The contributor has been extremely responsive — Greptile's P1 (list-mode evidence loss), all three muddlebee P2 issues, and the "deploy" keyword broadness were all addressed in follow-up commits. Architecture follows established patterns cleanly.
✅ What's Done Well
Security hardening (thorough)
HTTPS-only with loopback HTTP exception, extracted toapp/utils/url_validation(shared with models) — no logic duplication- Dual-auth rejection in
ArgoCDConfig._no_dual_auth() - Bearer token stripped from non-token field validators correctly (fixed from Greptile's original P2)
- Credentials, session tokens, and retired tokens all included in
_redact()— retired token set is a nice touch - 401 session retry with
_retired_session_tokensclearing — prevents stale token replay _SECRET_LINE_RE+_GENERIC_SECRET_VALUE_REdiff sanitization covers JWTs, GitHub tokens, AWS keys, private keys
Architecture follows project conventions
ArgoCDConfig(StrictConfigModel)✅ (unlike the original Airflow PR which usedBaseModel)source="argocd"✅ (correct for prioritizer scoring)"argocd"inEvidenceSourceLiteral ✅argocd_applications+argocd_diffregistered in evidence gate ✅- URL validation in
app/utils/url_validation(shared helper, not duplicated) ✅ - Docs in
docs/argocd.mdxwith nav registration ✅ .env.exampleupdated ✅- Multi-instance support via
ARGOCD_INSTANCES✅
Greptile P1 fixed
_map_argocd_application_statusnow correctly readsdata.get("applications", [])(plural), so list-mode evidence is preserved
Hint keywords tightened (muddlebee P2 fixed)
"deploy","drift"removed; now only"argocd","argo cd","argo-cd","gitops","outofsync","outofsynced"
Tests: 44 pass, comprehensive coverage
- Client auth flows (bearer, session login, dual-auth rejection, 401 retry)
- Diff sanitization and truncation
- Catalog/verify resolution, multi-instance
- Source detection with tightened hint markers
- Tool behavior (status single-app, list-mode, diff, not-configured path)
🔴 Blocker — Merge Conflicts
mergeable_state: dirty — main has moved significantly (201 files diverged) since this PR was last rebased. The PR cannot be merged until conflicts are resolved.
Key files that will need attention on rebase:
app/integrations/catalog.py(Airflow integration was merged into main via #570)app/nodes/investigate/processing/post_process.pyapp/nodes/plan_actions/detect_sources.pyapp/types/evidence.py(Airflow added"airflow"toEvidenceSource).env.example
Action needed: Rebase issue/320-argocd-integration onto latest main, resolve conflicts, and push. All quality gates should re-pass since the Argo CD changes are orthogonal to what landed in main.
⚠️ Minor Observations (non-blocking)
VaibhavUpreti's catalog.py comment — "do we need all of them?" Looking at the code, ARGOCD_AUTH_TOKEN with ARGOCD_TOKEN as fallback is intentional to handle both naming conventions (some Argo CD tooling exports ARGOCD_TOKEN). This is fine and follows the same fallback pattern used elsewhere.
surfaces not set on either tool — ArgoCDApplicationStatusTool and ArgoCDApplicationDiffTool default to investigation-only. Given these tools are read-only and investigation-focused, that's appropriate. Adding surfaces=("investigation", "chat") would be a cosmetic consistency improvement but is not blocking.
Summary
| Category | Status |
|---|---|
| Code quality | ✅ Clean (ruff, mypy) |
| Tests | ✅ 44 pass |
| Security hardening | ✅ Thorough |
| Architecture fit | ✅ Follows conventions |
| Greptile P1 (list-mode) | ✅ Fixed |
| muddlebee P2 issues | ✅ Fixed |
| Docs + .env.example | ✅ Complete |
| CI checks (last head) | ✅ All green |
| Merge conflicts | ❌ Blocker — rebase onto main required |
Once the rebase is done and CI is green again, this is mergeable.
|
@MestreY0d4-Uninter pls fix merge conflicts |
70e8c32 to
4082ffc
Compare
Closes Tracer-Cloud#320 Adds a first-class read-only Argo CD integration for deployment status and drift investigation, including catalog/env/verify wiring, investigation tools, source detection, evidence integration, transport/auth hardening, and regression coverage.
Preserve application lists in Argo CD status evidence and keep bearer-prefix stripping scoped to token fields only.
4082ffc to
8b0ebb6
Compare
|
Thanks @muddlebee, fixed. I rebased the branch onto the latest Validation:
I also re-checked the earlier review items against the current head; no remaining blocker found. |
|
🎊 Achievement unlocked: PR Merged. @MestreY0d4-Uninter passed code review, survived CI, and shipped. Respect. 🤝 👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome. |
|
@MestreY0d4-Uninter nice work. and from next time lets try to keep the commits after rebase/resolve conflicts clean :) difficult to spot any errors or missing.. |

Fixes #320
Describe changes -
This PR adds a first-class, read-only Argo CD integration for deployment and drift investigation.
What changed:
app/services/argocdclient for application listing, application status/summary, managed resources, and server-side diff retrieval.ArgoCDApplicationStatusToolArgoCDApplicationDiffToolSecurity and safety hardening:
https://; plainhttp://is accepted only for localhost/loopback development endpoints, including IPv6 loopback.POST /api/v1/session, which is used only to obtain a session token when username/password auth is configured.Screenshots (if UI) -
N/A — backend integration and investigation tooling only.
Demo video -
https://github.com/MestreY0d4-Uninter/opensre/releases/download/pr-948-argocd-real-demo/opensre-pr948-argocd-real-walkthrough.mp4
Release artifact page: https://github.com/MestreY0d4-Uninter/opensre/releases/tag/pr-948-argocd-real-demo
What it shows:
opensre integrations verify argocd.OutOfSync, andArgoCDApplicationDiffToolreturnsdrift_detected=truewith a replica diff.Transparency note: this is a real terminal-session recording with commands entered into a live shell by automation for repeatability. The commands run against a local kind-hosted Argo CD instance and exercise the real OpenSRE code paths.
Code Understanding and AI Usage
Used AI?
Approach:
app/services/*/client.py, integration catalog/verify wiring,BaseToolinvestigation tools, source detection, and evidence/post-processing hooks.Validation
Follow-up validation after maintainer review:
.env.examplerequest by adding the documented Argo CD environment variables.uv run opensre integrations verify argocdargocd local env passed Connected to Argo CD and listed 1 application.sync_status: OutOfSyncdrift_detected: truediff_count: 1guestbook-uiDeployment.drift_detected: falsediff_count: 0245 passedmake lint: passedmake format-check: passedmake typecheck: passedgit diff --check: passed42.93s1280x720515c60b1e8a25c12302f5a6f48a5fb5ea697ed501b2c4ab8140ca898c2c9c5a7Local validation completed before opening the PR:
Focused Argo CD tests:
uv run pytest -q tests/services/argocd/test_client.py tests/integrations/test_argocd_catalog_verify.py tests/nodes/plan_actions/test_detect_sources_argocd.py tests/tools/test_argocd_tools.py33 passedExpanded related tests:
uv run pytest -q tests/integrations/test_argocd_catalog_verify.py tests/integrations/test_catalog_multi_instance.py tests/integrations/test_verify.py tests/nodes/plan_actions/test_detect_sources_argocd.py tests/tools/test_argocd_tools.py tests/tools/test_registry.py60 passedParallel focused run:
33 passedType checking:
make typecheckSuccess: no issues found in 396 source filesLint:
make lintAll checks passed!Formatting:
make format-check910 files already formattedWhitespace/diff validation:
git diff --check --cachedCoverage/full non-synthetic local suite:
make test-cov3294 passed, 2 skipped, 1 xfailed, 1 warningLocal Argo CD scenario simulator:
ThreadingHTTPServerChecklist