Split/3 integrations verify by davincios · Pull Request #504 · Tracer-Cloud/opensre

davincios · 2026-04-08T01:05:12Z

Fixes #

Describe the changes you have made in this PR -

Screenshots of the UI changes (If any) -

Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

No, I wrote all the code myself
Yes, I used AI assistance (continue below)

If you used AI assistance:

I have reviewed every single line of the AI-generated code
I can explain the purpose and logic of each function/component I added
I have tested edge cases and understand how the code handles them
I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

Checklist before requesting a review

I have added proper PR title and linked to the issue
I have performed a self-review of my code
I can explain the purpose of every function, class, and logic block I added
I understand why my changes work and have tested them thoroughly
I have considered potential edge cases and how my code handles them
If it is a core feature, I have added thorough tests
My code follows the project's style guidelines and conventions

Note: Please check Allow edits from maintainers if you would like us to assist in the PR.

…pport - Switch LangGraph stream_mode from ["updates"] to ["events"] for fine-grained tool calls, LLM reasoning, and chain transitions in real-time - Add reasoning mapper (app/remote/reasoning.py) to translate events into human-readable spinner subtext (replaces fake cycling verbs) - Enhance StreamRenderer and _LiveSpinner to show live tool calls and LLM reasoning instead of cosmetic placeholders - Add astream_investigation() async runner and CLI bridge for local streaming with the same UX as remote - Add POST /investigate/stream SSE endpoint to the lightweight FastAPI server - Make opensre deploy fully interactive: context-aware menu, health check, confirmation prompts, branch selection - Add named remotes to the wizard store so multiple remote endpoints (ec2, staging, local) can coexist with an active selection - Deploy always saves the EC2 URL as a named remote and sets it active - opensre remote shows a remote picker when multiple remotes are configured Made-with: Cursor

…ions Empty investigation reports were caused by two bugs: 1. _save_investigation used result.get('key', 'N/A') which returns "" when the key exists but is empty, instead of the 'N/A' fallback. Fixed with `or 'N/A'` to treat empty strings as missing. 2. Noise-classified alerts routed to END before diagnose/publish ran, leaving root_cause, report, and problem_md as "". Now propagates is_noise through the result dict and writes an explanatory message. Also adds interactive investigation browser (select + view + save) and persists streamed investigation results as .md files. Made-with: Cursor

Keep managed EC2 deployment state in app-owned local storage, preserve streamed investigations on disconnect, and surface local streaming failures promptly so the remote CLI can stay stream-first and merge-ready. Made-with: Cursor

Redact api_token values in integration show output and allow vercel records to be shown through the managed integrations CLI so credentials are not exposed during inspection. Made-with: Cursor

Support Vercel incident discovery and RCA from the CLI and remote server so deployment failures can be selected, investigated, and correlated with GitHub code context. Made-with: Cursor

Use the project-scoped Vercel runtime log API and structured error fields so the CLI finds the same incidents users see in the dashboard. Add loading feedback after project selection and keep the Vercel RCA tests aligned with the updated behavior. Made-with: Cursor

Resolve the PR conflicts by keeping the Vercel RCA flows, setup verification behavior, and remote server integrations while bringing in the latest main-branch updates. Made-with: Cursor

- Add INTEGRATIONS_STORE_PATH and shared home constants; wire integrations store. - Add prompt_support (questionary escape/cancel) for opensre and wizard entrypoints. - Wizard prompt/store tweaks and tests; daily-update workflow and Makefile. Made-with: Cursor

- Extend Vercel client (streaming runtime logs), poller, VercelLogsTool, and remote server wiring. - Remove vercel_incidents and integrations vercel CLI; update poller tests. - EC2 test fixture, pyproject deps, Makefile-aligned test assets. Made-with: Cursor

- Drop verify runtime-log flags from verify_integrations and python -m app.integrations. - Wizard: Vercel choice hint and post-select note about API limits. - Update CLI integration tests and smoke test. Made-with: Cursor

greptile-apps · 2026-04-08T01:14:22Z

Greptile Summary

This PR adds Vercel integration setup/verify, a background VercelPoller that auto-investigates deployment failures, Escape-key support for questionary prompts, and refactors cmd_verify to return an exit code rather than calling sys.exit directly. It also extracts common filesystem paths into app/constants and wires vercel + github evidence into the RCA pipeline.

Confidence Score: 4/5

Safe to merge after addressing the log-ID scan abort issue; remaining findings are style/P2.

One P2 logic finding (_find_deployment_by_selected_log_id aborts on any inner fetch error rather than continuing the scan) could surface as a user-facing 400 for an edge-case Vercel API response. The other comments are structural and non-blocking. Test coverage for the new poller and server integration is thorough.

app/remote/vercel_poller.py (_find_deployment_by_selected_log_id inner loop), app/integrations/main.py (dispatch pattern)

Vulnerabilities

No security concerns identified. The path-traversal protection in _safe_investigation_path is preserved and correctly migrated from str to Path. The Vercel API token is handled as a secret field in VercelConfig and is not logged or echoed.

Important Files Changed

Filename	Overview
app/remote/vercel_poller.py	New file: Vercel URL resolution, background polling loop, and alert enrichment. Core logic is solid; one scan loop can abort early on any per-deployment fetch error.
app/services/vercel/client.py	Adds streaming runtime-log fetch with retry, new `get_project` and `get_deployment_events` improvements, and normalised git metadata. Retry loop uses blocking `time.sleep` which is acceptable when called from a thread pool.
app/remote/server.py	Adds VercelPoller lifespan task, `vercel_url` input field, and extracts `_execute_investigation`/`_normalized_request_alert` helpers. Clean refactor with correct cancellation handling.
app/integrations/main.py	Moves `setup` and `verify` commands out of the dispatch dict into the `if cmd not in commands:` branch — functionally correct but structurally inverted, making the intent harder to follow.
app/integrations/cli.py	Adds Vercel setup handler, changes `cmd_setup` to return resolved service name, and `cmd_verify` to return exit code. `_die` correctly annotated as `NoReturn`.
app/cli/commands/integrations.py	Auto-verify step after setup for services in VERIFY_SERVICES; `cmd_verify` exit code propagated via `raise SystemExit`. All analytics calls correctly ordered.
app/cli/prompt_support.py	New module that monkey-patches questionary to add Escape-to-cancel across all prompt types. Guard flag prevents double-installation.
app/nodes/plan_actions/detect_sources.py	Adds Vercel-prefixed field aliases for GitHub owner/repo/SHA/ref; enriches Vercel source dict with project_name, selected_log_id, and error_message.
app/nodes/root_cause_diagnosis/prompt_builder.py	Adds Vercel deployment and GitHub code evidence sections to the RCA prompt. `json` was already imported; section builders are well-guarded.
.github/workflows/daily-update.yml	Simplifies scheduling: single cron at 01:17 UTC (previous 23:59 London gate was fragile with delayed runners), removes the gate step and its conditional guards.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI / HTTP Client
    participant Server as remote/server.py
    participant Enrich as enrich_remote_alert_from_vercel
    participant Poller as VercelPoller (background)
    participant VercelAPI as Vercel API
    participant RCA as run_investigation_cli

    CLI->>Server: POST /investigate {vercel_url?, raw_alert}
    Server->>Enrich: _normalized_request_alert(req)
    alt vercel_url or vercel fields present
        Enrich->>VercelAPI: list_projects / get_deployment / get_runtime_logs
        VercelAPI-->>Enrich: deployment + logs
        Enrich-->>Server: enriched raw_alert
    else no vercel fields
        Enrich-->>Server: raw_alert (pass-through)
    end
    Server->>RCA: _execute_investigation(raw_alert, ...)
    RCA-->>Server: result dict
    Server-->>CLI: InvestigateResponse

    Note over Poller: Runs on interval (VERCEL_POLL_INTERVAL_SECONDS)
    Poller->>VercelAPI: list_projects / list_deployments / get_deployment_bundle
    VercelAPI-->>Poller: actionable candidates
    Poller->>RCA: _handle_polled_candidate → _execute_investigation
    RCA-->>Poller: result
    Poller->>Server: _save_investigation (via asyncio.to_thread)

Comments Outside Diff (1)

app/remote/vercel_poller.py, line 1211-1215 (link)

Opaque list_limit inflation heuristic

The expression min(100, max(25, deployment_limit * 25)) silently fetches up to 25× the user-requested limit when deployment_limit <= 5 (e.g., deployment_limit=1 → fetches 25 from the API, then iterates only the first 1). The intent (fetch extra stubs so non-actionable ones don't exhaust the budget) is reasonable but the formula and the <= 5 threshold aren't documented. A comment or named constant would help future readers understand the intent and the tradeoff.

Prompt To Fix With AI

This is a comment left during a code review.
Path: app/remote/vercel_poller.py
Line: 1211-1215

Comment:
**Opaque `list_limit` inflation heuristic**

The expression `min(100, max(25, deployment_limit * 25))` silently fetches up to 25× the user-requested limit when `deployment_limit <= 5` (e.g., `deployment_limit=1` → fetches 25 from the API, then iterates only the first 1). The intent (fetch extra stubs so non-actionable ones don't exhaust the budget) is reasonable but the formula and the `<= 5` threshold aren't documented. A comment or named constant would help future readers understand the intent and the tradeoff.

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: app/integrations/__main__.py
Line: 40-57

Comment:
**Inverted dispatch pattern for `setup`/`verify`**

`setup` and `verify` are handled inside the `if cmd not in commands:` block even though they are perfectly valid commands. This inverts the normal "dispatch table + unknown handler" pattern, making it easy to miss that `setup`/`verify` are supported and harder to add future commands consistently. Consider adding them to `commands` directly, or at minimum restructuring so the unknown-command branch is clearly the last resort.

```suggestion
    commands = {
        "list": lambda _: cmd_list(),
        "show": cmd_show,
        "remove": cmd_remove,
    }

    if cmd == "setup":
        resolved_service = cmd_setup(svc)
        if resolved_service in SUPPORTED_VERIFY_SERVICES:
            print(f"  Verifying {resolved_service}...\n")
            sys.exit(cmd_verify(resolved_service))
        return
    if cmd == "verify":
        sys.exit(
            cmd_verify(
                svc,
                send_slack_test="--send-slack-test" in option_args,
            )
        )
    if cmd not in commands:
        print(f"  Unknown command '{cmd}'. Try: setup, list, show, remove, verify", file=sys.stderr)
        sys.exit(1)

    commands[cmd](svc)
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: app/remote/vercel_poller.py
Line: 758-776

Comment:
**`VercelResolutionError` in inner loop aborts the entire log-ID scan**

`_fetch_deployment_bundle` raises `VercelResolutionError` if `get_deployment` returns a non-success response. Because this is called inside the `for deployment in ...` loop without any per-iteration error handling, a single inaccessible deployment will surface as a user-facing 400 error rather than being skipped and the search continuing to the next deployment. A deleted or access-restricted deployment in the list (which Vercel can return) would always abort the search.

```python
for deployment in deployments_result.get("deployments", []):
    deployment_id = str(deployment.get("id", "")).strip()
    if not deployment_id:
        continue
    try:
        deployment_details, events, runtime_logs = _fetch_deployment_bundle(
            client,
            project_id=project_id,
            deployment_id=deployment_id,
            log_limit=log_limit,
        )
    except VercelResolutionError:
        logger.debug("Skipping deployment %s during log-ID scan", deployment_id)
        continue
    if any(str(log.get("id", "")).strip() == selected_log_id for log in runtime_logs):
        return deployment_details, events, runtime_logs
    if any(str(event.get("id", "")).strip() == selected_log_id for event in events):
        return deployment_details, events, runtime_logs
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: app/remote/vercel_poller.py
Line: 1211-1215

Comment:
**Opaque `list_limit` inflation heuristic**

The expression `min(100, max(25, deployment_limit * 25))` silently fetches up to 25× the user-requested limit when `deployment_limit <= 5` (e.g., `deployment_limit=1` → fetches 25 from the API, then iterates only the first 1). The intent (fetch extra stubs so non-actionable ones don't exhaust the budget) is reasonable but the formula and the `<= 5` threshold aren't documented. A comment or named constant would help future readers understand the intent and the tradeoff.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: app/services/vercel/client.py
Line: 409-515

Comment:
**Blocking `time.sleep` in retry path**

`time.sleep(min(8.0, 2.0**attempt))` is a blocking call. When `get_runtime_logs` is invoked from `asyncio.to_thread` (as in `_parallel_deployment_events_and_runtime_logs`) this only blocks the thread-pool worker, which is acceptable. However, if it is ever called directly from a synchronous context on the event-loop thread (e.g., future tests or a sync script using `collect_vercel_candidates`), it will block the caller for up to 8 s per retry. Consider adding a comment noting it should never be called directly on the event loop thread.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "refactor(integrations): list-only Vercel..." | Re-trigger Greptile}

greptile-apps · 2026-04-08T01:14:25Z


    commands = {
-        "setup": cmd_setup,
        "list": lambda _: cmd_list(),
        "show": cmd_show,
        "remove": cmd_remove,
-        "verify": lambda service: cmd_verify(
-            service,
-            send_slack_test="--send-slack-test" in option_args,
-        ),
    }
    if cmd not in commands:
-        print(f"  Unknown command '{cmd}'. Try: {', '.join(commands)}", file=sys.stderr)
+        if cmd == "setup":
+            resolved_service = cmd_setup(svc)
+            if resolved_service in SUPPORTED_VERIFY_SERVICES:
+                print(f"  Verifying {resolved_service}...\n")
+                sys.exit(cmd_verify(resolved_service))
+            return
+        if cmd == "verify":
+            sys.exit(
+                cmd_verify(
+                    svc,
+                    send_slack_test="--send-slack-test" in option_args,


Inverted dispatch pattern for setup/verify

setup and verify are handled inside the if cmd not in commands: block even though they are perfectly valid commands. This inverts the normal "dispatch table + unknown handler" pattern, making it easy to miss that setup/verify are supported and harder to add future commands consistently. Consider adding them to commands directly, or at minimum restructuring so the unknown-command branch is clearly the last resort.

Suggested change

commands = {

"setup": cmd_setup,

"list": lambda _: cmd_list(),

"show": cmd_show,

"remove": cmd_remove,

"verify": lambda service: cmd_verify(

service,

send_slack_test="--send-slack-test" in option_args,

),

}

if cmd not in commands:

print(f" Unknown command '{cmd}'. Try: {', '.join(commands)}", file=sys.stderr)

if cmd == "setup":

resolved_service = cmd_setup(svc)

if resolved_service in SUPPORTED_VERIFY_SERVICES:

print(f" Verifying {resolved_service}...\n")

sys.exit(cmd_verify(resolved_service))

return

if cmd == "verify":

sys.exit(

cmd_verify(

svc,

send_slack_test="--send-slack-test" in option_args,

commands = {

"list": lambda _: cmd_list(),

"show": cmd_show,

"remove": cmd_remove,

}

if cmd == "setup":

resolved_service = cmd_setup(svc)

if resolved_service in SUPPORTED_VERIFY_SERVICES:

print(f" Verifying {resolved_service}...\n")

sys.exit(cmd_verify(resolved_service))

return

if cmd == "verify":

sys.exit(

cmd_verify(

svc,

send_slack_test="--send-slack-test" in option_args,

)

)

if cmd not in commands:

print(f" Unknown command '{cmd}'. Try: setup, list, show, remove, verify", file=sys.stderr)

sys.exit(1)

commands[cmd](svc)

Prompt To Fix With AI

This is a comment left during a code review. Path: app/integrations/__main__.py Line: 40-57 Comment: **Inverted dispatch pattern for `setup`/`verify`** `setup` and `verify` are handled inside the `if cmd not in commands:` block even though they are perfectly valid commands. This inverts the normal "dispatch table + unknown handler" pattern, making it easy to miss that `setup`/`verify` are supported and harder to add future commands consistently. Consider adding them to `commands` directly, or at minimum restructuring so the unknown-command branch is clearly the last resort. ```suggestion commands = { "list": lambda _: cmd_list(), "show": cmd_show, "remove": cmd_remove, } if cmd == "setup": resolved_service = cmd_setup(svc) if resolved_service in SUPPORTED_VERIFY_SERVICES: print(f" Verifying {resolved_service}...\n") sys.exit(cmd_verify(resolved_service)) return if cmd == "verify": sys.exit( cmd_verify( svc, send_slack_test="--send-slack-test" in option_args, ) ) if cmd not in commands: print(f" Unknown command '{cmd}'. Try: setup, list, show, remove, verify", file=sys.stderr) sys.exit(1) commands[cmd](svc) ``` How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-04-08T01:14:27Z

+            )
+        elif parsed.selected_log_id:
+            deployment, events, runtime_logs = _find_deployment_by_selected_log_id(
+                client,
+                project_id=str(project.get("id", "")).strip(),
+                selected_log_id=parsed.selected_log_id,
+                deployment_limit=_DEFAULT_DEPLOYMENT_LIMIT,
+                log_limit=_DEFAULT_LOG_LIMIT,
+            )
+        else:
+            deployment, events, runtime_logs = _select_latest_actionable_deployment(
+                client,
+                project_id=str(project.get("id", "")).strip(),
+                deployment_limit=_DEFAULT_DEPLOYMENT_LIMIT,
+                log_limit=_DEFAULT_LOG_LIMIT,
+            )
+
+    candidate = build_vercel_investigation_candidate(
+        project=project,


VercelResolutionError in inner loop aborts the entire log-ID scan

_fetch_deployment_bundle raises VercelResolutionError if get_deployment returns a non-success response. Because this is called inside the for deployment in ... loop without any per-iteration error handling, a single inaccessible deployment will surface as a user-facing 400 error rather than being skipped and the search continuing to the next deployment. A deleted or access-restricted deployment in the list (which Vercel can return) would always abort the search.

for deployment in deployments_result.get("deployments", []): deployment_id = str(deployment.get("id", "")).strip() if not deployment_id: continue try: deployment_details, events, runtime_logs = _fetch_deployment_bundle( client, project_id=project_id, deployment_id=deployment_id, log_limit=log_limit, ) except VercelResolutionError: logger.debug("Skipping deployment %s during log-ID scan", deployment_id) continue if any(str(log.get("id", "")).strip() == selected_log_id for log in runtime_logs): return deployment_details, events, runtime_logs if any(str(event.get("id", "")).strip() == selected_log_id for event in events): return deployment_details, events, runtime_logs

Prompt To Fix With AI

This is a comment left during a code review. Path: app/remote/vercel_poller.py Line: 758-776 Comment: **`VercelResolutionError` in inner loop aborts the entire log-ID scan** `_fetch_deployment_bundle` raises `VercelResolutionError` if `get_deployment` returns a non-success response. Because this is called inside the `for deployment in ...` loop without any per-iteration error handling, a single inaccessible deployment will surface as a user-facing 400 error rather than being skipped and the search continuing to the next deployment. A deleted or access-restricted deployment in the list (which Vercel can return) would always abort the search. ```python for deployment in deployments_result.get("deployments", []): deployment_id = str(deployment.get("id", "")).strip() if not deployment_id: continue try: deployment_details, events, runtime_logs = _fetch_deployment_bundle( client, project_id=project_id, deployment_id=deployment_id, log_limit=log_limit, ) except VercelResolutionError: logger.debug("Skipping deployment %s during log-ID scan", deployment_id) continue if any(str(log.get("id", "")).strip() == selected_log_id for log in runtime_logs): return deployment_details, events, runtime_logs if any(str(event.get("id", "")).strip() == selected_log_id for event in events): return deployment_details, events, runtime_logs ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-08T01:14:31Z

            logger.warning("[vercel] get_deployment_events error type=%s detail=%s", type(e).__name__, e)
            return {"success": False, "error": str(e)}

-    def get_runtime_logs(self, deployment_id: str, limit: int = 100) -> dict[str, Any]:
-        """Fetch serverless function runtime logs (stdout/stderr) for a deployment."""
+    def get_runtime_logs(
+        self,
+        deployment_id: str,
+        limit: int = 100,
+        *,
+        project_id: str = "",
+    ) -> dict[str, Any]:
+        """Fetch serverless function runtime logs for a deployment.
+
+        Per Vercel (`GET /v1/projects/{projectId}/deployments/{deploymentId}/runtime-logs
+        <https://vercel.com/docs/rest-api/logs/get-logs-for-a-deployment>`_), the response is
+        ``Content-Type: application/stream+json``: a **stream** of JSON objects, not one array.
+        This client uses :meth:`httpx.Client.stream` and :meth:`httpx.Response.iter_lines` to read
+        incrementally (line-delimited JSON) until ``limit`` rows are collected or the stream ends.
+        """
        params: dict[str, Any] = {"limit": min(limit, 2000)}
        params.update(self.config.team_params)
-        try:
-            resp = self._get_client().get(f"/v1/deployments/{deployment_id}/logs", params=params)
-            resp.raise_for_status()
-            data = resp.json()
-            raw_logs = data if isinstance(data, list) else data.get("logs", [])
-            logs = [
-                {
-                    "id": log.get("id", ""),
-                    "created_at": log.get("createdAt", ""),
-                    "payload": log.get("payload", {}),
-                    "type": log.get("type", ""),
-                    "source": log.get("source", ""),
-                }
-                for log in raw_logs
-            ]
-            return {"success": True, "logs": logs, "total": len(logs)}
-        except httpx.HTTPStatusError as e:
-            logger.warning(
-                "[vercel] get_runtime_logs HTTP failure status=%s id=%r",
-                e.response.status_code,
-                deployment_id,
-            )
-            return {"success": False, "error": f"HTTP {e.response.status_code}: {e.response.text[:200]}"}
-        except Exception as e:
-            logger.warning("[vercel] get_runtime_logs error type=%s detail=%s", type(e).__name__, e)
-            return {"success": False, "error": str(e)}
+        path = f"/v1/projects/{project_id}/deployments/{deployment_id}/runtime-logs"
+        if not project_id:
+            path = f"/v1/deployments/{deployment_id}/logs"
+
+        cap = min(limit, 2000)
+        stream_headers = {
+            **self.config.headers,
+            "Accept": "application/stream+json, application/x-ndjson, application/json",
+        }
+
+        last_retryable_detail = ""
+        last_retryable_kind = ""
+        for attempt in range(1, _RUNTIME_LOGS_READ_ATTEMPTS + 1):
+            try:
+                http = self._get_client()
+                with http.stream(
+                    "GET",
+                    path,
+                    params=params,
+                    headers=stream_headers,
+                    timeout=_runtime_logs_http_timeout(),
+                ) as resp:
+                    resp.raise_for_status()
+                    raw_logs = _collect_runtime_logs_from_stream(resp, cap)
+                logs = [
+                    {
+                        "id": log.get("id", "") or log.get("rowId", ""),
+                        "created_at": log.get("createdAt", "") or log.get("timestampInMs", ""),
+                        "payload": log.get("payload", {}),
+                        "message": _extract_runtime_log_message(log),
+                        "type": log.get("type", "") or log.get("level", ""),
+                        "source": log.get("source", ""),
+                        "level": log.get("level", ""),
+                        "request_path": log.get("requestPath", ""),
+                        "request_method": str(
+                            log.get("requestMethod", "") or log.get("method", "") or ""
+                        ).strip(),
+                        "status_code": log.get("responseStatusCode", ""),
+                        "domain": log.get("domain", ""),
+                    }
+                    for log in raw_logs
+                    if isinstance(log, dict)
+                ]
+                return {"success": True, "logs": logs, "total": len(logs)}
+            except (httpx.ReadTimeout, httpx.ReadError, httpx.RemoteProtocolError) as exc:
+                last_retryable_detail = str(exc)
+                last_retryable_kind = type(exc).__name__
+                logger.warning(
+                    "[vercel] get_runtime_logs %s attempt %s/%s deployment=%r",
+                    last_retryable_kind,
+                    attempt,
+                    _RUNTIME_LOGS_READ_ATTEMPTS,
+                    deployment_id,
+                )
+                if attempt >= _RUNTIME_LOGS_READ_ATTEMPTS:
+                    break
+                time.sleep(min(8.0, 2.0**attempt))
+            except httpx.HTTPStatusError as e:
+                if e.response.status_code == 404:
+                    return {"success": True, "logs": [], "total": 0}
+                logger.warning(
+                    "[vercel] get_runtime_logs HTTP failure status=%s id=%r",
+                    e.response.status_code,
+                    deployment_id,
+                )
+                return {"success": False, "error": f"HTTP {e.response.status_code}: {e.response.text[:200]}"}
+            except Exception as e:
+                logger.warning("[vercel] get_runtime_logs error type=%s detail=%s", type(e).__name__, e)
+                return {"success": False, "error": str(e)}
+
+        detail = last_retryable_detail or "Transient runtime log transport error"
+        if last_retryable_kind == "ReadTimeout":
+            read_s = _runtime_logs_read_timeout_seconds()
+            return {
+                "success": False,
+                "error": (
+                    f"{detail} (after {_RUNTIME_LOGS_READ_ATTEMPTS} attempts, "
+                    f"{read_s:g}s read timeout each; set VERCEL_RUNTIME_LOGS_READ_TIMEOUT to increase)"
+                ),
+            }
+
+        kind = last_retryable_kind or "transport error"
+        return {
+            "success": False,
+            "error": (
+                f"{detail} (after {_RUNTIME_LOGS_READ_ATTEMPTS} attempts while reading "
+                f"runtime logs; last error type: {kind})"


Blocking time.sleep in retry path

time.sleep(min(8.0, 2.0**attempt)) is a blocking call. When get_runtime_logs is invoked from asyncio.to_thread (as in _parallel_deployment_events_and_runtime_logs) this only blocks the thread-pool worker, which is acceptable. However, if it is ever called directly from a synchronous context on the event-loop thread (e.g., future tests or a sync script using collect_vercel_candidates), it will block the caller for up to 8 s per retry. Consider adding a comment noting it should never be called directly on the event loop thread.

Prompt To Fix With AI

This is a comment left during a code review. Path: app/services/vercel/client.py Line: 409-515 Comment: **Blocking `time.sleep` in retry path** `time.sleep(min(8.0, 2.0**attempt))` is a blocking call. When `get_runtime_logs` is invoked from `asyncio.to_thread` (as in `_parallel_deployment_events_and_runtime_logs`) this only blocks the thread-pool worker, which is acceptable. However, if it is ever called directly from a synchronous context on the event-loop thread (e.g., future tests or a sync script using `collect_vercel_candidates`), it will block the caller for up to 8 s per retry. Consider adding a comment noting it should never be called directly on the event loop thread. How can I resolve this? If you propose a fix, please make it concise.

@muddlebee

… drop redundant header, fix CodeQL - slack_delivery `_post_direct` exception log key reverted to ``type=%s`` (matching the exact pre-refactor format ``type=%s channel=%s thread_ts=%s detail=%s``) so existing log parsers that key off ``type=`` keep working. Per @muddlebee's nit on PR Tracer-Cloud#952. - DeliveryResponse field renamed ``error_type`` → ``exc_type``. Pythonic abbreviation for "exception type"; can't be ``type`` because that shadows the builtin. Field/log-key are intentionally distinct: the field is the Python attribute, the log key matches the legacy format. - Drop ``Content-Type: application/json; charset=utf-8`` from ``_slack_bearer_headers`` and ``_discord_auth_headers``. ``httpx.post`` already sets ``Content-Type: application/json`` automatically when the request uses the ``json=`` kwarg, so the explicit header was redundant. No behavioural change for Slack/Discord (UTF-8 is httpx's default encoding and neither provider parses the charset suffix). - Fix CodeQL py/import-and-import-from alerts on tests Tracer-Cloud#503/Tracer-Cloud#504. The ``test_module_does_not_import_httpx`` regression test in each delivery test file was importing the module under test via both ``import X as mod`` and ``from X import Y`` styles. Switched to a single ``from app.utils import <module>`` style so the same module is no longer dual-imported. Test count unchanged (3208 pass / 2 skipped / 1 xfail). Lint, format, and ``mypy app/utils/`` clean.

@muddlebee

…#952) * refactor(utils): extract shared HTTP-post helper for delivery modules Closes #864. ``app/utils/slack_delivery.py``, ``app/utils/discord_delivery.py``, and ``app/utils/telegram_delivery.py`` each issued ``httpx.post`` directly, applied a per-provider timeout, parsed the response body, and converted exceptions to ``(False, error, ...)`` tuples. The transport pieces were identical; only the success criteria, auth scheme, and error-message extraction differed per provider. Add ``app/utils/delivery_transport.py`` with ``post_json`` plus a ``DeliveryResponse`` dataclass that captures the shared transport behavior: HTTP POST, timeout, optional ``follow_redirects``, exception suppression, and JSON decoding with graceful fallback. The helper deliberately does not decide provider-level success — callers inspect ``status_code`` / ``data`` per their own semantics. Refactored callers: - Slack: ``_call_reactions_api``, ``_post_direct``, ``_post_via_webapp``, ``_post_via_incoming_webhook`` — all four code paths now go through ``post_json``. ``send_slack_report`` orchestration unchanged. - Discord: ``post_discord_message``, ``create_discord_thread`` — same Bearer/Bot header pattern factored into ``_discord_auth_headers``. - Telegram: ``post_telegram_message`` — bot-token redaction in error messages preserved by re-running ``_redact_token`` over ``response.error``. Provider-specific payload building, success criteria (``data["ok"]`` for Slack, status codes for Discord/Telegram), and error extraction all stay in the calling modules. Public function signatures are unchanged. Tests: - ``tests/utils/test_delivery_transport.py`` (new) — 15 tests covering happy path, every transport-failure shape (httpx.ConnectError, ReadTimeout, RequestError, OSError, RuntimeError), JSON-decode fallbacks (non-JSON body, JSON array, empty body), header / timeout / follow_redirects pass-through, and DeliveryResponse frozen-dataclass invariants. - ``tests/utils/test_slack_delivery.py`` (new) — 24 tests covering all four Slack code paths, ``send_slack_report`` orchestration, and fallback chain ``direct → webapp``. - ``tests/utils/test_discord_delivery.py`` (extended) — 3 new tests pinning the helper-delegation contract: module no longer imports ``httpx``, ``post_discord_message`` and ``create_discord_thread`` both go through ``post_json``. - ``tests/utils/test_telegram_delivery.py`` (extended) — 3 new tests pinning the same contract plus a regression test that the bot-token is redacted out of error strings even when the failure originates from the shared transport. - All pre-existing tests in ``test_discord_delivery.py`` and ``test_telegram_delivery.py`` updated to patch ``app.utils.delivery_transport.httpx.post`` (the new transport boundary) instead of the per-module ``httpx.post`` import that no longer exists. Verification: - ``pytest tests/``: 3191 pass, 2 skipped, 1 xfailed, 0 failures. - ``pytest tests/utils/``: 105 pass, 1 skipped (was 32 + 26 = 58 across delivery modules; now 105 with helper + Slack additions). - ``ruff check app/ tests/`` and ``ruff format --check``: clean. - ``mypy app/utils/``: clean on touched modules; the 5 reported errors are pre-existing missing-stub warnings in ``app/integrations/{mariadb,mysql,hf_remote}`` unrelated to this PR. * ci: retrigger after anyio/mcp circular-import flake The previous CI run hit a known transient ImportError in anyio.lowlevel/mcp when opensre integrations list ran on a pytest-xdist worker. Reproduces zero times locally on origin/main and on this branch; pushing an empty commit to re-run with a fresh interpreter pool. * fix(utils): address greptile review on delivery transport refactor - Wrap `DeliveryResponse.data` in `MappingProxyType` so the frozen dataclass is fully immutable end-to-end. Mutating the parsed body now raises `TypeError` instead of silently succeeding, and caller- passed dicts can no longer leak mutations into the response. - Add `error_type` field on `DeliveryResponse` populated with `type(exc).__name__` on transport failures. `slack_delivery._post_direct` threads it back into the exception log so on-call can distinguish `TimeoutError` from `ConnectionError` at a glance — restores the pre-refactor log shape that #864 had dropped. - Add `TestDelegatesToSharedTransport` to `tests/utils/test_slack_delivery.py`, mirroring the regression class on the discord/telegram test files. Pins that `slack_delivery` does not import `httpx` and that all four code paths (`_call_reactions_api`, `_post_direct`, `_post_via_webapp`, `_post_via_incoming_webhook`) route through `delivery_transport.post_json`. - Add focused tests for the immutable-data, error_type, and slack exc_type-log behaviours (+17 tests; 3208 pass / 2 skipped / 1 xfail). Tighten `_discord_error_from_data` signature to `Mapping[str, Any]` so mypy stays clean against the new read-only `data` type. * refactor(utils): address review nits — restore log key, rename field, drop redundant header, fix CodeQL - slack_delivery `_post_direct` exception log key reverted to ``type=%s`` (matching the exact pre-refactor format ``type=%s channel=%s thread_ts=%s detail=%s``) so existing log parsers that key off ``type=`` keep working. Per @muddlebee's nit on PR #952. - DeliveryResponse field renamed ``error_type`` → ``exc_type``. Pythonic abbreviation for "exception type"; can't be ``type`` because that shadows the builtin. Field/log-key are intentionally distinct: the field is the Python attribute, the log key matches the legacy format. - Drop ``Content-Type: application/json; charset=utf-8`` from ``_slack_bearer_headers`` and ``_discord_auth_headers``. ``httpx.post`` already sets ``Content-Type: application/json`` automatically when the request uses the ``json=`` kwarg, so the explicit header was redundant. No behavioural change for Slack/Discord (UTF-8 is httpx's default encoding and neither provider parses the charset suffix). - Fix CodeQL py/import-and-import-from alerts on tests #503/#504. The ``test_module_does_not_import_httpx`` regression test in each delivery test file was importing the module under test via both ``import X as mod`` and ``from X import Y`` styles. Switched to a single ``from app.utils import <module>`` style so the same module is no longer dual-imported. Test count unchanged (3208 pass / 2 skipped / 1 xfail). Lint, format, and ``mypy app/utils/`` clean. * fix(slack): restore charset=utf-8 — Slack emits missing_charset warning without it Live e2e probe against real ``slack.com/api/chat.postMessage`` revealed that dropping the explicit ``Content-Type: application/json; charset=utf-8`` header (per the previous nit-fix commit) caused Slack to add a ``missing_charset`` entry to ``response_metadata.warnings`` on every request. httpx alone sets only the bare ``application/json`` for ``json=`` kwargs, but Slack's docs explicitly recommend the charset suffix (https://api.slack.com/web#posting_json) and emit the warning otherwise. Restored the ``charset=utf-8`` header on ``_slack_bearer_headers`` with an inline comment explaining the reason. ``_discord_auth_headers`` stays auth-only (Discord does not emit this warning, and its docs do not require the charset suffix). Pinned via direct header assertions on the two existing slack tests that already captured headers (``test_sends_correct_url_and_headers`` for the reactions API, ``test_sends_thread_reply_payload`` for chat.postMessage), so a future regression that re-drops the charset will fail loudly. Verified live: - direct ``chat.postMessage`` probe now returns ``warnings=[]`` (was ``['missing_charset']`` before this commit). - 38/38 live checks against real Slack / Discord / Telegram pass. - 3208 unit tests pass; lint / format / mypy clean.

davincios added 10 commits April 6, 2026 14:50

fix(remote): harden streaming deploy workflow

2e734e6

Keep managed EC2 deployment state in app-owned local storage, preserve streamed investigations on disconnect, and surface local streaming failures promptly so the remote CLI can stay stream-first and merge-ready. Made-with: Cursor

fix(cli): redact integration tokens in show output

a78ebf5

Redact api_token values in integration show output and allow vercel records to be shown through the managed integrations CLI so credentials are not exposed during inspection. Made-with: Cursor

feat(vercel): add incident RCA flows for CLI and remote investigation

f7b9e3f

Support Vercel incident discovery and RCA from the CLI and remote server so deployment failures can be selected, investigated, and correlated with GitHub code context. Made-with: Cursor

Merge origin/main into feature/vercel-url-polling-rca

56bc2bd

Resolve the PR conflicts by keeping the Vercel RCA flows, setup verification behavior, and remote server integrations while bringing in the latest main-branch updates. Made-with: Cursor

davincios merged commit 784dd76 into main Apr 8, 2026
3 checks passed

davincios deleted the split/3-integrations-verify branch April 8, 2026 01:05

greptile-apps Bot reviewed Apr 8, 2026

View reviewed changes

mayankbharati-ops mentioned this pull request Apr 27, 2026

refactor(utils): extract shared HTTP-post helper for delivery modules #952

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split/3 integrations verify#504

Split/3 integrations verify#504
davincios merged 10 commits intomainfrom
split/3-integrations-verify

davincios commented Apr 8, 2026

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 8, 2026 •

edited

Loading

Vulnerabilities

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Apr 8, 2026

Uh oh!

greptile-apps Bot Apr 8, 2026

Uh oh!

greptile-apps Bot Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davincios commented Apr 8, 2026

Describe the changes you have made in this PR -

Screenshots of the UI changes (If any) -

Code Understanding and AI Usage

Checklist before requesting a review

Uh oh!

Uh oh!

greptile-apps Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Vulnerabilities

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Apr 8, 2026 •

edited

Loading