Conversation
…pport - Switch LangGraph stream_mode from ["updates"] to ["events"] for fine-grained tool calls, LLM reasoning, and chain transitions in real-time - Add reasoning mapper (app/remote/reasoning.py) to translate events into human-readable spinner subtext (replaces fake cycling verbs) - Enhance StreamRenderer and _LiveSpinner to show live tool calls and LLM reasoning instead of cosmetic placeholders - Add astream_investigation() async runner and CLI bridge for local streaming with the same UX as remote - Add POST /investigate/stream SSE endpoint to the lightweight FastAPI server - Make opensre deploy fully interactive: context-aware menu, health check, confirmation prompts, branch selection - Add named remotes to the wizard store so multiple remote endpoints (ec2, staging, local) can coexist with an active selection - Deploy always saves the EC2 URL as a named remote and sets it active - opensre remote shows a remote picker when multiple remotes are configured Made-with: Cursor
…ions
Empty investigation reports were caused by two bugs:
1. _save_investigation used result.get('key', 'N/A') which returns ""
when the key exists but is empty, instead of the 'N/A' fallback.
Fixed with `or 'N/A'` to treat empty strings as missing.
2. Noise-classified alerts routed to END before diagnose/publish ran,
leaving root_cause, report, and problem_md as "". Now propagates
is_noise through the result dict and writes an explanatory message.
Also adds interactive investigation browser (select + view + save)
and persists streamed investigation results as .md files.
Made-with: Cursor
Keep managed EC2 deployment state in app-owned local storage, preserve streamed investigations on disconnect, and surface local streaming failures promptly so the remote CLI can stay stream-first and merge-ready. Made-with: Cursor
Redact api_token values in integration show output and allow vercel records to be shown through the managed integrations CLI so credentials are not exposed during inspection. Made-with: Cursor
Support Vercel incident discovery and RCA from the CLI and remote server so deployment failures can be selected, investigated, and correlated with GitHub code context. Made-with: Cursor
Use the project-scoped Vercel runtime log API and structured error fields so the CLI finds the same incidents users see in the dashboard. Add loading feedback after project selection and keep the Vercel RCA tests aligned with the updated behavior. Made-with: Cursor
Resolve the PR conflicts by keeping the Vercel RCA flows, setup verification behavior, and remote server integrations while bringing in the latest main-branch updates. Made-with: Cursor
- Add INTEGRATIONS_STORE_PATH and shared home constants; wire integrations store. - Add prompt_support (questionary escape/cancel) for opensre and wizard entrypoints. - Wizard prompt/store tweaks and tests; daily-update workflow and Makefile. Made-with: Cursor
- Extend Vercel client (streaming runtime logs), poller, VercelLogsTool, and remote server wiring. - Remove vercel_incidents and integrations vercel CLI; update poller tests. - EC2 test fixture, pyproject deps, Makefile-aligned test assets. Made-with: Cursor
- Drop verify runtime-log flags from verify_integrations and python -m app.integrations. - Wizard: Vercel choice hint and post-select note about API limits. - Update CLI integration tests and smoke test. Made-with: Cursor
Greptile SummaryThis PR adds Vercel integration setup/verify, a background Confidence Score: 4/5Safe to merge after addressing the log-ID scan abort issue; remaining findings are style/P2. One P2 logic finding (_find_deployment_by_selected_log_id aborts on any inner fetch error rather than continuing the scan) could surface as a user-facing 400 for an edge-case Vercel API response. The other comments are structural and non-blocking. Test coverage for the new poller and server integration is thorough. app/remote/vercel_poller.py (_find_deployment_by_selected_log_id inner loop), app/integrations/main.py (dispatch pattern)
|
| Filename | Overview |
|---|---|
| app/remote/vercel_poller.py | New file: Vercel URL resolution, background polling loop, and alert enrichment. Core logic is solid; one scan loop can abort early on any per-deployment fetch error. |
| app/services/vercel/client.py | Adds streaming runtime-log fetch with retry, new get_project and get_deployment_events improvements, and normalised git metadata. Retry loop uses blocking time.sleep which is acceptable when called from a thread pool. |
| app/remote/server.py | Adds VercelPoller lifespan task, vercel_url input field, and extracts _execute_investigation/_normalized_request_alert helpers. Clean refactor with correct cancellation handling. |
| app/integrations/main.py | Moves setup and verify commands out of the dispatch dict into the if cmd not in commands: branch — functionally correct but structurally inverted, making the intent harder to follow. |
| app/integrations/cli.py | Adds Vercel setup handler, changes cmd_setup to return resolved service name, and cmd_verify to return exit code. _die correctly annotated as NoReturn. |
| app/cli/commands/integrations.py | Auto-verify step after setup for services in VERIFY_SERVICES; cmd_verify exit code propagated via raise SystemExit. All analytics calls correctly ordered. |
| app/cli/prompt_support.py | New module that monkey-patches questionary to add Escape-to-cancel across all prompt types. Guard flag prevents double-installation. |
| app/nodes/plan_actions/detect_sources.py | Adds Vercel-prefixed field aliases for GitHub owner/repo/SHA/ref; enriches Vercel source dict with project_name, selected_log_id, and error_message. |
| app/nodes/root_cause_diagnosis/prompt_builder.py | Adds Vercel deployment and GitHub code evidence sections to the RCA prompt. json was already imported; section builders are well-guarded. |
| .github/workflows/daily-update.yml | Simplifies scheduling: single cron at 01:17 UTC (previous 23:59 London gate was fragile with delayed runners), removes the gate step and its conditional guards. |
Sequence Diagram
sequenceDiagram
participant CLI as CLI / HTTP Client
participant Server as remote/server.py
participant Enrich as enrich_remote_alert_from_vercel
participant Poller as VercelPoller (background)
participant VercelAPI as Vercel API
participant RCA as run_investigation_cli
CLI->>Server: POST /investigate {vercel_url?, raw_alert}
Server->>Enrich: _normalized_request_alert(req)
alt vercel_url or vercel fields present
Enrich->>VercelAPI: list_projects / get_deployment / get_runtime_logs
VercelAPI-->>Enrich: deployment + logs
Enrich-->>Server: enriched raw_alert
else no vercel fields
Enrich-->>Server: raw_alert (pass-through)
end
Server->>RCA: _execute_investigation(raw_alert, ...)
RCA-->>Server: result dict
Server-->>CLI: InvestigateResponse
Note over Poller: Runs on interval (VERCEL_POLL_INTERVAL_SECONDS)
Poller->>VercelAPI: list_projects / list_deployments / get_deployment_bundle
VercelAPI-->>Poller: actionable candidates
Poller->>RCA: _handle_polled_candidate → _execute_investigation
RCA-->>Poller: result
Poller->>Server: _save_investigation (via asyncio.to_thread)
Comments Outside Diff (1)
-
app/remote/vercel_poller.py, line 1211-1215 (link)Opaque
list_limitinflation heuristicThe expression
min(100, max(25, deployment_limit * 25))silently fetches up to 25× the user-requested limit whendeployment_limit <= 5(e.g.,deployment_limit=1→ fetches 25 from the API, then iterates only the first 1). The intent (fetch extra stubs so non-actionable ones don't exhaust the budget) is reasonable but the formula and the<= 5threshold aren't documented. A comment or named constant would help future readers understand the intent and the tradeoff.Prompt To Fix With AI
This is a comment left during a code review. Path: app/remote/vercel_poller.py Line: 1211-1215 Comment: **Opaque `list_limit` inflation heuristic** The expression `min(100, max(25, deployment_limit * 25))` silently fetches up to 25× the user-requested limit when `deployment_limit <= 5` (e.g., `deployment_limit=1` → fetches 25 from the API, then iterates only the first 1). The intent (fetch extra stubs so non-actionable ones don't exhaust the budget) is reasonable but the formula and the `<= 5` threshold aren't documented. A comment or named constant would help future readers understand the intent and the tradeoff. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: app/integrations/__main__.py
Line: 40-57
Comment:
**Inverted dispatch pattern for `setup`/`verify`**
`setup` and `verify` are handled inside the `if cmd not in commands:` block even though they are perfectly valid commands. This inverts the normal "dispatch table + unknown handler" pattern, making it easy to miss that `setup`/`verify` are supported and harder to add future commands consistently. Consider adding them to `commands` directly, or at minimum restructuring so the unknown-command branch is clearly the last resort.
```suggestion
commands = {
"list": lambda _: cmd_list(),
"show": cmd_show,
"remove": cmd_remove,
}
if cmd == "setup":
resolved_service = cmd_setup(svc)
if resolved_service in SUPPORTED_VERIFY_SERVICES:
print(f" Verifying {resolved_service}...\n")
sys.exit(cmd_verify(resolved_service))
return
if cmd == "verify":
sys.exit(
cmd_verify(
svc,
send_slack_test="--send-slack-test" in option_args,
)
)
if cmd not in commands:
print(f" Unknown command '{cmd}'. Try: setup, list, show, remove, verify", file=sys.stderr)
sys.exit(1)
commands[cmd](svc)
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: app/remote/vercel_poller.py
Line: 758-776
Comment:
**`VercelResolutionError` in inner loop aborts the entire log-ID scan**
`_fetch_deployment_bundle` raises `VercelResolutionError` if `get_deployment` returns a non-success response. Because this is called inside the `for deployment in ...` loop without any per-iteration error handling, a single inaccessible deployment will surface as a user-facing 400 error rather than being skipped and the search continuing to the next deployment. A deleted or access-restricted deployment in the list (which Vercel can return) would always abort the search.
```python
for deployment in deployments_result.get("deployments", []):
deployment_id = str(deployment.get("id", "")).strip()
if not deployment_id:
continue
try:
deployment_details, events, runtime_logs = _fetch_deployment_bundle(
client,
project_id=project_id,
deployment_id=deployment_id,
log_limit=log_limit,
)
except VercelResolutionError:
logger.debug("Skipping deployment %s during log-ID scan", deployment_id)
continue
if any(str(log.get("id", "")).strip() == selected_log_id for log in runtime_logs):
return deployment_details, events, runtime_logs
if any(str(event.get("id", "")).strip() == selected_log_id for event in events):
return deployment_details, events, runtime_logs
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: app/remote/vercel_poller.py
Line: 1211-1215
Comment:
**Opaque `list_limit` inflation heuristic**
The expression `min(100, max(25, deployment_limit * 25))` silently fetches up to 25× the user-requested limit when `deployment_limit <= 5` (e.g., `deployment_limit=1` → fetches 25 from the API, then iterates only the first 1). The intent (fetch extra stubs so non-actionable ones don't exhaust the budget) is reasonable but the formula and the `<= 5` threshold aren't documented. A comment or named constant would help future readers understand the intent and the tradeoff.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: app/services/vercel/client.py
Line: 409-515
Comment:
**Blocking `time.sleep` in retry path**
`time.sleep(min(8.0, 2.0**attempt))` is a blocking call. When `get_runtime_logs` is invoked from `asyncio.to_thread` (as in `_parallel_deployment_events_and_runtime_logs`) this only blocks the thread-pool worker, which is acceptable. However, if it is ever called directly from a synchronous context on the event-loop thread (e.g., future tests or a sync script using `collect_vercel_candidates`), it will block the caller for up to 8 s per retry. Consider adding a comment noting it should never be called directly on the event loop thread.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "refactor(integrations): list-only Vercel..." | Re-trigger Greptile
|
|
||
| commands = { | ||
| "setup": cmd_setup, | ||
| "list": lambda _: cmd_list(), | ||
| "show": cmd_show, | ||
| "remove": cmd_remove, | ||
| "verify": lambda service: cmd_verify( | ||
| service, | ||
| send_slack_test="--send-slack-test" in option_args, | ||
| ), | ||
| } | ||
| if cmd not in commands: | ||
| print(f" Unknown command '{cmd}'. Try: {', '.join(commands)}", file=sys.stderr) | ||
| if cmd == "setup": | ||
| resolved_service = cmd_setup(svc) | ||
| if resolved_service in SUPPORTED_VERIFY_SERVICES: | ||
| print(f" Verifying {resolved_service}...\n") | ||
| sys.exit(cmd_verify(resolved_service)) | ||
| return | ||
| if cmd == "verify": | ||
| sys.exit( | ||
| cmd_verify( | ||
| svc, | ||
| send_slack_test="--send-slack-test" in option_args, |
There was a problem hiding this comment.
Inverted dispatch pattern for
setup/verify
setup and verify are handled inside the if cmd not in commands: block even though they are perfectly valid commands. This inverts the normal "dispatch table + unknown handler" pattern, making it easy to miss that setup/verify are supported and harder to add future commands consistently. Consider adding them to commands directly, or at minimum restructuring so the unknown-command branch is clearly the last resort.
| commands = { | |
| "setup": cmd_setup, | |
| "list": lambda _: cmd_list(), | |
| "show": cmd_show, | |
| "remove": cmd_remove, | |
| "verify": lambda service: cmd_verify( | |
| service, | |
| send_slack_test="--send-slack-test" in option_args, | |
| ), | |
| } | |
| if cmd not in commands: | |
| print(f" Unknown command '{cmd}'. Try: {', '.join(commands)}", file=sys.stderr) | |
| if cmd == "setup": | |
| resolved_service = cmd_setup(svc) | |
| if resolved_service in SUPPORTED_VERIFY_SERVICES: | |
| print(f" Verifying {resolved_service}...\n") | |
| sys.exit(cmd_verify(resolved_service)) | |
| return | |
| if cmd == "verify": | |
| sys.exit( | |
| cmd_verify( | |
| svc, | |
| send_slack_test="--send-slack-test" in option_args, | |
| commands = { | |
| "list": lambda _: cmd_list(), | |
| "show": cmd_show, | |
| "remove": cmd_remove, | |
| } | |
| if cmd == "setup": | |
| resolved_service = cmd_setup(svc) | |
| if resolved_service in SUPPORTED_VERIFY_SERVICES: | |
| print(f" Verifying {resolved_service}...\n") | |
| sys.exit(cmd_verify(resolved_service)) | |
| return | |
| if cmd == "verify": | |
| sys.exit( | |
| cmd_verify( | |
| svc, | |
| send_slack_test="--send-slack-test" in option_args, | |
| ) | |
| ) | |
| if cmd not in commands: | |
| print(f" Unknown command '{cmd}'. Try: setup, list, show, remove, verify", file=sys.stderr) | |
| sys.exit(1) | |
| commands[cmd](svc) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: app/integrations/__main__.py
Line: 40-57
Comment:
**Inverted dispatch pattern for `setup`/`verify`**
`setup` and `verify` are handled inside the `if cmd not in commands:` block even though they are perfectly valid commands. This inverts the normal "dispatch table + unknown handler" pattern, making it easy to miss that `setup`/`verify` are supported and harder to add future commands consistently. Consider adding them to `commands` directly, or at minimum restructuring so the unknown-command branch is clearly the last resort.
```suggestion
commands = {
"list": lambda _: cmd_list(),
"show": cmd_show,
"remove": cmd_remove,
}
if cmd == "setup":
resolved_service = cmd_setup(svc)
if resolved_service in SUPPORTED_VERIFY_SERVICES:
print(f" Verifying {resolved_service}...\n")
sys.exit(cmd_verify(resolved_service))
return
if cmd == "verify":
sys.exit(
cmd_verify(
svc,
send_slack_test="--send-slack-test" in option_args,
)
)
if cmd not in commands:
print(f" Unknown command '{cmd}'. Try: setup, list, show, remove, verify", file=sys.stderr)
sys.exit(1)
commands[cmd](svc)
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| ) | ||
| elif parsed.selected_log_id: | ||
| deployment, events, runtime_logs = _find_deployment_by_selected_log_id( | ||
| client, | ||
| project_id=str(project.get("id", "")).strip(), | ||
| selected_log_id=parsed.selected_log_id, | ||
| deployment_limit=_DEFAULT_DEPLOYMENT_LIMIT, | ||
| log_limit=_DEFAULT_LOG_LIMIT, | ||
| ) | ||
| else: | ||
| deployment, events, runtime_logs = _select_latest_actionable_deployment( | ||
| client, | ||
| project_id=str(project.get("id", "")).strip(), | ||
| deployment_limit=_DEFAULT_DEPLOYMENT_LIMIT, | ||
| log_limit=_DEFAULT_LOG_LIMIT, | ||
| ) | ||
|
|
||
| candidate = build_vercel_investigation_candidate( | ||
| project=project, |
There was a problem hiding this comment.
VercelResolutionError in inner loop aborts the entire log-ID scan
_fetch_deployment_bundle raises VercelResolutionError if get_deployment returns a non-success response. Because this is called inside the for deployment in ... loop without any per-iteration error handling, a single inaccessible deployment will surface as a user-facing 400 error rather than being skipped and the search continuing to the next deployment. A deleted or access-restricted deployment in the list (which Vercel can return) would always abort the search.
for deployment in deployments_result.get("deployments", []):
deployment_id = str(deployment.get("id", "")).strip()
if not deployment_id:
continue
try:
deployment_details, events, runtime_logs = _fetch_deployment_bundle(
client,
project_id=project_id,
deployment_id=deployment_id,
log_limit=log_limit,
)
except VercelResolutionError:
logger.debug("Skipping deployment %s during log-ID scan", deployment_id)
continue
if any(str(log.get("id", "")).strip() == selected_log_id for log in runtime_logs):
return deployment_details, events, runtime_logs
if any(str(event.get("id", "")).strip() == selected_log_id for event in events):
return deployment_details, events, runtime_logsPrompt To Fix With AI
This is a comment left during a code review.
Path: app/remote/vercel_poller.py
Line: 758-776
Comment:
**`VercelResolutionError` in inner loop aborts the entire log-ID scan**
`_fetch_deployment_bundle` raises `VercelResolutionError` if `get_deployment` returns a non-success response. Because this is called inside the `for deployment in ...` loop without any per-iteration error handling, a single inaccessible deployment will surface as a user-facing 400 error rather than being skipped and the search continuing to the next deployment. A deleted or access-restricted deployment in the list (which Vercel can return) would always abort the search.
```python
for deployment in deployments_result.get("deployments", []):
deployment_id = str(deployment.get("id", "")).strip()
if not deployment_id:
continue
try:
deployment_details, events, runtime_logs = _fetch_deployment_bundle(
client,
project_id=project_id,
deployment_id=deployment_id,
log_limit=log_limit,
)
except VercelResolutionError:
logger.debug("Skipping deployment %s during log-ID scan", deployment_id)
continue
if any(str(log.get("id", "")).strip() == selected_log_id for log in runtime_logs):
return deployment_details, events, runtime_logs
if any(str(event.get("id", "")).strip() == selected_log_id for event in events):
return deployment_details, events, runtime_logs
```
How can I resolve this? If you propose a fix, please make it concise.| logger.warning("[vercel] get_deployment_events error type=%s detail=%s", type(e).__name__, e) | ||
| return {"success": False, "error": str(e)} | ||
|
|
||
| def get_runtime_logs(self, deployment_id: str, limit: int = 100) -> dict[str, Any]: | ||
| """Fetch serverless function runtime logs (stdout/stderr) for a deployment.""" | ||
| def get_runtime_logs( | ||
| self, | ||
| deployment_id: str, | ||
| limit: int = 100, | ||
| *, | ||
| project_id: str = "", | ||
| ) -> dict[str, Any]: | ||
| """Fetch serverless function runtime logs for a deployment. | ||
|
|
||
| Per Vercel (`GET /v1/projects/{projectId}/deployments/{deploymentId}/runtime-logs | ||
| <https://vercel.com/docs/rest-api/logs/get-logs-for-a-deployment>`_), the response is | ||
| ``Content-Type: application/stream+json``: a **stream** of JSON objects, not one array. | ||
| This client uses :meth:`httpx.Client.stream` and :meth:`httpx.Response.iter_lines` to read | ||
| incrementally (line-delimited JSON) until ``limit`` rows are collected or the stream ends. | ||
| """ | ||
| params: dict[str, Any] = {"limit": min(limit, 2000)} | ||
| params.update(self.config.team_params) | ||
| try: | ||
| resp = self._get_client().get(f"/v1/deployments/{deployment_id}/logs", params=params) | ||
| resp.raise_for_status() | ||
| data = resp.json() | ||
| raw_logs = data if isinstance(data, list) else data.get("logs", []) | ||
| logs = [ | ||
| { | ||
| "id": log.get("id", ""), | ||
| "created_at": log.get("createdAt", ""), | ||
| "payload": log.get("payload", {}), | ||
| "type": log.get("type", ""), | ||
| "source": log.get("source", ""), | ||
| } | ||
| for log in raw_logs | ||
| ] | ||
| return {"success": True, "logs": logs, "total": len(logs)} | ||
| except httpx.HTTPStatusError as e: | ||
| logger.warning( | ||
| "[vercel] get_runtime_logs HTTP failure status=%s id=%r", | ||
| e.response.status_code, | ||
| deployment_id, | ||
| ) | ||
| return {"success": False, "error": f"HTTP {e.response.status_code}: {e.response.text[:200]}"} | ||
| except Exception as e: | ||
| logger.warning("[vercel] get_runtime_logs error type=%s detail=%s", type(e).__name__, e) | ||
| return {"success": False, "error": str(e)} | ||
| path = f"/v1/projects/{project_id}/deployments/{deployment_id}/runtime-logs" | ||
| if not project_id: | ||
| path = f"/v1/deployments/{deployment_id}/logs" | ||
|
|
||
| cap = min(limit, 2000) | ||
| stream_headers = { | ||
| **self.config.headers, | ||
| "Accept": "application/stream+json, application/x-ndjson, application/json", | ||
| } | ||
|
|
||
| last_retryable_detail = "" | ||
| last_retryable_kind = "" | ||
| for attempt in range(1, _RUNTIME_LOGS_READ_ATTEMPTS + 1): | ||
| try: | ||
| http = self._get_client() | ||
| with http.stream( | ||
| "GET", | ||
| path, | ||
| params=params, | ||
| headers=stream_headers, | ||
| timeout=_runtime_logs_http_timeout(), | ||
| ) as resp: | ||
| resp.raise_for_status() | ||
| raw_logs = _collect_runtime_logs_from_stream(resp, cap) | ||
| logs = [ | ||
| { | ||
| "id": log.get("id", "") or log.get("rowId", ""), | ||
| "created_at": log.get("createdAt", "") or log.get("timestampInMs", ""), | ||
| "payload": log.get("payload", {}), | ||
| "message": _extract_runtime_log_message(log), | ||
| "type": log.get("type", "") or log.get("level", ""), | ||
| "source": log.get("source", ""), | ||
| "level": log.get("level", ""), | ||
| "request_path": log.get("requestPath", ""), | ||
| "request_method": str( | ||
| log.get("requestMethod", "") or log.get("method", "") or "" | ||
| ).strip(), | ||
| "status_code": log.get("responseStatusCode", ""), | ||
| "domain": log.get("domain", ""), | ||
| } | ||
| for log in raw_logs | ||
| if isinstance(log, dict) | ||
| ] | ||
| return {"success": True, "logs": logs, "total": len(logs)} | ||
| except (httpx.ReadTimeout, httpx.ReadError, httpx.RemoteProtocolError) as exc: | ||
| last_retryable_detail = str(exc) | ||
| last_retryable_kind = type(exc).__name__ | ||
| logger.warning( | ||
| "[vercel] get_runtime_logs %s attempt %s/%s deployment=%r", | ||
| last_retryable_kind, | ||
| attempt, | ||
| _RUNTIME_LOGS_READ_ATTEMPTS, | ||
| deployment_id, | ||
| ) | ||
| if attempt >= _RUNTIME_LOGS_READ_ATTEMPTS: | ||
| break | ||
| time.sleep(min(8.0, 2.0**attempt)) | ||
| except httpx.HTTPStatusError as e: | ||
| if e.response.status_code == 404: | ||
| return {"success": True, "logs": [], "total": 0} | ||
| logger.warning( | ||
| "[vercel] get_runtime_logs HTTP failure status=%s id=%r", | ||
| e.response.status_code, | ||
| deployment_id, | ||
| ) | ||
| return {"success": False, "error": f"HTTP {e.response.status_code}: {e.response.text[:200]}"} | ||
| except Exception as e: | ||
| logger.warning("[vercel] get_runtime_logs error type=%s detail=%s", type(e).__name__, e) | ||
| return {"success": False, "error": str(e)} | ||
|
|
||
| detail = last_retryable_detail or "Transient runtime log transport error" | ||
| if last_retryable_kind == "ReadTimeout": | ||
| read_s = _runtime_logs_read_timeout_seconds() | ||
| return { | ||
| "success": False, | ||
| "error": ( | ||
| f"{detail} (after {_RUNTIME_LOGS_READ_ATTEMPTS} attempts, " | ||
| f"{read_s:g}s read timeout each; set VERCEL_RUNTIME_LOGS_READ_TIMEOUT to increase)" | ||
| ), | ||
| } | ||
|
|
||
| kind = last_retryable_kind or "transport error" | ||
| return { | ||
| "success": False, | ||
| "error": ( | ||
| f"{detail} (after {_RUNTIME_LOGS_READ_ATTEMPTS} attempts while reading " | ||
| f"runtime logs; last error type: {kind})" |
There was a problem hiding this comment.
Blocking
time.sleep in retry path
time.sleep(min(8.0, 2.0**attempt)) is a blocking call. When get_runtime_logs is invoked from asyncio.to_thread (as in _parallel_deployment_events_and_runtime_logs) this only blocks the thread-pool worker, which is acceptable. However, if it is ever called directly from a synchronous context on the event-loop thread (e.g., future tests or a sync script using collect_vercel_candidates), it will block the caller for up to 8 s per retry. Consider adding a comment noting it should never be called directly on the event loop thread.
Prompt To Fix With AI
This is a comment left during a code review.
Path: app/services/vercel/client.py
Line: 409-515
Comment:
**Blocking `time.sleep` in retry path**
`time.sleep(min(8.0, 2.0**attempt))` is a blocking call. When `get_runtime_logs` is invoked from `asyncio.to_thread` (as in `_parallel_deployment_events_and_runtime_logs`) this only blocks the thread-pool worker, which is acceptable. However, if it is ever called directly from a synchronous context on the event-loop thread (e.g., future tests or a sync script using `collect_vercel_candidates`), it will block the caller for up to 8 s per retry. Consider adding a comment noting it should never be called directly on the event loop thread.
How can I resolve this? If you propose a fix, please make it concise.… drop redundant header, fix CodeQL - slack_delivery `_post_direct` exception log key reverted to ``type=%s`` (matching the exact pre-refactor format ``type=%s channel=%s thread_ts=%s detail=%s``) so existing log parsers that key off ``type=`` keep working. Per @muddlebee's nit on PR Tracer-Cloud#952. - DeliveryResponse field renamed ``error_type`` → ``exc_type``. Pythonic abbreviation for "exception type"; can't be ``type`` because that shadows the builtin. Field/log-key are intentionally distinct: the field is the Python attribute, the log key matches the legacy format. - Drop ``Content-Type: application/json; charset=utf-8`` from ``_slack_bearer_headers`` and ``_discord_auth_headers``. ``httpx.post`` already sets ``Content-Type: application/json`` automatically when the request uses the ``json=`` kwarg, so the explicit header was redundant. No behavioural change for Slack/Discord (UTF-8 is httpx's default encoding and neither provider parses the charset suffix). - Fix CodeQL py/import-and-import-from alerts on tests Tracer-Cloud#503/Tracer-Cloud#504. The ``test_module_does_not_import_httpx`` regression test in each delivery test file was importing the module under test via both ``import X as mod`` and ``from X import Y`` styles. Switched to a single ``from app.utils import <module>`` style so the same module is no longer dual-imported. Test count unchanged (3208 pass / 2 skipped / 1 xfail). Lint, format, and ``mypy app/utils/`` clean.
…#952) * refactor(utils): extract shared HTTP-post helper for delivery modules Closes #864. ``app/utils/slack_delivery.py``, ``app/utils/discord_delivery.py``, and ``app/utils/telegram_delivery.py`` each issued ``httpx.post`` directly, applied a per-provider timeout, parsed the response body, and converted exceptions to ``(False, error, ...)`` tuples. The transport pieces were identical; only the success criteria, auth scheme, and error-message extraction differed per provider. Add ``app/utils/delivery_transport.py`` with ``post_json`` plus a ``DeliveryResponse`` dataclass that captures the shared transport behavior: HTTP POST, timeout, optional ``follow_redirects``, exception suppression, and JSON decoding with graceful fallback. The helper deliberately does not decide provider-level success — callers inspect ``status_code`` / ``data`` per their own semantics. Refactored callers: - Slack: ``_call_reactions_api``, ``_post_direct``, ``_post_via_webapp``, ``_post_via_incoming_webhook`` — all four code paths now go through ``post_json``. ``send_slack_report`` orchestration unchanged. - Discord: ``post_discord_message``, ``create_discord_thread`` — same Bearer/Bot header pattern factored into ``_discord_auth_headers``. - Telegram: ``post_telegram_message`` — bot-token redaction in error messages preserved by re-running ``_redact_token`` over ``response.error``. Provider-specific payload building, success criteria (``data["ok"]`` for Slack, status codes for Discord/Telegram), and error extraction all stay in the calling modules. Public function signatures are unchanged. Tests: - ``tests/utils/test_delivery_transport.py`` (new) — 15 tests covering happy path, every transport-failure shape (httpx.ConnectError, ReadTimeout, RequestError, OSError, RuntimeError), JSON-decode fallbacks (non-JSON body, JSON array, empty body), header / timeout / follow_redirects pass-through, and DeliveryResponse frozen-dataclass invariants. - ``tests/utils/test_slack_delivery.py`` (new) — 24 tests covering all four Slack code paths, ``send_slack_report`` orchestration, and fallback chain ``direct → webapp``. - ``tests/utils/test_discord_delivery.py`` (extended) — 3 new tests pinning the helper-delegation contract: module no longer imports ``httpx``, ``post_discord_message`` and ``create_discord_thread`` both go through ``post_json``. - ``tests/utils/test_telegram_delivery.py`` (extended) — 3 new tests pinning the same contract plus a regression test that the bot-token is redacted out of error strings even when the failure originates from the shared transport. - All pre-existing tests in ``test_discord_delivery.py`` and ``test_telegram_delivery.py`` updated to patch ``app.utils.delivery_transport.httpx.post`` (the new transport boundary) instead of the per-module ``httpx.post`` import that no longer exists. Verification: - ``pytest tests/``: 3191 pass, 2 skipped, 1 xfailed, 0 failures. - ``pytest tests/utils/``: 105 pass, 1 skipped (was 32 + 26 = 58 across delivery modules; now 105 with helper + Slack additions). - ``ruff check app/ tests/`` and ``ruff format --check``: clean. - ``mypy app/utils/``: clean on touched modules; the 5 reported errors are pre-existing missing-stub warnings in ``app/integrations/{mariadb,mysql,hf_remote}`` unrelated to this PR. * ci: retrigger after anyio/mcp circular-import flake The previous CI run hit a known transient ImportError in anyio.lowlevel/mcp when opensre integrations list ran on a pytest-xdist worker. Reproduces zero times locally on origin/main and on this branch; pushing an empty commit to re-run with a fresh interpreter pool. * fix(utils): address greptile review on delivery transport refactor - Wrap `DeliveryResponse.data` in `MappingProxyType` so the frozen dataclass is fully immutable end-to-end. Mutating the parsed body now raises `TypeError` instead of silently succeeding, and caller- passed dicts can no longer leak mutations into the response. - Add `error_type` field on `DeliveryResponse` populated with `type(exc).__name__` on transport failures. `slack_delivery._post_direct` threads it back into the exception log so on-call can distinguish `TimeoutError` from `ConnectionError` at a glance — restores the pre-refactor log shape that #864 had dropped. - Add `TestDelegatesToSharedTransport` to `tests/utils/test_slack_delivery.py`, mirroring the regression class on the discord/telegram test files. Pins that `slack_delivery` does not import `httpx` and that all four code paths (`_call_reactions_api`, `_post_direct`, `_post_via_webapp`, `_post_via_incoming_webhook`) route through `delivery_transport.post_json`. - Add focused tests for the immutable-data, error_type, and slack exc_type-log behaviours (+17 tests; 3208 pass / 2 skipped / 1 xfail). Tighten `_discord_error_from_data` signature to `Mapping[str, Any]` so mypy stays clean against the new read-only `data` type. * refactor(utils): address review nits — restore log key, rename field, drop redundant header, fix CodeQL - slack_delivery `_post_direct` exception log key reverted to ``type=%s`` (matching the exact pre-refactor format ``type=%s channel=%s thread_ts=%s detail=%s``) so existing log parsers that key off ``type=`` keep working. Per @muddlebee's nit on PR #952. - DeliveryResponse field renamed ``error_type`` → ``exc_type``. Pythonic abbreviation for "exception type"; can't be ``type`` because that shadows the builtin. Field/log-key are intentionally distinct: the field is the Python attribute, the log key matches the legacy format. - Drop ``Content-Type: application/json; charset=utf-8`` from ``_slack_bearer_headers`` and ``_discord_auth_headers``. ``httpx.post`` already sets ``Content-Type: application/json`` automatically when the request uses the ``json=`` kwarg, so the explicit header was redundant. No behavioural change for Slack/Discord (UTF-8 is httpx's default encoding and neither provider parses the charset suffix). - Fix CodeQL py/import-and-import-from alerts on tests #503/#504. The ``test_module_does_not_import_httpx`` regression test in each delivery test file was importing the module under test via both ``import X as mod`` and ``from X import Y`` styles. Switched to a single ``from app.utils import <module>`` style so the same module is no longer dual-imported. Test count unchanged (3208 pass / 2 skipped / 1 xfail). Lint, format, and ``mypy app/utils/`` clean. * fix(slack): restore charset=utf-8 — Slack emits missing_charset warning without it Live e2e probe against real ``slack.com/api/chat.postMessage`` revealed that dropping the explicit ``Content-Type: application/json; charset=utf-8`` header (per the previous nit-fix commit) caused Slack to add a ``missing_charset`` entry to ``response_metadata.warnings`` on every request. httpx alone sets only the bare ``application/json`` for ``json=`` kwargs, but Slack's docs explicitly recommend the charset suffix (https://api.slack.com/web#posting_json) and emit the warning otherwise. Restored the ``charset=utf-8`` header on ``_slack_bearer_headers`` with an inline comment explaining the reason. ``_discord_auth_headers`` stays auth-only (Discord does not emit this warning, and its docs do not require the charset suffix). Pinned via direct header assertions on the two existing slack tests that already captured headers (``test_sends_correct_url_and_headers`` for the reactions API, ``test_sends_thread_reply_payload`` for chat.postMessage), so a future regression that re-drops the charset will fail loudly. Verified live: - direct ``chat.postMessage`` probe now returns ``warnings=[]`` (was ``['missing_charset']`` before this commit). - 38/38 live checks against real Slack / Discord / Telegram pass. - 3208 unit tests pass; lint / format / mypy clean.
Fixes #
Describe the changes you have made in this PR -
Screenshots of the UI changes (If any) -
Code Understanding and AI Usage
Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?
If you used AI assistance:
Explain your implementation approach:
Checklist before requesting a review
Note: Please check Allow edits from maintainers if you would like us to assist in the PR.