Skip to content

Split/3 integrations verify#504

Merged
davincios merged 10 commits intomainfrom
split/3-integrations-verify
Apr 8, 2026
Merged

Split/3 integrations verify#504
davincios merged 10 commits intomainfrom
split/3-integrations-verify

Conversation

@davincios
Copy link
Copy Markdown
Contributor

Fixes #

Describe the changes you have made in this PR -

Screenshots of the UI changes (If any) -


Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

  • No, I wrote all the code myself
  • Yes, I used AI assistance (continue below)

If you used AI assistance:

  • I have reviewed every single line of the AI-generated code
  • I can explain the purpose and logic of each function/component I added
  • I have tested edge cases and understand how the code handles them
  • I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:


Checklist before requesting a review

  • I have added proper PR title and linked to the issue
  • I have performed a self-review of my code
  • I can explain the purpose of every function, class, and logic block I added
  • I understand why my changes work and have tested them thoroughly
  • I have considered potential edge cases and how my code handles them
  • If it is a core feature, I have added thorough tests
  • My code follows the project's style guidelines and conventions

Note: Please check Allow edits from maintainers if you would like us to assist in the PR.

davincios added 10 commits April 6, 2026 14:50
…pport

- Switch LangGraph stream_mode from ["updates"] to ["events"] for fine-grained
  tool calls, LLM reasoning, and chain transitions in real-time
- Add reasoning mapper (app/remote/reasoning.py) to translate events into
  human-readable spinner subtext (replaces fake cycling verbs)
- Enhance StreamRenderer and _LiveSpinner to show live tool calls and LLM
  reasoning instead of cosmetic placeholders
- Add astream_investigation() async runner and CLI bridge for local streaming
  with the same UX as remote
- Add POST /investigate/stream SSE endpoint to the lightweight FastAPI server
- Make opensre deploy fully interactive: context-aware menu, health check,
  confirmation prompts, branch selection
- Add named remotes to the wizard store so multiple remote endpoints (ec2,
  staging, local) can coexist with an active selection
- Deploy always saves the EC2 URL as a named remote and sets it active
- opensre remote shows a remote picker when multiple remotes are configured

Made-with: Cursor
…ions

Empty investigation reports were caused by two bugs:

1. _save_investigation used result.get('key', 'N/A') which returns ""
   when the key exists but is empty, instead of the 'N/A' fallback.
   Fixed with `or 'N/A'` to treat empty strings as missing.

2. Noise-classified alerts routed to END before diagnose/publish ran,
   leaving root_cause, report, and problem_md as "". Now propagates
   is_noise through the result dict and writes an explanatory message.

Also adds interactive investigation browser (select + view + save)
and persists streamed investigation results as .md files.

Made-with: Cursor
Keep managed EC2 deployment state in app-owned local storage, preserve streamed investigations on disconnect, and surface local streaming failures promptly so the remote CLI can stay stream-first and merge-ready.

Made-with: Cursor
Redact api_token values in integration show output and allow vercel records to be shown through the managed integrations CLI so credentials are not exposed during inspection.

Made-with: Cursor
Support Vercel incident discovery and RCA from the CLI and remote server so deployment failures can be selected, investigated, and correlated with GitHub code context.

Made-with: Cursor
Use the project-scoped Vercel runtime log API and structured error fields so the CLI finds the same incidents users see in the dashboard. Add loading feedback after project selection and keep the Vercel RCA tests aligned with the updated behavior.

Made-with: Cursor
Resolve the PR conflicts by keeping the Vercel RCA flows, setup verification behavior, and remote server integrations while bringing in the latest main-branch updates.

Made-with: Cursor
- Add INTEGRATIONS_STORE_PATH and shared home constants; wire integrations store.

- Add prompt_support (questionary escape/cancel) for opensre and wizard entrypoints.

- Wizard prompt/store tweaks and tests; daily-update workflow and Makefile.

Made-with: Cursor
- Extend Vercel client (streaming runtime logs), poller, VercelLogsTool, and remote server wiring.

- Remove vercel_incidents and integrations vercel CLI; update poller tests.

- EC2 test fixture, pyproject deps, Makefile-aligned test assets.

Made-with: Cursor
- Drop verify runtime-log flags from verify_integrations and python -m app.integrations.

- Wizard: Vercel choice hint and post-select note about API limits.

- Update CLI integration tests and smoke test.

Made-with: Cursor
@davincios davincios merged commit 784dd76 into main Apr 8, 2026
3 checks passed
@davincios davincios deleted the split/3-integrations-verify branch April 8, 2026 01:05
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 8, 2026

Greptile Summary

This PR adds Vercel integration setup/verify, a background VercelPoller that auto-investigates deployment failures, Escape-key support for questionary prompts, and refactors cmd_verify to return an exit code rather than calling sys.exit directly. It also extracts common filesystem paths into app/constants and wires vercel + github evidence into the RCA pipeline.

Confidence Score: 4/5

Safe to merge after addressing the log-ID scan abort issue; remaining findings are style/P2.

One P2 logic finding (_find_deployment_by_selected_log_id aborts on any inner fetch error rather than continuing the scan) could surface as a user-facing 400 for an edge-case Vercel API response. The other comments are structural and non-blocking. Test coverage for the new poller and server integration is thorough.

app/remote/vercel_poller.py (_find_deployment_by_selected_log_id inner loop), app/integrations/main.py (dispatch pattern)

Vulnerabilities

No security concerns identified. The path-traversal protection in _safe_investigation_path is preserved and correctly migrated from str to Path. The Vercel API token is handled as a secret field in VercelConfig and is not logged or echoed.

Important Files Changed

Filename Overview
app/remote/vercel_poller.py New file: Vercel URL resolution, background polling loop, and alert enrichment. Core logic is solid; one scan loop can abort early on any per-deployment fetch error.
app/services/vercel/client.py Adds streaming runtime-log fetch with retry, new get_project and get_deployment_events improvements, and normalised git metadata. Retry loop uses blocking time.sleep which is acceptable when called from a thread pool.
app/remote/server.py Adds VercelPoller lifespan task, vercel_url input field, and extracts _execute_investigation/_normalized_request_alert helpers. Clean refactor with correct cancellation handling.
app/integrations/main.py Moves setup and verify commands out of the dispatch dict into the if cmd not in commands: branch — functionally correct but structurally inverted, making the intent harder to follow.
app/integrations/cli.py Adds Vercel setup handler, changes cmd_setup to return resolved service name, and cmd_verify to return exit code. _die correctly annotated as NoReturn.
app/cli/commands/integrations.py Auto-verify step after setup for services in VERIFY_SERVICES; cmd_verify exit code propagated via raise SystemExit. All analytics calls correctly ordered.
app/cli/prompt_support.py New module that monkey-patches questionary to add Escape-to-cancel across all prompt types. Guard flag prevents double-installation.
app/nodes/plan_actions/detect_sources.py Adds Vercel-prefixed field aliases for GitHub owner/repo/SHA/ref; enriches Vercel source dict with project_name, selected_log_id, and error_message.
app/nodes/root_cause_diagnosis/prompt_builder.py Adds Vercel deployment and GitHub code evidence sections to the RCA prompt. json was already imported; section builders are well-guarded.
.github/workflows/daily-update.yml Simplifies scheduling: single cron at 01:17 UTC (previous 23:59 London gate was fragile with delayed runners), removes the gate step and its conditional guards.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI / HTTP Client
    participant Server as remote/server.py
    participant Enrich as enrich_remote_alert_from_vercel
    participant Poller as VercelPoller (background)
    participant VercelAPI as Vercel API
    participant RCA as run_investigation_cli

    CLI->>Server: POST /investigate {vercel_url?, raw_alert}
    Server->>Enrich: _normalized_request_alert(req)
    alt vercel_url or vercel fields present
        Enrich->>VercelAPI: list_projects / get_deployment / get_runtime_logs
        VercelAPI-->>Enrich: deployment + logs
        Enrich-->>Server: enriched raw_alert
    else no vercel fields
        Enrich-->>Server: raw_alert (pass-through)
    end
    Server->>RCA: _execute_investigation(raw_alert, ...)
    RCA-->>Server: result dict
    Server-->>CLI: InvestigateResponse

    Note over Poller: Runs on interval (VERCEL_POLL_INTERVAL_SECONDS)
    Poller->>VercelAPI: list_projects / list_deployments / get_deployment_bundle
    VercelAPI-->>Poller: actionable candidates
    Poller->>RCA: _handle_polled_candidate → _execute_investigation
    RCA-->>Poller: result
    Poller->>Server: _save_investigation (via asyncio.to_thread)
Loading

Comments Outside Diff (1)

  1. app/remote/vercel_poller.py, line 1211-1215 (link)

    P2 Opaque list_limit inflation heuristic

    The expression min(100, max(25, deployment_limit * 25)) silently fetches up to 25× the user-requested limit when deployment_limit <= 5 (e.g., deployment_limit=1 → fetches 25 from the API, then iterates only the first 1). The intent (fetch extra stubs so non-actionable ones don't exhaust the budget) is reasonable but the formula and the <= 5 threshold aren't documented. A comment or named constant would help future readers understand the intent and the tradeoff.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: app/remote/vercel_poller.py
    Line: 1211-1215
    
    Comment:
    **Opaque `list_limit` inflation heuristic**
    
    The expression `min(100, max(25, deployment_limit * 25))` silently fetches up to 25× the user-requested limit when `deployment_limit <= 5` (e.g., `deployment_limit=1` → fetches 25 from the API, then iterates only the first 1). The intent (fetch extra stubs so non-actionable ones don't exhaust the budget) is reasonable but the formula and the `<= 5` threshold aren't documented. A comment or named constant would help future readers understand the intent and the tradeoff.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: app/integrations/__main__.py
Line: 40-57

Comment:
**Inverted dispatch pattern for `setup`/`verify`**

`setup` and `verify` are handled inside the `if cmd not in commands:` block even though they are perfectly valid commands. This inverts the normal "dispatch table + unknown handler" pattern, making it easy to miss that `setup`/`verify` are supported and harder to add future commands consistently. Consider adding them to `commands` directly, or at minimum restructuring so the unknown-command branch is clearly the last resort.

```suggestion
    commands = {
        "list": lambda _: cmd_list(),
        "show": cmd_show,
        "remove": cmd_remove,
    }

    if cmd == "setup":
        resolved_service = cmd_setup(svc)
        if resolved_service in SUPPORTED_VERIFY_SERVICES:
            print(f"  Verifying {resolved_service}...\n")
            sys.exit(cmd_verify(resolved_service))
        return
    if cmd == "verify":
        sys.exit(
            cmd_verify(
                svc,
                send_slack_test="--send-slack-test" in option_args,
            )
        )
    if cmd not in commands:
        print(f"  Unknown command '{cmd}'. Try: setup, list, show, remove, verify", file=sys.stderr)
        sys.exit(1)

    commands[cmd](svc)
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: app/remote/vercel_poller.py
Line: 758-776

Comment:
**`VercelResolutionError` in inner loop aborts the entire log-ID scan**

`_fetch_deployment_bundle` raises `VercelResolutionError` if `get_deployment` returns a non-success response. Because this is called inside the `for deployment in ...` loop without any per-iteration error handling, a single inaccessible deployment will surface as a user-facing 400 error rather than being skipped and the search continuing to the next deployment. A deleted or access-restricted deployment in the list (which Vercel can return) would always abort the search.

```python
for deployment in deployments_result.get("deployments", []):
    deployment_id = str(deployment.get("id", "")).strip()
    if not deployment_id:
        continue
    try:
        deployment_details, events, runtime_logs = _fetch_deployment_bundle(
            client,
            project_id=project_id,
            deployment_id=deployment_id,
            log_limit=log_limit,
        )
    except VercelResolutionError:
        logger.debug("Skipping deployment %s during log-ID scan", deployment_id)
        continue
    if any(str(log.get("id", "")).strip() == selected_log_id for log in runtime_logs):
        return deployment_details, events, runtime_logs
    if any(str(event.get("id", "")).strip() == selected_log_id for event in events):
        return deployment_details, events, runtime_logs
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: app/remote/vercel_poller.py
Line: 1211-1215

Comment:
**Opaque `list_limit` inflation heuristic**

The expression `min(100, max(25, deployment_limit * 25))` silently fetches up to 25× the user-requested limit when `deployment_limit <= 5` (e.g., `deployment_limit=1` → fetches 25 from the API, then iterates only the first 1). The intent (fetch extra stubs so non-actionable ones don't exhaust the budget) is reasonable but the formula and the `<= 5` threshold aren't documented. A comment or named constant would help future readers understand the intent and the tradeoff.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: app/services/vercel/client.py
Line: 409-515

Comment:
**Blocking `time.sleep` in retry path**

`time.sleep(min(8.0, 2.0**attempt))` is a blocking call. When `get_runtime_logs` is invoked from `asyncio.to_thread` (as in `_parallel_deployment_events_and_runtime_logs`) this only blocks the thread-pool worker, which is acceptable. However, if it is ever called directly from a synchronous context on the event-loop thread (e.g., future tests or a sync script using `collect_vercel_candidates`), it will block the caller for up to 8 s per retry. Consider adding a comment noting it should never be called directly on the event loop thread.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "refactor(integrations): list-only Vercel..." | Re-trigger Greptile

Comment on lines 40 to +57

commands = {
"setup": cmd_setup,
"list": lambda _: cmd_list(),
"show": cmd_show,
"remove": cmd_remove,
"verify": lambda service: cmd_verify(
service,
send_slack_test="--send-slack-test" in option_args,
),
}
if cmd not in commands:
print(f" Unknown command '{cmd}'. Try: {', '.join(commands)}", file=sys.stderr)
if cmd == "setup":
resolved_service = cmd_setup(svc)
if resolved_service in SUPPORTED_VERIFY_SERVICES:
print(f" Verifying {resolved_service}...\n")
sys.exit(cmd_verify(resolved_service))
return
if cmd == "verify":
sys.exit(
cmd_verify(
svc,
send_slack_test="--send-slack-test" in option_args,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inverted dispatch pattern for setup/verify

setup and verify are handled inside the if cmd not in commands: block even though they are perfectly valid commands. This inverts the normal "dispatch table + unknown handler" pattern, making it easy to miss that setup/verify are supported and harder to add future commands consistently. Consider adding them to commands directly, or at minimum restructuring so the unknown-command branch is clearly the last resort.

Suggested change
commands = {
"setup": cmd_setup,
"list": lambda _: cmd_list(),
"show": cmd_show,
"remove": cmd_remove,
"verify": lambda service: cmd_verify(
service,
send_slack_test="--send-slack-test" in option_args,
),
}
if cmd not in commands:
print(f" Unknown command '{cmd}'. Try: {', '.join(commands)}", file=sys.stderr)
if cmd == "setup":
resolved_service = cmd_setup(svc)
if resolved_service in SUPPORTED_VERIFY_SERVICES:
print(f" Verifying {resolved_service}...\n")
sys.exit(cmd_verify(resolved_service))
return
if cmd == "verify":
sys.exit(
cmd_verify(
svc,
send_slack_test="--send-slack-test" in option_args,
commands = {
"list": lambda _: cmd_list(),
"show": cmd_show,
"remove": cmd_remove,
}
if cmd == "setup":
resolved_service = cmd_setup(svc)
if resolved_service in SUPPORTED_VERIFY_SERVICES:
print(f" Verifying {resolved_service}...\n")
sys.exit(cmd_verify(resolved_service))
return
if cmd == "verify":
sys.exit(
cmd_verify(
svc,
send_slack_test="--send-slack-test" in option_args,
)
)
if cmd not in commands:
print(f" Unknown command '{cmd}'. Try: setup, list, show, remove, verify", file=sys.stderr)
sys.exit(1)
commands[cmd](svc)
Prompt To Fix With AI
This is a comment left during a code review.
Path: app/integrations/__main__.py
Line: 40-57

Comment:
**Inverted dispatch pattern for `setup`/`verify`**

`setup` and `verify` are handled inside the `if cmd not in commands:` block even though they are perfectly valid commands. This inverts the normal "dispatch table + unknown handler" pattern, making it easy to miss that `setup`/`verify` are supported and harder to add future commands consistently. Consider adding them to `commands` directly, or at minimum restructuring so the unknown-command branch is clearly the last resort.

```suggestion
    commands = {
        "list": lambda _: cmd_list(),
        "show": cmd_show,
        "remove": cmd_remove,
    }

    if cmd == "setup":
        resolved_service = cmd_setup(svc)
        if resolved_service in SUPPORTED_VERIFY_SERVICES:
            print(f"  Verifying {resolved_service}...\n")
            sys.exit(cmd_verify(resolved_service))
        return
    if cmd == "verify":
        sys.exit(
            cmd_verify(
                svc,
                send_slack_test="--send-slack-test" in option_args,
            )
        )
    if cmd not in commands:
        print(f"  Unknown command '{cmd}'. Try: setup, list, show, remove, verify", file=sys.stderr)
        sys.exit(1)

    commands[cmd](svc)
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +758 to +776
)
elif parsed.selected_log_id:
deployment, events, runtime_logs = _find_deployment_by_selected_log_id(
client,
project_id=str(project.get("id", "")).strip(),
selected_log_id=parsed.selected_log_id,
deployment_limit=_DEFAULT_DEPLOYMENT_LIMIT,
log_limit=_DEFAULT_LOG_LIMIT,
)
else:
deployment, events, runtime_logs = _select_latest_actionable_deployment(
client,
project_id=str(project.get("id", "")).strip(),
deployment_limit=_DEFAULT_DEPLOYMENT_LIMIT,
log_limit=_DEFAULT_LOG_LIMIT,
)

candidate = build_vercel_investigation_candidate(
project=project,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 VercelResolutionError in inner loop aborts the entire log-ID scan

_fetch_deployment_bundle raises VercelResolutionError if get_deployment returns a non-success response. Because this is called inside the for deployment in ... loop without any per-iteration error handling, a single inaccessible deployment will surface as a user-facing 400 error rather than being skipped and the search continuing to the next deployment. A deleted or access-restricted deployment in the list (which Vercel can return) would always abort the search.

for deployment in deployments_result.get("deployments", []):
    deployment_id = str(deployment.get("id", "")).strip()
    if not deployment_id:
        continue
    try:
        deployment_details, events, runtime_logs = _fetch_deployment_bundle(
            client,
            project_id=project_id,
            deployment_id=deployment_id,
            log_limit=log_limit,
        )
    except VercelResolutionError:
        logger.debug("Skipping deployment %s during log-ID scan", deployment_id)
        continue
    if any(str(log.get("id", "")).strip() == selected_log_id for log in runtime_logs):
        return deployment_details, events, runtime_logs
    if any(str(event.get("id", "")).strip() == selected_log_id for event in events):
        return deployment_details, events, runtime_logs
Prompt To Fix With AI
This is a comment left during a code review.
Path: app/remote/vercel_poller.py
Line: 758-776

Comment:
**`VercelResolutionError` in inner loop aborts the entire log-ID scan**

`_fetch_deployment_bundle` raises `VercelResolutionError` if `get_deployment` returns a non-success response. Because this is called inside the `for deployment in ...` loop without any per-iteration error handling, a single inaccessible deployment will surface as a user-facing 400 error rather than being skipped and the search continuing to the next deployment. A deleted or access-restricted deployment in the list (which Vercel can return) would always abort the search.

```python
for deployment in deployments_result.get("deployments", []):
    deployment_id = str(deployment.get("id", "")).strip()
    if not deployment_id:
        continue
    try:
        deployment_details, events, runtime_logs = _fetch_deployment_bundle(
            client,
            project_id=project_id,
            deployment_id=deployment_id,
            log_limit=log_limit,
        )
    except VercelResolutionError:
        logger.debug("Skipping deployment %s during log-ID scan", deployment_id)
        continue
    if any(str(log.get("id", "")).strip() == selected_log_id for log in runtime_logs):
        return deployment_details, events, runtime_logs
    if any(str(event.get("id", "")).strip() == selected_log_id for event in events):
        return deployment_details, events, runtime_logs
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 409 to +515
logger.warning("[vercel] get_deployment_events error type=%s detail=%s", type(e).__name__, e)
return {"success": False, "error": str(e)}

def get_runtime_logs(self, deployment_id: str, limit: int = 100) -> dict[str, Any]:
"""Fetch serverless function runtime logs (stdout/stderr) for a deployment."""
def get_runtime_logs(
self,
deployment_id: str,
limit: int = 100,
*,
project_id: str = "",
) -> dict[str, Any]:
"""Fetch serverless function runtime logs for a deployment.

Per Vercel (`GET /v1/projects/{projectId}/deployments/{deploymentId}/runtime-logs
<https://vercel.com/docs/rest-api/logs/get-logs-for-a-deployment>`_), the response is
``Content-Type: application/stream+json``: a **stream** of JSON objects, not one array.
This client uses :meth:`httpx.Client.stream` and :meth:`httpx.Response.iter_lines` to read
incrementally (line-delimited JSON) until ``limit`` rows are collected or the stream ends.
"""
params: dict[str, Any] = {"limit": min(limit, 2000)}
params.update(self.config.team_params)
try:
resp = self._get_client().get(f"/v1/deployments/{deployment_id}/logs", params=params)
resp.raise_for_status()
data = resp.json()
raw_logs = data if isinstance(data, list) else data.get("logs", [])
logs = [
{
"id": log.get("id", ""),
"created_at": log.get("createdAt", ""),
"payload": log.get("payload", {}),
"type": log.get("type", ""),
"source": log.get("source", ""),
}
for log in raw_logs
]
return {"success": True, "logs": logs, "total": len(logs)}
except httpx.HTTPStatusError as e:
logger.warning(
"[vercel] get_runtime_logs HTTP failure status=%s id=%r",
e.response.status_code,
deployment_id,
)
return {"success": False, "error": f"HTTP {e.response.status_code}: {e.response.text[:200]}"}
except Exception as e:
logger.warning("[vercel] get_runtime_logs error type=%s detail=%s", type(e).__name__, e)
return {"success": False, "error": str(e)}
path = f"/v1/projects/{project_id}/deployments/{deployment_id}/runtime-logs"
if not project_id:
path = f"/v1/deployments/{deployment_id}/logs"

cap = min(limit, 2000)
stream_headers = {
**self.config.headers,
"Accept": "application/stream+json, application/x-ndjson, application/json",
}

last_retryable_detail = ""
last_retryable_kind = ""
for attempt in range(1, _RUNTIME_LOGS_READ_ATTEMPTS + 1):
try:
http = self._get_client()
with http.stream(
"GET",
path,
params=params,
headers=stream_headers,
timeout=_runtime_logs_http_timeout(),
) as resp:
resp.raise_for_status()
raw_logs = _collect_runtime_logs_from_stream(resp, cap)
logs = [
{
"id": log.get("id", "") or log.get("rowId", ""),
"created_at": log.get("createdAt", "") or log.get("timestampInMs", ""),
"payload": log.get("payload", {}),
"message": _extract_runtime_log_message(log),
"type": log.get("type", "") or log.get("level", ""),
"source": log.get("source", ""),
"level": log.get("level", ""),
"request_path": log.get("requestPath", ""),
"request_method": str(
log.get("requestMethod", "") or log.get("method", "") or ""
).strip(),
"status_code": log.get("responseStatusCode", ""),
"domain": log.get("domain", ""),
}
for log in raw_logs
if isinstance(log, dict)
]
return {"success": True, "logs": logs, "total": len(logs)}
except (httpx.ReadTimeout, httpx.ReadError, httpx.RemoteProtocolError) as exc:
last_retryable_detail = str(exc)
last_retryable_kind = type(exc).__name__
logger.warning(
"[vercel] get_runtime_logs %s attempt %s/%s deployment=%r",
last_retryable_kind,
attempt,
_RUNTIME_LOGS_READ_ATTEMPTS,
deployment_id,
)
if attempt >= _RUNTIME_LOGS_READ_ATTEMPTS:
break
time.sleep(min(8.0, 2.0**attempt))
except httpx.HTTPStatusError as e:
if e.response.status_code == 404:
return {"success": True, "logs": [], "total": 0}
logger.warning(
"[vercel] get_runtime_logs HTTP failure status=%s id=%r",
e.response.status_code,
deployment_id,
)
return {"success": False, "error": f"HTTP {e.response.status_code}: {e.response.text[:200]}"}
except Exception as e:
logger.warning("[vercel] get_runtime_logs error type=%s detail=%s", type(e).__name__, e)
return {"success": False, "error": str(e)}

detail = last_retryable_detail or "Transient runtime log transport error"
if last_retryable_kind == "ReadTimeout":
read_s = _runtime_logs_read_timeout_seconds()
return {
"success": False,
"error": (
f"{detail} (after {_RUNTIME_LOGS_READ_ATTEMPTS} attempts, "
f"{read_s:g}s read timeout each; set VERCEL_RUNTIME_LOGS_READ_TIMEOUT to increase)"
),
}

kind = last_retryable_kind or "transport error"
return {
"success": False,
"error": (
f"{detail} (after {_RUNTIME_LOGS_READ_ATTEMPTS} attempts while reading "
f"runtime logs; last error type: {kind})"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Blocking time.sleep in retry path

time.sleep(min(8.0, 2.0**attempt)) is a blocking call. When get_runtime_logs is invoked from asyncio.to_thread (as in _parallel_deployment_events_and_runtime_logs) this only blocks the thread-pool worker, which is acceptable. However, if it is ever called directly from a synchronous context on the event-loop thread (e.g., future tests or a sync script using collect_vercel_candidates), it will block the caller for up to 8 s per retry. Consider adding a comment noting it should never be called directly on the event loop thread.

Prompt To Fix With AI
This is a comment left during a code review.
Path: app/services/vercel/client.py
Line: 409-515

Comment:
**Blocking `time.sleep` in retry path**

`time.sleep(min(8.0, 2.0**attempt))` is a blocking call. When `get_runtime_logs` is invoked from `asyncio.to_thread` (as in `_parallel_deployment_events_and_runtime_logs`) this only blocks the thread-pool worker, which is acceptable. However, if it is ever called directly from a synchronous context on the event-loop thread (e.g., future tests or a sync script using `collect_vercel_candidates`), it will block the caller for up to 8 s per retry. Consider adding a comment noting it should never be called directly on the event loop thread.

How can I resolve this? If you propose a fix, please make it concise.

mayankbharati-ops added a commit to mayankbharati-ops/opensre that referenced this pull request Apr 27, 2026
… drop redundant header, fix CodeQL

- slack_delivery `_post_direct` exception log key reverted to ``type=%s``
  (matching the exact pre-refactor format ``type=%s channel=%s
  thread_ts=%s detail=%s``) so existing log parsers that key off
  ``type=`` keep working. Per @muddlebee's nit on PR Tracer-Cloud#952.
- DeliveryResponse field renamed ``error_type`` → ``exc_type``. Pythonic
  abbreviation for "exception type"; can't be ``type`` because that
  shadows the builtin. Field/log-key are intentionally distinct: the
  field is the Python attribute, the log key matches the legacy format.
- Drop ``Content-Type: application/json; charset=utf-8`` from
  ``_slack_bearer_headers`` and ``_discord_auth_headers``. ``httpx.post``
  already sets ``Content-Type: application/json`` automatically when the
  request uses the ``json=`` kwarg, so the explicit header was
  redundant. No behavioural change for Slack/Discord (UTF-8 is httpx's
  default encoding and neither provider parses the charset suffix).
- Fix CodeQL py/import-and-import-from alerts on tests Tracer-Cloud#503/Tracer-Cloud#504. The
  ``test_module_does_not_import_httpx`` regression test in each
  delivery test file was importing the module under test via both
  ``import X as mod`` and ``from X import Y`` styles. Switched to a
  single ``from app.utils import <module>`` style so the same module
  is no longer dual-imported.

Test count unchanged (3208 pass / 2 skipped / 1 xfail). Lint, format,
and ``mypy app/utils/`` clean.
muddlebee pushed a commit that referenced this pull request Apr 28, 2026
…#952)

* refactor(utils): extract shared HTTP-post helper for delivery modules

Closes #864.

``app/utils/slack_delivery.py``, ``app/utils/discord_delivery.py``, and
``app/utils/telegram_delivery.py`` each issued ``httpx.post`` directly,
applied a per-provider timeout, parsed the response body, and converted
exceptions to ``(False, error, ...)`` tuples. The transport pieces were
identical; only the success criteria, auth scheme, and error-message
extraction differed per provider.

Add ``app/utils/delivery_transport.py`` with ``post_json`` plus a
``DeliveryResponse`` dataclass that captures the shared transport
behavior: HTTP POST, timeout, optional ``follow_redirects``, exception
suppression, and JSON decoding with graceful fallback. The helper
deliberately does not decide provider-level success — callers inspect
``status_code`` / ``data`` per their own semantics.

Refactored callers:

- Slack: ``_call_reactions_api``, ``_post_direct``, ``_post_via_webapp``,
  ``_post_via_incoming_webhook`` — all four code paths now go through
  ``post_json``. ``send_slack_report`` orchestration unchanged.
- Discord: ``post_discord_message``, ``create_discord_thread`` — same
  Bearer/Bot header pattern factored into ``_discord_auth_headers``.
- Telegram: ``post_telegram_message`` — bot-token redaction in error
  messages preserved by re-running ``_redact_token`` over
  ``response.error``.

Provider-specific payload building, success criteria
(``data["ok"]`` for Slack, status codes for Discord/Telegram), and
error extraction all stay in the calling modules. Public function
signatures are unchanged.

Tests:

- ``tests/utils/test_delivery_transport.py`` (new) — 15 tests covering
  happy path, every transport-failure shape (httpx.ConnectError,
  ReadTimeout, RequestError, OSError, RuntimeError), JSON-decode
  fallbacks (non-JSON body, JSON array, empty body), header / timeout /
  follow_redirects pass-through, and DeliveryResponse frozen-dataclass
  invariants.
- ``tests/utils/test_slack_delivery.py`` (new) — 24 tests covering all
  four Slack code paths, ``send_slack_report`` orchestration, and
  fallback chain ``direct → webapp``.
- ``tests/utils/test_discord_delivery.py`` (extended) — 3 new tests
  pinning the helper-delegation contract: module no longer imports
  ``httpx``, ``post_discord_message`` and ``create_discord_thread``
  both go through ``post_json``.
- ``tests/utils/test_telegram_delivery.py`` (extended) — 3 new tests
  pinning the same contract plus a regression test that the bot-token
  is redacted out of error strings even when the failure originates
  from the shared transport.
- All pre-existing tests in ``test_discord_delivery.py`` and
  ``test_telegram_delivery.py`` updated to patch
  ``app.utils.delivery_transport.httpx.post`` (the new transport
  boundary) instead of the per-module ``httpx.post`` import that no
  longer exists.

Verification:

- ``pytest tests/``: 3191 pass, 2 skipped, 1 xfailed, 0 failures.
- ``pytest tests/utils/``: 105 pass, 1 skipped (was 32 + 26 = 58 across
  delivery modules; now 105 with helper + Slack additions).
- ``ruff check app/ tests/`` and ``ruff format --check``: clean.
- ``mypy app/utils/``: clean on touched modules; the 5 reported errors
  are pre-existing missing-stub warnings in
  ``app/integrations/{mariadb,mysql,hf_remote}`` unrelated to this PR.

* ci: retrigger after anyio/mcp circular-import flake

The previous CI run hit a known transient ImportError in
anyio.lowlevel/mcp when opensre integrations list ran on a
pytest-xdist worker. Reproduces zero times locally on origin/main and
on this branch; pushing an empty commit to re-run with a fresh
interpreter pool.

* fix(utils): address greptile review on delivery transport refactor

- Wrap `DeliveryResponse.data` in `MappingProxyType` so the frozen
  dataclass is fully immutable end-to-end. Mutating the parsed body
  now raises `TypeError` instead of silently succeeding, and caller-
  passed dicts can no longer leak mutations into the response.
- Add `error_type` field on `DeliveryResponse` populated with
  `type(exc).__name__` on transport failures. `slack_delivery._post_direct`
  threads it back into the exception log so on-call can distinguish
  `TimeoutError` from `ConnectionError` at a glance — restores the
  pre-refactor log shape that #864 had dropped.
- Add `TestDelegatesToSharedTransport` to `tests/utils/test_slack_delivery.py`,
  mirroring the regression class on the discord/telegram test files.
  Pins that `slack_delivery` does not import `httpx` and that all four
  code paths (`_call_reactions_api`, `_post_direct`, `_post_via_webapp`,
  `_post_via_incoming_webhook`) route through `delivery_transport.post_json`.
- Add focused tests for the immutable-data, error_type, and slack
  exc_type-log behaviours (+17 tests; 3208 pass / 2 skipped / 1 xfail).

Tighten `_discord_error_from_data` signature to `Mapping[str, Any]` so
mypy stays clean against the new read-only `data` type.

* refactor(utils): address review nits — restore log key, rename field, drop redundant header, fix CodeQL

- slack_delivery `_post_direct` exception log key reverted to ``type=%s``
  (matching the exact pre-refactor format ``type=%s channel=%s
  thread_ts=%s detail=%s``) so existing log parsers that key off
  ``type=`` keep working. Per @muddlebee's nit on PR #952.
- DeliveryResponse field renamed ``error_type`` → ``exc_type``. Pythonic
  abbreviation for "exception type"; can't be ``type`` because that
  shadows the builtin. Field/log-key are intentionally distinct: the
  field is the Python attribute, the log key matches the legacy format.
- Drop ``Content-Type: application/json; charset=utf-8`` from
  ``_slack_bearer_headers`` and ``_discord_auth_headers``. ``httpx.post``
  already sets ``Content-Type: application/json`` automatically when the
  request uses the ``json=`` kwarg, so the explicit header was
  redundant. No behavioural change for Slack/Discord (UTF-8 is httpx's
  default encoding and neither provider parses the charset suffix).
- Fix CodeQL py/import-and-import-from alerts on tests #503/#504. The
  ``test_module_does_not_import_httpx`` regression test in each
  delivery test file was importing the module under test via both
  ``import X as mod`` and ``from X import Y`` styles. Switched to a
  single ``from app.utils import <module>`` style so the same module
  is no longer dual-imported.

Test count unchanged (3208 pass / 2 skipped / 1 xfail). Lint, format,
and ``mypy app/utils/`` clean.

* fix(slack): restore charset=utf-8 — Slack emits missing_charset warning without it

Live e2e probe against real ``slack.com/api/chat.postMessage`` revealed
that dropping the explicit ``Content-Type: application/json; charset=utf-8``
header (per the previous nit-fix commit) caused Slack to add a
``missing_charset`` entry to ``response_metadata.warnings`` on every
request. httpx alone sets only the bare ``application/json`` for ``json=``
kwargs, but Slack's docs explicitly recommend the charset suffix
(https://api.slack.com/web#posting_json) and emit the warning otherwise.

Restored the ``charset=utf-8`` header on ``_slack_bearer_headers`` with
an inline comment explaining the reason. ``_discord_auth_headers`` stays
auth-only (Discord does not emit this warning, and its docs do not
require the charset suffix).

Pinned via direct header assertions on the two existing slack tests that
already captured headers (``test_sends_correct_url_and_headers`` for the
reactions API, ``test_sends_thread_reply_payload`` for chat.postMessage),
so a future regression that re-drops the charset will fail loudly.

Verified live:
- direct ``chat.postMessage`` probe now returns ``warnings=[]`` (was
  ``['missing_charset']`` before this commit).
- 38/38 live checks against real Slack / Discord / Telegram pass.
- 3208 unit tests pass; lint / format / mypy clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant