[Spaces] Add fetch_space_logs + hf spaces logs command#4091
[Spaces] Add fetch_space_logs + hf spaces logs command#4091Wauplin merged 13 commits intohuggingface:mainfrom
Conversation
Agents and scripts currently have no way to read Space build/run logs programmatically — the endpoint is only reachable via raw curl. This adds a public API to close that gap. - HfApi.fetch_space_logs(repo_id, *, build=False, follow=False) yields log lines as Iterable[str]. build=True switches to container build logs; default is the running app's stdout/stderr. - `hf spaces logs <repo_id> [--build] [-f] [-n N]` mirrors the Python API at the CLI level, with 404/403 mapped to clean CLIError messages. The helper trusts the "stream close = done" server contract (confirmed against moon-landing's SpaceLogs.svelte onClose handler) and does not poll SpaceStage; read timeout + bounded retries handle the misbehaving-upstream case. Structure mirrors _fetch_running_job_sse but without the status-check backstop. Tests use the mock-based pattern from hf jobs logs (no new VCR cassettes). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
hf_raise_for_status() raises HfHubHTTPError (inherits HTTPError), not httpx.HTTPStatusError. The previous handler was dead code, causing 404/403 errors to fall through to the retry loop instead of raising immediately. Spotted by cursor bugbot on PR review. Note: the same bug exists in _fetch_running_job_sse — not fixed here to keep the diff focused, but worth a follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 1f884f8. Configure here.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4091 +/- ##
==========================================
+ Coverage 75.00% 77.13% +2.13%
==========================================
Files 145 167 +22
Lines 13978 18884 +4906
==========================================
+ Hits 10484 14566 +4082
- Misses 3494 4318 +824 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Re: SSE helper duplication — explored this. We considered a shared helper but the jobs version has a status-check backstop we deliberately omit, and unifying would mean touching working code. Happy to refactor if a shared helper is wanted, but for now the duplication felt like the safer choice. |
Wauplin
left a comment
There was a problem hiding this comment.
Thanks for adding this @davanstrien ! It has been on my todo since a long time but never took the time to address it ^^ (see #2667)
Co-authored-by: Lucain <[email protected]>
Extracts `HfApi._stream_sse_events` to unify the retry/backoff/dedup loop previously duplicated across `_fetch_space_logs_sse` and `_fetch_running_job_sse`. Addresses Wauplin and Cursor Bugbot review comments on PR huggingface#4091. Also fixes a dead `except httpx.HTTPStatusError` handler that affected both Spaces and Jobs: `hf_raise_for_status` raises `HfHubHTTPError` (subclass of `httpx.HTTPError`, not `HTTPStatusError`), so 404/403 in follow mode used to fall through to the broad retry arm and stall for ~25s. The new helper catches `HfHubHTTPError` before the broad arm, so permanent errors fail fast. Live-verified: `hf spaces logs missing/x -f` now errors in <1s instead of ~25s. CLI cleanups on `hf spaces logs` per Wauplin: - Switch from `print()` to `out.text(line.strip())` (new mode-aware printer from huggingface#3979). - Drop the redundant local `HfHubHTTPError` block — it's already handled by the global CLI error mapper. Also tightens `_fetch_running_job_sse` typing by splitting the legacy `double_check_job_has_finished_on_status_code_or_error` mixed tuple into `tolerated_status_codes: tuple[int, ...]` and `tolerated_exception_types: tuple[type[Exception], ...]`, eliminating a runtime type-discrimination step. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The entry in `tolerated_exception_types` was never consulted: the SSE helper's `is_no_new_line_timeout` check short-circuits the tolerated tuple lookup for any ReadTimeout, so the tuple entry was dead code (pre-existing on `main` before huggingface#4091, preserved faithfully through the refactor). ReadTimeout tolerance continues to work via the `is_no_new_line_timeout` path. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
Pushed changes addressing the review:
|
Wauplin
left a comment
There was a problem hiding this comment.
Looks good! Thanks for the refacto
Last comments before getting it merged 🤗
| Note: if you are using a 'cpu-basic' hardware, you cannot configure a custom sleep time. Your Space will automatically | ||
| be paused after 48h of inactivity. | ||
|
|
||
| **5b. Debug a failing Space by reading its logs** |
| if tail is not None: | ||
| logs = deque(logs, maxlen=tail) | ||
| for line in logs: | ||
| out.text(line.strip()) |
There was a problem hiding this comment.
| out.text(line.strip()) | |
| all_lines = [] | |
| found_logs = False | |
| for line in logs: | |
| clean_line = line.strip() | |
| out.text(clean_line) | |
| if clean_line: | |
| found_logs = True | |
| if not found_logs and not build: | |
| out.hint(f"No run logs found for space {space_id}. Try passing --build to fetch build logs instead.") |
Suggestion: add a hint if no logs returned? should play well with agents
Co-authored-by: Lucain <[email protected]>
Co-authored-by: Lucain <[email protected]>
Co-authored-by: Lucain <[email protected]>
# Conflicts: # docs/source/en/package_reference/cli.md
- Promote "Debug a failing Space" heading to ### (matches PR huggingface#4108 style) - Add hint when run logs are empty, suggesting --build as alternative - Add tests covering the empty-logs hint for both run and build modes - Regenerate CLI reference docs to include hf spaces logs command
Wauplin
left a comment
There was a problem hiding this comment.
Thank you! All good for me once the comment is merged :)
|
This PR has been shipped as part of the v1.11.0 release. |

Summary
WIP and happy to discuss if it makes sense!
Using agents with Spaces, I found it helpful to give access to logs, and currently, this isn't exposed in the CLI. An alternative design would be to just return the URL for logs from methods that create /modify Spaces, but this might be better for actual scripting, etc.
HfApi.fetch_space_logs(repo_id, *, build=False, follow=False)— programmatic access to Space build/run logs via the SSE endpoint/api/spaces/{repo_id}/logs/{run|build}hf spaces logs <repo_id> [--build] [-f/--follow] [-n/--tail N]CLI commandMotivation
Agents and scripts that manage Spaces (restart, set volumes, push code) currently have no way to read why a Space failed without knowing the raw endpoint URL and crafting a curl request.
get_space_runtime()surfaces the stage (BUILD_ERROR,RUNTIME_ERROR) but not the actual error — that lives behind/api/spaces/{repo_id}/logs/{build|run}.By wrapping this as a first-class method, agents can autonomously check logs when something goes wrong — no human nudge needed to provide a URL or paste output. The pattern already exists for Jobs (
fetch_job_logs+hf jobs logs); this closes the equivalent gap for Spaces.API shape
Design decisions (all open for discussion)
build=Trueboolean flag vslog_type="build"enum. We tested this by prompting an independent agent with no knowledge of the implementation to write the CLI/Python calls they'd intuitively expect. They converged on--buildas a boolean toggle (likekubectl logs --previous) rather than--type buildas an enum. Rationale: shorter, no string literal to remember, honours the asymmetry ("logs" = run logs by default, build is the special case).No
SpaceStagepolling in the helper. The helper trusts that and does not pollget_space_runtime()as a backstop. Upstream misbehavior (observed on one RUNTIME_ERROR space where the server held the socket open with zero bytes) is bounded by read timeout + retry cap. This avoids coupling to theSpaceStageenum, which is currently incomplete (server returnsSLEEPINGbut the Python enum doesn't have it).Single method, not separate
fetch_space_logs+fetch_space_build_logs. Since both log endpoints shareIterable[str]and only differ by a URL segment, a single method with a boolean toggle felt like the right granularity but open to splitting if preferred.Test plan
make style+make qualityclean (2 pre-existingtyerrors incli/_output.py, not introduced here)test_cli.py— 228/228 passingNote
Medium Risk
Adds a new SSE-based log streaming API and CLI surface, and refactors Jobs SSE streaming to reuse the same retry/dedup loop, which could affect log/metrics streaming behavior under timeouts/retries.
Overview
Adds programmatic Space log access via
HfApi.fetch_space_logs(repo_id, build=..., follow=...), streaming from the Hub’s SSE endpoints for both run and build logs.Introduces
hf spaces logswith--build,--follow/-f, and--tail/-n(including validation of incompatible flags), plus a small shared SSE streaming helper inhf_api.pythat also replaces the bespoke Jobs SSE loop. Documentation and CLI reference are updated, and new CLI tests cover the new command behavior.Reviewed by Cursor Bugbot for commit c7625a5. Bugbot is set up for automated code reviews on this repo. Configure here.