[BUG FIX]: hf_hub_download crashes when stderr lacks a real file descriptor#4065
Conversation
f08d2e6 to
6eff9fb
Compare
6f4b2a4 to
43399d3
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 59aa035. Configure here.
98872dc to
cf1615c
Compare
Add _SafeTqdm fallback with thread lock for environments where stderr.fileno() is invalid (Textual, Jupyter, pytest, WSGI, etc.). Wrap progress bar construction in try/except to degrade gracefully instead of crashing the download.
cf1615c to
037f4ad
Compare
|
Friendly bump on this, happy to address any feedback if needed. Let me know if there's anything blocking review. |
|
(generated by Claude) Thanks for digging into this @tobocop2! I reproduced the crash and it's a real bug but I'd like to push back a bit on both the framing and the size of the fix before merging. On the affected-environments listThe PR description says this affects Textual, Jupyter, pytest capture, gunicorn/uvicorn, sandboxed runtimes, and IDE terminals. In practice that list is largely overstated. The relevant code path in CPython is: # multiprocessing/resource_tracker.py (ensure_running)
fds_to_pass = []
try:
fds_to_pass.append(sys.stderr.fileno())
except Exception:
passAnything that raises from
Plus two compounding preconditions: the multiprocessing start method has to be So in practice the realistic trigger is Textual (and any custom wrapper that returns On the fixThe current patch adds a # in utils/tqdm.py, near the tqdm subclass definition
import threading
tqdm.set_lock(threading.RLock()) # avoid the mp-lock fork_exec path entirelyThat would:
If you'd rather keep the reactive shape, then the Happy to be wrong about the proactive approach if there's a multiprocessing scenario I'm forgetting — but either way I think we can land this in ~5 lines instead of ~30. WDYT? |
|
Hey @tobocop2 , the comment above has been AI-generated after some investigation from both me and a local Claude, so I agree with what's stated above. I also confirm that I reproduced the bug in Textual but not in other cases you've mentioned (explaining why we haven't got any reports so far). About the solution I'd prefer a smaller one even if it means poorer UX in rare scenarios. |
Replace the _SafeTqdm class and the two try/except (OSError, ValueError) guards in _create_progress_bar with a single tqdm.set_lock(threading.RLock()) call at module import. This proactively short-circuits tqdm.get_lock()'s lazy TqdmDefaultWriteLock construction, so the multiprocessing create_mp_lock path that calls fork_exec is never taken on the HF subclass. The fork_exec path is what fails when sys.stderr.fileno() returns -1 (realistically: Textual TUIs and any custom wrapper that explicitly returns -1 from fileno()). CPython's multiprocessing/resource_tracker silently swallows fileno() exceptions, so StringIO-backed streams (Jupyter, pytest capture, etc.) are not affected. The proactive lock sacrifices inter-process tqdm bar coordination on the HF subclass, which is not a supported use case. Drops the now-obsolete _SafeTqdm class, the warnings.warn fallback, and the two unit tests that exercised the try/except fallback path. Keeps the end-to-end regression test in test_file_download.py.
3ab64b5 to
fc92298
Compare
thanks for the review @Wauplin I've just reflected your feedback and updated the PR and description. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Wauplin
left a comment
There was a problem hiding this comment.
Thanks for making the change @tobocop2 👍
Learnt a few things along the way :) Feels like we are fixing an issue mostly due to how Textual handles no file descriptors (with a -1 instead of a proper UnsupportedOperation) + how CPython do not check for invalid fileno values here. Wonder if this should live in tqdm directly instead but I'm fine with having the fix here since it's small.
Note that if we notice any drawback in the future we might rollback to another solution (e.g. by explicitly calling sys.stderr.fileno() in HF custom tqdm to disable it if running inside Textual?)
|
greatly appreciate you @Wauplin This means a lot and thanks again |
|
This PR has been shipped as part of the v1.13.0 release. |

Problem
hf_hub_downloadcrashes withValueError: bad value(s) in fds_to_keepwhensys.stderr.fileno()returns-1. The crash happens during tqdm's progress bar setup, before the download itself runs.In practice this hits Textual TUI apps (Textual's
app.pyreturns-1fromfileno()) and any custom stderr wrapper that does the same. CPython'smultiprocessing/resource_trackersilently swallows exceptions fromfileno(), so streams likeio.StringIO, Jupyter'sOutStream, and pytest'sCaptureIO— which all raise onfileno()— are not affected. Two more preconditions are needed for the crash: the multiprocessing start method must bespawnorforkserver, and the resource tracker must not have been started already.Root cause
The first time tqdm's default lock is touched, it lazily constructs
TqdmDefaultWriteLock, which callscls.create_mp_lock(). That spawns the multiprocessing resource tracker viafork_exec, which rejectsstderr.fileno() == -1.Why this fix
Setting a thread-only
RLockon the HF tqdm subclass at module import short-circuits the lazy init:tqdm.get_lock()finds_lockalready set and skips thecreate_mp_lockpath entirely. The patch is one line plus a comment.The trade-off is no inter-process tqdm bar coordination on the HF subclass — multiple separate Python processes printing tqdm bars to the same TTY may visually interleave. Downloads themselves are unaffected, and
huggingface_hubparallelizes via threads inside one process (not subprocesses), so this is not a scenario the library targets.Crash demo
Fixes #4066
Note
Medium Risk
Changes global progress-bar locking behavior by overriding
tqdm's default multiprocessing lock at import time, which could affect multi-process progress bar coordination. Download logic is unchanged, and risk is largely limited to progress display behavior.Overview
Prevents
hf_hub_downloadfrom crashing whensys.stderr.fileno()returns-1(e.g. Textual TUIs) by setting a thread-onlyRLockviatqdm.set_lock(...), avoiding tqdm’s lazy multiprocessing lock/resource-tracker initialization.Adds a regression test that patches
sys.stderrwith a badfileno()implementation and verifies the download completes successfully.Reviewed by Cursor Bugbot for commit 16d5a58. Bugbot is set up for automated code reviews on this repo. Configure here.