Show actionable error when HuggingFace dataset access fails (fixes #59, #61)#74
Show actionable error when HuggingFace dataset access fails (fixes #59, #61)#74lonexreb wants to merge 2 commits intoNVlabs:mainfrom
Conversation
Upstream physical_ai_av.utils.hf_interface.download_file raises a bare IndexError when HfApi.get_paths_info returns an empty list — typically because the user has not authenticated with HuggingFace or has not been granted access to the gated nvidia/PhysicalAI-Autonomous-Vehicles dataset. The resulting traceback is hard to act on. Wrap the PhysicalAIAVDatasetInterface() call in load_physical_aiavdataset and reraise as RuntimeError pointing at the access request page, the hf auth login command, and README §3. Fixes NVlabs#59, NVlabs#61. Signed-off-by: lonexreb <[email protected]>
|
Hi, thanks for creating this PR. I agree that hitting IndexError exception deep in the code when the root cause is HF access is not ideal. On the other hand, however, catching and raise all IndexError exception as HF access issue also seemed potentially misleading. What do you think about an explicit HF access check and raise the exception you wrote if the access check fails? |
Per @super-anova review on NVlabs#74: catching IndexError broadly risks misattributing unrelated failures as HF auth issues. Switch to an explicit access check up front using HfApi.repo_info() — only the two HF-specific exceptions (GatedRepoError, RepositoryNotFoundError) are caught, and only those raise the helpful RuntimeError. Other failure modes (network errors, transient HF outages, real bugs) propagate unchanged. Behavior: - User without HF auth or without granted dataset access: clear RuntimeError pointing to the access page and hf auth login. - User with valid auth + access: one HEAD request to HF, then instantiation proceeds normally. - Other errors: surfaced as-is, not masked. Signed-off-by: lonexreb <[email protected]>
|
@super-anova thanks — that's a much better signal-to-noise ratio. Pushed
Cost is one HEAD request to HF on first init. Sample diff: from huggingface_hub import HfApi
from huggingface_hub.errors import GatedRepoError, RepositoryNotFoundError
try:
HfApi().repo_info(
repo_id="nvidia/PhysicalAI-Autonomous-Vehicles",
repo_type="dataset",
)
except (GatedRepoError, RepositoryNotFoundError) as e:
raise RuntimeError(...same actionable message...) from e
avdi = physical_ai_av.PhysicalAIAVDatasetInterface()Let me know if you'd like the access check factored into a helper (e.g. |
Summary
When the upstream
physical_ai_avpackage fails to fetch dataset metadata from HuggingFace, it raises a bareIndexError: list index out of rangefrom deep insidephysical_ai_av.utils.hf_interface.download_file. The traceback gives the user no clue about the actual cause (missing HF authentication, or no access granted to the gated dataset).This PR catches that
IndexErrorat thePhysicalAIAVDatasetInterface()initialization call site inload_physical_aiavdataset()and reraises aRuntimeErrorwith an actionable message that points to:hf auth logincommand.The original exception is preserved via
raise ... from eso the upstream traceback is still available for debugging.Before
After
Why catch only
IndexError?This is the specific, observed failure mode reported in #59 and #61 with a known cause (empty
get_paths_inforesponse from gated/unauthenticated access). Other exception types (network errors, transient HF outages, etc.) propagate unchanged so they are not misattributed to authentication.Test plan
ast.parse) passes.avdi, or auth is configured) is unchanged — thetryblock only wraps the lazy default initialization.__cause__(raise ... from e) so the upstream traceback is still inspectable.test_inference.pywithHF_TOKENunset and confirming the new message appears.Related issues