[https://nvbugs/5961414][fix] Pre-cache aesthetic predictor weights to avoid VBench 429 errors#12127
Conversation
…o avoid VBench 429 errors VBench's aesthetic_quality dimension downloads sa_0_4_vit_l_14_linear.pth from GitHub via wget at evaluation time. In CI environments GitHub often returns HTTP 429 (Too Many Requests), causing the test to fail. Pre-download the file to ~/.cache/emb_reader/ with retries and exponential backoff before VBench evaluation runs, following the same pattern used for the DINO torch.hub pre-cache. Also unwaive the 3 VBench visual_gen tests that were blocked by this issue. Signed-off-by: Chang Liu <[email protected]>
📝 WalkthroughWalkthroughAdded prefetching logic for LAION aesthetic predictor weights with exponential backoff retry mechanism to avoid GitHub rate limits. Removed test skip waivers for visual generation and performance evaluation tests that are now expected to pass. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/integration/defs/examples/test_visual_gen.py`:
- Around line 178-192: Change the broad except around urllib.request.urlretrieve
to catch specific exceptions (urllib.error.URLError and OSError) and protect
against partial downloads by writing to a temporary path (e.g., cached_path +
".tmp") and only renaming/moving the temp file to cached_path on successful
completion; ensure the temp file is removed on failure before retrying and that
the retry/backoff logic using attempt and max_retries remains the same so
corrupted partial files are never left at cached_path.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ce95000a-8987-4c50-81b6-009f65106071
📒 Files selected for processing (2)
tests/integration/defs/examples/test_visual_gen.pytests/integration/test_lists/waives.txt
💤 Files with no reviewable changes (1)
- tests/integration/test_lists/waives.txt
|
/bot run --extra-stage "DGX_B200-4_GPUs-PyTorch-Post-Merge-1, DGX_B200-4_GPUs-PyTorch-Post-Merge-2" |
|
PR_Github #38663 [ run ] triggered by Bot. Commit: |
|
PR_Github #38663 [ run ] completed with state
|
Signed-off-by: Chang Liu <[email protected]>
|
/bot run --extra-stage "DGX_B200-4_GPUs-PyTorch-Post-Merge-1, DGX_B200-4_GPUs-PyTorch-Post-Merge-2" |
|
PR_Github #38894 [ run ] triggered by Bot. Commit: |
|
PR_Github #38894 [ run ] completed with state
|
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-4_GPUs-PyTorch-Post-Merge-1, DGX_B200-4_GPUs-PyTorch-Post-Merge-2" |
|
PR_Github #38899 [ run ] triggered by Bot. Commit: |
|
PR_Github #38899 [ run ] completed with state
|
Use raw.githubusercontent.com CDN URL instead of GitHub blob redirect, add User-Agent header, increase retries to 8 with longer exponential backoff (10-120s + jitter), and use atomic file writes to prevent corruption from partial downloads. Signed-off-by: Chang Liu <[email protected]> Signed-off-by: Chang Liu <[email protected]>
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-4_GPUs-PyTorch-Post-Merge-1, DGX_B200-4_GPUs-PyTorch-Post-Merge-2" |
|
PR_Github #38931 [ run ] triggered by Bot. Commit: |
|
PR_Github #38931 [ run ] completed with state |
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-4_GPUs-PyTorch-Post-Merge-1, DGX_B200-4_GPUs-PyTorch-Post-Merge-2" |
|
PR_Github #38933 [ run ] triggered by Bot. Commit: |
|
PR_Github #38933 [ run ] completed with state |
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-4_GPUs-PyTorch-Post-Merge-1, DGX_B200-4_GPUs-PyTorch-Post-Merge-2" |
|
PR_Github #38936 [ run ] triggered by Bot. Commit: |
|
PR_Github #38936 [ run ] completed with state |
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-4_GPUs-PyTorch-Post-Merge-1, DGX_B200-4_GPUs-PyTorch-Post-Merge-2" |
|
PR_Github #38938 [ run ] triggered by Bot. Commit: |
|
PR_Github #38938 [ run ] completed with state |
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-4_GPUs-PyTorch-Post-Merge-1, DGX_B200-4_GPUs-PyTorch-Post-Merge-2" |
|
/bot run --extra-stage "DGX_B200-4_GPUs-PyTorch-Post-Merge-1, DGX_B200-4_GPUs-PyTorch-Post-Merge-2" |
|
PR_Github #39328 [ run ] triggered by Bot. Commit: |
|
PR_Github #39328 [ run ] completed with state
|
|
/bot run --extra-stage "DGX_B200-4_GPUs-PyTorch-Post-Merge-1, DGX_B200-4_GPUs-PyTorch-Post-Merge-2" |
1 similar comment
|
/bot run --extra-stage "DGX_B200-4_GPUs-PyTorch-Post-Merge-1, DGX_B200-4_GPUs-PyTorch-Post-Merge-2" |
|
PR_Github #39383 [ run ] triggered by Bot. Commit: |
|
PR_Github #39383 [ run ] completed with state
|
MediaStorage.save_video() requires the ffmpeg CLI to encode MP4 output. Install it via apt-get in the _visual_gen_deps test fixture so the VBench dimension score tests can complete successfully in CI. Signed-off-by: Chang Liu <[email protected]>
|
/bot run --extra-stage "DGX_B200-4_GPUs-PyTorch-Post-Merge-1, DGX_B200-4_GPUs-PyTorch-Post-Merge-2" |
|
PR_Github #39503 [ run ] triggered by Bot. Commit: |
|
PR_Github #39503 [ run ] completed with state
|
apt-get install without a prior update returns exit code 100 (package not found) in CI containers with stale package lists. Signed-off-by: Chang Liu <[email protected]>
|
/bot run --extra-stage "DGX_B200-4_GPUs-PyTorch-Post-Merge-1, DGX_B200-4_GPUs-PyTorch-Post-Merge-2" |
1 similar comment
|
/bot run --extra-stage "DGX_B200-4_GPUs-PyTorch-Post-Merge-1, DGX_B200-4_GPUs-PyTorch-Post-Merge-2" |
…ench Resolve waives.txt conflict: keep both TestNemotronNanoV3 waive and the VBench test waives from main (added as temporary workaround while the 429 fix was in progress). Signed-off-by: Chang Liu <[email protected]>
|
/bot run --extra-stage "DGX_B200-4_GPUs-PyTorch-Post-Merge-1, DGX_B200-4_GPUs-PyTorch-Post-Merge-2" |
|
PR_Github #39539 [ run ] triggered by Bot. Commit: |
|
PR_Github #39539 [ run ] completed with state |
Remove the 5 test_vbench_dimension_score_* waives (NVBug 5961414) since this PR fixes the underlying 429 rate-limit and ffmpeg issues that caused them to fail. Signed-off-by: Chang Liu <[email protected]>
|
/bot run --extra-stage "DGX_B200-4_GPUs-PyTorch-Post-Merge-1, DGX_B200-4_GPUs-PyTorch-Post-Merge-2" |
This waive was accidentally added during merge conflict resolution and does not belong in this PR. Signed-off-by: Chang Liu <[email protected]>
|
/bot run --disable-fail-fast --extra-stage "DGX_B200-4_GPUs-PyTorch-Post-Merge-1, DGX_B200-4_GPUs-PyTorch-Post-Merge-2" |
|
PR_Github #39654 [ run ] triggered by Bot. Commit: |
|
PR_Github #39658 [ run ] triggered by Bot. Commit: |
|
PR_Github #39658 [ run ] completed with state |
|
Also, @yibinl-nvidia , it seems in your PR #12009, the subpath |
Thanks Chang for flagging this, should be fixed by #12463 |
…o avoid VBench 429 errors (NVIDIA#12127) Signed-off-by: Chang Liu <[email protected]> Signed-off-by: Chang Liu <[email protected]> Signed-off-by: Shreyas Misra <[email protected]> Co-authored-by: Shreyas Misra <[email protected]>
Summary
sa_0_4_vit_l_14_linear.pth) to~/.cache/emb_reader/with retries and exponential backoff before VBench evaluation, preventing GitHub HTTP 429 (Too Many Requests) failures in CI_precache_dino_for_torch_hub)Test plan
pytest tests/integration/defs/examples/test_visual_gen.py -v -k "test_vbench_dimension_score"on a B200 node with model weights available~/.cache/emb_reader/sa_0_4_vit_l_14_linear.pthafter first run🤖 Generated with Claude Code
Summary by CodeRabbit