Skip to content

Conversation

@lancelly
Copy link
Collaborator

@lancelly lancelly commented Oct 6, 2025

We suspected the OOM in one of these tests was due to a cluster misconfiguration. Now that it's fixed, we have re-enabled the related tests to see if the OOM is resolved.

Summary by CodeRabbit

  • Tests
    • Re-enabled a previously skipped integration test, improving overall test coverage and CI reliability.
    • Restores automated verification for an important performance path, helping catch regressions earlier and provide faster release feedback.
    • No changes to product features, performance, or user experience.

Signed-off-by: Lanyu Liao <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 6, 2025

📝 Walkthrough

Walkthrough

A single change was made to tests/integration/test_lists/waives.txt, removing one skip entry to unskip a specific test case. No other files or lines were modified.

Changes

Cohort / File(s) Summary of Changes
Test waiver list update
tests/integration/test_lists/waives.txt
Removed skip entry for accuracy/test_llm_api_pytorch.py::TestDeepSeekR1::test_nvfp4_multi_gpus[latency_trtllmgen], unskipping the test.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description does not follow the repository’s required template because it omits the summary annotation and the mandatory “## Description”, “## Test Coverage”, and “## PR Checklist” sections, leaving the template structure incomplete. Please update the PR description to include the template header (e.g., @coderabbitai summary), a “## Description” section explaining the change, a “## Test Coverage” section listing relevant tests, and the “## PR Checklist” to ensure compliance with repository standards.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
Title Check ✅ Passed The pull request title succinctly describes the main change of re-enabling tests related to a GB200 OOM issue and references the relevant bug and fix context, making it clear to reviewers.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lancelly lancelly changed the title [https://nvbugspro.nvidia.com/bug/5455140][fix] unwaive tests related to GB200 OOM [https://nvbugs/5455140][fix] unwaive tests related to GB200 OOM Oct 6, 2025
@lancelly
Copy link
Collaborator Author

lancelly commented Oct 6, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20680 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20680 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15623 completed with status: 'FAILURE'

@lancelly
Copy link
Collaborator Author

lancelly commented Oct 7, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20695 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20695 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15633 completed with status: 'FAILURE'

@lancelly
Copy link
Collaborator Author

lancelly commented Oct 7, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20707 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20707 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #15644 completed with status: 'FAILURE'

@lancelly
Copy link
Collaborator Author

lancelly commented Oct 7, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20714 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20714 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #15650 completed with status: 'FAILURE'

@lancelly
Copy link
Collaborator Author

lancelly commented Oct 7, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20715 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20715 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15651 completed with status: 'FAILURE'

@lancelly
Copy link
Collaborator Author

lancelly commented Oct 7, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20724 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20724 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15659 completed with status: 'FAILURE'

@lancelly
Copy link
Collaborator Author

lancelly commented Oct 7, 2025

/bot run --add-multi-gpu-test --disable-fail-fast

@lancelly
Copy link
Collaborator Author

lancelly commented Oct 7, 2025

/bot run --disable-fail-fast

1 similar comment
@lancelly
Copy link
Collaborator Author

lancelly commented Oct 7, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20737 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20737 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15670 completed with status: 'FAILURE'

@lancelly
Copy link
Collaborator Author

lancelly commented Oct 8, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20757 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20757 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15687 completed with status: 'SUCCESS'

@chzblych chzblych merged commit d57b8f0 into NVIDIA:main Oct 8, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants