Skip to content

[None][chore] Better Empty File Error Handling for trtllm-bench#12552

Merged
taylor-yb-lee merged 5 commits into
NVIDIA:mainfrom
yijingl-nvidia:better_error_handling
Apr 23, 2026
Merged

[None][chore] Better Empty File Error Handling for trtllm-bench#12552
taylor-yb-lee merged 5 commits into
NVIDIA:mainfrom
yijingl-nvidia:better_error_handling

Conversation

@yijingl-nvidia
Copy link
Copy Markdown
Collaborator

@yijingl-nvidia yijingl-nvidia commented Mar 25, 2026

Description

If the input dataset file happens to be empty due to process killed, trtllm-bench will throw a seemingly confusing error. To help user debug, add an early empty dataset check and throw a meaningful error in case the input dataset is empty.

Edit:
During CI debugging, Claude found a unit test failure at tests/unittest/bindings/test_transfer_agent_bindings.py and fixed the test by using the new agent binding interface. For simplicity, the fix is incorporated into this PR as well.

Test Coverage

No new test added.

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Added input validation for dataset streams to detect and report empty, corrupted, or incorrectly formatted data with clearer error messages.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 25, 2026

📝 Walkthrough

Walkthrough

A validation check was added to the create_dataset_from_stream function that raises a ValueError if no prompts are collected from the input stream, helping detect empty, corrupted, or incorrectly formatted dataset files.

Changes

Cohort / File(s) Summary
Input Validation
tensorrt_llm/bench/utils/data.py
Added early validation in create_dataset_from_stream to raise ValueError if no prompts are collected from the input stream (8 lines added).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description explains the main change (empty dataset validation) but states 'No new test added' despite adding new code paths, and mentions an unrelated CI fix without justification. Clarify why no tests were added for the new validation logic and provide details about the unrelated CI fix mentioned in the 'Edit' section.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding better error handling for empty dataset files in trtllm-bench, matching the code modification.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tensorrt_llm/bench/utils/data.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add/update the NVIDIA source header in this modified Python file.

This file was modified, but there is no visible NVIDIA copyright header at the top. Please add/update the standard project header with the latest meaningful modification year.

As per coding guidelines "**/*.{cpp,cc,cxx,h,hpp,py}: Add NVIDIA copyright header on ALL new files, and update year on modified files."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/bench/utils/data.py` at line 1, This file is missing the
required NVIDIA source header; update the top of tensorrt_llm.bench.utils.data
(module data.py) by inserting the project's standard NVIDIA copyright/header
block (matching other Python files in the repo) and ensure the copyright year is
updated to the latest meaningful modification year; place the header before any
imports (above the existing "import json") and follow the same formatting used
in other .py files in the project.
🧹 Nitpick comments (1)
tensorrt_llm/bench/utils/data.py (1)

124-130: Add a regression test for the empty-stream path.

Please add a unit/integration test that passes an empty dataset stream and asserts the failure path/message introduced at Lines 125-130.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/bench/utils/data.py` around lines 124 - 130, Add a regression
test that simulates an empty dataset stream and asserts the ValueError path when
no prompts are read: import the reader function from
tensorrt_llm.bench.utils.data (the function that consumes a dataset stream and
populates the local variable prompts), call it with an empty io.StringIO or
empty bytes stream, and use pytest.raises(ValueError, match="No data was read
from the dataset stream") to verify the exact failure message produced by the
len(prompts) == 0 branch; ensure the test name clearly indicates it covers the
empty-stream/error path and runs as part of the test suite.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/bench/utils/data.py`:
- Around line 124-130: The new ValueError raised when no prompts are read from
create_dataset_from_stream will surface as a raw traceback because callers in
benchmark modules don't catch it; replace this direct ValueError with a
specific, catchable exception (e.g., create a DatasetLoadError or
DatasetFormatError class) and raise that instead from
create_dataset_from_stream, then update the benchmark entry points that call
create_dataset_from_stream to catch DatasetLoadError and translate it to the
CLI-friendly error handling already used (or move the create_dataset_from_stream
call inside the existing try/except blocks); reference
create_dataset_from_stream and the new DatasetLoadError/DatasetFormatError so
the caller changes can locate and handle the error type.

---

Outside diff comments:
In `@tensorrt_llm/bench/utils/data.py`:
- Line 1: This file is missing the required NVIDIA source header; update the top
of tensorrt_llm.bench.utils.data (module data.py) by inserting the project's
standard NVIDIA copyright/header block (matching other Python files in the repo)
and ensure the copyright year is updated to the latest meaningful modification
year; place the header before any imports (above the existing "import json") and
follow the same formatting used in other .py files in the project.

---

Nitpick comments:
In `@tensorrt_llm/bench/utils/data.py`:
- Around line 124-130: Add a regression test that simulates an empty dataset
stream and asserts the ValueError path when no prompts are read: import the
reader function from tensorrt_llm.bench.utils.data (the function that consumes a
dataset stream and populates the local variable prompts), call it with an empty
io.StringIO or empty bytes stream, and use pytest.raises(ValueError, match="No
data was read from the dataset stream") to verify the exact failure message
produced by the len(prompts) == 0 branch; ensure the test name clearly indicates
it covers the empty-stream/error path and runs as part of the test suite.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2cf7ab9f-fbd5-4f65-a126-7af8d963ef41

📥 Commits

Reviewing files that changed from the base of the PR and between 93d99f1 and 8c3b090.

📒 Files selected for processing (1)
  • tensorrt_llm/bench/utils/data.py

Comment thread tensorrt_llm/bench/utils/data.py
@yijingl-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40796 [ run ] triggered by Bot. Commit: 76d9e5b Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40796 [ run ] completed with state SUCCESS. Commit: 76d9e5b
/LLM/main/L0_MergeRequest_PR pipeline #31811 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@yijingl-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41014 [ run ] triggered by Bot. Commit: 76d9e5b Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41014 [ run ] completed with state FAILURE. Commit: 76d9e5b
/LLM/main/L0_MergeRequest_PR pipeline #31994 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@yijingl-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41038 [ run ] triggered by Bot. Commit: ccfeccc Link to invocation

@yijingl-nvidia yijingl-nvidia force-pushed the better_error_handling branch from ccfeccc to 1402d60 Compare March 31, 2026 23:29
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41038 [ run ] completed with state FAILURE. Commit: ccfeccc
/LLM/main/L0_MergeRequest_PR pipeline #32018 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@yijingl-nvidia yijingl-nvidia force-pushed the better_error_handling branch from 1402d60 to 93e0248 Compare April 1, 2026 16:53
@yijingl-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41231 [ run ] triggered by Bot. Commit: 93e0248 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41231 [ run ] completed with state SUCCESS. Commit: 93e0248
/LLM/main/L0_MergeRequest_PR pipeline #32192 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

@yijingl-nvidia yijingl-nvidia force-pushed the better_error_handling branch from 93e0248 to 150268d Compare April 2, 2026 17:36
@yijingl-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41479 [ run ] triggered by Bot. Commit: 150268d Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41479 [ run ] completed with state SUCCESS. Commit: 150268d
/LLM/main/L0_MergeRequest_PR pipeline #32402 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@yijingl-nvidia yijingl-nvidia force-pushed the better_error_handling branch from b7d4ba7 to 4376d79 Compare April 2, 2026 21:28
@yijingl-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41518 [ run ] triggered by Bot. Commit: 4376d79 Link to invocation

@FrankD412 FrankD412 enabled auto-merge (squash) April 3, 2026 03:48
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41518 [ run ] completed with state SUCCESS. Commit: 4376d79
/LLM/main/L0_MergeRequest_PR pipeline #32434 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@yijingl-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@yijingl-nvidia yijingl-nvidia force-pushed the better_error_handling branch from 307609a to 8df6174 Compare April 16, 2026 17:52
@yijingl-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43820 [ run ] triggered by Bot. Commit: 8df6174 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43820 [ run ] completed with state FAILURE. Commit: 8df6174
/LLM/main/L0_MergeRequest_PR pipeline #34295 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@yijingl-nvidia yijingl-nvidia force-pushed the better_error_handling branch from 8df6174 to 3f04260 Compare April 20, 2026 18:05
@yijingl-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44775 [ run ] triggered by Bot. Commit: 3f04260 Link to invocation

@yijingl-nvidia yijingl-nvidia force-pushed the better_error_handling branch from 3f04260 to 7c35b45 Compare April 21, 2026 17:47
@yijingl-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44782 [ run ] triggered by Bot. Commit: 7c35b45 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44782 [ run ] completed with state SUCCESS. Commit: 7c35b45
/LLM/main/L0_MergeRequest_PR pipeline #35136 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@yijingl-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

yijingl-nvidia and others added 5 commits April 21, 2026 16:57
…ror handling

- Add NVIDIA copyright header to bench/utils/data.py
- Replace ValueError with DatasetFormatError (subclass of ValueError) for
  clearer, catchable exception semantics
- Wrap create_dataset_from_stream calls in low_latency.py and throughput.py
  to catch DatasetFormatError and re-raise as click.UsageError for clean
  CLI error output instead of raw tracebacks
- Add regression test for the empty-stream path in test_bench_data.py

Signed-off-by: Yijing Li <[email protected]>
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Add NVIDIA Apache-2.0 license headers to files modified in this PR
that were missing them, to fix the OSS vulnerability scan.

Signed-off-by: Yijing Li <[email protected]>
@yijingl-nvidia yijingl-nvidia force-pushed the better_error_handling branch from 7c35b45 to 553c302 Compare April 21, 2026 23:58
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44823 [ run ] triggered by Bot. Commit: 553c302 Link to invocation

@yijingl-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44827 [ run ] triggered by Bot. Commit: 553c302 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44827 [ run ] completed with state SUCCESS. Commit: 553c302
/LLM/main/L0_MergeRequest_PR pipeline #35173 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@yijingl-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44992 [ run ] triggered by Bot. Commit: 553c302 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44992 [ run ] completed with state SUCCESS. Commit: 553c302
/LLM/main/L0_MergeRequest_PR pipeline #35312 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@yijingl-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45038 [ run ] triggered by Bot. Commit: 553c302 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45038 [ run ] completed with state SUCCESS. Commit: 553c302
/LLM/main/L0_MergeRequest_PR pipeline #35345 completed with status: 'SUCCESS'

CI Report

Link to invocation

@taylor-yb-lee taylor-yb-lee merged commit 36113bf into NVIDIA:main Apr 23, 2026
5 checks passed
ziyixiong-nv pushed a commit to ziyixiong-nv/TensorRT-LLM that referenced this pull request Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants