[TRTLLM-8650][fix] beam search request validation (#8433) #9228

ixlmar · 2025-11-17T16:16:24Z

Description

Cherry-picks #8433

Test Coverage

Tests are included.

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

Release Notes

Bug Fixes
- Added validation to enforce beam width consistency and prevent incompatible sampling configurations from proceeding.
Tests
- Extended test coverage for parameter validation including edge cases and unsupported configurations.
Refactor
- Improved internal request queue processing to better separate special request signal handling from standard request filtering.

ixlmar · 2025-11-17T16:17:39Z

/bot run

coderabbitai · 2025-11-17T16:23:07Z

📝 Walkthrough

Walkthrough

The PR refactors request queue processing by renaming and restructuring the request filtering method to explicitly handle special queue items (shutdown, cancel, control signals). Additionally, beam width consistency validation is added to request validation. Test fixtures are updated to use a centralized model kwargs builder, and new parameter validation tests are introduced.

Changes

Cohort / File(s)	Summary
Request Queue Refactoring `tensorrt_llm/_torch/pyexecutor/executor_request_queue.py`	Method `_validate_and_filter_requests` renamed to `_handle_special_queue_items`. New handler processes shutdown requests (sets `is_shutdown`), canceled requests (records IDs), control requests (accumulates items on rank 0), and collects regular requests into `accepted_new_requests` list.
Request Validation Enhancement `tensorrt_llm/_torch/pyexecutor/py_executor.py`	Added beam width consistency check in `_validate_request`: if `request.sampling_config` exists, enforces `beam_width == max_beam_width`; raises `ValueError` if mismatch. Check placed before existing model-specific token ID range validations.
Queue Test Updates `tests/unittest/_torch/executor/test_executor_request_queue.py`	Test renamed from `test_validate_and_filter_requests` to `test_handle_special_queue_items`. Method call updated to `_handle_special_queue_items`. Added assertions verifying `executor_queue.is_shutdown` and presence of canceled request ID in `canceled_req_ids`.
Beam Search Test Restructuring `tests/unittest/_torch/sampler/test_beam_search.py`	Added helper functions `model_kwargs()` and `_build_llm()` for centralized LLM construction. Updated `llm()` and `llm_cuda_graph()` fixture signatures to accept `model_kwargs`. Introduced new `TestParameterValidation` test class (with `force_ampere` decorator) validating unsupported sampling parameters (greedy decoding returns, omitted beam search flag, beam width mismatches) raise appropriate errors.

Sequence Diagram(s)

sequenceDiagram
    participant Queue as Executor Queue
    participant Handler as _handle_special_queue_items
    participant Processor as Request Processor

    Queue->>Handler: new_requests
    Note over Handler: Filter special items
    
    alt Shutdown Request
        Handler->>Handler: set is_shutdown = True
        Handler->>Queue: early terminate
    else Canceled Request
        Handler->>Handler: record canceled_req_ids
    else Control Request
        Handler->>Handler: rank 0: accumulate<br/>to request_accumulated
    else Regular Request
        Handler->>Handler: add to accepted_new_requests
    end
    
    Handler->>Processor: return accepted_new_requests
    Processor->>Processor: process valid requests

sequenceDiagram
    participant Client as Test/Client
    participant Validator as _validate_request
    participant CheckBeamWidth as Beam Width Check
    participant CheckTokenRange as Token Range Check

    Client->>Validator: request with sampling_config
    Validator->>CheckBeamWidth: beam_width vs max_beam_width
    alt beam_width ≠ max_beam_width
        CheckBeamWidth->>Validator: raise ValueError
        Validator->>Client: error
    else beam_width = max_beam_width
        CheckBeamWidth->>Validator: valid
        Validator->>CheckTokenRange: proceed to model-specific checks
        CheckTokenRange->>Validator: validation result
        Validator->>Client: success/error
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

executor_request_queue.py: Method refactoring with altered control flow for special queue item handling requires careful verification of shutdown, cancellation, and control request logic paths.
py_executor.py: New beam width validation logic is straightforward but interacts with existing validation chain; verify placement doesn't introduce unintended side effects.
test_beam_search.py: Fixture restructuring is mechanical but introduces new centralized builder pattern; new TestParameterValidation test class mirrors existing tests with parameterization; verify new validation test cases exercise intended error paths and that model kwargs are correctly propagated.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Description check	❓ Inconclusive	The description is minimal and vague. It only states 'Cherry-picks #8433' and 'Tests are included' without explaining what the actual changes are or why they were made.	Provide a detailed description of what beam search request validation changes were cherry-picked and why they are necessary. Explain the problem being solved.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: adding beam search request validation. It follows the required format with JIRA ticket and fix type.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example:

"Create a concise high-level summary as a bullet-point list. Then include a Markdown table showing lines added and removed by each contributing author."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tensorrt-cicd · 2025-11-17T16:23:07Z

PR_Github #24780 [ run ] triggered by Bot. Commit: f228828

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

tensorrt_llm/_torch/pyexecutor/py_executor.py (1)

1593-1602: Beam-width validation is correct and well-placed; consider minor cleanup of the error message literal

The new check cleanly enforces request.sampling_config.beam_width == self.max_beam_width before any model-specific validation, and it will be triggered exactly once per incoming LlmRequest via _fetch_and_activate_new_requests, which aligns with the earlier decision to keep sampler.beam_width() lightweight and free of per-batch consistency checks. This is a good place to centralize beam-width consistency. Based on learnings

If you want to address the Ruff TRY003 hint, you could optionally shorten or factor the error text into a constant/custom exception, but that’s cosmetic and not blocking.

tensorrt_llm/_torch/pyexecutor/executor_request_queue.py (1)

275-333: Special-queue handling refactor looks sound; shutdown-drop behavior is worth being explicit about

Routing everything through _handle_special_queue_items simplifies _fetch_and_process_requests and makes the shutdown/cancel/control paths much clearer:

Shutdown sets is_shutdown and stops considering further items in this batch.

Cancel requests only tag canceled_req_ids and do not enter the waiting queue.

Control requests are isolated into control_requests, with rank 0 stashing any following items into request_accumulated, while other ranks ignore them, which matches the “control must be handled exclusively” contract enforced by the early return when control_requests is non-empty.

One behavioral detail to be aware of: any items that appear after a shutdown request in the same new_requests batch are intentionally dropped rather than queued. That’s consistent with treating shutdown as terminal, but it might be worth documenting in the class docstring or method docstring so callers don’t assume such requests will ever be processed.

Also applies to: 485-505

tests/unittest/_torch/executor/test_executor_request_queue.py (1)

478-497: Test correctly exercises special-item handling; setup can be simplified

The assertions here are exactly what we need: only the normal request is returned, is_shutdown flips to True, and the canceled ID is tracked in canceled_req_ids.

Given that beam-width validation no longer lives in this queue layer, the “avoid beam validation” comment and the sampling_config deletion on a bare Mock are now redundant. You can simplify the setup to just mock_request = Mock() and drop the delattr block/comment without changing behavior.
tests/unittest/_torch/sampler/test_beam_search.py (1)
490-593: Parameter-validation tests for beam search are well targeted; consider making regex patterns raw strings

The new TestParameterValidation class:

Uses its own fixed_params (max_beam_width 4) and model_kwargs pointing to a TinyLlama checkpoint, gated by @force_ampere and generous timeouts.

Verifies:

Greedy decoding (use_beam_search=False or omitted) with best_of > 1 raises ValueError and does not hang.

A smaller beam width (best_of=2 with max_beam_width=4) raises a RequestError whose message matches the new beam-width validation, and the engine remains usable afterwards via _check_engine_responds.

That gives good coverage of the new validation paths and, importantly, asserts that error-handling paths don’t leave the engine in a bad state.

Minor polish to align with Ruff’s RUF043 hint: the match= patterns are true regexes (".*...*"), so you can make them raw strings for clarity and to avoid accidental escaping issues, e.g.:
with pytest.raises(
    ValueError,
    match=r".*Greedy decoding in the LLM API does not allow multiple returns.*",
):
    ...
and similarly for the other match= arguments.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6151a4c and f228828.

📒 Files selected for processing (4)

tensorrt_llm/_torch/pyexecutor/executor_request_queue.py (3 hunks)
tensorrt_llm/_torch/pyexecutor/py_executor.py (1 hunks)
tests/unittest/_torch/executor/test_executor_request_queue.py (2 hunks)
tests/unittest/_torch/sampler/test_beam_search.py (6 hunks)

🧰 Additional context used

🧠 Learnings (13)

📓 Common learnings

Learnt from: dcampora
Repo: NVIDIA/TensorRT-LLM PR: 6867
File: tensorrt_llm/_torch/pyexecutor/sampler.py:67-72
Timestamp: 2025-08-13T16:20:37.987Z
Learning: In TensorRT-LLM sampler code, performance is prioritized over additional validation checks. The beam_width helper method intentionally returns the first request's beam_width without validating consistency across all requests to avoid performance overhead from iterating through the entire batch.

📚 Learning: 2025-08-13T16:20:37.987Z

Learnt from: dcampora
Repo: NVIDIA/TensorRT-LLM PR: 6867
File: tensorrt_llm/_torch/pyexecutor/sampler.py:67-72
Timestamp: 2025-08-13T16:20:37.987Z
Learning: In TensorRT-LLM sampler code, performance is prioritized over additional validation checks. The beam_width helper method intentionally returns the first request's beam_width without validating consistency across all requests to avoid performance overhead from iterating through the entire batch.

Applied to files:

tensorrt_llm/_torch/pyexecutor/py_executor.py
tests/unittest/_torch/sampler/test_beam_search.py

📚 Learning: 2025-08-19T12:45:11.997Z

Learnt from: amitz-nv
Repo: NVIDIA/TensorRT-LLM PR: 7033
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:0-0
Timestamp: 2025-08-19T12:45:11.997Z
Learning: In tensorrt_llm/_torch/pyexecutor/model_engine.py, DoRA (Delta Orthogonal Rank Adaptation) functionality was removed from the PyTorch flow to eliminate issues with inverted DoRA detection logic. The original is_dora condition was checking if scaling_vec_pointer == 0, which was potentially incorrect.

Applied to files:

tensorrt_llm/_torch/pyexecutor/py_executor.py
tests/unittest/_torch/sampler/test_beam_search.py

📚 Learning: 2025-09-29T15:14:28.503Z

Learnt from: amitz-nv
Repo: NVIDIA/TensorRT-LLM PR: 8063
File: tensorrt_llm/lora_manager.py:1080-1112
Timestamp: 2025-09-29T15:14:28.503Z
Learning: In tensorrt_llm/lora_manager.py, when calculating part_sizes for attn_qkv fused LoRA modules, the sizes are correctly multiplied by tp_size because model_config.num_heads and model_config.num_kv_heads are already divided by tp_size (per-TP-rank values), so multiplication is needed to get the original full concatenated dimension size. The interleave_fused_lora_weights_for_tp function provides proper validation with asserts for total size and TP divisibility.

Applied to files:

tensorrt_llm/_torch/pyexecutor/py_executor.py

📚 Learning: 2025-09-29T15:14:28.503Z

Learnt from: amitz-nv
Repo: NVIDIA/TensorRT-LLM PR: 8063
File: tensorrt_llm/lora_manager.py:1080-1112
Timestamp: 2025-09-29T15:14:28.503Z
Learning: In tensorrt_llm/lora_manager.py, when calculating part_sizes for attn_qkv fused LoRA modules, the sizes are correctly multiplied by tp_size because model_config.num_heads and model_config.num_kv_heads are already divided by tp_size (per-TP-rank values), so multiplication is needed to get the original full concatenated dimension size. The interleave_fused_lora_weights_for_tp function provides proper validation.

Applied to files:

tensorrt_llm/_torch/pyexecutor/py_executor.py

📚 Learning: 2025-08-15T06:46:53.813Z

Learnt from: eopXD
Repo: NVIDIA/TensorRT-LLM PR: 6767
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:0-0
Timestamp: 2025-08-15T06:46:53.813Z
Learning: In the TensorRT-LLM KV cache manager, SWA (Sliding Window Attention) combined with beam search is currently in a broken/non-functional state and is planned for future rework. During preparatory refactoring phases, code related to SWA+beam search may intentionally remain in a non-working state until the broader rework is completed.

Applied to files:

tensorrt_llm/_torch/pyexecutor/py_executor.py

📚 Learning: 2025-08-18T08:42:02.640Z

Learnt from: samuellees
Repo: NVIDIA/TensorRT-LLM PR: 6974
File: tensorrt_llm/serve/scripts/benchmark_dataset.py:558-566
Timestamp: 2025-08-18T08:42:02.640Z
Learning: In TensorRT-LLM's RandomDataset (tensorrt_llm/serve/scripts/benchmark_dataset.py), when using --random-token-ids option, sequence length accuracy is prioritized over semantic correctness for benchmarking purposes. The encode/decode operations should use skip_special_tokens=True and add_special_tokens=False to ensure exact target token lengths.

Applied to files:

tensorrt_llm/_torch/pyexecutor/py_executor.py

📚 Learning: 2025-08-26T06:07:02.166Z

Learnt from: shaharmor98
Repo: NVIDIA/TensorRT-LLM PR: 7231
File: tensorrt_llm/_torch/pyexecutor/_util.py:504-509
Timestamp: 2025-08-26T06:07:02.166Z
Learning: In tensorrt_llm/_torch/pyexecutor/_util.py, when calling model_engine.set_lora_model_config(), pass model_binding_config.mlp_hidden_size directly without multiplying by mapping.tp_size, as the mlp_hidden_size from get_bindings_model_config() is already the per-TP rank value needed for LoRA weight packaging.

Applied to files:

tensorrt_llm/_torch/pyexecutor/py_executor.py

📚 Learning: 2025-08-15T06:46:54.897Z

Learnt from: eopXD
Repo: NVIDIA/TensorRT-LLM PR: 6767
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:0-0
Timestamp: 2025-08-15T06:46:54.897Z
Learning: In cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp addToken function, newly allocated blocks are unshared by design. The beam search path in addToken (when sequence.getNumTokens() > windowSize) is currently broken/non-functional with SWA, so the block allocation doesn't follow a shared-then-unshared pattern.

Applied to files:

tensorrt_llm/_torch/pyexecutor/py_executor.py

📚 Learning: 2025-07-28T17:06:08.621Z

Learnt from: moraxu
Repo: NVIDIA/TensorRT-LLM PR: 6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

tests/unittest/_torch/sampler/test_beam_search.py

📚 Learning: 2025-08-26T09:37:10.463Z

Learnt from: jiaganc
Repo: NVIDIA/TensorRT-LLM PR: 7031
File: tensorrt_llm/bench/dataclasses/configuration.py:90-104
Timestamp: 2025-08-26T09:37:10.463Z
Learning: In TensorRT-LLM's bench configuration, the `get_pytorch_perf_config()` method returns `self.pytorch_config` which is a Dict[str, Any] that can contain default values including `cuda_graph_config`, making the fallback `llm_args["cuda_graph_config"]` safe to use.

Applied to files:

tests/unittest/_torch/sampler/test_beam_search.py

📚 Learning: 2025-08-26T09:37:10.463Z

Learnt from: jiaganc
Repo: NVIDIA/TensorRT-LLM PR: 7031
File: tensorrt_llm/bench/dataclasses/configuration.py:90-104
Timestamp: 2025-08-26T09:37:10.463Z
Learning: In TensorRT-LLM, the `get_pytorch_perf_config()` method returns `self.pytorch_config` which can contain default `cuda_graph_config` values, so `llm_args` may already have this config before the extra options processing.

Applied to files:

tests/unittest/_torch/sampler/test_beam_search.py

📚 Learning: 2025-08-01T15:14:45.673Z

Learnt from: yibinl-nvidia
Repo: NVIDIA/TensorRT-LLM PR: 6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

tests/unittest/_torch/sampler/test_beam_search.py

🧬 Code graph analysis (4)

tensorrt_llm/_torch/pyexecutor/py_executor.py (2)

cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp (6)

request (893-943)

request (893-893)

request (945-952)

request (945-945)

request (954-1005)

request (954-954)

tensorrt_llm/_torch/pyexecutor/sampler.py (1)

beam_width (133-136)

tests/unittest/_torch/executor/test_executor_request_queue.py (1)

tensorrt_llm/_torch/pyexecutor/executor_request_queue.py (1)

_handle_special_queue_items (485-504)

tensorrt_llm/_torch/pyexecutor/executor_request_queue.py (1)

tensorrt_llm/_torch/pyexecutor/llm_request.py (2)

append (101-127)

append (192-209)

tests/unittest/_torch/sampler/test_beam_search.py (2)

tensorrt_llm/executor/utils.py (1)

RequestError (76-77)

tensorrt_llm/_torch/models/checkpoints/hf/checkpoint_loader.py (1)

HfCheckpointLoader (19-75)

🪛 Ruff (0.14.4)

tensorrt_llm/_torch/pyexecutor/py_executor.py

1597-1600: Avoid specifying long messages outside the exception class

(TRY003)

tests/unittest/_torch/sampler/test_beam_search.py

536-536: Pattern passed to match= contains metacharacters but is neither escaped nor raw

(RUF043)

560-560: Pattern passed to match= contains metacharacters but is neither escaped nor raw

(RUF043)

582-582: Pattern passed to match= contains metacharacters but is neither escaped nor raw

(RUF043)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (1)

tests/unittest/_torch/sampler/test_beam_search.py (1)

21-23: Centralizing LLM construction via model_kwargs + _build_llm improves test clarity

Using a model_kwargs fixture plus _build_llm to assemble the LLM instance keeps the beam-search tests DRY and makes it easy to swap between the dummy checkpoint setup and real checkpoints in other fixtures. Both llm and llm_cuda_graph now share the same core configuration (batch size, seq length, max_beam_width), which should reduce drift between test variants.

This structure also plays nicely with the new parameter-validation tests that reuse _build_llm against a “real” TinyLlama checkpoint.

Also applies to: 269-313

tensorrt_llm/_torch/pyexecutor/py_executor.py

tensorrt-cicd · 2025-11-18T04:38:15Z

PR_Github #24780 [ run ] completed with state SUCCESS. Commit: f228828
/LLM/main/L0_MergeRequest_PR pipeline #18697 completed with status: 'FAILURE'

ixlmar · 2025-11-18T08:40:08Z

/bot run

tensorrt-cicd · 2025-11-18T08:46:18Z

PR_Github #24882 [ run ] triggered by Bot. Commit: f228828

ixlmar · 2025-11-18T09:14:55Z

/bot run

Signed-off-by: ixlmar <[email protected]>

ixlmar · 2025-11-18T09:17:16Z

/bot run

tensorrt-cicd · 2025-11-18T09:21:33Z

PR_Github #24891 [ run ] triggered by Bot. Commit: bc65d51

tensorrt-cicd · 2025-11-18T09:21:35Z

PR_Github #24882 [ run ] completed with state ABORTED. Commit: f228828
LLM/main/L0_MergeRequest_PR #18785 (Blue Ocean) completed with status: ABORTED

tensorrt-cicd · 2025-11-18T09:22:55Z

PR_Github #24892 [ run ] triggered by Bot. Commit: bc65d51

tensorrt-cicd · 2025-11-18T09:22:57Z

PR_Github #24891 [ run ] completed with state ABORTED. Commit: bc65d51

tensorrt-cicd · 2025-11-18T10:55:29Z

PR_Github #24892 [ run ] completed with state SUCCESS. Commit: bc65d51
/LLM/main/L0_MergeRequest_PR pipeline #18794 completed with status: 'FAILURE'

ixlmar · 2025-11-18T11:17:26Z

/bot run

tensorrt-cicd · 2025-11-18T11:23:10Z

PR_Github #24901 [ run ] triggered by Bot. Commit: bc65d51

tensorrt-cicd · 2025-11-18T13:02:29Z

PR_Github #24901 [ run ] completed with state SUCCESS. Commit: bc65d51
/LLM/main/L0_MergeRequest_PR pipeline #18802 completed with status: 'FAILURE'

ixlmar · 2025-11-18T13:32:52Z

/bot run

tensorrt-cicd · 2025-11-18T13:38:20Z

PR_Github #24912 [ run ] triggered by Bot. Commit: bc65d51

tensorrt-cicd · 2025-11-19T00:35:18Z

PR_Github #24912 [ run ] completed with state SUCCESS. Commit: bc65d51
/LLM/main/L0_MergeRequest_PR pipeline #18812 completed with status: 'FAILURE'

ixlmar · 2025-11-19T07:23:27Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-11-19T07:28:36Z

PR_Github #25004 [ run ] triggered by Bot. Commit: bc65d51

ixlmar · 2025-11-19T07:49:23Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-11-19T07:55:40Z

PR_Github #25013 [ run ] triggered by Bot. Commit: bc65d51

tensorrt-cicd · 2025-11-19T07:59:24Z

PR_Github #25013 [ run ] completed with state FAILURE. Commit: bc65d51
/LLM/main/L0_MergeRequest_PR pipeline #18895 completed with status: 'FAILURE'

ixlmar · 2025-11-19T08:56:44Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-11-19T09:01:55Z

PR_Github #25030 [ run ] triggered by Bot. Commit: bc65d51

tensorrt-cicd · 2025-11-19T20:57:50Z

PR_Github #25030 [ run ] completed with state SUCCESS. Commit: bc65d51
/LLM/main/L0_MergeRequest_PR pipeline #18912 completed with status: 'FAILURE'

ixlmar · 2025-11-20T07:26:02Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-11-20T07:32:48Z

PR_Github #25165 [ run ] triggered by Bot. Commit: bc65d51

tensorrt-cicd · 2025-11-20T15:25:33Z

PR_Github #25165 [ run ] completed with state SUCCESS. Commit: bc65d51
/LLM/main/L0_MergeRequest_PR pipeline #19026 completed with status: 'FAILURE'

ixlmar · 2025-11-20T15:26:59Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-11-20T15:32:29Z

PR_Github #25214 [ run ] triggered by Bot. Commit: bc65d51

tensorrt-cicd · 2025-11-20T23:42:19Z

PR_Github #25214 [ run ] completed with state SUCCESS. Commit: bc65d51
/LLM/main/L0_MergeRequest_PR pipeline #19069 completed with status: 'FAILURE'

ixlmar · 2025-11-21T09:16:56Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-11-21T09:22:17Z

PR_Github #25348 [ run ] triggered by Bot. Commit: bc65d51

tensorrt-cicd · 2025-11-21T12:08:43Z

PR_Github #25348 [ run ] completed with state SUCCESS. Commit: bc65d51
/LLM/main/L0_MergeRequest_PR pipeline #19172 completed with status: 'SUCCESS'

ixlmar requested review from Funatiq and stnie November 17, 2025 16:16

ixlmar force-pushed the chore/cherry-pick-beam-search-request-validation branch from 6390620 to f228828 Compare November 17, 2025 16:17

ixlmar marked this pull request as ready for review November 17, 2025 16:17

ixlmar requested a review from a team as a code owner November 17, 2025 16:17

coderabbitai bot reviewed Nov 17, 2025

View reviewed changes

stnie reviewed Nov 17, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/py_executor.py Outdated Show resolved Hide resolved

Funatiq approved these changes Nov 17, 2025

View reviewed changes

ixlmar force-pushed the chore/cherry-pick-beam-search-request-validation branch from f228828 to 6fdd356 Compare November 18, 2025 09:14

ixlmar force-pushed the chore/cherry-pick-beam-search-request-validation branch from 6fdd356 to 922fd50 Compare November 18, 2025 09:15

[TRTLLM-8650][fix] beam search request validation (NVIDIA#8433)

bc65d51

Signed-off-by: ixlmar <[email protected]>

ixlmar force-pushed the chore/cherry-pick-beam-search-request-validation branch from 922fd50 to bc65d51 Compare November 18, 2025 09:17

ixlmar requested a review from stnie November 18, 2025 09:30

stnie approved these changes Nov 18, 2025

View reviewed changes

ixlmar enabled auto-merge (squash) November 18, 2025 18:42

ixlmar merged commit 095b686 into NVIDIA:main Nov 21, 2025
5 checks passed

ixlmar deleted the chore/cherry-pick-beam-search-request-validation branch November 21, 2025 12:10

[TRTLLM-8650][fix] beam search request validation (#8433) #9228

[TRTLLM-8650][fix] beam search request validation (#8433) #9228

Uh oh!

Conversation

ixlmar commented Nov 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Summary by CodeRabbit

Release Notes

Uh oh!

ixlmar commented Nov 17, 2025

Uh oh!

coderabbitai bot commented Nov 17, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

tensorrt-cicd commented Nov 17, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tensorrt-cicd commented Nov 18, 2025

Uh oh!

ixlmar commented Nov 18, 2025

Uh oh!

tensorrt-cicd commented Nov 18, 2025

Uh oh!

ixlmar commented Nov 18, 2025

Uh oh!

ixlmar commented Nov 18, 2025

Uh oh!

tensorrt-cicd commented Nov 18, 2025

Uh oh!

tensorrt-cicd commented Nov 18, 2025

Uh oh!

tensorrt-cicd commented Nov 18, 2025

Uh oh!

tensorrt-cicd commented Nov 18, 2025

Uh oh!

tensorrt-cicd commented Nov 18, 2025

Uh oh!

ixlmar commented Nov 18, 2025

Uh oh!

tensorrt-cicd commented Nov 18, 2025

Uh oh!

tensorrt-cicd commented Nov 18, 2025

Uh oh!

ixlmar commented Nov 18, 2025

Uh oh!

tensorrt-cicd commented Nov 18, 2025

Uh oh!

tensorrt-cicd commented Nov 19, 2025

Uh oh!

ixlmar commented Nov 19, 2025

Uh oh!

tensorrt-cicd commented Nov 19, 2025

Uh oh!

ixlmar commented Nov 19, 2025

Uh oh!

tensorrt-cicd commented Nov 19, 2025

Uh oh!

tensorrt-cicd commented Nov 19, 2025

Uh oh!

ixlmar commented Nov 19, 2025

Uh oh!

tensorrt-cicd commented Nov 19, 2025

Uh oh!

tensorrt-cicd commented Nov 19, 2025

Uh oh!

ixlmar commented Nov 20, 2025

ixlmar commented Nov 17, 2025 •

edited by coderabbitai bot

Loading