[None][feat] Enable joint optimization of agent applications and TensorRT-LLM with Scaffolding by Boreas618 · Pull Request #11173 · NVIDIA/TensorRT-LLM

Boreas618 · 2026-02-01T15:00:48Z

Summary by CodeRabbit

Release Notes

New Features
- Added agent-tree based request scheduling for intelligent prioritization in batch processing.
- Implemented KV cache hints and truncation for optimized memory management in multi-turn conversations.
- Introduced new chat and multi-turn conversation controllers with tool-calling support.
- Added MCP (Model Context Protocol) worker integration for agentic workflows.
- Expanded performance metrics to include KV cache utilization and block tracking.
- Added multiround chat benchmarking with configurable traffic patterns and conversation distributions.
Bug Fixes
- Fixed input validation to handle empty lists gracefully.
Documentation
- Added comprehensive README and examples for benchmarking and deep research workflows.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Description

This PR addresses several issues in the Scaffolding-based implementation of Open Deep Research. It adds support for capturing agent application information via Task Collection, enables proactive KV cache eviction to reduce GPU memory pressure from agents, and introduces agent-tree–based performance isolation between agents and chatbots. It also includes benchmarks for evaluating Scaffolding-based agents.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

coderabbitai · 2026-02-01T15:06:45Z

📝 Walkthrough

Walkthrough

The changes implement a comprehensive agent-tree-based scheduling framework for batch request prioritization, introduce KV cache truncation capabilities with extended metrics tracking, enhance the scaffolding framework with multi-turn chat support and tool integration, and add benchmarking infrastructure. New agent hierarchy hierarchies enable prioritized batching, while expanded KV cache management and performance metrics provide observability and optimization. Supporting implementations include Deep Research workflows, MCP integration, and example client scripts.

Changes

Cohort / File(s)	Summary
Agent Tree Batch Scheduling `cpp/include/tensorrt_llm/batch_manager/agentTree.h`, `cpp/tensorrt_llm/batch_manager/agentTree.cpp`	Introduces comprehensive agent-tree-based scheduling with NodeFactory, multiple node types (AgentChatbot, AgentLatency, AgentDeepResearch, Chatbot), request merging strategies, and AgentTreePolicy for prioritized batching based on agent ratios and hierarchies.
Batch Scheduler Integration `cpp/include/tensorrt_llm/batch_manager/capacityScheduler.h`, `cpp/tensorrt_llm/batch_manager/capacityScheduler.cpp`, `cpp/include/tensorrt_llm/batch_manager/resortPolicy.h`, `cpp/tensorrt_llm/batch_manager/microBatchScheduler.h`, `cpp/tensorrt_llm/batch_manager/CMakeLists.txt`	Integrates ResortPolicy base class and AgentTreePolicy into CapacityScheduler to enable dynamic request reordering; adds setAgentTreeResortPolicy method and applies policies to incoming request streams.
KV Cache Management Enhancements `cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h`, `cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp`, `cpp/tensorrt_llm/batch_manager/llmRequest.h`	Adds truncateBlocks methods across WindowBlockManager, BlockManager, and KVCacheManager for per-sequence KV cache trimming; introduces releaseChildren for block tree cleanup; extends LlmRequest with agent_hierarchy and KV cache system stats tracking.
Performance Metrics Expansion `cpp/include/tensorrt_llm/executor/types.h`, `cpp/tensorrt_llm/executor/serialization.cpp`, `cpp/tensorrt_llm/nanobind/executor/request.cpp`	Adds maxNumBlocks, usedNumBlocks, and freeNumBlocks fields to KvCacheMetrics; extends serialization/deserialization paths and Python bindings for metrics reporting.
C++ Python Bindings `cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp`, `cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp`, `cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp`	Exposes setAgentTreeResortPolicy, agent_hierarchy parameter, and truncateBlocks to Python via nanobind; supports scheduled request configuration and KV cache operations from Python.
PyExecutor Runtime `tensorrt_llm/_torch/pyexecutor/_util.py`, `tensorrt_llm/_torch/pyexecutor/llm_request.py`, `tensorrt_llm/_torch/pyexecutor/py_executor.py`, `tensorrt_llm/_torch/pyexecutor/resource_manager.py`, `tensorrt_llm/_torch/pyexecutor/scheduler.py`	Adds resort policy configuration support, agent hierarchy propagation to LlmRequest, KV cache hint request queuing and synchronization across ranks, and truncate_blocks delegation.
Core API & Scheduling `tensorrt_llm/llmapi/llm.py`, `tensorrt_llm/llmapi/llm_args.py`, `tensorrt_llm/llmapi/__init__.py`, `tensorrt_llm/executor/request.py`, `tensorrt_llm/executor/base_worker.py`, `tensorrt_llm/executor/worker.py`, `tensorrt_llm/executor/proxy.py`, `tensorrt_llm/scheduling_params.py`, `tensorrt_llm/commands/serve.py`	Introduces ResortRequestPolicyConfig, KVCacheHintRequest and TruncateKVCacheRequest types, agent_hierarchy field in SchedulingParams and ChatCompletionRequest, set_kv_cache_hints public methods, and CLI options for agent configuration.
Scaffolding Framework Foundation `tensorrt_llm/scaffolding/__init__.py`, `tensorrt_llm/scaffolding/scaffolding_llm.py`, `tensorrt_llm/scaffolding/benchmark.py`, `tensorrt_llm/scaffolding/load_generation_strategy.py`	Expands public API with NativeChatController, ChatWithMCPController, task types; implements LoadGenerationStrategy hierarchy (Synchronous, Concurrent, Poisson, ConstantRate) for flexible request timing; adds max_parallel_requests configuration.
Scaffolding Task & Collection `tensorrt_llm/scaffolding/task.py`, `tensorrt_llm/scaffolding/task_collection.py`, `tensorrt_llm/scaffolding/controller.py`	Introduces RoleMessage hierarchy (User, Assistant, System), ChatTask, MCPCallTask, DropKVCacheTask; adds comprehensive task collections (ChatTokenCounter, TaskTimer, QueryCollector, TaskMetricsCollector, SubRequestMarker); implements NativeChatController and ChatWithMCPController with tool-call orchestration.
Scaffolding Workers `tensorrt_llm/scaffolding/worker.py`	Extends OpenaiWorker with KV cache hint and agent hierarchy support; introduces MCPWorker for multi-endpoint tool coordination; adds deterministic inference mode and DropKVCacheTask handling.
Deep Research Implementation `tensorrt_llm/scaffolding/contrib/open_deep_research/supervisor.py`, `tensorrt_llm/scaffolding/contrib/open_deep_research/researcher.py`, `tensorrt_llm/scaffolding/contrib/open_deep_research/tools.py`, `tensorrt_llm/scaffolding/contrib/open_deep_research/prompts.py`, `tensorrt_llm/scaffolding/contrib/open_deep_research/utils.py`, `tensorrt_llm/scaffolding/contrib/open_deep_research/__init__.py`	Implements Supervisor and Researcher controllers for multi-turn research workflows with tool integration; provides factory function create_open_deep_research_scaffolding_llm; defines web search, research, and reflection tools.
MCP Integration `tensorrt_llm/scaffolding/contrib/mcp/chat_handler.py`, `tensorrt_llm/scaffolding/contrib/mcp/chat_task.py`	Adds deterministic inference support via environment variable; introduces agent_hierarchy field to ChatTask for scheduling information tracking.
OpenAI Server `tensorrt_llm/serve/openai_protocol.py`, `tensorrt_llm/serve/openai_server.py`, `tensorrt_llm/serve/harmony_adapter.py`, `tensorrt_llm/serve/postprocess_handlers.py`	Adds CompletionTokensDetails, TimingMetrics, KvCacheMetrics, SpeculativeDecodingMetrics, PerfMetrics data models; exposes /v1/kv_cache_hints endpoint; integrates scheduling parameters and performance metrics into response paths.
Benchmarking Infrastructure `examples/scaffolding/bench_kv_cache_hints.py`, `examples/scaffolding/benchmarks/__main__.py`, `examples/scaffolding/benchmarks/__init__.py`, `examples/scaffolding/benchmarks/agent_benchmark.py`, `examples/scaffolding/benchmarks/benchmark_utils.py`, `examples/scaffolding/benchmarks/chat_benchmark.py`, `examples/scaffolding/benchmarks/multiround_chat_benchmark.py`, `examples/scaffolding/benchmarks/README.md`	Introduces comprehensive benchmarking suite with KV cache hints evaluation, agent/burst/chat/multiround benchmark modes, configurable distribution-based synthetic data generation, and detailed metrics collection.
Example Scripts & Utilities `examples/scaffolding/openai_chat_client_for_agent_tree.py`, `examples/serve/kv_cache_hint.sh`, `examples/scaffolding/contrib/open_deep_research/run_deep_research.py`, `examples/scaffolding/contrib/open_deep_research/WordLlama/main.py`, `examples/scaffolding/contrib/open_deep_research/WordLlama/wordllama_serve.py`, `examples/scaffolding/contrib/mcp/mcptest.py`, `examples/scaffolding/contrib/open_deep_research/TavilyMCP/travily.py`, `examples/scaffolding/contrib/open_deep_research/data/open_deepresearch_bench.json`	Provides example client for agent tree queries, KV cache hint shell script, Deep Research orchestration, WordLlama-based search MCP server, and benchmark datasets.
Unit Tests `cpp/tests/unit_tests/batch_manager/agentTreeTest.cpp`, `cpp/tests/unit_tests/batch_manager/truncateBlocksTest.cpp`, `cpp/tests/unit_tests/executor/serializeUtilsTest.cpp`, `tests/unittest/scaffolding/test_bench.py`, `tests/unittest/scaffolding/test_mcp_worker.py`, `tests/unittest/scaffolding/test_task_collection.py`, `tests/unittest/scaffolding/test_worker.py`	Adds comprehensive test coverage for agent tree sorting, KV cache truncation and reuse, metrics serialization, scaffolding benchmarks, MCP worker integration, task collection flows, and chat task handling.
Configuration & Dependencies `pyproject.toml`, `requirements.txt`	Adds mcp dependency; updates Codespell ignore list.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client Request
    participant Scheduler as CapacityScheduler
    participant Policy as AgentTreePolicy
    participant Tree as AgentTreeNode
    participant Queue as Request Queue

    Client->>Scheduler: Submit Requests
    Scheduler->>Policy: resortRequests(activeRequests)
    Policy->>Tree: insertRequest() per request
    Tree->>Tree: routeByHierarchy(level, nodeType)
    Note over Tree: Insert into appropriate node
    Tree->>Tree: getRequests() aggregate from tree
    Tree->>Tree: mergeNodesSequence() per node type
    Note over Tree: Apply node-specific merge logic<br/>(ratio mixing, reordering, etc.)
    Policy->>Policy: Partition gen vs non-gen
    Policy->>Policy: truncateByCapacity()
    Policy->>Queue: Return reordered requests
    Scheduler->>Scheduler: Schedule from requestsToSchedule

sequenceDiagram
    participant User as User Input
    participant Controller as ChatWithMCPController
    participant Generator as Generation Controller
    participant MCP as MCPWorker
    participant Task as Task Output

    User->>Controller: generate(prompt)
    Controller->>Controller: Create ChatTask from prompt
    loop Iteration <= max_iterations
        Controller->>Generator: process([ChatTask])
        Generator->>Generator: Generate with tools enabled
        Generator->>Task: Yield ChatTask with AssistantMessage
        alt Tool calls present
            Controller->>Controller: Extract tool_calls
            Controller->>MCP: MCPCallTask for each tool
            MCP->>MCP: Execute via web service
            MCP->>Task: Return result_str
            Controller->>Controller: Append UserMessage with result
        else No tool calls
            Controller->>Task: Break loop
        end
    end
    Controller->>Task: Return final GenerationResult

sequenceDiagram
    participant Request as LlmRequest
    participant KVMgr as KVCacheManager
    participant BlockMgr as BlockManager
    participant EvictPolicy as Eviction Policy

    Request->>KVMgr: truncateBlocks(targetTokens, numKeep)
    KVMgr->>KVMgr: Iterate window sizes
    KVMgr->>BlockMgr: truncateBlocks per window
    BlockMgr->>BlockMgr: Locate blocks exceeding numKeep
    BlockMgr->>BlockMgr: Detach excess blocks
    BlockMgr->>EvictPolicy: Release to free pool
    Note over EvictPolicy: Mark blocks as available
    BlockMgr->>Request: Update token counts
    KVMgr->>Request: setKvCacheSystemStats()

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

coderabbitai

Actionable comments posted: 2

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (9)

cpp/tensorrt_llm/batch_manager/microBatchScheduler.cpp (1)

1-16: ⚠️ Potential issue | 🟡 Minor

Update copyright year to reflect latest modification (2026).

This file was modified in 2026; the header still lists 2025. Please update the year to the latest meaningful modification.

As per coding guidelines: All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification.
tensorrt_llm/inputs/data.py (1)
1-2: ⚠️ Potential issue | 🟠 Major

Add NVIDIA copyright header.

Source files should include the standard NVIDIA header with the latest modification year (2026).
✅ Suggested header
+# Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
 # Adapt from
 # https://github.com/vllm-project/vllm/blob/2e33fe419186c65a18da6668972d61d7bbc31564/vllm/inputs/data.py
As per coding guidelines, “All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification.”
examples/scaffolding/contrib/open_deep_research/TavilyMCP/travily.py (2)
1-13: ⚠️ Potential issue | 🟡 Minor

Missing NVIDIA copyright header.

Per coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header with the year of latest meaningful modification. As per coding guidelines: "All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification."
📝 Proposed fix to add copyright header
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 import asyncio
 import logging
61-74: ⚠️ Potential issue | 🟡 Minor

Filename appears to have a typo.

The file is named travily.py but likely should be tavily.py to match the Tavily service it integrates with. This could cause confusion when searching for or referencing the file.
tensorrt_llm/commands/serve.py (1)

604-633: ⚠️ Potential issue | 🟠 Major

Wire agent CLI options into LLM configuration and validate inputs.

The agent_percentage and agent_types parameters are accepted from the CLI but never used in the serve function, making these flags ineffective. The range validation for agent_percentage should also be enforced at the CLI layer. However, the exact configuration key and structure for agent scheduling needs to be verified against the actual LLM backend implementation, as resort_policy_config does not appear in the codebase and the intended integration point is unclear.

tensorrt_llm/executor/worker.py (1)

1-5: ⚠️ Potential issue | 🟠 Major

Add the NVIDIA copyright header.

This file is a TensorRT‑LLM source file and should carry the standard NVIDIA header with the year of latest meaningful modification (2026 here). Please copy the canonical header used elsewhere in the repo.

As per coding guidelines: All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification.

tensorrt_llm/llmapi/llm.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add the NVIDIA copyright header.

This file is a TensorRT‑LLM source file and should carry the standard NVIDIA header with the year of latest meaningful modification (2026 here). Please copy the canonical header used elsewhere in the repo.

As per coding guidelines: All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification.

tensorrt_llm/scaffolding/scaffolding_llm.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add the NVIDIA copyright header.

This file is a TensorRT‑LLM source file and should carry the standard NVIDIA header with the year of latest meaningful modification (2026 here). Please copy the canonical header used elsewhere in the repo.

As per coding guidelines: All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification.
cpp/include/tensorrt_llm/executor/types.h (1)
1-2: ⚠️ Potential issue | 🟡 Minor

Update copyright year to include 2025.

The copyright header shows "2022-2024" but this file has been modified in 2025. As per coding guidelines, the copyright year should reflect the year of latest meaningful modification.
 /*
- * Copyright (c) 2022-2024, NVIDIA CORPORATION.  All rights reserved.
+ * Copyright (c) 2022-2025, NVIDIA CORPORATION.  All rights reserved.

🤖 Fix all issues with AI agents

In `@tensorrt_llm/_torch/pyexecutor/_util.py`:
- Around line 881-890: The resort policy handling and creation of the
SimpleScheduler reference variables (resort_policy_config, capacity_scheduler,
mb_scheduler) that are only defined inside the else branch, causing a NameError
when Python scheduler path is taken; fix by moving the entire if
resort_policy_config is not None: ...
capacity_scheduler.impl.set_agent_tree_resort_policy(...) and the subsequent
scheduler = SimpleScheduler(capacity_scheduler, mb_scheduler) lines inside the
same else block where capacity_scheduler and mb_scheduler are defined (so that
llm_args.resort_policy_config, capacity_scheduler, mb_scheduler and
SimpleScheduler are all in scope).

In `@tensorrt_llm/serve/harmony_adapter.py`:
- Around line 1719-1739: The function _create_usage_info currently uses a Python
3.10+ union style hint for optional parameters; change the type annotations for
reasoning_text and tokenizer to use typing.Optional to remain Python 3.8+
compatible (import Optional from typing), e.g. annotate reasoning_text as
Optional[str] and tokenizer as Optional[<tokenizer-type>] in the
_create_usage_info signature and any related variable/type uses, and update any
other occurrences in this file that use the `X | None` syntax to Optional[X];
keep the function logic unchanged.

🟠 Major comments (32)

examples/scaffolding/bench_kv_cache_hints.py-1-1 (1)
1-1: ⚠️ Potential issue | 🟠 Major

Add the required NVIDIA copyright header.

This new source file is missing the mandatory NVIDIA copyright header with the latest modification year. Please add it before the module docstring.
📝 Proposed header
+# Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
 """Benchmark TensorRT-LLM with KV cache hints for multi-turn conversations.
As per coding guidelines, all TensorRT-LLM source files (.py) should contain an NVIDIA copyright header with the year of latest meaningful modification.
examples/scaffolding/contrib/open_deep_research/run_deep_research.py-70-72 (1)
70-72: ⚠️ Potential issue | 🟠 Major

Use async_shutdown() for MCPWorker to properly clean up async resources.

The MCPWorker class has an async_shutdown() method that properly signals background tasks to stop and waits for their completion. Using synchronous shutdown() in an async context may not properly clean up the async resources.
🐛 Proposed fix
     llm.shutdown()
     generation_worker.shutdown()
-    mcp_worker.shutdown()
+    await mcp_worker.async_shutdown()
     return
examples/scaffolding/benchmarks/benchmark_utils.py-1-2 (1)

1-2: ⚠️ Potential issue | 🟠 Major

Add the standard NVIDIA copyright header.

This new source file is missing the required NVIDIA copyright header with the year of latest meaningful modification. Please add the repo-standard header at the top.

As per coding guidelines, all TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification.
examples/scaffolding/benchmarks/multiround_chat_benchmark.py-1-12 (1)
1-12: ⚠️ Potential issue | 🟠 Major

Add NVIDIA copyright header.

The file is missing the required NVIDIA copyright/SPDX header.
📝 Proposed header
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
 """Multi-turn conversation benchmark for scaffolding.
As per coding guidelines, "All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification".
tensorrt_llm/scaffolding/load_generation_strategy.py-1-4 (1)
1-4: ⚠️ Potential issue | 🟠 Major

Add NVIDIA copyright header.

The file is missing the required NVIDIA copyright/SPDX header.
📝 Proposed header
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
 """Load generation strategies for TensorRT-LLM Scaffolding Benchmark.
As per coding guidelines, "All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification".
tensorrt_llm/scaffolding/load_generation_strategy.py-147-205 (1)
147-205: ⚠️ Potential issue | 🟠 Major

max_concurrency is ignored in ThroughputStrategy and ConstantRateStrategy.

Both classes define max_concurrency but never override get_semaphore, so the limit is never enforced.
🛠️ Suggested fix
 class ThroughputStrategy(LoadGenerationStrategy):
@@
     max_concurrency: Optional[int] = Field(default=None, description="Maximum concurrency limit")
 
+    def get_semaphore(self):
+        """Return semaphore with specified concurrency limit if set."""
+        import asyncio
+
+        if self.max_concurrency is not None:
+            return asyncio.Semaphore(self.max_concurrency)
+        return _UnlimitedSemaphore()
+
     async def _generate_request_times(self) -> AsyncGenerator[float, None]:
         """All requests sent at start time immediately."""
         init_time = self.start_time
         while True:
             yield init_time
 class ConstantRateStrategy(LoadGenerationStrategy):
@@
     max_concurrency: Optional[int] = Field(default=None, description="Maximum concurrency limit")
 
+    def get_semaphore(self):
+        """Return semaphore with specified concurrency limit if set."""
+        import asyncio
+
+        if self.max_concurrency is not None:
+            return asyncio.Semaphore(self.max_concurrency)
+        return _UnlimitedSemaphore()
+
     async def _generate_request_times(self) -> AsyncGenerator[float, None]:
         """Generate send times at fixed intervals."""
         constant_increment = 1.0 / self.rate
tensorrt_llm/scaffolding/scaffolding_llm.py-225-226 (1)
225-226: ⚠️ Potential issue | 🟠 Major

Respect the shutdown_workers flag for async shutdown.

async_shutdown() now runs unconditionally, which changes behavior even when callers pass shutdown_workers=False. Gate it behind the flag (or rename the parameter).
🛠️ Suggested fix
         async def stop_task_on_loop():
             await self.task_queue.put(None)
             await self.main_loop_stop_event.wait()
-            for worker in self.workers.values():
-                await worker.async_shutdown()
+            if shutdown_workers:
+                for worker in self.workers.values():
+                    await worker.async_shutdown()
examples/serve/kv_cache_hint.sh-38-52 (1)
38-52: ⚠️ Potential issue | 🟠 Major

Fix KV‑cache hint field names to match the server API.

The server expects messages_to_retain and messages (not context_to_retain / context_to_remove). The current payload will fail validation or be ignored.
🛠️ Suggested fix
   -d "{
         \"model\": \"Qwen3-8B\",
         \"action\": \"truncate\",
-        \"messages\": [],
-        \"context_to_retain\": [
+        \"messages_to_retain\": [
           {\"role\": \"system\", \"content\": \"$prompt\"}
         ],
-        \"context_to_remove\": [
+        \"messages\": [
           {\"role\": \"user\", \"content\": \"$prompt\"}
         ],
         \"temperature\": 0.8,
         \"top_p\": 0.95
       }"
tensorrt_llm/scaffolding/contrib/mcp/chat_handler.py-1-1 (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add the NVIDIA copyright header.

This file is a TensorRT‑LLM source file and should carry the standard NVIDIA header with the year of latest meaningful modification (2026 here). Please copy the canonical header used elsewhere in the repo.

As per coding guidelines: All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification.
tensorrt_llm/llmapi/llm.py-627-669 (1)
627-669: ⚠️ Potential issue | 🟠 Major

Validate action and normalize sampling_params before tokenization.

input_processor often expects SamplingParams; passing None can lead to inconsistent tokenization or errors. Also, validating action here gives a clearer API contract than deferring to lower layers.
🛠️ Suggested fix
     def set_kv_cache_hints(
             self,
             action: Literal["truncate"],
             messages_to_retain: Union[str, List[int]],
             messages: Union[str, List[int]],
             sampling_params: Optional[SamplingParams] = None) -> None:
-        '''Set KV cache hints.
-        '''
+        '''Set KV cache hints.
+
+        Args:
+            action: Only "truncate" is supported.
+            messages_to_retain: Prompt(s) to retain.
+            messages: Prompt(s) representing the full context.
+            sampling_params: Sampling parameters used for tokenization.
+        '''
+        if action != "truncate":
+            raise ValueError(f"Invalid action: {action}")
+        sampling_params = self._prepare_sampling_params(sampling_params)
         if isinstance(messages, str) and messages == "":
             return
         if isinstance(messages, list) and len(messages) == 0:
             return
As per coding guidelines: Use Google-style docstrings for Python classes and functions, which can be parsed by Sphinx.
tensorrt_llm/executor/worker.py-334-335 (1)
334-335: ⚠️ Potential issue | 🟠 Major

Guard KV-cache hint handling to avoid crashing the worker loop.

If set_kv_cache_hints raises, the leader loop will abort and stop serving. Wrap this branch and log errors to keep the worker alive.
🛠️ Suggested fix
                 elif isinstance(req, KVCacheHintRequest):
-                    worker.set_kv_cache_hints(req)
+                    try:
+                        worker.set_kv_cache_hints(req)
+                    except Exception as e:
+                        logger.error(f"set_kv_cache_hints failed: {e}")
+                        logger.error(traceback.format_exc())
cpp/include/tensorrt_llm/batch_manager/resortPolicy.h-44-49 (1)

44-49: ⚠️ Potential issue | 🟠 Major

Remove the convenience overload or refactor to avoid unnecessary conversions.

The list-based resortRequests() overload copies requests between std::list and std::vector containers on every invocation. Since this is called from CapacityScheduler::operator() (the main batch scheduling routine), the overhead of two container conversions per scheduling decision could be significant for large request batches. Callers should either use the vector-based API directly or the convenience method should be eliminated to avoid hidden performance costs.
tensorrt_llm/scaffolding/contrib/open_deep_research/tools.py-1-4 (1)
1-4: 🛠️ Refactor suggestion | 🟠 Major

Missing NVIDIA copyright header.

Per coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header.
📝 Proposed fix to add copyright header
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 from tensorrt_llm.scaffolding.task import OpenAIToolDescription
examples/scaffolding/openai_chat_client_for_agent_tree.py-1-6 (1)
1-6: 🛠️ Refactor suggestion | 🟠 Major

Missing NVIDIA copyright header.

Per coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header with the year of latest meaningful modification.
📝 Proposed fix to add copyright header
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 from openai import OpenAI
tensorrt_llm/_torch/pyexecutor/py_executor.py-1665-1692 (1)
1665-1692: ⚠️ Potential issue | 🟠 Major

Potential issues in control queue processing.

Missing null check for kv_cache_manager: Line 1689 calls self.kv_cache_manager.truncate_blocks() without verifying that kv_cache_manager is not None. This could raise an AttributeError if KV cache is not configured.

Use TypeError for invalid type: Per the static analysis hint, TypeError is more appropriate than ValueError for invalid type checks.

qsize() is unreliable for thread safety: The qsize() method is not reliable in multi-threaded contexts. Consider using a try/except pattern with get_nowait() instead.
🐛 Proposed fix
     def _sync_and_process_control_queue(self):
         """
         Synchronizes and processes control queue items across all ranks.

         This method ensures that control queue items (like TruncateKVCacheRequest)
         are broadcast from rank 0 to all other ranks, so that all ranks execute
         the same control operations for consistency (e.g., KV cache truncation).
         """
         # Rank 0 collects items from the control queue
         if self.dist.rank == 0:
             control_requests = []
-            while self.control_queue.qsize() > 0:
-                request = self.control_queue.get_nowait()
-                if request is not None:
-                    control_requests.append(request)
+            while True:
+                try:
+                    request = self.control_queue.get_nowait()
+                    if request is not None:
+                        control_requests.append(request)
+                except queue.Empty:
+                    break
         else:
             control_requests = None

         # Broadcast control requests to all ranks
         control_requests = self.dist.broadcast(control_requests, root=0)

         # All ranks process the control requests
         for request in control_requests:
             if isinstance(request, TruncateKVCacheRequest):
+                if self.kv_cache_manager is None:
+                    logger.warning("KV cache manager is not configured, skipping truncate request")
+                    continue
                 self.kv_cache_manager.truncate_blocks(
                     request.messages, len(request.messages_to_retain))
             else:
-                raise ValueError(f"Invalid request type: {type(request)}.")
+                raise TypeError(f"Invalid request type: {type(request)}.")
tensorrt_llm/scaffolding/contrib/open_deep_research/researcher.py-1-12 (1)
1-12: ⚠️ Potential issue | 🟠 Major

Add required NVIDIA copyright header.

This new source file is missing the required NVIDIA SPDX header.
🛠️ Suggested fix
+# SPDX-License-Identifier: Apache-2.0
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
 import logging
 from dataclasses import dataclass, field
 from typing import List
As per coding guidelines, all TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification.
examples/scaffolding/benchmarks/__init__.py-1-5 (1)
1-5: ⚠️ Potential issue | 🟠 Major

Add required NVIDIA copyright header.

This new module is missing the required NVIDIA SPDX header.
🛠️ Suggested fix
+# SPDX-License-Identifier: Apache-2.0
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
 """Scaffolding benchmarks package.
As per coding guidelines, all TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification.
tensorrt_llm/serve/openai_server.py-493-528 (1)
493-528: ⚠️ Potential issue | 🟠 Major

Bug: convert_messages has incorrect return type handling and error propagation.

The inner function convert_messages has several issues:

Type hint mismatch: Declared return type is List[int] but it returns str (prompt) on success

Error handling bug: On exception, it returns a Response object, but callers at lines 519-520 use the result directly without checking for errors

Silent failure: If message conversion fails, the error Response is passed to self.llm.set_kv_cache_hints() as if it were valid prompt data

Additionally, line 493 has a style issue: missing space after = in documents =request.documents.
🐛 Proposed fix for error handling
-        documents =request.documents
+        documents = request.documents
         chat_template = request.chat_template
         chat_template_kwargs = request.chat_template_kwargs or {}

         from openai.types.chat import ChatCompletionMessageParam
-        def convert_messages(messages: List[ChatCompletionMessageParam]) -> List[int]:
+        def convert_messages(messages: List[ChatCompletionMessageParam]) -> Union[str, Response]:
             try:
                 conversation: List[ConversationMessage] = []
                 conversation, _, __ = parse_chat_messages_coroutines(messages, self.model_config, None)
                 prompt: str = apply_chat_template(
                     model_type=self.model_config.model_type,
                     tokenizer=self.tokenizer,
                     processor=self.processor,
                     conversation=conversation,
                     add_generation_prompt=add_generation_prompt,
                     mm_placeholder_counts=[],
                     tools=tool_dicts,
                     documents=documents,
                     chat_template=chat_template,
                     chat_template_kwargs=chat_template_kwargs,
                 )
                 return prompt
             except Exception as e:
                 logger.error(traceback.format_exc())
                 return self.create_error_response(str(e))

-        messages_to_retain = convert_messages(request.messages_to_retain) if request.messages_to_retain else []
-        messages = convert_messages(request.messages) if request.messages else []
+        messages_to_retain_result = convert_messages(request.messages_to_retain) if request.messages_to_retain else ""
+        if isinstance(messages_to_retain_result, Response):
+            return messages_to_retain_result
+        messages_to_retain = messages_to_retain_result
+
+        messages_result = convert_messages(request.messages) if request.messages else ""
+        if isinstance(messages_result, Response):
+            return messages_result
+        messages = messages_result
tensorrt_llm/scaffolding/contrib/open_deep_research/supervisor.py-52-57 (1)
52-57: ⚠️ Potential issue | 🟠 Major

SupervisorTask.create_from_prompt doesn’t populate input_str.

Line 52 sets user_prompt, but Supervisor.process reads input_str (Line 91). This leaves the prompt empty unless some other path sets it.
Suggested fix
     def create_from_prompt(prompt: str) -> "SupervisorTask":
         task = SupervisorTask()
         task.user_prompt = prompt
+        task.input_str = prompt
         task.research_brief = None
         task.final_report = None
         return task
cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp-1457-1474 (1)
1457-1474: ⚠️ Potential issue | 🟠 Major

Truncation threshold is checked before updating matched‑token count.

Line 1457 checks numMatchedTokens before adding the current block’s tokens, which can keep one extra block (and release descendants from the wrong node). Update the counter first and use >= to trigger truncation at the boundary.
Suggested fix
-            if (numMatchedTokens > numTokensToKeep)
-            {
-                matchingBlock->setPriority(executor::KvCacheRetentionConfig::kMinRetentionPriority);
-                releaseChildren(matchingBlock);
-                break;
-            }
-
-            if (mEventManager)
-            {
-                mEventManager->enqueueUpdatedEvent(tle::KVCacheUpdatedData(matchingBlock->getHash())
-                                                       .priorityUpdated(matchingBlock->getPriority(),
-                                                           executor::KvCacheRetentionConfig::kMinRetentionPriority),
-                    mWindowSize);
-            }
-
-            numMatchedTokens += numMatched > 0 ? numMatched : blockKey.uniqueTokens.size();
+            numMatchedTokens += numMatched > 0 ? numMatched : blockKey.uniqueTokens.size();
+            if (numMatchedTokens >= numTokensToKeep)
+            {
+                matchingBlock->setPriority(executor::KvCacheRetentionConfig::kMinRetentionPriority);
+                releaseChildren(matchingBlock);
+                break;
+            }
+
+            if (mEventManager)
+            {
+                mEventManager->enqueueUpdatedEvent(tle::KVCacheUpdatedData(matchingBlock->getHash())
+                                                       .priorityUpdated(matchingBlock->getPriority(),
+                                                           executor::KvCacheRetentionConfig::kMinRetentionPriority),
+                    mWindowSize);
+            }
cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp-1429-1474 (1)
1429-1474: ⚠️ Potential issue | 🟠 Major

Guard radix‑tree access with mCachedBlocksRootMutex.

Line 1441 reads/modifies mCachedBlocksRoot and then calls findMatchingBlock/releaseChildren without the mutex used in other radix‑tree operations, which risks races with concurrent cache updates.
Suggested fix
 void WindowBlockManager::truncateBlocks(LlmRequest::VecTokens const& targetTokens, SizeType32 numTokensToKeep)
 {
+    std::lock_guard<std::mutex> lock(mCachedBlocksRootMutex);
     auto numTokens = static_cast<SizeType32>(targetTokens.size());
     auto blockedTokens = chopVectorIntoBlocks<TokenIdType>(targetTokens, numTokens, mTokensPerBlock, true);
     SizeType32 numMatchedTokens = 0;
tensorrt_llm/scaffolding/contrib/open_deep_research/supervisor.py-143-155 (1)
143-155: ⚠️ Potential issue | 🟠 Major

complete_research only exits the inner loop.

Line 152 breaks the tool‑call loop, but the outer for _ in range(self.max_research_iter) continues, so planning keeps running after completion. Break the outer loop when complete_research is invoked.
Suggested fix
-            for tool_call in research_planning_task.messages[-1].tool_calls:
+            completed = False
+            for tool_call in research_planning_task.messages[-1].tool_calls:
                 tool_name = tool_call.function.name
                 arguments = json.loads(tool_call.function.arguments)
@@
                 elif tool_name == "complete_research":
-                    break
+                    completed = True
+                    break
+
+            if completed:
+                break
tensorrt_llm/scaffolding/worker.py-87-116 (1)
87-116: ⚠️ Potential issue | 🟠 Major

Remove blocking requests.post() from async function and add timeout.

The send_kv_cache_hint method is async but calls synchronous requests.post() directly, which blocks the event loop indefinitely without a timeout. Offload this call to a thread using asyncio.to_thread() and set a timeout.
Suggested fix
-            return requests.post(
-                url,
-                json=kv_cache_hint_params,
-                headers=headers,
-            )
+            return await asyncio.to_thread(
+                requests.post,
+                url,
+                json=kv_cache_hint_params,
+                headers=headers,
+                timeout=30,
+            )
tensorrt_llm/scaffolding/task_collection.py-131-132 (1)
131-132: ⚠️ Potential issue | 🟠 Major

Override get_global_info as a staticmethod to match TaskCollection.

Without @staticmethod, calling instance.get_global_info() raises TypeError because self is implicitly passed.
✅ Fix for all affected subclasses
 class ChatTokenCounter(TaskCollection):
@@
-    def get_global_info() -> Any:
+    `@staticmethod`
+    def get_global_info() -> Any:
         return ChatTokenCounter.statistics
@@
 class TaskTimer(TaskCollection):
@@
-    def get_global_info() -> Any:
+    `@staticmethod`
+    def get_global_info() -> Any:
         return TaskTimer.statistics
@@
 class QueryCollector(TaskCollection):
@@
-    def get_global_info() -> Any:
+    `@staticmethod`
+    def get_global_info() -> Any:
         with open(QueryCollector.file_name, 'w', encoding='utf-8') as f:
             json.dump(QueryCollector.query_dict,
                       f,
                       indent=4,
                       ensure_ascii=False)
         return None
Also applies to: 167-168, 186-192
cpp/include/tensorrt_llm/batch_manager/agentTree.h-17-18 (1)
17-18: ⚠️ Potential issue | 🟠 Major

Replace #pragma once with the required TRTLLM_ include guard.*

The guidelines mandate a preprocessor guard with the TRTLLM_<FILENAME>_H pattern.
🛡️ Required include guard
-#pragma once
+#ifndef TRTLLM_AGENTTREE_H
+#define TRTLLM_AGENTTREE_H
@@
-} // namespace agent_tree
-} // namespace tensorrt_llm::batch_manager
+} // namespace agent_tree
+} // namespace tensorrt_llm::batch_manager
+
+#endif // TRTLLM_AGENTTREE_H
As per coding guidelines: Use a preprocessor guard in C++ header files with the format TRTLLM_<FILENAME>_H, where the filename is in uppercase with no underscores, no prefix underscores, and no trailing underscores.

Also applies to: 337-338
tensorrt_llm/scaffolding/task_collection.py-1-1 (1)
1-1: ⚠️ Potential issue | 🟠 Major

Add the required NVIDIA copyright header.

This file currently has no header.
🔧 Suggested header
+#
+# Copyright (c) 2026, NVIDIA CORPORATION.  All rights reserved.
+#
 import json
As per coding guidelines: All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification.
cpp/tensorrt_llm/batch_manager/agentTree.cpp-293-301 (1)
293-301: ⚠️ Potential issue | 🟠 Major

Handle agentPercentage = -1.0 before constructing AgentChatbotNodeParams.

AgentTreeConfig documents -1.0 as a sentinel, but AgentChatbotNodeParams asserts ratio ∈ [0,1]. With the default, this path fails immediately.
✅ One possible fix (map sentinel to a policy ratio)
-    auto nodeParams = std::make_shared<AgentChatbotNodeParams>(config.agentPercentage, childNodeTypes);
+    auto agentRatio = config.agentPercentage;
+    if (agentRatio < 0.0F)
+    {
+        agentRatio = 0.5F; // TODO: pick a policy for "random" scheduling
+    }
+    auto nodeParams = std::make_shared<AgentChatbotNodeParams>(agentRatio, childNodeTypes);
tensorrt_llm/scaffolding/task.py-1-1 (1)
1-1: ⚠️ Potential issue | 🟠 Major

Add the required NVIDIA copyright header.

This file currently has no header.
🔧 Suggested header
+#
+# Copyright (c) 2026, NVIDIA CORPORATION.  All rights reserved.
+#
 import json
As per coding guidelines: All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification.
cpp/tensorrt_llm/batch_manager/agentTree.cpp-1-15 (1)
1-15: ⚠️ Potential issue | 🟠 Major

Update the copyright year to include 2026.

The file is newly added/modified in 2026 but the header stops at 2024.
🔧 Suggested update
- * Copyright (c) 2023-2024, NVIDIA CORPORATION.  All rights reserved.
+ * Copyright (c) 2023-2026, NVIDIA CORPORATION.  All rights reserved.
As per coding guidelines: All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification.
tensorrt_llm/scaffolding/__init__.py-1-1 (1)
1-1: ⚠️ Potential issue | 🟠 Major

Update the copyright year to include 2026.

This file has 2026 changes but the header still lists 2025.
🔧 Suggested update
-# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+# Copyright (c) 2025-2026, NVIDIA CORPORATION.  All rights reserved.
As per coding guidelines: All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification.
cpp/tensorrt_llm/batch_manager/agentTree.cpp-158-181 (1)
158-181: ⚠️ Potential issue | 🟠 Major

Avoid duplicating remaining requests when one list finishes.

When activeVectors == 1, you append the remaining requests but keep iterating the priority queue, which re‑adds the first of those remaining entries.
🐛 Suggested fix
             if (activeVectors == 1 && !pq.empty())
             {
                 auto [reqId, listIdx, itemIdx, req] = pq.top();
                 auto const* lastVec = nonEmptyVectors[listIdx];
                 result.insert(result.end(), lastVec->begin() + itemIdx, lastVec->end());
+                break;
             }
cpp/include/tensorrt_llm/batch_manager/agentTree.h-1-15 (1)
1-15: ⚠️ Potential issue | 🟠 Major

Update the copyright year to include 2026.

The header is 2023–2024 even though the file is newly added in 2026.
🔧 Suggested update
- * Copyright (c) 2023-2024, NVIDIA CORPORATION.  All rights reserved.
+ * Copyright (c) 2023-2026, NVIDIA CORPORATION.  All rights reserved.
As per coding guidelines: All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification.

🟡 Minor comments (24)

tests/unittest/scaffolding/test_mcp_worker.py-337-365 (1)
337-365: ⚠️ Potential issue | 🟡 Minor

Missing try/finally for worker cleanup.

Unlike other tests, test_multiple_calls_to_same_tool calls worker.shutdown() outside a try/finally block. If any assertion fails, the worker won't be properly cleaned up, potentially causing resource leaks or affecting subsequent tests.
Proposed fix
 `@pytest.mark.asyncio`
 async def test_multiple_calls_to_same_tool(mcp_server):
     sse_url = mcp_server.get_sse_url()
     worker = MCPWorker(urls=[sse_url])
     await worker.init_in_asyncio_event_loop()
 
-    task1 = MCPCallTask()
-    task1.tool_name = "add_numbers"
-    task1.args = json.dumps({"a": 10, "b": 20})
-
-    task2 = MCPCallTask()
-    task2.tool_name = "add_numbers"
-    task2.args = json.dumps({"a": 100, "b": 200})
-
-    status1 = await worker.run_task(task1)
-    status2 = await worker.run_task(task2)
-    assert status1 == TaskStatus.SUCCESS
-    assert status2 == TaskStatus.SUCCESS
-    result1 = int(task1.result_str)
-    result2 = int(task2.result_str)
-    assert result1 == 30, f"Expected 30, but got {result1}"
-    assert result2 == 300, f"Expected 300, but got {result2}"
-
-    print("✓ Test passed: Multiple calls to same tool successful")
-    print(f"  - Call 1: add_numbers(10, 20) = {result1}")
-    print(f"  - Call 2: add_numbers(100, 200) = {result2}")
-
-    worker.shutdown()
+    try:
+        task1 = MCPCallTask()
+        task1.tool_name = "add_numbers"
+        task1.args = json.dumps({"a": 10, "b": 20})
+
+        task2 = MCPCallTask()
+        task2.tool_name = "add_numbers"
+        task2.args = json.dumps({"a": 100, "b": 200})
+
+        status1 = await worker.run_task(task1)
+        status2 = await worker.run_task(task2)
+        assert status1 == TaskStatus.SUCCESS
+        assert status2 == TaskStatus.SUCCESS
+        result1 = int(task1.result_str)
+        result2 = int(task2.result_str)
+        assert result1 == 30, f"Expected 30, but got {result1}"
+        assert result2 == 300, f"Expected 300, but got {result2}"
+
+        print("✓ Test passed: Multiple calls to same tool successful")
+        print(f"  - Call 1: add_numbers(10, 20) = {result1}")
+        print(f"  - Call 2: add_numbers(100, 200) = {result2}")
+
+    finally:
+        worker.shutdown()
tests/unittest/scaffolding/test_mcp_worker.py-103-106 (1)
103-106: ⚠️ Potential issue | 🟡 Minor

Commented-out server readiness check may cause flaky tests.

The _wait_for_server call is commented out, which means tests proceed immediately after starting the server process. This creates a race condition where tests may run before the server is ready, potentially causing intermittent failures. Consider enabling the wait or adding an alternative readiness mechanism.
Proposed fix
         self.proc.start()
 
         # Wait for server to be ready
-        # self._wait_for_server(url=self.sse_url,
-        #                      timeout=self.MAX_SERVER_START_WAIT_S)
+        self._wait_for_server(url=self.sse_url,
+                              timeout=self.MAX_SERVER_START_WAIT_S)
tests/unittest/scaffolding/test_mcp_worker.py-253-253 (1)

253-253: ⚠️ Potential issue | 🟡 Minor

Fix inconsistent async call to synchronous method.

MCPWorker.shutdown() is a synchronous method (not async def). Line 436 in test_scaffolding_with_chat_mcp_controller incorrectly uses await mcp_worker.shutdown(). Remove the await to match the method signature and be consistent with all other test calls (lines 253, 282, 334, 364).
examples/scaffolding/contrib/open_deep_research/WordLlama/pyproject.toml-4-4 (1)
4-4: ⚠️ Potential issue | 🟡 Minor

Update placeholder description before merging.

The description field contains placeholder text that should be replaced with a meaningful description of the package's purpose.
Suggested fix
-description = "Add your description here"
+description = "MCP server for local knowledge base search used in Open Deep Research"
examples/scaffolding/contrib/open_deep_research/WordLlama/README.md-7-17 (1)
7-17: ⚠️ Potential issue | 🟡 Minor

Fix typo and add language specifier to code block.

Line 12: "Travily" should be "Tavily"

The fenced code block should specify a language (e.g., bash or shell) for proper syntax highlighting
Suggested fix
-```
+```bash
 python run_deep_research.py --enable_query_collector

 You will see query_result.json and copy query_result.json to this path.

-Kill the Travily serve.
+Kill the Tavily server.

 uv run wordllama_serve.py

 Then you can continue to run run_deep_research.py.
 ```
tensorrt_llm/scaffolding/contrib/open_deep_research/__init__.py-1-4 (1)
1-4: ⚠️ Potential issue | 🟡 Minor

Add NVIDIA copyright header.

Per coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header with the year of latest meaningful modification.
Suggested fix
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 from .researcher import Researcher
 from .supervisor import Supervisor, create_open_deep_research_scaffolding_llm

-__all__ = ["Supervisor", "Researcher", "create_open_deep_research_scaffolding_llm"]
+__all__ = ["Researcher", "Supervisor", "create_open_deep_research_scaffolding_llm"]
As per coding guidelines: "All TensorRT-LLM source files (.cpp, .h, .cu, .py, and other source files) should contain an NVIDIA copyright header with the year of latest meaningful modification."
cpp/include/tensorrt_llm/batch_manager/microBatchScheduler.h-44-45 (1)

44-45: ⚠️ Potential issue | 🟡 Minor

Remove the orphan comment on line 44.

The comment references AgentTreeConfig from agentTree.h, but this type is never used in microBatchScheduler.h and there is no corresponding include for that header. This comment provides no value and should be removed.

examples/scaffolding/contrib/open_deep_research/WordLlama/pyproject.toml-6-6 (1)

6-6: ⚠️ Potential issue | 🟡 Minor

Document the Python 3.12 requirement rationale.

The declared dependencies (wordllama >=3.9, mcp >=3.10) do not require Python 3.12. Since the requires-python = ">=3.12" constraint is intentional but undocumented, add a comment in the pyproject.toml (or README) explaining why Python 3.12 is required—whether for the package's own code, compatibility, or project policy.
tensorrt_llm/scaffolding/contrib/open_deep_research/README.md-72-75 (1)
72-75: ⚠️ Potential issue | 🟡 Minor

Typo in script filename.

The script name appears to be misspelled. Based on the Tavily API context, it should likely be tavily.py instead of travily.py.
📝 Proposed fix
 cd examples/scaffolding/contrib/open_deep_research/TavilyMCP
 export TAVILY_API_KEY=<your_api_key>
-uv run travily.py
+uv run tavily.py
examples/scaffolding/contrib/open_deep_research/WordLlama/wordllama_serve.py-1-30 (1)

1-30: ⚠️ Potential issue | 🟡 Minor

Missing NVIDIA copyright header.

Per coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header.
tensorrt_llm/scaffolding/contrib/open_deep_research/utils.py-1-11 (1)
1-11: ⚠️ Potential issue | 🟡 Minor

Missing NVIDIA copyright header.

Per coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header.
📝 Proposed fix to add copyright header
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 from datetime import datetime
examples/scaffolding/contrib/open_deep_research/WordLlama/main.py-1-6 (1)
1-6: ⚠️ Potential issue | 🟡 Minor

Missing NVIDIA copyright header.

Per coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header with the year of latest meaningful modification.
📝 Proposed fix to add copyright header
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 def main():
     print("Hello from wordllama!")
examples/scaffolding/contrib/open_deep_research/run_deep_research.py-1-17 (1)

1-17: ⚠️ Potential issue | 🟡 Minor

Missing NVIDIA copyright header.

Per coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header.
cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h-1696-1699 (1)
1696-1699: ⚠️ Potential issue | 🟡 Minor

Use //! style for the new truncateBlocks doc comments.

The new Doxygen blocks use ///, but the header rule calls for //! in this codebase.
✏️ Proposed fix
-    /// `@brief` Drop blocks from the KV cache manager.
-    /// `@param` targetTokens The target tokens with both prefix to keep and suffix to drop.
-    /// `@param` numTokensToKeep Number of tokens to keep from the end of the sequence.
+    //! `@brief` Drop blocks from the KV cache manager.
+    //! `@param` targetTokens The target tokens with both prefix to keep and suffix to drop.
+    //! `@param` numTokensToKeep Number of tokens to keep from the end of the sequence.
     virtual void truncateBlocks(LlmRequest::VecTokens const& targetTokens, SizeType32 numTokensToKeep) = 0;
-    /// `@brief` Drop blocks from the KV cache manager.
-    /// `@param` targetTokens The target tokens with both prefix to keep and suffix to drop.
-    /// `@param` numTokensToKeep Number of tokens to keep from the end of the sequence.
+    //! `@brief` Drop blocks from the KV cache manager.
+    //! `@param` targetTokens The target tokens with both prefix to keep and suffix to drop.
+    //! `@param` numTokensToKeep Number of tokens to keep from the end of the sequence.
     void truncateBlocks(LlmRequest::VecTokens const& targetTokens, SizeType32 numTokensToKeep) override;
As per coding guidelines, "Follow Doxygen rules for documenting new C++ class interfaces and function prototypes. Use //! for C++-style single-line comments and //!< for class members".

Also applies to: 2049-2052
cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h-915-915 (1)
915-915: ⚠️ Potential issue | 🟡 Minor

Add Doxygen docs for the new public truncateBlocks APIs.

These public declarations are missing Doxygen blocks, which makes the interface harder to consume.
✏️ Proposed doc additions
+    //! \brief Drop blocks from the KV cache for this window.
+    //! \param targetTokens The target tokens with both prefix to keep and suffix to drop.
+    //! \param numTokensToKeep Number of tokens to keep from the end of the sequence.
     void truncateBlocks(LlmRequest::VecTokens const& targetTokens, SizeType32 numTokensToKeep);
+    //! \brief Drop blocks from the KV cache for a specific window size.
+    //! \param targetTokens The target tokens with both prefix to keep and suffix to drop.
+    //! \param numTokensToKeep Number of tokens to keep from the end of the sequence.
     void truncateBlocks(LlmRequest::VecTokens const& targetTokens, SizeType32 numTokensToKeep, SizeType32 windowSize);
As per coding guidelines, "Follow Doxygen rules for documenting new C++ class interfaces and function prototypes. Use //! for C++-style single-line comments and //!< for class members".

Also applies to: 1421-1421
tensorrt_llm/scaffolding/contrib/mcp/chat_handler.py-12-14 (1)
12-14: ⚠️ Potential issue | 🟡 Minor

Harden env parsing and use a Google‑style docstring.

int(os.environ.get(...)) will raise if the variable is set to non‑numeric values (e.g., "true"). Consider a tolerant parse and a structured docstring.
🛠️ Suggested fix
-def is_deterministic_mode():
-    """Check if SCAFFOLDING_DETERMINISTIC environment variable is set to enable deterministic inference."""
-    return int(os.environ.get("SCAFFOLDING_DETERMINISTIC", 0)) == 1
+def is_deterministic_mode() -> bool:
+    """Check if deterministic inference is enabled.
+
+    Returns:
+        bool: True if SCAFFOLDING_DETERMINISTIC is set to a truthy value.
+    """
+    value = os.environ.get("SCAFFOLDING_DETERMINISTIC", "0").strip().lower()
+    return value in {"1", "true", "yes", "y", "on"}
As per coding guidelines: Use Google-style docstrings for Python classes and functions, which can be parsed by Sphinx.
tests/unittest/scaffolding/test_worker.py-81-88 (1)
81-88: ⚠️ Potential issue | 🟡 Minor

Missing cleanup on assertion failure.

Unlike test_trtoai_worker_generation (lines 73-78), this test doesn't wrap the assertion in a try/except block. If the assertion fails, worker.shutdown() won't be called, potentially leaving resources unreleased.
🛡️ Proposed fix to ensure cleanup on failure
 `@pytest.mark.asyncio`(loop_scope="module")
 def test_trtoai_worker_chat(default_prompt, model_name, server):
     worker = create_trtoai_worker(model_name, server.get_async_client())
     task = ChatTask.create_from_messages([UserMessage(default_prompt)])
     task.max_tokens = 100
     status = asyncio.run(worker.run_task(task))
-    assert status == TaskStatus.SUCCESS, "Chat Task is not successful with TRTOpenaiWorker"
-    worker.shutdown()
+    try:
+        assert status == TaskStatus.SUCCESS, "Chat Task is not successful with TRTOpenaiWorker"
+    except AssertionError as e:
+        worker.shutdown()
+        server.__exit__(None, None, None)
+        raise e
+    worker.shutdown()
tests/unittest/scaffolding/test_bench.py-166-166 (1)
166-166: ⚠️ Potential issue | 🟡 Minor

Remove extraneous f-string prefix.

The f-string f"before shutdown" has no placeholders. Use a regular string instead.
🔧 Proposed fix
-    print(f"before shutdown")
+    print("before shutdown")
examples/scaffolding/benchmarks/__main__.py-1-24 (1)

1-24: ⚠️ Potential issue | 🟡 Minor

Missing NVIDIA copyright header.

Per coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header.

examples/scaffolding/benchmarks/chat_benchmark.py-1-14 (1)

1-14: ⚠️ Potential issue | 🟡 Minor

Missing NVIDIA copyright header.

Per coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header with the year of latest meaningful modification.
tensorrt_llm/scaffolding/contrib/open_deep_research/researcher.py-90-106 (1)
90-106: ⚠️ Potential issue | 🟡 Minor

Validate non-empty task list before indexing.

research_tasks[0] will throw if an empty list is passed.
🛠️ Suggested fix
     def process(self, research_tasks: List[ResearchTask], **kwargs):
+        if not research_tasks:
+            raise ValueError("Researcher requires at least one ResearchTask")
         chat_task = ChatTask.create_from_prompt(
             research_tasks[0].research_topic,
tensorrt_llm/scaffolding/contrib/open_deep_research/researcher.py-52-55 (1)
52-55: ⚠️ Potential issue | 🟡 Minor

Avoid assert for runtime validation in Compressor.process.

Asserts can be stripped; use an explicit exception for input validation.
🛠️ Suggested fix
-        assert len(tasks) == 1 and isinstance(tasks[0], ChatTask), (
-            "Compressor only supports one ChatTask"
-        )
+        if len(tasks) != 1 or not isinstance(tasks[0], ChatTask):
+            raise ValueError("Compressor only supports a single ChatTask")
tensorrt_llm/scaffolding/worker.py-394-396 (1)

394-396: ⚠️ Potential issue | 🟡 Minor

Remove the duplicate asyncio import.

Line 394 re-imports asyncio, which is already imported at the top and triggers F811. Clean up the duplicate import.
tensorrt_llm/scaffolding/contrib/open_deep_research/prompts.py-151-156 (1)
151-156: ⚠️ Potential issue | 🟡 Minor

Update reflection instruction to match the tool name.

Line 152 still says “tavily_search” even though the tool list now uses web_search, which can mislead the agent.
Suggested fix
-After each tavily_search call, use reflection to analyze the results:
+After each web_search call, use reflection to analyze the results:

WeiHaocheng · 2026-02-02T01:50:41Z

Hi @dongxuy04 @QiJune @Superjomn , could you help to review this PR?
In simple terms, this PR consists of three main parts.

The implementation of OpenDeepResearch has been added to Scaffolding.
In the batch schedule section, the logic for scheduling based on the job_id requested by the Agent and the hierarchical information has been implemented.
In the kvcache section, we have implemented a feature that allows users to explicitly drop their kvcache requests.

I think what we particularly need to review is the part where the core logic of trtllm has been modified. I'm not sure if there are any areas that introduce significant risks.

Thanks!

WeiHaocheng · 2026-02-02T01:50:47Z

/bot run

tensorrt-cicd · 2026-02-02T01:57:19Z

PR_Github #34384 [ run ] triggered by Bot. Commit: 5121edc

tensorrt-cicd · 2026-02-02T02:45:40Z

PR_Github #34384 [ run ] completed with state FAILURE. Commit: 5121edc
/LLM/main/L0_MergeRequest_PR pipeline #26529 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Boreas618 · 2026-03-02T15:54:48Z

/bot run

tensorrt-cicd · 2026-03-02T16:00:38Z

PR_Github #37355 [ run ] triggered by Bot. Commit: 34280f9 Link to invocation

tensorrt-cicd · 2026-03-02T18:42:55Z

PR_Github #37355 [ run ] completed with state SUCCESS. Commit: 34280f9
/LLM/main/L0_MergeRequest_PR pipeline #28916 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Boreas618 · 2026-03-03T08:57:34Z

/bot run

Signed-off-by: Yi Sun <[email protected]>

WeiHaocheng · 2026-05-08T07:41:35Z

/bot run

tensorrt-cicd · 2026-05-08T07:47:22Z

PR_Github #47350 [ run ] triggered by Bot. Commit: bbe77b3 Link to invocation

tensorrt-cicd · 2026-05-08T12:44:34Z

PR_Github #47350 [ run ] completed with state SUCCESS. Commit: bbe77b3
/LLM/main/L0_MergeRequest_PR pipeline #37286 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Boreas618 · 2026-05-08T12:49:32Z

/bot run

tensorrt-cicd · 2026-05-08T12:55:02Z

PR_Github #47399 [ run ] triggered by Bot. Commit: bbe77b3 Link to invocation

tensorrt-cicd · 2026-05-08T17:02:27Z

PR_Github #47399 [ run ] completed with state SUCCESS. Commit: bbe77b3
/LLM/main/L0_MergeRequest_PR pipeline #37329 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Boreas618 · 2026-05-09T07:23:17Z

/bot run

tensorrt-cicd · 2026-05-09T07:28:45Z

PR_Github #47490 [ run ] triggered by Bot. Commit: 2c65a31 Link to invocation

tensorrt-cicd · 2026-05-09T08:13:03Z

PR_Github #47490 [ run ] completed with state SUCCESS. Commit: 2c65a31
/LLM/main/L0_MergeRequest_PR pipeline #37410 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Boreas618 · 2026-05-09T12:23:17Z

/bot run

tensorrt-cicd · 2026-05-09T12:28:41Z

PR_Github #47517 [ run ] triggered by Bot. Commit: 2e9ee7d Link to invocation

tensorrt-cicd · 2026-05-09T17:34:39Z

PR_Github #47517 [ run ] completed with state SUCCESS. Commit: 2e9ee7d
/LLM/main/L0_MergeRequest_PR pipeline #37434 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Boreas618 · 2026-05-09T19:43:43Z

/bot run

tensorrt-cicd · 2026-05-09T19:49:31Z

PR_Github #47536 [ run ] triggered by Bot. Commit: 2e9ee7d Link to invocation

tensorrt-cicd · 2026-05-10T03:17:38Z

PR_Github #47536 [ run ] completed with state SUCCESS. Commit: 2e9ee7d
/LLM/main/L0_MergeRequest_PR pipeline #37453 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

Boreas618 requested review from a team as code owners February 1, 2026 15:00

Boreas618 requested review from Superjomn, dongxuy04, laikhtewari and nv-guomingz February 1, 2026 15:00

svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label Feb 1, 2026

coderabbitai Bot reviewed Feb 1, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/pyexecutor/_util.py Outdated

Comment thread tensorrt_llm/serve/harmony_adapter.py Outdated

WeiHaocheng reviewed Feb 2, 2026

View reviewed changes

Comment thread tensorrt_llm/serve/openai_protocol.py Outdated

WeiHaocheng requested review from QiJune and removed request for Superjomn February 2, 2026 01:43

WeiHaocheng reviewed Feb 2, 2026

View reviewed changes

Comment thread examples/scaffolding/contrib/open_deep_research/data/open_deepresearch_bench.json Outdated

WeiHaocheng changed the title ~~[None][feat] Enable joint optimization of agent applications and TensorRT-LLM with scaffolding~~ [None][feat] Enable combined optimization of agent applications and TensorRT-LLM with scaffolding Feb 4, 2026

dongxuy04 reviewed Feb 4, 2026

View reviewed changes

Comment thread cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp Outdated

Superjomn reviewed Feb 9, 2026

View reviewed changes

Comment thread tensorrt_llm/llmapi/llm.py Outdated

Comment thread tensorrt_llm/llmapi/llm_args.py Outdated

Comment thread tensorrt_llm/llmapi/llm_args.py Outdated

Boreas618 force-pushed the scaffolding-open-deep-research branch from 6044c4a to 498712c Compare March 2, 2026 15:39

Boreas618 force-pushed the scaffolding-open-deep-research branch from 34280f9 to 17f2c17 Compare March 3, 2026 08:57

Boreas618 requested a review from a team as a code owner March 3, 2026 08:57

Boreas618 added 12 commits May 8, 2026 15:41

fix: remove unnecessary sync between ranks

4bf6515

Signed-off-by: Yi Sun <[email protected]>

fix: trt backend w/ kv cache control plane

89c6665

Signed-off-by: Yi Sun <[email protected]>

refactor: KV cache control plane

1da4639

Signed-off-by: Yi Sun <[email protected]>

refactor: remove reasoning tokens

4f4aaa7

Signed-off-by: Yi Sun <[email protected]>

revert: legacy changes

b968933

Signed-off-by: Yi Sun <[email protected]>

fix: legacy postproc call

5be61bc

Signed-off-by: Yi Sun <[email protected]>

fix: pre-commit check

c662de3

Signed-off-by: Yi Sun <[email protected]>

fix: server hang caused by race condition

cee057d

Signed-off-by: Yi Sun <[email protected]>

fix: resource governor params leaked to trt backend

fb0b76f

Signed-off-by: Yi Sun <[email protected]>

fix: pre_commit

5749015

Signed-off-by: Yi Sun <[email protected]>

fix: truncate block test

fa924a9

Signed-off-by: Yi Sun <[email protected]>

refactor: KV cache hints based on new API

bbe77b3

Signed-off-by: Yi Sun <[email protected]>

Merge branch 'main' into scaffolding-open-deep-research

2c65a31

Merge branch 'main' into scaffolding-open-deep-research

2e9ee7d

peihu-nv mentioned this pull request May 18, 2026

[None][fix] Handle unset attention_dp_relax in ADP routers #14276

Merged

1 task

Conversation

Boreas618 commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

coderabbitai Bot commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WeiHaocheng commented Feb 2, 2026

Uh oh!

WeiHaocheng commented Feb 2, 2026

Uh oh!

Uh oh!

tensorrt-cicd commented Feb 2, 2026

Uh oh!

tensorrt-cicd commented Feb 2, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Boreas618 commented Mar 2, 2026

Uh oh!

tensorrt-cicd commented Mar 2, 2026

Uh oh!

tensorrt-cicd commented Mar 2, 2026

Uh oh!

Boreas618 commented Mar 3, 2026

Uh oh!

WeiHaocheng commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

Boreas618 commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

Boreas618 commented May 9, 2026

Uh oh!

tensorrt-cicd commented May 9, 2026

Uh oh!

tensorrt-cicd commented May 9, 2026

Uh oh!

Boreas618 commented May 9, 2026

Uh oh!

tensorrt-cicd commented May 9, 2026

Uh oh!

tensorrt-cicd commented May 9, 2026

Uh oh!

Boreas618 commented May 9, 2026

Uh oh!

tensorrt-cicd commented May 9, 2026

Uh oh!

tensorrt-cicd commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Boreas618 commented Feb 1, 2026 •

edited

Loading

coderabbitai Bot commented Feb 1, 2026 •

edited

Loading