[None][feat] Support kv_cahce_reuse for HyperCLOVAX-Vision model #7789

yechank-nvidia · 2025-09-17T06:41:16Z

Summary by CodeRabbit

New Features
- Enhanced multimodal processing for HCXVision, including image token counting and unified embedding utilities.
Bug Fixes
- LLaVA-Next now pads variable-patch images for stable batching and shape consistency.
Documentation
- Updated supported models with GPT-OSS (PyTorch backend).
- Restructured feature matrices for clearer model/feature coverage.
Refactor
- Vision backends (CLIP, SigLIP) reuse pre-initialized attention metadata for efficiency.
- Streamlined multimodal hashing and position detection; early exits when no multimodal tokens.
- Disaggregated multimodal embeddings temporarily disabled with a clear error.
Chores
- Reduced noisy multimodal logging (warning → debug).

tensorrt_llm/inputs/registry.py

coderabbitai · 2025-09-17T06:56:22Z

📝 Walkthrough

Walkthrough

Adds GPT-OSS to docs and restructures feature matrices. Reuses pre-initialized attention metadata in CLIP and SigLIP vision models. Refactors HyperCLOVA-X multimodal input processor and embedding flow. Pads LLaVA-Next image patches for batching. Refactors multimodal hashing/positioning pipeline. Adjusts a log level in multimodal token position lookup.

Changes

Cohort / File(s)	Summary
Docs: Supported models and feature matrices `docs/source/models/supported-models.md`	Added `GptOssForCausalLM` to PyTorch backend listings. Simplified/retitled feature matrix columns, reformatted model rows with code-formatted names across matrices.
Vision attention metadata reuse `tensorrt_llm/_torch/models/modeling_clip.py`, `tensorrt_llm/_torch/models/modeling_siglip.py`	Introduced reusable `attn_metadata` on init; `prepare_attn_metadata` now mutates and prepares it per batch (pinned `seq_lens`, updated fields). Adjusted `max_seq_len` handling.
HyperCLOVA-X multimodal processor/flow `tensorrt_llm/_torch/models/modeling_hyperclovax.py`	`HCXVisionInputProcessor` now inherits `BaseMultimodalInputProcessor`; adds `vision_query_lengths`, `get_vocab_size`, `get_num_tokens_per_image`, `get_mm_token_ids`. Forward uses `get_multimodal_embeddings`; DISAGG path raises `NotImplementedError`. Uses `find_input_mm_embeds` post-embed. Minor formatting updates.
LLaVA-Next batching padding `tensorrt_llm/_torch/models/modeling_llava_next.py`	Added `_pad_for_batching` to zero-pad pixel values to equalize patch counts; applied before concatenation in forward.
Multimodal hashing/positions pipeline `tensorrt_llm/inputs/registry.py`	Reordered flow: compute `mm_hashes` and `prompt_token_ids` early; early-return when no MM tokens; compute positions via `find_mm_token_positions`; flatten and cast hashes; construct `MultimodalInput` and attach to extras. Preserves fallback behavior on failures.
Logging level tweak `tensorrt_llm/inputs/multimodal.py`	Downgraded a warning to debug in `find_mm_token_positions` when `mm_token_ids` supersede `vocab_size`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant Registry as inputs.registry
  participant Proc as InputProcessor
  participant Utils as multimodal utils
  participant MM as MultimodalInput

  Caller->>Registry: multimodal_hashing_process(inputs)
  Note over Registry: Early compute
  Registry->>Proc: get_multimodal_hashes(...)
  Registry->>Proc: get_prompt_token_ids(...)
  alt No multimodal tokens
    Registry-->>Caller: (prompt_token_ids, None)
  else Has multimodal tokens
    Registry->>Proc: get_vocab_size(), get_mm_token_ids(), get_mm_special_token_ids()
    Registry->>Utils: find_mm_token_positions(prompt_token_ids, vocab_size/mm_ids/...) 
    Note over Registry: Flatten + int32 cast
    Registry->>MM: from_components(int32_hashes, start_positions, counts, offsets)
    Registry-->>Caller: (prompt_token_ids, { "multimodal_input": MM })
  end

sequenceDiagram
  autonumber
  actor User
  participant HCX as HCXVisionForCausalLM
  participant Enc as MM Encoder
  participant Utils as modeling_multimodal_utils

  User->>HCX: forward(input_ids, multimodal_params)
  alt DISAGG enabled
    HCX-->>User: NotImplementedError
  else Non-DISAGG
    HCX->>Utils: get_multimodal_embeddings(encoder_forward_fn=Enc.forward, multimodal_params[:N])
    Utils->>Enc: forward(pixel_values,...)
    Utils-->>HCX: mm_embeds
    HCX->>Utils: find_input_mm_embeds(mm_embeds, multimodal_params[:N])
    Utils-->>HCX: mm_embeds (aligned)
    HCX-->>User: logits
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request has no description ("No pull request description was added by the author"), so it does not follow the repository's required PR template and is missing the Description, Test Coverage, and PR Checklist information reviewers need to assess rationale and tests.	Please populate the PR description using the repository template: provide a short summary of the change and why, list relevant tests and how they cover the change, confirm the PR checklist items (coding guidelines, tests, docs, CODEOWNERS, etc.), and include the appropriate ticket/issue reference or "[None]" as required.
Docstring Coverage	⚠️ Warning	Docstring coverage is 22.58% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "[None][feat] Support kv_cahce_reuse for HyperCLOVAX-Vision model" is concise and directly reflects the branch intent (hyperclovax_kvcache_reuse) and the modeling_hyperclovax changes in the diff, so it meaningfully summarizes the primary change; however it contains a typographical error ("cahce" should be "cache").

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (11)

tensorrt_llm/inputs/multimodal.py (3)

1-1: Add NVIDIA Apache-2.0 header (2025).

Per repo guidelines, prepend the current-year NVIDIA Apache-2.0 header to all Python sources.

Apply:

+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#     http://www.apache.org/licenses/LICENSE-2.0
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

626-629: Always return a tuple from find_mm_token_positions.

Returning a single list breaks callers expecting a 2‑tuple; will raise unpack errors in registry.py.

Apply:

-    if not torch.any(mm_mask):
-        return []
+    if not torch.any(mm_mask):
+        return [], []

150-193: UnboundLocalError risk for remainder in MultimodalRuntimeData.

remainder is defined only when (num_unseen_mm_tokens or num_mm_tokens_in_chunk) is None; later used unconditionally.

Apply:

-        if self.num_unseen_mm_tokens is None or self.num_mm_tokens_in_chunk is None:
+        remainder = 0
+        if self.num_unseen_mm_tokens is None or self.num_mm_tokens_in_chunk is None:
             # Compute cached multimodal tokens based on positions and cached tokens
             self.num_unseen_mm_tokens = 0
             self.num_mm_tokens_in_chunk = 0
-            remainder = 0
             for pos, length in zip(self.mm_token_positions,
                                    self.mm_token_lengths):
                 ...
+        else:
+            # Ensure defined when provided by caller
+            remainder = 0

tensorrt_llm/_torch/models/modeling_llava_next.py (2)

1-1: Add NVIDIA Apache-2.0 header (2025).

Add the standard header per guidelines.

Apply:

+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#     http://www.apache.org/licenses/LICENSE-2.0
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

5-11: Import torch.nn.functional.

Undefined name F in _pad_for_batching.

Apply:

 import torch
 import torch.nn as nn
+import torch.nn.functional as F

tensorrt_llm/inputs/registry.py (1)

1-1: Add NVIDIA Apache-2.0 header (2025).

Add the standard header per guidelines.

Apply:

+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#     http://www.apache.org/licenses/LICENSE-2.0
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

tensorrt_llm/_torch/models/modeling_hyperclovax.py (2)

1-1: Add NVIDIA Apache-2.0 header (2025).

Add the standard header per guidelines.

Apply:

+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#     http://www.apache.org/licenses/LICENSE-2.0
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

605-608: Use typing.Dict/Any for Python 3.8 compatibility.

Replace dict[str, any] with Dict[str, Any].

Apply:

-    def _post_process(self,
-                      input_ids: torch.Tensor,
-                      preprocessed_image: dict[str, any] = None):
+    def _post_process(self,
+                      input_ids: torch.Tensor,
+                      preprocessed_image: Dict[str, Any] = None):
@@
-    def _preprocess(self, text_prompt: dict[str, any], images: List[Any],
+    def _preprocess(self, text_prompt: Dict[str, Any], images: List[Any],
                     mm_processor_kwargs: Dict[str, Any]):

Also applies to: 667-669

tensorrt_llm/_torch/models/modeling_clip.py (1)

1-1: Add NVIDIA Apache-2.0 header (2025) at top of file.

Required by repo guidelines.

+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

tensorrt_llm/_torch/models/modeling_siglip.py (2)

1-1: Add NVIDIA Apache-2.0 header (2025) at top of file.

Required by repo guidelines.

+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

7-9: Duplicate/conflicting import of SiglipVisionConfig.

SiglipVisionConfig is imported from both configuration_siglip and modeling_siglip; the latter likely shadows the config class. Keep the config import only from configuration module.

-from transformers.models.siglip.configuration_siglip import SiglipVisionConfig
-from transformers.models.siglip.modeling_siglip import (SiglipVisionConfig,
-                                                        SiglipVisionEmbeddings)
+from transformers.models.siglip.configuration_siglip import SiglipVisionConfig
+from transformers.models.siglip.modeling_siglip import SiglipVisionEmbeddings

🧹 Nitpick comments (7)

tensorrt_llm/inputs/multimodal.py (1)

648-656: Clarify semantics of special token positions.

Returned start_special_token_positions are offsets within the flattened MM‑token union, not absolute prompt indices. Consider renaming to special_token_offsets or documenting explicitly to avoid misuse.

docs/source/models/supported-models.md (1)

47-58: Table consistency check.

Ensure the modality/feature values reflect actual runtime support (e.g., KV cache reuse states) for each class; discrepancies here create user confusion.
tensorrt_llm/_torch/models/modeling_hyperclovax.py (2)
595-601: Silence linter for unused parameters or use underscores.

Minor cleanup.

Apply:
-    def get_num_tokens_per_image(
-        self,
-        image: Image.Image,
-        **kwargs,
-    ):
+    def get_num_tokens_per_image(
+        self,
+        _image: Image.Image,
+        **_kwargs,
+    ):
         return self.vision_query_lengths[0].pop(0)
1059-1062: Remove extraneous f-prefix.

No interpolation; keep plain string.

Apply:
-                raise NotImplementedError(
-                    "HCXVisionForCausalLM does not support disaggregated inference yet. Please unset "
-                    f"the TLLM_MULTIMODAL_DISAGGREGATED environment variable, or set it to '0'."
-                )
+                raise NotImplementedError(
+                    "HCXVisionForCausalLM does not support disaggregated inference yet. Please unset "
+                    "the TLLM_MULTIMODAL_DISAGGREGATED environment variable, or set it to '0'."
+                )
tensorrt_llm/_torch/models/modeling_clip.py (2)
187-193: Pre‑allocating a mutable attn_metadata on the model can be racy across concurrent forwards.

If this model is shared across threads/requests, mutating a single instance is not thread‑safe. Either document single‑use/serialized access or scope attn_metadata per runner/engine instance.

118-121: Derive view shape from num_contexts/max_seq_len, not seq_lens buffer length.

Preallocated buffers may have capacity > batch size. Use explicit metadata fields.
-            encoder_states = encoder_states + (hidden_states.view(
-                attn_metadata.seq_lens.shape[0], attn_metadata.seq_lens[0],
-                -1), )
+            encoder_states = encoder_states + (hidden_states.view(
+                attn_metadata.num_contexts, attn_metadata.max_seq_len, -1), )
@@
-        encoder_states = encoder_states + (hidden_states.view(
-            attn_metadata.seq_lens.shape[0], attn_metadata.seq_lens[0], -1), )
+        encoder_states = encoder_states + (hidden_states.view(
+            attn_metadata.num_contexts, attn_metadata.max_seq_len, -1), )
Also applies to: 130-132
tensorrt_llm/_torch/models/modeling_siglip.py (1)

80-86: Same shared-state caveat as CLIP: attn_metadata on the model is mutable.

Not thread‑safe if the model is shared across requests. Clarify usage or scope per runner/engine.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 523a17d and 9e58dec.

📒 Files selected for processing (7)

docs/source/models/supported-models.md (2 hunks)
tensorrt_llm/_torch/models/modeling_clip.py (2 hunks)
tensorrt_llm/_torch/models/modeling_hyperclovax.py (7 hunks)
tensorrt_llm/_torch/models/modeling_llava_next.py (2 hunks)
tensorrt_llm/_torch/models/modeling_siglip.py (2 hunks)
tensorrt_llm/inputs/multimodal.py (1 hunks)
tensorrt_llm/inputs/registry.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

tensorrt_llm/inputs/multimodal.py
tensorrt_llm/_torch/models/modeling_clip.py
tensorrt_llm/_torch/models/modeling_siglip.py
tensorrt_llm/inputs/registry.py
tensorrt_llm/_torch/models/modeling_hyperclovax.py
tensorrt_llm/_torch/models/modeling_llava_next.py

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

tensorrt_llm/inputs/multimodal.py
tensorrt_llm/_torch/models/modeling_clip.py
tensorrt_llm/_torch/models/modeling_siglip.py
tensorrt_llm/inputs/registry.py
tensorrt_llm/_torch/models/modeling_hyperclovax.py
tensorrt_llm/_torch/models/modeling_llava_next.py

**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

tensorrt_llm/inputs/multimodal.py
tensorrt_llm/_torch/models/modeling_clip.py
tensorrt_llm/_torch/models/modeling_siglip.py
tensorrt_llm/inputs/registry.py
tensorrt_llm/_torch/models/modeling_hyperclovax.py
tensorrt_llm/_torch/models/modeling_llava_next.py

🧬 Code graph analysis (5)

tensorrt_llm/inputs/multimodal.py (1)

tensorrt_llm/logger.py (1)

debug (144-145)

tensorrt_llm/_torch/models/modeling_clip.py (3)

tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py (1)

attn_metadata (96-97)

tensorrt_llm/_torch/speculative/eagle3.py (2)

prepare (123-166)

prepare (244-254)

tensorrt_llm/_torch/speculative/mtp.py (1)

prepare (161-209)

tensorrt_llm/_torch/models/modeling_siglip.py (4)

tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py (1)

attn_metadata (96-97)

tensorrt_llm/_torch/speculative/interface.py (1)

prepare (172-175)

tensorrt_llm/_torch/speculative/eagle3.py (2)

prepare (123-166)

prepare (244-254)

tensorrt_llm/_torch/speculative/mtp.py (1)

prepare (161-209)

tensorrt_llm/inputs/registry.py (5)

tensorrt_llm/inputs/multimodal.py (7)

apply_mm_hashes (478-510)

find_mm_token_lengths (529-572)

find_mm_token_positions (575-656)

validate_mm_inputs (659-702)

hexdigest_to_int32 (513-526)

MultimodalInput (19-85)

from_components (72-77)

tensorrt_llm/runtime/model_runner_cpp.py (1)

vocab_size (489-490)

tensorrt_llm/_torch/models/modeling_hyperclovax.py (2)

get_vocab_size (592-593)

get_mm_token_ids (602-603)

tensorrt_llm/_torch/models/modeling_gemma3vl.py (1)

mm_token_ids (291-292)

tensorrt_llm/_torch/models/modeling_mistral.py (1)

mm_token_ids (506-507)

tensorrt_llm/_torch/models/modeling_hyperclovax.py (3)

tensorrt_llm/inputs/multimodal.py (1)

MultimodalParams (196-455)

tensorrt_llm/inputs/registry.py (5)

BaseMultimodalInputProcessor (45-178)

InputProcessor (22-42)

get_vocab_size (64-84)

get_num_tokens_per_image (130-148)

get_mm_token_ids (86-100)

tensorrt_llm/_torch/models/modeling_multimodal_utils.py (3)

find_input_mm_embeds (158-233)

fuse_input_embeds (276-330)

get_multimodal_embeddings (99-155)

🪛 Ruff (0.12.2)

tensorrt_llm/inputs/registry.py

480-482: Avoid specifying long messages outside the exception class

(TRY003)

tensorrt_llm/_torch/models/modeling_hyperclovax.py

597-597: Unused method argument: image

(ARG002)

598-598: Unused method argument: kwargs

(ARG002)

1061-1061: f-string without any placeholders

Remove extraneous f prefix

(F541)

tensorrt_llm/_torch/models/modeling_llava_next.py

375-375: Undefined name F

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (5)

tensorrt_llm/inputs/multimodal.py (2)
606-608: Log level downgrade LGTM.

Demoting to debug reduces noise; in this repo, logger.debug maps to VERBOSE. No action needed.

529-538: Fix return annotation and stale comment in find_mm_token_lengths.

Function returns a dict, not List[int]; comment is misleading.
[suggest_minor_issue]
Apply:
-def find_mm_token_lengths(mm_data: Dict[str, Any],
-                          input_processor: Any) -> List[int]:
+def find_mm_token_lengths(mm_data: Dict[str, Any],
+                          input_processor: Any) -> Dict[str, List[int]]:
@@
-    return num_mm_tokens  # flatten all mm instances to a single list
+    return num_mm_tokens  # mapping: modality -> list of lengths
Also applies to: 572-573
docs/source/models/supported-models.md (1)

13-13: Verify GPT‑OSS listing.

Please confirm that GptOssForCausalLM exists in the codebase and has a working loader; otherwise gate the row behind a release flag or remove until available.

tensorrt_llm/inputs/registry.py (1)

503-506: Attach MultimodalInput robustly.

This block assumes extra_processed_inputs is a dict (after guards above). Looks good once guards are in.

tensorrt_llm/_torch/models/modeling_siglip.py (1)

93-105: Confirm max_seq_len semantics.

You switched to max_seq_len = seq_len (per‑request maximum). Verify the backend expects per‑request max rather than total flattened tokens; mismatch can under‑allocate.

tensorrt_llm/_torch/models/modeling_clip.py

tensorrt_llm/_torch/models/modeling_hyperclovax.py

tensorrt_llm/_torch/models/modeling_llava_next.py

tensorrt_llm/_torch/models/modeling_siglip.py

tensorrt_llm/inputs/registry.py

tensorrt_llm/_torch/models/modeling_clip.py

yechank-nvidia · 2025-09-29T07:21:05Z

/bot run

tensorrt-cicd · 2025-09-29T07:26:34Z

PR_Github #20235 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-29T08:31:10Z

PR_Github #20235 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15259 completed with status: 'FAILURE'

yechank-nvidia · 2025-09-30T04:31:18Z

/bot run

tensorrt-cicd · 2025-09-30T04:41:01Z

PR_Github #20331 [ run ] triggered by Bot

tensorrt-cicd · 2025-09-30T06:47:17Z

PR_Github #20331 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15335 completed with status: 'FAILURE'

chang-l

LGTM

tensorrt_llm/inputs/registry.py

tensorrt_llm/_torch/models/modeling_llava_next.py

docs/source/models/supported-models.md

brb-nv · 2025-10-01T02:25:40Z

Thank you for the MR, @yechank-nvidia. Is there a good way to test this change? If there's an existing test harness, we should probably parameterize it to test the models being touched in this MR?

tensorrt_llm/_torch/models/modeling_clip.py

tensorrt_llm/_torch/models/modeling_hyperclovax.py

tensorrt_llm/_torch/models/modeling_siglip.py

Signed-off-by: yechank <[email protected]>

yechank-nvidia · 2025-10-20T04:07:07Z

/bot run

tensorrt-cicd · 2025-10-20T04:12:44Z

PR_Github #21847 [ run ] triggered by Bot. Commit: c0e9ed1

tensorrt-cicd · 2025-10-20T07:06:41Z

PR_Github #21847 [ run ] completed with state SUCCESS. Commit: c0e9ed1
/LLM/main/L0_MergeRequest_PR pipeline #16468 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

brb-nv

LGTM.

…DIA#7789) Signed-off-by: yechank <[email protected]>

…DIA#7789) Signed-off-by: yechank <[email protected]> Signed-off-by: yufeiwu-nv <[email protected]>

…DIA#7789) Signed-off-by: yechank <[email protected]>

yechank-nvidia marked this pull request as ready for review September 17, 2025 06:42

yechank-nvidia requested review from a team as code owners September 17, 2025 06:42

yechank-nvidia requested review from 2ez4bz, QiJune, chang-l, chzblych and qixiang-99 September 17, 2025 06:42

yechank-nvidia commented Sep 17, 2025

View reviewed changes

tensorrt_llm/inputs/registry.py Show resolved Hide resolved

yechank-nvidia self-assigned this Sep 17, 2025

coderabbitai bot reviewed Sep 17, 2025

View reviewed changes

chang-l reviewed Sep 18, 2025

View reviewed changes

tensorrt_llm/_torch/models/modeling_clip.py Outdated Show resolved Hide resolved

yechank-nvidia force-pushed the hyperclovax_kvcache_reuse branch from 9e58dec to 40f1830 Compare September 29, 2025 07:19

chang-l approved these changes Sep 30, 2025

View reviewed changes

tensorrt_llm/inputs/registry.py Show resolved Hide resolved

tensorrt_llm/_torch/models/modeling_llava_next.py Show resolved Hide resolved

docs/source/models/supported-models.md Show resolved Hide resolved

brb-nv reviewed Oct 1, 2025

View reviewed changes

yechank-nvidia added 4 commits October 20, 2025 13:05

add hyperclovax kv_cahce_reuse support

62e453d

Signed-off-by: yechank <[email protected]>

doc change

47dac29

Signed-off-by: yechank <[email protected]>

fix loading

699466c

Signed-off-by: yechank <[email protected]>

filter weights fix

c0e9ed1

Signed-off-by: yechank <[email protected]>

yechank-nvidia force-pushed the hyperclovax_kvcache_reuse branch from 47a8c65 to c0e9ed1 Compare October 20, 2025 04:06

brb-nv approved these changes Oct 21, 2025

View reviewed changes

chzblych approved these changes Oct 21, 2025

View reviewed changes

yechank-nvidia merged commit 85d5aa7 into NVIDIA:main Oct 21, 2025
5 checks passed

govind-ramnarayan pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request Oct 21, 2025

[None][feat] Support kv_cahce_reuse for HyperCLOVAX-Vision model (NVI…

1f3482d

…DIA#7789) Signed-off-by: yechank <[email protected]>

yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request Oct 24, 2025

[None][feat] Support kv_cahce_reuse for HyperCLOVAX-Vision model (NVI…

46fb72a

…DIA#7789) Signed-off-by: yechank <[email protected]> Signed-off-by: yufeiwu-nv <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Nov 1, 2025

[None][feat] Support kv_cahce_reuse for HyperCLOVAX-Vision model (NVI…

dafbfdd

…DIA#7789) Signed-off-by: yechank <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Nov 3, 2025

[None][feat] Support kv_cahce_reuse for HyperCLOVAX-Vision model (NVI…

9af1623

…DIA#7789) Signed-off-by: yechank <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Nov 3, 2025

[None][feat] Support kv_cahce_reuse for HyperCLOVAX-Vision model (NVI…

99a006e

…DIA#7789) Signed-off-by: yechank <[email protected]>

dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Nov 3, 2025

[None][feat] Support kv_cahce_reuse for HyperCLOVAX-Vision model (NVI…

c2aa150

…DIA#7789) Signed-off-by: yechank <[email protected]>

[None][feat] Support kv_cahce_reuse for HyperCLOVAX-Vision model #7789

[None][feat] Support kv_cahce_reuse for HyperCLOVAX-Vision model #7789

Uh oh!

Conversation

yechank-nvidia commented Sep 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

Uh oh!

coderabbitai bot commented Sep 17, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yechank-nvidia commented Sep 29, 2025

Uh oh!

tensorrt-cicd commented Sep 29, 2025

Uh oh!

tensorrt-cicd commented Sep 29, 2025

Uh oh!

yechank-nvidia commented Sep 30, 2025

Uh oh!

tensorrt-cicd commented Sep 30, 2025

Uh oh!

tensorrt-cicd commented Sep 30, 2025

Uh oh!

chang-l left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brb-nv commented Oct 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yechank-nvidia commented Oct 20, 2025

Uh oh!

tensorrt-cicd commented Oct 20, 2025

Uh oh!

tensorrt-cicd commented Oct 20, 2025

Uh oh!

brb-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yechank-nvidia commented Sep 17, 2025 •

edited by coderabbitai bot

Loading