Skip to content

Conversation

@yechank-nvidia
Copy link
Collaborator

@yechank-nvidia yechank-nvidia commented Sep 17, 2025

Summary by CodeRabbit

  • New Features

    • Enhanced multimodal processing for HCXVision, including image token counting and unified embedding utilities.
  • Bug Fixes

    • LLaVA-Next now pads variable-patch images for stable batching and shape consistency.
  • Documentation

    • Updated supported models with GPT-OSS (PyTorch backend).
    • Restructured feature matrices for clearer model/feature coverage.
  • Refactor

    • Vision backends (CLIP, SigLIP) reuse pre-initialized attention metadata for efficiency.
    • Streamlined multimodal hashing and position detection; early exits when no multimodal tokens.
    • Disaggregated multimodal embeddings temporarily disabled with a clear error.
  • Chores

    • Reduced noisy multimodal logging (warning → debug).

@yechank-nvidia yechank-nvidia marked this pull request as ready for review September 17, 2025 06:42
@yechank-nvidia yechank-nvidia requested review from a team as code owners September 17, 2025 06:42
@yechank-nvidia yechank-nvidia self-assigned this Sep 17, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 17, 2025

📝 Walkthrough

Walkthrough

Adds GPT-OSS to docs and restructures feature matrices. Reuses pre-initialized attention metadata in CLIP and SigLIP vision models. Refactors HyperCLOVA-X multimodal input processor and embedding flow. Pads LLaVA-Next image patches for batching. Refactors multimodal hashing/positioning pipeline. Adjusts a log level in multimodal token position lookup.

Changes

Cohort / File(s) Summary
Docs: Supported models and feature matrices
docs/source/models/supported-models.md
Added GptOssForCausalLM to PyTorch backend listings. Simplified/retitled feature matrix columns, reformatted model rows with code-formatted names across matrices.
Vision attention metadata reuse
tensorrt_llm/_torch/models/modeling_clip.py, tensorrt_llm/_torch/models/modeling_siglip.py
Introduced reusable attn_metadata on init; prepare_attn_metadata now mutates and prepares it per batch (pinned seq_lens, updated fields). Adjusted max_seq_len handling.
HyperCLOVA-X multimodal processor/flow
tensorrt_llm/_torch/models/modeling_hyperclovax.py
HCXVisionInputProcessor now inherits BaseMultimodalInputProcessor; adds vision_query_lengths, get_vocab_size, get_num_tokens_per_image, get_mm_token_ids. Forward uses get_multimodal_embeddings; DISAGG path raises NotImplementedError. Uses find_input_mm_embeds post-embed. Minor formatting updates.
LLaVA-Next batching padding
tensorrt_llm/_torch/models/modeling_llava_next.py
Added _pad_for_batching to zero-pad pixel values to equalize patch counts; applied before concatenation in forward.
Multimodal hashing/positions pipeline
tensorrt_llm/inputs/registry.py
Reordered flow: compute mm_hashes and prompt_token_ids early; early-return when no MM tokens; compute positions via find_mm_token_positions; flatten and cast hashes; construct MultimodalInput and attach to extras. Preserves fallback behavior on failures.
Logging level tweak
tensorrt_llm/inputs/multimodal.py
Downgraded a warning to debug in find_mm_token_positions when mm_token_ids supersede vocab_size.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant Registry as inputs.registry
  participant Proc as InputProcessor
  participant Utils as multimodal utils
  participant MM as MultimodalInput

  Caller->>Registry: multimodal_hashing_process(inputs)
  Note over Registry: Early compute
  Registry->>Proc: get_multimodal_hashes(...)
  Registry->>Proc: get_prompt_token_ids(...)
  alt No multimodal tokens
    Registry-->>Caller: (prompt_token_ids, None)
  else Has multimodal tokens
    Registry->>Proc: get_vocab_size(), get_mm_token_ids(), get_mm_special_token_ids()
    Registry->>Utils: find_mm_token_positions(prompt_token_ids, vocab_size/mm_ids/...) 
    Note over Registry: Flatten + int32 cast
    Registry->>MM: from_components(int32_hashes, start_positions, counts, offsets)
    Registry-->>Caller: (prompt_token_ids, { "multimodal_input": MM })
  end
Loading
sequenceDiagram
  autonumber
  actor User
  participant HCX as HCXVisionForCausalLM
  participant Enc as MM Encoder
  participant Utils as modeling_multimodal_utils

  User->>HCX: forward(input_ids, multimodal_params)
  alt DISAGG enabled
    HCX-->>User: NotImplementedError
  else Non-DISAGG
    HCX->>Utils: get_multimodal_embeddings(encoder_forward_fn=Enc.forward, multimodal_params[:N])
    Utils->>Enc: forward(pixel_values,...)
    Utils-->>HCX: mm_embeds
    HCX->>Utils: find_input_mm_embeds(mm_embeds, multimodal_params[:N])
    Utils-->>HCX: mm_embeds (aligned)
    HCX-->>User: logits
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request has no description ("No pull request description was added by the author"), so it does not follow the repository's required PR template and is missing the Description, Test Coverage, and PR Checklist information reviewers need to assess rationale and tests. Please populate the PR description using the repository template: provide a short summary of the change and why, list relevant tests and how they cover the change, confirm the PR checklist items (coding guidelines, tests, docs, CODEOWNERS, etc.), and include the appropriate ticket/issue reference or "[None]" as required.
Docstring Coverage ⚠️ Warning Docstring coverage is 22.58% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title "[None][feat] Support kv_cahce_reuse for HyperCLOVAX-Vision model" is concise and directly reflects the branch intent (hyperclovax_kvcache_reuse) and the modeling_hyperclovax changes in the diff, so it meaningfully summarizes the primary change; however it contains a typographical error ("cahce" should be "cache").
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (11)
tensorrt_llm/inputs/multimodal.py (3)

1-1: Add NVIDIA Apache-2.0 header (2025).

Per repo guidelines, prepend the current-year NVIDIA Apache-2.0 header to all Python sources.

Apply:

+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#     http://www.apache.org/licenses/LICENSE-2.0
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

626-629: Always return a tuple from find_mm_token_positions.

Returning a single list breaks callers expecting a 2‑tuple; will raise unpack errors in registry.py.

Apply:

-    if not torch.any(mm_mask):
-        return []
+    if not torch.any(mm_mask):
+        return [], []

150-193: UnboundLocalError risk for remainder in MultimodalRuntimeData.

remainder is defined only when (num_unseen_mm_tokens or num_mm_tokens_in_chunk) is None; later used unconditionally.

Apply:

-        if self.num_unseen_mm_tokens is None or self.num_mm_tokens_in_chunk is None:
+        remainder = 0
+        if self.num_unseen_mm_tokens is None or self.num_mm_tokens_in_chunk is None:
             # Compute cached multimodal tokens based on positions and cached tokens
             self.num_unseen_mm_tokens = 0
             self.num_mm_tokens_in_chunk = 0
-            remainder = 0
             for pos, length in zip(self.mm_token_positions,
                                    self.mm_token_lengths):
                 ...
+        else:
+            # Ensure defined when provided by caller
+            remainder = 0
tensorrt_llm/_torch/models/modeling_llava_next.py (2)

1-1: Add NVIDIA Apache-2.0 header (2025).

Add the standard header per guidelines.

Apply:

+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#     http://www.apache.org/licenses/LICENSE-2.0
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

5-11: Import torch.nn.functional.

Undefined name F in _pad_for_batching.

Apply:

 import torch
 import torch.nn as nn
+import torch.nn.functional as F
tensorrt_llm/inputs/registry.py (1)

1-1: Add NVIDIA Apache-2.0 header (2025).

Add the standard header per guidelines.

Apply:

+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#     http://www.apache.org/licenses/LICENSE-2.0
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
tensorrt_llm/_torch/models/modeling_hyperclovax.py (2)

1-1: Add NVIDIA Apache-2.0 header (2025).

Add the standard header per guidelines.

Apply:

+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#     http://www.apache.org/licenses/LICENSE-2.0
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

605-608: Use typing.Dict/Any for Python 3.8 compatibility.

Replace dict[str, any] with Dict[str, Any].

Apply:

-    def _post_process(self,
-                      input_ids: torch.Tensor,
-                      preprocessed_image: dict[str, any] = None):
+    def _post_process(self,
+                      input_ids: torch.Tensor,
+                      preprocessed_image: Dict[str, Any] = None):
@@
-    def _preprocess(self, text_prompt: dict[str, any], images: List[Any],
+    def _preprocess(self, text_prompt: Dict[str, Any], images: List[Any],
                     mm_processor_kwargs: Dict[str, Any]):

Also applies to: 667-669

tensorrt_llm/_torch/models/modeling_clip.py (1)

1-1: Add NVIDIA Apache-2.0 header (2025) at top of file.

Required by repo guidelines.

+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
tensorrt_llm/_torch/models/modeling_siglip.py (2)

1-1: Add NVIDIA Apache-2.0 header (2025) at top of file.

Required by repo guidelines.

+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

7-9: Duplicate/conflicting import of SiglipVisionConfig.

SiglipVisionConfig is imported from both configuration_siglip and modeling_siglip; the latter likely shadows the config class. Keep the config import only from configuration module.

-from transformers.models.siglip.configuration_siglip import SiglipVisionConfig
-from transformers.models.siglip.modeling_siglip import (SiglipVisionConfig,
-                                                        SiglipVisionEmbeddings)
+from transformers.models.siglip.configuration_siglip import SiglipVisionConfig
+from transformers.models.siglip.modeling_siglip import SiglipVisionEmbeddings
🧹 Nitpick comments (7)
tensorrt_llm/inputs/multimodal.py (1)

648-656: Clarify semantics of special token positions.

Returned start_special_token_positions are offsets within the flattened MM‑token union, not absolute prompt indices. Consider renaming to special_token_offsets or documenting explicitly to avoid misuse.

docs/source/models/supported-models.md (1)

47-58: Table consistency check.

Ensure the modality/feature values reflect actual runtime support (e.g., KV cache reuse states) for each class; discrepancies here create user confusion.

tensorrt_llm/_torch/models/modeling_hyperclovax.py (2)

595-601: Silence linter for unused parameters or use underscores.

Minor cleanup.

Apply:

-    def get_num_tokens_per_image(
-        self,
-        image: Image.Image,
-        **kwargs,
-    ):
+    def get_num_tokens_per_image(
+        self,
+        _image: Image.Image,
+        **_kwargs,
+    ):
         return self.vision_query_lengths[0].pop(0)

1059-1062: Remove extraneous f-prefix.

No interpolation; keep plain string.

Apply:

-                raise NotImplementedError(
-                    "HCXVisionForCausalLM does not support disaggregated inference yet. Please unset "
-                    f"the TLLM_MULTIMODAL_DISAGGREGATED environment variable, or set it to '0'."
-                )
+                raise NotImplementedError(
+                    "HCXVisionForCausalLM does not support disaggregated inference yet. Please unset "
+                    "the TLLM_MULTIMODAL_DISAGGREGATED environment variable, or set it to '0'."
+                )
tensorrt_llm/_torch/models/modeling_clip.py (2)

187-193: Pre‑allocating a mutable attn_metadata on the model can be racy across concurrent forwards.

If this model is shared across threads/requests, mutating a single instance is not thread‑safe. Either document single‑use/serialized access or scope attn_metadata per runner/engine instance.


118-121: Derive view shape from num_contexts/max_seq_len, not seq_lens buffer length.

Preallocated buffers may have capacity > batch size. Use explicit metadata fields.

-            encoder_states = encoder_states + (hidden_states.view(
-                attn_metadata.seq_lens.shape[0], attn_metadata.seq_lens[0],
-                -1), )
+            encoder_states = encoder_states + (hidden_states.view(
+                attn_metadata.num_contexts, attn_metadata.max_seq_len, -1), )
@@
-        encoder_states = encoder_states + (hidden_states.view(
-            attn_metadata.seq_lens.shape[0], attn_metadata.seq_lens[0], -1), )
+        encoder_states = encoder_states + (hidden_states.view(
+            attn_metadata.num_contexts, attn_metadata.max_seq_len, -1), )

Also applies to: 130-132

tensorrt_llm/_torch/models/modeling_siglip.py (1)

80-86: Same shared-state caveat as CLIP: attn_metadata on the model is mutable.

Not thread‑safe if the model is shared across requests. Clarify usage or scope per runner/engine.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 523a17d and 9e58dec.

📒 Files selected for processing (7)
  • docs/source/models/supported-models.md (2 hunks)
  • tensorrt_llm/_torch/models/modeling_clip.py (2 hunks)
  • tensorrt_llm/_torch/models/modeling_hyperclovax.py (7 hunks)
  • tensorrt_llm/_torch/models/modeling_llava_next.py (2 hunks)
  • tensorrt_llm/_torch/models/modeling_siglip.py (2 hunks)
  • tensorrt_llm/inputs/multimodal.py (1 hunks)
  • tensorrt_llm/inputs/registry.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

  • tensorrt_llm/inputs/multimodal.py
  • tensorrt_llm/_torch/models/modeling_clip.py
  • tensorrt_llm/_torch/models/modeling_siglip.py
  • tensorrt_llm/inputs/registry.py
  • tensorrt_llm/_torch/models/modeling_hyperclovax.py
  • tensorrt_llm/_torch/models/modeling_llava_next.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

  • tensorrt_llm/inputs/multimodal.py
  • tensorrt_llm/_torch/models/modeling_clip.py
  • tensorrt_llm/_torch/models/modeling_siglip.py
  • tensorrt_llm/inputs/registry.py
  • tensorrt_llm/_torch/models/modeling_hyperclovax.py
  • tensorrt_llm/_torch/models/modeling_llava_next.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

  • tensorrt_llm/inputs/multimodal.py
  • tensorrt_llm/_torch/models/modeling_clip.py
  • tensorrt_llm/_torch/models/modeling_siglip.py
  • tensorrt_llm/inputs/registry.py
  • tensorrt_llm/_torch/models/modeling_hyperclovax.py
  • tensorrt_llm/_torch/models/modeling_llava_next.py
🧬 Code graph analysis (5)
tensorrt_llm/inputs/multimodal.py (1)
tensorrt_llm/logger.py (1)
  • debug (144-145)
tensorrt_llm/_torch/models/modeling_clip.py (3)
tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py (1)
  • attn_metadata (96-97)
tensorrt_llm/_torch/speculative/eagle3.py (2)
  • prepare (123-166)
  • prepare (244-254)
tensorrt_llm/_torch/speculative/mtp.py (1)
  • prepare (161-209)
tensorrt_llm/_torch/models/modeling_siglip.py (4)
tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py (1)
  • attn_metadata (96-97)
tensorrt_llm/_torch/speculative/interface.py (1)
  • prepare (172-175)
tensorrt_llm/_torch/speculative/eagle3.py (2)
  • prepare (123-166)
  • prepare (244-254)
tensorrt_llm/_torch/speculative/mtp.py (1)
  • prepare (161-209)
tensorrt_llm/inputs/registry.py (5)
tensorrt_llm/inputs/multimodal.py (7)
  • apply_mm_hashes (478-510)
  • find_mm_token_lengths (529-572)
  • find_mm_token_positions (575-656)
  • validate_mm_inputs (659-702)
  • hexdigest_to_int32 (513-526)
  • MultimodalInput (19-85)
  • from_components (72-77)
tensorrt_llm/runtime/model_runner_cpp.py (1)
  • vocab_size (489-490)
tensorrt_llm/_torch/models/modeling_hyperclovax.py (2)
  • get_vocab_size (592-593)
  • get_mm_token_ids (602-603)
tensorrt_llm/_torch/models/modeling_gemma3vl.py (1)
  • mm_token_ids (291-292)
tensorrt_llm/_torch/models/modeling_mistral.py (1)
  • mm_token_ids (506-507)
tensorrt_llm/_torch/models/modeling_hyperclovax.py (3)
tensorrt_llm/inputs/multimodal.py (1)
  • MultimodalParams (196-455)
tensorrt_llm/inputs/registry.py (5)
  • BaseMultimodalInputProcessor (45-178)
  • InputProcessor (22-42)
  • get_vocab_size (64-84)
  • get_num_tokens_per_image (130-148)
  • get_mm_token_ids (86-100)
tensorrt_llm/_torch/models/modeling_multimodal_utils.py (3)
  • find_input_mm_embeds (158-233)
  • fuse_input_embeds (276-330)
  • get_multimodal_embeddings (99-155)
🪛 Ruff (0.12.2)
tensorrt_llm/inputs/registry.py

480-482: Avoid specifying long messages outside the exception class

(TRY003)

tensorrt_llm/_torch/models/modeling_hyperclovax.py

597-597: Unused method argument: image

(ARG002)


598-598: Unused method argument: kwargs

(ARG002)


1061-1061: f-string without any placeholders

Remove extraneous f prefix

(F541)

tensorrt_llm/_torch/models/modeling_llava_next.py

375-375: Undefined name F

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (5)
tensorrt_llm/inputs/multimodal.py (2)

606-608: Log level downgrade LGTM.

Demoting to debug reduces noise; in this repo, logger.debug maps to VERBOSE. No action needed.


529-538: Fix return annotation and stale comment in find_mm_token_lengths.

Function returns a dict, not List[int]; comment is misleading.
[suggest_minor_issue]
Apply:

-def find_mm_token_lengths(mm_data: Dict[str, Any],
-                          input_processor: Any) -> List[int]:
+def find_mm_token_lengths(mm_data: Dict[str, Any],
+                          input_processor: Any) -> Dict[str, List[int]]:
@@
-    return num_mm_tokens  # flatten all mm instances to a single list
+    return num_mm_tokens  # mapping: modality -> list of lengths

Also applies to: 572-573

docs/source/models/supported-models.md (1)

13-13: Verify GPT‑OSS listing.

Please confirm that GptOssForCausalLM exists in the codebase and has a working loader; otherwise gate the row behind a release flag or remove until available.

tensorrt_llm/inputs/registry.py (1)

503-506: Attach MultimodalInput robustly.

This block assumes extra_processed_inputs is a dict (after guards above). Looks good once guards are in.

tensorrt_llm/_torch/models/modeling_siglip.py (1)

93-105: Confirm max_seq_len semantics.

You switched to max_seq_len = seq_len (per‑request maximum). Verify the backend expects per‑request max rather than total flattened tokens; mismatch can under‑allocate.

@yechank-nvidia yechank-nvidia force-pushed the hyperclovax_kvcache_reuse branch from 9e58dec to 40f1830 Compare September 29, 2025 07:19
@yechank-nvidia
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20235 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20235 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15259 completed with status: 'FAILURE'

@yechank-nvidia
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20331 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #20331 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #15335 completed with status: 'FAILURE'

Copy link
Collaborator

@chang-l chang-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@brb-nv
Copy link
Collaborator

brb-nv commented Oct 1, 2025

Thank you for the MR, @yechank-nvidia. Is there a good way to test this change? If there's an existing test harness, we should probably parameterize it to test the models being touched in this MR?

@yechank-nvidia yechank-nvidia force-pushed the hyperclovax_kvcache_reuse branch from 47a8c65 to c0e9ed1 Compare October 20, 2025 04:06
@yechank-nvidia
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #21847 [ run ] triggered by Bot. Commit: c0e9ed1

@tensorrt-cicd
Copy link
Collaborator

PR_Github #21847 [ run ] completed with state SUCCESS. Commit: c0e9ed1
/LLM/main/L0_MergeRequest_PR pipeline #16468 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

Copy link
Collaborator

@brb-nv brb-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@yechank-nvidia yechank-nvidia merged commit 85d5aa7 into NVIDIA:main Oct 21, 2025
5 checks passed
govind-ramnarayan pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request Oct 21, 2025
yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request Oct 24, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Nov 1, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Nov 3, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Nov 3, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants