Skip to content

Conversation

@lancelly
Copy link
Collaborator

@lancelly lancelly commented Nov 5, 2025

Similar to: vllm-project/vllm#871

If we load a model with trust_remote_code=True, hf will download model code or load local model code and cache dynamic model modules in its cache directory (it's usually in ~/.cache/huggingface/modules/). When a server starts, it will serialize some configs and send them to worker sub process. The model config contains dynamic modules imported from hf cache directory which is already added in sys.path by init_hf_modules function. So workers have to init_hf_modules once they start in case of ModuleNotFoundError.

After adding this initialization, locally running trtllm-llmapi-launch trtllm-eval --trust_remote_code with Kimi-K2(as described in https://nvbugs/5556998) no longer raises:
ModuleNotFoundError: No module named 'transformers_modules'

Interestingly, IT coverage seems to have already included trtllm-llmapi-launch trtllm-eval --trust_remote_code usage , yet no error occurred:

if "Kimi" in model_path:

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced initialization handling for models requiring remote code execution to ensure proper module setup during process startup and worker spawning.

@lancelly lancelly requested a review from a team as a code owner November 5, 2025 03:04
@lancelly lancelly requested a review from hchings November 5, 2025 03:04
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 5, 2025

📝 Walkthrough

Walkthrough

Adds an idempotent helper function to initialize cached HuggingFace modules for models with trust_remote_code=True. The function is invoked at module import time to initialize HF modules in the main process and spawned subprocesses, and again inside worker_main when appropriate to ensure initialization in worker contexts.

Changes

Cohort / File(s) Change Summary
HuggingFace Module Initialization
tensorrt_llm/executor/worker.py
Added idempotent helper function to initialize cached HuggingFace modules with error handling and logging. Invoked at module import time and conditionally within worker_main when trust_remote_code=True.

Sequence Diagram

sequenceDiagram
    participant MainProc as Main Process
    participant SubProc as Spawned Subprocess
    participant WorkerMain as worker_main()
    participant HFInit as HF Init Helper

    Note over MainProc,HFInit: Module Import Phase
    MainProc->>HFInit: invoke init helper (top-level)
    HFInit-->>MainProc: HF modules cached

    Note over SubProc,HFInit: Subprocess Creation
    SubProc->>HFInit: invoke init helper (on import)
    HFInit-->>SubProc: HF modules cached in subprocess

    Note over WorkerMain,HFInit: Worker Execution (if trust_remote_code=True)
    WorkerMain->>HFInit: invoke init helper (before processing)
    HFInit-->>WorkerMain: HF modules ready
    WorkerMain->>WorkerMain: proceed with work
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Single focused file with clear scope
  • Initialization logic affecting multiple execution contexts (main, subprocess, worker) requires careful timing review
  • Need to verify idempotency guarantees and error handling completeness
  • Consider subprocess inheritance behavior and logging visibility across process boundaries

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title follows the required format with NVBugs ticket, type designation, and clearly describes the fix for HuggingFace module initialization in worker processes for models with trust_remote_code=True.
Description check ✅ Passed The PR description explains the issue, references similar work (vllm PR #871), describes the solution, and provides verification that the fix resolves the ModuleNotFoundError. However, it lacks explicit sections for Test Coverage and PR Checklist completion as specified in the template.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tensorrt_llm/executor/worker.py (1)

40-57: Log failures with stack traces

Right now we lose the stack when _init_hf_modules() blows up, which makes remote-module path issues hard to diagnose. Please log with exc_info or logger.exception so we keep the context.

@@
-    except ImportError as e:
-        logger.warning(f"ImportError initializing HF modules: {e}")
-    except Exception as e:
-        logger.error(f"Exception initializing HF modules: {e}")
+    except ImportError:
+        logger.warning("ImportError initializing HF modules", exc_info=True)
+    except Exception:
+        logger.exception("Exception initializing HF modules")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eeb56c2 and b86753d.

📒 Files selected for processing (1)
  • tensorrt_llm/executor/worker.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

  • tensorrt_llm/executor/worker.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

  • tensorrt_llm/executor/worker.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

  • tensorrt_llm/executor/worker.py
🧠 Learnings (1)
📚 Learning: 2025-07-17T09:01:27.402Z
Learnt from: amitz-nv
Repo: NVIDIA/TensorRT-LLM PR: 5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

Applied to files:

  • tensorrt_llm/executor/worker.py
🧬 Code graph analysis (1)
tensorrt_llm/executor/worker.py (1)
tensorrt_llm/logger.py (3)
  • debug (144-145)
  • warning (132-133)
  • error (126-127)
🪛 Ruff (0.14.3)
tensorrt_llm/executor/worker.py

55-55: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (1)
tensorrt_llm/executor/worker.py (1)

269-271: Good call re-initializing in worker processes

Re-running _init_hf_modules() when trust_remote_code models are loaded keeps worker subprocesses in sync. Looks solid.

Signed-off-by: Lanyu Liao <[email protected]>
@lancelly
Copy link
Collaborator Author

lancelly commented Nov 5, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23591 [ run ] triggered by Bot. Commit: dbbcfb8

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23591 [ run ] completed with state SUCCESS. Commit: dbbcfb8
/LLM/main/L0_MergeRequest_PR pipeline #17752 completed with status: 'FAILURE'

@lancelly
Copy link
Collaborator Author

lancelly commented Nov 5, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23636 [ run ] triggered by Bot. Commit: dbbcfb8

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23636 [ run ] completed with state FAILURE. Commit: dbbcfb8
/LLM/main/L0_MergeRequest_PR pipeline #17783 completed with status: 'FAILURE'

@lancelly
Copy link
Collaborator Author

lancelly commented Nov 6, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23690 [ run ] triggered by Bot. Commit: cc728dc

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23690 [ run ] completed with state SUCCESS. Commit: cc728dc
/LLM/main/L0_MergeRequest_PR pipeline #17824 completed with status: 'FAILURE'

@lancelly
Copy link
Collaborator Author

lancelly commented Nov 7, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23789 [ run ] triggered by Bot. Commit: cc728dc

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23789 [ run ] completed with state SUCCESS. Commit: cc728dc
/LLM/main/L0_MergeRequest_PR pipeline #17908 completed with status: 'FAILURE'

@lancelly
Copy link
Collaborator Author

lancelly commented Nov 9, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23919 [ run ] triggered by Bot. Commit: cc728dc

@lancelly
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23967 [ run ] triggered by Bot. Commit: cc728dc

@lancelly
Copy link
Collaborator Author

/bot run --abort

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23968 Bot args parsing error: usage: /bot [-h]
{run,kill,skip,submit,reviewers,reuse-pipeline,reuse-review} ...
/bot: error: unrecognized arguments: --abort

@lancelly
Copy link
Collaborator Author

/bot run --disable-fail-fast

1 similar comment
@lancelly
Copy link
Collaborator Author

/bot run --disable-fail-fast

@lancelly
Copy link
Collaborator Author

/bot kill

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23975 [ kill ] triggered by Bot. Commit: cc728dc

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23967 [ run ] completed with state ABORTED. Commit: cc728dc
LLM/main/L0_MergeRequest_PR #18049 (Blue Ocean) completed with status: ABORTED

@lancelly
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23975 [ kill ] completed with state SUCCESS. Commit: cc728dc
Successfully killed previous jobs for commit cc728dc

@lancelly
Copy link
Collaborator Author

/bot run --disable-fail-fast

1 similar comment
@lancelly
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23988 [ run ] triggered by Bot. Commit: cc728dc

@tensorrt-cicd
Copy link
Collaborator

PR_Github #23988 [ run ] completed with state SUCCESS. Commit: cc728dc
/LLM/main/L0_MergeRequest_PR pipeline #18065 completed with status: 'SUCCESS'

Copy link
Collaborator

@Superjomn Superjomn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Superjomn Superjomn merged commit 1fd1145 into NVIDIA:main Nov 11, 2025
5 checks passed
suyoggupta pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request Nov 12, 2025
lancelly added a commit to lancelly/TensorRT-LLM that referenced this pull request Dec 10, 2025
lancelly added a commit to lancelly/TensorRT-LLM that referenced this pull request Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants