Skip to content

Conversation

@willkill07
Copy link
Member

@willkill07 willkill07 commented Aug 25, 2025

Description

  • Introduce a ThinkingMixin which defaults to None -- currently is only supported for NVIDIA Nemotron-like models.
  • Add support to all providers to automatically insert an additional "thinking" system prompt if requested
  • Add tests to validate expected behavior

Requirements

Our nemotron model has a toggle for thinking on/off, by adding a system prompt message of {"role":"system","content":"/think"} or {"role":"system","content":"/no_think"}.

It would be nice if users could specify it in the config (llm section) whether they want thinking on or off, and we handle the system message for them.

We should see how we can support this in a transparent way.

Closes AIQ-1609

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

  • New Features

    • Optional "Thinking" setting for supported Nemotron models (defaults to None); when enabled, provides a model-specific system prompt and is applied across providers and SDK wrappers.
  • Improvements

    • Clarified gated-field semantics and "by a model" applicability for mixins (e.g., temperature, top_p).
    • Centralized thinking injection and retry application across integrations for consistent behavior.
    • Simplified gated-field typing by removing generic parameterization.
  • Documentation

    • Renamed gating docs to “Gated Field Mixins”, added Thinking mixin guidance; AWS Bedrock docs updated (top_p, context_size, max_retries).
  • Tests

    • Added comprehensive tests for Thinking behavior, prompt injection, and gated-field cases.
  • Chores

    • Added “Nemotron” to accepted vocabulary.

@coderabbitai
Copy link

coderabbitai bot commented Aug 25, 2025

Walkthrough

Removes generic parameterization from GatedFieldMixin and widens default typing; refines gating/validation logic; adds ThinkingMixin and thinking_system_prompt; implements a BaseThinkingInjector and patch_with_thinking to inject system prompts; centralizes thinking+retry patching across plugin wrappers; updates providers, docs, tests, and CI vocabulary.

Changes

Cohort / File(s) Summary
Core gating & mixins
src/nat/data_models/gated_field_mixin.py, src/nat/data_models/temperature_mixin.py, src/nat/data_models/top_p_mixin.py
Remove Generic[T] from GatedFieldMixin, change default_if_supported typing to `object
Thinking mixin
src/nat/data_models/thinking_mixin.py
Add ThinkingMixin with `thinking: bool
Thinking utils & tests
src/nat/llm/utils/thinking.py, tests/nat/llm/utils/test_thinking.py
Add FunctionArgumentWrapper, BaseThinkingInjector dataclass, decorator factory supporting sync/async/generator/async-generator, and patch_with_thinking; tests validate patched behavior across callable kinds.
Provider configs — include ThinkingMixin
src/nat/llm/aws_bedrock_llm.py, src/nat/llm/azure_openai_llm.py, src/nat/llm/nim_llm.py, src/nat/llm/openai_llm.py
Import and add ThinkingMixin to multiple LLM config classes; adjust AWS Bedrock max_tokens and context_size field descriptions.
Plugin wrappers — centralized patching
packages/nvidia_nat_agno/src/.../llm.py, packages/nvidia_nat_crewai/src/.../llm.py, packages/nvidia_nat_langchain/src/.../llm.py, packages/nvidia_nat_llama_index/src/.../llm.py, packages/nvidia_nat_semantic_kernel/src/.../llm.py
Add module-level ModelType TypeVar and _patch_llm_based_on_config(client, llm_config) to apply thinking injection (when present) then retry wrapping; introduce per-plugin ThinkingInjector implementations; refactor wrappers to call centralized patching.
Plugin tests & AGNO changes
packages/nvidia_nat_agno/tests/test_llm.py, related tests
Update tests to inspect mock.call_args for keyword assertions and base_url propagation; remove debug prints; adapt to centralized patching and argument-inspection style.
Gated mixin tests
tests/nat/data_models/test_gated_field_mixin.py
Replace GatedFieldMixin[T] usages with GatedFieldMixin in tests; adjust expectations where defaults are applied.
ThinkingMixin tests
tests/nat/data_models/test_thinking_mixin.py
New tests covering thinking prompt generation, model-key precedence, supported/unsupported validation, alternative key support, and no-keys behavior.
Agent & builder test updates
tests/nat/agent/test_react.py, tests/nat/builder/test_builder.py
mock_react_agent fixture internal name changed and now returns constructed agent; renamed TestTTCStrategyConfigTTTCStrategyConfig and updated usages.
mem0ai import warning suppression
packages/nvidia_nat_mem0ai/src/nat/plugins/mem0ai/mem0_editor.py
Suppress specific Pydantic deprecation warnings around AsyncMemoryClient import.
Docs & CI vocabulary
docs/source/extend/adding-an-llm-provider.md, docs/source/extend/gated-fields.md, docs/source/extend/integrating-aws-bedrock-models.md, docs/source/workflows/llms/index.md, ci/vale/styles/config/vocabularies/nat/accept.txt
Rename header to "Gated Field Mixins", clarify gated-field semantics and mixin applicability, document ThinkingMixin (Nemotron), add AWS Bedrock top_p/context_size docs, and add Nemotron to accepted vocabulary.

Sequence Diagram(s)

%%{init: {"theme":"default"}}%%
sequenceDiagram
  autonumber
  participant Config as ModelConfig
  participant Wrapper as Plugin Wrapper
  participant Patch as _patch_llm_based_on_config
  participant Injector as ThinkingInjector
  participant Client as LLM Client
  participant Provider as Provider API

  Config->>Wrapper: build client from config
  Wrapper->>Patch: call _patch_llm_based_on_config(client, config)
  alt ThinkingMixin present & thinking_system_prompt set
    Patch->>Injector: create injector(system_prompt, function_names)
    Patch->>Client: patch methods via patch_with_thinking(injector)
  end
  alt RetryMixin present
    Patch->>Client: patch retry behavior (patch_with_retry)
  end
  Client->>Provider: invoke patched method
  Provider-->>Client: response
  Client-->>Wrapper: return result
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs


📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c69e9f6 and cf3c251.

📒 Files selected for processing (1)
  • src/nat/data_models/thinking_mixin.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/nat/data_models/thinking_mixin.py
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai bot added feature request New feature or request non-breaking Non-breaking change labels Aug 25, 2025
coderabbitai[bot]

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as outdated.

@willkill07 willkill07 force-pushed the wkk_nemotron_thinking_support branch from 6a8f46b to 6a5b808 Compare August 25, 2025 22:08
coderabbitai[bot]

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as outdated.

@willkill07 willkill07 force-pushed the wkk_nemotron_thinking_support branch from bc2f7c9 to 59791d9 Compare August 26, 2025 13:12
coderabbitai[bot]

This comment was marked as outdated.

@willkill07 willkill07 force-pushed the wkk_nemotron_thinking_support branch from 59791d9 to 5f1d3f5 Compare August 26, 2025 13:26
@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 26, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@willkill07 willkill07 force-pushed the wkk_nemotron_thinking_support branch from 5f1d3f5 to 029578d Compare August 26, 2025 13:28
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (1)

55-67: API key propagation logic is a no-op and contains a typo; fix mapping from config to env var.

  • The branch reads and writes the same env var (NVIDIA_API_KEY) making it a no-op.
  • Variable name typo: nvidai_api_key → nvidia_api_key.
  • If llm_config.api_key is provided, prefer setting NVIDIA_API_KEY when not already set, or pass it directly if the client supports it.
  • Mutating process env is side-effectful; scope carefully and document.

Additional changes (outside the shown hunk):

# Replace the current block with:
if ("api_key" not in config_obj or config_obj["api_key"] is None):
    if not os.environ.get("NVIDIA_API_KEY"):
        # Prefer config value if present; otherwise leave env untouched.
        if getattr(llm_config, "api_key", None):
            os.environ["NVIDIA_API_KEY"] = llm_config.api_key  # noqa: S105 (intentional env injection)
else:
    # If api_key is explicitly provided in config, set env only when missing.
    if not os.environ.get("NVIDIA_API_KEY") and config_obj.get("api_key"):
        os.environ["NVIDIA_API_KEY"] = config_obj["api_key"]  # noqa: S105

Optionally, if Nvidia(...) accepts an api_key kwarg, prefer passing it directly instead of mutating env:

kwargs = {"id": config_obj.get("id")}
if "base_url" in config_obj and config_obj.get("base_url") is not None:
    kwargs["base_url"] = config_obj.get("base_url")
if config_obj.get("api_key"):
    kwargs["api_key"] = config_obj["api_key"]
client = Nvidia(**kwargs)  # type: ignore[arg-type]
♻️ Duplicate comments (5)
src/nat/llm/utils/thinking.py (1)

32-41: Add proper return type annotation for the decorator function.

The decorator function returned by _thinking_injector is currently typed too generically as Callable[..., Any]. This loses type safety at call sites.

Apply this diff to add proper type hints using ParamSpec and TypeVar:

+from typing import ParamSpec
 from typing import TypeVar

 ModelType = TypeVar("ModelType")
 MessagesType = TypeVar("MessagesType")
+P = ParamSpec("P")
+R = TypeVar("R")

-def _thinking_injector(system_prompt_injector: Callable[[MessagesType], MessagesType], ) -> Callable[..., Any]:
+def _thinking_injector(system_prompt_injector: Callable[[MessagesType], MessagesType]) -> Callable[[Callable[P, R]], Callable[P, R]]:
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (4)

17-17: Import missing symbols to support return annotations and generalized message types.

To add AsyncIterator return types and use Sequence in the injector signature, extend imports.

-from typing import TypeVar
+from typing import TypeVar, Any
+from collections.abc import AsyncIterator, Sequence

32-43: Docstring + generalize to Sequence and make idempotent to avoid duplicate system prompts.

Meet docstring requirements, avoid duplicate system messages, and accept any Sequence to match other providers and typing guidance.

-def agno_thinking_injector(client: ModelType, system_prompt: str) -> ModelType:
+def agno_thinking_injector(client: ModelType, system_prompt: str) -> ModelType:
+    """Patch an AGNO client to inject a system prompt as the first message."""
     from agno.models.message import Message
 
-    def injector(messages: list[Message]) -> list[Message]:
-        return [Message(role="system", content=system_prompt)] + messages
+    def injector(messages: Sequence[Message]) -> list[Message]:
+        if messages and getattr(messages[0], "role", None) == "system" and getattr(messages[0], "content", None) == system_prompt:
+            return list(messages)
+        return [Message(role="system", content=system_prompt), *messages]
 
     return patch_with_thinking(
         client,
         function_names=["invoke_stream", "invoke", "ainvoke", "ainvoke_stream"],
         system_prompt_injector=injector,
     )

46-47: Add return type and minimal docstring to public provider factory.

Public APIs must include return type hints and docstrings; annotate as AsyncIterator[Any] and add a concise summary.

-async def nim_agno(llm_config: NIMModelConfig, _builder: Builder):
+async def nim_agno(llm_config: NIMModelConfig, _builder: Builder) -> AsyncIterator[Any]:
+    """Yield an AGNO Nvidia client configured from nat, optionally patched for thinking and retries."""

89-90: Add return type and minimal docstring to public provider factory.

Apply the same return annotation and docstring pattern to OpenAI.

-async def openai_agno(llm_config: OpenAIModelConfig, _builder: Builder):
+async def openai_agno(llm_config: OpenAIModelConfig, _builder: Builder) -> AsyncIterator[Any]:
+    """Yield an AGNO OpenAI client configured from nat, optionally patched for thinking and retries."""
🧹 Nitpick comments (6)
docs/source/extend/gated-fields.md (1)

80-84: Documentation formatting inconsistency.

The new ThinkingMixin documentation is indented differently from the previous entries. All other mixin entries use 2-space indentation for their bullet points, but this one uses inconsistent spacing.

Apply this formatting fix for consistency:

-- {py:class}`~nat.data_models.thinking_mixin.ThinkingMixin`
-  - Field: `thinking: bool | None`
-  - Default when supported: `None` (use model default)
-  - Only currently supported on Nemotron models
+- {py:class}`~nat.data_models.thinking_mixin.ThinkingMixin`
+  - Field: `thinking: bool | None`
+  - Default when supported: `None` (use model default)
+  - Only currently supported on Nemotron models
docs/source/extend/adding-an-llm-provider.md (1)

83-89: Clarify the thinking_system_prompt property description.

The documentation should be clearer about when thinking_system_prompt returns None versus a system prompt string.

Apply this clarification:

-- `ThinkingMixin`: adds a `thinking` field, with a default of `None` when supported by a model. If supported, the `thinking_system_prompt` property will return the system prompt to use for thinking.
+- `ThinkingMixin`: adds a `thinking` field, with a default of `None` when supported by a model. The `thinking_system_prompt` property returns the appropriate system prompt string when both the model supports thinking and the field is set, otherwise returns `None`.
src/nat/llm/utils/thinking.py (1)

107-119: Consolidate exception handling logic.

The exception messages use inconsistent formatting with special characters (‑ vs -) and the logic flow could be clearer.

Apply this refactor for consistency and clarity:

-        try:  # instance‑level first
+        try:  # instance-level first
             if not inspect.isclass(obj):
                 object.__setattr__(obj, name, types.MethodType(wrapped, obj))
                 continue
         except Exception as exc:
             logger.info(
-                "Instance‑level patch failed for %s.%s (%s); "
-                "falling back to class‑level patch.",
+                "Instance-level patch failed for %s.%s (%s); "
+                "falling back to class-level patch.",
                 cls_name,
                 name,
                 exc,
             )

-        try:  # class‑level fallback
+        try:  # class-level fallback
             setattr(cls, name, wrapped)
packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py (3)

34-41: Add a concise docstring and make injection idempotent to avoid duplicate system prompts.

Prepend a Google-style docstring per guidelines and guard against double-injection when the first message already matches the same system prompt. This prevents duplicated system messages if the wrapper is patched multiple times or if upstream already injected one.

 def llama_index_thinking_injector(client: ModelType, system_prompt: str) -> ModelType:
+    """Patch a llama_index client to inject a system prompt as the first message."""
     from llama_index.core.base.llms.types import ChatMessage
 
-    def injector(messages: Sequence[ChatMessage]) -> Sequence[ChatMessage]:
-        msgs = list(messages)
-        msgs.insert(0, ChatMessage(role="system", content=system_prompt))
-        return msgs
+    def injector(messages: Sequence[ChatMessage]) -> Sequence[ChatMessage]:
+        # Idempotency: if first message is already the same system_prompt, do nothing.
+        if messages and getattr(messages[0], "role", None) == "system" and getattr(messages[0], "content", None) == system_prompt:
+            return list(messages)
+        msgs = list(messages)
+        msgs.insert(0, ChatMessage(role="system", content=system_prompt))
+        return msgs

31-32: Consider adding return type hints and minimal docstrings to the public provider factories.

Per repo guidelines, public APIs should include Python 3.11+ type hints and Google-style docstrings. Since these are async generators that yield a configured client, annotate with AsyncIterator[Any] and add one-line summaries.

Additional changes (outside the shown hunk):

# At top-level imports
from collections.abc import AsyncIterator
from typing import Any

# For each provider below, add return type + docstring, e.g.:
async def aws_bedrock_llama_index(llm_config: AWSBedrockModelConfig, _builder: Builder) -> AsyncIterator[Any]:
    """Yield a llama_index Bedrock client configured from nat, optionally patched for thinking and retries."""
    ...
    yield llm

async def azure_openai_llama_index(llm_config: AzureOpenAIModelConfig, _builder: Builder) -> AsyncIterator[Any]:
    """Yield a llama_index Azure OpenAI client configured from nat, optionally patched for thinking and retries."""
    ...
    yield llm

async def openai_llama_index(llm_config: OpenAIModelConfig, _builder: Builder) -> AsyncIterator[Any]:
    """Yield a llama_index OpenAI client configured from nat, optionally patched for thinking and retries."""
    ...
    yield llm

91-109: Parity: consider enabling thinking injection for llama_index NIM as well.

ThinkingMixin is supported by NIMModelConfig; wiring injection here would align behavior across providers and reduce surprises for users switching frameworks.

Additional changes (outside the shown hunk):

# Inside nim_llama_index after llm = NVIDIA(**kwargs)
if isinstance(llm_config, ThinkingMixin) and llm_config.thinking_system_prompt is not None:
    llm = llama_index_thinking_injector(llm, llm_config.thinking_system_prompt)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 59791d9 and 029578d.

📒 Files selected for processing (19)
  • ci/vale/styles/config/vocabularies/nat/accept.txt (1 hunks)
  • docs/source/extend/adding-an-llm-provider.md (1 hunks)
  • docs/source/extend/gated-fields.md (1 hunks)
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (4 hunks)
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (4 hunks)
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (5 hunks)
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py (4 hunks)
  • packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py (3 hunks)
  • src/nat/data_models/gated_field_mixin.py (6 hunks)
  • src/nat/data_models/thinking_mixin.py (1 hunks)
  • src/nat/llm/aws_bedrock_llm.py (1 hunks)
  • src/nat/llm/azure_openai_llm.py (1 hunks)
  • src/nat/llm/nim_llm.py (1 hunks)
  • src/nat/llm/openai_llm.py (1 hunks)
  • src/nat/llm/utils/thinking.py (1 hunks)
  • tests/nat/agent/test_react.py (1 hunks)
  • tests/nat/data_models/test_gated_field_mixin.py (1 hunks)
  • tests/nat/data_models/test_thinking_mixin.py (1 hunks)
  • tests/nat/llm/utils/test_thinking.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (12)
  • src/nat/llm/openai_llm.py
  • src/nat/llm/aws_bedrock_llm.py
  • tests/nat/data_models/test_thinking_mixin.py
  • src/nat/data_models/thinking_mixin.py
  • src/nat/llm/azure_openai_llm.py
  • packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py
  • tests/nat/data_models/test_gated_field_mixin.py
  • tests/nat/agent/test_react.py
  • tests/nat/llm/utils/test_thinking.py
  • src/nat/data_models/gated_field_mixin.py
  • src/nat/llm/nim_llm.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
🧰 Additional context used
📓 Path-based instructions (13)
**/*.{py,md,sh,yml,yaml,toml,json,ini,cfg,txt,rst,pyi}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces (no tabs) and ensure every file ends with a single trailing newline

Files:

  • ci/vale/styles/config/vocabularies/nat/accept.txt
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • docs/source/extend/adding-an-llm-provider.md
  • src/nat/llm/utils/thinking.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
  • docs/source/extend/gated-fields.md
**/*.{py,sh,md,yml,yaml,toml,json,ini,txt,cfg,rst,pyi}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.{py,sh,md,yml,yaml,toml,json,ini,txt,cfg,rst,pyi}: Every file must start with the standard SPDX Apache-2.0 header
All source files must include the SPDX Apache-2.0 header template

Files:

  • ci/vale/styles/config/vocabularies/nat/accept.txt
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • docs/source/extend/adding-an-llm-provider.md
  • src/nat/llm/utils/thinking.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
  • docs/source/extend/gated-fields.md
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code should be licensed under the Apache License 2.0, and should contain an Apache License 2.0 header

comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • ci/vale/styles/config/vocabularies/nat/accept.txt
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • docs/source/extend/adding-an-llm-provider.md
  • src/nat/llm/utils/thinking.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
  • docs/source/extend/gated-fields.md
{src/**/*.py,packages/*/src/**/*.py,examples/*/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' only for env var prefixes and informal code comments (never in documentation)

Files:

  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • src/nat/llm/utils/thinking.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All importable Python code in packages must live under packages//src/

Files:

  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Format Python code with yapf (PEP8 base, column_limit=120) run second in the toolchain
Run ruff (ruff check --fix) using configuration in pyproject.toml; fix warnings unless explicitly ignored; ruff is a linter only (not for formatting)
Respect naming: snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
Prefer typing/collections.abc abstractions (e.g., Sequence over list)
Use typing.Annotated for units or extra metadata when useful
Treat pyright warnings (configured in pyproject.toml) as errors during development
Prefer httpx with SSL verification enabled by default for HTTP requests; follow OWASP Top-10
Use async/await for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial

Files:

  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • src/nat/llm/utils/thinking.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{src/**/*.py,packages/*/src/**/*.py}: All public APIs require Python 3.11+ type hints on parameters and return values
Provide Google-style docstrings for every public module, class, function and CLI command
Docstring first line must be a concise description ending with a period

Files:

  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • src/nat/llm/utils/thinking.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*.{py,md}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived by setuptools-scm; never hard-code versions in code or documentation

Files:

  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • docs/source/extend/adding-an-llm-provider.md
  • src/nat/llm/utils/thinking.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
  • docs/source/extend/gated-fields.md
packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

  • Not all packages contain Python code, if they do they should also contain their own set of tests, in a
    tests/ directory at the same level as the pyproject.toml file.

Files:

  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
docs/source/**/*.md

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

docs/source/**/*.md: Use the official naming: first use 'NVIDIA NeMo Agent toolkit'; subsequent references 'NeMo Agent toolkit'; in headings use 'NeMo Agent Toolkit' (capital T)
Never use deprecated names (Agent Intelligence toolkit, aiqtoolkit, AgentIQ, AIQ/aiq) in documentation; update to the current naming unless explicitly referring to deprecated names
'AIQ Blueprint' is the intended name for the blueprint in docs; do not change it
Documentation sources must be Markdown files under docs/source
Surround code entities with backticks in documentation to avoid Vale false-positives
Keep docs in sync with code; documentation pipeline must pass (no Sphinx errors or broken links)

Files:

  • docs/source/extend/adding-an-llm-provider.md
  • docs/source/extend/gated-fields.md
docs/source/**/*

⚙️ CodeRabbit configuration file

This directory contains the source code for the documentation. All documentation should be written in Markdown format. Any image files should be placed in the docs/source/_static directory.

Files:

  • docs/source/extend/adding-an-llm-provider.md
  • docs/source/extend/gated-fields.md
src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All importable Python code must live under src/ so namespace packages resolve correctly

Files:

  • src/nat/llm/utils/thinking.py
src/nat/**/*

⚙️ CodeRabbit configuration file

This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.

Files:

  • src/nat/llm/utils/thinking.py
🧬 Code graph analysis (3)
packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py (2)
src/nat/data_models/thinking_mixin.py (2)
  • ThinkingMixin (28-66)
  • thinking_system_prompt (49-66)
src/nat/llm/utils/thinking.py (1)
  • patch_with_thinking (78-130)
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (2)
src/nat/data_models/thinking_mixin.py (2)
  • ThinkingMixin (28-66)
  • thinking_system_prompt (49-66)
src/nat/llm/utils/thinking.py (1)
  • patch_with_thinking (78-130)
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (4)
src/nat/data_models/thinking_mixin.py (2)
  • ThinkingMixin (28-66)
  • thinking_system_prompt (49-66)
src/nat/llm/nim_llm.py (1)
  • NIMModelConfig (31-41)
src/nat/llm/openai_llm.py (2)
  • openai_llm (45-47)
  • OpenAIModelConfig (30-41)
src/nat/llm/utils/thinking.py (1)
  • patch_with_thinking (78-130)
🪛 LanguageTool
docs/source/extend/gated-fields.md

[grammar] ~80-~80: There might be a mistake here.
Context: ... supported on GPT-5 models - {py:class}~nat.data_models.thinking_mixin.ThinkingMixin - Field: thinking: bool | None - Defau...

(QB_NEW_EN)


[grammar] ~81-~81: There might be a mistake here.
Context: ...hinking_mixin.ThinkingMixin - Field:thinking: bool | None - Default when supported:None` (use mode...

(QB_NEW_EN)


[grammar] ~82-~82: There might be a mistake here.
Context: ...en supported: None (use model default) - Only currently supported on Nemotron mod...

(QB_NEW_EN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: CI Pipeline / Test (arm64, 3.11)
  • GitHub Check: CI Pipeline / Test (amd64, 3.12)
  • GitHub Check: CI Pipeline / Test (arm64, 3.12)
  • GitHub Check: CI Pipeline / Documentation
  • GitHub Check: CI Pipeline / Build Wheels
🔇 Additional comments (9)
ci/vale/styles/config/vocabularies/nat/accept.txt (1)

94-94: LGTM!

The addition of "Nemotron" to the accepted vocabulary is appropriate and aligns with the PR's introduction of Nemotron model support throughout the codebase.

packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (2)

80-81: LGTM! Consistent implementation of thinking injection across all providers.

The thinking injection is properly implemented with the same pattern across all three CrewAI client registrations (Azure OpenAI, NIM, and OpenAI). The conditional check ensures the injection only occurs when the model supports thinking and has a valid prompt.

Also applies to: 118-119, 142-143


35-36: Add return type hint for inner function.

The injector function should have explicit type hints.

Apply this fix:

-    def injector(messages: list[dict[str, str]]) -> list[dict[str, str]]:
+    def injector(messages: list[dict[str, str]]) -> list[dict[str, str]]:
         return [{"role": "system", "content": system_prompt}] + messages

Wait, the type hints are already present. Let me skip this comment.

packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py (4)

58-60: Thinking gating and patch order look correct for Bedrock.

Injection happens before retry patching, which preserves expected behavior. LGTM.


79-81: Thinking gating and patch order look correct for Azure OpenAI.

Same pattern as Bedrock; no issues spotted.


124-126: Thinking gating and patch order look correct for OpenAI.

Consistent with other providers; good to go.


35-35: Verify ChatMessage import path for all supported llama_index versions
I was able to confirm that in llama_index 0.10.0 the only valid import is:

from llama_index.core.base.llms.types import ChatMessage

The alternative path (llama_index.core.llms.types) does not exist in 0.10.0. I could not automatically verify v0.11.x due to environment constraints—please manually confirm that from llama_index.core.base.llms.types import ChatMessage remains valid (and that no shorter path like from llama_index.core.llms.types import ChatMessage was introduced) in all 0.11.y versions you intend to support.

• File to check:

  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py (line 35)
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (2)

75-77: Thinking gating and patch order look correct for AGNO (NIM).

Injection before retry preserves expected behavior and mirrors other providers.


78-83: Retry patch application looks correct.

No issues spotted; parameters sourced from RetryMixin and applied after thinking injection. LGTM.

@willkill07 willkill07 force-pushed the wkk_nemotron_thinking_support branch from 3c7ca93 to a31169c Compare August 26, 2025 15:21
@willkill07 willkill07 force-pushed the wkk_nemotron_thinking_support branch from a31169c to 26bfe3a Compare August 26, 2025 15:24
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (1)

71-84: Fix environment variable propagation for NVIDIA API Key

The current block in packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (lines 71–84) never actually transfers the key and even misspells the temp variable. According to Agno’s documentation, the NVIDIA_API_KEY environment variable is used for authentication (docs.agno.com). We should instead set NVIDIA_API_KEY from config_obj["api_key"] when it’s provided but not already in the environment.

Please replace the existing block with:

-    # Because Agno uses a different environment variable for the API key, we need to set it here manually
-    if ("api_key" not in config_obj or config_obj["api_key"] is None):
-
-        if ("NVIDIA_API_KEY" in os.environ):
-            # Dont need to do anything. User has already set the correct key
-            pass
-        else:
-            nvidai_api_key = os.getenv("NVIDIA_API_KEY")
-
-            if (nvidai_api_key is not None):
-                # Transfer the key to the correct environment variable
-                os.environ["NVIDIA_API_KEY"] = nvidai_api_key
+    # Ensure Agno’s required NVIDIA_API_KEY is set when only config.api_key is provided
+    api_key = config_obj.get("api_key")
+    if api_key and "NVIDIA_API_KEY" not in os.environ:
+        os.environ["NVIDIA_API_KEY"] = api_key
  • Correctly propagates config_obj["api_key"] to the environment
  • Removes the no-op logic and typo in the temporary variable
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (1)

68-94: Fix Azure credential resolution in llm.py

The current implementation excludes "api_key", "azure_endpoint", and "azure_deployment" from config_obj via model_dump, then immediately attempts to read them back from config_obj. As a result, even when llm_config provides these values, the code will always fall through to the environment checks or raise avoidable ValueErrors.

• File: packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
• Lines: 68–94

Apply this critical patch:

@@
-    api_key = config_obj.get("api_key") or os.environ.get("AZURE_OPENAI_API_KEY") or os.environ.get("AZURE_API_KEY")
+    api_key = llm_config.api_key or os.environ.get("AZURE_OPENAI_API_KEY") or os.environ.get("AZURE_API_KEY")
     if api_key is None:
         raise ValueError("Azure API key is not set")
     os.environ["AZURE_API_KEY"] = api_key
@@
-    api_base = (config_obj.get("azure_endpoint") or os.environ.get("AZURE_OPENAI_ENDPOINT")
-                or os.environ.get("AZURE_API_BASE"))
+    api_base = (llm_config.azure_endpoint or os.environ.get("AZURE_OPENAI_ENDPOINT")
+                or os.environ.get("AZURE_API_BASE"))
     if api_base is None:
         raise ValueError("Azure endpoint is not set")
     os.environ["AZURE_API_BASE"] = api_base
@@
-    os.environ["AZURE_API_VERSION"] = llm_config.api_version
+    # only set the version if provided
+    if llm_config.api_version:
+        os.environ["AZURE_API_VERSION"] = llm_config.api_version
@@
-    model = config_obj.get("azure_deployment") or os.environ.get("AZURE_MODEL_DEPLOYMENT")
+    model = llm_config.azure_deployment or os.environ.get("AZURE_MODEL_DEPLOYMENT")
     if model is None:
         raise ValueError("Azure model deployment is not set")
 
     config_obj["model"] = model

This change ensures we use the values from llm_config directly, preserving intended overrides and preventing spurious errors.

packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (4)

84-101: Annotate return type and add docstring to public wrapper (AWS Bedrock).

Per guidelines, public APIs require return annotations and Google-style docstrings. This function is an async generator that yields the client.

-async def aws_bedrock_langchain(llm_config: AWSBedrockModelConfig, _builder: Builder):
+async def aws_bedrock_langchain(llm_config: AWSBedrockModelConfig, _builder: Builder) -> AsyncIterator[object]:
+    """
+    Yield a LangChain ChatBedrockConverse client configured via nat AWSBedrockModelConfig.
+    
+    Applies Nemotron thinking injection and automatic retries when configured.
+    """

Note: The thinking gate here looks good; injection occurs before retries.


122-139: Annotate return type and add docstring to public wrapper (NIM).

-async def nim_langchain(llm_config: NIMModelConfig, _builder: Builder):
+async def nim_langchain(llm_config: NIMModelConfig, _builder: Builder) -> AsyncIterator[object]:
+    """
+    Yield a LangChain ChatNVIDIA client configured via nat NIMModelConfig.
+    
+    Applies Nemotron thinking injection and automatic retries when configured.
+    """

141-163: Annotate return type and add docstring to public wrapper (OpenAI).

Add return annotation and docstring to meet public API requirements.

-async def openai_langchain(llm_config: OpenAIModelConfig, _builder: Builder):
+async def openai_langchain(llm_config: OpenAIModelConfig, _builder: Builder) -> AsyncIterator[object]:
+    """
+    Yield a LangChain ChatOpenAI client configured via nat OpenAIModelConfig.
+    
+    Enables stream_usage by default; applies Nemotron thinking injection and automatic retries when configured.
+    """

103-120: Add return type annotation and docstring to azure_openai_langchain; no stream_usage support

The azure_openai_langchain wrapper should be annotated and documented per our coding standards. After verification, AzureChatOpenAI (v0.3.32) does not expose a stream_usage parameter in its constructor or base class, so no default is required here.

File: packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
Lines: 103–120

• Change the signature to include an explicit return type.
• Add a concise docstring describing purpose and behavior.
• No change needed for stream_usage—AzureChatOpenAI doesn’t accept it.

-async def azure_openai_langchain(llm_config: AzureOpenAIModelConfig, _builder: Builder):
+async def azure_openai_langchain(
+    llm_config: AzureOpenAIModelConfig,
+    _builder: Builder,
+) -> AsyncIterator[object]:
+    """
+    Yield a LangChain AzureChatOpenAI client configured via nat AzureOpenAIModelConfig.
+
+    Applies Nemotron thinking injection and automatic retries when configured.
+    """
     from langchain_openai import AzureChatOpenAI

     client = AzureChatOpenAI(**llm_config.model_dump(exclude={"type"}, by_alias=True))
@@ unchanged @@
     yield client

No further action needed regarding stream_usage, as AzureChatOpenAI’s constructor signature is (*args, **kwargs) with no explicit stream_usage parameter in v0.3.32.

♻️ Duplicate comments (5)
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (3)

17-17: Import missing typing/ABC symbols to support required annotations.

To add return type hints (AsyncIterator[Any]) on public builders and to generalize the injector signature to Sequence, extend imports.

-from typing import TypeVar
+from typing import TypeVar, Any
+from collections.abc import AsyncIterator, Sequence

62-62: Add return type and Google-style docstring to public builder.

Public APIs must have return type hints (3.11+) and docstrings. This is an async generator.

-async def nim_agno(llm_config: NIMModelConfig, _builder: Builder):
+async def nim_agno(llm_config: NIMModelConfig, _builder: Builder) -> AsyncIterator[Any]:
+    """Yield an AGNO Nvidia client configured from nat.
+
+    Applies thinking prompt injection and retry policy if configured in llm_config.
+
+    Args:
+        llm_config: nat NIM provider configuration.
+        _builder: nat Builder instance (unused).
+
+    Yields:
+        Any: Configured AGNO Nvidia client.
+    """

104-105: Add return type and Google-style docstring to public builder.

Mirror nim_agno: annotate and document async generator return.

-async def openai_agno(llm_config: OpenAIModelConfig, _builder: Builder):
+async def openai_agno(llm_config: OpenAIModelConfig, _builder: Builder) -> AsyncIterator[Any]:
+    """Yield an AGNO OpenAI client configured from nat.
+
+    Applies thinking prompt injection and retry policy if configured in llm_config.
+
+    Args:
+        llm_config: nat OpenAI provider configuration.
+        _builder: nat Builder instance (unused).
+
+    Yields:
+        Any: Configured AGNO OpenAI client.
+    """
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (1)

34-58: Add a docstring to _crewai_thinking_injector and broaden message typing; optionally patch call and avoid double-injection.

  • Add a concise Google-style docstring to the outer helper (prior review also requested this on the public variant).
  • Prefer collections.abc abstractions per guidelines (Sequence/Mapping) for messages.
  • Optionally: include "call" in function_names for broader CrewAI compatibility and guard against injecting the same system message twice.

Apply within this hunk:

-def _crewai_thinking_injector(client: ModelType, system_prompt: str) -> ModelType:
+def _crewai_thinking_injector(client: ModelType, system_prompt: str) -> ModelType:
+    """
+    Inject a thinking system prompt into CrewAI LLM client calls.
+    
+    Prepends a system message {"role": "system", "content": system_prompt} to the first
+    positional messages argument for LLM.call/.__call__ invocations while preserving
+    the remaining args/kwargs. Returns the patched client.
+    """
 
-    def injector(messages: list[dict[str, str]], *args, **kwargs) -> FunctionArgumentWrapper:
+    def injector(messages: Sequence[Mapping[str, str]], *args, **kwargs) -> FunctionArgumentWrapper:
         """
         Inject a system prompt into the messages.
 
         The messages are the first (non-object) argument to the function.
         The rest of the arguments are passed through unchanged.
 
         Args:
             messages: The messages to inject the system prompt into.
             *args: The rest of the arguments to the function.
             **kwargs: The rest of the keyword arguments to the function.
 
         Returns:
             FunctionArgumentWrapper: An object that contains the transformed args and kwargs.
         """
-        new_messages = [{"role": "system", "content": system_prompt}] + messages
+        # Avoid double-injection if the first message already matches.
+        if messages and messages[0].get("role") == "system" and messages[0].get("content") == system_prompt:
+            return FunctionArgumentWrapper(messages, *args, **kwargs)
+        new_messages = [{"role": "system", "content": system_prompt}]
+        new_messages.extend(list(messages))
         return FunctionArgumentWrapper(new_messages, *args, **kwargs)
 
     return patch_with_thinking(
         client,
-        function_names=["call"],
+        function_names=["call", "__call__"],
         system_prompt_injector=injector,
     )

Add the needed imports near the other imports:

from collections.abc import Mapping, Sequence
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (1)

16-18: Add AsyncIterator import and use it to annotate async generator wrappers’ return types.

Public APIs must have full type hints. The wrapper factories below are async generators (async def ... yield ...). Import AsyncIterator here so we can type their returns.

-from collections.abc import Sequence
+from collections.abc import AsyncIterator, Sequence
🧹 Nitpick comments (11)
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (3)

33-59: Make thinking injection idempotent and accept Sequence[Message] instead of list.

  • Prevents duplicate system prompts when upstream already inserted one.
  • Broadens compatibility with any sequence type per repo guidelines.
-def _agno_thinking_injector(client: ModelType, system_prompt: str) -> ModelType:
+def _agno_thinking_injector(client: ModelType, system_prompt: str) -> ModelType:
+    """Patch an AGNO client to inject a thinking system prompt on every invocation."""
     from agno.models.message import Message
 
-    def injector(messages: list[Message], *args, **kwargs) -> FunctionArgumentWrapper:
+    def injector(messages: Sequence[Message], *args, **kwargs) -> FunctionArgumentWrapper:
         """
         Inject a system prompt into the messages.
@@
-        new_messages = [Message(role="system", content=system_prompt)] + messages
+        # Idempotency: if the first message already matches, do not duplicate.
+        if len(messages) > 0:
+            first = messages[0]
+            if getattr(first, "role", None) == "system" and getattr(first, "content", None) == system_prompt:
+                return FunctionArgumentWrapper(messages, *args, **kwargs)
+        new_messages = [Message(role="system", content=system_prompt)] + list(messages)
         return FunctionArgumentWrapper(new_messages, *args, **kwargs)
 
     return patch_with_thinking(
         client,
-        function_names=["invoke_stream", "invoke", "ainvoke", "ainvoke_stream"],
+        function_names=["invoke_stream", "invoke", "ainvoke", "ainvoke_stream"],
         system_prompt_injector=injector,
     )

30-30: Optional: factor method names into a single constant to avoid drift.

Keeps patch target names consistent across providers and eases future updates.

 ModelType = TypeVar("ModelType")
 
+AGNO_INVOKE_METHODS: tuple[str, ...] = ("invoke_stream", "invoke", "ainvoke", "ainvoke_stream")

And use it below:

-        function_names=["invoke_stream", "invoke", "ainvoke", "ainvoke_stream"],
+        function_names=list(AGNO_INVOKE_METHODS),

16-16: Optional: module docstring.

Add a brief module docstring per repo docs.

 import os
+"""AGNO builders and thinking/retry patching for nat LLM clients."""
packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py (6)

72-74: Good placement: thinking before retries; reduce repetition with a helper.

The order (inject thinking, then wrap with retries) is correct. To avoid repeating the isinstance/None check across providers, factor a small helper and call it here.

Apply this local diff:

-    if isinstance(llm_config, ThinkingMixin) and llm_config.thinking_system_prompt is not None:
-        llm = _llama_index_thinking_injector(llm, llm_config.thinking_system_prompt)
+    llm = _maybe_inject_thinking(llm, llm_config)

And add this helper (place it near _llama_index_thinking_injector):

def _maybe_inject_thinking(llm: ModelType, llm_config: ThinkingMixin) -> ModelType:
    return (_llama_index_thinking_injector(llm, llm_config.thinking_system_prompt)
            if isinstance(llm_config, ThinkingMixin) and llm_config.thinking_system_prompt is not None
            else llm)

93-95: Same refactor applies here.

Call the shared helper for consistency and to keep these registration functions lean.

-    if isinstance(llm_config, ThinkingMixin) and llm_config.thinking_system_prompt is not None:
-        llm = _llama_index_thinking_injector(llm, llm_config.thinking_system_prompt)
+    llm = _maybe_inject_thinking(llm, llm_config)

117-119: Same refactor applies here (NIM).

Keeps all providers uniform and easier to maintain.

-    if isinstance(llm_config, ThinkingMixin) and llm_config.thinking_system_prompt is not None:
-        llm = _llama_index_thinking_injector(llm, llm_config.thinking_system_prompt)
+    llm = _maybe_inject_thinking(llm, llm_config)

141-143: Same refactor applies here (OpenAI).

One-liner helper simplifies future changes (e.g., adding another patched method).

-    if isinstance(llm_config, ThinkingMixin) and llm_config.thinking_system_prompt is not None:
-        llm = _llama_index_thinking_injector(llm, llm_config.thinking_system_prompt)
+    llm = _maybe_inject_thinking(llm, llm_config)

35-61: Add a quick test to lock behavior.

Recommend adding/expanding tests to assert:

  • The first message is a system message equal to the computed thinking_system_prompt when thinking is enabled.
  • No duplicate injection occurs if the first message is already the same system prompt.
  • Retry wrapper still wraps patched methods (e.g., monkeypatch chat to count calls and ensure retry happens on transient errors).

I can draft pytest tests with a simple Mock LLM implementing chat/stream_chat/achat/astream_chat that records args, so we don’t bring in llama_index for unit tests. Want me to push those?


35-61: Harden the thinking injector and align with provider capabilities

  • In llama_index 0.10.x and 0.11.x, ChatMessage’s role field is typed as MessageRole but Pydantic will coerce a matching string (e.g. "system") into the enum. Both syntaxes work, though using MessageRole.SYSTEM is recommended for clarity and typo safety. (docs.llamaindex.ai)
  • The OpenAI, AzureOpenAI, and NVIDIA LLM wrappers all expose chat, stream_chat, achat, and astream_chat. (docs.llamaindex.ai)
  • The Bedrock wrapper supports chat, stream_chat, and achat but does not implement astream_chat (it raises NotImplementedError in v0.10.x). If you intend to inject into async streaming on Bedrock, you’ll need to guard against or omit astream_chat. (aidoczh.com)
  • Retain the duplicate-injection guard and add a concise injector docstring. Also consider adding a module-level docstring after the SPDX header.

Apply this refined diff in packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py:

-def _llama_index_thinking_injector(client: ModelType, system_prompt: str) -> ModelType:
-    from llama_index.core.base.llms.types import ChatMessage
+def _llama_index_thinking_injector(client: ModelType, system_prompt: str) -> ModelType:
+    """Inject a LlamaIndex-compatible system prompt into chat calls before user messages."""
+    from llama_index.core.base.llms.types import ChatMessage, MessageRole
@@
     def injector(messages: Sequence[ChatMessage], *args, **kwargs) -> FunctionArgumentWrapper:
-        new_messages = [ChatMessage(role="system", content=system_prompt)] + list(messages)
+        # Avoid double-injecting if the first message already matches our system prompt.
+        if messages:
+            first = messages[0]
+            if getattr(first, "role", None) in (MessageRole.SYSTEM, "system") \
+               and getattr(first, "content", "").strip() == system_prompt:
+                return FunctionArgumentWrapper(messages, *args, **kwargs)
+        new_messages = [ChatMessage(role=MessageRole.SYSTEM, content=system_prompt)] + list(messages)
         return FunctionArgumentWrapper(new_messages, *args, **kwargs)

And add at top of module (just after the SPDX header):

"""LlamaIndex LLM client registrations for nat with optional thinking system-prompt injection."""
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (1)

120-132: Minor: typo in variable name and comment clarity.

Rename nvidai_api_key -> nvidia_api_key; tighten comment to CrewAI context (not LiteLLM).

-            nvidai_api_key = os.getenv("NVIDIA_API_KEY")
+            nvidia_api_key = os.getenv("NVIDIA_API_KEY")
 
-            if (nvidai_api_key is not None):
-                # Transfer the key to the correct environment variable for LiteLLM
-                os.environ["NVIDIA_NIM_API_KEY"] = nvidai_api_key
+            if nvidia_api_key is not None:
+                # Transfer the key to the CrewAI env var expected by its NIM integration
+                os.environ["NVIDIA_NIM_API_KEY"] = nvidia_api_key
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (1)

37-41: Use consistent, stable BaseMessage import path.

LangChain exposes BaseMessage from langchain_core.messages. Importing from .messages (not .messages.base) is more conventional and consistent with the adjacent imports.

-from langchain_core.messages.base import BaseMessage
+from langchain_core.messages import BaseMessage
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 029578d and 26bfe3a.

📒 Files selected for processing (7)
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (4 hunks)
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (4 hunks)
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (5 hunks)
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py (5 hunks)
  • packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py (3 hunks)
  • src/nat/llm/utils/thinking.py (1 hunks)
  • tests/nat/llm/utils/test_thinking.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py
  • src/nat/llm/utils/thinking.py
  • tests/nat/llm/utils/test_thinking.py
🧰 Additional context used
📓 Path-based instructions (9)
{src/**/*.py,packages/*/src/**/*.py,examples/*/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' only for env var prefixes and informal code comments (never in documentation)

Files:

  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All importable Python code in packages must live under packages//src/

Files:

  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Format Python code with yapf (PEP8 base, column_limit=120) run second in the toolchain
Run ruff (ruff check --fix) using configuration in pyproject.toml; fix warnings unless explicitly ignored; ruff is a linter only (not for formatting)
Respect naming: snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
Prefer typing/collections.abc abstractions (e.g., Sequence over list)
Use typing.Annotated for units or extra metadata when useful
Treat pyright warnings (configured in pyproject.toml) as errors during development
Prefer httpx with SSL verification enabled by default for HTTP requests; follow OWASP Top-10
Use async/await for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial

Files:

  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*.{py,md,sh,yml,yaml,toml,json,ini,cfg,txt,rst,pyi}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces (no tabs) and ensure every file ends with a single trailing newline

Files:

  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{src/**/*.py,packages/*/src/**/*.py}: All public APIs require Python 3.11+ type hints on parameters and return values
Provide Google-style docstrings for every public module, class, function and CLI command
Docstring first line must be a concise description ending with a period

Files:

  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*.{py,sh,md,yml,yaml,toml,json,ini,txt,cfg,rst,pyi}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.{py,sh,md,yml,yaml,toml,json,ini,txt,cfg,rst,pyi}: Every file must start with the standard SPDX Apache-2.0 header
All source files must include the SPDX Apache-2.0 header template

Files:

  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*.{py,md}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived by setuptools-scm; never hard-code versions in code or documentation

Files:

  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code should be licensed under the Apache License 2.0, and should contain an Apache License 2.0 header

comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

  • Not all packages contain Python code, if they do they should also contain their own set of tests, in a
    tests/ directory at the same level as the pyproject.toml file.

Files:

  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
🧬 Code graph analysis (4)
packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py (2)
src/nat/data_models/thinking_mixin.py (2)
  • ThinkingMixin (28-66)
  • thinking_system_prompt (49-66)
src/nat/llm/utils/thinking.py (2)
  • FunctionArgumentWrapper (32-51)
  • patch_with_thinking (91-178)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (2)
src/nat/data_models/thinking_mixin.py (2)
  • ThinkingMixin (28-66)
  • thinking_system_prompt (49-66)
src/nat/llm/utils/thinking.py (2)
  • FunctionArgumentWrapper (32-51)
  • patch_with_thinking (91-178)
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (5)
src/nat/data_models/thinking_mixin.py (2)
  • ThinkingMixin (28-66)
  • thinking_system_prompt (49-66)
src/nat/llm/azure_openai_llm.py (2)
  • azure_openai_llm (55-57)
  • AzureOpenAIModelConfig (30-51)
src/nat/llm/nim_llm.py (1)
  • NIMModelConfig (31-41)
src/nat/llm/openai_llm.py (2)
  • openai_llm (45-47)
  • OpenAIModelConfig (30-41)
src/nat/llm/utils/thinking.py (2)
  • FunctionArgumentWrapper (32-51)
  • patch_with_thinking (91-178)
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (4)
src/nat/data_models/thinking_mixin.py (2)
  • ThinkingMixin (28-66)
  • thinking_system_prompt (49-66)
src/nat/llm/nim_llm.py (1)
  • NIMModelConfig (31-41)
src/nat/llm/openai_llm.py (2)
  • openai_llm (45-47)
  • OpenAIModelConfig (30-41)
src/nat/llm/utils/thinking.py (2)
  • FunctionArgumentWrapper (32-51)
  • patch_with_thinking (91-178)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: CI Pipeline / Check
🔇 Additional comments (14)
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (2)

91-96: Thinking injection before retry patch is correct.

Order of patches ensures the retried call includes the injected system message. Nice.


118-125: OpenAI: thinking injection before retry is also correct.

Consistent sequencing with the NIM builder. LGTM.

packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py (4)

16-18: LGTM: typing imports are appropriate for the new wrapper.

Using Sequence and TypeVar here keeps the injector’s API precise without over-constraining it. No issues.


23-23: LGTM: gating via ThinkingMixin is correct.

This import aligns with the PR objective: feature-gate thinking by model identity and explicit config opt-in.


28-29: LGTM: correct utilities for argument wrapping and patching.

These are the right hooks to prepend a system message while preserving original signatures.


32-33: LGTM: generic ModelType is a good choice.

Keeps patching return types faithful to the original LLM instance across providers.

packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (8)

17-17: Import of TypeVar is appropriate and scoped.

No issues; used to parameterize ModelType below.


23-23: ThinkingMixin import aligns with new PR wiring.

Matches config classes inheriting ThinkingMixin; good.


27-28: Correct utilities imported for thinking injection.

FunctionArgumentWrapper and patch_with_thinking are the right abstractions here.


31-33: ModelType TypeVar setup looks good.

Keeps return type fidelity of the patched client.


96-98: Thinking injection gating is correct and ordered before retry.

The isinstance check plus None guard is clean and avoids overhead when unset. Ordering before retry preserves original semantics.


134-136: Thinking injection gating for NIM is consistent.

Matches Azure/OpenAI pattern; good reuse of the helper.


158-160: Thinking injection gating for OpenAI is consistent.

Keeps provider behavior uniform.


54-58: CrewAI LLM patch covers the correct entrypoint (call) across supported versions

I checked the official CrewAI docs and code examples—every LLM invocation goes through the call(...) method, not __call__ or chat. In the runtime stack traces (e.g. in crew_agent_executor.py: answer = self.llm.call(...)) and in the “Connecting to LLMs” guide, only call is referenced for completions (github.com, docs.crewai.com). There is no __call__ alias or chat method exposed by the LLM class in any released version (github.com, docs.crewai.com).

Therefore, using

patch_with_thinking(
    client,
    function_names=["call"],
    system_prompt_injector=injector,
)

will reliably wrap the CrewAI LLM entrypoint, and no additional method names are needed.

Signed-off-by: Will Killian <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (3)

80-84: Include batch/abatch to future-proof thinking injection across LangChain clients.

Some chat models expose batch/abatch; patch them now to avoid gaps if used later.

     return patch_with_thinking(
         client,
-        function_names=["invoke", "ainvoke", "stream", "astream"],
+        function_names=["invoke", "ainvoke", "stream", "astream", "batch", "abatch"],
         system_prompt_injector=injector,
     )

Run this to see if batch/abatch are currently used (helps prioritize urgency):

#!/bin/bash
# Audit for potential call sites that would bypass injection if not patched.
rg -nP -C2 '\b(a?batch|a?invoke|a?stream)\s*\(' --type=py

16-18: Add AsyncIterator import and annotate async-generator wrappers’ return types.

Public APIs must have return type hints; these four factory wrappers are async generators (they yield clients). Import AsyncIterator and annotate the four functions accordingly.

Apply this diff for the import:

-from collections.abc import Sequence
+from collections.abc import AsyncIterator, Sequence

Then update the wrapper signatures and add concise docstrings (example shown for all four):

# aws_bedrock_langchain (line ~88)
async def aws_bedrock_langchain(llm_config: AWSBedrockModelConfig, _builder: Builder) -> AsyncIterator[object]:
    """
    Build and yield a LangChain ChatBedrockConverse client with optional thinking and retry wrappers.

    Args:
        llm_config: AWS Bedrock model config.
        _builder: Workflow builder (unused).

    Yields:
        The configured ChatBedrockConverse client.
    """

# azure_openai_langchain (line ~107)
async def azure_openai_langchain(llm_config: AzureOpenAIModelConfig, _builder: Builder) -> AsyncIterator[object]:
    """
    Build and yield a LangChain AzureChatOpenAI client with optional thinking and retry wrappers.

    Args:
        llm_config: Azure OpenAI model config.
        _builder: Workflow builder (unused).

    Yields:
        The configured AzureChatOpenAI client.
    """

# nim_langchain (line ~126)
async def nim_langchain(llm_config: NIMModelConfig, _builder: Builder) -> AsyncIterator[object]:
    """
    Build and yield a LangChain ChatNVIDIA client with optional thinking and retry wrappers.

    Args:
        llm_config: NIM model config.
        _builder: Workflow builder (unused).

    Yields:
        The configured ChatNVIDIA client.
    """

# openai_langchain (line ~145)
async def openai_langchain(llm_config: OpenAIModelConfig, _builder: Builder) -> AsyncIterator[object]:
    """
    Build and yield a LangChain ChatOpenAI client with optional thinking and retry wrappers.

    Args:
        llm_config: OpenAI model config.
        _builder: Workflow builder (unused).

    Yields:
        The configured ChatOpenAI client.
    """

58-60: Raise on unsupported LanguageModelInput instead of silently passing through.

Docstring says ValueError is raised for invalid inputs, but the else branch currently passes them through unchanged, which can hide misuses and skip the thinking injection.

Apply this diff:

         Raises:
             ValueError: If the messages are not a valid type for LanguageModelInput.
         """
@@
-        else:
-            return FunctionArgumentWrapper(messages, *args, **kwargs)
+        else:
+            raise ValueError(
+                f"Unsupported LanguageModelInput: {type(messages)}. "
+                "Expected str | BaseMessage | PromptValue | Sequence[BaseMessage]."
+            )

Also applies to: 77-78

🧹 Nitpick comments (1)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (1)

35-43: Add a brief docstring to the injector for clarity.

Not required, but helpful for discoverability and to satisfy internal documentation checks.

 def _langchain_thinking_injector(client: ModelType, system_prompt: str) -> ModelType:
+    """
+    Patch a LangChain chat model client to prepend a thinking SystemMessage to inputs.
+
+    Args:
+        client: The LangChain chat model instance to patch.
+        system_prompt: The system prompt to inject (e.g., "/think" or "/no_think").
+
+    Returns:
+        The same client instance patched so invoke/ainvoke/stream/astream receive inputs
+        with the thinking SystemMessage prepended.
+    """
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 26bfe3a and ea3387d.

📒 Files selected for processing (2)
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (5 hunks)
  • src/nat/llm/utils/thinking.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/nat/llm/utils/thinking.py
🧰 Additional context used
📓 Path-based instructions (9)
{src/**/*.py,packages/*/src/**/*.py,examples/*/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' only for env var prefixes and informal code comments (never in documentation)

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All importable Python code in packages must live under packages//src/

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Format Python code with yapf (PEP8 base, column_limit=120) run second in the toolchain
Run ruff (ruff check --fix) using configuration in pyproject.toml; fix warnings unless explicitly ignored; ruff is a linter only (not for formatting)
Respect naming: snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
Prefer typing/collections.abc abstractions (e.g., Sequence over list)
Use typing.Annotated for units or extra metadata when useful
Treat pyright warnings (configured in pyproject.toml) as errors during development
Prefer httpx with SSL verification enabled by default for HTTP requests; follow OWASP Top-10
Use async/await for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
**/*.{py,md,sh,yml,yaml,toml,json,ini,cfg,txt,rst,pyi}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces (no tabs) and ensure every file ends with a single trailing newline

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{src/**/*.py,packages/*/src/**/*.py}: All public APIs require Python 3.11+ type hints on parameters and return values
Provide Google-style docstrings for every public module, class, function and CLI command
Docstring first line must be a concise description ending with a period

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
**/*.{py,sh,md,yml,yaml,toml,json,ini,txt,cfg,rst,pyi}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.{py,sh,md,yml,yaml,toml,json,ini,txt,cfg,rst,pyi}: Every file must start with the standard SPDX Apache-2.0 header
All source files must include the SPDX Apache-2.0 header template

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
**/*.{py,md}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived by setuptools-scm; never hard-code versions in code or documentation

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code should be licensed under the Apache License 2.0, and should contain an Apache License 2.0 header

comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

  • Not all packages contain Python code, if they do they should also contain their own set of tests, in a
    tests/ directory at the same level as the pyproject.toml file.

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
🧬 Code graph analysis (1)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (2)
src/nat/data_models/thinking_mixin.py (2)
  • ThinkingMixin (28-66)
  • thinking_system_prompt (49-66)
src/nat/llm/utils/thinking.py (2)
  • FunctionArgumentWrapper (32-51)
  • patch_with_thinking (91-178)
🔇 Additional comments (3)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (3)

23-23: LGTM: imports, localized LangChain deps, and ModelType alias are appropriate.

  • ThinkingMixin import, use of FunctionArgumentWrapper/patch_with_thinking, and the ModelType TypeVar look good.
  • Localizing LangChain imports inside the injector keeps optional deps soft.

Also applies to: 28-30, 32-33, 37-42


61-74: Good: handles BaseMessage/PromptValue/str/Sequence without in-place mutation.

  • Properly supports single BaseMessage and Sequence[BaseMessage], fixing prior gaps.
  • Builds new lists; avoids mutating PromptValue-derived lists.

94-96: LGTM: Thinking injection gated by config and only when a prompt is available.

Conditionally wrapping the client when ThinkingMixin.thinking_system_prompt is not None is correct and keeps behavior non-breaking for unsupported models.

Also applies to: 113-115, 132-134, 157-159

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (7)
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (1)

37-59: Add complete type hints to public function.

The function _patch_llm_based_on_config lacks a docstring and is missing complete type hints per coding guidelines. All public APIs require Google-style docstrings.

Apply this fix:

 def _patch_llm_based_on_config(client: ModelType, llm_config: LLMBaseConfig) -> ModelType:
+    """
+    Apply thinking and retry patches to a CrewAI LLM client based on config.
+    
+    Args:
+        client: The CrewAI LLM client to patch.
+        llm_config: Configuration containing thinking and retry settings.
+    
+    Returns:
+        The patched client with thinking and retry behavior applied.
+    """
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (3)

38-39: Add docstring to public function.

The function _patch_llm_based_on_config requires a Google-style docstring per coding guidelines.

Apply this fix:

 def _patch_llm_based_on_config(client: ModelType, llm_config: LLMBaseConfig) -> ModelType:
+    """
+    Apply thinking and retry patches to a LangChain LLM client based on config.
+    
+    Args:
+        client: The LangChain LLM client to patch.
+        llm_config: Configuration containing thinking and retry settings.
+    
+    Returns:
+        The patched client with thinking and retry behavior applied.
+    """

91-96: Consider adding batch/abatch to function_names list.

Many LangChain chat models also expose batch and abatch methods. Consider including them for complete coverage of thinking injection.

                 function_names=[
                     "invoke",
                     "ainvoke",
                     "stream",
                     "astream",
+                    "batch",
+                    "abatch",
                 ],

84-84: Potential issue: Invalid inputs should raise an error.

The else branch returns the messages unchanged rather than raising an error for unsupported types. This could hide bugs where invalid input types are passed.

Apply this fix:

             else:
-                return FunctionArgumentWrapper(messages, *args, **kwargs)
+                raise ValueError(
+                    f"Unsupported LanguageModelInput: {type(messages)}. "
+                    "Expected str | BaseMessage | PromptValue | Sequence[BaseMessage]."
+                )
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (3)

16-18: Add missing typing imports to support return annotations.

You’ll need AsyncIterator and Any for the provider factory return types. Recommend importing AsyncIterator from collections.abc per guidelines.

-from typing import TypeVar
+from typing import Any, TypeVar
+from collections.abc import AsyncIterator

70-74: Add return type and a minimal docstring to the public provider factory.

Public APIs require return hints and Google-style docstrings. Returning an async generator → AsyncIterator[Any].

-async def nim_agno(llm_config: NIMModelConfig, _builder: Builder):
+async def nim_agno(llm_config: NIMModelConfig, _builder: Builder) -> AsyncIterator[Any]:
+    """Yield an AGNO Nvidia client configured from nat, optionally patched for thinking and retries.
+
+    Yields:
+        The Nvidia client instance patched per llm_config.
+    """

103-106: Add return type and a minimal docstring to the OpenAI provider factory.

Mirror the NIM function’s annotations and docstring style.

-async def openai_agno(llm_config: OpenAIModelConfig, _builder: Builder):
+async def openai_agno(llm_config: OpenAIModelConfig, _builder: Builder) -> AsyncIterator[Any]:
+    """Yield an AGNO OpenAI client configured from nat, optionally patched for thinking and retries.
+
+    Yields:
+        The OpenAIChat client instance patched per llm_config.
+    """
🧹 Nitpick comments (4)
packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py (1)

42-48: Add docstring to the inject method.

The inject method overrides the base class method and should include a docstring explaining its specific behavior.

Apply this fix:

         @override
         def inject(self, messages: Sequence[ChatMessage], *args, **kwargs) -> FunctionArgumentWrapper:
+            """
+            Inject a system prompt as the first message in the sequence.
+            
+            Args:
+                messages: Sequence of ChatMessage objects to inject the prompt into.
+                *args: Additional positional arguments to pass through.
+                **kwargs: Additional keyword arguments to pass through.
+                
+            Returns:
+                FunctionArgumentWrapper containing the modified messages and other args.
+            """
             new_messages = [ChatMessage(role="system", content=self.system_prompt)] + list(messages)
             return FunctionArgumentWrapper(new_messages, *args, **kwargs)
src/nat/llm/utils/thinking.py (1)

97-100: Remove unnecessary return statement in generator.

The explicit return statement after yield from is unnecessary and can be removed.

Apply this fix:

         def _gen(obj: object, *call_args, **call_kwargs) -> Iterable[Any]:
             new_args = injector.inject(*call_args, **call_kwargs)
             yield from fn(obj, *new_args.args, **new_args.kwargs)
-            return
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (2)

36-43: Add a concise docstring; remove redundant import in nested class.

  • _patch_llm_based_on_config is a small public-ish helper used by two providers; a short docstring improves readability.
  • Line 42 re-imports Message inside the class body; it’s already imported on Line 38. Drop the duplicate.
-def _patch_llm_based_on_config(client: ModelType, llm_config: LLMBaseConfig) -> ModelType:
+def _patch_llm_based_on_config(client: ModelType, llm_config: LLMBaseConfig) -> ModelType:
+    """Patch an AGNO client based on llm_config (thinking and retries) and return the patched client."""
@@
-    class AgnoThinkingInjector(BaseThinkingInjector):
-
-        from agno.models.message import Message
+    class AgnoThinkingInjector(BaseThinkingInjector):

86-88: Typo in local variable name.

nvidai_api_key → nvidia_api_key. This goes away if you remove the block above, but flagging for completeness.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ea3387d and 8ef99c3.

📒 Files selected for processing (7)
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (3 hunks)
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (4 hunks)
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (5 hunks)
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py (5 hunks)
  • packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py (3 hunks)
  • src/nat/llm/utils/thinking.py (1 hunks)
  • tests/nat/llm/utils/test_thinking.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/nat/llm/utils/test_thinking.py
🧰 Additional context used
📓 Path-based instructions (11)
{src/**/*.py,packages/*/src/**/*.py,examples/*/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' only for env var prefixes and informal code comments (never in documentation)

Files:

  • packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/llm/utils/thinking.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All importable Python code in packages must live under packages//src/

Files:

  • packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Format Python code with yapf (PEP8 base, column_limit=120) run second in the toolchain
Run ruff (ruff check --fix) using configuration in pyproject.toml; fix warnings unless explicitly ignored; ruff is a linter only (not for formatting)
Respect naming: snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
Prefer typing/collections.abc abstractions (e.g., Sequence over list)
Use typing.Annotated for units or extra metadata when useful
Treat pyright warnings (configured in pyproject.toml) as errors during development
Prefer httpx with SSL verification enabled by default for HTTP requests; follow OWASP Top-10
Use async/await for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial

Files:

  • packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/llm/utils/thinking.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
**/*.{py,md,sh,yml,yaml,toml,json,ini,cfg,txt,rst,pyi}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces (no tabs) and ensure every file ends with a single trailing newline

Files:

  • packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/llm/utils/thinking.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{src/**/*.py,packages/*/src/**/*.py}: All public APIs require Python 3.11+ type hints on parameters and return values
Provide Google-style docstrings for every public module, class, function and CLI command
Docstring first line must be a concise description ending with a period

Files:

  • packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/llm/utils/thinking.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
**/*.{py,sh,md,yml,yaml,toml,json,ini,txt,cfg,rst,pyi}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.{py,sh,md,yml,yaml,toml,json,ini,txt,cfg,rst,pyi}: Every file must start with the standard SPDX Apache-2.0 header
All source files must include the SPDX Apache-2.0 header template

Files:

  • packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/llm/utils/thinking.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
**/*.{py,md}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived by setuptools-scm; never hard-code versions in code or documentation

Files:

  • packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/llm/utils/thinking.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code should be licensed under the Apache License 2.0, and should contain an Apache License 2.0 header

comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/llm/utils/thinking.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

  • Not all packages contain Python code, if they do they should also contain their own set of tests, in a
    tests/ directory at the same level as the pyproject.toml file.

Files:

  • packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All importable Python code must live under src/ so namespace packages resolve correctly

Files:

  • src/nat/llm/utils/thinking.py
src/nat/**/*

⚙️ CodeRabbit configuration file

This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.

Files:

  • src/nat/llm/utils/thinking.py
🧬 Code graph analysis (6)
packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py (2)
src/nat/data_models/thinking_mixin.py (2)
  • ThinkingMixin (28-66)
  • thinking_system_prompt (49-66)
src/nat/llm/utils/thinking.py (4)
  • BaseThinkingInjector (57-81)
  • FunctionArgumentWrapper (34-53)
  • patch_with_thinking (121-215)
  • inject (70-81)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (10)
src/nat/data_models/thinking_mixin.py (2)
  • ThinkingMixin (28-66)
  • thinking_system_prompt (49-66)
src/nat/llm/aws_bedrock_llm.py (1)
  • AWSBedrockModelConfig (29-52)
src/nat/llm/azure_openai_llm.py (2)
  • azure_openai_llm (55-57)
  • AzureOpenAIModelConfig (30-51)
src/nat/llm/nim_llm.py (1)
  • NIMModelConfig (31-41)
src/nat/llm/openai_llm.py (2)
  • openai_llm (45-47)
  • OpenAIModelConfig (30-41)
src/nat/llm/utils/thinking.py (4)
  • BaseThinkingInjector (57-81)
  • FunctionArgumentWrapper (34-53)
  • patch_with_thinking (121-215)
  • inject (70-81)
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (2)
  • _patch_llm_based_on_config (36-66)
  • inject (45-47)
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (2)
  • _patch_llm_based_on_config (37-59)
  • inject (42-44)
packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py (2)
  • _patch_llm_based_on_config (38-68)
  • inject (45-47)
packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py (2)
  • _patch_llm_based_on_config (35-86)
  • inject (44-67)
src/nat/llm/utils/thinking.py (6)
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (1)
  • inject (45-47)
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (1)
  • inject (42-44)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (1)
  • inject (49-84)
packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py (1)
  • inject (45-47)
packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py (1)
  • inject (44-67)
tests/nat/llm/utils/test_thinking.py (3)
  • inject (40-41)
  • inject (46-47)
  • inject (52-53)
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (2)
src/nat/data_models/thinking_mixin.py (2)
  • ThinkingMixin (28-66)
  • thinking_system_prompt (49-66)
src/nat/llm/utils/thinking.py (4)
  • BaseThinkingInjector (57-81)
  • FunctionArgumentWrapper (34-53)
  • patch_with_thinking (121-215)
  • inject (70-81)
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (4)
src/nat/data_models/thinking_mixin.py (2)
  • ThinkingMixin (28-66)
  • thinking_system_prompt (49-66)
src/nat/llm/utils/thinking.py (4)
  • BaseThinkingInjector (57-81)
  • FunctionArgumentWrapper (34-53)
  • patch_with_thinking (121-215)
  • inject (70-81)
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (2)
  • _patch_llm_based_on_config (37-59)
  • inject (42-44)
packages/nvidia_nat_agno/tests/test_tool_wrapper.py (1)
  • Message (399-402)
packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py (2)
src/nat/data_models/thinking_mixin.py (2)
  • ThinkingMixin (28-66)
  • thinking_system_prompt (49-66)
src/nat/llm/utils/thinking.py (4)
  • BaseThinkingInjector (57-81)
  • FunctionArgumentWrapper (34-53)
  • patch_with_thinking (121-215)
  • inject (70-81)
🔇 Additional comments (17)
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (2)

46-52: LGTM! Thinking injection properly configured.

The implementation correctly checks for ThinkingMixin and non-null thinking_system_prompt before applying the patch. The CrewAIThinkingInjector properly prepends the system message to the messages list.


97-97: LGTM! Centralized patching improves maintainability.

The refactoring to use _patch_llm_based_on_config centralizes the thinking and retry logic, making the code more maintainable and consistent across all CrewAI wrappers.

packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py (2)

59-67: Well-designed handling of ChatHistory system messages.

The implementation thoughtfully handles both cases - when system_message is null (sets it directly) and when it exists (prepends a new system message to the messages list). This preserves existing system messages while ensuring the thinking prompt is injected.


101-101: LGTM! Clean refactoring to centralized patching.

The change simplifies the wrapper functions and consolidates the patching logic effectively.

packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (1)

68-85: Excellent handling of LanguageModelInput types.

The implementation comprehensively handles all valid LanguageModelInput types:

  • BaseMessage: Single message case properly handled
  • PromptValue: Correctly extracts and prepends to messages
  • str: Converts to HumanMessage with system prompt
  • Sequence[BaseMessage]: Validates all elements and prepends system message

The explicit type checking ensures robust error handling. Well done!

packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py (2)

49-60: LGTM! Clean and consistent implementation.

The thinking injection is properly applied with appropriate function names for LlamaIndex's chat interface methods.


38-39: Add docstring to public function.

The function _patch_llm_based_on_config requires a Google-style docstring per coding guidelines.

Apply this fix:

 def _patch_llm_based_on_config(client: ModelType, llm_config: LLMBaseConfig) -> ModelType:
+    """
+    Apply thinking and retry patches to a LlamaIndex LLM client based on config.
+    
+    Args:
+        client: The LlamaIndex LLM client to patch.
+        llm_config: Configuration containing thinking and retry settings.
+    
+    Returns:
+        The patched client with thinking and retry behavior applied.
+    """

Likely an incorrect or invalid review comment.

src/nat/llm/utils/thinking.py (3)

56-82: LGTM! Well-designed abstraction for thinking injection.

The BaseThinkingInjector class provides a clean abstraction with:

  • Clear separation of concerns
  • Proper use of dataclass decorator
  • Abstract method with comprehensive docstring
  • Type hints on all parameters

107-116: Excellent handling of different function types.

The implementation correctly identifies and wraps sync, async, generator, and async generator functions with appropriate wrappers. This ensures thinking injection works correctly regardless of the function type.


192-204: Good error handling with informative logging.

The patch application logic appropriately tries instance-level patching first, then falls back to class-level with clear logging of failures. This helps with debugging patching issues.

packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (7)

44-47: Thinking injector logic looks correct.

Prepending a system Message is consistent with the ThinkingMixin contract and mirrors the CrewAI injector pattern. The override marker is appropriate.


60-66: LGTM: Retry patch composition after thinking.

Order of composition (thinking first, then retry) is sensible and consistent with other backends.


99-99: LGTM: Centralized patching usage.

Deferring to _patch_llm_based_on_config keeps provider factories lean and consistent across backends.


114-116: LGTM: OpenAI client creation and centralized patching.

Mapping model→id and deferring patching keeps things clean.


79-92: Remove dead env var remapping block in llm.py

The existing code in packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (around lines 79–92) checks for NVIDIA_API_KEY and then re‐reads and re‐sets the same variable—making the entire block a no-op and rendering the comment misleading. We should either remove this block or update it based on the actual environment variable the AGNO client expects.

• File: packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
• Lines: 79–92

Suggested change:

-    # Because Agno uses a different environment variable for the API key, we need to set it here manually
-    if ("api_key" not in config_obj or config_obj["api_key"] is None):
-        if ("NVIDIA_API_KEY" in os.environ):
-            # Dont need to do anything. User has already set the correct key
-            pass
-        else:
-            nvidai_api_key = os.getenv("NVIDIA_API_KEY")
-            if (nvidai_api_key is not None):
-                # Transfer the key to the correct environment variable
-                os.environ["NVIDIA_API_KEY"] = nvidai_api_key
+    # If config.api_key is unset, the AGNO client will automatically read NVIDIA_API_KEY from the environment.

If the AGNO Nvidia client actually expects a different environment variable name (e.g. AGNO_NVIDIA_API_KEY), please confirm the exact name so we can add a precise remapping.


93-99: Consider optionally passing api_key from the config into the Nvidia client

This makes it possible for users to supply their API key without relying solely on environment variables:

     # Create Nvidia instance with conditional base_url
     kwargs = {"id": config_obj.get("id")}
     if config_obj.get("base_url") is not None:
         kwargs["base_url"] = config_obj.get("base_url")
+    if config_obj.get("api_key") is not None:
+        kwargs["api_key"] = config_obj.get("api_key")

     client = Nvidia(**kwargs)  # type: ignore[arg-type]

• Please confirm that the Nvidia constructor in the version of agno you’re pinning accepts an api_key keyword argument before merging.


49-58: AGNO invocation methods correctly covered; no changes needed

Verification of AGNO v0.5+ shows that all four invocation entry points—invoke, ainvoke, invoke_stream, and ainvoke_stream—are defined in agno/models/base.py and consistently implemented across all model-specific clients. The function_names list in your AgnoThinkingInjector therefore accurately targets every supported invocation path.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (6)
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (3)

87-99: Annotate return type and add a short docstring for the public factory.

Mirror the NIM path: add return type and docstring.

 @register_llm_client(config_type=OpenAIModelConfig, wrapper_type=LLMFrameworkEnum.AGNO)
-async def openai_agno(llm_config: OpenAIModelConfig, _builder: Builder):
+async def openai_agno(llm_config: OpenAIModelConfig, _builder: Builder) -> AsyncIterator[Any]:
+    """Create and yield an AGNO OpenAI client configured from NAT.
+
+    Applies thinking injection (if enabled) and retry logic.
+
+    Args:
+        llm_config: OpenAI model configuration.
+        _builder: Workflow builder (unused here).
+
+    Yields:
+        The configured and patched OpenAI client instance.
+    """

16-16: Add missing typing/ABC imports to support required annotations.

To add return type hints and use Sequence in signatures, extend imports accordingly.

-from typing import TypeVar
+from typing import TypeVar, Any
+from collections.abc import AsyncIterator, Sequence

69-84: Annotate return type and add a short docstring for the public factory.

Public APIs must have Python 3.11+ type hints and Google-style docstrings.

 @register_llm_client(config_type=NIMModelConfig, wrapper_type=LLMFrameworkEnum.AGNO)
-async def nim_agno(llm_config: NIMModelConfig, _builder: Builder):
+async def nim_agno(llm_config: NIMModelConfig, _builder: Builder) -> AsyncIterator[Any]:
+    """Create and yield an AGNO Nvidia client configured from NAT.
+
+    Applies thinking injection (if enabled) and retry logic.
+
+    Args:
+        llm_config: NIM model configuration.
+        _builder: Workflow builder (unused here).
+
+    Yields:
+        The configured and patched Nvidia client instance.
+    """
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (3)

62-97: Add return type hints and Google-style docstring to the public Azure entrypoint.

Public APIs require explicit return types and docstrings. The function is an async generator yielding the client.

Example update (outside this hunk):

from collections.abc import AsyncIterator

@register_llm_client(config_type=AzureOpenAIModelConfig, wrapper_type=LLMFrameworkEnum.CREWAI)
async def azure_openai_crewai(llm_config: AzureOpenAIModelConfig, _builder: Builder) -> AsyncIterator[ModelType]:
    """
    Yield a CrewAI LLM client configured for Azure OpenAI and patched with thinking/retry behaviors.

    Args:
        llm_config: Azure OpenAI model configuration (API endpoint, deployment, version, etc.).
        _builder: NAT builder (unused here).

    Yields:
        The configured CrewAI LLM client instance.
    """
    ...
    yield _patch_llm_based_on_config(client, llm_config)

Minor sanity check: ensure llm_config.api_version is a non-empty str before os.environ["AZURE_API_VERSION"] = llm_config.api_version; otherwise raise a clear ValueError.


100-126: Add return type hints and docstring to the public NIM entrypoint.

Same requirement as Azure/OpenAI entrypoints.

Example (outside this hunk):

from collections.abc import AsyncIterator

@register_llm_client(config_type=NIMModelConfig, wrapper_type=LLMFrameworkEnum.CREWAI)
async def nim_crewai(llm_config: NIMModelConfig, _builder: Builder) -> AsyncIterator[ModelType]:
    """
    Yield a CrewAI LLM client configured for NVIDIA NIM and patched with thinking/retry behaviors.

    Args:
        llm_config: NIM model configuration (model name, keys, etc.).
        _builder: NAT builder (unused here).

    Yields:
        The configured CrewAI LLM client instance.
    """
    ...

128-139: Add return type hints and docstring to the public OpenAI entrypoint.

Mirror the Azure/NIM suggestions for consistency and to satisfy lint/type checks.

Example (outside this hunk):

from collections.abc import AsyncIterator

@register_llm_client(config_type=OpenAIModelConfig, wrapper_type=LLMFrameworkEnum.CREWAI)
async def openai_crewai(llm_config: OpenAIModelConfig, _builder: Builder) -> AsyncIterator[ModelType]:
    """
    Yield a CrewAI LLM client configured for OpenAI and patched with thinking/retry behaviors.

    Args:
        llm_config: OpenAI model configuration.
        _builder: NAT builder (unused here).

    Yields:
        The configured CrewAI LLM client instance.
    """
    ...
🧹 Nitpick comments (8)
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (5)

15-16: Add a concise module docstring per repo guidelines.

Public modules require a Google-style docstring with a one-line summary.

 # limitations under the License.
 
+"""AGNO plugin wrappers for NIM and OpenAI clients with optional thinking and retry patching."""
+
 from typing import TypeVar

35-47: Tidy injector: drop duplicate import and generalize parameter type to Sequence.

  • Remove the redundant import inside the inner class (Line 41).
  • Accept Sequence[Message] for broader compatibility and convert to list when constructing the new message list.
 def _patch_llm_based_on_config(client: ModelType, llm_config: LLMBaseConfig) -> ModelType:
 
     from agno.models.message import Message
 
     class AgnoThinkingInjector(BaseThinkingInjector):
-
-        from agno.models.message import Message
-
         @override
-        def inject(self, messages: list[Message], *args, **kwargs) -> FunctionArgumentWrapper:
-            new_messages = [Message(role="system", content=self.system_prompt)] + messages
+        def inject(self, messages: Sequence[Message], *args, **kwargs) -> FunctionArgumentWrapper:
+            new_messages = [Message(role="system", content=self.system_prompt)] + list(messages)
             return FunctionArgumentWrapper(new_messages, *args, **kwargs)

35-65: Document the central patch function for maintainability.

Add a short Google-style docstring clarifying ordering (thinking before retry) and parameters/return.

-def _patch_llm_based_on_config(client: ModelType, llm_config: LLMBaseConfig) -> ModelType:
+def _patch_llm_based_on_config(client: ModelType, llm_config: LLMBaseConfig) -> ModelType:
+    """Patch an AGNO client based on the NAT LLM config.
+
+    Applies thinking injection first (if configured) and then retry logic.
+
+    Args:
+        client: The AGNO client instance to patch.
+        llm_config: LLM configuration that may include thinking/retry options.
+
+    Returns:
+        The patched client instance.
+    """

74-76: Minor: avoid unnecessary f-string for id.

Model name is already a string.

-        "id": f"{llm_config.model_name}",
+        "id": llm_config.model_name,

91-94: Consider pruning None-valued fields in one pass.

Optional: instead of per-key pruning like base_url, drop all None from the dumped config. This reduces constructor noise and potential arg-type ignores.

-    config_obj = {
-        "id": llm_config.model_name,
-        **llm_config.model_dump(exclude={"type", "model_name"}, by_alias=True),
-    }
+    dumped = llm_config.model_dump(exclude={"type", "model_name"}, by_alias=True)
+    config_obj = {"id": llm_config.model_name, **{k: v for k, v in dumped.items() if v is not None}}

Note: verify against AGNO constructors so we don't inadvertently drop required-but-None defaults.

packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (3)

17-17: Broaden typing imports to prepare for collections.abc types.

Per guidelines, prefer typing/collections.abc abstractions (e.g., Sequence, Mapping). Let's import them here.

-from typing import TypeVar
+from typing import Mapping, Sequence, TypeVar

39-45: Generalize CrewAIThinkingInjector input type and avoid assuming a list.

CrewAI calls often accept sequences of message dicts. Using Sequence[Mapping[str, str]] improves interoperability and aligns with the guidelines. Cast to list on write.

-        def inject(self, messages: list[dict[str, str]], *args, **kwargs) -> FunctionArgumentWrapper:
-            new_messages = [{"role": "system", "content": self.system_prompt}] + messages
+        def inject(self, messages: Sequence[Mapping[str, str]], *args, **kwargs) -> FunctionArgumentWrapper:
+            new_messages = [{"role": "system", "content": self.system_prompt}] + list(messages)
             return FunctionArgumentWrapper(new_messages, *args, **kwargs)

117-122: Ensure NVIDIA_NIM_API_KEY is set even when config_obj includes api_key.

If CrewAI/LiteLLM reads the NIM key from env only, providing api_key in config but not exporting the env var could fail. Safest is to export env if not already set, preferring the explicit config value.

-    if ("api_key" not in config_obj or config_obj["api_key"] is None):
-
-        if ("NVIDIA_NIM_API_KEY" in os.environ):
-            # Dont need to do anything. User has already set the correct key
-            pass
-        else:
-            nvidia_api_key = os.getenv("NVIDIA_API_KEY")
-
-            if (nvidia_api_key is not None):
-                # Transfer the key to the correct environment variable for LiteLLM
-                os.environ["NVIDIA_NIM_API_KEY"] = nvidia_api_key
+    # Ensure LiteLLM sees the correct API key for NIM
+    if "NVIDIA_NIM_API_KEY" not in os.environ:
+        if config_obj.get("api_key"):
+            os.environ["NVIDIA_NIM_API_KEY"] = str(config_obj["api_key"])
+        else:
+            nvidia_api_key = os.getenv("NVIDIA_API_KEY")
+            if nvidia_api_key:
+                os.environ["NVIDIA_NIM_API_KEY"] = nvidia_api_key

If CrewAI reliably honors the api_key argument directly, feel free to skip this change; otherwise, this makes the behavior robust.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8ef99c3 and 53ee5a5.

📒 Files selected for processing (2)
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (1 hunks)
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (9)
{src/**/*.py,packages/*/src/**/*.py,examples/*/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' only for env var prefixes and informal code comments (never in documentation)

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All importable Python code in packages must live under packages//src/

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Format Python code with yapf (PEP8 base, column_limit=120) run second in the toolchain
Run ruff (ruff check --fix) using configuration in pyproject.toml; fix warnings unless explicitly ignored; ruff is a linter only (not for formatting)
Respect naming: snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
Prefer typing/collections.abc abstractions (e.g., Sequence over list)
Use typing.Annotated for units or extra metadata when useful
Treat pyright warnings (configured in pyproject.toml) as errors during development
Prefer httpx with SSL verification enabled by default for HTTP requests; follow OWASP Top-10
Use async/await for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*.{py,md,sh,yml,yaml,toml,json,ini,cfg,txt,rst,pyi}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces (no tabs) and ensure every file ends with a single trailing newline

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{src/**/*.py,packages/*/src/**/*.py}: All public APIs require Python 3.11+ type hints on parameters and return values
Provide Google-style docstrings for every public module, class, function and CLI command
Docstring first line must be a concise description ending with a period

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*.{py,sh,md,yml,yaml,toml,json,ini,txt,cfg,rst,pyi}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.{py,sh,md,yml,yaml,toml,json,ini,txt,cfg,rst,pyi}: Every file must start with the standard SPDX Apache-2.0 header
All source files must include the SPDX Apache-2.0 header template

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*.{py,md}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived by setuptools-scm; never hard-code versions in code or documentation

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code should be licensed under the Apache License 2.0, and should contain an Apache License 2.0 header

comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

  • Not all packages contain Python code, if they do they should also contain their own set of tests, in a
    tests/ directory at the same level as the pyproject.toml file.

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
🧬 Code graph analysis (2)
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (5)
src/nat/llm/utils/thinking.py (4)
  • BaseThinkingInjector (57-81)
  • FunctionArgumentWrapper (34-53)
  • patch_with_thinking (121-215)
  • inject (70-81)
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (2)
  • _patch_llm_based_on_config (35-65)
  • inject (44-46)
packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py (2)
  • _patch_llm_based_on_config (38-68)
  • inject (45-47)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (2)
  • _patch_llm_based_on_config (38-105)
  • inject (49-84)
packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py (2)
  • _patch_llm_based_on_config (35-86)
  • inject (44-67)
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (5)
src/nat/data_models/thinking_mixin.py (2)
  • ThinkingMixin (28-66)
  • thinking_system_prompt (49-66)
src/nat/llm/nim_llm.py (1)
  • NIMModelConfig (31-41)
src/nat/llm/openai_llm.py (2)
  • openai_llm (45-47)
  • OpenAIModelConfig (30-41)
src/nat/llm/utils/thinking.py (4)
  • BaseThinkingInjector (57-81)
  • FunctionArgumentWrapper (34-53)
  • patch_with_thinking (121-215)
  • inject (70-81)
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (2)
  • _patch_llm_based_on_config (37-59)
  • inject (42-44)
🔇 Additional comments (6)
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (3)

59-65: Thinking before retries: good composition order.

Applying thinking injection prior to the retry wrapper is the correct order so retries re-run the fully prepared call. LGTM.


91-98: Confirm OpenAIChat/Nvidia constructor signatures

I wasn’t able to locate the OpenAIChat or Nvidia class definitions in the repo, so please manually verify that the fields you’re passing in config_obj exactly match each constructor’s signature. In particular:

  • Does OpenAIChat expect an id or a model keyword (or both)?
  • Does the Nvidia class (in agno.models.nvidia) use id or model for the model identifier?
  • Are api_key and base_url accepted as-is, or do the constructors expect openai_api_key/openai_api_base (or other names)?
  • Once you confirm the exact parameter names, update config_obj accordingly and remove any # type: ignore[arg-type] comments.

If you need help locating the class files or determining their __init__ signatures, let me know!


48-57: Action Required: Verify AGNO client invocation methods and constructor signatures

Before merging this change, please double-check that both the NVIDIA and OpenAI AGNO clients implement the exact methods you’re patching, and that their constructors use the parameters you expect:

• On the client instances returned by agno.models.nvidia.Nvidia and agno.models.openai.OpenAIChat, ensure all four of these methods exist and accept a list of Message objects (or equivalent) as their first argument:
invoke(messages: list[Message], …)
invoke_stream(messages: list[Message], …)
ainvoke(messages: list[Message], …)
ainvoke_stream(messages: list[Message], …)

• Verify in the source or API reference that those signatures are correct—particularly that the first parameter is a sequence of Message (not, for example, a single string or dict).

• Confirm that the constructors for each class accept an id argument (to select the model) and, for NVIDIA, a base_url override—rather than a top-level model field.

Relevant documentation for reference:
• NVIDIA model params (includes id, api_key, base_url): docs.agno.com/models/nvidia (docs.agno.com)
• OpenAIChat model params (includes id, api_key, base_url): docs.agno.com/models/openai (docs.agno.com)
• Example of passing a list of Message objects: docs.agno.com/examples/concepts/others/input_as_messages_list (docs.agno.com)

packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (3)

22-33: Centralized patching and typing scaffold look good.

  • Import set and ModelType TypeVar mirror the pattern used in other providers.
  • _patch_llm_based_on_config(...) -> ModelType is consistent and keeps the wrapper generic.

Also applies to: 34-37


53-59: Thinking injection before retry wrapping is correct.

Patching thinking first ensures retries see identical transformed inputs and avoids re-injecting on each retry.


46-52: No async or streaming methods to patch—call is the only public call method

  • Introspection of crewai.LLM shows only a call method; there are no acall, stream, or astream methods to cover.
  • Patching only "call" with patch_with_thinking therefore fully covers the available call surface.

No changes required.

@willkill07 willkill07 force-pushed the wkk_nemotron_thinking_support branch from 53ee5a5 to 9cfe66d Compare August 26, 2025 17:27
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (1)

78-95: Bug: Azure config reads excluded fields from config_obj, breaking explicit config values.

You exclude api_key, azure_endpoint, and azure_deployment from config_obj (Lines 69-76), but then attempt to read them from config_obj (Lines 78, 82, 89). This causes explicit values set on llm_config to be ignored, forcing reliance on env vars and potentially raising errors when env vars are absent.

Fix by sourcing these from llm_config (and only falling back to env vars), while still excluding them from the kwargs passed to LLM.

Apply this diff:

@@
-    api_key = config_obj.get("api_key") or os.environ.get("AZURE_OPENAI_API_KEY") or os.environ.get("AZURE_API_KEY")
+    api_key = llm_config.api_key or os.environ.get("AZURE_OPENAI_API_KEY") or os.environ.get("AZURE_API_KEY")
     if api_key is None:
         raise ValueError("Azure API key is not set")
     os.environ["AZURE_API_KEY"] = api_key
-    api_base = (config_obj.get("azure_endpoint") or os.environ.get("AZURE_OPENAI_ENDPOINT")
+    api_base = (llm_config.azure_endpoint or os.environ.get("AZURE_OPENAI_ENDPOINT")
                 or os.environ.get("AZURE_API_BASE"))
     if api_base is None:
         raise ValueError("Azure endpoint is not set")
     os.environ["AZURE_API_BASE"] = api_base
 
     os.environ["AZURE_API_VERSION"] = llm_config.api_version
-    model = config_obj.get("azure_deployment") or os.environ.get("AZURE_MODEL_DEPLOYMENT")
+    model = llm_config.azure_deployment or os.environ.get("AZURE_MODEL_DEPLOYMENT")
     if model is None:
         raise ValueError("Azure model deployment is not set")
 
     config_obj["model"] = model
♻️ Duplicate comments (4)
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (1)

37-45: Add docstrings for the patch helper and injector for maintainability.

Internal helpers still benefit from concise docstrings. This mirrors the earlier feedback about documenting the thinking injector.

Apply this diff:

 def _patch_llm_based_on_config(client: ModelType, llm_config: LLMBaseConfig) -> ModelType:
+    """
+    Patch a CrewAI LLM client with thinking and retry behavior based on the LLM config.
+    
+    Args:
+        client: The instantiated CrewAI LLM client.
+        llm_config: Provider config that may include ThinkingMixin and/or RetryMixin.
+    Returns:
+        The patched client (same instance), ready to be yielded.
+    """
 
     class CrewAIThinkingInjector(BaseThinkingInjector):
+        """CrewAI-specific thinking injector that prepends a system message."""
 
         @override
         def inject(self, messages: list[dict[str, str]], *args, **kwargs) -> FunctionArgumentWrapper:
             new_messages = [{"role": "system", "content": self.system_prompt}] + messages
             return FunctionArgumentWrapper(new_messages, *args, **kwargs)
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (3)

16-16: Add missing typing imports for async generator annotations.

To add return type hints on the public factories, import AsyncIterator and Any. Prefer collections.abc for ABCs per guidelines.

-from typing import TypeVar
+from typing import TypeVar, Any
+from collections.abc import AsyncIterator, Sequence

69-84: Add return type and docstring to public factory (nim_agno).

Per repo guidelines, public APIs require explicit return type hints and Google-style docstrings. This is an async generator.

-async def nim_agno(llm_config: NIMModelConfig, _builder: Builder):
+async def nim_agno(llm_config: NIMModelConfig, _builder: Builder) -> AsyncIterator[Any]:
+    """Yield an AGNO Nvidia client configured from nat, patched for thinking and retries if enabled.
+
+    Yields:
+        Any: The configured AGNO Nvidia client.
+    """

87-98: Add return type and docstring to public factory (openai_agno).

Same requirement as above for the OpenAI path.

-async def openai_agno(llm_config: OpenAIModelConfig, _builder: Builder):
+async def openai_agno(llm_config: OpenAIModelConfig, _builder: Builder) -> AsyncIterator[Any]:
+    """Yield an AGNO OpenAI client configured from nat, patched for thinking and retries if enabled.
+
+    Yields:
+        Any: The configured AGNO OpenAI client.
+    """
🧹 Nitpick comments (6)
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (2)

41-45: Guard against duplicate system prompts and broaden input type slightly.

If upstream already prepends the same thinking directive, this will add it twice. Also, CrewAI messages can be a list of dicts but widening to Mapping/Sequence improves robustness.

Apply this diff:

-        def inject(self, messages: list[dict[str, str]], *args, **kwargs) -> FunctionArgumentWrapper:
-            new_messages = [{"role": "system", "content": self.system_prompt}] + messages
-            return FunctionArgumentWrapper(new_messages, *args, **kwargs)
+        def inject(self, messages: list[dict[str, str]] | None, *args, **kwargs) -> FunctionArgumentWrapper:
+            msgs = messages or []
+            if msgs:
+                first = msgs[0]
+                if isinstance(first, dict) and first.get("role") == "system" and first.get("content") == self.system_prompt:
+                    # Already injected; return as-is.
+                    return FunctionArgumentWrapper(msgs, *args, **kwargs)
+            new_messages = [{"role": "system", "content": self.system_prompt}, *msgs]
+            return FunctionArgumentWrapper(new_messages, *args, **kwargs)

62-95: Add concise Google-style docstrings to public entrypoints.

The three registered LLM entrypoints are public APIs. Short docstrings describing config, env var behavior, and yielded object will aid users and satisfy the docs guideline.

I can draft these if helpful; let me know your preferred phrasing for env var precedence per provider.

Also applies to: 100-118, 121-133

packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (4)

15-16: Add a concise module docstring.

Public modules require a Google-style docstring. Place it after the license header.

 # limitations under the License.
 
+"""AGNO plugin: build and patch AGNO LLM clients with thinking and retry support driven by nat configs."""
+
 from typing import TypeVar

35-39: Document the central patcher for maintainability.

Add a short Google-style docstring to clarify what the helper composes and returns.

 def _patch_llm_based_on_config(client: ModelType, llm_config: LLMBaseConfig) -> ModelType:
-
+    """Apply thinking and retry patches to an AGNO client according to llm_config.
+
+    Args:
+        client: The AGNO client instance to patch.
+        llm_config: nat LLM config that may implement ThinkingMixin and/or RetryMixin.
+
+    Returns:
+        The patched client (same instance), with thinking/retry behavior applied.
+    """

41-46: Remove duplicate import and generalize injector signature to Sequence.

  • The import of Message inside the class body (Line 41) is redundant because Message is already imported in the outer scope of this function.
  • Accept a Sequence instead of list to be more flexible per guidelines, and avoid list-only concatenation.
-        from agno.models.message import Message
-
         @override
-        def inject(self, messages: list[Message], *args, **kwargs) -> FunctionArgumentWrapper:
-            new_messages = [Message(role="system", content=self.system_prompt)] + messages
+        def inject(self, messages: Sequence[Message], *args, **kwargs) -> FunctionArgumentWrapper:
+            new_messages = [Message(role="system", content=self.system_prompt)] + list(messages)
             return FunctionArgumentWrapper(new_messages, *args, **kwargs)

52-57: Minor: factor the AGNO method list into a constant.

Keeps the list DRY and discoverable.

-                                 function_names=[
-                                     "invoke_stream",
-                                     "invoke",
-                                     "ainvoke",
-                                     "ainvoke_stream",
-                                 ]))
+                                 function_names=AGNO_THINKING_METHODS))

Add near the top of the file (supporting snippet outside the selected lines):

AGNO_THINKING_METHODS: list[str] = ["invoke_stream", "invoke", "ainvoke", "ainvoke_stream"]
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 53ee5a5 and 9cfe66d.

📒 Files selected for processing (2)
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (1 hunks)
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (9)
{src/**/*.py,packages/*/src/**/*.py,examples/*/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' only for env var prefixes and informal code comments (never in documentation)

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All importable Python code in packages must live under packages//src/

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Format Python code with yapf (PEP8 base, column_limit=120) run second in the toolchain
Run ruff (ruff check --fix) using configuration in pyproject.toml; fix warnings unless explicitly ignored; ruff is a linter only (not for formatting)
Respect naming: snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
Prefer typing/collections.abc abstractions (e.g., Sequence over list)
Use typing.Annotated for units or extra metadata when useful
Treat pyright warnings (configured in pyproject.toml) as errors during development
Prefer httpx with SSL verification enabled by default for HTTP requests; follow OWASP Top-10
Use async/await for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*.{py,md,sh,yml,yaml,toml,json,ini,cfg,txt,rst,pyi}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces (no tabs) and ensure every file ends with a single trailing newline

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{src/**/*.py,packages/*/src/**/*.py}: All public APIs require Python 3.11+ type hints on parameters and return values
Provide Google-style docstrings for every public module, class, function and CLI command
Docstring first line must be a concise description ending with a period

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*.{py,sh,md,yml,yaml,toml,json,ini,txt,cfg,rst,pyi}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.{py,sh,md,yml,yaml,toml,json,ini,txt,cfg,rst,pyi}: Every file must start with the standard SPDX Apache-2.0 header
All source files must include the SPDX Apache-2.0 header template

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*.{py,md}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived by setuptools-scm; never hard-code versions in code or documentation

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code should be licensed under the Apache License 2.0, and should contain an Apache License 2.0 header

comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

  • Not all packages contain Python code, if they do they should also contain their own set of tests, in a
    tests/ directory at the same level as the pyproject.toml file.

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
🧬 Code graph analysis (2)
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (5)
src/nat/data_models/thinking_mixin.py (2)
  • ThinkingMixin (28-66)
  • thinking_system_prompt (49-66)
src/nat/llm/azure_openai_llm.py (2)
  • azure_openai_llm (55-57)
  • AzureOpenAIModelConfig (30-51)
src/nat/llm/nim_llm.py (1)
  • NIMModelConfig (31-41)
src/nat/llm/openai_llm.py (2)
  • openai_llm (45-47)
  • OpenAIModelConfig (30-41)
src/nat/llm/utils/thinking.py (4)
  • BaseThinkingInjector (57-81)
  • FunctionArgumentWrapper (34-53)
  • patch_with_thinking (121-215)
  • inject (70-81)
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (3)
src/nat/data_models/thinking_mixin.py (2)
  • ThinkingMixin (28-66)
  • thinking_system_prompt (49-66)
src/nat/llm/utils/thinking.py (4)
  • BaseThinkingInjector (57-81)
  • FunctionArgumentWrapper (34-53)
  • patch_with_thinking (121-215)
  • inject (70-81)
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (2)
  • _patch_llm_based_on_config (37-59)
  • inject (42-44)
🔇 Additional comments (6)
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (4)

46-57: Correct ordering: think-first then retry wrapping is right.

Thinking injection precedes retry patching, ensuring request transformation happens before any retries are attempted. This avoids retrying with mismatched payloads. Good call.


110-115: Nice compatibility bridge for NIM API key.

Gracefully maps NVIDIA_API_KEY into NVIDIA_NIM_API_KEY only when neither config nor env provides it. This prevents surprising overrides.


88-90: Verify api_version presence on config.

If api_version can be None/empty, setting AZURE_API_VERSION unconditionally could lead to confusing runtime behavior.

Please confirm the field is required at the model level. If optional, guard and raise a clear error:

-    os.environ["AZURE_API_VERSION"] = llm_config.api_version
+    if not llm_config.api_version:
+        raise ValueError("Azure API version is not set")
+    os.environ["AZURE_API_VERSION"] = llm_config.api_version

48-51: Please verify alternate CrewAI LLM entrypoints for thinking-prompt injection

The current injector only wraps the call method. If your CrewAI version exposes other request-sending methods (e.g., a streaming endpoint named stream or any alternate surface), those will bypass the thinking prompt and need to be included as well.

– Locate the LLM methods in your version (for example, by running dir(crewai.LLM) or checking the official docs).
– Expand the function_names list accordingly. For example, if there’s a stream method:

- function_names=["call"],
+ function_names=["call", "stream"],

Please confirm which methods on crewai.LLM send requests (such as call, stream, etc.) and that the intended environment variables (Azure/OpenAI/NIM) are honored across all of them.

packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (2)

78-79: Safe base_url pruning LGTM.

Switching to a guarded pop avoids KeyError when base_url is absent. This addresses the earlier concern cleanly.


49-57: Confirmed instantiation is valid

The BaseThinkingInjector is decorated with @dataclass and declares exactly the fields system_prompt: str and function_names: list[str], so the dataclass-generated __init__ will accept those keyword arguments without error. No custom __init__ is defined in AgnoThinkingInjector, so it inherits this correct signature.

@willkill07 willkill07 changed the title feat: support for nemotron thinking configuration feat: Support for Nemotron thinking configuration Aug 26, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (1)

62-67: Add return type hints and brief docstrings for public async generators.

Per repo guidelines, public APIs need explicit return types and Google-style docstrings. This was flagged earlier and still applies.

Example updates (outside the changed ranges):

from typing import AsyncIterator

@register_llm_client(config_type=AzureOpenAIModelConfig, wrapper_type=LLMFrameworkEnum.CREWAI)
async def azure_openai_crewai(llm_config: AzureOpenAIModelConfig, _builder: Builder) -> AsyncIterator[ModelType]:
    """Yield a CrewAI LLM client configured for Azure OpenAI, patched with thinking/retry."""
    ...

@register_llm_client(config_type=NIMModelConfig, wrapper_type=LLMFrameworkEnum.CREWAI)
async def nim_crewai(llm_config: NIMModelConfig, _builder: Builder) -> AsyncIterator[ModelType]:
    """Yield a CrewAI LLM client configured for NVIDIA NIM, patched with thinking/retry."""
    ...

@register_llm_client(config_type=OpenAIModelConfig, wrapper_type=LLMFrameworkEnum.CREWAI)
async def openai_crewai(llm_config: OpenAIModelConfig, _builder: Builder) -> AsyncIterator[ModelType]:
    """Yield a CrewAI LLM client configured for OpenAI, patched with thinking/retry."""
    ...

To quickly verify in-repo:

#!/bin/bash
# Check public CrewAI wrappers have AsyncIterator return types and docstrings.
rg -nP -C1 $'@register_llm_client\\(config_type=.*CREWAI\\)\\s*\\nasync def\\s+\\w+\\(' packages/nvidia_nat_crewai/src |
  awk -F: '{print $1":"$2}' | while read -r defline; do
    file="${defline%%:*}"; line="${defline##*:}"
    sed -n "$((line)),+5p" "$file"
  done

Happy to push a small follow-up PR for the annotations and docstrings if you prefer.

Also applies to: 100-105, 121-126

🧹 Nitpick comments (5)
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (5)

37-39: Add a concise docstring and clarify patch order.

This helper is part of the public plugin surface; add a short Google-style docstring and state that thinking is applied before retry so each retry reuses the injected prompt.

 def _patch_llm_based_on_config(client: ModelType, llm_config: LLMBaseConfig) -> ModelType:
+    """
+    Patch a CrewAI LLM client based on the provided config.
+
+    Applies thinking injection first (if enabled) and then retry behavior, so
+    retries re-invoke the underlying call with the injected system prompt.
+
+    Args:
+        client: The CrewAI LLM client instance to patch.
+        llm_config: Provider-specific LLM configuration carrying thinking/retry knobs.
+
+    Returns:
+        The patched client with thinking and retry semantics applied.
+    """

41-44: Make the injector input-robust and align with typing/collections.abc guidance.

CrewAI might pass tuples or other Sequences; concatenating with a list will raise. Prefer Sequence[Mapping[str, str]] and build a new list. Also handle None gracefully.

-        def inject(self, messages: list[dict[str, str]], *args, **kwargs) -> FunctionArgumentWrapper:
-            new_messages = [{"role": "system", "content": self.system_prompt}] + messages
-            return FunctionArgumentWrapper(new_messages, *args, **kwargs)
+        def inject(
+            self,
+            messages: Sequence[Mapping[str, str]] | None,
+            *args,
+            **kwargs,
+        ) -> FunctionArgumentWrapper:
+            system_message = {"role": "system", "content": self.system_prompt}
+            if messages is None:
+                new_messages = [system_message]
+            else:
+                new_messages = [system_message, *list(messages)]
+            return FunctionArgumentWrapper(new_messages, *args, **kwargs)

Add the missing imports in the typing block (see comment on Line 17).


78-87: Set both legacy and new Azure env var names for compatibility.

Some libs read AZURE_OPENAI_* while others read AZURE_API_*; set both to the resolved values to reduce surprises.

     os.environ["AZURE_API_KEY"] = api_key
+    # Keep both variants for compatibility across SDKs/plugins.
+    os.environ["AZURE_OPENAI_API_KEY"] = api_key
@@
     os.environ["AZURE_API_BASE"] = api_base
+    # Keep both variants for compatibility across SDKs/plugins.
+    os.environ["AZURE_OPENAI_ENDPOINT"] = api_base

89-93: Propagate deployment to ENV as well (compat fallback).

Some integrations look for an env var when the parameter isn’t threaded through; keeping it in ENV helps interop.

     config_obj["model"] = model
+    # Provide a compatibility env var for consumers that expect it.
+    os.environ["AZURE_MODEL_DEPLOYMENT"] = model

1-15: Consider adding a brief module docstring beneath the license header.

Keeps API docs cohesive and meets the “public module docstrings” guidance.

Example (outside the changed ranges):

"""CrewAI plugin LLM wrappers: build clients from nat configs and patch with thinking/retry semantics."""
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9cfe66d and 732bd41.

📒 Files selected for processing (1)
  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (9)
{src/**/*.py,packages/*/src/**/*.py,examples/*/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' only for env var prefixes and informal code comments (never in documentation)

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All importable Python code in packages must live under packages//src/

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Format Python code with yapf (PEP8 base, column_limit=120) run second in the toolchain
Run ruff (ruff check --fix) using configuration in pyproject.toml; fix warnings unless explicitly ignored; ruff is a linter only (not for formatting)
Respect naming: snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
Prefer typing/collections.abc abstractions (e.g., Sequence over list)
Use typing.Annotated for units or extra metadata when useful
Treat pyright warnings (configured in pyproject.toml) as errors during development
Prefer httpx with SSL verification enabled by default for HTTP requests; follow OWASP Top-10
Use async/await for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
**/*.{py,md,sh,yml,yaml,toml,json,ini,cfg,txt,rst,pyi}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces (no tabs) and ensure every file ends with a single trailing newline

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{src/**/*.py,packages/*/src/**/*.py}: All public APIs require Python 3.11+ type hints on parameters and return values
Provide Google-style docstrings for every public module, class, function and CLI command
Docstring first line must be a concise description ending with a period

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
**/*.{py,sh,md,yml,yaml,toml,json,ini,txt,cfg,rst,pyi}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.{py,sh,md,yml,yaml,toml,json,ini,txt,cfg,rst,pyi}: Every file must start with the standard SPDX Apache-2.0 header
All source files must include the SPDX Apache-2.0 header template

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
**/*.{py,md}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived by setuptools-scm; never hard-code versions in code or documentation

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code should be licensed under the Apache License 2.0, and should contain an Apache License 2.0 header

comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

  • Not all packages contain Python code, if they do they should also contain their own set of tests, in a
    tests/ directory at the same level as the pyproject.toml file.

Files:

  • packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py
🧬 Code graph analysis (1)
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (6)
src/nat/data_models/thinking_mixin.py (2)
  • ThinkingMixin (28-66)
  • thinking_system_prompt (49-66)
src/nat/llm/azure_openai_llm.py (2)
  • azure_openai_llm (55-57)
  • AzureOpenAIModelConfig (30-51)
src/nat/llm/nim_llm.py (1)
  • NIMModelConfig (31-41)
src/nat/llm/openai_llm.py (2)
  • openai_llm (45-47)
  • OpenAIModelConfig (30-41)
src/nat/llm/utils/thinking.py (4)
  • BaseThinkingInjector (57-81)
  • FunctionArgumentWrapper (34-53)
  • patch_with_thinking (121-215)
  • inject (70-81)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (2)
  • _patch_llm_based_on_config (38-105)
  • inject (49-84)
🔇 Additional comments (2)
packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (2)

53-59: LGTM: Thinking-before-retry ordering is correct.

Applying thinking first ensures every retry reuses the same injected system prompt, which is the expected behavior.


46-51: Include streaming method for thinking injection

CrewAI’s LLM likely exposes both a call and a streaming API. To ensure the thinking injector also wraps streamed responses, please confirm the available method names on crewai.LLM (via documentation or introspection) and update the patch accordingly:

• File: packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py (lines 46–51)
– Extend function_names to include the streaming entry point if it exists.

Suggested diff:

-                function_names=["call"],
+                function_names=["call", "stream"],

If the streaming method is named differently (e.g., "astream", "chat_stream"), replace "stream" with the correct identifier.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
src/nat/llm/aws_bedrock_llm.py (1)

49-52: Add explicit return type for provider factory and fix article grammar

Public API functions require 3.11+ type hints; also “A AWS” → “An AWS”.

Apply this diff:

+from collections.abc import AsyncGenerator
@@
-@register_llm_provider(config_type=AWSBedrockModelConfig)
-async def aws_bedrock_model(llm_config: AWSBedrockModelConfig, _builder: Builder):
+@register_llm_provider(config_type=AWSBedrockModelConfig)
+async def aws_bedrock_model(
+    llm_config: AWSBedrockModelConfig, _builder: Builder
+) -> AsyncGenerator[LLMProviderInfo, None]:
@@
-    yield LLMProviderInfo(config=llm_config, description="A AWS Bedrock model for use with an LLM client.")
+    yield LLMProviderInfo(config=llm_config, description="An AWS Bedrock model for use with an LLM client.")
tests/nat/builder/test_builder.py (4)

20-22: Wrong BaseModel import — should be from pydantic, not openai

These test helper classes are pydantic models; importing BaseModel from openai is incorrect and will break under pyright and at runtime.

Apply this diff:

-from openai import BaseModel
-from pydantic import ConfigDict
+from pydantic import BaseModel
+from pydantic import ConfigDict

238-242: Invalid return type annotation for supported_pipeline_types

-> [PipelineTypeEnum] is a list literal, not a type. Use a proper generic, and prefer collections.abc per guidelines.

Apply this diff:

+from collections.abc import Sequence
@@
-            def supported_pipeline_types(self) -> [PipelineTypeEnum]:
-                return [PipelineTypeEnum.AGENT_EXECUTION]
+            def supported_pipeline_types(self) -> Sequence[PipelineTypeEnum]:
+                return [PipelineTypeEnum.AGENT_EXECUTION]

609-672: Two async tests won’t run: missing test_ prefix

get_retriever() and get_retriever_config() won’t be collected by pytest; coverage is silently lost.

Apply this diff:

-async def get_retriever():
+async def test_get_retriever():
@@
-async def get_retriever_config():
+async def test_get_retriever_config():

723-766: Tautological assertion and variable shadowing in test_built_config

workflow_config is reassigned to workflow.config, making assert workflow_config.workflow == workflow_config.workflow a no-op. Use distinct names and assert against the originally constructed workflow config.

Apply this diff:

-async def test_built_config():
+async def test_built_config():
@@
-    workflow_config = FunctionReturningFunctionConfig()
+    workflow_fn_config = FunctionReturningFunctionConfig()
@@
-        await builder.set_workflow(workflow_config)
+        await builder.set_workflow(workflow_fn_config)
@@
-        workflow = builder.build()
-
-        workflow_config = workflow.config
-
-        assert workflow_config.general == general_config
-        assert workflow_config.functions == {"function1": function_config}
-        assert workflow_config.workflow == workflow_config.workflow
-        assert workflow_config.llms == {"llm1": llm_config}
-        assert workflow_config.embedders == {"embedder1": embedder_config}
-        assert workflow_config.memory == {"memory1": memory_config}
-        assert workflow_config.retrievers == {"retriever1": retriever_config}
-        assert workflow_config.object_stores == {"object_store1": object_store_config}
-        assert workflow_config.ttc_strategies == {"ttc_strategy": ttc_config}
+        workflow = builder.build()
+        built_cfg = workflow.config
+
+        assert built_cfg.general == general_config
+        assert built_cfg.functions == {"function1": function_config}
+        assert built_cfg.workflow == workflow_fn_config
+        assert built_cfg.llms == {"llm1": llm_config}
+        assert built_cfg.embedders == {"embedder1": embedder_config}
+        assert built_cfg.memory == {"memory1": memory_config}
+        assert built_cfg.retrievers == {"retriever1": retriever_config}
+        assert built_cfg.object_stores == {"object_store1": object_store_config}
+        assert built_cfg.ttc_strategies == {"ttc_strategy": ttc_config}
packages/nvidia_nat_mem0ai/src/nat/plugins/mem0ai/mem0_editor.py (1)

105-116: Add type hints and a docstring for remove_items.

Public APIs must be fully typed and documented. Annotate **kwargs and return type, and document expected keys.

Apply these diffs:

-    async def remove_items(self, **kwargs):
+    async def remove_items(self, **kwargs: Any) -> None:
+        """
+        Remove items by memory_id or purge all for a user.
+        
+        Keyword Args:
+            memory_id (str): ID of a specific memory to delete.
+            user_id (str): User whose memories should be deleted.
+        """

Add the missing import near the top of the file:

 import asyncio
+from typing import Any
♻️ Duplicate comments (12)
packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py (3)

85-93: Add return type hint and docstring to public API function.

 @register_llm_client(config_type=AzureOpenAIModelConfig, wrapper_type=LLMFrameworkEnum.LLAMA_INDEX)
-async def azure_openai_llama_index(llm_config: AzureOpenAIModelConfig, _builder: Builder):
+async def azure_openai_llama_index(llm_config: AzureOpenAIModelConfig, _builder: Builder) -> AsyncIterator[Any]:
+    """
+    Yield an Azure OpenAI llama_index client with optional thinking and retry patches.
+
+    Args:
+        llm_config: The Azure OpenAI model configuration.
+        _builder: The builder instance (unused).
+
+    Yields:
+        A patched AzureOpenAI client configured for llama_index.
+    """

95-103: Add return type hint and docstring to public API function.

 @register_llm_client(config_type=NIMModelConfig, wrapper_type=LLMFrameworkEnum.LLAMA_INDEX)
-async def nim_llama_index(llm_config: NIMModelConfig, _builder: Builder):
+async def nim_llama_index(llm_config: NIMModelConfig, _builder: Builder) -> AsyncIterator[Any]:
+    """
+    Yield a NVIDIA NIM llama_index client with optional thinking and retry patches.
+
+    Args:
+        llm_config: The NIM model configuration.
+        _builder: The builder instance (unused).
+
+    Yields:
+        A patched NVIDIA client configured for llama_index.
+    """

105-113: Add return type hint and docstring to public API function.

 @register_llm_client(config_type=OpenAIModelConfig, wrapper_type=LLMFrameworkEnum.LLAMA_INDEX)
-async def openai_llama_index(llm_config: OpenAIModelConfig, _builder: Builder):
+async def openai_llama_index(llm_config: OpenAIModelConfig, _builder: Builder) -> AsyncIterator[Any]:
+    """
+    Yield an OpenAI llama_index client with optional thinking and retry patches.
+
+    Args:
+        llm_config: The OpenAI model configuration.
+        _builder: The builder instance (unused).
+
+    Yields:
+        A patched OpenAI client configured for llama_index.
+    """
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (3)

16-16: Import missing typing/ABC symbols for proper type annotations.

To support proper type annotations for async generators, add Any and AsyncIterator to the imports.

-from typing import TypeVar
+from typing import TypeVar, Any
+from collections.abc import AsyncIterator

68-83: Add return type hint and docstring to public API function.

Public APIs must include return type hints and Google-style docstrings per repo guidelines.

 @register_llm_client(config_type=NIMModelConfig, wrapper_type=LLMFrameworkEnum.AGNO)
-async def nim_agno(llm_config: NIMModelConfig, _builder: Builder):
+async def nim_agno(llm_config: NIMModelConfig, _builder: Builder) -> AsyncIterator[Any]:
+    """
+    Yield an AGNO Nvidia client with optional thinking and retry patches.
+
+    Args:
+        llm_config: The NIM model configuration.
+        _builder: The builder instance (unused).
+
+    Yields:
+        A patched Nvidia client configured for AGNO.
+    """

86-101: Add return type hint and docstring to public API function.

 @register_llm_client(config_type=OpenAIModelConfig, wrapper_type=LLMFrameworkEnum.AGNO)
-async def openai_agno(llm_config: OpenAIModelConfig, _builder: Builder):
+async def openai_agno(llm_config: OpenAIModelConfig, _builder: Builder) -> AsyncIterator[Any]:
+    """
+    Yield an AGNO OpenAI client with optional thinking and retry patches.
+
+    Args:
+        llm_config: The OpenAI model configuration.
+        _builder: The builder instance (unused).
+
+    Yields:
+        A patched OpenAIChat client configured for AGNO.
+    """
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (6)

16-17: Add AsyncIterator import to properly type-hint async generator returns.

Public APIs must have full type hints. Import AsyncIterator and Any to properly annotate the async generator functions.

 from collections.abc import Sequence
+from collections.abc import AsyncIterator
 from typing import TypeVar
+from typing import Any

91-96: Consider adding batch/abatch to patched function names.

Many LangChain chat models expose batch/abatch methods. If users call these methods, thinking won't be injected. Consider future-proofing by including them in the patch list.

                 function_names=[
                     "invoke",
                     "ainvoke",
                     "stream",
                     "astream",
+                    "batch",
+                    "abatch",
                 ],

118-126: Add return type hint and docstring to public API function.

 @register_llm_client(config_type=AzureOpenAIModelConfig, wrapper_type=LLMFrameworkEnum.LANGCHAIN)
-async def azure_openai_langchain(llm_config: AzureOpenAIModelConfig, _builder: Builder):
+async def azure_openai_langchain(llm_config: AzureOpenAIModelConfig, _builder: Builder) -> AsyncIterator[Any]:
+    """
+    Yield an Azure OpenAI LangChain client with optional thinking and retry patches.
+
+    Args:
+        llm_config: The Azure OpenAI model configuration.
+        _builder: The builder instance (unused).
+
+    Yields:
+        A patched AzureChatOpenAI client configured for LangChain.
+    """

128-140: Add return type hint and docstring to public API function.

 @register_llm_client(config_type=NIMModelConfig, wrapper_type=LLMFrameworkEnum.LANGCHAIN)
-async def nim_langchain(llm_config: NIMModelConfig, _builder: Builder):
+async def nim_langchain(llm_config: NIMModelConfig, _builder: Builder) -> AsyncIterator[Any]:
+    """
+    Yield a NVIDIA NIM LangChain client with optional thinking and retry patches.
+
+    Args:
+        llm_config: The NIM model configuration.
+        _builder: The builder instance (unused).
+
+    Yields:
+        A patched ChatNVIDIA client configured for LangChain.
+    """

142-151: Add return type hint and docstring to public API function.

 @register_llm_client(config_type=OpenAIModelConfig, wrapper_type=LLMFrameworkEnum.LANGCHAIN)
-async def openai_langchain(llm_config: OpenAIModelConfig, _builder: Builder):
+async def openai_langchain(llm_config: OpenAIModelConfig, _builder: Builder) -> AsyncIterator[Any]:
+    """
+    Yield an OpenAI LangChain client with optional thinking and retry patches.
+
+    Args:
+        llm_config: The OpenAI model configuration.
+        _builder: The builder instance (unused).
+
+    Yields:
+        A patched ChatOpenAI client configured for LangChain.
+    """

84-84: Fix: else branch silently passes unsupported inputs without raising ValueError.

The docstring states a ValueError is raised for invalid inputs, but the else branch passes them through unchanged. This hides misuse of the API.

             else:
-                return FunctionArgumentWrapper(messages, *args, **kwargs)
+                raise ValueError(
+                    f"Unsupported LanguageModelInput: {type(messages)}. "
+                    "Expected str | BaseMessage | PromptValue | Sequence[BaseMessage]."
+                )
🧹 Nitpick comments (15)
src/nat/llm/aws_bedrock_llm.py (1)

16-19: Missing module-level docstring

Public modules should have a concise Google-style docstring.

Apply this diff:

@@
 
+"""AWS Bedrock LLM provider configuration and registration."""
+
 from pydantic import AliasChoices
tests/nat/builder/test_builder.py (2)

110-112: Fixture naming/style per guidelines

Autouse fixture should follow the fixture_ prefix convention; add name to keep the public name unchanged.

Apply this diff:

-@pytest.fixture(scope="module", autouse=True)
-async def _register():
+@pytest.fixture(scope="module", autouse=True, name="_register")
+async def fixture_register():

796-812: Fixture naming/style per guidelines

Add name= and use fixture_ prefix for consistency; leaves test references unchanged.

Apply this diff:

-@pytest.fixture
-def caplog_fixture(caplog):
+@pytest.fixture(name="caplog_fixture")
+def fixture_caplog_fixture(caplog):
@@
-@pytest.fixture
-def mock_component_data():
+@pytest.fixture(name="mock_component_data")
+def fixture_mock_component_data():
packages/nvidia_nat_mem0ai/src/nat/plugins/mem0ai/mem0_editor.py (5)

17-23: Drop import-time warning suppression; prefer typing-only import for AsyncMemoryClient.

Since we only need AsyncMemoryClient for type hints, import it under TYPE_CHECKING to avoid import-time side effects and eliminate the need to manage Pydantic warnings. Python 3.11 stores annotations as strings, so this is safe and cleaner.

Apply this diff:

-import warnings
-
-from pydantic.warnings import PydanticDeprecatedSince20
-
-with warnings.catch_warnings():
-    warnings.filterwarnings("ignore", category=PydanticDeprecatedSince20)
-    from mem0 import AsyncMemoryClient
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from mem0 import AsyncMemoryClient

30-33: Docstring branding: use “nat”, not “NAT”, in documentation strings.

Per guidelines, use lowercase “nat” in docs. Docstrings count as documentation.

Apply this diff:

-    """
-    Wrapper class that implements NAT interfaces for Mem0 Integrations Async.
-    """
+    """
+    Wrapper class that implements nat interfaces for Mem0 integration (async).
+    """

44-48: Use Sequence for input type and tighten the docstring.

Prefer Sequence over list for inputs you iterate but don’t mutate; update the docstring to Google style.

Apply these diffs:

-    async def add_items(self, items: list[MemoryItem]) -> None:
+    async def add_items(self, items: Sequence[MemoryItem]) -> None:
@@
-        """
-        Insert Multiple MemoryItems into the memory.
-        Each MemoryItem is translated and uploaded.
-        """
+        """
+        Insert multiple MemoryItem objects into Mem0.
+
+        Args:
+            items (Sequence[MemoryItem]): Items to add for the given user(s).
+        """

Add the missing import near the top of the file:

 import asyncio
+from typing import Sequence

54-60: Avoid mutating caller-provided metadata.

pop on item_meta mutates memory_item.metadata. Copy first to avoid surprising side effects.

Apply this diff:

-            item_meta = memory_item.metadata
+            item_meta = dict(memory_item.metadata)

86-86: Be explicit when required kwargs are missing.

Using kwargs.pop("user_id") raises a KeyError without context. Raise a clear ValueError instead.

Apply this diff:

-        user_id = kwargs.pop("user_id")  # Ensure user ID is in keyword arguments
+        user_id = kwargs.pop("user_id", None)  # Ensure user ID is in keyword arguments
+        if user_id is None:
+            raise ValueError("search requires 'user_id' in kwargs.")
packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py (3)

16-17: Missing type annotation import for async generator return types.

Public APIs require complete type hints per coding guidelines. Import AsyncIterator to properly annotate the async generator functions' return types.

 from collections.abc import Sequence
+from collections.abc import AsyncIterator
 from typing import TypeVar
+from typing import Any

38-69: Add docstring to public function.

Public functions require Google-style docstrings per coding guidelines.

 def _patch_llm_based_on_config(client: ModelType, llm_config: LLMBaseConfig) -> ModelType:
+    """
+    Patch a llama_index LLM client with thinking and retry capabilities.
+
+    Args:
+        client: The llama_index LLM client to patch.
+        llm_config: The LLM configuration containing thinking and retry settings.
+
+    Returns:
+        The patched client with optional thinking injection and retry logic.
+    """

71-83: Add return type hint and docstring to public API function.

Public APIs must include return type hints and Google-style docstrings per repo guidelines.

 @register_llm_client(config_type=AWSBedrockModelConfig, wrapper_type=LLMFrameworkEnum.LLAMA_INDEX)
-async def aws_bedrock_llama_index(llm_config: AWSBedrockModelConfig, _builder: Builder):
+async def aws_bedrock_llama_index(llm_config: AWSBedrockModelConfig, _builder: Builder) -> AsyncIterator[Any]:
+    """
+    Yield an AWS Bedrock llama_index client with optional thinking and retry patches.
+
+    Args:
+        llm_config: The AWS Bedrock model configuration.
+        _builder: The builder instance (unused).
+
+    Yields:
+        A patched Bedrock client configured for llama_index.
+    """
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (2)

35-65: Add docstring to public function.

Public functions require Google-style docstrings per coding guidelines.

 def _patch_llm_based_on_config(client: ModelType, llm_config: LLMBaseConfig) -> ModelType:
+    """
+    Patch an AGNO client with thinking and retry capabilities.
+
+    Args:
+        client: The AGNO client to patch.
+        llm_config: The LLM configuration containing thinking and retry settings.
+
+    Returns:
+        The patched client with optional thinking injection and retry logic.
+    """

41-41: Remove duplicate import statement.

The Message import on Line 41 duplicates the import already present on Line 37.

-    from agno.models.message import Message
-
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (2)

38-105: Add docstring to public function.

Public functions require Google-style docstrings per coding guidelines.

 def _patch_llm_based_on_config(client: ModelType, llm_config: LLMBaseConfig) -> ModelType:
+    """
+    Patch a LangChain chat model client with thinking and retry capabilities.
+
+    Args:
+        client: The LangChain chat model client to patch.
+        llm_config: The LLM configuration containing thinking and retry settings.
+
+    Returns:
+        The patched client with optional thinking injection and retry logic.
+    """

108-116: Add return type hint and docstring to public API function.

Public APIs must include return type hints and Google-style docstrings per repo guidelines.

 @register_llm_client(config_type=AWSBedrockModelConfig, wrapper_type=LLMFrameworkEnum.LANGCHAIN)
-async def aws_bedrock_langchain(llm_config: AWSBedrockModelConfig, _builder: Builder):
+async def aws_bedrock_langchain(llm_config: AWSBedrockModelConfig, _builder: Builder) -> AsyncIterator[Any]:
+    """
+    Yield an AWS Bedrock LangChain client with optional thinking and retry patches.
+
+    Args:
+        llm_config: The AWS Bedrock model configuration.
+        _builder: The builder instance (unused).
+
+    Yields:
+        A patched ChatBedrockConverse client configured for LangChain.
+    """
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 732bd41 and 09711c2.

📒 Files selected for processing (9)
  • docs/source/extend/integrating-aws-bedrock-models.md (0 hunks)
  • docs/source/workflows/llms/index.md (0 hunks)
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (1 hunks)
  • packages/nvidia_nat_agno/tests/test_llm.py (5 hunks)
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (2 hunks)
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py (1 hunks)
  • packages/nvidia_nat_mem0ai/src/nat/plugins/mem0ai/mem0_editor.py (1 hunks)
  • src/nat/llm/aws_bedrock_llm.py (2 hunks)
  • tests/nat/builder/test_builder.py (4 hunks)
💤 Files with no reviewable changes (2)
  • docs/source/extend/integrating-aws-bedrock-models.md
  • docs/source/workflows/llms/index.md
🧰 Additional context used
📓 Path-based instructions (13)
{src/**/*.py,packages/*/src/**/*.py,examples/*/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' only for env var prefixes and informal code comments (never in documentation)

Files:

  • packages/nvidia_nat_mem0ai/src/nat/plugins/mem0ai/mem0_editor.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/llm/aws_bedrock_llm.py
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All importable Python code in packages must live under packages//src/

Files:

  • packages/nvidia_nat_mem0ai/src/nat/plugins/mem0ai/mem0_editor.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Format Python code with yapf (PEP8 base, column_limit=120) run second in the toolchain
Run ruff (ruff check --fix) using configuration in pyproject.toml; fix warnings unless explicitly ignored; ruff is a linter only (not for formatting)
Respect naming: snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
Prefer typing/collections.abc abstractions (e.g., Sequence over list)
Use typing.Annotated for units or extra metadata when useful
Treat pyright warnings (configured in pyproject.toml) as errors during development
Prefer httpx with SSL verification enabled by default for HTTP requests; follow OWASP Top-10
Use async/await for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial

Files:

  • packages/nvidia_nat_mem0ai/src/nat/plugins/mem0ai/mem0_editor.py
  • packages/nvidia_nat_agno/tests/test_llm.py
  • tests/nat/builder/test_builder.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/llm/aws_bedrock_llm.py
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*.{py,md,sh,yml,yaml,toml,json,ini,cfg,txt,rst,pyi}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces (no tabs) and ensure every file ends with a single trailing newline

Files:

  • packages/nvidia_nat_mem0ai/src/nat/plugins/mem0ai/mem0_editor.py
  • packages/nvidia_nat_agno/tests/test_llm.py
  • tests/nat/builder/test_builder.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/llm/aws_bedrock_llm.py
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{src/**/*.py,packages/*/src/**/*.py}: All public APIs require Python 3.11+ type hints on parameters and return values
Provide Google-style docstrings for every public module, class, function and CLI command
Docstring first line must be a concise description ending with a period

Files:

  • packages/nvidia_nat_mem0ai/src/nat/plugins/mem0ai/mem0_editor.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/llm/aws_bedrock_llm.py
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*.{py,sh,md,yml,yaml,toml,json,ini,txt,cfg,rst,pyi}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.{py,sh,md,yml,yaml,toml,json,ini,txt,cfg,rst,pyi}: Every file must start with the standard SPDX Apache-2.0 header
All source files must include the SPDX Apache-2.0 header template

Files:

  • packages/nvidia_nat_mem0ai/src/nat/plugins/mem0ai/mem0_editor.py
  • packages/nvidia_nat_agno/tests/test_llm.py
  • tests/nat/builder/test_builder.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/llm/aws_bedrock_llm.py
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*.{py,md}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived by setuptools-scm; never hard-code versions in code or documentation

Files:

  • packages/nvidia_nat_mem0ai/src/nat/plugins/mem0ai/mem0_editor.py
  • packages/nvidia_nat_agno/tests/test_llm.py
  • tests/nat/builder/test_builder.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/llm/aws_bedrock_llm.py
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code should be licensed under the Apache License 2.0, and should contain an Apache License 2.0 header

comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • packages/nvidia_nat_mem0ai/src/nat/plugins/mem0ai/mem0_editor.py
  • packages/nvidia_nat_agno/tests/test_llm.py
  • tests/nat/builder/test_builder.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/llm/aws_bedrock_llm.py
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

  • Not all packages contain Python code, if they do they should also contain their own set of tests, in a
    tests/ directory at the same level as the pyproject.toml file.

Files:

  • packages/nvidia_nat_mem0ai/src/nat/plugins/mem0ai/mem0_editor.py
  • packages/nvidia_nat_agno/tests/test_llm.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
  • packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py
{tests/**/*.py,examples/*/tests/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{tests/**/*.py,examples/*/tests/**/*.py}: Unit tests live in tests/ (or examples/*/tests) and use markers defined in pyproject.toml (e2e, integration)
Use pytest with pytest-asyncio for asynchronous tests
Mock external services with pytest_httpserver or unittest.mock; do not hit live endpoints in tests
Mark expensive tests with @pytest.mark.slow or @pytest.mark.integration to skip by default

Files:

  • tests/nat/builder/test_builder.py
tests/**/*.py

⚙️ CodeRabbit configuration file

tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using the test_ prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using the fixture_ prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass

Files:

  • tests/nat/builder/test_builder.py
src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All importable Python code must live under src/ so namespace packages resolve correctly

Files:

  • src/nat/llm/aws_bedrock_llm.py
src/nat/**/*

⚙️ CodeRabbit configuration file

This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.

Files:

  • src/nat/llm/aws_bedrock_llm.py
🧬 Code graph analysis (5)
tests/nat/builder/test_builder.py (1)
src/nat/cli/type_registry.py (1)
  • register_ttc_strategy (731-738)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (2)
src/nat/data_models/thinking_mixin.py (2)
  • ThinkingMixin (28-66)
  • thinking_system_prompt (49-66)
src/nat/llm/utils/thinking.py (4)
  • BaseThinkingInjector (57-81)
  • FunctionArgumentWrapper (34-53)
  • patch_with_thinking (121-215)
  • inject (70-81)
src/nat/llm/aws_bedrock_llm.py (2)
src/nat/data_models/thinking_mixin.py (1)
  • ThinkingMixin (28-66)
src/nat/data_models/top_p_mixin.py (1)
  • TopPMixin (24-43)
packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py (3)
src/nat/data_models/thinking_mixin.py (2)
  • ThinkingMixin (28-66)
  • thinking_system_prompt (49-66)
src/nat/llm/utils/thinking.py (4)
  • BaseThinkingInjector (57-81)
  • FunctionArgumentWrapper (34-53)
  • patch_with_thinking (121-215)
  • inject (70-81)
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (2)
  • _patch_llm_based_on_config (35-65)
  • inject (44-46)
packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py (5)
src/nat/data_models/thinking_mixin.py (2)
  • ThinkingMixin (28-66)
  • thinking_system_prompt (49-66)
src/nat/llm/nim_llm.py (1)
  • NIMModelConfig (31-41)
src/nat/llm/openai_llm.py (2)
  • openai_llm (45-47)
  • OpenAIModelConfig (30-41)
src/nat/llm/utils/thinking.py (4)
  • BaseThinkingInjector (57-81)
  • FunctionArgumentWrapper (34-53)
  • patch_with_thinking (121-215)
  • inject (70-81)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (2)
  • _patch_llm_based_on_config (38-105)
  • inject (49-84)
🔇 Additional comments (10)
src/nat/llm/aws_bedrock_llm.py (3)

26-27: Thinking/TopP mixins import: LGTM

Imports align with the new gated-field machinery and keep provider configs consistent with others in the PR.


30-30: Extending AWSBedrockModelConfig with TopPMixin and ThinkingMixin: LGTM

The base mixin order matches the pattern used elsewhere and should enable top-p gating and thinking prompt injection without further changes here.


39-39: max_tokens field tweak: LGTM

Constraint and description are clear and consistent with other providers.

tests/nat/builder/test_builder.py (5)

102-104: Rename to TTTCStrategyConfig: LGTM

Naming is consistent (PascalCase) and the registered name remains “test_ttc_strategy”, preserving external behavior.


222-224: Updated TTC strategy registration: LGTM

Decorator and function signature now reference TTTCStrategyConfig consistently.


678-687: Add TTC strategy tests (normal, error, duplicate): LGTM

Good coverage of success and error paths.


692-694: Config instance rename usage: LGTM

Local var cfg is clear and correctly passed to builder.


734-734: Built-config TTC strategy inclusion: LGTM

Ensures TTC strategy config is captured in the final workflow config.

packages/nvidia_nat_mem0ai/src/nat/plugins/mem0ai/mem0_editor.py (1)

19-23: No fallback needed for Pydantic v1 – v2.x is enforced
The root pyproject.toml pins pydantic==2.10.*, and even in examples we require pydantic ~=2.10.0,<2.11.0. That guarantees runtime under Pydantic 2.x (where PydanticDeprecatedSince20 exists), so environments with Pydantic v1 are out of scope and would already fail dependency resolution. The proposed try/except fallback for v1 can be safely ignored.

Likely an incorrect or invalid review comment.

packages/nvidia_nat_agno/tests/test_llm.py (1)

49-51: LGTM! Clear and focused verification approach.

The refactored assertions properly verify the call arguments by extracting the kwargs and checking specific values, which is more maintainable than rigid assert_called_once_with checks.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
docs/source/extend/integrating-aws-bedrock-models.md (1)

20-20: Fix product naming per style guide.

First mention should read “NVIDIA NeMo Agent toolkit.”

Apply:

-The NeMo Agent toolkit supports integration with multiple LLM providers, including AWS Bedrock.
+The NVIDIA NeMo Agent toolkit supports integration with multiple LLM providers, including AWS Bedrock.
src/nat/data_models/thinking_mixin.py (1)

59-67: Fix: avoid TypeError when model keys are None; add explicit fallback return.

re.match() on a None-valued key (e.g., azure_deployment=None) will raise. Cast to str only when value is truthy, and return None explicitly when no match.

Apply:

-        if self.thinking is None:
-            return None
-        for key in _MODEL_KEYS:
-            if hasattr(self, key):
-                if _NVIDIA_NEMOTRON_REGEX.match(getattr(self, key)):
-                    return "/think" if self.thinking else "/no_think"
-                elif _LLAMA_NEMOTRON_REGEX.match(getattr(self, key)):
-                    return f"detailed thinking {'on' if self.thinking else 'off'}"
+        if self.thinking is None:
+            return None
+        for key in _MODEL_KEYS:
+            if not hasattr(self, key):
+                continue
+            value = getattr(self, key)
+            if value is None:
+                continue
+            text = str(value)
+            if _NVIDIA_NEMOTRON_REGEX.match(text):
+                return "/think" if self.thinking else "/no_think"
+            if _LLAMA_NEMOTRON_REGEX.match(text):
+                return f"detailed thinking {'on' if self.thinking else 'off'}"
+        return None
🧹 Nitpick comments (6)
docs/source/extend/integrating-aws-bedrock-models.md (1)

50-56: Wrap code-like values in backticks; fix LangChain casing; add missing punctuation.

Vale/Sphinx/style consistency.

Apply:

-* `top_p`: The top-p value to use for the model. This field is ignored for LlamaIndex. (0.0 to 1.0, default: 1.0)
-* `context_size`: The maximum number of tokens available for input. This is only required for LlamaIndex. This field is ignored for Langchain. (must be > 0, default: 1024)
-* `region_name`: AWS region where your Bedrock service is hosted (default: "None")
-* `base_url`: Custom Bedrock endpoint URL (default: None, needed if you don't want to use the default us-east-1 endpoint)
-* `credentials_profile_name`: AWS credentials profile name from ~/.aws/credentials or ~/.aws/config files (default: None)
-* `max_retries`: The maximum number of retries for the request
+* `top_p`: The top-p value to use for the model. This field is ignored for `LlamaIndex`. (range: `0.0`–`1.0`, default: `1.0`)
+* `context_size`: The maximum number of tokens available for input. This is only required for `LlamaIndex`. This field is ignored for `LangChain`. (must be `> 0`, default: `1024`)
+* `region_name`: AWS region where your Bedrock service is hosted (default: `None`)
+* `base_url`: Custom Bedrock endpoint URL (default: `None`, needed if you don't want to use the default `us-east-1` endpoint)
+* `credentials_profile_name`: AWS credentials profile name from `~/.aws/credentials` or `~/.aws/config` files (default: `None`)
+* `max_retries`: The maximum number of retries for the request.
docs/source/extend/gated-fields.md (1)

80-84: Tighten wording and code formatting for ThinkingMixin entry.

Minor grammar and clarity.

Apply:

-- {py:class}`~nat.data_models.thinking_mixin.ThinkingMixin`
-  - Field: `thinking: bool | None`
-  - Default when supported: `None` (use model default)
-  - Only currently supported on Nemotron models
+- {py:class}`~nat.data_models.thinking_mixin.ThinkingMixin`
+  - Field: `thinking: bool | None`
+  - Default when supported: `None` (defer to the model’s default).
+  - Currently only supported on Nemotron models.
src/nat/data_models/thinking_mixin.py (2)

16-21: Add module docstring per guidelines.

Apply:

+"""Thinking mixin for Nemotron-like models. Provides gated config and system prompts."""
+
 import re

23-24: Refactor Nemotron regexes for comprehensive model‐ID coverage

A repo-wide grep shows Nemotron IDs in tests, docs, examples, and code spanning formats like

  • “nvidia/nvidia-nemotron-8b”, “nvidia/nvidia-some-nemotron”
  • “nvidia/nemotron” (API example)
  • “nvidia/llama-nemotron”, “NVIDIA/LLaMa-3.1-Nemotron”
  • “nvidia/llama3-nemotron”, “nvidia/llama3.1-nemotron-nano-4b-v1.1”
  • “nvidia/llama-3.3-nemotron-super-49b-v1”

Current patterns only catch the first class (“nvidia/nvidia…nemotron”) and most llama variants, but miss direct “nvidia/nemotron” and any non-“nvidia/llama” repos with “nemotron”. To cover all observed cases while still distinguishing LLaMa vs. other Nemotron models, update in thinking_mixin.py lines 23–24:

- _NVIDIA_NEMOTRON_REGEX = re.compile(r"^nvidia/nvidia.*nemotron", re.IGNORECASE)
- _LLAMA_NEMOTRON_REGEX  = re.compile(r"^nvidia/llama.*nemotron",  re.IGNORECASE)
+ _NVIDIA_NEMOTRON_REGEX = re.compile(r"^nvidia/(?!llama)[^/]*nemotron", re.IGNORECASE)
+ _LLAMA_NEMOTRON_REGEX  = re.compile(r"^nvidia/llama[^/]*nemotron",    re.IGNORECASE)
  • The first regex now matches any nvidia/<repo> containing “nemotron” except llama variants (avoiding false positives on LLaMa).
  • The second explicitly matches all LLaMa-prefixed Nemotron repos.

This simplifies and generalizes matching across code, tests, docs, and examples.

src/nat/data_models/gated_field_mixin.py (2)

165-178: Clarify “seen” semantics in code comments/tests.

When any keyed attribute is present but no pattern matches in supported-mode, the field is treated as unsupported; if no keys are present, it’s treated as supported. Add a brief comment to reduce surprises and ensure tests assert this behavior.

Apply:

         """Check if a specific field is supported based on its configuration and keys."""
-        seen = False
+        # Semantics:
+        # - supported-mode: any match -> True; keys seen with no match -> False; no keys seen -> True
+        # - unsupported-mode: any match -> False; otherwise -> True (incl. no keys seen)
+        seen = False

23-31: Make mixin config immutable.

Prevents accidental mutation of gating rules at runtime.

Apply:

-@dataclass
+@dataclass(frozen=True)
 class GatedFieldMixinConfig:
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 09711c2 and 502cc50.

📒 Files selected for processing (9)
  • docs/source/extend/gated-fields.md (3 hunks)
  • docs/source/extend/integrating-aws-bedrock-models.md (2 hunks)
  • docs/source/workflows/llms/index.md (1 hunks)
  • src/nat/data_models/gated_field_mixin.py (6 hunks)
  • src/nat/data_models/temperature_mixin.py (1 hunks)
  • src/nat/data_models/thinking_mixin.py (1 hunks)
  • src/nat/data_models/top_p_mixin.py (1 hunks)
  • src/nat/llm/aws_bedrock_llm.py (2 hunks)
  • tests/nat/data_models/test_gated_field_mixin.py (12 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/nat/llm/aws_bedrock_llm.py
  • docs/source/workflows/llms/index.md
🧰 Additional context used
📓 Path-based instructions (13)
{src/**/*.py,packages/*/src/**/*.py,examples/*/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' only for env var prefixes and informal code comments (never in documentation)

Files:

  • src/nat/data_models/top_p_mixin.py
  • src/nat/data_models/temperature_mixin.py
  • src/nat/data_models/thinking_mixin.py
  • src/nat/data_models/gated_field_mixin.py
src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All importable Python code must live under src/ so namespace packages resolve correctly

Files:

  • src/nat/data_models/top_p_mixin.py
  • src/nat/data_models/temperature_mixin.py
  • src/nat/data_models/thinking_mixin.py
  • src/nat/data_models/gated_field_mixin.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Format Python code with yapf (PEP8 base, column_limit=120) run second in the toolchain
Run ruff (ruff check --fix) using configuration in pyproject.toml; fix warnings unless explicitly ignored; ruff is a linter only (not for formatting)
Respect naming: snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
Prefer typing/collections.abc abstractions (e.g., Sequence over list)
Use typing.Annotated for units or extra metadata when useful
Treat pyright warnings (configured in pyproject.toml) as errors during development
Prefer httpx with SSL verification enabled by default for HTTP requests; follow OWASP Top-10
Use async/await for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial

Files:

  • src/nat/data_models/top_p_mixin.py
  • src/nat/data_models/temperature_mixin.py
  • src/nat/data_models/thinking_mixin.py
  • tests/nat/data_models/test_gated_field_mixin.py
  • src/nat/data_models/gated_field_mixin.py
**/*.{py,md,sh,yml,yaml,toml,json,ini,cfg,txt,rst,pyi}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces (no tabs) and ensure every file ends with a single trailing newline

Files:

  • src/nat/data_models/top_p_mixin.py
  • src/nat/data_models/temperature_mixin.py
  • src/nat/data_models/thinking_mixin.py
  • docs/source/extend/gated-fields.md
  • tests/nat/data_models/test_gated_field_mixin.py
  • src/nat/data_models/gated_field_mixin.py
  • docs/source/extend/integrating-aws-bedrock-models.md
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{src/**/*.py,packages/*/src/**/*.py}: All public APIs require Python 3.11+ type hints on parameters and return values
Provide Google-style docstrings for every public module, class, function and CLI command
Docstring first line must be a concise description ending with a period

Files:

  • src/nat/data_models/top_p_mixin.py
  • src/nat/data_models/temperature_mixin.py
  • src/nat/data_models/thinking_mixin.py
  • src/nat/data_models/gated_field_mixin.py
**/*.{py,sh,md,yml,yaml,toml,json,ini,txt,cfg,rst,pyi}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.{py,sh,md,yml,yaml,toml,json,ini,txt,cfg,rst,pyi}: Every file must start with the standard SPDX Apache-2.0 header
All source files must include the SPDX Apache-2.0 header template

Files:

  • src/nat/data_models/top_p_mixin.py
  • src/nat/data_models/temperature_mixin.py
  • src/nat/data_models/thinking_mixin.py
  • docs/source/extend/gated-fields.md
  • tests/nat/data_models/test_gated_field_mixin.py
  • src/nat/data_models/gated_field_mixin.py
  • docs/source/extend/integrating-aws-bedrock-models.md
**/*.{py,md}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived by setuptools-scm; never hard-code versions in code or documentation

Files:

  • src/nat/data_models/top_p_mixin.py
  • src/nat/data_models/temperature_mixin.py
  • src/nat/data_models/thinking_mixin.py
  • docs/source/extend/gated-fields.md
  • tests/nat/data_models/test_gated_field_mixin.py
  • src/nat/data_models/gated_field_mixin.py
  • docs/source/extend/integrating-aws-bedrock-models.md
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code should be licensed under the Apache License 2.0, and should contain an Apache License 2.0 header

comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • src/nat/data_models/top_p_mixin.py
  • src/nat/data_models/temperature_mixin.py
  • src/nat/data_models/thinking_mixin.py
  • docs/source/extend/gated-fields.md
  • tests/nat/data_models/test_gated_field_mixin.py
  • src/nat/data_models/gated_field_mixin.py
  • docs/source/extend/integrating-aws-bedrock-models.md
src/nat/**/*

⚙️ CodeRabbit configuration file

This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.

Files:

  • src/nat/data_models/top_p_mixin.py
  • src/nat/data_models/temperature_mixin.py
  • src/nat/data_models/thinking_mixin.py
  • src/nat/data_models/gated_field_mixin.py
docs/source/**/*.md

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

docs/source/**/*.md: Use the official naming: first use 'NVIDIA NeMo Agent toolkit'; subsequent references 'NeMo Agent toolkit'; in headings use 'NeMo Agent Toolkit' (capital T)
Never use deprecated names (Agent Intelligence toolkit, aiqtoolkit, AgentIQ, AIQ/aiq) in documentation; update to the current naming unless explicitly referring to deprecated names
'AIQ Blueprint' is the intended name for the blueprint in docs; do not change it
Documentation sources must be Markdown files under docs/source
Surround code entities with backticks in documentation to avoid Vale false-positives
Keep docs in sync with code; documentation pipeline must pass (no Sphinx errors or broken links)

Files:

  • docs/source/extend/gated-fields.md
  • docs/source/extend/integrating-aws-bedrock-models.md
docs/source/**/*

⚙️ CodeRabbit configuration file

This directory contains the source code for the documentation. All documentation should be written in Markdown format. Any image files should be placed in the docs/source/_static directory.

Files:

  • docs/source/extend/gated-fields.md
  • docs/source/extend/integrating-aws-bedrock-models.md
{tests/**/*.py,examples/*/tests/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{tests/**/*.py,examples/*/tests/**/*.py}: Unit tests live in tests/ (or examples/*/tests) and use markers defined in pyproject.toml (e2e, integration)
Use pytest with pytest-asyncio for asynchronous tests
Mock external services with pytest_httpserver or unittest.mock; do not hit live endpoints in tests
Mark expensive tests with @pytest.mark.slow or @pytest.mark.integration to skip by default

Files:

  • tests/nat/data_models/test_gated_field_mixin.py
tests/**/*.py

⚙️ CodeRabbit configuration file

tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using the test_ prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using the fixture_ prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass

Files:

  • tests/nat/data_models/test_gated_field_mixin.py
🧬 Code graph analysis (4)
src/nat/data_models/top_p_mixin.py (1)
src/nat/data_models/gated_field_mixin.py (1)
  • GatedFieldMixin (34-242)
src/nat/data_models/temperature_mixin.py (1)
src/nat/data_models/gated_field_mixin.py (1)
  • GatedFieldMixin (34-242)
src/nat/data_models/thinking_mixin.py (1)
src/nat/data_models/gated_field_mixin.py (1)
  • GatedFieldMixin (34-242)
tests/nat/data_models/test_gated_field_mixin.py (1)
src/nat/data_models/gated_field_mixin.py (1)
  • GatedFieldMixin (34-242)
🪛 LanguageTool
docs/source/extend/gated-fields.md

[grammar] ~80-~80: There might be a mistake here.
Context: ... supported on GPT-5 models - {py:class}~nat.data_models.thinking_mixin.ThinkingMixin - Field: thinking: bool | None - Defau...

(QB_NEW_EN)


[grammar] ~81-~81: There might be a mistake here.
Context: ...hinking_mixin.ThinkingMixin - Field:thinking: bool | None - Default when supported:None` (use mode...

(QB_NEW_EN)


[grammar] ~82-~82: There might be a mistake here.
Context: ...en supported: None (use model default) - Only currently supported on Nemotron mod...

(QB_NEW_EN)

docs/source/extend/integrating-aws-bedrock-models.md

[grammar] ~51-~51: There might be a mistake here.
Context: ... Langchain. (must be > 0, default: 1024) * region_name: AWS region where your Bedrock service ...

(QB_NEW_EN)


[grammar] ~52-~52: There might be a mistake here.
Context: ...rock service is hosted (default: "None") * base_url: Custom Bedrock endpoint URL (default: ...

(QB_NEW_EN)


[grammar] ~53-~53: There might be a mistake here.
Context: ...t to use the default us-east-1 endpoint) * credentials_profile_name: AWS credentials profile name from ~/.a...

(QB_NEW_EN)


[grammar] ~54-~54: There might be a mistake here.
Context: ...s or ~/.aws/config files (default: None) * max_retries: The maximum number of retries for the ...

(QB_NEW_EN)

🔇 Additional comments (7)
tests/nat/data_models/test_gated_field_mixin.py (2)

73-79: LGTM: migrated to non-generic GatedFieldMixin.

Switching from GatedFieldMixin[T] to GatedFieldMixin aligns with the updated implementation.

Also applies to: 89-94, 103-108, 118-123, 133-138, 148-153, 163-168, 180-185, 197-202, 213-218, 227-232, 243-248, 251-256


190-192: Verified: No leftover generic usages of GatedFieldMixin
Ran the suggested ripgrep check (rg -nP '\bGatedFieldMixin\s*\[' -C2) and confirmed there are no occurrences of GatedFieldMixin[…] remaining.
All changes look correct—approving the update to the assertion.

src/nat/data_models/temperature_mixin.py (1)

24-31: LGTM: remove generic parameterization.

Inheritance change to non-generic GatedFieldMixin is consistent with the new mixin API.

src/nat/data_models/top_p_mixin.py (1)

24-31: LGTM: remove generic parameterization.

Matches the refactor in GatedFieldMixin and keeps field typing on top_p.

docs/source/extend/gated-fields.md (1)

44-51: LGTM: docs match non-generic API.

Examples now use GatedFieldMixin without type args; consistent with code.

Also applies to: 58-66

src/nat/data_models/thinking_mixin.py (1)

28-35: LGTM: gated field wiring and field definition are consistent with GatedFieldMixin usage.

Defaulting to None and excluding the field from export aligns with intent; gating will block non-supported models from setting it.

Also applies to: 42-47

src/nat/data_models/gated_field_mixin.py (1)

34-45: LGTM: removal of generics and switch to object | None resolves variance issues and simplifies API.

Matches prior feedback; keeps runtime behavior intact.

Also applies to: 56-63

@willkill07 willkill07 force-pushed the wkk_nemotron_thinking_support branch from 502cc50 to c69e9f6 Compare August 27, 2025 12:45
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/source/extend/integrating-aws-bedrock-models.md (1)

20-21: Fix first mention to comply with naming guideline.

Per docs guidelines, first mention must be “NVIDIA NeMo Agent toolkit.”

Apply:

-The NeMo Agent toolkit supports integration with multiple LLM providers, including AWS Bedrock. This documentation provides a comprehensive guide on how to integrate AWS Bedrock models into your NeMo Agent toolkit workflow. To view the full list of supported LLM providers, run `nat info components -t llm_provider`.
+The NVIDIA NeMo Agent toolkit supports integration with multiple LLM providers, including AWS Bedrock. This documentation provides a comprehensive guide on how to integrate AWS Bedrock models into your NeMo Agent toolkit workflow. To view the full list of supported LLM providers, run `nat info components -t llm_provider`.
♻️ Duplicate comments (3)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (3)

86-98: Also patch batch/abatch to future-proof thinking injection.

LangChain chat models may expose batch/abatch; include them to avoid gaps.

                 function_names=[
                     "invoke",
                     "ainvoke",
                     "stream",
                     "astream",
+                    "batch",
+                    "abatch",
                 ],

16-18: Add AsyncIterator import and annotate async generator returns.

Public APIs must have return type hints; these wrappers are async generators. Import AsyncIterator and annotate the four functions’ returns.

-from collections.abc import Sequence
+from collections.abc import Sequence, AsyncIterator
@@
-async def aws_bedrock_langchain(llm_config: AWSBedrockModelConfig, _builder: Builder):
+async def aws_bedrock_langchain(llm_config: AWSBedrockModelConfig, _builder: Builder) -> AsyncIterator[object]:
@@
-async def azure_openai_langchain(llm_config: AzureOpenAIModelConfig, _builder: Builder):
+async def azure_openai_langchain(llm_config: AzureOpenAIModelConfig, _builder: Builder) -> AsyncIterator[object]:
@@
-async def nim_langchain(llm_config: NIMModelConfig, _builder: Builder):
+async def nim_langchain(llm_config: NIMModelConfig, _builder: Builder) -> AsyncIterator[object]:
@@
-async def openai_langchain(llm_config: OpenAIModelConfig, _builder: Builder):
+async def openai_langchain(llm_config: OpenAIModelConfig, _builder: Builder) -> AsyncIterator[object]:

Also applies to: 108-116, 118-126, 128-140, 142-151


67-85: Raise on unsupported LanguageModelInput instead of silent pass-through.

Docstring promises a ValueError, but the else branch returns the original input unchanged, hiding misuse and skipping injection.

         elif isinstance(messages, Sequence):
             if all(isinstance(m, BaseMessage) for m in messages):
                 new_messages = [system_message, *list(messages)]
                 return FunctionArgumentWrapper(new_messages, *args, **kwargs)
             raise ValueError(
                 "Unsupported sequence element types for LanguageModelInput; expected Sequence[BaseMessage].")
-        else:
-            return FunctionArgumentWrapper(messages, *args, **kwargs)
+        else:
+            raise ValueError(
+                f"Unsupported LanguageModelInput: {type(messages)}. "
+                "Expected str | BaseMessage | PromptValue | Sequence[BaseMessage]."
+            )
🧹 Nitpick comments (9)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (2)

38-45: Add a brief docstring for the config-driven patcher.

Documenting intent and behavior of this shared patch point helps maintainers.

-def _patch_llm_based_on_config(client: ModelType, llm_config: LLMBaseConfig) -> ModelType:
+def _patch_llm_based_on_config(client: ModelType, llm_config: LLMBaseConfig) -> ModelType:
+    """
+    Apply config-driven patches (thinking, retries) to a LangChain chat client.
+
+    Args:
+        client: Concrete LangChain chat client instance.
+        llm_config: LLM configuration (may include ThinkingMixin/RetryMixin).
+
+    Returns:
+        The patched client with requested behaviors applied.
+    """

108-116: Add concise docstrings to exported wrapper functions.

Plugin entrypoints are public APIs; add Google-style docstrings.

 async def aws_bedrock_langchain(llm_config: AWSBedrockModelConfig, _builder: Builder) -> AsyncIterator[object]:
+    """Yield a LangChain ChatBedrockConverse client patched per llm_config."""
@@
 async def azure_openai_langchain(llm_config: AzureOpenAIModelConfig, _builder: Builder) -> AsyncIterator[object]:
+    """Yield a LangChain AzureChatOpenAI client patched per llm_config."""
@@
 async def nim_langchain(llm_config: NIMModelConfig, _builder: Builder) -> AsyncIterator[object]:
+    """Yield a LangChain ChatNVIDIA client patched per llm_config."""
@@
 async def openai_langchain(llm_config: OpenAIModelConfig, _builder: Builder) -> AsyncIterator[object]:
+    """Yield a LangChain ChatOpenAI client patched per llm_config."""

Also applies to: 118-126, 128-140, 142-151

docs/source/extend/gated-fields.md (1)

80-84: Clarify gating behavior when keys are present but no patterns match.

Docs state the no-keys case but omit the keys-present/no-match case. Suggest adding one sentence to “How It Works” so it matches the implementation.

Apply this patch to the “Behavior” list:

-  - No detection keys present → applies `default_if_supported`.
+  - No detection keys present → applies `default_if_supported`.
+  - Detection keys present but no patterns match:
+    - With `supported` mode → treated as unsupported.
+    - With `unsupported` mode → treated as supported.
src/nat/data_models/gated_field_mixin.py (4)

34-54: Non-generic GatedFieldMixin is coherent, but broadens types.

Moving default_if_supported to object | None resolves variance issues but loses static type safety for defaults. Acceptable if we rely on runtime constraints in mixins; otherwise consider adding a lightweight runtime type check for the default against the target field type.


96-104: Avoid double-validation: per-field validator is redundant with combined validator.

Both a per-field validator and a combined validator run, causing duplicate work and potentially duplicate errors. Create only the combined validator.

Apply this diff:

-        # Create and store validator
-        validator = cls._create_gated_field_validator(field_name, default_if_supported, unsupported, supported, keys)
-        validator_name = f"_gated_field_validator_{field_name}"
-        setattr(cls, validator_name, validator)
+        # Defer to the combined validator; it also handles the single-mixin case.

130-137: Add return type to factory for the validator function.

Annotate the return to satisfy strict type-checking guidelines.

Apply this diff:

-    def _create_gated_field_validator(
+    def _create_gated_field_validator(
         cls,
         field_name: str,
-        default_if_supported: object | None,
+        default_if_supported: object | None,
         unsupported: Sequence[Pattern[str]] | None,
         supported: Sequence[Pattern[str]] | None,
         keys: Sequence[str],
-    ):
+    ) -> "collections.abc.Callable[[object], object]":

And add imports (outside this hunk):

from typing import Any
from collections.abc import Callable  # already imported for Sequence; reuse symbol

214-231: Type-annotate the combined validator inner function.

Minor typing polish to align with repo standards.

Apply this diff:

-        def combined_gated_field_validator(self):
+        def combined_gated_field_validator(self) -> object:
docs/source/extend/integrating-aws-bedrock-models.md (2)

40-41: Consider documenting Thinking configuration in the example.

AWS Bedrock now supports ThinkingMixin per PR; add the thinking option to the YAML example to keep docs in sync. Please confirm the exact schema before we propose a precise diff (boolean vs structured config).


55-56: Clarify max_retries default.

State the default value (if any) and units/behavior (e.g., backoff policy) to avoid ambiguity.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 502cc50 and c69e9f6.

📒 Files selected for processing (11)
  • docs/source/extend/gated-fields.md (3 hunks)
  • docs/source/extend/integrating-aws-bedrock-models.md (2 hunks)
  • docs/source/workflows/llms/index.md (1 hunks)
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (3 hunks)
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py (1 hunks)
  • src/nat/data_models/gated_field_mixin.py (6 hunks)
  • src/nat/data_models/temperature_mixin.py (1 hunks)
  • src/nat/data_models/thinking_mixin.py (1 hunks)
  • src/nat/data_models/top_p_mixin.py (1 hunks)
  • src/nat/llm/aws_bedrock_llm.py (2 hunks)
  • tests/nat/data_models/test_gated_field_mixin.py (12 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • docs/source/workflows/llms/index.md
  • src/nat/data_models/top_p_mixin.py
  • src/nat/data_models/temperature_mixin.py
  • src/nat/data_models/thinking_mixin.py
  • src/nat/llm/aws_bedrock_llm.py
  • tests/nat/data_models/test_gated_field_mixin.py
  • packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py
🧰 Additional context used
📓 Path-based instructions (13)
{src/**/*.py,packages/*/src/**/*.py,examples/*/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' only for env var prefixes and informal code comments (never in documentation)

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/data_models/gated_field_mixin.py
packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All importable Python code in packages must live under packages//src/

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Format Python code with yapf (PEP8 base, column_limit=120) run second in the toolchain
Run ruff (ruff check --fix) using configuration in pyproject.toml; fix warnings unless explicitly ignored; ruff is a linter only (not for formatting)
Respect naming: snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
Prefer typing/collections.abc abstractions (e.g., Sequence over list)
Use typing.Annotated for units or extra metadata when useful
Treat pyright warnings (configured in pyproject.toml) as errors during development
Prefer httpx with SSL verification enabled by default for HTTP requests; follow OWASP Top-10
Use async/await for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/data_models/gated_field_mixin.py
**/*.{py,md,sh,yml,yaml,toml,json,ini,cfg,txt,rst,pyi}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces (no tabs) and ensure every file ends with a single trailing newline

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/data_models/gated_field_mixin.py
  • docs/source/extend/gated-fields.md
  • docs/source/extend/integrating-aws-bedrock-models.md
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{src/**/*.py,packages/*/src/**/*.py}: All public APIs require Python 3.11+ type hints on parameters and return values
Provide Google-style docstrings for every public module, class, function and CLI command
Docstring first line must be a concise description ending with a period

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/data_models/gated_field_mixin.py
**/*.{py,sh,md,yml,yaml,toml,json,ini,txt,cfg,rst,pyi}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.{py,sh,md,yml,yaml,toml,json,ini,txt,cfg,rst,pyi}: Every file must start with the standard SPDX Apache-2.0 header
All source files must include the SPDX Apache-2.0 header template

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/data_models/gated_field_mixin.py
  • docs/source/extend/gated-fields.md
  • docs/source/extend/integrating-aws-bedrock-models.md
**/*.{py,md}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived by setuptools-scm; never hard-code versions in code or documentation

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/data_models/gated_field_mixin.py
  • docs/source/extend/gated-fields.md
  • docs/source/extend/integrating-aws-bedrock-models.md
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code should be licensed under the Apache License 2.0, and should contain an Apache License 2.0 header

comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
  • src/nat/data_models/gated_field_mixin.py
  • docs/source/extend/gated-fields.md
  • docs/source/extend/integrating-aws-bedrock-models.md
packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

  • Not all packages contain Python code, if they do they should also contain their own set of tests, in a
    tests/ directory at the same level as the pyproject.toml file.

Files:

  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All importable Python code must live under src/ so namespace packages resolve correctly

Files:

  • src/nat/data_models/gated_field_mixin.py
src/nat/**/*

⚙️ CodeRabbit configuration file

This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.

Files:

  • src/nat/data_models/gated_field_mixin.py
docs/source/**/*.md

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

docs/source/**/*.md: Use the official naming: first use 'NVIDIA NeMo Agent toolkit'; subsequent references 'NeMo Agent toolkit'; in headings use 'NeMo Agent Toolkit' (capital T)
Never use deprecated names (Agent Intelligence toolkit, aiqtoolkit, AgentIQ, AIQ/aiq) in documentation; update to the current naming unless explicitly referring to deprecated names
'AIQ Blueprint' is the intended name for the blueprint in docs; do not change it
Documentation sources must be Markdown files under docs/source
Surround code entities with backticks in documentation to avoid Vale false-positives
Keep docs in sync with code; documentation pipeline must pass (no Sphinx errors or broken links)

Files:

  • docs/source/extend/gated-fields.md
  • docs/source/extend/integrating-aws-bedrock-models.md
docs/source/**/*

⚙️ CodeRabbit configuration file

This directory contains the source code for the documentation. All documentation should be written in Markdown format. Any image files should be placed in the docs/source/_static directory.

Files:

  • docs/source/extend/gated-fields.md
  • docs/source/extend/integrating-aws-bedrock-models.md
🧬 Code graph analysis (1)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (2)
src/nat/data_models/thinking_mixin.py (2)
  • ThinkingMixin (28-66)
  • thinking_system_prompt (49-66)
src/nat/llm/utils/thinking.py (4)
  • BaseThinkingInjector (57-81)
  • FunctionArgumentWrapper (34-53)
  • patch_with_thinking (121-215)
  • inject (70-81)
🪛 LanguageTool
docs/source/extend/gated-fields.md

[grammar] ~80-~80: There might be a mistake here.
Context: ... supported on GPT-5 models - {py:class}~nat.data_models.thinking_mixin.ThinkingMixin - Field: thinking: bool | None - Defau...

(QB_NEW_EN)


[grammar] ~81-~81: There might be a mistake here.
Context: ...hinking_mixin.ThinkingMixin - Field:thinking: bool | None - Default when supported:None` (use mode...

(QB_NEW_EN)


[grammar] ~82-~82: There might be a mistake here.
Context: ...en supported: None (use model default) - Only currently supported on Nemotron mod...

(QB_NEW_EN)

docs/source/extend/integrating-aws-bedrock-models.md

[grammar] ~51-~51: There might be a mistake here.
Context: ... LangChain. (must be > 0, default: 1024) * region_name: AWS region where your Bedrock service ...

(QB_NEW_EN)


[grammar] ~52-~52: There might be a mistake here.
Context: ...rock service is hosted (default: "None") * base_url: Custom Bedrock endpoint URL (default: ...

(QB_NEW_EN)


[grammar] ~53-~53: There might be a mistake here.
Context: ...t to use the default us-east-1 endpoint) * credentials_profile_name: AWS credentials profile name from ~/.a...

(QB_NEW_EN)


[grammar] ~54-~54: There might be a mistake here.
Context: ...s or ~/.aws/config files (default: None) * max_retries: The maximum number of retries for the ...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
docs/source/extend/integrating-aws-bedrock-models.md

50-50: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)


51-51: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)


52-52: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)


53-53: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)


54-54: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)


55-55: Unordered list style
Expected: dash; Actual: asterisk

(MD004, ul-style)

🔇 Additional comments (6)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (3)

40-45: LGTM: Localizing LangChain imports.

Scoping third-party imports inside the patcher reduces import-time dependencies for the plugin.


68-80: LGTM: Handles BaseMessage, PromptValue, str, and Sequence[BaseMessage] without in-place mutation.

This corrects prior gaps and prevents input mutation.


99-105: LGTM: Centralized retry patching after thinking.

Order ensures clean re-injection per attempt without double-prepending original args.

docs/source/extend/gated-fields.md (2)

44-51: Generic parameter removal from examples looks correct.

Switching from GatedFieldMixin[float] to GatedFieldMixin aligns with the updated non-generic implementation.


58-66: Consistent update to non-generic mixin.

AzureOnlyMixin example correctly drops the generic parameter to match code.

src/nat/data_models/gated_field_mixin.py (1)

165-178: Confirm intended fallback semantics match docs.

Implementation returns:

  • True when no keys were seen.
  • With keys: True if any supported matches; False if any unsupported matches; else default to (unsupported is not None).

This matches the proposed doc note; please confirm that supported-mode defaulting to “unsupported” on no-match is desired.

Co-authored-by: David Gardner <[email protected]>
Signed-off-by: Will Killian <[email protected]>
@willkill07
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 30885ff into NVIDIA:develop Aug 27, 2025
15 checks passed
@willkill07 willkill07 deleted the wkk_nemotron_thinking_support branch October 23, 2025 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants