Skip to content

Conversation

@dagardner-nv
Copy link
Contributor

@dagardner-nv dagardner-nv commented Mar 17, 2025

Fixes #41

@dagardner-nv dagardner-nv self-assigned this Mar 17, 2025
@dagardner-nv dagardner-nv marked this pull request as draft March 17, 2025 18:27
@dagardner-nv dagardner-nv changed the title Firs pass at setting up issue templates First pass at setting up issue templates Mar 17, 2025
@cwharris cwharris marked this pull request as ready for review March 18, 2025 14:12
@dagardner-nv
Copy link
Contributor Author

/merge

@dagardner-nv dagardner-nv merged commit 41047a6 into NVIDIA:develop Mar 31, 2025
1 check passed
@dagardner-nv dagardner-nv deleted the david-ci-first-start branch March 31, 2025 15:35
ericevans-nv referenced this pull request in ericevans-nv/agent-iq Apr 14, 2025
* First pass at adding issue templates, and rapids bots

Signed-off-by: David Gardner <[email protected]>

* Fix team name

Signed-off-by: David Gardner <[email protected]>

* Fix team name

Signed-off-by: David Gardner <[email protected]>

* First pass at a minimal CI for a PR

Signed-off-by: David Gardner <[email protected]>

* add ops-bots.yaml

Signed-off-by: Christopher Harris <[email protected]>

* rename ops-bot.yaml -> ops-bots.yaml

Signed-off-by: Christopher Harris <[email protected]>

* rename ops-bots.yaml -> ops-bot.yaml

Signed-off-by: Christopher Harris <[email protected]>

---------

Signed-off-by: David Gardner <[email protected]>
Signed-off-by: Christopher Harris <[email protected]>
Co-authored-by: Christopher Harris <[email protected]>
yczhang-nv referenced this pull request in yczhang-nv/NeMo-Agent-Toolkit Apr 21, 2025
* First pass at adding issue templates, and rapids bots

Signed-off-by: David Gardner <[email protected]>

* Fix team name

Signed-off-by: David Gardner <[email protected]>

* Fix team name

Signed-off-by: David Gardner <[email protected]>

* First pass at a minimal CI for a PR

Signed-off-by: David Gardner <[email protected]>

* add ops-bots.yaml

Signed-off-by: Christopher Harris <[email protected]>

* rename ops-bot.yaml -> ops-bots.yaml

Signed-off-by: Christopher Harris <[email protected]>

* rename ops-bots.yaml -> ops-bot.yaml

Signed-off-by: Christopher Harris <[email protected]>

---------

Signed-off-by: David Gardner <[email protected]>
Signed-off-by: Christopher Harris <[email protected]>
Co-authored-by: Christopher Harris <[email protected]>
Signed-off-by: Yuchen Zhang <[email protected]>
yczhang-nv referenced this pull request in yczhang-nv/NeMo-Agent-Toolkit May 8, 2025
* First pass at adding issue templates, and rapids bots

Signed-off-by: David Gardner <[email protected]>

* Fix team name

Signed-off-by: David Gardner <[email protected]>

* Fix team name

Signed-off-by: David Gardner <[email protected]>

* First pass at a minimal CI for a PR

Signed-off-by: David Gardner <[email protected]>

* add ops-bots.yaml

Signed-off-by: Christopher Harris <[email protected]>

* rename ops-bot.yaml -> ops-bots.yaml

Signed-off-by: Christopher Harris <[email protected]>

* rename ops-bots.yaml -> ops-bot.yaml

Signed-off-by: Christopher Harris <[email protected]>

---------

Signed-off-by: David Gardner <[email protected]>
Signed-off-by: Christopher Harris <[email protected]>
Co-authored-by: Christopher Harris <[email protected]>
Signed-off-by: Yuchen Zhang <[email protected]>
Mahanth-Maha pushed a commit to Mahanth-Maha/AIQToolkit-NuGinie that referenced this pull request Jul 9, 2025
AnuradhaKaruppiah pushed a commit to AnuradhaKaruppiah/oss-agentiq that referenced this pull request Aug 4, 2025
* First pass at adding issue templates, and rapids bots

Signed-off-by: David Gardner <[email protected]>

* Fix team name

Signed-off-by: David Gardner <[email protected]>

* Fix team name

Signed-off-by: David Gardner <[email protected]>

* First pass at a minimal CI for a PR

Signed-off-by: David Gardner <[email protected]>

* add ops-bots.yaml

Signed-off-by: Christopher Harris <[email protected]>

* rename ops-bot.yaml -> ops-bots.yaml

Signed-off-by: Christopher Harris <[email protected]>

* rename ops-bots.yaml -> ops-bot.yaml

Signed-off-by: Christopher Harris <[email protected]>

---------

Signed-off-by: David Gardner <[email protected]>
Signed-off-by: Christopher Harris <[email protected]>
Co-authored-by: Christopher Harris <[email protected]>
copy-pr-bot bot pushed a commit that referenced this pull request Aug 13, 2025
scheckerNV pushed a commit to scheckerNV/aiq-factory-reset that referenced this pull request Aug 22, 2025
* First pass at adding issue templates, and rapids bots

Signed-off-by: David Gardner <[email protected]>

* Fix team name

Signed-off-by: David Gardner <[email protected]>

* Fix team name

Signed-off-by: David Gardner <[email protected]>

* First pass at a minimal CI for a PR

Signed-off-by: David Gardner <[email protected]>

* add ops-bots.yaml

Signed-off-by: Christopher Harris <[email protected]>

* rename ops-bot.yaml -> ops-bots.yaml

Signed-off-by: Christopher Harris <[email protected]>

* rename ops-bots.yaml -> ops-bot.yaml

Signed-off-by: Christopher Harris <[email protected]>

---------

Signed-off-by: David Gardner <[email protected]>
Signed-off-by: Christopher Harris <[email protected]>
Co-authored-by: Christopher Harris <[email protected]>
rapids-bot bot pushed a commit that referenced this pull request Oct 10, 2025
1. Add intermediate_manager instance tracking similar to observabilty
2. Add configurable tracemalloc to track top users
3. Add a debug end point for dumping stats

Sample Usage:
```
nat mcp serve --config_file examples/getting_started/simple_calculator/configs/config.yml --enable_memory_profiling True --memory_profile_interval 10 --memory_profile_log_level=INFO
```

Start the client and run eval against that endpoint to similuate multiple-users:
```
nat serve --config_file examples/MCP/simple_calculator_mcp/configs/config-mcp-client.yml 
nat eval --config_file examples/evaluation_and_profiling/simple_calculator_eval/configs/config-tunable-rag-eval.yml --endpoint http://localhost:8000 --reps=2
```

Sample Output (an intentional resource leak was used to reference output; this is not expected with a regular workflow):
```
================================================================================
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:166 - MEMORY PROFILE AFTER 20 REQUESTS:
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:167 -   Current Memory: 2.95 MB
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:168 -   Peak Memory: 7.35 MB
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:169 - 
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:170 - NAT COMPONENT INSTANCES:
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:171 -   IntermediateStepManagers: 1 active (0 outstanding steps)
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:174 -   BaseExporters: 0 active (0 isolated)
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:175 -   Subject (event streams): 1 instances
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:182 - TOP 10 MEMORY GROWTH SINCE BASELINE:
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:189 -   #1: /home/devcontainers/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/linecache.py:139: size=753 KiB (+753 KiB), count=7950 (+7950), average=97 B
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:189 -   #2: <frozen importlib._bootstrap_external>:757: size=704 KiB (+704 KiB), count=5558 (+5558), average=130 B
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:189 -   #3: <frozen abc>:123: size=188 KiB (+188 KiB), count=2460 (+2460), average=78 B
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:189 -   #4: /home/devcontainers/dev/forks/nat/examples/getting_started/simple_calculator/src/nat_simple_calculator/register.py:118: size=98.1 KiB (+98.1 KiB), count=10 (+10), average=10041 B
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:189 -   #5: <frozen abc>:106: size=67.9 KiB (+67.9 KiB), count=238 (+238), average=292 B
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:189 -   #6: /home/devcontainers/dev/forks/nat/examples/getting_started/simple_calculator/src/nat_simple_calculator/register.py:128: size=48.9 KiB (+48.9 KiB), count=10 (+10), average=5007 B
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:189 -   #7: /home/devcontainers/dev/forks/nat/examples/getting_started/simple_calculator/src/nat_simple_calculator/register.py:112: size=37.7 KiB (+37.7 KiB), count=11 (+11), average=3509 B
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:189 -   #8: /home/devcontainers/dev/forks/nat/.venv/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py:40: size=30.3 KiB (+30.3 KiB), count=346 (+346), average=90 B
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:189 -   #9: /home/devcontainers/dev/forks/nat/.venv/lib/python3.12/site-packages/pydantic/main.py:253: size=26.0 KiB (+26.0 KiB), count=167 (+167), average=159 B
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:189 -   #10: /home/devcontainers/dev/forks/nat/.venv/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py:37: size=24.4 KiB (+24.4 KiB), count=500 (+500), average=50 B
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:191 - ===============================================================================
```

You can also get aggregate stats via the `debug/memory/stats` endpoint on the MCP server -
```
 curl -s http://localhost:9901/debug/memory/stats |jq
{
  "enabled": true,
  "request_count": 16,
  "current_memory_mb": 3.41,
  "peak_memory_mb": 7.75,
  "active_intermediate_managers": 1,
  "outstanding_steps": 0,
  "active_exporters": 0,
  "isolated_exporters": 0,
  "subject_instances": 0
}
```
## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.



## Summary by CodeRabbit

* **New Features**
  * Optional memory profiling for the MCP front end with an enable flag, configurable interval/top‑N, and a new debug endpoint exposing current memory stats.
  * Per-call profiling hooks integrated into function registration and invocation flows.

* **Improvements**
  * Runtime visibility now includes active manager and outstanding-step counts plus exporter/subject counts.
  * Safer baseline management and defensive handling when tracing is unavailable; configurable per-request logging.

* **Tests**
  * Comprehensive tests for profiler behavior, metrics, and error handling.

Authors:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - Will Killian (https://github.com/willkill07)
  - Yuchen Zhang (https://github.com/yczhang-nv)

URL: #961
elliott-davis pushed a commit to elliott-davis/NeMo-Agent-Toolkit that referenced this pull request Oct 30, 2025
1. Add intermediate_manager instance tracking similar to observabilty
2. Add configurable tracemalloc to track top users
3. Add a debug end point for dumping stats

Sample Usage:
```
nat mcp serve --config_file examples/getting_started/simple_calculator/configs/config.yml --enable_memory_profiling True --memory_profile_interval 10 --memory_profile_log_level=INFO
```

Start the client and run eval against that endpoint to similuate multiple-users:
```
nat serve --config_file examples/MCP/simple_calculator_mcp/configs/config-mcp-client.yml
nat eval --config_file examples/evaluation_and_profiling/simple_calculator_eval/configs/config-tunable-rag-eval.yml --endpoint http://localhost:8000 --reps=2
```

Sample Output (an intentional resource leak was used to reference output; this is not expected with a regular workflow):
```
================================================================================
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:166 - MEMORY PROFILE AFTER 20 REQUESTS:
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:167 -   Current Memory: 2.95 MB
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:168 -   Peak Memory: 7.35 MB
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:169 -
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:170 - NAT COMPONENT INSTANCES:
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:171 -   IntermediateStepManagers: 1 active (0 outstanding steps)
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:174 -   BaseExporters: 0 active (0 isolated)
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:175 -   Subject (event streams): 1 instances
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:182 - TOP 10 MEMORY GROWTH SINCE BASELINE:
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:189 -   NVIDIA#1: /home/devcontainers/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/linecache.py:139: size=753 KiB (+753 KiB), count=7950 (+7950), average=97 B
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:189 -   NVIDIA#2: <frozen importlib._bootstrap_external>:757: size=704 KiB (+704 KiB), count=5558 (+5558), average=130 B
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:189 -   NVIDIA#3: <frozen abc>:123: size=188 KiB (+188 KiB), count=2460 (+2460), average=78 B
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:189 -   NVIDIA#4: /home/devcontainers/dev/forks/nat/examples/getting_started/simple_calculator/src/nat_simple_calculator/register.py:118: size=98.1 KiB (+98.1 KiB), count=10 (+10), average=10041 B
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:189 -   NVIDIA#5: <frozen abc>:106: size=67.9 KiB (+67.9 KiB), count=238 (+238), average=292 B
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:189 -   NVIDIA#6: /home/devcontainers/dev/forks/nat/examples/getting_started/simple_calculator/src/nat_simple_calculator/register.py:128: size=48.9 KiB (+48.9 KiB), count=10 (+10), average=5007 B
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:189 -   NVIDIA#7: /home/devcontainers/dev/forks/nat/examples/getting_started/simple_calculator/src/nat_simple_calculator/register.py:112: size=37.7 KiB (+37.7 KiB), count=11 (+11), average=3509 B
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:189 -   NVIDIA#8: /home/devcontainers/dev/forks/nat/.venv/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py:40: size=30.3 KiB (+30.3 KiB), count=346 (+346), average=90 B
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:189 -   NVIDIA#9: /home/devcontainers/dev/forks/nat/.venv/lib/python3.12/site-packages/pydantic/main.py:253: size=26.0 KiB (+26.0 KiB), count=167 (+167), average=159 B
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:189 -   NVIDIA#10: /home/devcontainers/dev/forks/nat/.venv/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py:37: size=24.4 KiB (+24.4 KiB), count=500 (+500), average=50 B
2025-10-09 19:53:34 - INFO     - nat.front_ends.mcp.memory_profiler:191 - ===============================================================================
```

You can also get aggregate stats via the `debug/memory/stats` endpoint on the MCP server -
```
 curl -s http://localhost:9901/debug/memory/stats |jq
{
  "enabled": true,
  "request_count": 16,
  "current_memory_mb": 3.41,
  "peak_memory_mb": 7.75,
  "active_intermediate_managers": 1,
  "outstanding_steps": 0,
  "active_exporters": 0,
  "isolated_exporters": 0,
  "subject_instances": 0
}
```
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

* **New Features**
  * Optional memory profiling for the MCP front end with an enable flag, configurable interval/top‑N, and a new debug endpoint exposing current memory stats.
  * Per-call profiling hooks integrated into function registration and invocation flows.

* **Improvements**
  * Runtime visibility now includes active manager and outstanding-step counts plus exporter/subject counts.
  * Safer baseline management and defensive handling when tracing is unavailable; configurable per-request logging.

* **Tests**
  * Comprehensive tests for profiler behavior, metrics, and error handling.

Authors:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - Will Killian (https://github.com/willkill07)
  - Yuchen Zhang (https://github.com/yczhang-nv)

URL: NVIDIA#961
copy-pr-bot bot pushed a commit that referenced this pull request Nov 7, 2025
…config-011CUfXjUjdaDKiPKh5kVWUc

refactor: Move function intercepts configuration from decorator to YAML
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing issue templates

3 participants