Update evaluation docs #2

AnuradhaKaruppiah · 2025-03-15T19:39:26Z

Fix config file used in the concepts readmed. config.yml => eval_config.yml
Fix the format for "--skip-workflow" to specify the dataset
Miscellaneous cleanup

1. Fix config file used in the concepts readmed. config.yml => eval_config.yml 2. Fix the format for "--skip-workflow" to specify the dataset 2. Miscellaneous cleanup Signed-off-by: Anuradha Karuppiah <[email protected]>

Signed-off-by: Anuradha Karuppiah <[email protected]>

AnuradhaKaruppiah · 2025-03-16T02:19:24Z

CI is not setup yet but I ran vale locally

Update evaluation docs

…cs-p2 Vale spelling fixes

Update evaluation docs

1. Add intermediate_manager instance tracking similar to observabilty 2. Add configurable tracemalloc to track top users 3. Add a debug end point for dumping stats Sample Usage: ``` nat mcp serve --config_file examples/getting_started/simple_calculator/configs/config.yml --enable_memory_profiling True --memory_profile_interval 10 --memory_profile_log_level=INFO ``` Start the client and run eval against that endpoint to similuate multiple-users: ``` nat serve --config_file examples/MCP/simple_calculator_mcp/configs/config-mcp-client.yml nat eval --config_file examples/evaluation_and_profiling/simple_calculator_eval/configs/config-tunable-rag-eval.yml --endpoint http://localhost:8000 --reps=2 ``` Sample Output (an intentional resource leak was used to reference output; this is not expected with a regular workflow): ``` ================================================================================ 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:166 - MEMORY PROFILE AFTER 20 REQUESTS: 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:167 - Current Memory: 2.95 MB 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:168 - Peak Memory: 7.35 MB 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:169 - 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:170 - NAT COMPONENT INSTANCES: 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:171 - IntermediateStepManagers: 1 active (0 outstanding steps) 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:174 - BaseExporters: 0 active (0 isolated) 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:175 - Subject (event streams): 1 instances 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:182 - TOP 10 MEMORY GROWTH SINCE BASELINE: 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:189 - #1: /home/devcontainers/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/linecache.py:139: size=753 KiB (+753 KiB), count=7950 (+7950), average=97 B 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:189 - #2: <frozen importlib._bootstrap_external>:757: size=704 KiB (+704 KiB), count=5558 (+5558), average=130 B 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:189 - #3: <frozen abc>:123: size=188 KiB (+188 KiB), count=2460 (+2460), average=78 B 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:189 - #4: /home/devcontainers/dev/forks/nat/examples/getting_started/simple_calculator/src/nat_simple_calculator/register.py:118: size=98.1 KiB (+98.1 KiB), count=10 (+10), average=10041 B 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:189 - #5: <frozen abc>:106: size=67.9 KiB (+67.9 KiB), count=238 (+238), average=292 B 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:189 - #6: /home/devcontainers/dev/forks/nat/examples/getting_started/simple_calculator/src/nat_simple_calculator/register.py:128: size=48.9 KiB (+48.9 KiB), count=10 (+10), average=5007 B 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:189 - #7: /home/devcontainers/dev/forks/nat/examples/getting_started/simple_calculator/src/nat_simple_calculator/register.py:112: size=37.7 KiB (+37.7 KiB), count=11 (+11), average=3509 B 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:189 - #8: /home/devcontainers/dev/forks/nat/.venv/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py:40: size=30.3 KiB (+30.3 KiB), count=346 (+346), average=90 B 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:189 - #9: /home/devcontainers/dev/forks/nat/.venv/lib/python3.12/site-packages/pydantic/main.py:253: size=26.0 KiB (+26.0 KiB), count=167 (+167), average=159 B 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:189 - #10: /home/devcontainers/dev/forks/nat/.venv/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py:37: size=24.4 KiB (+24.4 KiB), count=500 (+500), average=50 B 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:191 - =============================================================================== ``` You can also get aggregate stats via the `debug/memory/stats` endpoint on the MCP server - ``` curl -s http://localhost:9901/debug/memory/stats |jq { "enabled": true, "request_count": 16, "current_memory_mb": 3.41, "peak_memory_mb": 7.75, "active_intermediate_managers": 1, "outstanding_steps": 0, "active_exporters": 0, "isolated_exporters": 0, "subject_instances": 0 } ``` ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **New Features** * Optional memory profiling for the MCP front end with an enable flag, configurable interval/top‑N, and a new debug endpoint exposing current memory stats. * Per-call profiling hooks integrated into function registration and invocation flows. * **Improvements** * Runtime visibility now includes active manager and outstanding-step counts plus exporter/subject counts. * Safer baseline management and defensive handling when tracing is unavailable; configurable per-request logging. * **Tests** * Comprehensive tests for profiler behavior, metrics, and error handling. Authors: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Will Killian (https://github.com/willkill07) - Yuchen Zhang (https://github.com/yczhang-nv) URL: #961

1. Add intermediate_manager instance tracking similar to observabilty 2. Add configurable tracemalloc to track top users 3. Add a debug end point for dumping stats Sample Usage: ``` nat mcp serve --config_file examples/getting_started/simple_calculator/configs/config.yml --enable_memory_profiling True --memory_profile_interval 10 --memory_profile_log_level=INFO ``` Start the client and run eval against that endpoint to similuate multiple-users: ``` nat serve --config_file examples/MCP/simple_calculator_mcp/configs/config-mcp-client.yml nat eval --config_file examples/evaluation_and_profiling/simple_calculator_eval/configs/config-tunable-rag-eval.yml --endpoint http://localhost:8000 --reps=2 ``` Sample Output (an intentional resource leak was used to reference output; this is not expected with a regular workflow): ``` ================================================================================ 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:166 - MEMORY PROFILE AFTER 20 REQUESTS: 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:167 - Current Memory: 2.95 MB 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:168 - Peak Memory: 7.35 MB 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:169 - 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:170 - NAT COMPONENT INSTANCES: 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:171 - IntermediateStepManagers: 1 active (0 outstanding steps) 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:174 - BaseExporters: 0 active (0 isolated) 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:175 - Subject (event streams): 1 instances 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:182 - TOP 10 MEMORY GROWTH SINCE BASELINE: 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:189 - NVIDIA#1: /home/devcontainers/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/linecache.py:139: size=753 KiB (+753 KiB), count=7950 (+7950), average=97 B 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:189 - NVIDIA#2: <frozen importlib._bootstrap_external>:757: size=704 KiB (+704 KiB), count=5558 (+5558), average=130 B 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:189 - NVIDIA#3: <frozen abc>:123: size=188 KiB (+188 KiB), count=2460 (+2460), average=78 B 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:189 - NVIDIA#4: /home/devcontainers/dev/forks/nat/examples/getting_started/simple_calculator/src/nat_simple_calculator/register.py:118: size=98.1 KiB (+98.1 KiB), count=10 (+10), average=10041 B 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:189 - NVIDIA#5: <frozen abc>:106: size=67.9 KiB (+67.9 KiB), count=238 (+238), average=292 B 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:189 - NVIDIA#6: /home/devcontainers/dev/forks/nat/examples/getting_started/simple_calculator/src/nat_simple_calculator/register.py:128: size=48.9 KiB (+48.9 KiB), count=10 (+10), average=5007 B 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:189 - NVIDIA#7: /home/devcontainers/dev/forks/nat/examples/getting_started/simple_calculator/src/nat_simple_calculator/register.py:112: size=37.7 KiB (+37.7 KiB), count=11 (+11), average=3509 B 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:189 - NVIDIA#8: /home/devcontainers/dev/forks/nat/.venv/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py:40: size=30.3 KiB (+30.3 KiB), count=346 (+346), average=90 B 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:189 - NVIDIA#9: /home/devcontainers/dev/forks/nat/.venv/lib/python3.12/site-packages/pydantic/main.py:253: size=26.0 KiB (+26.0 KiB), count=167 (+167), average=159 B 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:189 - NVIDIA#10: /home/devcontainers/dev/forks/nat/.venv/lib/python3.12/site-packages/uvicorn/protocols/http/httptools_impl.py:37: size=24.4 KiB (+24.4 KiB), count=500 (+500), average=50 B 2025-10-09 19:53:34 - INFO - nat.front_ends.mcp.memory_profiler:191 - =============================================================================== ``` You can also get aggregate stats via the `debug/memory/stats` endpoint on the MCP server - ``` curl -s http://localhost:9901/debug/memory/stats |jq { "enabled": true, "request_count": 16, "current_memory_mb": 3.41, "peak_memory_mb": 7.75, "active_intermediate_managers": 1, "outstanding_steps": 0, "active_exporters": 0, "isolated_exporters": 0, "subject_instances": 0 } ``` - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. * **New Features** * Optional memory profiling for the MCP front end with an enable flag, configurable interval/top‑N, and a new debug endpoint exposing current memory stats. * Per-call profiling hooks integrated into function registration and invocation flows. * **Improvements** * Runtime visibility now includes active manager and outstanding-step counts plus exporter/subject counts. * Safer baseline management and defensive handling when tracing is unavailable; configurable per-request logging. * **Tests** * Comprehensive tests for profiler behavior, metrics, and error handling. Authors: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Will Killian (https://github.com/willkill07) - Yuchen Zhang (https://github.com/yczhang-nv) URL: NVIDIA#961

…alues exactly Signed-off-by: David Gardner <[email protected]>

…api_version-dg Parametarize the api_version argument

AnuradhaKaruppiah requested review from dagardner-nv and mdemoret-nv March 15, 2025 19:39

AnuradhaKaruppiah force-pushed the eval-doc-fixes branch from 5627c17 to 8978f46 Compare March 15, 2025 19:46

Update evaluation docs

98b1377

1. Fix config file used in the concepts readmed. config.yml => eval_config.yml 2. Fix the format for "--skip-workflow" to specify the dataset 2. Miscellaneous cleanup Signed-off-by: Anuradha Karuppiah <[email protected]>

AnuradhaKaruppiah force-pushed the eval-doc-fixes branch from 8978f46 to 98b1377 Compare March 15, 2025 19:50

sean-javiya-nvidia approved these changes Mar 16, 2025

View reviewed changes

Fix vale warnings

9a4fe25

Signed-off-by: Anuradha Karuppiah <[email protected]>

AnuradhaKaruppiah merged commit 1e4026b into NVIDIA:develop Mar 16, 2025

AnuradhaKaruppiah referenced this pull request in AnuradhaKaruppiah/oss-agentiq Aug 4, 2025

Merge pull request #2 from AnuradhaKaruppiah/eval-doc-fixes

0a35658

Update evaluation docs

copy-pr-bot bot pushed a commit that referenced this pull request Aug 13, 2025

Merge pull request #2 from dagardner-nv/david-yuchen-update-naming-do…

a2179cf

…cs-p2 Vale spelling fixes

scheckerNV pushed a commit to scheckerNV/aiq-factory-reset that referenced this pull request Aug 22, 2025

Merge pull request NVIDIA#2 from AnuradhaKaruppiah/eval-doc-fixes

04b1d31

Update evaluation docs

dagardner-nv added a commit to dagardner-nv/NeMo-Agent-Toolkit that referenced this pull request Nov 3, 2025

Indent table to appear under item NVIDIA#2, add note about entering v…

484a305

…alues exactly Signed-off-by: David Gardner <[email protected]>

copy-pr-bot bot pushed a commit that referenced this pull request Nov 17, 2025

Merge pull request #2 from dagardner-nv/wkk_fix-azure-openai-missing-…

8fee073

…api_version-dg Parametarize the api_version argument

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update evaluation docs #2

Update evaluation docs #2

Uh oh!

AnuradhaKaruppiah commented Mar 15, 2025

Uh oh!

AnuradhaKaruppiah commented Mar 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update evaluation docs #2

Update evaluation docs #2

Uh oh!

Conversation

AnuradhaKaruppiah commented Mar 15, 2025

Uh oh!

AnuradhaKaruppiah commented Mar 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants