Skip to content

Conversation

@AnuradhaKaruppiah
Copy link
Contributor

@AnuradhaKaruppiah AnuradhaKaruppiah commented Aug 18, 2025

Description

When an Evaluation run is started the eval_call id is pushed to the call stack so subsequent traces can be grouped underneath it. This allows the user to debug the predictions, scores and traces for each run easily.

  • This PR is a workaround as the eval (score) exporting will be migrated to the observability exporter_manager in the future
  • This workaround only works for local eval. Remote eval is yet to be solved.
  • This PR also adds a change to wait on the trace export to finish before finishing the evaluation run
image

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

@AnuradhaKaruppiah AnuradhaKaruppiah added the DO NOT MERGE PR should not be merged; see PR for details label Aug 18, 2025
@AnuradhaKaruppiah AnuradhaKaruppiah self-assigned this Aug 18, 2025
Signed-off-by: Anuradha Karuppiah <[email protected]>
@AnuradhaKaruppiah AnuradhaKaruppiah changed the base branch from release/1.2 to develop August 18, 2025 21:24
@AnuradhaKaruppiah AnuradhaKaruppiah added improvement Improvement to existing functionality non-breaking Non-breaking change and removed DO NOT MERGE PR should not be merged; see PR for details labels Aug 19, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements grouping of workflow traces under parent evaluation calls in Weave by introducing a trace context management system. The changes enable better debugging by organizing predictions, scores, and traces hierarchically under each evaluation run.

Key changes:

  • Adds evaluation trace context system with Weave-specific implementation
  • Integrates trace context into evaluation workflow to group traces under parent evaluation
  • Adds task waiting mechanism to ensure trace exports complete before evaluation finishes

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/nat/eval/utils/eval_trace_ctx.py New evaluation trace context classes for framework-agnostic trace coordination
src/nat/eval/utils/weave_eval.py Updated to accept and use trace context for evaluation call propagation
src/nat/eval/evaluate.py Integrated trace context and added export task waiting for local workflows
src/nat/builder/workflow.py Added method to expose all exporters for task waiting
src/nat/observability/exporter/base_exporter.py Made task waiting method public for external access
packages/nvidia_nat_weave/src/nat/plugins/weave/weave_exporter.py Updated comments to reflect new export coordination approach

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

Copy link
Contributor

@mpenn mpenn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good. Just some minor nitpick comments.

Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
Signed-off-by: Anuradha Karuppiah <[email protected]>
@AnuradhaKaruppiah
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit aec0975 into NVIDIA:develop Aug 21, 2025
17 of 18 checks passed
saglave pushed a commit to snps-scm13/SNPS-NeMo-Agent-Toolkit that referenced this pull request Sep 2, 2025
…#663)

When an Evaluation run is started the eval_call id is pushed to the call stack so subsequent traces can be grouped underneath it. This allows the user to debug the predictions, scores and traces for each run easily.

- This PR is a workaround as the eval (score) exporting will be migrated to the observability exporter_manager in the future
- This workaround only works for local eval. Remote eval is yet to be solved.
- This PR also adds a change to wait on the trace export to finish before finishing the evaluation run

<img width="2044" height="907" alt="image" src="https://github.com/user-attachments/assets/4f169a63-7152-4b17-a9ea-bdf6ffda9218" />

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

Authors:
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

Approvers:
  - Matthew Penn (https://github.com/mpenn)

URL: NVIDIA#663
Signed-off-by: Sangharsh Aglave <[email protected]>
@AnuradhaKaruppiah AnuradhaKaruppiah deleted the ak-weave-eval-fixes branch September 18, 2025 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement to existing functionality non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants