Weave: Group workflow traces under the parent evaluation call #663

AnuradhaKaruppiah · 2025-08-18T14:48:03Z

Description

When an Evaluation run is started the eval_call id is pushed to the call stack so subsequent traces can be grouped underneath it. This allows the user to debug the predictions, scores and traces for each run easily.

This PR is a workaround as the eval (score) exporting will be migrated to the observability exporter_manager in the future
This workaround only works for local eval. Remote eval is yet to be solved.
This PR also adds a change to wait on the trace export to finish before finishing the evaluation run

By Submitting this PR I confirm:

I am familiar with the Contributing Guidelines.
We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
When the PR is ready for review, new or existing tests cover these changes.
When the PR is ready for review, the documentation is up to date with these changes.

Signed-off-by: Anuradha Karuppiah <[email protected]>

Copilot

Pull Request Overview

This PR implements grouping of workflow traces under parent evaluation calls in Weave by introducing a trace context management system. The changes enable better debugging by organizing predictions, scores, and traces hierarchically under each evaluation run.

Key changes:

Adds evaluation trace context system with Weave-specific implementation
Integrates trace context into evaluation workflow to group traces under parent evaluation
Adds task waiting mechanism to ensure trace exports complete before evaluation finishes

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/nat/eval/utils/eval_trace_ctx.py	New evaluation trace context classes for framework-agnostic trace coordination
src/nat/eval/utils/weave_eval.py	Updated to accept and use trace context for evaluation call propagation
src/nat/eval/evaluate.py	Integrated trace context and added export task waiting for local workflows
src/nat/builder/workflow.py	Added method to expose all exporters for task waiting
src/nat/observability/exporter/base_exporter.py	Made task waiting method public for external access
packages/nvidia_nat_weave/src/nat/plugins/weave/weave_exporter.py	Updated comments to reflect new export coordination approach

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.}

src/nat/eval/evaluate.py

src/nat/eval/utils/eval_trace_ctx.py

src/nat/eval/evaluate.py

Signed-off-by: Anuradha Karuppiah <[email protected]>

mpenn

Overall, this looks good. Just some minor nitpick comments.

src/nat/eval/utils/eval_trace_ctx.py

src/nat/eval/utils/weave_eval.py

Signed-off-by: Anuradha Karuppiah <[email protected]>

AnuradhaKaruppiah · 2025-08-21T02:50:04Z

/merge

…#663) When an Evaluation run is started the eval_call id is pushed to the call stack so subsequent traces can be grouped underneath it. This allows the user to debug the predictions, scores and traces for each run easily. - This PR is a workaround as the eval (score) exporting will be migrated to the observability exporter_manager in the future - This workaround only works for local eval. Remote eval is yet to be solved. - This PR also adds a change to wait on the trace export to finish before finishing the evaluation run <img width="2044" height="907" alt="image" src="https://github.com/user-attachments/assets/4f169a63-7152-4b17-a9ea-bdf6ffda9218" /> ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Matthew Penn (https://github.com/mpenn) URL: NVIDIA#663 Signed-off-by: Sangharsh Aglave <[email protected]>

AnuradhaKaruppiah added the DO NOT MERGE PR should not be merged; see PR for details label Aug 18, 2025

AnuradhaKaruppiah self-assigned this Aug 18, 2025

Changes for eval trace grouping

fb3512e

Signed-off-by: Anuradha Karuppiah <[email protected]>

AnuradhaKaruppiah force-pushed the ak-weave-eval-fixes branch from 56458f1 to fb3512e Compare August 18, 2025 21:23

AnuradhaKaruppiah changed the base branch from release/1.2 to develop August 18, 2025 21:24

Simplify the logic used for trace export wait

04ae339

Signed-off-by: Anuradha Karuppiah <[email protected]>

AnuradhaKaruppiah force-pushed the ak-weave-eval-fixes branch from 513b1c0 to 04ae339 Compare August 19, 2025 01:34

AnuradhaKaruppiah added improvement Improvement to existing functionality non-breaking Non-breaking change and removed DO NOT MERGE PR should not be merged; see PR for details labels Aug 19, 2025

AnuradhaKaruppiah requested a review from Copilot August 19, 2025 01:39

Copilot AI reviewed Aug 19, 2025

View reviewed changes

src/nat/eval/evaluate.py Show resolved Hide resolved

src/nat/eval/evaluate.py Show resolved Hide resolved

src/nat/eval/utils/eval_trace_ctx.py Outdated Show resolved Hide resolved

src/nat/eval/evaluate.py Outdated Show resolved Hide resolved

AnuradhaKaruppiah added 5 commits August 18, 2025 18:44

Pass the export_timeout as config

40e81f7

Signed-off-by: Anuradha Karuppiah <[email protected]>

Merge remote-tracking branch 'upstream/develop' into ak-weave-eval-fixes

6a48719

Drop unnecessary changes in the trace exporter

4f3de87

Signed-off-by: Anuradha Karuppiah <[email protected]>

Remove white space changes

12a76c6

Signed-off-by: Anuradha Karuppiah <[email protected]>

Add unit tests

b5fac07

Signed-off-by: Anuradha Karuppiah <[email protected]>

mpenn approved these changes Aug 21, 2025

View reviewed changes

src/nat/eval/utils/eval_trace_ctx.py Outdated Show resolved Hide resolved

src/nat/eval/utils/eval_trace_ctx.py Outdated Show resolved Hide resolved

src/nat/eval/utils/eval_trace_ctx.py Outdated Show resolved Hide resolved

src/nat/eval/utils/weave_eval.py Outdated Show resolved Hide resolved

AnuradhaKaruppiah added 4 commits August 20, 2025 18:11

Address review comments

cb9b5fc

Signed-off-by: Anuradha Karuppiah <[email protected]>

Fix CI check failures

e497f30

Signed-off-by: Anuradha Karuppiah <[email protected]>

More CI fixes

2009e07

Signed-off-by: Anuradha Karuppiah <[email protected]>

Fix unit tests

c33afe8

Signed-off-by: Anuradha Karuppiah <[email protected]>

rapids-bot bot merged commit aec0975 into NVIDIA:develop Aug 21, 2025
17 of 18 checks passed

AnuradhaKaruppiah deleted the ak-weave-eval-fixes branch September 18, 2025 23:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Weave: Group workflow traces under the parent evaluation call #663

Weave: Group workflow traces under the parent evaluation call #663

Uh oh!

AnuradhaKaruppiah commented Aug 18, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mpenn left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AnuradhaKaruppiah commented Aug 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Weave: Group workflow traces under the parent evaluation call #663

Weave: Group workflow traces under the parent evaluation call #663

Uh oh!

Conversation

AnuradhaKaruppiah commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

By Submitting this PR I confirm:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mpenn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AnuradhaKaruppiah commented Aug 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AnuradhaKaruppiah commented Aug 18, 2025 •

edited

Loading