[AI Evaluation] Add EvaluationMetric property for contexts used in evaluation

When looking at scores for metrics like `Equivalence` and `Groundedness` in the report, there is currently no way to know the grounding context that was used for the evaluation. We have heard feedback that this makes it harder for anyone who is viewing the report to assess / debug why the corresponding score was low / high.

This issue tracks the following changes to address this -
- [ ] Introduce a first-class property with type `Dictionary<string, string>?` on `EvaluationMetric` to store the contexts (if any) that were used as part of the evaluation.
- [ ] Update `GroundednessEvaluator` and `EquivalenceEvaluator` to store their respective contexts in this property.
- [ ] Update the generated evaluation report to display the contexts. For now, we can display contexts if available when hovering on the card for a particular metric. But eventually, it should be possible to click on a card for a metric and view the contexts associated with this metric in a details section below the cards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AI Evaluation] Add EvaluationMetric property for contexts used in evaluation #6033

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[AI Evaluation] Add EvaluationMetric property for contexts used in evaluation #6033

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions