When looking at scores for metrics like Equivalence and Groundedness in the report, there is currently no way to know the grounding context that was used for the evaluation. We have heard feedback that this makes it harder for anyone who is viewing the report to assess / debug why the corresponding score was low / high.
This issue tracks the following changes to address this -
When looking at scores for metrics like
EquivalenceandGroundednessin the report, there is currently no way to know the grounding context that was used for the evaluation. We have heard feedback that this makes it harder for anyone who is viewing the report to assess / debug why the corresponding score was low / high.This issue tracks the following changes to address this -
Dictionary<string, string>?onEvaluationMetricto store the contexts (if any) that were used as part of the evaluation.GroundednessEvaluatorandEquivalenceEvaluatorto store their respective contexts in this property.