-
Notifications
You must be signed in to change notification settings - Fork 2
feat(experiments): wire eval_model config field to evaluator construction #2113
Copy link
Copy link
Closed
Labels
enhancementNew feature or requestNew feature or requestexperimentsAutonomous self-experimentationAutonomous self-experimentation
Description
Problem
In crates/zeph-core/src/agent/experiment_cmd.rs:115-117, there is a documented TODO:
// TODO(#eval-model): eval_model config field is not yet wired to evaluator construction.
// Both the agent path and runner.rs use the agent's own provider as the judge.
// Wire eval_model to create a separate judge provider in a follow-up PR.
let evaluator = Evaluator::new(Arc::clone(&provider_arc), benchmark, config.eval_budget_tokens)?;Impact
When experiments.eval_model is set in config, it is silently ignored. The agent's own provider always acts as the experiment judge/evaluator, which can produce biased scores when the agent being tested and the judge are the same model.
Expected behavior
When experiments.eval_model is set, create a separate LLM provider instance for the evaluator so that the judge is independent from the agent under test.
Notes
- Feature is
#[cfg(feature = "experiments")]andenabled = falsein default config — no production impact today - Found during static code analysis in CI-71 (2026-03-22)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestexperimentsAutonomous self-experimentationAutonomous self-experimentation