Skip to content

feat(experiments): wire eval_model config field to evaluator construction #2113

@bug-ops

Description

@bug-ops

Problem

In crates/zeph-core/src/agent/experiment_cmd.rs:115-117, there is a documented TODO:

// TODO(#eval-model): eval_model config field is not yet wired to evaluator construction.
// Both the agent path and runner.rs use the agent's own provider as the judge.
// Wire eval_model to create a separate judge provider in a follow-up PR.
let evaluator = Evaluator::new(Arc::clone(&provider_arc), benchmark, config.eval_budget_tokens)?;

Impact

When experiments.eval_model is set in config, it is silently ignored. The agent's own provider always acts as the experiment judge/evaluator, which can produce biased scores when the agent being tested and the judge are the same model.

Expected behavior

When experiments.eval_model is set, create a separate LLM provider instance for the evaluator so that the judge is independent from the agent under test.

Notes

  • Feature is #[cfg(feature = "experiments")] and enabled = false in default config — no production impact today
  • Found during static code analysis in CI-71 (2026-03-22)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestexperimentsAutonomous self-experimentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions