Skip to content

Please add flag to log score for each sample (akin to Eleuther's LM Evaluation Harness) #215

@RylanSchaeffer

Description

@RylanSchaeffer

Hi! I've been using EleutherAI's LM Evaluation Harness and I'd like to be able to also runs some code tasks using your Big Code Evaluation Harness. We need the scores for each sample in each benchmark and the LM Evaluation Harness has a helpful flag log_samples that activates logging the per-sample scores.

As best as I can tell (and please correct me if I'm wrong), Big Code's Evaluation Harness doesn't have a similar flag. If my understanding is correct, could this please be added?

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions