Skip to content

Conversation

@shchur
Copy link
Contributor

@shchur shchur commented Sep 24, 2025

Issue #, if available:

Description of changes:

  • Add leakage_imputation_model that replaces errors where trained_on_this_dataset==True.
  • Sort leaderboard by win_rate instead of skill_score for consistency with pairwise_comparison.
  • Bump version to 0.6.1.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@shchur shchur requested a review from abdulfatir September 24, 2025 13:56
Copy link
Collaborator

@abdulfatir abdulfatir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Unrelated to this specific PR, but should we maybe add some tests for the analysis stuff?

training_corpus_overlap_df = training_corpus_overlap_df.fillna(False).astype(bool)
if leakage_imputation_model not in errors_df.columns:
raise ValueError(
f"Results for leakage_imputation_model '{leakage_imputation_model}' are missing for some tasks."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the error message correct? Why some tasks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that was indeed a typo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the unit tests: good point, I've added some basic ones for the core analysis methods.

@shchur shchur merged commit 4fbe5d4 into main Sep 25, 2025
3 checks passed
@shchur shchur deleted the add-leakage-imputation-model branch September 25, 2025 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants