Skip to content

FIX compute precision-recall at 100% recall#23214

Merged
jeremiedbb merged 11 commits intoscikit-learn:mainfrom
stephanecollot:fix_precision_recall_curve
May 2, 2022
Merged

FIX compute precision-recall at 100% recall#23214
jeremiedbb merged 11 commits intoscikit-learn:mainfrom
stephanecollot:fix_precision_recall_curve

Conversation

@stephanecollot
Copy link
Copy Markdown
Contributor

Reference Issues/PRs

Fixes #23213

What does this implement/fix? Explain your changes.

Remove the unnecessary dropping.

Any other comments?

Full disclosure, this PR modifies precision_recall_curve() that is only used by _binary_uninterpolated_average_precision() that is only used by average_precision_score()

precision, recall, _ = precision_recall_curve(

I think average_precision_score() should not be impacted by this change, and its is tested 54 times in unit tests

@glemaitre
Copy link
Copy Markdown
Member

You will need to fix failing tests. Basically, it should be docstring test in most cases.
We also need an additional non-regression test: I think that we should be a PR curve and check that the last point is equivalent to a decision rule that always predicts the positive class.

@stephanecollot
Copy link
Copy Markdown
Contributor Author

@glemaitre I added 3 commits:

  • fixing existing unit tests
  • fixing existing doc tests
  • adding regression test (I check that it is failing if I revert the change)

@glemaitre
Copy link
Copy Markdown
Member

Please add an entry to the change log at doc/whats_new/v1.1.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

@glemaitre glemaitre changed the title Fixes precision recall at 100% recall FIX compute precision-recall at 100% recall Apr 26, 2022
... y_true, y_scores)
>>> precision
array([0.66666667, 0.5 , 1. , 1. ])
array([0.5 , 0.66666667, 0.5 , 1. , 1. ])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the docstring of thresholds to define properly n_thresholds.

Copy link
Copy Markdown
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from these 2 changes, LGTM

Copy link
Copy Markdown
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. LGTM

@stephanecollot
Copy link
Copy Markdown
Contributor Author

stephanecollot commented Apr 29, 2022

It seems the doc is failing, but I cannot understand why:

Sphinx Warnings in affected files
doc/whats_new/v1.1.rst:105: WARNING: Unexpected indentation.

nothing wrong at line 105.

@glemaitre
Copy link
Copy Markdown
Member

For sure this is not linked with your change. We can ignore it.

@glemaitre
Copy link
Copy Markdown
Member

I merge main into your branch. The problem was solved in main by the following PR: #23246

Copy link
Copy Markdown
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @stephanecollot

@jeremiedbb jeremiedbb merged commit 32c53bc into scikit-learn:main May 2, 2022
@stephanecollot
Copy link
Copy Markdown
Contributor Author

It went very smoothly, I'm happy I did this first contribution, thank you @glemaitre.
I will most probably continue to contribute by adding uncertainty band on precision-recall curves soon.

@glemaitre
Copy link
Copy Markdown
Member

@stephanecollot I started a POC on the subject: #21211

The idea is to use cross-validation to get uncertainty bounds. I will probably find time to carry on some work in the next release. However, we will need to breakdown the PR into smaller PR to facilitate the review process:

  1. Add the return_indices parameter to cross_validate
  2. Then add a new from_cv_results class method for each display

glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request May 19, 2022
glemaitre added a commit that referenced this pull request May 19, 2022
Co-authored-by: Guillaume Lemaitre <[email protected]>
Co-authored-by: jeremiedbb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

precision_recall_curve() is not returning the full curve at high recall

3 participants