ENH implement `CalibrationCurveDisplay.from_cv_results` by glemaitre · Pull Request #21211 · scikit-learn/scikit-learn

glemaitre · 2021-10-01T12:44:22Z

This PR intends to add the capability of plotting uncertainty of the different curves (calibration, precision-recall, roc, etc.) by using the results of cross-validation (i.e. the output of cross_validate).

TODO:

add a parameter return_indices in cross_validate to store the train-test indices. It is the safest way to keep track of the train-test splits in the case of stochastic splitting strategies.
add a method from_cv_results in the plotting display to take advantage of the CV computation.
add unit test for from_cv_results
add unit test for the new keyword parameters in CalibrationDisplay
add unit test for the new strategy of binning in calibration_curve

Usage example

# %%
import numpy as np
from sklearn.datasets import make_classification

X, y = make_classification(
    n_samples=10_000, weights=[0.1, 0.9], random_state=42, class_sep=1
)
sample_weight = np.zeros_like(y, dtype=np.float64)
sample_weight[y == 0] = 0.1
sample_weight[y == 1] = 0.9

# %%
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test, sw_train, sw_test = train_test_split(
    X, y, sample_weight, random_state=42
)

# %%
from sklearn.linear_model import LogisticRegression
from sklearn.calibration import CalibratedClassifierCV

calibration_method = "isotonic"
models = {
    "LR no weights": LogisticRegression(),
    "LR class weights": LogisticRegression(class_weight="balanced"),
    "Calibrated LR no weights": CalibratedClassifierCV(
        LogisticRegression(),
        method=calibration_method,
    ),
    "Calibrated LR class weights": CalibratedClassifierCV(
        LogisticRegression(class_weight="balanced"),
        method=calibration_method,
    ),
    "Calibrated LR sample weights": CalibratedClassifierCV(
        LogisticRegression(),
        method=calibration_method,
    ),
    "Calibrated LR class and sample weights": CalibratedClassifierCV(
        LogisticRegression(class_weight="balanced"),
        method=calibration_method,
    ),
}

# %%
import matplotlib.pyplot as plt
from sklearn.calibration import CalibrationDisplay
from sklearn.metrics import balanced_accuracy_score

fig, ax = plt.subplots()

calibration_display_params = {
    "n_bins": 20,
    "strategy": "quantile",
}
for name, model in models.items():
    if "sample weights" in name:
        model.fit(X_train, y_train, sample_weight=sw_train)
    else:
        model.fit(X_train, y_train)

    score = balanced_accuracy_score(y_test, model.predict(X_test))
    CalibrationDisplay.from_estimator(
        model,
        X_test,
        y_test,
        name=name + f" - {score:.3f}",
        ax=ax,
        **calibration_display_params,
    )
ax.legend(loc="center left", bbox_to_anchor=(1, 0.5), title="Model - Balanced Accuracy")
_ = fig.suptitle(f"Using {calibration_method} calibration")

# %%
from sklearn.model_selection import cross_validate
from sklearn.model_selection import KFold

cv_results = {}
cv = KFold(n_splits=5)
for name, model in models.items():
    if "sample weights" in name:
        fit_params = {"sample_weight": sample_weight}
    else:
        fit_params = {}
    cv_results[name] = cross_validate(
        model,
        X,
        y,
        cv=cv,
        fit_params=fit_params,
        scoring="balanced_accuracy",
        return_estimator=True,
        return_indices=True,
    )

# %%
fig, ax = plt.subplots()
for model_idx, (name, results) in enumerate(cv_results.items()):
    CalibrationDisplay.from_cv_results(
        results, X, y, ax=ax, name=name, **calibration_display_params
    )
ax.legend(loc="center left", bbox_to_anchor=(1, 0.5))

# %%
fig, ax = plt.subplots()
for model_idx, (name, results) in enumerate(cv_results.items()):
    CalibrationDisplay.from_cv_results(
        results,
        X,
        y,
        ax=ax,
        name=name,
        plot_uncertainty_style="fill_between",
        **calibration_display_params,
    )
ax.legend(loc="center left", bbox_to_anchor=(1, 0.5))

# %%
fig, ax = plt.subplots()
for model_idx, (name, results) in enumerate(cv_results.items()):
    CalibrationDisplay.from_cv_results(
        results,
        X,
        y,
        ax=ax,
        name=name,
        plot_uncertainty_style="lines",
        **calibration_display_params,
    )
ax.legend(loc="center left", bbox_to_anchor=(1, 0.5))

# %%

ogrisel · 2021-10-21T13:01:26Z

sklearn/calibration.py

            calibrated classifier.

+        plot_uncertainty_style : {"errorbar", "fill_between", "lines"}, \
+                default="errorbar"


I think the default should plot_uncertainty_style="lines" as it's the easier to understand without being mislead. For plot_uncertainty_style="errorbar" and plot_uncertainty_style="fill_between" we need to know that it's based on the raw standard deviation (as opposed to a pseudo confidence interval based on the standard error of the mean for instance).

We could also accept plot_uncertainty_style=None to only plot the mean CV calibration curve without any uncertainty markers on the plot.

Also plot_uncertainty_style="shade" or plot_uncertainty_style="shaded_area" might be easier to understand than plot_uncertainty_style="fill_between".

I think that for the first iteration, I would rather only implement the "lines" strategy and not the others and remove this parameter from the public API.

For the record, this is the strategy followed when adding the from_cv_results method to RocCurveDisplay.

This way, we don't have to anticipate how the from_cv_results feature will interact or not with the orthogonal feature request to add fixed model uncertainty that results from the finite size sampling of the validation/calibration sets.

ogrisel · 2021-10-21T13:05:13Z

sklearn/calibration.py

+                default="errorbar"
+            Style to plot the uncertainty information. Possibilities are:
+
+            - "errorbar": error bars representing one standard deviation;


two standard deviations: 1 above and 1 below.

I assume (I did not check ;)

I checked and I think I am right:

import numpy as np import matplotlib.pyplot as plt plt.errorbar(np.arange(5), np.ones(5), np.ones(5))

thomasjpfan · 2021-11-19T18:26:48Z

sklearn/model_selection/_validation.py

+    return_indices : bool, default=False
+        Whether to return the train-test indices selected for each split.


Coming from #21664, I agree return_indices is useful. (I wanted to do something like this recently).

adrinjalali · 2024-03-07T10:16:01Z

@glemaitre this seems cool to be continued!

glemaitre · 2024-03-07T11:56:09Z

Yep this also pet of the CZI proposal on inspection. This would be my next effort after the tuning threshold classifier.

ogrisel · 2025-12-11T17:16:45Z

@AnneBeyer @lucyleeow: I think @glemaitre won't have time to revive this PR for the foreseeable future, but I think it's a very interesting feature.

Feel free to takeover this work in a new PR synced with the current state of the main branch.

lucyleeow · 2025-12-12T02:48:15Z

I'm slowly adding from_cv_results to various displays. #30508 is still waiting for review and #32235 is sort of waiting on #30508 as they are a bit intertwined, with a fair bit if merge conflicts.
Not sure if it is worth starting this until at least #30508 is merged, due to merge conflicts, waiting for reviews etc

adrinjalali · 2026-03-09T11:34:41Z

Since #30508 is merged, can this continue? Maybe @antoinebaker @AnneBaker @StefanieSenger ?

StefanieSenger · 2026-03-09T13:30:01Z

I think we meant to tag @AnneBeyer, instead of @AnneBaker. :)

AnneBeyer · 2026-03-09T13:48:41Z

Yes, I have this on my to-do list.

lucyleeow · 2026-03-09T19:53:54Z

I do think DetCurveDisplay would be an easier place to start/better introduction though. Partly because I have a draft PR #32235 and partly because CalibrationCurveDisplay is a little bit different from the other binary displays.

ENH add return_indices in cross_validate

6b09fe6

github-actions bot added the module:model_selection label Oct 1, 2021

glemaitre changed the title ~~ENH use uncertainty estimate~~ ENH use cv_results in the different curve display to add confidence intervals Oct 1, 2021

glemaitre marked this pull request as draft October 1, 2021 12:45

glemaitre added 3 commits October 1, 2021 14:45

iter

f5b6efb

ENH add from_cv_results into CalibrationDisplay

e6d86f5

changelog

bae6a9b

ogrisel self-requested a review October 19, 2021 09:18

ogrisel reviewed Oct 21, 2021

View reviewed changes

glemaitre mentioned this pull request Oct 22, 2021

Allow inspect.PartialDependenceDisplay to work with a prediction function #21388

Open

ogrisel mentioned this pull request Oct 28, 2021

FEA Add variable importance to linear models #21170

Open

thomasjpfan reviewed Nov 19, 2021

View reviewed changes

glemaitre mentioned this pull request May 2, 2022

FIX compute precision-recall at 100% recall #23214

Merged

glemaitre mentioned this pull request Jul 25, 2022

Add option to RocCurveDisplay to display the average of different length ROC curves #23983

Open

glemaitre mentioned this pull request Feb 22, 2023

ENH allow to return train/test split indices in cross_validate #25659

Merged

stephanecollot mentioned this pull request Mar 14, 2023

Sampling uncertainty on precision-recall and ROC curves #25856

Open

ogrisel mentioned this pull request Jan 17, 2025

UX CalibrationDisplay's naive use can lead to very confusing results #30664

Open

ogrisel changed the title ~~ENH use cv_results in the different curve display to add confidence intervals~~ ENH implement CalibrationCurveDisplay.from_cv_results Dec 11, 2025

ogrisel added this to Labs Jan 26, 2026

ogrisel moved this to In progress in Labs Jan 26, 2026

lucyleeow moved this from Discussion to Todo in Visualization and displays Feb 27, 2026

adrinjalali moved this from In progress to Todo in Labs Mar 9, 2026

		return_indices : bool, default=False
		Whether to return the train-test indices selected for each split.

Uh oh!

Conversation

glemaitre commented Oct 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO:

Usage example

Uh oh!

ogrisel Oct 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Oct 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Oct 21, 2021

Choose a reason for hiding this comment

Uh oh!

ogrisel Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Oct 21, 2021

Choose a reason for hiding this comment

Uh oh!

ogrisel Oct 21, 2021

Choose a reason for hiding this comment

Uh oh!

ogrisel Oct 21, 2021

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Nov 19, 2021

Choose a reason for hiding this comment

Uh oh!

adrinjalali commented Mar 7, 2024

Uh oh!

glemaitre commented Mar 7, 2024

Uh oh!

ogrisel commented Dec 11, 2025

Uh oh!

lucyleeow commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adrinjalali commented Mar 9, 2026

Uh oh!

StefanieSenger commented Mar 9, 2026

Uh oh!

AnneBeyer commented Mar 9, 2026

Uh oh!

lucyleeow commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

glemaitre commented Oct 1, 2021 •

edited

Loading

ogrisel Oct 21, 2021 •

edited

Loading

ogrisel Oct 21, 2021 •

edited

Loading

ogrisel Dec 11, 2025 •

edited

Loading

lucyleeow commented Dec 12, 2025 •

edited

Loading