Skip to content

CalibratedClassifierCV fails silently with large confidence scores #26766

@sktin

Description

@sktin

AB_ = fmin_bfgs(objective, AB0, fprime=grad, disp=False)

With default method='sigmoid', CalibratedClassifierCV may fail silently, giving an AUC score of 0.5, apparently when the classifier outputs large confidence scores. This happens in particular with SGDClassifier. You may question the quality of SGDClassifier; that's a separate question. But if you change method to isotonic, then AUC is correct, showing that despite suspiciously large confidence scores, SGDClassifier does rank the samples correctly. Code to reproduce:

import numpy as np
from sklearn.calibration import CalibratedClassifierCV
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import cross_val_score
# Setting up a trivially easy classification problem
r = 0.67
N = 1000
y_train = np.array([1]*int(N*r)+[0]*(N-int(N*r)))
X_train = 1e5*y_train.reshape((-1,1)) + np.random.default_rng(42).normal(size=N)
model = CalibratedClassifierCV(SGDClassifier(loss='squared_hinge',random_state=42))
print('Logistic calibration: ', cross_val_score(model,X_train,y_train,scoring='roc_auc').mean())
model = CalibratedClassifierCV(SGDClassifier(loss='squared_hinge',random_state=42),method='isotonic')
print('Isotonic calibration: ', cross_val_score(model,X_train,y_train,scoring='roc_auc').mean())

Expected output:

Logistic calibration:  0.5
Isotonic calibration:  1.0

The logistic calibration code appears to fail silently at this line

AB_ = fmin_bfgs(objective, AB0, fprime=grad, disp=False)
)
Because disp=False, the caller does not know that fmin_bfgs has failed. If you turn on the warning, then you can see that it fails without one iteration, so that logistic calibration outputs constant probability.

from math import log
from scipy.special import expit, xlogy
from sklearn.utils import column_or_1d
from scipy.optimize import fmin_bfgs

def _sigmoid_calibration_debug(predictions, y, sample_weight=None):
    """Probability Calibration with sigmoid method (Platt 2000)
    Parameters
    ----------
    predictions : ndarray of shape (n_samples,)
        The decision function or predict proba for the samples.
    y : ndarray of shape (n_samples,)
        The targets.
    sample_weight : array-like of shape (n_samples,), default=None
        Sample weights. If None, then samples are equally weighted.
    Returns
    -------
    a : float
        The slope.
    b : float
        The intercept.
    References
    ----------
    Platt, "Probabilistic Outputs for Support Vector Machines"
    """
    predictions = column_or_1d(predictions)
    y = column_or_1d(y)

    F = predictions  # F follows Platt's notations

    # Bayesian priors (see Platt end of section 2.2):
    # It corresponds to the number of samples, taking into account the
    # `sample_weight`.
    mask_negative_samples = y <= 0
    if sample_weight is not None:
        prior0 = (sample_weight[mask_negative_samples]).sum()
        prior1 = (sample_weight[~mask_negative_samples]).sum()
    else:
        prior0 = float(np.sum(mask_negative_samples))
        prior1 = y.shape[0] - prior0
    T = np.zeros_like(y, dtype=np.float64)
    T[y > 0] = (prior1 + 1.0) / (prior1 + 2.0)
    T[y <= 0] = 1.0 / (prior0 + 2.0)
    T1 = 1.0 - T

    def objective(AB):
        # From Platt (beginning of Section 2.2)
        P = expit(-(AB[0] * F + AB[1]))
        loss = -(xlogy(T, P) + xlogy(T1, 1.0 - P))
        if sample_weight is not None:
            return (sample_weight * loss).sum()
        else:
            return loss.sum()

    def grad(AB):
        # gradient of the objective function
        P = expit(-(AB[0] * F + AB[1]))
        TEP_minus_T1P = T - P
        if sample_weight is not None:
            TEP_minus_T1P *= sample_weight
        dA = np.dot(TEP_minus_T1P, F)
        dB = np.sum(TEP_minus_T1P)
        return np.array([dA, dB])

    AB0 = np.array([0.0, log((prior0 + 1.0) / (prior1 + 1.0))])
    AB_ = fmin_bfgs(objective, AB0, fprime=grad, disp=True) # Turn on outputs
    return AB_[0], AB_[1]

import sklearn
sklearn.calibration._sigmoid_calibration = _sigmoid_calibration_debug

model = CalibratedClassifierCV(SGDClassifier(loss='squared_hinge',random_state=42))
print('Logistic calibration: ', cross_val_score(model,X_train,y_train,scoring='roc_auc').mean())

Expected output:

Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 100.908259
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 101.623705
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 101.623705
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 101.623705
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 101.623705
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 100.908259
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 101.623705
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 101.623705
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 101.623705
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 101.623705
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 100.908259
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 101.623705
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 101.623705
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 101.623705
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 101.623705
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 100.908259
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 101.623705
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 101.623705
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 101.623705
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 101.623705
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 100.908259
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 101.623705
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 101.623705
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 101.623705
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 101.623705
         Iterations: 0
         Function evaluations: 15
         Gradient evaluations: 3
Logistic calibration:  0.5

Perhaps a more robust solution is to replace fmin_bfgs by a derivative-free minimization algorithm such as scipy.optimize.minimize with method='Nelder-Mead'.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions