Skip to content

Overflow in matthews_corrcoef on 32-bit numpy #2806

@aldanor

Description

@aldanor

Example:

import sklearn.metrics
import numpy

def matthews_corrcoef(y_true, y_predicted):
    conf_matrix = sklearn.metrics.confusion_matrix(y_true, y_predicted)
    true_pos = conf_matrix[1,1]
    false_pos = conf_matrix[1,0]
    false_neg = conf_matrix[0,1]
    n_points = conf_matrix.sum()*1.0
    pos_rate = (true_pos + false_neg) / n_points
    activity = (true_pos + false_pos) / n_points
    mcc_numerator = true_pos / n_points - pos_rate * activity
    mcc_denominator = activity * pos_rate * (1 - activity) * (1 - pos_rate)
    return mcc_numerator / numpy.sqrt(mcc_denominator)

def random_ys(n_points):
    x_true = numpy.random.sample(n_points)
    x_pred = x_true + 0.2 * (numpy.random.sample(n_points) - 0.5)
    y_true = (x_true > 0.5) * 1.0
    y_pred = (x_pred > 0.5) * 1.0
    return y_true, y_pred

for n_points in [10, 100, 1000]:
    y_true, y_pred = random_ys(n_points)
    mcc_safe = matthews_corrcoef(y_true, y_pred)
    mcc_unsafe = sklearn.metrics.matthews_corrcoef(y_true, y_pred)
    try:
        assert(abs(mcc_safe - mcc_unsafe) < 1e-8)
    except AssertionError:
        print('Error: mcc_safe=%s, mcc_unsafe=%s, n_points=%s' % (
            mcc_safe, mcc_unsafe, n_points))

This runs fine on 64-bit unix box, but not on 32-bit windows machine due to overflows (and afaik getting 64-bit numpy to work on windows is hard next to impossble).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions