Skip to content

Conversation

@lesteve
Copy link
Member

@lesteve lesteve commented Jan 20, 2023

What does this implement/fix? Explain your changes.

Looks like the test was meant to compare feature importances with n_jobs=1 and n_jobs=2 but we were actually not calling .fit ...

Unless I am very tired and I am missing something subtle of course ...

@lesteve lesteve added No Changelog Needed Quick Review For PRs that are quick to review labels Jan 20, 2023
@lesteve
Copy link
Member Author

lesteve commented Jan 20, 2023

There is an failure on ARM which does not seem related to this PR ...

__________________________ test_logreg_l1_sparse_data __________________________
[gw3] linux -- Python 3.9.15 /miniconda/envs/testenv/bin/python3.9

    def test_logreg_l1_sparse_data():
        # Because liblinear penalizes the intercept and saga does not, we do not
        # fit the intercept to make it possible to compare the coefficients of
        # the two models at convergence.
        rng = np.random.RandomState(42)
        n_samples = 50
        X, y = make_classification(n_samples=n_samples, n_features=20, random_state=0)
        X_noise = rng.normal(scale=0.1, size=(n_samples, 3))
        X_constant = np.zeros(shape=(n_samples, 2))
        X = np.concatenate((X, X_noise, X_constant), axis=1)
        X[X < 1] = 0
        X = sparse.csr_matrix(X)
    
        lr_liblinear = LogisticRegression(
            penalty="l1",
            C=1.0,
            solver="liblinear",
            fit_intercept=False,
            multi_class="ovr",
            tol=1e-10,
        )
>       lr_liblinear.fit(X, y)

/miniconda/envs/testenv/lib/python3.9/site-packages/sklearn/linear_model/tests/test_logistic.py:1018: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/miniconda/envs/testenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:1216: in fit
    self.coef_, self.intercept_, self.n_iter_ = _fit_liblinear(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

X = <50x25 sparse matrix of type '<class 'numpy.float64'>'
	with 149 stored elements in Compressed Sparse Row format>
y = array([1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0,
       0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0,
       1, 1, 1, 1, 0, 0])
C = 1.0, fit_intercept = False, intercept_scaling = 1, class_weight = None
penalty = 'l1', dual = False, verbose = 0, max_iter = 100, tol = 1e-10
random_state = None, multi_class = 'ovr', loss = 'logistic_regression'
epsilon = 0.1
sample_weight = array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

    def _fit_liblinear(
        X,
        y,
        C,
        fit_intercept,
        intercept_scaling,
        class_weight,
        penalty,
        dual,
        verbose,
        max_iter,
        tol,
        random_state=None,
        multi_class="ovr",
        loss="logistic_regression",
        epsilon=0.1,
        sample_weight=None,
    ):
        if loss not in ["epsilon_insensitive", "squared_epsilon_insensitive"]:
            enc = LabelEncoder()
            y_ind = enc.fit_transform(y)
            classes_ = enc.classes_
            if len(classes_) < 2:
                raise ValueError(
                    "This solver needs samples of at least 2 classes"
                    " in the data, but the data contains only one"
                    " class: %r"
                    % classes_[0]
                )
    
            class_weight_ = compute_class_weight(class_weight, classes=classes_, y=y)
        else:
            class_weight_ = np.empty(0, dtype=np.float64)
            y_ind = y
        liblinear.set_verbosity_wrap(verbose)
        rnd = check_random_state(random_state)
        if verbose:
            print("[LibLinear]", end="")
    
        # LinearSVC breaks when intercept_scaling is <= 0
        bias = -1.0
        if fit_intercept:
            if intercept_scaling <= 0:
                raise ValueError(
                    "Intercept scaling is %r but needs to be greater "
                    "than 0. To disable fitting an intercept,"
                    " set fit_intercept=False." % intercept_scaling
                )
            else:
                bias = intercept_scaling
    
        libsvm.set_verbosity_wrap(verbose)
        libsvm_sparse.set_verbosity_wrap(verbose)
        liblinear.set_verbosity_wrap(verbose)
    
        # Liblinear doesn't support 64bit sparse matrix indices yet
        if sp.issparse(X):
            _check_large_sparse(X)
    
        # LibLinear wants targets as doubles, even for classification
        y_ind = np.asarray(y_ind, dtype=np.float64).ravel()
        y_ind = np.require(y_ind, requirements="W")
    
        sample_weight = _check_sample_weight(sample_weight, X, dtype=np.float64)
    
        solver_type = _get_liblinear_solver_type(multi_class, penalty, loss, dual)
        raw_coef_, n_iter_ = liblinear.train_wrap(
            X,
            y_ind,
            sp.isspmatrix(X),
            solver_type,
            tol,
            bias,
            C,
            class_weight_,
            max_iter,
            rnd.randint(np.iinfo("i").max),
            epsilon,
            sample_weight,
        )
        # Regarding rnd.randint(..) in the above signature:
        # seed for srand in range [0..INT_MAX); due to limitations in Numpy
        # on 32-bit platforms, we can't get to the UINT_MAX limit that
        # srand supports
        n_iter_max = max(n_iter_)
        if n_iter_max >= max_iter:
>           warnings.warn(
                "Liblinear failed to converge, increase the number of iterations.",
                ConvergenceWarning,
            )
E           sklearn.exceptions.ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.

/miniconda/envs/testenv/lib/python3.9/site-packages/sklearn/svm/_base.py:1244: ConvergenceWarning
=============================== warnings summary ===============================

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! LGTM

@thomasjpfan thomasjpfan added the Waiting for Second Reviewer First reviewer is done, need a second one! label Jan 20, 2023
@thomasjpfan
Copy link
Member

As for the random ARM failure, I opened #25446 to stabilize the tests in test_logistic.

@lesteve
Copy link
Member Author

lesteve commented Jan 23, 2023

Actually from Olivier's comment in another issue, feature_importances_ is actually a property computing stuff in parallel, so this is fine ... closing this one.

The code is here.

@lesteve lesteve closed this Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ensemble No Changelog Needed Quick Review For PRs that are quick to review Waiting for Second Reviewer First reviewer is done, need a second one!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants