Skip to content

Conversation

@thibsej
Copy link
Contributor

@thibsej thibsej commented Feb 26, 2019

Reference Issues/PRs

this PR works on #11000 by preserving the dtype in LDA.

cross reference: #8769 (comment)

What does this implement/fix? Explain your changes.

Any other comments?

@thibsej thibsej marked this pull request as ready for review February 26, 2019 13:59
"'lsqr', and 'eigen').".format(self.solver))
if self.classes_.size == 2: # treat binary case as a special case
self.coef_ = np.array(self.coef_[1, :] - self.coef_[0, :], ndmin=2)
my_type = np.float32 if (X.dtype == np.float32) else np.float64
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is necessary when X.dtype is int32 or int64.
Should we keep the number of bits short, i.e. perfom casting from int32 to float32, or int32 to float 64 ?

ping @glemaitre

@thibsej thibsej changed the title Add float32 support for Linear Discriminant Analysis [MRG] Add float32 support for Linear Discriminant Analysis Feb 26, 2019
"'lsqr', and 'eigen').".format(self.solver))
if self.classes_.size == 2: # treat binary case as a special case
self.coef_ = np.array(self.coef_[1, :] - self.coef_[0, :], ndmin=2)
my_type = np.float32 if (X.dtype in [np.float32, np.int32]) else np.float64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should already have called check_X_y and/or as_float_array above. Then you can just use X.dtype to specify the dtype of arrays you allocate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, and the check_X_y of line 430 probably needs to be changed to "dtype=[np.float32, np.float64]"

assert_warns_message(FutureWarning, future_msg, lda.fit, X, y)

@pytest.mark.parametrize("data_type, expected_type",[
(np.float32, np.float32), (np.float64, np.float64), (np.int32, np.float32), (np.int64, np.float64)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep8 violations. please check

@GaelVaroquaux
Copy link
Member

No additional on top of those @agramfort and myself made above.

"""
# FIXME: Future warning to be removed in 0.23
X, y = check_X_y(X, y, ensure_min_samples=2, estimator=self)
X = as_float_array(X)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is necessary to get coefficients of type float32 output when X is of type int32. Removing this line gives coefficients of type float64.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is strange becasue the next check_X_y will convert to float anyway. It seems unecessary, isn't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see why. So X will be converted to float64 and you expect it to be converted to float 32 because X is of type int32. I would say that there is no real reason for that. int32 can default to float64. I am fine with that.

@massich what is the mechanism in the other estimators that you modified?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was no mechanism set. I've the vague idea that we talked about it long ago, but I can't recall that we agreed on anything. To me it makes sense that if someone willing creates the data in 32, wants to keep it in 32 bits. But if such individual is there she/he will scream at us, or would had done it already. So I've no objection on defaulting to float64.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An int32 is a "very big" int. Hence, it make sense to convert it to a float64. People who want to save memory with ints use int16. Indeed, an int has a larger span than a float for the same memory cost.

@glemaitre
Copy link
Member

CI are failing

@glemaitre glemaitre self-requested a review February 26, 2019 16:36
@glemaitre
Copy link
Member

It looks good. Could you add an entry in the what's new because we change the behavior of LDA.
I am still unsure about as_float_array call.

from .covariance import ledoit_wolf, empirical_covariance, shrunk_covariance
from .utils.multiclass import unique_labels
from .utils import check_array, check_X_y
from .utils import check_array, check_X_y, as_float_array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from .utils import check_array, check_X_y, as_float_array
from .utils import check_array, check_X_y
from .utils import as_float_array

"""
# FIXME: Future warning to be removed in 0.23
X, y = check_X_y(X, y, ensure_min_samples=2, estimator=self)
X = as_float_array(X)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is strange becasue the next check_X_y will convert to float anyway. It seems unecessary, isn't it?



@pytest.mark.parametrize("data_type, expected_type", [
(np.float32, np.float32), (np.float64, np.float64), (np.int32, np.float32),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convention for int32 is casting to float64.

@glemaitre
Copy link
Member

@thibsej Could you add an entry to what's new


@pytest.mark.parametrize("data_type, expected_type", [
(np.float32, np.float32), (np.float64, np.float64), (np.int32, np.float64),
(np.int64, np.float64)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this one more readable:

@pytest.mark.parametrize("data_type, expected_type", [
  (np.float32, np.float32),
  (np.float64, np.float64),
  (np.int32, np.float64),
  (np.int64, np.float64)
])

# Check value consistency between types
rtol = 1e-6
assert_allclose(clf_32.coef_, clf_64.coef_.astype(np.float32),
rtol=rtol)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not use astype assert_allclose should be able to handle it.

        assert_allclose(clf_32.coef_, clf_64.coef_, rtol=rtol)

@massich
Copy link
Contributor

massich commented Feb 26, 2019

@thibsej Could you add an entry to what's new

You should search for this section in doc/whats_new/v0.21.rst.
(if its not there you create it). Then you add the following entry

:mod:`sklearn.discriminant_analysis`
....................................


- |Enhancement| :class:`discriminant_analysis.LinearDiscriminantAnalysis` now
preserves ``float32`` and ``float64`` dtypes. :issues:`8769` and
:issues:`11000` by :user:`Thibault Sejourne <thibsej>`

Copy link
Contributor

@massich massich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, once addressed the nitpicks

@GaelVaroquaux GaelVaroquaux changed the title [MRG] Add float32 support for Linear Discriminant Analysis [MRG+1] Add float32 support for Linear Discriminant Analysis Feb 26, 2019
@GaelVaroquaux GaelVaroquaux changed the title [MRG+1] Add float32 support for Linear Discriminant Analysis [MRG+2] Add float32 support for Linear Discriminant Analysis Feb 26, 2019
@GaelVaroquaux
Copy link
Member

LGTM once the tests pass.

@pytest.mark.parametrize("data_type, expected_type", [
(np.float32, np.float32),
(np.float64, np.float64),
(np.int32, np.float64),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going from int32 to float64 is not what was done by isotonic regression isofused branch this morning. We should be consistent here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am +1 for int32 -> float64. It was the semantic for check_array.
@ogrisel was also for this conversion. I would think that we should correct the isotonic regression then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Merging. This is a net improvement. We'll nitpick later :D

@glemaitre
Copy link
Member

We should almost have a common test.

@GaelVaroquaux GaelVaroquaux merged commit 415fd83 into scikit-learn:master Feb 27, 2019
@massich
Copy link
Contributor

massich commented Feb 27, 2019 via email

@thibsej thibsej deleted the float32_support_lda branch February 27, 2019 09:20
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
…learn#13273)

* [skip ci] Empty commit to trigger PR

* Add dtype testing

* Fix: dtype testing

* Fix test_estimators[OneVsRestClassifier-check_estimators_dtypes]

* TST refactor using parametrize + Add failing test for int32

* Fix for int32

* Fix code according to review + Fix PEP8 violation

* Fix dtype for int32 and complex

* Fix pep8 violation

* Update whatsnew + test COSMIT
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
…learn#13273)

* [skip ci] Empty commit to trigger PR

* Add dtype testing

* Fix: dtype testing

* Fix test_estimators[OneVsRestClassifier-check_estimators_dtypes]

* TST refactor using parametrize + Add failing test for int32

* Fix for int32

* Fix code according to review + Fix PEP8 violation

* Fix dtype for int32 and complex

* Fix pep8 violation

* Update whatsnew + test COSMIT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants