ENH Preserving dtype for np.float32 in RandomProjection #22114

takoika · 2022-01-02T12:38:53Z

Reference Issues/PRs

This is a part of #11000 .

What does this implement/fix? Explain your changes.

This PR makes SparseRandomProjection and GaussianRandomProjection preserve numpy.float32 passed as input.

Any other comments?

…0gauusian_random_projection

takoika · 2022-01-02T14:07:25Z

In tutorial.rst GaussianRandomProjection is used as an example not to preserve input dtype. This is why doc tests in CI fail.

thomasjpfan

Thank you for the PR @takoika !

thomasjpfan · 2022-01-02T19:24:23Z

sklearn/random_projection.py

-        self.components_ = self._make_random_matrix(self.n_components_, n_features)
+        self.components_ = self._make_random_matrix(
+            self.n_components_, n_features
+        ).astype(X.dtype)


We can prevent a copy for the np.float64 case:

Suggested change

).astype(X.dtype)

).astype(X.dtype, copy=False)

Although, there is still a memory copy for float32, which is technically a regression for the float32 case.

A better solution would be to use Generator.standard_normal, which supports different dtype outputs. This way we do not need casting. To go down this path, we would need to support Generators, which is a long term goal: #20669

In the short term, we need to be content with this memory copy regression for float32. Practically, n_components by n_features matrix is not too big, so I think it should be okay. I would still wait to see what others think.

In the short term, we need to be content with this memory copy regression for float32. Practically, n_components by n_features matrix is not too big, so I think it should be okay. I would still wait to see what others think.

I agree we can probably leave with it. The non-copy of X with shape (n_samples, n_features) is likely to yield a net memory efficiency improvement when X.dtype == np.float32 in most practical cases.

ogrisel

Thanks for the PR. LGTM once @thomasjpfan suggestion above and mine below are taken into account.

sklearn/tests/test_random_projection.py

thomasjpfan

Thanks for the update!

sklearn/random_projection.py

thomasjpfan

Minor comment, otherwise LGTM

sklearn/tests/test_random_projection.py

takoika added 7 commits December 24, 2021 15:35

Add unit tests to verify perserving input dtype

e7ad849

Meke perserve input np.float32

3eaee61

Add test to verify numerical consistency

afee4ae

Merge branch 'main' of github.com:takoika/scikit-learn into issue1100…

006cd48

…0gauusian_random_projection

Add changelog

7a8cee8

Use rtol instead of atol

557d970

Use atol

869d54f

Use RBFSampler instead of GaussianRandomProjection

138749f

thomasjpfan reviewed Jan 2, 2022

View reviewed changes

ogrisel approved these changes Jan 3, 2022

View reviewed changes

sklearn/tests/test_random_projection.py Show resolved Hide resolved

takoika added 3 commits January 5, 2022 01:11

Add test for components for dtype matching and numerical consistency

b8462a5

Add comments

8c6a598

Not to copy in type casting

bb80595

takoika requested a review from thomasjpfan January 4, 2022 16:19

thomasjpfan reviewed Jan 5, 2022

View reviewed changes

sklearn/random_projection.py Outdated Show resolved Hide resolved

Move _more_tags to BaseRandomProjection

c5fd7f1

takoika requested a review from thomasjpfan January 6, 2022 10:54

thomasjpfan approved these changes Jan 6, 2022

View reviewed changes

sklearn/tests/test_random_projection.py Outdated Show resolved Hide resolved

takoika and others added 2 commits January 7, 2022 10:28

Use assert_allclose_dense_sparse

7a11f16

DOC Move whats new to random_projection section

47dba79

thomasjpfan changed the title ~~ENH Preserving dtype for np.float32 in SparseRandomProjection and GaussianRandomProjection~~ ENH Preserving dtype for np.float32 in RandomProjection Jan 7, 2022

thomasjpfan merged commit 8b6b519 into scikit-learn:main Jan 7, 2022

thomasjpfan mentioned this pull request Jan 8, 2022

Preserving dtype for float32 / float64 in transformers #11000

Open

28 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH Preserving dtype for np.float32 in RandomProjection #22114

ENH Preserving dtype for np.float32 in RandomProjection #22114

takoika commented Jan 2, 2022 •

edited

Loading

Uh oh!

takoika commented Jan 2, 2022

Uh oh!

thomasjpfan left a comment

Uh oh!

thomasjpfan Jan 2, 2022 •

edited

Loading

Uh oh!

ogrisel Jan 3, 2022 •

edited

Loading

Uh oh!

ogrisel left a comment

Uh oh!

Uh oh!

thomasjpfan left a comment

Uh oh!

Uh oh!

thomasjpfan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

ENH Preserving dtype for np.float32 in RandomProjection #22114

ENH Preserving dtype for np.float32 in RandomProjection #22114

Conversation

takoika commented Jan 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

takoika commented Jan 2, 2022

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Jan 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Jan 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

takoika commented Jan 2, 2022 •

edited

Loading

thomasjpfan Jan 2, 2022 •

edited

Loading

ogrisel Jan 3, 2022 •

edited

Loading