Skip to content

Conversation

@wallygauze
Copy link
Contributor

@wallygauze wallygauze commented Mar 1, 2017

Reference Issue

Fixes #8484

What does this implement/fix? Explain your changes.

  • 'Handling n_components==None' code in _fit method
  • Extended (to include n_samples as a limit) relevant ValueError in _fit_full method
  • Extended ValueError raised for (not 1 <= n_components <= n_features) and ValueError raised for (svd_solver == 'arpack' and n_components == n_features) in _fit_truncated

Documentation changes:

  • n_component parameter documentation, & mentions elsewhere in parameters section

  • n_components_ attribute documentation

  • unrelated (to issue) extra: corrected documentation for explained_variance_ratio_ attribute

@wallygauze wallygauze changed the title Fixed issue 8484 [MRG] Fixed issue 8484 Mar 1, 2017
@wallygauze wallygauze changed the title [MRG] Fixed issue 8484 [MRG] Fixing issue 8484 Mar 1, 2017
@codecov
Copy link

codecov bot commented Mar 6, 2017

Codecov Report

Merging #8486 into master will increase coverage by <.01%.
The diff coverage is 95.45%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8486      +/-   ##
==========================================
+ Coverage   95.48%   95.48%   +<.01%     
==========================================
  Files         342      342              
  Lines       61000    61023      +23     
==========================================
+ Hits        58246    58269      +23     
  Misses       2754     2754
Impacted Files Coverage Δ
sklearn/decomposition/tests/test_pca.py 100% <100%> (ø) ⬆️
sklearn/decomposition/pca.py 94.55% <91.66%> (+0.05%) ⬆️
sklearn/ensemble/gradient_boosting.py 95.79% <0%> (ø) ⬆️
sklearn/feature_selection/univariate_selection.py 99.46% <0%> (ø) ⬆️
sklearn/utils/estimator_checks.py 93.36% <0%> (+0.08%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a36c852...df6b90c. Read the comment docs.

@lesteve
Copy link
Member

lesteve commented Mar 6, 2017

You have flake8 failures that you need to understand. Also use a descriptive title please.

@wallygauze wallygauze changed the title [MRG] Fixing issue 8484 [MRG] Limiting n_components by both n_features and n_samples instead of just n_features Mar 6, 2017
@wallygauze
Copy link
Contributor Author

wallygauze commented Mar 7, 2017

Since my pull request introduces no change in coverage, I believe this codecov fail is irrelevant.

@agramfort
Copy link
Member

please add a non-regression test. Start from the gist in the issue:

import numpy as np
from .pca import PCA

X = np.array([[-1, -1,3,4,-1, -1,3,4], [-2, -1,5,-1, -1,3,4,2], [-3, -2,1,-1, -1,3,4,1],
[1, 1,4,-1, -1,3,4,2], [2, 1,0,-1, -1,3,4,2], [3, 2,10,-1, -1,3,4,10]])

pca = PCA(n_components=7, svd_solver="arpack")

pca.fit(X)

@wallygauze
Copy link
Contributor Author

While running the tests, I realised that pca.py can't handle the user not entering the number of components if the solver is 'arpack' : with that particular solver, n_components must be strictly less than min(n_samples, n_features), so the default (exactly min(n_samples, n_features)) would fail.

explained is greater than the percentage specified by n_components
n_components cannot be equal to n_features for svd_solver == 'arpack'.
explained is greater than the percentage specified by n_components.
if svd_solver == 'arpack', the number of components must be strictly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need a capital letter.

# matrix X) raise errors.
X = np.array([[0, 1, 0], [1, 0, 0]])
for solver in solver_list:
for n_components in [-1, 3]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use assert_raises_regex to check also the message return and not only assert_raises.

@wallygauze
Copy link
Contributor Author

Thank you @glemaitre .

@wallygauze
Copy link
Contributor Author

There was seemingly a problem with Travis until recently, so I am assuming that is why no checks were performed. I am going to open a new pull request, so all checks are run.

@wallygauze wallygauze closed this Apr 13, 2017
@glemaitre
Copy link
Member

you don't have to close
you should amend and make a push force
that will be enough

@wallygauze
Copy link
Contributor Author

wallygauze commented Apr 13, 2017

@glemaitre I did that, but it seemed I could no longer reopen this pull-request because amending the last commit rewrote history (which makes sense). Did I misunderstand what you advised? I perhaps should have pushed an empty commit on top. Regardless, I have opened a new pull-request already

@glemaitre
Copy link
Member

glemaitre commented Apr 13, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

n_components in PCA explicitly limited by n_features only

4 participants