-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
[MRG] Limiting n_components by both n_features and n_samples instead of just n_features #8486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8486 +/- ##
==========================================
+ Coverage 95.48% 95.48% +<.01%
==========================================
Files 342 342
Lines 61000 61023 +23
==========================================
+ Hits 58246 58269 +23
Misses 2754 2754
Continue to review full report at Codecov.
|
|
You have flake8 failures that you need to understand. Also use a descriptive title please. |
|
Since my pull request introduces no change in coverage, I believe this codecov fail is irrelevant. |
|
please add a non-regression test. Start from the gist in the issue: |
|
While running the tests, I realised that pca.py can't handle the user not entering the number of components if the solver is 'arpack' : with that particular solver, n_components must be strictly less than min(n_samples, n_features), so the default (exactly min(n_samples, n_features)) would fail. |
sklearn/decomposition/pca.py
Outdated
| explained is greater than the percentage specified by n_components | ||
| n_components cannot be equal to n_features for svd_solver == 'arpack'. | ||
| explained is greater than the percentage specified by n_components. | ||
| if svd_solver == 'arpack', the number of components must be strictly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need a capital letter.
| # matrix X) raise errors. | ||
| X = np.array([[0, 1, 0], [1, 0, 0]]) | ||
| for solver in solver_list: | ||
| for n_components in [-1, 3]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use assert_raises_regex to check also the message return and not only assert_raises.
|
Thank you @glemaitre . |
|
There was seemingly a problem with Travis until recently, so I am assuming that is why no checks were performed. I am going to open a new pull request, so all checks are run. |
|
you don't have to close |
|
@glemaitre I did that, but it seemed I could no longer reopen this pull-request because amending the last commit rewrote history (which makes sense). Did I misunderstand what you advised? I perhaps should have pushed an empty commit on top. Regardless, I have opened a new pull-request already |
|
You should have reopen the PR and then do the amend and push. I think that if you push somewhere else you cannot reopen. Anyway it is fine. Guillaume Lemaitre INRIA Saclay Ile-de-France / Equipe [email protected] - https://glemaitre.github.io/
|
Reference Issue
Fixes #8484
What does this implement/fix? Explain your changes.
Documentation changes:
n_component parameter documentation, & mentions elsewhere in parameters section
n_components_ attribute documentation
unrelated (to issue) extra: corrected documentation for explained_variance_ratio_ attribute