Throw when PCA generates invalid eigenvectors#5349
Merged
antoniovs1029 merged 1 commit intodotnet:masterfrom Aug 17, 2020
Merged
Throw when PCA generates invalid eigenvectors#5349antoniovs1029 merged 1 commit intodotnet:masterfrom
antoniovs1029 merged 1 commit intodotnet:masterfrom
Conversation
antoniovs1029
commented
Aug 15, 2020
| { | ||
| _eigenVectors[i] = new VBuffer<float>(eigenVectors[i].Length, eigenVectors[i]); | ||
| _meanProjected[i] = VectorUtils.DotProduct(in _eigenVectors[i], in mean); | ||
| Host.CheckParam(_eigenVectors[i].GetValues().All(FloatUtils.IsFinite), |
Contributor
Author
There was a problem hiding this comment.
The same check is done when deserializing, so it's ok to also add it in the constructor:
machinelearning/src/Microsoft.ML.PCA/PcaTrainer.cs
Lines 506 to 509 in a769365
The custom message was requested by the customer that opened the original issue on NimbusML.
Codecov Report
@@ Coverage Diff @@
## master #5349 +/- ##
==========================================
- Coverage 74.04% 73.97% -0.08%
==========================================
Files 1019 1019
Lines 189949 189975 +26
Branches 20429 20429
==========================================
- Hits 140651 140531 -120
- Misses 43786 43918 +132
- Partials 5512 5526 +14
Flags with carried forward coverage won't be shown. Click here to find out more.
|
harishsk
approved these changes
Aug 16, 2020
harishsk
approved these changes
Aug 16, 2020
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes microsoft/NimbusML#497
As discussed there, the problem is that when PCA generates eigenvectors with NaN values, a cryptic exception is thrown on NimbusML during prediction and not during training. It's thrown during prediction because to do prediction NimbusML saves the model to disk and loads it back, and during deserialization there's a check that prevents loading eigenvectors that contain NaNs.
In this PR I'm adding an exception to the constructor of PcaModelParameters so that a more readable exception is thrown during training of either NimbusML or ML.NET, so there's no need to wait until prediction for NimbusML to throw it.