Skip to content

Throw when PCA generates invalid eigenvectors#5349

Merged
antoniovs1029 merged 1 commit intodotnet:masterfrom
antoniovs1029:isPCANaN
Aug 17, 2020
Merged

Throw when PCA generates invalid eigenvectors#5349
antoniovs1029 merged 1 commit intodotnet:masterfrom
antoniovs1029:isPCANaN

Conversation

@antoniovs1029
Copy link
Copy Markdown
Contributor

@antoniovs1029 antoniovs1029 commented Aug 15, 2020

Fixes microsoft/NimbusML#497

As discussed there, the problem is that when PCA generates eigenvectors with NaN values, a cryptic exception is thrown on NimbusML during prediction and not during training. It's thrown during prediction because to do prediction NimbusML saves the model to disk and loads it back, and during deserialization there's a check that prevents loading eigenvectors that contain NaNs.

In this PR I'm adding an exception to the constructor of PcaModelParameters so that a more readable exception is thrown during training of either NimbusML or ML.NET, so there's no need to wait until prediction for NimbusML to throw it.

@antoniovs1029 antoniovs1029 requested a review from a team as a code owner August 15, 2020 05:57
{
_eigenVectors[i] = new VBuffer<float>(eigenVectors[i].Length, eigenVectors[i]);
_meanProjected[i] = VectorUtils.DotProduct(in _eigenVectors[i], in mean);
Host.CheckParam(_eigenVectors[i].GetValues().All(FloatUtils.IsFinite),
Copy link
Copy Markdown
Contributor Author

@antoniovs1029 antoniovs1029 Aug 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same check is done when deserializing, so it's ok to also add it in the constructor:

var vi = ctx.Reader.ReadFloatArray(_dimension);
Host.CheckDecode(vi.All(FloatUtils.IsFinite));
_eigenVectors[i] = new VBuffer<float>(_dimension, vi);
_meanProjected[i] = VectorUtils.DotProduct(in _eigenVectors[i], in _mean);

The custom message was requested by the customer that opened the original issue on NimbusML.

@codecov
Copy link
Copy Markdown

codecov Bot commented Aug 15, 2020

Codecov Report

Merging #5349 into master will decrease coverage by 0.07%.
The diff coverage is 96.15%.

@@            Coverage Diff             @@
##           master    #5349      +/-   ##
==========================================
- Coverage   74.04%   73.97%   -0.08%     
==========================================
  Files        1019     1019              
  Lines      189949   189975      +26     
  Branches    20429    20429              
==========================================
- Hits       140651   140531     -120     
- Misses      43786    43918     +132     
- Partials     5512     5526      +14     
Flag Coverage Δ
#Debug 73.97% <96.15%> (-0.08%) ⬇️
#production 69.74% <100.00%> (-0.10%) ⬇️
#test 87.68% <95.65%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
test/Microsoft.ML.Tests/AnomalyDetectionTests.cs 99.29% <95.65%> (-0.71%) ⬇️
src/Microsoft.ML.PCA/PcaTrainer.cs 81.87% <100.00%> (+0.16%) ⬆️
...c/Microsoft.ML.FastTree/Utils/ThreadTaskManager.cs 79.48% <0.00%> (-20.52%) ⬇️
...osoft.ML.KMeansClustering/KMeansPlusPlusTrainer.cs 83.60% <0.00%> (-7.27%) ⬇️
src/Microsoft.ML.Data/Training/TrainerUtils.cs 66.86% <0.00%> (-3.82%) ⬇️
....ML.AutoML/PipelineSuggesters/PipelineSuggester.cs 83.19% <0.00%> (-3.37%) ⬇️
...crosoft.ML.StandardTrainers/Standard/SdcaBinary.cs 85.23% <0.00%> (-3.33%) ⬇️
...L.AutoML/TrainerExtensions/TrainerExtensionUtil.cs 84.71% <0.00%> (-1.66%) ⬇️
src/Microsoft.ML.Sweeper/AsyncSweeper.cs 71.42% <0.00%> (-1.37%) ⬇️
...crosoft.ML.StandardTrainers/Optimizer/Optimizer.cs 71.96% <0.00%> (-1.16%) ⬇️
... and 10 more

@antoniovs1029 antoniovs1029 merged commit 1b1872f into dotnet:master Aug 17, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

'PcaAnomalyDetector' meets error "System.FormatException: 'One of the identified items was in an invalid format.'" in predicting

2 participants