Skip to content

Conversation

@refluster
Copy link

@refluster refluster commented Apr 5, 2017

sklearn.neighbors.LocalOutlierFactor has an ability to judge the outliers and return as bool, but we cannot get the level of outlier for each data, indicates inlier closed or obvious outlier. So in this pull request, decision_function() api is added to provide the outlier level as Local Outlier Factor.

Reference Issue

What does this implement/fix? Explain your changes.

add decision_function api into sklearn.neighbors.LocalOutlierFactor

Any other comments?

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Apr 5, 2017 via email

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Apr 5, 2017 via email

@refluster
Copy link
Author

Thank you so much for the comment! I'll fix it.

@refluster refluster changed the title [MRG] Add score api to LocalOutlierFactor [MRG] Add decision_function api to LocalOutlierFactor Apr 5, 2017
@refluster
Copy link
Author

Fix is done

@albertcthomas
Copy link
Contributor

There was a discussion on this in the LOF PR. It was decided to keep decision_function on new samples as a private method. You can access the decision function values on the training data by considering the opposite of the negative_outlier_factor_. cc @ngoix @amueller

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Apr 6, 2017 via email

@ngoix
Copy link
Contributor

ngoix commented Apr 6, 2017

Both decision function and predict methods are private and accessible through _decision_function() and _predict() methods. It was decided so because these two methods somehow extend the use of LOF prediction to "new data". LOF originally performs both training and testing on the same dataset, ie only performs fit_predict(X).
The main problem of making these methods public is that then fit_predict(X) would be different than fit(X).predict(X).

@agramfort
Copy link
Member

i agree with @ngoix

fit_predict(X) different than fit(X).predict(X) is bad.

@jnothman
Copy link
Member

jnothman commented Apr 6, 2017 via email

@refluster
Copy link
Author

Thanks for the comments! I understand it and am sorry that I could not care such matters. The decision_function should not be merged so I would close this pull request.
Thank you.

@refluster refluster closed this Apr 7, 2017
@albertcthomas
Copy link
Contributor

Should we make it explicit somewhere (in the code or documentation) that predict is private because fit_predict(X) would be different than fit(X).predict(X), And that decision_function was made private because predict was made private ?

I think it could be good for people investigating the code (as I did when looking at this PR) to know why.

@refluster refluster reopened this Apr 7, 2017
@refluster
Copy link
Author

refluster commented Apr 7, 2017

Thanks @albertcthomas for the advice. Comment regarding to keeping the method private was inserted to the code.

@refluster
Copy link
Author

The reason that decision_function is not published is now added. Please let me know if the explanation is wrong.

Also, the samples in X are not considered in the neighborhood of any
point.
point. To avoid training and testing on the different dataset, this
method is kept private as well as _predict().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say: " This method is kept private as it is an extension of LOF to the train-test setting, which is not described in the original paper."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 @ngoix maybe take over so we can move on.

@ngoix
Copy link
Contributor

ngoix commented Jun 6, 2017

These two lines have been included in PR #9015. This PR can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants