-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
DOC rework plot_document_classification_20newsgroups.py example #22928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I triggered the CI again because the artifacts were not generated |
…arn into doc_classification
ogrisel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, here is a first pass of feedback.
I also think you should append your name in the list of authors of this file in the header comment.
Co-authored-by: Olivier Grisel <[email protected]>
Co-authored-by: Olivier Grisel <[email protected]>
ogrisel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nice refactoring. Here is a batch of feedback!
Co-authored-by: Olivier Grisel <[email protected]>
Co-authored-by: Olivier Grisel <[email protected]>
jjerphan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments.
Co-authored-by: Julien Jerphanion <[email protected]>
|
Thank you for your contribution, @ArturoAmorQ. 🙂 |
ogrisel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a final review and here are a few more nitpicks but otherwise LGTM.
This example is much more interesting to read that it used to be :) Thank you very much for this contribution @ArturoAmorQ !
…it-learn#22928) Co-authored-by: Jérémie du Boisberranger <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Julien Jerphanion <[email protected]>
Co-authored-by: Jérémie du Boisberranger <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Julien Jerphanion <[email protected]>
Reference Issues/PRs
What does this implement/fix? Explain your changes.
The 20newsgroups classification example is our only document classification
example and it can/should be more didactic. Therefore, this PR reworks it by:
ConfusionMatrixDisplayto visualize resultsAny other comments?
The demo uses
RidgeClassifieras it has a relatively low training time, but any classifier with acoef_attribute may work.This example uses
TfidfVectorizeronly. A comparison of its performance with respect toHashingVectorizerwill be craft in another example.