-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
DOC Add HGBDT to "user_guide" reference in RF #26322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC Add HGBDT to "user_guide" reference in RF #26322
Conversation
ae191e1 to
231fffe
Compare
|
Thanks for addressing this last point in the issue :) here are a few comments:
Please let me know if you need help or more details addressing my comments. |
@ArturoAmorQ
Please feel free to correct me in case I missed any point. |
|
It is not necessary to add code. For that purpose I opened #26320. What I had in mind is comparing the computational cost of the algorithms themself, for instance, building shallow (hgbt) vs deep (rf) trees. Mention that in rf trees are independent and can be built separately whereas hgbt is the corrections are built succesively. Mention the efficiency of binning, etc. And for the last point I meant plain text, just not the note formatting. Please let me know if it's clearer now. |
|
ArturoAmorQ
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another comment: Please try to keep lines below 80 characters to comply with the PEP 8 style convention
doc/modules/ensemble.rst
Outdated
| implementation combines classifiers by averaging their probabilistic | ||
| prediction, instead of letting each classifier vote for a single class. | ||
|
|
||
| Comparison of Histogram-based Gradient Boosting Trees over Random Forests in terms of computational cost: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a subtitle, here you can introduce the goal of the following bullet-points like this:
| Comparison of Histogram-based Gradient Boosting Trees over Random Forests in terms of computational cost: | |
| A competitive alternative to random forests are | |
| :ref:`histogram_based_gradient_boosting` (HGBT) models: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a subtitle, here you can introduce the goal of the following bullet-points like this:
Made the changes as requested.
doc/modules/ensemble.rst
Outdated
|
|
||
| - Building shallow trees: HGBT can be computationally more efficient than RF when building shallow trees because it only needs to consider a limited number of bins to construct the splits, whereas RF builds deep trees that require more iterations to reach optimal splits. | ||
|
|
||
| - Sequential boosting: In HGBT, the trees are built sequentially, with each tree correcting the mistakes of the previous trees. This can be more computationally efficient than RF, where all the trees are built independently and in parallel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrase "This can be more computationally efficient than RF" is a bit vague, I suggest giving a deeper explanation on this regard:
| - Sequential boosting: In HGBT, the trees are built sequentially, with each tree correcting the mistakes of the previous trees. This can be more computationally efficient than RF, where all the trees are built independently and in parallel. | |
| - Sequential boosting: In HGBT, the trees are built sequentially, with each tree | |
| correcting the mistakes of the previous trees, which allows them to | |
| iteratively improve the model's performance using fewer trees. In contrast, | |
| RF use a majority vote to predict the outcome, which can require a larger | |
| number of trees to achieve the same level of accuracy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrase "This can be more computationally efficient than RF" is a bit vague, I suggest giving a deeper explanation on this regard:
Made the changes as requested
Thanks for noticing it! |
adrinjalali
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good addition. I wonder if we want to refer to xgboost here or not, since that's what people know.
…d of normal url's
948f744 to
70349e9
Compare
adrinjalali
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is certainly an improvement.
Learning slowly |
ArturoAmorQ
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please notice that you can accept suggestions directly from github. This makes the review process easier.
Co-authored-by: Arturo Amor <[email protected]>
425f2bc to
249b45d
Compare
Sure, will keep in mind |
ArturoAmorQ
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your effort @amay1212! It certainly LGTM :)
Reference Issues/PRs
Addresses comment number 2 from @ArturoAmorQ in issue #26220
What does this implement/fix? Explain your changes.
This updates the documentation of HGBT in RF section
Any other comments?