-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Description
Describe the workflow you want to enable
This issue references this TODO comment:
scikit-learn/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py
Lines 65 to 67 in be89eb7
| # TODO: Ideally this should be computed in parallel over the leaves using something | |
| # similar to _update_raw_predictions(), but this requires a cython version of | |
| # median(). |
Describe your proposed solution
Some APIs (like std::nth_element) allows getting the median or a given quantile of a contiguous buffer in O(n) but it does need to mutate a data-structure to sort (it can be the buffer or another data-structure if using a Comparator).
Could it be used there?
Moreover, why can't we compute the median in parallel here?
Additional context
Follows-up with discussions: https://github.com/scikit-learn/scikit-learn/pull/20811/files/4df17828e439ad09a7784e99cc3d8d956eb50fe0#r781036357
/cc @NicolasHug who might be interested in following this issue as he initially worked on the update and authored this comment.