Skip to content

FIX Adjust builder to support a different split record#16

Closed
thomasjpfan wants to merge 3 commits intoneurodata:obliqueprfrom
thomasjpfan:fix_oblique_tree
Closed

FIX Adjust builder to support a different split record#16
thomasjpfan wants to merge 3 commits intoneurodata:obliqueprfrom
thomasjpfan:fix_oblique_tree

Conversation

@thomasjpfan
Copy link
Copy Markdown

This is an example of how to get builder to work with the different types of split records.

This is functionally but I do not like the malloc + free in the builder.

All the setup.py language="c++" changes were needed to get the PR to compile on my machine

CC @adam2392

@thomasjpfan
Copy link
Copy Markdown
Author

thomasjpfan commented Mar 14, 2022

The constraint on SplitRecord being a struct really constraints the trees. Also I think I broke something when it comes to the algorithm itself.

I do not have any more bandwidth to think about this for the next few weeks. (Prioritizing other items for the next release)

@thomasjpfan
Copy link
Copy Markdown
Author

thomasjpfan commented Mar 15, 2022

BTW I think there is a way to make this work with subclassing. We'll have to define a cdef class _Record that is returned by a Splitter and is initialized by the builder (where we are doing malloc). Interally, the splitter would use it's own struct to do splitting, but when the information is passed back up to the TreeBuilder, it will be in the cdef class.

It's very similar to how we do it in HistGradientBoosting:

https://github.com/scikit-learn/scikit-learn/blob/fb4dbfd837483ac3daf06dfc28871bfcfb65c4ab/sklearn/ensemble/_hist_gradient_boosting/splitting.pyx#L32

Note how there is a split_info_struct and a SplitInfo class and how their fields are exactly the same.

One would need to experiment to see if this idea actually works.

@adam2392
Copy link
Copy Markdown
Collaborator

BTW I think there is a way to make this work with subclassing. We'll have to define a cdef class _Record that is returned by a Splitter. Interally, the splitter would use it's own struct to do splitting, but when the information is passed back up to the TreeBuilder, it will be in the cdef class.
One would need to experiment to see if this idea actually works.

Thanks for the info! I can work on it tomorrow.

Also, your idea of the malloc/free works! I fixed the bug it introduced to the algo and confirmed the performance is the same as before for both RF and OF.

If the subclassing works, then I'll lyk and begin working on unit tests. Otw, I have a working commit in the draft PR using your idea.

My colleague will also work on an in-depth example to append to the existing RF example.

@adam2392
Copy link
Copy Markdown
Collaborator

adam2392 commented Mar 15, 2022

(Notes for myself) Main challenges I foresee:

@thomasjpfan thomasjpfan closed this Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants