Fix KFoldCV with RandomForest
#3941
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@Patschkowski pointed out in #3909 that using
KFoldCVwithRandomForest<>did not seem to work. I thought it would be simple to figure out why but I actually had to dive in fairly deep. As it turned out the fix was simple:KFoldCVuses the classMetaInfoExtractor, which determines various information about a given machine learning algorithm, like whatTrain()variants it supports and so forth.MetaInfoExtractor, in order to determine what variants are available, uses theHAS_METHOD_FORM()macro insfinae_utility.hpp, which allows a method to either meet a fixed given form, or have additional extra arguments.The number of extra arguments is limited to 7... but
RandomForest::Train()has 8 extra hyperparameters after the training data and labels! ThereforeMetaInfoExtractordoes not work.The solution is to increase the maximum number of extra allowed arguments to 10.
I then added tests to
cv_test.cppto ensure thatRandomForest<>works correctly both withMetaInfoExtractorand withKFoldCV.I also noticed that the labels and weights types were not templatized for
RandomForest<>, so I also generalized those.