-
Notifications
You must be signed in to change notification settings - Fork 91
Simple k-fold cross validation #5
Description
K-fold cross validation (CV) is a preferred way to evaluate performance of a statistical model. CV is better than just splitting dataset into training/test sets because we use as many data samples for validation as we can get from a single dataset, thus improving estimate of out-of-the-box error.
SmartCore does not has a method for CV and this is a shame, because any good ML framework must have it.
I think we could start from a simple replica of the Scikit's sklearn.model_selection.KFold. Later on we can add replica of StratifiedKFold.
If you are not familiar with CV I would start from reading about it here and here. Next I would look at Scikit's implementation and design a function or a class that does the same for SmartCore.
We do not have to reproduce class KFold exactly, one way to do it is to write an iterator that spits out K pairs of (train, test) sets. Also, it might be helpful to see how train/test split is implemented in SmartCore