Skip to content

Simple k-fold cross validation #5

@VolodymyrOrlov

Description

@VolodymyrOrlov

K-fold cross validation (CV) is a preferred way to evaluate performance of a statistical model. CV is better than just splitting dataset into training/test sets because we use as many data samples for validation as we can get from a single dataset, thus improving estimate of out-of-the-box error.

SmartCore does not has a method for CV and this is a shame, because any good ML framework must have it.

I think we could start from a simple replica of the Scikit's sklearn.model_selection.KFold. Later on we can add replica of StratifiedKFold.

If you are not familiar with CV I would start from reading about it here and here. Next I would look at Scikit's implementation and design a function or a class that does the same for SmartCore.

We do not have to reproduce class KFold exactly, one way to do it is to write an iterator that spits out K pairs of (train, test) sets. Also, it might be helpful to see how train/test split is implemented in SmartCore

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions