Skip to content

Add Purged K-Fold Cross Validation Class #3830

@Patschkowski

Description

@Patschkowski

What is the desired addition or change?

It is envisioned to add a third Cross Validation (CV) class that purges and embargos data from the training sets after the k-fold splits to avoid information leakage from the training sets into the test set for time series. Algorithm will remove data from the training set as per the following picture ("overlap" and "embargo" will be removed).

image

What is the motivation for this feature?

Time series are not IID. New samples depend on previous samples and thus information is leaked from the training sets into the test set if not prevented.

If applicable, describe how this feature would be implemented.

mlpack has two implementations of CV: KFoldCV and SimpleCV. Both use functionality from CVBase. Proposed is to add a third class PurgedKFoldCV that implements the above-mentioned functionality.

Additional information?

Literature reference: M. López de Prado (2018): "Advances in Financial Machine Learning", pp. 105-109

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions