-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
What is the desired addition or change?
It is envisioned to add a third Cross Validation (CV) class that purges and embargos data from the training sets after the k-fold splits to avoid information leakage from the training sets into the test set for time series. Algorithm will remove data from the training set as per the following picture ("overlap" and "embargo" will be removed).
What is the motivation for this feature?
Time series are not IID. New samples depend on previous samples and thus information is leaked from the training sets into the test set if not prevented.
If applicable, describe how this feature would be implemented.
mlpack has two implementations of CV: KFoldCV and SimpleCV. Both use functionality from CVBase. Proposed is to add a third class PurgedKFoldCV that implements the above-mentioned functionality.
Additional information?
Literature reference: M. López de Prado (2018): "Advances in Financial Machine Learning", pp. 105-109
