Skip to content

[timeseries] Add first-class support for missing values #3886

@shchur

Description

@shchur

Currently, TimeSeriesPredictor deals with missing values in the data by first imputing them via forward-/backward-filling, and then training all models as if there are no missing values. This strategy may lead to poor accuracy on datasets with a large portion of missing values since training data will include regions with constant values arising from ffilling.

A better alternative is to keep the missing values represented by nan in the data, and let models handle it. This requires following modifications to the code:

  • Update metric implementations to handle missing values in target ([timeseries] Ensure that all metrics handle missing values in the target #3966)
  • Update preprocessing logic in TimeSeriesPredictor._check_and_prepare_data_frame
  • Make sure that all models can handle missing values. This means, all models can train normally and produce forecasts with no NaN values, even if training data contains NaNs.
    • GluonTS models (DeepAR, TFT, PatchTST, DLinear)
    • StatsForecast models (AutoETS, AutoARIMA, Theta, intermittent demand models)
    • Baseline models (Naive, SeasonalNaive, Average, SeasonalAverage, Zero)
    • MLForecast models (DirectTabular, RecursiveTabular)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions