[timeseries] Add native support for missing values #3995

shchur · 2024-03-21T14:48:57Z

Issue #, if available: fixes #3886

Description of changes:

Instead of imputing all missing values in target column in TimeSeriesPredictor._check_and_prepare_data_frame, we let each model use its own logic for handling missing values.
- GluonTS models (DeepAR, TFT, PatchTST, DLinear) + some local models (Average, SeasonalAverage, NPTS, Naive, SeasonalNaive) handle the missing values natively
- Other local models (AutoETS, AutoCES, AutoARIMA, Theta, intermittent models) perform imputation first
- MLForecast models use a mix of two strategies: missing values are imputed, but rows that originally contained NaN values are not used for training
Model properties (e.g., whether it can handle missing values) are stored using the _get_tags() mechanism
TimeSeriesPredictor now removes time series consisting of only NaN values from train_data during fit()
Missing values in covariates are still always imputed inside TimeSeriesFeatureGenerator
Add missing values to DUMMY_TS_DATAFRAME used in the tests to ensure that NaN support works in all scenarios

To do:

Add tests
Add missing values support to MLForecast models
Benchmark the new NaN handling strategy on datasets with missing values

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

yinweisu · 2024-03-21T15:00:23Z

Previous CI Run	Current CI Run

yinweisu · 2024-03-24T14:58:49Z

Previous CI Run	Current CI Run

review-notebook-app · 2024-03-25T15:56:05Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

yinweisu · 2024-03-25T16:05:25Z

Previous CI Run	Current CI Run
huggingface-hub==0.21.4	huggingface-hub==0.22.0
lightning-utilities==0.11.0	lightning-utilities==0.11.1
filelock==3.13.1	filelock==3.13.2
huggingface-hub==0.21.4	huggingface-hub==0.22.0
lightning-utilities==0.11.0	lightning-utilities==0.11.1
filelock==3.13.1	filelock==3.13.2

canerturkmen

LGTM overall thanks a lot for this! Dropped a few minor comments and questions.

timeseries/src/autogluon/timeseries/models/abstract/abstract_timeseries_model.py

timeseries/src/autogluon/timeseries/models/local/npts.py

timeseries/src/autogluon/timeseries/utils/features.py

timeseries/src/autogluon/timeseries/models/local/abstract_local_model.py

timeseries/src/autogluon/timeseries/predictor.py

timeseries/src/autogluon/timeseries/models/autogluon_tabular/mlforecast.py

timeseries/tests/unittests/models/test_models.py

yinweisu · 2024-03-26T13:26:45Z

Previous CI Run	Current CI Run
flatbuffers==24.3.7	flatbuffers==24.3.25
flatbuffers==24.3.7	flatbuffers==24.3.25

github-actions · 2024-03-26T16:01:42Z

Job PR-3995-7ff79e4 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3995/7ff79e4/index.html

yinweisu · 2024-03-26T23:49:41Z

Previous CI Run	Current CI Run

canerturkmen

LGTM! Only one comment.

canerturkmen · 2024-03-27T08:50:05Z

timeseries/tests/unittests/test_features.py

+    if known_covariates_names == []:
+        assert known_covariates_transformed is None
+    else:
+        assert not known_covariates_transformed[known_covariates_names].isna().any(axis=None)


This looks great and I'll probably use it for feature importance.

Should we also test the filling logic, at least for the 'median' and 'mode' scenarios?

Thanks, added a test for that in test_learner to ensure that this logic works after loading from disk

yinweisu · 2024-03-27T16:30:19Z

Previous CI Run	Current CI Run

yinweisu · 2024-03-28T16:05:42Z

Previous CI Run	Current CI Run

yinweisu · 2024-03-28T19:12:15Z

Previous CI Run	Current CI Run

shchur

Thanks @canerturkmen, I've added the tests for mode/median imputation + added missing values to DUMMY_TS_DATAFRAME so that we consider more settings with NaNs

shchur · 2024-03-28T18:49:35Z

pyproject.toml


 [tool.ruff]
-ignore = [
+lint.ignore = [


This fixes a deprecation warning from ruff

shchur · 2024-03-28T18:53:28Z

timeseries/tests/unittests/test_features.py

+    if known_covariates_names == []:
+        assert known_covariates_transformed is None
+    else:
+        assert not known_covariates_transformed[known_covariates_names].isna().any(axis=None)


Thanks, added a test for that in test_learner to ensure that this logic works after loading from disk

shchur · 2024-03-28T19:45:43Z

@canerturkmen just finished some benchmarking on 12 datasets with missing values / 3 folds each: This PR branch has 65% win rate vs. current master branch, so I guess we are good to merge once the problems with tests are resolved.

yinweisu · 2024-03-28T21:38:34Z

Previous CI Run	Current CI Run

shchur · 2024-03-29T07:04:25Z

timeseries/tests/unittests/models/test_local.py

    assert dict_equal_primitive(model._local_model_args, loaded_model._local_model_args)


-@pytest.mark.parametrize("model_class", TESTABLE_MODELS)


We already have the same test here https://github.com/autogluon/autogluon/blob/master/timeseries/tests/unittests/models/test_models.py#L269.

yinweisu · 2024-03-29T07:11:14Z

Previous CI Run	Current CI Run

yinweisu · 2024-03-29T08:40:52Z

Previous CI Run	Current CI Run

github-actions · 2024-03-29T11:09:26Z

Job PR-3995-e50709a is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3995/e50709a/index.html

canerturkmen

LGTM! 🚀

shchur force-pushed the nan-values-ts-models branch from bdf53bc to 640f677 Compare March 25, 2024 12:49

shchur changed the title ~~WIP: [timeseries] Add native support for missing values~~ [timeseries] Add native support for missing values Mar 25, 2024

canerturkmen added the module: timeseries related to the timeseries module label Mar 25, 2024

canerturkmen reviewed Mar 26, 2024

View reviewed changes

canerturkmen approved these changes Mar 27, 2024

View reviewed changes

shchur force-pushed the nan-values-ts-models branch from 7ff79e4 to bdf8f67 Compare March 27, 2024 16:09

shchur force-pushed the nan-values-ts-models branch from bdf8f67 to b055879 Compare March 28, 2024 15:31

shchur commented Mar 28, 2024

View reviewed changes

shchur added 10 commits March 29, 2024 06:28

Add native support for missing values

b59a1b1

Add missing values support to MLF models

551da11

Fix imputation

9c596fb

Add comment

38de1d0

Update tests

2a0edb1

Update predictor tests

0ab7ba3

Add tests

db7484a

Test if model input is preprocessed

ff1c15b

Update logic for NaN handling

f954163

Update install instructions

5ada722

shchur added 11 commits March 29, 2024 06:28

Fix isort

76b37a9

Fix pydantic version range & fix covariate imputation logic

62e5c7b

Assert that target is not imputed

eb2310d

Remove pydantic dependency

d8b6e2d

Use missing data in tests

e3270ae

Fix accidental changes + edge cases for snaive

0cfb8e3

Add test for NaN imputation

f9fc0fe

Fix tests

a05207c

Fix unintended changes

df18cc8

Fix tests that assume that data contains no NaNs

f1bc59f

Fix tests

51aeea5

shchur force-pushed the nan-values-ts-models branch from 9122bbc to 51aeea5 Compare March 29, 2024 07:02

shchur commented Mar 29, 2024

View reviewed changes

shchur requested a review from canerturkmen March 29, 2024 07:05

Fix ties is mode computation

e50709a

canerturkmen approved these changes Mar 29, 2024

View reviewed changes

shchur merged commit d0d1fa9 into autogluon:master Mar 29, 2024

shchur deleted the nan-values-ts-models branch April 3, 2024 09:51

LennartPurucker pushed a commit to LennartPurucker/autogluon that referenced this pull request Jun 1, 2024

[timeseries] Add native support for missing values (autogluon#3995)

83599bf

		assert dict_equal_primitive(model._local_model_args, loaded_model._local_model_args)


		@pytest.mark.parametrize("model_class", TESTABLE_MODELS)

[timeseries] Add native support for missing values #3995

[timeseries] Add native support for missing values #3995

Uh oh!

Conversation

shchur commented Mar 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yinweisu commented Mar 21, 2024

Uh oh!

yinweisu commented Mar 24, 2024

Uh oh!

review-notebook-app bot commented Mar 25, 2024

Uh oh!

yinweisu commented Mar 25, 2024

Uh oh!

canerturkmen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yinweisu commented Mar 26, 2024

Uh oh!

github-actions bot commented Mar 26, 2024

Uh oh!

yinweisu commented Mar 26, 2024

Uh oh!

canerturkmen left a comment

Choose a reason for hiding this comment

Uh oh!

canerturkmen Mar 27, 2024

Choose a reason for hiding this comment

Uh oh!

shchur Mar 28, 2024

Choose a reason for hiding this comment

Uh oh!

yinweisu commented Mar 27, 2024

Uh oh!

yinweisu commented Mar 28, 2024

Uh oh!

yinweisu commented Mar 28, 2024

Uh oh!

shchur left a comment

Choose a reason for hiding this comment

Uh oh!

shchur Mar 28, 2024

Choose a reason for hiding this comment

Uh oh!

shchur Mar 28, 2024

Choose a reason for hiding this comment

Uh oh!

shchur commented Mar 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yinweisu commented Mar 28, 2024

Uh oh!

shchur Mar 29, 2024

Choose a reason for hiding this comment

Uh oh!

yinweisu commented Mar 29, 2024

Uh oh!

yinweisu commented Mar 29, 2024

Uh oh!

github-actions bot commented Mar 29, 2024

Uh oh!

canerturkmen left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

shchur commented Mar 21, 2024 •

edited

Loading

shchur commented Mar 28, 2024 •

edited

Loading