-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[timeseries] Add native support for missing values #3995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
1 similar comment
|
bdf53bc to
640f677
Compare
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
canerturkmen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall thanks a lot for this! Dropped a few minor comments and questions.
timeseries/src/autogluon/timeseries/models/abstract/abstract_timeseries_model.py
Outdated
Show resolved
Hide resolved
timeseries/src/autogluon/timeseries/models/abstract/abstract_timeseries_model.py
Show resolved
Hide resolved
timeseries/src/autogluon/timeseries/models/abstract/abstract_timeseries_model.py
Show resolved
Hide resolved
timeseries/src/autogluon/timeseries/models/local/abstract_local_model.py
Show resolved
Hide resolved
timeseries/src/autogluon/timeseries/models/local/abstract_local_model.py
Show resolved
Hide resolved
timeseries/src/autogluon/timeseries/models/autogluon_tabular/mlforecast.py
Show resolved
Hide resolved
|
|
canerturkmen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Only one comment.
| if known_covariates_names == []: | ||
| assert known_covariates_transformed is None | ||
| else: | ||
| assert not known_covariates_transformed[known_covariates_names].isna().any(axis=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great and I'll probably use it for feature importance.
Should we also test the filling logic, at least for the 'median' and 'mode' scenarios?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, added a test for that in test_learner to ensure that this logic works after loading from disk
7ff79e4 to
bdf8f67
Compare
|
bdf8f67 to
b055879
Compare
|
1 similar comment
|
shchur
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @canerturkmen, I've added the tests for mode/median imputation + added missing values to DUMMY_TS_DATAFRAME so that we consider more settings with NaNs
|
|
||
| [tool.ruff] | ||
| ignore = [ | ||
| lint.ignore = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes a deprecation warning from ruff
| if known_covariates_names == []: | ||
| assert known_covariates_transformed is None | ||
| else: | ||
| assert not known_covariates_transformed[known_covariates_names].isna().any(axis=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, added a test for that in test_learner to ensure that this logic works after loading from disk
|
@canerturkmen just finished some benchmarking on 12 datasets with missing values / 3 folds each: This PR branch has 65% win rate vs. current |
|
9122bbc to
51aeea5
Compare
| assert dict_equal_primitive(model._local_model_args, loaded_model._local_model_args) | ||
|
|
||
|
|
||
| @pytest.mark.parametrize("model_class", TESTABLE_MODELS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
canerturkmen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚀
Issue #, if available: fixes #3886
Description of changes:
targetcolumn inTimeSeriesPredictor._check_and_prepare_data_frame, we let each model use its own logic for handling missing values._get_tags()mechanismTimeSeriesPredictornow removes time series consisting of only NaN values fromtrain_dataduringfit()TimeSeriesFeatureGeneratorDUMMY_TS_DATAFRAMEused in the tests to ensure that NaN support works in all scenariosTo do:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.