-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[timeseries] add feature importance to TimeSeriesPredictor #4033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[timeseries] add feature importance to TimeSeriesPredictor #4033
Conversation
|
shchur
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very elegant!
The main points that I want to discuss:
- Moving all feature importance logic to Trainer
- Using statistics of train data for masking
- Use full train_data/tuning_data by default
- Fixing seed for determinism
timeseries/src/autogluon/timeseries/trainer/abstract_trainer.py
Outdated
Show resolved
Hide resolved
timeseries/src/autogluon/timeseries/trainer/abstract_trainer.py
Outdated
Show resolved
Hide resolved
timeseries/src/autogluon/timeseries/trainer/abstract_trainer.py
Outdated
Show resolved
Hide resolved
| ) | ||
|
|
||
| if features is None: | ||
| features = feature_importance_transform.covariate_metadata.all_features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that some features are present in data.columns or data.static_features.columns but are missing from covariate_metadata.all_features because TimeSeriesFeatureGenerator removes non-informative features from the data (e.g., duplicate feature, feature consisting of all constant values, feature that takes different value for each row, unrecognized dtype time datetime).
We should manually check the columns in original data and assign feature importance of 0 to columns that are missing from transformed data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what is done in Tabular I believe
|
Innixma
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added initial review
timeseries/src/autogluon/timeseries/trainer/abstract_trainer.py
Outdated
Show resolved
Hide resolved
timeseries/src/autogluon/timeseries/trainer/abstract_trainer.py
Outdated
Show resolved
Hide resolved
timeseries/src/autogluon/timeseries/trainer/abstract_trainer.py
Outdated
Show resolved
Hide resolved
Thanks for the review @Innixma ! This was the first cut trying to pin down the design. I'll base my replies on that. |
|
timeseries/src/autogluon/timeseries/trainer/abstract_trainer.py
Outdated
Show resolved
Hide resolved
|
|
|
1 similar comment
|
|
timeseries/src/autogluon/timeseries/models/autogluon_tabular/mlforecast.py
Show resolved
Hide resolved
timeseries/src/autogluon/timeseries/trainer/abstract_trainer.py
Outdated
Show resolved
Hide resolved
timeseries/src/autogluon/timeseries/trainer/abstract_trainer.py
Outdated
Show resolved
Hide resolved
timeseries/src/autogluon/timeseries/trainer/abstract_trainer.py
Outdated
Show resolved
Hide resolved
|
c517448 to
68b8095
Compare
|
shchur
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for taking care of this massive feature! 🚀
timeseries/src/autogluon/timeseries/models/abstract/abstract_timeseries_model.py
Show resolved
Hide resolved
|
Issue #, if available:
#3924
Description of changes:
This draft PR introduces the high level design for
TimeSeriesPredictor.feature_importance.Additional to-dos:
(stretch) Revisit prediction caching logic to prevent models that don't use features from re-inferringBy submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.