-
Notifications
You must be signed in to change notification settings - Fork 13
Add multivariate <-> univariate conversion support #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multivariate <-> univariate conversion support #27
Conversation
| adapter: Literal["pandas", "datasets", "gluonts", "nixtla", "darts", "autogluon"] = "pandas", | ||
| *, | ||
| as_univariate: bool = False, | ||
| univariate_target_column: str = "target", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like the ugliest hack here, but we need to assign the target some name when using
fev.convert_input_data(task, adapter="datasets", as_univariate=True)One option is to always rename the target to univariate_target_column, even if the underlying task is univariate. This could simplify the downstream usage for univariate models (the target will always have the same name).
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really think that this is ugly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the code to always rename the target to univariate_target_column if as_univariate=True.
| past_df, | ||
| id_column=task.id_column, | ||
| timestamp_column=task.timestamp_column, | ||
| past_df.rename(columns={target_column: cls.target_column}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change for the AutoGluonAdapter, but I hope that this won't affect many users.
abdulfatir
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @shchur! Left some minor comments.
| *, | ||
| target_column: str | list[str], | ||
| id_column: str, | ||
| timestamp_column: str, | ||
| static_columns: list[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change the API from Task to this? Wouldn't Task already provide all this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main purpose is to avoid the situation where task.target_column is a list[str] of the multivariate targets, but inside convert_input_data we convert past to a univariate dataset with target set to univariate_target_column (so columns listed in task.target_column are missing from past). In this case, we would need to additionally provide univariate_target_column, which feels hacky to me.
Also the usual "principle of least knowledge" argument for good software design. This way we don't need to worry that some unexpected changes to Task break the adapters.
src/fev/adapters.py
Outdated
| # We cannot apply generate_univariate_targets_from_multivariate to future since it does not contain target cols, | ||
| # so we just repeat each entry and insert the IDs from past | ||
| original_column_order = future.column_names | ||
| future = future.select([i for i in range(len(future)) for _ in range(len(task.target_columns_list))]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm, this is doing:
[0, 0, ..., 1, 1, ..., ..., h-1, h-1, ... where each forecast index is repeated |target_columns_list| times?
| adapter: Literal["pandas", "datasets", "gluonts", "nixtla", "darts", "autogluon"] = "pandas", | ||
| *, | ||
| as_univariate: bool = False, | ||
| univariate_target_column: str = "target", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really think that this is ugly.
Issue #, if available:
This PR makes it easier to evaluate univariate models on multivariate tasks without modifying the task definition or using the
genearte_univariate_targets_fromtask attribute.Description of changes:
generate_univariate_targets_from_multivariate- converts a multivariate dataset to univariatecombine_univariate_predictions_to_multivariate- combines univariate predictions into multivariate predictions format (DatasetDict)fev.convert_input_datamethod to allow performing the multivariate -> univariate conversion before passing the data to the actual adapter.DatasetsAdapterthat returns the data in original format. This makes it possible to convert data to univariate format with justBy submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.