Skip to content

Conversation

@shchur
Copy link
Contributor

@shchur shchur commented Aug 1, 2025

Issue #, if available:

This PR makes it easier to evaluate univariate models on multivariate tasks without modifying the task definition or using the genearte_univariate_targets_from task attribute.

Description of changes:

  • Add two utils methods:
    • generate_univariate_targets_from_multivariate - converts a multivariate dataset to univariate
    • combine_univariate_predictions_to_multivariate - combines univariate predictions into multivariate predictions format (DatasetDict)
  • Extend the fev.convert_input_data method to allow performing the multivariate -> univariate conversion before passing the data to the actual adapter.
  • Add a new adapter DatasetsAdapter that returns the data in original format. This makes it possible to convert data to univariate format with just
    past, future = fev.convert_input_data(task, adapter="datasets", as_univariate=True)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@shchur shchur requested a review from abdulfatir August 1, 2025 13:06
adapter: Literal["pandas", "datasets", "gluonts", "nixtla", "darts", "autogluon"] = "pandas",
*,
as_univariate: bool = False,
univariate_target_column: str = "target",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like the ugliest hack here, but we need to assign the target some name when using

fev.convert_input_data(task, adapter="datasets", as_univariate=True)

One option is to always rename the target to univariate_target_column, even if the underlying task is univariate. This could simplify the downstream usage for univariate models (the target will always have the same name).

What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really think that this is ugly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the code to always rename the target to univariate_target_column if as_univariate=True.

past_df,
id_column=task.id_column,
timestamp_column=task.timestamp_column,
past_df.rename(columns={target_column: cls.target_column}),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change for the AutoGluonAdapter, but I hope that this won't affect many users.

Copy link
Collaborator

@abdulfatir abdulfatir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shchur! Left some minor comments.

Comment on lines +27 to +31
*,
target_column: str | list[str],
id_column: str,
timestamp_column: str,
static_columns: list[str],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the API from Task to this? Wouldn't Task already provide all this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main purpose is to avoid the situation where task.target_column is a list[str] of the multivariate targets, but inside convert_input_data we convert past to a univariate dataset with target set to univariate_target_column (so columns listed in task.target_column are missing from past). In this case, we would need to additionally provide univariate_target_column, which feels hacky to me.

Also the usual "principle of least knowledge" argument for good software design. This way we don't need to worry that some unexpected changes to Task break the adapters.

# We cannot apply generate_univariate_targets_from_multivariate to future since it does not contain target cols,
# so we just repeat each entry and insert the IDs from past
original_column_order = future.column_names
future = future.select([i for i in range(len(future)) for _ in range(len(task.target_columns_list))])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm, this is doing:

[0, 0, ..., 1, 1, ..., ..., h-1, h-1, ... where each forecast index is repeated |target_columns_list| times?

adapter: Literal["pandas", "datasets", "gluonts", "nixtla", "darts", "autogluon"] = "pandas",
*,
as_univariate: bool = False,
univariate_target_column: str = "target",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really think that this is ugly.

@shchur shchur merged commit 61cc25e into main Aug 4, 2025
@shchur shchur deleted the convert_multivariate_to_univariate_within_adapter branch August 4, 2025 08:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants