Skip to content

Conversation

@shchur
Copy link
Contributor

@shchur shchur commented Aug 20, 2025

Issue #, if available:

Description of changes:

  • Remove the TaskGenerator class
  • Allow including multiple rolling windows in a single Task
  • Add a EvaluationWindow class that corresponds to a single cutoff in the original dataset

To do:

  • Update docs
  • Update implementations of all Task / EvaluationWindow methods
  • Update unit tests
  • Update adapters

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -1 +1 @@
__version__ = "0.6.0rc2"
__version__ = "1.0.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need to bump the version to v1.0.0 given how breaking the changes are

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created a new branch pre-v1.0.0. We can merge all PRs before the next release into this branch to avoid making the documentation on main out-of-sync with the latest stable release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we need to bump it? Are we sure fev is now stable? If not, I'd stick to 0.7 as per semver.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, is __about__.py standard / meaningfully consumed by setuptools or similar? why not just define in init.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I moved this to __init__.py. Using __about__.py has the advantage of being able to check the version without loading the full package, but it's probably not worth it.

@shchur shchur changed the title [WIP] Refactor Task class to include multiple rolling window evaluations Refactor Task class to include multiple rolling window evaluations Aug 25, 2025
@shchur shchur changed the base branch from main to pre-v1.0.0 August 25, 2025 10:31
if as_univariate:
if univariate_target_column in past.column_names and univariate_target_column != task.target_column:
# Raise error if column called `univariate_target_column` already exists and it's not the *only* target column
if univariate_target_column in past.column_names and window.target_columns_list != [univariate_target_column]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we hit this often? If yes, maybe it makes sense to change the default to something which would lead to fewer collisions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can only hit this if there is a columns called target in the dataset that is not actually the only target column. This looks like a really rare case to me, so I wouldn't worry about too much.

Copy link
Contributor

@canerturkmen canerturkmen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! Dropping some comments.

data loading.
window_step_size : int | str | None
Step size between consecutive evaluation windows. Must be an integer if `initial_cutoff` is an integer.
Can be an integer or pandas offset string (e.g., 'D', '15min') if `initial_cutoff` is a timestamp.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. if this pandas candy is there, as a user I expect it to be there for horizon as well for consistency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making horizon: int | str can be really annoying for the users since then they won't be able to set MyModel(prediction_length=task.horizon) anymore, and will require some complex pandas offset manipulations.

In contrast, I don't think that user will ever want to access task.window_step_size since this parameter just configures which windows will be produced by iter_windows().

@@ -1 +1 @@
__version__ = "0.6.0rc2"
__version__ = "1.0.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we need to bump it? Are we sure fev is now stable? If not, I'd stick to 0.7 as per semver.

future: datasets.Dataset,
*,
target_column: str | list[str],
target_columns_list: list[str],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if at all possible I would avoid type encodings in variables (i.e., ..._list) unless in local variables where they clearly serve to disambiguate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name here was chosen to mimic the task property Task.target_columns_list

fev/src/fev/task.py

Lines 727 to 733 in 9a835d0

@property
def target_columns_list(self) -> list[str]:
"""A list including names of all target columns for this task."""
if isinstance(self.target_column, list):
return self.target_column
else:
return [self.target_column]

I think the current naming where target_column can be of type str | list[str] is a bit counterintuitive. How about we rename as follows in a follow-up PR?

  1. Task.target_column -> target: str | list[str]
  2. Task.target_columns_list -> target_columns: list[str]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

up to you. I understand the problem, but the solution doesn't help me.

raise ValueError("`window_step_size` must be an int if `initial_cutoff` is an int")
assert self.window_step_size >= 1
max_allowed_cutoff = -self.horizon - (self.num_windows - 1) * self.window_step_size
if self.initial_cutoff < 0 and self.initial_cutoff > max_allowed_cutoff:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow this check. So if I understand correctly I can only specify initial cutoff to allow for at least the required number of steps (roughly horizon * nr_windows) in the future. if so, why is it not checked for timestamp as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of timestamp-based cutoffs we cannot perform this check during Task.__post_init__ since we don't yet know which timestamps are available in the dataset.

If there is not enough data to generate the splits with a timestamp-based initial_cutoff, an error will be raised once the user accesses EvaluationWindow.get_input_data() for the window where not enough data is available.

For an integer-based negative initial_cutoff, we can immediately tell if the task configuration is invalid during __post_init__, so this check here is just to save user's time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool

Copy link
Contributor

@canerturkmen canerturkmen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Target branch should likely be master. Otherwise, LGTM!

@shchur
Copy link
Contributor Author

shchur commented Aug 26, 2025

Target branch should likely be master. Otherwise, LGTM!

@canerturkmen I will merge pre-v1.0.0 into master after we are done with the refactor to avoid breaking the documentation for the existing users of the latest stable release.

@shchur shchur merged commit e308267 into pre-v1.0.0 Aug 26, 2025
@shchur shchur deleted the tasks-and-splits branch August 28, 2025 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants