Support Dynamic Stacking to Avoid Stacked Overfitting #3616

LennartPurucker · 2023-10-21T00:01:53Z

Description of changes:

This PR adds support for dynamic stacking. The idea of dynamic stacking is to avoid stacked overfitting (a.k.a. stacked information leakage) by fitting AutoGluon at least twice. All but the last fits are done on a subset of the training data whereby a holdout set is used to determine if stacked overfitting occurs for the provided data. Afterwards, we run the last fit with or without multi-layer stacking depending on whether stacked overfitting has not occurred in a previous fit.

We understand dynamic stacking as a baseline solution to avoid stacked overfitting. Nevertheless, so far, it is the best solution we have benchmarked due to it being most consistent (which is very important for AutoML systems). Moreover, due to having a holdout set as a result of this approach, we have an empirical source of truth to determine if stacked overfitting occurred. While this source of truth may not necessarily generalize to the test data, it has so far still been better than any heuristic alternative or adjustment to the out-of-fold predictions.

The default version of the code in this PR has an accuracy (!preliminary numbers!) of correctly determining whether stacked overfitting occurs for each fold of the AutoML benchmark of ~74% (balanced accuracy: ~69%). This translates into better predictive performance on average.

Code Example

Script (outdated, see tests for newest examples)

import numpy as np
import openml
import pandas as pd
from autogluon.tabular import TabularPredictor


def get_data(tid: int, fold: int):
    # Get Task and dataset from OpenML and return split data
    oml_task = openml.tasks.get_task(tid, download_splits=True, download_data=True, download_qualities=False, download_features_meta_data=False)

    train_ind, test_ind = oml_task.get_train_test_split_indices(fold)
    X, *_ = oml_task.get_dataset().get_data(dataset_format="dataframe")

    return (
        X.iloc[train_ind, :].reset_index(drop=True),
        X.iloc[test_ind, :].reset_index(drop=True),
        oml_task.target_name,
        oml_task.task_type != "Supervised Classification",
    )


def _print_lb(leaderboard, task_id):
    with pd.option_context("display.max_rows", None, "display.max_columns", None, "display.width", 1000):
        print(leaderboard[leaderboard["model"].str.endswith("L1")].sort_values(by="score_val", ascending=True))
        print(leaderboard[~leaderboard["model"].str.endswith("L1")].sort_values(by="score_val", ascending=False))


def _run(task_id, metric, fold=0, print_res=True, enable_test=False):
    train_data, test_data, label, regression = get_data(task_id, fold)
    n_max_cols = 100
    n_max_train_instances = 500
    n_max_test_instances = 200

    # Sub sample instances
    train_data = train_data.sample(n=min(len(train_data), n_max_train_instances), random_state=0).reset_index(drop=True)
    test_data = test_data.sample(n=min(len(test_data), n_max_test_instances), random_state=0).reset_index(drop=True)

    # Sub sample columns
    cols = list(train_data.columns)
    cols.remove(label)
    if len(cols) > n_max_cols:
        cols = list(np.random.RandomState(42).choice(cols, replace=False, size=n_max_cols))
    train_data = train_data[cols + [label]]
    test_data = test_data[cols + [label]]

    # Run AutoGluon
    print(f"### task {task_id} and fold {fold} and shape {(train_data.shape, test_data.shape)}.")
    predictor = TabularPredictor(eval_metric=metric, label=label, verbosity=2)
    predictor.fit(
        train_data=train_data,
        hyperparameters={
            "FASTAI": [{}],
            "NN_TORCH": [{}],
            "RF": [{}],
            "XT": [{}],
            "GBM": [{}],
        },
        num_bag_sets=2,
        num_bag_folds=2,
        num_gpus=0,
        num_stack_levels=1,
        fit_weighted_ensemble=True,
        dynamic_stacking=True,
        time_limit=240,
        ds_args=dict(use_holdout=True, detection_time_frac=1 / 4, holdout_frac=1 / 9),
    )
    leaderboard = predictor.leaderboard(train_data, silent=True)[["model", "score_test", "score_val"]].sort_values(by="model").reset_index(drop=True)

    if print_res:
        _print_lb(leaderboard, task_id)

    return leaderboard


if __name__ == "__main__":
    _run(359955, "roc_auc")
    _run(146217, "log_loss")
    _run(359938, "mse")

TODOs and Open Questions for Merge

Determine how to include this in presets and decide on default split ratio

Write tests:

No time limit and time limit
Extra fit works with dynamic stacking at first fit
holdout, cv, repeated cv, custom val data.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

… to ray in the future

github-actions · 2023-10-26T23:37:48Z

Job PR-3616-11eef0c is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3616/11eef0c/index.html

github-actions · 2023-10-27T00:07:58Z

Job PR-3616-bb16dbc is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3616/bb16dbc/index.html

github-actions · 2023-10-27T00:11:38Z

Job PR-3616-fa14d95 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3616/fa14d95/index.html

Innixma

Added initial review

core/src/autogluon/core/learner/abstract_learner.py

core/src/autogluon/core/stacked_overfitting/utils.py

tabular/src/autogluon/tabular/predictor/predictor.py

Innixma · 2023-10-27T19:26:25Z

tabular/src/autogluon/tabular/predictor/predictor.py

+        raise ValueError("Unsupported validation procedure during dynamic stacking!")
+
+    set_logger_verbosity(ag_fit_kwargs["verbosity"])
+    org_learner = copy.deepcopy(self._learner)


Why are we creating a copy of the learner instead of a copy of the predictor? (both are probably fine, but wanted to know the reasoning)

To my understanding the only part of the predictor that is affected / needed for a reset to the original state before a fit is the learner. I think copying the predictor would be more expensive (copying useless stuff) than copying the learner. But for safety, we could also copy the predictor and reset the object this way.

tabular/src/autogluon/tabular/predictor/predictor.py

…amic stacking

…tacking

github-actions · 2023-10-28T00:47:26Z

Job PR-3616-64c98fc is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3616/64c98fc/index.html

github-actions · 2023-10-28T01:28:41Z

Job PR-3616-7711a9a is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3616/7711a9a/index.html

github-actions · 2023-10-28T01:31:55Z

Job PR-3616-8228cc4 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3616/8228cc4/index.html

github-actions · 2023-10-28T02:10:26Z

Job PR-3616-9e20387 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3616/9e20387/index.html

github-actions · 2023-10-28T02:32:58Z

Job PR-3616-6e2aace is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3616/6e2aace/index.html

github-actions · 2023-10-28T02:41:39Z

Job PR-3616-f090dc2 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3616/f090dc2/index.html

Innixma · 2023-11-02T20:14:37Z

Benchmark results look good! Thanks for the amazing contribution @LennartPurucker!

Innixma

LGTM!

LennartPurucker added 15 commits October 20, 2023 16:06

first version of dynamic stacking

5552366

apply isort

c928aae

avoid memory leak fix for now as it seems to not work on mac - switch…

d7e3317

… to ray in the future

rework dynamic stacking with ray and AutoGluon holdout

938cafb

clean up time limit code

2321b41

add support for repeated cross-validation

3e2ce22

add support for custom val data

0948373

linter fix

c5c3a30

make time always int

8544ee0

fix default values bug

ce0fb15

clean utils

2656f46

rename utils

dad40cd

add dynamic stacking tests

11eef0c

remove ray import

bb16dbc

fix typo

fa14d95

LennartPurucker marked this pull request as ready for review October 26, 2023 21:07

Innixma added this to the 1.0 Release milestone Oct 27, 2023

Innixma added enhancement New feature or request module: tabular priority: 0 Maximum priority labels Oct 27, 2023

Innixma self-requested a review October 27, 2023 18:48

Innixma reviewed Oct 27, 2023

View reviewed changes

LennartPurucker added 5 commits October 27, 2023 14:50

initial review update

64c98fc

improve input arguments of ds_args

7711a9a

add missing test again

8228cc4

fix bugs in concat of holdout data

f596c17

fix linter

9e20387

LennartPurucker and others added 3 commits October 27, 2023 23:16

fix isinstance typing

6e2aace

add note on where to find more info on parameters to docstring of dyn…

37fb051

…amic stacking

Merge remote-tracking branch 'origin/dynamic_stacking' into dynamic_s…

f090dc2

…tacking

Innixma approved these changes Nov 2, 2023

View reviewed changes

Innixma merged commit 29dd8d2 into autogluon:master Nov 2, 2023

LennartPurucker added a commit to LennartPurucker/autogluon that referenced this pull request Jun 1, 2024

Support Dynamic Stacking to Avoid Stacked Overfitting (autogluon#3616)

d3bf3af

Support Dynamic Stacking to Avoid Stacked Overfitting #3616

Support Dynamic Stacking to Avoid Stacked Overfitting #3616

Uh oh!

Conversation

LennartPurucker commented Oct 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of changes:

Code Example

TODOs and Open Questions for Merge

Uh oh!

github-actions bot commented Oct 26, 2023

Uh oh!

github-actions bot commented Oct 27, 2023

Uh oh!

github-actions bot commented Oct 27, 2023

Uh oh!

Innixma left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Innixma Oct 27, 2023

Choose a reason for hiding this comment

Uh oh!

LennartPurucker Oct 27, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Oct 28, 2023

Uh oh!

github-actions bot commented Oct 28, 2023

Uh oh!

github-actions bot commented Oct 28, 2023

Uh oh!

github-actions bot commented Oct 28, 2023

Uh oh!

github-actions bot commented Oct 28, 2023

Uh oh!

github-actions bot commented Oct 28, 2023

Uh oh!

Innixma commented Nov 2, 2023

Uh oh!

Innixma left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LennartPurucker commented Oct 21, 2023 •

edited

Loading