Skip to content

Conversation

@LennartPurucker
Copy link
Collaborator

@LennartPurucker LennartPurucker commented Oct 21, 2023

Rel.: #2779

Description of changes:

This PR adds support for dynamic stacking. The idea of dynamic stacking is to avoid stacked overfitting (a.k.a. stacked information leakage) by fitting AutoGluon at least twice. All but the last fits are done on a subset of the training data whereby a holdout set is used to determine if stacked overfitting occurs for the provided data. Afterwards, we run the last fit with or without multi-layer stacking depending on whether stacked overfitting has not occurred in a previous fit.

We understand dynamic stacking as a baseline solution to avoid stacked overfitting. Nevertheless, so far, it is the best solution we have benchmarked due to it being most consistent (which is very important for AutoML systems). Moreover, due to having a holdout set as a result of this approach, we have an empirical source of truth to determine if stacked overfitting occurred. While this source of truth may not necessarily generalize to the test data, it has so far still been better than any heuristic alternative or adjustment to the out-of-fold predictions.

The default version of the code in this PR has an accuracy (!preliminary numbers!) of correctly determining whether stacked overfitting occurs for each fold of the AutoML benchmark of ~74% (balanced accuracy: ~69%). This translates into better predictive performance on average.

Code Example

Script (outdated, see tests for newest examples)
import numpy as np
import openml
import pandas as pd
from autogluon.tabular import TabularPredictor


def get_data(tid: int, fold: int):
    # Get Task and dataset from OpenML and return split data
    oml_task = openml.tasks.get_task(tid, download_splits=True, download_data=True, download_qualities=False, download_features_meta_data=False)

    train_ind, test_ind = oml_task.get_train_test_split_indices(fold)
    X, *_ = oml_task.get_dataset().get_data(dataset_format="dataframe")

    return (
        X.iloc[train_ind, :].reset_index(drop=True),
        X.iloc[test_ind, :].reset_index(drop=True),
        oml_task.target_name,
        oml_task.task_type != "Supervised Classification",
    )


def _print_lb(leaderboard, task_id):
    with pd.option_context("display.max_rows", None, "display.max_columns", None, "display.width", 1000):
        print(leaderboard[leaderboard["model"].str.endswith("L1")].sort_values(by="score_val", ascending=True))
        print(leaderboard[~leaderboard["model"].str.endswith("L1")].sort_values(by="score_val", ascending=False))


def _run(task_id, metric, fold=0, print_res=True, enable_test=False):
    train_data, test_data, label, regression = get_data(task_id, fold)
    n_max_cols = 100
    n_max_train_instances = 500
    n_max_test_instances = 200

    # Sub sample instances
    train_data = train_data.sample(n=min(len(train_data), n_max_train_instances), random_state=0).reset_index(drop=True)
    test_data = test_data.sample(n=min(len(test_data), n_max_test_instances), random_state=0).reset_index(drop=True)

    # Sub sample columns
    cols = list(train_data.columns)
    cols.remove(label)
    if len(cols) > n_max_cols:
        cols = list(np.random.RandomState(42).choice(cols, replace=False, size=n_max_cols))
    train_data = train_data[cols + [label]]
    test_data = test_data[cols + [label]]

    # Run AutoGluon
    print(f"### task {task_id} and fold {fold} and shape {(train_data.shape, test_data.shape)}.")
    predictor = TabularPredictor(eval_metric=metric, label=label, verbosity=2)
    predictor.fit(
        train_data=train_data,
        hyperparameters={
            "FASTAI": [{}],
            "NN_TORCH": [{}],
            "RF": [{}],
            "XT": [{}],
            "GBM": [{}],
        },
        num_bag_sets=2,
        num_bag_folds=2,
        num_gpus=0,
        num_stack_levels=1,
        fit_weighted_ensemble=True,
        dynamic_stacking=True,
        time_limit=240,
        ds_args=dict(use_holdout=True, detection_time_frac=1 / 4, holdout_frac=1 / 9),
    )
    leaderboard = predictor.leaderboard(train_data, silent=True)[["model", "score_test", "score_val"]].sort_values(by="model").reset_index(drop=True)

    if print_res:
        _print_lb(leaderboard, task_id)

    return leaderboard


if __name__ == "__main__":
    _run(359955, "roc_auc")
    _run(146217, "log_loss")
    _run(359938, "mse")

TODOs and Open Questions for Merge

  • Determine how to include this in presets and decide on default split ratio

Write tests:

  • No time limit and time limit
  • Extra fit works with dynamic stacking at first fit
  • holdout, cv, repeated cv, custom val data.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@LennartPurucker LennartPurucker marked this pull request as ready for review October 26, 2023 21:07
@Innixma Innixma added this to the 1.0 Release milestone Oct 27, 2023
@Innixma Innixma added enhancement New feature or request module: tabular priority: 0 Maximum priority labels Oct 27, 2023
@Innixma Innixma self-requested a review October 27, 2023 18:48
Copy link
Contributor

@Innixma Innixma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added initial review

raise ValueError("Unsupported validation procedure during dynamic stacking!")

set_logger_verbosity(ag_fit_kwargs["verbosity"])
org_learner = copy.deepcopy(self._learner)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we creating a copy of the learner instead of a copy of the predictor? (both are probably fine, but wanted to know the reasoning)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding the only part of the predictor that is affected / needed for a reset to the original state before a fit is the learner. I think copying the predictor would be more expensive (copying useless stuff) than copying the learner. But for safety, we could also copy the predictor and reset the object this way.

@Innixma
Copy link
Contributor

Innixma commented Nov 2, 2023

Benchmark results look good! Thanks for the amazing contribution @LennartPurucker!

Copy link
Contributor

@Innixma Innixma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Innixma Innixma merged commit 29dd8d2 into autogluon:master Nov 2, 2023
LennartPurucker added a commit to LennartPurucker/autogluon that referenced this pull request Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request module: tabular priority: 0 Maximum priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants