-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
Needs TriageIssue requires TriageIssue requires TriagebugSomething isn't workingSomething isn't workingmodule: tabular
Milestone
Description
Describe the bug
Using autogluon tabular, Ray package is used for paralellism, but when have a big dataset with a lot of columns eg:10.000 Ray reproduces an error from catboost. It's only catboost with problems, all other algorithms work well.
Checking the log we can find the error at this line:
Will train 2 folds in parallel instead (Estimated 28.90% memory usage per fold, 57.80%/80.00% total)
In this case, AutoGluon's estimate was wrong, and the two models ended up taking >100% of memory instead of 57.80%, causing the out of memory exception.
To Reproduce
predictor = TabularPredictor(
label="n2_maior_igual_17",
eval_metric="log_loss",
path="modelos/n2_maior_igual_17/"
).fit(
dados_treino_n2_maior_igual_17,
presets="best_quality",
excluded_model_types=["KNN", "XT", "RF"],
ds_args={"enable_ray_logging": False},
ag_args_fit={
"early_stop": None,
"colsample_bylevel": 1.0,
},
time_limit= 8 * 3600,
refit_full=True,
calibrate=True
)Screenshots / Logs
Error:
Fitting model: CatBoost_BAG_L1 ... Training model for up to 9610.46s of the 16674.71s of remaining time.
Memory not enough to fit 8 folds in parallel. Will train 2 folds in parallel instead (Estimated 28.90% memory usage per fold, 57.80%/80.00% total).
Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy (2 workers, per: cpus=8, gpus=0, memory=28.90%)
Warning: Exception caused CatBoost_BAG_L1 to fail during training... Skipping this model.
ray::_ray_fit() (pid=1932, ip=127.0.0.1)
File "python\ray\_raylet.pyx", line 1883, in ray._raylet.execute_task
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\models\ensemble\fold_fitting_strategy.py", line 413, in _ray_fit
fold_model.fit(X=X_fold, y=y_fold, X_val=X_val_fold, y_val=y_val_fold, time_limit=time_limit_fold, **resources, **kwargs_fold)
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\models\abstract\abstract_model.py", line 925, in fit
out = self._fit(**kwargs)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\tabular\models\catboost\catboost_model.py", line 243, in _fit
self.model.fit(X, **fit_final_kwargs)
File "C:\Users\celes\anaconda3\Lib\site-packages\catboost\core.py", line 5245, in fit
self._fit(X, y, cat_features, text_features, embedding_features, None, graph, sample_weight, None, None, None, None, baseline, use_best_model,
File "C:\Users\celes\anaconda3\Lib\site-packages\catboost\core.py", line 2410, in _fit
self._train(
File "C:\Users\celes\anaconda3\Lib\site-packages\catboost\core.py", line 1790, in _train
self._object._train(train_pool, test_pool, params, allow_clear_pool, init_model._object if init_model else None)
File "_catboost.pyx", line 5017, in _catboost._CatBoost._train
File "_catboost.pyx", line 5066, in _catboost._CatBoost._train
_catboost.CatBoostError: bad allocation
Detailed Traceback:
Traceback (most recent call last):
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\tabular\trainer\abstract_trainer.py", line 2160, in _train_and_save
model = self._train_single(**model_fit_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\tabular\trainer\abstract_trainer.py", line 2047, in _train_single
model = model.fit(X=X, y=y, X_val=X_val, y_val=y_val, X_test=X_test, y_test=y_test, total_resources=total_resources, **model_fit_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\models\abstract\abstract_model.py", line 925, in fit
out = self._fit(**kwargs)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\models\ensemble\stacker_ensemble_model.py", line 270, in _fit
return super()._fit(X=X, y=y, time_limit=time_limit, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\models\ensemble\bagged_ensemble_model.py", line 390, in _fit
self._fit_folds(
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\models\ensemble\bagged_ensemble_model.py", line 847, in _fit_folds
fold_fitting_strategy.after_all_folds_scheduled()
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\models\ensemble\fold_fitting_strategy.py", line 690, in after_all_folds_scheduled
self._run_parallel(X, y, X_pseudo, y_pseudo, model_base_ref, time_limit_fold, head_node_id)
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\models\ensemble\fold_fitting_strategy.py", line 631, in _run_parallel
self._process_fold_results(finished, unfinished, fold_ctx)
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\models\ensemble\fold_fitting_strategy.py", line 587, in _process_fold_results
raise processed_exception
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\models\ensemble\fold_fitting_strategy.py", line 550, in _process_fold_results
fold_model, pred_proba, time_start_fit, time_end_fit, predict_time, predict_1_time, predict_n_size, fit_num_cpus, fit_num_gpus = self.ray.get(finished)
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\ray_private\auto_init_hook.py", line 21, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\ray_private\client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\ray_private\worker.py", line 2771, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\ray_private\worker.py", line 919, in get_objects
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(CatBoostError): ray::_ray_fit() (pid=1932, ip=127.0.0.1)
File "python\ray\_raylet.pyx", line 1883, in ray._raylet.execute_task
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\models\ensemble\fold_fitting_strategy.py", line 413, in _ray_fit
fold_model.fit(X=X_fold, y=y_fold, X_val=X_val_fold, y_val=y_val_fold, time_limit=time_limit_fold, **resources, **kwargs_fold)
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\core\models\abstract\abstract_model.py", line 925, in fit
out = self._fit(**kwargs)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\celes\anaconda3\Lib\site-packages\autogluon\tabular\models\catboost\catboost_model.py", line 243, in _fit
self.model.fit(X, **fit_final_kwargs)
File "C:\Users\celes\anaconda3\Lib\site-packages\catboost\core.py", line 5245, in fit
self._fit(X, y, cat_features, text_features, embedding_features, None, graph, sample_weight, None, None, None, None, baseline, use_best_model,
File "C:\Users\celes\anaconda3\Lib\site-packages\catboost\core.py", line 2410, in _fit
self._train(
File "C:\Users\celes\anaconda3\Lib\site-packages\catboost\core.py", line 1790, in _train
self._object._train(train_pool, test_pool, params, allow_clear_pool, init_model._object if init_model else None)
File "_catboost.pyx", line 5017, in _catboost._CatBoost._train
File "_catboost.pyx", line 5066, in _catboost._CatBoost._train
_catboost.CatBoostError: bad allocation
Installed Versions
Latest Autogluon, catboost and ray
Metadata
Metadata
Assignees
Labels
Needs TriageIssue requires TriageIssue requires TriagebugSomething isn't workingSomething isn't workingmodule: tabular
Type
Projects
Status
Done