-
Notifications
You must be signed in to change notification settings - Fork 1.5k
RBDT does not support xgboost inference #15197
Description
Check duplicate issues.
- Checked for duplicates
Description
RBDT was never testing proper handling of xgboost models, either in writing them to ROOT format or in their usage afterwards.
The functionality was never actually tested since the xgboost package was not installed on the CI nodes and the only available test was silently not running if the package was not found
root/tmva/tmva/test/CMakeLists.txt
Lines 34 to 41 in 05203ae
| # Disabled because RBDT doesn't support the imbalanced tree structure of | |
| # XGBoost models. | |
| # if(dataframe) | |
| # find_python_module(xgboost QUIET) | |
| # if (PY_XGBOOST_FOUND) | |
| # ROOT_ADD_PYUNITTEST(rbdt_xgboost rbdt_xgboost.py) | |
| # endif() | |
| # endif() |
The test had to be disabled right after the package was installed on some nodes and it immediately triggered failures as documented in this PR #15183
Since today, all platforms are now running (for the first time) RBDT tests with xgboost models. The Alma9 platform highlights a severe problem on the following three tests
TEST FAILURES:
296:test-tcollbm
1314:tutorial-tmva-tmva101_Training-py
1315:tutorial-tmva-tmva102_Testing-py
By running inside a debugger, we can see that there is an issue at exit time (which does not depend on the version of xgboost)
(ROOT-CI) [root@663cf5af2db5 tmva]# gdb --args python tmva102_Testing.py
[...]
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007f1251f78110 in __do_global_dtors_aux () from /lib64/libopenblaso.so.0
(gdb)
(gdb) bt
#0 0x00007f1251f78110 in __do_global_dtors_aux () from /lib64/libopenblaso.so.0
#1 0x00007f1279ebef9e in _dl_fini () at dl-fini.c:142
#2 0x00007f12796574dd in __run_exit_handlers (status=0, listp=0x7f12797fa838 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true,
run_dtors=run_dtors@entry=true) at exit.c:116
#3 0x00007f1279657620 in __GI_exit (status=<optimized out>) at exit.c:146
#4 0x00007f127963feb7 in __libc_start_call_main (main=main@entry=0x557932280160 <main>, argc=argc@entry=2, argv=argv@entry=0x7ffe94d9b838)
at ../sysdeps/nptl/libc_start_call_main.h:74
#5 0x00007f127963ff60 in __libc_start_main_impl (main=0x557932280160 <main>, argc=2, argv=0x7ffe94d9b838, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=0x7ffe94d9b828) at ../csu/libc-start.c:389
#6 0x0000557932280095 in _start ()
And the same stacktrace is produced for all three tests. This may or may not be linked to wlav/cppyy#227 and the fact that complex Python packages (in this case it's xgboost, but also pytorch is notorious for this behaviour) seem to not handle well C++ symbols.
Our current understanding of the situation with RBDT is the following:
- The implementation that is currently in master never actually worked on multiple levels (largely untested, tutorials did not check for correctness of results)
- Even if it worked, it lacked the ability to handle imbalanced trees which is pretty common when using xgboost models
- The re-implementation of the feature available at [TMVA] Fix RBDT evaluation of XGBoost models by merging RBDT with FastForest #15173 vastly improves the functionality
- There seems to be a major issue with complex Python packages and C++ which looks like we can't solve and will make at least one platform always red on our CI if we leave things as they are
Reproducer
At least on Alma9, the following is the smallest reproducer of the aforementioned segfault
import xgboost
import ROOT
import numpy as np
import json
np.random.seed(1234)
def create_dataset(num_events, num_features, num_outputs, dtype=np.float32):
x = np.random.normal(0.0, 1.0, (num_events, num_features)).astype(dtype=dtype)
if num_outputs == 1:
y = np.random.normal(0.0, 1.0, (num_events)).astype(dtype=dtype)
else:
y = np.random.choice(
a=range(num_outputs), size=(num_events), p=[1.0 / float(num_outputs)] * num_outputs
).astype(dtype=dtype)
return x, y
def _test_XGBBinary(label):
"""
Compare response of XGB classifier and TMVA tree inference system.
"""
x, y = create_dataset(1000, 10, 2)
xgb = xgboost.XGBClassifier(n_estimators=100, max_depth=3)
xgb.fit(x, y)
ROOT.TMVA.Experimental.SaveXGBoost(xgb, "myModel", "testXGBBinary{}.root".format(label), num_inputs=10)
bdt = ROOT.TMVA.Experimental.RBDT("myModel", "testXGBBinary{}.root".format(label))
y_xgb = xgb.predict_proba(x)[:, 1].squeeze()
y_bdt = bdt.Compute(x).squeeze()
np.testing.assert_array_almost_equal(y_xgb, y_bdt)
if __name__ == "__main__":
_test_XGBBinary("default")ROOT version
Any
Installation method
build from source
Operating system
Alma9 (others could be affected)
Additional context
No response