This is the official implementation of the paper RDB2G-Bench: A Comprehensive Benchmark for Automatic Graph Modeling of Relational Databases.
RDB2G-Bench is an easy-to-use framework for benchmarking graph-based analysis and prediction tasks by converting relational database data into graphs.
git clone https://github.com/chlehdwon/RDB2G-Bench.git
cd RDB2G-Bench
pip install -e .Also, please install additional PyG dependencies. The below shows an example when you use torch 2.1.0 + cuda 12.1.
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.1.0+cu121.htmlYou can skip this part if you don't want to reproduce our dataset.
Comprehensive documentation and detailed guides are available at our documentation site.
You can also check the examples/ directory for complete usage examples and tutorials.
from rdb2g_bench.dataset.dataset import load_rdb2g_bench
bench = load_rdb2g_bench("./results")
result = bench['rel-f1']['driver-top3'][0] # Access by graph index
test_metric = result['test_metric'] # Test performance
params = result['params'] # Model parameters
train_time = result['train_time'] # Train timefrom rdb2g_bench.dataset.node_worker import run_gnn_node_worker
results = run_gnn_node_worker(
dataset_name="rel-f1",
task_name="driver-top3",
gnn="GraphSAGE",
epochs=20,
lr=0.005
)from rdb2g_bench.dataset.link_worker import run_idgnn_link_worker
results = run_idgnn_link_worker(
dataset_name="rel-avito",
task_name="user-ad-visit",
gnn="GraphSAGE",
epochs=20,
lr=0.001
)from rdb2g_bench.benchmark.bench_runner import run_benchmark
results = run_benchmark(
dataset="rel-f1",
task="driver-top3",
gnn="GraphSAGE",
budget_percentage=0.05,
method="all",
num_runs=10,
seed=0
)Before using LLM-based baseline, you need to set up your API key:
export ANTHROPIC_API_KEY="YOUR_API_KEY"from rdb2g_bench.benchmark.llm.llm_runner import run_llm_baseline
results = run_llm_baseline(
dataset="rel-f1",
task="driver-top3",
gnn="GraphSAGE",
budget_percentage=0.05,
model="claude-3-5-sonnet-latest",
temperature=0.8,
seed=42
)rdb2g_bench/
├── benchmark/ # Core benchmarking functionality
│ ├── llm/ # LLM-based baseline methods
│ └── baselines/ # Other baseline methods
├── common/ # Shared utilities and search spaces
├── dataset/ # Dataset loading and processing
└── __init__.py # Package initialization
The dataset construction and implementation of RDB2G-Bench is based on the RelBench framework.
If you use RDB2G-Bench in your research, please cite:
@inproceedings{choi2025rdb2gbench,
title={RDB2G-Bench: A Comprehensive Benchmark for Automatic Graph Modeling of Relational Databases},
author={Dongwon Choi and Sunwoo Kim and Juyeon Kim and Kyungho Kim and Geon Lee and Shinhwan Kang and Myunghwan Kim and Kijung Shin},
year={2025},
booktitle={NeurIPS},
}or
@article{choi2025rdb2gbench,
title={RDB2G-Bench: A Comprehensive Benchmark for Automatic Graph Modeling of Relational Databases},
author={Dongwon Choi and Sunwoo Kim and Juyeon Kim and Kyungho Kim and Geon Lee and Shinhwan Kang and Myunghwan Kim and Kijung Shin},
year={2025},
url={https://arxiv.org/abs/2506.01360},
}This project is distributed under the MIT License as specified in the LICENSE file.
