GitHub - chlehdwon/RDB2G-Bench: [NeurIPS 2025 Datasets and Benchmarks] Source code for the paper RDB2G-Bench: A Comprehensive Benchmark for Automatic Graph Modeling of Relational Databases

This is the official implementation of the paper RDB2G-Bench: A Comprehensive Benchmark for Automatic Graph Modeling of Relational Databases.

RDB2G-Bench is an easy-to-use framework for benchmarking graph-based analysis and prediction tasks by converting relational database data into graphs.

🚀 Installation

git clone https://github.com/chlehdwon/RDB2G-Bench.git
cd RDB2G-Bench
pip install -e .

Also, please install additional PyG dependencies. The below shows an example when you use torch 2.1.0 + cuda 12.1.

pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.1.0+cu121.html

You can skip this part if you don't want to reproduce our dataset.

⚡ Package Usage

Comprehensive documentation and detailed guides are available at our documentation site.

You can also check the examples/ directory for complete usage examples and tutorials.

Download Pre-computed Datasets

from rdb2g_bench.dataset.dataset import load_rdb2g_bench

bench = load_rdb2g_bench("./results")

result = bench['rel-f1']['driver-top3'][0]  # Access by graph index
test_metric = result['test_metric']         # Test performance
params = result['params']                   # Model parameters
train_time = result['train_time']           # Train time

Reproduce Datasets for Classification & Regression Tasks

from rdb2g_bench.dataset.node_worker import run_gnn_node_worker

results = run_gnn_node_worker(
    dataset_name="rel-f1",
    task_name="driver-top3",
    gnn="GraphSAGE",
    epochs=20,
    lr=0.005
)

Reproduce Datasets for Recommendation Tasks

from rdb2g_bench.dataset.link_worker import run_idgnn_link_worker

results = run_idgnn_link_worker(
    dataset_name="rel-avito",
    task_name="user-ad-visit",
    gnn="GraphSAGE",
    epochs=20,
    lr=0.001
)

Run Benchmarks

from rdb2g_bench.benchmark.bench_runner import run_benchmark

results = run_benchmark(
    dataset="rel-f1",
    task="driver-top3",
    gnn="GraphSAGE",
    budget_percentage=0.05,
    method="all",
    num_runs=10,
    seed=0
)

Run LLM-based baseline

Before using LLM-based baseline, you need to set up your API key:

export ANTHROPIC_API_KEY="YOUR_API_KEY"

from rdb2g_bench.benchmark.llm.llm_runner import run_llm_baseline

results = run_llm_baseline(
    dataset="rel-f1",
    task="driver-top3",
    gnn="GraphSAGE",
    budget_percentage=0.05,
    model="claude-3-5-sonnet-latest",
    temperature=0.8,
    seed=42
)

📁 Package Structure

rdb2g_bench/
├── benchmark/         # Core benchmarking functionality
│   ├── llm/           # LLM-based baseline methods
│   └── baselines/     # Other baseline methods
├── common/            # Shared utilities and search spaces  
├── dataset/           # Dataset loading and processing
└── __init__.py        # Package initialization

📖 Reference

The dataset construction and implementation of RDB2G-Bench is based on the RelBench framework.

📝 Citation

If you use RDB2G-Bench in your research, please cite:

@inproceedings{choi2025rdb2gbench,
    title={RDB2G-Bench: A Comprehensive Benchmark for Automatic Graph Modeling of Relational Databases}, 
    author={Dongwon Choi and Sunwoo Kim and Juyeon Kim and Kyungho Kim and Geon Lee and Shinhwan Kang and Myunghwan Kim and Kijung Shin},
    year={2025},
    booktitle={NeurIPS},
}

or

@article{choi2025rdb2gbench,
    title={RDB2G-Bench: A Comprehensive Benchmark for Automatic Graph Modeling of Relational Databases}, 
    author={Dongwon Choi and Sunwoo Kim and Juyeon Kim and Kyungho Kim and Geon Lee and Shinhwan Kang and Myunghwan Kim and Kijung Shin},
    year={2025},
    url={https://arxiv.org/abs/2506.01360}, 
}

⚖️ License

This project is distributed under the MIT License as specified in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
docs		docs
examples		examples
rdb2g_bench		rdb2g_bench
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Installation

⚡ Package Usage

Download Pre-computed Datasets

Reproduce Datasets for Classification & Regression Tasks

Reproduce Datasets for Recommendation Tasks

Run Benchmarks

Run LLM-based baseline

📁 Package Structure

📖 Reference

📝 Citation

⚖️ License

About

Uh oh!

Releases 3

Packages

Uh oh!

Languages

License

chlehdwon/RDB2G-Bench

Folders and files

Latest commit

History

Repository files navigation

🚀 Installation

⚡ Package Usage

Download Pre-computed Datasets

Reproduce Datasets for Classification & Regression Tasks

Reproduce Datasets for Recommendation Tasks

Run Benchmarks

Run LLM-based baseline

📁 Package Structure

📖 Reference

📝 Citation

⚖️ License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Languages

Packages