Code for the paper Hypergraph Pattern Machine: Compositional Tokenization for Higher-Order Interactions.
Hypergraphs model higher-order relations that drive real-world decisions such as drug prescriptions, where a regime determines whether a drug should be dropped, kept, or excluded — a structural signal that existing methods cannot capture because they only propagate messages over observed hyperedges. HGPM closes this gap by tokenizing the subsets around each target entity into a bounded inclusion DAG, labeling every adjacent-order edge as compositional, emergent, or inhibitory, and encoding the sequence with an inclusion-aware Transformer pretrained via masked subset reconstruction. On eight node-classification benchmarks and two real drug-interaction corpora (HODDI, JADER), HGPM matches or exceeds state-of-the-art methods and, in a clinical case study, correctly distinguishes a side-effect-suppressing drug addition that feature-similarity baselines miss.
HGPM/— the Python package (data, model, task drivers, utils)config/— YAML configs for every dataset (config/{graph,drug}/*.yaml)data/— drop-in root for the released datasets, plus prepare scripts
conda create -n hgpm -c conda-forge python=3.11 -y
conda activate hgpm
pip install torch --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txtFor a different CUDA build, swap the --index-url
(PyTorch index).
Datasets are distributed separately. Download both archives from the Google Drive folder and unzip them at the repository root.
| File | SHA256 |
|---|---|
hgpm_graph_data_package.zip |
E5AE5124A88C4E56418217AB29C5C8FC48E332BD183F8DBF6EADF3EB97312342 |
hgpm_drug_data_package.zip |
02D62F0692408F157A5390F1D67A54417FB58170B2DCDC0A3E8EC515A01734E5 |
After unzipping, data/graph/{protocols,dags}/ and
data/drug/{hoddi,jader}/ should be populated. See
data/README.md for details.
Graph benchmarks — homophilic (citeseer, pubmed, cora_ca, dblp_ca)
and heterophilic (congress, senate, walmart, house):
python -m HGPM.main_graph_pretrain --config config/graph/citeseer_pretrain.yaml
python -m HGPM.main_graph_finetune --config config/graph/citeseer_finetune.yamlDrug benchmarks — HODDI uses a single pretraining stage; JADER follows a two-stage curriculum (semantic warm-up, then regime prediction):
# HODDI
python -m HGPM.main_drug_pretrain --config config/drug/hoddi_pretrain.yaml
python -m HGPM.main_drug_finetune --config config/drug/hoddi_finetune.yaml
# JADER
python -m HGPM.main_drug_pretrain --config config/drug/jader_pretrain_stage1.yaml
python -m HGPM.main_drug_pretrain --config config/drug/jader_pretrain_stage2.yaml
python -m HGPM.main_drug_finetune --config config/drug/jader_finetune.yamlAppend --smoke to any command for a fast end-to-end check (a few minibatches
through data loading, pretraining, finetuning, and evaluation).
@misc{hgpm2026,
title = {Hypergraph Pattern Machine: Compositional Tokenization for Higher-Order Interactions},
author = {Zhao, Kyrie and Wang, Zehong and Ma, Tianyi and Wu, Fang and
Tang, Xiangru and Li\`{o}, Pietro and Wang, Sheng and Ye, Yanfang},
year = {2026},
eprint = {2605.16527},
archivePrefix = {arXiv},
primaryClass = {cs.LG},
doi = {10.48550/arXiv.2605.16527},
url = {https://arxiv.org/abs/2605.16527}
}Released under the MIT License.
