MARIOH: Multiplicity-Aware Hypergraph Reconstruction

This repository provides the official implementation of MARIOH, a supervised method for reconstructing hyperedges in hypergraphs by leveraging edge multiplicity. MARIOH integrates several key components: a theoretically guaranteed filtering step to identify true size-2 hyperedges, a multiplicity-aware classifier for scoring hyperedge candidates, and a bidirectional search strategy that explores both high- and low-confidence cliques. These components work together to achieve accurate and efficient hypergraph reconstruction. For further details, please refer to our accompanying research paper.

main.py: The entry point script to run the hyperedge reconstruction pipeline.
params.py: Parameter dictionaries for various datasets and modes (reduced or preserved).
utils/: A directory containing modularized code for data processing, feature extraction, graph operations, model training, evaluation, and input/output utilities.
data/: A directory that should contain the dataset-specific training and testing files.

Requirements

Python Version: 3.8+ recommended
Dependencies:
- numpy
- networkx
- torch
- joblib
- argparse
- Additional Python dependencies can be installed via:
```
pip install -r requirements.txt
```
Adjust requirements.txt or the installation commands as needed for your environment.

Datasets

You must place your datasets into the data/ directory. Each dataset should have its own subdirectory, for example:

data/
|-- {dataset_name}/
    |-- train.txt        # Training data (reduced mode)
    |-- test.txt         # Testing data (reduced mode)
    |-- train_dup.txt    # Training data (preserved mode)
    +-- test_dup.txt     # Testing data (preserved mode)

Reduced mode uses train.txt and test.txt.
Preserved mode uses train_dup.txt and test_dup.txt.

Please refer to the related publication for details on dataset formats and preprocessing steps.

Running the Code

To run the pipeline, navigate to the directory containing main.py and execute:

python main.py --data {dataset_name} --gpu 0 --seed 42 --output_dir output

Arguments

--data {dataset_name}: Specify the dataset folder name located under data/.
--gpu {int}: GPU device number. If no GPU is available or you wish to run on CPU, set --gpu to a non-existent GPU ID (e.g., --gpu 99), and it will default to CPU.
--seed {int}: Random seed for reproducibility.
--output_dir {path}: Directory to store the output hyperedge predictions and results.
--preserved: Optional flag. If set, the pipeline will run in "preserved" mode using train_dup.txt and test_dup.txt. If omitted, the pipeline runs in "reduced" mode using train.txt and test.txt.

Examples

Reduced mode (default):

python main.py --data hschool --gpu 0 --seed 123 --output_dir output

Preserved mode:

python main.py --data hschool --gpu 0 --seed 123 --output_dir output --preserved

In these examples, the code will:

Load and preprocess the graph data.
Extract features and prepare a training dataset.
Train a classifier network with the best parameters specified in params.py.
Use the trained classifier to reconstruct hyperedges in the test graph.
Save the reconstructed hyperedges to output/reconstructed_hyp_reduced/{dataset_name}_{seed}.txt (in reduced mode) or output/reconstructed_hyp_preserved/{dataset_name}_{seed}.txt (in preserved mode).

Interpreting the Results

Output Files: The final reconstructed hyperedges are stored as comma-separated node IDs per line.
Evaluation Metrics: The code prints evaluation metrics such as Jaccard similarity and multiset Jaccard similarity during execution. These metrics help assess the quality of hyperedge reconstruction relative to the ground truth.
Performance & Reproducibility: By setting the random seed (--seed) and controlling hyperparameters through params.py, you can reproduce experimental results reported in the associated research paper.

Extending and Customizing

Modify params.py to add or change hyperparameters for different datasets.
Adjust or add dataset loaders in utils/data_processing.py if your input format differs.
Add new evaluation metrics in utils/evaluation.py.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
utils		utils
LICENSE		LICENSE
README.md		README.md
main.py		main.py
params.py		params.py
requirements.txt		requirements.txt
supplementary_document.pdf		supplementary_document.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MARIOH: Multiplicity-Aware Hypergraph Reconstruction

Contents

Requirements

Datasets

Running the Code

Arguments

Examples

Interpreting the Results

Extending and Customizing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

KyuhanLee/MARIOH

Folders and files

Latest commit

History

Repository files navigation

MARIOH: Multiplicity-Aware Hypergraph Reconstruction

Contents

Requirements

Datasets

Running the Code

Arguments

Examples

Interpreting the Results

Extending and Customizing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages