The benchmark framework is divided into three parts; dataset construction, document ranking, and LLM generation. They are all standalone files that can be executed, but this repo has the constructed splits, and ranked files if you to go ahead in the pipeline. Please refer to data/dataset_template.ipynb for an example of how the data is made to ensure product, neighbor, and user size distribution. Please refer to notebook/ranking.ipynb for how the files are ranked. This framework is not set up to run everything at once. master_generation.py was converted to CLI to run files but will require your own API key or endpoint.
gh repo clone PGraphRAG-benchmark/PGraphRAG
cd PGraphRAG-benchmark/PGraphRAG
### Install Dependencies
To install the necessary dependencies, run:
```bash
pip install -r requirements.txNote this is not necessary to run you own LLM models, we ran Llama-3.1-8b-instruct on our own hardware and GPT-4o-mini through Azure cloud services.
Includes files to construct a dataset for the PGraph Framework. The GraphConstruction script processes a data split JSON and forms the graph network. This is a required step to run document ranking, and the graph construction is handled internally in the ranking script.
Contains files for ranking.
Returns the profile in a dictionary to run generations on based on tuned settings.
The master_generation.py script is used to generate outputs for dataset tasks using LLMs like Llama-3.1-8B-Instruct or GPT.
python master_generation.py --input ./data/Rankings/Amazon/amazon_dev_reviewText_bm25.json --model gptThis example uses GPT to generate review text on the dev split of Amazon reviews (User Product Review Generation), ranked by BM25 on all modes and all k values.
-
--input: File path to the ranking file. Required. -
--model: Model to use for generation. Required.- Valid options:
llama: Llama-3.1-8B-Instructgpt: gpt-4o-mini-20240718
- Valid options:
-
--mode: Mode(s) to generate on. Optional, default performs all modes.- Valid options:
none: Retrieves nothing for the prompt.random: Retrieves a random review from the dataset for the prompt.user: Retrieves "user_ratings" for the prompt.neighbor: Retrieves "neighbor_ratings" for the prompt.both: Retrieves both "user_ratings" and "neighbor_ratings" for the prompt.
- Valid options:
-
--k: K-value(s) (top k retrieved reviews) to generate on. Optional, default performs all k (1, 2, 4).
python master_generation.py --input ./data/Rankings/Amazon/amazon_dev_reviewText_bm25.json --model gptpython master_generation.py --input ./data/Rankings/Amazon/amazon_dev_reviewText_bm25.json --model gpt --mode both neighborpython master_generation.py --input ./data/Rankings/Amazon/amazon_dev_reviewText_bm25.json --model gpt --k 1python master_generation.py --input ./data/Rankings/Amazon/amazon_dev_reviewText_bm25.json --model gpt --mode none both --k 2 4The master_eval.py script evaluates batches of output files.
python master_eval.py --ranking ./data/Rankings/Amazon/amazon_dev_reviewText_bm25.json --results ./results/amazon_dev_reviewText_GPT_bm25This evaluates all output files in the given results directory against gold labels taken from the specified ranking file.
For reference please cite the following:
@misc{pgraphrag,
title={Personalized Graph-Based Retrieval for Large Language Models},
author={S. Au, C.J. Dimacali, O. Pedirappagari, N. Park, F. Dernoncourt, Y. Wang, N. Kanakaris, H. Deilamsalehy, R.A. Rossi, N.K. Ahmed},
year={2025},
eprint={2501.02157},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.02157},
}