Personalized Graph-Based Retrieval Benchmark for LLMs

Steven Au, Cameron J. Dimacali, Ojasmitha Pedirappagari, Namyong Park, Franck Dernoncourt, Yu Wang, Nikos Kanakaris, Hanieh Deilamsalehy, Ryan A. Rossi, Nesreen K. Ahmed

Check our paper on Personalized Graph-Based Retrieval for Large Language Models at https://arxiv.org/abs/2501.02157

As large language models (LLMs) evolve, their ability to deliver personalized and context-aware responses offers transformative potential for improving user experiences. We propose Personalized Graph-based Retrieval-Augmented Generation (PGraphRAG), a framework and benchmark that leverages user-centric knowledge graphs to enrich personalization. By directly integrating structured user knowledge into the retrieval process and augmenting prompts with user-relevant context, PGraphRAG enhances contextual understanding and output quality. This benchmark is designed to evaluate personalized text generation tasks in real-world settings where user history is sparse or unavailable.

The benchmark framework is divided into three parts; dataset construction, document ranking, and LLM generation. They are all standalone files that can be executed, but this repo has the constructed splits, and ranked files if you to go ahead in the pipeline. Please refer to data/dataset_template.ipynb for an example of how the data is made to ensure product, neighbor, and user size distribution. Please refer to notebook/ranking.ipynb for how the files are ranked. This framework is not set up to run everything at once. master_generation.py was converted to CLI to run files but will require your own API key or endpoint.

Getting Started

Clone the Repository

gh repo clone PGraphRAG-benchmark/PGraphRAG
cd PGraphRAG-benchmark/PGraphRAG

### Install Dependencies

To install the necessary dependencies, run:

```bash
pip install -r requirements.tx

Note this is not necessary to run you own LLM models, we ran Llama-3.1-8b-instruct on our own hardware and GPT-4o-mini through Azure cloud services.

Data Structure

Data

Includes files to construct a dataset for the PGraph Framework. The GraphConstruction script processes a data split JSON and forms the graph network. This is a required step to run document ranking, and the graph construction is handled internally in the ranking script.

Data Split

Contains files for ranking.

Data Rank

Returns the profile in a dictionary to run generations on based on tuned settings.

Usage

Master Generation Script

The master_generation.py script is used to generate outputs for dataset tasks using LLMs like Llama-3.1-8B-Instruct or GPT.

To run the script:

python master_generation.py --input ./data/Rankings/Amazon/amazon_dev_reviewText_bm25.json --model gpt

This example uses GPT to generate review text on the dev split of Amazon reviews (User Product Review Generation), ranked by BM25 on all modes and all k values.

Arguments

--input: File path to the ranking file. Required.
--model: Model to use for generation. Required.
- Valid options:
  - llama: Llama-3.1-8B-Instruct
  - gpt: gpt-4o-mini-20240718
--mode: Mode(s) to generate on. Optional, default performs all modes.
- Valid options:
  - none: Retrieves nothing for the prompt.
  - random: Retrieves a random review from the dataset for the prompt.
  - user: Retrieves "user_ratings" for the prompt.
  - neighbor: Retrieves "neighbor_ratings" for the prompt.
  - both: Retrieves both "user_ratings" and "neighbor_ratings" for the prompt.
--k: K-value(s) (top k retrieved reviews) to generate on. Optional, default performs all k (1, 2, 4).

Examples

Full dataset-task generation:

python master_generation.py --input ./data/Rankings/Amazon/amazon_dev_reviewText_bm25.json --model gpt

Generation on a subset of modes (`both`, `neighbor`), (all k):

python master_generation.py --input ./data/Rankings/Amazon/amazon_dev_reviewText_bm25.json --model gpt --mode both neighbor

Generation on a subset of k (`1`), (all modes):

python master_generation.py --input ./data/Rankings/Amazon/amazon_dev_reviewText_bm25.json --model gpt --k 1

Generation on a subset of modes and subset of k (`none_k2`, `none_k4`, `both_k2`, `both_k4`):

python master_generation.py --input ./data/Rankings/Amazon/amazon_dev_reviewText_bm25.json --model gpt --mode none both --k 2 4

Master Evaluation Script

The master_eval.py script evaluates batches of output files.

To run the script:

python master_eval.py --ranking ./data/Rankings/Amazon/amazon_dev_reviewText_bm25.json --results ./results/amazon_dev_reviewText_GPT_bm25

This evaluates all output files in the given results directory against gold labels taken from the specified ranking file.

Reference

For reference please cite the following:

@misc{pgraphrag,
      title={Personalized Graph-Based Retrieval for Large Language Models}, 
      author={S. Au, C.J. Dimacali, O. Pedirappagari, N. Park, F. Dernoncourt, Y. Wang, N. Kanakaris, H. Deilamsalehy, R.A. Rossi, N.K. Ahmed},
      year={2025},
      eprint={2501.02157},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.02157}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
data		data
data_rank		data_rank
data_splits		data_splits
notebook		notebook
.gitignore		.gitignore
README.md		README.md
master_eval.py		master_eval.py
master_generation.py		master_generation.py
pgraphrag-fig.png		pgraphrag-fig.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Personalized Graph-Based Retrieval Benchmark for LLMs

Table of Contents

Getting Started

Clone the Repository

Data Structure

Data

Data Split

Data Rank

Usage

Master Generation Script

To run the script:

Arguments

Examples

Full dataset-task generation:

Generation on a subset of modes (`both`, `neighbor`), (all k):

Generation on a subset of k (`1`), (all modes):

Generation on a subset of modes and subset of k (`none_k2`, `none_k4`, `both_k2`, `both_k4`):

Master Evaluation Script

To run the script:

Reference

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

PGraphRAG-benchmark/PGraphRAG

Folders and files

Latest commit

History

Repository files navigation

Personalized Graph-Based Retrieval Benchmark for LLMs

Table of Contents

Getting Started

Clone the Repository

Data Structure

Data

Data Split

Data Rank

Usage

Master Generation Script

To run the script:

Arguments

Examples

Full dataset-task generation:

Generation on a subset of modes (both, neighbor), (all k):

Generation on a subset of k (1), (all modes):

Generation on a subset of modes and subset of k (none_k2, none_k4, both_k2, both_k4):

Master Evaluation Script

To run the script:

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Generation on a subset of modes (`both`, `neighbor`), (all k):

Generation on a subset of k (`1`), (all modes):

Generation on a subset of modes and subset of k (`none_k2`, `none_k4`, `both_k2`, `both_k4`):

Packages