CaseGNN & LEXA

Code for CaseGNN (ECIR 2024 paper):

Title: CaseGNN: Graph Neural Networks for Legal Case Retrieval with Text-Attributed Graphs

Author: Yanran Tang, Ruihong Qiu, Yilun Liu, Xue Li and Zi Huang

And LEXA (Extension of CaseGNN): Title: LEXA: Legal Case Retrieval via Graph Contrastive Learning with Contextualised LLM Embeddings

Author: Yanran Tang, Ruihong Qiu, Yilun Liu, Xue Li and Zi Huang

Installation

Requirements can be seen in /requirements.txt

Dataset

Datasets can be downloaded from COLIEE2022 and COLIEE2023.

Specifically, the downloaded COLIEE2022 folders task1_train_files_2022 and task1_test_files_2022 should be put into /PromptCase/task1_train_2022/ and /PromptCase/task1_test_2022/ respectively.

The label file task1_train_labels_2022.json and task1_test_labels_2022.json shoule be put into folder /label/.

COLIEE2023 folders should be set in a similar way.

The final project file are as follows:

```
$ ./CaseGNN/
.
├── DATASET
│   └── data_load.py
├── Grpah_generation
│   ├── graph
│   │   ├── graph_bin_2022
│   │   └── graph_bin_2023
│   └── TACG.py
├── Information_extraction  
│   ├── coliee2022_ie    
│   ├── coliee2023_ie
│   ├── lexnlp             
│   ├── stanford-openie
│   ├── create_structured_csv.py
│   ├── knowledge_graph.py
│   └── relation_extractor.py             
├── label 
│   ├── hard_neg_top50_train_2022.json
│   ├── hard_neg_top50_train_2023.json
│   ├── task1_test_labels_2022.json            
│   ├── task1_test_labels_2023.json 
│   ├── task1_train_labels_2022.json 
│   ├── task1_train_labels_2023.json 
│   ├── test_2022_candidate_with_yearfilter.json
│   └── test_2023_candidate_with_yearfilter.json     
├── PromptCase
│   ├── preprocessing
│   │   ├── openaiAPI.py
│   │   ├── process.py
│   │   └── reference.py
│   ├── promptcase_embedding
│   ├── PromptCase_embedding_generation.py
│   ├── task1_test_2022
│   │   └── task1_test_files_2022
│   ├── task1_test_2023
│   │   └── task1_test_files_2023
│   ├── task1_train_2022
│   │   └── task1_train_files_2022
│   └── task1_train_2023
│       └── task1_train_files_2023
├── CaseGNN2022_run.sh
├── CaseGNN2023_run.sh
├── CaseGNN++2022_run.sh
├── CaseGNN++2023_run.sh
├── LegalFeatureExtraction.sh
├── RelationExtraction.sh
├── PromptcaseEmbeddingGeneration.sh
├── TACG.sh
├── main.py
├── model.py
├── train.py
├── main_casegnn2plus.py
├── model_casegnn2plus.py
├── train_casegnn2plus.py
├── EUGATConv.py
├── torch_metrics.py
├── requirements.txt
└── README.md          
```

Data Preparation

1. Information Extraction

1. Legal Feature Extraction
- PromptCase Preprocessing is used to extracted the fact and issue from the cases.
- Run . ./LegalFeatureExtraction.sh to generate files in the following three folders:
  - /PromptCase/task1_test_2022/processed/,
  - /PromptCase/task1_test_2022/processed_new/, which is the legal issues of cases,
  - /PromptCase/task1_test_2022/summary_test_2022_txt/, which is the legal facts of cases.
- The same process for COLIEE2023, please change the --data 2022 to --data 2023 in LegalFeatureExtraction.sh.
1. Relation Extraction
- Run . ./RelationExtraction.sh.
- The final relation triplets are in the folder /Information_extraction/coliee2022_ie/coliee2022train(or test)_sum(or fact)/result/.
- The same process for COLIEE2023, please change the --data 2022 to --data 2023 in RelationExtraction.sh.
- The relation extraction is based on the knowledge_graph_from_unstructured_text and lexnlp.
Note: Legal feature extraction should be done first since the relation extraction is based on the extracted legal features.
The extracted information can be also downloaded here.

2. PromptCase Embedding Generation

PromptCase is used to generate the case embedding (the feature of virtual global node)
- Run . ./PromptcaseEmbeddingGeneration.sh.
- The generated case embedding and the according index list of cases are saved in folder /PromptCase/promptcase_embedding/
- The same process for COLIEE2023, please change the --data 2022 to --data 2023 in PromptcaseEmbeddingGeneration.sh.
The generated PromptCase embedding can be also downloaded here.

3. TACG Constrction

TACG constrction utilises the result of Information Extraction and PromptCase Embedding, please ensure the folders of coliee2022_ie/coliee2022train(or test)_sum(or fact)/result/ and /PromptCase/promptcase_embedding/ have been generated or downloaded.
Run . ./TACG.sh
The TACG graphs are saved in folder /Graph_generation/graph/
The same process for COLIEE2023, please change the --data 2022 to --data 2023 in TACG.sh.

Model Training

1. CaseGNN Model Training

Run . ./CaseGNN2022_run.sh and . ./CaseGNN2023_run.sh for COLIEE2022 and COLIEE2023, respectively.

2. CaseGNN++ Model Training (LEXA without LLMs)

Run . ./CaseGNN++2022_run.sh and . ./CaseGNN++2023_run.sh for COLIEE2022 and COLIEE2023, respectively.

Specifically, augmentation methods can be chosen to use for:

Positive samples only (--pos_aug)
Random negative samples only (--ran_aug)
Both positive and random negative samples (--pos_aug --ran_aug)

3. LEXA Model (🤗Huggin Face)

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("AnnaStudy/LEXA-8B", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("AnnaStudy/LEXA-8B")

case_txt = "The following contains key components of a legal case. Legal facts..."

tokenized = tokenizer(case_txt, return_tensors='pt', padding=True, truncation=True, max_length=2048)
outputs = model(**tokenized)
case_embedding = outputs.last_hidden_state[:, -1]

Cite

If you find this repo useful, please cite

@article{LEXA,
  author       = {Yanran Tang, Ruihong Qiu, Xue Li, Zi Huang},
  title        = {LEXA: Legal Case Retrieval via Graph Contrastive Learning with Contextualised LLM Embeddings},
  journal      = {CoRR},
  volume       = {abs/2405.11791},
  year         = {2025}
}

@inproceedings{CaseGNN,
  author       = {Yanran Tang and
                  Ruihong Qiu and
                  Yilun Liu and
                  Xue Li and
                  Zi Huang},
  title        = {CaseGNN: Graph Neural Networks for Legal Case Retrieval with Text-Attributed
                  Graphs},
  booktitle    = {ECIR},
  year         = {2024}
}

@inproceedings{PromptCase,
  author       = {Yanran Tang and
                  Ruihong Qiu and
                  Xue Li},
  title        = {Prompt-Based Effective Input Reformulation for Legal Case Retrieval},
  booktitle    = {ADC},
  year         = {2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CaseGNN & LEXA

Installation

Dataset

Data Preparation

1. Information Extraction

2. PromptCase Embedding Generation

3. TACG Constrction

Model Training

1. CaseGNN Model Training

2. CaseGNN++ Model Training (LEXA without LLMs)

3. LEXA Model (🤗Huggin Face)

Cite

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
DATASET		DATASET
Graph_generation		Graph_generation
Information_extraction		Information_extraction
PromptCase		PromptCase
images		images
label		label
CaseGNN++2022_run.sh		CaseGNN++2022_run.sh
CaseGNN++2023_run.sh		CaseGNN++2023_run.sh
CaseGNN2022_run.sh		CaseGNN2022_run.sh
CaseGNN2023_run.sh		CaseGNN2023_run.sh
EUGATConv.py		EUGATConv.py
LegalFeatureExtraction.sh		LegalFeatureExtraction.sh
PromptcaseEmbeddingGeneration.sh		PromptcaseEmbeddingGeneration.sh
README.md		README.md
RelationExtraction.sh		RelationExtraction.sh
TACG.sh		TACG.sh
main.py		main.py
main_casegnn2plus.py		main_casegnn2plus.py
model.py		model.py
model_casegnn2plus.py		model_casegnn2plus.py
requirements.txt		requirements.txt
torch_metrics.py		torch_metrics.py
train.py		train.py
train_casegnn2plus.py		train_casegnn2plus.py

Folders and files

Latest commit

History

Repository files navigation

CaseGNN & LEXA

Installation

Dataset

Data Preparation

1. Information Extraction

2. PromptCase Embedding Generation

3. TACG Constrction

Model Training

1. CaseGNN Model Training

2. CaseGNN++ Model Training (LEXA without LLMs)

3. LEXA Model (🤗Huggin Face)

Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages