GitHub

This repository contains codes for our EMNLP 2025 paper: A Necessary Step toward Faithfulness: Measuring and Improving Consistency in Free-Text Explanations.

Please cite our paper if you find our code useful:

@inproceedings{zhao2025necessary,
  title={A Necessary Step toward Faithfulness: Measuring and Improving Consistency in Free-Text Explanations},
  author={Zhao, Lingjun and Daum{\'e} III, Hal},
  booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year={2025},
  organization={Association for Computational Linguistics}
}

🛠️ Preliminaries: Environment Setup

conda env create -n new_env_name -f environment.yml

I. Simple PEX Consistency Computation

Download SFT model (fine-tuned on TripAdvisor dataset from Llama2-13B) from here, unzip, and put it in trained_models folder.
Call PEX score computation:

To compute the PEX score, specify the review to be classified, the model's prediction, and its explanation. For example:

python3 compute_pex_score.py \
  --model_name llama2_13b_finetune_review \
  --review "Striking architecture is only the beginning of what can only be described as one of the best hotel experiences of my professional life. The attention to detail in the room and the bathroom is remarkable and the bed is the best night's sleep I've ever experienced in a hotel. Gorgeous LCD tv and Bose wave machine added to my enjoyment. Sofitel is worth the extra \$'s." \
  --prediction "Truthful" \
  --explanation "[reason1] Specific details: The reviewer provides specific details about the room and the bathroom, such as the remarkable bathroom, which suggests that they experienced it firsthand. [reason2] Enhanced experience: The reviewer suggests that their experience was enhanced by the architecture, in-room technology, and bathroom amenities, which suggests that the hotel's features impressed them."

The output will be displayed as:

PEX score: 4.25

(Optional) Run the following script to compute PEX consistency on the validation set:

bash scripts/compute_pex.sh

The output will be saved to test_models/compute_pex_example/val_explanations_adjusted_woe.json The PEX consistency score is under adjusted_woe_score key for each explanation.

Your output shall look like the example output file in data/val_explanations_pex.jsonl.

II. Full Pipeline (as in our paper)

This example uses the TripAdvisor hotel review dataset (Negative deceptive opinion spam, Ott et al., 2013) and the Llama-2 13B model. The steps are similar for other datasets and models.

Supervised fine-tuning to improve prediction accuracy

bash scripts/finetune_review.sh

Generate free-text explanations, and compute PEX consistency on validation and test sets

bash scripts/generate_and_compute_pex.sh
bash scripts/generate_and_compute_pex_testset.sh

Sample explanations to train DPO model, and compute PEX consistency

bash scripts/sample_review_explanation.sh
bash scripts/sample_review_explanation_trainset.sh

Train DPO model and infer explanations, compute PEX consistency

bash scripts/train_infer_dpo.sh

Evaluate faithfulness by training student models

(a) Use explanations generated by the DPO model

bash scripts/finetune_student_model.sh

(b) Use explanations generated by the SFT model

bash scripts/finetune_student_model_sft.sh

(c) No explanation, use prediction label only as teaching signal

bash scripts/finetune_student_model_no_explanation.sh

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
data		data
scripts		scripts
utils		utils
README.md		README.md
compute_adjusted_woe.py		compute_adjusted_woe.py
compute_pex_score.py		compute_pex_score.py
compute_woe.py		compute_woe.py
environment.yml		environment.yml
explanation_dpo.py		explanation_dpo.py
explanation_dpo_yi.py		explanation_dpo_yi.py
finetune_review.py		finetune_review.py
finetune_review_yi.py		finetune_review_yi.py
infer_dpo_explanation.py		infer_dpo_explanation.py
test_local_inconsistency.py		test_local_inconsistency.py
test_review_explanation.py		test_review_explanation.py
test_review_prediction.py		test_review_prediction.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🛠️ Preliminaries: Environment Setup

I. Simple PEX Consistency Computation

II. Full Pipeline (as in our paper)

About

Uh oh!

Releases

Packages

Languages

lingjunzhao/PEX_consistency

Folders and files

Latest commit

History

Repository files navigation

🛠️ Preliminaries: Environment Setup

I. Simple PEX Consistency Computation

II. Full Pipeline (as in our paper)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages