torch==2.5.1
transformer==4.46.2
datasets==3.1.0
numpy==1.26.4
Download the hugging face checkpoints of LLMs (Llama2, Llama3, Mistral and Qwen2.5) to dir ./models/xxx_hf/, e.g., ./models/llama3_hf/8bf/, ./models/llama2_hf/13bf/, etc.
We provide shell script templates ./run_cmd/run_xxx.sh for different types of models to reproduce the experiment results in our paper.
Run this command to evaluate T5 (T5-large or T5-3B):
sh ./run_cmd/run_t5.sh
Run this command to evaluate GPT-3.5 or GPT-4:
sh ./run_cmd/run_gpt.sh
Run this command to evaluate small LLMs (Llama3, Llama2, Mistral and Qwen2.5)
sh ./run_cmd/run_llama.sh
If you want to use our code, please cite as
@inproceedings{pan-etal-2024-llms,
title = "Are {LLM}s Good Zero-Shot Fallacy Classifiers?",
author = "Pan, Fengjun and
Wu, Xiaobao and
Li, Zongrui and
Luu, Anh Tuan",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-main.794/",
doi = "10.18653/v1/2024.emnlp-main.794",
pages = "14338--14364"
}
