Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data [Paper] [Project Website]
We provide the questions of quantitative reasoning with data (QRData) in benchmark/QRData.json. It contains 411 questions with the following keys.
data_descriptionquestionanswerdata_files: a list of names of data filesmeta_data: a dict containsreference,keywords,question_type, andmultiple_choices(the possible choices ifquestion_typeis 'multiple_choice').
Data files related to the questions are in benchmark/data.zip.
Questions of quantitative reasoning with text (QRText) are in
Some numerical questions in QRText encounter measurement errors. We will release the corrected version in the future.benchmark/QRText.json.
The script for evaluation is in 'benchmark/eval.py'.
Please cite our paper if this repository inspires your work.
@inproceedings{liu-etal-2024-llms,
title = "Are {LLM}s Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data",
author = "Liu, Xiao and
Wu, Zirui and
Wu, Xueqing and
Lu, Pan and
Chang, Kai-Wei and
Feng, Yansong",
editor = "Ku, Lun-Wei and
Martins, Andre and
Srikumar, Vivek",
booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
month = aug,
year = "2024",
address = "Bangkok, Thailand and virtual meeting",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-acl.548",
pages = "9215--9235",
}