This repo contains the code and data for the paper:
Analyzing the Role of Semantic Representations in the Era of Large Language Models (2023)
Zhijing Jin*, Yuen Chen*, Fernando Gonzalez Adauto*, Jiayi Zhang, Jiarui Liu, Julian Michael, Bernhard Schölkopf, Mona Diab (*: Co-first author)
-
code/: Contains the codes for the Tasks 0-8 described below. -
data/: For the source data, please download the data files from this google drive folder (containing the CSVs for all the datasets) to the localdata/folder. The existing files in the localdata/folder contains the AMRs of all datasets parsed using AMR3-structbart-L, text input for prompt generation, and input for Task 2 and default Task 6.
We use the library transition-amr-parser to get AMRs from sentences. The script to get the AMRs can be found in code/predict_amr.py.
To use efficiency package, which saves gpt queries into a cache automatically, run the following code:
pip install efficiencyThis script is used to call the OpenAI API and get the LLMs' inference performance for the selected task.
- Pass the input data file, the AMR file, the dataset, the amr flag, and model version as arguments to the script. For example:
python code/general_request_chatbot.py --data_file data/classifier_inputs/updated_data_input_classifier_input.csv --amr_file data/corrected_amrs.csv --dataset logic --amr_cot --model_version gpt4To get LLMs' response on SPIDER dataset, run the following code:
python code/general_request_spider.py --amr_cot --model_version gpt4- The outputs are stored in a csv file in
data/outputs/{model_version}/requests_direct_{dataset}.csv - To get the results for all the datasets, run the following code:
python code/eval_gpt.py --data_file {file_to_evaluate} --dataset {dataset}For example:
python code/eval_gpt.py --data_file data/outputs/gpt-4-0613/requests_direct_logic.csv --dataset logicTo train a binary classifier to predict when AMRs help and when LLMs fail,
- installed the required packages.
python -r code/BERTBinaryClassification/requirements.txt-
Download this data folder from google drive and put it under the
code/BERTBinaryClassificationdirectory. -
Run
code/BERTBinaryClassification/train.ipynb.
We generate the features in the Text Characterization Toolkit (Simig et al., 2022; this repo), as well as our own proposed features.
(In current implementation, we assume the text-characterization-toolkit is in the same directory as this repo. ie ../text-characterization-toolkit)
python code/get_features.py --dataset paws --output_dir ../data/featuredWe combine all datasets into one csv file, and compute the correlation between linguistic features (features which >90% of the data has) and AMR helpfulness.
python code/combine_features.pyWe fit traditional machine learning methods, such as logistic regression, decision tree, random forest, XGBoost, and ensemble models, to predict AMR helpfulness using linguistic features:
python code/train_basics.pypython amr_cot_ablation.py --dataset entity_recog_gold --cut_col amr --ratio 0.5 --output_dir data/ablation --model_version gpt-4-0613The output is stored in a csv file in {output_dir}/{dataset}_{model_version}_{cutcol}.csv
To plot the results, run the following code:
python code/plot_ablation.py --data_file ./data/ablation/entity_recog_gold_gpt-4-0613_text.csv --cut_col amrThe plot is stored in data/ablation/{dataset}_{model_version}_{cut_col}.png
The summary csv is stored in data/ablation/{dataset}_{model_version}_{cut_col}_summary.csv.
As an intermediate step of constructing the GoldAMR-ComposedSlang dataset, we let gpt-3.5-turbo-0613 to identify candidate slang usage:
python create_slang.pyWe annotate 50 samples from the PAWS dataset, and ask human annotators to evaluate the correctness of LLMs reasoning over AMR based on the following criteria:
- The commonalities and differences between the two AMRs are correctly identified.
- Drawing on the commonalities and differences, the LLMs can correctly infer the relationship between the two sentences.
The annotation results can be found here.
For coding and data questions,
- Please first open a GitHub issue.
- If you want a more speedy response, please link your GitHub issue when emailing any of the student authors on this paper: Yuen Chen, Fernando Gonzalez, and Jiarui Liu.
- We will reply to your email and directly answer on the GitHub issue, so more people can benefit if they have similar questions.
For future collaborations or further requests,
- Feel free to email Zhijing Jin and Yuen Chen.