Library of baseline solvers for AI2 Reasoning Challenge (ARC) Set (http://data.allenai.org/arc/). These solvers retrieve relevant sentences from a large text corpus (ARC_Corpus.txt in the dataset), and use two types of models to predict the correct answer.
- An entailment-based model that computes the entailment score for each
(retrieved sentence, question+answer choice as an assertion)pair and scores each answer choice based on the highest-scoring sentence. - A reading comprehension model (BiDAF) that converts the retrieved sentences into a paragraph per question. The model is used to predict the best answer span and each answer choice is scored based on the overlap with the predicted span.
- Create the
arc_solversenvironment using Anaconda
conda create -n arc_solvers python=3.6- Activate the environment
source activate arc_solvers- Install the requirements in the environment:
sh scripts/install_requirements.sh- Install pytorch as per instructions on http://pytorch.org/. Command as of Feb. 26, 2018:
conda install pytorch torchvision -c pytorch- Download the data and models into
data/folder. This will also build the ElasticSearch index (assumes ElasticSearch 6+ is running onES_HOSTmachine defined in the script)
sh scripts/download_data.sh- Download and prepare embeddings. This will download glove.840B.300d.zip from https://nlp.stanford.edu/projects/glove/ and convert it to glove.840B.300d.txt.gz which is readable from AllenNLP
sh download_and_prepare_glove.shRun the entailment-based baseline solvers against a question set using scripts/evaluate_solver.sh
For example, to evaluate the DGEM model on the Challenge Set, run:
sh scripts/evaluate_solver.sh \
data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl \
data/ARC-V1-Models-Aug2018/dgem/Change dgem to decompatt to test the Decomposable Attention model.
To evaluate the BiDAF model, use the evaluate_bidaf.sh script
sh scripts/evaluate_bidaf.sh \
data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl \
data/ARC-V1-Models-Aug2018/bidaf/This model implements an attention interaction between the context-encoded representations of the question and the choices. The model is described here.
To train the model, download the data and word embeddings (see Setup data/models above).
Evaluate the trained model:
python arc_solvers/run.py evaluate \
--archive_file data/ARC-V1-Models-Aug2018/max_att/model.tar.gz \
--evaluation_data_file data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonlor
Train a new model:
python arc_solvers/run.py train \
-s trained_models/qa_multi_question_to_choices/serialization/ \
arc_solvers/training_config/qa/multi_choice/reader_qa_multi_choice_max_att_ARC_Chellenge_full.jsonTo run the baseline solvers against a new question set, create a file using the JSONL format. For example:
{
"id":"Mercury_SC_415702",
"question": {
"stem":"George wants to warm his hands quickly by rubbing them. Which skin surface will
produce the most heat?",
"choices":[
{"text":"dry palms","label":"A"},
{"text":"wet palms","label":"B"},
{"text":"palms covered with oil","label":"C"},
{"text":"palms covered with lotion","label":"D"}
]
},
"answerKey":"A"
}Run the evaluation scripts on this new file using the same commands as above.
To run a new entailment model (implemented using AllenNLP), you need to
-
Create a
Predictorthat converts the input JSON to anInstanceexpected by your entailment model. See DecompAttPredictor for an example. -
Add your custom predictor to the predictor overrides For example, if your new model is registered using
my_awesome_modeland the predictor is registered usingmy_awesome_predictor, add"my_awesome_model": "my_awesome_predictor"to thepredictor_overrides. -
Run the
evaluate_solver.shscript with your learned model inmy_awesome_model/model.tar.gz:
sh scripts/evaluate_solver.sh \
data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl \
my_awesome_model/To run a new reading comprehension (RC) model (implemented using AllenNLP), you need to
-
Create a
Predictorthat converts the input JSON to anInstanceexpected by your RC model. See BidafQaPredictor for an example. -
Add your custom predictor to the predictor overrides For example, if your new model is registered using
my_awesome_modeland the predictor is registered usingmy_awesome_predictor, add"my_awesome_model": "my_awesome_predictor"to thepredictor_overrides. -
Run the
evaluate_bidaf.shscript with your learned model inmy_awesome_model/model.tar.gz:
sh scripts/evaluate_solver.sh \
data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl \
my_awesome_model/