ARC-Solvers

Library of baseline solvers for AI2 Reasoning Challenge (ARC) Set (http://data.allenai.org/arc/). These solvers retrieve relevant sentences from a large text corpus (ARC_Corpus.txt in the dataset), and use two types of models to predict the correct answer.

An entailment-based model that computes the entailment score for each (retrieved sentence, question+answer choice as an assertion) pair and scores each answer choice based on the highest-scoring sentence.
A reading comprehension model (BiDAF) that converts the retrieved sentences into a paragraph per question. The model is used to predict the best answer span and each answer choice is scored based on the overlap with the predicted span.

Setup environment

Create the arc_solvers environment using Anaconda

conda create -n arc_solvers python=3.6

Activate the environment

source activate arc_solvers

Install the requirements in the environment:

sh scripts/install_requirements.sh

Install pytorch as per instructions on http://pytorch.org/. Command as of Feb. 26, 2018:

conda install pytorch torchvision -c pytorch

Setup data/models

Download the data and models into data/ folder. This will also build the ElasticSearch index (assumes ElasticSearch 6+ is running on ES_HOST machine defined in the script)

sh scripts/download_data.sh

Download and prepare embeddings. This will download glove.840B.300d.zip from https://nlp.stanford.edu/projects/glove/ and convert it to glove.840B.300d.txt.gz which is readable from AllenNLP

sh download_and_prepare_glove.sh

Running baseline models

Run the entailment-based baseline solvers against a question set using scripts/evaluate_solver.sh

Running a pre-trained DGEM model

For example, to evaluate the DGEM model on the Challenge Set, run:

sh scripts/evaluate_solver.sh \
	data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl \
	data/ARC-V1-Models-Aug2018/dgem/

Change dgem to decompatt to test the Decomposable Attention model.

Running a pre-trained BiDAF model

To evaluate the BiDAF model, use the evaluate_bidaf.sh script

 sh scripts/evaluate_bidaf.sh \
    data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl \
    data/ARC-V1-Models-Aug2018/bidaf/

Training and evaluating the BiLSTM Max-out with Question to Choices Max Attention

This model implements an attention interaction between the context-encoded representations of the question and the choices. The model is described here.

To train the model, download the data and word embeddings (see Setup data/models above).

Evaluate the trained model:

python arc_solvers/run.py evaluate \
    --archive_file data/ARC-V1-Models-Aug2018/max_att/model.tar.gz \
    --evaluation_data_file data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl

or

Train a new model:

python arc_solvers/run.py train \
    -s trained_models/qa_multi_question_to_choices/serialization/ \
    arc_solvers/training_config/qa/multi_choice/reader_qa_multi_choice_max_att_ARC_Chellenge_full.json

Running against a new question set

To run the baseline solvers against a new question set, create a file using the JSONL format. For example:

{
    "id":"Mercury_SC_415702",
    "question": {
       "stem":"George wants to warm his hands quickly by rubbing them. Which skin surface will
               produce the most heat?",
       "choices":[
                  {"text":"dry palms","label":"A"},
                  {"text":"wet palms","label":"B"},
                  {"text":"palms covered with oil","label":"C"},
                  {"text":"palms covered with lotion","label":"D"}
                 ]
    },
    "answerKey":"A"
}

Run the evaluation scripts on this new file using the same commands as above.

Running a new Entailment-based model

To run a new entailment model (implemented using AllenNLP), you need to

Create a Predictor that converts the input JSON to an Instance expected by your entailment model. See DecompAttPredictor for an example.
Add your custom predictor to the predictor overrides For example, if your new model is registered using my_awesome_model and the predictor is registered using my_awesome_predictor, add "my_awesome_model": "my_awesome_predictor" to the predictor_overrides.
Run the evaluate_solver.sh script with your learned model in my_awesome_model/model.tar.gz:

 sh scripts/evaluate_solver.sh \
    data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl \
    my_awesome_model/

Running a new Reading Comprehension model

To run a new reading comprehension (RC) model (implemented using AllenNLP), you need to

Create a Predictor that converts the input JSON to an Instance expected by your RC model. See BidafQaPredictor for an example.
Add your custom predictor to the predictor overrides For example, if your new model is registered using my_awesome_model and the predictor is registered using my_awesome_predictor, add "my_awesome_model": "my_awesome_predictor" to the predictor_overrides.
Run the evaluate_bidaf.sh script with your learned model in my_awesome_model/model.tar.gz:

 sh scripts/evaluate_solver.sh \
    data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl \
    my_awesome_model/

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
arc_solvers		arc_solvers
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ARC-Solvers

Setup environment

Setup data/models

Running baseline models

Running a pre-trained DGEM model

Running a pre-trained BiDAF model

Training and evaluating the BiLSTM Max-out with Question to Choices Max Attention

Running against a new question set

Running a new Entailment-based model

Running a new Reading Comprehension model

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ARC-Solvers

Setup environment

Setup data/models

Running baseline models

Running a pre-trained DGEM model

Running a pre-trained BiDAF model

Training and evaluating the BiLSTM Max-out with Question to Choices Max Attention

Running against a new question set

Running a new Entailment-based model

Running a new Reading Comprehension model

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages