language guided program synthesis results for LARC

This repository is the official implementation for the language-guided program synthesis experiments in Communicating Natural Programs to Humans and Machines (Section 5, Executing Natual Programs). This repository and branch is a static branch designed to reproduce the results in the paper. The full LARC dataset can be found here: https://github.com/samacqua/LARC

Getting Started: Dependencies and Requirements

The experiments in the paper were conducted on an academic supercomputing cluster (Ubuntu machines, 24 parallel CPUs per experiment.) The following setup has been tested on both Ubuntu and Mac (OS Catalina). The codebase is implemented in both Python and OCaml.

Install Python 3.7.7 and the Python requirements.

We test our implementation on 3.7.7. On Linux, you can create a fresh install (including pip) with:

sudo apt-get update; sudo apt-get install software-properties-common; 
sudo add-apt-repository ppa:deadsnakes/ppa; sudo apt-get update; 
sudo apt-get install python3.7; sudo apt install python3.7-dev; 
sudo apt install python3-pip; python3.7 -m pip install pip;
pip install --upgrade setuptools;

Install the requirements.

pip install -r requirements.txt

Download the T5 pre-trained language model. At an interactive prompt, run:

> import transformers; from transformers import T5EncoderModel; T5EncoderModel.from_pretrained('t5-small')

Build the OCaml binaries.

The repository contains prebuilt OCaml binaries that should run on most Linux-based machines. However, to build the OCaml binaries from scratch, you can run the following from the root of the repo.

Install OCaml.

sudo apt install ocaml ; 
sudo apt install opam ; 
opam init; opam update; 
opam switch 4.06.1+flambda ;
eval `opam config env`;

Install the OCaml requirements.

opam depext ppx_jane core re2 yojson vg cairo2 camlimages menhir ocaml-protoc zmq; 
opam install ppx_jane core re2 yojson vg cairo2 camlimages menhir ocaml-protoc zmq;

Run the following from the directory root to build the binaries.

make clean; make

LARC Dataset

The program synthesis experiments use LARC (Language-annotated Abstraction and Reasoning Corpus) (Sec. 3 of the paper), a dataset containing human descriptions of the ARC[https://github.com/fchollet/ARC] (Chollet 2019) inductive reasoning tasks gathered via a 2-participant communication game. The full language dataset collection procedure is described in the main paper and released separately. This repository also contains the following pre-processed versions of this dataset, which can be used directly to reproduce the results of the reported experiments:

ARC tasks. Our experiments use the training dataset (n=400 tasks) from the original ARC repository. JSON files containing the input/output examples for each task and a task_id are in arc_data/data/training/<task_id>.json.
Human natural language task annotations. Sentence-parsed annotations for each task from the human experiment are in data/arc/language/sentences/language.json, in a dict mapping from task_id to an array of strings for each task.
Natural language DSL function annotations. Additional expert-written annotations for each function in the DSL, used in the pseudo-annotation training procedure (Sec. 5.1), are available in data/arc/primitiveNamesToDescriptions.json. This maps each function in the DSL to both a human readable name and a human readable short gloss (stored in a [name, gloss] tuple); we use the gloss for our experiments. During training and execution, these natural language annotations are used along with the formal DSL, implemented in OCaml, in solvers/arc.ml.

Training and Evaluation.

The script to train and evaluate any of the reported models on the paper (including calling the LARC dataloader) is located at bin/arc.py.

A full list of commandline arguments (and descriptions of their functions) can be found by running

python bin/arc.py -h

By default, as the algorithm is iterative, the training scripts will both run the algorithm for a specified number of iterations, and evaluate on a held-out test task every n iterations (where n is an adjustable argument.)

Running the commands below will produce fairly verbose log outputs that include evaluation metrics, and the location of the model checkpoint. In particular, running

grep 'checkpoint' [LOG_FILE]

will print out the path of the checkpoints at each iteration, and running

grep 'Hits' [LOG_FILE]

will print out the held-out task evaluation metrics.

It is also possible to resume and evaluate a model checkpoint from any iteration in training. By default, the scripts write timestamped checkpoints to a directory titled experimentOutputs (the exact directory appears with other informaiton in the output.)

The --resume [CHECKPOINT_PATH] commandline argument will resume training from a checkpoint, including to re-run evaluation tasks.

For additional information on the command line output (and default scripts to graph the outputs from the checkpoints), see docs/EC_README.md.

Command line arguments for experiments

To train and evaluate the best-performing model in the paper (Table 1, IO + NL, initialized from tasks solved after a 1hr initial enumeration from the base DSL), run:

python bin/arc.py --enumerationTimeout 720 --no-dsl --testingTimeout 0 --iterations 5 --taskBatchSize 200 --testEvery 1 --recognitionSteps 10000 --biasOptimal --contextual --no-cuda --CPUs 24 --featureExtractor LMPseudoTranslationFeatureExtractor --Helmholtz 0.5 --no-background-helmholtz --no-consolidation --taskReranker randomShuffle --seed 2  --preload_frontiers experimentOutputs/arc/2021-09-30T12:23:24.378667/arc_aic=1.0_arity=0_ET=10_t_zero=1_it=1_MF=10_noConsolidation=True_pc=10_RW=False_solver=ocaml_STM=True_L=1.0_batch=200_TRR=randomShuffle_K=2_topkNotMAP=False_UET=3600_DSL=True_rec=False.pickle'

Name		Name	Last commit message	Last commit date
Latest commit History 2,414 Commits
.vscode		.vscode
PBE_Strings_Track		PBE_Strings_Track
arc_data		arc_data
bin		bin
data		data
docs		docs
dreamcoder		dreamcoder
failures		failures
language_experiments		language_experiments
neural_seq		neural_seq
pinn @ 1878ef5		pinn @ 1878ef5
plots		plots
pregex @ b5eab11		pregex @ b5eab11
prototypical-networks		prototypical-networks
pyccg @ c465a23		pyccg @ c465a23
recognitionModels		recognitionModels
requirements		requirements
rust_compressor		rust_compressor
solvers		solvers
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
Makefile		Makefile
Readme.md		Readme.md
__init__.py		__init__.py
graphs		graphs
jsonDebug		jsonDebug
logReports		logReports
official_experiments		official_experiments
official_figures		official_figures
runtests		runtests
singularity		singularity
solver		solver
taskRankGraphs		taskRankGraphs
test.png		test.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

language guided program synthesis results for LARC

Getting Started: Dependencies and Requirements

Install Python 3.7.7 and the Python requirements.

Build the OCaml binaries.

LARC Dataset

Training and Evaluation.

Command line arguments for experiments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

language guided program synthesis results for LARC

Getting Started: Dependencies and Requirements

Install Python 3.7.7 and the Python requirements.

Build the OCaml binaries.

LARC Dataset

Training and Evaluation.

Command line arguments for experiments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages