This repository contains evaluation scripts for: miniCodeProps: a Minimal Benchmark for Proving Code Properties
@misc{lohn2024minicodepropsminimalbenchmarkproving,
title={miniCodeProps: a Minimal Benchmark for Proving Code Properties},
author={Evan Lohn and Sean Welleck},
year={2024},
eprint={2406.11915},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2406.11915},
}
Setup instructions (adapted from minictx-eval)
-
Python 3
-
PyTorch
-
Required Python packages (specified in
requirements.txt)pip install -r requirements.txt
-
Lean 4
-
Install Lean 4
If you are running Linux, you need only install elan following these instructions. There is no need to install a code editor (i.e. no need to follow the instructions to install VS-code). To ensure your path is updated, you should run
source $HOME/.elan/envas prompted by the installer If you are evaluating in an environment with vscode, follow the instructions on the Lean 4 installation page to set up Lean 4. -
Set up and build your target Lean project
cd MiniCodePropsLeanSrc lake exe cache get lake build -
Set up and build Lean REPL
Build the REPL project (we include a copy of the REPL in the
/repldirectory of this repository with a suitable Lean version):cd repl lake build
Open a script in the script/ directory and verify that the paths and parameters are correctly set according to your setup. The script contains the following variables:
TASK: The task name (e.g.full_proof).MAX_ITERS: The maximum number of iterations (default:100).NUM_SAMPLES: The number of samples (default:32).TEMPERATURES: The temperatures for sampling (default:0.0).DATASET: The name of the dataset (default:minicodeprops).DATA: The path to the dataset file (default:data/minicodeprops_bench_ps.jsonl).MODEL: The model name (default:gpt-4o).NAME: The name for the output directory.OUTPUT_DIR: The directory where the output will be saved.REPL_PATH: The path to the REPL project.LEAN_PATH: The path to the Lean project environment.
Make the script executable and run it:
chmod +x script/gpt4.sh
./script/gpt4.shchmod +x script/bfs.sh
./script/bfs.shThe output will be saved in the ${OUTPUT_DIR} directory.
The evaluation code interacts with the Lean compiler using the Lean REPL. repl_wrapper.py provides a Python interface to interact with the Lean REPL directly.
Create a new thread by calling InteractiveThread(thread_id, repl_path, lean_env_path), where:
thread_id: Any numberrepl_path: The path to the REPL directorylean_env_path: The path to the Lean project containing the environment you want to test
Example:
from repl_wrapper import InteractiveThread
thread = InteractiveThread(1, repl_path, lean_env_path)
thread.start()
cmd = {'cmd': 'import MiniF2F.Minif2fImport\n open BigOperators Real Nat Topology'}
output = thread.submit_and_receive(cmd)
thread.close()
thread.join()thread.submit_and_receive takes a dictionary as input and returns the output of the REPL in a dictionary.
The implementation adapts the minictx-eval repo from: miniCTX: Neural Theorem Proving with (Long-)Contexts
@misc{hu2024minictxneuraltheoremproving,
title={miniCTX: Neural Theorem Proving with (Long-)Contexts},
author={Jiewen Hu and Thomas Zhu and Sean Welleck},
year={2024},
eprint={2408.03350},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2408.03350},
}