minicodeprops-eval

This repository contains evaluation scripts for: miniCodeProps: a Minimal Benchmark for Proving Code Properties

@misc{lohn2024minicodepropsminimalbenchmarkproving,
      title={miniCodeProps: a Minimal Benchmark for Proving Code Properties}, 
      author={Evan Lohn and Sean Welleck},
      year={2024},
      eprint={2406.11915},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2406.11915}, 
}

Setup instructions (adapted from minictx-eval)

Requirements

Python 3
PyTorch
Required Python packages (specified in requirements.txt)
```
pip install -r requirements.txt
```
Lean 4

Setup Environment

Install Lean 4

If you are running Linux, you need only install elan following these instructions. There is no need to install a code editor (i.e. no need to follow the instructions to install VS-code). To ensure your path is updated, you should run source $HOME/.elan/env as prompted by the installer If you are evaluating in an environment with vscode, follow the instructions on the Lean 4 installation page to set up Lean 4.

Set up and build your target Lean project

cd MiniCodePropsLeanSrc
lake exe cache get
lake build

Set up and build Lean REPL

Build the REPL project (we include a copy of the REPL in the /repl directory of this repository with a suitable Lean version):
```
cd repl
lake build
```

Running the Evaluation

Edit the Script

Open a script in the script/ directory and verify that the paths and parameters are correctly set according to your setup. The script contains the following variables:

TASK: The task name (e.g. full_proof).
MAX_ITERS: The maximum number of iterations (default: 100).
NUM_SAMPLES: The number of samples (default: 32).
TEMPERATURES: The temperatures for sampling (default: 0.0).
DATASET: The name of the dataset (default: minicodeprops).
DATA: The path to the dataset file (default: data/minicodeprops_bench_ps.jsonl).
MODEL: The model name (default: gpt-4o).
NAME: The name for the output directory.
OUTPUT_DIR: The directory where the output will be saved.
REPL_PATH: The path to the REPL project.
LEAN_PATH: The path to the Lean project environment.

Run the Script

Make the script executable and run it:

chmod +x script/gpt4.sh
./script/gpt4.sh

chmod +x script/bfs.sh
./script/bfs.sh

Output

The output will be saved in the ${OUTPUT_DIR} directory.

`repl_wrapper.py`

The evaluation code interacts with the Lean compiler using the Lean REPL. repl_wrapper.py provides a Python interface to interact with the Lean REPL directly.

Usage

Create a new thread by calling InteractiveThread(thread_id, repl_path, lean_env_path), where:

thread_id: Any number
repl_path: The path to the REPL directory
lean_env_path: The path to the Lean project containing the environment you want to test

Example:

from repl_wrapper import InteractiveThread

thread = InteractiveThread(1, repl_path, lean_env_path)
thread.start()

cmd = {'cmd': 'import MiniF2F.Minif2fImport\n  open BigOperators Real Nat Topology'}
output = thread.submit_and_receive(cmd)

thread.close()
thread.join()

thread.submit_and_receive takes a dictionary as input and returns the output of the REPL in a dictionary.

Credits

The implementation adapts the minictx-eval repo from: miniCTX: Neural Theorem Proving with (Long-)Contexts

@misc{hu2024minictxneuraltheoremproving,
      title={miniCTX: Neural Theorem Proving with (Long-)Contexts}, 
      author={Jiewen Hu and Thomas Zhu and Sean Welleck},
      year={2024},
      eprint={2408.03350},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2408.03350}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
MiniCodePropsLeanSrc		MiniCodePropsLeanSrc
data		data
repl		repl
script		script
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
check.py		check.py
check_api.py		check_api.py
repl_wrapper.py		repl_wrapper.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

minicodeprops-eval

Setup instructions (adapted from minictx-eval)

Requirements

Setup Environment

Running the Evaluation

Edit the Script

Run the Script

Output

`repl_wrapper.py`

Usage

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

cmu-l3/minicodeprops-eval

Folders and files

Latest commit

History

Repository files navigation

minicodeprops-eval

Setup instructions (adapted from minictx-eval)

Requirements

Setup Environment

Running the Evaluation

Edit the Script

Run the Script

Output

repl_wrapper.py

Usage

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

`repl_wrapper.py`

Packages