Before running LTD-Bench, please ensure that your Linux environment has already installed Xvfb, as it may be required for the Hard-level generation tasks.
You can install it using the following command.
apt-get install xvfb
apt-get install ghostscriptor
yum install xorg-x11-server-Xvfb
yum install ghostscriptThen you need to run Xvfb
Xvfb :1 -screen 0 800x600x24&Setup your Python environment
pip install -r requirements.txtSet up the model configuration in "run.sh" file, including your model_id, API_BASE_URL and API_KEY.
Then you can start running model inference!
sh run.shSet up your GPT-4.1 configuration in "run_eval.sh" file, including your OPENAI_BASE_URL and OPENAIL_KEY.
Then you can run GPT-4.1 automatic evaluation.
sh run_eval.sh