Skip to content

Feng-Hong/WINO-DLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

WINO-DLLM

This repository provides scripts and instructions to evaluate WINO on LLaDA and MMaDA.

Evaluation of WINO on LLaDA

  1. Installation We recommend using uv for dependency and virtual environment management.
pipx install uv # or pip install uv
cd LLaDA
uv venv --python 3.11 dev
source dev/bin/activate
uv pip install -r requirements.txt
  1. Prepare Model and Datasets

Before running inference or evaluation, please download the following models and datasets from Hugging Face into the specified local directories (e.g., ./LLaDA/models/ and ./LLaDA/data/).

You may use either huggingface-cli or the Python datasets library to complete the download.

Model Name Hugging Face Repo Local Path
LLaDA-8B-Instruct GSAI-ML/LLaDA-8B-Instruct ./LLaDA/models/LLaDA-8B-Instruct/
Dataset Name Hugging Face Repo Local Path
GSM8K openai/gsm8k ./LLaDA/data/gsm8k/
MATH-500 HuggingFaceH4/MATH-500 ./LLaDA/data/math500/
HumanEval openai/openai_humaneval ./LLaDA/data/humaneval/
ai2_arc allenai/ai2_arc ./LLaDA/data/ai2_arc/

Datasets not listed above are already included in the ./LLaDA/data/ directory

  1. Quick Demo

Please make sure to set the correct model path in generate.py.

python generate.py
  1. Evaluation

To evaluate WINO on a benchmark such as GSM8K. Please configure the model and data paths in the corresponding config file.

CUDA_VISIBLE_DEVICES=0 python eval.py --config ./configs/gsm8k.yaml

All available config files can be found in the ./LLaDA/configs/ directory.

Evaluation of WINO on MMaDA

We evaluate WINO using lmms-eval.

To run the evaluation, follow these steps:

  1. Install MMaDA dependencies
cd MMaDA
# pipx install uv
uv venv --python 3.11 dev
source dev/bin/activate
uv pip install -r requirements.txt

A quick inference demo can be performed after this step.

python generate_demo.py
  1. Install lmms-eval dependencies
cd lmms_eval
uv pip install -e .
  1. Set some necessary environmental variables Some environmental variables are necessary for certain tasks to run.
export OPENAI_API_KEY="<YOUR_API_KEY>"
export HF_HOME="<Path to HF cache>" 
export HF_TOKEN="<YOUR_API_KEY>"
export HF_HUB_ENABLE_HF_TRANSFER="1"

Once all dependencies are installed and your API key is set, you can run the evaluation script directly:

cd ..
# Evaluating MMaDA on the reported six multimodel benchmarks
bash scripts/eval_baseline.sh
# Evaluating WINO on the reported six multimodel benchmarks
bash scripts/eval_wino.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors