WINO-DLLM

This repository provides scripts and instructions to evaluate WINO on LLaDA and MMaDA.

Evaluation of WINO on LLaDA

Installation We recommend using uv for dependency and virtual environment management.

pipx install uv # or pip install uv
cd LLaDA
uv venv --python 3.11 dev
source dev/bin/activate
uv pip install -r requirements.txt

Prepare Model and Datasets

Before running inference or evaluation, please download the following models and datasets from Hugging Face into the specified local directories (e.g., ./LLaDA/models/ and ./LLaDA/data/).

You may use either huggingface-cli or the Python datasets library to complete the download.

Model Name	Hugging Face Repo	Local Path
LLaDA-8B-Instruct	`GSAI-ML/LLaDA-8B-Instruct`	`./LLaDA/models/LLaDA-8B-Instruct/`

Dataset Name	Hugging Face Repo	Local Path
GSM8K	`openai/gsm8k`	`./LLaDA/data/gsm8k/`
MATH-500	`HuggingFaceH4/MATH-500`	`./LLaDA/data/math500/`
HumanEval	`openai/openai_humaneval`	`./LLaDA/data/humaneval/`
ai2_arc	`allenai/ai2_arc`	`./LLaDA/data/ai2_arc/`

Datasets not listed above are already included in the ./LLaDA/data/ directory

Quick Demo

Please make sure to set the correct model path in generate.py.

python generate.py

Evaluation

To evaluate WINO on a benchmark such as GSM8K. Please configure the model and data paths in the corresponding config file.

CUDA_VISIBLE_DEVICES=0 python eval.py --config ./configs/gsm8k.yaml

All available config files can be found in the ./LLaDA/configs/ directory.

Evaluation of WINO on MMaDA

We evaluate WINO using lmms-eval.

To run the evaluation, follow these steps:

Install MMaDA dependencies

cd MMaDA
# pipx install uv
uv venv --python 3.11 dev
source dev/bin/activate
uv pip install -r requirements.txt

A quick inference demo can be performed after this step.

python generate_demo.py

Install lmms-eval dependencies

cd lmms_eval
uv pip install -e .

Set some necessary environmental variables Some environmental variables are necessary for certain tasks to run.

export OPENAI_API_KEY="<YOUR_API_KEY>"
export HF_HOME="<Path to HF cache>" 
export HF_TOKEN="<YOUR_API_KEY>"
export HF_HUB_ENABLE_HF_TRANSFER="1"

Once all dependencies are installed and your API key is set, you can run the evaluation script directly:

cd ..
# Evaluating MMaDA on the reported six multimodel benchmarks
bash scripts/eval_baseline.sh
# Evaluating WINO on the reported six multimodel benchmarks
bash scripts/eval_wino.sh

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
LLaDA		LLaDA
MMaDA		MMaDA
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WINO-DLLM

Evaluation of WINO on LLaDA

Evaluation of WINO on MMaDA

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WINO-DLLM

Evaluation of WINO on LLaDA

Evaluation of WINO on MMaDA

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages