Create environment and install dependencies.
conda create -n MM python=3.8
pip install -r requirements.txt
We have host MJ-Bench dataset on huggingface, where you should request access on this page first and shall be automatically approved. Then you can simply load the dataset vi:
from datasets import load_dataset
dataset = load_dataset("MJ-Bench/MJ-Bench")
# use streaming mode to load on the fly
dataset = load_dataset("MJ-Bench/MJ-Bench", streaming=True)
config/config.yaml contains the configuration for the three types of multimodal judges that you want to evaluate. You can copy the default configuration to a new file and modify the model_path and api_key to use in your own envionrment. If you add new models, make sure you also add the load_model and get_score functions in the corresponding files under reward_models/.
To get the inference result from a multimodal judge, simply run
python inference.py --model [MODEL_NAME] --config_path [CONFIG_PATH] --dataset [DATASET] --perspective [PERSPECTIVE] --save_dir [SAVE_DIR] --threshold [THRESHOLD] --multi_image [MULTI_IMAGE] --prompt_template_path [PROMPT_PATH]where MODEL_NAME is the name of the reward model to evaluate; CONFIG_PATH is the path to the configuration file; DATASET is the dataset to evaluate on (default is MJ-Bench/MJ-Bench); PERSPECTIVE is the data subset to evaluate (e.g. alignment, safety, quality, bias); SAVE_DIR is the directory to save the results; and THRESHOLD is the preference threshold for the score-based RMs(i.e. image_0 is prefered only if score(image_0) - score(image_1) > THRESHOLD); MULTI_IMAGE indicates whether input multiple images or not (only close-source VLMs and some open-source VLMs support this); PROMPT_PATH indicates the path to the prompt for the VLM judges (needs to be consistent with MULTI_IMAGE).
@article{chen2024mj,
title={MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?},
author={Chen, Zhaorun and Du, Yichao and Wen, Zichen and Zhou, Yiyang and Cui, Chenhang and Weng, Zhenzhen and Tu, Haoqin and Wang, Chaoqi and Tong, Zhengwei and Huang, Qinglan and others},
journal={arXiv preprint arXiv:2407.04842},
year={2024}
}
