chore: Update lmms-eval to support video evaluations for LLaVA models

Luodian · Luodian · commit ccf4fbff4a28 · 2024-06-12T15:39:03.000Z
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-<p align="center" width="100%">
+<p align="center" width="85%">
 <img src="https://i.postimg.cc/g0QRgMVv/WX20240228-113337-2x.png"  width="100%" height="70%">
 </p>
 
@@ -11,7 +11,7 @@
 
 # Annoucement
 
-- [2024-06] The `lmms-eval/v0.2` has been upgraded to support video evaluations, and other feature updates. Please refer to the [blog](https://lmms-lab.github.io/posts/lmms-eval-0.2/) for more details
+- [2024-06] The `lmms-eval/v0.2` has been upgraded to support video evaluations for video models like LLaVA-NeXT Video and Gemini 1.5 Pro across tasks such as EgoSchema, PerceptionTest, VideoMME, and more. Please refer to the [blog](https://lmms-lab.github.io/posts/lmms-eval-0.2/) for more details
 
 - [2024-03] We have released the first version of `lmms-eval`, please refer to the [blog](https://lmms-lab.github.io/posts/lmms-eval-0.1/) for more details
 
@@ -67,9 +67,32 @@ conda install openjdk=8
 ```
 you can then check your java version by `java -version` 
 
+
+<details>
+<summary>Comprehensive Evaluation Results of LLaVA Family Models</summary>
+<br>
+
+As demonstrated by the extensive table below, we aim to provide detailed information for readers to understand the datasets included in lmms-eval and some specific details about these datasets (we remain grateful for any corrections readers may have during our evaluation process).
+
+We provide a Google Sheet for the detailed results of the LLaVA series models on different datasets. You can access the sheet [here](https://docs.google.com/spreadsheets/d/1a5ImfdKATDI8T7Cwh6eH-bEsnQFzanFraFUgcS9KHWc/edit?usp=sharing). It's a live sheet, and we are updating it with new results.
+
+<p align="center" width="100%">
+<img src="https://i.postimg.cc/jdw497NS/WX20240307-162526-2x.png"  width="100%" height="80%">
+</p>
+
+We also provide the raw data exported from Weights & Biases for the detailed results of the LLaVA series models on different datasets. You can access the raw data [here](https://docs.google.com/spreadsheets/d/1AvaEmuG4csSmXaHjgu4ei1KBMmNNW8wflOD_kkTDdv8/edit?usp=sharing).
+
+</details>
+<br>
+
+
+Our Development will be continuing on the main branch, and we encourage you to give us feedback on what features are desired and how to improve the library further, or ask questions, either in issues or PRs on GitHub.
+
 # Multiple Usages
+
+**Evaluation of LLaVA on MME**
+
 ```bash
-# Evaluation of LLaVA on MME
 python3 -m accelerate.commands.launch \
     --num_processes=8 \
     -m lmms_eval \
@@ -80,8 +103,11 @@ python3 -m accelerate.commands.launch \
     --log_samples \
     --log_samples_suffix llava_v1.5_mme \
     --output_path ./logs/
+```
+
+**Evaluation of LLaVA on multiple datasets**
 
-# Evaluation of LLaVA on multiple datasets
+```bash
 python3 -m accelerate.commands.launch \
     --num_processes=8 \
     -m lmms_eval \
@@ -92,8 +118,11 @@ python3 -m accelerate.commands.launch \
     --log_samples \
     --log_samples_suffix llava_v1.5_mme_mmbenchen \
     --output_path ./logs/
+```
 
-# For other variants llava. Note that `conv_template` is an arg of the init function of llava in `lmms_eval/models/llava.py`
+**For other variants llava. Note that `conv_template` is an arg of the init function of llava in `lmms_eval/models/llava.py`**
+
+```bash
 python3 -m accelerate.commands.launch \
     --num_processes=8 \
     -m lmms_eval \
@@ -104,8 +133,11 @@ python3 -m accelerate.commands.launch \
     --log_samples \
     --log_samples_suffix llava_v1.5_mme_mmbenchen \
     --output_path ./logs/
+```
 
-# Evaluation of larger lmms (llava-v1.6-34b)
+**Evaluation of larger lmms (llava-v1.6-34b)**
+
+```bash
 python3 -m accelerate.commands.launch \
     --num_processes=8 \
     -m lmms_eval \
@@ -116,11 +148,17 @@ python3 -m accelerate.commands.launch \
     --log_samples \
     --log_samples_suffix llava_v1.5_mme_mmbenchen \
     --output_path ./logs/
+```
+
+**Evaluation with a set of configurations, supporting evaluation of multiple models and datasets**
 
-# Evaluation with a set of configurations, supporting evaluation of multiple models and datasets
+```bash
 python3 -m accelerate.commands.launch --num_processes=8 -m lmms_eval --config ./miscs/example_eval.yaml
+```
 
-# Evaluation with naive model sharding for bigger model (llava-next-72b)
+**Evaluation with naive model sharding for bigger model (llava-next-72b)**
+
+```bash
 python3 -m lmms_eval \
     --model=llava \
     --model_args=pretrained=lmms-lab/llava-next-72b,conv_template=qwen_1_5,device_map=auto,model_name=llava_qwen \
@@ -130,8 +168,11 @@ python3 -m lmms_eval \
     --log_samples_suffix=llava_qwen \
     --output_path="./logs/" \
     --wandb_args=project=lmms-eval,job_type=eval,entity=llava-vl
+```
+
+**Evaluation with SGLang for bigger model (llava-next-72b)**
 
-# Evaluation with SGLang for bigger model (llava-next-72b)
+```bash
 python3 -m lmms_eval \
 	--model=llava_sglang \
 	--model_args=pretrained=lmms-lab/llava-next-72b,tokenizer=lmms-lab/llavanext-qwen-tokenizer,conv_template=chatml-llava,tp_size=8,parallel=8 \
@@ -143,26 +184,6 @@ python3 -m lmms_eval \
 	--verbosity=INFO
 ```
 
-<details>
-<summary>Comprehensive Evaluation Results of LLaVA Family Models</summary>
-<br>
-
-As demonstrated by the extensive table below, we aim to provide detailed information for readers to understand the datasets included in lmms-eval and some specific details about these datasets (we remain grateful for any corrections readers may have during our evaluation process).
-
-We provide a Google Sheet for the detailed results of the LLaVA series models on different datasets. You can access the sheet [here](https://docs.google.com/spreadsheets/d/1a5ImfdKATDI8T7Cwh6eH-bEsnQFzanFraFUgcS9KHWc/edit?usp=sharing). It's a live sheet, and we are updating it with new results.
-
-<p align="center" width="100%">
-<img src="https://i.postimg.cc/jdw497NS/WX20240307-162526-2x.png"  width="100%" height="80%">
-</p>
-
-We also provide the raw data exported from Weights & Biases for the detailed results of the LLaVA series models on different datasets. You can access the raw data [here](https://docs.google.com/spreadsheets/d/1AvaEmuG4csSmXaHjgu4ei1KBMmNNW8wflOD_kkTDdv8/edit?usp=sharing).
-
-</details>
-<br>
-
-
-Our Development will be continuing on the main branch, and we encourage you to give us feedback on what features are desired and how to improve the library further, or ask questions, either in issues or PRs on GitHub.
-
 ## Supported models
 
 Please check [supported models](lmms_eval/models/__init__.py) for more details.