Skip to content

Commit ccf4fbf

Browse files
committed
chore: Update lmms-eval to support video evaluations for LLaVA models
1 parent 380a8b5 commit ccf4fbf

File tree

1 file changed

+50
-29
lines changed

1 file changed

+50
-29
lines changed

README.md

Lines changed: 50 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<p align="center" width="100%">
1+
<p align="center" width="85%">
22
<img src="https://i.postimg.cc/g0QRgMVv/WX20240228-113337-2x.png" width="100%" height="70%">
33
</p>
44

@@ -11,7 +11,7 @@
1111

1212
# Annoucement
1313

14-
- [2024-06] The `lmms-eval/v0.2` has been upgraded to support video evaluations, and other feature updates. Please refer to the [blog](https://lmms-lab.github.io/posts/lmms-eval-0.2/) for more details
14+
- [2024-06] The `lmms-eval/v0.2` has been upgraded to support video evaluations for video models like LLaVA-NeXT Video and Gemini 1.5 Pro across tasks such as EgoSchema, PerceptionTest, VideoMME, and more. Please refer to the [blog](https://lmms-lab.github.io/posts/lmms-eval-0.2/) for more details
1515

1616
- [2024-03] We have released the first version of `lmms-eval`, please refer to the [blog](https://lmms-lab.github.io/posts/lmms-eval-0.1/) for more details
1717

@@ -67,9 +67,32 @@ conda install openjdk=8
6767
```
6868
you can then check your java version by `java -version`
6969

70+
71+
<details>
72+
<summary>Comprehensive Evaluation Results of LLaVA Family Models</summary>
73+
<br>
74+
75+
As demonstrated by the extensive table below, we aim to provide detailed information for readers to understand the datasets included in lmms-eval and some specific details about these datasets (we remain grateful for any corrections readers may have during our evaluation process).
76+
77+
We provide a Google Sheet for the detailed results of the LLaVA series models on different datasets. You can access the sheet [here](https://docs.google.com/spreadsheets/d/1a5ImfdKATDI8T7Cwh6eH-bEsnQFzanFraFUgcS9KHWc/edit?usp=sharing). It's a live sheet, and we are updating it with new results.
78+
79+
<p align="center" width="100%">
80+
<img src="https://i.postimg.cc/jdw497NS/WX20240307-162526-2x.png" width="100%" height="80%">
81+
</p>
82+
83+
We also provide the raw data exported from Weights & Biases for the detailed results of the LLaVA series models on different datasets. You can access the raw data [here](https://docs.google.com/spreadsheets/d/1AvaEmuG4csSmXaHjgu4ei1KBMmNNW8wflOD_kkTDdv8/edit?usp=sharing).
84+
85+
</details>
86+
<br>
87+
88+
89+
Our Development will be continuing on the main branch, and we encourage you to give us feedback on what features are desired and how to improve the library further, or ask questions, either in issues or PRs on GitHub.
90+
7091
# Multiple Usages
92+
93+
**Evaluation of LLaVA on MME**
94+
7195
```bash
72-
# Evaluation of LLaVA on MME
7396
python3 -m accelerate.commands.launch \
7497
--num_processes=8 \
7598
-m lmms_eval \
@@ -80,8 +103,11 @@ python3 -m accelerate.commands.launch \
80103
--log_samples \
81104
--log_samples_suffix llava_v1.5_mme \
82105
--output_path ./logs/
106+
```
107+
108+
**Evaluation of LLaVA on multiple datasets**
83109

84-
# Evaluation of LLaVA on multiple datasets
110+
```bash
85111
python3 -m accelerate.commands.launch \
86112
--num_processes=8 \
87113
-m lmms_eval \
@@ -92,8 +118,11 @@ python3 -m accelerate.commands.launch \
92118
--log_samples \
93119
--log_samples_suffix llava_v1.5_mme_mmbenchen \
94120
--output_path ./logs/
121+
```
95122

96-
# For other variants llava. Note that `conv_template` is an arg of the init function of llava in `lmms_eval/models/llava.py`
123+
**For other variants llava. Note that `conv_template` is an arg of the init function of llava in `lmms_eval/models/llava.py`**
124+
125+
```bash
97126
python3 -m accelerate.commands.launch \
98127
--num_processes=8 \
99128
-m lmms_eval \
@@ -104,8 +133,11 @@ python3 -m accelerate.commands.launch \
104133
--log_samples \
105134
--log_samples_suffix llava_v1.5_mme_mmbenchen \
106135
--output_path ./logs/
136+
```
107137

108-
# Evaluation of larger lmms (llava-v1.6-34b)
138+
**Evaluation of larger lmms (llava-v1.6-34b)**
139+
140+
```bash
109141
python3 -m accelerate.commands.launch \
110142
--num_processes=8 \
111143
-m lmms_eval \
@@ -116,11 +148,17 @@ python3 -m accelerate.commands.launch \
116148
--log_samples \
117149
--log_samples_suffix llava_v1.5_mme_mmbenchen \
118150
--output_path ./logs/
151+
```
152+
153+
**Evaluation with a set of configurations, supporting evaluation of multiple models and datasets**
119154

120-
# Evaluation with a set of configurations, supporting evaluation of multiple models and datasets
155+
```bash
121156
python3 -m accelerate.commands.launch --num_processes=8 -m lmms_eval --config ./miscs/example_eval.yaml
157+
```
122158

123-
# Evaluation with naive model sharding for bigger model (llava-next-72b)
159+
**Evaluation with naive model sharding for bigger model (llava-next-72b)**
160+
161+
```bash
124162
python3 -m lmms_eval \
125163
--model=llava \
126164
--model_args=pretrained=lmms-lab/llava-next-72b,conv_template=qwen_1_5,device_map=auto,model_name=llava_qwen \
@@ -130,8 +168,11 @@ python3 -m lmms_eval \
130168
--log_samples_suffix=llava_qwen \
131169
--output_path="./logs/" \
132170
--wandb_args=project=lmms-eval,job_type=eval,entity=llava-vl
171+
```
172+
173+
**Evaluation with SGLang for bigger model (llava-next-72b)**
133174

134-
# Evaluation with SGLang for bigger model (llava-next-72b)
175+
```bash
135176
python3 -m lmms_eval \
136177
--model=llava_sglang \
137178
--model_args=pretrained=lmms-lab/llava-next-72b,tokenizer=lmms-lab/llavanext-qwen-tokenizer,conv_template=chatml-llava,tp_size=8,parallel=8 \
@@ -143,26 +184,6 @@ python3 -m lmms_eval \
143184
--verbosity=INFO
144185
```
145186

146-
<details>
147-
<summary>Comprehensive Evaluation Results of LLaVA Family Models</summary>
148-
<br>
149-
150-
As demonstrated by the extensive table below, we aim to provide detailed information for readers to understand the datasets included in lmms-eval and some specific details about these datasets (we remain grateful for any corrections readers may have during our evaluation process).
151-
152-
We provide a Google Sheet for the detailed results of the LLaVA series models on different datasets. You can access the sheet [here](https://docs.google.com/spreadsheets/d/1a5ImfdKATDI8T7Cwh6eH-bEsnQFzanFraFUgcS9KHWc/edit?usp=sharing). It's a live sheet, and we are updating it with new results.
153-
154-
<p align="center" width="100%">
155-
<img src="https://i.postimg.cc/jdw497NS/WX20240307-162526-2x.png" width="100%" height="80%">
156-
</p>
157-
158-
We also provide the raw data exported from Weights & Biases for the detailed results of the LLaVA series models on different datasets. You can access the raw data [here](https://docs.google.com/spreadsheets/d/1AvaEmuG4csSmXaHjgu4ei1KBMmNNW8wflOD_kkTDdv8/edit?usp=sharing).
159-
160-
</details>
161-
<br>
162-
163-
164-
Our Development will be continuing on the main branch, and we encourage you to give us feedback on what features are desired and how to improve the library further, or ask questions, either in issues or PRs on GitHub.
165-
166187
## Supported models
167188

168189
Please check [supported models](lmms_eval/models/__init__.py) for more details.

0 commit comments

Comments
 (0)