pip install -r requirements.txt
pip install flash-attn --no-build-isolation
VIREO Food-172, Recipe1M, Nutrition5k, FoodSeg103, UECFoodPixComplete
Note: You only need to download the extracted Nutrition5k images provided in our FoodDialogues instead of the original Nutrition5k dataset.
Download them from the above links, and organize them as follows.
├── dataset
│ ├── FoodSeg103
│ │ ├── category_id.txt
│ │ ├── FoodReasonSeg_test.json
│ │ ├── FoodReasonSeg_train.json
│ │ └── Images
│ │ └── ...
│ │ └── ImageSets
│ │ └── ...
│ ├── UECFOODPIXCOMPLETE
│ │ └── data
│ │ ├── category.txt
│ │ ├── train9000.txt
│ │ ├── test1000.txt
│ │ └── UECFOODPIXCOMPLETE
│ │ └── train
│ │ └── ...
│ │ └── test
│ │ └── ...
│ ├── VireoFood172
│ │ └── train_id.json
│ │ └── ingre.json
│ │ └── foodlist.json
│ │ └── ready_chinese_food
│ │ └── ...
│ ├── Recipe1M
│ │ └── recipe1m_train_1488.json
│ │ └── images
│ │ └── ...
│ ├── Nutrition5k
│ │ ├── train_id.json
│ │ ├── cafe_1_id.json
│ │ ├── cafe_2_id.json
│ │ ├── dish_metadata_cafe1.csv
│ │ ├── dish_metadata_cafe2.csv
│ │ ├── FoodDialogues_train.json
│ │ ├── FoodDialogues_test.json
│ │ ├── images
│ │ └── ...
To train FoodLMM, you need to download the pre-trained weights of LISA-7B-v1-explanatory and SAM ViT-H weights, and set their paths in train_config_Stage1.yaml.
deepspeed --master_port=XXX train_ds_Stage1.py --cfg_file=train_config_Stage1.yaml
The weights merging processes will be done autonomously, if you couldn't find the weights in the configed path ('./runs/FoodLMM_S1' by default), try the following commands.
cd ./runs/EXP_NAME/ckpt_model && python zero_to_fp32.py . ../pytorch_model.bin
CUDA_VISIBLE_DEVICES="" python merge_lora_weights_and_save_hf_model.py \
--cfg_file=train_config.yaml \
--weight="PATH_TO_pytorch_model.bin" \
--save_path="PATH_TO_SAVED_MODEL"
deepspeed --master_port=XXX train_ds_Stage2.py --cfg_file=train_config_Stage2.yaml
CUDA_VISIBLE_DEVICES=0 python online_demo.py --version='PATH_TO_FoodLMM_Chat'
If you find this project useful in your research, please consider citing:
@article{yin2023foodlmm,
title={FoodLMM: A Versatile Food Assistant using Large Multi-modal Model},
author={Yin, Yuehao and Qi, Huiyan and Zhu, Bin and Chen, Jingjing and Jiang, Yu-Gang and Ngo, Chong-Wah},
journal={arXiv preprint arXiv:2312.14991},
year={2023}
}
- This work is built upon LISA, and our datasets are generated from Nutrition5k and FoodSeg103 using GPT-4.