An eyecare foundation model for clinical assistance: a randomized controlled trial

Requirements

This software requires a Linux system: Ubuntu 22.04 or Ubuntu 20.04 (other versions are not tested) and Python3.8 (other versions are not tested).

The Python packages needed are listed below. They can also be found in reqirements.txt.

torch>=1.10.0
torchvision>=0.11.1
timm>=0.4.12
einops>=0.3.2
pandas>=1.3.4
albumentations>=1.1.0
wandb>=0.12.11
flash-attn==2.5.8,
ninja>=1.11.1.0
open-clip-torch>=2.11.0

Prepare the environment

1. Download the project

Open the terminal in the system.
Clone this repo file to the home path.

git clone https://github.com/eyefm/EyeFM.git

Change the current directory to the project directory

cd EyeFM

2. Prepare the running environment

Install Pytorch 1.10.0 (Cuda 11.1)

pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html

Install other Python packages

pip install -r requirements.txt

Fine-tuning of vision module

1. Prepare the datasets

During fine-tuning, images and labels are stored in CSV files: train.csv, val.csv, and test.csv, corresponding to the training, validation, and testing sets, respectively. These files should be placed in the same directory. In each CSV file, cfp and oct refer to the image paths of the corresponding modalities, while label denotes the associated ground truth. For paired-modality fine-tuning, each row contains paired data from the two or more modalities.

├── fine-tuning csv path
    ├──train.csv
    ├──val.csv
    ├──test.csv

├── fine-tuning image path
    ├──cfp
    	├──ffds.png
    	fs1fsd.png
    ├──oct
    	├──gdfg.png
    	├──d4fsad.png

cfp	oct	label
cfp/ffds.png	oct/gdfg.png	0
cfp/fs1fsd.png	oct/d4fsad.png	1

2. Configuration of fine-tuning workflows

To fine-tune EyeFM on downstream vision tasks, run:

cd image_module
python run_finetuning.py --config cfgs/finetune/finetune.yaml --finetune /path/to/checkpoint --in_domains cfp --epochs 50 --batch_size 16 --blr 0.0005 --input_size 224 --data_path path/to/fine-tuning/image --csv_path path/to/fine-tuning/csv/file

The fine-tuning scripts support both YAML configuration files (e.g., finetune.yaml) and command-line arguments.
Arguments specified in the configuration file override the default settings, while command-line arguments override both the default and configuration arguments.

3. Results interpretation

Given an input image, the model will output the probability distribution across different disease categories. For example, when provided with an image of diabetic retinopathy (DR), the model will output the probabilities corresponding to each of the five severity levels.

No retinopathy:0.0
Mild retinopathy:0.012
Moderate retinopathy:0.121
Severe retinopathy:0.753
Proliferative retinopathy:0.114

Pretraining of vision module

1. Prepare the datasets

During the pretraining phase, images are stored in a CSV file namedtrain.csv, as shown in the table below. In this file, the column headers cfp, oct, eyephoto, uwf, and ffa correspond to different image modalities: Color Fundus Photography (CFP), Optical Coherence Tomography (OCT), External Eye Photo (EEP), Ultra-Widefield Fundus (UWF), and Fundus Fluorescein Angiography (FFA). Each entry under these columns specifies the corresponding image name.

cfp	oct	eyephoto	uwf	ffa
cfp/dsg.png	oct/hsdg.png	eyephoto/gsag.png	uwf/shhsd.png	ffa/iqpakg.png
cfp/hsdjt.png	oct/asd5fh.png	eyephoto/sd7dslg.png	uwf/lajg.png	ffa/glmz9h.png

2. Configuration of pretraining workflows

To complete the pretraining of EyeFM, run:

cd image_module
python run_pretraining.py --config cfgs/pretrain/pretraining.yaml --epochs 800 --batch_size 100 --blr 0.0001 --input_size 224 --data_path path/to/pretraining/image --csv_path path/to/pretraining/csv

The pretraining scripts support both YAML configuration files (e.g., pretraining.yaml) and command-line arguments.
Arguments specified in the configuration file override the default settings, while command-line arguments override both the default and configuration arguments.

Pretraining of vision-language

After the pretraining of vision module, you can start vision-language joint training.

Preparing dataset

Images and texts are stored in JSON files: train.json, val.json, and test.json, corresponding to the training, validation, and testing sets, respectively.

These files should be placed in the same directory. In each JSON file, image refer to the image paths, while conversations denotes the related text.

{
    "id": "FFA1",
    "image": "FFA1.jpg",
    "conversations": [
      {
        "from": "human",
        "value": "What is the expression of the image\n<image>"
      },
      {
        "from": "ophthalmologist",
        "value": "The fluorescein angiography of the left eye shows early microvascular aneurysm-like proliferation starting at the posterior pole and peripheral regions. As the angiography progresses, fluorescein leakage gradually appears, with a cluster of high fluorescence observed at the branch of the temporal artery, followed by rapid fluorescein leakage. A large number of laser spots are seen from the posterior pole to the temporal peripheral area, with gradual fluorescein staining. In the later stages, a patchy, blurry high fluorescence is observed in the macular region, and the fluorescence of the optic disc gradually increases, with the boundary becoming discernible."
      }
    ]
  }

1. Initialization

To ensure the smooth adaptation, you can:

Obtain the original LLaVA-7B weight following the instructions here.
Obtain the original LLaMA-7B weight following the instructions here.
Utilize the weights of the visual module obtained from the above steps.

2. Ophthalmic concept alignment

This part is trained on 8 A800 GPUs with 40GB memory with the follow code:

torchrun --nnodes=1 --nproc_per_node=8 --master_port=25001 \ 
train/train_mem.py --model_name_or_path /EyeFM/cn_llava  --data_path /EyeFM/ours_cn_all.json --image_folder / --tune_mm_mlp_adapter True --output_dir /EyeFM/output --vision_tower /EyeFM/transfer_image_encoder --mm_vision_select_layer -2 --mm_use_im_start_end True --num_train_epochs 5 --per_device_train_batch_size 2 --per_device_eval_batch_size 1 --gradient_accumulation_steps 8 --evaluation_strategy "no" --save_strategy "steps" --save_steps 1000 --save_total_limit 3 --learning_rate 2e-3 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --model_max_length 1024 --lazy_preprocess True --gradient_checkpointing True --dataloader_num_workers 8 --report_to none --bf16 false --tf32 false

3. Ophthalmic visual instruct tuning

This part is trained on 8 A800 GPUs with 40GB memory with the follow code:

torchrun --nnodes=1 --nproc_per_node=8 --master_port=25001 \
train/train_mem.py --model_name_or_path /EyeFM/cn_llava  --data_path /EyeFM/ours_cn_all.json --image_folder / --tune_mm_mlp_adapter True --output_dir /EyeFM/output --vision_tower /EyeFM/transfer_image_encoder --mm_vision_select_layer -2 --mm_use_im_start_end True --num_train_epochs 3 --per_device_train_batch_size 1 --per_device_eval_batch_size 4 --gradient_accumulation_steps 8 --evaluation_strategy "no" --save_strategy "steps" --save_steps 1000 --save_total_limit 3 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --model_max_length 1024 --lazy_preprocess True --gradient_checkpointing True --dataloader_num_workers 8 --report_to none --bf16 false --tf32 false

4. LoRA

If you are do not have enough GPU memory, you can use LoRA following the intructions here

5. Evaluation

We evaluate models on the report generation and VQA tasks using Meteor and ROUNGE. The details of the tasks are shown in our paper.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
eval		eval
image_module		image_module
model		model
train		train
README.md		README.md
__init__.py		__init__.py
constants.py		constants.py
conversation.py		conversation.py
openai_api.py		openai_api.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

An eyecare foundation model for clinical assistance: a randomized controlled trial

Contents

Requirements

Prepare the environment

1. Download the project

2. Prepare the running environment

Fine-tuning of vision module

1. Prepare the datasets

2. Configuration of fine-tuning workflows

3. Results interpretation

Pretraining of vision module

1. Prepare the datasets

2. Configuration of pretraining workflows

Pretraining of vision-language

Preparing dataset

1. Initialization

2. Ophthalmic concept alignment

3. Ophthalmic visual instruct tuning

4. LoRA

5. Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

eyefm/EyeFM

Folders and files

Latest commit

History

Repository files navigation

An eyecare foundation model for clinical assistance: a randomized controlled trial

Contents

Requirements

Prepare the environment

1. Download the project

2. Prepare the running environment

Fine-tuning of vision module

1. Prepare the datasets

2. Configuration of fine-tuning workflows

3. Results interpretation

Pretraining of vision module

1. Prepare the datasets

2. Configuration of pretraining workflows

Pretraining of vision-language

Preparing dataset

1. Initialization

2. Ophthalmic concept alignment

3. Ophthalmic visual instruct tuning

4. LoRA

5. Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages