We introduce UniBiomed, the first universal foundation model for grounded biomedical image interpretation, which is capable of generating accurate diagnostic findings and simultaneously segmenting the corresponding biomedical targets. UniBiomed is based on a novel integration of Multi-modal Large Language Model (MLLM) and Segment Anything Model (SAM), which can effectively unify diverse biomedical tasks in universal training for advancing grounded interpretation.
Our model is available at HuggingFace.
git clone https://github.com/Luffy03/UniBiomed
cd UniBiomed
conda create -n UniBiomed python=3.10
conda activate UniBiomed
conda install pytorch==2.3.1 torchvision==0.18.1 pytorch-cuda=12.1 cuda -c pytorch -c "nvidia/label/cuda-12.1.0" -c "nvidia/label/cuda-12.1.1"
pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.3/index.html
pip install -r requirements.txtYou need to download sam2-hiera-large in the 'pretrained' path.
./ # project root
pretrained/
├── sam2_hiera_large.pt
Our curated datasets are available at Hugging face. Some of the datasets should be downloaded and processed from the original links. The datasets are organized as follows:
./ # project root
data/Biomed
├── CoCaHis
├──train
├──train_mask
├──test
├──test_mask
├──train.json
├──test.json
├── MedTrinity
├── MSD
├── ...
Quick start to use our model. A demo script is available at example.py and some examples are placed in './examples'.
import argparse
import torch
from transformers import (AutoModel, AutoTokenizer,
BitsAndBytesConfig, CLIPImageProcessor,
GenerationConfig)
def parse_args():
parser = argparse.ArgumentParser(description='UniBiomed')
parser.add_argument('--model_path', default='Luffy503/UniBiomed')
return args
args = parse_args()
# load model
model = AutoModel.from_pretrained(
args.model_path,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
use_flash_attn=True,
trust_remote_code=True,
).eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(
args.model_path,
trust_remote_code=True,
)
# define data input, image and text instruction
data_dict = {}
image, text = None, None
data_dict['image'] = image
data_dict['text'] = text
# output
pred_dict = model.predict_forward(**data_dict, tokenizer=tokenizer)
# text description
prediction = pred_dict['prediction']
# segmentation mask
mask = pred_dict['prediction_masks'][0][0]Run the following command for training (8*H800 GPUs).
bash tools/dist.sh train projects/unibiomed/configs/biomed.py 8For single-GPU, you can simply modify GPU number as 1
bash tools/dist.sh train projects/unibiomed/configs/biomed_subset.py 1The seed is set to 42.
After training, you need to save hugging face model for evaluation. Replace '$your_model$' as the real model path. The model will be saved to './save_hf'.
PYTHONPATH=. python projects/unibiomed/hf/convert_to_hf.py projects/unibiomed/configs/biomed.py --pth-model ./work_dirs/biomed/$your_model$.pth --save-path ./save_hfYou can use our trained model on hugging face for evaluation.
For segmentation (replace '$datasetname'):
PYTHONPATH=. python demo/demo_seg2D.py --val_folder /data/Biomed/$datasetname --work-dir ./val_results/$datasetname --model_path Luffy503/UniBiomed
# or one for all
bash demo_seg.shFor grounded disease recognition:
PYTHONPATH=. python demo/demo_disease.py --data_path ./data/Biomed/Disease/$datasetname --model_path Luffy503/UniBiomed --save_dir ./val_results/Grounded_disease/$datasetname
# eval metrics
python demo/eval_utils/metrics_grounded_disease.py --root ./data/Biomed/Disease/$datasetname --prediction_dir_path ./val_results/Grounded_disease/$datasetname
# or one for all
bash demo_disease.shFor Region understand:
PYTHONPATH=. python demo/demo_RegionCap.py --data_path ./data/Biomed/Disease/$datasetname --model_path Luffy503/UniBiomed --save_dir ./val_results/region_understand/$datasetname
# or one for all
bash demo_RegionCap.shFor medtrinity report generation:
PYTHONPATH=. python demo/demo_Medtrinity.py --model_path Luffy503/UniBiomed
# eval metrics
python demo/eval_utils/metrics_medtrinity.py --root ./data/Biomed/MedTrinity --gt_json_path train.json --prediction_dir_path ./val_results/MedTrinityFor radgenome report generation:
PYTHONPATH=. python demo/demo_GRG.py --model_path Luffy503/UniBiomed --save_dir ./val_results/Grounded_Report_Generation/RadGenome
# eval metrics
python demo/eval_utils/metrics_grg.py --root ./data/Biomed/RadGenome --prediction_dir_path ./val_results/Grounded_Report_Generation/RadGenomeOur work is developed on the great work Sa2VA. We highly appreciate their great efforts. We also thanks RadGenome, BiomedParse, VoCo, and MedTrinity for providing data preprocessing toolkits.
If you find this repo useful for your research, please consider citing the paper as follows:
@article{wu2025unibiomed,
title={Unibiomed: A universal foundation model for grounded biomedical image interpretation},
author={Wu, Linshan and Nie, Yuxiang and He, Sunan and Zhuang, Jiaxin and Luo, Luyang and Li, Tao and Xie, Zhuoyao and Chen, Dexuan and Zhao, Yinghua and Mahboobani, Neeraj and others},
journal={arXiv preprint arXiv:2504.21336},
year={2025}
}