MedSAM-Agent: Empowering Interactive Medical Image Segmentation with Multi-turn Agentic Reinforcement Learning

🤖 Model | 🤗 Dataset | 📖 Paper

Shengyuan Liu¹ Liuxin Bao¹ Qi Yang^2,3 Wanting Geng^2,4 Boyun Zheng¹ Chenxin Li¹ Wenting Chen⁵ Houwen Peng^2✉ Yixuan Yuan^1✉

¹Chinese University of Hong Kong ²Hunyuan Group, Tencent ³Institute of Automation, the Chinese Academy of Sciences ⁴Dalian University of Technology ⁵Stanford University

^✉ Corresponding Author.

🚀Overview

In this work, we propose MedSAM-Agent, a framework that reformulates interactive segmentation as a multi-step autonomous decision-making process. First, we introduce a hybrid prompting strategy for expert-curated trajectory generation, enabling the model to internalize human-like decision heuristics and adaptive refinement strategies. Furthermore, we develop a two-stage training pipeline that integrates multi-turn, end-to-end outcome verification with a clinical-fidelity process reward design to promote interaction parsimony and decision efficiency.

✨ Todo List

Release the SFT and RL dataset for MedSAM-Agent.
Release the code of trajectory generation.
Release the paper, model and the base code for MedSAM-Agent.

Environment Setup

We use python 3.11/CUDA 12.9/torch 2.8.0 for implementation.
We train our models on 8 NVIDIA H20 GPUs with 96G memory.

# create environment
conda create -n msagent python=3.11 
conda activate msagent
pip install -r requirements.txt

📦Evaluation

Model Download

We support three segmentation backbones: MedSAM2, SAM, and IMISNet. Please download the checkpoints from:

MedSAM2: link
SAM2.1: link
IMISNet: link

For SAM2.1 and IMISNet, please also download the dependency repositories and install them:

cd third_party/
git clone https://github.com/facebookresearch/sam2.git
cd sam2
pip install -e .

Dataset Preparation

In this repo, our dataset is based on BioMedParse and UniBioMed. We evaluate our model on 6 modalities and 21 datasets. Details of dataset split can be found in our paper.

We will release the SFT trajectory dataset and RL training dataset soon.

Inference

Single sample (one image): Run the script in infer/run_single_inference.py with your paths:

cd infer
python run_single_inference.py \
  --img-path infer/demo/BTCV-0-106_CT_abdomen.png \
  --target-description "right kidney in abdomen CT" \
  --model-path /path/to/mllm_model \
  --seg-checkpoint /path/to/MedSAM2_latest.pt \
  --seg-model medsam

Whole-dataset / multi-GPU: Edit the variables at the top of infer/run_batch_inference.sh: MODEL_PATH (local Qwen checkpoint or gpt), SEG_MODEL (medsam, sam, imisnet), segmentation checkpoints/configs, DATA_ROOT, DATASETS, SPLIT, GPU topology (N_GPUS, PROCESSES_PER_GPU).

bash run_batch_inference.sh

RL Training

Environment Setup

Please follow the instructions in RL-verl/README.md to set up the Verl environment.

Notice: the version of Sglang==0.5.4

We support two segmentation backbones for RL training: MedSAM2 and IMISNet.

API Server (segmentation)

First, start the API server for segmentation model inference. You can choose either MedSAM2 or IMISNet by modifying the variables in RL-verl/api_server/run_api.sh:

bash RL-verl/api_server/run_api.sh

RL Training with Verl

Script: RL-verl/recipe/medsam_agent/run.sh
You can modify the following variables in run.sh:
- MODEL: segmentation backbone, options: medsam2 or imisnet
- SAVE_CHECKPOINT_DIR: root directory to save Verl training outputs
- DATASET_TRAIN: path to training dataset parquet file
- DATASET_VAL: path to validation dataset parquet file
- REF_MODEL_PATH: path to the base MLLM model (local checkpoint or Qwen/Qwen3-VL-8B-Instruct)

bash RL-verl/recipe/medsam_agent/run.sh

🎈Acknowledgements

Greatly appreciate the tremendous effort for the following projects!

📜Citation

If you find this work helpful for your project, please consider citing our paper.

@misc{liu2026medsamagentempoweringinteractivemedical,
      title={MedSAM-Agent: Empowering Interactive Medical Image Segmentation with Multi-turn Agentic Reinforcement Learning}, 
      author={Shengyuan Liu and Liuxin Bao and Qi Yang and Wanting Geng and Boyun Zheng and Chenxin Li and Wenting Chen and Houwen Peng and Yixuan Yuan},
      year={2026},
      eprint={2602.03320},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.03320}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
RL-verl		RL-verl
assets		assets
infer		infer
third_party/segment_anything		third_party/segment_anything
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MedSAM-Agent: Empowering Interactive Medical Image Segmentation with Multi-turn Agentic Reinforcement Learning

🚀Overview

✨ Todo List

Environment Setup

📦Evaluation

Model Download

Dataset Preparation

Inference

RL Training

Environment Setup

API Server (segmentation)

RL Training with Verl

🎈Acknowledgements

📜Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

MedSAM-Agent: Empowering Interactive Medical Image Segmentation with Multi-turn Agentic Reinforcement Learning

🚀Overview

✨ Todo List

Environment Setup

📦Evaluation

Model Download

Dataset Preparation

Inference

RL Training

Environment Setup

API Server (segmentation)

RL Training with Verl

🎈Acknowledgements

📜Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages