Skip to content

pixas/MedSSS

Repository files navigation

MedSSS

📃 Paper |🤗 MedSSS-8B-Policy |🤗 MedSSS-8B-PRM | 📚 SFT/PRM Data

💫 News

  • 🔥 [2025/11/08] MedS$^3$ has been accepted as a poster at AAAI 2026 Main!

⚡Introduction

This repository contains the self-evolving pipeline of MedS$^3$, a slow-thinking small medical language models built with a self-evolution pipeline and an innovative soft dual-sided process supervision.

MedSSS

MedS$^3$ is a medical LLM designed for advanced medical reasoning, with reliable intermediate reasoning steps. It can leverage the PRM model to select the most correct response from several outputs. It supports both traditional medical question answering problems, as well as realistic clinical problems. It is built with the following three steps

  • Using Monte-Carlo Tree Search to self-collect correct and incorrect reasoning trajectories.
  • Use SFT to train a policy model in the correct trajectory set and use soft-label two-class classification to train a PRM model in both correct/incorrect internal reasoning steps.
  • Use PRM best-of-N decoding method to generate several candidate responses and use PRM to select the most appropriate one, with the highest PRM score.

We open-sourced our models, data, and code here.

👨‍⚕️ Model

  • Model Access
Backbone Supported Languages Link
MedSSS-8B-Policy LLaMA-3.1-8B English HF Link
MedSSS-8B-PRM LLaMA-3.1-8B English HF Link

Please follow the Huggingface page to deploy the two models

📚 Data

  • Data Access

You can access the detailed step-by-step solution in HF link.

About

Towards Medical Small Language Models with Self-Evolved \\ Slow Thinking

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors