A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
-
2023-05-31 update.new paper "Polaris: A Safety-focused LLM Constellation Architecture for Healthcare"
-
2023-05-31 update.new paper "Medical mT5: an open-source multilingual text-to-text LLM for the medical domain"
-
2023-05-31 update.new paper "Apollo: An Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People"
-
2023-05-31 update.new paper "LLM-CXR: INSTRUCTION-FINETUNED LLM FOR CXR IMAGE UNDERSTANDING AND GENERATION"
-
2023-05-31 update.new paper "Me LLaMA: Foundation large language models for medical applications"
-
2023-05-31 update.new paper "BioMistral: A collection of open-source pretrained large language models for medical domains"
-
2023-05-31 update.new paper "OncoGPT: A medical conversational model tailored with oncology domain expertise on a large language model Meta-AI (LLaMA)"
-
2023-03-17 update.new paper "Health-LLM: Personalized Retrieval-Augmented Disease Prediction System"
-
2023-03-17 update.new paper "HealAI: A Healthcare LLM for Effective Medical Documentation"
-
2023-03-17 update.new paper "BiMediX: Bilingual Medical Mixture of Experts LLM"
-
2023-03-17 update.new paper "JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning and Professional Question Answering Capability"
-
2023-03-17 update.new paper "MedChatZH: A tuning LLM for traditional Chinese medicine consultation"
-
2023-10-18 added new paper "Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-world Multi-turn Dialogue".
-
2023-10-18 added new paper "Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model".
-
2023-10-9 We release the version 1 of the survey (https://arxiv.org/abs/2310.05694).
-
Introduction
-
What LLMs Can Do for Healthcare? From Fundamental Tasks to Advanced Applications
- NER and RE for Healthcare Alpacare
- Text Classification for Healthcare
- Semantic Textual Similarity for Healthcare
- Question Answering for Healthcare
- Dialogue System for Healthcare
- Generation of Medical Reports from Images
- Summary
-
From LMs to LLMs for Healthcare
- LMs for Healthcare
- LLMs for Healthcare
-
Train and Use LLM for Healthcare
- Pre-training Methods
- Masked Language Modeling
- Next Word Prediction
- Sequence-to-sequence MLM
- Replaced Token Detection
- Sentence Boundary Detection
- Next Sentence Prediction
- Sentence Order Prediction
-
Post-training Methods
- From predicting tokens to follow instructions: Instruction Fine-Tuning and Supervised Fine-tuning
- Reinforced Learning from Human Feedback
- From Human Feedback to AI Feedback
- Summary
-
Usage
- From Fine-tuning to In-context Learning
- From System 1 Deep Learning To System 2 Deep Learning: Chain-of-Thought
- AI Agents
- Summary
-
Parameters-, Memory-, and Compute-efficient Methods
- Parameters-efficient Methods
- Compute-efficient and Memory-efficient Methods
-
Useful Resources
- OpenBMB
- DeepSpeed Chat
- Training Data
- Summary
-
Evaluation Method
- General NLP tasks Evaluation
- Healthcare Evaluation
- Evaluation of Robustness, Bias, and Ethics
- Future Directions for Health Evaluation
- Summary
-
Improving Fairness, Accountability, Transparency, and Ethics
- Fairness
- Accountability
- Transparency
- Ethics
-
Future work and Conclusion
- Future Work
- Medical knowledge enhancement
- Integration with Healthcare process
- Effective Interaction with Patients and Doctors
- Hallucinations, Misunderstandings and Prompt Brittleness
-
Conclusion
Fig. 2. The organizational framework for the content. Section III, Section IV, Section V are technology details, while Section II, Section VI and Section VI
are more valued for Healthcare professionals

TABLE I BRIEF SUMMARIZATION OF EXISTING PLMS FOR HEALTHCARE.
TABLE II SUMMARIZATION OF TRAINING DATA AND EVALUATION TASKS FOR EXISTING PLMS FOR HEALTHCARE.
| Model Name | Method | Training Data | Eval task |
|---|---|---|---|
| BioBERT | FT | PubMed, PMC | Biomedical NER, RE, QA |
| BlueBert | FT | PubMed, MIMIC-III | BLUE |
| MIMIC-BERT | FT | MIMIC-III | Biomedical NER |
| BioFLAIR~ | FT | PubMed | Bio NER |
| Bio-ELECTRA-small | PT | PubMed | Biomedical NER |
| AlphaBERT | FT | Discharge diagnoses | Extractive Summarization Task |
| Spanish-bert | FT | Spanish | Spanish Clinical Case Corpus |
| GreenCovidSQuADBERT | FT | CORD19, PubMed, PMC | NER, QA |
| BEHRT | PT | CPRD, HES | Disease Prediction |
| BioMed-RoBERTa | FT | BIOMED | CHEMPROT, RCT |
| RadBERT~ | FT | Radiology Report Corpus | Report Coding, Summarization |
| CT-BERT~ | FT | Tweet | COVID-19 Text Classification |
| French-BERT | FT | French clinical documents | DEFT challenge |
| FS-/RAD-/GER-BERT | FT,PT | Unstructured radiology reports | Chest Radiograph Reports Classification |
| Japanese-BERT | FT | Japanese EHR | Symptoms Classification |
| MC-BERT | FT | Chinese EHR | Chinese Biomedical Evaluation benchmark |
| BioALBERT-ner | FT | PubMed, PMC | Biomedical NER |
| BioMegatron | PT | PubMed | biomedical NER, RE, QA |
| CharacterBERT | Bert | OpenWebText, MIMIC-III, PMC | Medical NER, NLI, RE, SS |
| ClinicalBert | FT | MIMIC-III | Hospital Readmission Prediction |
| Clinical XLNet | FT | MIMIC-III | PMV, Mortality |
| Bio-LM | FT | PubMed, PMC, MIMIC-III | 18 Biomedical NLP Tasks |
| BioBERTpt | FT | Private clinical notes, WMT16 | SemClinBr |
| RoBERTa-MIMIC | FT | i2b2 2010, 2012, n2c2 2018 | i2b2 2010, 2012, N2C2 2018 |
| Clinical KB-ALBERT | FT | MIMIC-III, UMLS | MedNLI, i2b2 2010, 2012 |
| CHMBERT | FT | Medical text data | Disease Prediction |
| PubMedBERT | PT | PubMed | BLURB |
| ouBioBERT | FT | PubMed, Wikipedia | BLUE |
| BERT-EHR | FT | General EHR | Myocardial Infarction, Breast Cancer, Liver Cirrhosis |
| AraBERT | PT | Arabic Wikipedia, OSIAN | Arabic SA, NER, QA |
| ABioNER | FT | Arabic scientific literature | Arabic NER |
| ELECTRAMed | FT | PubMed | Biomedical NER, RE, and QA |
| KeBioLM | FT | PubMed | BLURB |
| SINA-BERT | FT | Online Persian source | Persian QA, SA |
| Med-BERT | FT | General EHR | Disease prediction |
| Galén | FT | Private clinical cases | CodiEsp-D, CodiEsp-P, Cantemist-Coding tasks |
| SCIFIVE~ | T5 | PubMed, PMC | Biomedical NER, RE, NIL, QA |
| BioELECTRA | PT | PubMed, PMC | BLURB, BLUE |
| UmlsBERT | FT | MIMIC-III | MedNLI, i2b2 2006,2010, 2012, 2014 |
| MedGPT | FT | MIMIC-III, private EHRs | Disorder Prediction |
| MentalBERT | FT | Depression Stress, Suicide Detection, | |
| CODER | FT | UMLS | MCSM, Medical RE |
| BioLinkBERT~ | FT | PubMed | BLURB, USMLE |
| BioALBERT | FT | PubMed, PMC, MIMIC-III | 6 BioNLP Tasks |
| BioBART~ | FT | PubMed | Biomedical EL, NER, QA, Dialogue, Summarization |
| SAPBERT | FT | UMLS | MEL |
| VPP | FT | PubMed | Biomedical NER |
| KAD | FT | MIMIC-CXR | PadChest, ChestXray14, CheXpert and ChestX-Det10 |
TABLE VIII THE STATISTICS OF COMPUTATION COST FOR EXISTING HEALTHCARE LLM.
| Model Name | Total data size | epoch | Batch size | GPU type | GPU number | GPU time |
|---|---|---|---|---|---|---|
| Visual Med-Alpaca | 54k data points | 3 | 128 | A100-80G | 4 | 2.51 hours |
| GatorTron | \textgreater 90 billion words | 10 | - | A100 | 992 | 6 days |
| Galactica | - | - | - | A100-80G | 128 | - |
| ChatDoctor | 100k conversations | 3 | 192 | A100 | 6 | 3 hours |
| DoctorGLM | 3.5G | 1 | 4 | A100-80G | 1 | 8 hours |
| PMC-LLaMA | 75B tokens | 5 | 128 | A100 | 8 | 7 days |
| Visual Med-Alpaca | 44.8MB* (without images) | - | 128 | A100-80G | 4 | 2.51 hours |
| BianQue 1.0 | 9 million samples | 1 | - | RTX 4090 | 8 | 16 days |
| GatorTronGPT | 277B tokens | 1,120/560 | A100-80G | 560 | 26 days | |
| HuatuoGPT | 226,042 instances | 3 | 128 | A100 | 8 | - |
| LLaVA-Med | 15 million figure-caption pairs | - | - | A100 | 8 | 15 hours |
| Med-Flamingo | 1.3M image-caption pairs | - | 400 | A100-80G | 8 | 6.75 days |
TABLE IX ESTIMATED FLOPS AND TRAINING TOKENS FOR DIFFERENT MODEL SIZES.
| Parameters | FLOPs | FLOPs (in Gopher unit) | Tokens |
|---|---|---|---|
| 400 Million | 1.92e+19 | 1/29, 968 | 8.0 Billion |
| 1 Billion | 1.21e+20 | 1/4, 761 | 20.2 Billion |
| 10 Billion | 1.23e+22 | 1/46 | 205.1 Billion |
| 67 Billion | 5.76e+23 | 1 | 1.5 Trillion |
| 175 Billion | 3.85e+24 | 6.7 | 3.7 Trillion |
| 280 Billion | 9.90e+24 | 17.2 | 5.9 Trillion |
| 520 Billion | 3.43e+25 | 59.5 | 11.0 Trillion |
| 1 Trillion | 1.27e+26 | 221.3 | 21.2 Trillion |
| 10 Trillion | 1.30e+28 | 22515.9 | 216.2 Trillion |
@misc{he2023survey,
title={A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics},
author={Kai He and Rui Mao and Qika Lin and Yucheng Ruan and Xiang Lan and Mengling Feng and Erik Cambria},
year={2023},
eprint={2310.05694},
archivePrefix={arXiv},
primaryClass={cs.CL}
}