Official implementation for the paper: "MedFactEval and MedAgentBrief: A Framework and Pipeline for Generating and Evaluating Factual Clinical Summaries"
Status: Accepted to the Pacific Symposium on Biocomputing (PSB), proceedings paper here.
Preprint: https://arxiv.org/abs/2509.05878
This repository provides the complete source code for MedFactEval, a framework for scalable, fact-grounded evaluation of AI-generated clinical text, and MedAgentBrief, a model-agnostic workflow for generating high-quality, factual discharge summaries.
Evaluating the factual accuracy of LLM-generated clinical text is a critical barrier to adoption. Manual expert review is the gold standard but is unscalable for continuous quality assurance.
MedFactEval reframes the ambiguous task of "is this a good summary?" into a series of concrete, verifiable questions based on clinician-defined key facts, which are then assessed by a robust LLM Jury for both factual presence and contradictions.
MedAgentBrief provides a multi-step, model-agnostic pipeline for generating high-fidelity clinical summaries from unstructured notes, using structured prompts and expert-informed templates.
-
MedFactEval Framework:
Scalable evaluation pipeline using an LLM Jury to assess factual presence and contradictions of AI-generated text against clinician-defined key facts. -
MedAgentBrief Workflow:
Model-agnostic, multi-step pipeline for generating high-fidelity clinical summaries from unstructured notes, with clear formatting and medical relevance. -
Single-Prompt Summarizer:
An alternative, streamlined summarization approach using a single, comprehensive prompt for rapid prototyping or baseline comparison. -
Analysis Code:
Jupyter notebooks and scripts to replicate the meta-evaluation, non-inferiority analysis, and performance trade-offs presented in our paper. -
Samples:
Anonymized examples of both MedAgentBrief-generated summaries and MedFactEval evaluation reports for reference.
medfacteval/
├── medfacteval/ # Core code for the MedFactEval framework and LLM Jury
│ ├── auto_eval.py # Main evaluation pipeline and reporting (LLM-as-a-judge approach)
│ ├── agnostic_evaluator_models.py # LLM and model-agnostic evaluation utilities
│ └── multi_evaluator.py # Multi-model evaluation (LLM Jury approach)
│
├── medagentbrief/ # Code for the MedAgentBrief generation workflow
│ ├── medagentbrief.py # Main multi-step summarization pipeline
│ ├── llm_calls.py # LLM interaction utilities
│ └── html_summary_generator.py # HTML report generation (matching tags to citations)
│
├── single_prompt/ # Single-prompt summarization baseline
│ └── single_prompt_summarizer.py
│
├── analysis/ # Meta-evaluation, statistical analysis, and figure generation
│ ├── agreements.ipynb # Reproduces Figure 2
│ ├── non_inferiority.ipynb # Reproduces Figure 3 and Figure S3
│ └── tradeoffs.ipynb # Reproduces Figure 1 and Figure S1 and Figure S2
│
├── samples/ # Anonymized samples of MedAgentBrief summaries and MedFactEval reports
│ ├── anonymized_medagentbrief_sumary.html
│ └── anonymized_medfacteval_report.html
│
├── supplementary_materials.pdf # Supplementary materials (Table S1, Figures S1-S3)
├── requirements.txt # Python dependencies and package requirements
└── README.md-
Clone the repository:
git clone https://github.com/yourusername/medfacteval.git cd medfacteval -
Install dependencies:
pip install -r requirements.txt
-
Configure Secure LLM API Access:
This project is designed to work with HIPAA-compliant LLM APIs (e.g., Stanford Health Care or Vertex AI) for handling patient health information.-
For MedAgentBrief (
medagentbrief/llm_calls.py):- Set your API key by editing the line:
lab_key = "insert_your_lab_key_or_load_it_from_env" - Or provide your Google Cloud JSON key in
credentials_path.
- Set your API key by editing the line:
-
For MedFactEval:
- Provide your credentials in
medfacteval/agnostic_evaluator_models.py.
- Provide your credentials in
-
For more details on secure API access, please see our documentation:
Secure LLM API Access Guide
-
-
Run MedAgentBrief or MedFactEval:
See the docstrings inmedagentbrief/medagentbrief.pyandmedfacteval/multi_evaluator.pyfor example usage and input formats.
This project is licensed under the MIT License. See LICENSE for details.
For questions or collaborations, please contact François Grolleau: [email protected]