MedFactEval

Official implementation for the paper: "MedFactEval and MedAgentBrief: A Framework and Pipeline for Generating and Evaluating Factual Clinical Summaries"

Status: Accepted to the Pacific Symposium on Biocomputing (PSB), proceedings paper here.
Preprint: https://arxiv.org/abs/2509.05878

This repository provides the complete source code for MedFactEval, a framework for scalable, fact-grounded evaluation of AI-generated clinical text, and MedAgentBrief, a model-agnostic workflow for generating high-quality, factual discharge summaries.

Overview

Evaluating the factual accuracy of LLM-generated clinical text is a critical barrier to adoption. Manual expert review is the gold standard but is unscalable for continuous quality assurance.

MedFactEval reframes the ambiguous task of "is this a good summary?" into a series of concrete, verifiable questions based on clinician-defined key facts, which are then assessed by a robust LLM Jury for both factual presence and contradictions.

MedAgentBrief provides a multi-step, model-agnostic pipeline for generating high-fidelity clinical summaries from unstructured notes, using structured prompts and expert-informed templates.

Key Features

MedFactEval Framework:
Scalable evaluation pipeline using an LLM Jury to assess factual presence and contradictions of AI-generated text against clinician-defined key facts.
MedAgentBrief Workflow:
Model-agnostic, multi-step pipeline for generating high-fidelity clinical summaries from unstructured notes, with clear formatting and medical relevance.
Single-Prompt Summarizer:
An alternative, streamlined summarization approach using a single, comprehensive prompt for rapid prototyping or baseline comparison.
Analysis Code:
Jupyter notebooks and scripts to replicate the meta-evaluation, non-inferiority analysis, and performance trade-offs presented in our paper.
Samples:
Anonymized examples of both MedAgentBrief-generated summaries and MedFactEval evaluation reports for reference.

Repository Structure

medfacteval/
├── medfacteval/                # Core code for the MedFactEval framework and LLM Jury
│   ├── auto_eval.py            # Main evaluation pipeline and reporting (LLM-as-a-judge approach)
│   ├── agnostic_evaluator_models.py # LLM and model-agnostic evaluation utilities
│   └── multi_evaluator.py      # Multi-model evaluation (LLM Jury approach)
│
├── medagentbrief/              # Code for the MedAgentBrief generation workflow
│   ├── medagentbrief.py        # Main multi-step summarization pipeline
│   ├── llm_calls.py            # LLM interaction utilities
│   └── html_summary_generator.py # HTML report generation (matching tags to citations)
│
├── single_prompt/              # Single-prompt summarization baseline
│   └── single_prompt_summarizer.py
│
├── analysis/                   # Meta-evaluation, statistical analysis, and figure generation
│   ├── agreements.ipynb        # Reproduces Figure 2
│   ├── non_inferiority.ipynb   # Reproduces Figure 3 and Figure S3
│   └── tradeoffs.ipynb         # Reproduces Figure 1 and Figure S1 and Figure S2
│
├── samples/                    # Anonymized samples of MedAgentBrief summaries and MedFactEval reports
│   ├── anonymized_medagentbrief_sumary.html
│   └── anonymized_medfacteval_report.html
│
├── supplementary_materials.pdf # Supplementary materials (Table S1, Figures S1-S3)
├── requirements.txt            # Python dependencies and package requirements
└── README.md

Quickstart

Clone the repository:

git clone https://github.com/yourusername/medfacteval.git
cd medfacteval

Install dependencies:
```
pip install -r requirements.txt
```
Configure Secure LLM API Access:
This project is designed to work with HIPAA-compliant LLM APIs (e.g., Stanford Health Care or Vertex AI) for handling patient health information.
- For MedAgentBrief (medagentbrief/llm_calls.py):
  - Set your API key by editing the line:
    lab_key = "insert_your_lab_key_or_load_it_from_env"
  - Or provide your Google Cloud JSON key in credentials_path.
- For MedFactEval:
  - Provide your credentials in medfacteval/agnostic_evaluator_models.py.
- For more details on secure API access, please see our documentation:
  Secure LLM API Access Guide
Run MedAgentBrief or MedFactEval:
See the docstrings in medagentbrief/medagentbrief.py and medfacteval/multi_evaluator.py for example usage and input formats.

License

This project is licensed under the MIT License. See LICENSE for details.

Contact

For questions or collaborations, please contact François Grolleau: [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MedFactEval

Overview

Key Features

Repository Structure

Quickstart

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
analysis		analysis
medagentbrief		medagentbrief
medfacteval		medfacteval
samples		samples
single_prompt		single_prompt
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
supplementary_materials.pdf		supplementary_materials.pdf

Folders and files

Latest commit

History

Repository files navigation

MedFactEval

Overview

Key Features

Repository Structure

Quickstart

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages