The National Health and Nutrition Examination Survey (NHANES) is one of the most comprehensive public health datasets available, spanning over two decades of U.S. health data. But working with it has been frustrating. If you've tried using NHANES before, you've likely hit two major problems: (1) CDC server reliability issues that break reproducible research, and (2) cycle suffix confusion, where finding DEMO, DEMO_B, DEMO_C, all the way through DEMO_L makes data discovery a scavenger hunt.
nhanesdata solves both problems. All datasets are hosted on reliable cloud storage with fast access, and all survey cycles are already merged. Just use read_nhanes("demo") and you get demographics data from 1999-2023 with a year column tracking which cycle each observation belongs to. No CDC server timeouts, no suffix confusion.
All processed datasets are publicly available at
https://nhanes.kylegrealis.com/with no authentication required.
This package builds on the nhanesA package, which provides the foundation for accessing NHANES data through R.
# From CRAN (submitted for approval Feb. 18, 2026)
install.packages("nhanesdata")
# Development version from GitHub
pak::pak("kyleGrealis/nhanesdata")library(nhanesdata)
# Load any dataset (case-insensitive)
demo <- read_nhanes("demo") # Demographics
bpx <- read_nhanes("BPX") # Blood pressure
trigly <- read_nhanes("TRIGLY") # Triglycerides
# Search for variables
term_search("diabetes") # By keyword
var_search("RIDAGEYR") # By variable name
# Get CDC documentation
get_url("DEMO_J")All datasets include a year column (survey cycle start year) and seqn (participant ID). Join datasets on both columns:
library(dplyr)
analysis <- read_nhanes("demo") |>
inner_join(read_nhanes("bpx"), by = c("seqn", "year"))| Function | Purpose |
|---|---|
read_nhanes() |
Load a pre-merged NHANES dataset from cloud storage |
create_design() |
Create survey design objects with proper weighting for multiple cycles |
term_search() |
Search variables by keyword or phrase |
var_search() |
Search variables by exact name |
get_url() |
Get CDC codebook URL for a specific table |
All functions are case-insensitive.
All standard NHANES datasets are included, except:
- Surplus samples (requires special access)
- Pooled samples (different analysis requirements)
- Special samples (limited availability)
- 2019-2020 cycle data (COVID-19 disruption)
Categories include:
- Questionnaire/Interview: Demographics, health conditions, lifestyle factors, dietary data
- Examination: Physical measurements, body composition, cardiovascular fitness
- Laboratory: Biomarkers, environmental chemicals, infectious disease serology, nutritional status
- Dietary: Dietary recall, supplement use, food frequency questionnaires
See the dataset catalog for the complete list, or browse inst/extdata/datasets.yml in the source.
- The 2019-2020 survey cycle (suffix K) is excluded due to COVID-19 data collection disruptions. See
vignette("covid-data-exclusion")for details. - Variable names match CDC documentation. Always verify definitions with
get_url()since variable usage may differ across cycles. - Data types are automatically harmonized across cycles (integer vs. double, factor vs. character).
library(arrow)
demo <- arrow::read_parquet("https://nhanes.kylegrealis.com/demo.parquet")This works from any language with Arrow support. Dataset names in URLs are lowercase.
- Documentation:
?read_nhanes,browseVignettes("nhanesdata") - Bug reports: GitHub Issues
- CDC NHANES: nhanes.cdc.gov
- nhanesA: Direct interface to the NHANES API
- survey: Complex survey analysis with proper weighting
- srvyr: Tidy survey analysis using dplyr syntax
- gtsummary: Publication-ready summary tables
- sumExtras: Extended summary statistics and helpers
NHANES data is public domain (U.S. government). This processing code is MIT licensed.