Skip to content

NHANES data curation project. Download files using the {nhanesA} R package. Store parquet files in publicly-accessible Cloudflare site

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

kyleGrealis/nhanesdata

Repository files navigation

nhanesdata

R-CMD-check Update NHANES Data Lifecycle: maturing License: MIT CRAN status CRAN downloads

The National Health and Nutrition Examination Survey (NHANES) is one of the most comprehensive public health datasets available, spanning over two decades of U.S. health data. But working with it has been frustrating. If you've tried using NHANES before, you've likely hit two major problems: (1) CDC server reliability issues that break reproducible research, and (2) cycle suffix confusion, where finding DEMO, DEMO_B, DEMO_C, all the way through DEMO_L makes data discovery a scavenger hunt.

nhanesdata solves both problems. All datasets are hosted on reliable cloud storage with fast access, and all survey cycles are already merged. Just use read_nhanes("demo") and you get demographics data from 1999-2023 with a year column tracking which cycle each observation belongs to. No CDC server timeouts, no suffix confusion.

All processed datasets are publicly available at https://nhanes.kylegrealis.com/ with no authentication required.

Acknowledgments

This package builds on the nhanesA package, which provides the foundation for accessing NHANES data through R.

Installation

# From CRAN (submitted for approval Feb. 18, 2026)
install.packages("nhanesdata")

# Development version from GitHub
pak::pak("kyleGrealis/nhanesdata")

Quick Start

library(nhanesdata)

# Load any dataset (case-insensitive)
demo   <- read_nhanes("demo")    # Demographics
bpx    <- read_nhanes("BPX")     # Blood pressure
trigly <- read_nhanes("TRIGLY")   # Triglycerides

# Search for variables
term_search("diabetes") # By keyword
var_search("RIDAGEYR")  # By variable name

# Get CDC documentation
get_url("DEMO_J")

All datasets include a year column (survey cycle start year) and seqn (participant ID). Join datasets on both columns:

library(dplyr)

analysis <- read_nhanes("demo") |>
  inner_join(read_nhanes("bpx"), by = c("seqn", "year"))

Functions

Function Purpose
read_nhanes() Load a pre-merged NHANES dataset from cloud storage
create_design() Create survey design objects with proper weighting for multiple cycles
term_search() Search variables by keyword or phrase
var_search() Search variables by exact name
get_url() Get CDC codebook URL for a specific table

All functions are case-insensitive.

Available Datasets

All standard NHANES datasets are included, except:

  • Surplus samples (requires special access)
  • Pooled samples (different analysis requirements)
  • Special samples (limited availability)
  • 2019-2020 cycle data (COVID-19 disruption)

Categories include:

  • Questionnaire/Interview: Demographics, health conditions, lifestyle factors, dietary data
  • Examination: Physical measurements, body composition, cardiovascular fitness
  • Laboratory: Biomarkers, environmental chemicals, infectious disease serology, nutritional status
  • Dietary: Dietary recall, supplement use, food frequency questionnaires

See the dataset catalog for the complete list, or browse inst/extdata/datasets.yml in the source.

Important Notes

  • The 2019-2020 survey cycle (suffix K) is excluded due to COVID-19 data collection disruptions. See vignette("covid-data-exclusion") for details.
  • Variable names match CDC documentation. Always verify definitions with get_url() since variable usage may differ across cycles.
  • Data types are automatically harmonized across cycles (integer vs. double, factor vs. character).

Direct Access (Without the Package)

library(arrow)
demo <- arrow::read_parquet("https://nhanes.kylegrealis.com/demo.parquet")

This works from any language with Arrow support. Dataset names in URLs are lowercase.

Getting Help

Related Packages

  • nhanesA: Direct interface to the NHANES API
  • survey: Complex survey analysis with proper weighting
  • srvyr: Tidy survey analysis using dplyr syntax
  • gtsummary: Publication-ready summary tables
  • sumExtras: Extended summary statistics and helpers

License

NHANES data is public domain (U.S. government). This processing code is MIT licensed.

About

NHANES data curation project. Download files using the {nhanesA} R package. Store parquet files in publicly-accessible Cloudflare site

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages