hvtiRutilities provides utility functions for working with clinical research data at the Cleveland Clinic Heart, Vascular and Thoracic Institute (HVTI) Clinical Outcomes Registries and Research (CORR) department. The package simplifies common data preparation tasks when working with SAS datasets in R.
-
r_data_types(): Automatically infer and convert data types in a dataset- Converts character columns to factors
- Detects binary numeric variables (0/1) and converts to logical
- Converts numeric variables with few unique values to factors
- Handles various NA representations ("NA", "na", etc.)
- Preserves variable labels from SAS/labelled data
-
label_map(): Extract variable labels from labeled datasets- Creates a lookup table mapping variable names to their labels
- Useful for working with SAS datasets that have variable labels
- Returns a data frame with
key(variable name) andlabelcolumns
-
sample_data(): Generate sample datasets for testing- Creates datasets with various column types for testing package functions
- Useful for examples and unit tests
-
generate_survival_data(): Simulate a cardiac surgery survival cohort- Generates realistic patient-level data including demographics, pre-operative labs, cardiac function, and surgical variables
- Survival times from a Weibull model with clinically-motivated linear predictor (LVEF, age, hemoglobin, NYHA class, eGFR)
- Includes reoperation outcome and administrative censoring up to 15 years
- Variable labels attached for compatibility with
havenandlabel_map()
You can install the development version of hvtiRutilities from GitHub with:
# install.packages("pak")
pak::pak("ehrlinger/hvtiRutilities")library(hvtiRutilities)
# Create sample data
dta <- sample_data(n = 100)
# Examine original types
str(dta)
# boolean: int (values: 1, 2)
# logical: chr (values: "F", "T")
# char: chr (values: "male", "female")
# Apply automatic type conversion
dta_converted <- r_data_types(dta)
# Examine converted types
str(dta_converted)
# boolean: logi (binary 1/2 → TRUE/FALSE)
# logical: Factor (character → factor)
# char: Factor (character → factor)# Skip conversion for specific variables
dta_partial <- r_data_types(dta, skip_vars = c("boolean", "char"))
# boolean and char remain unchanged, others are converted# Convert only variables with fewer than 5 unique values to factors
dta_strict <- r_data_types(dta, factor_size = 5)
# Keep binary variables as factors instead of logical
dta_factors <- r_data_types(dta, binary_factor = TRUE)# Create labeled data (common with SAS imports)
library(labelled)
dta <- data.frame(
age = c(25, 30, 35),
sex = c("M", "F", "M"),
bp = c(120, 130, 125)
)
var_label(dta$age) <- "Patient Age (years)"
var_label(dta$sex) <- "Patient Sex"
var_label(dta$bp) <- "Systolic Blood Pressure (mmHg)"
# Extract labels as a lookup table
labels <- label_map(dta)
print(labels)
# key label
# 1 age Patient Age (years)
# 2 sex Patient Sex
# 3 bp Systolic Blood Pressure (mmHg)
# Use for matching/joining
summary_table <- data.frame(variable = c("age", "bp"))
summary_table$label <- labels$label[match(summary_table$variable, labels$key)]# Simulate a cardiac surgery cohort (reproducible)
dta <- generate_survival_data(n = 500, seed = 1024)
# Event and reoperation rates
mean(dta$dead) # ~death rate
mean(dta$reop) # ~reoperation rate
# Integrate with the rest of the package
model_data <- r_data_types(
dta,
factor_size = 5,
skip_vars = c("ccfid", "iv_dead", "iv_reop")
)
# Extract variable labels for reporting
lmap <- label_map(model_data)- Preserves variable labels: All functions maintain SAS/labelled variable attributes
- Handles NA variants: Automatically converts "NA", "na", "Na", "nA" strings to actual NA values
- Type-safe: Returns the same data structure class as input (data.frame, tibble, data.table, etc.)
- Flexible control: Multiple parameters to customize type conversion behavior
- Package documentation:
?r_data_types,?label_map,?generate_survival_data - Vignettes:
vignette("hvtiRutilities"),vignette("survival-data") - For bug reports and feature requests: GitHub Issues
- For package news and changes: Run
hvtiRutilities.news()in R
GPL (>= 3)