Academia.eduAcademia.edu

Understanding the Limits of Large Datasets

2012, Journal of Cancer Education

Abstract

BACKGROUND-This study investigated missing data in a large cancer dataset, to alert educators to the implications and limitations of missing data. The authors examined the California Cancer Registry for missing data by eight common cancer sites, seven sociodemographic and clinical variables, and the top three reporting sources. The gender variable had no missing data, followed by age (0.1% missing), ethnicity (2.2%), stage (7.0%), differentiation (36.3%), and birthplace (42.5%). Hospitals'/clinics' reports had the lowest percentages of missing data. CONCLUSIONS-Educators should anticipate the limitations of missing data in large datasets to prevent methodological flaws and misinterpretations of research findings.