Understanding the Limits of Large Datasets

Duy Hiệp Nguyễn

Understanding the Limits of Large Datasets

Duy Hiệp Nguyễn

2012, Journal of Cancer Education

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

BACKGROUND-This study investigated missing data in a large cancer dataset, to alert educators to the implications and limitations of missing data. The authors examined the California Cancer Registry for missing data by eight common cancer sites, seven sociodemographic and clinical variables, and the top three reporting sources. The gender variable had no missing data, followed by age (0.1% missing), ethnicity (2.2%), stage (7.0%), differentiation (36.3%), and birthplace (42.5%). Hospitals'/clinics' reports had the lowest percentages of missing data. CONCLUSIONS-Educators should anticipate the limitations of missing data in large datasets to prevent methodological flaws and misinterpretations of research findings.

Recai Yucel

CHANCE, 2008

Cancer registries collect information on type of cancer, histological characteristics, stage at diagnosis, patient demographics, initial course of treatment including surgery, radiotherapy, and chemotherapy, and patient survival (Hewitt and Simone 1999). Such information can be valuable for studying the patterns of cancer epidemiology, diagnosis, treatment, and outcome. However, misreporting on registry information is unavoidable, and thus studies based solely on registry data would lead to invalid results.

Log In

Understanding the Limits of Large Datasets

Sign up for access to the world's latest research

Abstract

Related papers

Related topics

Related papers