Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2012, Journal of Cancer Education
BACKGROUND-This study investigated missing data in a large cancer dataset, to alert educators to the implications and limitations of missing data. The authors examined the California Cancer Registry for missing data by eight common cancer sites, seven sociodemographic and clinical variables, and the top three reporting sources. The gender variable had no missing data, followed by age (0.1% missing), ethnicity (2.2%), stage (7.0%), differentiation (36.3%), and birthplace (42.5%). Hospitals'/clinics' reports had the lowest percentages of missing data. CONCLUSIONS-Educators should anticipate the limitations of missing data in large datasets to prevent methodological flaws and misinterpretations of research findings.
CHANCE, 2008
Cancer registries collect information on type of cancer, histological characteristics, stage at diagnosis, patient demographics, initial course of treatment including surgery, radiotherapy, and chemotherapy, and patient survival (Hewitt and Simone 1999). Such information can be valuable for studying the patterns of cancer epidemiology, diagnosis, treatment, and outcome. However, misreporting on registry information is unavoidable, and thus studies based solely on registry data would lead to invalid results.
Cancer, 1985
Epidemiologic data that were obtained through the SEER registry (Surveillance Epidemiology End Results Program) are presented. Survival statistics for five major brain tumor types in childhood are presented for the years 1968 through 1979. Further survival statistics are compared in patients treated in community hospitals versu university hospitals. The lack of uniform reporting, absence of pathologic conformation, and limited number of patients seen throughout the country are emphasized. There is a need for the establishment of a data base composed of the member institutions of the childhood cancer consortiums. This data base would address questions of patterns of failure, benefits of surgical and clinical staging, and the value of a new putative pathologic grading system. Additionally, the opportunity to collect these patients should permit identification of long-term treatment effects. Identification of early morbidity may lead to modification of treatment programs. A data base containing treatment and diagnostic parameters would allow significant cross-correlations and would lead to the design of future studies that are based upon accurate data.
At present, the problem of missing data has received virtually no attention by medical and healthcare education researchers. This is a significant problem for the education research community because when missing data are disregarded or handled inappropriately it can result in serious validity threats. This article discusses the problem of missing data in the context of medical and healthcare education research and recommends appropriate methods for handling missing data.
Online Journal of Public Health Informatics, 2015
Cancer registry data collection involves, at a minimum, collecting data on demographics, tumor characteristics, and treatment. A common, identified, and standardized set of data elements is needed to share data quickly and efficiently with consumers of this data. This project highlights the fact that, there is a need to develop common data elements; Surveys were developed for central cancer registries (CCRs) and cancer researchers (CRs) at NCI-designated Cancer Centers, in order to understand data needs. Survey questions were developed based on the project focus, an evaluation of the research registries and database responses, and systematic review of the literature. Questions covered the following topics: 1) Research, 2) Data collection, 3) Database/ repository, 4) Use of data, 5) Additional data items, 6) Data requests, 7) New data fields, and 8) Cancer registry data set. A review of the surveys indicates that all cancer registries’ data are used for public health surveillance, an...
Pediatric Blood & Cancer, 2007
International Journal of Medical Informatics, 2018
Several methods have been suggested for evaluation of population-based cancer registries (PBCR) worldwide. However, most of these methods evaluate the data and outputs of the cancer registries. This study aimed to develop a comprehensive tool and protocol for evaluation of inputs, processes and outputs of a PBCR. Methods: The standards of the North American Association of Central Cancer Registries (NAACCR) were used to draft a comprehensive checklist. In addition, the national guidelines of PBCR were used to develop a questionnaire for evaluation of knowledge and practice of the PBCR personnel. Furthermore, a protocol for evaluation of the completeness and validity of the PBCR data was developed according to the International Agency for Research on Cancer (IARC) and the NAACCR guidelines. A 0-4 Likert based score and expert opinions (10 experts) were used to assess validity of the eight questionnaires/checklists. A modified Delphi method was applied to validate the checklists and questionnaires. Questions with a score higher than 3 remained in the final tools. Results: The final package consists of 546 questions including 108 (19.8%) for evaluation of guidelines, 54 (9.9%) for analysis and reports, 87 (15.9%) for governance and infrastructure, 155 (28.4%) for information technology, 21 (3.8%) for personnel knowledge and 121 (22.2%) for their practice. Additionally, data quality indicators were also considered for evaluation of PBCRs. Conclusion: This comprehensive tool can be used to show the gaps and limitations of the PBCR programs and provide informative clues for their improvement.
JMIR cancer, 2018
Cancer registries systematically collect cancer-related data to support cancer surveillance activities. However, cancer data are often unavailable for months to years after diagnosis, limiting its utility. The objective of this study was to identify the barriers to rapid cancer reporting and identify ways to shorten the turnaround time. Certified cancer registrars reporting to the Indiana State Department of Health cancer registry participated in a semistructured interview. Registrars were asked to describe the reporting process, estimate the duration of each step, and identify any barriers that may impact the reporting speed. Qualitative data analysis was performed with the intent of generating recommendations for workflow redesign. The existing and redesigned workflows were simulated for comparison. Barriers to rapid reporting included access to medical records from multiple facilities and the waiting period from diagnosis to treatment. The redesigned workflow focused on facilitat...
This research explores the substantial benefits of multidisciplinary modelling in enhancing comprehension of the complexities of cancer. A synergistic approach is fueled by the combination of data science approaches with medical knowledge, which enables the decoding of the multiple causes of cancer genesis, development and response to therapy. The symbiotic link between data-driven insights and clinical judgment is clarified in this paper. It examines how cancer research is changing because of the fusion of prediction models, biomarker identification, and therapy optimization. Interdisciplinary modelling has implications for diagnosis, prognosis, and therapeutic intervention in addition to data analysis. This research sheds light on the evolution of interdisciplinary modelling's influence on disentangling the intricacies of cancer via real-world case studies and analytical narratives.
North Carolina medical journal, 2014
European Journal of Cancer and Clinical Oncology
The quality of the recorded diagnosis is a major limit to the usefulness of Cancer Registry statistics that is easily overlooked by users of the data. With data from a largepopulationbased cancer registry as an example, we demonstrate how Registry statistics could be improved by wider use of three simple indices, name& (1) the proportion histologically verified (adjusted for age), (2) the proportion of verified cases with an uninformative diagnosis, and (3) the proportion of cases that are staged. We believe that greater awareness of the deficiencies of Cancer Registry statistics will lead to a more critical interpretation of them, and help stimulate efforts to rectif matters.
Cancer Medicine, 2021
BackgroundMissing patient reported outcomes data threaten the validity of PRO‐specific findings and conclusions from randomized controlled trials by introducing bias due to data missing not at random. Clinical Research Associates are a largely unexplored source for informing understanding of potential causes of missing PRO data. The purpose of this qualitative research was to describe factors that influence missing PRO data, as revealed through the lived experience of CRAs.MethodsMaximum variation sampling was used to select CRAs having a range of experiences with missing PRO data from academic or nonacademic centers in different geographic locations of Canada. Semistructured interviews were audio‐recorded, transcribed verbatim, and analyzed according to descriptive phenomenology.ResultsEleven CRAs were interviewed. Analysis revealed several factors that influence missing PRO data that were organized within themes. PROs for routine clinical care compete with PROs for RCTs. Both the ...
Methods in Molecular Biology, 2009
Cancer registries provide systematically collected information on cancer incidence, prevalence, mortality, and survival of different cancers. Aggregated and de-identified patient-level information on cancer is available for analysis from individual cancer registries, nationally from the Surveillance, Epidemiology, and End Results program, the Centers for Diseases Control and Prevention, the North American Association of Central Cancer Registries; and internationally from the International Association of Cancer Registries. Over the past few decades, the type and extent of cancer-related information captured by different cancer registries have been greatly expanded by linkage with other population-based information sources, such as the census data and the Centers for Medicare and Medicaid Services claims data. In addition, sophisticated statistical analytical techniques have been developed that have greatly expanded the traditional purview of cancer registries focused on descriptive epidemiology and disease quantification to a much broader analytical horizon ranging from study of cancer etiology; rare cancers in specific demographic groups; interaction of environmental and genetic factors in causation of cancer; impact of co-morbidities, race, geographic, socioeconomic, and provider-related factors on access, diagnosis, and treatment; outcomes and end results of cancer treatment; and cancer control initiatives to diverse areas of cancer care disparity, public health policy, public health education, and importantly, cost-effectiveness of cancer care. Thus, it is not surprising that cancer registries have increasingly become indispensable parts of local, national, and international cancer control programs, and it is certain that cancer registries will continue to be extraordinary resources of information for clinicians, researchers, scientists, policy makers, and the public in our fight against cancer.
British journal of cancer, 1997
There is a need to evaluate cancer services and provide a baseline on current treatment success and organization. This study shows that this process may be severely hindered by case note destruction or inaccessibility and incomplete information. This is an ongoing problem that needs to be addressed now.
Journal of oncology practice / American Society of Clinical Oncology, 2015
JCO precision oncology, 2020
PURPOSE Our goal was to identify the opportunities and challenges in analyzing data from the American Association of Cancer Research Project Genomics Evidence Neoplasia Information Exchange (GENIE), a multiinstitutional database derived from clinically driven genomic testing, at both the inter-and the intra-institutional level. Inter-institutionally, we identified genotypic differences between primary and metastatic tumors across the 3 most represented cancers in GENIE. Intra-institutionally, we analyzed the clinical characteristics of the Vanderbilt-Ingram Cancer Center (VICC) subset of GENIE to inform the interpretation of GENIE as a whole. METHODS We performed overall cohort matching on the basis of age, ethnicity, and sex of 13,208 patients stratified by cancer type (breast, colon, or lung) and sample site (primary or metastatic). We then determined whether detected variants, at the gene level, were associated with primary or metastatic tumors. We extracted clinical data for the VICC subset from VICC's clinical data warehouse. Treatment exposures were mapped to a 13-class schema derived from the HemOnc ontology. RESULTS Across 756 genes, there were significant differences in all cancer types. In breast cancer, ESR1 variants were over-represented in metastatic samples (odds ratio, 5.91; q , 10 −6). TP53 mutations were overrepresented in metastatic samples across all cancers. VICC had a significantly different cancer type distribution than that of GENIE but patients were well matched with respect to age, sex, and sample type. Treatment data from VICC was used for a bipartite network analysis, demonstrating clusters with a mix of histologies and others being more histology specific. CONCLUSION This article demonstrates the feasibility of deriving meaningful insights from GENIE at the inter-and intra-institutional level and illuminates the opportunities and challenges of the data GENIE contains. The results should help guide future development of GENIE, with the goal of fully realizing its potential for accelerating precision medicine.
Pathology & Oncology Research, 2002
Cancer Causes & Control, 2011
Objective-Cancer incidence and mortality statistics provide limited insight regarding the cancer survivor population and its needs. Cancer prevalence statistics enumerate cancer survivors-those currently living with cancer. Commonly used limited-duration prevalence (LDP) methods yield biased estimates of the number of survivors. National estimates may not allow sufficient granularity to inform local survivorship programs. In this study, complete prevalence (CP) methods are applied to actual North Carolina Central Cancer Registry (NCCCR) data to generate better, more informative prevalence estimates than previous methods. Methods-Data included all incident cases for 1995-2007 from the NCCCR and US Census population data. SEER*Stat software was used to calculate 13-year LDP. ComPrev software was used to estimate CP for each cancer site, gender, and race combination. Results-CP methods estimated 362,810 survivors in North Carolina on January 1, 2008, 40% more than LDP estimates of 258,556, with substantial racial, regional, and gender differences in prevalence rankings of several cancers. Conclusion-CP estimates are substantially higher than previous prevalence estimates. This study found previously unrecognized racial, regional, and gender differences. State and local programs may apply these methods using their own data to develop better, more detailed estimates to improve planning for their specific survivor populations' needs.
arXiv (Cornell University), 2022
Missing data is a common concern in health datasets, and its impact on good decision-making processes is well documented. Our study's contribution is a methodology for tackling missing data problems using a combination of synthetic dataset generation, missing data imputation and deep learning methods to resolve missing data challenges. Specifically, we conducted a series of experiments with these objectives; a) generating a realistic synthetic dataset, b) simulating data missingness, c) recovering the missing data, and d) analyzing imputation performance. Our methodology used a gaussian mixture model whose parameters were learned from a cleaned subset of a real demographic and health dataset to generate the synthetic data. We simulated various missingness degrees ranging from 10%, 20%, 30%, and 40% under the missing completely at random scheme MCAR. We used an integrated performance analysis framework involving clustering, classification and direct imputation analysis. Our results show that models trained on synthetic and imputed datasets could make predictions with an accuracy of 83% and 80% on a) an unseen real dataset and b) an unseen reserved synthetic test dataset, respectively. Moreover, the models that used the DAE method for imputation yielded the lowest log loss an indication of good performance, even though the accuracy measures were slightly lower. In conclusion, our work demonstrates that using our methodology, one can reverse engineer a solution to resolve missingness on an unseen dataset with missingness. Moreover, though we used a health dataset, our methodology can be utilized in other contexts.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.