Papers by Ashok Krishnamurthy

Genome-scale sequencing data have yet to be widely used in clinical medicine. Indeed, it is not c... more Genome-scale sequencing data have yet to be widely used in clinical medicine. Indeed, it is not clear whether the routine accumulation of massive amounts of largely uninterpretable genomic data will yield a net benefit in terms of improving health. Nevertheless, the use of genomic data is certain to grow, and the medical community will need to consider how much genomic data to store and how to interpret and communicate those data to both patients and providers. To meet these emerging challenges and ultimately facilitate the optimal use of genomic data, we advocate for the development of a two-pronged Genomic Clinical Decision Support System that encompasses the concept of the Clinical Mendeliome and introduces the concept of the Archival Value Criterion. The model we propose is designed to stimulate effective clinical use of genomic data, drive genomic research, and meet both current and future needs in medicine and research.

Evaluating robustness of a generalized linear model when applied to electronic health record data accessed using an Open API
Health Informatics Journal, Apr 1, 2023
The Integrated Clinical and Environmental Exposures Service (ICEES) provides open regulatory-comp... more The Integrated Clinical and Environmental Exposures Service (ICEES) provides open regulatory-compliant access to clinical data, including electronic health record data, that have been integrated with environmental exposures data. While ICEES has been validated in the context of an asthma use case and several other use cases, the regulatory constraints on the ICEES open application programming interface (OpenAPI) result in data loss when using the service for multivariate analysis. In this study, we investigated the robustness of the ICEES OpenAPI through a comparative analysis, in which we applied a generalized linear model (GLM) to the OpenAPI data and the constraint-free source data to examine factors predictive of asthma exacerbations. Consistent with previous studies, we found that the main predictors identified by both analyses were sex, prednisone, race, obesity, and airborne particulate exposure. Comparison of GLM model fit revealed that data loss impacts model quality, but only with select interaction terms. We conclude that the ICEES OpenAPI supports multivariate analysis, albeit with potential data loss that users should be aware of.

International Journal of Environmental Research and Public Health, Oct 29, 2021
ICEES (Integrated Clinical and Environmental Exposures Service) provides a diseaseagnostic, regul... more ICEES (Integrated Clinical and Environmental Exposures Service) provides a diseaseagnostic, regulatory-compliant approach for openly exposing and analyzing clinical data that have been integrated at the patient level with environmental exposures data. ICEES is equipped with basic features to support exploratory analysis using statistical approaches, such as bivariate chi-square tests. We recently developed a method for using ICEES to generate multivariate tables for subsequent application of machine learning and statistical models. The objective of the present study was to use this approach to identify predictors of asthma exacerbations through the application of three multivariate methods: conditional random forest, conditional tree, and generalized linear model. Among seven potential predictor variables, we found five to be of significant importance using both conditional random forest and conditional tree: prednisone, race, airborne particulate exposure, obesity, and sex. The conditional tree method additionally identified several significant two-way and three-way interactions among the same variables. When we applied a generalized linear model, we identified four significant predictor variables, namely prednisone, race, airborne particulate exposure, and obesity. When ranked in order by effect size, the results were in agreement with the results from the conditional random forest and conditional tree methods as well as the published literature. Our results suggest that the open multivariate analytic capabilities provided by ICEES are valid in the context of an asthma use case and likely will have broad value in advancing open research in environmental and public health.

JAMIA Open, 2021
Objectives Social determinants of health (SDH), key contributors to health, are rarely systematic... more Objectives Social determinants of health (SDH), key contributors to health, are rarely systematically measured and collected in the electronic health record (EHR). We investigate how to leverage clinical notes using novel applications of multi-label learning (MLL) to classify SDH in mental health and substance use disorder patients who frequent the emergency department. Methods and Materials We labeled a gold-standard corpus of EHR clinical note sentences (N = 4063) with 6 identified SDH-related domains recommended by the Institute of Medicine for inclusion in the EHR. We then trained 5 classification models: linear-Support Vector Machine, K-Nearest Neighbors, Random Forest, XGBoost, and bidirectional Long Short-Term Memory (BI-LSTM). We adopted 5 common evaluation measures: accuracy, average precision–recall (AP), area under the curve receiver operating characteristic (AUC-ROC), Hamming loss, and log loss to compare the performance of different methods for MLL classification using ...

JMIR Public Health and Surveillance, 2021
BackgroundAs the world faced the pandemic caused by the novel coronavirus disease 2019 (COVID-19)... more BackgroundAs the world faced the pandemic caused by the novel coronavirus disease 2019 (COVID-19), medical professionals, technologists, community leaders, and policy makers sought to understand how best to leverage data for public health surveillance and community education. With this complex public health problem, North Carolinians relied on data from state, federal, and global health organizations to increase their understanding of the pandemic and guide decision-making.ObjectiveWe aimed to describe the role that stakeholders involved in COVID-19–related data played in managing the pandemic in North Carolina. The study investigated the processes used by organizations throughout the state in using, collecting, and reporting COVID-19 data.MethodsWe used an exploratory qualitative study design to investigate North Carolina’s COVID-19 data collection efforts. To better understand these processes, key informant interviews were conducted with employees from organizations that collected...

An approach for open multivariate analysis of integrated clinical and environmental exposures data
Informatics in Medicine Unlocked, 2021
ABSTRACTThe Integrated Clinical and Environmental Exposures Service (ICEES) provides regulatory-c... more ABSTRACTThe Integrated Clinical and Environmental Exposures Service (ICEES) provides regulatory-compliant open access to sensitive patient data that have been integrated with public exposures data. ICEES was designed initially to support dynamic cohort creation and bivariate contingency tests. The objective of the present study was to develop an open approach to support multivariate analyses using existing ICEES functionalities and abiding by all regulatory constraints. We first developed an open approach for generating a multivariate table that maintains contingencies between clinical and environmental variables using programmatic calls to the open ICEES application programming interface. We then applied the approach to data on a large cohort (N = 22,365) of patients with asthma or related conditions and generated an eight-feature table. Due to regulatory constraints, data loss was incurred with the incorporation of each successive feature variable, from a starting sample size of N = 22,365 to a final sample size of N = 4,556 (20.5%), but data loss was < 10% until the addition of the final two feature variables. We then applied a generalized linear model to the subsequent dataset and focused on the impact of seven select feature variables on asthma exacerbations, defined as annual emergency department or inpatient visits for respiratory issues. We identified five feature variables—sex, race, obesity, prednisone, and airborne particulate exposure—as significant predictors of asthma exacerbations. We discuss the advantages and disadvantages of ICEES open multivariate analysis and conclude that, despite limitations, ICEES can provide a valuable resource for open multivariate analysis and can serve as an exemplar for regulatory-compliant informatics solutions to open patient data, with capabilities to explore the impact of environmental exposures on health outcomes.

Journal of the American Medical Informatics Association
Research increasingly relies on interrogating large-scale data resources. The NIH National Heart,... more Research increasingly relies on interrogating large-scale data resources. The NIH National Heart, Lung, and Blood Institute developed the NHLBI BioData CatalystⓇ (BDC), a community-driven ecosystem where researchers, including bench and clinical scientists, statisticians, and algorithm developers, find, access, share, store, and compute on large-scale datasets. This ecosystem provides secure, cloud-based workspaces, user authentication and authorization, search, tools and workflows, applications, and new innovative features to address community needs, including exploratory data analysis, genomic and imaging tools, tools for reproducibility, and improved interoperability with other NIH data science platforms. BDC offers straightforward access to large-scale datasets and computational resources that support precision medicine for heart, lung, blood, and sleep conditions, leveraging separately developed and managed platforms to maximize flexibility based on researcher needs, expertise,...

JAMA Network Open
ImportanceThe COVID-19 pandemic has been associated with an increase in mental health diagnoses a... more ImportanceThe COVID-19 pandemic has been associated with an increase in mental health diagnoses among adolescents, though the extent of the increase, particularly for severe cases requiring hospitalization, has not been well characterized. Large-scale federated informatics approaches provide the ability to efficiently and securely query health care data sets to assess and monitor hospitalization patterns for mental health conditions among adolescents.ObjectiveTo estimate changes in the proportion of hospitalizations associated with mental health conditions among adolescents following onset of the COVID-19 pandemic.Design, Setting, and ParticipantsThis retrospective, multisite cohort study of adolescents 11 to 17 years of age who were hospitalized with at least 1 mental health condition diagnosis between February 1, 2019, and April 30, 2021, used patient-level data from electronic health records of 8 children’s hospitals in the US and France.Main Outcomes and MeasuresChange in the mo...

Tissue clearing methods allow every cell in the mouse brain to be imaged without physical section... more Tissue clearing methods allow every cell in the mouse brain to be imaged without physical sectioning. However, the computational tools currently available for cell quantification in cleared tissue images have been limited to counting sparse cell populations in stereotypical mice. Here we introduce NuMorph, a group of image analysis tools to quantify all nuclei and nuclear markers within the mouse cortex after tissue clearing and imaging by a conventional light-sheet microscope. We applied NuMorph to investigate two distinct mouse models: aTopoisomerase 1(Top1) conditional knockout model with severe neurodegenerative deficits and aNeurofibromin 1(Nf1) conditional knockout model with a more subtle brain overgrowth phenotype. In each case, we identified differential effects of gene deletion on individual cell-type counts and distribution across cortical regions that manifest as alterations of gross brain morphology. These results underline the value of 3D whole brain imaging approaches...

Frontiers in Artificial Intelligence
Research on rare diseases has received increasing attention, in part due to the realized profitab... more Research on rare diseases has received increasing attention, in part due to the realized profitability of orphan drugs. Biomedical informatics holds promise in accelerating translational research on rare disease, yet challenges remain, including the lack of diagnostic codes for rare diseases and privacy concerns that prevent research access to electronic health records when few patients exist. The Integrated Clinical and Environmental Exposures Service (ICEES) provides regulatory-compliant open access to electronic health record data that have been integrated with environmental exposures data, as well as analytic tools to explore the integrated data. We describe a proof-of-concept application of ICEES to examine demographics, clinical characteristics, environmental exposures, and health outcomes among a cohort of patients enriched for phenotypes associated with cystic fibrosis (CF), idiopathic bronchiectasis (IB), and primary ciliary dyskinesia (PCD). We then focus on a subset of pa...

Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science
Clinical and Translational Science
Within clinical, biomedical, and translational science, an increasing number of projects are adop... more Within clinical, biomedical, and translational science, an increasing number of projects are adopting graphs for knowledge representation. Graph‐based data models elucidate the interconnectedness among core biomedical concepts, enable data structures to be easily updated, and support intuitive queries, visualizations, and inference algorithms. However, knowledge discovery across these “knowledge graphs” (KGs) has remained difficult. Data set heterogeneity and complexity; the proliferation of ad hoc data formats; poor compliance with guidelines on findability, accessibility, interoperability, and reusability; and, in particular, the lack of a universally accepted, open‐access model for standardization across biomedical KGs has left the task of reconciling data sources to downstream consumers. Biolink Model is an open‐source data model that can be used to formalize the relationships between data structures in translational science. It incorporates object‐oriented classification and gr...

Bioinformatics
Motivation As the number of public data resources continues to proliferate, identifying relevant ... more Motivation As the number of public data resources continues to proliferate, identifying relevant datasets across heterogenous repositories is becoming critical to answering scientific questions. To help researchers navigate this data landscape, we developed Dug: a semantic search tool for biomedical datasets utilizing evidence-based relationships from curated knowledge graphs to find relevant datasets and explain why those results are returned. Results Developed through the National Heart, Lung and Blood Institute’s (NHLBI) BioData Catalyst ecosystem, Dug has indexed more than 15 911 study variables from public datasets. On a manually curated search dataset, Dug’s total recall (total relevant results/total results) of 0.79 outperformed default Elasticsearch’s total recall of 0.76. When using synonyms or related concepts as search queries, Dug (0.36) far outperformed Elasticsearch (0.14) in terms of total recall with no significant loss in the precision of its top results. Availabili...

NuMorph: Tools for cortical cellular phenotyping in tissue-cleared whole-brain images
Cell Reports, 2021
SUMMARY Tissue-clearing methods allow every cell in the mouse brain to be imaged without physical... more SUMMARY Tissue-clearing methods allow every cell in the mouse brain to be imaged without physical sectioning. However, the computational tools currently available for cell quantification in cleared tissue images have been limited to counting sparse cell populations in stereotypical mice. Here, we introduce NuMorph, a group of analysis tools to quantify all nuclei and nuclear markers within the mouse cortex after clearing and imaging by light-sheet microscopy. We apply NuMorph to investigate two distinct mouse models: a Topoisomerase 1 (Top1) model with severe neurodegenerative deficits and a Neurofibromin 1 (Nf1) model with a more subtle brain overgrowth phenotype. In each case, we identify differential effects of gene deletion on individual cell-type counts and distribution across cortical regions that manifest as alterations of gross brain morphology. These results underline the value of whole-brain imaging approaches, and the tools are widely applicable for studying brain structure phenotypes at cellular resolution.

Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, 2019
Image-based cell counting is a fundamental yet challenging task with wide applications in biologi... more Image-based cell counting is a fundamental yet challenging task with wide applications in biological research. In this paper, we propose a novel Deep Network designed to universally solve this problem for various cell types. Specifically, we first extend the segmentation network, U-Net with a Self-Attention module, named SAU-Net, for cell counting. Second, we design an online version of Batch Normalization to mitigate the generalization gap caused by data augmentation in small datasets. We evaluate the proposed method on four public cell counting benchmarks-synthetic fluorescence microscopy (VGG) dataset, Modified Bone Marrow (MBM) dataset, human subcutaneous adipose tissue (ADI) dataset, and Dublin Cell Counting (DCC) dataset. Our method surpasses the current state-ofthe-art performance in the three real datasets (MBM, ADI and DCC) and achieves competitive results in the synthetic dataset (VGG).

JMIR Medical Informatics, 2019
Background In a multisite clinical research collaboration, institutions may or may not use the sa... more Background In a multisite clinical research collaboration, institutions may or may not use the same common data model (CDM) to store clinical data. To overcome this challenge, we proposed to use Health Level 7’s Fast Healthcare Interoperability Resources (FHIR) as a meta-CDM—a single standard to represent clinical data. Objective In this study, we aimed to create an open-source application termed the Clinical Asset Mapping Program for FHIR (CAMP FHIR) to efficiently transform clinical data to FHIR for supporting source-agnostic CDM-to-FHIR mapping. Methods Mapping with CAMP FHIR involves (1) mapping each source variable to its corresponding FHIR element and (2) mapping each item in the source data’s value sets to the corresponding FHIR value set item for variables with strict value sets. To date, CAMP FHIR has been used to transform 108 variables from the Informatics for Integrating Biology & the Bedside (i2b2) and Patient-Centered Outcomes Research Network data models to fields acr...

Clinical Annotation Research Kit (CLARK): A Computable Phenotyping Tool Using Machine Learning (Preprint)
UNSTRUCTURED Introduction: Computable phenotypes are algorithms that translate clinical features ... more UNSTRUCTURED Introduction: Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient’s medical record. While natural language processing (NLP) methods have shown success in extracting clinical features from text, the use of such tools has generally been limited to research groups with substantial NLP expertise. Our goal was to develop open-source phenotyping software, Clinical Annotation Research Kit (CLARK), that enables clinical and translational researchers to use machine-learning based NLP for computable phenotyping without requiring deep informatics expertise. Methods: CLARK enables non-expert users to mine text using machine learning classifiers by specifying features for the software to match in clinical notes. Once the features are defined, the user-f...

Mechanical Engineering Research, 2016
An elementary computational framework, as a first step to an eventual comprehensive model of human... more An elementary computational framework, as a first step to an eventual comprehensive model of human movement, is presented. Such a model in conjunction with anatomical, physiological and experimental studies should provide a means of verifying theoretical, experimental, and heuristic models of human movement. For this purpose, a three-dimensional three link humanoid model and a two-link planar arm model are presented to explore responses to simple external forces. Such models are useful for a variety of current applications in art, science, engineering, sports, and medicine. The models are subjected to kinesthetic, auditory, and visual inputs. Creating desired behavior is the goal. The models are flexible, modular, and expandable for inclusion of more segments, muscles, and sensory and central nervous system (CNS) processing. Three computer simulations are presented: rhythmic maneuvers of the three link model, in response to periodic motion of a platform, the planar arm producing visua...

eGEMs (Generating Evidence & Methods to improve patient outcomes), 2016
Introduction: In genomics and other fields, it is now possible to capture and store large amounts... more Introduction: In genomics and other fields, it is now possible to capture and store large amounts of data in electronic medical records (EMRs). However, it is not clear if the routine accumulation of massive amounts of (largely uninterpretable) data will yield any health benefits to patients. Nevertheless, the use of large-scale medical data is likely to grow. To meet emerging challenges and facilitate optimal use of genomic data, our institution initiated a comprehensive planning process that addresses the needs of all stakeholders (e.g., patients, families, healthcare providers, researchers, technical staff, administrators). Our experience with this process and a key genomics research project contributed to the proposed framework.Framework: We propose a two-pronged Genomic Clinical Decision Support System (CDSS) that encompasses the concept of the “Clinical Mendeliome” as a patient-centric list of genomic variants that are clinically actionable and introduces the concept of the “A...
Climatologies Based on the Weather Research and Forecast (WRF) Model
2009 DoD High Performance Computing Modernization Program Users Group Conference, 2009

2006 HPCMP Users Group Conference (HPCMP-UGC'06), 2006
At the heart of Network Centric Warfare is the ability for all assets on the battlefield to commu... more At the heart of Network Centric Warfare is the ability for all assets on the battlefield to communicate and coordinate their actions. Therefore, as these systems are being developed they must be tested and evaluated together along with other assets in a networked environment. The key requirement to conducting this type of Test and Evaluation (i.e., distributed testing) is having the necessary expertise to combine networking, security, high performance computing (HPC), and simulation experience as needed. The Army began preparation for testing in a distributed environment more than a decade ago when the Army Test and Evaluation Command created the Virtual Proving Ground. An outgrowth of this technology investment was a series of increasingly complex distributed test events or exercises whose purpose was to provide technology integration points and demonstrate and document the capabilities and methodologies for conducting distributed testing. The experience gained in performing these exercises over the past ten years, raises important questions regarding interoperability of network-centric assets, performance of spatially separated systems (especially those involving Hardware-in-the-Loop (HWIL) assets) and high bandwidth requirements such as video and audio streaming feeds. This paper seeks to expound on a few of these issues as observed from the most recent tests as observed from the US Army Redstone Technical Test Center (RTTC). The latest exercise, Distributed Test Event 5 (DTE-5),
Uploads
Papers by Ashok Krishnamurthy