Premature coronary artery disease (CAD) in younger adults often arises from underrecognized risk ... more Premature coronary artery disease (CAD) in younger adults often arises from underrecognized risk factors such as elevated lipoprotein(a) (Lp(a)), a genetically determined lipoprotein with atherogenic and prothrombotic properties. We report a 45-year-old male with untreated hypertension, prior ischemic stroke, and significant tobacco use, who presented with exertional angina. Laboratory evaluation showed mildly elevated low-density lipoprotein cholesterol (LDL-C; 142 mg/dL), borderline low high-density lipoprotein cholesterol (HDL-C; 38 mg/dL), and markedly elevated Lp(a) (180 mg/dL). Coronary angiography revealed a chronic total occlusion of the proximal left anterior descending (LAD) artery, 90% stenosis of the left circumflex (LCx) artery, and Rentrop grade 3 collateral flow from a codominant right coronary artery. Due to financial constraints, revascularization was deferred. The patient was managed with high-intensity statins, dual antiplatelet therapy, beta-blockers, angiotensin-converting enzyme (ACE) inhibitors, and lifestyle modification. Over follow-up, he showed marked symptomatic improvement, enhanced left ventricular ejection fraction, and partial reversal of diastolic dysfunction. This case highlights the importance of Lp(a) screening in premature CAD and demonstrates that intensive medical therapy can stabilize high-risk patients when revascularization is not feasible.
Predicting the material stability is essential for accelerating the discovery of advanced materia... more Predicting the material stability is essential for accelerating the discovery of advanced materials in renewable energy, aerospace, and catalysis. Traditional approaches, such as Density Functional Theory (DFT), are accurate but computationally expensive and unsuitable for high-throughput screening. This study introduces a machine learning (ML) framework trained on high-dimensional data from the Open Quantum Materials Database (OQMD) to predict formation energy, a key stability metric. Among the evaluated models, deep learning outperformed Gradient Boosting Machines and Random Forest, achieving up to 0.88 R 2 prediction accuracy. Feature importance analysis identified thermodynamic, electronic, and structural properties as the primary drivers of stability, offering interpretable insights into material behavior. Compared to DFT, the proposed ML framework significantly reduces computational costs, enabling the rapid screening of thousands of compounds. These results highlight ML's transformative potential in materials discovery, with direct applications in energy storage, semiconductors, and catalysis.
This study explores advanced data-driven methodologies for forecasting electricity demand and int... more This study explores advanced data-driven methodologies for forecasting electricity demand and integrating renewable energy resources, with a focus on Cornell University's campus infrastructure. Leveraging historical data from energy management systems and regional meteorological records, we developed predictive models to analyze energy consumption patterns and renewable energy generation potential. Techniques such as Long Short-Term Memory (LSTM) networks, ARIMA, Random Forest, and Generative Adversarial Networks (GANs) were employed to capture temporal dependencies and enhance forecasting accuracy. Clustering algorithms, including k-means and Expectation-Maximization (EM), provided insights into energy usage behaviors across different building types and climatic conditions. Our findings reveal significant seasonal and hourly trends in solar and wind energy generation, with complementary patterns that support hybrid renewable energy systems. Predictive models demonstrated high accuracy, enabling the estimation of additional renewable capacity and the design of energy storage solutions to mitigate intermittency challenges. The study highlights the scalability of these methods to other campuses or urban settings and their potential to contribute to carbon neutrality goals. By integrating machine learning with renewable energy management, this research advances the development of sustainable, efficient, and resilient energy systems.
A review of challenges and solutions for using machine learning approaches for missing data, 2024
Missing data poses significant challenges to the reliability of statistical analyses and predicti... more Missing data poses significant challenges to the reliability of statistical analyses and predictive modeling across diverse research fields. This paper provides an in-depth review of both traditional and machine learning imputation techniques, enabling researchers to navigate the complexities of missing data with greater efficacy. We evaluate simple imputation methods, such as mean, median, and mode, and delve into more sophisticated strategies including regression- based, hot and cold deck, and probabilistic models like Gaussian Mixture Models and K-Nearest Neighbors. Furthermore, the paper explores cutting-edge machine learning approaches like Random Forest, Multiple Imputation by Chained Equations, and deep learning models such as autoencoders and Generative Adversarial Networks. Our comprehensive analysis highlights the effectiveness of each method, tailored to various missing data mechanisms MCAR, MAR, and NMAR providing actionable insights for researchers to enhance data integrity and improve the outcomes of their studies.
Warfarin, a commonly prescribed anticoagulant, poses significant dosing challenges due to its nar... more Warfarin, a commonly prescribed anticoagulant, poses significant dosing challenges due to its narrow therapeutic range and high variability in patient responses. This study applies advanced machine learning techniques to improve the accuracy of international normalized ratio (INR) predictions using the MIMIC-III dataset, addressing the critical issue of missing data. By leveraging dimensionality reduction methods such as principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE), and advanced imputation techniques including denoising autoencoders (DAE) and generative adversarial networks (GAN), we achieved significant improvements in predictive accuracy. The integration of these methods substantially reduced prediction errors compared to traditional approaches. This research demonstrates the potential of machine learning (ML) models to provide more personalized and precise dosing strategies that reduce the risks of adverse drug events. Our method could integrate into clinical workflows to enhance anticoagulation therapy in cases of missing data, with potential applications in other complex medical treatments.
This survey rigorously explores contemporary clustering algorithms within the machine learning pa... more This survey rigorously explores contemporary clustering algorithms within the machine learning paradigm, focusing on five primary methodologies: centroid-based, hierarchical, density-based, distribution-based, and graph-based clustering. Through the lens of recent innovations such as deep embedded clustering and spectral clustering, we analyze the strengths, limitations, and the breadth of application domains—ranging from bioinformatics to social network analysis. Notably, the survey introduces novel contributions by integrating clustering techniques with dimensionality reduction and proposing advanced ensemble methods to enhance stability and accuracy across varied data structures. This work uniquely synthesizes the latest advancements and offers new perspectives on overcoming traditional challenges like scalability and noise sensitivity, thus providing a comprehensive roadmap for future research and practical applications in data-intensive environments.
Amphiphilic copolymers (AP) represent a class of novel antibiofouling materials whose chemistry a... more Amphiphilic copolymers (AP) represent a class of novel antibiofouling materials whose chemistry and composition can be tuned to optimize their performance. However, the enormous chemistry‐composition design space associated with AP makes their performance optimization laborious; it is not experimentally feasible to assess and validate all possible AP compositions even with the use of rapid screening methodologies. To address this constraint, a robust model development paradigm is reported, yielding a versatile machine learning approach that accurately predicts biofilm formation by Pseudomonas aeruginosa on a library of AP. The model excels in extracting underlying patterns in a “pooled” dataset from various experimental sources, thereby expanding the design space accessible to the model to a much larger selection of AP chemistries and compositions. The model is used to screen virtual libraries of AP for identification of best‐performing candidates for experimental validation. Initia...
Premature coronary artery disease (CAD) in younger adults often arises from underrecognized risk ... more Premature coronary artery disease (CAD) in younger adults often arises from underrecognized risk factors such as elevated lipoprotein(a) (Lp(a)), a genetically determined lipoprotein with atherogenic and prothrombotic properties. We report a 45-year-old male with untreated hypertension, prior ischemic stroke, and significant tobacco use, who presented with exertional angina. Laboratory evaluation showed mildly elevated low-density lipoprotein cholesterol (LDL-C; 142 mg/dL), borderline low high-density lipoprotein cholesterol (HDL-C; 38 mg/dL), and markedly elevated Lp(a) (180 mg/dL). Coronary angiography revealed a chronic total occlusion of the proximal left anterior descending (LAD) artery, 90% stenosis of the left circumflex (LCx) artery, and Rentrop grade 3 collateral flow from a codominant right coronary artery. Due to financial constraints, revascularization was deferred. The patient was managed with high-intensity statins, dual antiplatelet therapy, beta-blockers, angiotensin-converting enzyme (ACE) inhibitors, and lifestyle modification. Over follow-up, he showed marked symptomatic improvement, enhanced left ventricular ejection fraction, and partial reversal of diastolic dysfunction. This case highlights the importance of Lp(a) screening in premature CAD and demonstrates that intensive medical therapy can stabilize high-risk patients when revascularization is not feasible.
Predicting the material stability is essential for accelerating the discovery of advanced materia... more Predicting the material stability is essential for accelerating the discovery of advanced materials in renewable energy, aerospace, and catalysis. Traditional approaches, such as Density Functional Theory (DFT), are accurate but computationally expensive and unsuitable for high-throughput screening. This study introduces a machine learning (ML) framework trained on high-dimensional data from the Open Quantum Materials Database (OQMD) to predict formation energy, a key stability metric. Among the evaluated models, deep learning outperformed Gradient Boosting Machines and Random Forest, achieving up to 0.88 R 2 prediction accuracy. Feature importance analysis identified thermodynamic, electronic, and structural properties as the primary drivers of stability, offering interpretable insights into material behavior. Compared to DFT, the proposed ML framework significantly reduces computational costs, enabling the rapid screening of thousands of compounds. These results highlight ML's transformative potential in materials discovery, with direct applications in energy storage, semiconductors, and catalysis.
This study explores advanced data-driven methodologies for forecasting electricity demand and int... more This study explores advanced data-driven methodologies for forecasting electricity demand and integrating renewable energy resources, with a focus on Cornell University's campus infrastructure. Leveraging historical data from energy management systems and regional meteorological records, we developed predictive models to analyze energy consumption patterns and renewable energy generation potential. Techniques such as Long Short-Term Memory (LSTM) networks, ARIMA, Random Forest, and Generative Adversarial Networks (GANs) were employed to capture temporal dependencies and enhance forecasting accuracy. Clustering algorithms, including k-means and Expectation-Maximization (EM), provided insights into energy usage behaviors across different building types and climatic conditions. Our findings reveal significant seasonal and hourly trends in solar and wind energy generation, with complementary patterns that support hybrid renewable energy systems. Predictive models demonstrated high accuracy, enabling the estimation of additional renewable capacity and the design of energy storage solutions to mitigate intermittency challenges. The study highlights the scalability of these methods to other campuses or urban settings and their potential to contribute to carbon neutrality goals. By integrating machine learning with renewable energy management, this research advances the development of sustainable, efficient, and resilient energy systems.
A review of challenges and solutions for using machine learning approaches for missing data, 2024
Missing data poses significant challenges to the reliability of statistical analyses and predicti... more Missing data poses significant challenges to the reliability of statistical analyses and predictive modeling across diverse research fields. This paper provides an in-depth review of both traditional and machine learning imputation techniques, enabling researchers to navigate the complexities of missing data with greater efficacy. We evaluate simple imputation methods, such as mean, median, and mode, and delve into more sophisticated strategies including regression- based, hot and cold deck, and probabilistic models like Gaussian Mixture Models and K-Nearest Neighbors. Furthermore, the paper explores cutting-edge machine learning approaches like Random Forest, Multiple Imputation by Chained Equations, and deep learning models such as autoencoders and Generative Adversarial Networks. Our comprehensive analysis highlights the effectiveness of each method, tailored to various missing data mechanisms MCAR, MAR, and NMAR providing actionable insights for researchers to enhance data integrity and improve the outcomes of their studies.
Warfarin, a commonly prescribed anticoagulant, poses significant dosing challenges due to its nar... more Warfarin, a commonly prescribed anticoagulant, poses significant dosing challenges due to its narrow therapeutic range and high variability in patient responses. This study applies advanced machine learning techniques to improve the accuracy of international normalized ratio (INR) predictions using the MIMIC-III dataset, addressing the critical issue of missing data. By leveraging dimensionality reduction methods such as principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE), and advanced imputation techniques including denoising autoencoders (DAE) and generative adversarial networks (GAN), we achieved significant improvements in predictive accuracy. The integration of these methods substantially reduced prediction errors compared to traditional approaches. This research demonstrates the potential of machine learning (ML) models to provide more personalized and precise dosing strategies that reduce the risks of adverse drug events. Our method could integrate into clinical workflows to enhance anticoagulation therapy in cases of missing data, with potential applications in other complex medical treatments.
This survey rigorously explores contemporary clustering algorithms within the machine learning pa... more This survey rigorously explores contemporary clustering algorithms within the machine learning paradigm, focusing on five primary methodologies: centroid-based, hierarchical, density-based, distribution-based, and graph-based clustering. Through the lens of recent innovations such as deep embedded clustering and spectral clustering, we analyze the strengths, limitations, and the breadth of application domains—ranging from bioinformatics to social network analysis. Notably, the survey introduces novel contributions by integrating clustering techniques with dimensionality reduction and proposing advanced ensemble methods to enhance stability and accuracy across varied data structures. This work uniquely synthesizes the latest advancements and offers new perspectives on overcoming traditional challenges like scalability and noise sensitivity, thus providing a comprehensive roadmap for future research and practical applications in data-intensive environments.
Amphiphilic copolymers (AP) represent a class of novel antibiofouling materials whose chemistry a... more Amphiphilic copolymers (AP) represent a class of novel antibiofouling materials whose chemistry and composition can be tuned to optimize their performance. However, the enormous chemistry‐composition design space associated with AP makes their performance optimization laborious; it is not experimentally feasible to assess and validate all possible AP compositions even with the use of rapid screening methodologies. To address this constraint, a robust model development paradigm is reported, yielding a versatile machine learning approach that accurately predicts biofilm formation by Pseudomonas aeruginosa on a library of AP. The model excels in extracting underlying patterns in a “pooled” dataset from various experimental sources, thereby expanding the design space accessible to the model to a much larger selection of AP chemistries and compositions. The model is used to screen virtual libraries of AP for identification of best‐performing candidates for experimental validation. Initia...
Uploads
Papers by Aasim Wani
provides an in-depth review of both traditional and machine learning imputation techniques, enabling researchers to navigate the complexities of missing data with greater efficacy. We evaluate simple imputation methods, such as mean, median, and mode, and delve into more sophisticated strategies including regression- based, hot and cold deck, and probabilistic models like Gaussian Mixture Models and K-Nearest Neighbors. Furthermore, the paper explores cutting-edge machine learning approaches like Random Forest, Multiple Imputation by Chained Equations, and deep learning models such as autoencoders and Generative Adversarial Networks. Our comprehensive analysis highlights the effectiveness of each method, tailored to various missing data mechanisms MCAR, MAR, and NMAR providing actionable insights for researchers to enhance data integrity and improve the outcomes of their studies.
provides an in-depth review of both traditional and machine learning imputation techniques, enabling researchers to navigate the complexities of missing data with greater efficacy. We evaluate simple imputation methods, such as mean, median, and mode, and delve into more sophisticated strategies including regression- based, hot and cold deck, and probabilistic models like Gaussian Mixture Models and K-Nearest Neighbors. Furthermore, the paper explores cutting-edge machine learning approaches like Random Forest, Multiple Imputation by Chained Equations, and deep learning models such as autoencoders and Generative Adversarial Networks. Our comprehensive analysis highlights the effectiveness of each method, tailored to various missing data mechanisms MCAR, MAR, and NMAR providing actionable insights for researchers to enhance data integrity and improve the outcomes of their studies.