Papers by Davide Ballabio

Chemistry Proceedings, 2024
Alkaloids are naturally occurring metabolites with a wide variety of pharmacological activities a... more Alkaloids are naturally occurring metabolites with a wide variety of pharmacological activities and applications in science, particularly in medicinal chemistry as anti-inflammatory drugs. Because they can be labelled as active or inactive compounds against the inflammatory biological response, the aim of this work was to calibrate quantitative structure-activity relationships (QSARs) using machine learning classifiers to predict anti-inflammatory activity based on the molecular structures of alkaloids. A dataset of 100 alkaloids (58 active and 42 inactive) was retrieved from two systematic reviews. Molecules were properly curated, and the molecular geometries of the compounds were optimized using the semi-empirical method (PM3) to calculate molecular descriptors, binary fingerprints (extended-connectivity fingerprints and path fingerprints) and MACCS (Molecular ACCess System) structural keys. Then, we calibrated the QSAR models using well-known linear and non-linear machine learning classifiers, i.e., partial least squares discriminant analysis (PLSDA), random forests (RF), adaptive boosting (AdaBoost), k-nearest neighbors (kNN), N-nearest neighbors (N3) and binned nearest neighbors (BNN). For validation purposes, the dataset was randomly split into a training set and a test set in a 70:30 ratio. When using molecular descriptors, genetic algorithms-variable subset selection (GAs-VSS) was used for supervised feature selection. During the calibration of the models, a five-fold Venetian blinds cross-validation was used to optimize the classifier parameters and to control the presence of overfitting. The performance of the models was quantified by means of the non-error rate (NER) statistical parameter.

Green Analytical Chemistry, 2025
In this study, the application framework of Condensed Phase Membrane Introduction Mass Spectromet... more In this study, the application framework of Condensed Phase Membrane Introduction Mass Spectrometry (CP-MIMS), a direct mass spectrometry technique, is extended for real-time monitoring of migration processes from food contact materials (FCMs) with a focus on Bisphenol A (BPA) as re-emerging contaminant. The whole instrumental system was properly designed to meet important requirements in terms of signal stability, low noise and ease of handling. A dedicated MATLAB APP was developed for semi-automated processing of instrumental output. A full factorial experimental design was applied to optimize five response variables by varying the acceptor phase flow-rate and composition, stirring, temperature, and membrane length. The CP-MIMS method was validated in tap water and food simulants, obtaining detection limits in the 0.8-6 µg/kg range. Considering the great advantage of real-time analysis of BPA migration from FCMs, not yet explored in literature, its high sample throughput and compliance with the green analytical chemistry principles, the CP-MIMS method has proven to be suitable for the determination of BPA below the specific migration limit established by the EU (0.05 mg/kg). The applicability of the method was demonstrated by performing migration tests on plastic articles, acquiring the migration profile of BPA over time for samples that showed detectable release of BPA. Excellent trueness was proved by comparison with a confirmatory liquid chromatography-high resolution mass spectrometry method. This study provides important insights into the role of CP-MIMS in scientific research to achieve valuable temporal resolution in the study of dynamic processes, such as the release of compounds from FCMs.
Science of The Total Environment, 2024

Chemometrics and Intelligent Laboratory Systems, 2024
Clustering is an unsupervised machine learning methodology widely used in several sciences to fin... more Clustering is an unsupervised machine learning methodology widely used in several sciences to find groups of similar patterns in complex data. The results generated by clustering algorithms generally depend on userdefined input parameters such as the number of expected clusters, which can have a great impact on the homogeneity of the identified clusters. Clustering validity indices (CVIs) are an effective method for determining the optimal number of clusters that best fit the natural partition of a dataset. They do not require any underlying assumption nor a priori knowledge about the true dataset structure. Since 1965, many cluster validity indices have been proposed in the literature and used in several different applications. In this paper, the performance of 68 cluster validity indices was evaluated on 21 real-life research and simulated datasets. CVIs were compared on the same partition for each dataset, which was searched for by the kmeans clustering algorithm. Multivariate chemometric methods were applied to disclose mutual relationships among the indices and to select those that are more effective in terms of accuracy and reliability.

Journal of Cheminformatics, 2024
Natural products are a diverse class of compounds with promising biological properties, such as h... more Natural products are a diverse class of compounds with promising biological properties, such as high potency and excellent selectivity. However, they have different structural motifs than typical drug-like compounds, e.g., a wider range of molecular weight, multiple stereocenters and higher fraction of sp 3-hybridized carbons. This makes the encoding of natural products via molecular fingerprints difficult, thus restricting their use in cheminformatics studies. To tackle this issue, we explored over 30 years of research to systematically evaluate which molecular fingerprint provides the best performance on the natural product chemical space. We considered 20 molecular fingerprints from four different sources, which we then benchmarked on over 100,000 unique natural products from the COCONUT (COlleCtion of Open Natural prodUcTs) and CMNPD (Comprehensive Marine Natural Products Database) databases. Our analysis focused on the correlation between different fingerprints and their classification performance on 12 bioactivity prediction datasets. Our results show that different encodings can provide fundamentally different views of the natural product chemical space, leading to substantial differences in pairwise similarity and performance. While Extended Connectivity Fingerprints are the de-facto option to encoding drug-like compounds, other fingerprints resulted to match or outperform them for bioactivity prediction of natural products. These results highlight the need to evaluate multiple fingerprinting algorithms for optimal performance and suggest new areas of research. Finally, we provide an open-source Python package for computing all molecular fingerprints considered in the study, as well as data and scripts necessary to reproduce the results, at https:// github. com/ dahvi da/ NP_ Finge rprin ts.

Chemometrics and Intelligent Laboratory Systems, 2024
Approaches of high-level data fusion, also known as consensus, combine predictions of individual ... more Approaches of high-level data fusion, also known as consensus, combine predictions of individual models to increase reliability and overcome limitations of single models. Consensus strategies are frequently applied in the framework of Quantitative Structure-Activity Relationships (QSARs) to reduce the uncertainties in the prediction of molecular activities and provide better accuracy of the model outcomes. However, specific regions of the chemical space may systematically be associated with low accuracy and even consensus modelling cannot improve prediction reliability through the multiple outcomes of individual models. In this study, a new heuristic metric to assess the degree of accuracy of consensus predictions in the chemical space is proposed. This metric can assist the mapping of reliability in prediction and enhance the delineation of a safe zone, where consensus predictions are expected to have better accuracy. The new metric is calculated by kernel-based potential functions and it can be used in the framework of both classification and regression consensus modelling. Four case studies, including extensive datasets for consensus modelling, were used to test the proposed approach. Results demonstrated that a potential can be associated with regions of the chemical space as a function of accuracy of consensus modelling and it can be used to enable the mapping of reliability in prediction and the definition of specific regions where predictions are expected to be more reliable.
Toxics, 2024
This article is an open access article distributed under the terms and conditions of the Creative... more This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY
Chemometrics and Intelligent Laboratory Systems, 2023
Advanced Materials Interfaces, 2023
Journal of Vacuum Science and Technology A, 2023
Food Research International, 2023
Microchemical Journal , 2023
Advanced Materials Interfaces, 2023
Separations, 2023
This article is an open access article distributed under the terms and conditions of the Creative... more This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY
Journal of Agricultural and Food Chemistry, 2022
Molecules, 2023
This article is an open access article distributed under the terms and conditions of the Creative... more This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY
Frontiers Plant Science, 2022
were found to be deterministic. However, the PCA revealed that the classification between samples... more were found to be deterministic. However, the PCA revealed that the classification between samples (BP vs. non-BP fruit) was not possible by univariate analysis (individual elements or the K/Ca ratio).Therefore, a multivariate classification approach was applied, and the classification measures (sensitivity, specificity, and balanced precision) of the PLS-DA models for all cultivars evaluated ('Granny Smith', 'Fuji' and 'Brookfield') on the full training samples and with both validation procedures (Venetian and Monte Carlo), ranged from 0.76 to 0.92. The results of this work indicate that using this technology at the individual fruit level is essential to understand the factors that determine this disorder and can improve BP prediction of intact fruit.

Separations, 2022
The study concerns the photodegradation of the antidepressant escitalopram (ESC), the Senantiomer... more The study concerns the photodegradation of the antidepressant escitalopram (ESC), the Senantiomer of the citalopram raceme, both in ultrapure and surface water, considering the contribution of indirect photolysis through the presence of nitrate and bicarbonate. The effect of nitrate and bicarbonate concentrations was investigated by full factorial design, and only the nitrate concentration resulted in having a significant effect on the degradation. The kinetics of ESC photodegradation is the pseudo-first-order (half-life = 62.4 h in ultrapure water and 48.4 h in lake water). The generation of transformation products (TPs) was monitored through a developed and validated HPLC-MS/MS method. Fourteen TPs were identified in ultrapure water (one of them, at m/z 261, for the first time) and other two TPs at m/z 327 (found for the first time in this study) were identified only in presence of a nitrate. Several TPs were the same as those formed during the photodegradation of citalopram. The photodegradation pathway of ESC and its mechanism of degradation in water is proposed. The method was applied successfully to the analyses of surface water samples, in which a few dozen of ng L −1 of ESC was determined together with the presence of TP2, TP5 and TP12. Finally, a preliminary in silico evaluation of the toxicological profile and environmental behavior of TPs by computational models was carried out; two TPs (TP4 and TP10) were identified as of potential concern, as they were predicted mutagenic by Ames test model.
Journal of Pharmaceutical and Biomedical Analysis, 2022
Uploads
Papers by Davide Ballabio