Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Energies
The methods used in chemical engineering are strongly reliant on having a solid grasp of the thermodynamic features of complex systems. It is difficult to define the behavior of ions and molecules in complex systems and to make reliable predictions about the thermodynamic features of complex systems across a wide range. Deep learning (DL), which can provide explanations for intricate interactions that are beyond the scope of traditional mathematical functions, would appear to be an effective solution to this problem. In this brief Perspective, we provide an overview of DL and review several of its possible applications within the realm of chemical engineering. DL approaches to anticipate the molecular thermodynamic characteristics of a broad range of systems based on the data that are already available are also described, with numerous cases serving as illustrations.
Digital Discovery
pyRISM combines physics-based calculations and deep learning to rapidly predict solvation free energy in different solvents and temperatures without reparameterization.
Journal of Chemical Theory and Computation, 2022
Machine learning thermodynamic perturbation theory (MLPT) is a promising approach to compute finite temperature properties when the goal is to compare several different levels of ab initio theory and/or to apply highly expensive computational methods. Indeed, starting from a production molecular dynamics trajectory, this method can estimate properties at one or more target levels of theory from only a small number of additional fixed-geometry calculations, which are used to train a machine learning model. However, as MLPT is based on thermodynamic perturbation theory (TPT), inaccuracies might arise when the starting point trajectory samples a configurational space which has a small overlap with that of the target approximations of interest. By considering case studies of molecules adsorbed in zeolites and several different density functional theory approximations, in this work we assess the accuracy of MLPT for ensemble total energies and enthalpies of adsorption. The problematic cases that were found are analyzed and it is shown that, even without knowing exact reference results, pathological cases for MLPT can be detected by considering a coefficient that measures the statistical imbalance induced by the TPT reweighting. For the most pathological examples we recover target level results within chemical accuracy by applying a machine learning-based Monte Carlo (MLMC) resampling. Finally, based on the ideas developed in this work, we assess and confirm the accuracy of recently published MLPT-based enthalpies of adsorption at the random phase approximation level, whose high computational cost would completely hinder a direct molecular dynamics simulation.
Prediction of aqueous solubilities or hydration free energies is an extensively studied area in machine learning applications on chemistry since water is the sole solvent in the living system. However, for non-aqueous solutions, few machine learning studies have been undertaken so far despite the fact that the solvation mechanism plays an important role in various chemical reactions. Here, we introduce a novel, machine-learning based quantitative structure-property prediction method which predicts solvation free energies for various organic solute and solvent…
Chemical Science, 2022
Machine learning techniques including neural networks are popular tools for chemical, physical and materials applications searching for viable alternative methods in the analysis of structure and energetics of systems ranging from crystals to biomolecules. Efforts are less abundant for prediction of kinetics and dynamics. Here we explore the ability of three well established recurrent neural network architectures for reproducing and forecasting the energetics of a liquid solution of ethyl acetate containing a macromolecular polymer-lipid aggregate at ambient conditions. Data models from three recurrent neural networks, ERNN, LSTM and GRU, are trained and tested on half million points time series of the macromolecular aggregate potential energy and its interaction energy with the solvent obtained from molecular dynamics simulations. Our exhaustive analyses convey that the recurrent neural network architectures investigated generate data models that reproduce excellently the time series although their capability of yielding short or long term energetics forecasts with expected statistical distributions of the time points is limited. We propose an in silico protocol by extracting time patterns of the original series and utilizing these patterns to create an ensemble of artificial network models trained on an ensemble of time series seeded by the additional time patters. The energetics forecast improve, predicting a band of forecasted time series with a spread of values consistent with the molecular dynamics energy fluctuations span. Although the distribution of points from the band of energy forecasts is not optimal, the proposed in silico protocol provides useful estimates of the solvated macromolecular aggregate fate. Given the growing application of artificial networks in materials design, the data-based protocol presented here expands the realm of science areas where supervised machine learning serves as a decision making tool aiding the simulation practitioner to assess when long simulations are worth to be continued.
Chemical Science
Prediction of aqueous solubilities or hydration free energies is an extensively studied area in machine learning applications to chemistry since water is the sole solvent in the living system. However,...
The Journal of Physical Chemistry A
arXiv: Chemical Physics, 2017
Neural networks are being used to make new types of empirical chemical models as inexpensive as force fields, but with accuracy close to the ab-initio methods used to build them. Besides modeling potential energy surfaces, neural-nets can provide qualitative insights and make qualitative chemical trends quantitatively predictable. In this work we present a neural-network that predicts the energies of molecules as a sum of bond energies. The network learns the total energies of the popular GDB9 dataset to a competitive MAE of 0.94 kcal/mol. The method is naturally linearly scaling, and applicable to molecules of nanoscopic size. More importantly it gives chemical insight into the relative strengths of bonds as a function of their molecular environment, despite only being trained on total energy information. We show that the network makes predictions of relative bond strengths in good agreement with measured trends and human predictions. We show that DIM-NN learns the same heuristic t...
2020
Recent advances in machine learning technologies and their chemical applications lead to the developments of diverse structure-property relationship based prediction models for various chemical properties. The free energy of solvation is one of them and plays a dominant role as a fundamental measure of solvation chemistry. Here, we introduce a novel machine learning-based solvation model, which calculates the target solvation free energy from pairwise atomistic interactions. The novelty of our proposed solvation model involves rather a simple architecture: two encoding function extracts vector representations of the atomic and the molecular features from the given chemical structure, while the inner product between two atomistic features calculates their interactions, instead of black-boxed perceptron networks. The cross-validation results on 6,493 experimental measurements for 952 organic solutes and 147 organic solvents achieve an outstanding performance, which is 0.2 kcal/mol in ...
The Journal of Physical Chemistry Letters
Activity coefficients, which are a measure of the non-ideality of liquid mixtures, are a key property in chemical engineering with relevance to modeling chemical and phase equilibria as well as transport processes. Although experimental data on thousands of binary mixtures are available, prediction methods are needed to calculate the activity coefficients in many relevant mixtures that have not been explored to-date. In this report, we propose a probabilistic matrix factorization model for predicting the activity coefficients in arbitrary binary mixtures. Although no physical descriptors for the considered components were used, our method outperforms the state-of-the-art method that has been refined over three decades while requiring much less training effort. This opens perspectives to novel methods for predicting physico-chemical properties of binary mixtures with the potential to revolutionize modeling and simulation in chemical engineering.
HAL (Le Centre pour la Communication Scientifique Directe), 2022
Phase equilibrium calculations are an essential part of numerical simulations of multi-component multi-phase flow in porous media, accounting for the largest share of the computational time. In this work, we introduce a GPUenabled, fast, and parallel framework, PTFlash, that vectorizes algorithms required for isothermal two-phase flash calculations using PyTorch, and can facilitate a wide range of downstream applications. In addition, to further accelerate PTFlash, we design two task-specific neural networks, one for predicting the stability of given mixtures and the other for providing estimates of the distribution coefficients, which are trained offline and help shorten computation time by sidestepping stability analysis and reducing the number of iterations to reach convergence. The evaluation of PTFlash was conducted on three case studies involving hydrocarbons, CO 2 and N 2 , for which the phase equilibrium was tested over a large range of temperature, pressure and composition conditions, using the Soave-Redlich-Kwong (SRK) equation of state. We compare PTFlash with an in-house thermodynamic library, Carnot, written in C++ and performing flash calculations one by one on CPU. Results show speed-ups on large scale calculations up to two order of magnitudes, while maintaining perfect precision with the reference solution provided by Carnot.
The Journal of Physical Chemistry A
Machine learning provides promising new methods for accurate yet rapid prediction of molecular properties, including thermochemistry, which is an integral component of many computer simulations, particularly automated reaction mechanism generation. Often, very large datasets with tens of thousands of molecules are required for training the models, but most datasets of experimental or high-accuracy quantum mechanical quality are much smaller. To overcome these limitations, we calculate new high-level datasets and derive bond additivity corrections to significantly improve enthalpies of formation. We adopt a transfer learning technique to train neural network models that achieve good performance even with a relatively small set of high-accuracy data. The training data for the entropy model is carefully selected so that important conformational effects are captured. The resulting models are generally applicable thermochemistry predictors for organic compounds with oxygen and nitrogen heteroatoms that approach experimental and coupled cluster accuracy while only requiring molecular graph inputs. Due to their versatility and the ease of adding new training data, they are poised to replace conventional estimation methods for thermochemical parameters in reaction mechanism generation. Since high-accuracy data is often sparse, similar transfer learning approaches are expected to be useful for estimating many other molecular properties.
Journal of Integrative Bioinformatics
We present a flexible deep convolutional neural network method for the analysis of arbitrary sized graph structures representing molecules. This method, which makes use of the Lipinski RDKit module, an open-source cheminformatics software, enables the incorporation of any global molecular (such as molecular charge and molecular weight) and local (such as atom hybridization and bond orders) information. In this paper, we show that this method significantly outperforms another recently proposed method based on deep convolutional neural networks on several datasets that are studied. Several best practices for training deep convolutional neural networks on chemical datasets are also highlighted within the article, such as how to select the information to be included in the model, how to prevent overfitting and how unbalanced classes in the data can be handled.
Quantitative prediction of reaction properties, such as activation energy, have been limited due to a lack of available training data. Such predictions would be useful for computer-assisted reaction mechanism generation and organic synthesis planning. We develop a template-free deep learning model to predict activation energy given reactant and product graphs and train the model on a new, diverse data set of gas-phase quantum chemistry reactions. We demonstrate that our model achieves accurate predictions and agrees with an intuitive understanding of chemical reactivity. With the continued generation of quantitative chemical reaction data and development of methods that leverage such data, we expect many more methods for reactivity prediction to become available in the near future.
In the space of only a few years, deep generative modeling has revolutionized how we think of artificial creativity, yielding autonomous systems which produce original images, music, and text. Inspired by these successes, researchers are now applying deep generative modeling techniques to the generation and optimization of molecules-in our review we found 45 papers on the subject published in the past two years. These works point to a future where such systems will be used to generate lead molecules, greatly reducing resources spent downstream synthesizing and characterizing bad leads in the lab. In this review we survey the increasingly complex landscape of models and representation schemes that have been proposed. The four classes of techniques we describe are recursive neural networks, autoencoders, generative adversarial networks, and reinforcement learning. After first discussing some of the mathematical fundamentals of each technique, we draw high level connections and comparisons with other techniques and expose the pros and cons of each. Several important high level themes emerge as a result of this work, including the shift away from the SMILES string representation of molecules towards more sophisticated representations such as graph grammars and 3D representations, the importance of reward function design, the need for better standards for benchmark-ing and testing, and the benefits of adversarial training and reinforcement learning over maximum likelihood based training. The average cost to bring a new drug to market is now well over one billion USD, 1 with an average time from discovery to market of 13 years. 2 Outside of pharmaceuticals the average time from discovery to commercial production can be even longer, for instance for energetic molecules it is 25 years. 3 A critical first step in molecular discovery is generating a pool of candidates for computational study or synthesis and characterization. This is a daunting task because the space of possible molecules is enormous-the number of potential drug-like compounds has been estimated to be between 10 23 and 10 60 , 4 while the number of all compounds that have been synthesized is on the order of 10 8. Heuristics, such as Lipin-ski's "rule of five" for pharmaceuticals 5 can help narrow the space of possibilities, but the task remains daunting. High throughput screening (HTS) 6 and high throughput virtual screening (HTVS) 7 techniques have made larger parts of chemical space accessible to computational and experimental study. Machine learning has been shown to be capable of yielding rapid and accurate property predictions for many properties of interest and is being integrated into screening pipelines, since it is orders of magnitude faster than traditional computational chemistry methods. 8 Techniques for the interpretation and "inversion" of a machine learning model can illuminate structure-property relations that have been learned by the model which can in turn be used to guide the design of new lead molecules. 9,10 However even with these new techniques bad leads still waste limited supercomputer and laboratory resources, so minimizing the number of bad leads generated at the start of the pipeline remains a) Electronic mail: [email protected] a key priority. The focus of this review is on the use of deep learning techniques for the targeted generation of molecules and guided exploration of chemical space. We note that machine learning (and more broadly artificial intelligence) is having an impact on accelerating other parts of the chemical discovery pipeline as well, via machine learning accelerated ab-initio simulation, 8 machine learning based reaction prediction, 11,12 deep learning based synthesis planning, 13 and the development of high-throughput "self-driving" robotic laboratories. 14,15 Deep neural networks, which are often defined as networks with more than three layers, have been around for many decades but until recently were difficult to train and fell behind other techniques for classification and regression. By most accounts, the deep learning revolution in machine learning began in 2012, when deep neu-ral network based models began to win several different competitions for the first time. First came a demonstration by Cire¸sanCire¸san et al. of how deep neural networks could achieve near-human performance on the task of handwritten digit classification. 16 Next came groundbreaking work by Krizhevsky et al. which showed how deep convo-lutional networks achieved superior performance on the 2010 ImageNet image classification challenge. 17 Finally, around the same time in 2012, a multitask neural network developed by Dahl et al. won the "Merck Molecular Activity Challenge" to predict the molecular activities of molecules at 15 different sites in the body, beating out more traditional machine learning approaches such as boosted decision trees. 18 One of the key technical advances published that year and used by both Krizhevsky et al. and Dahl et al. was a novel regularization trick called "dropout".
Nanotechnology, 2017
Single-molecule experiments typically produce large datasets, which can be challenging to analyse. Hypothesis-driven data exploration, informed by an expectation of the signal characteristics, can lead to interpretation bias or loss of information. The latest developments in Machine Learning, so-called Deep Learning (DL) methods offer an interesting, new avenue to address such challenges. A DL algorithm 'learns' certain data features towards an optimal prediction (e.g., a base call in DNA sequencing). In some applications, such as speech and image recognition, DL has been able to outperform conventional Machine Learning strategies and even human performance. However, to date DL has not been applied in single-molecule science, to the best of our knowledge. We illustrate how it can be used, and discuss some of the strengths and weaknesses of the approach. In particular, a 'deep' neural network has many features of a 'black box', which marks a significant departure from hypothesis-led data interpretation. Single-molecule sensing, sequencing and characterisation based on electric or optical techniques can generate very large datasets, in some cases in a (semi-) automated fashion 1-4. This data is typically one-dimensional, in the sense that the conductance, intensity or current is being recorded as a function of a single, independent variable, for example time or electrode distance, as illustrated in Fig. 1. The analysis can be very challenging, not only because of the sheer size of the dataset, but also due to insufficient knowledge about the characteristic signal features and poor signal-tonoise ratio (S/N). Some hypothesis-driven data exploration and selection is often required: Led by an expectation that may for example be informed by a physical model or even intuition, the experimenter inspects the data either manually or
ArXiv, 2022
Currently machine learning techniques including neural networks are popular tools for materials and chemical scientists with applications that may provide viable alternative methods in the analysis of structure and energetics of systems ranging from crystals to biomolecules. However, efforts are less abundant for prediction of kinetics and dynamics. Here we explore the ability of three well established recurrent neural network architectures for forecasting the energetics of a macromolecular polymer-lipid aggregate solvated in ethyl acetate at ambient conditions. The solvated 4-macromolecule aggregate is considered to be a pre-micellar formation of finite life span. Data models generated from three recurrent neural networks, ERNN, LSTM and GRU, are trained and tested on nanoseconds-long time series of the intra-macromolecules potential energy and their interaction energy with the solvent generated from Molecular Dynamics and containing half million points. Our exhaustive analyses con...
The Journal of Physical Chemistry Letters
Machine learning (ML) is quickly becoming a premier tool for modeling chemical processes and materials. ML-based force fields, trained on large data sets of high-quality electron structure calculations, are particularly attractive due their unique combination of computational efficiency and physical accuracy. This Perspective summarizes some recent advances in the development of neural network-based interatomic potentials. Designing high-quality training data sets is crucial to overall model accuracy. One strategy is active learning, in which new data are automatically collected for atomic configurations that produce large ML uncertainties. Another strategy is to use the highest levels of quantum theory possible. Transfer learning allows training to a data set of mixed fidelity. A model initially trained to a large data set of density functional theory calculations can be significantly improved by retraining to a relatively small data set of expensive coupled cluster theory calculations. These advances are exemplified by applications to molecules and materials.
Computational Materials Science, 2021
Simulation of reasonable timescales for any long physical process using molecular dynamics (MD) is a major challenge in computational physics. In this study, we have implemented an approach based on multi-fidelity physics informed neural network (MPINN) to achieve long-range MD simulation results over a large sample space with significantly less computational cost. The fidelity of our present multi-fidelity study is based on the integration timestep size of MD simulations. While MD simulations with larger timestep produce results with lower level of accuracy, it can provide enough computationally cheap training data for the MPINN to learn an accurate relationship between these low-fidelity results and high-fidelity MD results obtained using smaller simulation timestep. We have performed two benchmark studies, involving one and two component Lennard-Jones systems, to determine the optimum percentage of high-fidelity training data required to achieve accurate results with high computational saving. The results show that important system properties such as system energy per atom, system pressure and diffusion coefficients can be determined with high accuracy while saving 68% computational costs. Finally, as a demonstration of the applicability of our present methodology in practical MD studies, we have studied the viscosity of argon-copper nanofluid and its variation with temperature and volume fraction by MD simulation using MPINN. Then we have compared them with numerous previous studies and theoretical models. Our results indicate that MPINN can predict accurate nanofluid viscosity at a wide range of sample space with significantly small number of MD simulations. Our present methodology is the first implementation of MPINN in conjunction with MD simulation for predicting nanoscale properties. This can pave pathways to investigate more complex engineering problems that demand long-range MD simulations.
2019
Recent advances in artificial intelligence along with development of large datasets of energies calculated using quantum mechanical (QM)/density functional theory (DFT) methods have enabled prediction of accurate molecular energies at reasonably low computational cost. However, machine learning models that have been reported so far requires the atomic positions obtained from geometry optimizations using high level QM/DFT methods as input in order to predict the energies, and do not allow for geometry optimization. In this paper, a transferable and molecule-size independent machine learning model (BAND NN) based on a chemically intuitive representation inspired by molecular mechanics force fields is presented. The model predicts the atomization energies of equilibrium and non-equilibrium structures as sum of energy contributions from bonds (B), angles (A), nonbonds (N) and dihedrals (D) at remarkable accuracy. The robustness of the proposed model is further validated by calculations ...
Journal of Chemical Information and Modeling, 2021
Solvation free energy is a fundamental property that influences various chemical and biological processes, such as reaction rates, protein folding, drug binding, and bioavailability of drugs. In this work, we present a deep learning method based on graph networks to accurately predict solvation free energies of small organic molecules. The proposed model, comprising three phases, namely, message passing, interaction, and prediction, is able to predict solvation free energies in any generic organic solvent with a mean absolute error of 0.16 kcal/mol. In terms of accuracy, the current model outperforms all of the proposed machine learning-based models so far. The atomic interactions predicted in an unsupervised manner are able to explain the trends of free energies consistent with chemical wisdom. Further, the robustness of the machine learning-based model has been tested thoroughly, and its capability to interpret the predictions has been verified with several examples.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.