1
|
Chemla Y, Levin I, Fan Y, Johnson AA, Coley CW, Voigt CA. Hyperspectral reporters for long-distance and wide-area detection of gene expression in living bacteria. Nat Biotechnol 2025:10.1038/s41587-025-02622-y. [PMID: 40216953 DOI: 10.1038/s41587-025-02622-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2024] [Accepted: 02/27/2025] [Indexed: 04/27/2025]
Abstract
Genetically encoded reporters are suitable for short-distance imaging in the laboratory but not for scanning wide outdoor areas from a distance. Here we introduce hyperspectral reporters (HSRs) designed for hyperspectral imaging cameras that are commonly mounted on unmanned aerial vehicles and satellites. HSR genes encode enzymes that produce a molecule with a unique absorption signature that can be reliably distinguished in hyperspectral images. Quantum mechanical simulations of 20,170 metabolites identified candidate HSRs, leading to the selection of biliverdin IXα and bacteriochlorophyll a for their distinct absorption spectra and biosynthetic feasibility. These genes were integrated into chemical sensor circuits in soil (Pseudomonas putida) and aquatic (Rubrivivax gelatinosus) bacteria. The bacteria were detectable outdoors under ambient light from up to 90 m in a single 4,000-m2 hyperspectral image taken using fixed and unmanned aerial vehicle-mounted cameras. The dose-response functions of the chemical sensors were measured remotely. HSRs enable large-scale studies and applications in ecology, agriculture, environmental monitoring, forensics and defense.
Collapse
Affiliation(s)
- Yonatan Chemla
- Synthetic Biology Center, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Itai Levin
- Synthetic Biology Center, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Yueyang Fan
- Synthetic Biology Center, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Anna A Johnson
- Synthetic Biology Center, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Christopher A Voigt
- Synthetic Biology Center, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
2
|
Alfonso-Ramos JE, Adamo C, Brémond É, Stuyver T. Improving the Reliability of, and Confidence in, DFT Functional Benchmarking through Active Learning. J Chem Theory Comput 2025; 21:1752-1761. [PMID: 39893680 DOI: 10.1021/acs.jctc.4c01729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2025]
Abstract
Validating the performance of exchange-correlation functionals is vital to ensure the reliability of density functional theory (DFT) calculations. Typically, these validations involve benchmarking data sets. Currently, such data sets are usually assembled in an unprincipled manner, suffering from uncontrolled chemical bias, and limiting the transferability of benchmarking results to a broader chemical space. In this work, a data-efficient solution based on active learning is explored to address this issue. Focusing─as a proof of principle─on pericyclic reactions, we start from the BH9 benchmarking data set and design a chemical reaction space around this initial data set by combinatorially combining reaction templates and substituents. Next, a surrogate model is trained to predict the standard deviation of the activation energies computed across a selection of 20 distinct DFT functionals. With this model, the designed chemical reaction space is explored, enabling the identification of challenging regions, i.e., regions with large DFT functional divergence, for which representative reactions are subsequently acquired as additional training points. Remarkably, it turns out that the function mapping the molecular structure to functional divergence is readily learnable; convergence is reached upon the acquisition of fewer than 100 reactions. With our final updated model, a more challenging─and arguably more representative─pericyclic benchmarking data set is curated, and we demonstrate that the functional performance has changed significantly compared to the original BH9 subset.
Collapse
Affiliation(s)
- Javier E Alfonso-Ramos
- Ecole Nationale Supérieure de Chimie de Paris, Université PSL, CNRS, i-CLeHS, 75 005 Paris, France
| | - Carlo Adamo
- Ecole Nationale Supérieure de Chimie de Paris, Université PSL, CNRS, i-CLeHS, 75 005 Paris, France
| | - Éric Brémond
- Université Paris Cité, CNRS, ITODYS, 75 013 Paris, France
| | - Thijs Stuyver
- Ecole Nationale Supérieure de Chimie de Paris, Université PSL, CNRS, i-CLeHS, 75 005 Paris, France
| |
Collapse
|
3
|
Morán-González L, Betten JE, Kneiding H, Balcells D. AABBA Graph Kernel: Atom-Atom, Bond-Bond, and Bond-Atom Autocorrelations for Machine Learning. J Chem Inf Model 2024; 64:8756-8769. [PMID: 39580812 PMCID: PMC11632777 DOI: 10.1021/acs.jcim.4c01583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 11/03/2024] [Accepted: 11/15/2024] [Indexed: 11/26/2024]
Abstract
Graphs are one of the most natural and powerful representations available for molecules; natural because they have an intuitive correspondence to skeletal formulas, the language used by chemists worldwide, and powerful, because they are highly expressive both globally (molecular topology) and locally (atom and bond properties). Graph kernels are used to transform molecular graphs into fixed-length vectors, which, based on their capacity of measuring similarity, can be used as fingerprints for machine learning (ML). To date, graph kernels have mostly focused on the atomic nodes of the graph. In this work, we developed a graph kernel based on atom-atom, bond-bond, and bond-atom (AABBA) autocorrelations. The resulting vector representations were tested on regression ML tasks on a data set of transition metal complexes; a benchmark motivated by the higher complexity of these compounds relative to organic molecules. In particular, we tested different flavors of the AABBA kernel in the prediction of the energy barriers and bond distances of the Vaska's complex data set (Friederich et al., Chem. Sci., 2020, 11, 4584). For a variety of ML models, including neural networks, gradient boosting machines, and Gaussian processes, we showed that AABBA outperforms the baseline including only atom-atom autocorrelations. Dimensionality reduction studies also showed that the bond-bond and bond-atom autocorrelations yield many of the most relevant features. We believe that the AABBA graph kernel can accelerate the exploration of large chemical spaces and inspire novel molecular representations in which both atomic and bond properties play an important role.
Collapse
Affiliation(s)
- Lucía Morán-González
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033 0315 Oslo, Norway
- Centre
for Materials Science and Nanotechnology, Department of Chemistry, University of Oslo, P.O.
Box 1033 0315 Oslo, Norway
| | - Jørn Eirik Betten
- Simula
Research Laboratory, Kristian Augusts Gate 23, 0164 Oslo, Norway
| | - Hannes Kneiding
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033 0315 Oslo, Norway
| | - David Balcells
- Hylleraas
Centre for Quantum Molecular Sciences, Department of Chemistry, University of Oslo, P.O. Box 1033 0315 Oslo, Norway
| |
Collapse
|
4
|
Gould T, Chan B, Dale SG, Vuckovic S. Identifying and embedding transferability in data-driven representations of chemical space. Chem Sci 2024; 15:11122-11133. [PMID: 39027290 PMCID: PMC11253166 DOI: 10.1039/d4sc02358g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 06/02/2024] [Indexed: 07/20/2024] Open
Abstract
Transferability, especially in the context of model generalization, is a paradigm of all scientific disciplines. However, the rapid advancement of machine learned model development threatens this paradigm, as it can be difficult to understand how transferability is embedded (or missed) in complex models developed using large training data sets. Two related open problems are how to identify, without relying on human intuition, what makes training data transferable; and how to embed transferability into training data. To solve both problems for ab initio chemical modelling, an indispensable tool in everyday chemistry research, we introduce a transferability assessment tool (TAT) and demonstrate it on a controllable data-driven model for developing density functional approximations (DFAs). We reveal that human intuition in the curation of training data introduces chemical biases that can hamper the transferability of data-driven DFAs. We use our TAT to motivate three transferability principles; one of which introduces the key concept of transferable diversity. Finally, we propose data curation strategies for general-purpose machine learning models in chemistry that identify and embed the transferability principles.
Collapse
Affiliation(s)
- Tim Gould
- Queensland Micro- and Nanotechnology Centre, Griffith University Nathan Qld 4111 Australia
| | - Bun Chan
- Graduate School of Engineering, Nagasaki University Bunkyo 1-14 Nagasaki 852-8521 Japan
| | - Stephen G Dale
- Queensland Micro- and Nanotechnology Centre, Griffith University Nathan Qld 4111 Australia
- Institute of Functional Intelligent Materials, National University of Singapore 4 Science Drive 2 Singapore 117544
| | - Stefan Vuckovic
- Department of Chemistry, University of Fribourg Fribourg Switzerland
| |
Collapse
|
5
|
Avagliano D, Skreta M, Arellano-Rubach S, Aspuru-Guzik A. DELFI: a computer oracle for recommending density functionals for excited states calculations. Chem Sci 2024; 15:4489-4503. [PMID: 38516092 PMCID: PMC10952086 DOI: 10.1039/d3sc06440a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 02/05/2024] [Indexed: 03/23/2024] Open
Abstract
Density functional theory (DFT) is the workhorse of computational quantum chemistry. One of its main limitations is that choosing the right functional is a non-trivial task left for human experts. The choice is particularly hard for excited state calculations when using its time-dependent formulation (TD-DFT). This is due to the approximations of the method, but also because the photophysical properties of a molecule are defined by a manifold of states that all need to be properly described. This includes not only the relative energy of the states, but also capturing the correct character, order, and intensity of the transitions. In this work, we developed a neural network to recommend functionals to be used on molecules for TD-DFT calculations, by simultaneously considering all these properties for a manifold of states. This was possible by developing a scoring system to define the accuracy of an excited state's calculation against a higher-accuracy reference. The scoring system is generalizable to any level of theory; we here applied it to evaluate the performance of common functionals of different rungs against a higher accuracy method on a large set of organic molecules. The results are collected in a database that we released and made open, providing four million data points to the community for future applications. The scoring system assigns a value between zero and one hundred to each functional for each molecule, transforming the complicated task of learning photophysical properties into a simpler regression task. We used the dataset to train a graph attention neural network to predict the scores for unseen molecules. We call this oracle DELFI (Data-driven EvaLuation of Functionals by Inference), which can be used to quickly screen and predict the ranking of functionals to calculate the optical properties of organic molecules. We validated DELFI in two in silico experiments: choosing a common functional for a series of spiropyran-merocyanine isomers and a unique functional to screen a large dataset of over 50 000 organic photovoltaic molecules, for which an extensive benchmark would be unfeasible. A corresponding web application allows DELFI to be easily run and the results to be analyzed, alleviating the hurdle of choosing the right functional for TD-DFT calculations.
Collapse
Affiliation(s)
- Davide Avagliano
- Department of Chemistry, University of Toronto 80 St. George Street Toronto ON M5S 3H6 Canada
- Department of Computer Science, University of Toronto 40 St. George Street Toronto ON M5S 2E4 Canada
| | - Marta Skreta
- Department of Computer Science, University of Toronto 40 St. George Street Toronto ON M5S 2E4 Canada
- Vector Institute for Artificial Intelligence 661 University Ave. Suite 710 ON M5G 1M1 Toronto Canada
| | | | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto 80 St. George Street Toronto ON M5S 3H6 Canada
- Department of Computer Science, University of Toronto 40 St. George Street Toronto ON M5S 2E4 Canada
- Vector Institute for Artificial Intelligence 661 University Ave. Suite 710 ON M5G 1M1 Toronto Canada
- Department of Materials Science & Engineering, University of Toronto 184 College St Toronto M5S 3E4 Canada
- Department of Chemical Engineering & Applied Chemistry, University of Toronto 200 College St ON M5S 3E5 Toronto Canada
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR) 66118 University Ave. M5G 1M1 Toronto Canada
- Acceleration Consortium 80 St George St M5S 3H6 Toronto Canada
| |
Collapse
|
6
|
Jose A, Devijver E, Jakse N, Poloni R. Informative Training Data for Efficient Property Prediction in Metal-Organic Frameworks by Active Learning. J Am Chem Soc 2024; 146:6134-6144. [PMID: 38404041 DOI: 10.1021/jacs.3c13687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
In recent data-driven approaches to material discovery, scenarios where target quantities are expensive to compute and measure are often overlooked. In such cases, it becomes imperative to construct a training set that includes the most diverse, representative, and informative samples. Here, a novel regression tree-based active learning algorithm is employed for such a purpose. It is applied to predict the band gap and adsorption properties of metal-organic frameworks (MOFs), a novel class of materials that results from the virtually infinite combinations of their building units. Simpler and low dimensional descriptors, such as those based on stoichiometric and geometric properties, are used to compute the feature space for this model owing to their ability to better represent MOFs in the low data regime. The partitions given by a regression tree constructed on the labeled part of the data set are used to select new samples to be added to the training set, thereby limiting its size while maximizing the prediction quality. Tests on the QMOF, hMOF, and dMOF data sets reveal that our method constructs small training data sets to learn regression models that predict the target properties more efficiently than existing active learning approaches, and with lower variance. Specifically, our active learning approach is highly beneficial when labels are unevenly distributed in the descriptor space and when the label distribution is imbalanced, which is often the case for real world data. The regions defined by the tree help in revealing patterns in the data, thereby offering a unique tool to efficiently analyze complex structure-property relationships in materials and accelerate materials discovery.
Collapse
Affiliation(s)
- Ashna Jose
- SIMaP, Grenoble-INP, CNRS, University of Grenoble Alpes, Grenoble 38042, France
| | - Emilie Devijver
- LiG, Grenoble-INP, CNRS, University of Grenoble Alpes, Grenoble 38042, France
| | - Noel Jakse
- SIMaP, Grenoble-INP, CNRS, University of Grenoble Alpes, Grenoble 38042, France
| | - Roberta Poloni
- SIMaP, Grenoble-INP, CNRS, University of Grenoble Alpes, Grenoble 38042, France
| |
Collapse
|
7
|
Zhang Z, Li J, Wang YG. Modeling Interfacial Dynamics on Single Atom Electrocatalysts: Explicit Solvation and Potential Dependence. Acc Chem Res 2024; 57:198-207. [PMID: 38166366 DOI: 10.1021/acs.accounts.3c00589] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2024]
Abstract
ConspectusSingle atom electrocatalysts, with noble metal-free composition, maximal atom efficiency, and exceptional reactivity toward various energy and environmental applications, have become a research hot spot in the recent decade. Their simplicity and the isolated nature of the atomic structure of their active site have also made them an ideal model catalyst system for studying reaction mechanisms and activity trends. However, the state of the single atom active sites during electrochemical reactions may not be as simple as is usually assumed. To the contrary, the single atom electrocatalysts have been reported to be under greater influence from interfacial dynamics, with solvent and electrolyte ions perpetually interacting with the electrified active center under an applied electrode potential. These complexities render the activity trends and reaction mechanisms derived from simplistic models dubious.In this Account, with a few popular single atom electrocatalysis systems, we show how the change in electrochemical potential induces nontrivial variation in the free energy profile of elemental electrochemical reaction steps, demonstrate how the active centers with different electronic structure features can induce different solvation structures at the interface even for the same reaction intermediate of the simplest electrochemical reaction, and discuss the implication of the complexities on the kinetics and thermodynamics of the reaction system to better address the activity and selectivity trends. We also venture into more intriguing interfacial phenomena, such as alternative reaction pathways and intermediates that are favored and stabilized by solvation and polarization effects, long-range interfacial dynamics across the region far beyond the contact layer, and the dynamic activation or deactivation of single atom sites under operation conditions. We show the necessity of including realistic aspects (explicit solvent, electrolyte, and electrode potential) into the model to correctly capture the physics and chemistry at the electrochemical interface and to understand the reaction mechanisms and reactivity trends. We also demonstrate how the popular simplistic design principles fail and how they can be revised by including the kinetics and interfacial factors in the model. All of these rich dynamics and chemistry would remain hidden or overlooked otherwise. We believe that the complexity at an electrochemical interface is not a curse but a blessing in that it enables deeper understanding and finer control of the potential-dependent free energy landscape of electrochemical reactions, which opens up new dimensions for further design and optimization of single atom electrocatalysts and beyond. Limitations of current methods and challenges faced by the theoretical and experimental communities are discussed, along with the possible solutions awaiting development in the future.
Collapse
Affiliation(s)
- Zisheng Zhang
- Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, California 90095, United States
| | - Jun Li
- Department of Chemistry and Key Laboratory of Organic Optoelectronics & Molecular Engineering of Ministry of Education, Tsinghua University, Beijing 100084, China
| | | |
Collapse
|
8
|
Duan C, Du Y, Jia H, Kulik HJ. Accurate transition state generation with an object-aware equivariant elementary reaction diffusion model. NATURE COMPUTATIONAL SCIENCE 2023; 3:1045-1055. [PMID: 38177724 DOI: 10.1038/s43588-023-00563-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 11/03/2023] [Indexed: 01/06/2024]
Abstract
Transition state search is key in chemistry for elucidating reaction mechanisms and exploring reaction networks. The search for accurate 3D transition state structures, however, requires numerous computationally intensive quantum chemistry calculations due to the complexity of potential energy surfaces. Here we developed an object-aware SE(3) equivariant diffusion model that satisfies all physical symmetries and constraints for generating sets of structures-reactant, transition state and product-in an elementary reaction. Provided reactant and product, this model generates a transition state structure in seconds instead of hours, which is typically required when performing quantum-chemistry-based optimizations. The generated transition state structures achieve a median of 0.08 Å root mean square deviation compared to the true transition state. With a confidence scoring model for uncertainty quantification, we approach an accuracy required for reaction barrier estimation (2.6 kcal mol-1) by only performing quantum chemistry-based optimizations on 14% of the most challenging reactions. We envision usefulness for our approach in constructing large reaction networks with unknown mechanisms.
Collapse
Affiliation(s)
- Chenru Duan
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, US.
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, US.
| | - Yuanqi Du
- Department of Computer Science, Cornell University, Ithaca, NY, US
| | - Haojun Jia
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, US
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, US
| | - Heather J Kulik
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, US
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, US
| |
Collapse
|
9
|
Casetti N, Alfonso-Ramos JE, Coley CW, Stuyver T. Combining Molecular Quantum Mechanical Modeling and Machine Learning for Accelerated Reaction Screening and Discovery. Chemistry 2023; 29:e202301957. [PMID: 37526059 DOI: 10.1002/chem.202301957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/30/2023] [Accepted: 07/31/2023] [Indexed: 08/02/2023]
Abstract
Molecular quantum mechanical modeling, accelerated by machine learning, has opened the door to high-throughput screening campaigns of complex properties, such as the activation energies of chemical reactions and absorption/emission spectra of materials and molecules; in silico. Here, we present an overview of the main principles, concepts, and design considerations involved in such hybrid computational quantum chemistry/machine learning screening workflows, with a special emphasis on some recent examples of their successful application. We end with a brief outlook of further advances that will benefit the field.
Collapse
Affiliation(s)
- Nicholas Casetti
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts, 02139, United States
| | - Javier E Alfonso-Ramos
- Ecole Nationale Supérieure de Chimie de Paris, Université PSL, CNRS, Institute of Chemistry for Life and Health Sciences, 75005, Paris, France
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts, 02139, United States
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts, 02139, United States
| | - Thijs Stuyver
- Ecole Nationale Supérieure de Chimie de Paris, Université PSL, CNRS, Institute of Chemistry for Life and Health Sciences, 75005, Paris, France
| |
Collapse
|
10
|
Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J, Zhu Y, Liu J, Zhang B, Wei GW. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem Rev 2023; 123:8736-8780. [PMID: 37384816 PMCID: PMC10999174 DOI: 10.1021/acs.chemrev.3c00189] [Citation(s) in RCA: 79] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.
Collapse
Affiliation(s)
- Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Zailiang Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Ekaterina Merkurjev
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Lu Ke
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Long Chen
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
11
|
Cytter Y, Nandy A, Duan C, Kulik HJ. Insights into the deviation from piecewise linearity in transition metal complexes from supervised machine learning models. Phys Chem Chem Phys 2023; 25:8103-8116. [PMID: 36876903 DOI: 10.1039/d3cp00258f] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
Virtual high-throughput screening (VHTS) and machine learning (ML) with density functional theory (DFT) suffer from inaccuracies from the underlying density functional approximation (DFA). Many of these inaccuracies can be traced to the lack of derivative discontinuity that leads to a curvature in the energy with electron addition or removal. Over a dataset of nearly one thousand transition metal complexes typical of VHTS applications, we computed and analyzed the average curvature (i.e., deviation from piecewise linearity) for 23 density functional approximations spanning multiple rungs of "Jacob's ladder". While we observe the expected dependence of the curvatures on Hartree-Fock exchange, we note limited correlation of curvature values between different rungs of "Jacob's ladder". We train ML models (i.e., artificial neural networks or ANNs) to predict the curvature and the associated frontier orbital energies for each of these 23 functionals and then interpret differences in curvature among the different DFAs through analysis of the ML models. Notably, we observe spin to play a much more important role in determining the curvature of range-separated and double hybrids in comparison to semi-local functionals, explaining why curvature values are weakly correlated between these and other families of functionals. Over a space of 187.2k hypothetical compounds, we use our ANNs to pinpoint DFAs for which representative transition metal complexes have near-zero curvature with low uncertainty, demonstrating an approach to accelerate screening of complexes with targeted optical gaps.
Collapse
Affiliation(s)
- Yael Cytter
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
12
|
Vuckovic S. Using AI to navigate through the DFA zoo. NATURE COMPUTATIONAL SCIENCE 2023; 3:6-7. [PMID: 38177964 DOI: 10.1038/s43588-022-00393-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]
Affiliation(s)
- Stefan Vuckovic
- Physical and Theoretical Chemistry, University of Wuppertal, Wuppertal, Germany.
| |
Collapse
|