1
|
Manrique-Castano D, Bhaskar D, ElAli A. Dissecting glial scar formation by spatial point pattern and topological data analysis. Sci Rep 2024; 14:19035. [PMID: 39152163 PMCID: PMC11329771 DOI: 10.1038/s41598-024-69426-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 08/05/2024] [Indexed: 08/19/2024] Open
Abstract
Glial scar formation represents a fundamental response to central nervous system (CNS) injuries. It is mainly characterized by a well-defined spatial rearrangement of reactive astrocytes and microglia. The mechanisms underlying glial scar formation have been extensively studied, yet quantitative descriptors of the spatial arrangement of reactive glial cells remain limited. Here, we present a novel approach using point pattern analysis (PPA) and topological data analysis (TDA) to quantify spatial patterns of reactive glial cells after experimental ischemic stroke in mice. We provide open and reproducible tools using R and Julia to quantify spatial intensity, cell covariance and conditional distribution, cell-to-cell interactions, and short/long-scale arrangement, which collectively disentangle the arrangement patterns of the glial scar. This approach unravels a substantial divergence in the distribution of GFAP+ and IBA1+ cells after injury that conventional analysis methods cannot fully characterize. PPA and TDA are valuable tools for studying the complex spatial arrangement of reactive glia and other nervous cells following CNS injuries and have potential applications for evaluating glial-targeted restorative therapies.
Collapse
Affiliation(s)
- Daniel Manrique-Castano
- Neuroscience Axis, Research Center of CHU de Québec-Université Laval, Quebec City, QC, Canada.
- Department of Psychiatry and Neuroscience, Faculty of Medicine, Université Laval, Quebec City, QC, Canada.
| | | | - Ayman ElAli
- Neuroscience Axis, Research Center of CHU de Québec-Université Laval, Quebec City, QC, Canada.
- Department of Psychiatry and Neuroscience, Faculty of Medicine, Université Laval, Quebec City, QC, Canada.
| |
Collapse
|
2
|
Arango AS, Park H, Tajkhorshid E. Topological Learning Approach to Characterizing Biological Membranes. J Chem Inf Model 2024; 64:5242-5252. [PMID: 38912752 DOI: 10.1021/acs.jcim.4c00552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/25/2024]
Abstract
Biological membranes play key roles in cellular compartmentalization, structure, and its signaling pathways. At varying temperatures, individual membrane lipids sample from different configurations, a process that frequently leads to higher-order phase behavior and phenomena. Here, we present a persistent homology (PH)-based method for quantifying the structural features of individual and bulk lipids, providing local and contextual information on lipid tail organization. Our method leverages the mathematical machinery of algebraic topology and machine learning to infer temperature-dependent structural information on lipids from static coordinates. To train our model, we generated multiple molecular dynamics trajectories of dipalmitoyl-phosphatidylcholine membranes at varying temperatures. A fingerprint was then constructed for each set of lipid coordinates by PH filtration, in which interaction spheres were grown around the lipid atoms while tracking their intersections. The sphere filtration formed a simplicial complex that captures enduring key topological features of the configuration landscape using homology, yielding persistence data. Following fingerprint extraction for physiologically relevant temperatures, the persistence data were used to train an attention-based neural network for assignment of effective temperature values to selected membrane regions. Our persistence homology-based method captures the local structural effects, via effective temperature, of lipids adjacent to other membrane constituents, e.g., sterols and proteins. This topological learning approach can predict lipid effective temperatures from static coordinates across multiple spatial resolutions. The tool, called MembTDA, can be accessed at https://github.com/hyunp2/Memb-TDA.
Collapse
Affiliation(s)
- Andres S Arango
- Theoretical and Computational Biophysics Group, NIH Resource Center for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, and Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Hyun Park
- Theoretical and Computational Biophysics Group, NIH Resource Center for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, and Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Emad Tajkhorshid
- Theoretical and Computational Biophysics Group, NIH Resource Center for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, and Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
3
|
Zhang Y, Xing S, Wei L, Shi T. Utilizing Machine Learning Models for Predicting Diamagnetic Susceptibility of Organic Compounds. ACS OMEGA 2024; 9:14368-14374. [PMID: 38560008 PMCID: PMC10976355 DOI: 10.1021/acsomega.3c10469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 02/03/2024] [Accepted: 03/08/2024] [Indexed: 04/04/2024]
Abstract
This research is centered on examining the magnetic characteristics of organic molecules, with a particular emphasis on magnetic susceptibility, an essential physical property that provides insights into molecular microstructures and reaction processes. Traditional approaches for determining and calculating magnetic susceptibility are generally inefficient and demanding. To overcome these challenges, we have introduced a novel approach using quantitative structure-property relationships, which efficiently elucidates the relationship between the structural properties of molecules and their molar magnetic susceptibility. In our study, we utilized a comprehensive database comprising molar magnetic susceptibility data for 382 organic molecules. We applied six distinct molecular fingerprinting methods-RDKit Fingerprint, Morgan Fingerprint, MACCS Keys, atom pair fingerprint, Avalon Fingerprint, and topology fingerprint-as feature inputs for training seven different machine learning models, namely random forest, AdaBoost, gradient boosting, extra trees, elastic net, support vector machine, and multilayer perceptron (MLP). Our findings revealed that the integration of the atom pair fingerprint with the MLP model yielded R2 values of 0.88 and 0.90 in the validation and test sets, respectively, showcasing exceptional predictive accuracy. This advancement significantly expedites research and development processes related to the magnetic properties of organic molecules. Moreover, by employing this effective predictive method, it is expected to considerably reduce both experimental and computational expenses while maintaining high accuracy. This development represents a breakthrough in the rapid screening and prediction of properties for various compounds, offering a new and efficient pathway in this field of study.
Collapse
Affiliation(s)
- Yining Zhang
- Xinjiang
Laboratory of Phase Transitions and Microstructures in Condensed Matter
Physics, College of Physical Science and Technology, Yili Normal University, Yining 835000, China
| | - Sijie Xing
- Alibaba
Cloud Big Data Application College, Zhuhai
College of Science and Technology, Zhuhai 519041, China
| | - Lai Wei
- Xinjiang
Laboratory of Phase Transitions and Microstructures in Condensed Matter
Physics, College of Physical Science and Technology, Yili Normal University, Yining 835000, China
| | - Tongfei Shi
- Xinjiang
Laboratory of Phase Transitions and Microstructures in Condensed Matter
Physics, College of Physical Science and Technology, Yili Normal University, Yining 835000, China
- School
of Chemical Engineering and Light Industry, Guangdong University of Technology, Guangzhou 510006, People’s Republic of China
| |
Collapse
|
4
|
Wang W, Yan Z, Wang L, Xu S. Topological Characteristics of the Pore Network in the Tight Sandstone Using Persistent Homology. ACS OMEGA 2024; 9:11589-11596. [PMID: 38496948 PMCID: PMC10938304 DOI: 10.1021/acsomega.3c08847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 01/12/2024] [Accepted: 02/13/2024] [Indexed: 03/19/2024]
Abstract
Tight sandstone reservoirs have become important areas for unconventional reservoir development, and their pore network is a key feature for identifying tight sandstone, which affects fluid migration path and reservoir development efficiency. However, the connectivity characteristics of the pore network at different scales have remained unclear owing to the numerous pores and uneven pore shape. Here, using pore size distributions from many hundreds of tight sandstone samples and subsequent topological data analysis, we construct the topological structure of the pore network in the Yanchang Formation tight sandstone of the Ordos Basin in China and visualize the topological characteristics of the pore network with distances. We show that there are three connected groups within the pore structure of the tight sandstone. The topology of the pore network resides on a trident ring manifold, suggesting that the pore network in the tight sandstone encompasses three obvious dominant connection paths. One prominent bar on the H0 dimension in the barcode indicates a two-point connection from nanoscale to microscale in the pore network. Three prominent bars with varying durations on the H1 dimension indicate the presence of three separate multipoint connections within a limited extent in the pore network. Connectivity of combined pores is good and controlled by the topological structure of the pore network. This demonstration of pore connections on a trident ring manifold provides a population-level visualization of the pore network in the tight sandstone.
Collapse
Affiliation(s)
- Wei Wang
- College
of Chemistry and Chemical Engineering, Yulin
University, Yulin 719000, Shaanxi, P. R. China
| | - Zhiyong Yan
- No.
1 Oil Production Plant, Petrochina Changqing
Oilfield Company, Yan’an 716000, Shaanxi, P. R. China
| | - Lina Wang
- No.
1 Oil Production Plant, Petrochina Changqing
Oilfield Company, Yan’an 716000, Shaanxi, P. R. China
| | - Shuang Xu
- No.
1 Oil Production Plant, Petrochina Changqing
Oilfield Company, Yan’an 716000, Shaanxi, P. R. China
| |
Collapse
|
5
|
Ju CW, Shen Y, French EJ, Yi J, Bi H, Tian A, Lin Z. Accurate Electronic and Optical Properties of Organic Doublet Radicals Using Machine Learned Range-Separated Functionals. J Phys Chem A 2024. [PMID: 38382058 DOI: 10.1021/acs.jpca.3c07437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Luminescent organic semiconducting doublet-spin radicals are unique and emergent optical materials because their fluorescent quantum yields (Φfl) are not compromised by the spin-flipping intersystem crossing (ISC) into a dark high-spin state. The multiconfigurational nature of these radicals challenges their electronic structure calculations in the framework of single-reference density functional theory (DFT) and introduces room for method improvement. In the present study, we extended our earlier development of ML-ωPBE [J. Phys. Chem. Lett., 2021, 12, 9516-9524], a range-separated hybrid (RSH) exchange-correlation (XC) functional constructed using the stacked ensemble machine learning (SEML) algorithm, from closed-shell organic semiconducting molecules to doublet-spin organic semiconducting radicals. We assessed its performance for a new test set of 64 doublet-spin radicals from five categories while placing all previously compiled 3926 closed-shell molecules in the new training set. Interestingly, ML-ωPBE agrees with the nonempirical OT-ωPBE functional regarding the prediction of the molecule-dependent range-separation parameter (ω), with a small mean absolute error (MAE) of 0.0197 a0-1, but saves the computational cost by 2.46 orders of magnitude. This result demonstrates an outstanding domain adaptation capacity of ML-ωPBE for diverse organic semiconducting species. To further assess the predictive power of ML-ωPBE in experimental observables, we also applied it to evaluate absorption and fluorescence energies (Eabs and Efl) using linear-response time-dependent DFT (TDDFT), and we compared its behavior with nine popular XC functionals. For most radicals, ML-ωPBE reproduces experimental measurements of Eabs and Efl with small MAEs of 0.299 and 0.254 eV, only marginally different from those of OT-ωPBE. Our work illustrates a successful extension of the SEML framework from closed-shell molecules to doublet-spin radicals and will open the venue for calculating optical properties for organic semiconductors using single-reference TDDFT.
Collapse
Affiliation(s)
- Cheng-Wei Ju
- Department of Chemistry, University of Massachusetts, Amherst, Massachusetts 01003, United States
- Pritzker School of Molecular Engineering, The University of Chicago, Chicago, Illinois 60637, United States
| | - Yili Shen
- Manning College of Information and Computer Sciences, University of Massachusetts, Amherst, Massachusetts 01003, United States
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, Indiana 46556, United States
| | - Ethan J French
- Department of Chemistry, University of Massachusetts, Amherst, Massachusetts 01003, United States
- Department of Mathematics and Statistics, University of Massachusetts, Amherst, Massachusetts 01003, United States
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Charlestown, Massachusetts 02129, United States
| | - Jun Yi
- Department of Chemistry, University of Massachusetts, Amherst, Massachusetts 01003, United States
- Department of Chemistry, Wake Forest University, Winston-Salem, North Carolina 27109, United States
| | - Hongshan Bi
- Department of Chemistry, University of Massachusetts, Amherst, Massachusetts 01003, United States
| | - Aaron Tian
- Manning College of Information and Computer Sciences, University of Massachusetts, Amherst, Massachusetts 01003, United States
- Department of Mathematics and Statistics, University of Massachusetts, Amherst, Massachusetts 01003, United States
| | - Zhou Lin
- Department of Chemistry, University of Massachusetts, Amherst, Massachusetts 01003, United States
| |
Collapse
|
6
|
Kirkland JK, Kumawat J, Shaban Tameh M, Tolman T, Lambert AC, Lief GR, Yang Q, Ess DH. Machine Learning Models for Predicting Zirconocene Properties and Barriers. J Chem Inf Model 2024; 64:775-784. [PMID: 38259142 DOI: 10.1021/acs.jcim.3c01575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Zr metallocenes have significant potential to be highly tunable polyethylene catalysts through modification of the aromatic ligand framework. Here we report the development of multiple machine learning models using a large library (>700 systems) of DFT-calculated zirconocene properties and barriers for ethylene polymerization. We show that very accurate machine learning models are possible for HOMO-LUMO gaps of precatalysts but the performance significantly depends on the machine learning algorithm and type of featurization, such as fingerprints, Coulomb matrices, smooth overlap of atomic positions, or persistence images. Surprisingly, the description of the bonding hapticity, the number of direct connections between Zr and the ligand aromatic carbons, only has a moderate influence on the performance of most models. Despite robust models for HOMO-LUMO gaps, these types of machine learning models based on structure connectivity type features perform poorly in predicting ethylene migratory insertion barrier heights. Therefore, we developed several relatively robust and accurate machine learning models for barrier heights that are based on quantum-chemical descriptors (QCDs). The quantitative accuracy of these models depends on which potential energy surface structure QCDs were harvested from. This revealed a Hammett-type principle to naturally emerge showing that QCDs from the π-coordination complexes provide much better descriptions of the transition states than other potential-energy structures. Feature importance analysis of the QCDs provides several fundamental principles that influence zirconocene catalyst reactivity.
Collapse
Affiliation(s)
- Justin K Kirkland
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Jugal Kumawat
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Maliheh Shaban Tameh
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Tyson Tolman
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Allison C Lambert
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| | - Graham R Lief
- Research and Technology, Chevron Phillips Chemical Company, Highways 60 & 123, Bartlesville, Oklahoma 74003, United States
| | - Qing Yang
- Research and Technology, Chevron Phillips Chemical Company, Highways 60 & 123, Bartlesville, Oklahoma 74003, United States
| | - Daniel H Ess
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84604, United States
| |
Collapse
|
7
|
Gao H, Zhong S, Dangayach R, Chen Y. Understanding and Designing a High-Performance Ultrafiltration Membrane Using Machine Learning. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:17831-17840. [PMID: 36790106 PMCID: PMC10666290 DOI: 10.1021/acs.est.2c05404] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 02/04/2023] [Accepted: 02/06/2023] [Indexed: 06/18/2023]
Abstract
Ultrafiltration (UF) as one of the mainstream membrane-based technologies has been widely used in water and wastewater treatment. Increasing demand for clean and safe water requires the rational design of UF membranes with antifouling potential, while maintaining high water permeability and removal efficiency. This work employed a machine learning (ML) method to establish and understand the correlation of five membrane performance indices as well as three major performance-determining membrane properties with membrane fabrication conditions. The loading of additives, specifically nanomaterials (A_wt %), at loading amounts of >1.0 wt % was found to be the most significant feature affecting all of the membrane performance indices. The polymer content (P_wt %), molecular weight of the pore maker (M_Da), and pore maker content (M_wt %) also made considerable contributions to predicting membrane performance. Notably, M_Da was more important than M_wt % for predicting membrane performance. The feature analysis of ML models in terms of membrane properties (i.e., mean pore size, overall porosity, and contact angle) provided an unequivocal explanation of the effects of fabrication conditions on membrane performance. Our approach can provide practical aid in guiding the design of fit-for-purpose separation membranes through data-driven virtual experiments.
Collapse
Affiliation(s)
- Haiping Gao
- School
of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
- Shandong
Provincial Key Laboratory of Water Pollution Control and Resource
Reuse, School of Environmental Science and Engineering, Shandong University, Qingdao, Shandong 266237, China
| | - Shifa Zhong
- School
of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
- School
of Ecological and Environmental Sciences, East China Normal University, Shanghai 200241, China
| | - Raghav Dangayach
- School
of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Yongsheng Chen
- School
of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| |
Collapse
|
8
|
Chen J, Woldring DR, Huang F, Huang X, Wei GW. Topological deep learning based deep mutational scanning. Comput Biol Med 2023; 164:107258. [PMID: 37506452 PMCID: PMC10528359 DOI: 10.1016/j.compbiomed.2023.107258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 06/28/2023] [Accepted: 07/08/2023] [Indexed: 07/30/2023]
Abstract
High-throughput deep mutational scanning (DMS) experiments have significantly impacted protein engineering, drug discovery, immunology, cancer biology, and evolutionary biology by enabling the systematic understanding of protein functions. However, the mutational space associated with proteins is astronomically large, making it overwhelming for current experimental capabilities. Therefore, alternative methods for DMS are imperative. We propose a topological deep learning (TDL) paradigm to facilitate in silico DMS. We utilize a new topological data analysis (TDA) technique based on the persistent spectral theory, also known as persistent Laplacian, to capture both topological invariants and the homotopic shape evolution of data. To validate our TDL-DMS model, we use SARS-CoV-2 datasets and show excellent accuracy and reliability for binding interface mutations. This finding is significant for SARS-CoV-2 variant forecasting and designing effective antibodies and vaccines. Our proposed model is expected to have a significant impact on drug discovery, vaccine design, precision medicine, and protein engineering.
Collapse
Affiliation(s)
- Jiahui Chen
- Department of Mathematical Sciences, University of Arkansas, Fayetteville, AR 72701, USA
| | - Daniel R Woldring
- Department of Chemical Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Faqing Huang
- Department of Chemistry and Biochemistry, University of Southern Mississippi, Hattiesburg, MS 39406, USA
| | - Xuefei Huang
- Department of Chemistry, Michigan State University, MI 48824, USA; Department of Biomedical Engineering, Michigan State University, East Lansing, MI 48824, USA; The Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA; Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA.
| |
Collapse
|
9
|
Bailey T, Jackson A, Berbece RA, Wu K, Hondow N, Martin E. Gradient Boosted Machine Learning Model to Predict H 2, CH 4, and CO 2 Uptake in Metal-Organic Frameworks Using Experimental Data. J Chem Inf Model 2023; 63:4545-4551. [PMID: 37463276 PMCID: PMC10428209 DOI: 10.1021/acs.jcim.3c00135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Indexed: 07/20/2023]
Abstract
Predictive screening of metal-organic framework (MOF) materials for their gas uptake properties has been previously limited by using data from a range of simulated sources, meaning the final predictions are dependent on the performance of these original models. In this work, experimental gas uptake data has been used to create a Gradient Boosted Tree model for the prediction of H2, CH4, and CO2 uptake over a range of temperatures and pressures in MOF materials. The descriptors used in this database were obtained from the literature, with no computational modeling needed. This model was repeated 10 times, showing an average R2 of 0.86 and a mean absolute error (MAE) of ±2.88 wt % across the runs. This model will provide gas uptake predictions for a range of gases, temperatures, and pressures as a one-stop solution, with the data provided being based on previous experimental observations in the literature, rather than simulations, which may differ from their real-world results. The objective of this work is to create a machine learning model for the inference of gas uptake in MOFs. The basis of model development is experimental as opposed to simulated data to realize its applications by practitioners. The real-world nature of this research materializes in a focus on the application of algorithms as opposed to the detailed assessment of the algorithms.
Collapse
Affiliation(s)
- Tom Bailey
- School
of Chemical and Process Engineering, University
of Leeds, Leeds LS2 9JT, U.K.
| | - Adam Jackson
- School
of Chemical and Process Engineering, University
of Leeds, Leeds LS2 9JT, U.K.
| | | | - Kejun Wu
- School
of Chemical and Process Engineering, University
of Leeds, Leeds LS2 9JT, U.K.
- Zhejiang
Provincial Key Laboratory of Advanced Chemical Engineering Manufacture
Technology, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
| | - Nicole Hondow
- School
of Chemical and Process Engineering, University
of Leeds, Leeds LS2 9JT, U.K.
| | - Elaine Martin
- School
of Chemical and Process Engineering, University
of Leeds, Leeds LS2 9JT, U.K.
| |
Collapse
|
10
|
Angelis D, Sofos F, Karakasidis TE. Artificial Intelligence in Physical Sciences: Symbolic Regression Trends and Perspectives. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING : STATE OF THE ART REVIEWS 2023; 30:1-21. [PMID: 37359747 PMCID: PMC10113133 DOI: 10.1007/s11831-023-09922-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Accepted: 03/27/2023] [Indexed: 06/28/2023]
Abstract
Symbolic regression (SR) is a machine learning-based regression method based on genetic programming principles that integrates techniques and processes from heterogeneous scientific fields and is capable of providing analytical equations purely from data. This remarkable characteristic diminishes the need to incorporate prior knowledge about the investigated system. SR can spot profound and elucidate ambiguous relations that can be generalizable, applicable, explainable and span over most scientific, technological, economical, and social principles. In this review, current state of the art is documented, technical and physical characteristics of SR are presented, the available programming techniques are investigated, fields of application are explored, and future perspectives are discussed. Supplementary Information The online version contains supplementary material available at 10.1007/s11831-023-09922-z.
Collapse
Affiliation(s)
- Dimitrios Angelis
- Condensed Matter Physics Laboratory, Department of Physics, University of Thessaly, Lamia, 35100 Greece
| | - Filippos Sofos
- Condensed Matter Physics Laboratory, Department of Physics, University of Thessaly, Lamia, 35100 Greece
| | - Theodoros E. Karakasidis
- Condensed Matter Physics Laboratory, Department of Physics, University of Thessaly, Lamia, 35100 Greece
| |
Collapse
|
11
|
Abstract
Path homology proposed by S.-T.Yau and his co-workers provides a new mathematical model for directed graphs and networks. Persistent path homology (PPH) extends the path homology with filtration to deal with asymmetry structures. However, PPH is constrained to purely topological persistence and cannot track the homotopic shape evolution of data during filtration. To overcome the limitation of PPH, persistent path Laplacian (PPL) is introduced to capture the shape evolution of data. PPL's harmonic spectra fully recover PPH's topological persistence and its non-harmonic spectra reveal the homotopic shape evolution of data during filtration.
Collapse
Affiliation(s)
- Rui Wang
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|
12
|
SuHAN: Substructural hierarchical attention network for molecular representation. J Mol Graph Model 2023; 119:108401. [PMID: 36584590 DOI: 10.1016/j.jmgm.2022.108401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 12/16/2022] [Accepted: 12/23/2022] [Indexed: 12/26/2022]
Abstract
Recently, molecular representation and property exploration, with the combination of neural network, play a critical role in the field of drug design and discovery for assisting in drug related research. However, previous research in molecular representation relies heavily on artificial extraction of features based on biological experiments which may result in a manually introduced noise of molecular information with high cost in time and money. In this paper, a novel method named Substructural Hierarchical Attention Network (SuHAN) is proposed to discover inherent characteristics of molecules for representation learning. Specifically, SuHAN is composed of the cascaded layer: atom-level layer and substructure-level layer. Molecule in the SMILES format is divided into several substructural fragments by predefined partition rules, and then they are fed into atom-level layer and substructure-level layer successively to obtain feature representation from different perspective: atomic view and substructural view. In this way, the prominent structural features that may be omitted in global extraction are excavated from a fine-grained viewpoint and fused to reconstruct representative pattern in an overall view. Experiments on biophysics and physiology datasets demonstrate that our model is competitive with a significant improvement of both accuracy and stability in performance. We confirmed that the substructural segments and progressive hierarchical networks lead to an effective molecular representation for downstream tasks. These results provide a novel perspective about reconstructing overall pattern through local prominent structure.
Collapse
|
13
|
Chen D, Liu J, Wu J, Wei GW, Pan F, Yau ST. Path Topology in Molecular and Materials Sciences. J Phys Chem Lett 2023; 14:954-964. [PMID: 36688834 PMCID: PMC10799224 DOI: 10.1021/acs.jpclett.2c03706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
The structures of molecules and materials determine their functions. Understanding the structure and function relationship is the holy grail of molecular and materials sciences. However, the rational design of molecules and materials with desirable functions remains a grand challenge despite decades of efforts. A major obstacle is the lack of an intrinsic mathematical characteristic that attributes to a specific function. This work introduces persistent path topology (PPT) to effectively characterize directed networks extracted from functional units, such as constitutional isomers, cis-trans isomers, chiral molecules, Jahn-Teller isomerism, and high-entropy alloy catalysts. Path homology (PH) theory is utilized to decipher the role of mirror-symmetric sublattices that hinder the formation of periodic unit cells in amorphous solids. Topological perturbation analysis (TPA) is proposed to reveal the critical target in the blood coagulation system. The proposed topological tools can be directly applied to systems biology, omics sciences, topological materials, and machine learning study of molecular and materials sciences.
Collapse
Affiliation(s)
- Dong Chen
- School of Advanced Materials, Peking University, Shenzhen Graduate School, Shenzhen518055, China
- Department of Mathematics, Michigan State University, East Lansing, Michigan48824, United States
| | - Jian Liu
- School of Mathematical Sciences, Hebei Normal University, Heibei, 050024, China
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing101408, China
| | - Jie Wu
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing101408, China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan48824, United States
| | - Feng Pan
- School of Advanced Materials, Peking University, Shenzhen Graduate School, Shenzhen518055, China
| | - Shing-Tung Yau
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing101408, China
- Yau Mathematical Sciences Center, Tsinghua University, Beijing100084, China
| |
Collapse
|
14
|
Zhang W, Huang W, Tan J, Guo Q, Wu B. Heterogeneous catalysis mediated by light, electricity and enzyme via machine learning: Paradigms, applications and prospects. CHEMOSPHERE 2022; 308:136447. [PMID: 36116627 DOI: 10.1016/j.chemosphere.2022.136447] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 09/08/2022] [Accepted: 09/11/2022] [Indexed: 06/15/2023]
Abstract
Energy crisis and environmental pollution have become the bottleneck of human sustainable development. Therefore, there is an urgent need to develop new catalysts for energy production and environmental remediation. Due to the high cost caused by blind screening and limited valuable computing resources, the traditional experimental methods and theoretical calculations are difficult to meet with the requirements. In the past decades, computer science has made great progress, especially in the field of machine learning (ML). As a new research paradigm, ML greatly accelerates the theoretical calculation methods represented by first principal calculation and molecular dynamics, and establish the physical picture of heterogeneous catalytic processes for energy and environment. This review firstly summarized the general research paradigms of ML in the discovery of catalysts. Then, the latest progresses of ML in light-, electricity- and enzyme-mediated heterogeneous catalysis were reviewed from the perspective of catalytic performance, operating conditions and reaction mechanism. The general guidelines of ML for heterogeneous catalysis were proposed. Finally, the existing problems and future development trend of ML in heterogeneous catalysis mediated by light, electricity and enzyme were summarized. We highly expect that this review will facilitate the interaction between ML and heterogeneous catalysis, and illuminate the development prospect of heterogeneous catalysis.
Collapse
Affiliation(s)
- Wentao Zhang
- Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, 518055, People's Republic of China
| | - Wenguang Huang
- South China Institute of Environmental Sciences, Ministry of Ecology and Environment of PRC, Guangzhou, 510655, People's Republic of China.
| | - Jie Tan
- South China Institute of Environmental Sciences, Ministry of Ecology and Environment of PRC, Guangzhou, 510655, People's Republic of China
| | - Qingwei Guo
- South China Institute of Environmental Sciences, Ministry of Ecology and Environment of PRC, Guangzhou, 510655, People's Republic of China
| | - Bingdang Wu
- School of Environmental Science and Engineering, Suzhou University of Science and Technology, Suzhou, 215009, People's Republic of China; Key Laboratory of Suzhou Sponge City Technology, Suzhou, 215002, People's Republic of China.
| |
Collapse
|
15
|
Hayashi S, Koseki J, Shimamura T. Bayesian statistical method for detecting structural and topological diversity in polymorphic proteins. Comput Struct Biotechnol J 2022; 20:6519-6525. [DOI: 10.1016/j.csbj.2022.11.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 11/17/2022] [Accepted: 11/18/2022] [Indexed: 11/22/2022] Open
|
16
|
Kuntz D, Wilson AK. Machine learning, artificial intelligence, and chemistry: how smart algorithms are reshaping simulation and the laboratory. PURE APPL CHEM 2022. [DOI: 10.1515/pac-2022-0202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
Machine learning and artificial intelligence are increasingly gaining in prominence through image analysis, language processing, and automation, to name a few applications. Machine learning is also making profound changes in chemistry. From revisiting decades-old analytical techniques for the purpose of creating better calibration curves, to assisting and accelerating traditional in silico simulations, to automating entire scientific workflows, to being used as an approach to deduce underlying physics of unexplained chemical phenomena, machine learning and artificial intelligence are reshaping chemistry, accelerating scientific discovery, and yielding new insights. This review provides an overview of machine learning and artificial intelligence from a chemist’s perspective and focuses on a number of examples of the use of these approaches in computational chemistry and in the laboratory.
Collapse
Affiliation(s)
- David Kuntz
- Department of Chemistry , University of North Texas , Denton , TX 76201 , USA
| | - Angela K. Wilson
- Department of Chemistry , Michigan State University , East Lansing , MI 48824 , USA
| |
Collapse
|
17
|
Chen Z, Bononi FC, Sievers CA, Kong WY, Donadio D. UV-Visible Absorption Spectra of Solvated Molecules by Quantum Chemical Machine Learning. J Chem Theory Comput 2022; 18:4891-4902. [PMID: 35913220 DOI: 10.1021/acs.jctc.1c01181] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Predicting UV-visible absorption spectra is essential to understand photochemical processes and design energy materials. Quantum chemical methods can deliver accurate calculations of UV-visible absorption spectra, but they are computationally expensive, especially for large systems or when one computes line shapes from thermal averages. Here, we present an approach to predict UV-visible absorption spectra of solvated aromatic molecules by quantum chemistry (QC) and machine learning (ML). We show that a ML model, trained on the high-level QC calculation of the excitation energy of a set of aromatic molecules, can accurately predict the line shape of the lowest-energy UV-visible absorption band of several related molecules with less than 0.1 eV deviation with respect to reference experimental spectra. Applying linear decomposition analysis on the excitation energies, we unveil that our ML models probe vertical excitations of these aromatic molecules primarily by learning the atomic environment of their phenyl rings, which align with the physical origin of the π →π* electronic transition. Our study provides an effective workflow that combines ML with quantum chemical methods to accelerate the calculations of UV-visible absorption spectra for various molecular systems.
Collapse
Affiliation(s)
- Zekun Chen
- Department of Chemistry, University of California Davis 95616, California, United States
| | - Fernanda C Bononi
- Department of Chemistry, University of California Davis 95616, California, United States
| | - Charles A Sievers
- Department of Chemistry, University of California Davis 95616, California, United States
| | - Wang-Yeuk Kong
- Department of Chemistry, University of California Davis 95616, California, United States
| | - Davide Donadio
- Department of Chemistry, University of California Davis 95616, California, United States
| |
Collapse
|
18
|
Zhang Z, Cheng M, Xiao X, Bi K, Song T, Hu KQ, Dai Y, Zhou L, Liu C, Ji X, Shi WQ. Machine-Learning-Guided Identification of Coordination Polymer Ligands for Crystallizing Separation of Cs/Sr. ACS APPLIED MATERIALS & INTERFACES 2022; 14:33076-33084. [PMID: 35801670 DOI: 10.1021/acsami.2c05272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Separation of Cs/Sr is one of many coordination-chemistry-centered processes in the grand scheme of spent nuclear fuel reprocessing, a critical link for a sustainable nuclear energy industry. To deploy a crystallizing Cs/Sr separation technology, we planned to systematically screen and identify candidate ligands that can efficiently and selectively bind to Sr2+ and form coordination polymers. Therefore, we mined the Cambridge Structural Database for characteristic structural information and developed a machine-learning-guided methodology for ligand evaluation. The optimized machine-learning model, correlating the molecular structures of the ligands with the predicted coordinative properties, generated a ranking list of potential compounds for Cs/Sr selective crystallization. The Sr2+ sequestration capability and selectivity over Cs+ of the promising ligands identified (squaric acid and chloranilic acid) were subsequently confirmed experimentally, with commendable performances, corroborating the artificial-intelligence-guided strategy.
Collapse
Affiliation(s)
- Zhiyuan Zhang
- School of Chemical Engineering, Sichuan University, Chengdu 610065, People's Republic of China
| | - Min Cheng
- School of Chemical Engineering, Sichuan University, Chengdu 610065, People's Republic of China
| | - Xinyi Xiao
- School of Chemical Engineering, Sichuan University, Chengdu 610065, People's Republic of China
| | - Kexin Bi
- School of Chemical Engineering, Sichuan University, Chengdu 610065, People's Republic of China
| | - Ting Song
- School of Chemical Engineering, Sichuan University, Chengdu 610065, People's Republic of China
| | - Kong-Qiu Hu
- Laboratory of Nuclear Energy Chemistry, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| | - Yiyang Dai
- School of Chemical Engineering, Sichuan University, Chengdu 610065, People's Republic of China
| | - Li Zhou
- School of Chemical Engineering, Sichuan University, Chengdu 610065, People's Republic of China
| | - Chong Liu
- School of Chemical Engineering, Sichuan University, Chengdu 610065, People's Republic of China
| | - Xu Ji
- School of Chemical Engineering, Sichuan University, Chengdu 610065, People's Republic of China
| | - Wei-Qun Shi
- Laboratory of Nuclear Energy Chemistry, Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| |
Collapse
|
19
|
Branco S, Carvalho JG, Reis MS, Lopes NV, Cabral J. 0-Dimensional Persistent Homology Analysis Implementation in Resource-Scarce Embedded Systems. SENSORS (BASEL, SWITZERLAND) 2022; 22:3657. [PMID: 35632064 PMCID: PMC9144123 DOI: 10.3390/s22103657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 05/02/2022] [Accepted: 05/06/2022] [Indexed: 06/15/2023]
Abstract
Persistent Homology (PH) analysis is a powerful tool for understanding many relevant topological features from a given dataset. PH allows finding clusters, noise, and relevant connections in the dataset. Therefore, it can provide a better view of the problem and a way of perceiving if a given dataset is equal to another, if a given sample is relevant, and how the samples occupy the feature space. However, PH involves reducing the problem to its simplicial complex space, which is computationally expensive and implementing PH in such Resource-Scarce Embedded Systems (RSES) is an essential add-on for them. However, due to its complexity, implementing PH in such tiny devices is considerably complicated due to the lack of memory and processing power. The following paper shows the implementation of 0-Dimensional Persistent Homology Analysis in a set of well-known RSES, using a technique that reduces the memory footprint and processing power needs of the 0-Dimensional PH algorithm. The results are positive and show that RSES can be equipped with this real-time data analysis tool.
Collapse
Affiliation(s)
- Sérgio Branco
- Algoritmi Center, University of Minho, 4800-058 Guimarães, Portugal; (S.B.); (J.G.C.)
- CEiiA—Centro de Engenharia, Av. D. Afonso Henriques 1825, 4450-017 Matosinhos, Portugal
| | - João G. Carvalho
- Algoritmi Center, University of Minho, 4800-058 Guimarães, Portugal; (S.B.); (J.G.C.)
- DTx—Digital Transformation CoLab, University of Minho, 4800-058 Guimarães, Portugal;
| | - Marco S. Reis
- CIEPQPF, Department of Chemical Engineering, University of Coimbra, Rua Sílvio Lima, Pólo II—Pinhal de Marrocos, 3030-790 Coimbra, Portugal;
| | - Nuno V. Lopes
- DTx—Digital Transformation CoLab, University of Minho, 4800-058 Guimarães, Portugal;
| | - Jorge Cabral
- Algoritmi Center, University of Minho, 4800-058 Guimarães, Portugal; (S.B.); (J.G.C.)
- CEiiA—Centro de Engenharia, Av. D. Afonso Henriques 1825, 4450-017 Matosinhos, Portugal
| |
Collapse
|
20
|
Howarth A, Goodman JM. The DP5 probability, quantification and visualisation of structural uncertainty in single molecules. Chem Sci 2022; 13:3507-3518. [PMID: 35432857 PMCID: PMC8943899 DOI: 10.1039/d1sc04406k] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Accepted: 02/24/2022] [Indexed: 12/22/2022] Open
Abstract
Whenever a new molecule is made, a chemist will justify the proposed structure by analysing the NMR spectra. The widely-used DP4 algorithm will choose the best match from a series of possibilities, but draws no conclusions from a single candidate structure. Here we present the DP5 probability, a step-change in the quantification of molecular uncertainty: given one structure and one 13C NMR spectra, DP5 gives the probability of the structure being correct. We show the DP5 probability can rapidly differentiate between structure proposals indistinguishable by NMR to an expert chemist. We also show in a number of challenging examples the DP5 probability may prevent incorrect structures being published and later reassigned. DP5 will prove extremely valuable in fields such as discovery-driven automated chemical synthesis and drug development. Alongside the DP4-AI package, DP5 can help guide synthetic chemists when resolving the most subtle structural uncertainty. The DP5 system is available at https://github.com/Goodman-lab/DP5.
Collapse
Affiliation(s)
- Alexander Howarth
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge Lensfield Road Cambridge CB2 1EW UK
| | - Jonathan M Goodman
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge Lensfield Road Cambridge CB2 1EW UK
| |
Collapse
|
21
|
Zulkepli NFS, Noorani MSM, Razak FA, Ismail M, Alias MA. Hybridization of hierarchical clustering with persistent homology in assessing haze episodes between air quality monitoring stations. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2022; 306:114434. [PMID: 35065362 DOI: 10.1016/j.jenvman.2022.114434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 11/24/2021] [Accepted: 01/02/2022] [Indexed: 06/14/2023]
Abstract
Haze has been a major issue afflicting Southeast Asian countries, including Malaysia, for the past few decades. Hierarchical agglomerative cluster analysis (HACA) is commonly used to evaluate the spatial behavior between areas in which pollutants interact. Typically, using HACA, the Euclidean distance acts as the dissimilarity measure and air quality monitoring stations are grouped according to this measure, thus revealing the most polluted areas. In this study, a framework for the hybridization of the HACA technique is proposed by considering the topological similarity (Wasserstein distance) between stations to evaluate the spatial patterns of the affected areas by haze episodes. For this, a tool in the topological data analysis (TDA), namely, persistent homology, is used to extract essential topological features hidden in the dataset. The performance of the proposed method is compared with that of traditional HACA and evaluated based on its ability to categorize areas according to the exceedance level of the particulate matter (PM10). Results show that additional topological features have yielded better accuracy compared to without the case that does not consider topological features. The cluster validity indices are computed to verify the results, and the proposed method outperforms the traditional method, suggesting a practical alternative approach for assessing the similarity in air pollution behaviors based on topological characterizations.
Collapse
Affiliation(s)
| | - Mohd Salmi Md Noorani
- Department of Mathematical Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor, Malaysia.
| | - Fatimah Abdul Razak
- Department of Mathematical Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor, Malaysia.
| | - Munira Ismail
- Department of Mathematical Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor, Malaysia.
| | - Mohd Almie Alias
- Department of Mathematical Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor, Malaysia.
| |
Collapse
|
22
|
Harper DR, Nandy A, Arunachalam N, Duan C, Janet JP, Kulik HJ. Representations and strategies for transferable machine learning Improve model performance in chemical discovery. J Chem Phys 2022; 156:074101. [DOI: 10.1063/5.0082964] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Daniel R Harper
- Massachusetts Institute of Technology, United States of America
| | - Aditya Nandy
- Massachusetts Institute of Technology, United States of America
| | | | - Chenru Duan
- Massachusetts Institute of Technology, United States of America
| | | | - Heather J. Kulik
- Dept of Chemical Engineering, Massachusetts Institute of Technology, United States of America
| |
Collapse
|
23
|
WEI XIAOQI, WEI GUOWEI. HOMOTOPY CONTINUATION FOR THE SPECTRA OF PERSISTENT LAPLACIANS. FOUNDATIONS OF DATA SCIENCE (SPRINGFIELD, MO.) 2021; 3:677-700. [PMID: 35822080 PMCID: PMC9273002 DOI: 10.3934/fods.2021017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The p-persistent q-combinatorial Laplacian defined for a pair of simplicial complexes is a generalization of the q-combinatorial Laplacian. Given a filtration, the spectra of persistent combinatorial Laplacians not only recover the persistent Betti numbers of persistent homology but also provide extra multiscale geometrical information of the data. Paired with machine learning algorithms, the persistent Laplacian has many potential applications in data science. Seeking different ways to find the spectrum of an operator is an active research topic, becoming interesting when ideas are originated from multiple fields. In this work, we explore an alternative approach for the spectrum of persistent Laplacians. As the eigenvalues of a persistent Laplacian matrix are the roots of its characteristic polynomial, one may attempt to find the roots of the characteristic polynomial by homotopy continuation, and thus resolving the spectrum of the corresponding persistent Laplacian. We consider a set of simple polytopes and small molecules to prove the principle that algebraic topology, combinatorial graph, and algebraic geometry can be integrated to understand the shape of data.
Collapse
Affiliation(s)
- XIAOQI WEI
- Department of Mathematics, Michigan State University, MI 48824, USA
| | | |
Collapse
|
24
|
Fite S, Nitecki O, Gross Z. Custom Tokenization Dictionary, CUSTODI: A General, Fast, and Reversible Data-Driven Representation and Regressor. J Chem Inf Model 2021; 61:3285-3291. [PMID: 34180231 DOI: 10.1021/acs.jcim.1c00563] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Custom tokenization dictionary (CUSTODI) is introduced as a novel way for tackling the problem of molecular representations, and especially the challenge of molecular property prediction. Herein, the motivational theory and the actual representation and model are presented and shown to have performance that is in line with benchmark methodologies. The uniqueness of CUSTODI is its applicability on small training sets and the developed theory suggests its possible use for a-priori estimation of future fit quality on any given dataset, regardless of the method used for fitting.
Collapse
Affiliation(s)
- Shachar Fite
- Schulich Faculty of Chemistry, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | - Omri Nitecki
- Schulich Faculty of Chemistry, Technion-Israel Institute of Technology, Haifa 32000, Israel
| | - Zeev Gross
- Schulich Faculty of Chemistry, Technion-Israel Institute of Technology, Haifa 32000, Israel
| |
Collapse
|
25
|
Smith CPA, Landman NH, Bardin J, Kruta I. New evidence from exceptionally "well-preserved" specimens sheds light on the structure of the ammonite brachial crown. Sci Rep 2021; 11:11862. [PMID: 34088905 PMCID: PMC8178333 DOI: 10.1038/s41598-021-89998-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 05/05/2021] [Indexed: 02/05/2023] Open
Abstract
Ammonite soft body remains are rarely preserved. One of the biggest enigmas is the morphology of the ammonite brachial crown that has, up till now, never been recovered. Recently, mysterious hook-like structures have been reported in multiple specimens of Scaphitidae, a large family of heteromorph Late Cretaceous ammonites. A previous examination of these structures revealed that they belong to the ammonites. Their nature, however, remained elusive. Here, we exploit tomographic data to study their arrangement in space in order to clarify this matter. After using topological data analyses and comparing their morphology, number, and distribution to other known cephalopod structures, in both extant and extinct taxa, we conclude that these hook-like structures represent part of the brachial crown armature. Therefore, it appears that there are at least three independent evolutionary origins of hooks: in belemnoids, oegospids, and now in ammonites. Finally, we propose for the first time a hypothetical reconstruction of an ammonite brachial crown.
Collapse
Affiliation(s)
- C. P. A. Smith
- grid.462242.40000 0004 0417 3208Biogéosciences, UMR 6282, Université Bourgogne Franche-Comté-CNRS-EPHE, 6 boulevard Gabriel, 21000 Dijon, France
| | - N. H. Landman
- grid.241963.b0000 0001 2152 1081Division of Paleontology (Invertebrates), American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024 USA
| | - J. Bardin
- grid.462844.80000 0001 2308 1657CR2P – Centre de Recherche en Paléontologie, Paris, UMR 7207, Sorbonne Université-MNHN-CNRS, 4 place Jussieu, 75005 Paris, France
| | - I. Kruta
- grid.241963.b0000 0001 2152 1081Division of Paleontology (Invertebrates), American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024 USA ,grid.462844.80000 0001 2308 1657CR2P – Centre de Recherche en Paléontologie, Paris, UMR 7207, Sorbonne Université-MNHN-CNRS, 4 place Jussieu, 75005 Paris, France
| |
Collapse
|
26
|
Jeon J, Kang S, Kim HU. Predicting biochemical and physiological effects of natural products from molecular structures using machine learning. Nat Prod Rep 2021; 38:1954-1966. [PMID: 34047331 DOI: 10.1039/d1np00016k] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Covering: 2016 to 2021Discovery of novel natural products has been greatly facilitated by advances in genome sequencing, genome mining and analytical techniques. As a result, the volume of data for natural products has increased over the years, which started to serve as ingredients for developing machine learning models. In the past few years, a number of machine learning models have been developed to examine various aspects of a molecule by effectively processing its molecular structure. Understanding of the biological effects of natural products can benefit from such machine learning approaches. In this context, this Highlight reviews recent studies on machine learning models developed to infer various biological effects of molecules. A particular attention is paid to molecular featurization, or computational representation of a molecular structure, which is an essential process during the development of a machine learning model. Technical challenges associated with the use of machine learning for natural products are further discussed.
Collapse
Affiliation(s)
- Junhyeok Jeon
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea.
| | - Seongmo Kang
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea.
| | - Hyun Uk Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea. and KAIST Institute for Artificial Intelligence, KAIST, Daejeon 34141, Republic of Korea and BioProcess Engineering Research Center and BioInformatics Research Center, KAIST, Daejeon 34141, Republic of Korea
| |
Collapse
|
27
|
Mirth J, Zhai Y, Bush J, Alvarado EG, Jordan H, Heim M, Krishnamoorthy B, Pflaum M, Clark A, Z Y, Adams H. Representations of energy landscapes by sublevelset persistent homology: An example with n-alkanes. J Chem Phys 2021; 154:114114. [PMID: 33752361 DOI: 10.1063/5.0036747] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Encoding the complex features of an energy landscape is a challenging task, and often, chemists pursue the most salient features (minima and barriers) along a highly reduced space, i.e., two- or three-dimensions. Even though disconnectivity graphs or merge trees summarize the connectivity of the local minima of an energy landscape via the lowest-barrier pathways, there is much information to be gained by also considering the topology of each connected component at different energy thresholds (or sublevelsets). We propose sublevelset persistent homology as an appropriate tool for this purpose. Our computations on the configuration phase space of n-alkanes from butane to octane allow us to conjecture, and then prove, a complete characterization of the sublevelset persistent homology of the alkane CmH2m+2 Potential Energy Landscapes (PELs), for all m, in all homological dimensions. We further compare both the analytical configurational PELs and sampled data from molecular dynamics simulation using the united and all-atom descriptions of the intramolecular interactions. In turn, this supports the application of distance metrics to quantify sampling fidelity and lays the foundation for future work regarding new metrics that quantify differences between the topological features of high-dimensional energy landscapes.
Collapse
Affiliation(s)
- Joshua Mirth
- Department of Mathematics, Colorado State University, Fort Collins, Colorado 80524, USA
| | - Yanqin Zhai
- Department of Nuclear, Plasma, and Radiological Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Johnathan Bush
- Department of Mathematics, Colorado State University, Fort Collins, Colorado 80524, USA
| | - Enrique G Alvarado
- Department of Mathematics and Statistics, Washington State University, Pullman, Washington 99164, USA
| | - Howie Jordan
- Department of Mathematics, University of Colorado, Boulder, Colorado 80309, USA
| | - Mark Heim
- Department of Mathematics, Colorado State University, Fort Collins, Colorado 80524, USA
| | - Bala Krishnamoorthy
- Department of Mathematics and Statistics, Washington State University, Vancouver, Washington 98686, USA
| | - Markus Pflaum
- Department of Mathematics, University of Colorado, Boulder, Colorado 80309, USA
| | - Aurora Clark
- Department of Chemistry, Washington State University, Pullman, Washington 99164, USA
| | - Y Z
- Department of Nuclear, Plasma, and Radiological Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Henry Adams
- Department of Mathematics, Colorado State University, Fort Collins, Colorado 80524, USA
| |
Collapse
|
28
|
Wang R, Zhao R, Ribando-Gros E, Chen J, Tong Y, Wei GW. HERMES: PERSISTENT SPECTRAL GRAPH SOFTWARE. FOUNDATIONS OF DATA SCIENCE (SPRINGFIELD, MO.) 2021; 3:67-97. [PMID: 34485918 PMCID: PMC8411887 DOI: 10.3934/fods.2021006] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/17/2023]
Abstract
Persistent homology (PH) is one of the most popular tools in topological data analysis (TDA), while graph theory has had a significant impact on data science. Our earlier work introduced the persistent spectral graph (PSG) theory as a unified multiscale paradigm to encompass TDA and geometric analysis. In PSG theory, families of persistent Laplacian matrices (PLMs) corresponding to various topological dimensions are constructed via a filtration to sample a given dataset at multiple scales. The harmonic spectra from the null spaces of PLMs offer the same topological invariants, namely persistent Betti numbers, at various dimensions as those provided by PH, while the non-harmonic spectra of PLMs give rise to additional geometric analysis of the shape of the data. In this work, we develop an open-source software package, called highly efficient robust multidimensional evolutionary spectra (HERMES), to enable broad applications of PSGs in science, engineering, and technology. To ensure the reliability and robustness of HERMES, we have validated the software with simple geometric shapes and complex datasets from three-dimensional (3D) protein structures. We found that the smallest non-zero eigenvalues are very sensitive to data abnormality.
Collapse
Affiliation(s)
- Rui Wang
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Rundong Zhao
- Department of Computer Science and Engineering, Michigan State University, MI 48824, USA
| | - Emily Ribando-Gros
- Department of Computer Science and Engineering, Michigan State University, MI 48824, USA
| | - Jiahui Chen
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Yiying Tong
- Department of Computer Science and Engineering, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Department of Electrical and Computer Engineering, Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|
29
|
Shen WX, Zeng X, Zhu F, Wang YL, Qin C, Tan Y, Jiang YY, Chen YZ. Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-021-00301-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
30
|
Townsend J, Vogiatzis KD. Transferable MP2-Based Machine Learning for Accurate Coupled-Cluster Energies. J Chem Theory Comput 2020; 16:7453-7461. [DOI: 10.1021/acs.jctc.0c00927] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Affiliation(s)
- Jacob Townsend
- Department of Chemistry, University of Tennessee, Knoxville, Tennessee 37996, United States
| | | |
Collapse
|