1
|
Su Z, Tong Y, Wei GW. Hodge Decomposition of Single-Cell RNA Velocity. J Chem Inf Model 2024; 64:3558-3568. [PMID: 38572676 PMCID: PMC11035094 DOI: 10.1021/acs.jcim.4c00132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 03/21/2024] [Accepted: 03/22/2024] [Indexed: 04/05/2024]
Abstract
RNA velocity has the ability to capture the cell dynamic information in the biological processes; yet, a comprehensive analysis of the cell state transitions and their associated chemical and biological processes remains a gap. In this work, we provide the Hodge decomposition, coupled with discrete exterior calculus (DEC), to unveil cell dynamics by examining the decomposed curl-free, divergence-free, and harmonic components of the RNA velocity field in a low dimensional representation, such as a UMAP or a t-SNE representation. Decomposition results show that the decomposed components distinctly reveal key cell dynamic features such as cell cycle, bifurcation, and cell lineage differentiation, regardless of the choice of the low-dimensional representations. The consistency across different representations demonstrates that the Hodge decomposition is a reliable and robust way to extract these cell dynamic features, offering unique analysis and insightful visualization of single-cell RNA velocity fields.
Collapse
Affiliation(s)
- Zhe Su
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yiying Tong
- Department
of Computer Science and Engineering, Michigan
State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Electrical and Computer Engineering, Michigan State University, East
Lansing, Michigan 48824, United States
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
2
|
Cottrell S, Wang R, Wei GW. PLPCA: Persistent Laplacian-Enhanced PCA for Microarray Data Analysis. J Chem Inf Model 2024; 64:2405-2420. [PMID: 37738663 PMCID: PMC10999748 DOI: 10.1021/acs.jcim.3c01023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/24/2023]
Abstract
Over the years, Principal Component Analysis (PCA) has served as the baseline approach for dimensionality reduction in gene expression data analysis. Its primary objective is to identify a subset of disease-causing genes from a vast pool of thousands of genes. However, PCA possesses inherent limitations that hinder its interpretability, introduce class ambiguity, and fail to capture complex geometric structures in the data. Although these limitations have been partially addressed in the literature by incorporating various regularizers, such as graph Laplacian regularization, existing PCA based methods still face challenges related to multiscale analysis and capturing higher-order interactions in the data. To address these challenges, we propose a novel approach called Persistent Laplacian-enhanced Principal Component Analysis (PLPCA). PLPCA amalgamates the advantages of earlier regularized PCA methods with persistent spectral graph theory, specifically persistent Laplacians derived from algebraic topology. In contrast to graph Laplacians, persistent Laplacians enable multiscale analysis through filtration and can incorporate higher-order simplicial complexes to capture higher-order interactions in the data. We evaluate and validate the performance of PLPCA using ten benchmark microarray data sets that exhibit a wide range of dimensions and data imbalance ratios. Our extensive studies over these data sets demonstrate that PLPCA provides up to 12% improvement to the current state-of-the-art PCA models on five evaluation metrics for classification tasks after dimensionality reduction.
Collapse
Affiliation(s)
- Sean Cottrell
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Rui Wang
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
3
|
Nicolle A, Deng S, Ihme M, Kuzhagaliyeva N, Ibrahim EA, Farooq A. Mixtures Recomposition by Neural Nets: A Multidisciplinary Overview. J Chem Inf Model 2024; 64:597-620. [PMID: 38284618 DOI: 10.1021/acs.jcim.3c01633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2024]
Abstract
Artificial Neural Networks (ANNs) are transforming how we understand chemical mixtures, providing an expressive view of the chemical space and multiscale processes. Their hybridization with physical knowledge can bridge the gap between predictivity and understanding of the underlying processes. This overview explores recent progress in ANNs, particularly their potential in the 'recomposition' of chemical mixtures. Graph-based representations reveal patterns among mixture components, and deep learning models excel in capturing complexity and symmetries when compared to traditional Quantitative Structure-Property Relationship models. Key components, such as Hamiltonian networks and convolution operations, play a central role in representing multiscale mixtures. The integration of ANNs with Chemical Reaction Networks and Physics-Informed Neural Networks for inverse chemical kinetic problems is also examined. The combination of sensors with ANNs shows promise in optical and biomimetic applications. A common ground is identified in the context of statistical physics, where ANN-based methods iteratively adapt their models by blending their initial states with training data. The concept of mixture recomposition unveils a reciprocal inspiration between ANNs and reactive mixtures, highlighting learning behaviors influenced by the training environment.
Collapse
Affiliation(s)
- Andre Nicolle
- Aramco Fuel Research Center, Rueil-Malmaison 92852, France
| | - Sili Deng
- Massachusetts Institute of Technology, Cambridge 02139, Massachusetts, United States
| | - Matthias Ihme
- Stanford University, Stanford 94305, California, United States
| | | | - Emad Al Ibrahim
- King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Aamir Farooq
- King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| |
Collapse
|
4
|
Wee J, Chen J, Xia K, Wei GW. Integration of persistent Laplacian and pre-trained transformer for protein solubility changes upon mutation. Comput Biol Med 2024; 169:107918. [PMID: 38194782 PMCID: PMC10922365 DOI: 10.1016/j.compbiomed.2024.107918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 12/21/2023] [Accepted: 01/01/2024] [Indexed: 01/11/2024]
Abstract
Protein mutations can significantly influence protein solubility, which results in altered protein functions and leads to various diseases. Despite tremendous effort, machine learning prediction of protein solubility changes upon mutation remains a challenging task as indicated by the poor scores of normalized Correct Prediction Ratio (CPR). Part of the challenge stems from the fact that there is no three-dimensional (3D) structures for the wild-type and mutant proteins. This work integrates persistent Laplacians and pre-trained Transformer for the task. The Transformer, pretrained with hundreds of millions of protein sequences, embeds wild-type and mutant sequences, while persistent Laplacians track the topological invariant change and homotopic shape evolution induced by mutations in 3D protein structures, which are rendered from AlphaFold2. The resulting machine learning model was trained on an extensive data set labeled with three solubility types. Our model outperforms all existing predictive methods and improves the state-of-the-art up to 15%.
Collapse
Affiliation(s)
- JunJie Wee
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Jiahui Chen
- Department of Mathematical Sciences, University of Arkansas, Fayetteville, AR 72701, USA
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore.
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA; Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA.
| |
Collapse
|
5
|
Cottrell S, Hozumi Y, Wei GW. K-Nearest-Neighbors Induced Topological PCA for Single Cell RNA-Sequence Data Analysis. ARXIV 2023:arXiv:2310.14521v1. [PMID: 37961744 PMCID: PMC10635285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Traditional PCA, a main workhorse in dimensionality reduction, lacks the ability to capture geometrical structure information embedded in the data, and previous graph Laplacian regularizations are limited by the analysis of only a single scale. We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L2,1 norm regularization to address multiscale and multiclass heterogeneity issues in data. We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method. The proposed kNN-PL is a new algebraic topology technique which addresses the many limitations of the traditional persistent homology. Rather than inducing filtration via the varying of a distance threshold, we introduced kNN-tPCA, where filtrations are achieved by varying the number of neighbors in a kNN network at each step, and find that this framework has significant implications for hyper-parameter tuning. We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse benchmark scRNA-seq datasets, and showcase that our methods outperform other unsupervised PCA enhancements from the literature, as well as popular Uniform Manifold Approximation (UMAP), t-Distributed Stochastic Neighbor Embedding (tSNE), and Projection Non-Negative Matrix Factorization (NMF) by significant margins. For example, tPCA provides up to 628%, 78%, and 149% improvements to UMAP, tSNE, and NMF, respectively on classification in the F1 metric, and kNN-tPCA offers 53%, 63%, and 32% improvements to UMAP, tSNE, and NMF, respectively on clustering in the ARI metric.
Collapse
Affiliation(s)
- Sean Cottrell
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Yuta Hozumi
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
6
|
Qiu Y, Wei GW. Artificial intelligence-aided protein engineering: from topological data analysis to deep protein language models. Brief Bioinform 2023; 24:bbad289. [PMID: 37580175 PMCID: PMC10516362 DOI: 10.1093/bib/bbad289] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 07/14/2023] [Accepted: 07/26/2023] [Indexed: 08/16/2023] Open
Abstract
Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development.
Collapse
Affiliation(s)
- Yuchi Qiu
- Department of Mathematics, Michigan State University, East Lansing, 48824 MI, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, 48824 MI, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, 48824 MI, USA
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, 48824 MI, USA
| |
Collapse
|
7
|
Qiu Y, Wei GW. Artificial intelligence-aided protein engineering: from topological data analysis to deep protein language models. ARXIV 2023:arXiv:2307.14587v1. [PMID: 37547662 PMCID: PMC10402185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Protein engineering is an emerging field in biotechnology that has the potential to revolutionize various areas, such as antibody design, drug discovery, food security, ecology, and more. However, the mutational space involved is too vast to be handled through experimental means alone. Leveraging accumulative protein databases, machine learning (ML) models, particularly those based on natural language processing (NLP), have considerably expedited protein engineering. Moreover, advances in topological data analysis (TDA) and artificial intelligence-based protein structure prediction, such as AlphaFold2, have made more powerful structure-based ML-assisted protein engineering strategies possible. This review aims to offer a comprehensive, systematic, and indispensable set of methodological components, including TDA and NLP, for protein engineering and to facilitate their future development.
Collapse
Affiliation(s)
- Yuchi Qiu
- Department of Mathematics, Michigan State University, East Lansing, 48824, MI, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, 48824, MI, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, 48824, MI, USA
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, 48824, MI, USA
| |
Collapse
|
8
|
Merkurjev E, Nguyen DD, Wei GW. Multiscale Laplacian Learning. APPL INTELL 2023; 53:15727-15746. [PMID: 38031564 PMCID: PMC10686291 DOI: 10.1007/s10489-022-04333-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/08/2022] [Indexed: 11/29/2022]
Abstract
Machine learning has greatly influenced many fields, including science. However, despite of the tremendous accomplishments of machine learning, one of the key limitations of most existing machine learning approaches is their reliance on large labeled sets, and thus, data with limited labeled samples remains a challenge. Moreover, the performance of machine learning methods often severely hindered in case of diverse data, usually associated with smaller data sets or data associated with areas of study where the size of the data sets is constrained by high experimental cost and/or ethics. These challenges call for innovative strategies for dealing with these types of data. In this work, the aforementioned challenges are addressed by integrating graph-based frameworks, semi-supervised techniques, multiscale structures, and modified and adapted optimization procedures. This results in two innovative multiscale Laplacian learning (MLL) approaches for machine learning tasks, such as data classification, and for tackling data with limited samples, diverse data, and small data sets. The first approach, multikernel manifold learning (MML), integrates manifold learning with multikernel information and incorporates a warped kernel regularizer using multiscale graph Laplacians. The second approach, the multiscale MBO (MMBO) method, introduces multiscale Laplacians to the modification of the famous classical Merriman-Bence-Osher (MBO) scheme, and makes use of fast solvers. We demonstrate the performance of our algorithms experimentally on a variety of benchmark data sets, and compare them favorably to the state-of-art approaches.
Collapse
Affiliation(s)
| | - Duc Duy Nguyen
- Department of Mathematics, University of Kentucky, KY 40506, USA
| | - Guo-Wei Wei
- Department of Mathematics, Department of Biochemistry and Molecular Biology, Department of Electrical and Computer Engineering Michigan State University, MI 48824, USA
| |
Collapse
|
9
|
Wei X, Chen J, Guo-Wei W. Persistent topological Laplacian analysis of SARS-CoV-2 variants. ARXIV 2023:arXiv:2301.10865v2. [PMID: 36748007 PMCID: PMC9900960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Topological data analysis (TDA) is an emerging field in mathematics and data science. Its central technique, persistent homology, has had tremendous success in many science and engineering disciplines. However, persistent homology has limitations, including its inability to handle heterogeneous information, such as multiple types of geometric objects; being qualitative rather than quantitative, e.g., counting a 5-member ring the same as a 6-member ring, and a failure to describe non-topological changes, such as homotopic changes in protein-protein binding. Persistent topological Laplacians (PTLs), such as persistent Laplacian and persistent sheaf Laplacian, were proposed to overcome the limitations of persistent homology. In this work, we examine the modeling and analysis power of PTLs in the study of the protein structures of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike receptor binding domain (RBD). First, we employ PTLs to study how the RBD mutation-induced structural changes of RBD-angiotensin-converting enzyme 2 (ACE2) binding complexes are captured in the changes of spectra of the PTLs among SARS-CoV-2 variants. Additionally, we use PTLs to analyze the binding of RBD and ACE2-induced structural changes of various SARS-CoV-2 variants. Finally, we explore the impacts of computationally generated RBD structures on a topological deep learning paradigm and predictions of deep mutational scanning datasets for the SARS-CoV-2 Omicron BA.2 variant. Our results indicate that PTLs have advantages over persistent homology in analyzing protein structural changes and provide a powerful new TDA tool for data science.
Collapse
Affiliation(s)
- Xiaoqi Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Jiahui Chen
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Wei Guo-Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|
10
|
Zhu Z, Dou B, Cao Y, Jiang J, Zhu Y, Chen D, Feng H, Liu J, Zhang B, Zhou T, Wei GW. TIDAL: Topology-Inferred Drug Addiction Learning. J Chem Inf Model 2023; 63:1472-1489. [PMID: 36826415 DOI: 10.1021/acs.jcim.3c00046] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
Drug addiction is a global public health crisis, and the design of antiaddiction drugs remains a major challenge due to intricate mechanisms. Since experimental drug screening and optimization are too time-consuming and expensive, there is urgent need to develop innovative artificial intelligence (AI) methods for addressing the challenge. We tackle this challenge by topology-inferred drug addiction learning (TIDAL) built from integrating multiscale topological Laplacians, deep bidirectional transformer, and ensemble-assisted neural networks (EANNs). Multiscale topological Laplacians are a novel class of algebraic topology tools that embed molecular topological invariants and algebraic invariants into its harmonic spectra and nonharmonic spectra, respectively. These invariants complement sequence information extracted from a bidirectional transformer. We validate the proposed TIDAL framework on 22 drug addiction related, 4 hERG, and 12 DAT data sets, which suggests that the proposed TIDAL is a state-of-the-art framework for the modeling and analysis of drug addiction data. We carry out cross-target analysis of the current drug addiction candidates to alert their side effects and identify their repurposing potentials. Our analysis reveals drug-mediated linear and bilinear target correlations. Finally, TIDAL is applied to shed light on relative efficacy, repurposing potential, and potential side effects of 12 existing antiaddiction medications. Our results suggest that TIDAL provides a new computational strategy for pressingly needed antisubstance addiction drug development.
Collapse
Affiliation(s)
- Zailiang Zhu
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, 430200, P R. China
| | - Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, 430200, P R. China
| | - Yukang Cao
- School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, 430200, P R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, 430200, P R. China.,Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, 430200, P R. China
| | - Dong Chen
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Hongsong Feng
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, 430200, P R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences, Wuhan Textile University, Wuhan, 430200, P R. China
| | - Tianshou Zhou
- Key Laboratory of Computational Mathematics, Guangdong Province, and School of Mathematics, Sun Yat-sen University, Guangzhou, 510006, P R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States.,Department of Electrical and Computer Engineering Michigan State University, East Lansing, Michigan 48824, United States.,Department of Biochemistry and Molecular Biology Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
11
|
Abstract
Path homology proposed by S.-T.Yau and his co-workers provides a new mathematical model for directed graphs and networks. Persistent path homology (PPH) extends the path homology with filtration to deal with asymmetry structures. However, PPH is constrained to purely topological persistence and cannot track the homotopic shape evolution of data during filtration. To overcome the limitation of PPH, persistent path Laplacian (PPL) is introduced to capture the shape evolution of data. PPL's harmonic spectra fully recover PPH's topological persistence and its non-harmonic spectra reveal the homotopic shape evolution of data during filtration.
Collapse
Affiliation(s)
- Rui Wang
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|
12
|
Chen D, Liu J, Wu J, Wei GW, Pan F, Yau ST. Path Topology in Molecular and Materials Sciences. J Phys Chem Lett 2023; 14:954-964. [PMID: 36688834 PMCID: PMC10799224 DOI: 10.1021/acs.jpclett.2c03706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
The structures of molecules and materials determine their functions. Understanding the structure and function relationship is the holy grail of molecular and materials sciences. However, the rational design of molecules and materials with desirable functions remains a grand challenge despite decades of efforts. A major obstacle is the lack of an intrinsic mathematical characteristic that attributes to a specific function. This work introduces persistent path topology (PPT) to effectively characterize directed networks extracted from functional units, such as constitutional isomers, cis-trans isomers, chiral molecules, Jahn-Teller isomerism, and high-entropy alloy catalysts. Path homology (PH) theory is utilized to decipher the role of mirror-symmetric sublattices that hinder the formation of periodic unit cells in amorphous solids. Topological perturbation analysis (TPA) is proposed to reveal the critical target in the blood coagulation system. The proposed topological tools can be directly applied to systems biology, omics sciences, topological materials, and machine learning study of molecular and materials sciences.
Collapse
Affiliation(s)
- Dong Chen
- School of Advanced Materials, Peking University, Shenzhen Graduate School, Shenzhen518055, China
- Department of Mathematics, Michigan State University, East Lansing, Michigan48824, United States
| | - Jian Liu
- School of Mathematical Sciences, Hebei Normal University, Heibei, 050024, China
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing101408, China
| | - Jie Wu
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing101408, China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan48824, United States
| | - Feng Pan
- School of Advanced Materials, Peking University, Shenzhen Graduate School, Shenzhen518055, China
| | - Shing-Tung Yau
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing101408, China
- Yau Mathematical Sciences Center, Tsinghua University, Beijing100084, China
| |
Collapse
|
13
|
Liu J, Xia KL, Wu J, Yau SST, Wei GW. Biomolecular Topology: Modelling and Analysis. ACTA MATHEMATICA SINICA, ENGLISH SERIES 2022; 38:1901-1938. [PMID: 36407804 PMCID: PMC9640850 DOI: 10.1007/s10114-022-2326-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 07/12/2022] [Indexed: 05/25/2023]
Abstract
With the great advancement of experimental tools, a tremendous amount of biomolecular data has been generated and accumulated in various databases. The high dimensionality, structural complexity, the nonlinearity, and entanglements of biomolecular data, ranging from DNA knots, RNA secondary structures, protein folding configurations, chromosomes, DNA origami, molecular assembly, to others at the macromolecular level, pose a severe challenge in their analysis and characterization. In the past few decades, mathematical concepts, models, algorithms, and tools from algebraic topology, combinatorial topology, computational topology, and topological data analysis, have demonstrated great power and begun to play an essential role in tackling the biomolecular data challenge. In this work, we introduce biomolecular topology, which concerns the topological problems and models originated from the biomolecular systems. More specifically, the biomolecular topology encompasses topological structures, properties and relations that are emerged from biomolecular structures, dynamics, interactions, and functions. We discuss the various types of biomolecular topology from structures (of proteins, DNAs, and RNAs), protein folding, and protein assembly. A brief discussion of databanks (and databases), theoretical models, and computational algorithms, is presented. Further, we systematically review related topological models, including graphs, simplicial complexes, persistent homology, persistent Laplacians, de Rham-Hodge theory, Yau-Hausdorff distance, and the topology-based machine learning models.
Collapse
Affiliation(s)
- Jian Liu
- School of Mathematical Sciences, Hebei Normal University, Shijiazhuang, 050024 P. R. China
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, 101408 P. R. China
| | - Ke-Lin Xia
- School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, 639798 Singapore
| | - Jie Wu
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, 101408 P. R. China
- Department of Mathematical Sciences, Tsinghua University, Beijing, 100084 P. R. China
| | - Stephen Shing-Toung Yau
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, 101408 P. R. China
- Department of Mathematical Sciences, Tsinghua University, Beijing, 100084 P. R. China
| | - Guo-Wei Wei
- Department of Mathematics & Department of Biochemistry and Molecular Biology & Department of Electrical and Computer Engineering, Michigan State University, Wells Hall 619 Red Cedar Road, East Lansing, MI 48824-1027 USA
| |
Collapse
|
14
|
Gao K, Wang R, Chen J, Cheng L, Frishcosy J, Huzumi Y, Qiu Y, Schluckbier T, Wei X, Wei GW. Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2. Chem Rev 2022; 122:11287-11368. [PMID: 35594413 PMCID: PMC9159519 DOI: 10.1021/acs.chemrev.1c00965] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Despite tremendous efforts in the past two years, our understanding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), virus-host interactions, immune response, virulence, transmission, and evolution is still very limited. This limitation calls for further in-depth investigation. Computational studies have become an indispensable component in combating coronavirus disease 2019 (COVID-19) due to their low cost, their efficiency, and the fact that they are free from safety and ethical constraints. Additionally, the mechanism that governs the global evolution and transmission of SARS-CoV-2 cannot be revealed from individual experiments and was discovered by integrating genotyping of massive viral sequences, biophysical modeling of protein-protein interactions, deep mutational data, deep learning, and advanced mathematics. There exists a tsunami of literature on the molecular modeling, simulations, and predictions of SARS-CoV-2 and related developments of drugs, vaccines, antibodies, and diagnostics. To provide readers with a quick update about this literature, we present a comprehensive and systematic methodology-centered review. Aspects such as molecular biophysics, bioinformatics, cheminformatics, machine learning, and mathematics are discussed. This review will be beneficial to researchers who are looking for ways to contribute to SARS-CoV-2 studies and those who are interested in the status of the field.
Collapse
Affiliation(s)
- Kaifu Gao
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Rui Wang
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Jiahui Chen
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Limei Cheng
- Clinical
Pharmacology and Pharmacometrics, Bristol
Myers Squibb, Princeton, New Jersey 08536, United States
| | - Jaclyn Frishcosy
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yuta Huzumi
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yuchi Qiu
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Tom Schluckbier
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Xiaoqi Wei
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
15
|
Alfarraj A, Wei GW. Geometric algebra generation of molecular surfaces. J R Soc Interface 2022; 19:20220117. [PMID: 35414214 PMCID: PMC9006026 DOI: 10.1098/rsif.2022.0117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Geometric algebra is a powerful framework that unifies mathematics and physics. Since its revival in the 1960s, it has attracted great attention and has been exploited in fields like physics, computer science and engineering. This work introduces a geometric algebra method for the molecular surface generation that uses the Clifford-Fourier transform (CFT) which is a generalization of the classical Fourier transform. Notably, the classical Fourier transform and CFT differ in the derivative property in [Formula: see text] for k even. This distinction is due to the non-commutativity of geometric product of pseudoscalars with multivectors and has significant consequences in applications. We use the CFT in [Formula: see text] to benefit from the derivative property in solving partial differential equations (PDEs). The CFT is used to solve the mode decomposition process in PDE transform. Two different initial cases are proposed to make the initial shapes in the present method. The proposed method is applied first to small molecules and proteins. To validate the method, the molecular surfaces generated are compared to surfaces of other definitions. Applications are considered to protein electrostatic surface potentials and solvation free energy. This work opens the door for further applications of geometric algebra and CFT in biological sciences.
Collapse
Affiliation(s)
- Azzam Alfarraj
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA.,Department of Mathematics, King Fahd University of Petroleum and Minerals, Dhahran 31261, KSA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA.,Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA.,Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
16
|
Chen J, Wang R, Wei GW. SARS-CoV-2 becoming more infectious as revealed by algebraic topology and deep learning. COMMUNICATIONS IN INFORMATION AND SYSTEMS 2021; 21:31-36. [PMID: 34675755 DOI: 10.4310/cis.2021.v21.n1.a2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused by coronavirus disease 2019 (COVID-19) has led to a tremendous human fatality and economic loss. SARS-CoV-2 infectivity is a key reason for the widespread viral transmission, but its rigorous experimental measurement is essentially impossible due to the ongoing genome evolution around the world. We show that artificial intelligence (AI) and algebraic topology (AT) offer an accurate and efficient alternative to the experimental determination of viral infectivity. AI and AT analysis indicates that the on-going mutations make SARS-CoV-2 more infectious.
Collapse
Affiliation(s)
- Jiahui Chen
- Department of Mathematics, Michigan State University MI 48824, USA
| | - Rui Wang
- Department of Mathematics, Michigan State University MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University MI 48824, USA
| |
Collapse
|
17
|
UMAP-assisted K-means clustering of large-scale SARS-CoV-2 mutation datasets. Comput Biol Med 2021; 131:104264. [PMID: 33647832 PMCID: PMC7897976 DOI: 10.1016/j.compbiomed.2021.104264] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 02/05/2021] [Accepted: 02/06/2021] [Indexed: 12/16/2022]
Abstract
Coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has a worldwide devastating effect. Understanding the evolution and transmission of SARS-CoV-2 is of paramount importance for controlling, combating and preventing COVID-19. Due to the rapid growth in both the number of SARS-CoV-2 genome sequences and the number of unique mutations, the phylogenetic analysis of SARS-CoV-2 genome isolates faces an emergent large-data challenge. We introduce a dimension-reduced K-means clustering strategy to tackle this challenge. We examine the performance and effectiveness of three dimension-reduction algorithms: principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP). By using four benchmark datasets, we found that UMAP is the best-suited technique due to its stable, reliable, and efficient performance, its ability to improve clustering accuracy, especially for large Jaccard distanced-based datasets, and its superior clustering visualization. The UMAP-assisted K-means clustering enables us to shed light on increasingly large datasets from SARS-CoV-2 genome isolates.
Collapse
|
18
|
Wang R, Zhao R, Ribando-Gros E, Chen J, Tong Y, Wei GW. HERMES: PERSISTENT SPECTRAL GRAPH SOFTWARE. FOUNDATIONS OF DATA SCIENCE (SPRINGFIELD, MO.) 2021; 3:67-97. [PMID: 34485918 PMCID: PMC8411887 DOI: 10.3934/fods.2021006] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/17/2023]
Abstract
Persistent homology (PH) is one of the most popular tools in topological data analysis (TDA), while graph theory has had a significant impact on data science. Our earlier work introduced the persistent spectral graph (PSG) theory as a unified multiscale paradigm to encompass TDA and geometric analysis. In PSG theory, families of persistent Laplacian matrices (PLMs) corresponding to various topological dimensions are constructed via a filtration to sample a given dataset at multiple scales. The harmonic spectra from the null spaces of PLMs offer the same topological invariants, namely persistent Betti numbers, at various dimensions as those provided by PH, while the non-harmonic spectra of PLMs give rise to additional geometric analysis of the shape of the data. In this work, we develop an open-source software package, called highly efficient robust multidimensional evolutionary spectra (HERMES), to enable broad applications of PSGs in science, engineering, and technology. To ensure the reliability and robustness of HERMES, we have validated the software with simple geometric shapes and complex datasets from three-dimensional (3D) protein structures. We found that the smallest non-zero eigenvalues are very sensitive to data abnormality.
Collapse
Affiliation(s)
- Rui Wang
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Rundong Zhao
- Department of Computer Science and Engineering, Michigan State University, MI 48824, USA
| | - Emily Ribando-Gros
- Department of Computer Science and Engineering, Michigan State University, MI 48824, USA
| | - Jiahui Chen
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Yiying Tong
- Department of Computer Science and Engineering, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Department of Electrical and Computer Engineering, Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|