1
|
Kleczynski M, Bergonzo C, Kearsley AJ. Spatial and Sequential Topological Analysis of Molecular Dynamics Simulations of IgG1 Fc Domains. J Chem Theory Comput 2025; 21:4884-4897. [PMID: 40261915 PMCID: PMC12079798 DOI: 10.1021/acs.jctc.5c00161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2025] [Revised: 04/05/2025] [Accepted: 04/10/2025] [Indexed: 04/24/2025]
Abstract
Monoclonal antibodies are utilized in a wide range of biomedical applications. The NIST monoclonal antibody is a resource for developing analysis methods for monoclonal antibody based biopharmaceutical platforms. Techniques from topological data analysis quantify structural features such as loops and tunnels which are not easily measured by classical data analysis methods. In this paper, we introduce the Gaussian CROCKER column differences (GCCD) matrix, which augments standard topological data analysis summaries with biological sequence information. We use GCCD matrices to successfully differentiate between glycosylated and aglycosylated conformations from molecular dynamics simulations of the NIST monoclonal antibody Fc domain. We are optimistic that other researchers will be able to utilize GCCD matrices to quantify multiscale spatial and sequential features.
Collapse
Affiliation(s)
- Melinda Kleczynski
- National
Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| | - Christina Bergonzo
- National
Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
- Institute
for Bioscience and Biotechnology Research, Rockville, Maryland 20850, United States
| | - Anthony J. Kearsley
- National
Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
2
|
Sarra GD, Jha S, Roudi Y. The role of oscillations in grid cells' toroidal topology. PLoS Comput Biol 2025; 21:e1012776. [PMID: 39879234 DOI: 10.1371/journal.pcbi.1012776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Accepted: 01/07/2025] [Indexed: 01/31/2025] Open
Abstract
Persistent homology applied to the activity of grid cells in the Medial Entorhinal Cortex suggests that this activity lies on a toroidal manifold. By analyzing real data and a simple model, we show that neural oscillations play a key role in the appearance of this toroidal topology. To quantitatively monitor how changes in spike trains influence the topology of the data, we first define a robust measure for the degree of toroidality of a dataset. Using this measure, we find that small perturbations ( ~ 100 ms) of spike times have little influence on both the toroidality and the hexagonality of the ratemaps. Jittering spikes by ~ 100-500 ms, however, destroys the toroidal topology, while still having little impact on grid scores. These critical jittering time scales fall in the range of the periods of oscillations between the theta and eta bands. We thus hypothesized that these oscillatory modulations of neuronal spiking play a key role in the appearance and robustness of toroidal topology and the hexagonal spatial selectivity is not sufficient. We confirmed this hypothesis using a simple model for the activity of grid cells, consisting of an ensemble of independent rate-modulated Poisson processes. When these rates were modulated by oscillations, the network behaved similarly to the real data in exhibiting toroidal topology, even when the position of the fields were perturbed. In the absence of oscillations, this similarity was substantially lower. Furthermore, we find that the experimentally recorded spike trains indeed exhibit temporal modulations at the eta and theta bands, and that the ratio of the power in the eta band to that of the theta band, [Formula: see text], correlates with the critical jittering time at which the toroidal topology disappears.
Collapse
Affiliation(s)
- Giovanni di Sarra
- Kavli Institute for Systems Neuroscience and Centre for Algorithms in the Cortex, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Siddharth Jha
- W.M. Keck Center for Neurophysics, Department of Physics and Astronomy, University of California Los Angeles, Los Angeles, California, United States of America
| | - Yasser Roudi
- Kavli Institute for Systems Neuroscience and Centre for Algorithms in the Cortex, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
- Department of Mathematics, King's College London, London, United Kingdom
| |
Collapse
|
3
|
Che M, Galaz-García F, Guijarro L, Membrillo Solis IA. Metric geometry of spaces of persistence diagrams. JOURNAL OF APPLIED AND COMPUTATIONAL TOPOLOGY 2024; 8:2197-2246. [PMID: 39524153 PMCID: PMC11541355 DOI: 10.1007/s41468-024-00189-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 03/11/2024] [Accepted: 07/12/2024] [Indexed: 11/16/2024]
Abstract
Persistence diagrams are objects that play a central role in topological data analysis. In the present article, we investigate the local and global geometric properties of spaces of persistence diagrams. In order to do this, we construct a family of functors D p , 1 ≤ p ≤ ∞ , that assign, to each metric pair (X, A), a pointed metric spaceD p ( X , A ) . Moreover, we show that D ∞ is sequentially continuous with respect to the Gromov-Hausdorff convergence of metric pairs, and we prove that D p preserves several useful metric properties, such as completeness and separability, for p ∈ [ 1 , ∞ ) , and geodesicity and non-negative curvature in the sense of Alexandrov, for p = 2 . For the latter case, we describe the metric of the space of directions at the empty diagram. We also show that the Fréchet mean set of a Borel probability measure onD p ( X , A ) , 1 ≤ p ≤ ∞ , with finite second moment and compact support is non-empty. As an application of our geometric framework, we prove that the space of Euclidean persistence diagrams,D p ( R 2 n , Δ n ) , 1 ≤ n and 1 ≤ p < ∞ , has infinite covering, Hausdorff, asymptotic, Assouad, and Assouad-Nagata dimensions.
Collapse
Affiliation(s)
- Mauricio Che
- Department of Mathematical Sciences, Durham University, Durham, UK
| | | | - Luis Guijarro
- Department of Mathematics, Universidad Autónoma de Madrid and ICMAT CSIC-UAM-UC3M, Madrid, Spain
| | - Ingrid Amaranta Membrillo Solis
- Mathematical Sciences, University of Southampton, Southampton, UK
- Present Address: School of Mathematical Sciences, Queen Mary University of London, London, UK
| |
Collapse
|
4
|
Bou Dagher L, Madern D, Malbos P, Brochier-Armanet C. Persistent homology reveals strong phylogenetic signal in 3D protein structures. PNAS NEXUS 2024; 3:pgae158. [PMID: 38689707 PMCID: PMC11058471 DOI: 10.1093/pnasnexus/pgae158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 04/01/2024] [Indexed: 05/02/2024]
Abstract
Changes that occur in proteins over time provide a phylogenetic signal that can be used to decipher their evolutionary history and the relationships between organisms. Sequence comparison is the most common way to access this phylogenetic signal, while those based on 3D structure comparisons are still in their infancy. In this study, we propose an effective approach based on Persistent Homology Theory (PH) to extract the phylogenetic information contained in protein structures. PH provides efficient and robust algorithms for extracting and comparing geometric features from noisy datasets at different spatial resolutions. PH has a growing number of applications in the life sciences, including the study of proteins (e.g. classification, folding). However, it has never been used to study the phylogenetic signal they may contain. Here, using 518 protein families, representing 22,940 protein sequences and structures, from 10 major taxonomic groups, we show that distances calculated with PH from protein structures correlate strongly with phylogenetic distances calculated from protein sequences, at both small and large evolutionary scales. We test several methods for calculating PH distances and propose some refinements to improve their relevance for addressing evolutionary questions. This work opens up new perspectives in evolutionary biology by proposing an efficient way to access the phylogenetic signal contained in protein structures, as well as future developments of topological analysis in the life sciences.
Collapse
Affiliation(s)
- Léa Bou Dagher
- Université Claude Bernard Lyon 1, CNRS, VetAgro Sup, Laboratoire de Biométrie et BiologieÉvolutive, UMR5558, F-69622 Villeurbanne, France
- Université Claude Bernard Lyon 1, CNRS, Institut Camille Jordan, UMR5208, F-69622 Villeurbanne, France
- Université Libanaise, Laboratoire de Mathématiques, École Doctorale en Science et Technologie, PO BOX 5 Hadath, Liban
| | - Dominique Madern
- University Grenoble Alpes, CEA, CNRS, IBS, 38000 Grenoble, France
| | - Philippe Malbos
- Université Claude Bernard Lyon 1, CNRS, Institut Camille Jordan, UMR5208, F-69622 Villeurbanne, France
| | - Céline Brochier-Armanet
- Université Claude Bernard Lyon 1, CNRS, VetAgro Sup, Laboratoire de Biométrie et BiologieÉvolutive, UMR5558, F-69622 Villeurbanne, France
| |
Collapse
|
5
|
Tarín-Pelló A, Suay-García B, Forés-Martos J, Falcó A, Pérez-Gracia MT. Computer-aided drug repurposing to tackle antibiotic resistance based on topological data analysis. Comput Biol Med 2023; 166:107496. [PMID: 37793206 DOI: 10.1016/j.compbiomed.2023.107496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Revised: 08/29/2023] [Accepted: 09/15/2023] [Indexed: 10/06/2023]
Abstract
The progressive emergence of antimicrobial resistance has become a global health problem in need of rapid solution. Research into new antimicrobial drugs is imperative. Drug repositioning, together with computational mathematical prediction models, could be a fast and efficient method of searching for new antibiotics. The aim of this study was to identify compounds with potential antimicrobial capacity against Escherichia coli from US Food and Drug Administration-approved drugs, and the similarity between known drug targets and E. coli proteins using a topological structure-activity data analysis model. This model has been shown to identify molecules with known antibiotic capacity, such as carbapenems and cephalosporins, as well as new molecules that could act as antimicrobials. Topological similarities were also found between E. coli proteins and proteins from different bacterial species such as Mycobacterium tuberculosis, Pseudomonas aeruginosa and Salmonella Typhimurium, which could imply that the selected molecules have a broader spectrum than expected. These molecules include antitumor drugs, antihistamines, lipid-lowering agents, hypoglycemic agents, antidepressants, nucleotides, and nucleosides, among others. The results presented in this study prove the ability of computational mathematical prediction models to predict molecules with potential antimicrobial capacity and/or possible new pharmacological targets of interest in the design of new antibiotics and in the better understanding of antimicrobial resistance.
Collapse
Affiliation(s)
- Antonio Tarín-Pelló
- Área de Microbiología, Departamento de Farmacia, Instituto de Ciencias Biomédicas, Facultad de Ciencias de la Salud Universidad Cardenal Herrera-CEU, CEU Universities, C/ Santiago Ramón y Cajal, 46115, Alfara del Patriarca, Valencia, Spain
| | - Beatriz Suay-García
- ESI International Chair@CEU-UCH, Departamento de Matemáticas, Física y Ciencias Tecnológicas, Universidad Cardenal Herrera-CEU, CEU Universities, C/ San Bartolomé 55, 46115, Alfara del Patriarca, Valencia, Spain
| | - Jaume Forés-Martos
- ESI International Chair@CEU-UCH, Departamento de Matemáticas, Física y Ciencias Tecnológicas, Universidad Cardenal Herrera-CEU, CEU Universities, C/ San Bartolomé 55, 46115, Alfara del Patriarca, Valencia, Spain
| | - Antonio Falcó
- ESI International Chair@CEU-UCH, Departamento de Matemáticas, Física y Ciencias Tecnológicas, Universidad Cardenal Herrera-CEU, CEU Universities, C/ San Bartolomé 55, 46115, Alfara del Patriarca, Valencia, Spain
| | - María-Teresa Pérez-Gracia
- Área de Microbiología, Departamento de Farmacia, Instituto de Ciencias Biomédicas, Facultad de Ciencias de la Salud Universidad Cardenal Herrera-CEU, CEU Universities, C/ Santiago Ramón y Cajal, 46115, Alfara del Patriarca, Valencia, Spain.
| |
Collapse
|
6
|
Wei X, Chen J, Wei GW. Persistent topological Laplacian analysis of SARS-CoV-2 variants. JOURNAL OF COMPUTATIONAL BIOPHYSICS AND CHEMISTRY 2023; 22:569-587. [PMID: 37829318 PMCID: PMC10569362 DOI: 10.1142/s2737416523500278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/14/2023]
Abstract
Topological data analysis (TDA) is an emerging field in mathematics and data science. Its central technique, persistent homology, has had tremendous success in many science and engineering disciplines. However, persistent homology has limitations, including its inability to handle heterogeneous information, such as multiple types of geometric objects; being qualitative rather than quantitative, e.g., counting a 5-member ring the same as a 6-member ring, and a failure to describe non-topological changes, such as homotopic changes in protein-protein binding. Persistent topological Laplacians (PTLs), such as persistent Laplacian and persistent sheaf Laplacian, were proposed to overcome the limitations of persistent homology. In this work, we examine the modeling and analysis power of PTLs in the study of the protein structures of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike receptor binding domain (RBD). First, we employ PTLs to study how the RBD mutation-induced structural changes of RBD-angiotensin-converting enzyme 2 (ACE2) binding complexes are captured in the changes of spectra of the PTLs among SARS-CoV-2 variants. Additionally, we use PTLs to analyze the binding of RBD and ACE2-induced structural changes of various SARS-CoV-2 variants. Finally, we explore the impacts of computationally generated RBD structures on a topological deep learning paradigm and predictions of deep mutational scanning datasets for the SARS-CoV-2 Omicron BA.2 variant. Our results indicate that PTLs have advantages over persistent homology in analyzing protein structural changes and provide a powerful new TDA tool for data science.
Collapse
Affiliation(s)
- Xiaoqi Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Jiahui Chen
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|
7
|
Yang X, Ren Y, Hong B, He A, Wang J, Wang Z. Epileptic detection in single and multi-lead EEG signals using persistent homology based on bi-directional weighted visibility graphs. CHAOS (WOODBURY, N.Y.) 2023; 33:2894484. [PMID: 37276567 DOI: 10.1063/5.0140579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 05/03/2023] [Indexed: 06/07/2023]
Abstract
Epilepsy is a widespread neurological disorder, and its recurrence and suddenness are making automatic detection of seizure an urgent necessity. For this purpose, this paper performs topological data analysis (TDA) of electroencephalographic (EEG) signals by the medium of graphs to explore the potential brain activity information they contain. Through our innovative method, we first map the time series of epileptic EEGs into bi-directional weighted visibility graphs (BWVGs), which give more comprehensive reflections of the signals compared to previous existing structures. Traditional graph-theoretic measurements are generally partial and mainly consider differences or correlations in vertices or edges, whereas persistent homology (PH), the essential part of TDA, provides an alternative way of thinking by quantifying the topology structure of the graphs and analyzing the evolution of these topological properties with scale changes. Therefore, we analyze the PH for BWVGs and then obtain the two indicators of persistence and birth-death for homology groups to reflect the topology of the mapping graphs of EEG signals and reveal the discrepancies in brain dynamics. Furthermore, we adopt neural networks (NNs) for the automatic detection of epileptic signals and successfully achieve a classification accuracy of 99.67% when distinguishing among three different sets of EEG signals from seizure, seizure-free, and healthy subjects. In addition, to accommodate multi-leads, we propose a classifier that incorporates graph structure to distinguish seizure and seizure-free EEG signals. The classification accuracies of the two subjects used in the classifier are as high as 99.23% and 94.76%, respectively, indicating that our proposed model is useful for the analysis of EEG signals.
Collapse
Affiliation(s)
- Xiaodong Yang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| | - Yanlin Ren
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| | - Binyi Hong
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| | - Aijun He
- School of Electronic Science and Engineering, Nanjing University, Nanjing 210093, China
| | - Jun Wang
- School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Zhixiao Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| |
Collapse
|
8
|
Amézquita EJ, Nasrin F, Storey KM, Yoshizawa M. Genomics data analysis via spectral shape and topology. PLoS One 2023; 18:e0284820. [PMID: 37099525 PMCID: PMC10132553 DOI: 10.1371/journal.pone.0284820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Accepted: 04/09/2023] [Indexed: 04/27/2023] Open
Abstract
Mapper, a topological algorithm, is frequently used as an exploratory tool to build a graphical representation of data. This representation can help to gain a better understanding of the intrinsic shape of high-dimensional genomic data and to retain information that may be lost using standard dimension-reduction algorithms. We propose a novel workflow to process and analyze RNA-seq data from tumor and healthy subjects integrating Mapper, differential gene expression, and spectral shape analysis. Precisely, we show that a Gaussian mixture approximation method can be used to produce graphical structures that successfully separate tumor and healthy subjects, and produce two subgroups of tumor subjects. A further analysis using DESeq2, a popular tool for the detection of differentially expressed genes, shows that these two subgroups of tumor cells bear two distinct gene regulations, suggesting two discrete paths for forming lung cancer, which could not be highlighted by other popular clustering methods, including t-distributed stochastic neighbor embedding (t-SNE). Although Mapper shows promise in analyzing high-dimensional data, tools to statistically analyze Mapper graphical structures are limited in the existing literature. In this paper, we develop a scoring method using heat kernel signatures that provides an empirical setting for statistical inferences such as hypothesis testing, sensitivity analysis, and correlation analysis.
Collapse
Affiliation(s)
- Erik J. Amézquita
- Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI, United States of America
| | - Farzana Nasrin
- Department of Mathematics, University of Hawaii at Manoa, Honolulu, HI, United States of America
| | - Kathleen M. Storey
- Department of Mathematics, Lafayette College, Easton, PA, United States of America
| | - Masato Yoshizawa
- School of Life Sciences, University of Hawaii at Manoa, Honolulu, HI, United States of America
| |
Collapse
|
9
|
Benjamin K, Mukta L, Moryoussef G, Uren C, Harrington HA, Tillmann U, Barbensi A. Homology of homologous knotted proteins. J R Soc Interface 2023; 20:20220727. [PMID: 37122282 PMCID: PMC10130707 DOI: 10.1098/rsif.2022.0727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 04/06/2023] [Indexed: 05/02/2023] Open
Abstract
Quantification and classification of protein structures, such as knotted proteins, often requires noise-free and complete data. Here, we develop a mathematical pipeline that systematically analyses protein structures. We showcase this geometric framework on proteins forming open-ended trefoil knots, and we demonstrate that the mathematical tool, persistent homology, faithfully represents their structural homology. This topological pipeline identifies important geometric features of protein entanglement and clusters the space of trefoil proteins according to their depth. Persistence landscapes quantify the topological difference between a family of knotted and unknotted proteins in the same structural homology class. This difference is localized and interpreted geometrically with recent advancements in systematic computation of homology generators. The topological and geometric quantification we find is robust to noisy input data, which demonstrates the potential of this approach in contexts where standard knot theoretic tools fail.
Collapse
Affiliation(s)
| | - Lamisah Mukta
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK
| | | | - Christopher Uren
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK
| | - Heather A. Harrington
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
| | - Ulrike Tillmann
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK
- Isaac Newton Institute for Mathematical Sciences, University of Cambridge, Cambridge CB3 0EH, UK
| | - Agnese Barbensi
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK
- School of Mathematics and Statistics, University of Melbourne, Melbourne, Victoria 3010, Australia
| |
Collapse
|
10
|
Ye X, Sun F, Xiang S. TREPH: A Plug-In Topological Layer for Graph Neural Networks. ENTROPY (BASEL, SWITZERLAND) 2023; 25:331. [PMID: 36832697 PMCID: PMC9954936 DOI: 10.3390/e25020331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 02/04/2023] [Accepted: 02/08/2023] [Indexed: 06/18/2023]
Abstract
Topological Data Analysis (TDA) is an approach to analyzing the shape of data using techniques from algebraic topology. The staple of TDA is Persistent Homology (PH). Recent years have seen a trend of combining PH and Graph Neural Networks (GNNs) in an end-to-end manner to capture topological features from graph data. Though effective, these methods are limited by the shortcomings of PH: incomplete topological information and irregular output format. Extended Persistent Homology (EPH), as a variant of PH, addresses these problems elegantly. In this paper, we propose a plug-in topological layer for GNNs, termed Topological Representation with Extended Persistent Homology (TREPH). Taking advantage of the uniformity of EPH, a novel aggregation mechanism is designed to collate topological features of different dimensions to the local positions determining their living processes. The proposed layer is provably differentiable and more expressive than PH-based representations, which in turn is strictly stronger than message-passing GNNs in expressive power. Experiments on real-world graph classification tasks demonstrate the competitiveness of TREPH compared with the state-of-the-art approaches.
Collapse
Affiliation(s)
- Xue Ye
- National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 101408, China
| | - Fang Sun
- School of Mathematical Sciences, Capital Normal University, Beijing 100048, China
| | - Shiming Xiang
- National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 101408, China
| |
Collapse
|
11
|
Yu D, Zhou X, Pan Y, Niu Z, Yuan X, Sun H. University Academic Performance Development Prediction Based on TDA. ENTROPY (BASEL, SWITZERLAND) 2022; 25:24. [PMID: 36673165 PMCID: PMC9857682 DOI: 10.3390/e25010024] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 11/21/2022] [Accepted: 12/16/2022] [Indexed: 06/17/2023]
Abstract
With the rapid development of higher education, the evaluation of the academic growth potential of universities has received extensive attention from scholars and educational administrators. Although the number of papers on university academic evaluation is increasing, few scholars have conducted research on the changing trend of university academic performance. Because traditional statistical methods and deep learning techniques have proven to be incapable of handling short time series data well, this paper proposes to adopt topological data analysis (TDA) to extract specified features from short time series data and then construct the model for the prediction of trend of university academic performance. The performance of the proposed method is evaluated by experiments on a real-world university academic performance dataset. By comparing the prediction results given by the Markov chain as well as SVM on the original data and TDA statistics, respectively, we demonstrate that the data generated by TDA methods can help construct very discriminative models and have a great advantage over the traditional models. In addition, this paper gives the prediction results as a reference, which provides a new perspective for the development evaluation of the academic performance of colleges and universities.
Collapse
Affiliation(s)
- Daohua Yu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Xin Zhou
- School of Mathematics and Statistics, Beijing Institute of Technology, Beijing 100081, China
| | - Yu Pan
- School of Mathematics and Statistics, Beijing Institute of Technology, Beijing 100081, China
| | - Zhendong Niu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
- School of Computing and Information, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | - Xu Yuan
- Yangtze Delta Region Academy of Beijing Institute of Technology, Jiaxing 314019, China
| | - Huafei Sun
- School of Mathematics and Statistics, Beijing Institute of Technology, Beijing 100081, China
- Yangtze Delta Region Academy of Beijing Institute of Technology, Jiaxing 314019, China
| |
Collapse
|
12
|
Guo Z, Yamaguchi R. Machine learning methods for protein-protein binding affinity prediction in protein design. FRONTIERS IN BIOINFORMATICS 2022; 2:1065703. [PMID: 36591334 PMCID: PMC9800603 DOI: 10.3389/fbinf.2022.1065703] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 12/01/2022] [Indexed: 12/23/2022] Open
Abstract
Protein-protein interactions govern a wide range of biological activity. A proper estimation of the protein-protein binding affinity is vital to design proteins with high specificity and binding affinity toward a target protein, which has a variety of applications including antibody design in immunotherapy, enzyme engineering for reaction optimization, and construction of biosensors. However, experimental and theoretical modelling methods are time-consuming, hinder the exploration of the entire protein space, and deter the identification of optimal proteins that meet the requirements of practical applications. In recent years, the rapid development in machine learning methods for protein-protein binding affinity prediction has revealed the potential of a paradigm shift in protein design. Here, we review the prediction methods and associated datasets and discuss the requirements and construction methods of binding affinity prediction models for protein design.
Collapse
Affiliation(s)
- Zhongliang Guo
- Division of Cancer Systems Biology, Aichi Cancer Center Research Institute, Nagoya, Aichi, Japan
| | - Rui Yamaguchi
- Division of Cancer Systems Biology, Aichi Cancer Center Research Institute, Nagoya, Aichi, Japan,Division of Cancer Informatics, Nagoya University Graduate School of Medicine, Nagoya, Aichi, Japan,*Correspondence: Rui Yamaguchi,
| |
Collapse
|
13
|
Hayashi S, Koseki J, Shimamura T. Bayesian statistical method for detecting structural and topological diversity in polymorphic proteins. Comput Struct Biotechnol J 2022; 20:6519-6525. [DOI: 10.1016/j.csbj.2022.11.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 11/17/2022] [Accepted: 11/18/2022] [Indexed: 11/22/2022] Open
|
14
|
Migdałek G, Żelawski M. Measuring population-level plant gene flow with topological data analysis. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
15
|
Gao K, Wang R, Chen J, Cheng L, Frishcosy J, Huzumi Y, Qiu Y, Schluckbier T, Wei X, Wei GW. Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2. Chem Rev 2022; 122:11287-11368. [PMID: 35594413 PMCID: PMC9159519 DOI: 10.1021/acs.chemrev.1c00965] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Despite tremendous efforts in the past two years, our understanding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), virus-host interactions, immune response, virulence, transmission, and evolution is still very limited. This limitation calls for further in-depth investigation. Computational studies have become an indispensable component in combating coronavirus disease 2019 (COVID-19) due to their low cost, their efficiency, and the fact that they are free from safety and ethical constraints. Additionally, the mechanism that governs the global evolution and transmission of SARS-CoV-2 cannot be revealed from individual experiments and was discovered by integrating genotyping of massive viral sequences, biophysical modeling of protein-protein interactions, deep mutational data, deep learning, and advanced mathematics. There exists a tsunami of literature on the molecular modeling, simulations, and predictions of SARS-CoV-2 and related developments of drugs, vaccines, antibodies, and diagnostics. To provide readers with a quick update about this literature, we present a comprehensive and systematic methodology-centered review. Aspects such as molecular biophysics, bioinformatics, cheminformatics, machine learning, and mathematics are discussed. This review will be beneficial to researchers who are looking for ways to contribute to SARS-CoV-2 studies and those who are interested in the status of the field.
Collapse
Affiliation(s)
- Kaifu Gao
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Rui Wang
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Jiahui Chen
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Limei Cheng
- Clinical
Pharmacology and Pharmacometrics, Bristol
Myers Squibb, Princeton, New Jersey 08536, United States
| | - Jaclyn Frishcosy
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yuta Huzumi
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yuchi Qiu
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Tom Schluckbier
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Xiaoqi Wei
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
16
|
Stolz BJ, Kaeppler J, Markelc B, Braun F, Lipsmeier F, Muschel RJ, Byrne HM, Harrington HA. Multiscale topology characterizes dynamic tumor vascular networks. SCIENCE ADVANCES 2022; 8:eabm2456. [PMID: 35687679 PMCID: PMC9187234 DOI: 10.1126/sciadv.abm2456] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 04/27/2022] [Indexed: 06/15/2023]
Abstract
Advances in imaging techniques enable high-resolution three-dimensional (3D) visualization of vascular networks over time and reveal abnormal structural features such as twists and loops, and their quantification is an active area of research. Here, we showcase how topological data analysis, the mathematical field that studies the "shape" of data, can characterize the geometric, spatial, and temporal organization of vascular networks. We propose two topological lenses to study vasculature, which capture inherent multiscale features and vessel connectivity, and surpass the single-scale analysis of existing methods. We analyze images collected using intravital and ultramicroscopy modalities and quantify spatiotemporal variation of twists, loops, and avascular regions (voids) in 3D vascular networks. This topological approach validates and quantifies known qualitative trends such as dynamic changes in tortuosity and loops in response to antibodies that modulate vessel sprouting; furthermore, it quantifies the effect of radiotherapy on vessel architecture.
Collapse
Affiliation(s)
| | - Jakob Kaeppler
- Oxford Institute for Radiation Oncology, University of Oxford, Oxford, UK
| | - Bostjan Markelc
- Oxford Institute for Radiation Oncology, University of Oxford, Oxford, UK
- Department of Experimental Oncology, Institute of Oncology Ljubljana, Ljubljana, Slovenia
| | - Franziska Braun
- Data Science, pRED Informatics, Pharma Research & Early Development, Roche Innovation Center Munich, Munich, Germany
| | - Florian Lipsmeier
- Digital Biomarkers, pRED Informatics, Pharma Research & Early Development, Roche Innovation Center Basel, Basel, Switzerland
| | - Ruth J. Muschel
- Oxford Institute for Radiation Oncology, University of Oxford, Oxford, UK
| | - Helen M. Byrne
- Mathematical Institute, University of Oxford, Oxford, UK
| | - Heather A. Harrington
- Mathematical Institute, University of Oxford, Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| |
Collapse
|
17
|
Skaf Y, Laubenbacher R. Topological data analysis in biomedicine: A review. J Biomed Inform 2022; 130:104082. [PMID: 35508272 DOI: 10.1016/j.jbi.2022.104082] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/20/2022] [Accepted: 04/23/2022] [Indexed: 01/22/2023]
Abstract
Significant technological advances made in recent years have shepherded a dramatic increase in utilization of digital technologies for biomedicine- everything from the widespread use of electronic health records to improved medical imaging capabilities and the rising ubiquity of genomic sequencing contribute to a "digitization" of biomedical research and clinical care. With this shift toward computerized tools comes a dramatic increase in the amount of available data, and current tools for data analysis capable of extracting meaningful knowledge from this wealth of information have yet to catch up. This article seeks to provide an overview of emerging mathematical methods with the potential to improve the abilities of clinicians and researchers to analyze biomedical data, but may be hindered from doing so by a lack of conceptual accessibility and awareness in the life sciences research community. In particular, we focus on topological data analysis (TDA), a set of methods grounded in the mathematical field of algebraic topology that seeks to describe and harness features related to the "shape" of data. We aim to make such techniques more approachable to non-mathematicians by providing a conceptual discussion of their theoretical foundations followed by a survey of their published applications to scientific research. Finally, we discuss the limitations of these methods and suggest potential avenues for future work integrating mathematical tools into clinical care and biomedical informatics.
Collapse
Affiliation(s)
- Yara Skaf
- University of Florida, Department of Mathematics, Gainesville, FL, USA; University of Florida, Department of Medicine, Division of Pulmonary, Critical Care, & Sleep Medicine, Gainesville, FL, USA.
| | - Reinhard Laubenbacher
- University of Florida, Department of Mathematics, Gainesville, FL, USA; University of Florida, Department of Medicine, Division of Pulmonary, Critical Care, & Sleep Medicine, Gainesville, FL, USA.
| |
Collapse
|
18
|
Noshita K, Murata H, Kirie S. Model-based plant phenomics on morphological traits using morphometric descriptors. BREEDING SCIENCE 2022; 72:19-30. [PMID: 36045892 PMCID: PMC8987841 DOI: 10.1270/jsbbs.21078] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 12/20/2021] [Indexed: 06/15/2023]
Abstract
The morphological traits of plants contribute to many important functional features such as radiation interception, lodging tolerance, gas exchange efficiency, spatial competition between individuals and/or species, and disease resistance. Although the importance of plant phenotyping techniques is increasing with advances in molecular breeding strategies, there are barriers to its advancement, including the gap between measured data and phenotypic values, low quantitativity, and low throughput caused by the lack of models for representing morphological traits. In this review, we introduce morphological descriptors that can be used for phenotyping plant morphological traits. Geometric morphometric approaches pave the way to a general-purpose method applicable to single units. Hierarchical structures composed of an indefinite number of multiple elements, which is often observed in plants, can be quantified in terms of their multi-scale topological characteristics using topological data analysis. Theoretical morphological models capture specific anatomical structures, if recognized. These morphological descriptors provide us with the advantages of model-based plant phenotyping, including robust quantification of limited datasets. Moreover, we discuss the future possibilities that a system of model-based measurement and model refinement would solve the lack of morphological models and the difficulties in scaling out the phenotyping processes.
Collapse
Affiliation(s)
- Koji Noshita
- Department of Biology, Kyushu University, Fukuoka, Fukuoka 819-0395, Japan
- Plant Frontier Research Center, Kyushu University, Fukuoka, Fukuoka 819-0395, Japan
| | - Hidekazu Murata
- Department of Biology, Kyushu University, Fukuoka, Fukuoka 819-0395, Japan
| | - Shiryu Kirie
- metaPhorest (Bioaesthetics Platform), Department of Electrical Engineering and Bioscience, Waseda University, TWIns, Tokyo 162-8480, Japan
| |
Collapse
|
19
|
Stenseke J. Persistent homology and the shape of evolutionary games. J Theor Biol 2021; 531:110903. [PMID: 34534569 DOI: 10.1016/j.jtbi.2021.110903] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 09/08/2021] [Accepted: 09/09/2021] [Indexed: 11/17/2022]
Abstract
For nearly three decades, spatial games have produced a wealth of insights to the study of behavior and its relation to population structure. However, as different rules and factors are added or altered, the dynamics of spatial models often become increasingly complicated to interpret. To tackle this problem, we introduce persistent homology as a rigorous framework that can be used to both define and compute higher-order features of data in a manner which is invariant to parameter choices, robust to noise, and independent of human observation. Our work demonstrates its relevance for spatial games by showing how topological features of simulation data that persist over different spatial scales reflect the stability of strategies in 2D lattice games. To do so, we analyze the persistent homology of scenarios from two games: a Prisoner's Dilemma and a SIRS epidemic model. The experimental results show how the method accurately detects features that correspond to real aspects of the game dynamics. Unlike other tools that study dynamics of spatial systems, persistent homology can tell us something meaningful about population structure while remaining neutral about the underlying structure itself. Regardless of game complexity, since strategies either succeed or fail to conform to shapes of a certain topology there is much potential for the method to provide novel insights for a wide variety of spatially extended systems in biology, social science, and physics.
Collapse
Affiliation(s)
- Jakob Stenseke
- Department of Philosophy, Lund University, Helgonavagen 3, Lund 221 00, Sweden.
| |
Collapse
|
20
|
Chen J, Zhao R, Tong Y, Wei GW. EVOLUTIONARY DE RHAM-HODGE METHOD. DISCRETE AND CONTINUOUS DYNAMICAL SYSTEMS. SERIES B 2021; 26:3785-3821. [PMID: 34675756 DOI: 10.3934/dcdsb.2020257] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The de Rham-Hodge theory is a landmark of the 20th Century's mathematics and has had a great impact on mathematics, physics, computer science, and engineering. This work introduces an evolutionary de Rham-Hodge method to provide a unified paradigm for the multiscale geometric and topological analysis of evolving manifolds constructed from a filtration, which induces a family of evolutionary de Rham complexes. While the present method can be easily applied to close manifolds, the emphasis is given to more challenging compact manifolds with 2-manifold boundaries, which require appropriate analysis and treatment of boundary conditions on differential forms to maintain proper topological properties. Three sets of unique evolutionary Hodge Laplacians are proposed to generate three sets of topology-preserving singular spectra, for which the multiplicities of zero eigenvalues correspond to exactly the persistent Betti numbers of dimensions 0, 1 and 2. Additionally, three sets of non-zero eigenvalues further reveal both topological persistence and geometric progression during the manifold evolution. Extensive numerical experiments are carried out via the discrete exterior calculus to demonstrate the potential of the proposed paradigm for data representation and shape analysis of both point cloud data and density maps. To demonstrate the utility of the proposed method, the application is considered to the protein B-factor predictions of a few challenging cases for which existing biophysical models break down.
Collapse
Affiliation(s)
- Jiahui Chen
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Rundong Zhao
- Department of Computer Science and Engineering, Michigan State University, MI 48824, USA
| | - Yiying Tong
- Department of Computer Science and Engineering, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
| |
Collapse
|
21
|
Chazal F, Michel B. An Introduction to Topological Data Analysis: Fundamental and Practical Aspects for Data Scientists. Front Artif Intell 2021; 4:667963. [PMID: 34661095 PMCID: PMC8511823 DOI: 10.3389/frai.2021.667963] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 07/16/2021] [Indexed: 11/30/2022] Open
Abstract
With the recent explosion in the amount, the variety, and the dimensionality of available data, identifying, extracting, and exploiting their underlying structure has become a problem of fundamental importance for data analysis and statistical learning. Topological data analysis (tda) is a recent and fast-growing field providing a set of new topological and geometric tools to infer relevant features for possibly complex data. It proposes new well-founded mathematical theories and computational tools that can be used independently or in combination with other data analysis and statistical learning techniques. This article is a brief introduction, through a few selected topics, to basic fundamental and practical aspects of tda for nonexperts.
Collapse
Affiliation(s)
- Frédéric Chazal
- Inria Saclay - Île-de-France Research Centre, Palaiseau, France
| | | |
Collapse
|
22
|
Vipond O, Bull JA, Macklin PS, Tillmann U, Pugh CW, Byrne HM, Harrington HA. Multiparameter persistent homology landscapes identify immune cell spatial patterns in tumors. Proc Natl Acad Sci U S A 2021; 118:e2102166118. [PMID: 34625491 PMCID: PMC8522280 DOI: 10.1073/pnas.2102166118] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/24/2021] [Indexed: 12/29/2022] Open
Abstract
Highly resolved spatial data of complex systems encode rich and nonlinear information. Quantification of heterogeneous and noisy data-often with outliers, artifacts, and mislabeled points-such as those from tissues, remains a challenge. The mathematical field that extracts information from the shape of data, topological data analysis (TDA), has expanded its capability for analyzing real-world datasets in recent years by extending theory, statistics, and computation. An extension to the standard theory to handle heterogeneous data is multiparameter persistent homology (MPH). Here we provide an application of MPH landscapes, a statistical tool with theoretical underpinnings. MPH landscapes, computed for (noisy) data from agent-based model simulations of immune cells infiltrating into a spheroid, are shown to surpass existing spatial statistics and one-parameter persistent homology. We then apply MPH landscapes to study immune cell location in digital histology images from head and neck cancer. We quantify intratumoral immune cells and find that infiltrating regulatory T cells have more prominent voids in their spatial patterns than macrophages. Finally, we consider how TDA can integrate and interrogate data of different types and scales, e.g., immune cell locations and regions with differing levels of oxygenation. This work highlights the power of MPH landscapes for quantifying, characterizing, and comparing features within the tumor microenvironment in synthetic and real datasets.
Collapse
Affiliation(s)
- Oliver Vipond
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
| | - Joshua A Bull
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
| | - Philip S Macklin
- Nuffield Department of Medicine Research Building, University of Oxford, Oxford OX3 7FZ, United Kingdom
| | - Ulrike Tillmann
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom;
| | - Christopher W Pugh
- Nuffield Department of Medicine Research Building, University of Oxford, Oxford OX3 7FZ, United Kingdom;
| | - Helen M Byrne
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom;
| | - Heather A Harrington
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom;
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, United Kingdom
| |
Collapse
|
23
|
Li L, Thompson C, Henselman-Petrusek G, Giusti C, Ziegelmeier L. Minimal Cycle Representatives in Persistent Homology Using Linear Programming: An Empirical Study With User's Guide. Front Artif Intell 2021; 4:681117. [PMID: 34708196 PMCID: PMC8544243 DOI: 10.3389/frai.2021.681117] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Accepted: 05/14/2021] [Indexed: 12/24/2022] Open
Abstract
Cycle representatives of persistent homology classes can be used to provide descriptions of topological features in data. However, the non-uniqueness of these representatives creates ambiguity and can lead to many different interpretations of the same set of classes. One approach to solving this problem is to optimize the choice of representative against some measure that is meaningful in the context of the data. In this work, we provide a study of the effectiveness and computational cost of severalℓ 1 minimization optimization procedures for constructing homological cycle bases for persistent homology with rational coefficients in dimension one, including uniform-weighted and length-weighted edge-loss algorithms as well as uniform-weighted and area-weighted triangle-loss algorithms. We conduct these optimizations via standard linear programming methods, applying general-purpose solvers to optimize over column bases of simplicial boundary matrices. Our key findings are: 1) optimization is effective in reducing the size of cycle representatives, though the extent of the reduction varies according to the dimension and distribution of the underlying data, 2) the computational cost of optimizing a basis of cycle representatives exceeds the cost of computing such a basis, in most data sets we consider, 3) the choice of linear solvers matters a lot to the computation time of optimizing cycles, 4) the computation time of solving an integer program is not significantly longer than the computation time of solving a linear program for most of the cycle representatives, using the Gurobi linear solver, 5) strikingly, whether requiring integer solutions or not, we almost always obtain a solution with the same cost and almost all solutions found have entries in{ - 1,0,1 } and therefore, are also solutions to a restrictedℓ 0 optimization problem, and 6) we obtain qualitatively different results for generators in Erdős-Rényi random clique complexes than in real-world and synthetic point cloud data.
Collapse
Affiliation(s)
- Lu Li
- Mathematics, Statistics, and Computer Science Department, Macalester College, Saint Paul, MN, United States
| | - Connor Thompson
- Department of Mathematics, Purdue University, West Lafayette, IN, United States
| | | | - Chad Giusti
- Department of Mathematical Sciences, University of Delaware, Newark, DE, United States
| | - Lori Ziegelmeier
- Mathematics, Statistics, and Computer Science Department, Macalester College, Saint Paul, MN, United States
| |
Collapse
|
24
|
Turkeš R, Nys J, Verdonck T, Latré S. Noise robustness of persistent homology on greyscale images, across filtrations and signatures. PLoS One 2021; 16:e0257215. [PMID: 34559812 PMCID: PMC8462731 DOI: 10.1371/journal.pone.0257215] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 08/25/2021] [Indexed: 11/18/2022] Open
Abstract
Topological data analysis is a recent and fast growing field that approaches the analysis of datasets using techniques from (algebraic) topology. Its main tool, persistent homology (PH), has seen a notable increase in applications in the last decade. Often cited as the most favourable property of PH and the main reason for practical success are the stability theorems that give theoretical results about noise robustness, since real data is typically contaminated with noise or measurement errors. However, little attention has been paid to what these stability theorems mean in practice. To gain some insight into this question, we evaluate the noise robustness of PH on the MNIST dataset of greyscale images. More precisely, we investigate to what extent PH changes under typical forms of image noise, and quantify the loss of performance in classifying the MNIST handwritten digits when noise is added to the data. The results show that the sensitivity to noise of PH is influenced by the choice of filtrations and persistence signatures (respectively the input and output of PH), and in particular, that PH features are often not robust to noise in a classification task.
Collapse
Affiliation(s)
- Renata Turkeš
- Department of Computer Science, IDLab, University of Antwerp - imec, Antwerp, Belgium
| | - Jannes Nys
- Department of Computer Science, IDLab, University of Antwerp - imec, Antwerp, Belgium
| | - Tim Verdonck
- Department of Mathematics, Applied Mathematics, University of Antwerp, Antwerp, Belgium
| | - Steven Latré
- Department of Computer Science, IDLab, University of Antwerp - imec, Antwerp, Belgium
| |
Collapse
|
25
|
Salch A, Regalski A, Abdallah H, Suryadevara R, Catanzaro MJ, Diwadkar VA. From mathematics to medicine: A practical primer on topological data analysis (TDA) and the development of related analytic tools for the functional discovery of latent structure in fMRI data. PLoS One 2021; 16:e0255859. [PMID: 34383838 PMCID: PMC8360597 DOI: 10.1371/journal.pone.0255859] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Accepted: 07/23/2021] [Indexed: 11/19/2022] Open
Abstract
fMRI is the preeminent method for collecting signals from the human brain in vivo, for using these signals in the service of functional discovery, and relating these discoveries to anatomical structure. Numerous computational and mathematical techniques have been deployed to extract information from the fMRI signal. Yet, the application of Topological Data Analyses (TDA) remain limited to certain sub-areas such as connectomics (that is, with summarized versions of fMRI data). While connectomics is a natural and important area of application of TDA, applications of TDA in the service of extracting structure from the (non-summarized) fMRI data itself are heretofore nonexistent. “Structure” within fMRI data is determined by dynamic fluctuations in spatially distributed signals over time, and TDA is well positioned to help researchers better characterize mass dynamics of the signal by rigorously capturing shape within it. To accurately motivate this idea, we a) survey an established method in TDA (“persistent homology”) to reveal and describe how complex structures can be extracted from data sets generally, and b) describe how persistent homology can be applied specifically to fMRI data. We provide explanations for some of the mathematical underpinnings of TDA (with expository figures), building ideas in the following sequence: a) fMRI researchers can and should use TDA to extract structure from their data; b) this extraction serves an important role in the endeavor of functional discovery, and c) TDA approaches can complement other established approaches toward fMRI analyses (for which we provide examples). We also provide detailed applications of TDA to fMRI data collected using established paradigms, and offer our software pipeline for readers interested in emulating our methods. This working overview is both an inter-disciplinary synthesis of ideas (to draw researchers in TDA and fMRI toward each other) and a detailed description of methods that can motivate collaborative research.
Collapse
Affiliation(s)
- Andrew Salch
- Department of Mathematics, Wayne State University, Detroit, Michigan, United States of America
- * E-mail: (AS); (AR); (HA)
| | - Adam Regalski
- Department of Mathematics, Wayne State University, Detroit, Michigan, United States of America
- * E-mail: (AS); (AR); (HA)
| | - Hassan Abdallah
- Department of Mathematics, Wayne State University, Detroit, Michigan, United States of America
- * E-mail: (AS); (AR); (HA)
| | - Raviteja Suryadevara
- Department of Mathematics, Wayne State University, Detroit, Michigan, United States of America
- Department of Psychiatry & Behavioral Neuroscience, Wayne State University, Detroit, Michigan, United States of America
| | - Michael J. Catanzaro
- Department of Mathematics, Iowa State University, Ames, Iowa, United States of America
| | - Vaibhav A. Diwadkar
- Department of Psychiatry & Behavioral Neuroscience, Wayne State University, Detroit, Michigan, United States of America
| |
Collapse
|
26
|
Loughrey C, Fitzpatrick P, Orr N, Jurek-Loughrey A. The topology of data: Opportunities for cancer research. Bioinformatics 2021; 37:3091-3098. [PMID: 34320632 PMCID: PMC8504620 DOI: 10.1093/bioinformatics/btab553] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 06/14/2021] [Accepted: 07/28/2021] [Indexed: 01/20/2023] Open
Abstract
Motivation Topological methods have recently emerged as a reliable and interpretable framework for extracting information from high-dimensional data, leading to the creation of a branch of applied mathematics called Topological Data Analysis (TDA). Since then, TDA has been progressively adopted in biomedical research. Biological data collection can result in enormous datasets, comprising thousands of features and spanning diverse datatypes. This presents a barrier to initial data analysis as the fundamental structure of the dataset becomes hidden, obstructing the discovery of important features and patterns. TDA provides a solution to obtain the underlying shape of datasets over continuous resolutions, corresponding to key topological features independent of noise. TDA has the potential to support future developments in healthcare as biomedical datasets rise in complexity and dimensionality. Previous applications extend across the fields of neuroscience, oncology, immunology and medical image analysis. TDA has been used to reveal hidden subgroups of cancer patients, construct organizational maps of brain activity and classify abnormal patterns in medical images. The utility of TDA is broad and to understand where current achievements lie, we have evaluated the present state of TDA in cancer data analysis. Results This article aims to provide an overview of TDA in Cancer Research. A brief introduction to the main concepts of TDA is provided to ensure that the article is accessible to readers who are not familiar with this field. Following this, a focussed literature review on the field is presented, discussing how TDA has been applied across heterogeneous datatypes for cancer research.
Collapse
Affiliation(s)
- Ciara Loughrey
- School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, BT9 5BN, United Kingdom
| | - Padraig Fitzpatrick
- School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, BT9 5BN, United Kingdom
| | - Nick Orr
- Patrick G Johnston Centre for Cancer Research, Queen's University Belfast, BT9 7AE, United Kingdom
| | - Anna Jurek-Loughrey
- School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, BT9 5BN, United Kingdom
| |
Collapse
|
27
|
Adams H, Moy M. Topology Applied to Machine Learning: From Global to Local. Front Artif Intell 2021; 4:668302. [PMID: 34056580 PMCID: PMC8160457 DOI: 10.3389/frai.2021.668302] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 04/15/2021] [Indexed: 11/24/2022] Open
Abstract
Through the use of examples, we explain one way in which applied topology has evolved since the birth of persistent homology in the early 2000s. The first applications of topology to data emphasized the global shape of a dataset, such as the three-circle model for 3 × 3 pixel patches from natural images, or the configuration space of the cyclo-octane molecule, which is a sphere with a Klein bottle attached via two circles of singularity. In these studies of global shape, short persistent homology bars are disregarded as sampling noise. More recently, however, persistent homology has been used to address questions about the local geometry of data. For instance, how can local geometry be vectorized for use in machine learning problems? Persistent homology and its vectorization methods, including persistence landscapes and persistence images, provide popular techniques for incorporating both local geometry and global topology into machine learning. Our meta-hypothesis is that the short bars are as important as the long bars for many machine learning tasks. In defense of this claim, we survey applications of persistent homology to shape recognition, agent-based modeling, materials science, archaeology, and biology. Additionally, we survey work connecting persistent homology to geometric features of spaces, including curvature and fractal dimension, and various methods that have been used to incorporate persistent homology into machine learning.
Collapse
Affiliation(s)
- Henry Adams
- Department of Mathematics, Colorado State University, Fort Collins, CO, United States
| | - Michael Moy
- Department of Mathematics, Colorado State University, Fort Collins, CO, United States
| |
Collapse
|
28
|
Li J, Bian C, Luo H, Chen D, Cao L, Liang H. Multi-dimensional persistent feature analysis identifies connectivity patterns of resting-state brain networks in Alzheimer's disease. J Neural Eng 2020; 18. [PMID: 33152713 DOI: 10.1088/1741-2552/abc7ef] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 11/05/2020] [Indexed: 01/21/2023]
Abstract
OBJECTIVE The characterization of functional brain network is crucial to understanding the neural mechanisms associated with Alzheimer's disease (AD) and mild cognitive impairment (MCI). Some studies have shown that graph theoretical analysis could reveal changes of the disease-related brain networks by thresholding edge weights. But the choice of threshold depends on ambiguous cognitive conditions, which leads to the lack of interpretability. Recently, persistent homology (PH) was proposed to record the persistence of topological features of networks across every possible thresholds, reporting a higher sensitivity than graph theoretical features in detecting network-level biomarkers of AD. However, most research on PH focused on 0-dimensional features (persistence of connected components) reflecting the intrinsic topology of the brain network, rather than 1-dimensional features (persistence of cycles) with an interesting neurobiological communication pattern. Our aim is to explore the multi-dimensional persistent features of brain networks in the AD and MCI patients, and further to capture valuable brain connectivity patterns. APPROACH We characterized the change rate of the connected component numbers across graph filtration using the functional derivative curves, and examined the persistence landscapes that vectorize the persistence of cycle structures. After that, the multi-dimensional persistent features were validated in disease identification using a K-nearest neighbor algorithm. Furthermore, a connectivity pattern mining framework was designed to capture the disease-specific brain structures. MAIN RESULTS We found that the multi-dimensional persistent features can identify statistical group differences, quantify subject-level distances, and yield disease-specific connectivity patterns. Relatively high classification accuracies were received when compared with graph theoretical features. SIGNIFICANCE This work represents a conceptual bridge linking complex brain network analysis and computational topology. Our results can be beneficial for providing a complementary objective opinion to the clinical diagnosis of neurodegenerative diseases.
Collapse
Affiliation(s)
- Jin Li
- Harbin Engineering University, Harbin, Heilongjiang, CHINA
| | - Chenyuan Bian
- Harbin Engineering University, Harbin, Heilongjiang, CHINA
| | - Haoran Luo
- Harbin Engineering University, Harbin, Heilongjiang, CHINA
| | - Dandan Chen
- Harbin Engineering University, Harbin, Heilongjiang, CHINA
| | - Luolong Cao
- Harbin Engineering University, Harbin, Heilongjiang, CHINA
| | - Hong Liang
- Harbin Engineering University, Nantong street 145, Harbin, 150001, CHINA
| |
Collapse
|
29
|
Ismail MS, Md Noorani MS, Ismail M, Abdul Razak F, Alias MA. Predicting next day direction of stock price movement using machine learning methods with persistent homology: Evidence from Kuala Lumpur Stock Exchange. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106422] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
30
|
Amézquita EJ, Quigley MY, Ophelders T, Munch E, Chitwood DH. The shape of things to come: Topological data analysis and biology, from molecules to organisms. Dev Dyn 2020; 249:816-833. [PMID: 32246730 PMCID: PMC7383827 DOI: 10.1002/dvdy.175] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2020] [Revised: 03/29/2020] [Accepted: 03/29/2020] [Indexed: 11/11/2022] Open
Abstract
Shape is data and data is shape. Biologists are accustomed to thinking about how the shape of biomolecules, cells, tissues, and organisms arise from the effects of genetics, development, and the environment. Less often do we consider that data itself has shape and structure, or that it is possible to measure the shape of data and analyze it. Here, we review applications of topological data analysis (TDA) to biology in a way accessible to biologists and applied mathematicians alike. TDA uses principles from algebraic topology to comprehensively measure shape in data sets. Using a function that relates the similarity of data points to each other, we can monitor the evolution of topological features-connected components, loops, and voids. This evolution, a topological signature, concisely summarizes large, complex data sets. We first provide a TDA primer for biologists before exploring the use of TDA across biological sub-disciplines, spanning structural biology, molecular biology, evolution, and development. We end by comparing and contrasting different TDA approaches and the potential for their use in biology. The vision of TDA, that data are shape and shape is data, will be relevant as biology transitions into a data-driven era where the meaningful interpretation of large data sets is a limiting factor.
Collapse
Affiliation(s)
- Erik J Amézquita
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, Michigan, USA
| | - Michelle Y Quigley
- Department of Horticulture, Michigan State University, East Lansing, Michigan, USA
| | - Tim Ophelders
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, Michigan, USA
| | - Elizabeth Munch
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, Michigan, USA.,Department of Mathematics, Michigan State University, East Lansing, Michigan, USA
| | - Daniel H Chitwood
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, Michigan, USA.,Department of Horticulture, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
31
|
Yan Y, Ivanov K, Mumini Omisore O, Igbe T, Liu Q, Nie Z, Wang L. Gait Rhythm Dynamics for Neuro-Degenerative Disease Classification via Persistence Landscape- Based Topological Representation. SENSORS (BASEL, SWITZERLAND) 2020; 20:E2006. [PMID: 32260065 PMCID: PMC7180793 DOI: 10.3390/s20072006] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2020] [Revised: 03/30/2020] [Accepted: 04/02/2020] [Indexed: 11/16/2022]
Abstract
Neuro-degenerative disease is a common progressive nervous system disorder that leads to serious clinical consequences. Gait rhythm dynamics analysis is essential for evaluating clinical states and improving quality of life for neuro-degenerative patients. The magnitude of stride-to-stride fluctuations and corresponding changes over time-gait dynamics-reflects the physiology of gait, in quantifying the pathologic alterations in the locomotor control system of health subjects and patients with neuro-degenerative diseases. Motivated by algebra topology theory, a topological data analysis-inspired nonlinear framework was adopted in the study of the gait dynamics. Meanwhile, the topological representation-persistence landscapes were used as input of classifiers in order to distinguish different neuro-degenerative disease type from healthy. In this work, stride-to-stride time series from healthy control (HC) subjects are compared with the gait dynamics from patients with amyotrophic lateral sclerosis (ALS), Huntington's disease (HD), and Parkinson's disease (PD). The obtained results show that the proposed methodology discriminates healthy subjects from subjects with other neuro-degenerative diseases with relatively high accuracy. In summary, our study is the first attempt to provide a topological representation-based method into the disease classification with gait rhythms measured from the stride intervals to visualize gait dynamics and classify neuro-degenerative diseases. The proposed method could be potentially used in earlier interventions and state monitoring.
Collapse
Affiliation(s)
- Yan Yan
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen 518055, China; (Y.Y.); (K.I.); (O.M.O.); (T.I.); (Q.L.); (Z.N.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Kamen Ivanov
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen 518055, China; (Y.Y.); (K.I.); (O.M.O.); (T.I.); (Q.L.); (Z.N.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Olatunji Mumini Omisore
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen 518055, China; (Y.Y.); (K.I.); (O.M.O.); (T.I.); (Q.L.); (Z.N.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tobore Igbe
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen 518055, China; (Y.Y.); (K.I.); (O.M.O.); (T.I.); (Q.L.); (Z.N.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qiuhua Liu
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen 518055, China; (Y.Y.); (K.I.); (O.M.O.); (T.I.); (Q.L.); (Z.N.)
| | - Zedong Nie
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen 518055, China; (Y.Y.); (K.I.); (O.M.O.); (T.I.); (Q.L.); (Z.N.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lei Wang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen 518055, China; (Y.Y.); (K.I.); (O.M.O.); (T.I.); (Q.L.); (Z.N.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
32
|
A topology-based network tree for the prediction of protein-protein binding affinity changes following mutation. NAT MACH INTELL 2020; 2:116-123. [PMID: 34170981 PMCID: PMC7223817 DOI: 10.1038/s42256-020-0149-6] [Citation(s) in RCA: 117] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Accepted: 01/10/2020] [Indexed: 12/14/2022]
Abstract
The ability to predict protein-protein interactions is crucial to our understanding of a wide range of biological activities and functions in the human body, and for guiding drug discovery. Despite considerable efforts to develop suitable computational methods, predicting protein-protein interaction binding affinity changes following mutation (ΔΔG) remains a severe challenge. Algebraic topology, a champion in recent worldwide competitions for protein-ligand binding affinity predictions, is a promising approach to simplifying the complexity of biological structures. Here we introduce element- and site-specific persistent homology (a new branch of algebraic topology) to simplify the structural complexity of protein-protein complexes and embed crucial biological information into topological invariants. We also propose a new deep learning algorithm called NetTree to take advantage of convolutional neural networks and gradient-boosting trees. A topology-based network tree is constructed by integrating the topological representation and NetTree for predicting protein-protein interaction ΔΔG. Tests on major benchmark datasets indicate that the proposed topology-based network tree is an important improvement over the current state of the art in predicting ΔΔG.
Collapse
|
33
|
Bramer D, Wei GW. Atom-specific persistent homology and its application to protein flexibility analysis. COMPUTATIONAL AND MATHEMATICAL BIOPHYSICS 2020; 8:1-35. [PMID: 34278230 PMCID: PMC8281920 DOI: 10.1515/cmb-2020-0001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Recently, persistent homology has had tremendous success in biomolecular data analysis. It works by examining the topological relationship or connectivity of a group of atoms in a molecule at a variety of scales, then rendering a family of topological representations of the molecule. However, persistent homology is rarely employed for the analysis of atomic properties, such as biomolecular flexibility analysis or B-factor prediction. This work introduces atom-specific persistent homology to provide a local atomic level representation of a molecule via a global topological tool. This is achieved through the construction of a pair of conjugated sets of atoms and corresponding conjugated simplicial complexes, as well as conjugated topological spaces. The difference between the topological invariants of the pair of conjugated sets is measured by Bottleneck and Wasserstein metrics and leads to an atom-specific topological representation of individual atomic properties in a molecule. Atom-specific topological features are integrated with various machine learning algorithms, including gradient boosting trees and convolutional neural network for protein thermal fluctuation analysis and B-factor prediction. Extensive numerical results indicate the proposed method provides a powerful topological tool for analyzing and predicting localized information in complex macromolecules.
Collapse
Affiliation(s)
- David Bramer
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Corresponding Author: Guo-WeiWei: Department of Mathematics, Michigan State University, MI 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA; Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA,
| |
Collapse
|
34
|
Steinberg L, Russo J, Frey J. A new topological descriptor for water network structure. J Cheminform 2019; 11:48. [PMID: 31292766 PMCID: PMC6617667 DOI: 10.1186/s13321-019-0369-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Accepted: 07/02/2019] [Indexed: 11/10/2022] Open
Abstract
Bulk water molecular dynamics simulations based on a series of atomistic water potentials (TIP3P, TIP4P/Ew, SPC/E and OPC) are compared using new techniques from the field of topological data analysis. The topological invariants (the different degrees of homology) derived from each simulation frame are used to create a series of persistence diagrams from the atomic positions. These are averaged over the simulation time using the persistence image formalism, before being normalised by their total magnitude (the L1 norm) to ensure a size independent descriptor (L1NPI). We demonstrate that the L1NPI formalism is suitable for the analysis of systems where the number of molecules varies by at least a factor of 10. Using standard machine learning techniques, a basic linear SVM, it is shown that differences in water models are able to be isolated to different degrees of homology. In particular, whereas first degree homology is able to distinguish between all atomistic potentials studied, OPC is the only potential that differs in its second degree homology. The L1 normalised persistence images are then used in the comparison of a series of Stillinger-Weber potential simulations to the atomistic potentials and the effects of changing the strength of three-body interactions on the structures is easily evident in L1NPI space, with a reduction in variance of structures as interaction strength increases being the most obvious result. Furthermore, there is a clear tracking in L1NPI space of the λ parameter. The L1NPI formalism presents a useful new technique for the analysis of water and other materials. It is approximately size-independent, and has been shown to contain information as to real structures in the system. We finally present a perspective on the use of L1NPIs and other persistent homology techniques as a descriptor for water solubility.
Collapse
Affiliation(s)
- Lee Steinberg
- School of Chemistry, University of Southampton, Southampton, SO17 1BJ UK
| | - John Russo
- School of Mathematics, University of Bristol, Bristol, UK
| | - Jeremy Frey
- School of Chemistry, University of Southampton, Southampton, SO17 1BJ UK
| |
Collapse
|
35
|
|
36
|
Belchi F, Pirashvili M, Conway J, Bennett M, Djukanovic R, Brodzki J. Lung Topology Characteristics in patients with Chronic Obstructive Pulmonary Disease. Sci Rep 2018; 8:5341. [PMID: 29593257 PMCID: PMC5871819 DOI: 10.1038/s41598-018-23424-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Accepted: 03/12/2018] [Indexed: 11/28/2022] Open
Abstract
Quantitative features that can currently be obtained from medical imaging do not provide a complete picture of Chronic Obstructive Pulmonary Disease (COPD). In this paper, we introduce a novel analytical tool based on persistent homology that extracts quantitative features from chest CT scans to describe the geometric structure of the airways inside the lungs. We show that these new radiomic features stratify COPD patients in agreement with the GOLD guidelines for COPD and can distinguish between inspiratory and expiratory scans. These CT measurements are very different to those currently in use and we demonstrate that they convey significant medical information. The results of this study are a proof of concept that topological methods can enhance the standard methodology to create a finer classification of COPD and increase the possibilities of more personalized treatment.
Collapse
Affiliation(s)
- Francisco Belchi
- Mathematical Sciences, University of Southampton, Southampton, UK
| | | | - Joy Conway
- Faculty of Health Sciences, University of Southampton, Southampton, UK.,NIHR Southampton Respiratory and Critical Care Biomedical Research Centre. University of Southampton, Southampton, UK
| | - Michael Bennett
- NIHR Southampton Respiratory and Critical Care Biomedical Research Centre. University of Southampton, Southampton, UK.,Clinical and Experimental Science, Faculty of Medicine, University of Southampton, Southampton, UK
| | - Ratko Djukanovic
- NIHR Southampton Respiratory and Critical Care Biomedical Research Centre. University of Southampton, Southampton, UK.,Clinical and Experimental Science, Faculty of Medicine, University of Southampton, Southampton, UK
| | - Jacek Brodzki
- Mathematical Sciences, University of Southampton, Southampton, UK.
| |
Collapse
|
37
|
Nikolić D, Kovačev-Nikolić V. Dynamical persistence of active sites identified in maltose-binding protein. J Mol Model 2017; 23:167. [PMID: 28451879 DOI: 10.1007/s00894-017-3344-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Accepted: 04/03/2017] [Indexed: 10/19/2022]
Abstract
This study identifies dynamical properties of maltose-binding protein (MBP) useful in unveiling active site residues susceptible to ligand binding. The described methodology has been previously used in support of novel topological techniques of persistent homology and statistical inference in complex, multi-scale, high-dimensional data often encountered in computational biophysics. Here we outline a computational protocol that is based on the anisotropic elastic network models of 14 all-atom three-dimensional protein structures. We introduce the notion of dynamical distance matrices as a measure of correlated interactions among 370 amino acid residues that constitute a single protein. The dynamical distance matrices serve as an input for a persistent homology suite of codes to further distinguish a small subset of residues with high affinity for ligand binding and allosteric activity. In addition, we show that ligand-free closed MBP structures require lower deformation energies than open MBP structures, which may be used in categorization of time-evolving molecular dynamics structures. Analysis of the most probable allosteric coupling pathways between active site residues and the protein exterior is also presented.
Collapse
Affiliation(s)
- Dragan Nikolić
- Department of Mechanical Engineering, University of Alberta and National Institute for Nanotechnology, 11421 Saskatchewan Dr NW, Edmonton, AB, T6G 2M9, Canada.
| | | |
Collapse
|
38
|
Stolz BJ, Harrington HA, Porter MA. Persistent homology of time-dependent functional networks constructed from coupled time series. CHAOS (WOODBURY, N.Y.) 2017; 27:047410. [PMID: 28456167 DOI: 10.1063/1.4978997] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
We use topological data analysis to study "functional networks" that we construct from time-series data from both experimental and synthetic sources. We use persistent homology with a weight rank clique filtration to gain insights into these functional networks, and we use persistence landscapes to interpret our results. Our first example uses time-series output from networks of coupled Kuramoto oscillators. Our second example consists of biological data in the form of functional magnetic resonance imaging data that were acquired from human subjects during a simple motor-learning task in which subjects were monitored for three days during a five-day period. With these examples, we demonstrate that (1) using persistent homology to study functional networks provides fascinating insights into their properties and (2) the position of the features in a filtration can sometimes play a more vital role than persistence in the interpretation of topological features, even though conventionally the latter is used to distinguish between signal and noise. We find that persistent homology can detect differences in synchronization patterns in our data sets over time, giving insight both on changes in community structure in the networks and on increased synchronization between brain regions that form loops in a functional network during motor learning. For the motor-learning data, persistence landscapes also reveal that on average the majority of changes in the network loops take place on the second of the three days of the learning process.
Collapse
Affiliation(s)
- Bernadette J Stolz
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
| | | | - Mason A Porter
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, United Kingdom
| |
Collapse
|