1
|
Durán C, Ciucci S, Palladini A, Ijaz UZ, Zippo AG, Sterbini FP, Masucci L, Cammarota G, Ianiro G, Spuul P, Schroeder M, Grill SW, Parsons BN, Pritchard DM, Posteraro B, Sanguinetti M, Gasbarrini G, Gasbarrini A, Cannistraci CV. Nonlinear machine learning pattern recognition and bacteria-metabolite multilayer network analysis of perturbed gastric microbiome. Nat Commun 2021; 12:1926. [PMID: 33771992 PMCID: PMC7997970 DOI: 10.1038/s41467-021-22135-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Accepted: 02/24/2021] [Indexed: 12/11/2022] Open
Abstract
The stomach is inhabited by diverse microbial communities, co-existing in a dynamic balance. Long-term use of drugs such as proton pump inhibitors (PPIs), or bacterial infection such as Helicobacter pylori, cause significant microbial alterations. Yet, studies revealing how the commensal bacteria re-organize, due to these perturbations of the gastric environment, are in early phase and rely principally on linear techniques for multivariate analysis. Here we disclose the importance of complementing linear dimensionality reduction techniques with nonlinear ones to unveil hidden patterns that remain unseen by linear embedding. Then, we prove the advantages to complete multivariate pattern analysis with differential network analysis, to reveal mechanisms of bacterial network re-organizations which emerge from perturbations induced by a medical treatment (PPIs) or an infectious state (H. pylori). Finally, we show how to build bacteria-metabolite multilayer networks that can deepen our understanding of the metabolite pathways significantly associated to the perturbed microbial communities.
Collapse
Affiliation(s)
- Claudio Durán
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Center for Systems Biology Dresden (CSBD), Cluster of Excellence Physics of Life (PoL), Department of Physics, Technische Universität Dresden, Dresden, Germany
| | - Sara Ciucci
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Center for Systems Biology Dresden (CSBD), Cluster of Excellence Physics of Life (PoL), Department of Physics, Technische Universität Dresden, Dresden, Germany
| | - Alessandra Palladini
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Center for Systems Biology Dresden (CSBD), Cluster of Excellence Physics of Life (PoL), Department of Physics, Technische Universität Dresden, Dresden, Germany
- Paul Langerhans Institute Dresden, Helmholtz Zentrum Munchen, Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- German Center for Diabetes Research (DZD e.V.), Neuherberg, Germany
| | - Umer Z Ijaz
- Department of Infrastructure and Environment University of Glasgow, School of Engineering, Glasgow, UK
| | - Antonio G Zippo
- Institute of Neuroscience, Consiglio Nazionale delle Ricerche, Milan, Italy
| | | | - Luca Masucci
- Institute of Microbiology, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Giovanni Cammarota
- Internal Medicine and Gastroenterology Unit, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Gianluca Ianiro
- Internal Medicine and Gastroenterology Unit, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Pirjo Spuul
- Department of Chemistry and Biotechnology, Division of Gene Technology, Tallinn University of Technology, Tallinn, 12618, Estonia
| | - Michael Schroeder
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Dresden, Germany
| | - Stephan W Grill
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Dresden, Germany
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | - Bryony N Parsons
- Department of Cellular and Molecular Physiology, Institute of Translational Medicine, University of Liverpool, Liverpool, UK
| | - D Mark Pritchard
- Department of Cellular and Molecular Physiology, Institute of Translational Medicine, University of Liverpool, Liverpool, UK
- Department of Gastroenterology, Royal Liverpool and Broadgreen University Hospitals NHS Trust, Liverpool, UK
| | - Brunella Posteraro
- Institute of Microbiology, Università Cattolica del Sacro Cuore, Rome, Italy
| | | | - Giovanni Gasbarrini
- Internal Medicine and Gastroenterology Unit, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Antonio Gasbarrini
- Internal Medicine and Gastroenterology Unit, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Carlo Vittorio Cannistraci
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Center for Systems Biology Dresden (CSBD), Cluster of Excellence Physics of Life (PoL), Department of Physics, Technische Universität Dresden, Dresden, Germany.
- Center for Complex Network Intelligence (CCNI) at Tsinghua Laboratory of Brain and Intelligence (THBI), Department of Biomedical Engineering, Tsinghua University, Beijing, China.
| |
Collapse
|
2
|
Liu LP, Zheng YW. Predicting differentiation potential of human pluripotent stem cells: Possibilities and challenges. World J Stem Cells 2019; 11:375-382. [PMID: 31396366 PMCID: PMC6682503 DOI: 10.4252/wjsc.v11.i7.375] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 06/12/2019] [Accepted: 06/20/2019] [Indexed: 02/06/2023] Open
Abstract
The capability of human pluripotent stem cell (hPSC) lines to propagate indefinitely and differentiate into derivatives of three embryonic germ layers makes these cells be powerful tools for basic scientific research and promising agents for translational medicine. However, variations in differentiation tendency and efficiency as well as pluripotency maintenance necessitate the selection of hPSC lines for the intended applications to save time and cost. To screen the qualified cell lines and exclude problematic cell lines, their pluripotency must be confirmed initially by traditional methods such as teratoma formation or by high-throughput gene expression profiling assay. Additionally, their differentiation potential, particularly the lineage-specific differentiation propensities of hPSC lines, should be predicted in an early stage. As a complement to the teratoma assay, RNA sequencing data provide a quantitative estimate of the differentiation ability of hPSCs in vivo. Moreover, multiple scorecards have been developed based on selected gene sets for predicting the differentiation potential into three germ layers or the desired cell type many days before terminal differentiation. For clinical application of hPSCs, the malignant potential of the cells must also be evaluated. A combination of histologic examination of teratoma with quantitation of gene expression data derived from teratoma tissue provides safety-related predictive information by detecting immature teratomas, malignancy marker expression, and other parameters. Although various prediction methods are available, distinct limitations remain such as the discordance of results between different assays and requirement of a long time and high labor and cost, restricting their wide applications in routine studies. Therefore, simpler and more rapid detection assays with high specificity and sensitivity that can be used to monitor the status of hPSCs at any time and fewer targeted markers that are more specific for a given desired cell type are urgently needed.
Collapse
Affiliation(s)
- Li-Ping Liu
- Institute of Regenerative Medicine, Affiliated Hospital of Jiangsu University, Jiangsu University, Zhenjiang 212001, Jiangsu Province, China
| | - Yun-Wen Zheng
- Institute of Regenerative Medicine, Affiliated Hospital of Jiangsu University, Jiangsu University, Zhenjiang 212001, Jiangsu Province, China.
| |
Collapse
|
3
|
Härtner F, Andrade-Navarro MA, Alanis-Lobato G. Geometric characterisation of disease modules. APPLIED NETWORK SCIENCE 2018; 3:10. [PMID: 30839777 PMCID: PMC6214295 DOI: 10.1007/s41109-018-0066-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Accepted: 05/28/2018] [Indexed: 05/07/2023]
Abstract
There is an increasing accumulation of evidence supporting the existence of a hyperbolic geometry underlying the network representation of complex systems. In particular, it has been shown that the latent geometry of the human protein network (hPIN) captures biologically relevant information, leading to a meaningful visual representation of protein-protein interactions and translating challenging systems biology problems into measuring distances between proteins. Moreover, proteins can efficiently communicate with each other, without global knowledge of the hPIN structure, via a greedy routing (GR) process in which hyperbolic distances guide biological signals from source to target proteins. It is thanks to this effective information routing throughout the hPIN that the cell operates, communicates with other cells and reacts to environmental changes. As a result, the malfunction of one or a few members of this intricate system can disturb its dynamics and derive in disease phenotypes. In fact, it is known that the proteins associated with a single disease agglomerate non-randomly in the same region of the hPIN, forming one or several connected components known as the disease module (DM). Here, we present a geometric characterisation of DMs. First, we found that DM positions on the two-dimensional hyperbolic plane reflect their fragmentation and functional heterogeneity, rendering an informative picture of the cellular processes that the disease is affecting. Second, we used a distance-based dissimilarity measure to cluster DMs with shared clinical features. Finally, we took advantage of the GR strategy to study how defective proteins affect the transduction of signals throughout the hPIN.
Collapse
Affiliation(s)
- Franziska Härtner
- Faculty for Physics, Mathematics and Computer Science, Johannes Gutenberg Universität, Institute of Computer Science, Staudingerweg 7, Mainz, 55128 Germany
| | - Miguel A. Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg Universität, Institute of Molecular Biology, Ackermannweg 4, Mainz, 55128 Germany
| | - Gregorio Alanis-Lobato
- Faculty of Biology, Johannes Gutenberg Universität, Institute of Molecular Biology, Ackermannweg 4, Mainz, 55128 Germany
| |
Collapse
|
4
|
Machine learning meets complex networks via coalescent embedding in the hyperbolic space. Nat Commun 2017; 8:1615. [PMID: 29151574 PMCID: PMC5694768 DOI: 10.1038/s41467-017-01825-5] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 10/19/2017] [Indexed: 01/02/2023] Open
Abstract
Physicists recently observed that realistic complex networks emerge as discrete samples from a continuous hyperbolic geometry enclosed in a circle: the radius represents the node centrality and the angular displacement between two nodes resembles their topological proximity. The hyperbolic circle aims to become a universal space of representation and analysis of many real networks. Yet, inferring the angular coordinates to map a real network back to its latent geometry remains a challenging inverse problem. Here, we show that intelligent machines for unsupervised recognition and visualization of similarities in big data can also infer the network angular coordinates of the hyperbolic model according to a geometrical organization that we term "angular coalescence." Based on this phenomenon, we propose a class of algorithms that offers fast and accurate "coalescent embedding" in the hyperbolic circle even for large networks. This computational solution to an inverse problem in physics of complex systems favors the application of network latent geometry techniques in disciplines dealing with big network data analysis including biology, medicine, and social science.
Collapse
|
5
|
Kim D, Ryu J, Son M, Oh J, Chung K, Lee S, Lee J, Ahn J, Min J, Ahn J, Kang HM, Kim J, Jung C, Kim N, Cho H. A liver-specific gene expression panel predicts the differentiation status of in vitro hepatocyte models. Hepatology 2017; 66. [PMID: 28640507 PMCID: PMC5698781 DOI: 10.1002/hep.29324] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
UNLABELLED Alternative cell sources, such as three-dimensional organoids and induced pluripotent stem cell-derived cells, might provide a potentially effective approach for both drug development applications and clinical transplantation. For example, the development of cell sources for liver cell-based therapy has been increasingly needed, and liver transplantation is performed for the treatment for patients with severe end-stage liver disease. Differentiated liver cells and three-dimensional organoids are expected to provide new cell sources for tissue models and revolutionary clinical therapies. However, conventional experimental methods confirming the expression levels of liver-specific lineage markers cannot provide complete information regarding the differentiation status or degree of similarity between liver and differentiated cell sources. Therefore, in this study, to overcome several issues associated with the assessment of differentiated liver cells and organoids, we developed a liver-specific gene expression panel (LiGEP) algorithm that presents the degree of liver similarity as a "percentage." We demonstrated that the percentage calculated using the LiGEP algorithm was correlated with the developmental stages of in vivo liver tissues in mice, suggesting that LiGEP can correctly predict developmental stages. Moreover, three-dimensional cultured HepaRG cells and human pluripotent stem cell-derived hepatocyte-like cells showed liver similarity scores of 59.14% and 32%, respectively, although general liver-specific markers were detected. CONCLUSION Our study describes a quantitative and predictive model for differentiated samples, particularly liver-specific cells or organoids; and this model can be further expanded to various tissue-specific organoids; our LiGEP can provide useful information and insights regarding the differentiation status of in vitro liver models. (Hepatology 2017;66:1662-1674).
Collapse
Affiliation(s)
- Dae‐Soo Kim
- Genome Research CenterKorea Research Institute of Bioscience and BiotechnologyDaejeonRepublic of Korea,Department of Functional GenomicsKorea University of Science and TechnologyDaejeonRepublic of Korea
| | - Jea‐Woon Ryu
- Genome Research CenterKorea Research Institute of Bioscience and BiotechnologyDaejeonRepublic of Korea
| | - Mi‐Young Son
- Department of Functional GenomicsKorea University of Science and TechnologyDaejeonRepublic of Korea,Stem Cell Research CenterKorea Research Institute of Bioscience and BiotechnologyDaejeonRepublic of Korea
| | - Jung‐Hwa Oh
- Korea Institute of ToxicologyDaejeonRepublic of Korea
| | - Kyung‐Sook Chung
- Genome Research CenterKorea Research Institute of Bioscience and BiotechnologyDaejeonRepublic of Korea,Department of Functional GenomicsKorea University of Science and TechnologyDaejeonRepublic of Korea
| | - Sugi Lee
- Genome Research CenterKorea Research Institute of Bioscience and BiotechnologyDaejeonRepublic of Korea,Department of Functional GenomicsKorea University of Science and TechnologyDaejeonRepublic of Korea
| | - Jeong‐Ju Lee
- Genome Research CenterKorea Research Institute of Bioscience and BiotechnologyDaejeonRepublic of Korea
| | - Jun‐Ho Ahn
- Genome Research CenterKorea Research Institute of Bioscience and BiotechnologyDaejeonRepublic of Korea
| | - Ju‐Sik Min
- Genome Research CenterKorea Research Institute of Bioscience and BiotechnologyDaejeonRepublic of Korea
| | - Jiwon Ahn
- Genome Research CenterKorea Research Institute of Bioscience and BiotechnologyDaejeonRepublic of Korea
| | - Hyun Mi Kang
- Stem Cell Research CenterKorea Research Institute of Bioscience and BiotechnologyDaejeonRepublic of Korea
| | - Janghwan Kim
- Department of Functional GenomicsKorea University of Science and TechnologyDaejeonRepublic of Korea,Stem Cell Research CenterKorea Research Institute of Bioscience and BiotechnologyDaejeonRepublic of Korea
| | - Cho‐Rok Jung
- Department of Functional GenomicsKorea University of Science and TechnologyDaejeonRepublic of Korea,Stem Cell Research CenterKorea Research Institute of Bioscience and BiotechnologyDaejeonRepublic of Korea
| | - Nam‐Soon Kim
- Genome Research CenterKorea Research Institute of Bioscience and BiotechnologyDaejeonRepublic of Korea,Department of Functional GenomicsKorea University of Science and TechnologyDaejeonRepublic of Korea
| | - Hyun‐Soo Cho
- Genome Research CenterKorea Research Institute of Bioscience and BiotechnologyDaejeonRepublic of Korea,Department of Functional GenomicsKorea University of Science and TechnologyDaejeonRepublic of Korea
| |
Collapse
|
6
|
Zeng C, Mulas F, Sui Y, Guan T, Miller N, Tan Y, Liu F, Jin W, Carrano AC, Huising MO, Shirihai OS, Yeo GW, Sander M. Pseudotemporal Ordering of Single Cells Reveals Metabolic Control of Postnatal β Cell Proliferation. Cell Metab 2017; 25:1160-1175.e11. [PMID: 28467932 PMCID: PMC5501713 DOI: 10.1016/j.cmet.2017.04.014] [Citation(s) in RCA: 102] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 02/28/2017] [Accepted: 04/13/2017] [Indexed: 01/28/2023]
Abstract
Pancreatic β cell mass for appropriate blood glucose control is established during early postnatal life. β cell proliferative capacity declines postnatally, but the extrinsic cues and intracellular signals that cause this decline remain unknown. To obtain a high-resolution map of β cell transcriptome dynamics after birth, we generated single-cell RNA-seq data of β cells from multiple postnatal time points and ordered cells based on transcriptional similarity using a new analytical tool. This analysis captured signatures of immature, proliferative β cells and established high expression of amino acid metabolic, mitochondrial, and Srf/Jun/Fos transcription factor genes as their hallmark feature. Experimental validation revealed high metabolic activity in immature β cells and a role for reactive oxygen species and Srf/Jun/Fos transcription factors in driving postnatal β cell proliferation and mass expansion. Our work provides the first high-resolution molecular characterization of state changes in postnatal β cells and paves the way for the identification of novel therapeutic targets to stimulate β cell regeneration.
Collapse
Affiliation(s)
- Chun Zeng
- Departments of Pediatrics and Cellular & Molecular Medicine, Pediatric Diabetes Research Center and Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Francesca Mulas
- Departments of Pediatrics and Cellular & Molecular Medicine, Pediatric Diabetes Research Center and Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Yinghui Sui
- Departments of Pediatrics and Cellular & Molecular Medicine, Pediatric Diabetes Research Center and Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Tiffany Guan
- Departments of Pediatrics and Cellular & Molecular Medicine, Pediatric Diabetes Research Center and Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Nathanael Miller
- Departments of Medicine and Molecular & Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA; Department of Medicine, Boston University, School of Medicine, Boston, MA 02118, USA
| | - Yuliang Tan
- Howard Hughes Medical Institute, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Fenfen Liu
- Departments of Pediatrics and Cellular & Molecular Medicine, Pediatric Diabetes Research Center and Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Wen Jin
- Departments of Pediatrics and Cellular & Molecular Medicine, Pediatric Diabetes Research Center and Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Andrea C Carrano
- Departments of Pediatrics and Cellular & Molecular Medicine, Pediatric Diabetes Research Center and Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Mark O Huising
- Department of Neurobiology, Physiology & Behavior, College of Biological Sciences, University of California, Davis, CA 95616, USA
| | - Orian S Shirihai
- Departments of Medicine and Molecular & Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA; Department of Medicine, Boston University, School of Medicine, Boston, MA 02118, USA
| | - Gene W Yeo
- Department of Cellular & Molecular Medicine and Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Maike Sander
- Departments of Pediatrics and Cellular & Molecular Medicine, Pediatric Diabetes Research Center and Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
7
|
Ciucci S, Ge Y, Durán C, Palladini A, Jiménez-Jiménez V, Martínez-Sánchez LM, Wang Y, Sales S, Shevchenko A, Poser SW, Herbig M, Otto O, Androutsellis-Theotokis A, Guck J, Gerl MJ, Cannistraci CV. Enlightening discriminative network functional modules behind Principal Component Analysis separation in differential-omic science studies. Sci Rep 2017; 7:43946. [PMID: 28287094 PMCID: PMC5347127 DOI: 10.1038/srep43946] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Accepted: 02/06/2017] [Indexed: 01/08/2023] Open
Abstract
Omic science is rapidly growing and one of the most employed techniques to explore differential patterns in omic datasets is principal component analysis (PCA). However, a method to enlighten the network of omic features that mostly contribute to the sample separation obtained by PCA is missing. An alternative is to build correlation networks between univariately-selected significant omic features, but this neglects the multivariate unsupervised feature compression responsible for the PCA sample segregation. Biologists and medical researchers often prefer effective methods that offer an immediate interpretation to complicated algorithms that in principle promise an improvement but in practice are difficult to be applied and interpreted. Here we present PC-corr: a simple algorithm that associates to any PCA segregation a discriminative network of features. Such network can be inspected in search of functional modules useful in the definition of combinatorial and multiscale biomarkers from multifaceted omic data in systems and precision biomedicine. We offer proofs of PC-corr efficacy on lipidomic, metagenomic, developmental genomic, population genetic, cancer promoteromic and cancer stem-cell mechanomic data. Finally, PC-corr is a general functional network inference approach that can be easily adopted for big data exploration in computer science and analysis of complex systems in physics.
Collapse
Affiliation(s)
- Sara Ciucci
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Department of Physics, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany.,Lipotype GmbH, Tatzberg 47, 01307 Dresden, Germany
| | - Yan Ge
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Department of Physics, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Claudio Durán
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Department of Physics, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Alessandra Palladini
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Department of Physics, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany.,Lipotype GmbH, Tatzberg 47, 01307 Dresden, Germany.,Membrane Biochemistry Group, DZD Paul Langerhans Institute, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Víctor Jiménez-Jiménez
- Integrin Signalling Group, Fundación Centro Nacional de Investigaciones Cardiovasculares Carlos III, Melchor Fernández Almagro 3, 28029 Madrid, Spain
| | - Luisa María Martínez-Sánchez
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Department of Physics, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Yuting Wang
- MPI of Molecular Cell Biology and Genetics, Pfotenhauerstrstraße 108, 01307 Dresden, Germany.,Center for Regenerative Therapies Dresden (CRTD), Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Fetscherstraße 105, 01307 Dresden, Germany
| | - Susanne Sales
- MPI of Molecular Cell Biology and Genetics, Pfotenhauerstrstraße 108, 01307 Dresden, Germany
| | - Andrej Shevchenko
- MPI of Molecular Cell Biology and Genetics, Pfotenhauerstrstraße 108, 01307 Dresden, Germany
| | - Steven W Poser
- Department of Internal Medicine III, University Hospital Carl Gustav Carus at the Technische Universität Dresden, Fetscherstr.74, 01307 Dresden, Germany
| | - Maik Herbig
- Cellular Machines Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Oliver Otto
- Cellular Machines Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Andreas Androutsellis-Theotokis
- Center for Regenerative Therapies Dresden (CRTD), Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Fetscherstraße 105, 01307 Dresden, Germany.,Department of Internal Medicine III, University Hospital Carl Gustav Carus at the Technische Universität Dresden, Fetscherstr.74, 01307 Dresden, Germany.,Department of Stem Cell Biology, Centre for Biomolecular Sciences, Division of Cancer and Stem Cells, School of Medicine, University of Nottingham, Nottingham NG7 2RD, U.K
| | - Jochen Guck
- Cellular Machines Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | | | - Carlo Vittorio Cannistraci
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Department of Physics, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| |
Collapse
|
8
|
Eisinga R, Heskes T, Pelzer B, Te Grotenhuis M. Exact p-values for pairwise comparison of Friedman rank sums, with application to comparing classifiers. BMC Bioinformatics 2017; 18:68. [PMID: 28122501 PMCID: PMC5267387 DOI: 10.1186/s12859-017-1486-2] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2016] [Accepted: 01/11/2017] [Indexed: 12/21/2022] Open
Abstract
Background The Friedman rank sum test is a widely-used nonparametric method in computational biology. In addition to examining the overall null hypothesis of no significant difference among any of the rank sums, it is typically of interest to conduct pairwise comparison tests. Current approaches to such tests rely on large-sample approximations, due to the numerical complexity of computing the exact distribution. These approximate methods lead to inaccurate estimates in the tail of the distribution, which is most relevant for p-value calculation. Results We propose an efficient, combinatorial exact approach for calculating the probability mass distribution of the rank sum difference statistic for pairwise comparison of Friedman rank sums, and compare exact results with recommended asymptotic approximations. Whereas the chi-squared approximation performs inferiorly to exact computation overall, others, particularly the normal, perform well, except for the extreme tail. Hence exact calculation offers an improvement when small p-values occur following multiple testing correction. Exact inference also enhances the identification of significant differences whenever the observed values are close to the approximate critical value. We illustrate the proposed method in the context of biological machine learning, were Friedman rank sum difference tests are commonly used for the comparison of classifiers over multiple datasets. Conclusions We provide a computationally fast method to determine the exact p-value of the absolute rank sum difference of a pair of Friedman rank sums, making asymptotic tests obsolete. Calculation of exact p-values is easy to implement in statistical software and the implementation in R is provided in one of the Additional files and is also available at http://www.ru.nl/publish/pages/726696/friedmanrsd.zip. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1486-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rob Eisinga
- Department of Social Science Research Methods, Radboud University Nijmegen, PO Box 9104, , 6500 HE, Nijmegen, The Netherlands.
| | - Tom Heskes
- Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands
| | - Ben Pelzer
- Department of Social Science Research Methods, Radboud University Nijmegen, PO Box 9104, , 6500 HE, Nijmegen, The Netherlands
| | - Manfred Te Grotenhuis
- Department of Social Science Research Methods, Radboud University Nijmegen, PO Box 9104, , 6500 HE, Nijmegen, The Netherlands
| |
Collapse
|
9
|
Alessio M, Cannistraci CV. Nonlinear Dimensionality Reduction by Minimum Curvilinearity for Unsupervised Discovery of Patterns in Multidimensional Proteomic Data. Methods Mol Biol 2016; 1384:289-298. [PMID: 26611421 DOI: 10.1007/978-1-4939-3255-9_16] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Dimensionality reduction is largely and successfully employed for the visualization and discrimination of patterns, hidden in multidimensional proteomics datasets. Principal component analysis (PCA), which is the preferred approach for linear dimensionality reduction, may present serious limitations, in particular when samples are nonlinearly related, as often occurs in several two-dimensional electrophoresis (2-DE) datasets. An aggravating factor is that PCA robustness is impaired when the number of samples is small in comparison to the number of proteomic features, and this is the case in high-dimensional proteomic datasets, including 2-DE ones. Here, we describe the use of a nonlinear unsupervised learning machine for dimensionality reduction called minimum curvilinear embedding (MCE) that was successfully applied to different biological samples datasets. In particular, we provide an example where we directly compare MCE performance with that of PCA in disclosing neuropathic pain patterns, hidden in a multidimensional proteomic dataset.
Collapse
Affiliation(s)
- Massimo Alessio
- Proteome Biochemistry, IRCCS-San Raffaele Scientific Institute, Milan, Italy.
| | - Carlo Vittorio Cannistraci
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Technische Universität Dresden, Tatzberg 47/49, 01307, Dresden, Germany.
| |
Collapse
|
10
|
Highlighting nonlinear patterns in population genetics datasets. Sci Rep 2015; 5:8140. [PMID: 25633916 PMCID: PMC4311249 DOI: 10.1038/srep08140] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2014] [Accepted: 01/08/2015] [Indexed: 01/25/2023] Open
Abstract
Detecting structure in population genetics and case-control studies is important, as it exposes phenomena such as ecoclines, admixture and stratification. Principal Component Analysis (PCA) is a linear dimension-reduction technique commonly used for this purpose, but it struggles to reveal complex, nonlinear data patterns. In this paper we introduce non-centred Minimum Curvilinear Embedding (ncMCE), a nonlinear method to overcome this problem. Our analyses show that ncMCE can separate individuals into ethnic groups in cases in which PCA fails to reveal any clear structure. This increased discrimination power arises from ncMCE's ability to better capture the phylogenetic signal in the samples, whereas PCA better reflects their geographic relation. We also demonstrate how ncMCE can discover interesting patterns, even when the data has been poorly pre-processed. The juxtaposition of PCA and ncMCE visualisations provides a new standard of analysis with utility for discovering and validating significant linear/nonlinear complementary patterns in genetic data.
Collapse
|
11
|
Cannistraci CV, Alanis-Lobato G, Ravasi T. Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding. Bioinformatics 2013; 29:i199-209. [PMID: 23812985 PMCID: PMC3694668 DOI: 10.1093/bioinformatics/btt208] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION Most functions within the cell emerge thanks to protein-protein interactions (PPIs), yet experimental determination of PPIs is both expensive and time-consuming. PPI networks present significant levels of noise and incompleteness. Predicting interactions using only PPI-network topology (topological prediction) is difficult but essential when prior biological knowledge is absent or unreliable. METHODS Network embedding emphasizes the relations between network proteins embedded in a low-dimensional space, in which protein pairs that are closer to each other represent good candidate interactions. To achieve network denoising, which boosts prediction performance, we first applied minimum curvilinear embedding (MCE), and then adopted shortest path (SP) in the reduced space to assign likelihood scores to candidate interactions. Furthermore, we introduce (i) a new valid variation of MCE, named non-centred MCE (ncMCE); (ii) two automatic strategies for selecting the appropriate embedding dimension; and (iii) two new randomized procedures for evaluating predictions. RESULTS We compared our method against several unsupervised and supervisedly tuned embedding approaches and node neighbourhood techniques. Despite its computational simplicity, ncMCE-SP was the overall leader, outperforming the current methods in topological link prediction. CONCLUSION Minimum curvilinearity is a valuable non-linear framework that we successfully applied to the embedding of protein networks for the unsupervised prediction of novel PPIs. The rationale for our approach is that biological and evolutionary information is imprinted in the non-linear patterns hidden behind the protein network topology, and can be exploited for predicting new protein links. The predicted PPIs represent good candidates for testing in high-throughput experiments or for exploitation in systems biology tools such as those used for network-based inference and prediction of disease-related functional modules. AVAILABILITY https://sites.google.com/site/carlovittoriocannistraci/home. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Carlo Vittorio Cannistraci
- Integrative Systems Biology Laboratory, Biological and Environmental Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia.
| | | | | |
Collapse
|
12
|
Mulas F, Zagar L, Zupan B, Bellazzi R. Supporting regenerative medicine by integrative dimensionality reduction. Methods Inf Med 2012; 51:341-7. [PMID: 22773076 DOI: 10.3414/me11-02-0045] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2011] [Accepted: 05/04/2012] [Indexed: 01/03/2023]
Abstract
OBJECTIVE The assessment of the developmental potential of stem cells is a crucial step towards their clinical application in regenerative medicine. It has been demonstrated that genome-wide expression profiles can predict the cellular differentiation stage by means of dimensionality reduction methods. Here we show that these techniques can be further strengthened to support decision making with i) a novel strategy for gene selection; ii) methods for combining the evidence from multiple data sets. METHODS We propose to exploit dimensionality reduction methods for the selection of genes specifically activated in different stages of differentiation. To obtain an integrated predictive model, the expression values of the selected genes from multiple data sets are combined. We investigated distinct approaches that either aggregate data sets or use learning ensembles. RESULTS We analyzed the performance of the proposed methods on six publicly available data sets. The selection procedure identified a reduced subset of genes whose expression values gave rise to an accurate stage prediction. The assessment of predictive accuracy demonstrated a high quality of predictions for most of the data integration methods presented. CONCLUSION The experimental results highlighted the main potentials of proposed approaches. These include the ability to predict the true staging by combining multiple training data sets when this could not be inferred from a single data source, and to focus the analysis on a reduced list of genes of similar predictive performance.
Collapse
Affiliation(s)
- F Mulas
- Centre for Tissue Engineering, University of Pavia, Pavia, Italy
| | | | | | | |
Collapse
|
13
|
Hashimoto T, Jaakkola T, Sherwood R, Mazzoni EO, Wichterle H, Gifford D. Lineage-based identification of cellular states and expression programs. Bioinformatics 2012; 28:i250-7. [PMID: 22689769 PMCID: PMC3371836 DOI: 10.1093/bioinformatics/bts204] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
We present a method, LineageProgram, that uses the developmental lineage relationship of observed gene expression measurements to improve the learning of developmentally relevant cellular states and expression programs. We find that incorporating lineage information allows us to significantly improve both the predictive power and interpretability of expression programs that are derived from expression measurements from in vitro differentiation experiments. The lineage tree of a differentiation experiment is a tree graph whose nodes describe all of the unique expression states in the input expression measurements, and edges describe the experimental perturbations applied to cells. Our method, LineageProgram, is based on a log-linear model with parameters that reflect changes along the lineage tree. Regularization with L(1) that based methods controls the parameters in three distinct ways: the number of genes change between two cellular states, the number of unique cellular states, and the number of underlying factors responsible for changes in cell state. The model is estimated with proximal operators to quickly discover a small number of key cell states and gene sets. Comparisons with existing factorization, techniques, such as singular value decomposition and non-negative matrix factorization show that our method provides higher predictive power in held, out tests while inducing sparse and biologically relevant gene sets.
Collapse
Affiliation(s)
- Tatsunori Hashimoto
- Department of Computer Science and Electrical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | | | | | | | | | |
Collapse
|