51
|
Jain Y, Ding S, Qiu J. Sliced inverse regression for integrative multi-omics data analysis. Stat Appl Genet Mol Biol 2019; 18:/j/sagmb.ahead-of-print/sagmb-2018-0028/sagmb-2018-0028.xml. [PMID: 30685747 DOI: 10.1515/sagmb-2018-0028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Advancement in next-generation sequencing, transcriptomics, proteomics and other high-throughput technologies has enabled simultaneous measurement of multiple types of genomic data for cancer samples. These data together may reveal new biological insights as compared to analyzing one single genome type data. This study proposes a novel use of supervised dimension reduction method, called sliced inverse regression, to multi-omics data analysis to improve prediction over a single data type analysis. The study further proposes an integrative sliced inverse regression method (integrative SIR) for simultaneous analysis of multiple omics data types of cancer samples, including MiRNA, MRNA and proteomics, to achieve integrative dimension reduction and to further improve prediction performance. Numerical results show that integrative analysis of multi-omics data is beneficial as compared to single data source analysis, and more importantly, that supervised dimension reduction methods possess advantages in integrative data analysis in terms of classification and prediction as compared to unsupervised dimension reduction methods.
Collapse
Affiliation(s)
- Yashita Jain
- Center for Bioinformatics and Computational Biology, University of Delaware, 15 Innovation Way, Newark, DE 19711, USA
| | - Shanshan Ding
- Center for Bioinformatics and Computational Biology, University of Delaware, 15 Innovation Way, Newark, DE 19711, USA.,Department of Applied Economics and Statistics, University of Delaware, 531 S College Ave., Newark, DE 19711, USA
| | - Jing Qiu
- Center for Bioinformatics and Computational Biology, University of Delaware, 15 Innovation Way, Newark, DE 19711, USA.,Department of Applied Economics and Statistics, University of Delaware, 531 S College Ave., Newark, DE 19711, USA
| |
Collapse
|
52
|
Computational Methods for Subtyping of Tumors and Their Applications for Deciphering Tumor Heterogeneity. Methods Mol Biol 2018. [PMID: 30378077 DOI: 10.1007/978-1-4939-8868-6_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
With the rapid development of deep sequencing technologies, many programs are generating multi-platform genomic profiles (e.g., somatic mutation, DNA methylation, and gene expression) for a large number of tumors. This activity has provided unique opportunities and challenges to stratify tumors and decipher tumor heterogeneity. In this chapter, we summarize several computational methods to address the challenge of tumor stratification with different types of genomic data. We further introduce their applications in emerging large-scale genomic data to show their effectiveness in deciphering tumor heterogeneity and clinical relevance.
Collapse
|
53
|
Perakakis N, Yazdani A, Karniadakis GE, Mantzoros C. Omics, big data and machine learning as tools to propel understanding of biological mechanisms and to discover novel diagnostics and therapeutics. Metabolism 2018; 87:A1-A9. [PMID: 30098323 PMCID: PMC6325641 DOI: 10.1016/j.metabol.2018.08.002] [Citation(s) in RCA: 80] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Accepted: 08/07/2018] [Indexed: 12/12/2022]
Affiliation(s)
- Nikolaos Perakakis
- Department of Endocrinology, VA Boston Healthcare System, Jamaica Plain, Boston, MA 02130, USA; Division of Endocrinology, Diabetes and Metabolism, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA
| | - Alireza Yazdani
- Division of Applied Mathematics, Brown University, Providence, RI 02906, USA
| | | | - Christos Mantzoros
- Department of Endocrinology, VA Boston Healthcare System, Jamaica Plain, Boston, MA 02130, USA; Division of Endocrinology, Diabetes and Metabolism, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA.
| |
Collapse
|
54
|
Hurgobin B, de Jong E, Bosco A. Insights into respiratory disease through bioinformatics. Respirology 2018; 23:1117-1126. [PMID: 30218470 DOI: 10.1111/resp.13401] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 08/18/2018] [Accepted: 08/22/2018] [Indexed: 12/21/2022]
Abstract
Respiratory diseases such as asthma, chronic obstructive pulmonary disease and lung cancer represent a critical area for medical research as millions of people are affected globally. The development of new strategies for treatment and/or prevention, and the identification of biomarkers for patient stratification and early detection of disease inception are essential to reducing the impact of lung diseases. The successful translation of research into clinical practice requires a detailed understanding of the underlying biology. In this regard, the advent of next-generation sequencing and mass spectrometry has led to the generation of an unprecedented amount of data spanning multiple layers of biological regulation (genome, epigenome, transcriptome, proteome, metabolome and microbiome). Dealing with this wealth of data requires sophisticated bioinformatics and statistical tools. Here, we review the basic concepts in bioinformatics and genomic data analysis and illustrate the application of these tools to further our understanding of lung diseases. We also highlight the potential for data integration of multi-omic profiles and computational drug repurposing to define disease subphenotypes and match them to targeted therapies, paving the way for personalized medicine.
Collapse
Affiliation(s)
- Bhavna Hurgobin
- Telethon Kids Institute, The University of Western Australia, Perth, WA, Australia
| | - Emma de Jong
- Telethon Kids Institute, The University of Western Australia, Perth, WA, Australia
| | - Anthony Bosco
- Telethon Kids Institute, The University of Western Australia, Perth, WA, Australia
| |
Collapse
|
55
|
Abstract
Sarcoidosis is a complex, polygenic disease of unknown cause with diverse clinical phenotypes, ranging from self-limited, asymptomatic disease to life-altering symptoms and early disease-related mortality. It is unlikely that a single common environmental exposure (e.g., infection, antigen) entirely explains the disease, and numerous genetic mutations are associated with the disease. As such, it is reasonable to assume, as with other phenotypically diverse diseases, that distinct genetic mechanisms and related biological biomarkers will serve to further define sarcoidosis subphenotypes, mechanisms, and possibly etiology, thus guiding personalized care. The fields of "omics" and systems biology research are widely applied to understand polygenic and phenotypically diverse diseases, such as sarcoidosis. "Omics" refers to technologies that allow comprehensive profiling of sets of molecules in an organism. Systems biology applies advanced computational approaches to make sense of the enormous data sets that are typically generated from "omics" platforms. The primary objectives of this article are to review the available "omics" tools, assess the current status of "omics" and systems biology research in the field of sarcoidosis, and consider how this technology could be applied to advance our understanding of the mechanistic underpinnings of disease and to develop novel treatments.
Collapse
|
56
|
Chen J, Peng H, Han G, Cai H, Cai J. HOGMMNC: a higher order graph matching with multiple network constraints model for gene–drug regulatory modules identification. Bioinformatics 2018; 35:602-610. [DOI: 10.1093/bioinformatics/bty662] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Revised: 05/18/2018] [Accepted: 07/23/2018] [Indexed: 11/14/2022] Open
Affiliation(s)
- Jiazhou Chen
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
- Guangdong Provincial Key Lab of Computational Intelligence and Cyberspace Information, South China University of Technology, Guangzhou, China
| | - Hong Peng
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| | - Guoqiang Han
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| | - Hongmin Cai
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
- Guangdong Provincial Key Lab of Computational Intelligence and Cyberspace Information, South China University of Technology, Guangzhou, China
| | - Jiulun Cai
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| |
Collapse
|
57
|
Misra BB, Langefeld CD, Olivier M, Cox LA. Integrated Omics: Tools, Advances, and Future Approaches. J Mol Endocrinol 2018; 62:JME-18-0055. [PMID: 30006342 DOI: 10.1530/jme-18-0055] [Citation(s) in RCA: 249] [Impact Index Per Article: 35.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2018] [Revised: 07/02/2018] [Accepted: 07/12/2018] [Indexed: 12/13/2022]
Abstract
With the rapid adoption of high-throughput omic approaches to analyze biological samples such as genomics, transcriptomics, proteomics, and metabolomics, each analysis can generate tera- to peta-byte sized data files on a daily basis. These data file sizes, together with differences in nomenclature among these data types, make the integration of these multi-dimensional omics data into biologically meaningful context challenging. Variously named as integrated omics, multi-omics, poly-omics, trans-omics, pan-omics, or shortened to just 'omics', the challenges include differences in data cleaning, normalization, biomolecule identification, data dimensionality reduction, biological contextualization, statistical validation, data storage and handling, sharing, and data archiving. The ultimate goal is towards the holistic realization of a 'systems biology' understanding of the biological question in hand. Commonly used approaches in these efforts are currently limited by the 3 i's - integration, interpretation, and insights. Post integration, these very large datasets aim to yield unprecedented views of cellular systems at exquisite resolution for transformative insights into processes, events, and diseases through various computational and informatics frameworks. With the continued reduction in costs and processing time for sample analyses, and increasing types of omics datasets generated such as glycomics, lipidomics, microbiomics, and phenomics, an increasing number of scientists in this interdisciplinary domain of bioinformatics face these challenges. We discuss recent approaches, existing tools, and potential caveats in the integration of omics datasets for development of standardized analytical pipelines that could be adopted by the global omics research community.
Collapse
Affiliation(s)
- Biswapriya B Misra
- B Misra, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Carl D Langefeld
- C Langefeld, Biostatistical Sciences, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Michael Olivier
- M Olivier, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Laura A Cox
- L Cox, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| |
Collapse
|
58
|
Li C, Lee J, Ding J, Sun S. Integrative analysis of gene expression and methylation data for breast cancer cell lines. BioData Min 2018; 11:13. [PMID: 29983747 PMCID: PMC6019806 DOI: 10.1186/s13040-018-0174-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2018] [Accepted: 06/13/2018] [Indexed: 12/11/2022] Open
Abstract
Background The deadly costs of cancer and necessity for an accurate method of early cancer detection have demanded the identification of genetic and epigenetic factors associated with cancer. DNA methylation, an epigenetic event, plays an important role in cancer susceptibility. In this paper, we use DNA methylation and gene expression data integration and pathway analysis to further explore and understand the complex relationship between methylation and gene expression. Results Through linear modeling and analysis of variance, we obtain genes that show a significant correlation between methylation and gene expression. We then examine the functions and relationships of these genes using bioinformatic tools and databases. In particular, using ConsensusPathDB, we analyze the networks of statistically significant genes to identify hub genes, genes with a large number of links to other genes. We identify eight major hub genes, all in strong association with cancer susceptibility. Through further analysis of the function, gene expression level, and methylation level of these hub genes, we conclude that they are novel potential biomarkers for breast cancer. Conclusions Our findings have various implications for cancer screening, early detection methods, and potential novel treatments for cancer. Researchers can also use our results to develop more effective methods for cancer study. Electronic supplementary material The online version of this article (10.1186/s13040-018-0174-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Juyon Lee
- Korea International School Pangyo Campus, Seongnam, South Korea
| | - Jessica Ding
- Liberal Arts and Science Academy, Austin, Texas USA
| | - Shuying Sun
- 4Department of Mathematics, Texas State University, San Marcos, TX USA
| |
Collapse
|
59
|
Kawaguchi A, Yamashita F. Supervised multiblock sparse multivariable analysis with application to multimodal brain imaging genetics. Biostatistics 2018; 18:651-665. [PMID: 28369170 DOI: 10.1093/biostatistics/kxx011] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Accepted: 02/06/2017] [Indexed: 12/25/2022] Open
Abstract
This article proposes a procedure for describing the relationship between high-dimensional data sets, such as multimodal brain images and genetic data. We propose a supervised technique to incorporate the clinical outcome to determine a score, which is a linear combination of variables with hieratical structures to multimodalities. This approach is expected to obtain interpretable and predictive scores. The proposed method was applied to a study of Alzheimer's disease (AD). We propose a diagnostic method for AD that involves using whole-brain magnetic resonance imaging (MRI) and positron emission tomography (PET), and we select effective brain regions for the diagnostic probability and investigate the genome-wide association with the regions using single nucleotide polymorphisms (SNPs). The two-step dimension reduction method, which we previously introduced, was considered applicable to such a study and allows us to partially incorporate the proposed method. We show that the proposed method offers classification functions with feasibility and reasonable prediction accuracy based on the receiver operating characteristic (ROC) analysis and reasonable regions of the brain and genomes. Our simulation study based on the synthetic structured data set showed that the proposed method outperformed the original method and provided the characteristic for the supervised feature.
Collapse
Affiliation(s)
- Atsushi Kawaguchi
- Center for Comprehensive Community Medicine, Faculty of Medicine, Saga University, 5-1-1 Nabeshima, Saga 849-8501, Japan
| | - Fumio Yamashita
- Division of Ultrahigh Field MRI, Institute for Biomedical Sciences, Iwate Medical University, Yahaba, Iwate 028-3694, Japan
| | | |
Collapse
|
60
|
Hung LH, Shi K, Wu M, Young WC, Raftery AE, Yeung KY. fastBMA: scalable network inference and transitive reduction. Gigascience 2018; 6:1-10. [PMID: 29020744 PMCID: PMC5632288 DOI: 10.1093/gigascience/gix078] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Accepted: 08/10/2017] [Indexed: 11/15/2022] Open
Abstract
Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel, and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a computationally efficient module for eliminating redundant indirect edges in the network by mapping the transitive reduction to an easily solved shortest-path problem. We evaluated the performance of fastBMA on synthetic data and experimental genome-wide time series yeast and human datasets. When using a single CPU core, fastBMA is up to 100 times faster than the next fastest method, LASSO, with increased accuracy. It is a memory-efficient, parallel, and distributed application that scales to human genome-wide expression data. A 10 000-gene regulation network can be obtained in a matter of hours using a 32-core cloud cluster (2 nodes of 16 cores). fastBMA is a significant improvement over its predecessor ScanBMA. It is more accurate and orders of magnitude faster than other fast network inference methods such as the 1 based on LASSO. The improved scalability allows it to calculate networks from genome scale data in a reasonable time frame. The transitive reduction method can improve accuracy in denser networks. fastBMA is available as code (M.I.T. license) from GitHub (https://github.com/lhhunghimself/fastBMA), as part of the updated networkBMA Bioconductor package (https://www.bioconductor.org/packages/release/bioc/html/networkBMA.html) and as ready-to-deploy Docker images (https://hub.docker.com/r/biodepot/fastbma/).
Collapse
Affiliation(s)
- Ling-Hong Hung
- Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A
| | - Kaiyuan Shi
- Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A
| | - Migao Wu
- Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A
| | - William Chad Young
- Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195-4322, U.S.A
| | - Adrian E. Raftery
- Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195-4322, U.S.A
| | - Ka Yee Yeung
- Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A
- Correspondence address. Ka Yee Yeung, Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A.; Tel: 253-692-4924; Fax: 253-692-5862; E-mail:
| |
Collapse
|
61
|
Shi Q, Zhang C, Peng M, Yu X, Zeng T, Liu J, Chen L. Pattern fusion analysis by adaptive alignment of multiple heterogeneous omics data. Bioinformatics 2018; 33:2706-2714. [PMID: 28520848 DOI: 10.1093/bioinformatics/btx176] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2016] [Accepted: 03/27/2017] [Indexed: 12/20/2022] Open
Abstract
Motivation Integrating different omics profiles is a challenging task, which provides a comprehensive way to understand complex diseases in a multi-view manner. One key for such an integration is to extract intrinsic patterns in concordance with data structures, so as to discover consistent information across various data types even with noise pollution. Thus, we proposed a novel framework called 'pattern fusion analysis' (PFA), which performs automated information alignment and bias correction, to fuse local sample-patterns (e.g. from each data type) into a global sample-pattern corresponding to phenotypes (e.g. across most data types). In particular, PFA can identify significant sample-patterns from different omics profiles by optimally adjusting the effects of each data type to the patterns, thereby alleviating the problems to process different platforms and different reliability levels of heterogeneous data. Results To validate the effectiveness of our method, we first tested PFA on various synthetic datasets, and found that PFA can not only capture the intrinsic sample clustering structures from the multi-omics data in contrast to the state-of-the-art methods, such as iClusterPlus, SNF and moCluster, but also provide an automatic weight-scheme to measure the corresponding contributions by data types or even samples. In addition, the computational results show that PFA can reveal shared and complementary sample-patterns across data types with distinct signal-to-noise ratios in Cancer Cell Line Encyclopedia (CCLE) datasets, and outperforms over other works at identifying clinically distinct cancer subtypes in The Cancer Genome Atlas (TCGA) datasets. Availability and implementation PFA has been implemented as a Matlab package, which is available at http://www.sysbio.ac.cn/cb/chenlab/images/PFApackage_0.1.rar . Contact lnchen@sibs.ac.cn , liujuan@whu.edu.cn or zengtao@sibs.ac.cn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qianqian Shi
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shanghai 200031, China
| | - Chuanchao Zhang
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shanghai 200031, China.,State Key Laboratory of Software Engineering, School of Computer, Wuhan University, Wuhan 430072, China
| | - Minrui Peng
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shanghai 200031, China
| | - Xiangtian Yu
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shanghai 200031, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shanghai 200031, China
| | - Juan Liu
- State Key Laboratory of Software Engineering, School of Computer, Wuhan University, Wuhan 430072, China
| | - Luonan Chen
- Key Laboratory of Systems Biology, CAS Center for Excellence in Molecular Cell Science, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
62
|
Chen J, Zhang S. Matrix Integrative Analysis (MIA) of Multiple Genomic Data for Modular Patterns. Front Genet 2018; 9:194. [PMID: 29910825 PMCID: PMC5992392 DOI: 10.3389/fgene.2018.00194] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Accepted: 05/11/2018] [Indexed: 11/13/2022] Open
Abstract
The increasing availability of high-throughput biological data, especially multi-dimensional genomic data across the same samples, has created an urgent need for modular and integrative analysis tools that can reveal the relationships among different layers of cellular activities. To this end, we present a MATLAB package, Matrix Integration Analysis (MIA), implementing and extending four published methods, designed based on two classical techniques, non-negative matrix factorization (NMF), and partial least squares (PLS). This package can integrate diverse types of genomic data (e.g., copy number variation, DNA methylation, gene expression, microRNA expression profiles, and/or gene network data) to identify the underlying modular patterns by each method. Particularly, we demonstrate the differences between these two classes of methods, which give users some suggestions about how to select a suitable method in the MIA package. MIA is a flexible tool which could handle a wide range of biological problems and data types. Besides, we also provide an executable version for users without a MATLAB license.
Collapse
Affiliation(s)
- Jinyu Chen
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, CAS, Beijing, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, CAS, Beijing, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| |
Collapse
|
63
|
Dihazi H, Asif AR, Beißbarth T, Bohrer R, Feussner K, Feussner I, Jahn O, Lenz C, Majcherczyk A, Schmidt B, Schmitt K, Urlaub H, Valerius O. Integrative omics - from data to biology. Expert Rev Proteomics 2018; 15:463-466. [DOI: 10.1080/14789450.2018.1476143] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Affiliation(s)
- Hassan Dihazi
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Nephrology and Rheumatology, University Medical Center Göttingen, University of Göttingen, Göttingen, Germany
| | - Abdul R. Asif
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Institute for Clinical Chemistry/UMG-Laborateries, University Medical Center Göttingen, University of Göttingen, Göttingen, Germany
| | - Tim Beißbarth
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Department of Medical Statistics, University Medical Center Göttingen, University of Göttingen, Göttingen, Germany
| | - Rainer Bohrer
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Gesellschaft für Wissenschaftlische Datenverarbeitung mbH, Göttingen, Germany
| | - Kirstin Feussner
- Göttingen Metabolomics and Lipidomics Platform (GMLP), Göttingen, Germany
- Department of Plant Biochemistry, Albrecht-von-Haller-Institute for Plant Sciences, University of Göttingen, Göttingen, Germany
| | - Ivo Feussner
- Göttingen Metabolomics and Lipidomics Platform (GMLP), Göttingen, Germany
- Department of Plant Biochemistry, Albrecht-von-Haller-Institute for Plant Sciences, University of Göttingen, Göttingen, Germany
| | - Olaf Jahn
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Proteomics Group, Max Planck Institute of Experimental Medicine, Göttingen, Germany
| | - Christof Lenz
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Institute for Clinical Chemistry/UMG-Laborateries, University Medical Center Göttingen, University of Göttingen, Göttingen, Germany
- Bioanalytical Mass Spectrometry, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Andrzej Majcherczyk
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Büsgen-Institute, Section Molecular Wood Biotechnology and Technical Mycology, University of Göttingen, Göttingen, Germany
| | - Bernhard Schmidt
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Department of Cellular Biochemistry, University Medical Center Göttingen, Göttingen, Germany
| | - Kerstin Schmitt
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Institute for Microbiology and Genetics, University of Göttingen, Göttingen, Germany
| | - Henning Urlaub
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Institute for Clinical Chemistry/UMG-Laborateries, University Medical Center Göttingen, University of Göttingen, Göttingen, Germany
- Bioanalytical Mass Spectrometry, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Oliver Valerius
- Göttingen Proteomics Forum (GPF), Göttingen, Germany
- Institute for Microbiology and Genetics, University of Göttingen, Göttingen, Germany
| |
Collapse
|
64
|
Zhang J, Zhang S. The Discovery of Mutated Driver Pathways in Cancer: Models and Algorithms. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:988-998. [PMID: 28113329 DOI: 10.1109/tcbb.2016.2640963] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The pathogenesis of cancer in human is still poorly understood. With the rapid development of high-throughput sequencing technologies, huge volumes of cancer genomics data have been generated. Deciphering that data poses great opportunities and challenges to computational biologists. One of such key challenges is to distinguish driver mutations, genes as well as pathways from passenger ones. Mutual exclusivity of gene mutations (each patient has no more than one mutation in the gene set) has been observed in various cancer types and thus has been used as an important property of a driver gene set or pathway. In this article, we aim to review the recent development of computational models and algorithms for discovering driver pathways or modules in cancer with the focus on mutual exclusivity-based ones.
Collapse
|
65
|
Gut metabolome meets microbiome: A methodological perspective to understand the relationship between host and microbe. Methods 2018; 149:3-12. [PMID: 29715508 DOI: 10.1016/j.ymeth.2018.04.029] [Citation(s) in RCA: 118] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Revised: 03/06/2018] [Accepted: 04/22/2018] [Indexed: 02/06/2023] Open
Abstract
It is well established that gut microbes and their metabolic products regulate host metabolism. The interactions between the host and its gut microbiota are highly dynamic and complex. In this review we present and discuss the metabolomic strategies to study the gut microbial ecosystem. We highlight the metabolic profiling approaches to study faecal samples aimed at deciphering the metabolic product derived from gut microbiota. We also discuss how metabolomics data can be integrated with metagenomics data derived from gut microbiota and how such approaches may lead to better understanding of the microbial functions. Finally, the emerging approaches of genome-scale metabolic modelling to study microbial co-metabolism and host-microbe interactions are highlighted.
Collapse
|
66
|
Abstract
BACKGROUND Omics profiling is now a routine component of biomedical studies. In the analysis of omics data, clustering is an essential step and serves multiple purposes including for example revealing the unknown functionalities of omics units, assisting dimension reduction in outcome model building, and others. In the most recent omics studies, a prominent trend is to conduct multilayer profiling, which collects multiple types of genetic, genomic, epigenetic and other measurements on the same subjects. In the literature, clustering methods tailored to multilayer omics data are still limited. Directly applying the existing clustering methods to multilayer omics data and clustering each layer first and then combing across layers are both "suboptimal" in that they do not accommodate the interconnections within layers and across layers in an informative way. METHODS In this study, we develop the MuNCut (Multilayer NCut) clustering approach. It is tailored to multilayer omics data and sufficiently accounts for both across- and within-layer connections. It is based on the novel NCut technique and also takes advantages of regularized sparse estimation. It has an intuitive formulation and is computationally very feasible. To facilitate implementation, we develop the function muncut in the R package NcutYX. RESULTS Under a wide spectrum of simulation settings, it outperforms competitors. The analysis of TCGA (The Cancer Genome Atlas) data on breast cancer and cervical cancer shows that MuNCut generates biologically meaningful results which differ from those using the alternatives. CONCLUSIONS We propose a more effective clustering analysis of multiple omics data. It provides a new venue for jointly analyzing genetic, genomic, epigenetic and other measurements.
Collapse
|
67
|
Xie B, Yuan Z, Yang Y, Sun Z, Zhou S, Fang X. MOBCdb: a comprehensive database integrating multi-omics data on breast cancer for precision medicine. Breast Cancer Res Treat 2018; 169:625-632. [PMID: 29429018 DOI: 10.1007/s10549-018-4708-z] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Accepted: 02/03/2018] [Indexed: 12/23/2022]
Abstract
BACKGROUND Breast cancer is one of the most frequently diagnosed cancers among women worldwide, characterized by diverse biological heterogeneity. It is well known that complex and combined gene regulation of multi-omics is involved in the occurrence and development of breast cancer. RESULTS In this paper, we present the Multi-Omics Breast Cancer Database (MOBCdb), a simple and easily accessible repository that integrates genomic, transcriptomic, epigenomic, clinical, and drug response data of different subtypes of breast cancer. MOBCdb allows users to retrieve simple nucleotide variation (SNV), gene expression, microRNA expression, DNA methylation, and specific drug response data by various search fashions. The genome-wide browser /navigation facility in MOBCdb provides an interface for visualizing multi-omics data of multi-samples simultaneously. Furthermore, the survival module provides survival analysis for all or some of the samples by using data of three omics. The approved public drugs with genetic variations on breast cancer are also included in MOBCdb. CONCLUSION In summary, MOBCdb provides users a unique web interface to the integrated multi-omics data of different subtypes of breast cancer, which enables the users to identify potential novel biomarkers for precision medicine.
Collapse
Affiliation(s)
- Bingbing Xie
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zifeng Yuan
- Shanghai Key Lab of Intelligent Information Processing and School of Computer Science, Fudan University, Shanghai, 200433, China
| | - Yadong Yang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhidan Sun
- Shanghai Key Lab of Intelligent Information Processing and School of Computer Science, Fudan University, Shanghai, 200433, China
| | - Shuigeng Zhou
- Shanghai Key Lab of Intelligent Information Processing and School of Computer Science, Fudan University, Shanghai, 200433, China.
| | - Xiangdong Fang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
68
|
An Overview of Metabolomics Data Analysis: Current Tools and Future Perspectives. COMPREHENSIVE ANALYTICAL CHEMISTRY 2018. [DOI: 10.1016/bs.coac.2018.07.001] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
69
|
Kang M, Park J, Kim DC, Biswas AK, Liu C, Gao J. Multi-Block Bipartite Graph for Integrative Genomic Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1350-1358. [PMID: 27429442 DOI: 10.1109/tcbb.2016.2591521] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Human diseases involve a sequence of complex interactions between multiple biological processes. In particular, multiple genomic data such as Single Nucleotide Polymorphism (SNP), Copy Number Variation (CNV), DNA Methylation (DM), and their interactions simultaneously play an important role in human diseases. However, despite the widely known complex multi-layer biological processes and increased availability of the heterogeneous genomic data, most research has considered only a single type of genomic data. Furthermore, recent integrative genomic studies for the multiple genomic data have also been facing difficulties due to the high-dimensionality and complexity, especially when considering their intra- and inter-block interactions. In this paper, we introduce a novel multi-block bipartite graph and its inference methods, MB2I and sMB2I, for the integrative genomic study. The proposed methods not only integrate multiple genomic data but also incorporate intra/inter-block interactions by using a multi-block bipartite graph. In addition, the methods can be used to predict quantitative traits (e.g., gene expression, survival time) from the multi-block genomic data. The performance was assessed by simulation experiments that implement practical situations. We also applied the method to the human brain data of psychiatric disorders. The experimental results were analyzed by maximum edge biclique and biclustering, and biological findings were discussed.
Collapse
|
70
|
Huang S, Chaudhary K, Garmire LX. More Is Better: Recent Progress in Multi-Omics Data Integration Methods. Front Genet 2017; 8:84. [PMID: 28670325 PMCID: PMC5472696 DOI: 10.3389/fgene.2017.00084] [Citation(s) in RCA: 407] [Impact Index Per Article: 50.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2017] [Accepted: 06/01/2017] [Indexed: 01/20/2023] Open
Abstract
Multi-omics data integration is one of the major challenges in the era of precision medicine. Considerable work has been done with the advent of high-throughput studies, which have enabled the data access for downstream analyses. To improve the clinical outcome prediction, a gamut of software tools has been developed. This review outlines the progress done in the field of multi-omics integration and comprehensive tools developed so far in this field. Further, we discuss the integration methods to predict patient survival at the end of the review.
Collapse
Affiliation(s)
- Sijia Huang
- Epidemiology Program, University of Hawaii Cancer CenterHonolulu, HI, United States.,Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at ManoaHonolulu, HI, United States
| | - Kumardeep Chaudhary
- Epidemiology Program, University of Hawaii Cancer CenterHonolulu, HI, United States
| | - Lana X Garmire
- Epidemiology Program, University of Hawaii Cancer CenterHonolulu, HI, United States.,Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at ManoaHonolulu, HI, United States.,Department of Obstetrics, Gynecology, and Women's Health, John A. Burns School of Medicine, University of Hawaii at ManoaHonolulu, HI, United States
| |
Collapse
|
71
|
Arneson D, Shu L, Tsai B, Barrere-Cain R, Sun C, Yang X. Multidimensional Integrative Genomics Approaches to Dissecting Cardiovascular Disease. Front Cardiovasc Med 2017; 4:8. [PMID: 28289683 PMCID: PMC5327355 DOI: 10.3389/fcvm.2017.00008] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Accepted: 02/09/2017] [Indexed: 12/19/2022] Open
Abstract
Elucidating the mechanisms of complex diseases such as cardiovascular disease (CVD) remains a significant challenge due to multidimensional alterations at molecular, cellular, tissue, and organ levels. To better understand CVD and offer insights into the underlying mechanisms and potential therapeutic strategies, data from multiple omics types (genomics, epigenomics, transcriptomics, metabolomics, proteomics, microbiomics) from both humans and model organisms have become available. However, individual omics data types capture only a fraction of the molecular mechanisms. To address this challenge, there have been numerous efforts to develop integrative genomics methods that can leverage multidimensional information from diverse data types to derive comprehensive molecular insights. In this review, we summarize recent methodological advances in multidimensional omics integration, exemplify their applications in cardiovascular research, and pinpoint challenges and future directions in this incipient field.
Collapse
Affiliation(s)
- Douglas Arneson
- Department of Integrative Biology and Physiology, University of California Los Angeles, Los Angeles, CA, USA; Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Le Shu
- Department of Integrative Biology and Physiology, University of California Los Angeles, Los Angeles, CA, USA; Molecular, Cellular, and Integrative Physiology Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Brandon Tsai
- Department of Integrative Biology and Physiology, University of California Los Angeles , Los Angeles, CA , USA
| | - Rio Barrere-Cain
- Department of Integrative Biology and Physiology, University of California Los Angeles , Los Angeles, CA , USA
| | - Christine Sun
- Department of Integrative Biology and Physiology, University of California Los Angeles , Los Angeles, CA , USA
| | - Xia Yang
- Department of Integrative Biology and Physiology, University of California Los Angeles, Los Angeles, CA, USA; Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA; Molecular, Cellular, and Integrative Physiology Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA; Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA; Molecular Biology Institute, University of California Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
72
|
Qin J, Yan B, Hu Y, Wang P, Wang J. Applications of integrative OMICs approaches to gene regulation studies. QUANTITATIVE BIOLOGY 2016. [DOI: 10.1007/s40484-016-0085-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
73
|
Cao Z, Zhang S. An integrative and comparative study of pan-cancer transcriptomes reveals distinct cancer common and specific signatures. Sci Rep 2016; 6:33398. [PMID: 27633916 PMCID: PMC5025752 DOI: 10.1038/srep33398] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Accepted: 08/24/2016] [Indexed: 12/11/2022] Open
Abstract
To investigate the commonalities and specificities across tumor lineages, we perform a systematic pan-cancer transcriptomic study across 6744 specimens. We find six pan-cancer subnetwork signatures which relate to cell cycle, immune response, Sp1 regulation, collagen, muscle system and angiogenesis. Moreover, four pan-cancer subnetwork signatures demonstrate strong prognostic potential. We also characterize 16 cancer type-specific subnetwork signatures which show diverse implications to somatic mutations, somatic copy number aberrations, DNA methylation alterations and clinical outcomes. Furthermore, some of them are strongly correlated with histological or molecular subtypes, indicating their implications with tumor heterogeneity. In summary, we systematically explore the pan-cancer common and cancer type-specific gene subnetwork signatures across multiple cancers, and reveal distinct commonalities and specificities among cancers at transcriptomic level.
Collapse
Affiliation(s)
- Zhen Cao
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| | - Shihua Zhang
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| |
Collapse
|
74
|
Joint analysis of multiple high-dimensional data types using sparse matrix approximations of rank-1 with applications to ovarian and liver cancer. BioData Min 2016; 9:24. [PMID: 27478503 PMCID: PMC4966782 DOI: 10.1186/s13040-016-0103-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Accepted: 07/05/2016] [Indexed: 12/15/2022] Open
Abstract
Background Technological advances enable the cost-effective acquisition of Multi-Modal Data Sets (MMDS) composed of measurements for multiple, high-dimensional data types obtained from a common set of bio-samples. The joint analysis of the data matrices associated with the different data types of a MMDS should provide a more focused view of the biology underlying complex diseases such as cancer that would not be apparent from the analysis of a single data type alone. As multi-modal data rapidly accumulate in research laboratories and public databases such as The Cancer Genome Atlas (TCGA), the translation of such data into clinically actionable knowledge has been slowed by the lack of computational tools capable of analyzing MMDSs. Here, we describe the Joint Analysis of Many Matrices by ITeration (JAMMIT) algorithm that jointly analyzes the data matrices of a MMDS using sparse matrix approximations of rank-1. Methods The JAMMIT algorithm jointly approximates an arbitrary number of data matrices by rank-1 outer-products composed of “sparse” left-singular vectors (eigen-arrays) that are unique to each matrix and a right-singular vector (eigen-signal) that is common to all the matrices. The non-zero coefficients of the eigen-arrays identify small subsets of variables for each data type (i.e., signatures) that in aggregate, or individually, best explain a dominant eigen-signal defined on the columns of the data matrices. The approximation is specified by a single “sparsity” parameter that is selected based on false discovery rate estimated by permutation testing. Multiple signals of interest in a given MDDS are sequentially detected and modeled by iterating JAMMIT on “residual” data matrices that result from a given sparse approximation. Results We show that JAMMIT outperforms other joint analysis algorithms in the detection of multiple signatures embedded in simulated MDDS. On real multimodal data for ovarian and liver cancer we show that JAMMIT identified multi-modal signatures that were clinically informative and enriched for cancer-related biology. Conclusions Sparse matrix approximations of rank-1 provide a simple yet effective means of jointly reducing multiple, big data types to a small subset of variables that characterize important clinical and/or biological attributes of the bio-samples from which the data were acquired. Electronic supplementary material The online version of this article (doi:10.1186/s13040-016-0103-7) contains supplementary material, which is available to authorized users.
Collapse
|
75
|
Zhu R, Zhao Q, Zhao H, Ma S. Integrating multidimensional omics data for cancer outcome. Biostatistics 2016; 17:605-18. [PMID: 26980320 PMCID: PMC5031941 DOI: 10.1093/biostatistics/kxw010] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Accepted: 01/27/2016] [Indexed: 01/06/2023] Open
Abstract
In multidimensional cancer omics studies, one subject is profiled on multiple layers of omics activities. In this article, the goal is to integrate multiple types of omics measurements, identify markers, and build a model for cancer outcome. The proposed analysis is achieved in two steps. In the first step, we analyze the regulation among different types of omics measurements, through the construction of linear regulatory modules (LRMs). The LRMs have sound biological basis, and their construction differs from the existing analyses by modeling the regulation of sets of gene expressions (GEs) by sets of regulators. The construction is realized with the assistance of regularized singular value decomposition. In the second step, the proposed cancer outcome model includes the regulated GEs, "residuals" of GEs, and "residuals" of regulators, and we use regularized estimation to select relevant markers. Simulation shows that the proposed method outperforms the alternatives with more accurate marker identification. We analyze the The Cancer Genome Atlas data on cutaneous melanoma and lung adenocarcinoma and obtain meaningful results.
Collapse
Affiliation(s)
- Ruoqing Zhu
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | - Qing Zhao
- Department of Biostatistics, Yale University, New Haven, CT, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, New Haven, CT, USA
| | - Shuangge Ma
- Department of Biostatistics, Yale University, New Haven, CT, USA
| |
Collapse
|
76
|
Chen J, Zhang S. Integrative analysis for identifying joint modular patterns of gene-expression and drug-response data. Bioinformatics 2016; 32:1724-32. [DOI: 10.1093/bioinformatics/btw059] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2015] [Accepted: 01/27/2016] [Indexed: 12/13/2022] Open
|
77
|
Bersanelli M, Mosca E, Remondini D, Giampieri E, Sala C, Castellani G, Milanesi L. Methods for the integration of multi-omics data: mathematical aspects. BMC Bioinformatics 2016; 17 Suppl 2:15. [PMID: 26821531 PMCID: PMC4959355 DOI: 10.1186/s12859-015-0857-9] [Citation(s) in RCA: 246] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Methods for the integrative analysis of multi-omics data are required to draw a more complete and accurate picture of the dynamics of molecular systems. The complexity of biological systems, the technological limits, the large number of biological variables and the relatively low number of biological samples make the analysis of multi-omics datasets a non-trivial problem. RESULTS AND CONCLUSIONS We review the most advanced strategies for integrating multi-omics datasets, focusing on mathematical and methodological aspects.
Collapse
Affiliation(s)
- Matteo Bersanelli
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy. .,Institute of Biomedical Technologies - CNR, Via Fratelli Cervi 93, Segrate MI, 20090, Italy.
| | - Ettore Mosca
- Institute of Biomedical Technologies - CNR, Via Fratelli Cervi 93, Segrate MI, 20090, Italy.
| | - Daniel Remondini
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy.
| | - Enrico Giampieri
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy.
| | - Claudia Sala
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy.
| | - Gastone Castellani
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy.
| | - Luciano Milanesi
- Institute of Biomedical Technologies - CNR, Via Fratelli Cervi 93, Segrate MI, 20090, Italy.
| |
Collapse
|
78
|
Liu B, Shen X, Pan W. Integrative and regularized principal component analysis of multiple sources of data. Stat Med 2016; 35:2235-50. [PMID: 26756854 DOI: 10.1002/sim.6866] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2015] [Revised: 09/28/2015] [Accepted: 12/14/2015] [Indexed: 12/14/2022]
Abstract
Integration of data of disparate types has become increasingly important to enhancing the power for new discoveries by combining complementary strengths of multiple types of data. One application is to uncover tumor subtypes in human cancer research in which multiple types of genomic data are integrated, including gene expression, DNA copy number, and DNA methylation data. In spite of their successes, existing approaches based on joint latent variable models require stringent distributional assumptions and may suffer from unbalanced scales (or units) of different types of data and non-scalability of the corresponding algorithms. In this paper, we propose an alternative based on integrative and regularized principal component analysis, which is distribution-free, computationally efficient, and robust against unbalanced scales. The new method performs dimension reduction simultaneously on multiple types of data, seeking data-adaptive sparsity and scaling. As a result, in addition to feature selection for each type of data, integrative clustering is achieved. Numerically, the proposed method compares favorably against its competitors in terms of accuracy (in identifying hidden clusters), computational efficiency, and robustness against unbalanced scales. In particular, compared with a popular method, the new method was competitive in identifying tumor subtypes associated with distinct patient survival patterns when applied to a combined analysis of DNA copy number, mRNA expression, and DNA methylation data in a glioblastoma multiforme study. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Binghui Liu
- School of Mathematics and Statistics, Northeast Normal University, Changchun, 130024, Jilin Province, China.,School of Statistics, University of Minnesota, 224 Church St. S.E., Minneapolis, 55455, MN, U.S.A.,Division of Biostatistics, University of Minnesota, 420 Delaware St. S.E., Minneapolis, 55455, MN, U.S.A
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, 224 Church St. S.E., Minneapolis, 55455, MN, U.S.A
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, 420 Delaware St. S.E., Minneapolis, 55455, MN, U.S.A
| |
Collapse
|
79
|
He H, Lin D, Zhang J, Wang Y, Deng HW. Biostatistics, Data Mining and Computational Modeling. TRANSLATIONAL BIOINFORMATICS 2016. [DOI: 10.1007/978-94-017-7543-4_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
80
|
Yang Z, Michailidis G. A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. Bioinformatics 2016; 32:1-8. [PMID: 26377073 PMCID: PMC5006236 DOI: 10.1093/bioinformatics/btv544] [Citation(s) in RCA: 112] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2015] [Revised: 09/08/2015] [Accepted: 09/09/2015] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION Recent advances in high-throughput omics technologies have enabled biomedical researchers to collect large-scale genomic data. As a consequence, there has been growing interest in developing methods to integrate such data to obtain deeper insights regarding the underlying biological system. A key challenge for integrative studies is the heterogeneity present in the different omics data sources, which makes it difficult to discern the coordinated signal of interest from source-specific noise or extraneous effects. RESULTS We introduce a novel method of multi-modal data analysis that is designed for heterogeneous data based on non-negative matrix factorization. We provide an algorithm for jointly decomposing the data matrices involved that also includes a sparsity option for high-dimensional settings. The performance of the proposed method is evaluated on synthetic data and on real DNA methylation, gene expression and miRNA expression data from ovarian cancer samples obtained from The Cancer Genome Atlas. The results show the presence of common modules across patient samples linked to cancer-related pathways, as well as previously established ovarian cancer subtypes. AVAILABILITY AND IMPLEMENTATION The source code repository is publicly available at https://github.com/yangzi4/iNMF. CONTACT gmichail@umich.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zi Yang
- Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - George Michailidis
- Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
81
|
Xu Y, Zhu Y, Müller P, Mitra R, Ji Y. Characterizing Cancer-Specific Networks by Integrating TCGA Data. Cancer Inform 2015; 13:125-31. [PMID: 26628858 PMCID: PMC4657757 DOI: 10.4137/cin.s13776] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Revised: 05/11/2015] [Accepted: 05/12/2015] [Indexed: 12/26/2022] Open
Abstract
The Cancer Genome Atlas (TCGA) generates comprehensive genomic data for thousands of patients over more than 20 cancer types. TCGA data are typically whole-genome measurements of multiple genomic features, such as DNA copy numbers, DNA methylation, and gene expression, providing unique opportunities for investigating cancer mechanism from multiple molecular and regulatory layers. We propose a Bayesian graphical model to systemically integrate multi-platform TCGA data for inference of the interactions between different genomic features either within a gene or between multiple genes. The presence or absence of edges in the graph indicates the presence or absence of conditional dependence between genomic features. The inference is restricted to genes within a known biological network, but can be extended to any sets of genes. Applying the model to the same genes using patient samples in two different cancer types, we identify network components that are common as well as different between cancer types. The examples and codes are available at https://www.ma.utexas.edu/users/yxu/software.html.
Collapse
Affiliation(s)
- Yanxun Xu
- Department of Statistics and Data Sciences, The University of Texas at Austin, Austin, TX, USA
| | - Yitan Zhu
- Northshore University HealthSystem, Evanston, IL, USA
| | - Peter Müller
- Department of Mathematics, The University of Texas at Austin, Austin, TX, USA
| | - Riten Mitra
- School of Public Health and Information Sciences, The University of Louisville, Louisville, KY, USA
| | - Yuan Ji
- Northshore University HealthSystem, Evanston, IL, USA. ; Department of Public Health Sciences, The University of Chicago, Chicago, IL, USA
| |
Collapse
|
82
|
Integrative phenotyping framework (iPF): integrative clustering of multiple omics data identifies novel lung disease subphenotypes. BMC Genomics 2015; 16:924. [PMID: 26560100 PMCID: PMC4642618 DOI: 10.1186/s12864-015-2170-4] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 10/31/2015] [Indexed: 12/15/2022] Open
Abstract
Background The increased multi-omics information on carefully phenotyped patients in studies of complex diseases requires novel methods for data integration. Unlike continuous intensity measurements from most omics data sets, phenome data contain clinical variables that are binary, ordinal and categorical. Results In this paper we introduce an integrative phenotyping framework (iPF) for disease subtype discovery. A feature topology plot was developed for effective dimension reduction and visualization of multi-omics data. The approach is free of model assumption and robust to data noises or missingness. We developed a workflow to integrate homogeneous patient clustering from different omics data in an agglomerative manner and then visualized heterogeneous clustering of pairwise omics sources. We applied the framework to two batches of lung samples obtained from patients diagnosed with chronic obstructive lung disease (COPD) or interstitial lung disease (ILD) with well-characterized clinical (phenomic) data, mRNA and microRNA expression profiles. Application of iPF to the first training batch identified clusters of patients consisting of homogenous disease phenotypes as well as clusters with intermediate disease characteristics. Analysis of the second batch revealed a similar data structure, confirming the presence of intermediate clusters. Genes in the intermediate clusters were enriched with inflammatory and immune functional annotations, suggesting that they represent mechanistically distinct disease subphenotypes that may response to immunomodulatory therapies. The iPF software package and all source codes are publicly available. Conclusions Identification of subclusters with distinct clinical and biomolecular characteristics suggests that integration of phenomic and other omics information could lead to identification of novel mechanism-based disease sub-phenotypes. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2170-4) contains supplementary material, which is available to authorized users.
Collapse
|
83
|
Kang M, Kim DC, Liu C, Gao J. Multiblock discriminant analysis for integrative genomic study. BIOMED RESEARCH INTERNATIONAL 2015; 2015:783592. [PMID: 26075260 PMCID: PMC4450020 DOI: 10.1155/2015/783592] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Accepted: 04/21/2015] [Indexed: 12/27/2022]
Abstract
Human diseases are abnormal medical conditions in which multiple biological components are complicatedly involved. Nevertheless, most contributions of research have been made with a single type of genetic data such as Single Nucleotide Polymorphism (SNP) or Copy Number Variation (CNV). Furthermore, epigenetic modifications and transcriptional regulations have to be considered to fully exploit the knowledge of the complex human diseases as well as the genomic variants. We call the collection of the multiple heterogeneous data "multiblock data." In this paper, we propose a novel Multiblock Discriminant Analysis (MultiDA) method that provides a new integrative genomic model for the multiblock analysis and an efficient algorithm for discriminant analysis. The integrative genomic model is built by exploiting the representative genomic data including SNP, CNV, DNA methylation, and gene expression. The efficient algorithm for the discriminant analysis identifies discriminative factors of the multiblock data. The discriminant analysis is essential to discover biomarkers in computational biology. The performance of the proposed MultiDA was assessed by intensive simulation experiments, where the outstanding performance comparing the related methods was reported. As a target application, we applied MultiDA to human brain data of psychiatric disorders. The findings and gene regulatory network derived from the experiment are discussed.
Collapse
Affiliation(s)
- Mingon Kang
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA
| | - Dong-Chul Kim
- Department of Computer Science, University of Texas-Pan American, Edinburg, TX 78539, USA
| | - Chunyu Liu
- Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 66012, USA
| | - Jean Gao
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA
| |
Collapse
|
84
|
Wang P, Qin J, Qin Y, Zhu Y, Wang LY, Li MJ, Zhang MQ, Wang J. ChIP-Array 2: integrating multiple omics data to construct gene regulatory networks. Nucleic Acids Res 2015; 43:W264-9. [PMID: 25916854 PMCID: PMC4489297 DOI: 10.1093/nar/gkv398] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2015] [Accepted: 04/15/2015] [Indexed: 12/17/2022] Open
Abstract
Transcription factors (TFs) play an important role in gene regulation. The interconnections among TFs, chromatin interactions, epigenetic marks and cis-regulatory elements form a complex gene transcription apparatus. Our previous work, ChIP-Array, combined TF binding and transcriptome data to construct gene regulatory networks (GRNs). Here we present an enhanced version, ChIP-Array 2, to integrate additional types of omics data including long-range chromatin interaction, open chromatin region and histone modification data to dissect more comprehensive GRNs involving diverse regulatory components. Moreover, we substantially extended our motif database for human, mouse, rat, fruit fly, worm, yeast and Arabidopsis, and curated large amount of omics data for users to select as input or backend support. With ChIP-Array 2, we compiled a library containing regulatory networks of 18 TFs/chromatin modifiers in mouse embryonic stem cell (mESC). The web server and the mESC library are publicly free and accessible athttp://jjwanglab.org/chip-array.
Collapse
Affiliation(s)
- Panwen Wang
- Centre for Genomic Sciences and Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, Guangdong 518057, China
| | - Jing Qin
- Centre for Genomic Sciences and Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, Guangdong 518057, China
| | - Yiming Qin
- Centre for Genomic Sciences and Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Yun Zhu
- Centre for Genomic Sciences and Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, Guangdong 518057, China
| | - Lily Yan Wang
- Centre for Genomic Sciences and Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, Guangdong 518057, China
| | - Mulin Jun Li
- Centre for Genomic Sciences and Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, Guangdong 518057, China
| | - Michael Q Zhang
- Bioinformatics Division, TNLIST, Tsinghua University, Beijing 100084, China Department of Molecular and Cell Biology, Center for Systems Biology, The University of Texas at Dallas, Dallas, TX 75080, USA
| | - Junwen Wang
- Centre for Genomic Sciences and Department of Biochemistry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, Guangdong 518057, China
| |
Collapse
|
85
|
Ping Y, Deng Y, Wang L, Zhang H, Zhang Y, Xu C, Zhao H, Fan H, Yu F, Xiao Y, Li X. Identifying core gene modules in glioblastoma based on multilayer factor-mediated dysfunctional regulatory networks through integrating multi-dimensional genomic data. Nucleic Acids Res 2015; 43:1997-2007. [PMID: 25653168 PMCID: PMC4344511 DOI: 10.1093/nar/gkv074] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
The driver genetic aberrations collectively regulate core cellular processes underlying cancer development. However, identifying the modules of driver genetic alterations and characterizing their functional mechanisms are still major challenges for cancer studies. Here, we developed an integrative multi-omics method CMDD to identify the driver modules and their affecting dysregulated genes through characterizing genetic alteration-induced dysregulated networks. Applied to glioblastoma (GBM), the CMDD identified a core gene module of 17 genes, including seven known GBM drivers, and their dysregulated genes. The module showed significant association with shorter survival of GBM. When classifying driver genes in the module into two gene sets according to their genetic alteration patterns, we found that one gene set directly participated in the glioma pathway, while the other indirectly regulated the glioma pathway, mostly, via their dysregulated genes. Both of the two gene sets were significant contributors to survival and helpful for classifying GBM subtypes, suggesting their critical roles in GBM pathogenesis. Also, by applying the CMDD to other six cancers, we identified some novel core modules associated with overall survival of patients. Together, these results demonstrate integrative multi-omics data can identify driver modules and uncover their dysregulated genes, which is useful for interpreting cancer genome.
Collapse
Affiliation(s)
- Yanyan Ping
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150086, China
| | - Yulan Deng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150086, China
| | - Li Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150086, China
| | - Hongyi Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150086, China
| | - Yong Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150086, China
| | - Chaohan Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150086, China
| | - Hongying Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150086, China
| | - Huihui Fan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150086, China
| | - Fulong Yu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150086, China
| | - Yun Xiao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150086, China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150086, China
| |
Collapse
|
86
|
Role of microRNAs in cancers of the female reproductive tract: insights from recent clinical and experimental discovery studies. Clin Sci (Lond) 2014; 128:153-80. [PMID: 25294164 DOI: 10.1042/cs20140087] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
microRNAs (miRNAs) are small RNA molecules that represent the top of the pyramid of many tumorigenesis cascade pathways as they have the ability to affect multiple, intricate, and still undiscovered downstream targets. Understanding how miRNA molecules serve as master regulators in these important networks involved in cancer initiation and progression open up significant innovative areas for therapy and diagnosis that have been sadly lacking for deadly female reproductive tract cancers. This review will highlight the recent advances in the field of miRNAs in epithelial ovarian cancer, endometrioid endometrial cancer and squamous-cell cervical carcinoma focusing on studies associated with actual clinical information in humans. Importantly, recent miRNA profiling studies have included well-characterized clinical specimens of female reproductive tract cancers, allowing for studies correlating miRNA expression with clinical outcomes. This review will summarize the current thoughts on the role of miRNA processing in unique miRNA species present in these cancers. In addition, this review will focus on current data regarding miRNA molecules as unique biomarkers associated with clinically significant outcomes such as overall survival and chemotherapy resistance. We will also discuss why specific miRNA molecules are not recapitulated across multiple studies of the same cancer type. Although the mechanistic contributions of miRNA molecules to these clinical phenomena have been confirmed using in vitro and pre-clinical mouse model systems, these studies are truly only the beginning of our understanding of the roles miRNAs play in cancers of the female reproductive tract. This review will also highlight useful areas for future research regarding miRNAs as therapeutic targets in cancers of the female reproductive tract.
Collapse
|
87
|
Dellinger AE, Nixon AB, Pang H. Integrative Pathway Analysis Using Graph-Based Learning with Applications to TCGA Colon and Ovarian Data. Cancer Inform 2014; 13:1-9. [PMID: 25125969 PMCID: PMC4125381 DOI: 10.4137/cin.s13634] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Revised: 03/17/2014] [Accepted: 03/18/2014] [Indexed: 12/15/2022] Open
Abstract
Recent method development has included multi-dimensional genomic data algorithms because such methods have more accurately predicted clinical phenotypes related to disease. This study is the first to conduct an integrative genomic pathway-based analysis with a graph-based learning algorithm. The methodology of this analysis, graph-based semi-supervised learning, detects pathways that improve prediction of a dichotomous variable, which in this study is cancer stage. This analysis integrates genome-level gene expression, methylation, and single nucleotide polymorphism (SNP) data in serous cystadenocarcinoma (OV) and colon adenocarcinoma (COAD). The top 10 ranked predictive pathways in COAD and OV were biologically relevant to their respective cancer stages and significantly enhanced prediction accuracy and area under the ROC curve (AUC) when compared to single data-type analyses. This method is an effective way to simultaneously predict binary clinical phenotypes and discover their biological mechanisms.
Collapse
Affiliation(s)
- Andrew E Dellinger
- Department of Mathematics and Statistics, Elon University, Elon, NC, USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
| | - Andrew B Nixon
- Department of Medicine, Division of Medical Oncology, Duke University School of Medicine, Durham, NC, USA
| | - Herbert Pang
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
88
|
Le TD, Liu L, Zhang J, Liu B, Li J. From miRNA regulation to miRNA-TF co-regulation: computational approaches and challenges. Brief Bioinform 2014; 16:475-96. [DOI: 10.1093/bib/bbu023] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2014] [Accepted: 06/10/2014] [Indexed: 12/14/2022] Open
|
89
|
Ding H, Wang C, Huang K, Machiraju R. iGPSe: a visual analytic system for integrative genomic based cancer patient stratification. BMC Bioinformatics 2014; 15:203. [PMID: 25000928 PMCID: PMC4227100 DOI: 10.1186/1471-2105-15-203] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2014] [Accepted: 06/10/2014] [Indexed: 12/21/2022] Open
Abstract
Background Cancers are highly heterogeneous with different subtypes. These subtypes often possess different genetic variants, present different pathological phenotypes, and most importantly, show various clinical outcomes such as varied prognosis and response to treatment and likelihood for recurrence and metastasis. Recently, integrative genomics (or panomics) approaches are often adopted with the goal of combining multiple types of omics data to identify integrative biomarkers for stratification of patients into groups with different clinical outcomes. Results In this paper we present a visual analytic system called Interactive Genomics Patient Stratification explorer (iGPSe) which significantly reduces the computing burden for biomedical researchers in the process of exploring complicated integrative genomics data. Our system integrates unsupervised clustering with graph and parallel sets visualization and allows direct comparison of clinical outcomes via survival analysis. Using a breast cancer dataset obtained from the The Cancer Genome Atlas (TCGA) project, we are able to quickly explore different combinations of gene expression (mRNA) and microRNA features and identify potential combined markers for survival prediction. Conclusions Visualization plays an important role in the process of stratifying given population patients. Visual tools allowed for the selection of possibly features across various datasets for the given patient population. We essentially made a case for visualization for a very important problem in translational informatics.
Collapse
Affiliation(s)
| | | | - Kun Huang
- Department of Computer Science and Engineering and Biomedical Informatics, The Ohio State University, 43210 Columbus, OH, USA.
| | | |
Collapse
|
90
|
Liu Z, Zhang S. Toward a systematic understanding of cancers: a survey of the pan-cancer study. Front Genet 2014; 5:194. [PMID: 25071824 PMCID: PMC4080169 DOI: 10.3389/fgene.2014.00194] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2014] [Accepted: 06/12/2014] [Indexed: 11/29/2022] Open
Abstract
Studies on molecular aberrations of cancer patients have increased unprecedentedly in scale and accessibility, allowing large-scale integrative cross-cancer analysis. Pan-cancer study is becoming a valuable paradigm for cancer genomics. Here, we review recent advances in this field and highlight the potential challenges and directions especially from the computational angle.
Collapse
Affiliation(s)
| | - Shihua Zhang
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of SciencesBeijing, China
| |
Collapse
|
91
|
Guan D, Shao J, Zhao Z, Wang P, Qin J, Deng Y, Boheler KR, Wang J, Yan B. PTHGRN: unraveling post-translational hierarchical gene regulatory networks using PPI, ChIP-seq and gene expression data. Nucleic Acids Res 2014; 42:W130-6. [PMID: 24875471 PMCID: PMC4086064 DOI: 10.1093/nar/gku471] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Interactions among transcriptional factors (TFs), cofactors and other proteins or enzymes can affect transcriptional regulatory capabilities of eukaryotic organisms. Post-translational modifications (PTMs) cooperate with TFs and epigenetic alterations to constitute a hierarchical complexity in transcriptional gene regulation. While clearly implicated in biological processes, our understanding of these complex regulatory mechanisms is still limited and incomplete. Various online software have been proposed for uncovering transcriptional and epigenetic regulatory networks, however, there is a lack of effective web-based software capable of constructing underlying interactive organizations between post-translational and transcriptional regulatory components. Here, we present an open web server, post-translational hierarchical gene regulatory network (PTHGRN) to unravel relationships among PTMs, TFs, epigenetic modifications and gene expression. PTHGRN utilizes a graphical Gaussian model with partial least squares regression-based methodology, and is able to integrate protein–protein interactions, ChIP-seq and gene expression data and to capture essential regulation features behind high-throughput data. The server provides an integrative platform for users to analyze ready-to-use public high-throughput Omics resources or upload their own data for systems biology study. Users can choose various parameters in the method, build network topologies of interests and dissect their associations with biological functions. Application of the software to stem cell and breast cancer demonstrates that it is an effective tool for understanding regulatory mechanisms in biological complex systems. PTHGRN web server is publically available at web site http://www.byanbioinfo.org/pthgrn.
Collapse
Affiliation(s)
- Daogang Guan
- Department of Biology, Hong Kong Baptist University, Kowloon, Hong Kong SAR, China
| | - Jiaofang Shao
- Department of Biology, Hong Kong Baptist University, Kowloon, Hong Kong SAR, China
| | - Zhongying Zhao
- Department of Biology, Hong Kong Baptist University, Kowloon, Hong Kong SAR, China
| | - Panwen Wang
- Department of Biochemistry and HKU-SIRI, The University of Hong Kong, Hong Kong SAR, China
| | - Jing Qin
- Department of Biochemistry and HKU-SIRI, The University of Hong Kong, Hong Kong SAR, China
| | - Youping Deng
- Department of Internal Medicine and Biochemistry, Rush University Medical Center, Chicago, Illinois 60612, USA
| | - Kenneth R Boheler
- Stem Cell & Regenerative Medicine Consortium, LKS Faculty of Medicine and Department of Physiology, The University of Hong Kong, Hong Kong SAR, China
| | - Junwen Wang
- Department of Biochemistry and HKU-SIRI, The University of Hong Kong, Hong Kong SAR, China Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Bin Yan
- Department of Biology, Hong Kong Baptist University, Kowloon, Hong Kong SAR, China Stem Cell & Regenerative Medicine Consortium, LKS Faculty of Medicine and Department of Physiology, The University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
92
|
Tragante V, Moore JH, Asselbergs FW. The ENCODE project and perspectives on pathways. Genet Epidemiol 2014; 38:275-80. [PMID: 24723339 DOI: 10.1002/gepi.21802] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2013] [Revised: 01/22/2014] [Accepted: 03/04/2014] [Indexed: 12/22/2022]
Abstract
The recently completed ENCODE project is a new source of information on metabolic activity, unveiling knowledge about evolution and similarities among species, refuting the myth that most DNA is "junk" and has no actual function. With this expansive resource comes a challenge: integrating these new layers of information into our current knowledge of single-nucleotide polymorphisms and previously described metabolic pathways with the aim of discovering new genes and pathways related to human diseases and traits. Further, we must determine which computational methods will be most useful in this pursuit. In this paper, we speculate over the possible methods that will emerge in this new, challenging field.
Collapse
Affiliation(s)
- Vinicius Tragante
- Department of Cardiology, Division of Heart and Lungs, University Medical Center Utrecht, GA Utrecht, The Netherlands; Department of Medical Genetics, Biomedical Genetics, University Medical Center Utrecht, CX Utrecht, The Netherlands
| | | | | |
Collapse
|
93
|
Zhao Q, Shi X, Xie Y, Huang J, Shia B, Ma S. Combining multidimensional genomic measurements for predicting cancer prognosis: observations from TCGA. Brief Bioinform 2014; 16:291-303. [PMID: 24632304 DOI: 10.1093/bib/bbu003] [Citation(s) in RCA: 92] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
With accumulating research on the interconnections among different types of genomic regulations, researchers have found that multidimensional genomic studies outperform one-dimensional studies in multiple aspects. Among many sources of multidimensional genomic data, The Cancer Genome Atlas (TCGA) provides the public with comprehensive profiling data on >30 cancer types, making it an ideal test bed for conducting and comparing different analyses. In this article, the analysis goal is to apply several existing methods and associate multidimensional genomic measurements with cancer outcomes in particular prognosis, with special focus on the predictive power of genomic signatures. We exploit clinical data and four types of genomic measurement including mRNA gene expression, DNA methylation, microRNA and copy number alterations for breast invasive carcinoma, glioblastoma multiforme, acute myeloid leukemia and lung squamous cell carcinoma collected by TCGA. To accommodate the high dimensionality, we extract important features using Principal Component Analysis, Partial Least Squares and Least Absolute Shrinkage and Selection Operator (Lasso), which are representative of dimension reduction and variable selection techniques and have been extensively adopted, and fit Cox survival models with combined important features. We calibrate the predictive power of each type of genomic measurement for the prognosis of four cancer types and find that the results vary across cancers. Our analysis also suggests that for most of the cancers in our study and the adopted methods, there is no substantial improvement in prediction when adding other genomic measurement after gene expression and clinical covariates have been included in the model. This is consistent with the findings that molecular features measured at the transcription level affect clinical outcomes more directly than those measured at the DNA/epigenetic level.
Collapse
|
94
|
Abstract
With the rapid development of high-throughput sequencing technologies, many groups are generating multi-platform genomic profiles (e.g., DNA methylation and gene expression) for their biological samples. This activity has generated a huge number of so-called "multidimensional genomic datasets," providing unique opportunities and challenges to study coordination among different regulatory levels and discover underlying combinatorial patterns of cellular systems. We summarize a matrix factorization framework to address the challenge of integrating multiple genomic datasets, as well as a semi-supervised variant of the method that can incorporate prior knowledge. The basic idea is to project the different kinds of genomic data onto a common coordinate system, wherein genetic variables that are strongly correlated in a subset of samples form a multidimensional module. In the context of cancer biology, such modules reveal perturbed pathways and clinically distinct patient subgroups that would have been overlooked with only a single type of data. In summary, the matrix factorization framework can uncover associations between distinct layers of cellular activity and explain their biological implications in multidimensional data.
Collapse
|
95
|
Sohn KA, Kim D, Lim J, Kim JH. Relative impact of multi-layered genomic data on gene expression phenotypes in serous ovarian tumors. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 6:S9. [PMID: 24521303 PMCID: PMC3906601 DOI: 10.1186/1752-0509-7-s6-s9] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Background The emerging multi-layers of genomic data have provided unprecedented opportunities for cancer research, especially for the association study between gene expressions and other types of genomic features. No previous approaches, however, provide an adequate statistical framework for or global analysis on the relative impact of different genomic feature layers to gene expression phenotypes. Methods We propose an integrative statistical framework based on a sparse regression to model the impact of multi-layered genomic features on gene expression traits. The proposed approach can be regarded as an integrative expression Quantitative Traits Loci approach in which not only the genetic variations of SNPs or copy number variations but also other features in both genomic and epigenomic levels are used to explain the expression of genes. To highlight the validity of the proposed approach, the TCGA ovarian cancer dataset was analysed as a pilot task. Results The analysis shows that our integrative approach has consistently superior power in predicting gene expression levels compared to that from each single data type-based analysis. Moreover, the proposed method has the advantage of producing a substantially reduced number of spurious associations. We provide an interesting characterization of genes in terms of its genomic association patterns. Important genomic features reported in previous ovarian cancer research are successfully identified as major hubs in the resulting association network between heterogeneous types of genomic features and genes. Conclusions In this paper, we model the gene expression phenotypes with respect to multiple different types of genomic data in an integrative framework. Our analysis reveals the global view on the relative contribution of different genomic feature types to gene expression phenotypes in ovarian cancer.
Collapse
|