1
|
Khalilisamani N, Li Z, Pettolino FA, Moncuquet P, Reverter A, MacMillan CP. Leveraging transcriptomics-based approaches to enhance genomic prediction: integrating SNPs and gene networks for cotton fibre quality improvement. FRONTIERS IN PLANT SCIENCE 2024; 15:1420837. [PMID: 39372856 PMCID: PMC11450228 DOI: 10.3389/fpls.2024.1420837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Accepted: 08/19/2024] [Indexed: 10/08/2024]
Abstract
Cultivated cotton plants are the world's largest source of natural fibre, where yield and quality are key traits for this renewable and biodegradable commodity. The Gossypium hirsutum cotton genome contains ~80K protein-coding genes, making precision breeding of complex traits a challenge. This study tested approaches to improving the genomic prediction (GP) accuracy of valuable cotton fibre traits to help accelerate precision breeding. With a biology-informed basis, a novel approach was tested for improving GP for key cotton fibre traits with transcriptomics of key time points during fibre development, namely, fibre cells undergoing primary, transition, and secondary wall development. Three test approaches included weighting of SNPs in DE genes overall, in target DE gene lists informed by gene annotation, and in a novel approach of gene co-expression network (GCN) clusters created with partial correlation and information theory (PCIT) as the prior information in GP models. The GCN clusters were nucleated with known genes for fibre biomechanics, i.e., fasciclin-like arabinogalactan proteins, and cluster size effects were evaluated. The most promising improvements in GP accuracy were achieved by using GCN clusters for cotton fibre elongation by 4.6%, and strength by 4.7%, where cluster sizes of two and three neighbours proved most effective. Furthermore, the improvements in GP were due to only a small number of SNPs, in the order of 30 per trait using the GCN cluster approach. Non-trait-specific biological time points, and genes, were found to have neutral effects, or even reduced GP accuracy for certain traits. As the GCN clusters were generated based on known genes for fibre biomechanics, additional candidate genes were identified for fibre elongation and strength. These results demonstrate that GCN clusters make a specific and unique contribution in improving the GP of cotton fibre traits. The findings also indicate that there is room for incorporating biology-based GCNs into GP models of genomic selection pipelines for cotton breeding to help improve precision breeding of target traits. The PCIT-GCN cluster approach may also hold potential application in other crops and trees for enhancing breeding of complex traits.
Collapse
Affiliation(s)
- Nima Khalilisamani
- Cotton Biotechnology, Agriculture and Food, CSIRO, Canberra, ACT, Australia
| | - Zitong Li
- Cotton Biotechnology, Agriculture and Food, CSIRO, Canberra, ACT, Australia
| | | | - Philippe Moncuquet
- Cotton Biotechnology, Agriculture and Food, CSIRO, Canberra, ACT, Australia
| | - Antonio Reverter
- Livestock and Aquatic Genomics, Agriculture and Food, CSIRO, St Lucia, QLD, Australia
| | | |
Collapse
|
2
|
Caloto R, Lorenzo-Martín LF, Quesada V, Carracedo A, Bustelo XR. CiberAMP: An R Package to Identify Differential mRNA Expression Linked to Somatic Copy Number Variations in Cancer Datasets. BIOLOGY 2022; 11:biology11101411. [PMID: 36290315 PMCID: PMC9598370 DOI: 10.3390/biology11101411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 09/05/2022] [Accepted: 09/26/2022] [Indexed: 11/27/2022]
Abstract
Simple Summary The ability to establish accurate correlations between the number of copies of genes and the expression levels of their encoded transcripts remains a challenge despite the extensive progress made in the understanding of the genome of cancer cells. Here, we describe a new algorithm that does so by integrating both genomics and transcriptomics data from the Cancer Genome Atlas. In addition to explaining the step-by-step basis of this new method, we provide examples of how this new algorithm can help identify functionally meaningful gene copy alterations that are recurrently detected in cancer patients. Abstract Somatic copy number variations (SCNVs) are genetic alterations frequently found in cancer cells. These genetic alterations can lead to concomitant perturbations in the expression of the genes included in them and, as a result, promote a selective advantage to cancer cells. However, this is not always the case. Due to this, it is important to develop in silico tools to facilitate the accurate identification and functional cataloging of gene expression changes associated with SCNVs from pan-cancer data. Here, we present a new R-coded tool, designated as CiberAMP, which utilizes genomic and transcriptomic data contained in the Cancer Genome Atlas (TCGA) to identify such events. It also includes information on the genomic context in which such SCNVs take place. By doing so, CiberAMP provides clues about the potential functional relevance of each of the SCNV-associated gene expression changes found in the interrogated tumor samples. The main features and advantages of this new algorithm are illustrated using glioblastoma data from the TCGA database.
Collapse
Affiliation(s)
- Rubén Caloto
- Molecular Mechanisms of Cancer Program, Centro de Investigación del Cáncer, CSIC-University of Salamanca, 37007 Salamanca, Spain
- Instituto de Biología Molecular y Celular del Cáncer de Salamanca, CSIC-University of Salamanca, 37007 Salamanca, Spain
- Centro de Investigación Biomédica en Red de Cáncer (CIBERONC), CSIC-University of Salamanca, 37007 Salamanca, Spain
| | - L. Francisco Lorenzo-Martín
- Molecular Mechanisms of Cancer Program, Centro de Investigación del Cáncer, CSIC-University of Salamanca, 37007 Salamanca, Spain
- Instituto de Biología Molecular y Celular del Cáncer de Salamanca, CSIC-University of Salamanca, 37007 Salamanca, Spain
- Centro de Investigación Biomédica en Red de Cáncer (CIBERONC), CSIC-University of Salamanca, 37007 Salamanca, Spain
| | - Víctor Quesada
- Centro de Investigación Biomédica en Red de Cáncer (CIBERONC), CSIC-University of Salamanca, 37007 Salamanca, Spain
- Departamento de Bioquímica y Biología Molecular, Universidad de Oviedo, 33006 Oviedo, Spain
| | - Arkaitz Carracedo
- Centro de Investigación Biomédica en Red de Cáncer (CIBERONC), CSIC-University of Salamanca, 37007 Salamanca, Spain
- Center for Cooperative Research in Biosciences (CIC-bioGUNE), Basque Research and Technology Alliance (BRTA), Bizkaia Technology Park, 48160 Derio, Spain
- Ikerbasque, Basque Foundation for Science, 48013 Bilbao, Spain
- Traslational Prostate Cancer Research Lab, CIC-bioGUNE, Biocruces Bizkaia Health Research Institute, 48903 Barakaldo, Spain
| | - Xosé R. Bustelo
- Molecular Mechanisms of Cancer Program, Centro de Investigación del Cáncer, CSIC-University of Salamanca, 37007 Salamanca, Spain
- Instituto de Biología Molecular y Celular del Cáncer de Salamanca, CSIC-University of Salamanca, 37007 Salamanca, Spain
- Centro de Investigación Biomédica en Red de Cáncer (CIBERONC), CSIC-University of Salamanca, 37007 Salamanca, Spain
- Correspondence:
| |
Collapse
|
3
|
Wang W, Zhang J, Wang Y, Xu Y, Zhang S. Non-coding ribonucleic acid-mediated CAMSAP1 upregulation leads to poor prognosis with suppressed immune infiltration in liver hepatocellular carcinoma. Front Genet 2022; 13:916847. [PMID: 36212130 PMCID: PMC9532701 DOI: 10.3389/fgene.2022.916847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Accepted: 08/11/2022] [Indexed: 11/13/2022] Open
Abstract
Liver hepatocellular carcinoma (LIHC) is well-known for its unfavorable prognosis due to the lack of reliable diagnostic and prognostic biomarkers. Calmodulin-regulated spectrin-associated protein 1 (CAMSAP1) is a non-centrosomal microtubule minus-end binding protein that regulates microtubule dynamics. This study aims to investigate the specific role and mechanisms of CAMSAP1 in LIHC. We performed systematical analyses of CAMSAP1 and demonstrated that differential expression of CAMSAP1 is associated with genetic alteration and DNA methylation, and serves as a potential diagnostic and prognostic biomarker in some cancers, especially LIHC. Further evidence suggested that CAMSAP1 overexpression leads to adverse clinical outcomes in advanced LIHC. Moreover, the AC145207.5/LINC01748-miR-101–3p axis is specifically responsible for CAMSAP1 overexpression in LIHC. In addition to the previously reported functions in the cell cycle and regulation of actin cytoskeleton, CAMSAP1-related genes are enriched in cancer- and immune-associated pathways. As expected, CAMSAP1-associated LIHC is infiltrated in the suppressed immune microenvironment. Specifically, except for immune cell infiltration, it is significantly positively correlated with immune checkpoint genes, especially CD274 (PD-L1), and cancer-associated fibroblasts. Prediction of immune checkpoint blockade therapy suggests that these patients may benefit from therapy. Our study is the first to demonstrate that besides genetic alteration and DNA methylation, AC145207.5/LINC01748-miR-101-3p-mediated CAMSAP1 upregulation in advanced LIHC leads to poor prognosis with suppressed immune infiltration, representing a potential diagnostic and prognostic biomarker as well as a promising immunotherapy target for LIHC.
Collapse
|
4
|
Sheng Y, Jiang Y, Yang Y, Li X, Qiu J, Wu J, Cheng L, Han J. CNA2Subpathway: identification of dysregulated subpathway driven by copy number alterations in cancer. Brief Bioinform 2021; 22:6076935. [PMID: 33423051 DOI: 10.1093/bib/bbaa413] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 11/25/2020] [Accepted: 12/15/2020] [Indexed: 12/14/2022] Open
Abstract
Biological pathways reflect the key cellular mechanisms that dictate disease states, drug response and altered cellular function. The local areas of pathways are defined as subpathways (SPs), whose dysfunction has been reported to be associated with the occurrence and development of cancer. With the development of high-throughput sequencing technology, identifying dysfunctional SPs by using multi-omics data has become possible. Moreover, the SPs are not isolated in the biological system but interact with each other. Here, we propose a network-based calculated method, CNA2Subpathway, to identify dysfunctional SPs is driven by somatic copy number alterations (CNAs) in cancer through integrating pathway topology information, multi-omics data and SP crosstalk. This provides a novel way of SP analysis by using the SP interactions in the system biological level. Using data sets from breast cancer and head and neck cancer, we validate the effectiveness of CNA2Subpathway in identifying cancer-relevant SPs driven by the somatic CNAs, which are also shown to be associated with cancer immune and prognosis of patients. We further compare our results with five pathway or SP analysis methods based on CNA and gene expression data without considering SP crosstalk. With these analyses, we show that CNA2Subpathway could help to uncover dysfunctional SPs underlying cancer via the use of SP crosstalk. CNA2Subpathway is developed as an R-based tool, which is freely available on GitHub (https://github.com/hanjunwei-lab/CNA2Subpathway).
Collapse
Affiliation(s)
- Yuqi Sheng
- College of Bioinformatics Science and Technology, Harbin Medical University, China
| | - Ying Jiang
- College of Basic Medical Science, Heilongjiang University of Chinese Medicine, China
| | - Yang Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, China
| | - Xiangmei Li
- College of Bioinformatics Science and Technology, Harbin Medical University, China
| | - Jiayue Qiu
- College of Bioinformatics Science and Technology, Harbin Medical University, China
| | - Jiashuo Wu
- College of Bioinformatics Science and Technology, Harbin Medical University, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, China
| | - Junwei Han
- College of Bioinformatics Science and Technology, Harbin Medical University, China
| |
Collapse
|
5
|
Ruan P, Wang Y, Shen R, Wang S. Using association signal annotations to boost similarity network fusion. Bioinformatics 2020; 35:3718-3726. [PMID: 30863842 DOI: 10.1093/bioinformatics/btz124] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2018] [Revised: 01/17/2019] [Accepted: 02/15/2019] [Indexed: 01/13/2023] Open
Abstract
MOTIVATION Recent technology developments have made it possible to generate various kinds of omics data, which provides opportunities to better solve problems such as disease subtyping or disease mapping using more comprehensive omics data jointly. Among many developed data-integration methods, the similarity network fusion (SNF) method has shown a great potential to identify new disease subtypes through separating similar subjects using multi-omics data. SNF effectively fuses similarity networks with pairwise patient similarity measures from different types of omics data into one fused network using both shared and complementary information across multiple types of omics data. RESULTS In this article, we proposed an association-signal-annotation boosted similarity network fusion (ab-SNF) method, adding feature-level association signal annotations as weights aiming to up-weight signal features and down-weight noise features when constructing subject similarity networks to boost the performance in disease subtyping. In various simulation studies, the proposed ab-SNF outperforms the original SNF approach without weights. Most importantly, the improvement in the subtyping performance due to association-signal-annotation weights is amplified in the integration process. Applications to somatic mutation data, DNA methylation data and gene expression data of three cancer types from The Cancer Genome Atlas project suggest that the proposed ab-SNF method consistently identifies new subtypes in each cancer that more accurately predict patient survival and are more biologically meaningful. AVAILABILITY AND IMPLEMENTATION The R package abSNF is freely available for downloading from https://github.com/pfruan/abSNF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Peifeng Ruan
- Department of Statistics, Columbian College of Arts and Sciences, The George Washington University, Washington, DC, USA
| | - Ya Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Ronglai Shen
- Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Shuang Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
| |
Collapse
|
6
|
Sathyanarayanan A, Gupta R, Thompson EW, Nyholt DR, Bauer DC, Nagaraj SH. A comparative study of multi-omics integration tools for cancer driver gene identification and tumour subtyping. Brief Bioinform 2019; 21:1920-1936. [PMID: 31774481 PMCID: PMC7711266 DOI: 10.1093/bib/bbz121] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 09/09/2019] [Accepted: 09/13/2019] [Indexed: 12/11/2022] Open
Abstract
Oncogenesis and cancer can arise as a consequence of a wide range of genomic aberrations including mutations, copy number alterations, expression changes and epigenetic modifications encompassing multiple omics layers. Integrating genomic, transcriptomic, proteomic and epigenomic datasets via multi-omics analysis provides the opportunity to derive a deeper and holistic understanding of the development and progression of cancer. There are two primary approaches to integrating multi-omics data: multi-staged (focused on identifying genes driving cancer) and meta-dimensional (focused on establishing clinically relevant tumour or sample classifications). A number of ready-to-use bioinformatics tools are available to perform both multi-staged and meta-dimensional integration of multi-omics data. In this study, we compared nine different integration tools using real and simulated cancer datasets. The performance of the multi-staged integration tools were assessed at the gene, function and pathway levels, while meta-dimensional integration tools were assessed based on the sample classification performance. Additionally, we discuss the influence of factors such as data representation, sample size, signal and noise on multi-omics data integration. Our results provide current and much needed guidance regarding selection and use of the most appropriate and best performing multi-omics integration tools.
Collapse
Affiliation(s)
- Anita Sathyanarayanan
- School of Biomedical Sciences, Faculty of Health, and Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Australia
| | - Rohit Gupta
- Department of Biotechnology, Indian Institute of Technology Madras, Chennai, India.,Department of Computational Biology, Indraprastha Institute of Information Technology, Delhi, India
| | - Erik W Thompson
- School of Biomedical Sciences, Faculty of Health, and Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Australia.,Translational Research Institute, Brisbane, Australia
| | - Dale R Nyholt
- School of Biomedical Sciences, Faculty of Health, and Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Australia
| | | | - Shivashankar H Nagaraj
- School of Biomedical Sciences, Faculty of Health, and Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, Australia.,Translational Research Institute, Brisbane, Australia
| |
Collapse
|
7
|
Vasudevan P, Murugesan T. Cancer Subtype Discovery Using Prognosis-Enhanced Neural Network Classifier in Multigenomic Data. Technol Cancer Res Treat 2018; 17:1533033818790509. [PMID: 30092720 PMCID: PMC6088521 DOI: 10.1177/1533033818790509] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Objective: The main objective in studying large-scale cancer omics is to identify molecular mechanisms of cancer and discover novel biomedical targets. This work not only discovers the cancer subtypes in genome scale data by using clustering and classification but also measures their accuracy. Methods: Initially, candidate cancer subtypes are recognized by max-flow/min-cut graph clustering. Finally, prognosis-enhanced neural network classifier is proposed for classification. We analyzed the heterogeneity and identified the subtypes of glioblastoma multiforme, an aggressive adult brain tumor, from 215 samples with microRNA expression (12 042 genes). The samples were classified into 4 different classes such as mesenchymal, classical, proneural, and neural subtypes owing to mutations and gene expression. The results are measured using the metrics such as silhouette width, biological stability index, clustering accuracy, precision, recall, and f-measure. Results: Max-flow/min-cut clustering produces higher clustering accuracy of 88.93% for 215 samples. The proposed prognosis-enhanced neural network classifier algorithm produces higher accuracy results of 89.2% for 215 samples efficiently. Conclusion: From the experimental results, the proposed prognosis-enhanced neural network classifier is seen as an alternative, which is full of promise for cancer subtype prediction in genome scale data.
Collapse
Affiliation(s)
| | - Thangamani Murugesan
- 2 Department of Computer Science and Engineering, Kongu Engineering College, Perundurai, Tamilnadu, India
| |
Collapse
|
8
|
Soh J, Cho H, Choi CH, Lee H. Identification and Characterization of MicroRNAs Associated with Somatic Copy Number Alterations in Cancer. Cancers (Basel) 2018; 10:cancers10120475. [PMID: 30501131 PMCID: PMC6315597 DOI: 10.3390/cancers10120475] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2018] [Revised: 11/10/2018] [Accepted: 11/24/2018] [Indexed: 12/30/2022] Open
Abstract
MicroRNAs (miRNAs) are key molecules that regulate biological processes such as cell proliferation, differentiation, and apoptosis in cancer. Somatic copy number alterations (SCNAs) are common genetic mutations that play essential roles in cancer development. Here, we investigated the association between miRNAs and SCNAs in cancer. We collected 2538 tumor samples for seven cancer types from The Cancer Genome Atlas. We found that 32−84% of miRNAs are in SCNA regions, with the rate depending on the cancer type. In these regions, we identified 80 SCNA-miRNAs whose expression was mainly associated with SCNAs in at least one cancer type and showed that these SCNA-miRNAs are related to cancer by survival analysis and literature searching. We also identified 58 SCNA-miRNAs common in the seven cancer types (CC-SCNA-miRNAs) and showed that these CC-SCNA-miRNAs are more likely to be related with protein and gene expression than other miRNAs. Furthermore, we experimentally validated the oncogenic role of miR-589. In conclusion, our results suggest that SCNA-miRNAs significantly alter biological processes related to cancer development, confirming the importance of SCNAs in non-coding regions in cancer.
Collapse
Affiliation(s)
- Jihee Soh
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, Korea.
| | - Hyejin Cho
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, Korea.
| | - Chan-Hun Choi
- College of Korean Medicine, Dongshin University, Naju-si, Jeollanam-do 58245, Korea.
| | - Hyunju Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, Korea.
| |
Collapse
|
9
|
Hattinger CM, Patrizio MP, Tavanti E, Luppi S, Magagnoli F, Picci P, Serra M. Genetic testing for high-grade osteosarcoma: a guide for future tailored treatments? Expert Rev Mol Diagn 2018; 18:947-961. [PMID: 30324828 DOI: 10.1080/14737159.2018.1535903] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
INTRODUCTION Genetic characterization of osteosarcoma has evolved during the last decade, thanks to the integrated application of conventional and new candidate-driven and genome-wide technologies. Areas covered: This review provides an overview of the state of art in genetic testing applied to osteosarcoma, with particular regard to novel candidate genetic biomarkers that can be analyzed in tumor tissue and blood samples, which might be used to predict toxicity and prognosis, detect disease relapse, and improve patients' selection criteria for tailoring treatment. Expert commentary: Genetic testing based on modern technologies is expected to indicate new osteosarcoma-related prognostic markers and driver genes, which may highlight novel therapeutic targets and patients stratification biomarkers. The definition of tailored or targeted treatment approaches may improve outcome of patients with localized tumors and, even more, of those with metastatic disease, for whom progress in cure probability is highly warranted.
Collapse
Affiliation(s)
| | - Maria Pia Patrizio
- a Laboratory of Experimental Oncology , IRCCS Istituto Ortopedico Rizzoli , Bologna , Italy
| | - Elisa Tavanti
- a Laboratory of Experimental Oncology , IRCCS Istituto Ortopedico Rizzoli , Bologna , Italy
| | - Silvia Luppi
- a Laboratory of Experimental Oncology , IRCCS Istituto Ortopedico Rizzoli , Bologna , Italy
| | - Federica Magagnoli
- a Laboratory of Experimental Oncology , IRCCS Istituto Ortopedico Rizzoli , Bologna , Italy
| | - Piero Picci
- a Laboratory of Experimental Oncology , IRCCS Istituto Ortopedico Rizzoli , Bologna , Italy
| | - Massimo Serra
- a Laboratory of Experimental Oncology , IRCCS Istituto Ortopedico Rizzoli , Bologna , Italy
| |
Collapse
|
10
|
A part of patients with autism spectrum disorder has haploidy of HPC-1/syntaxin1A gene that possibly causes behavioral disturbance as in experimentally gene ablated mice. Neurosci Lett 2017; 644:5-9. [DOI: 10.1016/j.neulet.2017.02.052] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2016] [Revised: 01/25/2017] [Accepted: 02/20/2017] [Indexed: 01/02/2023]
|
11
|
Arneson D, Shu L, Tsai B, Barrere-Cain R, Sun C, Yang X. Multidimensional Integrative Genomics Approaches to Dissecting Cardiovascular Disease. Front Cardiovasc Med 2017; 4:8. [PMID: 28289683 PMCID: PMC5327355 DOI: 10.3389/fcvm.2017.00008] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Accepted: 02/09/2017] [Indexed: 12/19/2022] Open
Abstract
Elucidating the mechanisms of complex diseases such as cardiovascular disease (CVD) remains a significant challenge due to multidimensional alterations at molecular, cellular, tissue, and organ levels. To better understand CVD and offer insights into the underlying mechanisms and potential therapeutic strategies, data from multiple omics types (genomics, epigenomics, transcriptomics, metabolomics, proteomics, microbiomics) from both humans and model organisms have become available. However, individual omics data types capture only a fraction of the molecular mechanisms. To address this challenge, there have been numerous efforts to develop integrative genomics methods that can leverage multidimensional information from diverse data types to derive comprehensive molecular insights. In this review, we summarize recent methodological advances in multidimensional omics integration, exemplify their applications in cardiovascular research, and pinpoint challenges and future directions in this incipient field.
Collapse
Affiliation(s)
- Douglas Arneson
- Department of Integrative Biology and Physiology, University of California Los Angeles, Los Angeles, CA, USA; Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Le Shu
- Department of Integrative Biology and Physiology, University of California Los Angeles, Los Angeles, CA, USA; Molecular, Cellular, and Integrative Physiology Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Brandon Tsai
- Department of Integrative Biology and Physiology, University of California Los Angeles , Los Angeles, CA , USA
| | - Rio Barrere-Cain
- Department of Integrative Biology and Physiology, University of California Los Angeles , Los Angeles, CA , USA
| | - Christine Sun
- Department of Integrative Biology and Physiology, University of California Los Angeles , Los Angeles, CA , USA
| | - Xia Yang
- Department of Integrative Biology and Physiology, University of California Los Angeles, Los Angeles, CA, USA; Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA; Molecular, Cellular, and Integrative Physiology Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA; Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA; Molecular Biology Institute, University of California Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
12
|
Chaturvedi N, de Menezes RX, Goeman JJ. A global × global test for testing associations between two large sets of variables. Biom J 2016; 59:145-158. [PMID: 27225065 DOI: 10.1002/bimj.201500106] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Revised: 01/06/2016] [Accepted: 03/07/2016] [Indexed: 12/30/2022]
Abstract
In high-dimensional omics studies where multiple molecular profiles are obtained for each set of patients, there is often interest in identifying complex multivariate associations, for example, copy number regulated expression levels in a certain pathway or in a genomic region. To detect such associations, we present a novel approach to test for association between two sets of variables. Our approach generalizes the global test, which tests for association between a group of covariates and a single univariate response, to allow high-dimensional multivariate response. We apply the method to several simulated datasets as well as two publicly available datasets, where we compare the performance of multivariate global test (G2) with univariate global test. The method is implemented in R and will be available as a part of the globaltest package in R.
Collapse
Affiliation(s)
- Nimisha Chaturvedi
- Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands.,Netherlands Bioinformatics Center, Nijmegen, The Netherlands
| | - Renée X de Menezes
- Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands.,Netherlands Bioinformatics Center, Nijmegen, The Netherlands
| | - Jelle J Goeman
- Biostatistics, Department for Health Evidence, Radboud University Medical Center, Nijmegen, The Netherlands.,Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
13
|
Meng C, Helm D, Frejno M, Kuster B. moCluster: Identifying Joint Patterns Across Multiple Omics Data Sets. J Proteome Res 2015; 15:755-65. [PMID: 26653205 DOI: 10.1021/acs.jproteome.5b00824] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Increasingly, multiple omics approaches are being applied to understand the complexity of biological systems. Yet, computational approaches that enable the efficient integration of such data are not well developed. Here, we describe a novel algorithm, termed moCluster, which discovers joint patterns among multiple omics data. The method first employs a multiblock multivariate analysis to define a set of latent variables representing joint patterns across input data sets, which is further passed to an ordinary clustering algorithm in order to discover joint clusters. Using simulated data, we show that moCluster's performance is not compromised by issues present in iCluster/iCluster+ (notably, the nondeterministic solution) and that it operates 100× to 1000× faster than iCluster/iCluster+. We used moCluster to cluster proteomic and transcriptomic data from the NCI-60 cell line panel. The resulting cluster model revealed different phenotypes across cellular subtypes, such as doubling time and drug response. Applying moCluster to methylation, mRNA, and protein data from a large study on colorectal cancer patients identified four molecular subtypes, including one characterized by microsatellite instability and high expression of genes/proteins involved in immunity, such as PDL1, a target of multiple drugs currently in development. The other three subtypes have not been discovered before using single data sets, which clearly illustrates the molecular complexity of oncogenesis and the need for holistic, multidata analysis strategies.
Collapse
Affiliation(s)
| | | | - Martin Frejno
- Department of Oncology, University of Oxford , Oxford OX3 7DQ, United Kingdom
| | - Bernhard Kuster
- Center for Integrated Protein Science Munich (CIPSM) , Emil-Erlenmeyer-Forum 5, Freising 85354, Germany
| |
Collapse
|
14
|
Dassi E, Greco V, Sidarovich V, Zuccotti P, Arseni N, Scaruffi P, Tonini GP, Quattrone A. Translational compensation of genomic instability in neuroblastoma. Sci Rep 2015; 5:14364. [PMID: 26399178 PMCID: PMC4585852 DOI: 10.1038/srep14364] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 08/25/2015] [Indexed: 11/23/2022] Open
Abstract
Cancer-associated gene expression imbalances are conventionally studied at the genomic, epigenomic and transcriptomic levels. Given the relevance of translational control in determining cell phenotypes, we evaluated the translatome, i.e., the transcriptome engaged in translation, as a descriptor of the effects of genetic instability in cancer. We performed this evaluation in high-risk neuroblastomas, which are characterized by a low frequency of point mutations or known cancer-driving genes and by the presence of several segmental chromosomal aberrations that produce gene-copy imbalances that guide aggressiveness. We thus integrated genome, transcriptome, translatome and miRome profiles in a representative panel of high-risk neuroblastoma cell lines. We identified a number of genes whose genomic imbalance was corrected by compensatory adaptations in translational efficiency. The transcriptomic level of these genes was predictive of poor prognosis in more than half of cases, and the genomic imbalances found in their loci were shared by 27 other tumor types. This homeostatic process is also not limited to copy number-altered genes, as we showed the translational stoichiometric rebalance of histone genes. We suggest that the translational buffering of fluctuations in these dose-sensitive transcripts is a potential driving process of neuroblastoma evolution.
Collapse
Affiliation(s)
- Erik Dassi
- Laboratory of Translational Genomics, Centre for Integrative Biology, University of Trento, Italy
| | - Valentina Greco
- Laboratory of Translational Genomics, Centre for Integrative Biology, University of Trento, Italy
| | - Viktoryia Sidarovich
- Laboratory of Translational Genomics, Centre for Integrative Biology, University of Trento, Italy
| | - Paola Zuccotti
- Laboratory of Translational Genomics, Centre for Integrative Biology, University of Trento, Italy
| | - Natalia Arseni
- Laboratory of Translational Genomics, Centre for Integrative Biology, University of Trento, Italy
| | - Paola Scaruffi
- Center of Physiopathology of Human Reproduction, Unit of Obstetrics and Gynecology, IRCSS A.O.U. San Martino IST, Genova, Italy
| | - Gian Paolo Tonini
- Neuroblastoma Laboratory, Pediatric Research Institute, Fondazione Città della Speranza, Padova, Italy
| | - Alessandro Quattrone
- Laboratory of Translational Genomics, Centre for Integrative Biology, University of Trento, Italy
| |
Collapse
|
15
|
Cava C, Bertoli G, Castiglioni I. Integrating genetics and epigenetics in breast cancer: biological insights, experimental, computational methods and therapeutic potential. BMC SYSTEMS BIOLOGY 2015; 9:62. [PMID: 26391647 PMCID: PMC4578257 DOI: 10.1186/s12918-015-0211-x] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Accepted: 09/15/2015] [Indexed: 12/11/2022]
Abstract
BACKGROUND Development of human cancer can proceed through the accumulation of different genetic changes affecting the structure and function of the genome. Combined analyses of molecular data at multiple levels, such as DNA copy-number alteration, mRNA and miRNA expression, can clarify biological functions and pathways deregulated in cancer. The integrative methods that are used to investigate these data involve different fields, including biology, bioinformatics, and statistics. RESULTS These methodologies are presented in this review, and their implementation in breast cancer is discussed with a focus on integration strategies. We report current applications, recent studies and interesting results leading to the identification of candidate biomarkers for diagnosis, prognosis, and therapy in breast cancer by using both individual and combined analyses. CONCLUSION This review presents a state of art of the role of different technologies in breast cancer based on the integration of genetics and epigenetics, and shares some issues related to the new opportunities and challenges offered by the application of such integrative approaches.
Collapse
Affiliation(s)
- Claudia Cava
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), Milan, Italy.
| | - Gloria Bertoli
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), Milan, Italy.
| | - Isabella Castiglioni
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), Milan, Italy.
| |
Collapse
|
16
|
Protopopova MV, Pavlichenko VV, Menzel R, Putschew A, Luckenbach T, Steinberg CEW. Contrasting cellular stress responses of Baikalian and Palearctic amphipods upon exposure to humic substances: environmental implications. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2014; 21:14124-14137. [PMID: 25053285 DOI: 10.1007/s11356-014-3323-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2014] [Accepted: 07/09/2014] [Indexed: 06/03/2023]
Abstract
The species-rich, endemic amphipod fauna of Lake Baikal does not overlap with the common Palearctic fauna; however, the underlying mechanisms for this are poorly understood. Considering that Palearctic lakes have a higher relative input of natural organic compounds with a dominance of humic substances (HSs) than Lake Baikal, we addressed the question whether HSs are candidate factors that affect the different species compositions in these water bodies. We hypothesized that interspecies differences in stress defense might reveal that Baikalian amphipods are inferior to Palearctic amphipods in dealing with HS-mediated stress. In this study, two key mechanisms of general stress response were examined: heat-shock protein 70 (HSP70) and multixenobiotic resistance-associated transporters (ABCB1). The results of quantitative polymerase chain reaction (qPCR) showed that the basal levels (in 3-day acclimated animals) of hsp70 and abcb1 transcripts were lower in Baikalian species (Eulimnogammarus cyaneus, Eulimnogammarus verrucosus, Eulimnogammarus vittatus-the most typical littoral species) than in the Palearctic amphipod (Gammarus lacustris-the only Palearctic species distributed in the Baikalian region). In the amphipods, the stress response was induced using HSs at 10 mg L(-1) dissolved organic carbon, which was higher than in sampling sites of the studied species, but well within the range (3-10 mg L(-1)) in the surrounding water bodies populated by G. lacustris. The results of qPCR and western blotting (n = 5) showed that HS exposure led to increased hsp70/abcb1 transcripts and HSP70 protein levels in G. lacustris, whereas these transcript levels remained constant or decreased in the Baikalian species. The decreased level of stress transcripts is probably not able to confer an effective tolerance to Baikalian species against further environmental stressors in conditions with elevated HS levels. Thus, our results suggest a greater robustness of Palearctic amphipods and a higher sensitivity of Baikalian amphipods to HS challenge, which might prevent most endemic species from migrating to habitats outside Lake Baikal.
Collapse
Affiliation(s)
- Marina V Protopopova
- Siberian Institute of Plant Physiology and Biochemistry, Siberian Branch Russian Academy of Sciences, Lermontov str., 132, Irkutsk, Russia, 664033,
| | | | | | | | | | | |
Collapse
|
17
|
Hong S, Huang Y, Cao Y, Chen X, Han JDJ. Approaches to uncovering cancer diagnostic and prognostic molecular signatures. Mol Cell Oncol 2014; 1:e957981. [PMID: 27308330 PMCID: PMC4905187 DOI: 10.4161/23723548.2014.957981] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2014] [Revised: 07/21/2014] [Accepted: 07/22/2014] [Indexed: 12/14/2022]
Abstract
The recent rapid development of high-throughput technology enables the study of molecular signatures for cancer diagnosis and prognosis at multiple levels, from genomic and epigenomic to transcriptomic. These unbiased large-scale scans provide important insights into the detection of cancer-related signatures. In addition to single-layer signatures, such as gene expression and somatic mutations, integrating data from multiple heterogeneous platforms using a systematic approach has been proven to be particularly effective for the identification of classification markers. This approach not only helps to uncover essential driver genes and pathways in the cancer network that are responsible for the mechanisms of cancer development, but will also lead us closer to the ultimate goal of personalized cancer therapy.
Collapse
Affiliation(s)
- Shengjun Hong
- Chinese Academy of Sciences Key Laboratory of Computational Biology; Chinese Academy of Sciences-Max Planck Partner Institute for Computational Biology; Shanghai Institutes for Biological Sciences; Chinese Academy of Sciences ; Shanghai, China
| | - Yi Huang
- Chinese Academy of Sciences Key Laboratory of Computational Biology; Chinese Academy of Sciences-Max Planck Partner Institute for Computational Biology; Shanghai Institutes for Biological Sciences; Chinese Academy of Sciences ; Shanghai, China
| | - Yaqiang Cao
- Chinese Academy of Sciences Key Laboratory of Computational Biology; Chinese Academy of Sciences-Max Planck Partner Institute for Computational Biology; Shanghai Institutes for Biological Sciences; Chinese Academy of Sciences ; Shanghai, China
| | - Xingwei Chen
- Chinese Academy of Sciences Key Laboratory of Computational Biology; Chinese Academy of Sciences-Max Planck Partner Institute for Computational Biology; Shanghai Institutes for Biological Sciences; Chinese Academy of Sciences ; Shanghai, China
| | - Jing-Dong J Han
- Chinese Academy of Sciences Key Laboratory of Computational Biology; Chinese Academy of Sciences-Max Planck Partner Institute for Computational Biology; Shanghai Institutes for Biological Sciences; Chinese Academy of Sciences ; Shanghai, China
| |
Collapse
|
18
|
Newton R, Wernisch L. A meta-analysis of multiple matched copy number and transcriptomics data sets for inferring gene regulatory relationships. PLoS One 2014; 9:e105522. [PMID: 25148247 PMCID: PMC4141782 DOI: 10.1371/journal.pone.0105522] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2014] [Accepted: 07/21/2014] [Indexed: 12/25/2022] Open
Abstract
Inferring gene regulatory relationships from observational data is challenging. Manipulation and intervention is often required to unravel causal relationships unambiguously. However, gene copy number changes, as they frequently occur in cancer cells, might be considered natural manipulation experiments on gene expression. An increasing number of data sets on matched array comparative genomic hybridisation and transcriptomics experiments from a variety of cancer pathologies are becoming publicly available. Here we explore the potential of a meta-analysis of thirty such data sets. The aim of our analysis was to assess the potential of in silico inference of trans-acting gene regulatory relationships from this type of data. We found sufficient correlation signal in the data to infer gene regulatory relationships, with interesting similarities between data sets. A number of genes had highly correlated copy number and expression changes in many of the data sets and we present predicted potential trans-acted regulatory relationships for each of these genes. The study also investigates to what extent heterogeneity between cell types and between pathologies determines the number of statistically significant predictions available from a meta-analysis of experiments.
Collapse
Affiliation(s)
- Richard Newton
- Biostatistics Unit, Medical Research Council, Cambridge, United Kingdom
- * E-mail:
| | - Lorenz Wernisch
- Biostatistics Unit, Medical Research Council, Cambridge, United Kingdom
| |
Collapse
|
19
|
Chaturvedi N, Goeman JJ, Boer JM, van Wieringen WN, de Menezes RX. A test for comparing two groups of samples when analyzing multiple omics profiles. BMC Bioinformatics 2014; 15:236. [PMID: 25004928 PMCID: PMC4227098 DOI: 10.1186/1471-2105-15-236] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Accepted: 06/28/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A number of statistical models has been proposed for studying the association between gene expression and copy number data in integrated analysis. The next step is to compare association patterns between different groups of samples. RESULTS We propose a method, named dSIM, to find differences in association between copy number and gene expression, when comparing two groups of samples. Firstly, we use ridge regression to correct for the baseline associations between copy number and gene expression. Secondly, the global test is applied to the corrected data in order to find differences in association patterns between two groups of samples. We show that dSIM detects differences even in small genomic regions in a simulation study. We also apply dSIM to two publicly available breast cancer datasets and identify chromosome arms where copy number led gene expression regulation differs between positive and negative estrogen receptor samples. In spite of differing genomic coverage, some selected arms are identified in both datasets. CONCLUSION We developed a flexible and robust method for studying association differences between two groups of samples while integrating genomic data from different platforms. dSIM can be used with most types of microarray/sequencing data, including methylation and microRNA expression. The method is implemented in R and will be made part of the BioConductor package SIM.
Collapse
Affiliation(s)
- Nimisha Chaturvedi
- Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands.
| | | | | | | | | |
Collapse
|
20
|
Bryk J, Tautz D. Copy number variants and selective sweeps in natural populations of the house mouse (Mus musculus domesticus). Front Genet 2014; 5:153. [PMID: 24917877 PMCID: PMC4042557 DOI: 10.3389/fgene.2014.00153] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2013] [Accepted: 05/09/2014] [Indexed: 12/28/2022] Open
Abstract
Copy-number variants (CNVs) may play an important role in early adaptations, potentially facilitating rapid divergence of populations. We describe an approach to study this question by investigating CNVs present in natural populations of mice in the early stages of divergence and their involvement in selective sweeps. We have analyzed individuals from two recently diverged natural populations of the house mouse (Mus musculus domesticus) from Germany and France using custom, high-density, comparative genome hybridization arrays (CGH) that covered almost 164 Mb and 2444 genes. One thousand eight hundred and sixty one of those genes we previously identified as differentially expressed between these populations, while the expression of the remaining genes was invariant. In total, we identified 1868 CNVs across all 10 samples, 200 bp to 600 kb in size and affecting 424 genic regions. Roughly two thirds of all CNVs found were deletions. We found no enrichment of CNVs among the differentially expressed genes between the populations compared to the invariant ones, nor any meaningful correlation between CNVs and gene expression changes. Among the CNV genes, we found cellular component gene ontology categories of the synapse overrepresented among all the 2444 genes tested. To investigate potential adaptive significance of the CNV regions, we selected six that showed large differences in frequency of CNVs between the two populations and analyzed variation in at least two microsatellites surrounding the loci in a sample of 46 unrelated animals from the same populations collected in field trappings. We identified two loci with large differences in microsatellite heterozygosity (Sfi1 and Glo1/Dnahc8 regions) and one locus with low variation across the populations (Cmah), thus suggesting that these genomic regions might have recently undergone selective sweeps. Interestingly, the Glo1 CNV has previously been implicated in anxiety-like behavior in mice, suggesting a differential evolution of a behavioral trait.
Collapse
Affiliation(s)
- Jarosław Bryk
- Max Planck Institute for Evolutionary Biology Plön, Germany
| | - Diethard Tautz
- Max Planck Institute for Evolutionary Biology Plön, Germany
| |
Collapse
|
21
|
Integrative genomics with mediation analysis in a survival context. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2013; 2013:413783. [PMID: 24454535 PMCID: PMC3878392 DOI: 10.1155/2013/413783] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Accepted: 09/23/2013] [Indexed: 12/25/2022]
Abstract
DNA copy number aberrations (DCNA) and subsequent altered gene expression profiles may have a major impact on tumor initiation, on development, and eventually on recurrence and cancer-specific mortality. However, most methods employed in integrative genomic analysis of the two biological levels, DNA and RNA, do not consider survival time. In the present note, we propose the adoption of a survival analysis-based framework for the integrative analysis of DCNA and mRNA levels to reveal their implication on patient clinical outcome with the prerequisite that the effect of DCNA on survival is mediated by mRNA levels. The specific aim of the paper is to offer a feasible framework to test the DCNA-mRNA-survival pathway. We provide statistical inference algorithms for mediation based on asymptotic results. Furthermore, we illustrate the applicability of the method in an integrative genomic analysis setting by using a breast cancer data set consisting of 141 invasive breast tumors. In addition, we provide implementation in R.
Collapse
|
22
|
Azad AKM, Lee H. Voting-based cancer module identification by combining topological and data-driven properties. PLoS One 2013; 8:e70498. [PMID: 23940583 PMCID: PMC3734239 DOI: 10.1371/journal.pone.0070498] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2012] [Accepted: 06/19/2013] [Indexed: 12/19/2022] Open
Abstract
Recently, computational approaches integrating copy number aberrations (CNAs) and gene expression (GE) have been extensively studied to identify cancer-related genes and pathways. In this work, we integrate these two data sets with protein-protein interaction (PPI) information to find cancer-related functional modules. To integrate CNA and GE data, we first built a gene-gene relationship network from a set of seed genes by enumerating all types of pairwise correlations, e.g. GE-GE, CNA-GE, and CNA-CNA, over multiple patients. Next, we propose a voting-based cancer module identification algorithm by combining topological and data-driven properties (VToD algorithm) by using the gene-gene relationship network as a source of data-driven information, and the PPI data as topological information. We applied the VToD algorithm to 266 glioblastoma multiforme (GBM) and 96 ovarian carcinoma (OVC) samples that have both expression and copy number measurements, and identified 22 GBM modules and 23 OVC modules. Among 22 GBM modules, 15, 12, and 20 modules were significantly enriched with cancer-related KEGG, BioCarta pathways, and GO terms, respectively. Among 23 OVC modules, 19, 18, and 23 modules were significantly enriched with cancer-related KEGG, BioCarta pathways, and GO terms, respectively. Similarly, we also observed that 9 and 2 GBM modules and 15 and 18 OVC modules were enriched with cancer gene census (CGC) and specific cancer driver genes, respectively. Our proposed module-detection algorithm significantly outperformed other existing methods in terms of both functional and cancer gene set enrichments. Most of the cancer-related pathways from both cancer data sets found in our algorithm contained more than two types of gene-gene relationships, showing strong positive correlations between the number of different types of relationship and CGC enrichment -values (0.64 for GBM and 0.49 for OVC). This study suggests that identified modules containing both expression changes and CNAs can explain cancer-related activities with greater insights.
Collapse
Affiliation(s)
- A. K. M. Azad
- School of Information and Communications, Gwangju Institute of Science and Technology, Gwangju, South Korea
| | - Hyunju Lee
- School of Information and Communications, Gwangju Institute of Science and Technology, Gwangju, South Korea
- * E-mail:
| |
Collapse
|
23
|
Goh XY, Newton R, Wernisch L, Fitzgerald R. Testing the utility of an integrated analysis of copy number and transcriptomics datasets for inferring gene regulatory relationships. PLoS One 2013; 8:e63780. [PMID: 23737949 PMCID: PMC3667814 DOI: 10.1371/journal.pone.0063780] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2012] [Accepted: 04/07/2013] [Indexed: 12/31/2022] Open
Abstract
Correlation patterns between matched copy number variation and gene expression data in cancer samples enable the inference of causal gene regulatory relationships by exploiting the natural randomization of such systems. The aim of this study was to test and verify experimentally the accuracy of a causal inference approach based on genomic randomization using esophageal cancer samples. Two candidates with strong regulatory effects emerging from our analysis are components of growth factor receptors, and implicated in cancer development, namely ERBB2 and FGFR2. We tested experimentally two ERBB2 and three FGFR2 regulated interactions predicted by the statistical analysis, all of which were confirmed. We also applied the method in a meta-analysis of 10 cancer datasets and tested 15 of the predicted regulatory interactions experimentally. Three additional predicted ERBB2 regulated interactions were confirmed, as well as interactions regulated by ARPC1A and FANCG. Overall, two thirds of experimentally tested predictions were confirmed.
Collapse
Affiliation(s)
- Xin Yi Goh
- Medical Research Council Cancer Cell Unit, Hutchison-MRC Research Centre, Cambridge, United Kingdom
- Department of Oncology, University of Cambridge, Cambridge, United Kingdom
| | - Richard Newton
- Medical Research Council Biostatistics Unit, Cambridge, United Kingdom
- * E-mail:
| | - Lorenz Wernisch
- Medical Research Council Biostatistics Unit, Cambridge, United Kingdom
| | - Rebecca Fitzgerald
- Medical Research Council Cancer Cell Unit, Hutchison-MRC Research Centre, Cambridge, United Kingdom
| |
Collapse
|
24
|
Leday GGR, van der Vaart AW, van Wieringen WN, van de Wiel MA. Modeling association between DNA copy number and gene expression with constrained piecewise linear regression splines. Ann Appl Stat 2013. [DOI: 10.1214/12-aoas605] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
25
|
Lambrou GI, Adamaki M, Delakas D, Spandidos DA, Vlahopoulos S, Zaravinos A. Gene expression is highly correlated on the chromosome level in urinary bladder cancer. Cell Cycle 2013; 12:1544-1559. [PMID: 23624844 PMCID: PMC3680534 DOI: 10.4161/cc.24673] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2012] [Revised: 03/19/2013] [Accepted: 04/11/2013] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVE Chromosome correlation maps display correlations between gene expression patterns on the same chromosome. Our goal was to map the genes on chromosome regions and to identify correlations through their location on chromosome regions. MATERIALS AND METHODS Following microarray analysis we used Ingenuity Pathway Analysis (IPA) to construct gene networks of the co-deregulated genes in bladder cancer. Chromosome mapping, mathematical modeling and data simulations were performed using the WebGestalt and Matlab(®) softwares. RESULTS The top deregulated molecules among 129 bladder cancer samples were implicated in the PI3K/AKT signaling, cell cycle, Myc-mediated apoptosis signaling and ERK5 signaling pathways. Their most prominent molecular and cellular functions were related to cell cycle, cell death, gene expression, molecular transport and cellular growth and proliferation. Chromosome correlation maps allowed us to detect significantly co-expressed genes along the chromosomes. We identified strong correlations among tumors of Tα-grade 1, as well as for those of Tα-grade 2, in chromosomes 1, 2, 3, 7, 12 and 19. Chromosomal domains of gene co-expression were revealed for the normal tissues, as well. The expression data were further simulated, exhibiting an excellent fit (0.7 < R(2) < 0.9). The simulations revealed that along the different samples, genes on same chromosomes are expressed in a similar manner. CONCLUSIONS Gene expression is highly correlated on the chromosome level. Chromosome correlation maps of gene expression signatures can provide further information on gene regulatory mechanisms. Gene expression data can be simulated using polynomial functions.
Collapse
Affiliation(s)
- George I. Lambrou
- First Department of Pediatrics; University of Athens; Choremeio Research Laboratory; Athens, Greece
| | - Maria Adamaki
- First Department of Pediatrics; University of Athens; Choremeio Research Laboratory; Athens, Greece
| | - Dimitris Delakas
- Department of Urology; Asklipieio General Hospital; Athens, Greece
| | | | - Spyros Vlahopoulos
- First Department of Pediatrics; University of Athens; Choremeio Research Laboratory; Athens, Greece
| | - Apostolos Zaravinos
- Laboratory of Clinical Virology; Medical School; University of Crete; Crete, Greece
| |
Collapse
|
26
|
Kuijjer ML, Hogendoorn PCW, Cleton-Jansen AM. Genome-wide analyses on high-grade osteosarcoma: making sense of a genomically most unstable tumor. Int J Cancer 2013; 133:2512-21. [PMID: 23436697 DOI: 10.1002/ijc.28124] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2012] [Accepted: 02/13/2013] [Indexed: 12/16/2022]
Abstract
High-grade osteosarcoma is an extremely genomically unstable tumor. This, together with other challenges, such as the heterogeneity within and between tumor samples, and the rarity of the disease, renders it difficult to study this tumor on a genome-wide level. Now that most laboratories change from genome-wide microarray experiments to Next-Generation Sequencing it is important to discuss the lessons we have learned from microarray studies. In this review, we discuss the challenges of high-grade osteosarcoma data analysis. We give an overview of microarray studies that have been conducted so far on both osteosarcoma tissue samples and cell lines. We discuss recent findings from integration of different data types, which is particularly relevant in a tumor with such a complex genomic profile. Finally, we elaborate on the translation of results obtained with bioinformatics into functional studies, which has lead to valuable findings, especially when keeping in mind that no new therapies with a significant impact on survival have been developed in the past decades.
Collapse
Affiliation(s)
- Marieke L Kuijjer
- Department of Pathology, Leiden University Medical Center, Leiden, The Netherlands
| | | | | |
Collapse
|
27
|
A predictive framework for integrating disparate genomic data types using sample-specific gene set enrichment analysis and multi-task learning. PLoS One 2012; 7:e44635. [PMID: 23028573 PMCID: PMC3441565 DOI: 10.1371/journal.pone.0044635] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2012] [Accepted: 08/06/2012] [Indexed: 11/19/2022] Open
Abstract
Understanding the root molecular and genetic causes driving complex traits is a fundamental challenge in genomics and genetics. Numerous studies have used variation in gene expression to understand complex traits, but the underlying genomic variation that contributes to these expression changes is not well understood. In this study, we developed a framework to integrate gene expression and genotype data to identify biological differences between samples from opposing complex trait classes that are driven by expression changes and genotypic variation. This framework utilizes pathway analysis and multi-task learning to build a predictive model and discover pathways relevant to the complex trait of interest. We simulated expression and genotype data to test the predictive ability of our framework and to measure how well it uncovered pathways with genes both differentially expressed and genetically associated with a complex trait. We found that the predictive performance of the multi-task model was comparable to other similar methods. Also, methods like multi-task learning that considered enrichment analysis scores from both data sets found pathways with both genetic and expression differences related to the phenotype. We used our framework to analyze differences between estrogen receptor (ER) positive and negative breast cancer samples. An analysis of the top 15 gene sets from the multi-task model showed they were all related to estrogen, steroids, cell signaling, or the cell cycle. Although our study suggests that multi-task learning does not enhance predictive accuracy, the models generated by our framework do provide valuable biological pathway knowledge for complex traits.
Collapse
|
28
|
Standfuss C, Pospisil H, Klein A. SNP microarray analyses reveal copy number alterations and progressive genome reorganization during tumor development in SVT/t driven mice breast cancer. BMC Cancer 2012; 12:380. [PMID: 22935085 PMCID: PMC3534550 DOI: 10.1186/1471-2407-12-380] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2012] [Accepted: 08/08/2012] [Indexed: 11/21/2022] Open
Abstract
Background Tumor development is known to be a stepwise process involving dynamic changes that affect cellular integrity and cellular behavior. This complex interaction between genomic organization and gene, as well as protein expression is not yet fully understood. Tumor characterization by gene expression analyses is not sufficient, since expression levels are only available as a snapshot of the cell status. So far, research has mainly focused on gene expression profiling or alterations in oncogenes, even though DNA microarray platforms would allow for high-throughput analyses of copy number alterations (CNAs). Methods We analyzed DNA from mouse mammary gland epithelial cells using the Affymetrix Mouse Diversity Genotyping array (MOUSEDIVm520650) and calculated the CNAs. Segmental copy number alterations were computed based on the probeset CNAs using the circular binary segmentation algorithm. Motif search was performed in breakpoint regions (inter-segment regions) with the MEME suite to identify common motif sequences. Results Here we present a four stage mouse model addressing copy number alterations in tumorigenesis. No considerable changes in CNA were identified for non-transgenic mice, but a stepwise increase in CNA was found during tumor development. The segmental copy number alteration revealed informative chromosomal fragmentation patterns. In inter-segment regions (hypothetical breakpoint sides) unique motifs were found. Conclusions Our analyses suggest genome reorganization as a stepwise process that involves amplifications and deletions of chromosomal regions. We conclude from distinctive fragmentation patterns that conserved as well as individual breakpoints exist which promote tumorigenesis.
Collapse
Affiliation(s)
- Christoph Standfuss
- Bioinformatics, Technical University of Applied Sciences Wildau, 15745 Wildau, Bahnhofstrasse, Germany.
| | | | | |
Collapse
|
29
|
Yuan Y, Curtis C, Caldas C, Markowetz F. A sparse regulatory network of copy-number driven gene expression reveals putative breast cancer oncogenes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:947-954. [PMID: 21788678 DOI: 10.1109/tcbb.2011.105] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
UNLABELLED Copy number aberrations are recognized to be important in cancer as they may localize to regions harboring oncogenes or tumor suppressors. Such genomic alterations mediate phenotypic changes through their impact on expression. Both cis- and transacting alterations are important since they may help to elucidate putative cancer genes. However, amidst numerous passenger genes, trans-effects are less well studied due to the computational difficulty in detecting weak and sparse signals in the data, and yet may influence multiple genes on a global scale. We propose an integrative approach to learn a sparse interaction network of DNA copy-number regions with their downstream transcriptional targets in breast cancer. With respect to goodness of fit on both simulated and real data, the performance of sparse network inference is no worse than other state-of-the-art models but with the advantage of simultaneous feature selection and efficiency. The DNA-RNA interaction network helps to distinguish copy-number driven expression alterations from those that are copy-number independent. Further, our approach yields a quantitative copy-number dependency score, which distinguishes cis- versus trans-effects. When applied to a breast cancer data set, numerous expression profiles were impacted by cis-acting copy-number alterations, including several known oncogenes such as GRB7, ERBB2, and LSM1. Several trans-acting alterations were also identified, impacting genes such as ADAM2 and BAGE, which warrant further investigation. AVAILABILITY An R package named lol is available from www.markowetzlab.org/software/lol.html.
Collapse
Affiliation(s)
- Yinyin Yuan
- Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, United Kingdom.
| | | | | | | |
Collapse
|
30
|
Park C, Ahn J, Yoon Y, Park S. Identification of functional CNV region networks using a CNV-gene mapping algorithm in a genome-wide scale. ACTA ACUST UNITED AC 2012; 28:2045-51. [PMID: 22652832 DOI: 10.1093/bioinformatics/bts318] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
MOTIVATION Identifying functional relation of copy number variation regions (CNVRs) and gene is an essential process in understanding the impact of genotypic variations on phenotype. There have been many related works, but only a few attempts were made to normal populations. RESULTS To analyze the functions of genome-wide CNVRs, we applied a novel correlation measure called Correlation based on Sample Set (CSS) to paired Whole Genome TilePath array and messenger RNA (mRNA) microarray data from 210 HapMap individuals with normal phenotypes and calculated the confident CNVR-gene relationships. Two CNVR nodes form an edge if they regulate a common set of genes, allowing the construction of a global CNVR network. We performed functional enrichment on the common genes that were trans-regulated from CNVRs clustered together in our CNVR network. As a result, we observed that most of CNVR clusters in our CNVR network were reported to be involved in some biological processes or cellular functions, while most CNVR clusters from randomly constructed CNVR networks showed no evidence of functional enrichment. Those results imply that CSS is capable of finding related CNVR-gene pairs and CNVR networks that have functional significance. AVAILABILITY http://embio.yonsei.ac.kr/~ Park/cnv_net.php. CONTACT sanghyun@cs.yonsei.ac.kr SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chihyun Park
- Department of Computer Science, Yonsei University, South Korea, Seoul 120-749, South Korea
| | | | | | | |
Collapse
|
31
|
Kuijjer ML, Rydbeck H, Kresse SH, Buddingh EP, Lid AB, Roelofs H, Bürger H, Myklebost O, Hogendoorn PCW, Meza-Zepeda LA, Cleton-Jansen AM. Identification of osteosarcoma driver genes by integrative analysis of copy number and gene expression data. Genes Chromosomes Cancer 2012; 51:696-706. [PMID: 22454324 DOI: 10.1002/gcc.21956] [Citation(s) in RCA: 105] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2011] [Accepted: 03/02/2012] [Indexed: 12/11/2022] Open
Abstract
High-grade osteosarcoma is a tumor with a complex genomic profile, occurring primarily in adolescents with a second peak at middle age. The extensive genomic alterations obscure the identification of genes driving tumorigenesis during osteosarcoma development. To identify such driver genes, we integrated DNA copy number profiles (Affymetrix SNP 6.0) of 32 diagnostic biopsies with 84 expression profiles (Illumina Human-6 v2.0) of high-grade osteosarcoma as compared with its putative progenitor cells, i.e., mesenchymal stem cells (n = 12) or osteoblasts (n = 3). In addition, we performed paired analyses between copy number and expression profiles of a subset of 29 patients for which both DNA and mRNA profiles were available. Integrative analyses were performed in Nexus Copy Number software and statistical language R. Paired analyses were performed on all probes detecting significantly differentially expressed genes in corresponding LIMMA analyses. For both nonpaired and paired analyses, copy number aberration frequency was set to >35%. Nonpaired and paired integrative analyses resulted in 45 and 101 genes, respectively, which were present in both analyses using different control sets. Paired analyses detected >90% of all genes found with the corresponding nonpaired analyses. Remarkably, approximately twice as many genes as found in the corresponding nonpaired analyses were detected. Affected genes were intersected with differentially expressed genes in osteosarcoma cell lines, resulting in 31 new osteosarcoma driver genes. Cell division related genes, such as MCM4 and LATS2, were overrepresented and genomic instability was predictive for metastasis-free survival, suggesting that deregulation of the cell cycle is a driver of osteosarcomagenesis.
Collapse
Affiliation(s)
- Marieke L Kuijjer
- Department of Pathology, Leiden University Medical Center, Leiden, The Netherlands
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Lahti L, Schäfer M, Klein HU, Bicciato S, Dugas M. Cancer gene prioritization by integrative analysis of mRNA expression and DNA copy number data: a comparative review. Brief Bioinform 2012; 14:27-35. [PMID: 22441573 PMCID: PMC3548603 DOI: 10.1093/bib/bbs005] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
A variety of genome-wide profiling techniques are available to investigate complementary aspects of genome structure and function. Integrative analysis of heterogeneous data sources can reveal higher level interactions that cannot be detected based on individual observations. A standard integration task in cancer studies is to identify altered genomic regions that induce changes in the expression of the associated genes based on joint analysis of genome-wide gene expression and copy number profiling measurements. In this review, we highlight common approaches to genomic data integration and provide a transparent benchmarking procedure to quantitatively compare method performances in cancer gene prioritization. Algorithms, data sets and benchmarking results are available at http://intcomp.r-forge.r-project.org.
Collapse
Affiliation(s)
- Leo Lahti
- Wageningen University, Laboratory of Microbiology, 6703HB Wageningen, Netherlands.
| | | | | | | | | |
Collapse
|
33
|
Microarray-based copy number analysis of neurofibromatosis type-1 (NF1)-associated malignant peripheral nerve sheath tumors reveals a role for Rho-GTPase pathway genes in NF1 tumorigenesis. Hum Mutat 2012; 33:763-76. [DOI: 10.1002/humu.22044] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2011] [Accepted: 01/18/2012] [Indexed: 01/22/2023]
|
34
|
Ooi WF, Re A, Sidarovich V, Canella V, Arseni N, Adami V, Guarguaglini G, Giubettini M, Scaruffi P, Stigliani S, Lavia P, Tonini GP, Quattrone A. Segmental chromosome aberrations converge on overexpression of mitotic spindle regulatory genes in high-risk neuroblastoma. Genes Chromosomes Cancer 2012; 51:545-56. [PMID: 22337647 DOI: 10.1002/gcc.21940] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2011] [Revised: 01/06/2012] [Accepted: 01/07/2012] [Indexed: 12/21/2022] Open
Abstract
Integration of genome-wide profiles of DNA copy number alterations (CNAs) and gene expression variations (GEVs) could provide combined power to the identification of driver genes and gene networks in tumors. Here we merge matched genome and transcriptome microarray analyses from neuroblastoma samples to derive correlation patterns of CNAs and GEVs, irrespective of their genomic location. Neuroblastoma correlation patterns are strongly asymmetrical, being on average 10 CNAs linked to 1 GEV, and show the widespread prevalence of long range covariance. Functional enrichment and network analysis of the genes covarying with CNAs consistently point to a major cell function, the regulation of mitotic spindle assembly. Moreover, elevated expression of 14 key genes promoting this function is strongly associated to high-risk neuroblastomas with 1p loss and MYCN amplification in a set of 410 tumor samples (P < 0.00001). Independent CNA/GEV profiling on neuroblastoma cell lines shows that increased levels of expression of these genes are linked to 1p loss. By this approach, we reveal a convergence of clustered neuroblastoma CNAs toward increased expression of a group of prognostic and functionally cooperating genes. We therefore propose gain of function of the spindle assembly machinery as a lesion potentially offering new targets for therapy of high-risk neuroblastoma.
Collapse
Affiliation(s)
- Wen Fong Ooi
- Laboratory of Translational Genomics, Centre for Integrative Biology and Department of Information Engineering and Computer Science, University of Trento, 38122 Trento, Italy
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Comparative analysis of algorithms for integration of copy number and expression data. Nat Methods 2012; 9:351-5. [PMID: 22327835 DOI: 10.1038/nmeth.1893] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2011] [Accepted: 01/06/2012] [Indexed: 12/15/2022]
Abstract
Chromosomal instability is a hallmark of cancer, and genes that display abnormal expression in aberrant chromosomal regions are likely to be key players in tumor progression. Identifying such driver genes reliably requires computational methods that can integrate genome-scale data from several sources. We compared the performance of ten algorithms that integrate copy-number and transcriptomics data from 15 head and neck squamous cell carcinoma cell lines, 129 lung squamous cell carcinoma primary tumors and simulated data. Our results revealed clear differences between the methods in terms of sensitivity and specificity as well as in their performance in small and large sample sizes. Results of the comparison are available at http://csbi.ltdk.helsinki.fi/cn2gealgo/.
Collapse
|
36
|
Sheng J, Deng HW, Calhoun V, Wang YP. Integrated analysis of gene expression and copy number data on gene shaving using independent component analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1568-79. [PMID: 21519112 PMCID: PMC3146966 DOI: 10.1109/tcbb.2011.71] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
DNA microarray gene expression and microarray-based comparative genomic hybridization (aCGH) have been widely used for biomedical discovery. Because of the large number of genes and the complex nature of biological networks, various analysis methods have been proposed. One such method is "gene shaving," a procedure which identifies subsets of the genes with coherent expression patterns and large variation across samples. Since combining genomic information from multiple sources can improve classification and prediction of diseases, in this paper we proposed a new method, "ICA gene shaving" (ICA, independent component analysis), for jointly analyzing gene expression and copy number data. First we used ICA to analyze joint measurements, gene expression and copy number, of a biological system and project the data onto statistically independent biological processes. Next, we used these results to identify patterns of variation in the data and then applied an iterative shaving method. We investigated the properties of our proposed method by analyzing both simulated and real data. We demonstrated that the robustness of our method to noise using simulated data. Using breast cancer data, we showed that our method is superior to the Generalized Singular Value Decomposition (GSVD) gene shaving method for identifying genes associated with breast cancer.
Collapse
Affiliation(s)
- Jinhua Sheng
- School of Computing and Engineering, University of Missouri – Kansas City, MO, USA
| | - Hong-Wen Deng
- School of Medicine, University of Missouri – Kansas City, MO, USA
| | | | - Yu-Ping Wang
- School of Computing and Engineering, University of Missouri – Kansas City, MO, USA
- Biomedical Engineering, Tulane University, New Orleans, LA, USA
| |
Collapse
|
37
|
Park C, Ahn J, Yoon Y, Park S. A multi-sample based method for identifying common CNVs in normal human genomic structure using high-resolution aCGH data. PLoS One 2011; 6:e26975. [PMID: 22073121 PMCID: PMC3205051 DOI: 10.1371/journal.pone.0026975] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2011] [Accepted: 10/07/2011] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND It is difficult to identify copy number variations (CNV) in normal human genomic data due to noise and non-linear relationships between different genomic regions and signal intensity. A high-resolution array comparative genomic hybridization (aCGH) containing 42 million probes, which is very large compared to previous arrays, was recently published. Most existing CNV detection algorithms do not work well because of noise associated with the large amount of input data and because most of the current methods were not designed to analyze normal human samples. Normal human genome analysis often requires a joint approach across multiple samples. However, the majority of existing methods can only identify CNVs from a single sample. METHODOLOGY AND PRINCIPAL FINDINGS We developed a multi-sample-based genomic variations detector (MGVD) that uses segmentation to identify common breakpoints across multiple samples and a k-means-based clustering strategy. Unlike previous methods, MGVD simultaneously considers multiple samples with different genomic intensities and identifies CNVs and CNV zones (CNVZs); CNVZ is a more precise measure of the location of a genomic variant than the CNV region (CNVR). CONCLUSIONS AND SIGNIFICANCE We designed a specialized algorithm to detect common CNVs from extremely high-resolution multi-sample aCGH data. MGVD showed high sensitivity and a low false discovery rate for a simulated data set, and outperformed most current methods when real, high-resolution HapMap datasets were analyzed. MGVD also had the fastest runtime compared to the other algorithms evaluated when actual, high-resolution aCGH data were analyzed. The CNVZs identified by MGVD can be used in association studies for revealing relationships between phenotypes and genomic aberrations. Our algorithm was developed with standard C++ and is available in Linux and MS Windows format in the STL library. It is freely available at: http://embio.yonsei.ac.kr/~Park/mgvd.php.
Collapse
Affiliation(s)
- Chihyun Park
- Department of Computer Science, Yonsei University, Seoul, South Korea
| | - Jaegyoon Ahn
- Department of Computer Science, Yonsei University, Seoul, South Korea
| | - Youngmi Yoon
- Division of Information Engineering, Gachon University of Medicine and Science, Incheon, South Korea
| | - Sanghyun Park
- Department of Computer Science, Yonsei University, Seoul, South Korea
| |
Collapse
|
38
|
Nemes S, Parris TZ, Danielsson A, Kannius-Janson M, Jonasson JM, Steineck G, Helou K. Segmented regression, a versatile tool to analyze mRNA levels in relation to DNA copy number aberrations. Genes Chromosomes Cancer 2011; 51:77-82. [PMID: 22034095 DOI: 10.1002/gcc.20934] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2011] [Accepted: 08/31/2011] [Indexed: 12/11/2022] Open
Abstract
DNA copy number aberrations (CNA) and subsequent altered gene expression profiles (mRNA levels) are characteristic features of cancerous cells. Integrative genomic analysis aims to identify recurrent CNA that may have a potential role in cancer development, assuming that gene amplification is accompanied by overexpression, while deletions give rise to downregulation of gene expression. We propose a segmented regression-based approach to identify CNA-driven alteration of gene expression profiles. Segmented regression allows to fit piecewise linear models in different domains of CNA joined by a change-point, where the mRNA-CNA relationship undergoes structural changes. Here, we illustrate the implementation and applicability of the proposed model using 1,161 chromosome fragments detected as DNA CNA in primary tumors from 97 breast cancer patients. We identified significant CNA-driven changes in gene expression levels for 341 chromosome fragments, of which 72 showed a nonlinear relationship to CNA. For 59 of 72 chromosome fragments (82%), we observed an initial increase in mRNA levels due to changes in CNA. After the change-point was passed, the mRNA levels reached a plateau, and a further increase in DNA copy numbers did not induce further elevation in mRNA levels. In contrast, for 13 chromosome fragments, the change-point marked the point where mRNA production accelerated. We conclude that segmented regression modeling may provide valuable insights into the impact CNA have on gene expression in cancer cells.
Collapse
Affiliation(s)
- Szilárd Nemes
- Division of Clinical Cancer Epidemiology, Department of Oncology, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | | | | | | | | | | | | |
Collapse
|
39
|
Hsu FH, Serpedin E, Chen Y, Dougherty ER. Stochastic modeling of the relationship between copy number and gene expression based on transcriptional logic. IEEE Trans Biomed Eng 2011; 59:272-80. [PMID: 22042124 DOI: 10.1109/tbme.2011.2173341] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
DNA copy number alterations (CNAs) can cause genetic diseases, and studies have revealed a relationship between CNAs and gene expression; however, the manner in which CNAs relate to gene expression and what regulatory mechanisms underlying the relationship remain unclear. In many instances, real data have revealed a nonlinear relationship between copy number and gene expression. In this paper, queueing theory is used to model this relationship, with the basic structural parameters involving transcription factor (TF) arrival and departure rates. A key finding is that the ratio of TF arrival rate to TF departure rate is critical: small and large ratios corresponding to nonlinear and linear relationships, respectively. Indeed, copy number amplifications do not necessarily lead to expression increases: when one of the regulatory TFs exists in a small amount, copy number gains can cause a down regulation. Using the concept of mutual information, we show that the TF with minimum activation probability can have maximum dependence in regulation: a TF in small amount could result in a nonlinear copy-number-gene-expression relationship and play a major role in regulation. The expectation-maximization algorithm is used to estimate the ratio of TF arrival rate to TF departure rate. The theoretical results are illustrated via simulations.
Collapse
Affiliation(s)
- Fang-Han Hsu
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA.
| | | | | | | |
Collapse
|
40
|
Huang N, Shah PK, Li C. Lessons from a decade of integrating cancer copy number alterations with gene expression profiles. Brief Bioinform 2011; 13:305-16. [PMID: 21949216 DOI: 10.1093/bib/bbr056] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Over the last decade, multiple functional genomic datasets studying chromosomal aberrations and their downstream effects on gene expression have accumulated for several cancer types. A vast majority of them are in the form of paired gene expression profiles and somatic copy number alterations (CNA) information on the same patients identified using microarray platforms. In response, many algorithms and software packages are available for integrating these paired data. Surprisingly, there has been no serious attempt to review the currently available methodologies or the novel insights brought using them. In this work, we discuss the quantitative relationships observed between CNA and gene expression in multiple cancer types and biological milestones achieved using the available methodologies. We discuss the conceptual evolution of both, the step-wise and the joint data integration methodologies over the last decade. We conclude by providing suggestions for building efficient data integration methodologies and asking further biological questions.
Collapse
Affiliation(s)
- Norman Huang
- Department of Biostatistics and Computational Biology, CLS-11075, Dana-Farber Cancer Institute, Harvard School of Public Health, CLS-11075 3 Blackfan Circle, Boston, MA 02115, USA
| | | | | |
Collapse
|
41
|
Jörnsten R, Abenius T, Kling T, Schmidt L, Johansson E, Nordling TEM, Nordlander B, Sander C, Gennemark P, Funa K, Nilsson B, Lindahl L, Nelander S. Network modeling of the transcriptional effects of copy number aberrations in glioblastoma. Mol Syst Biol 2011; 7:486. [PMID: 21525872 PMCID: PMC3101951 DOI: 10.1038/msb.2011.17] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Accepted: 03/21/2011] [Indexed: 12/25/2022] Open
Abstract
DNA copy number aberrations (CNAs) are a hallmark of cancer genomes. However, little is known about how such changes affect global gene expression. We develop a modeling framework, EPoC (Endogenous Perturbation analysis of Cancer), to (1) detect disease-driving CNAs and their effect on target mRNA expression, and to (2) stratify cancer patients into long- and short-term survivors. Our method constructs causal network models of gene expression by combining genome-wide DNA- and RNA-level data. Prognostic scores are obtained from a singular value decomposition of the networks. By applying EPoC to glioblastoma data from The Cancer Genome Atlas consortium, we demonstrate that the resulting network models contain known disease-relevant hub genes, reveal interesting candidate hubs, and uncover predictors of patient survival. Targeted validations in four glioblastoma cell lines support selected predictions, and implicate the p53-interacting protein Necdin in suppressing glioblastoma cell growth. We conclude that large-scale network modeling of the effects of CNAs on gene expression may provide insights into the biology of human cancer. Free software in MATLAB and R is provided.
Collapse
Affiliation(s)
- Rebecka Jörnsten
- Mathematical Sciences, University of Gothenburg and Chalmers University of Technology, Gothenburg, Sweden
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Liu WW, Gao YX, Zhou LP, Duan A, Tan LL, Li WZ, Yan M, Yang HY, Yan SL, Wang MQ, Ding WJ. Observations on Copy Number Variations in a Kidney-yang Deficiency Syndrome Family. EVIDENCE-BASED COMPLEMENTARY AND ALTERNATIVE MEDICINE 2011; 2011:548358. [PMID: 21811512 PMCID: PMC3136678 DOI: 10.1093/ecam/neq069] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2009] [Accepted: 05/19/2010] [Indexed: 11/17/2022]
Abstract
We have performed an analysis of a family with kidney-yang deficiency syndrome (KDS) in order to determine the structural genomic variations through a novel approach designated as “copy number variants” (CNVs). Twelve KDS subjects and three healthy spouses from this family were included in this study. Genomic DNA samples were genotyped utilizing an Affymetrix 100 K single nucleotide polymorphism array, and CNVs were identified by Copy Number Algorithm (CNAT4.0, Affymetrix). Our results demonstrate that 447 deleted and 476 duplicated CNVs are shared among KDS subjects within the family. The homologus ratio of deleted CNVs was as high as 99.78%. One-copy-duplicated CNVs display mid-range homology. For two copies of duplicated CNVs (CNV4), a markedly heterologous ratio was observed. Therefore, with the important exception of CNV4, our data shows that CNVs shared among KDS subjects display typical Mendelian inheritance. A total of 113 genes with established functions were identified from the CNV flanks; significantly enriched genes surrounding CNVs may contribute to certain adaptive benefit. These genes could be classified into categories including: binding and transporter, cell cycle, signal transduction, biogenesis, nerve development, metabolism regulation and immune response. They can also be included into three pathways, that is, signal transduction, metabolic processes and immunological networks. Particularly, the results reported here are consistent with the extensive impairments observed in KDS patients, involving the mass-energy-information-carrying network. In conclusion, this article provides the first set of CNVs from KDS patients that will facilitate our further understanding of the genetic basis of KDS and will allow novel strategies for a rational therapy of this disease.
Collapse
Affiliation(s)
- Wei Wei Liu
- Department of Fundamental Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu 610075, China
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Solvang HK, Lingjærde OC, Frigessi A, Børresen-Dale AL, Kristensen VN. Linear and non-linear dependencies between copy number aberrations and mRNA expression reveal distinct molecular pathways in breast cancer. BMC Bioinformatics 2011; 12:197. [PMID: 21609452 PMCID: PMC3128865 DOI: 10.1186/1471-2105-12-197] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2010] [Accepted: 05/24/2011] [Indexed: 12/13/2022] Open
Abstract
Background Elucidating the exact relationship between gene copy number and expression would enable identification of regulatory mechanisms of abnormal gene expression and biological pathways of regulation. Most current approaches either depend on linear correlation or on nonparametric tests of association that are insensitive to the exact shape of the relationship. Based on knowledge of enzyme kinetics and gene regulation, we would expect the functional shape of the relationship to be gene dependent and to be related to the gene regulatory mechanisms involved. Here, we propose a statistical approach to investigate and distinguish between linear and nonlinear dependences between DNA copy number alteration and mRNA expression. Results We applied the proposed method to DNA copy numbers derived from Illumina 109 K SNP-CGH arrays (using the log R values) and expression data from Agilent 44 K mRNA arrays, focusing on commonly aberrated genomic loci in a collection of 102 breast tumors. Regression analysis was used to identify the type of relationship (linear or nonlinear), and subsequent pathway analysis revealed that genes displaying a linear relationship were overall associated with substantially different biological processes than genes displaying a nonlinear relationship. In the group of genes with a linear relationship, we found significant association to canonical pathways, including purine and pyrimidine metabolism (for both deletions and amplifications) as well as estrogen metabolism (linear amplification) and BRCA-related response to damage (linear deletion). In the group of genes displaying a nonlinear relationship, the top canonical pathways were specific pathways like PTEN and PI13K/AKT (nonlinear amplification) and Wnt(B) and IL-2 signalling (nonlinear deletion). Both amplifications and deletions pointed to the same affected pathways and identified cancer as the top significant disease and cell cycle, cell signaling and cellular development as significant networks. Conclusions This paper presents a novel approach to assessing the validity of the dependence of expression data on copy number data, and this approach may help in identifying the drivers of carcinogenesis.
Collapse
Affiliation(s)
- Hiroko K Solvang
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital, Radiumhospitalet, Montebello, and Department of Biostatistics, Institute of Basic Medical Science, University of Oslo, 0310 Oslo, Norway.
| | | | | | | | | |
Collapse
|
44
|
Hur Y, Lee H. Wavelet-based identification of DNA focal genomic aberrations from single nucleotide polymorphism arrays. BMC Bioinformatics 2011; 12:146. [PMID: 21569311 PMCID: PMC3114745 DOI: 10.1186/1471-2105-12-146] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2010] [Accepted: 05/11/2011] [Indexed: 11/10/2022] Open
Abstract
Background Copy number aberrations (CNAs) are an important molecular signature in cancer initiation, development, and progression. However, these aberrations span a wide range of chromosomes, making it hard to distinguish cancer related genes from other genes that are not closely related to cancer but are located in broadly aberrant regions. With the current availability of high-resolution data sets such as single nucleotide polymorphism (SNP) microarrays, it has become an important issue to develop a computational method to detect driving genes related to cancer development located in the focal regions of CNAs. Results In this study, we introduce a novel method referred to as the wavelet-based identification of focal genomic aberrations (WIFA). The use of the wavelet analysis, because it is a multi-resolution approach, makes it possible to effectively identify focal genomic aberrations in broadly aberrant regions. The proposed method integrates multiple cancer samples so that it enables the detection of the consistent aberrations across multiple samples. We then apply this method to glioblastoma multiforme and lung cancer data sets from the SNP microarray platform. Through this process, we confirm the ability to detect previously known cancer related genes from both cancer types with high accuracy. Also, the application of this approach to a lung cancer data set identifies focal amplification regions that contain known oncogenes, though these regions are not reported using a recent CNAs detecting algorithm GISTIC: SMAD7 (chr18q21.1) and FGF10 (chr5p12). Conclusions Our results suggest that WIFA can be used to reveal cancer related genes in various cancer data sets.
Collapse
Affiliation(s)
- Youngmi Hur
- Dept. of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA
| | | |
Collapse
|
45
|
van Wieringen WN, van de Wiel MA. Exploratory factor analysis of pathway copy number data with an application towards the integration with gene expression data. J Comput Biol 2011; 18:729-41. [PMID: 21554018 DOI: 10.1089/cmb.2009.0209] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Realizing that genes often operate together, studies into the molecular biology of cancer shift focus from individual genes to pathways. In order to understand the regulatory mechanisms of a pathway, one must study its genes at all molecular levels. To facilitate such study at the genomic level, we developed exploratory factor analysis for the characterization of the variability of a pathway's copy number data. A latent variable model that describes the call probability data of a pathway is introduced and fitted with an EM algorithm. In two breast cancer data sets, it is shown that the first two latent variables of GO nodes, which inherit a clear interpretation from the call probabilities, are often related to the proportion of aberrations and a contrast of the probabilities of a loss and of a gain. Linking the latent variables to the node's gene expression data suggests that they capture the "global" effect of genomic aberrations on these transcript levels. In all, the proposed method provides an possibly insightful characterization of pathway copy number data, which may be fruitfully exploited to study the interaction between the pathway's DNA copy number aberrations and data from other molecular levels like gene expression.
Collapse
Affiliation(s)
- Wessel N van Wieringen
- Department of Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands.
| | | |
Collapse
|
46
|
Yuan Y, Li CT, Windram O. Directed partial correlation: inferring large-scale gene regulatory network through induced topology disruptions. PLoS One 2011; 6:e16835. [PMID: 21494330 PMCID: PMC3071805 DOI: 10.1371/journal.pone.0016835] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2010] [Accepted: 01/11/2011] [Indexed: 11/19/2022] Open
Abstract
Inferring regulatory relationships among many genes based on their temporal variation in transcript abundance has been a popular research topic. Due to the nature of microarray experiments, classical tools for time series analysis lose power since the number of variables far exceeds the number of the samples. In this paper, we describe some of the existing multivariate inference techniques that are applicable to hundreds of variables and show the potential challenges for small-sample, large-scale data. We propose a directed partial correlation (DPC) method as an efficient and effective solution to regulatory network inference using these data. Specifically for genomic data, the proposed method is designed to deal with large-scale datasets. It combines the efficiency of partial correlation for setting up network topology by testing conditional independence, and the concept of Granger causality to assess topology change with induced interruptions. The idea is that when a transcription factor is induced artificially within a gene network, the disruption of the network by the induction signifies a genes role in transcriptional regulation. The benchmarking results using GeneNetWeaver, the simulator for the DREAM challenges, provide strong evidence of the outstanding performance of the proposed DPC method. When applied to real biological data, the inferred starch metabolism network in Arabidopsis reveals many biologically meaningful network modules worthy of further investigation. These results collectively suggest DPC is a versatile tool for genomics research. The R package DPC is available for download (http://code.google.com/p/dpcnet/).
Collapse
Affiliation(s)
- Yinyin Yuan
- Cancer Research UK, Cambridge Research Institute, Cambridge, United Kingdom.
| | | | | |
Collapse
|
47
|
Integrative genomic profiling reveals conserved genetic mechanisms for tumorigenesis in common entities of non-Hodgkin's lymphoma. Genes Chromosomes Cancer 2011; 50:313-26. [DOI: 10.1002/gcc.20856] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2010] [Accepted: 01/07/2011] [Indexed: 01/10/2023] Open
|
48
|
Scofield TC, Delmerico JA, Chaudhary V, Valente G. XtremeData dbX: An FPGA-Based Data Warehouse Appliance. Comput Sci Eng 2010. [DOI: 10.1109/mcse.2010.93] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
49
|
Xie T, Zhang C, Zhang B, Molony C, Oudes A, Roberts C, Dai H, Schadt E, Lamb J. A survey of cancer cell lines reveals highly structured and hierarchical relationships within and between DNA and mRNA that may be the result of selection. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2010; 14:91-7. [PMID: 20141331 DOI: 10.1089/omi.2009.0114] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Copy number variation (CNV) is one of the most profound forms of somatic DNA changes that underlie most human cancers. However, the degree of complexity within and between DNA and mRNA variations in cancer cohorts has yet to be fully characterized. Here we characterized the connectivity of CNV/CNV and its contribution to transcriptome in human cancer cell lines. Strikingly, we found there is a significant nonrandom correlation of many unlinked DNA loci and also a significant association between CNV and mRNA expression in cis and in trans (called eCNV). Both distributions of DNA/DNA and DNA/mRNA associations exhibit a scale-free structure showing that, for DNA/DNA, a few loci correlate to many other loci, whereas most loci correlate to only a few loci; and for DNA/mRNA, certain chromosomal loci associate with many mRNAs and that many mRNAs are controlled by more than one locus. This suggests that a small number of DNA loci act as hubs in a hierarchical structure that is highly nonrandom in nature, and genes linking to these hot spots tend to be involved in similar biological functions. Derivation of highly connected structures suggests a process of undirected copy number changes followed by selection of those advantageous to tumor cells during tumorigenesis. Given that the cohort includes many tissue types, our observations may identify a common and important underlying structure present in human tumors.
Collapse
Affiliation(s)
- Tao Xie
- Rosetta Inpharmatics LLC, Seattle Washington, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Schäfer M, Schwender H, Merk S, Haferlach C, Ickstadt K, Dugas M. Integrated analysis of copy number alterations and gene expression: a bivariate assessment of equally directed abnormalities. ACTA ACUST UNITED AC 2009; 25:3228-35. [PMID: 19828576 DOI: 10.1093/bioinformatics/btp592] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
MOTIVATION The analysis of a number of different genetic features like copy number (CN) variation, gene expression (GE) or loss of heterocygosity has considerably increased in recent years, as well as the number of available datasets. This is particularly due to the success of microarray technology. Thus, to understand mechanisms of disease pathogenesis on a molecular basis, e.g. in cancer research, the challenge of analyzing such different data types in an integrated way has become increasingly important. In order to tackle this problem, we propose a new procedure for an integrated analysis of two different data types that searches for genes and genetic regions which for both inputs display strong equally directed deviations from the reference median. We employ this approach, based on a modified correlation coefficient and an explorative Wilcoxon test, to find DNA regions of such abnormalities in GE and CN (e.g. underexpressed genes accompanied by a loss of DNA material). RESULTS In an application to acute myeloid leukemia, our procedure is able to identify various regions on different chromosomes with characteristic abnormalities in GE and CN data and shows a higher sensitivity to differences in abnormalities than standard approaches. While the results support various findings of previous studies, some new interesting DNA regions can be identified. In a simulation study, our procedure also shows more reliable results than standard approaches. AVAILABILITY Code and data available as R packages edira and ediraAMLdata from http://www.statistik.tu-dortmund.de/~schaefer/ CONTACT martin.schaefer@udo.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Martin Schäfer
- Collaborative Research Center 475, TU Dortmund University, Dortmund, Germany.
| | | | | | | | | | | |
Collapse
|