1
|
Li Y, Herold T, Mansmann U, Hornung R. Does combining numerous data types in multi-omics data improve or hinder performance in survival prediction? Insights from a large-scale benchmark study. BMC Med Inform Decis Mak 2024; 24:244. [PMID: 39223659 PMCID: PMC11370316 DOI: 10.1186/s12911-024-02642-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 08/21/2024] [Indexed: 09/04/2024] Open
Abstract
BACKGROUND Predictive modeling based on multi-omics data, which incorporates several types of omics data for the same patients, has shown potential to outperform single-omics predictive modeling. Most research in this domain focuses on incorporating numerous data types, despite the complexity and cost of acquiring them. The prevailing assumption is that increasing the number of data types necessarily improves predictive performance. However, the integration of less informative or redundant data types could potentially hinder this performance. Therefore, identifying the most effective combinations of omics data types that enhance predictive performance is critical for cost-effective and accurate predictions. METHODS In this study, we systematically evaluated the predictive performance of all 31 possible combinations including at least one of five genomic data types (mRNA, miRNA, methylation, DNAseq, and copy number variation) using 14 cancer datasets with right-censored survival outcomes, publicly available from the TCGA database. We employed various prediction methods and up-weighted clinical data in every model to leverage their predictive importance. Harrell's C-index and the integrated Brier Score were used as performance measures. To assess the robustness of our findings, we performed a bootstrap analysis at the level of the included datasets. Statistical testing was conducted for key results, limiting the number of tests to ensure a low risk of false positives. RESULTS Contrary to expectations, we found that using only mRNA data or a combination of mRNA and miRNA data was sufficient for most cancer types. For some cancer types, the additional inclusion of methylation data led to improved prediction results. Far from enhancing performance, the introduction of more data types most often resulted in a decline in performance, which varied between the two performance measures. CONCLUSIONS Our findings challenge the prevailing notion that combining multiple omics data types in multi-omics survival prediction improves predictive performance. Thus, the widespread approach in multi-omics prediction of incorporating as many data types as possible should be reconsidered to avoid suboptimal prediction results and unnecessary expenditure.
Collapse
Affiliation(s)
- Yingxia Li
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany.
| | - Tobias Herold
- Laboratory for Leukemia Diagnostics, Department of Medicine III, LMU University Hospital, LMU Munich, Munich, Germany
| | - Ulrich Mansmann
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany
| | - Roman Hornung
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Marchioninistr. 15, 81377, Munich, Germany
- Munich Center for Machine Learning (MCML), Munich, Germany
| |
Collapse
|
2
|
Nagelberg AL, Sihota TS, Chuang YC, Shi R, Chow JLM, English J, MacAulay C, Lam S, Lam WL, Lockwood WW. Integrative genomics identifies SHPRH as a tumor suppressor gene in lung adenocarcinoma that regulates DNA damage response. Br J Cancer 2024; 131:534-550. [PMID: 38890444 PMCID: PMC11300780 DOI: 10.1038/s41416-024-02755-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 06/03/2024] [Accepted: 06/06/2024] [Indexed: 06/20/2024] Open
Abstract
BACKGROUND Identification of driver mutations and development of targeted therapies has considerably improved outcomes for lung cancer patients. However, significant limitations remain with the lack of identified drivers in a large subset of patients. Here, we aimed to assess the genomic landscape of lung adenocarcinomas (LUADs) from individuals without a history of tobacco use to reveal new genetic drivers of lung cancer. METHODS Integrative genomic analyses combining whole-exome sequencing, copy number, and mutational information for 83 LUAD tumors was performed and validated using external datasets to identify genetic variants with a predicted functional consequence and assess association with clinical outcomes. LUAD cell lines with alteration of identified candidates were used to functionally characterize tumor suppressive potential using a conditional expression system both in vitro and in vivo. RESULTS We identified 21 genes with evidence of positive selection, including 12 novel candidates that have yet to be characterized in LUAD. In particular, SNF2 Histone Linker PHD RING Helicase (SHPRH) was identified due to its frequency of biallelic disruption and location within the familial susceptibility locus on chromosome arm 6q. We found that low SHPRH mRNA expression is associated with poor survival outcomes in LUAD patients. Furthermore, we showed that re-expression of SHPRH in LUAD cell lines with inactivating alterations for SHPRH reduces their in vitro colony formation and tumor burden in vivo. Finally, we explored the biological pathways associated SHPRH inactivation and found an association with the tolerance of LUAD cells to DNA damage. CONCLUSIONS These data suggest that SHPRH is a tumor suppressor gene in LUAD, whereby its expression is associated with more favorable patient outcomes, reduced tumor and mutational burden, and may serve as a predictor of response to DNA damage. Thus, further exploration into the role of SHPRH in LUAD development may make it a valuable biomarker for predicting LUAD risk and prognosis.
Collapse
Affiliation(s)
- Amy L Nagelberg
- Department of Integrative Oncology, BC Cancer Research Institute, Vancouver, BC, Canada
- Department of Pathology & Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Tianna S Sihota
- Department of Integrative Oncology, BC Cancer Research Institute, Vancouver, BC, Canada
- Department of Pathology & Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Yu-Chi Chuang
- Department of Integrative Oncology, BC Cancer Research Institute, Vancouver, BC, Canada
- Interdisciplinary Oncology Program, University of British Columbia, Vancouver, BC, Canada
| | - Rocky Shi
- Department of Integrative Oncology, BC Cancer Research Institute, Vancouver, BC, Canada
- Interdisciplinary Oncology Program, University of British Columbia, Vancouver, BC, Canada
| | - Justine L M Chow
- Department of Integrative Oncology, BC Cancer Research Institute, Vancouver, BC, Canada
| | - John English
- Department of Pathology & Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Calum MacAulay
- Department of Integrative Oncology, BC Cancer Research Institute, Vancouver, BC, Canada
- Department of Pathology & Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Stephen Lam
- Department of Integrative Oncology, BC Cancer Research Institute, Vancouver, BC, Canada
| | - Wan L Lam
- Department of Integrative Oncology, BC Cancer Research Institute, Vancouver, BC, Canada
- Department of Pathology & Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada
- Interdisciplinary Oncology Program, University of British Columbia, Vancouver, BC, Canada
| | - William W Lockwood
- Department of Integrative Oncology, BC Cancer Research Institute, Vancouver, BC, Canada.
- Department of Pathology & Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada.
- Interdisciplinary Oncology Program, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
3
|
Sibilio P, Conte F, Huang Y, Castaldi PJ, Hersh CP, DeMeo DL, Silverman EK, Paci P. Correlation-based network integration of lung RNA sequencing and DNA methylation data in chronic obstructive pulmonary disease. Heliyon 2024; 10:e31301. [PMID: 38807864 PMCID: PMC11130701 DOI: 10.1016/j.heliyon.2024.e31301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 05/08/2024] [Accepted: 05/14/2024] [Indexed: 05/30/2024] Open
Abstract
Chronic Obstructive Pulmonary Disease (COPD) is a heterogeneous, chronic inflammatory process of the lungs and, like other complex diseases, is caused by both genetic and environmental factors. Detailed understanding of the molecular mechanisms of complex diseases requires the study of the interplay among different biomolecular layers, and thus the integration of different omics data types. In this study, we investigated COPD-associated molecular mechanisms through a correlation-based network integration of lung tissue RNA-seq and DNA methylation data of COPD cases (n = 446) and controls (n = 346) derived from the Lung Tissue Research Consortium. First, we performed a SWIM-network based analysis to build separate correlation networks for RNA-seq and DNA methylation data for our case-control study population. Then, we developed a method to integrate the results into a coupled network of differentially expressed and differentially methylated genes to investigate their relationships across both molecular layers. The functional enrichment analysis of the nodes of the coupled network revealed a strikingly significant enrichment in Immune System components, both innate and adaptive, as well as immune-system component communication (interleukin and cytokine-cytokine signaling). Our analysis allowed us to reveal novel putative COPD-associated genes and to analyze their relationships, both at the transcriptomics and epigenomics levels, thus contributing to an improved understanding of COPD pathogenesis.
Collapse
Affiliation(s)
- Pasquale Sibilio
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
- Institute for Systems Analysis and Computer Science “Antonio Ruberti”, National Research Council, Rome, Italy
| | - Federica Conte
- Institute for Systems Analysis and Computer Science “Antonio Ruberti”, National Research Council, Rome, Italy
| | - Yichen Huang
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Peter J. Castaldi
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Craig P. Hersh
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Dawn L. DeMeo
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Edwin K. Silverman
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Paola Paci
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
- Institute for Systems Analysis and Computer Science “Antonio Ruberti”, National Research Council, Rome, Italy
- Karolinska Institutet, 17177, Stockholm, Sweden
| |
Collapse
|
4
|
Mokou M, Narayanasamy S, Stroggilos R, Balaur IA, Vlahou A, Mischak H, Frantzi M. A Drug Repurposing Pipeline Based on Bladder Cancer Integrated Proteotranscriptomics Signatures. Methods Mol Biol 2023; 2684:59-99. [PMID: 37410228 DOI: 10.1007/978-1-0716-3291-8_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/07/2023]
Abstract
Delivering better care for patients with bladder cancer (BC) necessitates the development of novel therapeutic strategies that address both the high disease heterogeneity and the limitations of the current therapeutic modalities, such as drug low efficacy and patient resistance acquisition. Drug repurposing is a cost-effective strategy that targets the reuse of existing drugs for new therapeutic purposes. Such a strategy could open new avenues toward more effective BC treatment. BC patients' multi-omics signatures can be used to guide the investigation of existing drugs that show an effective therapeutic potential through drug repurposing. In this book chapter, we present an integrated multilayer approach that includes cross-omics analyses from publicly available transcriptomics and proteomics data derived from BC tissues and cell lines that were investigated for the development of disease-specific signatures. These signatures are subsequently used as input for a signature-based repurposing approach using the Connectivity Map (CMap) tool. We further explain the steps that may be followed to identify and select existing drugs of increased potential for repurposing in BC patients.
Collapse
Affiliation(s)
- Marika Mokou
- Department of Biomarker Research, Mosaiques Diagnostics, Hannover, Germany.
| | - Shaman Narayanasamy
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Rafael Stroggilos
- Systems Biology Center, Biomedical Research Foundation, Academy of Athens, Athens, Greece
| | - Irina-Afrodita Balaur
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Antonia Vlahou
- Systems Biology Center, Biomedical Research Foundation, Academy of Athens, Athens, Greece
| | - Harald Mischak
- Department of Biomarker Research, Mosaiques Diagnostics, Hannover, Germany
- Institute of Cardiovascular and Medical Sciences, University of Glasgow, Glasgow, UK
| | - Maria Frantzi
- Department of Biomarker Research, Mosaiques Diagnostics, Hannover, Germany
| |
Collapse
|
5
|
Proteomics for Biomarker Discovery for Diagnosis and Prognosis of Kidney Transplantation Rejection. Proteomes 2022; 10:proteomes10030024. [PMID: 35893765 PMCID: PMC9326686 DOI: 10.3390/proteomes10030024] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 06/27/2022] [Accepted: 06/28/2022] [Indexed: 02/07/2023] Open
Abstract
Renal transplantation is currently the treatment of choice for end-stage kidney disease, enabling a quality of life superior to dialysis. Despite this, all transplanted patients are at risk of allograft rejection processes. The gold-standard diagnosis of graft rejection, based on histological analysis of kidney biopsy, is prone to sampling errors and carries high costs and risks associated with such invasive procedures. Furthermore, the routine clinical monitoring, based on urine volume, proteinuria, and serum creatinine, usually only detects alterations after graft histologic damage and does not differentiate between the diverse etiologies. Therefore, there is an urgent need for new biomarkers enabling to predict, with high sensitivity and specificity, the rejection processes and the underlying mechanisms obtained from minimally invasive procedures to be implemented in routine clinical surveillance. These new biomarkers should also detect the rejection processes as early as possible, ideally before the 78 clinical outputs, while enabling balanced immunotherapy in order to minimize rejections and reducing the high toxicities associated with these drugs. Proteomics of biofluids, collected through non-invasive or minimally invasive analysis, e.g., blood or urine, present inherent characteristics that may provide biomarker candidates. The current manuscript reviews biofluids proteomics toward biomarkers discovery that specifically identify subclinical, acute, and chronic immune rejection processes while allowing for the discrimination between cell-mediated or antibody-mediated processes. In time, these biomarkers will lead to patient risk stratification, monitoring, and personalized and more efficient immunotherapies toward higher graft survival and patient quality of life.
Collapse
|
6
|
Dlamini Z, Skepu A, Kim N, Mkhabele M, Khanyile R, Molefi T, Mbatha S, Setlai B, Mulaudzi T, Mabongo M, Bida M, Kgoebane-Maseko M, Mathabe K, Lockhat Z, Kgokolo M, Chauke-Malinga N, Ramagaga S, Hull R. AI and precision oncology in clinical cancer genomics: From prevention to targeted cancer therapies-an outcomes based patient care. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.100965] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
|
7
|
Bjaanæs MM, Nilsen G, Halvorsen AR, Russnes HG, Solberg S, Jørgensen L, Brustugun OT, Lingjærde OC, Helland Å. Whole genome copy number analyses reveal a highly aberrant genome in TP53 mutant lung adenocarcinoma tumors. BMC Cancer 2021; 21:1089. [PMID: 34625038 PMCID: PMC8501630 DOI: 10.1186/s12885-021-08811-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 09/23/2021] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Genetic alterations are common in non-small cell lung cancer (NSCLC), and DNA mutations and translocations are targets for therapy. Copy number aberrations occur frequently in NSCLC tumors and may influence gene expression and further alter signaling pathways. In this study we aimed to characterize the genomic architecture of NSCLC tumors and to identify genomic differences between tumors stratified by histology and mutation status. Furthermore, we sought to integrate DNA copy number data with mRNA expression to find genes with expression putatively regulated by copy number aberrations and the oncogenic pathways associated with these affected genes. METHODS Copy number data were obtained from 190 resected early-stage NSCLC tumors and gene expression data were available from 113 of the adenocarcinomas. Clinical and histopathological data were known, and EGFR-, KRAS- and TP53 mutation status was determined. Allele-specific copy number profiles were calculated using ASCAT, and regional copy number aberration were subsequently obtained and analyzed jointly with the gene expression data. RESULTS The NSCLC tumors tissue displayed overall complex DNA copy number profiles with numerous recurrent aberrations. Despite histological differences, tissue samples from squamous cell carcinomas and adenocarcinomas had remarkably similar copy number patterns. The TP53-mutated lung adenocarcinomas displayed a highly aberrant genome, with significantly altered copy number profiles including gains, losses and focal complex events. The EGFR-mutant lung adenocarcinomas had specific arm-wise aberrations particularly at chromosome7p and 9q. A large number of genes displayed correlation between copy number and expression level, and the PI(3)K-mTOR pathway was highly enriched for such genes. CONCLUSIONS The genomic architecture in NSCLC tumors is complex, and particularly TP53-mutated lung adenocarcinomas displayed highly aberrant copy number profiles. We suggest to always include TP53-mutation status when studying copy number aberrations in NSCLC tumors. Copy number may further impact gene expression and alter cellular signaling pathways.
Collapse
MESH Headings
- Adenocarcinoma of Lung/genetics
- Adenocarcinoma of Lung/pathology
- Alleles
- Carcinoma, Non-Small-Cell Lung/genetics
- Carcinoma, Non-Small-Cell Lung/pathology
- Chromosomes, Human, Pair 7
- Chromosomes, Human, Pair 9
- Class I Phosphatidylinositol 3-Kinases/genetics
- DNA Copy Number Variations
- Ex-Smokers
- Female
- Gene Dosage
- Gene Expression
- Genes, erbB-1/genetics
- Genes, p53
- Genes, ras/genetics
- Humans
- Lung Neoplasms/genetics
- Lung Neoplasms/pathology
- Male
- Non-Smokers
- Polymorphism, Single Nucleotide
- Signal Transduction/genetics
- Smokers
- TOR Serine-Threonine Kinases/genetics
Collapse
Affiliation(s)
- Maria Moksnes Bjaanæs
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital-The Norwegian Radium Hospital, Oslo, Norway
- Department of Oncology, Oslo University Hospital, 4950 Nydalen Oslo, Norway
| | - Gro Nilsen
- Department of Computer Science, University of Oslo, Oslo, Norway
- Centre for Cancer Biomedicine, Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Ann Rita Halvorsen
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital-The Norwegian Radium Hospital, Oslo, Norway
| | - Hege G. Russnes
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital-The Norwegian Radium Hospital, Oslo, Norway
- Department of Pathology, Oslo University Hospital, Oslo, Norway
| | - Steinar Solberg
- Department of Cardiothoracic Surgery, Oslo University Hospital, Oslo, Norway
| | - Lars Jørgensen
- Department of Cardiothoracic Surgery, Oslo University Hospital, Oslo, Norway
| | - Odd Terje Brustugun
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital-The Norwegian Radium Hospital, Oslo, Norway
- Section of Oncology, Vestre Viken Hospital, Drammen, Norway
| | - Ole Christian Lingjærde
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital-The Norwegian Radium Hospital, Oslo, Norway
- Department of Computer Science, University of Oslo, Oslo, Norway
- Centre for Cancer Biomedicine, Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Åslaug Helland
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital-The Norwegian Radium Hospital, Oslo, Norway
- Department of Oncology, Oslo University Hospital, 4950 Nydalen Oslo, Norway
| |
Collapse
|
8
|
Qin G, Liu Z, Xie L. Multiple Omics Data Integration. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11508-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
|
9
|
Oh M, Park S, Kim S, Chae H. Machine learning-based analysis of multi-omics data on the cloud for investigating gene regulations. Brief Bioinform 2020; 22:66-76. [PMID: 32227074 DOI: 10.1093/bib/bbaa032] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Revised: 02/05/2020] [Accepted: 02/25/2020] [Indexed: 02/06/2023] Open
Abstract
Gene expressions are subtly regulated by quantifiable measures of genetic molecules such as interaction with other genes, methylation, mutations, transcription factor and histone modifications. Integrative analysis of multi-omics data can help scientists understand the condition or patient-specific gene regulation mechanisms. However, analysis of multi-omics data is challenging since it requires not only the analysis of multiple omics data sets but also mining complex relations among different genetic molecules by using state-of-the-art machine learning methods. In addition, analysis of multi-omics data needs quite large computing infrastructure. Moreover, interpretation of the analysis results requires collaboration among many scientists, often requiring reperforming analysis from different perspectives. Many of the aforementioned technical issues can be nicely handled when machine learning tools are deployed on the cloud. In this survey article, we first survey machine learning methods that can be used for gene regulation study, and we categorize them according to five different goals: gene regulatory subnetwork discovery, disease subtype analysis, survival analysis, clinical prediction and visualization. We also summarize the methods in terms of multi-omics input types. Then, we explain why the cloud is potentially a good solution for the analysis of multi-omics data, followed by a survey of two state-of-the-art cloud systems, Galaxy and BioVLAB. Finally, we discuss important issues when the cloud is used for the analysis of multi-omics data for the gene regulation study.
Collapse
Affiliation(s)
- Minsik Oh
- Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Korea
| | - Sungjoon Park
- Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Korea.,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Korea.,Bioinformatics Institute, Seoul National University, Seoul, 08826, Korea
| | - Heejoon Chae
- Division of Computer Science, Sookmyung Women's University, Seoul, 04310,Korea
| |
Collapse
|
10
|
Mallik S, Zhao Z. Graph- and rule-based learning algorithms: a comprehensive review of their applications for cancer type classification and prognosis using genomic data. Brief Bioinform 2020; 21:368-394. [PMID: 30649169 PMCID: PMC7373185 DOI: 10.1093/bib/bby120] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Revised: 10/26/2018] [Accepted: 11/21/2018] [Indexed: 12/20/2022] Open
Abstract
Cancer is well recognized as a complex disease with dysregulated molecular networks or modules. Graph- and rule-based analytics have been applied extensively for cancer classification as well as prognosis using large genomic and other data over the past decade. This article provides a comprehensive review of various graph- and rule-based machine learning algorithms that have been applied to numerous genomics data to determine the cancer-specific gene modules, identify gene signature-based classifiers and carry out other related objectives of potential therapeutic value. This review focuses mainly on the methodological design and features of these algorithms to facilitate the application of these graph- and rule-based analytical approaches for cancer classification and prognosis. Based on the type of data integration, we divided all the algorithms into three categories: model-based integration, pre-processing integration and post-processing integration. Each category is further divided into four sub-categories (supervised, unsupervised, semi-supervised and survival-driven learning analyses) based on learning style. Therefore, a total of 11 categories of methods are summarized with their inputs, objectives and description, advantages and potential limitations. Next, we briefly demonstrate well-known and most recently developed algorithms for each sub-category along with salient information, such as data profiles, statistical or feature selection methods and outputs. Finally, we summarize the appropriate use and efficiency of all categories of graph- and rule mining-based learning methods when input data and specific objective are given. This review aims to help readers to select and use the appropriate algorithms for cancer classification and prognosis study.
Collapse
Affiliation(s)
- Saurav Mallik
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center, Houston
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center, Houston
| |
Collapse
|
11
|
Prospects and challenges of multi-omics data integration in toxicology. Arch Toxicol 2020; 94:371-388. [PMID: 32034435 DOI: 10.1007/s00204-020-02656-y] [Citation(s) in RCA: 134] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 01/29/2020] [Indexed: 12/13/2022]
Abstract
Exposure of cells or organisms to chemicals can trigger a series of effects at the regulatory pathway level, which involve changes of levels, interactions, and feedback loops of biomolecules of different types. A single-omics technique, e.g., transcriptomics, will detect biomolecules of one type and thus can only capture changes in a small subset of the biological cascade. Therefore, although applying single-omics analyses can lead to the identification of biomarkers for certain exposures, they cannot provide a systemic understanding of toxicity pathways or adverse outcome pathways. Integration of multiple omics data sets promises a substantial improvement in detecting this pathway response to a toxicant, by an increase of information as such and especially by a systemic understanding. Here, we report the findings of a thorough evaluation of the prospects and challenges of multi-omics data integration in toxicological research. We review the availability of such data, discuss options for experimental design, evaluate methods for integration and analysis of multi-omics data, discuss best practices, and identify knowledge gaps. Re-analyzing published data, we demonstrate that multi-omics data integration can considerably improve the confidence in detecting a pathway response. Finally, we argue that more data need to be generated from studies with a multi-omics-focused design, to define which omics layers contribute most to the identification of a pathway response to a toxicant.
Collapse
|
12
|
Xu J, Yang P, Xue S, Sharma B, Sanchez-Martin M, Wang F, Beaty KA, Dehan E, Parikh B. Translating cancer genomics into precision medicine with artificial intelligence: applications, challenges and future perspectives. Hum Genet 2019; 138:109-124. [PMID: 30671672 PMCID: PMC6373233 DOI: 10.1007/s00439-019-01970-5] [Citation(s) in RCA: 109] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Accepted: 01/02/2019] [Indexed: 02/07/2023]
Abstract
In the field of cancer genomics, the broad availability of genetic information offered by next-generation sequencing technologies and rapid growth in biomedical publication has led to the advent of the big-data era. Integration of artificial intelligence (AI) approaches such as machine learning, deep learning, and natural language processing (NLP) to tackle the challenges of scalability and high dimensionality of data and to transform big data into clinically actionable knowledge is expanding and becoming the foundation of precision medicine. In this paper, we review the current status and future directions of AI application in cancer genomics within the context of workflows to integrate genomic analysis for precision cancer care. The existing solutions of AI and their limitations in cancer genetic testing and diagnostics such as variant calling and interpretation are critically analyzed. Publicly available tools or algorithms for key NLP technologies in the literature mining for evidence-based clinical recommendations are reviewed and compared. In addition, the present paper highlights the challenges to AI adoption in digital healthcare with regard to data requirements, algorithmic transparency, reproducibility, and real-world assessment, and discusses the importance of preparing patients and physicians for modern digitized healthcare. We believe that AI will remain the main driver to healthcare transformation toward precision medicine, yet the unprecedented challenges posed should be addressed to ensure safety and beneficial impact to healthcare.
Collapse
Affiliation(s)
- Jia Xu
- IBM Watson Health, Cambridge, MA, USA.
| | | | - Shang Xue
- IBM Watson Health, Cambridge, MA, USA
| | | | | | - Fang Wang
- IBM Watson Health, Cambridge, MA, USA
| | | | | | | |
Collapse
|
13
|
Liu J, Liang G, Siegmund KD, Lewinger JP. Data integration by multi-tuning parameter elastic net regression. BMC Bioinformatics 2018; 19:369. [PMID: 30305021 PMCID: PMC6180486 DOI: 10.1186/s12859-018-2401-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 09/26/2018] [Indexed: 12/15/2022] Open
Abstract
Background To integrate molecular features from multiple high-throughput platforms in prediction, a regression model that penalizes features from all platforms equally is commonly used. However, data from different platforms are likely to differ in effect sizes, the proportion of predictive features, and correlations structures. Subtle but important features may be missed by shrinking all features equally. Results We propose an Elastic net (EN) model with separate tuning parameter penalties for each platform that is fit using standard software. In a comprehensive simulation study, we evaluated the performance of EN logistic regression with multiple tuning penalties. We found that when the number of informative features differs among the platforms, and when there is no notable correlation between the features from different platforms, the multi-tuning parameter EN yields more predictive models. Moreover, the multi-tuning parameter EN is robust, in the sense that there is no loss of predictivity relative to a single tuning parameter EN when features across all platforms have similar effects. We also investigated the performance of multi-tuning parameter EN using real cancer datasets. Conclusion The proposed multi-tuning parameter EN model, fit using standard penalized regression software, can achieve better prediction in sample classification when integrating multiple genomic platforms, compared to the traditional method where a single penalty parameter is used for all features in different platforms. Electronic supplementary material The online version of this article (10.1186/s12859-018-2401-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jie Liu
- Department of Preventive Medicine, USC Keck School of Medicine, 2001 N Soto Street, Los Angeles, CA, 90089, USA.
| | - Gangning Liang
- USC Institute of Urology and the Catherine & Joseph Aresty Department of Urology, Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA, 90089, USA
| | - Kimberly D Siegmund
- Department of Preventive Medicine, USC Keck School of Medicine, 2001 N Soto Street, Los Angeles, CA, 90089, USA
| | - Juan Pablo Lewinger
- Department of Preventive Medicine, USC Keck School of Medicine, 2001 N Soto Street, Los Angeles, CA, 90089, USA
| |
Collapse
|
14
|
Athreya A, Iyer R, Neavin D, Wang L, Weinshilboum R, Kaddurah-Daouk R, Rush J, Frye M, Bobo W. Augmentation of Physician Assessments with Multi-Omics Enhances Predictability of Drug Response: A Case Study of Major Depressive Disorder. IEEE COMPUT INTELL M 2018; 13:20-31. [PMID: 30467458 DOI: 10.1109/mci.2018.2840660] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
This work proposes a "learning-augmented clinical assessment" workflow to sequentially augment physician assessments of patients' symptoms and their socio-demographic measures with heterogeneous biological measures to accurately predict treatment outcomes using machine learning. Across many psychiatric illnesses, ranging from major depressive disorder to schizophrenia, symptom severity assessments are subjective and do not include biological measures, making predictability in eventual treatment outcomes a challenge. Using data from the Mayo Clinic PGRN-AMPS SSRI trial as a case study, this work demonstrates a significant improvement in the prediction accuracy for antidepressant treatment outcomes in patients with major depressive disorder from 35% to 80% individualized by patient, compared to using only a physician's assessment as the predictors. This improvement is achieved through an iterative overlay of biological measures, starting with metabolites (blood measures modulated by drug action) associated with symptom severity, and then adding in genes associated with metabolomic concentrations. Hence, therapeutic efficacy for a new patient can be assessed prior to treatment, using prediction models that take as inputs, selected biological measures and physician's assessments of depression severity. Of broader significance extending beyond psychiatry, the approach presented in this work can potentially be applied to predicting treatment outcomes for other medical conditions, such as migraine headaches or rheumatoid arthritis, for which patients are treated according to subject-reported assessments of symptom severity.
Collapse
Affiliation(s)
- Arjun Athreya
- Department of Electrical and Computer Engineering, Univ. of Illinois at Urbana-Champaign, IL, USA
| | - Ravishankar Iyer
- Department of Electrical and Computer Engineering, Univ. of Illinois at Urbana-Champaign, IL, USA
| | - Drew Neavin
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, MN, USA
| | - Liewei Wang
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, MN, USA
| | - Richard Weinshilboum
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, MN, USA
| | | | - John Rush
- Department of Psychiatry and Behavioral Sciences, Duke University, NC, USA
| | - Mark Frye
- Department of Psychiatry and Psychology, Mayo Clinic, MN, USA
| | - William Bobo
- Department of Psychiatry and Psychology, Mayo Clinic, FL, USA
| |
Collapse
|
15
|
Misra BB, Langefeld CD, Olivier M, Cox LA. Integrated Omics: Tools, Advances, and Future Approaches. J Mol Endocrinol 2018; 62:JME-18-0055. [PMID: 30006342 DOI: 10.1530/jme-18-0055] [Citation(s) in RCA: 249] [Impact Index Per Article: 35.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2018] [Revised: 07/02/2018] [Accepted: 07/12/2018] [Indexed: 12/13/2022]
Abstract
With the rapid adoption of high-throughput omic approaches to analyze biological samples such as genomics, transcriptomics, proteomics, and metabolomics, each analysis can generate tera- to peta-byte sized data files on a daily basis. These data file sizes, together with differences in nomenclature among these data types, make the integration of these multi-dimensional omics data into biologically meaningful context challenging. Variously named as integrated omics, multi-omics, poly-omics, trans-omics, pan-omics, or shortened to just 'omics', the challenges include differences in data cleaning, normalization, biomolecule identification, data dimensionality reduction, biological contextualization, statistical validation, data storage and handling, sharing, and data archiving. The ultimate goal is towards the holistic realization of a 'systems biology' understanding of the biological question in hand. Commonly used approaches in these efforts are currently limited by the 3 i's - integration, interpretation, and insights. Post integration, these very large datasets aim to yield unprecedented views of cellular systems at exquisite resolution for transformative insights into processes, events, and diseases through various computational and informatics frameworks. With the continued reduction in costs and processing time for sample analyses, and increasing types of omics datasets generated such as glycomics, lipidomics, microbiomics, and phenomics, an increasing number of scientists in this interdisciplinary domain of bioinformatics face these challenges. We discuss recent approaches, existing tools, and potential caveats in the integration of omics datasets for development of standardized analytical pipelines that could be adopted by the global omics research community.
Collapse
Affiliation(s)
- Biswapriya B Misra
- B Misra, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Carl D Langefeld
- C Langefeld, Biostatistical Sciences, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Michael Olivier
- M Olivier, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Laura A Cox
- L Cox, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| |
Collapse
|
16
|
Li C, Lee J, Ding J, Sun S. Integrative analysis of gene expression and methylation data for breast cancer cell lines. BioData Min 2018; 11:13. [PMID: 29983747 PMCID: PMC6019806 DOI: 10.1186/s13040-018-0174-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2018] [Accepted: 06/13/2018] [Indexed: 12/11/2022] Open
Abstract
Background The deadly costs of cancer and necessity for an accurate method of early cancer detection have demanded the identification of genetic and epigenetic factors associated with cancer. DNA methylation, an epigenetic event, plays an important role in cancer susceptibility. In this paper, we use DNA methylation and gene expression data integration and pathway analysis to further explore and understand the complex relationship between methylation and gene expression. Results Through linear modeling and analysis of variance, we obtain genes that show a significant correlation between methylation and gene expression. We then examine the functions and relationships of these genes using bioinformatic tools and databases. In particular, using ConsensusPathDB, we analyze the networks of statistically significant genes to identify hub genes, genes with a large number of links to other genes. We identify eight major hub genes, all in strong association with cancer susceptibility. Through further analysis of the function, gene expression level, and methylation level of these hub genes, we conclude that they are novel potential biomarkers for breast cancer. Conclusions Our findings have various implications for cancer screening, early detection methods, and potential novel treatments for cancer. Researchers can also use our results to develop more effective methods for cancer study. Electronic supplementary material The online version of this article (10.1186/s13040-018-0174-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Juyon Lee
- Korea International School Pangyo Campus, Seongnam, South Korea
| | - Jessica Ding
- Liberal Arts and Science Academy, Austin, Texas USA
| | - Shuying Sun
- 4Department of Mathematics, Texas State University, San Marcos, TX USA
| |
Collapse
|
17
|
Noell G, Faner R, Agustí A. From systems biology to P4 medicine: applications in respiratory medicine. Eur Respir Rev 2018; 27:27/147/170110. [PMID: 29436404 PMCID: PMC9489012 DOI: 10.1183/16000617.0110-2017] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Accepted: 11/30/2017] [Indexed: 12/22/2022] Open
Abstract
Human health and disease are emergent properties of a complex, nonlinear, dynamic multilevel biological system: the human body. Systems biology is a comprehensive research strategy that has the potential to understand these emergent properties holistically. It stems from advancements in medical diagnostics, “omics” data and bioinformatic computing power. It paves the way forward towards “P4 medicine” (predictive, preventive, personalised and participatory), which seeks to better intervene preventively to preserve health or therapeutically to cure diseases. In this review, we: 1) discuss the principles of systems biology; 2) elaborate on how P4 medicine has the potential to shift healthcare from reactive medicine (treatment of illness) to predict and prevent illness, in a revolution that will be personalised in nature, probabilistic in essence and participatory driven; 3) review the current state of the art of network (systems) medicine in three prevalent respiratory diseases (chronic obstructive pulmonary disease, asthma and lung cancer); and 4) outline current challenges and future goals in the field. Systems biology and network medicine have the potential to transform medical research and practicehttp://ow.ly/r3jR30hf35x
Collapse
Affiliation(s)
- Guillaume Noell
- Institut d'Investigacions Biomediques August Pi i Sunyer (IDIBAPS), Barcelona, Spain.,CIBER Enfermedades Respiratorias (CIBERES), Barcelona, Spain
| | - Rosa Faner
- Institut d'Investigacions Biomediques August Pi i Sunyer (IDIBAPS), Barcelona, Spain.,CIBER Enfermedades Respiratorias (CIBERES), Barcelona, Spain
| | - Alvar Agustí
- Institut d'Investigacions Biomediques August Pi i Sunyer (IDIBAPS), Barcelona, Spain .,CIBER Enfermedades Respiratorias (CIBERES), Barcelona, Spain.,Respiratory Institute, Hospital Clinic, Universitat de Barcelona, Barcelona, Spain
| |
Collapse
|
18
|
Huang S, Chaudhary K, Garmire LX. More Is Better: Recent Progress in Multi-Omics Data Integration Methods. Front Genet 2017; 8:84. [PMID: 28670325 PMCID: PMC5472696 DOI: 10.3389/fgene.2017.00084] [Citation(s) in RCA: 407] [Impact Index Per Article: 50.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2017] [Accepted: 06/01/2017] [Indexed: 01/20/2023] Open
Abstract
Multi-omics data integration is one of the major challenges in the era of precision medicine. Considerable work has been done with the advent of high-throughput studies, which have enabled the data access for downstream analyses. To improve the clinical outcome prediction, a gamut of software tools has been developed. This review outlines the progress done in the field of multi-omics integration and comprehensive tools developed so far in this field. Further, we discuss the integration methods to predict patient survival at the end of the review.
Collapse
Affiliation(s)
- Sijia Huang
- Epidemiology Program, University of Hawaii Cancer CenterHonolulu, HI, United States.,Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at ManoaHonolulu, HI, United States
| | - Kumardeep Chaudhary
- Epidemiology Program, University of Hawaii Cancer CenterHonolulu, HI, United States
| | - Lana X Garmire
- Epidemiology Program, University of Hawaii Cancer CenterHonolulu, HI, United States.,Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at ManoaHonolulu, HI, United States.,Department of Obstetrics, Gynecology, and Women's Health, John A. Burns School of Medicine, University of Hawaii at ManoaHonolulu, HI, United States
| |
Collapse
|
19
|
Ren W, Li W, Wang D, Hu S, Suo J, Ying X. Combining multi-dimensional data to identify key genes and pathways in gastric cancer. PeerJ 2017; 5:e3385. [PMID: 28603669 PMCID: PMC5463969 DOI: 10.7717/peerj.3385] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2017] [Accepted: 05/06/2017] [Indexed: 12/22/2022] Open
Abstract
Gastric cancer is an aggressive cancer that is often diagnosed late. Early detection and treatment require a better understanding of the molecular pathology of the disease. The present study combined data on gene expression and regulatory levels (microRNA, methylation, copy number) with the aim of identifying key genes and pathways for gastric cancer. Data used in this study was retrieved from The Cancer Genomic Atlas. Differential analyses between gastric cancer and normal tissues were carried out using Limma. Copy number alterations were identified for tumor samples. Bimodal filtering of differentially expressed genes (DEGs) based on regulatory changes was performed to identify candidate genes. Protein–protein interaction networks for candidate genes were generated by Cytoscape software. Gene ontology and pathway analyses were performed, and disease-associated network was constructed using the Agilent literature search plugin on Cytoscape. In total, we identified 3602 DEGs, 251 differentially expressed microRNAs, 604 differential methylation-sites, and 52 copy number altered regions. Three groups of candidate genes controlled by different regulatory mechanisms were screened out. Interaction networks for candidate genes were constructed consisting of 415, 228, and 233 genes, respectively, all of which were enriched in cell cycle, P53 signaling, DNA replication, viral carcinogenesis, HTLV-1 infection, and progesterone mediated oocyte maturation pathways. Nine hub genes (SRC, KAT2B, NR3C1, CDK6, MCM2, PRKDC, BLM, CCNE1, PARK2) were identified that were presumed to be key regulators of the networks; seven of these were shown to be implicated in gastric cancer through disease-associated network construction. The genes and pathways identified in our study may play pivotal roles in gastric carcinogenesis and have clinical significance.
Collapse
Affiliation(s)
- Wu Ren
- Department of Gastrointestinal Surgery, The First Hospital of Jilin University, Changchun, China.,Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Wei Li
- Department of Gastrointestinal Surgery, The First Hospital of Jilin University, Changchun, China
| | - Daguang Wang
- Department of Gastrointestinal Surgery, The First Hospital of Jilin University, Changchun, China
| | - Shuofeng Hu
- Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Jian Suo
- Department of Gastrointestinal Surgery, The First Hospital of Jilin University, Changchun, China
| | - Xiaomin Ying
- Beijing Institute of Basic Medical Sciences, Beijing, China
| |
Collapse
|
20
|
Bontha SV, Maluf DG, Mueller TF, Mas VR. Systems Biology in Kidney Transplantation: The Application of Multi-Omics to a Complex Model. Am J Transplant 2017; 17:11-21. [PMID: 27214826 DOI: 10.1111/ajt.13881] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Revised: 04/15/2016] [Accepted: 05/12/2016] [Indexed: 01/25/2023]
Abstract
In spite of reduction of rejection rates and improvement in short-term survival post-kidney transplantation, modest progress has occurred in long-term graft attrition over the years. Timely identification of molecular events that precede clinical and histopathological changes might help in early intervention and thereby increase the graft half-life. Evolution of "omics" tools has enabled systemic investigation of the influence of the whole genome, epigenome, transcriptome, proteome and microbiome on transplant function and survival. In this omics era, systemic approaches, in-depth clinical phenotyping and use of strict validation methods are the key for further understanding the complex mechanisms associated with graft function. Systems biology is an interdisciplinary holistic approach that focuses on complex and dynamic interactions within biological systems. The complexity of the human kidney transplant is unlikely to be captured by a reductionist approach. It appears essential to integrate multi-omics data that can elucidate the multidimensional and multilayered regulation of the underlying heterogeneous and complex kidney transplant model. Herein, we discuss studies that focus on genetic biomarkers, emerging technologies and systems biology approaches, which should increase the ability to discover biomarkers, understand mechanisms and stratify patients and responses post-kidney transplantation.
Collapse
Affiliation(s)
- S V Bontha
- Translational Genomics Transplant Laboratory, Division of Transplant, Department of Surgery, University of Virginia, Charlottesville, VA
| | - D G Maluf
- Translational Genomics Transplant Laboratory, Division of Transplant, Department of Surgery, University of Virginia, Charlottesville, VA
| | - T F Mueller
- Division of Nephrology, University Hospital, Zürich, Switzerland
| | - V R Mas
- Translational Genomics Transplant Laboratory, Division of Transplant, Department of Surgery, University of Virginia, Charlottesville, VA
| |
Collapse
|
21
|
An X, Hu J, Do KA. SIFORM: shared informative factor models for integration of multi-platform bioinformatic data. Bioinformatics 2016; 32:3279-3290. [PMID: 27381342 DOI: 10.1093/bioinformatics/btw295] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2015] [Accepted: 04/28/2016] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION High-dimensional omic data derived from different technological platforms have been extensively used to facilitate comprehensive understanding of disease mechanisms and to determine personalized health treatments. Numerous studies have integrated multi-platform omic data; however, few have efficiently and simultaneously addressed the problems that arise from high dimensionality and complex correlations. RESULTS We propose a statistical framework of shared informative factor models that can jointly analyze multi-platform omic data and explore their associations with a disease phenotype. The common disease-associated sample characteristics across different data types can be captured through the shared structure space, while the corresponding weights of genetic variables directly index the strengths of their association with the phenotype. Extensive simulation studies demonstrate the performance of the proposed method in terms of biomarker detection accuracy via comparisons with three popular regularized regression methods. We also apply the proposed method to The Cancer Genome Atlas lung adenocarcinoma dataset to jointly explore associations of mRNA expression and protein expression with smoking status. Many of the identified biomarkers belong to key pathways for lung tumorigenesis, some of which are known to show differential expression across smoking levels. We discover potential biomarkers that reveal different mechanisms of lung tumorigenesis between light smokers and heavy smokers. AVAILABILITY AND IMPLEMENTATION R code to implement the new method can be downloaded from http://odin.mdacc.tmc.edu/jhhu/ CONTACT: jhu@mdanderson.org.
Collapse
Affiliation(s)
- Xuebei An
- Department of Biostatistics, the University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Jianhua Hu
- Department of Biostatistics, the University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Kim-Anh Do
- Department of Biostatistics, the University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| |
Collapse
|
22
|
Bersanelli M, Mosca E, Remondini D, Giampieri E, Sala C, Castellani G, Milanesi L. Methods for the integration of multi-omics data: mathematical aspects. BMC Bioinformatics 2016; 17 Suppl 2:15. [PMID: 26821531 PMCID: PMC4959355 DOI: 10.1186/s12859-015-0857-9] [Citation(s) in RCA: 246] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Methods for the integrative analysis of multi-omics data are required to draw a more complete and accurate picture of the dynamics of molecular systems. The complexity of biological systems, the technological limits, the large number of biological variables and the relatively low number of biological samples make the analysis of multi-omics datasets a non-trivial problem. RESULTS AND CONCLUSIONS We review the most advanced strategies for integrating multi-omics datasets, focusing on mathematical and methodological aspects.
Collapse
Affiliation(s)
- Matteo Bersanelli
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy. .,Institute of Biomedical Technologies - CNR, Via Fratelli Cervi 93, Segrate MI, 20090, Italy.
| | - Ettore Mosca
- Institute of Biomedical Technologies - CNR, Via Fratelli Cervi 93, Segrate MI, 20090, Italy.
| | - Daniel Remondini
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy.
| | - Enrico Giampieri
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy.
| | - Claudia Sala
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy.
| | - Gastone Castellani
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy.
| | - Luciano Milanesi
- Institute of Biomedical Technologies - CNR, Via Fratelli Cervi 93, Segrate MI, 20090, Italy.
| |
Collapse
|
23
|
He H, Lin D, Zhang J, Wang Y, Deng HW. Biostatistics, Data Mining and Computational Modeling. TRANSLATIONAL BIOINFORMATICS 2016. [DOI: 10.1007/978-94-017-7543-4_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
24
|
Liu Z, Li W, Lv J, Xie R, Huang H, Li Y, He Y, Jiang J, Chen B, Guo S, Chen L. Identification of potential COPD genes based on multi-omics data at the functional level. MOLECULAR BIOSYSTEMS 2016; 12:191-204. [PMID: 26575263 DOI: 10.1039/c5mb00577a] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/17/2025]
Abstract
Chronic obstructive pulmonary disease (COPD) is a complex disease, which involves dysfunctions in multi-omics. The changes in biological processes, such as adhesion junction, signaling transduction, transcriptional regulation, and cell proliferation, will lead to the occurrence of COPD. A novel systematic approach MMMG (Methylation-MicroRNA-MRNA-GO) was proposed to identify potential COPD genes by integrating function information with a methylation profile, a microRNA expression profile and an mRNA expression profile. 8 co-functional classes and 102 potential COPD genes were identified. These genes displayed a high performance in classifying COPD patients and normal samples, revealed COPD-related pathways, and have been confirmed to be associated with COPD by Matthews correlation coefficient (MCC)-values, literature, an independent data set, and pathways. The MMMG method that analyzed multi-omics data at the functional level could effectively identify potential COPD genes. These potential COPD genes would provide in-depth insights into understanding the complexity of COPD genome landscapes, improve the early diagnostics, and guide new efforts to develop therapeutics in the future.
Collapse
Affiliation(s)
- Zhe Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
An N, Yang X, Cheng S, Wang G, Zhang K. Developmental genes significantly afflicted by aberrant promoter methylation and somatic mutation predict overall survival of late-stage colorectal cancer. Sci Rep 2015; 5:18616. [PMID: 26691761 PMCID: PMC4686889 DOI: 10.1038/srep18616] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2015] [Accepted: 11/19/2015] [Indexed: 02/07/2023] Open
Abstract
Carcinogenesis is an exceedingly complicated process, which involves multi-level dysregulations, including genomics (majorly caused by somatic mutation and copy number variation), DNA methylomics, and transcriptomics. Therefore, only looking into one molecular level of cancer is not sufficient to uncover the intricate underlying mechanisms. With the abundant resources of public available data in the Cancer Genome Atlas (TCGA) database, an integrative strategy was conducted to systematically analyze the aberrant patterns of colorectal cancer on the basis of DNA copy number, promoter methylation, somatic mutation and gene expression. In this study, paired samples in each genomic level were retrieved to identify differentially expressed genes with corresponding genetic or epigenetic dysregulations. Notably, the result of gene ontology enrichment analysis indicated that the differentially expressed genes with corresponding aberrant promoter methylation or somatic mutation were both functionally concentrated upon developmental process, suggesting the intimate association between development and carcinogenesis. Thus, by means of random walk with restart, 37 significant development-related genes were retrieved from a priori-knowledge based biological network. In five independent microarray datasets, Kaplan-Meier survival and Cox regression analyses both confirmed that the expression of these genes was significantly associated with overall survival of Stage III/IV colorectal cancer patients.
Collapse
Affiliation(s)
- Ning An
- State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, Peking Union Medical College & Cancer Institute (Hospital), Chinese Academy of Medical Sciences, Beijing, 100021, China
| | - Xue Yang
- State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, Peking Union Medical College & Cancer Institute (Hospital), Chinese Academy of Medical Sciences, Beijing, 100021, China
| | - Shujun Cheng
- State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, Peking Union Medical College & Cancer Institute (Hospital), Chinese Academy of Medical Sciences, Beijing, 100021, China
| | - Guiqi Wang
- Department of Endoscopy, Cancer Hospital, Chinese Academy of Medical Sciences, Beijing, 100021, China
| | - Kaitai Zhang
- State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, Peking Union Medical College & Cancer Institute (Hospital), Chinese Academy of Medical Sciences, Beijing, 100021, China
| |
Collapse
|
26
|
Lu TP, Hsiao CK, Lai LC, Tsai MH, Hsu CP, Lee JM, Chuang EY. Identification of regulatory SNPs associated with genetic modifications in lung adenocarcinoma. BMC Res Notes 2015; 8:92. [PMID: 25889623 PMCID: PMC4384239 DOI: 10.1186/s13104-015-1053-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2014] [Accepted: 03/11/2015] [Indexed: 11/28/2022] Open
Abstract
Background Although much research effort has been devoted to elucidating lung cancer, the molecular mechanism of tumorigenesis still remains unclear. A major challenge to improve the understanding of lung cancer is the difficulty of identifying reproducible differentially expressed genes across independent studies, due to their low consistency. To enhance the reproducibility of the findings, an integrated analysis was performed to identify regulatory SNPs. Thirty-two pairs of tumor and adjacent normal lung tissue specimens were analyzed using Affymetrix U133plus2.0, Affymetrix SNP 6.0, and Illumina Infinium Methylation microarrays. Copy number variations (CNVs) and methylation alterations were analyzed and paired t-tests were used to identify differentially expressed genes. Results A total of 505 differentially expressed genes were identified, and their dysregulated patterns moderately correlated with CNVs and methylation alterations based on the hierarchical clustering analysis. Subsequently, three statistical approaches were performed to explore regulatory SNPs, which revealed that the genotypes of 551 and 66 SNPs were associated with CNV and changes in methylation, respectively. Among them, downstream transcriptional dysregulation was observed in 9 SNPs for CNVs and 4 SNPs for methylation alterations. Conclusions In summary, these identified SNPs concurrently showed the same direction of gene expression changes with genetic modifications, suggesting their pivotal roles in the genome for non-smoking women with lung adenocarcinoma. Electronic supplementary material The online version of this article (doi:10.1186/s13104-015-1053-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tzu-Pin Lu
- Department of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan.
| | - Chuhsing K Hsiao
- Department of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan. .,Bioinformatics and Biostatistics Core, Center of Genomic Medicine, National Taiwan University, Taipei, Taiwan.
| | - Liang-Chuan Lai
- Bioinformatics and Biostatistics Core, Center of Genomic Medicine, National Taiwan University, Taipei, Taiwan. .,Graduate Institute of Physiology, National Taiwan University, Taipei, Taiwan.
| | - Mong-Hsun Tsai
- Bioinformatics and Biostatistics Core, Center of Genomic Medicine, National Taiwan University, Taipei, Taiwan. .,Institute of Biotechnology, National Taiwan University, Taipei, Taiwan.
| | - Chung-Ping Hsu
- Division of Thoracic Surgery, Taichung Veterans General Hospital, Taichung, Taiwan.
| | - Jang-Ming Lee
- Department of Surgery, National Taiwan University Hospital, Taipei, Taiwan.
| | - Eric Y Chuang
- Bioinformatics and Biostatistics Core, Center of Genomic Medicine, National Taiwan University, Taipei, Taiwan. .,Graduate Institute of Biomedical Electronics and Bioinformatics and Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan.
| |
Collapse
|
27
|
Jiang N, Wang L, Chen J, Wang L, Leach L, Luo Z. Conserved and divergent patterns of DNA methylation in higher vertebrates. Genome Biol Evol 2014; 6:2998-3014. [PMID: 25355807 PMCID: PMC4255770 DOI: 10.1093/gbe/evu238] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/20/2014] [Indexed: 02/07/2023] Open
Abstract
DNA methylation in the genome plays a fundamental role in the regulation of gene expression and is widespread in the genome of eukaryotic species. For example, in higher vertebrates, there is a "global" methylation pattern involving complete methylation of CpG sites genome-wide, except in promoter regions that are typically enriched for CpG dinucleotides, or so called "CpG islands." Here, we comprehensively examined and compared the distribution of CpG sites within ten model eukaryotic species and linked the observed patterns to the role of DNA methylation in controlling gene transcription. The analysis revealed two distinct but conserved methylation patterns for gene promoters in human and mouse genomes, involving genes with distinct distributions of promoter CpGs and gene expression patterns. Comparative analysis with four other higher vertebrates revealed that the primary regulatory role of the DNA methylation system is highly conserved in higher vertebrates.
Collapse
Affiliation(s)
- Ning Jiang
- Department of Biostatistics & Computational Biology, SKLG, School of Life Sciences, Fudan University, Shanghai, China School of Biosciences, The University of Birmingham, Birmingham B15 2TT United Kingdom
| | - Lin Wang
- Department of Biostatistics & Computational Biology, SKLG, School of Life Sciences, Fudan University, Shanghai, China
| | - Jing Chen
- School of Biosciences, The University of Birmingham, Birmingham B15 2TT United Kingdom
| | - Luwen Wang
- Department of Biostatistics & Computational Biology, SKLG, School of Life Sciences, Fudan University, Shanghai, China
| | - Lindsey Leach
- School of Biosciences, The University of Birmingham, Birmingham B15 2TT United Kingdom
| | - Zewei Luo
- Department of Biostatistics & Computational Biology, SKLG, School of Life Sciences, Fudan University, Shanghai, China School of Biosciences, The University of Birmingham, Birmingham B15 2TT United Kingdom
| |
Collapse
|
28
|
Wang L, Xiao Y, Ping Y, Li J, Zhao H, Li F, Hu J, Zhang H, Deng Y, Tian J, Li X. Integrating multi-omics for uncovering the architecture of cross-talking pathways in breast cancer. PLoS One 2014; 9:e104282. [PMID: 25137136 PMCID: PMC4138095 DOI: 10.1371/journal.pone.0104282] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2014] [Accepted: 07/07/2014] [Indexed: 12/21/2022] Open
Abstract
Cross-talk among abnormal pathways widely occurs in human cancer and generally leads to insensitivity to cancer treatment. Moreover, alterations in the abnormal pathways are not limited to single molecular level. Therefore, we proposed a strategy that integrates a large number of biological sources at multiple levels for systematic identification of cross-talk among risk pathways in cancer by random walk on protein interaction network. We applied the method to multi-Omics breast cancer data from The Cancer Genome Atlas (TCGA), including somatic mutation, DNA copy number, DNA methylation and gene expression profiles. We identified close cross-talk among many known cancer-related pathways with complex change patterns. Furthermore, we identified key genes (linkers) bridging these cross-talks and showed that these genes carried out consistent biological functions with the linked cross-talking pathways. Through identification of leader genes in each pathway, the architecture of cross-talking pathways was built. Notably, we observed that linkers cooperated with leaders to form the fundamentation of cross-talk of pathways which play core roles in deterioration of breast cancer. As an example, we observed that KRAS showed a direct connection to numerous cancer-related pathways, such as MAPK signaling pathway, suggesting that it may be a central communication hub. In summary, we offer an effective way to characterize complex cross-talk among disease pathways, which can be applied to other diseases and provide useful information for the treatment of cancer.
Collapse
Affiliation(s)
- Li Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yun Xiao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yanyan Ping
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jing Li
- Department of Ultrasonic medicine, The 1st Affiliated Hospital of Heilongjiang University of Chinese Medicine, Harbin, China
| | - Hongying Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Feng Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jing Hu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Hongyi Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yulan Deng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jiawei Tian
- Department of Ultrasonic medicine, The 2nd Affiliated Hospital of Harbin Medical University, Harbin, China
- * E-mail: (JT); (XL)
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- * E-mail: (JT); (XL)
| |
Collapse
|
29
|
Li Y, Vongsangnak W, Chen L, Shen B. Integrative analysis reveals disease-associated genes and biomarkers for prostate cancer progression. BMC Med Genomics 2014; 7 Suppl 1:S3. [PMID: 25080090 PMCID: PMC4110715 DOI: 10.1186/1755-8794-7-s1-s3] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
30
|
Reconstructing targetable pathways in lung cancer by integrating diverse omics data. Nat Commun 2014; 4:2617. [PMID: 24135919 DOI: 10.1038/ncomms3617] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2013] [Accepted: 09/16/2013] [Indexed: 01/04/2023] Open
Abstract
Global 'multi-omics' profiling of cancer cells harbours the potential for characterizing the signalling networks associated with specific oncogenes. Here we profile the transcriptome, proteome and phosphoproteome in a panel of non-small cell lung cancer (NSCLC) cell lines in order to reconstruct targetable networks associated with KRAS dependency. We develop a two-step bioinformatics strategy addressing the challenge of integrating these disparate data sets. We first define an 'abundance-score' combining transcript, protein and phospho-protein abundances to nominate differentially abundant proteins and then use the Prize Collecting Steiner Tree algorithm to identify functional sub-networks. We identify three modules centred on KRAS and MET, LCK and PAK1 and β-Catenin. We validate activation of these proteins in KRAS-dependent (KRAS-Dep) cells and perform functional studies defining LCK as a critical gene for cell proliferation in KRAS-Dep but not KRAS-independent NSCLCs. These results suggest that LCK is a potential druggable target protein in KRAS-Dep lung cancers.
Collapse
|
31
|
Kristensen VN, Lingjærde OC, Russnes HG, Vollan HKM, Frigessi A, Børresen-Dale AL. Principles and methods of integrative genomic analyses in cancer. Nat Rev Cancer 2014; 14:299-313. [PMID: 24759209 DOI: 10.1038/nrc3721] [Citation(s) in RCA: 249] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Combined analyses of molecular data, such as DNA copy-number alteration, mRNA and protein expression, point to biological functions and molecular pathways being deregulated in multiple cancers. Genomic, metabolomic and clinical data from various solid cancers and model systems are emerging and can be used to identify novel patient subgroups for tailored therapy and monitoring. The integrative genomics methodologies that are used to interpret these data require expertise in different disciplines, such as biology, medicine, mathematics, statistics and bioinformatics, and they can seem daunting. The objectives, methods and computational tools of integrative genomics that are available to date are reviewed here, as is their implementation in cancer research.
Collapse
Affiliation(s)
- Vessela N Kristensen
- 1] Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway. [2] K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0313 Oslo, Norway. [3] Department of Clinical Molecular Oncology, Division of Medicine, Akershus University Hospital, 1478 Ahus, Norway
| | - Ole Christian Lingjærde
- 1] K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0313 Oslo, Norway. [2] Division for Biomedical Informatics, Department of Computer Science, University of Oslo, 0316 Oslo, Norway
| | - Hege G Russnes
- 1] Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway. [2] K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0313 Oslo, Norway. [3] Department of Pathology, Oslo University Hospital, 0450 Oslo, Norway
| | - Hans Kristian M Vollan
- 1] Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway. [2] K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0313 Oslo, Norway. [3] Department of Oncology, Division of Cancer, Surgery and Transplantation, Oslo University Hospital, 0450 Oslo, Norway
| | - Arnoldo Frigessi
- 1] Statistics for Innovation, Norwegian Computing Center, 0314 Oslo, Norway. [2] Department of Biostatistics, Institute of Basic Medical Sciences, University of Oslo, PO Box 1122 Blindern, 0317 Oslo, Norway
| | - Anne-Lise Børresen-Dale
- 1] Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway. [2] K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0313 Oslo, Norway
| |
Collapse
|
32
|
Identification of genes with consistent methylation levels across different human tissues. Sci Rep 2014; 4:4351. [PMID: 24619003 PMCID: PMC3950633 DOI: 10.1038/srep04351] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2013] [Accepted: 02/17/2014] [Indexed: 02/02/2023] Open
Abstract
DNA methylation plays an important role in regulating cell growth and disease development. Methylation profiles are examined by bisulfite conversion; however, the lack of markers for bisulfite conversion efficiency and appropriate internal control genes remains a major challenge. To address these issues, we utilized two bioinformatics approaches, coefficients of variances and resampling tests, to identify probes showing stable methylation levels from several independent microarray datasets. Mass spectrometry validated the consistently high methylation levels of the five probes (N4BP2, EGFL8, CTRB1, TSPAN3, and ZNF690) in 13 human tissue types from 24 cell lines. Linear associations between detected methylation levels and methyl concentrations of DNA samples were further demonstrated in three genes (N4BP2, EGFL8, and CTRB1). To summarize, we identified five genes which may serve as internal controls for methylation studies by analyzing large-scale microarray data, and three of them can be used as markers for evaluating the efficiency of bisulfite conversion.
Collapse
|
33
|
Sohn KA, Kim D, Lim J, Kim JH. Relative impact of multi-layered genomic data on gene expression phenotypes in serous ovarian tumors. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 6:S9. [PMID: 24521303 PMCID: PMC3906601 DOI: 10.1186/1752-0509-7-s6-s9] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Background The emerging multi-layers of genomic data have provided unprecedented opportunities for cancer research, especially for the association study between gene expressions and other types of genomic features. No previous approaches, however, provide an adequate statistical framework for or global analysis on the relative impact of different genomic feature layers to gene expression phenotypes. Methods We propose an integrative statistical framework based on a sparse regression to model the impact of multi-layered genomic features on gene expression traits. The proposed approach can be regarded as an integrative expression Quantitative Traits Loci approach in which not only the genetic variations of SNPs or copy number variations but also other features in both genomic and epigenomic levels are used to explain the expression of genes. To highlight the validity of the proposed approach, the TCGA ovarian cancer dataset was analysed as a pilot task. Results The analysis shows that our integrative approach has consistently superior power in predicting gene expression levels compared to that from each single data type-based analysis. Moreover, the proposed method has the advantage of producing a substantially reduced number of spurious associations. We provide an interesting characterization of genes in terms of its genomic association patterns. Important genomic features reported in previous ovarian cancer research are successfully identified as major hubs in the resulting association network between heterogeneous types of genomic features and genes. Conclusions In this paper, we model the gene expression phenotypes with respect to multiple different types of genomic data in an integrative framework. Our analysis reveals the global view on the relative contribution of different genomic feature types to gene expression phenotypes in ovarian cancer.
Collapse
|
34
|
Day TK, Bianco-Miotto T. Common gene pathways and families altered by DNA methylation in breast and prostate cancers. Endocr Relat Cancer 2013; 20:R215-32. [PMID: 23818572 DOI: 10.1530/erc-13-0204] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Epigenetic modifications, such as DNA methylation, are widely studied in cancer as they are stable and easy to measure genome wide. DNA methylation changes have been used to differentiate benign from malignant tissue and to predict tumor recurrence or patient outcome. Multiple genome wide DNA methylation studies in breast and prostate cancers have identified genes that are differentially methylated in malignant tissue compared with non-malignant tissue or in association with hormone receptor status or tumor recurrence. Although this has identified potential biomarkers for diagnosis and prognosis, what is highlighted by reviewing these studies is the similarities between breast and prostate cancers. In particular, the gene families/pathways targeted by DNA methylation in breast and prostate cancers have significant overlap and include homeobox genes, zinc finger transcription factors, S100 calcium binding proteins, and potassium voltage-gated family members. Many of the gene pathways targeted by aberrant methylation in breast and prostate cancers are not targeted in other cancers, suggesting that some of these targets may be specific to hormonal cancers. Genome wide DNA methylation profiles in breast and prostate cancers will not only define more specific and sensitive biomarkers for cancer diagnosis and prognosis but also identify novel therapeutic targets, which may be direct targets of agents that reverse DNA methylation or which may target novel gene families that are themselves DNA methylation targets.
Collapse
Affiliation(s)
- Tanya K Day
- Dame Roma Mitchell Cancer Research Laboratories, Discipline of Medicine, Hanson Institute, Adelaide Prostate Cancer Research Centre, The University of Adelaide, South Australia, Australia
| | | |
Collapse
|
35
|
Aure MR, Steinfeld I, Baumbusch LO, Liestøl K, Lipson D, Nyberg S, Naume B, Sahlberg KK, Kristensen VN, Børresen-Dale AL, Lingjærde OC, Yakhini Z. Identifying in-trans process associated genes in breast cancer by integrated analysis of copy number and expression data. PLoS One 2013; 8:e53014. [PMID: 23382830 PMCID: PMC3559658 DOI: 10.1371/journal.pone.0053014] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2012] [Accepted: 11/22/2012] [Indexed: 12/12/2022] Open
Abstract
Genomic copy number alterations are common in cancer. Finding the genes causally implicated in oncogenesis is challenging because the gain or loss of a chromosomal region may affect a few key driver genes and many passengers. Integrative analyses have opened new vistas for addressing this issue. One approach is to identify genes with frequent copy number alterations and corresponding changes in expression. Several methods also analyse effects of transcriptional changes on known pathways. Here, we propose a method that analyses in-cis correlated genes for evidence of in-trans association to biological processes, with no bias towards processes of a particular type or function. The method aims to identify cis-regulated genes for which the expression correlation to other genes provides further evidence of a network-perturbing role in cancer. The proposed unsupervised approach involves a sequence of statistical tests to systematically narrow down the list of relevant genes, based on integrative analysis of copy number and gene expression data. A novel adjustment method handles confounding effects of co-occurring copy number aberrations, potentially a large source of false positives in such studies. Applying the method to whole-genome copy number and expression data from 100 primary breast carcinomas, 6373 genes were identified as commonly aberrant, 578 were highly in-cis correlated, and 56 were in addition associated in-trans to biological processes. Among these in-trans process associated and cis-correlated (iPAC) genes, 28% have previously been reported as breast cancer associated, and 64% as cancer associated. By combining statistical evidence from three separate subanalyses that focus respectively on copy number, gene expression and the combination of the two, the proposed method identifies several known and novel cancer driver candidates. Validation in an independent data set supports the conclusion that the method identifies genes implicated in cancer.
Collapse
Affiliation(s)
- Miriam Ragle Aure
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, Oslo, Norway
- K. G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, University of Oslo, Oslo, Norway
| | - Israel Steinfeld
- Laboratory of Computational Biology, Computer Science Department, Israel Institute of Technology, Haifa, Israel
| | - Lars Oliver Baumbusch
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, Oslo, Norway
| | - Knut Liestøl
- Biomedical Informatics Lab, Department of Computer Science, University of Oslo, Oslo, Norway
- Centre for Cancer Biomedicine, University of Oslo, Oslo, Norway
| | - Doron Lipson
- Laboratory of Computational Biology, Computer Science Department, Israel Institute of Technology, Haifa, Israel
| | - Sandra Nyberg
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, Oslo, Norway
- K. G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, University of Oslo, Oslo, Norway
| | - Bjørn Naume
- Division of Cancer Medicine and Radiotherapy, Department of Oncology, Oslo University Hospital Radiumhospitalet, Oslo, Norway
| | - Kristine Kleivi Sahlberg
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, Oslo, Norway
- K. G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, University of Oslo, Oslo, Norway
| | - Vessela N. Kristensen
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, Oslo, Norway
- K. G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, University of Oslo, Oslo, Norway
- Institute for Clinical Epidemiology and Molecular Biology (EpiGen) Akershus University Hospital, Akershus, Norway
| | - Anne-Lise Børresen-Dale
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, Oslo, Norway
- K. G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, University of Oslo, Oslo, Norway
| | - Ole Christian Lingjærde
- K. G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, University of Oslo, Oslo, Norway
- Biomedical Informatics Lab, Department of Computer Science, University of Oslo, Oslo, Norway
- Centre for Cancer Biomedicine, University of Oslo, Oslo, Norway
- * E-mail: (OCL); (ZY)
| | - Zohar Yakhini
- Laboratory of Computational Biology, Computer Science Department, Israel Institute of Technology, Haifa, Israel
- Agilent Laboratories, Tel Aviv, Israel
- * E-mail: (OCL); (ZY)
| |
Collapse
|
36
|
Kresse SH, Rydbeck H, Skårn M, Namløs HM, Barragan-Polania AH, Cleton-Jansen AM, Serra M, Liestøl K, Hogendoorn PCW, Hovig E, Myklebost O, Meza-Zepeda LA. Integrative analysis reveals relationships of genetic and epigenetic alterations in osteosarcoma. PLoS One 2012; 7:e48262. [PMID: 23144859 PMCID: PMC3492335 DOI: 10.1371/journal.pone.0048262] [Citation(s) in RCA: 79] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Accepted: 09/21/2012] [Indexed: 12/19/2022] Open
Abstract
Background Osteosarcomas are the most common non-haematological primary malignant tumours of bone, and all conventional osteosarcomas are high-grade tumours showing complex genomic aberrations. We have integrated genome-wide genetic and epigenetic profiles from the EuroBoNeT panel of 19 human osteosarcoma cell lines based on microarray technologies. Principal Findings The cell lines showed complex patterns of DNA copy number changes, where genomic copy number gains were significantly associated with gene-rich regions and losses with gene-poor regions. By integrating the datasets, 350 genes were identified as having two types of aberrations (gain/over-expression, hypo-methylation/over-expression, loss/under-expression or hyper-methylation/under-expression) using a recurrence threshold of 6/19 (>30%) cell lines. The genes showed in general alterations in either DNA copy number or DNA methylation, both within individual samples and across the sample panel. These 350 genes are involved in embryonic skeletal system development and morphogenesis, as well as remodelling of extracellular matrix. The aberrations of three selected genes, CXCL5, DLX5 and RUNX2, were validated in five cell lines and five tumour samples using PCR techniques. Several genes were hyper-methylated and under-expressed compared to normal osteoblasts, and expression could be reactivated by demethylation using 5-Aza-2′-deoxycytidine treatment for four genes tested; AKAP12, CXCL5, EFEMP1 and IL11RA. Globally, there was as expected a significant positive association between gain and over-expression, loss and under-expression as well as hyper-methylation and under-expression, but gain was also associated with hyper-methylation and under-expression, suggesting that hyper-methylation may oppose the effects of increased copy number for detrimental genes. Conclusions Integrative analysis of genome-wide genetic and epigenetic alterations identified dependencies and relationships between DNA copy number, DNA methylation and mRNA expression in osteosarcomas, contributing to better understanding of osteosarcoma biology.
Collapse
Affiliation(s)
- Stine H. Kresse
- Department of Tumour Biology, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
| | - Halfdan Rydbeck
- Department of Tumour Biology, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Magne Skårn
- Department of Tumour Biology, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
| | - Heidi M. Namløs
- Department of Tumour Biology, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
| | - Ana H. Barragan-Polania
- Department of Tumour Biology, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
- Norwegian Microarray Consortium, Department of Molecular Biosciences, University of Oslo, Oslo, Norway
| | | | - Massimo Serra
- Laboratory of Experimental Oncology, Istituto Ortopedico Rizzoli, Bologna, Italy
| | - Knut Liestøl
- Department of Informatics, University of Oslo, Oslo, Norway
| | | | - Eivind Hovig
- Department of Tumour Biology, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Ola Myklebost
- Department of Tumour Biology, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
- Norwegian Microarray Consortium, Department of Molecular Biosciences, University of Oslo, Oslo, Norway
| | - Leonardo A. Meza-Zepeda
- Department of Tumour Biology, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
- Norwegian Microarray Consortium, Department of Molecular Biosciences, University of Oslo, Oslo, Norway
- * E-mail:
| |
Collapse
|
37
|
Huang T, Jiang M, Kong X, Cai YD. Dysfunctions associated with methylation, microRNA expression and gene expression in lung cancer. PLoS One 2012; 7:e43441. [PMID: 22912875 PMCID: PMC3422260 DOI: 10.1371/journal.pone.0043441] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Accepted: 07/23/2012] [Indexed: 12/02/2022] Open
Abstract
Integrating high-throughput data obtained from different molecular levels is essential for understanding the mechanisms of complex diseases such as cancer. In this study, we integrated the methylation, microRNA and mRNA data from lung cancer tissues and normal lung tissues using functional gene sets. For each Gene Ontology (GO) term, three sets were defined: the methylation set, the microRNA set and the mRNA set. The discriminating ability of each gene set was represented by the Matthews correlation coefficient (MCC), as evaluated by leave-one-out cross-validation (LOOCV). Next, the MCCs in the methylation sets, the microRNA sets and the mRNA sets were ranked. By comparing the MCC ranks of methylation, microRNA and mRNA for each GO term, we classified the GO sets into six groups and identified the dysfunctional methylation, microRNA and mRNA gene sets in lung cancer. Our results provide a systematic view of the functional alterations during tumorigenesis that may help to elucidate the mechanisms of lung cancer and lead to improved treatments for patients.
Collapse
Affiliation(s)
- Tao Huang
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China
- Shanghai Center for Bioinformation Technology, Shanghai, People's Republic of China
| | - Min Jiang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China
- State Key Laboratory of Medical Genomics, Ruijin Hospital, Shanghai Jiaotong University, Shanghai, People's Republic of China
| | - Xiangyin Kong
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China
- State Key Laboratory of Medical Genomics, Ruijin Hospital, Shanghai Jiaotong University, Shanghai, People's Republic of China
| | - Yu-Dong Cai
- Institute of Systems Biology, Shanghai University, Shanghai, People's Republic of China
| |
Collapse
|
38
|
Single nucleotide polymorphism microarray analysis in cortisol-secreting adrenocortical adenomas identifies new candidate genes and pathways. Neoplasia 2012; 14:206-18. [PMID: 22496620 DOI: 10.1593/neo.111758] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2011] [Revised: 02/13/2012] [Accepted: 02/13/2012] [Indexed: 02/04/2023] Open
Abstract
The genetic mechanisms underlying adrenocortical tumor development are still largely unknown. We used high-resolution single nucleotide polymorphism microarrays (Affymetrix SNP 6.0) to detect copy number alterations (CNAs) and copy-neutral losses of heterozygosity (cnLOH) in 15 cortisol-secreting adrenocortical adenomas with matched blood samples. We focused on microalterations aiming to discover new candidate genes involved in early tumorigenesis and/or autonomous cortisol secretion. We identified 962 CNAs with a median of 18 CNAs per sample. Half of them involved noncoding regions, 89% were less than 100 kb, and 28% were found in at least two samples. The most frequently gained regions were 5p15.33, 6q16.1, 7p22.3-22.2, 8q24.3, 9q34.2-34.3, 11p15.5, 11q11, 12q12, 16q24.3, 20p11.1-20q21.11, and Xq28 (≥20% of cases), most of them being identified in the same three adenomas. These regions contained among others genes like NOTCH1, CYP11B2, HRAS, and IGF2. Recurrent losses were less common and smaller than gains, being mostly localized at 1p, 6q, and 11q. Pathway analysis revealed that Notch signaling was the most frequently altered. We identified 46 recurrent CNAs that each affected a single gene (31 gains and 15 losses), including genes involved in steroidogenesis (CYP11B1) or tumorigenesis (CTNNB1, EPHA7, SGK1, STIL, FHIT). Finally, 20 small cnLOH in four cases affecting 15 known genes were found. Our findings provide the first high-resolution genome-wide view of chromosomal changes in cortisol-secreting adenomas and identify novel candidate genes, such as HRAS, EPHA7, and SGK1. Furthermore, they implicate that the Notch1 signaling pathway might be involved in the molecular pathogenesis of adrenocortical tumors.
Collapse
|
39
|
Wang D, Zhang Y, Huang Y, Li P, Wang M, Wu R, Cheng L, Zhang W, Zhang Y, Li B, Wang C, Guo Z. Comparison of different normalization assumptions for analyses of DNA methylation data from the cancer genome. Gene 2012; 506:36-42. [PMID: 22771920 DOI: 10.1016/j.gene.2012.06.075] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2011] [Revised: 06/21/2012] [Accepted: 06/22/2012] [Indexed: 01/02/2023]
Abstract
Nowadays, some researchers normalized DNA methylation arrays data in order to remove the technical artifacts introduced by experimental differences in sample preparation, array processing and other factors. However, other researchers analyzed DNA methylation arrays without performing data normalization considering that current normalizations for methylation data may distort real differences between normal and cancer samples because cancer genomes may be extensively subject to hypomethylation and the total amount of CpG methylation might differ substantially among samples. In this study, using eight datasets by Infinium HumanMethylation27 assay, we systemically analyzed the global distribution of DNA methylation changes in cancer compared to normal control and its effect on data normalization for selecting differentially methylated (DM) genes. We showed more differentially methylated (DM) genes could be found in the Quantile/Lowess-normalized data than in the non-normalized data. We found the DM genes additionally selected in the Quantile/Lowess-normalized data showed significantly consistent methylation states in another independent dataset for the same cancer, indicating these extra DM genes were effective biological signals related to the disease. These results suggested normalization can increase the power of detecting DM genes in the context of diagnostic markers which were usually characterized by relatively large effect sizes. Besides, we evaluated the reproducibility of DM discoveries for a particular cancer type, and we found most of the DM genes additionally detected in one dataset showed the same methylation directions in the other dataset for the same cancer type, indicating that these DM genes were effective biological signals in the other dataset. Furthermore, we showed that some DM genes detected from different studies for a particular cancer type were significantly reproducible at the functional level.
Collapse
Affiliation(s)
- Dong Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Vucic EA, Thu KL, Robison K, Rybaczyk LA, Chari R, Alvarez CE, Lam WL. Translating cancer 'omics' to improved outcomes. Genome Res 2012; 22:188-95. [PMID: 22301133 DOI: 10.1101/gr.124354.111] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The genomics era has yielded great advances in the understanding of cancer biology. At the same time, the immense complexity of the cancer genome has been revealed, as well as a striking heterogeneity at the whole-genome (or omics) level that exists between even histologically similar tumors. The vast accrual and public availability of multi-omics databases with associated clinical annotation including tumor histology, patient response, and outcome are a rich resource that has the potential to lead to rapid translation of high-throughput omics to improved overall survival. We focus on the unique advantages of a multidimensional approach to genomic analysis in this new high-throughput omics age and discuss the implications of the changing cancer demographic to translational omics research.
Collapse
Affiliation(s)
- Emily A Vucic
- British Columbia Cancer Research Centre, Vancouver V5Z 1L3, Canada.
| | | | | | | | | | | | | |
Collapse
|
41
|
da Costa Prando E, Cavalli LR, Rainho CA. Evidence of epigenetic regulation of the tumor suppressor gene cluster flanking RASSF1 in breast cancer cell lines. Epigenetics 2012; 6:1413-24. [PMID: 22139571 DOI: 10.4161/epi.6.12.18271] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Epigenetic mechanisms are frequently deregulated in cancer cells and can lead to the silencing of genes with tumor suppressor activities. The isoform A of the Ras-association domain family member 1 (RASSF1A) gene is one of the most frequently silenced transcripts in human tumors, however, few studies have simultaneously investigated epigenetic abnormalities associated with the 3p21.3 tumor suppressor gene cluster flanking RASSF1 (i.e., SEMA3B, HYAL3, HYAL2, HYAL1, TUSC2, RASSF1, ZMYND10, NPRL2, TMEM115, and CACNA2D2). This study aimed to investigate the role of epigenetic changes to these genes in seventeen breast cancer cell lines and in three non-tumorigenic epithelial breast cell lines (184A1, 184B5, and MCF 10A) and to evaluate the effect on gene expression of treatment with the demethylating agent 5-Aza-2'-deoxycytidine and/or Trichostatin A (TSA), a histone deacetylase inhibitor. We report that, although the RASSF1A isoform was determined to be epigenetically silenced in 15 of the 17 breast cancer cell lines, all the cell lines expressed the RASSF1C isoform. Five breast cancer cell lines overexpressed RASSF1C, when compared to the normal epithelial cell line 184A1. Furthermore, the genes HYAL1 and CACNA2D2 were significantly overexpressed after the treatments. After the combinated treatment, RASSF1A re-expression was accompanied by an increase in expression levels of the flanking genes. The Spearman's correlation coefficient indicated a positive co-regulation of the following gene pairs: RASSF1 and TUSC2 (r=0.64, p=0.002), RASSF1 and ZMYND10 (r=0.58, p=0.07), RASSF1 and NPRL2 (r=0.48, p=0.03), ZMYND10 and NPRL2 (r=0.71; p=0,0004), and NPRL2 and TMEM115 (r=0.66, p=0.001). Interestingly, the genes TUSC2, NPRL2 and TMEM115 were found to be unmethylated in each of the untreated cell lines. Chromatin immunoprecipitation using antibodies against the acetylated and trimethylated lysine 9 of histone H3 demonstrated low levels of histone methylation in these genes, which are located closest to RASSF1. These results provide evidence that epigenetic repression is involved in the down-regulation of multiple genes at 3p21.3 in breast cancer cells.
Collapse
Affiliation(s)
- Erika da Costa Prando
- Department of Genetics, Biosciences Institute, Sao Paulo State University, Sao Paulo, Brazil
| | | | | |
Collapse
|
42
|
Sun Z, Chai HS, Wu Y, White WM, Donkena KV, Klein CJ, Garovic VD, Therneau TM, Kocher JPA. Batch effect correction for genome-wide methylation data with Illumina Infinium platform. BMC Med Genomics 2011; 4:84. [PMID: 22171553 PMCID: PMC3265417 DOI: 10.1186/1755-8794-4-84] [Citation(s) in RCA: 91] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2011] [Accepted: 12/16/2011] [Indexed: 01/12/2023] Open
Abstract
Background Genome-wide methylation profiling has led to more comprehensive insights into gene regulation mechanisms and potential therapeutic targets. Illumina Human Methylation BeadChip is one of the most commonly used genome-wide methylation platforms. Similar to other microarray experiments, methylation data is susceptible to various technical artifacts, particularly batch effects. To date, little attention has been given to issues related to normalization and batch effect correction for this kind of data. Methods We evaluated three common normalization approaches and investigated their performance in batch effect removal using three datasets with different degrees of batch effects generated from HumanMethylation27 platform: quantile normalization at average β value (QNβ); two step quantile normalization at probe signals implemented in "lumi" package of R (lumi); and quantile normalization of A and B signal separately (ABnorm). Subsequent Empirical Bayes (EB) batch adjustment was also evaluated. Results Each normalization could remove a portion of batch effects and their effectiveness differed depending on the severity of batch effects in a dataset. For the dataset with minor batch effects (Dataset 1), normalization alone appeared adequate and "lumi" showed the best performance. However, all methods left substantial batch effects intact in the datasets with obvious batch effects and further correction was necessary. Without any correction, 50 and 66 percent of CpGs were associated with batch effects in Dataset 2 and 3, respectively. After QNβ, lumi or ABnorm, the number of CpGs associated with batch effects were reduced to 24, 32, and 26 percent for Dataset 2; and 37, 46, and 35 percent for Dataset 3, respectively. Additional EB correction effectively removed such remaining non-biological effects. More importantly, the two-step procedure almost tripled the numbers of CpGs associated with the outcome of interest for the two datasets. Conclusion Genome-wide methylation data from Infinium Methylation BeadChip can be susceptible to batch effects with profound impacts on downstream analyses and conclusions. Normalization can reduce part but not all batch effects. EB correction along with normalization is recommended for effective batch effect removal.
Collapse
Affiliation(s)
- Zhifu Sun
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic College of Medicine, 200 First Street, Rochester, MN 55905, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Vollan HKM, Caldas C. The breast cancer genome--a key for better oncology. BMC Cancer 2011; 11:501. [PMID: 22128823 PMCID: PMC3268769 DOI: 10.1186/1471-2407-11-501] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2011] [Accepted: 11/30/2011] [Indexed: 12/18/2022] Open
Abstract
Molecular classification has added important knowledge to breast cancer biology, but has yet to be implemented as a clinical standard. Full sequencing of breast cancer genomes could potentially refine classification and give a more complete picture of the mutational profile of cancer and thus aid therapy decisions. Future treatment guidelines must be based on the knowledge derived from histopathological sub-classification of tumors, but with added information from genomic signatures when properly clinically validated. The objective of this article is to give some background on molecular classification, the potential of next generation sequencing, and to outline how this information could be implemented in the clinic.
Collapse
|
44
|
Gene expression studies in autism: moving from the genome to the transcriptome and beyond. Neurobiol Dis 2011; 45:69-75. [PMID: 21839838 DOI: 10.1016/j.nbd.2011.07.017] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2011] [Accepted: 07/20/2011] [Indexed: 12/22/2022] Open
Abstract
Autism is a clinically and genetically heterogeneous neurodevelopmental disorder. Although multiple genes, risk alleles and copy number variants (CNVs) have been implicated in ASD, none of the currently established genetic causes of ASD accounts for more than 2% of the cases, and a genetic diagnosis is not yet possible for most autism patients. Thus, advancing our understanding of autism genetics requires the integration of genetic information with information on genome function, as provided by transcriptomic data. We review recent autism transcriptome studies, in the context of current knowledge of autism genetics, and discuss the utility of gene expression data in evaluating the functional relevance of genetic variants and identifying common molecular pathways dysregulated in autism.
Collapse
|
45
|
Solvang HK, Lingjærde OC, Frigessi A, Børresen-Dale AL, Kristensen VN. Linear and non-linear dependencies between copy number aberrations and mRNA expression reveal distinct molecular pathways in breast cancer. BMC Bioinformatics 2011; 12:197. [PMID: 21609452 PMCID: PMC3128865 DOI: 10.1186/1471-2105-12-197] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2010] [Accepted: 05/24/2011] [Indexed: 12/13/2022] Open
Abstract
Background Elucidating the exact relationship between gene copy number and expression would enable identification of regulatory mechanisms of abnormal gene expression and biological pathways of regulation. Most current approaches either depend on linear correlation or on nonparametric tests of association that are insensitive to the exact shape of the relationship. Based on knowledge of enzyme kinetics and gene regulation, we would expect the functional shape of the relationship to be gene dependent and to be related to the gene regulatory mechanisms involved. Here, we propose a statistical approach to investigate and distinguish between linear and nonlinear dependences between DNA copy number alteration and mRNA expression. Results We applied the proposed method to DNA copy numbers derived from Illumina 109 K SNP-CGH arrays (using the log R values) and expression data from Agilent 44 K mRNA arrays, focusing on commonly aberrated genomic loci in a collection of 102 breast tumors. Regression analysis was used to identify the type of relationship (linear or nonlinear), and subsequent pathway analysis revealed that genes displaying a linear relationship were overall associated with substantially different biological processes than genes displaying a nonlinear relationship. In the group of genes with a linear relationship, we found significant association to canonical pathways, including purine and pyrimidine metabolism (for both deletions and amplifications) as well as estrogen metabolism (linear amplification) and BRCA-related response to damage (linear deletion). In the group of genes displaying a nonlinear relationship, the top canonical pathways were specific pathways like PTEN and PI13K/AKT (nonlinear amplification) and Wnt(B) and IL-2 signalling (nonlinear deletion). Both amplifications and deletions pointed to the same affected pathways and identified cancer as the top significant disease and cell cycle, cell signaling and cellular development as significant networks. Conclusions This paper presents a novel approach to assessing the validity of the dependence of expression data on copy number data, and this approach may help in identifying the drivers of carcinogenesis.
Collapse
Affiliation(s)
- Hiroko K Solvang
- Department of Genetics, Institute for Cancer Research, Oslo University Hospital, Radiumhospitalet, Montebello, and Department of Biostatistics, Institute of Basic Medical Science, University of Oslo, 0310 Oslo, Norway.
| | | | | | | | | |
Collapse
|
46
|
Wierinckx A, Roche M, Raverot G, Legras-Lachuer C, Croze S, Nazaret N, Rey C, Auger C, Jouanneau E, Chanson P, Trouillas J, Lachuer J. Integrated genomic profiling identifies loss of chromosome 11p impacting transcriptomic activity in aggressive pituitary PRL tumors. Brain Pathol 2011; 21:533-43. [PMID: 21251114 DOI: 10.1111/j.1750-3639.2011.00476.x] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Integrative genomics approaches associating DNA structure and transcriptomic analysis should allow the identification of cascades of events relating to tumor aggressiveness. While different genome alterations have been identified in pituitary tumors, none have ever been correlated with the aggressiveness. This study focused on one subtype of pituitary tumor, the prolactin (PRL) pituitary tumors, to identify molecular events associated with the aggressive and malignant phenotypes. We combined a comparative genomic hybridization and transcriptomic analysis of 13 PRL tumors classified as nonaggressive or aggressive. Allelic loss within the p arm region of chromosome 11 was detected in five of the aggressive tumors. Allelic loss in the 11q arm was observed in three of these five tumors, all three of which were considered as malignant based on the occurrence of metastases. Comparison of genomic and transcriptomic data showed that allelic loss impacted upon the expression of genes located in the imbalanced region. Data filtering allowed us to highlight five deregulated genes (DGKZ, CD44, TSG101, GTF2H1, HTATIP2), within the missing 11p region, potentially responsible for triggering the aggressive and malignant phenotypes of PRL tumors. Our combined genomic and transcriptomic analysis underlines the importance of chromosome allelic loss in determining the aggressiveness and malignancy of tumors.
Collapse
|
47
|
Carrell DT. Understanding the genetics of male infertility: progress at the bench and in the clinic. Preface. Syst Biol Reprod Med 2011; 57:1-2. [PMID: 21208141 DOI: 10.3109/19396368.2010.543310] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Affiliation(s)
- Douglas T Carrell
- Department of Surgery Urology, OB-GYN, Physiology, and IVF and Andrology Laboratories, University of Utah School of Medicine, Salt Lake City, UT, USA
| |
Collapse
|
48
|
Deciphering squamous cell carcinoma using multidimensional genomic approaches. J Skin Cancer 2010; 2011:541405. [PMID: 21234096 PMCID: PMC3017908 DOI: 10.1155/2011/541405] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2010] [Accepted: 10/26/2010] [Indexed: 12/04/2022] Open
Abstract
Squamous cell carcinomas (SqCCs) arise in a wide range of tissues including skin, lung, and oral mucosa. Although all SqCCs are epithelial in origin and share common nomenclature, these cancers differ greatly with respect to incidence, prognosis, and treatment. Current knowledge of genetic similarities and differences between SqCCs is insufficient to describe the biology of these cancers, which arise from diverse tissue origins. In this paper we provide a general overview of whole genome approaches for gene and pathway discovery and highlight the advancement of integrative genomics as a state-of-the-art technology in the study of SqCC genetics.
Collapse
|