1
|
Francis D, Sun F. A comparative analysis of mutual information methods for pairwise relationship detection in metagenomic data. BMC Bioinformatics 2024; 25:266. [PMID: 39143554 PMCID: PMC11323399 DOI: 10.1186/s12859-024-05883-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 07/29/2024] [Indexed: 08/16/2024] Open
Abstract
BACKGROUND Construction of co-occurrence networks in metagenomic data often employs correlation to infer pairwise relationships between microbes. However, biological systems are complex and often display qualities non-linear in nature. Therefore, the reliance on correlation alone may overlook important relationships and fail to capture the full breadth of intricacies presented in underlying interaction networks. It is of interest to incorporate metrics that are not only robust in detecting linear relationships, but non-linear ones as well. RESULTS In this paper, we explore the use of various mutual information (MI) estimation approaches for quantifying pairwise relationships in biological data and compare their performances against two traditional measures-Pearson's correlation coefficient, r, and Spearman's rank correlation coefficient, ρ. Metrics are tested on both simulated data designed to mimic pairwise relationships that may be found in ecological systems and real data from a previous study on C. diff infection. The results demonstrate that, in the case of asymmetric relationships, mutual information estimators can provide better detection ability than Pearson's or Spearman's correlation coefficients. Specifically, we find that these estimators have elevated performances in the detection of exploitative relationships, demonstrating the potential benefit of including them in future metagenomic studies. CONCLUSIONS Mutual information (MI) can uncover complex pairwise relationships in biological data that may be missed by traditional measures of association. The inclusion of such relationships when constructing co-occurrence networks can result in a more comprehensive analysis than the use of correlation alone.
Collapse
Affiliation(s)
- Dallace Francis
- Quantitative and Computational Biology Department, University of Southern California, Los Angeles, CA, 90089, USA.
| | - Fengzhu Sun
- Quantitative and Computational Biology Department, University of Southern California, Los Angeles, CA, 90089, USA
| |
Collapse
|
2
|
Timilsina M, Fey D, Buosi S, Janik A, Costabello L, Carcereny E, Abreu DR, Cobo M, Castro RL, Bernabé R, Minervini P, Torrente M, Provencio M, Nováček V. Synergy between imputed genetic pathway and clinical information for predicting recurrence in early stage non-small cell lung cancer. J Biomed Inform 2023; 144:104424. [PMID: 37352900 DOI: 10.1016/j.jbi.2023.104424] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 06/06/2023] [Accepted: 06/11/2023] [Indexed: 06/25/2023]
Abstract
OBJECTIVE Lung cancer exhibits unpredictable recurrence in low-stage tumors and variable responses to different therapeutic interventions. Predicting relapse in early-stage lung cancer can facilitate precision medicine and improve patient survivability. While existing machine learning models rely on clinical data, incorporating genomic information could enhance their efficiency. This study aims to impute and integrate specific types of genomic data with clinical data to improve the accuracy of machine learning models for predicting relapse in early-stage, non-small cell lung cancer patients. METHODS The study utilized a publicly available TCGA lung cancer cohort and imputed genetic pathway scores into the Spanish Lung Cancer Group (SLCG) data, specifically in 1348 early-stage patients. Initially, tumor recurrence was predicted without imputed pathway scores. Subsequently, the SLCG data were augmented with pathway scores imputed from TCGA. The integrative approach aimed to enhance relapse risk prediction performance. RESULTS The integrative approach achieved improved relapse risk prediction with the following evaluation metrics: an area under the precision-recall curve (PR-AUC) score of 0.75, an area under the ROC (ROC-AUC) score of 0.80, an F1 score of 0.61, and a Precision of 0.80. The prediction explanation model SHAP (SHapley Additive exPlanations) was employed to explain the machine learning model's predictions. CONCLUSION We conclude that our explainable predictive model is a promising tool for oncologists that addresses an unmet clinical need of post-treatment patient stratification based on the relapse risk while also improving the predictive power by incorporating proxy genomic data not available for specific patients.
Collapse
Affiliation(s)
- Mohan Timilsina
- Data Science Institute, Insight Centre for Data Analytics, University of Galway, Ireland.
| | - Dirk Fey
- Systems Biology Ireland, University College Dublin, Ireland.
| | - Samuele Buosi
- Data Science Institute, Insight Centre for Data Analytics, University of Galway, Ireland.
| | | | | | - Enric Carcereny
- Catalan Institute of Oncology, Hospital Universitari Germans Trias i Pujol, B-ARGO, IGTP, Badalona, Spain.
| | | | - Manuel Cobo
- Medical Oncology Intercenter Unit. Regional and Virgen de la Victoria University Hospitals. IBIMA. Málaga., Spain.
| | | | - Reyes Bernabé
- Hospital Universitario Virgen del Rocio, Sevilla, Spain.
| | | | - Maria Torrente
- Medical Oncology Department, Hospital Universitario Puerta de Hierro Majadahonda, Madrid, Spain.
| | - Mariano Provencio
- Medical Oncology Department, Hospital Universitario Puerta de Hierro Majadahonda, Madrid, Spain.
| | - Vít Nováček
- Data Science Institute, Insight Centre for Data Analytics, University of Galway, Ireland; Faculty of Informatics, Masaryk University Brno, Czech Republic; Masaryk Memorial Cancer Institute, Brno, Czech Republic.
| |
Collapse
|
3
|
Li B, Yang L, Jiang C, Yao Y, Li H, Cheng S, Zou B, Fan B, Wang L. Integrated multi-dimensional deep neural network model improves prognosis prediction of advanced NSCLC patients receiving bevacizumab. Front Oncol 2023; 13:1052147. [PMID: 36865790 PMCID: PMC9972089 DOI: 10.3389/fonc.2023.1052147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 01/31/2023] [Indexed: 02/16/2023] Open
Abstract
Background The addition of bevacizumab was found to be associated with prolonged survival whether in combination with chemotherapy, tyrosine kinase inhibitors or immune checkpoint inhibitors in the treatment landscape of advanced non-small cell lung cancer (NSCLC) patients. However, the biomarkers for efficacy of bevacizumab were still largely unknown. This study aimed to develop a deep learning model to provide individual assessment of survival in advanced NSCLC patients receiving bevacizumab. Methods All data were retrospectively collected from a cohort of 272 radiological and pathological proven advanced non-squamous NSCLC patients. A novel multi-dimensional deep neural network (DNN) models were trained based on clinicopathological, inflammatory and radiomics features using DeepSurv and N-MTLR algorithm. And concordance index (C-index) and bier score was used to demonstrate the discriminatory and predictive capacity of the model. Results The integration of clinicopathologic, inflammatory and radiomics features representation was performed using DeepSurv and N-MTLR with the C-index of 0.712 and 0.701 in testing cohort. And Cox proportional hazard (CPH) and random survival forest (RSF) models were also developed after data pre-processing and feature selection with the C-index of 0.665 and 0.679 respectively. DeepSurv prognostic model, indicated with best performance, was used for individual prognosis prediction. And patients divided in high-risk group were significantly associated with inferior PFS (median PFS: 5.4 vs 13.1 months, P<0.0001) and OS (median OS: 16.4 vs 21.3 months, P<0.0001). Conclusions The integration of clinicopathologic, inflammatory and radiomics features representation based on DeepSurv model exhibited superior predictive accuracy as non-invasive method to assist in patients counseling and guidance of optimal treatment strategies.
Collapse
Affiliation(s)
- Butuo Li
- Department of Radiation Oncology, Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, Shandong, China
| | - Linlin Yang
- Department of Radiation Oncology, Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, Shandong, China
| | - Chao Jiang
- Department of Otorhinolaryngology Head and Neck Surgery, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, Shandong, China,Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, China
| | - Yueyuan Yao
- Department of Radiation Oncology, Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, Shandong, China
| | - Haoqian Li
- Department of Radiation Oncology, Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, Shandong, China
| | - Shuping Cheng
- Department of Radiation Oncology, Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, Shandong, China
| | - Bing Zou
- Department of Radiation Oncology, Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, Shandong, China
| | - Bingjie Fan
- Department of Radiation Oncology, Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, Shandong, China
| | - Linlin Wang
- Department of Radiation Oncology, Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, Shandong, China,*Correspondence: Linlin Wang,
| |
Collapse
|
4
|
Tran KA, Kondrashova O, Bradley A, Williams ED, Pearson JV, Waddell N. Deep learning in cancer diagnosis, prognosis and treatment selection. Genome Med 2021; 13:152. [PMID: 34579788 PMCID: PMC8477474 DOI: 10.1186/s13073-021-00968-x] [Citation(s) in RCA: 363] [Impact Index Per Article: 90.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2020] [Accepted: 09/12/2021] [Indexed: 12/13/2022] Open
Abstract
Deep learning is a subdiscipline of artificial intelligence that uses a machine learning technique called artificial neural networks to extract patterns and make predictions from large data sets. The increasing adoption of deep learning across healthcare domains together with the availability of highly characterised cancer datasets has accelerated research into the utility of deep learning in the analysis of the complex biology of cancer. While early results are promising, this is a rapidly evolving field with new knowledge emerging in both cancer biology and deep learning. In this review, we provide an overview of emerging deep learning techniques and how they are being applied to oncology. We focus on the deep learning applications for omics data types, including genomic, methylation and transcriptomic data, as well as histopathology-based genomic inference, and provide perspectives on how the different data types can be integrated to develop decision support tools. We provide specific examples of how deep learning may be applied in cancer diagnosis, prognosis and treatment management. We also assess the current limitations and challenges for the application of deep learning in precision oncology, including the lack of phenotypically rich data and the need for more explainable deep learning models. Finally, we conclude with a discussion of how current obstacles can be overcome to enable future clinical utilisation of deep learning.
Collapse
Affiliation(s)
- Khoa A. Tran
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
- School of Biomedical Sciences, Faculty of Health, Queensland University of Technology (QUT), Brisbane, 4059 Australia
| | - Olga Kondrashova
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
| | - Andrew Bradley
- Faculty of Engineering, Queensland University of Technology (QUT), Brisbane, 4000 Australia
| | - Elizabeth D. Williams
- School of Biomedical Sciences, Faculty of Health, Queensland University of Technology (QUT), Brisbane, 4059 Australia
- Australian Prostate Cancer Research Centre - Queensland (APCRC-Q) and Queensland Bladder Cancer Initiative (QBCI), Brisbane, 4102 Australia
| | - John V. Pearson
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
| | - Nicola Waddell
- Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, 4006 Australia
| |
Collapse
|
5
|
Liu M, Katsevich E, Janson L, Ramdas A. Fast and powerful conditional randomization testing via distillation. Biometrika 2021. [DOI: 10.1093/biomet/asab039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Summary
We consider the problem of conditional independence testing: given a response $Y$ and covariates $(X,Z)$, we test the null hypothesis that $Y {\perp\!\!\!\perp} X \mid Z$. The conditional randomization test was recently proposed as a way to use distributional information about $X\mid Z$ to exactly and nonasymptotically control Type-I error using any test statistic in any dimensionality without assuming anything about $Y\mid (X,Z)$. This flexibility, in principle, allows one to derive powerful test statistics from complex prediction algorithms while maintaining statistical validity. Yet the direct use of such advanced test statistics in the conditional randomization test is prohibitively computationally expensive, especially with multiple testing, due to the requirement to recompute the test statistic many times on resampled data. We propose the distilled conditional randomization test, a novel approach to using state-of-the-art machine learning algorithms in the conditional randomization test while drastically reducing the number of times those algorithms need to be run, thereby taking advantage of their power and the conditional randomization test’s statistical guarantees without suffering the usual computational expense. In addition to distillation, we propose a number of other tricks, like screening and recycling computations, to further speed up the conditional randomization test without sacrificing its high power and exact validity. Indeed, we show in simulations that all our proposals combined lead to a test that has similar power to most powerful existing conditional randomization test implementations, but requires orders of magnitude less computation, making it a practical tool even for large datasets. We demonstrate these benefits on a breast cancer dataset by identifying biomarkers related to cancer stage.
Collapse
Affiliation(s)
- Molei Liu
- Department of Biostatistics, Harvard Chan School of Public Health, 677 Huntington Avenue, Boston, Massachusetts 02115, U.S.A
| | - Eugene Katsevich
- Department of Statistics and Data Science, Wharton School of the University of Pennsylvania, 265 South 37th Street, Philadelphia, Pennsylvania 19104, U.S.A
| | - Lucas Janson
- Department of Statistics, Harvard University, One Oxford Street, Cambridge, Massachusetts 02138, U.S.A
| | - Aaditya Ramdas
- Department of Statistics & Data Science, Carnegie Mellon University, 132H Baker Hall, Pittsburgh, Pennsylvania 15213, U.S.A
| |
Collapse
|
6
|
Seal DB, Das V, Goswami S, De RK. Estimating gene expression from DNA methylation and copy number variation: A deep learning regression model for multi-omics integration. Genomics 2020; 112:2833-2841. [PMID: 32234433 DOI: 10.1016/j.ygeno.2020.03.021] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Revised: 03/17/2020] [Accepted: 03/22/2020] [Indexed: 12/21/2022]
Abstract
Gene expression analysis plays a significant role for providing molecular insights in cancer. Various genetic and epigenetic factors (being dealt under multi-omics) affect gene expression giving rise to cancer phenotypes. A recent growth in understanding of multi-omics seems to provide a resource for integration in interdisciplinary biology since they altogether can draw the comprehensive picture of an organism's developmental and disease biology in cancers. Such large scale multi-omics data can be obtained from public consortium like The Cancer Genome Atlas (TCGA) and several other platforms. Integrating these multi-omics data from varied platforms is still challenging due to high noise and sensitivity of the platforms used. Currently, a robust integrative predictive model to estimate gene expression from these genetic and epigenetic data is lacking. In this study, we have developed a deep learning-based predictive model using Deep Denoising Auto-encoder (DDAE) and Multi-layer Perceptron (MLP) that can quantitatively capture how genetic and epigenetic alterations correlate with directionality of gene expression for liver hepatocellular carcinoma (LIHC). The DDAE used in the study has been trained to extract significant features from the input omics data to estimate the gene expression. These features have then been used for back-propagation learning by the multilayer perceptron for the task of regression and classification. We have benchmarked the proposed model against state-of-the-art regression models. Finally, the deep learning-based integration model has been evaluated for its disease classification capability, where an accuracy of 95.1% has been obtained.
Collapse
Affiliation(s)
- Dibyendu Bikash Seal
- A. K. Choudhury School of Information Technology, University of Calcutta, JD-2, Sector III, Salt Lake City, Kolkata 700106, India
| | - Vivek Das
- Novo Nordisk Research Center Seattle, Inc., 530 Fairview Ave N # 5000, Seattle, WA 98109, United States
| | - Saptarsi Goswami
- Bangabasi Morning College, 35 Rajkumar Chakraborty Sarani, Scott Ln, Kolkata 700009, India
| | - Rajat K De
- Machine Intelligence Unit, Indian Statistical Institute, 203 Barrackpore Trunk Road, Kolkata 700108, India.
| |
Collapse
|
7
|
Li F, Wu T, Xu Y, Dong Q, Xiao J, Xu Y, Li Q, Zhang C, Gao J, Liu L, Hu X, Huang J, Li X, Zhang Y. A comprehensive overview of oncogenic pathways in human cancer. Brief Bioinform 2019; 21:957-969. [DOI: 10.1093/bib/bbz046] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Revised: 03/02/2019] [Accepted: 03/31/2019] [Indexed: 12/22/2022] Open
Abstract
Abstract
Alterations of biological pathways can lead to oncogenesis. An overview of these oncogenic pathways would be highly valuable for researchers to reveal the pathogenic mechanism and develop novel therapeutic approaches for cancers. Here, we reviewed approximately 8500 literatures and documented experimentally validated cancer-pathway associations as benchmarking data set. This data resource includes 4709 manually curated relationships between 1557 paths and 49 cancers with 2427 upstream regulators in 7 species. Based on this resource, we first summarized the cancer-pathway associations and revealed some commonly deregulated pathways across tumor types. Then, we systematically analyzed these oncogenic pathways by integrating TCGA pan-cancer data sets. Multi-omics analysis showed oncogenic pathways may play different roles across tumor types under different omics contexts. We also charted the survival relevance landscape of oncogenic pathways in 26 tumor types, identified dominant omics features and found survival relevance for oncogenic pathways varied in tumor types and omics levels. Moreover, we predicted upstream regulators and constructed a hierarchical network model to understand the pathogenic mechanism of human cancers underlying oncogenic pathway context. Finally, we developed `CPAD’ (freely available at http://bio-bigdata.hrbmu.edu.cn/CPAD/), an online resource for exploring oncogenic pathways in human cancers, that integrated manually curated cancer-pathway associations, TCGA pan-cancer multi-omics data sets, drug–target data, drug sensitivity and multi-omics data for cancer cell lines. In summary, our study provides a comprehensive characterization of oncogenic pathways and also presents a valuable resource for investigating the pathogenesis of human cancer.
Collapse
Affiliation(s)
- Feng Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Tan Wu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yanjun Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Qun Dong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jing Xiao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yingqi Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Qian Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Chunlong Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jianxia Gao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Liqiu Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xiaoxu Hu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jian Huang
- Center for Informational Biology, University of Electronic Science and Technology of China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yunpeng Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| |
Collapse
|
8
|
Wei Y, Yan Z, Wu C, Zhang Q, Zhu Y, Li K, Xu Y. Integrated analysis of dosage effect lncRNAs in lung adenocarcinoma based on comprehensive network. Oncotarget 2017; 8:71430-71446. [PMID: 29069717 PMCID: PMC5641060 DOI: 10.18632/oncotarget.19864] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 07/25/2017] [Indexed: 02/07/2023] Open
Abstract
Accumulating evidences indicate that cancer-related lncRNAs occur frequent somatic copy number alternation (SCNA). Although individual SCNA lncRNAs have been implicated in tumor biology, their regulatory mechanism has not been assessed in a systematic way. In order to explore the expression characteristics and biological functions of SCNA lncRNAs in cancer, we built a computational framework based on lncRNA expression profiles, lncRNA copy numbers and dosage sensitivity score (DSS). First, we found that the lncRNAs with different DSS were involved in distinct biological processes, while those with the same DSS had similar functions. Second, some of the lncRNAs participated in the progression and metastasis of lung adenocarcinoma (LUAD) through cis-acting regulation. In lncRNA-TF-mRNA network, lncRNAs interacted with 4 TFs and affected the immune system, and further influenced LUAD progression. Third, competing endogenous RNA network analysis inferred that lncRNA ENSG00000240990 competed with HOXA10 to absorb hsa-let-7a/b/f/g-5p and affected patient prognosis in LUAD. Last but not least, by integrating target information of miRNA we also provided a new perspective for the discovery of potential small molecule drugs. In summary, we systematically analyzed the regulatory role of SCNA lncRNAs. This work may facilitate cancer research and serve as the basis for future efforts to understand the role of SCNA lncRNAs, develop novel biomarkers and improve knowledge of tumor biology.
Collapse
Affiliation(s)
- Yunzhen Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Zichuang Yan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Cheng Wu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Qiang Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yinling Zhu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Kun Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yan Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| |
Collapse
|
9
|
Oncosis and apoptosis induction by activation of an overexpressed ion channel in breast cancer cells. Oncogene 2017; 36:6490-6500. [PMID: 28759041 DOI: 10.1038/onc.2017.234] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2016] [Revised: 06/01/2017] [Accepted: 06/02/2017] [Indexed: 12/30/2022]
Abstract
The critical role of calcium signalling in processes related to cancer cell proliferation and invasion has seen a focus on pharmacological inhibition of overexpressed ion channels in specific cancer subtypes as a potential therapeutic approach. However, despite the critical role of calcium in cell death pathways, pharmacological activation of overexpressed ion channels has not been extensively evaluated in breast cancer. Here we define the overexpression of transient receptor potential vanilloid 4 (TRPV4) in a subgroup of breast cancers of the basal molecular subtype. We also report that pharmacological activation of TRPV4 with GSK1016790A reduced viability of two basal breast cancer cell lines with pronounced endogenous overexpression of TRPV4, MDA-MB-468 and HCC1569. Pharmacological activation of TRPV4 produced pronounced cell death through two mechanisms: apoptosis and oncosis in MDA-MB-468 cells. Apoptosis was associated with PARP-1 cleavage and oncosis was associated with a rapid decline in intracellular ATP levels, which was a consequence of, rather than the cause of, the intracellular ion increase. TRPV4 activation also resulted in reduced tumour growth in vivo. These studies define a novel therapeutic strategy for breast cancers that overexpress specific calcium permeable plasmalemmal ion channels with available selective pharmacological activators.
Collapse
|
10
|
Yan Z, Liu Y, Wei Y, Zhao N, Zhang Q, Wu C, Chang Z, Xu Y. The functional consequences and prognostic value of dosage sensitivity in ovarian cancer. MOLECULAR BIOSYSTEMS 2017; 13:380-391. [PMID: 28067383 DOI: 10.1039/c6mb00625f] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Copy number alteration (CNA) represents an important class of genetic variations that may contribute to tumorigenesis, tumor growth and metastatic spread. CNA can directly affect the expression of genes within the CNA regions; however, genes within the CNA regions exhibit heterogeneity in gene dosage sensitivity. In this study, a computational framework was built to identify 1170 dosage-sensitive genes (DSGs) and 1215 dosage-resistant genes (DRGs) that were related to ovarian serous cystadenocarcinoma (OV) through the association between CNA and gene expression. To analyze the different functions of the genes within the two groups, the functional annotation results indicated that DRGs were involved in cancer-related processes like immune response, cell death and apoptosis, while DSGs were enriched in essential processes like the cell cycle and the DNA metabolic process. Meanwhile, two three-dimensional regulatory networks for differentially expressed miRNAs, differentially expressed transcription factors (TFs) and DSGs or DRGs were constructed based on feed-forward loops. We identified key regulators (such as miR-16-5p, miR-98-5p, MYB and HOXA5) and cancer prognosis-related network motifs (such as miR-98-5p-HOXA5-TP53 and miR-16-5p-MYB-IGF1R) after the analysis of network topological features. Our results lead us to speculate that these genes and associated regulators may be potential mechanistic biomarkers for tumorigenesis and progression of cancer. Research on the network characteristics and the role of feed-forward loops in OV tumorigenesis and development could lead to feasible suggestions for the prevention and early diagnosis of OV, which will shed light on understanding the functional mechanism of CNA in cancer.
Collapse
Affiliation(s)
- Zichuang Yan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Yongjing Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Yunzhen Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Ning Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Qiang Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Cheng Wu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Zhiqiang Chang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Yan Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| |
Collapse
|
11
|
Cava C, Bertoli G, Castiglioni I. Integrating genetics and epigenetics in breast cancer: biological insights, experimental, computational methods and therapeutic potential. BMC SYSTEMS BIOLOGY 2015; 9:62. [PMID: 26391647 PMCID: PMC4578257 DOI: 10.1186/s12918-015-0211-x] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Accepted: 09/15/2015] [Indexed: 12/11/2022]
Abstract
BACKGROUND Development of human cancer can proceed through the accumulation of different genetic changes affecting the structure and function of the genome. Combined analyses of molecular data at multiple levels, such as DNA copy-number alteration, mRNA and miRNA expression, can clarify biological functions and pathways deregulated in cancer. The integrative methods that are used to investigate these data involve different fields, including biology, bioinformatics, and statistics. RESULTS These methodologies are presented in this review, and their implementation in breast cancer is discussed with a focus on integration strategies. We report current applications, recent studies and interesting results leading to the identification of candidate biomarkers for diagnosis, prognosis, and therapy in breast cancer by using both individual and combined analyses. CONCLUSION This review presents a state of art of the role of different technologies in breast cancer based on the integration of genetics and epigenetics, and shares some issues related to the new opportunities and challenges offered by the application of such integrative approaches.
Collapse
Affiliation(s)
- Claudia Cava
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), Milan, Italy.
| | - Gloria Bertoli
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), Milan, Italy.
| | - Isabella Castiglioni
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), Milan, Italy.
| |
Collapse
|
12
|
K-Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data. BIOMED RESEARCH INTERNATIONAL 2015; 2015:918954. [PMID: 26339652 PMCID: PMC4538770 DOI: 10.1155/2015/918954] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Accepted: 12/18/2014] [Indexed: 01/23/2023]
Abstract
With modern technologies such as microarray, deep sequencing, and liquid chromatography-mass spectrometry (LC-MS), it is possible to measure the expression levels of thousands of genes/proteins simultaneously to unravel important biological processes. A very first step towards elucidating hidden patterns and understanding the massive data is the application of clustering techniques. Nonlinear relations, which were mostly unutilized in contrast to linear correlations, are prevalent in high-throughput data. In many cases, nonlinear relations can model the biological relationship more precisely and reflect critical patterns in the biological systems. Using the general dependency measure, Distance Based on Conditional Ordered List (DCOL) that we introduced before, we designed the nonlinear K-profiles clustering method, which can be seen as the nonlinear counterpart of the K-means clustering algorithm. The method has a built-in statistical testing procedure that ensures genes not belonging to any cluster do not impact the estimation of cluster profiles. Results from extensive simulation studies showed that K-profiles clustering not only outperformed traditional linear K-means algorithm, but also presented significantly better performance over our previous General Dependency Hierarchical Clustering (GDHC) algorithm. We further analyzed a gene expression dataset, on which K-profile clustering generated biologically meaningful results.
Collapse
|
13
|
Identification of Novel Breast Cancer Subtype-Specific Biomarkers by Integrating Genomics Analysis of DNA Copy Number Aberrations and miRNA-mRNA Dual Expression Profiling. BIOMED RESEARCH INTERNATIONAL 2015; 2015:746970. [PMID: 25961039 PMCID: PMC4413257 DOI: 10.1155/2015/746970] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Revised: 09/15/2014] [Accepted: 09/22/2014] [Indexed: 12/11/2022]
Abstract
Breast cancer is a heterogeneous disease with well-defined molecular subtypes. Currently, comparative genomic hybridization arrays (aCGH) techniques have been developed rapidly, and recent evidences in studies of breast cancer suggest that tumors within gene expression subtypes share similar DNA copy number aberrations (CNA) which can be used to further subdivide subtypes. Moreover, subtype-specific miRNA expression profiles are also proposed as novel signatures for breast cancer classification. The identification of mRNA or miRNA expression-based breast cancer subtypes is considered an instructive means of prognosis. Here, we conducted an integrated analysis based on copy number aberrations data and miRNA-mRNA dual expression profiling data to identify breast cancer subtype-specific biomarkers. Interestingly, we found a group of genes residing in subtype-specific CNA regions that also display the corresponding changes in mRNAs levels and their target miRNAs' expression. Among them, the predicted direct correlation of BRCA1-miR-143-miR-145 pairs was selected for experimental validation. The study results indicated that BRCA1 positively regulates miR-143-miR-145 expression and miR-143-miR-145 can serve as promising novel biomarkers for breast cancer subtyping. In our integrated genomics analysis and experimental validation, a new frame to predict candidate biomarkers of breast cancer subtype is provided and offers assistance in order to understand the potential disease etiology of the breast cancer subtypes.
Collapse
|
14
|
|
15
|
Kristensen VN, Lingjærde OC, Russnes HG, Vollan HKM, Frigessi A, Børresen-Dale AL. Principles and methods of integrative genomic analyses in cancer. Nat Rev Cancer 2014; 14:299-313. [PMID: 24759209 DOI: 10.1038/nrc3721] [Citation(s) in RCA: 249] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Combined analyses of molecular data, such as DNA copy-number alteration, mRNA and protein expression, point to biological functions and molecular pathways being deregulated in multiple cancers. Genomic, metabolomic and clinical data from various solid cancers and model systems are emerging and can be used to identify novel patient subgroups for tailored therapy and monitoring. The integrative genomics methodologies that are used to interpret these data require expertise in different disciplines, such as biology, medicine, mathematics, statistics and bioinformatics, and they can seem daunting. The objectives, methods and computational tools of integrative genomics that are available to date are reviewed here, as is their implementation in cancer research.
Collapse
Affiliation(s)
- Vessela N Kristensen
- 1] Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway. [2] K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0313 Oslo, Norway. [3] Department of Clinical Molecular Oncology, Division of Medicine, Akershus University Hospital, 1478 Ahus, Norway
| | - Ole Christian Lingjærde
- 1] K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0313 Oslo, Norway. [2] Division for Biomedical Informatics, Department of Computer Science, University of Oslo, 0316 Oslo, Norway
| | - Hege G Russnes
- 1] Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway. [2] K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0313 Oslo, Norway. [3] Department of Pathology, Oslo University Hospital, 0450 Oslo, Norway
| | - Hans Kristian M Vollan
- 1] Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway. [2] K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0313 Oslo, Norway. [3] Department of Oncology, Division of Cancer, Surgery and Transplantation, Oslo University Hospital, 0450 Oslo, Norway
| | - Arnoldo Frigessi
- 1] Statistics for Innovation, Norwegian Computing Center, 0314 Oslo, Norway. [2] Department of Biostatistics, Institute of Basic Medical Sciences, University of Oslo, PO Box 1122 Blindern, 0317 Oslo, Norway
| | - Anne-Lise Børresen-Dale
- 1] Department of Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway. [2] K.G. Jebsen Centre for Breast Cancer Research, Institute for Clinical Medicine, Faculty of Medicine, University of Oslo, 0313 Oslo, Norway
| |
Collapse
|
16
|
Integrative genomics with mediation analysis in a survival context. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2013; 2013:413783. [PMID: 24454535 PMCID: PMC3878392 DOI: 10.1155/2013/413783] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Accepted: 09/23/2013] [Indexed: 12/25/2022]
Abstract
DNA copy number aberrations (DCNA) and subsequent altered gene expression profiles may have a major impact on tumor initiation, on development, and eventually on recurrence and cancer-specific mortality. However, most methods employed in integrative genomic analysis of the two biological levels, DNA and RNA, do not consider survival time. In the present note, we propose the adoption of a survival analysis-based framework for the integrative analysis of DCNA and mRNA levels to reveal their implication on patient clinical outcome with the prerequisite that the effect of DCNA on survival is mediated by mRNA levels. The specific aim of the paper is to offer a feasible framework to test the DCNA-mRNA-survival pathway. We provide statistical inference algorithms for mediation based on asymptotic results. Furthermore, we illustrate the applicability of the method in an integrative genomic analysis setting by using a breast cancer data set consisting of 141 invasive breast tumors. In addition, we provide implementation in R.
Collapse
|
17
|
Johansson I, Ringnér M, Hedenfalk I. The landscape of candidate driver genes differs between male and female breast cancer. PLoS One 2013; 8:e78299. [PMID: 24194916 PMCID: PMC3806766 DOI: 10.1371/journal.pone.0078299] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2013] [Accepted: 09/17/2013] [Indexed: 01/05/2023] Open
Abstract
The rapidly growing collection of diverse genome-scale data from multiple tumor types sheds light on various aspects of the underlying tumor biology. With the objective to identify genes of importance for breast tumorigenesis in men and to enable comparisons with genes important for breast cancer development in women, we applied the computational framework COpy Number and EXpression In Cancer (CONEXIC) to detect candidate driver genes among all altered passenger genes. Unique to this approach is that each driver gene is associated with several gene modules that are believed to be altered by the driver. Thirty candidate drivers were found in the male breast cancers and 67 in the female breast cancers. We identified many known drivers of breast cancer and other types of cancer, in the female dataset (e.g. GATA3, CCNE1, GRB7, CDK4). In contrast, only three known cancer genes were found among male breast cancers; MAP2K4, LHP, and ZNF217. Many of the candidate drivers identified are known to be involved in processes associated with tumorigenesis, including proliferation, invasion and differentiation. One of the modules identified in male breast cancer was regulated by THY1, a gene involved in invasion and related to epithelial-mesenchymal transition. Furthermore, men with THY1 positive breast cancers had significantly inferior survival. THY1 may thus be a promising novel prognostic marker for male breast cancer. Another module identified among male breast cancers, regulated by SPAG5, was closely associated with proliferation. Our data indicate that male and female breast cancers display highly different landscapes of candidate driver genes, as only a few genes were found in common between the two. Consequently, the pathobiology of male breast cancer may differ from that of female breast cancer and can be associated with differences in prognosis; men diagnosed with breast cancer may consequently require different management and treatment strategies than women.
Collapse
Affiliation(s)
- Ida Johansson
- Division of Oncology, Department of Clinical Sciences, Lund and CREATE Health Strategic Center for Translational Cancer Research, Lund University, Lund, Sweden
- * E-mail:
| | - Markus Ringnér
- Division of Oncology, Department of Clinical Sciences, Lund and CREATE Health Strategic Center for Translational Cancer Research, Lund University, Lund, Sweden
| | - Ingrid Hedenfalk
- Division of Oncology, Department of Clinical Sciences, Lund and CREATE Health Strategic Center for Translational Cancer Research, Lund University, Lund, Sweden
| |
Collapse
|
18
|
Azad AKM, Lee H. Voting-based cancer module identification by combining topological and data-driven properties. PLoS One 2013; 8:e70498. [PMID: 23940583 PMCID: PMC3734239 DOI: 10.1371/journal.pone.0070498] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2012] [Accepted: 06/19/2013] [Indexed: 12/19/2022] Open
Abstract
Recently, computational approaches integrating copy number aberrations (CNAs) and gene expression (GE) have been extensively studied to identify cancer-related genes and pathways. In this work, we integrate these two data sets with protein-protein interaction (PPI) information to find cancer-related functional modules. To integrate CNA and GE data, we first built a gene-gene relationship network from a set of seed genes by enumerating all types of pairwise correlations, e.g. GE-GE, CNA-GE, and CNA-CNA, over multiple patients. Next, we propose a voting-based cancer module identification algorithm by combining topological and data-driven properties (VToD algorithm) by using the gene-gene relationship network as a source of data-driven information, and the PPI data as topological information. We applied the VToD algorithm to 266 glioblastoma multiforme (GBM) and 96 ovarian carcinoma (OVC) samples that have both expression and copy number measurements, and identified 22 GBM modules and 23 OVC modules. Among 22 GBM modules, 15, 12, and 20 modules were significantly enriched with cancer-related KEGG, BioCarta pathways, and GO terms, respectively. Among 23 OVC modules, 19, 18, and 23 modules were significantly enriched with cancer-related KEGG, BioCarta pathways, and GO terms, respectively. Similarly, we also observed that 9 and 2 GBM modules and 15 and 18 OVC modules were enriched with cancer gene census (CGC) and specific cancer driver genes, respectively. Our proposed module-detection algorithm significantly outperformed other existing methods in terms of both functional and cancer gene set enrichments. Most of the cancer-related pathways from both cancer data sets found in our algorithm contained more than two types of gene-gene relationships, showing strong positive correlations between the number of different types of relationship and CGC enrichment -values (0.64 for GBM and 0.49 for OVC). This study suggests that identified modules containing both expression changes and CNAs can explain cancer-related activities with greater insights.
Collapse
Affiliation(s)
- A. K. M. Azad
- School of Information and Communications, Gwangju Institute of Science and Technology, Gwangju, South Korea
| | - Hyunju Lee
- School of Information and Communications, Gwangju Institute of Science and Technology, Gwangju, South Korea
- * E-mail:
| |
Collapse
|
19
|
Leday GGR, van der Vaart AW, van Wieringen WN, van de Wiel MA. Modeling association between DNA copy number and gene expression with constrained piecewise linear regression splines. Ann Appl Stat 2013. [DOI: 10.1214/12-aoas605] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
20
|
Leday GG, van de Wiel MA. PLRS: a flexible tool for the joint analysis of DNA copy number and mRNA expression data. Bioinformatics 2013; 29:1081-2. [DOI: 10.1093/bioinformatics/btt082] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
21
|
Mesquita B, Lopes P, Rodrigues A, Pereira D, Afonso M, Leal C, Henrique R, Lind GE, Jerónimo C, Lothe RA, Teixeira MR. Frequent copy number gains at 1q21 and 1q32 are associated with overexpression of the ETS transcription factors ETV3 and ELF3 in breast cancer irrespective of molecular subtypes. Breast Cancer Res Treat 2013; 138:37-45. [PMID: 23329352 DOI: 10.1007/s10549-013-2408-2] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2012] [Accepted: 01/07/2013] [Indexed: 01/03/2023]
Abstract
Several ETS transcription factors are involved in the pathogenesis of human cancers by different mechanisms. As gene copy number gain/amplification is an alternative mechanism of oncogenic activation and 1q gain is the most common copy number change in breast carcinoma, we investigated how that genomic change impacts in the expression of the three 1q ETS family members ETV3, ELK4, and ELF3. We have first evaluated 141 breast carcinomas for genome-wide copy number changes by chromosomal CGH and showed that 1q21 and 1q32 were the two chromosome bands with most frequent genomic copy number gains. Second, we confirmed by FISH with locus-specific BAC clones that cases showing 1q gain/amplification by CGH showed copy number increase of the ETS genes ETV3 (located in 1q21~23), ELF3, and ELK4 (both in 1q32). Third, gene expression levels of the three 1q ETS genes, as well as their potential targets MYC and CRISP3, were evaluated by quantitative real-time PCR. We here show for the first time that the most common genomic copy number gains in breast cancer, 1q21 and 1q32, are associated with overexpression of the ETS transcription factors ETV3 and ELF3 (but not ELK4) at these loci irrespective of molecular subtypes. Among the three 1q ETS genes, ELF3 has a relevant role in breast carcinogenesis and is also the most likely target of the 1q copy number increase. The basal-like molecular subtype presented the worst prognosis regarding disease-specific survival, but no additional prognostic value was found for 1q copy number status or ELF3 expression. In addition, we show that there is a correlation between the expression of the oncogene MYC, irrespectively of copy number gain at its loci in 8q24, and the expression of both the transcriptional repressor ETV3 and the androgen respondent ELK4.
Collapse
Affiliation(s)
- Bárbara Mesquita
- Department of Genetics, Portuguese Oncology Institute, Rua Dr. António Bernardino de Almeida, 4200-072 Porto, Portugal
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Peters AA, Simpson PT, Bassett JJ, Lee JM, Da Silva L, Reid LE, Song S, Parat MO, Lakhani SR, Kenny PA, Roberts-Thomson SJ, Monteith GR. Calcium Channel TRPV6 as a Potential Therapeutic Target in Estrogen Receptor–Negative Breast Cancer. Mol Cancer Ther 2012; 11:2158-68. [DOI: 10.1158/1535-7163.mct-11-0965] [Citation(s) in RCA: 96] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
23
|
van Wieringen WN, Unger K, Leday GGR, Krijgsman O, de Menezes RX, Ylstra B, van de Wiel MA. Matching of array CGH and gene expression microarray features for the purpose of integrative genomic analyses. BMC Bioinformatics 2012; 13:80. [PMID: 22559006 PMCID: PMC3475006 DOI: 10.1186/1471-2105-13-80] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2011] [Accepted: 03/22/2012] [Indexed: 11/12/2022] Open
Abstract
Background An increasing number of genomic studies interrogating more than one molecular level is published. Bioinformatics follows biological practice, and recent years have seen a surge in methodology for the integrative analysis of genomic data. Often such analyses require knowledge of which elements of one platform link to those of another. Although important, many integrative analyses do not or insufficiently detail the matching of the platforms. Results We describe, illustrate and discuss six matching procedures. They are implemented in the R-package sigaR (available from Bioconductor). The principles underlying the presented matching procedures are generic, and can be combined to form new matching approaches or be applied to the matching of other platforms. Illustration of the matching procedures on a variety of data sets reveals how the procedures differ in the use of the available data, and may even lead to different results for individual genes. Conclusions Matching of data from multiple genomics platforms is an important preprocessing step for many integrative bioinformatic analysis, for which we present six generic procedures, both old and new. They have been implemented in the R-package sigaR, available from Bioconductor.
Collapse
Affiliation(s)
- Wessel N van Wieringen
- Department of Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands.
| | | | | | | | | | | | | |
Collapse
|
24
|
Cheng Q, Chang JT, Geradts J, Neckers LM, Haystead T, Spector NL, Lyerly HK. Amplification and high-level expression of heat shock protein 90 marks aggressive phenotypes of human epidermal growth factor receptor 2 negative breast cancer. Breast Cancer Res 2012; 14:R62. [PMID: 22510516 PMCID: PMC3446397 DOI: 10.1186/bcr3168] [Citation(s) in RCA: 127] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2011] [Revised: 02/28/2012] [Accepted: 04/17/2012] [Indexed: 12/31/2022] Open
Abstract
Introduction Although human epidermal growth factor receptor 2 (HER2) positive or estrogen receptor (ER) positive breast cancers are treated with clinically validated anti-HER2 or anti-estrogen therapies, intrinsic and acquired resistance to these therapies appears in a substantial proportion of breast cancer patients and new therapies are needed. Identification of additional molecular factors, especially those characterized by aggressive behavior and poor prognosis, could prioritize interventional opportunities to improve the diagnosis and treatment of breast cancer. Methods We compiled a collection of 4,010 breast tumor gene expression data derived from 23 datasets that have been posted on the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database. We performed a genome-scale survival analysis using Cox-regression survival analyses, and validated using Kaplan-Meier Estimates survival and Cox Proportional-Hazards Regression survival analyses. We conducted a genome-scale analysis of chromosome alteration using 481 breast cancer samples obtained from The Cancer Genome Atlas (TCGA), from which combined expression and copy number data were available. We assessed the correlation between somatic copy number alterations and gene expression using analysis of variance (ANOVA). Results Increased expression of each of the heat shock protein (HSP) 90 isoforms, as well as HSP transcriptional factor 1 (HSF1), was correlated with poor prognosis in different subtypes of breast cancer. High-level expression of HSP90AA1 and HSP90AB1, two cytoplasmic HSP90 isoforms, was driven by chromosome coding region amplifications and were independent factors that led to death from breast cancer among patients with triple-negative (TNBC) and HER2-/ER+ subtypes, respectively. Furthermore, amplification of HSF1 was correlated with higher HSP90AA1 and HSP90AB1 mRNA expression among the breast cancer cells without amplifications of these two genes. A collection of HSP90AA1, HSP90AB1 and HSF1 amplifications defined a subpopulation of breast cancer with up-regulated HSP90 gene expression, and up-regulated HSP90 expression independently elevated the risk of recurrence of TNBC and poor prognosis of HER2-/ER+ breast cancer. Conclusions Up-regulated HSP90 mRNA expression represents a confluence of genomic vulnerability that renders HER2 negative breast cancers more aggressive, resulting in poor prognosis. Targeting breast cancer with up-regulated HSP90 may potentially improve the effectiveness of clinical intervention in this disease.
Collapse
Affiliation(s)
- Qing Cheng
- Department of Surgery, Duke University Medical Center, Box 2606, 203 Research Drive, Durham, NC 27710, USA.
| | | | | | | | | | | | | |
Collapse
|
25
|
Lahti L, Schäfer M, Klein HU, Bicciato S, Dugas M. Cancer gene prioritization by integrative analysis of mRNA expression and DNA copy number data: a comparative review. Brief Bioinform 2012; 14:27-35. [PMID: 22441573 PMCID: PMC3548603 DOI: 10.1093/bib/bbs005] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
A variety of genome-wide profiling techniques are available to investigate complementary aspects of genome structure and function. Integrative analysis of heterogeneous data sources can reveal higher level interactions that cannot be detected based on individual observations. A standard integration task in cancer studies is to identify altered genomic regions that induce changes in the expression of the associated genes based on joint analysis of genome-wide gene expression and copy number profiling measurements. In this review, we highlight common approaches to genomic data integration and provide a transparent benchmarking procedure to quantitatively compare method performances in cancer gene prioritization. Algorithms, data sets and benchmarking results are available at http://intcomp.r-forge.r-project.org.
Collapse
Affiliation(s)
- Leo Lahti
- Wageningen University, Laboratory of Microbiology, 6703HB Wageningen, Netherlands.
| | | | | | | | | |
Collapse
|
26
|
Arneson N, Moreno J, Iakovlev V, Ghazani A, Warren K, McCready D, Jurisica I, Done SJ. Comparison of whole genome amplification methods for analysis of DNA extracted from microdissected early breast lesions in formalin-fixed paraffin-embedded tissue. ISRN ONCOLOGY 2012; 2012:710692. [PMID: 22530150 PMCID: PMC3317021 DOI: 10.5402/2012/710692] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2011] [Accepted: 11/09/2011] [Indexed: 12/03/2022]
Abstract
To understand cancer progression, it is desirable to study the earliest stages of its development, which are often microscopic lesions. Array comparative genomic hybridization (aCGH) is a valuable high-throughput molecular approach for discovering DNA copy number changes; however, it requires a relatively large amount of DNA, which is difficult to obtain from microdissected lesions. Whole genome amplification (WGA) methods were developed to increase DNA quantity; however their reproducibility, fidelity, and suitability for formalin-fixed paraffin-embedded (FFPE) samples are questioned. Using aCGH analysis, we compared two widely used approaches for WGA: single cell comparative genomic hybridization protocol (SCOMP) and degenerate oligonucleotide primed PCR (DOP-PCR). Cancer cell line and microdissected FFPE breast cancer DNA samples were amplified by the two WGA methods and subjected to aCGH. The genomic profiles of amplified DNA were compared with those of non-amplified controls by four analytic methods and validated by quantitative PCR (Q-PCR). We found that SCOMP-amplified samples had close similarity to non-amplified controls with concordance rates close to those of reference tests, while DOP-amplified samples had a statistically significant amount of changes. SCOMP is able to amplify small amounts of DNA extracted from FFPE samples and provides quality of aCGH data similar to non-amplified samples.
Collapse
Affiliation(s)
- Nona Arneson
- Division of Applied Molecular Oncology, Ontario Cancer Institute, Princess Margaret Hospital, Toronto, ON, Canada M5G 2M9
| | | | | | | | | | | | | | | |
Collapse
|
27
|
Ried T, Hu Y, Difilippantonio MJ, Ghadimi BM, Grade M, Camps J. The consequences of chromosomal aneuploidy on the transcriptome of cancer cells. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2012; 1819:784-93. [PMID: 22426433 DOI: 10.1016/j.bbagrm.2012.02.020] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Received: 01/04/2012] [Revised: 02/28/2012] [Accepted: 02/29/2012] [Indexed: 01/09/2023]
Abstract
Chromosomal aneuploidies are a defining feature of carcinomas, i.e., tumors of epithelial origin. Such aneuploidies result in tumor specific genomic copy number alterations. The patterns of genomic imbalances are tumor specific, and to a certain extent specific for defined stages of tumor development. Genomic imbalances occur already in premalignant precursor lesions, i.e., before the transition to invasive disease, and their distribution is maintained in metastases, and in cell lines derived from primary tumors. These observations are consistent with the interpretation that tumor specific genomic imbalances are drivers of malignant transformation. Naturally, this precipitates the question of how such imbalances influence the expression of resident genes. A number of laboratories have systematically integrated copy number alterations with gene expression changes in primary tumors and metastases, cell lines, and experimental models of aneuploidy to address the question as to whether genomic imbalances deregulate the expression of one or few key genes, or rather affect the cancer transcriptome more globally. The majority of these studies showed that gene expression levels follow genomic copy number. Therefore, gross genomic copy number changes, including aneuploidies of entire chromosome arms and chromosomes, result in a massive deregulation of the transcriptome of cancer cells. This article is part of a Special Issue entitled: Chromatin in time and space.
Collapse
Affiliation(s)
- Thomas Ried
- Genetics Branch, Center for Cancer Research, National Cancer Institute/NIH, USA.
| | | | | | | | | | | |
Collapse
|
28
|
Yakhini Z, Jurisica I. Cancer computational biology. BMC Bioinformatics 2011; 12:120. [PMID: 21521513 PMCID: PMC3111371 DOI: 10.1186/1471-2105-12-120] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2011] [Accepted: 04/26/2011] [Indexed: 01/18/2023] Open
|