51
|
Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer. Nat Commun 2021; 12:124. [PMID: 33402734 PMCID: PMC7785750 DOI: 10.1038/s41467-020-20430-7] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Accepted: 12/02/2020] [Indexed: 01/08/2023] Open
Abstract
High-dimensional multi-omics data are now standard in biology. They can greatly enhance our understanding of biological systems when effectively integrated. To achieve proper integration, joint Dimensionality Reduction (jDR) methods are among the most efficient approaches. However, several jDR methods are available, urging the need for a comprehensive benchmark with practical guidelines. We perform a systematic evaluation of nine representative jDR methods using three complementary benchmarks. First, we evaluate their performances in retrieving ground-truth sample clustering from simulated multi-omics datasets. Second, we use TCGA cancer data to assess their strengths in predicting survival, clinical annotations and known pathways/biological processes. Finally, we assess their classification of multi-omics single-cell data. From these in-depth comparisons, we observe that intNMF performs best in clustering, while MCIA offers an effective behavior across many contexts. The code developed for this benchmark study is implemented in a Jupyter notebook—multi-omics mix (momix)—to foster reproducibility, and support users and future developers. Advances in omics technology have resulted in the generation of multi-view data for cancer samples. Here, the authors compare dimensionality reduction techniques using simulated and TCGA data and identify the features of the methods with superior performance.
Collapse
|
52
|
Castelletti F, La Rocca L, Peluso S, Stingo FC, Consonni G. Bayesian learning of multiple directed networks from observational data. Stat Med 2020; 39:4745-4766. [PMID: 32969059 DOI: 10.1002/sim.8751] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 06/29/2020] [Accepted: 08/25/2020] [Indexed: 11/08/2022]
Abstract
Graphical modeling represents an established methodology for identifying complex dependencies in biological networks, as exemplified in the study of co-expression, gene regulatory, and protein interaction networks. The available observations often exhibit an intrinsic heterogeneity, which impacts on the network structure through the modification of specific pathways for distinct groups, such as disease subtypes. We propose to infer the resulting multiple graphs jointly in order to benefit from potential similarities across groups; on the other hand our modeling framework is able to accommodate group idiosyncrasies. We consider directed acyclic graphs (DAGs) as network structures, and develop a Bayesian method for structural learning of multiple DAGs. We explicitly account for Markov equivalence of DAGs, and propose a suitable prior on the collection of graph spaces that induces selective borrowing strength across groups. The resulting inference allows in particular to compute the posterior probability of edge inclusion, a useful summary for representing flow directions within the network. Finally, we detail a simulation study addressing the comparative performance of our method, and present an analysis of two protein networks together with a substantive interpretation of our findings.
Collapse
Affiliation(s)
- Federico Castelletti
- Department of Statistical Sciences, Università Cattolica del Sacro Cuore, Milan, Italy
| | - Luca La Rocca
- Department of Physics, Informatics and Mathematics, Università degli Studi di Modena e Reggio Emilia, Modena, Italy
| | - Stefano Peluso
- Department of Statistical Sciences, Università Cattolica del Sacro Cuore, Milan, Italy
| | - Francesco C Stingo
- Department of Statistics, Computer Science, Applications "G. Parenti", Università degli Studi di Firenze, Florence, Italy
| | - Guido Consonni
- Department of Statistical Sciences, Università Cattolica del Sacro Cuore, Milan, Italy
| |
Collapse
|
53
|
Park M, Kim D, Moon K, Park T. Integrative Analysis of Multi-Omics Data Based on Blockwise Sparse Principal Components. Int J Mol Sci 2020; 21:E8202. [PMID: 33147797 PMCID: PMC7663540 DOI: 10.3390/ijms21218202] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Revised: 10/27/2020] [Accepted: 10/31/2020] [Indexed: 01/14/2023] Open
Abstract
The recent development of high-throughput technology has allowed us to accumulate vast amounts of multi-omics data. Because even single omics data have a large number of variables, integrated analysis of multi-omics data suffers from problems such as computational instability and variable redundancy. Most multi-omics data analyses apply single supervised analysis, repeatedly, for dimensional reduction and variable selection. However, these approaches cannot avoid the problems of redundancy and collinearity of variables. In this study, we propose a novel approach using blockwise component analysis. This would solve the limitations of current methods by applying variable clustering and sparse principal component (sPC) analysis. Our approach consists of two stages. The first stage identifies homogeneous variable blocks, and then extracts sPCs, for each omics dataset. The second stage merges sPCs from each omics dataset, and then constructs a prediction model. We also propose a graphical method showing the results of sparse PCA and model fitting, simultaneously. We applied the proposed methodology to glioblastoma multiforme data from The Cancer Genome Atlas. The comparison with other existing approaches showed that our proposed methodology is more easily interpretable than other approaches, and has comparable predictive power, with a much smaller number of variables.
Collapse
Affiliation(s)
- Mira Park
- Department of Preventive Medicine, Eulji University, Daejeon 34824, Korea;
| | - Doyoen Kim
- Department of Statistics, Korea University, Seoul 02841, Korea; (D.K.); (K.M.)
| | - Kwanyoung Moon
- Department of Statistics, Korea University, Seoul 02841, Korea; (D.K.); (K.M.)
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
54
|
A system-level approach identifies HIF-2α as a critical regulator of chondrosarcoma progression. Nat Commun 2020; 11:5023. [PMID: 33024108 PMCID: PMC7538956 DOI: 10.1038/s41467-020-18817-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2017] [Accepted: 09/11/2020] [Indexed: 12/18/2022] Open
Abstract
Chondrosarcomas, malignant cartilaginous neoplasms, are capable of transitioning to highly aggressive, metastatic, and treatment-refractory states, resulting in significant patient mortality. Here, we aim to uncover the transcriptional program directing such tumor progression in chondrosarcomas. We conduct weighted correlation network analysis to extract a characteristic gene module underlying chondrosarcoma malignancy. Hypoxia-inducible factor-2α (HIF-2α, encoded by EPAS1) is identified as an upstream regulator that governs the malignancy gene module. HIF-2α is upregulated in high-grade chondrosarcoma biopsies and EPAS1 gene amplification is associated with poor prognosis in chondrosarcoma patients. Using tumor xenograft mouse models, we demonstrate that HIF-2α confers chondrosarcomas the capacities required for tumor growth, local invasion, and metastasis. Meanwhile, pharmacological inhibition of HIF-2α, in conjunction with the chemotherapy agents, synergistically enhances chondrosarcoma cell apoptosis and abolishes malignant signatures of chondrosarcoma in mice. We expect that our insights into the pathogenesis of chondrosarcoma will provide guidelines for the development of molecular targeted therapeutics for chondrosarcoma. Chondrosarcomas are frequently aggressive, understanding the transcriptional changes associated with progression may help in developing new treatments. Here, the authors show that HIF-2α is increased in expression on progression and pharmacological inhibition of the protein together with chemotherapy is a useful strategy for controlling tumour growth in mice.
Collapse
|
55
|
Targeted sequencing of crucial cancer causing genes of breast cancer in Saudi patients. Saudi J Biol Sci 2020; 27:2651-2659. [PMID: 32994724 PMCID: PMC7499116 DOI: 10.1016/j.sjbs.2020.05.047] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2019] [Revised: 05/26/2020] [Accepted: 05/29/2020] [Indexed: 11/20/2022] Open
Abstract
Breast cancer is the most common cancer among women worldwide, causing 15% of cancer-related deaths among women. Breast cancer incidence rate is increasing in most countries. In Saudi Arabia, breast cancer constitutes nearly 22% of the newly diagnosed cancer cases in women. Breast cancer incidence in the women population of Saudi Arabia is 25.9%, with 18.2% mortality. In this study, targeted sequencing of 164 selected genes was performed on germline and somatic DNA derived from the blood and tissue samples of 50 breast cancer patients using customized panel on Ion torrent platform. This study focused on the identification of genetic variations of different cancer-causing genes, raising the hope for identification of personalized prognosis. After final filtration and validation, we found protein-truncating, non-synonymous missense, and splice site mutations in the known susceptibility genes for breast cancer. We identified a total of 14 point mutations and one deletion in BRCA1, BRCA2, and RAD50 genes from the BRCA panel analysis of breast cancer samples. In the customized panel analysis, we identified 37 potential mutations in 25 breast cancer risk associated genes. Out of these, most mutations were observed in TP53. After filtration, we observed 7 mutations in TP53 genes (n = 7:- one stop gain (p.R81X), four non-synonymous (p.R81X, p.Y88C, p.R141H, and p.V25D), and two deletions (c.59delC and c.327delC)). Among the mutations detected in our study, TP53 (p.R81X), VHL (p.E52X), and BRCA2 (p.K3326X) mutations, which lead to an aberrant transcript with a premature stop codon, were reported for the first time in breast cancer patients from Saudi Arabia. Our study will help in identifying the damaging mutations and predisposing genes in Saudi breast cancer patients.
Collapse
|
56
|
Addiction to protein kinase Cɩ due to PRKCI gene amplification can be exploited for an aptamer-based targeted therapy in ovarian cancer. Signal Transduct Target Ther 2020; 5:140. [PMID: 32820156 PMCID: PMC7441162 DOI: 10.1038/s41392-020-0197-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 05/01/2020] [Accepted: 05/22/2020] [Indexed: 12/18/2022] Open
Abstract
PRKCI, the gene for protein kinase Cι (PKCι), is frequently amplified in ovarian cancer and recent studies have shown that PKCι participates in ovary tumorigenesis. However, it is unknown whether PKCι is differentially involved in the growth/survival between PRKCI-amplified and non-amplified ovarian cancer cells. In this study, we analyzed ovarian cancer patient dataset and revealed that PRKCI is the only PKC family member significantly amplified in ovarian cancer and PRKCI amplification is associated with higher PKCι expression. Using a panel of ovarian cancer cell lines, we found that abundance of PKCι is generally associated with PRKCI amplification. Interestingly, silencing PKCι led to apoptosis in PRKCI-amplified ovarian cancer cells but not in those without PRKCI amplification, thus indicating an oncogenic addiction to PKCɩ in PRKCI-amplified cells. Since small-molecule inhibitors characterized to selectively block atypical PKCs did not offer selectivity nor sensitivity in PRKCI-amplified ovarian cancer cells and were even cytotoxic to non-cancerous ovary surface or fallopian tube epithelial cells, we designed an EpCAM aptamer-PKCι siRNA chimera (EpCAM-siPKCι aptamer). EpCAM-siPKCι aptamer not only effectively induced apoptosis of PRKCI-amplified ovarian cancer cells but also greatly deterred intraperitoneal tumor development in xenograft mouse model. This study has demonstrated a precision medicine-based strategy to target a subset of ovarian cancer that contains PRKCI amplification and shown that the EpCAM aptamer-delivered PKCι siRNA may be used to suppress such tumors.
Collapse
|
57
|
Kaushik AC, Wang YJ, Wang X, Wei DQ. Irinotecan and vandetanib create synergies for treatment of pancreatic cancer patients with concomitant TP53 and KRAS mutations. Brief Bioinform 2020; 22:5879228. [PMID: 32743640 DOI: 10.1093/bib/bbaa149] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 06/01/2020] [Accepted: 06/14/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The most frequently mutated gene pairs in pancreatic adenocarcinoma (PAAD) are KRAS and TP53, and our goal is to illustrate the multiomics and molecular dynamics landscapes of KRAS/TP53 mutation and also to obtain prospective novel drugs for KRAS- and TP53-mutated PAAD patients. Moreover, we also made an attempt to discover the probable link amid KRAS and TP53 on the basis of the abovementioned multiomics data. METHOD We utilized TCGA & Cancer Cell Line Encyclopedia data for the analysis of KRAS/TP53 mutation in a multiomics manner. In addition to that, we performed molecular dynamics analysis of KRAS and TP53 to produce mechanistic descriptions of particular mutations and carcinogenesis. RESULT We discover that there is a significant difference in the genomics, transcriptomics, methylomics, and molecular dynamics pattern of KRAS and TP53 mutation from the matching wild type in PAAD, and the prognosis of pancreatic cancer is directly linked with a particular mutation of KRAS and protein stability. Screened drugs are potentially effective in PAAD patients. CONCLUSIONS KRAS and TP53 prognosis of PAAD is directly associated with a specific mutation of KRAS. Irinotecan and vandetanib are prospective drugs for PAAD patients with KRASG12Dmutation and TP53 mutation.
Collapse
Affiliation(s)
| | - Yan-Jing Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University
| | - Xiangeng Wang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University
| | - Dong-Qing Wei
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University
| |
Collapse
|
58
|
Swanson DM, Lien T, Bergholtz H, Sørlie T, Frigessi A. A Bayesian two-way latent structure model for genomic data integration reveals few pan-genomic cluster subtypes in a breast cancer cohort. Bioinformatics 2020; 35:4886-4897. [PMID: 31077301 DOI: 10.1093/bioinformatics/btz381] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Revised: 04/05/2019] [Accepted: 05/01/2019] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION Unsupervised clustering is important in disease subtyping, among having other genomic applications. As genomic data has become more multifaceted, how to cluster across data sources for more precise subtyping is an ever more important area of research. Many of the methods proposed so far, including iCluster and Cluster of Cluster Assignments (COCAs), make an unreasonable assumption of a common clustering across all data sources, and those that do not are fewer and tend to be computationally intensive. RESULTS We propose a Bayesian parametric model for integrative, unsupervised clustering across data sources. In our two-way latent structure model, samples are clustered in relation to each specific data source, distinguishing it from methods like COCAs and iCluster, but cluster labels have across-dataset meaning, allowing cluster information to be shared between data sources. A common scaling across data sources is not required, and inference is obtained by a Gibbs Sampler, which we improve with a warm start strategy and modified density functions to robustify and speed convergence. Posterior interpretation allows for inference on common clusterings occurring among subsets of data sources. An interesting statistical formulation of the model results in sampling from closed-form posteriors despite incorporation of a complex latent structure. We fit the model with Gaussian and more general densities, which influences the degree of across-dataset cluster label sharing. Uniquely among integrative clustering models, our formulation makes no nestedness assumptions of samples across data sources so that a sample missing data from one genomic source can be clustered according to its existing data sources. We apply our model to a Norwegian breast cancer cohort of ductal carcinoma in situ and invasive tumors, comprised of somatic copy-number alteration, methylation and expression datasets. We find enrichment in the Her2 subtype and ductal carcinoma among those observations exhibiting greater cluster correspondence across expression and CNA data. In general, there are few pan-genomic clusterings, suggesting that models assuming a common clustering across genomic data sources might yield misleading results. AVAILABILITY AND IMPLEMENTATION The model is implemented in an R package called twl ('two-way latent'), available on CRAN. Data for analysis are available within the R package. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David M Swanson
- Oslo Centre for Biostatistics and Epidemiology, Oslo University Hospital, Oslo, Norway
| | - Tonje Lien
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
| | - Helga Bergholtz
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
| | - Therese Sørlie
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway.,Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Arnoldo Frigessi
- Oslo Centre for Biostatistics and Epidemiology, Oslo University Hospital, Oslo, Norway.,Oslo Centre for Biostatistics and Epidemiology, University of Oslo, Oslo, Norway
| |
Collapse
|
59
|
A survey on single and multi omics data mining methods in cancer data classification. J Biomed Inform 2020; 107:103466. [DOI: 10.1016/j.jbi.2020.103466] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 05/01/2020] [Accepted: 05/31/2020] [Indexed: 01/09/2023]
|
60
|
Ruan P, Wang Y, Shen R, Wang S. Using association signal annotations to boost similarity network fusion. Bioinformatics 2020; 35:3718-3726. [PMID: 30863842 DOI: 10.1093/bioinformatics/btz124] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2018] [Revised: 01/17/2019] [Accepted: 02/15/2019] [Indexed: 01/13/2023] Open
Abstract
MOTIVATION Recent technology developments have made it possible to generate various kinds of omics data, which provides opportunities to better solve problems such as disease subtyping or disease mapping using more comprehensive omics data jointly. Among many developed data-integration methods, the similarity network fusion (SNF) method has shown a great potential to identify new disease subtypes through separating similar subjects using multi-omics data. SNF effectively fuses similarity networks with pairwise patient similarity measures from different types of omics data into one fused network using both shared and complementary information across multiple types of omics data. RESULTS In this article, we proposed an association-signal-annotation boosted similarity network fusion (ab-SNF) method, adding feature-level association signal annotations as weights aiming to up-weight signal features and down-weight noise features when constructing subject similarity networks to boost the performance in disease subtyping. In various simulation studies, the proposed ab-SNF outperforms the original SNF approach without weights. Most importantly, the improvement in the subtyping performance due to association-signal-annotation weights is amplified in the integration process. Applications to somatic mutation data, DNA methylation data and gene expression data of three cancer types from The Cancer Genome Atlas project suggest that the proposed ab-SNF method consistently identifies new subtypes in each cancer that more accurately predict patient survival and are more biologically meaningful. AVAILABILITY AND IMPLEMENTATION The R package abSNF is freely available for downloading from https://github.com/pfruan/abSNF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Peifeng Ruan
- Department of Statistics, Columbian College of Arts and Sciences, The George Washington University, Washington, DC, USA
| | - Ya Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Ronglai Shen
- Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Shuang Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
| |
Collapse
|
61
|
Beauvais M, Knoppers BM. When information is the treatment? Precision medicine in healthcare. Healthc Manage Forum 2020; 33:120-125. [PMID: 31505971 DOI: 10.1177/0840470419859017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Profoundly more data-intensive than conventional medicine, precision medicine's distinctive informational needs present new challenges for healthcare management. Data protection and privacy law are key determinants in precision medicine's future. This article examines legal and regulatory barriers to the incorporation of precision medicine into healthcare. Specific attention is paid to analyzing recent health privacy laws, court cases, and medical device regulations. Considering the challenges identified, recommendations and guidance are crafted for health leaders with reference to domestic and international initiatives.
Collapse
Affiliation(s)
- Michael Beauvais
- Centre of Genomics and Policy, McGill University, Montreal, Quebec, Canada
| | - Bartha Maria Knoppers
- Centre of Genomics and Policy, McGill University, Montreal, Quebec, Canada
- Faculty of Medicine, McGill University, Montreal, Quebec, Canada
| |
Collapse
|
62
|
Lemsara A, Ouadfel S, Fröhlich H. PathME: pathway based multi-modal sparse autoencoders for clustering of patient-level multi-omics data. BMC Bioinformatics 2020; 21:146. [PMID: 32299344 PMCID: PMC7161108 DOI: 10.1186/s12859-020-3465-2] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 03/23/2020] [Indexed: 02/08/2023] Open
Abstract
Background Recent years have witnessed an increasing interest in multi-omics data, because these data allow for better understanding complex diseases such as cancer on a molecular system level. In addition, multi-omics data increase the chance to robustly identify molecular patient sub-groups and hence open the door towards a better personalized treatment of diseases. Several methods have been proposed for unsupervised clustering of multi-omics data. However, a number of challenges remain, such as the magnitude of features and the large difference in dimensionality across different omics data sources. Results We propose a multi-modal sparse denoising autoencoder framework coupled with sparse non-negative matrix factorization to robustly cluster patients based on multi-omics data. The proposed model specifically leverages pathway information to effectively reduce the dimensionality of omics data into a pathway and patient specific score profile. In consequence, our method allows us to understand, which pathway is a feature of which particular patient cluster. Moreover, recently proposed machine learning techniques allow us to disentangle the specific impact of each individual omics feature on a pathway score. We applied our method to cluster patients in several cancer datasets using gene expression, miRNA expression, DNA methylation and CNVs, demonstrating the possibility to obtain biologically plausible disease subtypes characterized by specific molecular features. Comparison against several competing methods showed a competitive clustering performance. In addition, post-hoc analysis of somatic mutations and clinical data provided supporting evidence and interpretation of the identified clusters. Conclusions Our suggested multi-modal sparse denoising autoencoder approach allows for an effective and interpretable integration of multi-omics data on pathway level while addressing the high dimensional character of omics data. Patient specific pathway score profiles derived from our model allow for a robust identification of disease subgroups.
Collapse
Affiliation(s)
- Amina Lemsara
- Computer Science Department, University of Constantine 2, 25016, Constantine, Algeria
| | - Salima Ouadfel
- Computer Science Department, University of Constantine 2, 25016, Constantine, Algeria
| | - Holger Fröhlich
- University of Bonn, Bonn-Aachen, International Center for IT, 53115, Bonn, Germany. .,Fraunhofer Institute for, Algorithms and Scientific, Computing (SCAI), 53754, Sankt, Augustin, Germany.
| |
Collapse
|
63
|
Precision medicine and management of rheumatoid arthritis. J Autoimmun 2020; 110:102405. [PMID: 32276742 DOI: 10.1016/j.jaut.2020.102405] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 01/05/2020] [Indexed: 12/20/2022]
Abstract
Precision medicine (PM) is a very commonly used term that implies a highly individualized and tailored approach to patient management. There are, however, many layers of precision, as for example taking an appropriate patient history, or performing additional lab or imaging tests are already helping to better tailor treatments to the right patient. All this adds to the narrower definition of PM, which implies using the unique molecular characteristics of a patient for management decisions. Big data has become an essential part of PM, including as much information as possible to improve precision of disease management, although integration of multi-source data continues to be a challenge in practical application. In research big data can identify new (sub-)phenotypes in unsupervised analyses, which ultimately advance precision by allowing new targeted therapeutic approaches. We will discuss the current status of PM in rheumatoid arthritis (RA) in the management areas of diagnosis, prognosis, selection of therapy, and decision to reduce therapy. PM markers for diagnosis of RA are usually markers of RA classification rather than diagnosis, and subtypes of RA are potentially underrecognized. Prognostic precision is well established for RA, including markers of disease activity or structure, as well as autoantibodies and genetics. The choice of the right compound in a patient identified to have a poor prognosis, however, remains widely arbitrary. Finally and most recently, the most reliable markers for a safe withdrawal of therapy continue to be lower levels of disease activity and longer presence of remission.
Collapse
|
64
|
Liu S, Yang Z, Li G, Li C, Luo Y, Gong Q, Wu X, Li T, Zhang Z, Xing B, Xu X, Lu X. Multi-omics Analysis of Primary Cell Culture Models Reveals Genetic and Epigenetic Basis of Intratumoral Phenotypic Diversity. GENOMICS PROTEOMICS & BIOINFORMATICS 2020; 17:576-589. [PMID: 32205176 PMCID: PMC7212478 DOI: 10.1016/j.gpb.2018.07.008] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 05/29/2018] [Accepted: 07/24/2018] [Indexed: 12/27/2022]
Abstract
Uncovering the functionally essential variations related to tumorigenesis and tumor progression from cancer genomics data is still challenging due to the genetic diversity among patients, and extensive inter- and intra-tumoral heterogeneity at different levels of gene expression regulation, including but not limited to the genomic, epigenomic, and transcriptional levels. To minimize the impact of germline genetic heterogeneities, in this study, we establish multiple primary cultures from the primary and recurrent tumors of a single patient with hepatocellular carcinoma (HCC). Multi-omics sequencing was performed for these cultures that encompass the diversity of tumor cells from the same patient. Variations in the genome sequence, epigenetic modification, and gene expression are used to infer the phylogenetic relationships of these cell cultures. We find the discrepancy among the relationships revealed by single nucleotide variations (SNVs) and transcriptional/epigenomic profiles from the cell cultures. We fail to find overlap between sample-specific mutated genes and differentially expressed genes (DEGs), suggesting that most of the heterogeneous SNVs among tumor stages or lineages of the patient are functionally insignificant. Moreover, copy number alterations (CNAs) and DNA methylation variation within gene bodies, rather than promoters, are significantly correlated with gene expression variability among these cell cultures. Pathway analysis of CNA/DNA methylation-related genes indicates that a single cell clone from the recurrent tumor exhibits distinct cellular characteristics and tumorigenicity, and such an observation is further confirmed by cellular experiments both in vitro and in vivo. Our systematic analysis reveals that CNAs and epigenomic changes, rather than SNVs, are more likely to contribute to the phenotypic diversity among subpopulations in the tumor. These findings suggest that new therapeutic strategies targeting gene dosage and epigenetic modification should be considered in personalized cancer medicine. This culture model may be applied to the further identification of plausible determinants of cancer metastasis and relapse.
Collapse
Affiliation(s)
- Sixue Liu
- (1)CAS Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; (2)University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zuyu Yang
- (1)CAS Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; (3)Invasive Pathogens Laboratory, Institute of Environmental Science and Research, Porirua 5022, Wellington, New Zealand
| | - Guanghao Li
- (1)CAS Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; (2)University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chunyan Li
- (1)CAS Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; (2)University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yanting Luo
- (1)CAS Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; (2)University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qiang Gong
- (1)CAS Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Xin Wu
- (1)CAS Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Tao Li
- (1)CAS Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; (2)University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhiqian Zhang
- (4)Department of Cell Biology, Key Laboratory of Carcinogenesis and Translational Research, Center for Molecular and Translational Medicine, Peking University Cancer Hospital and Institute, Beijing 100142, China
| | - Baocai Xing
- (5)Department of Hepatobiliary Surgery I, Peking University Cancer Hospital and Institute, Beijing 100142, China
| | - Xiaolan Xu
- (6)National Key Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.
| | - Xuemei Lu
- (1)CAS Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; (2)University of Chinese Academy of Sciences, Beijing 100049, China; (7)CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China.
| |
Collapse
|
65
|
Park S, Xu H, Zhao H. Integrating Multidimensional Data for Clustering Analysis With Applications to Cancer Patient Data. J Am Stat Assoc 2020; 116:14-26. [DOI: 10.1080/01621459.2020.1730853] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Affiliation(s)
- Seyoung Park
- Department of Statistics, Sungkyunkwan University, Seoul, Korea
| | - Hao Xu
- Department of Biostatistics, Yale School of Public Health, New Haven, CT
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT
| |
Collapse
|
66
|
Di Nanni N, Bersanelli M, Milanesi L, Mosca E. Network Diffusion Promotes the Integrative Analysis of Multiple Omics. Front Genet 2020; 11:106. [PMID: 32180795 PMCID: PMC7057719 DOI: 10.3389/fgene.2020.00106] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 01/29/2020] [Indexed: 02/01/2023] Open
Abstract
The development of integrative methods is one of the main challenges in bioinformatics. Network-based methods for the analysis of multiple gene-centered datasets take into account known and/or inferred relations between genes. In the last decades, the mathematical machinery of network diffusion—also referred to as network propagation—has been exploited in several network-based pipelines, thanks to its ability of amplifying association between genes that lie in network proximity. Indeed, network diffusion provides a quantitative estimation of network proximity between genes associated with one or more different data types, from simple binary vectors to real vectors. Therefore, this powerful data transformation method has also been increasingly used in integrative analyses of multiple collections of biological scores and/or one or more interaction networks. We present an overview of the state of the art of bioinformatics pipelines that use network diffusion processes for the integrative analysis of omics data. We discuss the fundamental ways in which network diffusion is exploited, open issues and potential developments in the field. Current trends suggest that network diffusion is a tool of broad utility in omics data analysis. It is reasonable to think that it will continue to be used and further refined as new data types arise (e.g. single cell datasets) and the identification of system-level patterns will be considered more and more important in omics data analysis.
Collapse
Affiliation(s)
- Noemi Di Nanni
- Institute of Biomedical Technologies, National Research Council, Milan, Italy.,Department of Industrial and Information Engineering, University of Pavia, Pavia, Italy
| | - Matteo Bersanelli
- Department of Physics and Astronomy, University of Bologna, Bologna, Italy.,National Institute of Nuclear Physics (INFN), Bologna, Italy
| | - Luciano Milanesi
- Institute of Biomedical Technologies, National Research Council, Milan, Italy
| | - Ettore Mosca
- Institute of Biomedical Technologies, National Research Council, Milan, Italy
| |
Collapse
|
67
|
Serafini MS, Lopez-Perez L, Fico G, Licitra L, De Cecco L, Resteghini C. Transcriptomics and Epigenomics in head and neck cancer: available repositories and molecular signatures. CANCERS OF THE HEAD & NECK 2020; 5:2. [PMID: 31988797 PMCID: PMC6971871 DOI: 10.1186/s41199-020-0047-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Indexed: 02/06/2023]
Abstract
For many years, head and neck squamous cell carcinoma (HNSCC) has been considered as a single entity. However, in the last decades HNSCC complexity and heterogeneity have been recognized. In parallel, high-throughput omics techniques had allowed picturing a larger spectrum of the behavior and characteristics of molecules in cancer and a large set of omics web-based tools and informative repository databases have been developed. The objective of the present review is to provide an overview on biological, prognostic and predictive molecular signatures in HNSCC. To contextualize the selected data, our literature survey includes a short summary of the main characteristics of omics data repositories and web-tools for data analyses. The timeframe of our analysis was fixed, encompassing papers published between January 2015 and January 2019. From more than 1000 papers evaluated, 61 omics studies were selected: 33 investigating mRNA signatures, 11 and 13 related to miRNA and other non-coding-RNA signatures and 4 analyzing DNA methylation signatures. More than half of identified signatures (36) had a prognostic value but only in 10 studies selection of a specific anatomical sub-site (8 oral cavity, 1 oropharynx and 1 both oral cavity and oropharynx) was performed. Noteworthy, although the sample size included in many studies was limited, about one-half of the retrieved studies reported an external validation on independent dataset(s), strengthening the relevance of the obtained data. Finally, we highlighted the development and exploitation of three gene-expression signatures, whose clinical impact on prognosis/prediction of treatment response could be high. Based on this overview on omics-related literature in HNSCC, we identified some limits and strengths. The major limits are represented by the low number of signatures associated to DNA methylation and to non-coding RNA (miRNA, lncRNA and piRNAs) and the availability of a single dataset with multiple omics on more than 500 HNSCC (i.e. TCGA). The major strengths rely on the integration of multiple datasets through meta-analysis approaches and on the growing integration among omics data obtained on the same cohort of patients. Moreover, new approaches based on artificial intelligence and informatic analyses are expected to be available in the next future.
Collapse
Affiliation(s)
- Mara S Serafini
- 1Integrated Biology Platform, Department of Applied Research and Technology Development, Fondazione IRCCS Istituto Nazionale dei Tumori di Milano, Milan, Italy
| | - Laura Lopez-Perez
- 2Life Supporting Technologies, Universidad Politécnica de Madrid, Madrid, Spain
| | - Giuseppe Fico
- 2Life Supporting Technologies, Universidad Politécnica de Madrid, Madrid, Spain
| | - Lisa Licitra
- 3Head and Neck Medical Oncology Department, Fondazione IRCCS Istituto Nazionale dei Tumori di Milano, Milan, Italy.,4University of Milan, Milan, Italy
| | - Loris De Cecco
- 1Integrated Biology Platform, Department of Applied Research and Technology Development, Fondazione IRCCS Istituto Nazionale dei Tumori di Milano, Milan, Italy
| | - Carlo Resteghini
- 3Head and Neck Medical Oncology Department, Fondazione IRCCS Istituto Nazionale dei Tumori di Milano, Milan, Italy
| |
Collapse
|
68
|
Kang M, Gao J. Integration of Multi-omics Data for Expression Quantitative Trait Loci (eQTL) Analysis and eQTL Epistasis. Methods Mol Biol 2020; 2082:157-171. [PMID: 31849014 DOI: 10.1007/978-1-0716-0026-9_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Expression quantitative trait loci (eQTL) mapping studies identify genetic loci that regulate gene expression. eQTL mapping studies can capture gene regulatory interactions and provide insight into the genetic mechanism of biological systems. Recently, the integration of multi-omics data, such as single-nucleotide polymorphisms (SNPs), copy number variations (CNVs), DNA methylation, and gene expression, plays an important role in elucidating complex biological systems, since biological systems involve a sequence of complex interactions between various biological processes. This chapter introduces multi-omics data that have been used in many eQTL studies and integrative methodologies that incorporate multi-omics data for eQTL studies. Furthermore, we describe a statistical approach that can detect nonlinear causal relationships between eQTLs, called eQTL epistasis, and its importance.
Collapse
Affiliation(s)
- Mingon Kang
- Department of Computer Science, University of Nevada, Las Vegas, Las Vegas, NV, USA
| | - Jean Gao
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, USA.
| |
Collapse
|
69
|
Dalgleish JL, Wang Y, Zhu J, Meltzer PS. CNVScope: Visually Exploring Copy Number Aberrations in Cancer Genomes. Cancer Inform 2019; 18:1176935119890290. [PMID: 31832011 PMCID: PMC6887803 DOI: 10.1177/1176935119890290] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Accepted: 10/30/2019] [Indexed: 12/12/2022] Open
Abstract
Motivation: DNA copy number (CN) data are a fast-growing source of information used in basic and translational cancer research. Most CN segmentation data are presented without regard to the relationship between chromosomal regions. We offer both a toolkit to help scientists without programming experience visually explore the CN interactome and a package that constructs CN interactomes from publicly available data sets. Results: The CNVScope visualization, based on a publicly available neuroblastoma CN data set, clearly displays a distinct CN interaction in the region of the MYCN, a canonical frequent amplicon target in this cancer. Exploration of the data rapidly identified cis and trans events, including a strong anticorrelation between 11q loss and17q gain with the region of 11q loss bounded by the cell cycle regulator CCND1. Availability: The shiny application is readily available for use at http://cnvscope.nci.nih.gov/, and the package can be downloaded from CRAN (https://cran.r-project.org/package=CNVScope), where help pages and vignettes are located. A newer version is available on the GitHub site (https://github.com/jamesdalg/CNVScope/), which features an animated tutorial. The CNVScope package can be locally installed using instructions on the GitHub site for Windows and Macintosh systems. This CN analysis package also runs on a linux high-performance computing cluster, with options for multinode and multiprocessor analysis of CN variant data. The shiny application can be started using a single command (which will automatically install the public data package).
Collapse
Affiliation(s)
- James Lt Dalgleish
- Genetics Branch, National Cancer Institute, Center for Cancer Research, National Institutes of Health, Bethesda, MD, USA
| | - Yonghong Wang
- Genetics Branch, National Cancer Institute, Center for Cancer Research, National Institutes of Health, Bethesda, MD, USA
| | - Jack Zhu
- Genetics Branch, National Cancer Institute, Center for Cancer Research, National Institutes of Health, Bethesda, MD, USA
| | - Paul S Meltzer
- Genetics Branch, National Cancer Institute, Center for Cancer Research, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
70
|
Simidjievski N, Bodnar C, Tariq I, Scherer P, Andres Terre H, Shams Z, Jamnik M, Liò P. Variational Autoencoders for Cancer Data Integration: Design Principles and Computational Practice. Front Genet 2019; 10:1205. [PMID: 31921281 PMCID: PMC6917668 DOI: 10.3389/fgene.2019.01205] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Accepted: 10/31/2019] [Indexed: 12/27/2022] Open
Abstract
International initiatives such as the Molecular Taxonomy of Breast Cancer International Consortium are collecting multiple data sets at different genome-scales with the aim to identify novel cancer bio-markers and predict patient survival. To analyze such data, several machine learning, bioinformatics, and statistical methods have been applied, among them neural networks such as autoencoders. Although these models provide a good statistical learning framework to analyze multi-omic and/or clinical data, there is a distinct lack of work on how to integrate diverse patient data and identify the optimal design best suited to the available data.In this paper, we investigate several autoencoder architectures that integrate a variety of cancer patient data types (e.g., multi-omics and clinical data). We perform extensive analyses of these approaches and provide a clear methodological and computational framework for designing systems that enable clinicians to investigate cancer traits and translate the results into clinical applications. We demonstrate how these networks can be designed, built, and, in particular, applied to tasks of integrative analyses of heterogeneous breast cancer data. The results show that these approaches yield relevant data representations that, in turn, lead to accurate and stable diagnosis.
Collapse
Affiliation(s)
- Nikola Simidjievski
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | - Cristian Bodnar
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | - Ifrah Tariq
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom.,Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Paul Scherer
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | - Helena Andres Terre
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | - Zohreh Shams
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | - Mateja Jamnik
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| | - Pietro Liò
- Department of Computer Science and Technology, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
71
|
Madsen T, Świtnicki M, Juul M, Pedersen JS. EBADIMEX: an empirical Bayes approach to detect joint differential expression and methylation and to classify samples. Stat Appl Genet Mol Biol 2019; 18:/j/sagmb.ahead-of-print/sagmb-2018-0050/sagmb-2018-0050.xml. [PMID: 31734658 DOI: 10.1515/sagmb-2018-0050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
DNA methylation and gene expression are interdependent and both implicated in cancer development and progression, with many individual biomarkers discovered. A joint analysis of the two data types can potentially lead to biological insights that are not discoverable with separate analyses. To optimally leverage the joint data for identifying perturbed genes and classifying clinical cancer samples, it is important to accurately model the interactions between the two data types. Here, we present EBADIMEX for jointly identifying differential expression and methylation and classifying samples. The moderated t-test widely used with empirical Bayes priors in current differential expression methods is generalised to a multivariate setting by developing: (1) a moderated Welch t-test for equality of means with unequal variances; (2) a moderated F-test for equality of variances; and (3) a multivariate test for equality of means with equal variances. This leads to parametric models with prior distributions for the parameters, which allow fast evaluation and robust analysis of small data sets. EBADIMEX is demonstrated on simulated data as well as a large breast cancer (BRCA) cohort from TCGA. We show that the use of empirical Bayes priors and moderated tests works particularly well on small data sets.
Collapse
Affiliation(s)
- Tobias Madsen
- Department of Molecular Medicine, Aarhus University, Palle Juul-Jensens Boulevard 99, DK-8200 Aarhus N, Denmark.,Bioinformatics Research Centre, Aarhus University, C.F. Møllers Alle 8 DK-8000 Aarhus C, Denmark
| | - Michał Świtnicki
- Department of Molecular Medicine, Aarhus University, Palle Juul-Jensens Boulevard 99, DK-8200 Aarhus N, Denmark
| | - Malene Juul
- Department of Molecular Medicine, Aarhus University, Palle Juul-Jensens Boulevard 99, DK-8200 Aarhus N, Denmark.,Bioinformatics Research Centre, Aarhus University, C.F. Møllers Alle 8 DK-8000 Aarhus C, Denmark
| | - Jakob Skou Pedersen
- Department of Molecular Medicine, Aarhus University, Palle Juul-Jensens Boulevard 99, DK-8200 Aarhus N, Denmark.,Bioinformatics Research Centre, Aarhus University, C.F. Møllers Alle 8 DK-8000 Aarhus C, Denmark
| |
Collapse
|
72
|
Hernández-Lemus E, Reyes-Gopar H, Espinal-Enríquez J, Ochoa S. The Many Faces of Gene Regulation in Cancer: A Computational Oncogenomics Outlook. Genes (Basel) 2019; 10:E865. [PMID: 31671657 PMCID: PMC6896122 DOI: 10.3390/genes10110865] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 10/16/2019] [Accepted: 10/24/2019] [Indexed: 12/16/2022] Open
Abstract
Cancer is a complex disease at many different levels. The molecular phenomenology of cancer is also quite rich. The mutational and genomic origins of cancer and their downstream effects on processes such as the reprogramming of the gene regulatory control and the molecular pathways depending on such control have been recognized as central to the characterization of the disease. More important though is the understanding of their causes, prognosis, and therapeutics. There is a multitude of factors associated with anomalous control of gene expression in cancer. Many of these factors are now amenable to be studied comprehensively by means of experiments based on diverse omic technologies. However, characterizing each dimension of the phenomenon individually has proven to fall short in presenting a clear picture of expression regulation as a whole. In this review article, we discuss some of the more relevant factors affecting gene expression control both, under normal conditions and in tumor settings. We describe the different omic approaches that we can use as well as the computational genomic analysis needed to track down these factors. Then we present theoretical and computational frameworks developed to integrate the amount of diverse information provided by such single-omic analyses. We contextualize this within a systems biology-based multi-omic regulation setting, aimed at better understanding the complex interplay of gene expression deregulation in cancer.
Collapse
Affiliation(s)
- Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
| | - Helena Reyes-Gopar
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
| | - Jesús Espinal-Enríquez
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
| | - Soledad Ochoa
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
| |
Collapse
|
73
|
Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A, Hoffman MM. Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2019; 50:71-91. [PMID: 30467459 PMCID: PMC6242341 DOI: 10.1016/j.inffus.2018.09.012] [Citation(s) in RCA: 210] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Computer Science, Stanford University,
Stanford, CA, USA
| | - Francis Nguyen
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Bo Wang
- Hikvision Research Institute, Santa Clara, CA, USA
| | - Jure Leskovec
- Department of Computer Science, Stanford University,
Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Anna Goldenberg
- Genetics & Genome Biology, SickKids Research Institute,
Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Michael M. Hoffman
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| |
Collapse
|
74
|
Wang Y, Liu Q, Huang S, Yuan B. Learning a Structural and Functional Representation for Gene Expressions: To Systematically Dissect Complex Cancer Phenotypes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1729-1742. [PMID: 28489545 DOI: 10.1109/tcbb.2017.2702161] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Cancer is a heterogeneous disease, thus one of the central problems is how to dissect the resulting complex phenotypes in terms of their biological building blocks. Computationally, this is to represent and interpret high dimensional observations through a structural and conceptual abstraction into the most influential determinants underlying the problem. The working hypothesis of this report is to consider gene interaction to be largely responsible for the manifestation of complex cancer phenotypes, thus where the representation is to be conceptualized. Here, we report a representation learning strategy combined with regularizations, in which gene expressions are described in terms of a regularized product of meta-genes and their expression levels. The meta-genes are constrained by gene interactions thus representing their original topological contexts. The expression levels are supervised by their conditional dependencies among the observations thus providing a cluster-specific constraint. We obtain both of these structural constraints using a node-based graphical model. Our representation allows the selection of more influential modules, thus implicating their possible roles in neoplastic transformations. We validate our representation strategy by its robust recognitions of various cancer phenotypes comparing with various classical methods. The modules discovered are either shared or specify for different types or stages of human cancers, all of which are consistent with literature and biology.
Collapse
|
75
|
Gabaldón T. Recent trends in molecular diagnostics of yeast infections: from PCR to NGS. FEMS Microbiol Rev 2019; 43:517-547. [PMID: 31158289 PMCID: PMC8038933 DOI: 10.1093/femsre/fuz015] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Accepted: 05/31/2019] [Indexed: 12/29/2022] Open
Abstract
The incidence of opportunistic yeast infections in humans has been increasing over recent years. These infections are difficult to treat and diagnose, in part due to the large number and broad diversity of species that can underlie the infection. In addition, resistance to one or several antifungal drugs in infecting strains is increasingly being reported, severely limiting therapeutic options and showcasing the need for rapid detection of the infecting agent and its drug susceptibility profile. Current methods for species and resistance identification lack satisfactory sensitivity and specificity, and often require prior culturing of the infecting agent, which delays diagnosis. Recently developed high-throughput technologies such as next generation sequencing or proteomics are opening completely new avenues for more sensitive, accurate and fast diagnosis of yeast pathogens. These approaches are the focus of intensive research, but translation into the clinics requires overcoming important challenges. In this review, we provide an overview of existing and recently emerged approaches that can be used in the identification of yeast pathogens and their drug resistance profiles. Throughout the text we highlight the advantages and disadvantages of each methodology and discuss the most promising developments in their path from bench to bedside.
Collapse
Affiliation(s)
- Toni Gabaldón
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
- Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
- ICREA, Pg Lluís Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
76
|
Di Nanni N, Gnocchi M, Moscatelli M, Milanesi L, Mosca E. Gene relevance based on multiple evidences in complex networks. Bioinformatics 2019; 36:865-871. [PMID: 31504182 PMCID: PMC9883679 DOI: 10.1093/bioinformatics/btz652] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 05/17/2019] [Accepted: 08/19/2019] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION Multi-omics approaches offer the opportunity to reconstruct a more complete picture of the molecular events associated with human diseases, but pose challenges in data analysis. Network-based methods for the analysis of multi-omics leverage the complex web of macromolecular interactions occurring within cells to extract significant patterns of molecular alterations. Existing network-based approaches typically address specific combinations of omics and are limited in terms of the number of layers that can be jointly analysed. In this study, we investigate the application of network diffusion to quantify gene relevance on the basis of multiple evidences (layers). RESULTS We introduce a gene score (mND) that quantifies the relevance of a gene in a biological process taking into account the network proximity of the gene and its first neighbours to other altered genes. We show that mND has a better performance over existing methods in finding altered genes in network proximity in one or more layers. We also report good performances in recovering known cancer genes. The pipeline described in this article is broadly applicable, because it can handle different types of inputs: in addition to multi-omics datasets, datasets that are stratified in many classes (e.g., cell clusters emerging from single cell analyses) or a combination of the two scenarios. AVAILABILITY AND IMPLEMENTATION The R package 'mND' is available at URL: https://www.itb.cnr.it/mnd. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Noemi Di Nanni
- Department of Biomedical Sciences, Institute of Biomedical Technologies, National Research Council, 20090 Segrate (MI), Italy,Department of Industrial and Information Engineering, University of Pavia, Italy
| | - Matteo Gnocchi
- Department of Biomedical Sciences, Institute of Biomedical Technologies, National Research Council, 20090 Segrate (MI), Italy
| | - Marco Moscatelli
- Department of Biomedical Sciences, Institute of Biomedical Technologies, National Research Council, 20090 Segrate (MI), Italy
| | - Luciano Milanesi
- Department of Biomedical Sciences, Institute of Biomedical Technologies, National Research Council, 20090 Segrate (MI), Italy
| | | |
Collapse
|
77
|
Abbas-Aghababazadeh F, Mo Q, Fridley BL. Statistical genomics in rare cancer. Semin Cancer Biol 2019; 61:1-10. [PMID: 31437624 DOI: 10.1016/j.semcancer.2019.08.021] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 08/14/2019] [Accepted: 08/17/2019] [Indexed: 12/26/2022]
Abstract
Rare cancers make of more than 20% of cancer cases. Due to the rare nature, less research has been conducted on rare cancers resulting in worse outcomes for patients with rare cancers compared to common cancers. The ability to study rare cancers is impaired by the ability to collect a large enough set of patients to complete an adequately powered genomic study. In this manuscript we outline analytical approaches and public genomic datasets that have been used in genomic studies of rare cancers. These statistical analysis approaches and study designs include: gene set / pathway analyses, pedigree and consortium studies, meta-analysis or horizontal integration, and integration of multiple types of genomic information or vertical integration. We also discuss some of the publicly available resources that can be leveraged in rare cancer genomic studies.
Collapse
Affiliation(s)
| | - Qianxing Mo
- Department of Biostatistics & Bioinformatics, Moffitt Cancer Center, Tampa, FL, 33612, USA.
| | - Brooke L Fridley
- Department of Biostatistics & Bioinformatics, Moffitt Cancer Center, Tampa, FL, 33612, USA.
| |
Collapse
|
78
|
Sun W, Bunn P, Jin C, Little P, Zhabotynsky V, Perou CM, Hayes DN, Chen M, Lin DY. The association between copy number aberration, DNA methylation and gene expression in tumor samples. Nucleic Acids Res 2019. [PMID: 29529299 PMCID: PMC5887505 DOI: 10.1093/nar/gky131] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
We systematically studied the association between somatic copy number aberration (SCNA), DNA methylation and gene expression using -omic data from The Cancer Genome Atlas (TCGA) on six cancer types: breast cancer, colon cancer, glioblastoma, leukemia, lower-grade glioma and prostate cancer. A major challenge for such integrated study is that the association between DNA methylation and gene expression is severely confounded by tumor purity and cell type composition, which are often unobserved and difficult to estimate. To overcome this challenge, we developed a method to remove confounding effects by calculating the principal components that span the space of the latent factors. Another intriguing findings of our study is that there could be both positive and negative associations between SCNA and DNA methylation, while the CpGs with negative/positive associations with SCNA are often located around CpG islands/ocean, respectively. A joint study of SCNA, DNA methylation, and gene expression suggest that SCNA often affect DNA methylation and gene expression independently.
Collapse
Affiliation(s)
- Wei Sun
- Public Health Science Division, Fred Hutchison Cancer Research Center, USA
| | - Paul Bunn
- Department of Biostatistics, University of North Carolina, Chapel Hill, USA
| | - Chong Jin
- Department of Biostatistics, University of North Carolina, Chapel Hill, USA
| | - Paul Little
- Department of Biostatistics, University of North Carolina, Chapel Hill, USA
| | - Vasyl Zhabotynsky
- Department of Biostatistics, University of North Carolina, Chapel Hill, USA
| | - Charles M Perou
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, USA.,Department of Genetics, University of North Carolina, Chapel Hill, USA
| | - David Neil Hayes
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, USA.,Department of Medicine, Division of Hematology/Oncology, University of North Carolina, Chapel Hill, USA
| | - Mengjie Chen
- Department of Medicine, University of Chicago, USA.,Department of Human Genetics, University of Chicago, USA
| | - Dan-Yu Lin
- Department of Biostatistics, University of North Carolina, Chapel Hill, USA.,Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, USA
| |
Collapse
|
79
|
Shannon NB, Tan JWS, Tan HL, Wang W, Chen Y, Lim HJ, Tan QX, Hendrikson J, Ng WH, Loo LY, Skanthakumar T, Wasudevan SD, Kon OL, Lim TKH, Tan GHC, Chia CS, Soo KC, Ong CAJ, Teo MCC. A set of molecular markers predicts chemosensitivity to Mitomycin-C following cytoreductive surgery and hyperthermic intraperitoneal chemotherapy for colorectal peritoneal metastasis. Sci Rep 2019; 9:10572. [PMID: 31332257 PMCID: PMC6646658 DOI: 10.1038/s41598-019-46819-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Accepted: 05/28/2019] [Indexed: 12/21/2022] Open
Abstract
Cytoreductive surgery (CRS) and hyperthermic intraperitoneal chemotherapy (HIPEC) is associated with significant perioperative morbidity and mortality. We aim to generate and validate a biomarker set predicting sensitivity to Mitomycin-C to refine selection of patients with colorectal peritoneal metastasis (CPM) for this treatment. A signature predicting Mitomycin-C sensitivity was generated using data from Genomics of Drug Sensitivity in Cancer and The Cancer Genome Atlas. Validation was performed on CPM patients who underwent CRS-HIPEC (n = 62) using immunohistochemistry (IHC). We determined predictive significance of our set using overall survival as a surrogate endpoint via a logistic regression model. Three potential biomarkers were identified and optimized for IHC. Patients exhibiting lower expression of PAXIP1 and SSBP2 had poorer survival than those with higher expression (p = 0.045 and 0.140, respectively). No difference was observed in patients with differing DTYMK expression (p = 0.715). Combining PAXIP1 and SSBP2 in a set, patients with two dysregulated protein markers had significantly poorer survival than one or no dysregulated marker (p = 0.016). This set independently predicted survival in a Cox regression model (HR 5.097; 95% CI 1.731–15.007; p = 0.003). We generated and validated an IHC prognostic set which could potentially identify patients who are likely to benefit from HIPEC using Mitomycin-C.
Collapse
Affiliation(s)
| | - Joey Wee-Shan Tan
- Division of Surgical Oncology, National Cancer Centre Singapore, Singapore, Singapore
| | - Hwee Leong Tan
- Division of Surgical Oncology, National Cancer Centre Singapore, Singapore, Singapore
| | - Weining Wang
- Division of Surgical Oncology, National Cancer Centre Singapore, Singapore, Singapore
| | - Yudong Chen
- Division of Surgical Oncology, National Cancer Centre Singapore, Singapore, Singapore
| | - Hui Jun Lim
- Division of Surgical Oncology, National Cancer Centre Singapore, Singapore, Singapore
| | - Qiu Xuan Tan
- Division of Surgical Oncology, National Cancer Centre Singapore, Singapore, Singapore
| | - Josephine Hendrikson
- Division of Surgical Oncology, National Cancer Centre Singapore, Singapore, Singapore
| | - Wai Har Ng
- Division of Surgical Oncology, National Cancer Centre Singapore, Singapore, Singapore
| | - Li Yang Loo
- Division of Surgical Oncology, National Cancer Centre Singapore, Singapore, Singapore
| | | | - Seettha D Wasudevan
- Division of Surgical Oncology, National Cancer Centre Singapore, Singapore, Singapore
| | - Oi Lian Kon
- Division of Medical Sciences, National Cancer Centre Singapore, Singapore, Singapore
| | - Tony Kiat Hon Lim
- Department of Anatomical Pathology, Singapore General Hospital, Singapore, Singapore
| | - Grace Hwei Ching Tan
- Division of Surgical Oncology, National Cancer Centre Singapore, Singapore, Singapore
| | - Claramae Shulyn Chia
- Division of Surgical Oncology, National Cancer Centre Singapore, Singapore, Singapore
| | - Khee Chee Soo
- Division of Surgical Oncology, National Cancer Centre Singapore, Singapore, Singapore
| | - Chin-Ann Johnny Ong
- Division of Surgical Oncology, National Cancer Centre Singapore, Singapore, Singapore.
| | | |
Collapse
|
80
|
Mercatelli D, Ray F, Giorgi FM. Pan-Cancer and Single-Cell Modeling of Genomic Alterations Through Gene Expression. Front Genet 2019; 10:671. [PMID: 31379928 PMCID: PMC6657420 DOI: 10.3389/fgene.2019.00671] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Accepted: 06/27/2019] [Indexed: 12/27/2022] Open
Abstract
Cancer is a disease often characterized by the presence of multiple genomic alterations, which trigger altered transcriptional patterns and gene expression, which in turn sustain the processes of tumorigenesis, tumor progression, and tumor maintenance. The links between genomic alterations and gene expression profiles can be utilized as the basis to build specific molecular tumorigenic relationships. In this study, we perform pan-cancer predictions of the presence of single somatic mutations and copy number variations using machine learning approaches on gene expression profiles. We show that gene expression can be used to predict genomic alterations in every tumor type, where some alterations are more predictable than others. We propose gene aggregation as a tool to improve the accuracy of alteration prediction models from gene expression profiles. Ultimately, we show how this principle can be beneficial in intrinsically noisy datasets, such as those based on single-cell sequencing.
Collapse
Affiliation(s)
- Daniele Mercatelli
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Forest Ray
- Department of Systems Biology, Columbia University Medical Center, New York, NY, United States
| | - Federico M. Giorgi
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
81
|
Wilk G, Braun R. Integrative analysis reveals disrupted pathways regulated by microRNAs in cancer. Nucleic Acids Res 2019; 46:1089-1101. [PMID: 29294105 PMCID: PMC5814839 DOI: 10.1093/nar/gkx1250] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Accepted: 12/01/2017] [Indexed: 02/06/2023] Open
Abstract
MicroRNAs (miRNAs) are small endogenous regulatory molecules that modulate gene expression post-transcriptionally. Although differential expression of miRNAs have been implicated in many diseases (including cancers), the underlying mechanisms of action remain unclear. Because each miRNA can target multiple genes, miRNAs may potentially have functional implications for the overall behavior of entire pathways. Here, we investigate the functional consequences of miRNA dysregulation through an integrative analysis of miRNA and mRNA expression data using a novel approach that incorporates pathway information a priori. By searching for miRNA-pathway associations that differ between healthy and tumor tissue, we identify specific relationships at the systems level which are disrupted in cancer. Our approach is motivated by the hypothesis that if an miRNA and pathway are associated, then the expression of the miRNA and the collective behavior of the genes in a pathway will be correlated. As such, we first obtain an expression-based summary of pathway activity using Isomap, a dimension reduction method which can articulate non-linear structure in high-dimensional data. We then search for miRNAs that exhibit differential correlations with the pathway summary between phenotypes as a means of finding aberrant miRNA-pathway coregulation in tumors. We apply our method to cancer data using gene and miRNA expression datasets from The Cancer Genome Atlas and compare ∼105 miRNA-pathway relationships between healthy and tumor samples from four tissues (breast, prostate, lung and liver). Many of the flagged pairs we identify have a biological basis for disruption in cancer.
Collapse
Affiliation(s)
- Gary Wilk
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL 60208, USA
| | - Rosemary Braun
- Biostatistics Division, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA.,Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, IL 60208, USA
| |
Collapse
|
82
|
Park JE, Park SY, Kim HJ, Kim HS. Reproducibility and Generalizability in Radiomics Modeling: Possible Strategies in Radiologic and Statistical Perspectives. Korean J Radiol 2019; 20:1124-1137. [PMID: 31270976 PMCID: PMC6609433 DOI: 10.3348/kjr.2018.0070] [Citation(s) in RCA: 184] [Impact Index Per Article: 36.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Accepted: 04/07/2019] [Indexed: 02/06/2023] Open
Abstract
Radiomics, which involves the use of high-dimensional quantitative imaging features for predictive purposes, is a powerful tool for developing and testing medical hypotheses. Radiologic and statistical challenges in radiomics include those related to the reproducibility of imaging data, control of overfitting due to high dimensionality, and the generalizability of modeling. The aims of this review article are to clarify the distinctions between radiomics features and other omics and imaging data, to describe the challenges and potential strategies in reproducibility and feature selection, and to reveal the epidemiological background of modeling, thereby facilitating and promoting more reproducible and generalizable radiomics research.
Collapse
Affiliation(s)
- Ji Eun Park
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea
| | - Seo Young Park
- Department of Clinical Epidemiology and Biostatistics, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea
| | - Hwa Jung Kim
- Department of Clinical Epidemiology and Biostatistics, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea.
| | - Ho Sung Kim
- Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Korea
| |
Collapse
|
83
|
Dong L, Zhang Z, Xu J, Wang F, Ma Y, Li F, Shen C, Liu Z, Zhang J, Liu C, Yi P, Yu J. Consistency analysis of microRNA-arm expression reveals microRNA-369-5p/3p as tumor suppressors in gastric cancer. Mol Oncol 2019; 13:1605-1620. [PMID: 31162777 PMCID: PMC6599845 DOI: 10.1002/1878-0261.12527] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Revised: 04/20/2019] [Accepted: 06/03/2019] [Indexed: 01/11/2023] Open
Abstract
The 5p and 3p arms of microRNA (miRNA) are typically generated from the same precursor, and one arm influences protein output, while the other has a short half‐life. However, a few miR‐5p/3p pairs have been reported to co‐exist in cancer cells. Here, we performed a genome‐wide analysis of miRNA expression in gastric cancer (GC) cells to systematically investigate the co‐expression profile of miR‐5p/3p in gastric tumorigenesis. We discovered that only 41 miR‐5p/3p pairs out of 1749 analyzed miRNA were co‐expressed. Specifically, abnormal expression of miR‐369‐5p and miR‐369‐3p was correlated with GC progression. Importantly, both in vitro and in vivo assays revealed that miR‐369‐5p and miR‐369‐3p exhibited tumor‐suppressive roles by regulating jun proto‐oncogene and v‐akt murine thymoma viral oncogene homolog 1 function in GC cells, respectively. Moreover, we observed that miR‐369 was inactivated in GC tissues due to DNA methylation. We also showed that inhibition of miR‐369‐5p/3p attenuated the effect of azacitidine (AZA) treatment on suppressing cell growth and invasion. These results suggest that the therapeutic efficacy of AZA in GC is at least partly attributable to miR‐369 activation. Overall, our findings provide convincing evidence that both the 5p and 3p arms of miRNA co‐expressed in GC and DNA methylation‐induced miR‐369 signaling contribute to GC progression.
Collapse
Affiliation(s)
- Lei Dong
- Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing, China
| | - Zhengyi Zhang
- Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing, China
| | - Jiayue Xu
- Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing, China
| | - Fang Wang
- Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing, China
| | - Yanni Ma
- Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing, China
| | - Feng Li
- Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing, China
| | - Chao Shen
- Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing, China
| | - Ziwen Liu
- Department of General Surgery, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
| | - Junwu Zhang
- Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing, China
| | - Changzheng Liu
- Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing, China.,NHC Key Laboratory of Systems Biology of Pathogens and Christophe Mérieux Laboratory, IPB, CAMS-Fondation Mérieux, Institute of Pathogen Biology (IPB), Chinese Academy of Medical Sciences (CAMS) & Peking Union Medical College, Beijing, China
| | - Ping Yi
- Department of Obstetrics and Gynecology, The Third Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Jia Yu
- Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing, China
| |
Collapse
|
84
|
McCurdy S, Molinaro A, Pachter L. Factor analysis for survival time prediction with informative censoring and diverse covariates. Stat Med 2019; 38:3719-3732. [PMID: 31162708 DOI: 10.1002/sim.8151] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Revised: 01/15/2019] [Accepted: 03/03/2019] [Indexed: 11/05/2022]
Abstract
Fulfilling the promise of precision medicine requires accurately and precisely classifying disease states. For cancer, this includes prediction of survival time from a surfeit of covariates. Such data presents an opportunity for improved prediction, but also a challenge due to high dimensionality. Furthermore, disease populations can be heterogeneous. Integrative modeling is sensible, as the underlying hypothesis is that joint analysis of multiple covariates provides greater explanatory power than separate analyses. We propose an integrative latent variable model that combines factor analysis for various data types and an exponential proportional hazards (EPH) model for continuous survival time with informative censoring. The factor and EPH models are connected through low-dimensional latent variables that can be interpreted and visualized to identify subpopulations. We use this model to predict survival time. We demonstrate this model's utility in simulation and on four Cancer Genome Atlas datasets: diffuse lower-grade glioma, glioblastoma multiforme, lung adenocarcinoma, and lung squamous cell carcinoma. These datasets have small sample sizes, high-dimensional diverse covariates, and high censorship rates. We compare the predictions from our model to three alternative models. Our model outperforms in simulation and is competitive on real datasets. Furthermore, the low-dimensional visualization for diffuse lower-grade glioma displays known subpopulations.
Collapse
Affiliation(s)
- Shannon McCurdy
- California Institute for Quantitative Biosciences, University of California, Berkeley, California
| | - Annette Molinaro
- Department of Neurological Surgery, University of California, San Francisco, California.,Division of Epidemiology and Biostatistics, University of California, San Francisco, California
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California.,Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California
| |
Collapse
|
85
|
Xiang D, Zhao SD, Tony Cai T. Signal classification for the integrative analysis of multiple sequences of large-scale multiple tests. J R Stat Soc Series B Stat Methodol 2019. [DOI: 10.1111/rssb.12323] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Dongdong Xiang
- East China Normal University; Shanghai People's Republic of China
| | | | - T. Tony Cai
- University of Pennsylvania; Philadelphia USA
| |
Collapse
|
86
|
CancerMine: a literature-mined resource for drivers, oncogenes and tumor suppressors in cancer. Nat Methods 2019; 16:505-507. [DOI: 10.1038/s41592-019-0422-y] [Citation(s) in RCA: 95] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Accepted: 04/10/2019] [Indexed: 01/13/2023]
|
87
|
Xia X, Lo YC, Gholkar AA, Senese S, Ong JY, Velasquez EF, Damoiseaux R, Torres JZ. Leukemia Cell Cycle Chemical Profiling Identifies the G2-Phase Leukemia Specific Inhibitor Leusin-1. ACS Chem Biol 2019; 14:994-1001. [PMID: 31046221 DOI: 10.1021/acschembio.9b00173] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Targeting the leukemia proliferation cycle has been a successful approach to developing antileukemic therapies. However, drug screening efforts to identify novel antileukemic agents have been hampered by the lack of a suitable high-throughput screening platform for suspension cells that does not rely on flow-cytometry analyses. We report the development of a novel leukemia cell-based high-throughput chemical screening platform for the discovery of cell cycle phase specific inhibitors that utilizes chemical cell cycle profiling. We have used this approach to analyze the cell cycle response of acute lymphoblastic leukemia CCRF-CEM cells to each of 181420 druglike compounds. This approach yielded cell cycle phase specific inhibitors of leukemia cell proliferation. Further analyses of the top G2-phase and M-phase inhibitors identified the leukemia specific inhibitor 1 (Leusin-1). Leusin-1 arrests cells in G2 phase and triggers an apoptotic cell death. Most importantly, Leusin-1 was more active in acute lymphoblastic leukemia cells than other types of leukemias, non-blood cancers, or normal cells and represents a lead molecule for developing antileukemic drugs.
Collapse
|
88
|
Downregulation of CYB5D2 is associated with breast cancer progression. Sci Rep 2019; 9:6624. [PMID: 31036830 PMCID: PMC6488675 DOI: 10.1038/s41598-019-43006-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 04/10/2019] [Indexed: 12/14/2022] Open
Abstract
We report here that CYB5D2 is associated with tumor suppression function in breast cancer (BC). CYB5D2 expression was significantly reduced in tamoxifen resistant MCF7 cells and in MCF7 cell-derived xenografts treated with TAM. CYB5D2 overexpression induced apoptosis in MCF7 cells; CYB5D2 knockdown enhanced MCF7 cell proliferation. Using the TCGA and Curtis datasets within the Oncomine database, CYB5D2 mRNA expression was downregulated in primary BCs vs breast tissues and HER2-positive or triple negative BCs vs estrogen receptor (ER)-positive BCs. Using the TCGA and Metabric datasets (n = 817 and n = 2509) within cBioPortal, 660 and 4891 differentially expressed genes (DEGs) in relation to CYB5D2 were identified. These DEGs were enriched in pathways governing cell cycle progression, progesterone-derived oocyte maturation, oocyte-meiosis, estrogen-mediated S-phase entry, and DNA metabolism. CYB5D2 downregulation decreased overall survival (OS, p = 0.0408). A CYB5D2-derived 21-gene signature was constructed and robustly correlated with OS shortening (p = 5.72e-12), and independently predicted BC deaths (HR = 1.28; 95% CI 1.08–1.52; p = 0.004) once adjusting for known clinical factors. CYB5D2 reductions displayed relationship with mutations in PIK3CA, GATA3, MAP3K1, CDH1, TP53 and RB1. Impressively, 85% (560/659) of TP53 mutations occurred in the 21-gene signature-positive BC. Collectively, we provide the first evidence that CYB5D2 is a candidate tumor suppressor of BC.
Collapse
|
89
|
Geng P, Tong X, Lu Q. An integrative U method for joint analysis of multi-level omic data. BMC Genet 2019; 20:40. [PMID: 30967125 PMCID: PMC6457037 DOI: 10.1186/s12863-019-0742-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Accepted: 03/20/2019] [Indexed: 11/30/2022] Open
Abstract
Background The advance of high-throughput technologies has made it cost-effective to collect diverse types of omic data in large-scale clinical and biological studies. While the collection of the vast amounts of multi-level omic data from these studies provides a great opportunity for genetic research, the high dimensionality of omic data and complex relationships among multi-level omic data bring tremendous analytic challenges. Results To address these challenges, we develop an integrative U (IU) method for the design and analysis of multi-level omic data. While non-parametric methods make less model assumptions and are flexible for analyzing different types of phenotypes and omic data, they have been less developed for association analysis of omic data. The IU method is a nonparametric method that can accommodate various types of omic and phenotype data, and consider interactive relationship among different levels of omic data. Through simulations and a real data application, we compare the IU test with commonly used variance component tests. Conclusions Results show that the proposed test attains more robust type I error performance and higher empirical power than variance component tests under various types of phenotypes and different underlying interaction effects. Electronic supplementary material The online version of this article (10.1186/s12863-019-0742-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Pei Geng
- Department of Mathematics, Illinois State University, Normal, IL, 61761, USA
| | - Xiaoran Tong
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, 48824, USA
| | - Qing Lu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
90
|
Xu A, Chen J, Peng H, Han G, Cai H. Simultaneous Interrogation of Cancer Omics to Identify Subtypes With Significant Clinical Differences. Front Genet 2019; 10:236. [PMID: 30984238 PMCID: PMC6448130 DOI: 10.3389/fgene.2019.00236] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Accepted: 03/04/2019] [Indexed: 11/21/2022] Open
Abstract
Recent advances in high-throughput sequencing have accelerated the accumulation of omics data on the same tumor tissue from multiple sources. Intensive study of multi-omics integration on tumor samples can stimulate progress in precision medicine and is promising in detecting potential biomarkers. However, current methods are restricted owing to highly unbalanced dimensions of omics data or difficulty in assigning weights between different data sources. Therefore, the appropriate approximation and constraints of integrated targets remain a major challenge. In this paper, we proposed an omics data integration method, named high-order path elucidated similarity (HOPES). HOPES fuses the similarities derived from various omics data sources to solve the dimensional discrepancy, and progressively elucidate the similarities from each type of omics data into an integrated similarity with various high-order connected paths. Through a series of incremental constraints for commonality, HOPES can take both specificity of single data and consistency between different data types into consideration. The fused similarity matrix gives global insight into patients' correlation and efficiently distinguishes subgroups. We tested the performance of HOPES on both a simulated dataset and several empirical tumor datasets. The test datasets contain three omics types including gene expression, DNA methylation, and microRNA data for five different TCGA cancer projects. Our method was shown to achieve superior accuracy and high robustness compared with several benchmark methods on simulated data. Further experiments on five cancer datasets demonstrated that HOPES achieved superior performances in cancer classification. The stratified subgroups were shown to have statistically significant differences in survival. We further located and identified the key genes, methylation sites, and microRNAs within each subgroup. They were shown to achieve high potential prognostic value and were enriched in many cancer-related biological processes or pathways.
Collapse
Affiliation(s)
- Aodan Xu
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| | - Jiazhou Chen
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| | - Hong Peng
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| | - GuoQiang Han
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| | - Hongmin Cai
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
| |
Collapse
|
91
|
López de Maturana E, Alonso L, Alarcón P, Martín-Antoniano IA, Pineda S, Piorno L, Calle ML, Malats N. Challenges in the Integration of Omics and Non-Omics Data. Genes (Basel) 2019; 10:genes10030238. [PMID: 30897838 PMCID: PMC6471713 DOI: 10.3390/genes10030238] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Revised: 03/05/2019] [Accepted: 03/14/2019] [Indexed: 11/16/2022] Open
Abstract
Omics data integration is already a reality. However, few omics-based algorithms show enough predictive ability to be implemented into clinics or public health domains. Clinical/epidemiological data tend to explain most of the variation of health-related traits, and its joint modeling with omics data is crucial to increase the algorithm’s predictive ability. Only a small number of published studies performed a “real” integration of omics and non-omics (OnO) data, mainly to predict cancer outcomes. Challenges in OnO data integration regard the nature and heterogeneity of non-omics data, the possibility of integrating large-scale non-omics data with high-throughput omics data, the relationship between OnO data (i.e., ascertainment bias), the presence of interactions, the fairness of the models, and the presence of subphenotypes. These challenges demand the development and application of new analysis strategies to integrate OnO data. In this contribution we discuss different attempts of OnO data integration in clinical and epidemiological studies. Most of the reviewed papers considered only one type of omics data set, mainly RNA expression data. All selected papers incorporated non-omics data in a low-dimensionality fashion. The integrative strategies used in the identified papers adopted three modeling methods: Independent, conditional, and joint modeling. This review presents, discusses, and proposes integrative analytical strategies towards OnO data integration.
Collapse
Affiliation(s)
- Evangelina López de Maturana
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Melchor Fernández Almagro 3, 28029 Madrid, Spain.
| | - Lola Alonso
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Melchor Fernández Almagro 3, 28029 Madrid, Spain.
| | - Pablo Alarcón
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Melchor Fernández Almagro 3, 28029 Madrid, Spain.
| | - Isabel Adoración Martín-Antoniano
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Melchor Fernández Almagro 3, 28029 Madrid, Spain.
| | - Silvia Pineda
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Melchor Fernández Almagro 3, 28029 Madrid, Spain.
| | - Lucas Piorno
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Melchor Fernández Almagro 3, 28029 Madrid, Spain.
| | - M Luz Calle
- Biosciences Department, University of Vic-Central University of Catalonia, Carrer de la Laura 13, 08570 Vic, Spain.
| | - Núria Malats
- Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), and CIBERONC, Melchor Fernández Almagro 3, 28029 Madrid, Spain.
| |
Collapse
|
92
|
Shafi A, Nguyen T, Peyvandipour A, Nguyen H, Draghici S. A Multi-Cohort and Multi-Omics Meta-Analysis Framework to Identify Network-Based Gene Signatures. Front Genet 2019; 10:159. [PMID: 30941158 PMCID: PMC6434849 DOI: 10.3389/fgene.2019.00159] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Accepted: 02/14/2019] [Indexed: 12/20/2022] Open
Abstract
Although massive amounts of condition-specific molecular profiles are being accumulated in public repositories every day, meaningful interpretation of these data remains a major challenge. In an effort to identify the biomarkers that describe the key biological phenomena for a given condition, several approaches have been developed over the past few years. However, the majority of these approaches either (i) do not consider the known intermolecular interactions, or (ii) do not integrate molecular data of multiple types (e.g., genomics, transcriptomics, proteomics, epigenomics, etc.), and thus potentially fail to capture the true biological changes responsible for complex diseases (e.g., cancer). In addition, these approaches often ignore the heterogeneity and study bias present in independent molecular cohorts. In this manuscript, we propose a novel multi-cohort and multi-omics meta-analysis framework that overcomes all three limitations mentioned above in order to identify robust molecular subnetworks that capture the key dynamic nature of a given biological condition. Our framework integrates multiple independent gene expression studies, unmatched DNA methylation studies, and protein-protein interactions to identify methylation-driven subnetworks. We demonstrate the proposed framework by constructing subnetworks related to two complex diseases: glioblastoma and low-grade gliomas. We validate the identified subnetworks by showing their ability to predict patients' clinical outcome on multiple independent validation cohorts.
Collapse
Affiliation(s)
- Adib Shafi
- Department of Computer Science, Wayne State University, Detroit, MI, United States
| | - Tin Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, NV, United States
| | - Azam Peyvandipour
- Department of Computer Science, Wayne State University, Detroit, MI, United States
| | - Hung Nguyen
- Department of Computer Science and Engineering, University of Nevada, Reno, NV, United States
| | - Sorin Draghici
- Department of Computer Science, Wayne State University, Detroit, MI, United States.,Department of Obstetrics and Gynecology, Wayne State University, Detroit, MI, United States
| |
Collapse
|
93
|
Li Y, Wu FX, Ngom A. A review on machine learning principles for multi-view biological data integration. Brief Bioinform 2019; 19:325-340. [PMID: 28011753 DOI: 10.1093/bib/bbw113] [Citation(s) in RCA: 124] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Indexed: 01/08/2023] Open
Abstract
Driven by high-throughput sequencing techniques, modern genomic and clinical studies are in a strong need of integrative machine learning models for better use of vast volumes of heterogeneous information in the deep understanding of biological systems and the development of predictive models. How data from multiple sources (called multi-view data) are incorporated in a learning system is a key step for successful analysis. In this article, we provide a comprehensive review on omics and clinical data integration techniques, from a machine learning perspective, for various analyses such as prediction, clustering, dimension reduction and association. We shall show that Bayesian models are able to use prior information and model measurements with various distributions; tree-based methods can either build a tree with all features or collectively make a final decision based on trees learned from each view; kernel methods fuse the similarity matrices learned from individual views together for a final similarity matrix or learning model; network-based fusion methods are capable of inferring direct and indirect associations in a heterogeneous network; matrix factorization models have potential to learn interactions among features from different views; and a range of deep neural networks can be integrated in multi-modal learning for capturing the complex mechanism of biological systems.
Collapse
Affiliation(s)
- Yifeng Li
- Information and Communications Technologies, National Research Council Canada, Ottawa, Ontario, Canada
| | - Fang-Xiang Wu
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Alioune Ngom
- School of Computer Science, University of Windsor, Windsor, Ontario, Canada
| |
Collapse
|
94
|
Manzoni C, Kia DA, Vandrovcova J, Hardy J, Wood NW, Lewis PA, Ferrari R. Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences. Brief Bioinform 2019; 19:286-302. [PMID: 27881428 PMCID: PMC6018996 DOI: 10.1093/bib/bbw114] [Citation(s) in RCA: 352] [Impact Index Per Article: 70.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Indexed: 02/07/2023] Open
Abstract
Advances in the technologies and informatics used to generate and process large biological data sets (omics data) are promoting a critical shift in the study of biomedical sciences. While genomics, transcriptomics and proteinomics, coupled with bioinformatics and biostatistics, are gaining momentum, they are still, for the most part, assessed individually with distinct approaches generating monothematic rather than integrated knowledge. As other areas of biomedical sciences, including metabolomics, epigenomics and pharmacogenomics, are moving towards the omics scale, we are witnessing the rise of inter-disciplinary data integration strategies to support a better understanding of biological systems and eventually the development of successful precision medicine. This review cuts across the boundaries between genomics, transcriptomics and proteomics, summarizing how omics data are generated, analysed and shared, and provides an overview of the current strengths and weaknesses of this global approach. This work intends to target students and researchers seeking knowledge outside of their field of expertise and fosters a leap from the reductionist to the global-integrative analytical approach in research.
Collapse
Affiliation(s)
- Claudia Manzoni
- School of Pharmacy, University of Reading, Whiteknights, Reading, United Kingdom.,Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Demis A Kia
- Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Jana Vandrovcova
- Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - John Hardy
- Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Nicholas W Wood
- Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Patrick A Lewis
- School of Pharmacy, University of Reading, Whiteknights, Reading, United Kingdom.,Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Raffaele Ferrari
- Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| |
Collapse
|
95
|
Kartha VK, Sebastiani P, Kern JG, Zhang L, Varelas X, Monti S. CaDrA: A Computational Framework for Performing Candidate Driver Analyses Using Genomic Features. Front Genet 2019; 10:121. [PMID: 30838036 PMCID: PMC6390206 DOI: 10.3389/fgene.2019.00121] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2018] [Accepted: 02/04/2019] [Indexed: 12/12/2022] Open
Abstract
The identification of genetic alteration combinations as drivers of a given phenotypic outcome, such as drug sensitivity, gene or protein expression, and pathway activity, is a challenging task that is essential to gaining new biological insights and to discovering therapeutic targets. Existing methods designed to predict complementary drivers of such outcomes lack analytical flexibility, including the support for joint analyses of multiple genomic alteration types, such as somatic mutations and copy number alterations, multiple scoring functions, and rigorous significance and reproducibility testing procedures. To address these limitations, we developed Candidate Driver Analysis or CaDrA, an integrative framework that implements a step-wise heuristic search approach to identify functionally relevant subsets of genomic features that, together, are maximally associated with a specific outcome of interest. We show CaDrA's overall high sensitivity and specificity for typically sized multi-omic datasets using simulated data, and demonstrate CaDrA's ability to identify known mutations linked with sensitivity of cancer cells to drug treatment using data from the Cancer Cell Line Encyclopedia (CCLE). We further apply CaDrA to identify novel regulators of oncogenic activity mediated by Hippo signaling pathway effectors YAP and TAZ in primary breast cancer tumors using data from The Cancer Genome Atlas (TCGA), which we functionally validate in vitro. Finally, we use pan-cancer TCGA protein expression data to show the high reproducibility of CaDrA's search procedure. Collectively, this work demonstrates the utility of our framework for supporting the fast querying of large, publicly available multi-omics datasets, including but not limited to TCGA and CCLE, for potential drivers of a given target profile of interest.
Collapse
Affiliation(s)
- Vinay K. Kartha
- Bioinformatics Program, Boston University, Boston, MA, United States
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA, United States
| | - Paola Sebastiani
- Bioinformatics Program, Boston University, Boston, MA, United States
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, United States
| | - Joseph G. Kern
- Department of Biochemistry, Boston University School of Medicine, Boston, MA, United States
| | - Liye Zhang
- School of Life Sciences and Technology, ShanghaiTech University, Shanghai, China
| | - Xaralabos Varelas
- Department of Biochemistry, Boston University School of Medicine, Boston, MA, United States
| | - Stefano Monti
- Bioinformatics Program, Boston University, Boston, MA, United States
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA, United States
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, United States
| |
Collapse
|
96
|
Lu H, Arshad M, Thornton A, Avesani G, Cunnea P, Curry E, Kanavati F, Liang J, Nixon K, Williams ST, Hassan MA, Bowtell DDL, Gabra H, Fotopoulou C, Rockall A, Aboagye EO. A mathematical-descriptor of tumor-mesoscopic-structure from computed-tomography images annotates prognostic- and molecular-phenotypes of epithelial ovarian cancer. Nat Commun 2019; 10:764. [PMID: 30770825 PMCID: PMC6377605 DOI: 10.1038/s41467-019-08718-9] [Citation(s) in RCA: 103] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Accepted: 01/24/2019] [Indexed: 12/11/2022] Open
Abstract
The five-year survival rate of epithelial ovarian cancer (EOC) is approximately 35-40% despite maximal treatment efforts, highlighting a need for stratification biomarkers for personalized treatment. Here we extract 657 quantitative mathematical descriptors from the preoperative CT images of 364 EOC patients at their initial presentation. Using machine learning, we derive a non-invasive summary-statistic of the primary ovarian tumor based on 4 descriptors, which we name "Radiomic Prognostic Vector" (RPV). RPV reliably identifies the 5% of patients with median overall survival less than 2 years, significantly improves established prognostic methods, and is validated in two independent, multi-center cohorts. Furthermore, genetic, transcriptomic and proteomic analysis from two independent datasets elucidate that stromal phenotype and DNA damage response pathways are activated in RPV-stratified tumors. RPV and its associated analysis platform could be exploited to guide personalized therapy of EOC and is potentially transferrable to other cancer types.
Collapse
Affiliation(s)
- Haonan Lu
- Ovarian Cancer Action Research Centre, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, W12 0HS, UK
- Cancer Imaging Centre, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, W12 0HS, UK
| | - Mubarik Arshad
- Cancer Imaging Centre, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, W12 0HS, UK
| | - Andrew Thornton
- Ovarian Cancer Action Research Centre, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, W12 0HS, UK
| | - Giacomo Avesani
- Cancer Imaging Centre, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, W12 0HS, UK
| | - Paula Cunnea
- Ovarian Cancer Action Research Centre, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, W12 0HS, UK
| | - Ed Curry
- Ovarian Cancer Action Research Centre, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, W12 0HS, UK
| | - Fahdi Kanavati
- Cancer Imaging Centre, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, W12 0HS, UK
| | - Jack Liang
- Cancer Imaging Centre, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, W12 0HS, UK
| | - Katherine Nixon
- Ovarian Cancer Action Research Centre, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, W12 0HS, UK
| | - Sophie T Williams
- Ovarian Cancer Action Research Centre, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, W12 0HS, UK
| | - Mona Ali Hassan
- Ovarian Cancer Action Research Centre, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, W12 0HS, UK
| | - David D L Bowtell
- Peter MacCallum Cancer Centre, Melbourne, 3010, VIC, Australia
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Melbourne, 3010, VIC, Australia
| | - Hani Gabra
- Ovarian Cancer Action Research Centre, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, W12 0HS, UK
- Early Clinical Development, iMED Biotech Unit, AstraZeneca, Cambridge, SG8 6HB, UK
| | - Christina Fotopoulou
- Ovarian Cancer Action Research Centre, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, W12 0HS, UK
| | - Andrea Rockall
- Cancer Imaging Centre, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, W12 0HS, UK
- Department of Radiology, Imperial College Healthcare NHS Trust, London, W12 0HS, UK
- Department of Radiology, The Royal Marsden NHS Foundation Trust, London, SW3 6JJ, UK
| | - Eric O Aboagye
- Cancer Imaging Centre, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, W12 0HS, UK.
| |
Collapse
|
97
|
Visvanathan A, Patil V, Abdulla S, Hoheisel JD, Somasundaram K. N⁶-Methyladenosine Landscape of Glioma Stem-Like Cells: METTL3 Is Essential for the Expression of Actively Transcribed Genes and Sustenance of the Oncogenic Signaling. Genes (Basel) 2019; 10:E141. [PMID: 30781903 PMCID: PMC6410051 DOI: 10.3390/genes10020141] [Citation(s) in RCA: 68] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 11/22/2018] [Accepted: 11/28/2018] [Indexed: 01/26/2023] Open
Abstract
Despite recent advances in N⁶-methyladenosine (m⁶A) biology, the regulation of crucial RNA processing steps by the RNA methyltransferase-like 3 (METTL3) in glioma stem-like cells (GSCs) remains obscure. An integrated analysis of m⁶A-RIP (RNA immunoprecipitation) and total RNA-Seq of METTL3-silenced GSCs identified that m⁶A modification in GSCs is principally carried out by METTL3. The m⁶A-modified transcripts showed higher abundance compared to non-modified transcripts. Further, we showed that the METTL3 is essential for the expression of GSC-specific actively transcribed genes. Silencing METTL3 resulted in the elevation of several aberrant alternative splicing events. We also found that putative m⁶A reader proteins play a key role in the RNA stabilization function of METTL3. METTL3 altered A-to-I and C-to-U RNA editing events by differentially regulating RNA editing enzymes ADAR and APOBEC3A. Similar to protein-coding genes, lincRNAs (long intergenic non-coding RNAs) with m⁶A marks showed METTL3-dependent high expression. m⁶A modification of 3'UTRs appeared to result in a conformation-dependent hindrance to miRNA binding to their targets. The integrated analysis of the m⁶A regulome in METTL3-silenced GSCs showed global disruption in tumorigenic pathways that are indispensable for GSC maintenance and glioma progression. We conclude that METTL3 plays a vital role in many steps of RNA processing and orchestrates successful execution of oncogenic pathways in GSCs.
Collapse
Affiliation(s)
- Abhirami Visvanathan
- Department of Microbiology and Cell Biology, Indian Institute of Science, Bangalore 560012, India.
| | - Vikas Patil
- Department of Microbiology and Cell Biology, Indian Institute of Science, Bangalore 560012, India.
| | - Shibla Abdulla
- Department of Microbiology and Cell Biology, Indian Institute of Science, Bangalore 560012, India.
| | - Jörg D Hoheisel
- Functional Genome Analysis, Deutsches Krebsforschungszentrum (DKFZ), Im Neuenheimer Feld 580, 69120 Heidelberg, Germany.
| | - Kumaravel Somasundaram
- Department of Microbiology and Cell Biology, Indian Institute of Science, Bangalore 560012, India.
| |
Collapse
|
98
|
Li X, Yang M, Zhang Q, Fan Y, Zhu T, Chen F, Wang K. Whole Exome Sequencing in the Accurate Diagnosis of Bilateral Breast Cancer: a Case Study. J Breast Cancer 2019; 22:131-140. [PMID: 30941240 PMCID: PMC6438836 DOI: 10.4048/jbc.2019.22.e10] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Accepted: 12/25/2018] [Indexed: 01/02/2023] Open
Abstract
When faced with a case of bilateral breast cancer (BBC), understanding how to differentiate bilateral primary breast cancer from contralateral metastatic breast cancer is essential for treatment, but clear identification criteria have not been established to date. Diverse events play different roles in the therapy and prognosis of BBC; hence, it is of great significance to detect a more comprehensive and convincing technique to make an accurate differential diagnosis. We report a rare case of synchronous BBC in a 61-year-old Chinese woman. Based on her clinical and pathological features and the use of whole exome sequencing and cancer genome analysis, we concluded that the patient developed contralateral metastatic breast cancer which metastasized from left to right. Therefore, together with clinical, pathological and cancer genomics information, we could precisely define the origin and evolution of BBC.
Collapse
Affiliation(s)
- Xiaoling Li
- The Second School of Clinical Medicine, Southern Medical University, Guangzhou, China.,Department of Breast Cancer, Cancer Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Mei Yang
- Department of Breast Cancer, Cancer Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | | | | | - Teng Zhu
- Department of Breast Cancer, Cancer Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Fulong Chen
- Department of Breast Cancer, Cancer Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Kun Wang
- The Second School of Clinical Medicine, Southern Medical University, Guangzhou, China.,Department of Breast Cancer, Cancer Center, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| |
Collapse
|
99
|
Xu J, Yang P, Xue S, Sharma B, Sanchez-Martin M, Wang F, Beaty KA, Dehan E, Parikh B. Translating cancer genomics into precision medicine with artificial intelligence: applications, challenges and future perspectives. Hum Genet 2019; 138:109-124. [PMID: 30671672 PMCID: PMC6373233 DOI: 10.1007/s00439-019-01970-5] [Citation(s) in RCA: 94] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Accepted: 01/02/2019] [Indexed: 02/07/2023]
Abstract
In the field of cancer genomics, the broad availability of genetic information offered by next-generation sequencing technologies and rapid growth in biomedical publication has led to the advent of the big-data era. Integration of artificial intelligence (AI) approaches such as machine learning, deep learning, and natural language processing (NLP) to tackle the challenges of scalability and high dimensionality of data and to transform big data into clinically actionable knowledge is expanding and becoming the foundation of precision medicine. In this paper, we review the current status and future directions of AI application in cancer genomics within the context of workflows to integrate genomic analysis for precision cancer care. The existing solutions of AI and their limitations in cancer genetic testing and diagnostics such as variant calling and interpretation are critically analyzed. Publicly available tools or algorithms for key NLP technologies in the literature mining for evidence-based clinical recommendations are reviewed and compared. In addition, the present paper highlights the challenges to AI adoption in digital healthcare with regard to data requirements, algorithmic transparency, reproducibility, and real-world assessment, and discusses the importance of preparing patients and physicians for modern digitized healthcare. We believe that AI will remain the main driver to healthcare transformation toward precision medicine, yet the unprecedented challenges posed should be addressed to ensure safety and beneficial impact to healthcare.
Collapse
Affiliation(s)
- Jia Xu
- IBM Watson Health, Cambridge, MA, USA.
| | | | - Shang Xue
- IBM Watson Health, Cambridge, MA, USA
| | | | | | - Fang Wang
- IBM Watson Health, Cambridge, MA, USA
| | | | | | | |
Collapse
|
100
|
Jain Y, Ding S, Qiu J. Sliced inverse regression for integrative multi-omics data analysis. Stat Appl Genet Mol Biol 2019; 18:/j/sagmb.ahead-of-print/sagmb-2018-0028/sagmb-2018-0028.xml. [PMID: 30685747 DOI: 10.1515/sagmb-2018-0028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Advancement in next-generation sequencing, transcriptomics, proteomics and other high-throughput technologies has enabled simultaneous measurement of multiple types of genomic data for cancer samples. These data together may reveal new biological insights as compared to analyzing one single genome type data. This study proposes a novel use of supervised dimension reduction method, called sliced inverse regression, to multi-omics data analysis to improve prediction over a single data type analysis. The study further proposes an integrative sliced inverse regression method (integrative SIR) for simultaneous analysis of multiple omics data types of cancer samples, including MiRNA, MRNA and proteomics, to achieve integrative dimension reduction and to further improve prediction performance. Numerical results show that integrative analysis of multi-omics data is beneficial as compared to single data source analysis, and more importantly, that supervised dimension reduction methods possess advantages in integrative data analysis in terms of classification and prediction as compared to unsupervised dimension reduction methods.
Collapse
Affiliation(s)
- Yashita Jain
- Center for Bioinformatics and Computational Biology, University of Delaware, 15 Innovation Way, Newark, DE 19711, USA
| | - Shanshan Ding
- Center for Bioinformatics and Computational Biology, University of Delaware, 15 Innovation Way, Newark, DE 19711, USA.,Department of Applied Economics and Statistics, University of Delaware, 531 S College Ave., Newark, DE 19711, USA
| | - Jing Qiu
- Center for Bioinformatics and Computational Biology, University of Delaware, 15 Innovation Way, Newark, DE 19711, USA.,Department of Applied Economics and Statistics, University of Delaware, 531 S College Ave., Newark, DE 19711, USA
| |
Collapse
|