1
|
Mallick S, Qamar Q, Mishra B, Nayak A. The mechanism underlying the oncogenic potential of AAA+ ATPase PSMC4 in cancer is revealed by mutations and copy number amplifications. Mutat Res 2025; 830:111901. [PMID: 39985882 DOI: 10.1016/j.mrfmmm.2025.111901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Revised: 01/20/2025] [Accepted: 02/12/2025] [Indexed: 02/24/2025]
Abstract
Recent research has discovered a connection between the AAA+ ATPase PSMC4 (Proteasome 26S Subunit, ATPase 4) and several forms of cancer. However, a detailed analysis of the oncogenic potential of PSMC4 was elusive. In this study, we anticipate PSMC4's potential as a cancer biomarker. We aimed to comprehensively assess the expression profiles, prognostic significance, and relevant cellular pathways associated with it. Through our examination of various types of cancers, PSMC4 is found to be overexpressed. Interestingly, our result finds a positive correlation between PSMC4 overexpression and unfavourable overall survival rates in cancer. Further, we looked into the mutations and copy number amplifications of PSMC4 across various cancers. Our study reveals that missense mutations plays a great role behind the oncogenic potential of PSMC4. Several possible mutation sites are predicted. Interestingly, we found fifteen hotspot mutations in the ATPase domain of PSMC4. Additionally, PSMC4 has shown a high amplification percentage in various cancers. We are additionally attentive to the functional characteristics of the protein PSMC4 across various types of cancer. In the protein-protein interaction analyses, it was found that multiple oncoproteins were directly interacting with PSMC4. The top signaling pathways of PSMC4 also indicate that it plays a crucial role in cancer development. Overall, this study reveals that PSMC4 could be a potential diagnostic and prognostic marker for cancer, making it a promising biomarker and target.
Collapse
Affiliation(s)
- Sanjida Mallick
- Department of Life Science, Guru Nanak Institute of Pharmaceutical Science and Technology, 157/F, Nilgunj Rd, Sahid Colony, Panihati, Kolkata, West Bengal 700114, India
| | - Qurratulain Qamar
- Department of Life Science, Guru Nanak Institute of Pharmaceutical Science and Technology, 157/F, Nilgunj Rd, Sahid Colony, Panihati, Kolkata, West Bengal 700114, India
| | - Bibhudutta Mishra
- Department of Biotechnology, NIST University, Bramhapur, Odisha 761008, India
| | - Aditi Nayak
- Department of Life Science, Guru Nanak Institute of Pharmaceutical Science and Technology, 157/F, Nilgunj Rd, Sahid Colony, Panihati, Kolkata, West Bengal 700114, India.
| |
Collapse
|
2
|
Ono K, Eguchi T. Large-Scale Databases and Portals on Cancer Genome to Analyze Chaperone Genes Correlated to Patient Prognosis. Methods Mol Biol 2023; 2693:293-306. [PMID: 37540443 DOI: 10.1007/978-1-0716-3342-7_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/05/2023]
Abstract
Molecular chaperones, such as heat shock proteins (HSPs), have attracted attention as molecules involved in malignant events in cancers and are potential therapeutic targets and biomarkers for tumor therapy. Furthermore, mutations in chaperones can significantly impact cancer risk and prognosis. Bioinformatics is a particularly useful method for developing biomarkers as a practical consideration for the immediate clinical application of data. Many large-scale databases and portals on cancer genome are nowadays publicly available, including the International Cancer Genome Consortium (ICGC); The Cancer Genome Atlas (TCGA), renamed as Genomic Data Commons (GDC); Catalogue of Somatic Mutations in Cancer (COSMIC); and Cancer Cell Line Encyclopedia (CCLE). Referring to these databases, advanced web portals are publicized, including cBioPortal, Human Protein Atlas (HPA), Kaplan-Meier (KM) plotter, Gene Expression Profiling Interactive Analysis 2 (GEPIA2), Genomics of Drug Sensitivity in Cancer (GDSC), and Dependency Map (DepMap). Here, we assemble these databases and portals to clarify what is available and useful for current cancer research and provide protocols to utilize the HPA, KM plotter, and GEPIA2 for studies on chaperone genes in cancer patients. Utilizing these portals will reveal the correlation between tumor subtype-specific high expression of chaperone genes and patient prognosis. Our protocols are useful to increase systematic awareness of chaperones and find new biomarkers for diagnosis and prognosis and new targets for anticancer drugs.
Collapse
Affiliation(s)
- Kisho Ono
- Department of Oral and Maxillofacial Surgery, Okayama University Hospital, Okayama, Japan
| | - Takanori Eguchi
- Department of Dental Pharmacology, Faculty of Medicine, Dentistry and Pharmaceutical Sciences, Okayama University, Okayama, Japan.
| |
Collapse
|
3
|
Gendoo DMA. Overview of Bioinformatics Software and Databases for Metabolic Engineering. Methods Mol Biol 2023; 2553:265-274. [PMID: 36227548 DOI: 10.1007/978-1-0716-2617-7_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The explosion of the "omics" era has introduced a growing number of sets and tools that facilitate molecular interrogation of the metabolome. These include various bioinformatics and pharmacogenomics resources that can be utilized independently or collectively to facilitate metabolic engineering across disease, clinical oncology, and understanding of molecular changes across larger systems. This review provides starting points for accessing publicly available data and computational tools that support assessment of metabolic profiles and metabolic regulation, providing both a depth-and-breadth approach toward understanding the metabolome. We focus in particular on pathway databases and tools, which provide in-depth analysis of metabolic pathways, which is at the heart of metabolic engineering.
Collapse
Affiliation(s)
- Deena M A Gendoo
- Centre for Computational Biology, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom.
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom.
| |
Collapse
|
4
|
Planell N, Lagani V, Sebastian-Leon P, van der Kloet F, Ewing E, Karathanasis N, Urdangarin A, Arozarena I, Jagodic M, Tsamardinos I, Tarazona S, Conesa A, Tegner J, Gomez-Cabrero D. STATegra: Multi-Omics Data Integration - A Conceptual Scheme With a Bioinformatics Pipeline. Front Genet 2021; 12:620453. [PMID: 33747045 PMCID: PMC7970106 DOI: 10.3389/fgene.2021.620453] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 01/20/2021] [Indexed: 12/13/2022] Open
Abstract
Technologies for profiling samples using different omics platforms have been at the forefront since the human genome project. Large-scale multi-omics data hold the promise of deciphering different regulatory layers. Yet, while there is a myriad of bioinformatics tools, each multi-omics analysis appears to start from scratch with an arbitrary decision over which tools to use and how to combine them. Therefore, it is an unmet need to conceptualize how to integrate such data and implement and validate pipelines in different cases. We have designed a conceptual framework (STATegra), aiming it to be as generic as possible for multi-omics analysis, combining available multi-omic anlaysis tools (machine learning component analysis, non-parametric data combination, and a multi-omics exploratory analysis) in a step-wise manner. While in several studies, we have previously combined those integrative tools, here, we provide a systematic description of the STATegra framework and its validation using two The Cancer Genome Atlas (TCGA) case studies. For both, the Glioblastoma and the Skin Cutaneous Melanoma (SKCM) cases, we demonstrate an enhanced capacity of the framework (and beyond the individual tools) to identify features and pathways compared to single-omics analysis. Such an integrative multi-omics analysis framework for identifying features and components facilitates the discovery of new biology. Finally, we provide several options for applying the STATegra framework when parametric assumptions are fulfilled and for the case when not all the samples are profiled for all omics. The STATegra framework is built using several tools, which are being integrated step-by-step as OpenSource in the STATegRa Bioconductor package.
Collapse
Affiliation(s)
- Nuria Planell
- Translational Bioinformatics Unit, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Pamplona, Spain
| | - Vincenzo Lagani
- Institute of Chemical Biology, Ilia State University, Tbilisi, Georgia
- Gnosis Data Analysis P.C., Heraklion, Greece
| | - Patricia Sebastian-Leon
- Department of Genomic and Systems Reproductive Medicine, IVI-RMA (Instituto Valenciano de Infertilidad – Reproductive Medicine Associates) IVI Foundation, Valencia, Spain
| | - Frans van der Kloet
- Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, Netherlands
| | - Ewoud Ewing
- Department of Clinical Neuroscience, Karolinska Institutet, Center for Molecular Medicine, Karolinska University Hospital, Stockholm, Sweden
| | - Nestoras Karathanasis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, Greece
- Computational Medicine Center, Thomas Jefferson University, Philadelphia, PA, United States
| | - Arantxa Urdangarin
- Translational Bioinformatics Unit, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Pamplona, Spain
| | - Imanol Arozarena
- Cancer Signalling Unit, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), Health Research Institute of Navarre (IdiSNA), Pamplona, Spain
| | - Maja Jagodic
- Department of Clinical Neuroscience, Karolinska Institutet, Center for Molecular Medicine, Karolinska University Hospital, Stockholm, Sweden
| | - Ioannis Tsamardinos
- Gnosis Data Analysis P.C., Heraklion, Greece
- Computer Science Department, University of Crete, Heraklion, Greece
| | - Sonia Tarazona
- Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, València, Spain
| | - Ana Conesa
- Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL, United States
- Genetics Institute, University of Florida, Gainesville, FL, United States
| | - Jesper Tegner
- Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Unit of Computational Medicine, Department of Medicine, Center for Molecular Medicine, Karolinska Institutet, Karolinska University Hospital, Stockholm, Sweden
- Science for Life Laboratory, Solna, Sweden
| | - David Gomez-Cabrero
- Translational Bioinformatics Unit, Navarrabiomed, Complejo Hospitalario de Navarra (CHN), Universidad Pública de Navarra (UPNA), IdiSNA, Pamplona, Spain
- Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Unit of Computational Medicine, Department of Medicine, Center for Molecular Medicine, Karolinska Institutet, Karolinska University Hospital, Stockholm, Sweden
- Mucosal & Salivary Biology DivisionKing’s College London Dental Institute, London, United Kingdom
| |
Collapse
|
5
|
Almeida JR, Pratas D, Oliveira JL. A semi-automatic methodology for analysing distributed and private biobanks. Comput Biol Med 2020; 130:104180. [PMID: 33360272 DOI: 10.1016/j.compbiomed.2020.104180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 12/14/2020] [Accepted: 12/14/2020] [Indexed: 10/22/2022]
Abstract
Privacy issues limit the analysis and cross-exploration of most distributed and private biobanks, often raised by the multiple dimensionality and sensitivity of the data associated with access restrictions and policies. These characteristics prevent collaboration between entities, constituting a barrier to emergent personalized and public health challenges, namely the discovery of new druggable targets, identification of disease-causing genetic variants, or the study of rare diseases. In this paper, we propose a semi-automatic methodology for the analysis of distributed and private biobanks. The strategies involved in the proposed methodology efficiently enable the creation and execution of unified genomic studies using distributed repositories, without compromising the information present in the datasets. We apply the methodology to a case study in the current Covid-19, ensuring the combination of the diagnostics from multiple entities while maintaining privacy through a completely identical procedure. Moreover, we show that the methodology follows a simple, intuitive, and practical scheme.
Collapse
Affiliation(s)
- João Rafael Almeida
- DETI/IEETA, University of Aveiro, Aveiro, Portugal; Department of Computation, University of A Coruña, A Coruña, Spain.
| | - Diogo Pratas
- DETI/IEETA, University of Aveiro, Aveiro, Portugal; Department of Virology, University of Helsinki, Helsinki, Finland.
| | | |
Collapse
|
6
|
Nikmanesh F, Sarhadi S, Dadashpour M, Asgari Y, Zarghami N. Omics Integration Analysis Unravel the Landscape of Driving Mechanisms of Colorectal Cancer. Asian Pac J Cancer Prev 2020; 21:3539-3549. [PMID: 33369450 PMCID: PMC8046321 DOI: 10.31557/apjcp.2020.21.12.3539] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Accepted: 11/30/2020] [Indexed: 02/06/2023] Open
Abstract
Colorectal cancer (CRC) is one of the most malignant cancers and results in a substantial rate of morbidity and mortality. Diagnosis of this malignancy in early stages increases the chance of effective treatment. High-throughput data analyses reveal omics signatures and also provide the possibility of developing computational models for early detection of this disease. Such models would be able to use as complementary tools for early detection of different types of cancers including CRC. In this study, using gene expression data, the Flux balance analysis (FBA) applied to decode metabolic fluxes in cancer and normal cells. Moreover, transcriptome and genome analyses revealed driver agents of CRC in a biological network scheme. By applying comprehensive publicly available data from TCGA, different aspect of CRC regulome including the regulatory effect of gene expression, methylation, microRNA, copy number aberration and point mutation profile over protein levels investigated and the results provide a regulatory picture underlying CRC. Compiling omics profiles indicated snapshots of changes in different omics levels and flux rate of CRC. In conclusion, considering obtained CRC signatures and their role in biological operating systems of cells, the results suggest reliable driver regulatory modules that could potentially serve as biomarkers and therapeutic targets and furthermore expand our understanding of driving mechanisms of this disease. .
Collapse
Affiliation(s)
- Fatemeh Nikmanesh
- Stem Cell Research Center, Tabriz University of Medical Sciences, Tabriz, Iran.
- Department of Medical Biotechnology, Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran.
- Iranian Blood Transfusion Organization-Research Center, Iranian Blood Transfusion Organization, IBTO blg., Hemmat Exp. Way, Teheran, Iran.
| | - Shamim Sarhadi
- Stem Cell Research Center, Tabriz University of Medical Sciences, Tabriz, Iran.
- Department of Medical Biotechnology, School of Advanced Technologies in Medicine, Tehran University of Medical Sciences, Tehran, Iran.
| | - Mehdi Dadashpour
- Stem Cell Research Center, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Yazdan Asgari
- Iranian Blood Transfusion Organization-Research Center, Iranian Blood Transfusion Organization, IBTO blg., Hemmat Exp. Way, Teheran, Iran.
| | - Nosratollah Zarghami
- Stem Cell Research Center, Tabriz University of Medical Sciences, Tabriz, Iran.
- Department of Clinical Biochemistry and Laboratory Medicine, Faculty of Medicine, Tabriz University of Medical Sciences, Tabriz, Iran.
| |
Collapse
|
7
|
Ramos M, Geistlinger L, Oh S, Schiffer L, Azhar R, Kodali H, de Bruijn I, Gao J, Carey VJ, Morgan M, Waldron L. Multiomic Integration of Public Oncology Databases in Bioconductor. JCO Clin Cancer Inform 2020; 4:958-971. [PMID: 33119407 PMCID: PMC7608653 DOI: 10.1200/cci.19.00119] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/21/2020] [Indexed: 01/04/2023] Open
Abstract
PURPOSE Investigations of the molecular basis for the development, progression, and treatment of cancer increasingly use complementary genomic assays to gather multiomic data, but management and analysis of such data remain complex. The cBioPortal for cancer genomics currently provides multiomic data from > 260 public studies, including The Cancer Genome Atlas (TCGA) data sets, but integration of different data types remains challenging and error prone for computational methods and tools using these resources. Recent advances in data infrastructure within the Bioconductor project enable a novel and powerful approach to creating fully integrated representations of these multiomic, pan-cancer databases. METHODS We provide a set of R/Bioconductor packages for working with TCGA legacy data and cBioPortal data, with special considerations for loading time; efficient representations in and out of memory; analysis platform; and an integrative framework, such as MultiAssayExperiment. Large methylation data sets are provided through out-of-memory data representation to provide responsive loading times and analysis capabilities on machines with limited memory. RESULTS We developed the curatedTCGAData and cBioPortalData R/Bioconductor packages to provide integrated multiomic data sets from the TCGA legacy database and the cBioPortal web application programming interface using the MultiAssayExperiment data structure. This suite of tools provides coordination of diverse experimental assays with clinicopathological data with minimal data management burden, as demonstrated through several greatly simplified multiomic and pan-cancer analyses. CONCLUSION These integrated representations enable analysts and tool developers to apply general statistical and plotting methods to extensive multiomic data through user-friendly commands and documented examples.
Collapse
Affiliation(s)
- Marcel Ramos
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY
- Institute for Implementation Science and Population Health, City University of New York, New York, NY
- Roswell Park Comprehensive Cancer Center, Buffalo, NY
| | - Ludwig Geistlinger
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY
- Institute for Implementation Science and Population Health, City University of New York, New York, NY
| | - Sehyun Oh
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY
- Institute for Implementation Science and Population Health, City University of New York, New York, NY
| | - Lucas Schiffer
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY
- Institute for Implementation Science and Population Health, City University of New York, New York, NY
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA
| | - Rimsha Azhar
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY
- Institute for Implementation Science and Population Health, City University of New York, New York, NY
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY
| | - Hanish Kodali
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY
- Institute for Implementation Science and Population Health, City University of New York, New York, NY
| | - Ino de Bruijn
- Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Jianjiong Gao
- Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Vincent J. Carey
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA
| | - Martin Morgan
- Roswell Park Comprehensive Cancer Center, Buffalo, NY
| | - Levi Waldron
- Graduate School of Public Health and Health Policy, City University of New York, New York, NY
- Institute for Implementation Science and Population Health, City University of New York, New York, NY
| |
Collapse
|
8
|
Paired like homeodomain 1 and SAM and SH3 domain-containing 1 in the progression and prognosis of head and neck squamous cell carcinoma. Int J Biochem Cell Biol 2020; 127:105846. [PMID: 32905855 DOI: 10.1016/j.biocel.2020.105846] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 08/28/2020] [Accepted: 09/01/2020] [Indexed: 12/14/2022]
Abstract
Head and neck squamous cell carcinoma (HNSCC) is an aggressive malignancy with high morbidity and mortality rates. In spite of numerous advancements have been made in therapeutic methods, the prognosis of HNSCC patients remains poor. Therefore, investigation of crucial genes during HNSCC tumorigenesis which could be exploited as biomarkers and therapeutic targets is greatly needed. In this study, original data of four independent datasets was downloaded from the Gene Expression Omnibus database and analyzed through R language to screen out differentially expressed genes. Paired like homeodomain 1 and SAM and SH3 domain-containing 1 were selected to be further explored through multiple online databases. Quantitative real-time polymerase chain reaction analysis and immunohistochemistry assay were adopted to validate the downregulation of paired like homeodomain 1 and SAM and SH3 domain-containing 1 in HNSCC and statistical analysis indicated their close associations with patient prognosis. In vitro experiments demonstrated the inhibitory effect of paired like homeodomain 1 and SAM and SH3 domain-containing 1 on HNSCC progression. Overall, we identified the aberrant downregulation of paired like homeodomain 1 and SAM and SH3 domain-containing 1 in HNSCC and suggested the potential of utilizing them as therapeutic targets or efficient biomarkers for diagnosis and prognosis evaluation. Our findings may provide novel evidences for the development of new strategies for HNSCC treatment.
Collapse
|
9
|
Paul S. RFCM 3: Computational Method for Identification of miRNA-mRNA Regulatory Modules in Cervical Cancer. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1729-1740. [PMID: 30990434 DOI: 10.1109/tcbb.2019.2910851] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Cervical cancer is a leading severe malignancy throughout the world. Molecular processes and biomarkers leading to tumor progression in cervical cancer are either unknown or only partially understood. An increasing number of studies have shown that microRNAs play an important role in tumorigenesis so understanding the regulatory mechanism of miRNAs in gene-regulatory network will help elucidate the complex biological processes that occur during malignancy. Functional genomics data provides opportunities to study the aberrant microRNA-messenger RNA (miRNA-mRNA) interaction. Identification of miRNA-mRNA regulatory modules will aid deciphering aberrant transcriptional regulatory network in cervical cancer but is computationally challenging. In this regard, an algorithm, termed as relevant and functionally consistent miRNA-mRNA modules (RFCM3), is proposed. It integrates miRNA and mRNA expression data of cervical cancer for identification of potential miRNA-mRNA modules. It selects set of miRNA-mRNA modules by maximizing relation of mRNAs with miRNA and functional similarity between selected mRNAs. Later, using the knowledge of the miRNA-miRNA synergistic network different modules are fused and finally a set of modules are generated containing several miRNAs as well as mRNAs. This type of module explains the underlying biological pathways containing multiple miRNAs and mRNAs. The effectiveness of the proposed approach over other existing methods has been demonstrated on a miRNA and mRNA expression data of cervical cancer with respect to enrichment analyses and other standard metrices. The prognostic value of the genes in a module with respect to cervical cancer is also demonstrated. The approach was found to generate more robust, integrated, and functionally enriched miRNA-mRNA modules in cervical cancer.
Collapse
|
10
|
Randhawa V, Pathania S. Advancing from protein interactomes and gene co-expression networks towards multi-omics-based composite networks: approaches for predicting and extracting biological knowledge. Brief Funct Genomics 2020; 19:364-376. [PMID: 32678894 DOI: 10.1093/bfgp/elaa015] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 05/31/2020] [Accepted: 06/15/2020] [Indexed: 01/17/2023] Open
Abstract
Prediction of biological interaction networks from single-omics data has been extensively implemented to understand various aspects of biological systems. However, more recently, there is a growing interest in integrating multi-omics datasets for the prediction of interactomes that provide a global view of biological systems with higher descriptive capability, as compared to single omics. In this review, we have discussed various computational approaches implemented to infer and analyze two of the most important and well studied interactomes: protein-protein interaction networks and gene co-expression networks. We have explicitly focused on recent methods and pipelines implemented to infer and extract biologically important information from these interactomes, starting from utilizing single-omics data and then progressing towards multi-omics data. Accordingly, recent examples and case studies are also briefly discussed. Overall, this review will provide a proper understanding of the latest developments in protein and gene network modelling and will also help in extracting practical knowledge from them.
Collapse
Affiliation(s)
- Vinay Randhawa
- Department of Biochemistry, Panjab University, Chandigarh, 160014, India
| | - Shivalika Pathania
- Department of Biotechnology, Panjab University, Chandigarh, 160014, India
| |
Collapse
|
11
|
Ulfenborg B. Vertical and horizontal integration of multi-omics data with miodin. BMC Bioinformatics 2019; 20:649. [PMID: 31823712 PMCID: PMC6902525 DOI: 10.1186/s12859-019-3224-4] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Accepted: 11/14/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Studies on multiple modalities of omics data such as transcriptomics, genomics and proteomics are growing in popularity, since they allow us to investigate complex mechanisms across molecular layers. It is widely recognized that integrative omics analysis holds the promise to unlock novel and actionable biological insights into health and disease. Integration of multi-omics data remains challenging, however, and requires combination of several software tools and extensive technical expertise to account for the properties of heterogeneous data. RESULTS This paper presents the miodin R package, which provides a streamlined workflow-based syntax for multi-omics data analysis. The package allows users to perform analysis of omics data either across experiments on the same samples (vertical integration), or across studies on the same variables (horizontal integration). Workflows have been designed to promote transparent data analysis and reduce the technical expertise required to perform low-level data import and processing. CONCLUSIONS The miodin package is implemented in R and is freely available for use and extension under the GPL-3 license. Package source, reference documentation and user manual are available at https://gitlab.com/algoromics/miodin.
Collapse
|
12
|
Zhang W, Zhang H, Yang H, Li M, Xie Z, Li W. Computational resources associating diseases with genotypes, phenotypes and exposures. Brief Bioinform 2019; 20:2098-2115. [PMID: 30102366 PMCID: PMC6954426 DOI: 10.1093/bib/bby071] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 07/01/2018] [Indexed: 12/16/2022] Open
Abstract
The causes of a disease and its therapies are not only related to genotypes, but also associated with other factors, including phenotypes, environmental exposures, drugs and chemical molecules. Distinguishing disease-related factors from many neutral factors is critical as well as difficult. Over the past two decades, bioinformaticians have developed many computational resources to integrate the omics data and discover associations among these factors. However, researchers and clinicians are experiencing difficulties in choosing appropriate resources from hundreds of relevant databases and software tools. Here, in order to assist the researchers and clinicians, we systematically review the public computational resources of human diseases related to genotypes, phenotypes, environment factors, drugs and chemical exposures. We briefly describe the development history of these computational resources, followed by the details of the relevant databases and software tools. We finally conclude with a discussion of current challenges and future opportunities as well as prospects on this topic.
Collapse
Affiliation(s)
- Wenliang Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Haiyue Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Huan Yang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Miaoxin Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Zhi Xie
- State Key Lab of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou 500040, China
| | - Weizhong Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| |
Collapse
|
13
|
Zanfardino M, Franzese M, Pane K, Cavaliere C, Monti S, Esposito G, Salvatore M, Aiello M. Bringing radiomics into a multi-omics framework for a comprehensive genotype-phenotype characterization of oncological diseases. J Transl Med 2019; 17:337. [PMID: 31590671 PMCID: PMC6778975 DOI: 10.1186/s12967-019-2073-2] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Accepted: 09/18/2019] [Indexed: 02/07/2023] Open
Abstract
Genomic and radiomic data integration, namely radiogenomics, can provide meaningful knowledge in cancer diagnosis, prognosis and treatment. Despite several data structures based on multi-layer architecture proposed to combine multi-omic biological information, none of these has been designed and assessed to include radiomic data as well. To meet this need, we propose to use the MultiAssayExperiment (MAE), an R package that provides data structures and methods for manipulating and integrating multi-assay experiments, as a suitable tool to manage radiogenomic experiment data. To this aim, we first examine the role of radiogenomics in cancer phenotype definition, then the current state of radiogenomics data integration in public repository and, finally, challenges and limitations of including radiomics in MAE, designing an extended framework and showing its application on a case study from the TCGA-TCIA archives. Radiomic and genomic data from 91 patients have been successfully integrated in a single MAE object, demonstrating the suitability of the MAE data structure as container of radiogenomic data.
Collapse
|
14
|
Stanstrup J, Broeckling CD, Helmus R, Hoffmann N, Mathé E, Naake T, Nicolotti L, Peters K, Rainer J, Salek RM, Schulze T, Schymanski EL, Stravs MA, Thévenot EA, Treutler H, Weber RJM, Willighagen E, Witting M, Neumann S. The metaRbolomics Toolbox in Bioconductor and beyond. Metabolites 2019; 9:E200. [PMID: 31548506 PMCID: PMC6835268 DOI: 10.3390/metabo9100200] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Revised: 09/16/2019] [Accepted: 09/17/2019] [Indexed: 11/17/2022] Open
Abstract
Metabolomics aims to measure and characterise the complex composition of metabolites in a biological system. Metabolomics studies involve sophisticated analytical techniques such as mass spectrometry and nuclear magnetic resonance spectroscopy, and generate large amounts of high-dimensional and complex experimental data. Open source processing and analysis tools are of major interest in light of innovative, open and reproducible science. The scientific community has developed a wide range of open source software, providing freely available advanced processing and analysis approaches. The programming and statistics environment R has emerged as one of the most popular environments to process and analyse Metabolomics datasets. A major benefit of such an environment is the possibility of connecting different tools into more complex workflows. Combining reusable data processing R scripts with the experimental data thus allows for open, reproducible research. This review provides an extensive overview of existing packages in R for different steps in a typical computational metabolomics workflow, including data processing, biostatistics, metabolite annotation and identification, and biochemical network and pathway analysis. Multifunctional workflows, possible user interfaces and integration into workflow management systems are also reviewed. In total, this review summarises more than two hundred metabolomics specific packages primarily available on CRAN, Bioconductor and GitHub.
Collapse
Affiliation(s)
- Jan Stanstrup
- Preventive and Clinical Nutrition, University of Copenhagen, Rolighedsvej 30, 1958 Frederiksberg C, Denmark.
| | - Corey D Broeckling
- Proteomics and Metabolomics Facility, Colorado State University, Fort Collins, CO 80523, USA.
| | - Rick Helmus
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, 1098 XH Amsterdam, The Netherlands.
| | - Nils Hoffmann
- Leibniz-Institut für Analytische Wissenschaften-ISAS-e.V., Otto-Hahn-Straße 6b, 44227 Dortmund, Germany.
| | - Ewy Mathé
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA.
| | - Thomas Naake
- Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam-Golm, Germany.
| | - Luca Nicolotti
- The Australian Wine Research Institute, Metabolomics Australia, PO Box 197, Adelaide SA 5064, Australia.
| | - Kristian Peters
- Leibniz Institute of Plant Biochemistry (IPB Halle), Bioinformatics and Scientific Data, 06120 Halle, Germany.
| | - Johannes Rainer
- Institute for Biomedicine, Eurac Research, Affiliated Institute of the University of Lübeck, 39100 Bolzano, Italy.
| | - Reza M Salek
- The International Agency for Research on Cancer, 150 cours Albert Thomas, CEDEX 08, 69372 Lyon, France.
| | - Tobias Schulze
- Department of Effect-Directed Analysis, Helmholtz Centre for Environmental Research-UFZ, Permoserstraße 15, 04318 Leipzig, Germany.
| | - Emma L Schymanski
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 avenue du Swing, L-4367 Belvaux, Luxembourg.
| | - Michael A Stravs
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, Überlandstrasse 133, 8600 Dubendorf, Switzerland.
| | - Etienne A Thévenot
- CEA, LIST, Laboratory for Data Sciences and Decision, MetaboHUB, Gif-Sur-Yvette F-91191, France.
| | - Hendrik Treutler
- Leibniz Institute of Plant Biochemistry (IPB Halle), Bioinformatics and Scientific Data, 06120 Halle, Germany.
| | - Ralf J M Weber
- Phenome Centre Birmingham and School of Biosciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK.
| | - Egon Willighagen
- Department of Bioinformatics-BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands.
| | - Michael Witting
- Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, 85764 Neuherberg, Germany.
- Chair of Analytical Food Chemistry, Technische Universität München, 85354 Weihenstephan, Germany.
| | - Steffen Neumann
- Leibniz Institute of Plant Biochemistry (IPB Halle), Bioinformatics and Scientific Data, 06120 Halle, Germany.
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig Deutscher, Platz 5e, 04103 Leipzig, Germany.
| |
Collapse
|
15
|
In silico drug repositioning: from large-scale transcriptome data to therapeutics. Arch Pharm Res 2019; 42:879-889. [PMID: 31482491 DOI: 10.1007/s12272-019-01176-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Accepted: 07/26/2019] [Indexed: 02/06/2023]
Abstract
Drug repositioning is an attractive alternative to conventional drug development when new beneficial effects of old drugs are clinically validated because pharmacokinetic and safety profiles are generally already available. Since ~ 30% of drugs newly approved by the US food and drug administration (FDA) are developed through drug repositioning, identifying novel usage for existing drugs is an emerging strategy for developing disease treatments. With advances in next-generation sequencing technologies, available transcriptome data related to diseases have expanded rapidly. Harnessing these resources enables a better understanding of disease mechanisms and drug mode of action (MOA), and moves toward personalized pharmacotherapy. In this review, we briefly outline publicly available large-scale transcriptome databases and tools for drug repositioning. We also highlight recent approaches leading to the discovery of novel drug targets, drug response biomarkers, drug indications, and drug MOA.
Collapse
|
16
|
Bioinformatics-based discovery of PYGM and TNNC2 as potential biomarkers of head and neck squamous cell carcinoma. Biosci Rep 2019; 39:BSR20191612. [PMID: 31324732 PMCID: PMC6663994 DOI: 10.1042/bsr20191612] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 07/11/2019] [Accepted: 07/18/2019] [Indexed: 12/12/2022] Open
Abstract
Head and neck squamous cell carcinoma (HNSCC) is an aggressive malignancy with high morbidity and mortality rates and ranks as the sixth most common cancer all over the world. Despite numerous advancements in therapeutic methods, the prognosis of HNSCC patients still remains poor. Therefore, there is an urgent need to have a better understanding of the molecular mechanisms underlying HNSCC progression and to identify essential genes that could serve as effective biomarkers and potential treatment targets. In the present study, original data of three independent datasets were downloaded from the Gene Expression Omnibus database (GEO) and R language was applied to screen out the differentially expressed genes (DEGs). PYGM and TNNC2 were finally selected from the overlapping DEGs of three datasets for further analyses. Transcriptional and survival data related to PYGM and TNNC2 was detected through multiple online databases such as Oncomine, Gene Expression Profiling Interactive Analysis (GEPIA), cBioportal, and UALCAN. Quantitative real-time polymerase chain reaction (qPCR) analysis was adopted for the validation of PYGM and TNNC2 mRNA level in HNSCC tissues and cell lines. Survival curves were plotted to evaluate the association of these two genes with HNSCC prognosis. It was demonstrated that PYGM and TNNC2 were significantly down-regulated in HNSCC and the aberrant expression of PYGM and TNNC2 were correlated with HNSCC prognosis, implying the potential of exploiting them as therapeutic targets for HNSCC treatment or potential biomarkers for diagnosis and prognosis.
Collapse
|
17
|
Gendoo DMA, Zon M, Sandhu V, Manem VSK, Ratanasirigulchai N, Chen GM, Waldron L, Haibe-Kains B. MetaGxData: Clinically Annotated Breast, Ovarian and Pancreatic Cancer Datasets and their Use in Generating a Multi-Cancer Gene Signature. Sci Rep 2019; 9:8770. [PMID: 31217513 PMCID: PMC6584731 DOI: 10.1038/s41598-019-45165-4] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 05/31/2019] [Indexed: 12/13/2022] Open
Abstract
A wealth of transcriptomic and clinical data on solid tumours are under-utilized due to unharmonized data storage and format. We have developed the MetaGxData package compendium, which includes manually-curated and standardized clinical, pathological, survival, and treatment metadata across breast, ovarian, and pancreatic cancer data. MetaGxData is the largest compendium of curated transcriptomic data for these cancer types to date, spanning 86 datasets and encompassing 15,249 samples. Open access to standardized metadata across cancer types promotes use of their transcriptomic and clinical data in a variety of cross-tumour analyses, including identification of common biomarkers, and assessing the validity of prognostic signatures. Here, we demonstrate that MetaGxData is a flexible framework that facilitates meta-analyses by using it to identify common prognostic genes in ovarian and breast cancer. Furthermore, we use the data compendium to create the first gene signature that is prognostic in a meta-analysis across 3 cancer types. These findings demonstrate the potential of MetaGxData to serve as an important resource in oncology research, and provide a foundation for future development of cancer-specific compendia.
Collapse
Affiliation(s)
- Deena M A Gendoo
- Centre for Computational Biology, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, United Kingdom.
| | - Michael Zon
- Princess Margaret Cancer Center, University Health Network, Toronto, M5G 2C1, Canada.,Department of Biomedical Engineering, McMaster University, Toronto, L8S 4L8, Canada
| | - Vandana Sandhu
- Princess Margaret Cancer Center, University Health Network, Toronto, M5G 2C1, Canada
| | - Venkata S K Manem
- Princess Margaret Cancer Center, University Health Network, Toronto, M5G 2C1, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, M5S 3H7, Canada.,Institut Universitaire de Cardiologie et de Pneumologie de Québec, Université Laval, Québec City, G1V 4G5, Canada
| | | | - Gregory M Chen
- Princess Margaret Cancer Center, University Health Network, Toronto, M5G 2C1, Canada
| | - Levi Waldron
- Graduate School of Public Health and Health Policy, Institute of Implementation Science in Population Health, City University of New York School, New York, 11101, USA.
| | - Benjamin Haibe-Kains
- Princess Margaret Cancer Center, University Health Network, Toronto, M5G 2C1, Canada. .,Department of Medical Biophysics, University of Toronto, Toronto, M5S 3H7, Canada. .,Department of Computer Science, University of Toronto, Toronto, M5T 3A1, Canada. .,Ontario Institute of Cancer Research, Toronto, M5G 0A3, Canada. .,Vector Institute, Toronto, M5G 1M1, Canada.
| |
Collapse
|
18
|
Sun R, Bao M, Long X, Yuan Y, Wu M, Li X, Bao J. Metabolic gene NR4A1 as a potential therapeutic target for non-smoking female non-small cell lung cancer patients. Thorac Cancer 2019; 10:715-727. [PMID: 30806032 PMCID: PMC6449245 DOI: 10.1111/1759-7714.12989] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2018] [Revised: 01/04/2019] [Accepted: 01/05/2019] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND Although cigarette smoking is considered one of the key risk factors for lung cancer, 15% of male patients and 53% of female patients with lung cancer are non-smokers. Metabolic changes are critical features of cancer. Therapeutic target identification from a metabolic perspective in non-small cell lung cancer (NSCLC) tissue of female non-smokers has long been ignored. RESULTS Based on microarray data retrieved from Affymetrix expression arrays E-GEOD-19804, we found that the downregulated genes in non-smoking female NSCLC patients tended to participate in protein/amino acid and lipid metabolism, while upregulated genes were more involved in protein/amino acid and carbohydrate metabolism. Combining nutrient metabolic co-expression, protein-protein interaction network construction and overall survival assessment, we identified NR4A1 and TIE1 as potential therapeutic targets for NSCLC in female non-smokers. To accelerate the drug development for non-smoking female NSCLC patients, we identified nilotinib as a potential agonist targeting NR4A1 encoded protein by molecular docking and molecular dynamic stimulation. We also show that nilotinib inhibited proliferation and induced senescence of cells in non-smoking female NSCLC patients in vitro. CONCLUSIONS These results not only uncover nutrient metabolic characteristics in non-smoking female NSCLC patients, but also provide a new paradigm for identifying new targets and drugs for novel therapy for such patients.
Collapse
MESH Headings
- Biomarkers, Tumor/metabolism
- Carcinoma, Non-Small-Cell Lung/drug therapy
- Carcinoma, Non-Small-Cell Lung/genetics
- Carcinoma, Non-Small-Cell Lung/metabolism
- Cell Line, Tumor
- Cell Proliferation/drug effects
- Cell Survival/drug effects
- Down-Regulation
- Drug Screening Assays, Antitumor
- Female
- Gene Expression Regulation, Neoplastic/drug effects
- Humans
- Lung Neoplasms/drug therapy
- Lung Neoplasms/genetics
- Lung Neoplasms/metabolism
- Molecular Docking Simulation
- Molecular Dynamics Simulation
- Non-Smokers/statistics & numerical data
- Nuclear Receptor Subfamily 4, Group A, Member 1/antagonists & inhibitors
- Nuclear Receptor Subfamily 4, Group A, Member 1/chemistry
- Nuclear Receptor Subfamily 4, Group A, Member 1/metabolism
- Protein Interaction Maps
- Pyrimidines/pharmacology
- Pyrimidines/therapeutic use
- Receptor, TIE-1/genetics
- Receptor, TIE-1/metabolism
- Survival Analysis
Collapse
Affiliation(s)
- Rong Sun
- Key Laboratory of Bio‐Resource and Eco‐Environment of Ministry of Education, College of Life SciencesSichuan UniversityChengduChina
| | - Min‐Yue Bao
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, West China Hospital of StomatologySichuan UniversityChengduChina
| | - Xin Long
- Key Laboratory of Bio‐Resource and Eco‐Environment of Ministry of Education, College of Life SciencesSichuan UniversityChengduChina
| | - Yuan Yuan
- Key Laboratory of Bio‐Resource and Eco‐Environment of Ministry of Education, College of Life SciencesSichuan UniversityChengduChina
| | - Miao‐Miao Wu
- Key Laboratory of Bio‐Resource and Eco‐Environment of Ministry of Education, College of Life SciencesSichuan UniversityChengduChina
| | - Xin Li
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, West China Hospital of StomatologySichuan UniversityChengduChina
| | - Jin‐Ku Bao
- Key Laboratory of Bio‐Resource and Eco‐Environment of Ministry of Education, College of Life SciencesSichuan UniversityChengduChina
- State Key Laboratory of Oral Diseases, National Clinical Research Center for Oral Diseases, West China Hospital of StomatologySichuan UniversityChengduChina
| |
Collapse
|
19
|
Kagohara LT, Stein-O’Brien GL, Kelley D, Flam E, Wick HC, Danilova LV, Easwaran H, Favorov AV, Qian J, Gaykalova DA, Fertig EJ. Epigenetic regulation of gene expression in cancer: techniques, resources and analysis. Brief Funct Genomics 2019; 17:49-63. [PMID: 28968850 PMCID: PMC5860551 DOI: 10.1093/bfgp/elx018] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Cancer is a complex disease, driven by aberrant activity in numerous signaling pathways in even individual malignant cells. Epigenetic changes are critical mediators of these functional changes that drive and maintain the malignant phenotype. Changes in DNA methylation, histone acetylation and methylation, noncoding RNAs, posttranslational modifications are all epigenetic drivers in cancer, independent of changes in the DNA sequence. These epigenetic alterations were once thought to be crucial only for the malignant phenotype maintenance. Now, epigenetic alterations are also recognized as critical for disrupting essential pathways that protect the cells from uncontrolled growth, longer survival and establishment in distant sites from the original tissue. In this review, we focus on DNA methylation and chromatin structure in cancer. The precise functional role of these alterations is an area of active research using emerging high-throughput approaches and bioinformatics analysis tools. Therefore, this review also describes these high-throughput measurement technologies, public domain databases for high-throughput epigenetic data in tumors and model systems and bioinformatics algorithms for their analysis. Advances in bioinformatics data that combine these epigenetic data with genomics data are essential to infer the function of specific epigenetic alterations in cancer. These integrative algorithms are also a focus of this review. Future studies using these emerging technologies will elucidate how alterations in the cancer epigenome cooperate with genetic aberrations during tumor initiation and progression. This deeper understanding is essential to future studies with epigenetics biomarkers and precision medicine using emerging epigenetic therapies.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Daria A Gaykalova
- Corresponding authors: Daria A. Gaykalova, Otolaryngology - Head and Neck Surgery, The Johns Hopkins University School of Medicine, 1550 Orleans Street, Rm 574, CRBII Baltimore, MD 21231, USA. Tel.: +1 410 614 2745; Fax: +1 410 614 1411; E-mail: ; Elana J. Fertig, Assistant Professor of Oncology, Division of Biostatistics and Bioinformatics, Johns Hopkins University, 550 N Broadway, 1101 E Baltimore, MD 21205, USA. Tel.: +1 410 955 4268; Fax: +1 410 955 0859; E-mail:
| | | |
Collapse
|
20
|
Manzoni C, Kia DA, Vandrovcova J, Hardy J, Wood NW, Lewis PA, Ferrari R. Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences. Brief Bioinform 2019; 19:286-302. [PMID: 27881428 PMCID: PMC6018996 DOI: 10.1093/bib/bbw114] [Citation(s) in RCA: 428] [Impact Index Per Article: 71.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Indexed: 02/07/2023] Open
Abstract
Advances in the technologies and informatics used to generate and process large biological data sets (omics data) are promoting a critical shift in the study of biomedical sciences. While genomics, transcriptomics and proteinomics, coupled with bioinformatics and biostatistics, are gaining momentum, they are still, for the most part, assessed individually with distinct approaches generating monothematic rather than integrated knowledge. As other areas of biomedical sciences, including metabolomics, epigenomics and pharmacogenomics, are moving towards the omics scale, we are witnessing the rise of inter-disciplinary data integration strategies to support a better understanding of biological systems and eventually the development of successful precision medicine. This review cuts across the boundaries between genomics, transcriptomics and proteomics, summarizing how omics data are generated, analysed and shared, and provides an overview of the current strengths and weaknesses of this global approach. This work intends to target students and researchers seeking knowledge outside of their field of expertise and fosters a leap from the reductionist to the global-integrative analytical approach in research.
Collapse
Affiliation(s)
- Claudia Manzoni
- School of Pharmacy, University of Reading, Whiteknights, Reading, United Kingdom.,Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Demis A Kia
- Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Jana Vandrovcova
- Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - John Hardy
- Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Nicholas W Wood
- Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Patrick A Lewis
- School of Pharmacy, University of Reading, Whiteknights, Reading, United Kingdom.,Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Raffaele Ferrari
- Department Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| |
Collapse
|
21
|
Zheng Y, Liu Y, Zhao S, Zheng Z, Shen C, An L, Yuan Y. Large-scale analysis reveals a novel risk score to predict overall survival in hepatocellular carcinoma. Cancer Manag Res 2018; 10:6079-6096. [PMID: 30538557 PMCID: PMC6252784 DOI: 10.2147/cmar.s181396] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Background Hepatocellular carcinoma (HCC) is a major cause of cancer mortality and an increasing incidence worldwide; however, there are very few effective diagnostic approaches and prognostic biomarkers. Materials and methods One hundred forty-nine pairs of HCC samples from Gene Expression Omnibus (GEO) were obtained to screen differentially expressed genes (DEGs) between HCC and normal samples. The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, Gene ontology enrichment analyses, and protein–protein interaction network were used. Cox proportional hazards regression analysis was used to identify significant prognostic DEGs, with which a gene expression signature prognostic prediction model was identified in The Cancer Genome Atlas (TCGA) project discovery cohort. The robustness of this panel was assessed in the GSE14520 cohort. We verified details of the gene expression level of the key molecules through TCGA, GEO, and qPCR and used immunohistochemistry for substantiation in HCC tissues. The methylation states of these genes were also explored. Results Ninety-eight genes, consisting of 13 upregulated and 85 downregulated genes, were screened out in three datasets. KEGG and Gene ontology analysis for the DEGs revealed important biological features of each subtype. Protein–protein interaction network analysis was constructed, consisting of 64 nodes and 115 edges. A subset of four genes (SPINK1, TXNRD1, LCAT, and PZP) that formed a prognostic gene expression signature was established from TCGA and validated in GSE14520. Next, the expression details of the four genes were validated with TCGA, GEO, and clinical samples. The expression panels of the four genes were closely related to methylation states. Conclusion This study identified a novel four-gene signature biomarker for predicting the prognosis of HCC. The biomarkers may also reveal molecular mechanisms underlying development of the disease and provide new insights into interventional strategies.
Collapse
Affiliation(s)
- Yujia Zheng
- Biotherapy Center, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | - Yulin Liu
- Biotherapy Center, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | - Songfeng Zhao
- Department of Pharmacy, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China,
| | - Zhetian Zheng
- School of Computer Science, Yangtze University, Jingzhou, Hubei, China
| | - Chunyi Shen
- Biotherapy Center, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | - Li An
- Institute of Quality Standard and Testing Technology for Agro-products, Henan Academy of Agricultural Sciences, Zhengzhou, China,
| | - Yongliang Yuan
- Department of Pharmacy, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China,
| |
Collapse
|
22
|
Berlin R, Gruen R, Best J. Systems Medicine Disease: Disease Classification and Scalability Beyond Networks and Boundary Conditions. Front Bioeng Biotechnol 2018; 6:112. [PMID: 30131956 PMCID: PMC6090066 DOI: 10.3389/fbioe.2018.00112] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Accepted: 07/18/2018] [Indexed: 12/26/2022] Open
Abstract
In order to accommodate the forthcoming wealth of health and disease related information, from genome to body sensors to population and the environment, the approach to disease description and definition demands re-examination. Traditional classification methods remain trapped by history; to provide the descriptive features that are required for a comprehensive description of disease, systems science, which realizes dynamic processes, adaptive response, and asynchronous communication channels, must be applied (Wolkenhauer et al., 2013). When Disease is viewed beyond the thresholds of lines and threshold boundaries, disease definition is not only the result of reductionist, mechanistic categories which reluctantly face re-composition. Disease is process and synergy as the characteristics of Systems Biology and Systems Medicine are included. To capture the wealth of information and contribute meaningfully to medical practice and biology research, Disease classification goes beyond a single spatial biologic level or static time assignment to include the interface of Disease process and organism response (Bechtel, 2017a; Green et al., 2017).
Collapse
Affiliation(s)
- Richard Berlin
- Department of Computer Science, University of Illinois, Urbana, IL, United States
| | - Russell Gruen
- Department of Surgery, Nanyang Institute of Technology in Health and Medicine, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - James Best
- Lee Kong China School of Medicine, Nanyang Technological University, Singapore, Singapore
- Imperial College, London, United Kingdom
| |
Collapse
|
23
|
Liu X, Wei L, Zhao B, Cai X, Dong C, Yin F. Low expression of KCNN3 may affect drug resistance in ovarian cancer. Mol Med Rep 2018; 18:1377-1386. [PMID: 29901154 PMCID: PMC6072180 DOI: 10.3892/mmr.2018.9107] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Accepted: 04/26/2018] [Indexed: 12/23/2022] Open
Abstract
Drug resistance is a principal contributor to the poor prognosis of ovarian cancer (OC). Therefore, identifying factors that affect drug resistance in OC is critical. In the present study, 51 OC specimens from lab collections were immunohistochemically tested, public data for 489 samples from The Cancer Genome Atlas cohort and 1,656 samples from the Kaplan‑Meier Plotter were downloaded, and data were retrieved from Oncomine. It was identified that the mRNA and protein expression of the potassium calcium‑activated channel subfamily N member 3 (KCNN3) was markedly lower in OC tissues compared with normal tissues, and in drug‑resistant OC tissues compared with sensitive OC tissues. Low KCNN3 expression consistently predicted shorter disease‑free and overall survival (OS). Specifically, low KCNN3 expression predicted shorter OS in 395 patients with low expression levels of mucin‑16. There was additional evidence that KCNN3 expression is mediated by microRNA‑892b. Furthermore, text mining and analyses of protein and gene interactions indicated that KCNN3 affects drug resistance. To the best of the authors' knowledge, this is the first report to associate KCNN3 with poor prognosis and drug resistance in OC. The present findings indicated that KCNN3 is a potential prognostic marker and therapeutic target for OC.
Collapse
Affiliation(s)
- Xia Liu
- Key Laboratory of Longevity and Ageing‑Related Disease of Chinese Ministry of Education, Centre for Translational Medicine and School of Preclinical Medicine, Guangxi Medical University, Nanning, Guangxi 530021, P.R. China
| | - Luwei Wei
- Department of Gynecologic Oncology, The Affiliated Tumor Hospital, Guangxi Medical University, Nanning, Guangxi 530021, P.R. China
| | - Bingbing Zhao
- Department of Gynecologic Oncology, The Affiliated Tumor Hospital, Guangxi Medical University, Nanning, Guangxi 530021, P.R. China
| | - Xiangxue Cai
- Key Laboratory of Longevity and Ageing‑Related Disease of Chinese Ministry of Education, Centre for Translational Medicine and School of Preclinical Medicine, Guangxi Medical University, Nanning, Guangxi 530021, P.R. China
| | - Caihua Dong
- Key Laboratory of Longevity and Ageing‑Related Disease of Chinese Ministry of Education, Centre for Translational Medicine and School of Preclinical Medicine, Guangxi Medical University, Nanning, Guangxi 530021, P.R. China
| | - Fuqiang Yin
- Life Sciences Institute, Guangxi Medical University, Nanning, Guangxi 530021, P.R. China
| |
Collapse
|
24
|
Musa A, Ghoraie LS, Zhang SD, Glazko G, Yli-Harja O, Dehmer M, Haibe-Kains B, Emmert-Streib F. A review of connectivity map and computational approaches in pharmacogenomics. Brief Bioinform 2018; 19:506-523. [PMID: 28069634 PMCID: PMC5952941 DOI: 10.1093/bib/bbw112] [Citation(s) in RCA: 102] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Large-scale perturbation databases, such as Connectivity Map (CMap) or Library of Integrated Network-based Cellular Signatures (LINCS), provide enormous opportunities for computational pharmacogenomics and drug design. A reason for this is that in contrast to classical pharmacology focusing at one target at a time, the transcriptomics profiles provided by CMap and LINCS open the door for systems biology approaches on the pathway and network level. In this article, we provide a review of recent developments in computational pharmacogenomics with respect to CMap and LINCS and related applications.
Collapse
Affiliation(s)
- Aliyu Musa
- Predictive Medicine and Analytics Lab, Department of Signal Processing, Tampere University of Technology, Tampere, Finland
| | - Laleh Soltan Ghoraie
- Bioinformatics and Computational Genomics Laboratory, Princess Margaret Cancer Center, University Health Network, Toronto, ON, Canada
| | - Shu-Dong Zhang
- Northern Ireland Centre for Stratified Medicine, Biomedical Sciences Research Institute, University of Ulster, C-TRIC Building, Altnagelvin Area Hospital, Glenshane Road, Derry/Londonderry, Northern Ireland, UK
| | - Galina Glazko
- University of Rochester Department of Biostatistics and Computational Biology, Rochester, New York, USA
| | - Olli Yli-Harja
- Computational Systems Biology, Department of Signal Processing, Tampere University of Technology, Tampere, Finland
| | - Matthias Dehmer
- Institute for Bioinformatics and Translational Research, UMIT- The Health and Life Sciences University, Eduard Wallnoefer Zentrum 1, Hall in Tyrol, Austria
| | - Benjamin Haibe-Kains
- Bioinformatics and Computational Genomics Laboratory, Princess Margaret Cancer Center, University Health Network, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Ontario Institute of Cancer Research, Toronto, ON, Canada
| | - Frank Emmert-Streib
- Predictive Medicine and Analytics Lab, Department of Signal Processing, Tampere University of Technology, Tampere, Finland
| |
Collapse
|
25
|
Musa A, Ghoraie LS, Zhang SD, Glazko G, Yli-Harja O, Dehmer M, Haibe-Kains B, Emmert-Streib F. A review of connectivity map and computational approaches in pharmacogenomics. Brief Bioinform 2018. [PMID: 28069634 DOI: 10.1093/bib] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/14/2023] Open
Abstract
Large-scale perturbation databases, such as Connectivity Map (CMap) or Library of Integrated Network-based Cellular Signatures (LINCS), provide enormous opportunities for computational pharmacogenomics and drug design. A reason for this is that in contrast to classical pharmacology focusing at one target at a time, the transcriptomics profiles provided by CMap and LINCS open the door for systems biology approaches on the pathway and network level. In this article, we provide a review of recent developments in computational pharmacogenomics with respect to CMap and LINCS and related applications.
Collapse
Affiliation(s)
- Aliyu Musa
- Predictive Medicine and Analytics Lab, Department of Signal Processing, Tampere University of Technology, Tampere, Finland
| | - Laleh Soltan Ghoraie
- Bioinformatics and Computational Genomics Laboratory, Princess Margaret Cancer Center, University Health Network, Toronto, ON, Canada
| | - Shu-Dong Zhang
- Northern Ireland Centre for Stratified Medicine, Biomedical Sciences Research Institute, University of Ulster, C-TRIC Building, Altnagelvin Area Hospital, Glenshane Road, Derry/Londonderry BT47 6SB, Northern Ireland, UK
| | - Galina Glazko
- University of Rochester Department of Biostatistics and Computational Biology, Rochester, New York 14642, USA
| | - Olli Yli-Harja
- Computational Systems Biology, Department of Signal Processing, Tampere University of Technology, Tampere, Finland
| | - Matthias Dehmer
- Institute for Bioinformatics and Translational Research, UMIT- The Health and Life Sciences University, Eduard Wallnoefer Zentrum 1, 6060 Hall in Tyrol, Austria
| | - Benjamin Haibe-Kains
- Bioinformatics and Computational Genomics Laboratory, Princess Margaret Cancer Center, University Health Network, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Ontario Institute of Cancer Research, Toronto, ON, Canada
| | - Frank Emmert-Streib
- Predictive Medicine and Analytics Lab, Department of Signal Processing, Tampere University of Technology, Tampere, Finland
| |
Collapse
|
26
|
Abstract
This article considers replicability of the performance of predictors across studies. We suggest a general approach to investigating this issue, based on ensembles of prediction models trained on different studies. We quantify how the common practice of training on a single study accounts in part for the observed challenges in replicability of prediction performance. We also investigate whether ensembles of predictors trained on multiple studies can be combined, using unique criteria, to design robust ensemble learners trained upfront to incorporate replicability into different contexts and populations.
Collapse
Affiliation(s)
- Prasad Patil
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115
| | - Giovanni Parmigiani
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215;
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115
| |
Collapse
|
27
|
Zeng ISL, Lumley T. Review of Statistical Learning Methods in Integrated Omics Studies (An Integrated Information Science). Bioinform Biol Insights 2018; 12:1177932218759292. [PMID: 29497285 PMCID: PMC5824897 DOI: 10.1177/1177932218759292] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2017] [Accepted: 01/24/2018] [Indexed: 12/14/2022] Open
Abstract
Integrated omics is becoming a new channel for investigating the complex molecular system in modern biological science and sets a foundation for systematic learning for precision medicine. The statistical/machine learning methods that have emerged in the past decade for integrated omics are not only innovative but also multidisciplinary with integrated knowledge in biology, medicine, statistics, machine learning, and artificial intelligence. Here, we review the nontrivial classes of learning methods from the statistical aspects and streamline these learning methods within the statistical learning framework. The intriguing findings from the review are that the methods used are generalizable to other disciplines with complex systematic structure, and the integrated omics is part of an integrated information science which has collated and integrated different types of information for inferences and decision making. We review the statistical learning methods of exploratory and supervised learning from 42 publications. We also discuss the strengths and limitations of the extended principal component analysis, cluster analysis, network analysis, and regression methods. Statistical techniques such as penalization for sparsity induction when there are fewer observations than the number of features and using Bayesian approach when there are prior knowledge to be integrated are also included in the commentary. For the completeness of the review, a table of currently available software and packages from 23 publications for omics are summarized in the appendix.
Collapse
Affiliation(s)
- Irene Sui Lan Zeng
- Department of Statistics, Faculty of Science, The University of Auckland, Auckland, New Zealand
| | - Thomas Lumley
- Department of Statistics, Faculty of Science, The University of Auckland, Auckland, New Zealand
| |
Collapse
|
28
|
Kagohara LT, Stein-O'Brien GL, Kelley D, Flam E, Wick HC, Danilova LV, Easwaran H, Favorov AV, Qian J, Gaykalova DA, Fertig EJ. Epigenetic regulation of gene expression in cancer: techniques, resources and analysis. Brief Funct Genomics 2018. [PMID: 28968850 DOI: 10.1101/114025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2023] Open
Abstract
Cancer is a complex disease, driven by aberrant activity in numerous signaling pathways in even individual malignant cells. Epigenetic changes are critical mediators of these functional changes that drive and maintain the malignant phenotype. Changes in DNA methylation, histone acetylation and methylation, noncoding RNAs, posttranslational modifications are all epigenetic drivers in cancer, independent of changes in the DNA sequence. These epigenetic alterations were once thought to be crucial only for the malignant phenotype maintenance. Now, epigenetic alterations are also recognized as critical for disrupting essential pathways that protect the cells from uncontrolled growth, longer survival and establishment in distant sites from the original tissue. In this review, we focus on DNA methylation and chromatin structure in cancer. The precise functional role of these alterations is an area of active research using emerging high-throughput approaches and bioinformatics analysis tools. Therefore, this review also describes these high-throughput measurement technologies, public domain databases for high-throughput epigenetic data in tumors and model systems and bioinformatics algorithms for their analysis. Advances in bioinformatics data that combine these epigenetic data with genomics data are essential to infer the function of specific epigenetic alterations in cancer. These integrative algorithms are also a focus of this review. Future studies using these emerging technologies will elucidate how alterations in the cancer epigenome cooperate with genetic aberrations during tumor initiation and progression. This deeper understanding is essential to future studies with epigenetics biomarkers and precision medicine using emerging epigenetic therapies.
Collapse
|
29
|
Ramos M, Schiffer L, Re A, Azhar R, Basunia A, Rodriguez C, Chan T, Chapman P, Davis SR, Gomez-Cabrero D, Culhane AC, Haibe-Kains B, Hansen KD, Kodali H, Louis MS, Mer AS, Riester M, Morgan M, Carey V, Waldron L. Software for the Integration of Multiomics Experiments in Bioconductor. Cancer Res 2017; 77:e39-e42. [PMID: 29092936 DOI: 10.1158/0008-5472.can-17-0344] [Citation(s) in RCA: 61] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2017] [Revised: 05/30/2017] [Accepted: 07/27/2017] [Indexed: 11/16/2022]
Abstract
Multiomics experiments are increasingly commonplace in biomedical research and add layers of complexity to experimental design, data integration, and analysis. R and Bioconductor provide a generic framework for statistical analysis and visualization, as well as specialized data classes for a variety of high-throughput data types, but methods are lacking for integrative analysis of multiomics experiments. The MultiAssayExperiment software package, implemented in R and leveraging Bioconductor software and design principles, provides for the coordinated representation of, storage of, and operation on multiple diverse genomics data. We provide the unrestricted multiple 'omics data for each cancer tissue in The Cancer Genome Atlas as ready-to-analyze MultiAssayExperiment objects and demonstrate in these and other datasets how the software simplifies data representation, statistical analysis, and visualization. The MultiAssayExperiment Bioconductor package reduces major obstacles to efficient, scalable, and reproducible statistical analysis of multiomics data and enhances data science applications of multiple omics datasets. Cancer Res; 77(21); e39-42. ©2017 AACR.
Collapse
Affiliation(s)
- Marcel Ramos
- Graduate School of Public Health & Health Policy, City University of New York, New York, New York.,Institute for Implementation Science in Population Health, City University of New York, New York, New York.,Roswell Park Cancer Institute, University of Buffalo, Buffalo, New York
| | - Lucas Schiffer
- Graduate School of Public Health & Health Policy, City University of New York, New York, New York.,Institute for Implementation Science in Population Health, City University of New York, New York, New York
| | - Angela Re
- Centre for Sustainable Future Technologies, Istituto Italiano di Tecnologia, Corso Trento, Torino, Italy
| | - Rimsha Azhar
- Graduate School of Public Health & Health Policy, City University of New York, New York, New York.,Institute for Implementation Science in Population Health, City University of New York, New York, New York
| | - Azfar Basunia
- Harvard TH Chan School of Public Health, Boston, Massachusetts
| | - Carmen Rodriguez
- Graduate School of Public Health & Health Policy, City University of New York, New York, New York.,Institute for Implementation Science in Population Health, City University of New York, New York, New York
| | - Tiffany Chan
- Graduate School of Public Health & Health Policy, City University of New York, New York, New York.,Institute for Implementation Science in Population Health, City University of New York, New York, New York
| | - Phil Chapman
- Computational Biology Support Team, Cancer Research UK Manchester Institute, The University of Manchester, Manchester, United Kingdom
| | - Sean R Davis
- Center for Cancer Research, NCI, NIH, Bethesda, Maryland
| | - David Gomez-Cabrero
- Mucosal and Salivary Biology Division, King's College London Dental Institute, London, United Kingdom
| | - Aedin C Culhane
- Harvard TH Chan School of Public Health, Boston, Massachusetts.,Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Benjamin Haibe-Kains
- Princess Margaret Cancer Center, University Health Network, Toronto, Ontario, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.,Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.,Ontario Institute of Cancer Research, Toronto, Ontario, Canada
| | - Kasper D Hansen
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland.,McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Hanish Kodali
- Graduate School of Public Health & Health Policy, City University of New York, New York, New York.,Institute for Implementation Science in Population Health, City University of New York, New York, New York
| | - Marie S Louis
- Graduate School of Public Health & Health Policy, City University of New York, New York, New York.,Institute for Implementation Science in Population Health, City University of New York, New York, New York
| | - Arvind S Mer
- Princess Margaret Cancer Center, University Health Network, Toronto, Ontario, Canada
| | - Markus Riester
- Novartis Institutes for BioMedical Research, Cambridge, Massachusetts
| | - Martin Morgan
- Roswell Park Cancer Institute, University of Buffalo, Buffalo, New York
| | - Vince Carey
- Harvard TH Chan School of Public Health, Boston, Massachusetts.,Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | - Levi Waldron
- Graduate School of Public Health & Health Policy, City University of New York, New York, New York. .,Institute for Implementation Science in Population Health, City University of New York, New York, New York
| |
Collapse
|
30
|
Gomez-Cabrero D, Marabita F, Tarazona S, Cano I, Roca J, Conesa A, Sabatier P, Tegnér J. Guidelines for Developing Successful Short Advanced Courses in Systems Medicine and Systems Biology. Cell Syst 2017; 5:168-175. [PMID: 28843483 DOI: 10.1016/j.cels.2017.05.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Revised: 02/21/2017] [Accepted: 05/31/2017] [Indexed: 12/15/2022]
Abstract
Systems medicine and systems biology have inherent educational challenges. These have largely been addressed either by providing new masters programs or by redesigning undergraduate programs. In contrast, short courses can respond to a different need: they can provide condensed updates for professionals across academia, the clinic, and industry. These courses have received less attention. Here, we share our experiences in developing and providing such courses to current and future leaders in systems biology and systems medicine. We present guidelines for how to reproduce our courses, and we offer suggestions for how to select students who will nurture an interdisciplinary learning environment and thrive there.
Collapse
Affiliation(s)
- David Gomez-Cabrero
- Unit of Computational Medicine, Department of Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden; Center for Molecular Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden; Unit of Clinical Epidemiology, Department of Medicine, Karolinska University Hospital, L8, 17176 Stockholm, Sweden; Science for Life Laboratory, 17121 Solna, Sweden; Mucosal and Salivary Biology Division, King's College London Dental Institute, London SE1 9RT, UK.
| | - Francesco Marabita
- Unit of Computational Medicine, Department of Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden; Center for Molecular Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden; Unit of Clinical Epidemiology, Department of Medicine, Karolinska University Hospital, L8, 17176 Stockholm, Sweden; Science for Life Laboratory, 17121 Solna, Sweden
| | - Sonia Tarazona
- Centro de Investigacion Principe Felipe, 46012 Valencia, Spain; Department of Applied Statistics, Operations Research and Quality, Universitat Politècnica de València, Camí de Vera, 46022 Valencia, Spain
| | - Isaac Cano
- Hospital Clinic de Barcelona, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Universitat de Barcelona, 08007 Barcelona, Spain; Center for Biomedical Network Research in Respiratory Diseases (CIBERES), 28029 Madrid, Spain
| | - Josep Roca
- Hospital Clinic de Barcelona, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Universitat de Barcelona, 08007 Barcelona, Spain; Center for Biomedical Network Research in Respiratory Diseases (CIBERES), 28029 Madrid, Spain
| | - Ana Conesa
- Centro de Investigacion Principe Felipe, 46012 Valencia, Spain; Microbiology and Cell Science Department, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL 32603, USA
| | - Philippe Sabatier
- TIMC-IMAG Laboratory, UMR 5525, Centre National de la Recherche Scientifique, Vetagro Sup, Université Grenoble-Alpes, 38400 Saint-Martin-d'Hères, France
| | - Jesper Tegnér
- Unit of Computational Medicine, Department of Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden; Center for Molecular Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden; Unit of Clinical Epidemiology, Department of Medicine, Karolinska University Hospital, L8, 17176 Stockholm, Sweden; Science for Life Laboratory, 17121 Solna, Sweden; Biological and Environmental Sciences and Engineering Division (BESE), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.
| |
Collapse
|
31
|
Ferrero G, Miano V, Beccuti M, Balbo G, De Bortoli M, Cordero F. Dissecting the genomic activity of a transcriptional regulator by the integrative analysis of omics data. Sci Rep 2017; 7:8564. [PMID: 28819152 PMCID: PMC5561104 DOI: 10.1038/s41598-017-08754-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Accepted: 07/13/2017] [Indexed: 12/19/2022] Open
Abstract
In the study of genomic regulation, strategies to integrate the data produced by Next Generation Sequencing (NGS)-based technologies in a meaningful ensemble are eagerly awaited and must continuously evolve. Here, we describe an integrative strategy for the analysis of data generated by chromatin immunoprecipitation followed by NGS which combines algorithms for data overlap, normalization and epigenetic state analysis. The performance of our strategy is illustrated by presenting the analysis of data relative to the transcriptional regulator Estrogen Receptor alpha (ERα) in MCF-7 breast cancer cells and of Glucocorticoid Receptor (GR) in A549 lung cancer cells. We went through the definition of reference cistromes for different experimental contexts, the integration of data relative to co-regulators and the overlay of chromatin states as defined by epigenetic marks in MCF-7 cells. With our strategy, we identified novel features of estrogen-independent ERα activity, including FoxM1 interaction, eRNAs transcription and a peculiar ontology of connected genes.
Collapse
Affiliation(s)
- Giulio Ferrero
- Center for Molecular Systems Biology, University of Turin, 10043, Orbassano, Turin, Italy.,Dept. of Computer Science, University of Turin, 10149, Turin, Italy.,Dept. of Biological and Clinical Sciences, University of Turin, 10043, Orbassano, Turin, Italy
| | - Valentina Miano
- Center for Molecular Systems Biology, University of Turin, 10043, Orbassano, Turin, Italy.,Dept. of Biological and Clinical Sciences, University of Turin, 10043, Orbassano, Turin, Italy
| | - Marco Beccuti
- Dept. of Computer Science, University of Turin, 10149, Turin, Italy
| | - Gianfranco Balbo
- Center for Molecular Systems Biology, University of Turin, 10043, Orbassano, Turin, Italy.,Dept. of Computer Science, University of Turin, 10149, Turin, Italy
| | - Michele De Bortoli
- Center for Molecular Systems Biology, University of Turin, 10043, Orbassano, Turin, Italy. .,Dept. of Biological and Clinical Sciences, University of Turin, 10043, Orbassano, Turin, Italy.
| | - Francesca Cordero
- Center for Molecular Systems Biology, University of Turin, 10043, Orbassano, Turin, Italy.,Dept. of Computer Science, University of Turin, 10149, Turin, Italy
| |
Collapse
|
32
|
Gandy LM, Gumm J, Fertig B, Thessen A, Kennish MJ, Chavan S, Marchionni L, Xia X, Shankrit S, Fertig EJ. Synthesizer: Expediting synthesis studies from context-free data with information retrieval techniques. PLoS One 2017; 12:e0175860. [PMID: 28437440 PMCID: PMC5402950 DOI: 10.1371/journal.pone.0175860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2016] [Accepted: 03/31/2017] [Indexed: 11/18/2022] Open
Abstract
Scientists have unprecedented access to a wide variety of high-quality datasets. These datasets, which are often independently curated, commonly use unstructured spreadsheets to store their data. Standardized annotations are essential to perform synthesis studies across investigators, but are often not used in practice. Therefore, accurately combining records in spreadsheets from differing studies requires tedious and error-prone human curation. These efforts result in a significant time and cost barrier to synthesis research. We propose an information retrieval inspired algorithm, Synthesize, that merges unstructured data automatically based on both column labels and values. Application of the Synthesize algorithm to cancer and ecological datasets had high accuracy (on the order of 85-100%). We further implement Synthesize in an open source web application, Synthesizer (https://github.com/lisagandy/synthesizer). The software accepts input as spreadsheets in comma separated value (CSV) format, visualizes the merged data, and outputs the results as a new spreadsheet. Synthesizer includes an easy to use graphical user interface, which enables the user to finish combining data and obtain perfect accuracy. Future work will allow detection of units to automatically merge continuous data and application of the algorithm to other data formats, including databases.
Collapse
Affiliation(s)
- Lisa M. Gandy
- Department of Computer Science, Central Michigan University, Mt Pleasant, MI, United States of America
- * E-mail: (LMG); (EJF)
| | - Jordan Gumm
- Department of Computer Science, Central Michigan University, Mt Pleasant, MI, United States of America
| | - Benjamin Fertig
- Ronin Institute for Independent Scholarship, Montclair, NJ, United States of America
| | - Anne Thessen
- Ronin Institute for Independent Scholarship, Montclair, NJ, United States of America
| | - Michael J. Kennish
- Department of Marine and Coastal Sciences, Rutgers University, New Brunswick, NJ, United States of America
| | - Sameer Chavan
- Colorado Center for Personalized Medicine, University of Colorado Denver, Denver, CO, United States of America
| | - Luigi Marchionni
- Department of Oncology, Johns Hopkins University, Baltimore, MD, United States of America
| | - Xiaoxin Xia
- Department of Oncology, Johns Hopkins University, Baltimore, MD, United States of America
| | - Shambhavi Shankrit
- Department of Oncology, Johns Hopkins University, Baltimore, MD, United States of America
| | - Elana J. Fertig
- Department of Oncology, Johns Hopkins University, Baltimore, MD, United States of America
- * E-mail: (LMG); (EJF)
| |
Collapse
|
33
|
Hernandez-Ferrer C, Ruiz-Arenas C, Beltran-Gomila A, González JR. MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration. BMC Bioinformatics 2017; 18:36. [PMID: 28095799 PMCID: PMC5240259 DOI: 10.1186/s12859-016-1455-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2016] [Accepted: 12/24/2016] [Indexed: 12/01/2022] Open
Abstract
BACKGROUND Reduction in the cost of genomic assays has generated large amounts of biomedical-related data. As a result, current studies perform multiple experiments in the same subjects. While Bioconductor's methods and classes implemented in different packages manage individual experiments, there is not a standard class to properly manage different omic datasets from the same subjects. In addition, most R/Bioconductor packages that have been designed to integrate and visualize biological data often use basic data structures with no clear general methods, such as subsetting or selecting samples. RESULTS To cover this need, we have developed MultiDataSet, a new R class based on Bioconductor standards, designed to encapsulate multiple data sets. MultiDataSet deals with the usual difficulties of managing multiple and non-complete data sets while offering a simple and general way of subsetting features and selecting samples. We illustrate the use of MultiDataSet in three common situations: 1) performing integration analysis with third party packages; 2) creating new methods and functions for omic data integration; 3) encapsulating new unimplemented data from any biological experiment. CONCLUSIONS MultiDataSet is a suitable class for data integration under R and Bioconductor framework.
Collapse
Affiliation(s)
- Carles Hernandez-Ferrer
- Institut de Salut Global de Barcelona (ISGlobal) - Campus Mar, Barcelona Biulding: Biomedical Research Park, c/Dr. Aiguader, 88, 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain
| | - Carlos Ruiz-Arenas
- Institut de Salut Global de Barcelona (ISGlobal) - Campus Mar, Barcelona Biulding: Biomedical Research Park, c/Dr. Aiguader, 88, 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain
| | - Alba Beltran-Gomila
- Institut de Salut Global de Barcelona (ISGlobal) - Campus Mar, Barcelona Biulding: Biomedical Research Park, c/Dr. Aiguader, 88, 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain
| | - Juan R. González
- Institut de Salut Global de Barcelona (ISGlobal) - Campus Mar, Barcelona Biulding: Biomedical Research Park, c/Dr. Aiguader, 88, 08003 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain
| |
Collapse
|
34
|
Walsh CJ, Hu P, Batt J, Dos Santos CC. Discovering MicroRNA-Regulatory Modules in Multi-Dimensional Cancer Genomic Data: A Survey of Computational Methods. Cancer Inform 2016; 15:25-42. [PMID: 27721651 PMCID: PMC5051584 DOI: 10.4137/cin.s39369] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2016] [Revised: 08/14/2016] [Accepted: 08/16/2016] [Indexed: 12/20/2022] Open
Abstract
MicroRNAs (miRs) are small single-stranded noncoding RNA that function in RNA silencing and post-transcriptional regulation of gene expression. An increasing number of studies have shown that miRs play an important role in tumorigenesis, and understanding the regulatory mechanism of miRs in this gene regulatory network will help elucidate the complex biological processes at play during malignancy. Despite advances, determination of miR–target interactions (MTIs) and identification of functional modules composed of miRs and their specific targets remain a challenge. A large amount of data generated by high-throughput methods from various sources are available to investigate MTIs. The development of data-driven tools to harness these multi-dimensional data has resulted in significant progress over the past decade. In parallel, large-scale cancer genomic projects are allowing new insights into the commonalities and disparities of miR–target regulation across cancers. In the first half of this review, we explore methods for identification of pairwise MTIs, and in the second half, we explore computational tools for discovery of miR-regulatory modules in a cancer-specific and pan-cancer context. We highlight strengths and limitations of each of these tools as a practical guide for the computational biologists.
Collapse
Affiliation(s)
- Christopher J Walsh
- Keenan and Li Ka Shing Knowledge Institute of Saint Michael's Hospital, Toronto, ON, Canada.; Institute of Medical Sciences and Department of Medicine, University of Toronto, Toronto, ON, Canada
| | - Pingzhao Hu
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, Canada
| | - Jane Batt
- Keenan and Li Ka Shing Knowledge Institute of Saint Michael's Hospital, Toronto, ON, Canada.; Institute of Medical Sciences and Department of Medicine, University of Toronto, Toronto, ON, Canada
| | - Claudia C Dos Santos
- Keenan and Li Ka Shing Knowledge Institute of Saint Michael's Hospital, Toronto, ON, Canada.; Institute of Medical Sciences and Department of Medicine, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
35
|
Microarray-based identification of genes associated with cancer progression and prognosis in hepatocellular carcinoma. JOURNAL OF EXPERIMENTAL & CLINICAL CANCER RESEARCH : CR 2016; 35:127. [PMID: 27567667 PMCID: PMC5002170 DOI: 10.1186/s13046-016-0403-2] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/19/2016] [Accepted: 08/09/2016] [Indexed: 12/13/2022]
Abstract
BACKGROUND Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related deaths. The average survival and 5-year survival rates of HCC patients still remains poor. Thus, there is an urgent need to better understand the mechanisms of cancer progression in HCC and to identify useful biomarkers to predict prognosis. METHODS Public data portals including Oncomine, The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) profiles were used to retrieve the HCC-related microarrays and to identify potential genes contributed to cancer progression. Bioinformatics analyses including pathway enrichment, protein/gene interaction and text mining were used to explain the potential roles of the identified genes in HCC. Quantitative real-time polymerase chain reaction analysis and Western blotting were used to measure the expression of the targets. The data were analysed by SPSS 20.0 software. RESULTS We identified 80 genes that were significantly dysregulated in HCC according to four independent microarrays covering 386 cases of HCC and 327 normal liver tissues. Twenty genes were consistently and stably dysregulated in the four microarrays by at least 2-fold and detection of gene expression by RT-qPCR and western blotting showed consistent expression profiles in 11 HCC tissues compared with corresponding paracancerous tissues. Eleven of these 20 genes were associated with disease-free survival (DFS) or overall survival (OS) in a cohort of 157 HCC patients, and eight genes were associated with tumour pathologic PT, tumour stage or vital status. Potential roles of those 20 genes in regulation of HCC progression were predicted, primarily in association with metastasis. INTS8 was specifically correlated with most clinical characteristics including DFS, OS, stage, metastasis, invasiveness, diagnosis, and age. CONCLUSION The significantly dysregulated genes identified in this study were associated with cancer progression and prognosis in HCC, and might be potential therapeutic targets for HCC treatment or potential biomarkers for diagnosis and prognosis.
Collapse
|
36
|
Silva TC, Colaprico A, Olsen C, D'Angelo F, Bontempi G, Ceccarelli M, Noushmehr H. TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages. F1000Res 2016; 5:1542. [PMID: 28232861 PMCID: PMC5302158 DOI: 10.12688/f1000research.8923.2] [Citation(s) in RCA: 104] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/24/2016] [Indexed: 01/09/2023] Open
Abstract
Biotechnological advances in sequencing have led to an explosion of publicly available data via large international consortia such as
The Cancer Genome Atlas (TCGA),
The Encyclopedia of DNA Elements (ENCODE), and
The NIH Roadmap Epigenomics Mapping Consortium (Roadmap). These projects have provided unprecedented opportunities to interrogate the epigenome of cultured cancer cell lines as well as normal and tumor tissues with high genomic resolution. The
Bioconductor project offers more than 1,000 open-source software and statistical packages to analyze high-throughput genomic data. However, most packages are designed for specific data types (e.g. expression, epigenetics, genomics) and there is no one comprehensive tool that provides a complete integrative analysis of the resources and data provided by all three public projects. A need to create an integration of these different analyses was recently proposed. In this workflow, we provide a series of biologically focused integrative analyses of different molecular data. We describe how to download, process and prepare TCGA data and by harnessing several key Bioconductor packages, we describe how to extract biologically meaningful genomic and epigenomic data. Using Roadmap and ENCODE data, we provide a work plan to identify biologically relevant functional epigenomic elements associated with cancer. To illustrate our workflow, we analyzed two types of brain tumors: low-grade glioma (LGG) versus high-grade glioma (glioblastoma multiform or GBM). This workflow introduces the following Bioconductor packages:
AnnotationHub,
ChIPSeeker,
ComplexHeatmap,
pathview,
ELMER,
GAIA,
MINET,
RTCGAToolbox,
TCGAbiolinks.
Collapse
Affiliation(s)
- Tiago C Silva
- Department of Genetics, Ribeirao Preto Medical School, University of Sao Paulo, Ribeirao Preto, Brazil; Department of Biomedical Sciences, Cedars-Sinai, Los Angeles, CA, USA
| | - Antonio Colaprico
- Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium; Machine Learning Group, ULB, Brussels, Belgium
| | - Catharina Olsen
- Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium; Machine Learning Group, ULB, Brussels, Belgium
| | - Fulvio D'Angelo
- Department of Science and Technology, University of Sannio, Benevento, Italy; Biogem, Istituto di Ricerche Genetiche Gaetano Salvatore, Avellino, Italy
| | - Gianluca Bontempi
- Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium; Machine Learning Group, ULB, Brussels, Belgium; Department of Science and Technology, University of Sannio, Benevento, Italy
| | | | - Houtan Noushmehr
- Department of Genetics, Ribeirao Preto Medical School, University of Sao Paulo, Ribeirao Preto, Brazil; Department of Neurosurgery, Henry Ford Hospital, Detroit, MI, USA
| |
Collapse
|
37
|
Silva TC, Colaprico A, Olsen C, D'Angelo F, Bontempi G, Ceccarelli M, Noushmehr H. TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages. F1000Res 2016. [PMID: 28232861 DOI: 10.12688/f1000research.8923.1] [Citation(s) in RCA: 144] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Biotechnological advances in sequencing have led to an explosion of publicly available data via large international consortia such as The Cancer Genome Atlas (TCGA), The Encyclopedia of DNA Elements (ENCODE), and The NIH Roadmap Epigenomics Mapping Consortium (Roadmap). These projects have provided unprecedented opportunities to interrogate the epigenome of cultured cancer cell lines as well as normal and tumor tissues with high genomic resolution. The Bioconductor project offers more than 1,000 open-source software and statistical packages to analyze high-throughput genomic data. However, most packages are designed for specific data types (e.g. expression, epigenetics, genomics) and there is no one comprehensive tool that provides a complete integrative analysis of the resources and data provided by all three public projects. A need to create an integration of these different analyses was recently proposed. In this workflow, we provide a series of biologically focused integrative analyses of different molecular data. We describe how to download, process and prepare TCGA data and by harnessing several key Bioconductor packages, we describe how to extract biologically meaningful genomic and epigenomic data. Using Roadmap and ENCODE data, we provide a work plan to identify biologically relevant functional epigenomic elements associated with cancer. To illustrate our workflow, we analyzed two types of brain tumors: low-grade glioma (LGG) versus high-grade glioma (glioblastoma multiform or GBM). This workflow introduces the following Bioconductor packages: AnnotationHub, ChIPSeeker, ComplexHeatmap, pathview, ELMER, GAIA, MINET, RTCGAToolbox, TCGAbiolinks.
Collapse
Affiliation(s)
- Tiago C Silva
- Department of Genetics, Ribeirao Preto Medical School, University of Sao Paulo, Ribeirao Preto, Brazil; Department of Biomedical Sciences, Cedars-Sinai, Los Angeles, CA, USA
| | - Antonio Colaprico
- Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium; Machine Learning Group, ULB, Brussels, Belgium
| | - Catharina Olsen
- Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium; Machine Learning Group, ULB, Brussels, Belgium
| | - Fulvio D'Angelo
- Department of Science and Technology, University of Sannio, Benevento, Italy; Biogem, Istituto di Ricerche Genetiche Gaetano Salvatore, Avellino, Italy
| | - Gianluca Bontempi
- Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium; Machine Learning Group, ULB, Brussels, Belgium; Department of Science and Technology, University of Sannio, Benevento, Italy
| | | | - Houtan Noushmehr
- Department of Genetics, Ribeirao Preto Medical School, University of Sao Paulo, Ribeirao Preto, Brazil; Department of Neurosurgery, Henry Ford Hospital, Detroit, MI, USA
| |
Collapse
|
38
|
Chen B, Butte AJ. Leveraging big data to transform target selection and drug discovery. Clin Pharmacol Ther 2016; 99:285-97. [PMID: 26659699 PMCID: PMC4785018 DOI: 10.1002/cpt.318] [Citation(s) in RCA: 103] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Accepted: 12/02/2015] [Indexed: 02/06/2023]
Abstract
The advances of genomics, sequencing, and high throughput technologies have led to the creation of large volumes of diverse datasets for drug discovery. Analyzing these datasets to better understand disease and discover new drugs is becoming more common. Recent open data initiatives in basic and clinical research have dramatically increased the types of data available to the public. The past few years have witnessed successful use of big data in many sectors across the whole drug discovery pipeline. In this review, we will highlight the state of the art in leveraging big data to identify new targets, drug indications, and drug response biomarkers in this era of precision medicine.
Collapse
Affiliation(s)
- B Chen
- Institute for Computational Health Sciences, University of California, San Francisco, San Francisco, California, USA
| | - A J Butte
- Institute for Computational Health Sciences, University of California, San Francisco, San Francisco, California, USA
| |
Collapse
|