1
|
Cassan O, Lecellier CH, Martin A, Bréhélin L, Lèbre S. Optimizing data integration improves gene regulatory network inference in Arabidopsis thaliana. Bioinformatics 2024; 40:btae415. [PMID: 38913855 PMCID: PMC11227367 DOI: 10.1093/bioinformatics/btae415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 06/12/2024] [Accepted: 06/21/2024] [Indexed: 06/26/2024] Open
Abstract
MOTIVATIONS Gene regulatory networks (GRNs) are traditionally inferred from gene expression profiles monitoring a specific condition or treatment. In the last decade, integrative strategies have successfully emerged to guide GRN inference from gene expression with complementary prior data. However, datasets used as prior information and validation gold standards are often related and limited to a subset of genes. This lack of complete and independent evaluation calls for new criteria to robustly estimate the optimal intensity of prior data integration in the inference process. RESULTS We address this issue for two regression-based GRN inference models, a weighted random forest (weigthedRF) and a generalized linear model estimated under a weighted LASSO penalty with stability selection (weightedLASSO). These approaches are applied to data from the root response to nitrate induction in Arabidopsis thaliana. For each gene, we measure how the integration of transcription factor binding motifs influences model prediction. We propose a new approach, DIOgene, that uses model prediction error and a simulated null hypothesis in order to optimize data integration strength in a hypothesis-driven, gene-specific manner. This integration scheme reveals a strong diversity of optimal integration intensities between genes, and offers good performance in minimizing prediction error as well as retrieving experimental interactions. Experimental results show that DIOgene compares favorably against state-of-the-art approaches and allows to recover master regulators of nitrate induction. AVAILABILITY AND IMPLEMENTATION The R code and notebooks demonstrating the use of the proposed approaches are available in the repository https://github.com/OceaneCsn/integrative_GRN_N_induction.
Collapse
Affiliation(s)
- Océane Cassan
- LIRMM, Univ Montpellier, CNRS, Montpellier, 34095, France
| | - Charles-Henri Lecellier
- LIRMM, Univ Montpellier, CNRS, Montpellier, 34095, France
- IGMM, Univ Montpellier, CNRS, Montpellier, 34090, France
| | - Antoine Martin
- IPSIM, CNRS, INRAE, Institut Agro, Univ Montpellier, 34060, Montpellier, France
| | | | - Sophie Lèbre
- LIRMM, Univ Montpellier, CNRS, Montpellier, 34095, France
- IMAG, Univ Montpellier, CNRS, Montpellier, 34090, France
- Université Paul-Valéry-Montpellier 3, Montpellier, 34090, France
| |
Collapse
|
2
|
Hecker D, Lauber M, Behjati Ardakani F, Ashrafiyan S, Manz Q, Kersting J, Hoffmann M, Schulz MH, List M. Computational tools for inferring transcription factor activity. Proteomics 2023; 23:e2200462. [PMID: 37706624 DOI: 10.1002/pmic.202200462] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 08/11/2023] [Accepted: 08/22/2023] [Indexed: 09/15/2023]
Abstract
Transcription factors (TFs) are essential players in orchestrating the regulatory landscape in cells. Still, their exact modes of action and dependencies on other regulatory aspects remain elusive. Since TFs act cell type-specific and each TF has its own characteristics, untangling their regulatory interactions from an experimental point of view is laborious and convoluted. Thus, there is an ongoing development of computational tools that estimate transcription factor activity (TFA) from a variety of data modalities, either based on a mapping of TFs to their putative target genes or in a genome-wide, gene-unspecific fashion. These tools can help to gain insights into TF regulation and to prioritize candidates for experimental validation. We want to give an overview of available computational tools that estimate TFA, illustrate examples of their application, debate common result validation strategies, and discuss assumptions and concomitant limitations.
Collapse
Affiliation(s)
- Dennis Hecker
- Goethe University Frankfurt, Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, Frankfurt am Main, Germany
| | - Michael Lauber
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Fatemeh Behjati Ardakani
- Goethe University Frankfurt, Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, Frankfurt am Main, Germany
| | - Shamim Ashrafiyan
- Goethe University Frankfurt, Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, Frankfurt am Main, Germany
| | - Quirin Manz
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Johannes Kersting
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
- GeneSurge GmbH, München, Germany
| | - Markus Hoffmann
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
- Institute for Advanced Study, Technical University of Munich, Garching, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Marcel H Schulz
- Goethe University Frankfurt, Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, Frankfurt am Main, Germany
| | - Markus List
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| |
Collapse
|
3
|
Somers J, Fenner M, Kong G, Thirumalaisamy D, Yashar WM, Thapa K, Kinali M, Nikolova O, Babur Ö, Demir E. A framework for considering prior information in network-based approaches to omics data analysis. Proteomics 2023; 23:e2200402. [PMID: 37986684 DOI: 10.1002/pmic.202200402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 09/20/2023] [Accepted: 09/21/2023] [Indexed: 11/22/2023]
Abstract
For decades, molecular biologists have been uncovering the mechanics of biological systems. Efforts to bring their findings together have led to the development of multiple databases and information systems that capture and present pathway information in a computable network format. Concurrently, the advent of modern omics technologies has empowered researchers to systematically profile cellular processes across different modalities. Numerous algorithms, methodologies, and tools have been developed to use prior knowledge networks (PKNs) in the analysis of omics datasets. Interestingly, it has been repeatedly demonstrated that the source of prior knowledge can greatly impact the results of a given analysis. For these methods to be successful it is paramount that their selection of PKNs is amenable to the data type and the computational task they aim to accomplish. Here we present a five-level framework that broadly describes network models in terms of their scope, level of detail, and ability to inform causal predictions. To contextualize this framework, we review a handful of network-based omics analysis methods at each level, while also describing the computational tasks they aim to accomplish.
Collapse
Affiliation(s)
- Julia Somers
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA
| | - Madeleine Fenner
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA
| | - Garth Kong
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA
- Division of Oncological Sciences, Oregon Health and Science University, Portland, Oregon, USA
| | - Dharani Thirumalaisamy
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA
| | - William M Yashar
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA
- Division of Oncological Sciences, Oregon Health and Science University, Portland, Oregon, USA
| | - Kisan Thapa
- Computer Science Department, University of Massachusetts Boston, College of Science and Mathematics, Boston, Massachusetts, USA
| | - Meric Kinali
- Computer Science Department, University of Massachusetts Boston, College of Science and Mathematics, Boston, Massachusetts, USA
| | - Olga Nikolova
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA
- Division of Oncological Sciences, Oregon Health and Science University, Portland, Oregon, USA
| | - Özgün Babur
- Computer Science Department, University of Massachusetts Boston, College of Science and Mathematics, Boston, Massachusetts, USA
| | - Emek Demir
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, Oregon, USA
| |
Collapse
|
4
|
Li X, Lappalainen T, Bussemaker HJ. Identifying genetic regulatory variants that affect transcription factor activity. CELL GENOMICS 2023; 3:100382. [PMID: 37719147 PMCID: PMC10504674 DOI: 10.1016/j.xgen.2023.100382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 05/19/2023] [Accepted: 07/21/2023] [Indexed: 09/19/2023]
Abstract
Genetic variants affecting gene expression levels in humans have been mapped in the Genotype-Tissue Expression (GTEx) project. Trans-acting variants impacting many genes simultaneously through a shared transcription factor (TF) are of particular interest. Here, we developed a generalized linear model (GLM) to estimate protein-level TF activity levels in an individual-specific manner from GTEx RNA sequencing (RNA-seq) profiles. It uses observed differential gene expression after TF perturbation as a predictor and, by analyzing differential expression within pairs of neighboring genes, controls for the confounding effect of variation in chromatin state along the genome. We inferred genotype-specific activities for 55 TFs across 49 tissues. Subsequently performing genome-wide association analysis on this virtual trait revealed TF activity quantitative trait loci (aQTLs) that, as a set, are enriched for functional features. Altogether, the set of tools we introduce here highlights the potential of genetic association studies for cellular endophenotypes based on a network-based multi-omics approach. The transparent peer review record is available.
Collapse
Affiliation(s)
- Xiaoting Li
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Tuuli Lappalainen
- New York Genome Center, New York, NY 10013, USA
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Harmen J. Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| |
Collapse
|
5
|
Huang X, Yu J, Lai S, Li Z, Qu F, Fu X, Li Q, Zhong X, Zhang D, Li H. Long Non-Coding RNA LINC00052 Targets miR-548p/Notch2/Pyk2 to Modulate Tumor Budding and Metastasis of Human Breast Cancer. Biochem Genet 2023; 61:336-353. [PMID: 35918619 DOI: 10.1007/s10528-022-10255-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2022] [Accepted: 06/22/2022] [Indexed: 01/24/2023]
Abstract
Abnormal expression of long non-coding RNAs (lncRNAs) is involved in many pathological processes of cancers. However, the role of lncRNA LINC00052 in breast cancer progression is still unclear. Here, LINC00052 expression was detected by in situ hybridization and quantitative real-time PCR assays. Cell Counting Kit-8, wound healing, and transwell assays were used to investigate changes in the proliferation, migration, and invasion of breast cancer cells. MiR-548p was found associated with LINC00052 or Notch2 by RNA pull-down, dual-luciferase reporter, and qRT-PCR assays. The effect of LINC00052 on lung metastasis was explored through in vivo experiments. High LINC00052 expression was observed in breast cancer tissues and cells. LINC00052 silencing inhibited the proliferation, migration, and invasion of MCF7 cells, and LINC00052 overexpression produced the opposite results. MiR-548p, a target gene of LINC00052, partially rescued the effects of LINC00052 on proliferation, migration, and invasion of MCF7. Notch2 was the target of miR-548p and LINC00052 could promote Notch2 expression. Moreover, the phosphorylation of proline-rich tyrosine kinase 2 (Pyk2), a downstream factor of Notch2, was increased by LINC00052, and a Pyk2 mutant could inhibit the cell migration and invasion induced by LINC00052 overexpression in MDA-MB-468 cells, which was similar to the function of the miR-548p mimic. We further demonstrated that LINC00052 exacerbated the metastases of breast cancer cells in vivo. Our research demonstrated that LINC00052 is highly expressed in breast cancer and promotes breast cancer proliferation, migration, and invasion via the miR-548p/Notch2/Pyk2 axis. LINC00052 could serve as a potential therapeutic target for breast cancer.
Collapse
Affiliation(s)
- Xiaojia Huang
- Department of Breast Surgery, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital of Sun Yat-Sen University, No. 26 Erheng Road, Yuancun, Tianhe District, Guangzhou, 510655, Guangdong, China
| | - Junli Yu
- Department of Medical Ultrasound, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, 510655, Guangdong, China
| | - Shengqing Lai
- Department of Breast Surgery, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital of Sun Yat-Sen University, No. 26 Erheng Road, Yuancun, Tianhe District, Guangzhou, 510655, Guangdong, China
| | - Zongyan Li
- Department of Breast Surgery, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital of Sun Yat-Sen University, No. 26 Erheng Road, Yuancun, Tianhe District, Guangzhou, 510655, Guangdong, China
| | - Fanli Qu
- Department of Breast Surgery, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital of Sun Yat-Sen University, No. 26 Erheng Road, Yuancun, Tianhe District, Guangzhou, 510655, Guangdong, China
| | - Xiaoyan Fu
- Department of Breast Surgery, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital of Sun Yat-Sen University, No. 26 Erheng Road, Yuancun, Tianhe District, Guangzhou, 510655, Guangdong, China
| | - Qian Li
- Department of Breast Surgery, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital of Sun Yat-Sen University, No. 26 Erheng Road, Yuancun, Tianhe District, Guangzhou, 510655, Guangdong, China
| | - Xiaofang Zhong
- Department of Breast Surgery, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital of Sun Yat-Sen University, No. 26 Erheng Road, Yuancun, Tianhe District, Guangzhou, 510655, Guangdong, China
| | - Dawei Zhang
- Department of Pancreatic Hepatobiliary Surgery, The Sixth Affiliated Hospital of Sun Yat-Sen University, Guangzhou, 510655, Guangdong, China
| | - Haiyan Li
- Department of Breast Surgery, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital of Sun Yat-Sen University, No. 26 Erheng Road, Yuancun, Tianhe District, Guangzhou, 510655, Guangdong, China.
| |
Collapse
|
6
|
Pačínková A, Popovici V. Using empirical biological knowledge to infer regulatory networks from multi-omics data. BMC Bioinformatics 2022; 23:351. [PMID: 35996085 PMCID: PMC9396869 DOI: 10.1186/s12859-022-04891-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 08/08/2022] [Indexed: 12/13/2022] Open
Abstract
Background Integration of multi-omics data can provide a more complex view of the biological system consisting of different interconnected molecular components, the crucial aspect for developing novel personalised therapeutic strategies for complex diseases. Various tools have been developed to integrate multi-omics data. However, an efficient multi-omics framework for regulatory network inference at the genome level that incorporates prior knowledge is still to emerge. Results We present IntOMICS, an efficient integrative framework based on Bayesian networks. IntOMICS systematically analyses gene expression, DNA methylation, copy number variation and biological prior knowledge to infer regulatory networks. IntOMICS complements the missing biological prior knowledge by so-called empirical biological knowledge, estimated from the available experimental data. Regulatory networks derived from IntOMICS provide deeper insights into the complex flow of genetic information on top of the increasing accuracy trend compared to a published algorithm designed exclusively for gene expression data. The ability to capture relevant crosstalks between multi-omics modalities is verified using known associations in microsatellite stable/instable colon cancer samples. Additionally, IntOMICS performance is compared with two algorithms for multi-omics regulatory network inference that can also incorporate prior knowledge in the inference framework. IntOMICS is also applied to detect potential predictive biomarkers in microsatellite stable stage III colon cancer samples. Conclusions We provide IntOMICS, a framework for multi-omics data integration using a novel approach to biological knowledge discovery. IntOMICS is a powerful resource for exploratory systems biology and can provide valuable insights into the complex mechanisms of biological processes that have a vital role in personalised medicine. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04891-9.
Collapse
Affiliation(s)
- Anna Pačínková
- RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic. .,Faculty of Informatics, Masaryk University, Botanicka 68a, Brno, Czech Republic.
| | - Vlad Popovici
- RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic
| |
Collapse
|
7
|
Inference of Molecular Regulatory Systems Using Statistical Path-Consistency Algorithm. ENTROPY 2022; 24:e24050693. [PMID: 35626576 PMCID: PMC9142129 DOI: 10.3390/e24050693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 05/12/2022] [Accepted: 05/12/2022] [Indexed: 11/16/2022]
Abstract
One of the key challenges in systems biology and molecular sciences is how to infer regulatory relationships between genes and proteins using high-throughout omics datasets. Although a wide range of methods have been designed to reverse engineer the regulatory networks, recent studies show that the inferred network may depend on the variable order in the dataset. In this work, we develop a new algorithm, called the statistical path-consistency algorithm (SPCA), to solve the problem of the dependence of variable order. This method generates a number of different variable orders using random samples, and then infers a network by using the path-consistent algorithm based on each variable order. We propose measures to determine the edge weights using the corresponding edge weights in the inferred networks, and choose the edges with the largest weights as the putative regulations between genes or proteins. The developed method is rigorously assessed by the six benchmark networks in DREAM challenges, the mitogen-activated protein (MAP) kinase pathway, and a cancer-specific gene regulatory network. The inferred networks are compared with those obtained by using two up-to-date inference methods. The accuracy of the inferred networks shows that the developed method is effective for discovering molecular regulatory systems.
Collapse
|
8
|
Zhang JZ, Xu W, Hu P. Tightly Integrated Multiomics-based Deep Tensor Survival Model for Time-to-Event Prediction. Bioinformatics 2022; 38:3259-3266. [PMID: 35445698 DOI: 10.1093/bioinformatics/btac286] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 03/12/2022] [Accepted: 04/18/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Multiomics cancer profiles provide essential signals for predicting cancer survival. It is challenging to reveal the complex patterns from multiple types of data and link them to survival outcomes. We aim to develop a new deep learning-based algorithm to integrate three types of high-dimensional omics data measured on the same individuals to improve cancer survival outcome prediction. RESULTS We built a three-dimension tensor to integrate multi-omics cancer data and factorized it into two-dimension matrices of latent factors, which were fed into neural networks-based survival networks. The new algorithm and other multi-omics-based algorithms, as well as individual genomic-based survival analysis algorithms, were applied to the breast cancer data colon and rectal cancer data from The Cancer Genome Atlas (TCGA) program. We evaluated the goodness-of-fit using the concordance index (C-index) and Integrated Brier Score (IBS). We demonstrated that the proposed tight integration framework has better survival prediction performance than the models using individual genomic data and other conventional data integration methods. AVAILABILITY https://github.com/jasperzyzhang/DeepTensorSurvival. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jasper Zhongyuan Zhang
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario M5T 3M7, Canada
| | - Wei Xu
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario M5T 3M7, Canada.,Biostatistics Department, Princess Margaret Cancer Centre, Toronto, Ontario M5G 2M9, Canada
| | - Pingzhao Hu
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario M5T 3M7, Canada.,Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, Manitoba, R3E 0J9, Canada.,CancerCare Manitoba Research Institute, CancerCare Manitoba, Winnipeg, Manitoba, R3E 0V9, Canada.,Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, R3T 2N2, Canada
| |
Collapse
|
9
|
Computing microRNA-gene interaction networks in pan-cancer using miRDriver. Sci Rep 2022; 12:3717. [PMID: 35260634 PMCID: PMC8904490 DOI: 10.1038/s41598-022-07628-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 02/18/2022] [Indexed: 11/13/2022] Open
Abstract
DNA copy number aberrated regions in cancer are known to harbor cancer driver genes and the short non-coding RNA molecules, i.e., microRNAs. In this study, we integrated the multi-omics datasets such as copy number aberration, DNA methylation, gene and microRNA expression to identify the signature microRNA-gene associations from frequently aberrated DNA regions across pan-cancer utilizing a LASSO-based regression approach. We studied 7294 patient samples associated with eighteen different cancer types from The Cancer Genome Atlas (TCGA) database and identified several cancer-specific and common microRNA-gene interactions enriched in experimentally validated microRNA-target interactions. We highlighted several oncogenic and tumor suppressor microRNAs that were cancer-specific and common in several cancer types. Our method substantially outperformed the five state-of-art methods in selecting significantly known microRNA-gene interactions in multiple cancer types. Several microRNAs and genes were found to be associated with tumor survival and progression. Selected target genes were found to be significantly enriched in cancer-related pathways, cancer hallmark and Gene Ontology (GO) terms. Furthermore, subtype-specific potential gene signatures were discovered in multiple cancer types.
Collapse
|
10
|
Cai M, Chen N. The Roles of IRF-8 in Regulating IL-9-Mediated Immunologic Mechanisms in the Development of DLBCL: A State-of-the-Art Literature Review. Front Oncol 2022; 12:817069. [PMID: 35211408 PMCID: PMC8860898 DOI: 10.3389/fonc.2022.817069] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 01/18/2022] [Indexed: 01/05/2023] Open
Abstract
Interferon regulatory factor 8 (IRF-8) is a transcription suppressor that functions through associations with other transcription factors, contributing to the growth and differentiation of bone marrow cells and the activation of macrophages. IRF-8 expression profoundly affects pathogenic processes ranging from infections to blood diseases. Interleukin-9 (IL-9) is a multipotent cytokine that acts on a variety of immune cells by binding to the IL-9 receptor (IL-9R) and is involved in a variety of diseases such as cancer, autoimmune diseases, and other pathogen-mediated immune regulatory diseases. Studies have shown that IL-9 levels are significantly increased in the serum of patients with diffuse large B-cell lymphoma (DLBCL), and IL-9 levels are correlated with the DLBCL prognostic index. The activator protein-1 (AP-1) complex is a dimeric transcription factor that plays a critical role in cellular proliferation, apoptosis, angiogenesis, oncogene-induced transformation, and invasion by controlling basic and induced transcription of several genes containing the AP-1 locus. The AP-1 complex is involved in many cancers, including hematological tumors. In this report, we systematically review the precise roles of IL-9, IRF-8, and AP-1 in tumor development, particularly with regard to DLBCL. Finally, the recent progress in IRF-8 and IL-9 research is presented; the possible relationship among IRF-8, IL-9, and AP-1 family members is analyzed; and future research prospects are discussed.
Collapse
Affiliation(s)
- Mingyue Cai
- Provincial Hospital Affiliated to Shandong First Medical University, Department of Hematology, Jinan, China
| | - Na Chen
- Provincial Hospital Affiliated to Shandong First Medical University, Department of Hematology, Jinan, China.,School of Medicine, Shandong University, Jinan, China
| |
Collapse
|
11
|
Ma CZ, Brent MR. Inferring TF activities and activity regulators from gene expression data with constraints from TF perturbation data. Bioinformatics 2021; 37:1234-1245. [PMID: 33135076 PMCID: PMC8189679 DOI: 10.1093/bioinformatics/btaa947] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 09/26/2020] [Accepted: 10/27/2020] [Indexed: 12/20/2022] Open
Abstract
Motivation The activity of a transcription factor (TF) in a sample of cells is the extent to which it is exerting its regulatory potential. Many methods of inferring TF activity from gene expression data have been described, but due to the lack of appropriate large-scale datasets, systematic and objective validation has not been possible until now. Results We systematically evaluate and optimize the approach to TF activity inference in which a gene expression matrix is factored into a condition-independent matrix of control strengths and a condition-dependent matrix of TF activity levels. We find that expression data in which the activities of individual TFs have been perturbed are both necessary and sufficient for obtaining good performance. To a considerable extent, control strengths inferred using expression data from one growth condition carry over to other conditions, so the control strength matrices derived here can be used by others. Finally, we apply these methods to gain insight into the upstream factors that regulate the activities of yeast TFs Gcr2, Gln3, Gcn4 and Msn2. Availability and implementation Evaluation code and data are available at https://doi.org/10.5281/zenodo.4050573. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cynthia Z Ma
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA
| | - Michael R Brent
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| |
Collapse
|
12
|
Mahmoodi SH, Aghdam R, Eslahchi C. An order independent algorithm for inferring gene regulatory network using quantile value for conditional independence tests. Sci Rep 2021; 11:7605. [PMID: 33828122 PMCID: PMC8027014 DOI: 10.1038/s41598-021-87074-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 03/24/2021] [Indexed: 10/31/2022] Open
Abstract
In recent years, due to the difficulty and inefficiency of experimental methods, numerous computational methods have been introduced for inferring the structure of Gene Regulatory Networks (GRNs). The Path Consistency (PC) algorithm is one of the popular methods to infer the structure of GRNs. However, this group of methods still has limitations and there is a potential for improvements in this field. For example, the PC-based algorithms are still sensitive to the ordering of nodes i.e. different node orders results in different network structures. The second is that the networks inferred by these methods are highly dependent on the threshold used for independence testing. Also, it is still a challenge to select the set of conditional genes in an optimal way, which affects the performance and computation complexity of the PC-based algorithm. We introduce a novel algorithm, namely Order Independent PC-based algorithm using Quantile value (OIPCQ), which improves the accuracy of the learning process of GRNs and solves the order dependency issue. The quantile-based thresholds are considered for different orders of CMI tests. For conditional gene selection, we consider the paths between genes with length equal or greater than 2 while other well-known PC-based methods only consider the paths of length 2. We applied OIPCQ on the various networks of the DREAM3 and DREAM4 in silico challenges. As a real-world case study, we used OIPCQ to reconstruct SOS DNA network obtained from Escherichia coli and GRN for acute myeloid leukemia based on the RNA sequencing data from The Cancer Genome Atlas. The results show that OIPCQ produces the same network structure for all the permutations of the genes and improves the resulted GRN through accurately quantifying the causal regulation strength in comparison with other well-known PC-based methods. According to the GRN constructed by OIPCQ, for acute myeloid leukemia, two regulators BCLAF1 and NRSF reported previously are significantly important. However, the highest degree nodes in this GRN are ZBTB7A and PU1 which play a significant role in cancer, especially in leukemia. OIPCQ is freely accessible at https://github.com/haammim/OIPCQ-and-OIPCQ2 .
Collapse
Affiliation(s)
- Sayyed Hadi Mahmoodi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
| | - Rosa Aghdam
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran. .,School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| | - Changiz Eslahchi
- Department of Computer and Data Sciences, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran. .,School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| |
Collapse
|
13
|
Scherer M, Schmidt F, Lazareva O, Walter J, Baumbach J, Schulz MH, List M. Machine learning for deciphering cell heterogeneity and gene regulation. NATURE COMPUTATIONAL SCIENCE 2021; 1:183-191. [PMID: 38183187 DOI: 10.1038/s43588-021-00038-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 02/08/2021] [Indexed: 12/14/2022]
Abstract
Epigenetics studies inheritable and reversible modifications of DNA that allow cells to control gene expression throughout their development and in response to environmental conditions. In computational epigenomics, machine learning is applied to study various epigenetic mechanisms genome wide. Its aim is to expand our understanding of cell differentiation, that is their specialization, in health and disease. Thus far, most efforts focus on understanding the functional encoding of the genome and on unraveling cell-type heterogeneity. Here, we provide an overview of state-of-the-art computational methods and their underlying statistical concepts, which range from matrix factorization and regularized linear regression to deep learning methods. We further show how the rise of single-cell technology leads to new computational challenges and creates opportunities to further our understanding of epigenetic regulation.
Collapse
Affiliation(s)
- Michael Scherer
- Department of Genetics/Epigenetics, Saarland University, Saarbrücken, Germany
- Computational Biology Group, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
- Graduate School of Computer Science, Saarland Informatics Campus, Saarbrücken, Germany
| | | | - Olga Lazareva
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Jörn Walter
- Computational Biology Group, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
| | - Jan Baumbach
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
- Computational BioMedicine Lab, Institute of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
- Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Marcel H Schulz
- Institute of Cardiovascular Regeneration, University Hospital and Goethe University Frankfurt, Frankfurt, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany.
| |
Collapse
|
14
|
Zeng W, Wang Y, Jiang R. Integrating distal and proximal information to predict gene expression via a densely connected convolutional neural network. Bioinformatics 2020; 36:496-503. [PMID: 31318408 DOI: 10.1093/bioinformatics/btz562] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Revised: 05/19/2019] [Accepted: 07/16/2019] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Interactions among cis-regulatory elements such as enhancers and promoters are main driving forces shaping context-specific chromatin structure and gene expression. Although there have been computational methods for predicting gene expression from genomic and epigenomic information, most of them neglect long-range enhancer-promoter interactions, due to the difficulty in precisely linking regulatory enhancers to target genes. Recently, HiChIP, a novel high-throughput experimental approach, has generated comprehensive data on high-resolution interactions between promoters and distal enhancers. Moreover, plenty of studies suggest that deep learning achieves state-of-the-art performance in epigenomic signal prediction, and thus promoting the understanding of regulatory elements. In consideration of these two factors, we integrate proximal promoter sequences and HiChIP distal enhancer-promoter interactions to accurately predict gene expression. RESULTS We propose DeepExpression, a densely connected convolutional neural network, to predict gene expression using both promoter sequences and enhancer-promoter interactions. We demonstrate that our model consistently outperforms baseline methods, not only in the classification of binary gene expression status but also in regression of continuous gene expression levels, in both cross-validation experiments and cross-cell line predictions. We show that the sequential promoter information is more informative than the experimental enhancer information; meanwhile, the enhancer-promoter interactions within ±100 kbp around the TSS of a gene are most beneficial. We finally visualize motifs in both promoter and enhancer regions and show the match of identified sequence signatures with known motifs. We expect to see a wide spectrum of applications using HiChIP data in deciphering the mechanism of gene regulation. AVAILABILITY AND IMPLEMENTATION DeepExpression is freely available at https://github.com/wanwenzeng/DeepExpression. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wanwen Zeng
- MOE Key Laboratory of Bioinformatics, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Yong Wang
- CEMS, NCMIS, MDIS, Academy of Mathematics and Systems Science, National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing 100080, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
| | - Rui Jiang
- MOE Key Laboratory of Bioinformatics, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|
15
|
Liu Y, Shi N, Regev A, He S, Hemann MT. Integrated regulatory models for inference of subtype-specific susceptibilities in glioblastoma. Mol Syst Biol 2020; 16:e9506. [PMID: 32974985 PMCID: PMC7516378 DOI: 10.15252/msb.20209506] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 08/25/2020] [Accepted: 08/27/2020] [Indexed: 12/15/2022] Open
Abstract
Glioblastoma multiforme (GBM) is a highly malignant form of cancer that lacks effective treatment options or well-defined strategies for personalized cancer therapy. The disease has been stratified into distinct molecular subtypes; however, the underlying regulatory circuitry that gives rise to such heterogeneity and its implications for therapy remain unclear. We developed a modular computational pipeline, Integrative Modeling of Transcription Regulatory Interactions for Systematic Inference of Susceptibility in Cancer (inTRINSiC), to dissect subtype-specific regulatory programs and predict genetic dependencies in individual patient tumors. Using a multilayer network consisting of 518 transcription factors (TFs), 10,733 target genes, and a signaling layer of 3,132 proteins, we were able to accurately identify differential regulatory activity of TFs that shape subtype-specific expression landscapes. Our models also allowed inference of mechanisms for altered TF behavior in different GBM subtypes. Most importantly, we were able to use the multilayer models to perform an in silico perturbation analysis to infer differential genetic vulnerabilities across GBM subtypes and pinpoint the MYB family member MYBL2 as a drug target specific for the Proneural subtype.
Collapse
Affiliation(s)
- Yunpeng Liu
- Department of BiologyMassachusetts Institute of TechnologyCambridgeMAUSA
- MIT Koch Institute for Integrative Cancer ResearchCambridgeMAUSA
- Broad Institute of MIT and HarvardCambridgeMAUSA
| | - Ning Shi
- School of Computer ScienceUniversity of BirminghamBirminghamUK
| | - Aviv Regev
- Department of BiologyMassachusetts Institute of TechnologyCambridgeMAUSA
- MIT Koch Institute for Integrative Cancer ResearchCambridgeMAUSA
- Broad Institute of MIT and HarvardCambridgeMAUSA
| | - Shan He
- School of Computer ScienceUniversity of BirminghamBirminghamUK
| | - Michael T Hemann
- Department of BiologyMassachusetts Institute of TechnologyCambridgeMAUSA
- MIT Koch Institute for Integrative Cancer ResearchCambridgeMAUSA
- Broad Institute of MIT and HarvardCambridgeMAUSA
| |
Collapse
|
16
|
Cui J, Shu J. Circulating microRNA trafficking and regulation: computational principles and practice. Brief Bioinform 2020; 21:1313-1326. [PMID: 31504144 PMCID: PMC7412956 DOI: 10.1093/bib/bbz079] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Revised: 06/07/2019] [Accepted: 06/07/2019] [Indexed: 01/18/2023] Open
Abstract
Rapid advances in genomics discovery tools and a growing realization of microRNA's implication in intercellular communication have led to a proliferation of studies of circulating microRNA sorting and regulation across cells and different species. Although sometimes, reaching controversial scientific discoveries and conclusions, these studies have yielded new insights in the functional roles of circulating microRNA and a plethora of analytical methods and tools. Here, we consider this body of work in light of key computational principles underpinning discovery of circulating microRNAs in terms of their sorting and targeting, with the goal of providing practical guidance for applications that is focused on the design and analysis of circulating microRNAs and their context-dependent regulation. We survey a broad range of informatics methods and tools that are available to the researcher, discuss their key features, applications and various unsolved problems and close this review with prospects and broader implication of this field.
Collapse
Affiliation(s)
- Juan Cui
- Systems Biology and Biomedical Informatics Laboratory, Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE, USA
| | - Jiang Shu
- Systems Biology and Biomedical Informatics Laboratory, Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE, USA
| |
Collapse
|
17
|
Wang H, Lin SY, Hu FF, Guo AY, Hu H. The expression and regulation of HOX genes and membrane proteins among different cytogenetic groups of acute myeloid leukemia. Mol Genet Genomic Med 2020; 8:e1365. [PMID: 32614525 PMCID: PMC7507697 DOI: 10.1002/mgg3.1365] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Revised: 05/21/2020] [Accepted: 05/22/2020] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND The cytogenetic aberrations were considered as markers for diagnosis and prognosis in acute myeloid leukemia (AML), while the expression and regulation under different cytogenetic groups remain to be fully elucidated. METHODS In this paper, for favorable, poor, and cytogenetically normal groups of AML patients, we performed comprehensive bioinformatics analyses including identifying differentially expressed genes (DEGs) and microRNAs (miRNAs) among them, functional enrichment and regulatory networks. RESULTS We found that DEGs were enriched in membrane-related processes. Eleven genes and two miRNAs were significantly differentially expressed among these three AML groups. In survival analysis, membrane-related genes and several miRNAs were significant on prognostic outcome. Notably, six HOXA and three HOXB genes were significantly in low expression and high methylation in AML with favorable cytogenetics. Meanwhile, the miRNA-HOX gene co-regulatory networks revealed that HOXA5 was a hub node and regulated an AML oncogene SPARC. CONCLUSION Our work may provide novel insights to the molecular characteristics and classification between AML with different cytogenetics.
Collapse
Affiliation(s)
- Huili Wang
- Department of Environmental Engineering, Wenhua College, Wuhan, China
| | - Sheng-Yan Lin
- Hubei Bioinformatics & Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Fei-Fei Hu
- Hubei Bioinformatics & Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - An-Yuan Guo
- Hubei Bioinformatics & Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Hui Hu
- Hubei Bioinformatics & Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
18
|
Poos AM, Kordaß T, Kolte A, Ast V, Oswald M, Rippe K, König R. Modelling TERT regulation across 19 different cancer types based on the MIPRIP 2.0 gene regulatory network approach. BMC Bioinformatics 2019; 20:737. [PMID: 31888467 PMCID: PMC6937852 DOI: 10.1186/s12859-019-3323-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Accepted: 12/16/2019] [Indexed: 01/15/2023] Open
Abstract
Background Reactivation of the telomerase reverse transcriptase gene TERT is a central feature for unlimited proliferation of the majority of cancers. However, the underlying regulatory processes are only partly understood. Results We assembled regulator binding information from serveral sources to construct a generic human and mouse gene regulatory network. Advancing our “Mixed Integer linear Programming based Regulatory Interaction Predictor” (MIPRIP) approach, we identified the most common and cancer-type specific regulators of TERT across 19 different human cancers. The results were validated by using the well-known TERT regulation by the ETS1 transcription factor in a subset of melanomas with mutations in the TERT promoter. Our improved MIPRIP2 R-package and the associated generic regulatory networks are freely available at https://github.com/KoenigLabNM/MIPRIP. Conclusion MIPRIP 2.0 identified common as well as tumor type specific regulators of TERT. The software can be easily applied to transcriptome datasets to predict gene regulation for any gene and disease/condition under investigation.
Collapse
Affiliation(s)
- Alexandra M Poos
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747, Jena, Germany.,Division of Chromatin Networks, German Cancer Research Center (DKFZ) and Bioquant, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany.,Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
| | - Theresa Kordaß
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany.,Research Group GMP & T Cell Therapy, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120, Heidelberg, Germany
| | - Amol Kolte
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747, Jena, Germany
| | - Volker Ast
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747, Jena, Germany
| | - Marcus Oswald
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747, Jena, Germany
| | - Karsten Rippe
- Division of Chromatin Networks, German Cancer Research Center (DKFZ) and Bioquant, Im Neuenheimer Feld 267, 69120, Heidelberg, Germany
| | - Rainer König
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747, Jena, Germany.
| |
Collapse
|
19
|
Karakülah G, Arslan N, Yandım C, Suner A. TEffectR: an R package for studying the potential effects of transposable elements on gene expression with linear regression model. PeerJ 2019; 7:e8192. [PMID: 31824778 PMCID: PMC6899341 DOI: 10.7717/peerj.8192] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 11/11/2019] [Indexed: 01/24/2023] Open
Abstract
Introduction Recent studies highlight the crucial regulatory roles of transposable elements (TEs) on proximal gene expression in distinct biological contexts such as disease and development. However, computational tools extracting potential TE -proximal gene expression associations from RNA-sequencing data are still missing. Implementation Herein, we developed a novel R package, using a linear regression model, for studying the potential influence of TE species on proximal gene expression from a given RNA-sequencing data set. Our R package, namely TEffectR, makes use of publicly available RepeatMasker TE and Ensembl gene annotations as well as several functions of other R-packages. It calculates total read counts of TEs from sorted and indexed genome aligned BAM files provided by the user, and determines statistically significant relations between TE expression and the transcription of nearby genes under diverse biological conditions. Availability TEffectR is freely available at https://github.com/karakulahg/TEffectR along with a handy tutorial as exemplified by the analysis of RNA-sequencing data including normal and tumour tissue specimens obtained from breast cancer patients.
Collapse
Affiliation(s)
- Gökhan Karakülah
- Izmir Biomedicine and Genome Center, Izmir, Turkey.,Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Izmir, Turkey
| | | | - Cihangir Yandım
- Izmir Biomedicine and Genome Center, Izmir, Turkey.,Department of Genetics and Bioengineering, Faculty of Engineering, Izmir University of Economics, Izmir, Turkey
| | - Aslı Suner
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Ege University, Izmir, Turkey
| |
Collapse
|
20
|
Wilk G, Braun R. Integrative analysis reveals disrupted pathways regulated by microRNAs in cancer. Nucleic Acids Res 2019; 46:1089-1101. [PMID: 29294105 PMCID: PMC5814839 DOI: 10.1093/nar/gkx1250] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Accepted: 12/01/2017] [Indexed: 02/06/2023] Open
Abstract
MicroRNAs (miRNAs) are small endogenous regulatory molecules that modulate gene expression post-transcriptionally. Although differential expression of miRNAs have been implicated in many diseases (including cancers), the underlying mechanisms of action remain unclear. Because each miRNA can target multiple genes, miRNAs may potentially have functional implications for the overall behavior of entire pathways. Here, we investigate the functional consequences of miRNA dysregulation through an integrative analysis of miRNA and mRNA expression data using a novel approach that incorporates pathway information a priori. By searching for miRNA-pathway associations that differ between healthy and tumor tissue, we identify specific relationships at the systems level which are disrupted in cancer. Our approach is motivated by the hypothesis that if an miRNA and pathway are associated, then the expression of the miRNA and the collective behavior of the genes in a pathway will be correlated. As such, we first obtain an expression-based summary of pathway activity using Isomap, a dimension reduction method which can articulate non-linear structure in high-dimensional data. We then search for miRNAs that exhibit differential correlations with the pathway summary between phenotypes as a means of finding aberrant miRNA-pathway coregulation in tumors. We apply our method to cancer data using gene and miRNA expression datasets from The Cancer Genome Atlas and compare ∼105 miRNA-pathway relationships between healthy and tumor samples from four tissues (breast, prostate, lung and liver). Many of the flagged pairs we identify have a biological basis for disruption in cancer.
Collapse
Affiliation(s)
- Gary Wilk
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL 60208, USA
| | - Rosemary Braun
- Biostatistics Division, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA.,Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, IL 60208, USA
| |
Collapse
|
21
|
Estimation of Transcription Factor Activity in Knockdown Studies. Sci Rep 2019; 9:9593. [PMID: 31270369 PMCID: PMC6610105 DOI: 10.1038/s41598-019-46053-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Accepted: 06/20/2019] [Indexed: 11/24/2022] Open
Abstract
Numerous methods have been developed trying to infer actual regulatory events in a sample. A prominent class of methods model genome-wide gene expression as linear equations derived from a transcription factor (TF) – gene network and optimizes parameters to fit the measured expression intensities. We apply four such methods on experiments with a TF-knockdown (KD) in human and E. coli. The transcriptome data provides clear expression signals and thus represents an extremely favorable test setting. The methods estimate activity changes of all TFs, which we expect to be highest in the KD TF. However, only in 15 out of 54 cases, the KD TFs ranked in the top 5%. We show that this poor overall performance cannot be attributed to a low effectiveness of the knockdown or the specific regulatory network provided as background knowledge. Further, the ranks of regulators related to the KD TF by the network or pathway are not significantly different from a random selection. In general, the result overlaps of different methods are small, indicating that they draw very different conclusions when presented with the same, presumably simple, inference problem. These results show that the investigated methods cannot yield robust TF activity estimates in knockdown schemes.
Collapse
|
22
|
Huang S, Xu W, Hu P, Lakowski TM. Integrative Analysis Reveals Subtype-Specific Regulatory Determinants in Triple Negative Breast Cancer. Cancers (Basel) 2019; 11:cancers11040507. [PMID: 30974831 PMCID: PMC6521146 DOI: 10.3390/cancers11040507] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Revised: 04/02/2019] [Accepted: 04/02/2019] [Indexed: 12/20/2022] Open
Abstract
Different breast cancer (BC) subtypes have unique gene expression patterns, but their regulatory mechanisms have yet to be fully elucidated. We hypothesized that the top upregulated (Yin) and downregulated (Yang) genes determine the fate of cancer cells. To reveal the regulatory determinants of these Yin and Yang genes in different BC subtypes, we developed a lasso regression model integrating DNA methylation (DM), copy number variation (CNV) and microRNA (miRNA) expression of 391 BC patients, coupled with miRNA–target interactions and transcription factor (TF) binding sites. A total of 25, 20, 15 and 24 key regulators were identified for luminal A, luminal B, Her2-enriched, and triple negative (TN) subtypes, respectively. Many of the 24 TN regulators were found to regulate the PPARA and FOXM1 pathways. The Yin Yang gene expression mean ratio (YMR) and combined risk score (CRS) signatures built with either the targets of or the TN regulators were associated with the BC patients’ survival. Previously, we identified FOXM1 and PPARA as the top Yin and Yang pathways in TN, respectively. These two pathways and their regulators could be further explored experimentally, which might help to identify potential therapeutic targets for TN.
Collapse
Affiliation(s)
- Shujun Huang
- College of Pharmacy, University of Manitoba, Winnipeg, MB R3E 0T5, Canada; huangs12@myumanitoba (S.H.); (W.X.)
| | - Wayne Xu
- College of Pharmacy, University of Manitoba, Winnipeg, MB R3E 0T5, Canada; huangs12@myumanitoba (S.H.); (W.X.)
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB R3E 0J9, Canada
- Research Institute in Oncology and Hematology, University of Manitoba, Winnipeg, MB R3E 0V9, Canada
| | - Pingzhao Hu
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB R3E 0J9, Canada
- Research Institute in Oncology and Hematology, University of Manitoba, Winnipeg, MB R3E 0V9, Canada
- Correspondence: (P.H.); (T.M.L.); Tel.: +1-204-789-3229 (P.H.); +1-204-272-3173 (T.M.L.)
| | - Ted M. Lakowski
- College of Pharmacy, University of Manitoba, Winnipeg, MB R3E 0T5, Canada; huangs12@myumanitoba (S.H.); (W.X.)
- Correspondence: (P.H.); (T.M.L.); Tel.: +1-204-789-3229 (P.H.); +1-204-272-3173 (T.M.L.)
| |
Collapse
|
23
|
Dependency of the Cancer-Specific Transcriptional Regulation Circuitry on the Promoter DNA Methylome. Cell Rep 2019; 26:3461-3474.e5. [DOI: 10.1016/j.celrep.2019.02.084] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Revised: 10/01/2018] [Accepted: 02/21/2019] [Indexed: 01/05/2023] Open
|
24
|
Nagy ZB, Barták BK, Kalmár A, Galamb O, Wichmann B, Dank M, Igaz P, Tulassay Z, Molnár B. Comparison of Circulating miRNAs Expression Alterations in Matched Tissue and Plasma Samples During Colorectal Cancer Progression. Pathol Oncol Res 2019; 25:97-105. [PMID: 28980150 DOI: 10.1007/s12253-017-0308-1] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/24/2017] [Accepted: 09/12/2017] [Indexed: 12/16/2022]
Abstract
MicroRNAs (miRNAs) have been found to play a critical role in colorectal adenoma-carcinoma sequence. MiRNA-specific high-throughput arrays became available to detect promising miRNA expression alterations even in biological fluids, such as plasma samples, where miRNAs are stable. The purpose of this study was to identify circulating miRNAs showing altered expression between normal colonic (N), tubular adenoma (ADT), tubulovillous adenoma (ADTV) and colorectal cancer (CRC) matched plasma and tissue samples. Sixteen peripheral plasma and matched tissue biopsy samples (N n = 4; ADT n = 4; ADTV n = 4; CRC n = 4) were selected, and total RNA including miRNA fraction was isolated. MiRNAs from plasma samples were extracted using QIAamp Circulating Nucleic Acid Kit (Qiagen). Matched tissue-plasma miRNA microarray experiments were conducted by GeneChip® miRNA 3.0 Array (Affymetrix). RT-qPCR (microRNA Ready-to-use PCR Human Panel I + II; Exiqon) was used for validation. Characteristic miRNA expression alterations were observed in comparison of AD and CRC groups (miR-149*, miR-3196, miR-4687) in plasma samples. In the N vs. CRC comparison, significant overexpression of miR-612, miR-1296, miR-933, miR-937 and miR-1207 was detected by RT-PCR (p < 0.05). Similar expression pattern of these miRNAs were observed using microarray in tissue pairs, as well. Although miRNAs were also found in circulatory system in a lower concentration compared to tissues, expression patterns slightly overlapped between tissue and plasma samples. Detected circulating miRNA alterations may originate not only from the primer tumor but from other cell types including immune cells.
Collapse
Affiliation(s)
- Zsófia Brigitta Nagy
- Molecular Gastroenterology Laboratory, 2nd Department of Internal Medicine, Semmelweis University, Szentkirályi street 46, Budapest, 1088, Hungary.
| | - Barbara Kinga Barták
- Molecular Gastroenterology Laboratory, 2nd Department of Internal Medicine, Semmelweis University, Szentkirályi street 46, Budapest, 1088, Hungary
| | - Alexandra Kalmár
- Molecular Gastroenterology Laboratory, 2nd Department of Internal Medicine, Semmelweis University, Szentkirályi street 46, Budapest, 1088, Hungary
| | - Orsolya Galamb
- Molecular Gastroenterology Laboratory, 2nd Department of Internal Medicine, Semmelweis University, Szentkirályi street 46, Budapest, 1088, Hungary
- Molecular Medicine Research Group, Hungarian Academy of Sciences, Budapest, Hungary
| | - Barnabás Wichmann
- Molecular Gastroenterology Laboratory, 2nd Department of Internal Medicine, Semmelweis University, Szentkirályi street 46, Budapest, 1088, Hungary
- Molecular Medicine Research Group, Hungarian Academy of Sciences, Budapest, Hungary
| | - Magdolna Dank
- Department of Clinical Oncology, Semmelweis University, Budapest, Hungary
| | - Péter Igaz
- Molecular Gastroenterology Laboratory, 2nd Department of Internal Medicine, Semmelweis University, Szentkirályi street 46, Budapest, 1088, Hungary
- Molecular Medicine Research Group, Hungarian Academy of Sciences, Budapest, Hungary
| | - Zsolt Tulassay
- Molecular Gastroenterology Laboratory, 2nd Department of Internal Medicine, Semmelweis University, Szentkirályi street 46, Budapest, 1088, Hungary
- Molecular Medicine Research Group, Hungarian Academy of Sciences, Budapest, Hungary
| | - Béla Molnár
- Molecular Gastroenterology Laboratory, 2nd Department of Internal Medicine, Semmelweis University, Szentkirályi street 46, Budapest, 1088, Hungary
- Molecular Medicine Research Group, Hungarian Academy of Sciences, Budapest, Hungary
| |
Collapse
|
25
|
Grabowski P, Rappsilber J. A Primer on Data Analytics in Functional Genomics: How to Move from Data to Insight? Trends Biochem Sci 2019; 44:21-32. [PMID: 30522862 PMCID: PMC6318833 DOI: 10.1016/j.tibs.2018.10.010] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Revised: 10/19/2018] [Accepted: 10/25/2018] [Indexed: 02/06/2023]
Abstract
High-throughput methodologies and machine learning have been central in developing systems-level perspectives in molecular biology. Unfortunately, performing such integrative analyses has traditionally been reserved for bioinformaticians. This is now changing with the appearance of resources to help bench-side biologists become skilled at computational data analysis and handling large omics data sets. Here, we show an entry route into the field of omics data analytics. We provide information about easily accessible data sources and suggest some first steps for aspiring computational data analysts. Moreover, we highlight how machine learning is transforming the field and how it can help make sense of biological data. Finally, we suggest good starting points for self-learning and hope to convince readers that computational data analysis and programming are not intimidating.
Collapse
Affiliation(s)
- Piotr Grabowski
- Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355 Berlin, Germany
| | - Juri Rappsilber
- Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355 Berlin, Germany; Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, UK.
| |
Collapse
|
26
|
Wilk G, Braun R. regQTLs: Single nucleotide polymorphisms that modulate microRNA regulation of gene expression in tumors. PLoS Genet 2018; 14:e1007837. [PMID: 30557297 PMCID: PMC6343932 DOI: 10.1371/journal.pgen.1007837] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Revised: 01/23/2019] [Accepted: 11/17/2018] [Indexed: 02/07/2023] Open
Abstract
Genome-wide association studies (GWAS) have identified single nucleotide polymorphisms (SNPs) associated with trait diversity and disease susceptibility, yet their functional properties often remain unclear. It has been hypothesized that SNPs in microRNA binding sites may disrupt gene regulation by microRNAs (miRNAs), short non-coding RNAs that bind to mRNA and downregulate the target gene. While several studies have predicted the location of SNPs in miRNA binding sites, to date there has been no comprehensive analysis of their impact on miRNA regulation. Here we investigate the functional properties of genetic variants and their effects on miRNA regulation of gene expression in cancer. Our analysis is motivated by the hypothesis that distinct alleles may cause differential binding (from miRNAs to mRNAs or from transcription factors to DNA) and change the expression of genes. We previously identified pathways—systems of genes conferring specific cell functions—that are dysregulated by miRNAs in cancer, by comparing miRNA–pathway associations between healthy and tumor tissue. We draw on these results as a starting point to assess whether SNPs on dysregulated pathways are responsible for miRNA dysregulation of individual genes in tumors. Using an integrative regression analysis that incorporates miRNA expression, mRNA expression, and SNP genotype data, we identify functional SNPs that we term “regulatory QTLs (regQTLs)”: loci whose alleles impact the regulation of genes by miRNAs. We apply the method to breast, liver, lung, and prostate cancer data from The Cancer Genome Atlas, and provide a tool to explore the findings. Genomics studies have identified single nucleotide polymorphisms (SNPs) associated with trait diversity and disease susceptibility, yet the mechanism of action of many genetic variants remains unclear. MicroRNAs (miRNAs) are a class of small non-coding RNA molecules that base-pair coding mRNAs to regulate gene transcription. We hypothesize that SNP variants may affect the ability of miRNAs to bind their target genes, thus influencing gene regulation. To identify these “regulatory QTLs” (regQTLs), we integrate miRNA expression, mRNA expression, and SNP data to identify miRNAs that are associated with pathway dysregulation in tumors, and assess whether SNPs on these pathways are responsible for disrupted miRNA-gene regulation. This data-driven approach enables the discovery of SNPs whose alleles impact gene regulation by miRNAs, with functional consequences for tumor biology. We detail the method, apply it to data from The Cancer Genome Atlas, and provide a tool to explore the findings.
Collapse
Affiliation(s)
- Gary Wilk
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, United States of America
- Biostatistics Division, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, United States of America
| | - Rosemary Braun
- Biostatistics Division, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, United States of America
- Department of Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, Illinois, United States of America
- * E-mail:
| |
Collapse
|
27
|
Frost HR, Amos CI. A multi-omics approach for identifying important pathways and genes in human cancer. BMC Bioinformatics 2018; 19:479. [PMID: 30541428 PMCID: PMC6292115 DOI: 10.1186/s12859-018-2476-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Accepted: 11/09/2018] [Indexed: 12/15/2022] Open
Abstract
Background Cancer develops when pathways controlling cell survival, cell fate or genome maintenance are disrupted by the somatic alteration of key driver genes. Understanding how pathway disruption is driven by somatic alterations is thus essential for an accurate characterization of cancer biology and identification of therapeutic targets. Unfortunately, current cancer pathway analysis methods fail to fully model the relationship between somatic alterations and pathway activity. Results To address these limitations, we developed a multi-omics method for identifying biologically important pathways and genes in human cancer. Our approach combines single-sample pathway analysis with multi-stage, lasso-penalized regression to find pathways whose gene expression can be explained largely in terms of gene-level somatic alterations in the tumor. Importantly, this method can analyze case-only data sets, does not require information regarding pathway topology and supports personalized pathway analysis using just somatic alteration data for a limited number of cancer-associated genes. The practical effectiveness of this technique is illustrated through an analysis of data from The Cancer Genome Atlas using gene sets from the Molecular Signatures Database. Conclusions Novel insights into the pathophysiology of human cancer can be obtained from statistical models that predict expression-based pathway activity in terms of non-silent somatic mutations and copy number variation. These models enable the identification of biologically important pathways and genes and support personalized pathway analysis in cases where gene expression data is unavailable. Electronic supplementary material The online version of this article (10.1186/s12859-018-2476-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- H Robert Frost
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, 03755, NH, USA.
| | - Christopher I Amos
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, 03755, NH, USA
| |
Collapse
|
28
|
Palowitch J, Shabalin A, Zhou YH, Nobel AB, Wright FA. Estimation of cis-eQTL effect sizes using a log of linear model. Biometrics 2018; 74:616-625. [PMID: 29073327 PMCID: PMC5920774 DOI: 10.1111/biom.12810] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Revised: 09/01/2017] [Accepted: 09/01/2017] [Indexed: 11/29/2022]
Abstract
The study of expression Quantitative Trait Loci (eQTL) is an important problem in genomics and biomedicine. While detection (testing) of eQTL associations has been widely studied, less work has been devoted to the estimation of eQTL effect size. To reduce false positives, detection methods frequently rely on linear modeling of rank-based normalized or log-transformed gene expression data. Unfortunately, these approaches do not correspond to the simplest model of eQTL action, and thus yield estimates of eQTL association that can be uninterpretable and inaccurate. In this article, we propose a new, log-of-linear model for eQTL action, termed ACME, that captures allelic contributions to cis-acting eQTLs in an additive fashion, yielding effect size estimates that correspond to a biologically coherent model of cis-eQTLs. We describe a non-linear least-squares algorithm to fit the model by maximum likelihood, and obtain corresponding p-values. We perform careful investigation of the model using a combination of simulated data and data from the Genotype Tissue Expression (GTEx) project. Our results reveal little evidence for dominance effects, a parsimonious result that accords with a simple biological model for allele-specific expression and supports use of the ACME model. We show that Type-I error is well-controlled under our approach in a realistic setting, so that rank-based normalizations are unnecessary. Furthermore, we show that such normalizations can be detrimental to power and estimation accuracy under the proposed model. We then show, through effect size analyses of whole-genome cis-eQTLs in the GTEx data, that using standard normalizations instead of ACME noticeably affects the ranking and sign of estimates.
Collapse
Affiliation(s)
- John Palowitch
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| | - Andrey Shabalin
- Department of Psychiatry, University of Utah, Salt Lake City, Utah 84108, U.S.A
| | - Yi-Hui Zhou
- Bioinformatics Research Center and Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina, U.S.A
| | - Andrew B Nobel
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| | - Fred A Wright
- Bioinformatics Research Center and Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina, U.S.A
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, U.S.A
| |
Collapse
|
29
|
Min JW, Koh Y, Kim DY, Kim HL, Han JA, Jung YJ, Yoon SS, Choi SS. Identification of Novel Functional Variants of SIN3A and SRSF1 among Somatic Variants in Acute Myeloid Leukemia Patients. Mol Cells 2018; 41:465-475. [PMID: 29764005 PMCID: PMC5974623 DOI: 10.14348/molcells.2018.0051] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Revised: 02/25/2018] [Accepted: 03/08/2018] [Indexed: 12/18/2022] Open
Abstract
The advent of massively parallel sequencing, also called next-generation sequencing (NGS), has dramatically influenced cancer genomics by accelerating the identification of novel molecular alterations. Using a whole genome sequencing (WGS) approach, we identified somatic coding and noncoding variants that may contribute to leukemogenesis in 11 adult Korean acute myeloid leukemia (AML) patients, with serial tumor samples (primary and relapse) available for 5 of them; somatic variants were identified in 187 AML-related genes, including both novel (SIN3A, C10orf53, PTPRR, and RERGL) and well-known (NPM1, RUNX1, and CEPBA) AML-related genes. Notably, SIN3A expression shows prognostic value in AML. A newly designed method, referred to as "hot-zone" analysis, detected two putative functional noncoding variants that can alter transcription factor binding affinity near PPP1R10 and SRSF1. Moreover, the functional importance of the SRSF1 noncoding variant was further investigated by luciferase assays, which showed that the variant is critical for the regulation of gene expression leading to leukemogenesis. We expect that further functional investigation of these coding and noncoding variants will contribute to a more in-depth understanding of the underlying molecular mechanisms of AML and the development of targeted anti-cancer drugs.
Collapse
Affiliation(s)
- Jae-Woong Min
- Division of Biomedical Convergence, College of Biomedical Science, Institute of Bioscience & Biotechnology, Kangwon National University, Chuncheon 24341,
Korea
| | - Youngil Koh
- Department of Internal Medicine, Seoul National University Hospital, Seoul 03080,
Korea
- Cancer Research Institute, Seoul National University College of Medicine, Seoul, 03080,
Korea
| | - Dae-Yoon Kim
- Cancer Research Institute, Seoul National University College of Medicine, Seoul, 03080,
Korea
| | - Hyung-Lae Kim
- Department of Biochemistry, School of Medicine, Ewha Woman’s University, Seoul 03760,
Korea
| | - Jeong A Han
- Department of Biochemistry and Molecular Biology, School of Medicine, Kangwon National University, Chuncheon 24341,
Korea
| | - Yu-Jin Jung
- Department of Biological Sciences, Kangwon National University, Chuncheon 24341,
Korea
| | - Sung-Soo Yoon
- Department of Internal Medicine, Seoul National University Hospital, Seoul 03080,
Korea
- Cancer Research Institute, Seoul National University College of Medicine, Seoul, 03080,
Korea
| | - Sun Shim Choi
- Division of Biomedical Convergence, College of Biomedical Science, Institute of Bioscience & Biotechnology, Kangwon National University, Chuncheon 24341,
Korea
| |
Collapse
|
30
|
Griffin PJ, Zhang Y, Johnson WE, Kolaczyk ED. Detection of multiple perturbations in multi-omics biological networks. Biometrics 2018; 74:1351-1361. [PMID: 29772079 DOI: 10.1111/biom.12893] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2016] [Revised: 04/01/2018] [Accepted: 04/01/2018] [Indexed: 01/24/2023]
Abstract
Cellular mechanism-of-action is of fundamental concern in many biological studies. It is of particular interest for identifying the cause of disease and learning the way in which treatments act against disease. However, pinpointing such mechanisms is difficult, due to the fact that small perturbations to the cell can have wide-ranging downstream effects. Given a snapshot of cellular activity, it can be challenging to tell where a disturbance originated. The presence of an ever-greater variety of high-throughput biological data offers an opportunity to examine cellular behavior from multiple angles, but also presents the statistical challenge of how to effectively analyze data from multiple sources. In this setting, we propose a method for mechanism-of-action inference by extending network filtering to multi-attribute data. We first estimate a joint Gaussian graphical model across multiple data types using penalized regression and filter for network effects. We then apply a set of likelihood ratio tests to identify the most likely site of the original perturbation. In addition, we propose a conditional testing procedure to allow for detection of multiple perturbations. We demonstrate this methodology on paired gene expression and methylation data from The Cancer Genome Atlas (TCGA).
Collapse
Affiliation(s)
- Paula J Griffin
- Department of Biostatistics, Boston University School of Public Health, Boston, U.S.A
| | - Yuqing Zhang
- Division of Computational Biomedicine, Boston University School of Medicine, Boston, U.S.A.,Graduate Program in Bioinformatics, Boston University, Boston, U.S.A
| | - William Evan Johnson
- Department of Biostatistics, Boston University School of Public Health, Boston, U.S.A.,Division of Computational Biomedicine, Boston University School of Medicine, Boston, U.S.A.,Graduate Program in Bioinformatics, Boston University, Boston, U.S.A
| | - Eric D Kolaczyk
- Graduate Program in Bioinformatics, Boston University, Boston, U.S.A.,Department of Mathematics and Statistics, Boston University, Boston, U.S.A
| |
Collapse
|
31
|
Bessière C, Taha M, Petitprez F, Vandel J, Marin JM, Bréhélin L, Lèbre S, Lecellier CH. Probing instructions for expression regulation in gene nucleotide compositions. PLoS Comput Biol 2018; 14:e1005921. [PMID: 29293496 PMCID: PMC5766238 DOI: 10.1371/journal.pcbi.1005921] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Revised: 01/12/2018] [Accepted: 12/10/2017] [Indexed: 01/22/2023] Open
Abstract
Gene expression is orchestrated by distinct regulatory regions to ensure a wide variety of cell types and functions. A challenge is to identify which regulatory regions are active, what are their associated features and how they work together in each cell type. Several approaches have tackled this problem by modeling gene expression based on epigenetic marks, with the ultimate goal of identifying driving regions and associated genomic variations that are clinically relevant in particular in precision medicine. However, these models rely on experimental data, which are limited to specific samples (even often to cell lines) and cannot be generated for all regulators and all patients. In addition, we show here that, although these approaches are accurate in predicting gene expression, inference of TF combinations from this type of models is not straightforward. Furthermore these methods are not designed to capture regulation instructions present at the sequence level, before the binding of regulators or the opening of the chromatin. Here, we probe sequence-level instructions for gene expression and develop a method to explain mRNA levels based solely on nucleotide features. Our method positions nucleotide composition as a critical component of gene expression. Moreover, our approach, able to rank regulatory regions according to their contribution, unveils a strong influence of the gene body sequence, in particular introns. We further provide evidence that the contribution of nucleotide content can be linked to co-regulations associated with genome 3D architecture and to associations of genes within topologically associated domains.
Collapse
Affiliation(s)
- Chloé Bessière
- IBC, Univ. Montpellier, CNRS, Montpellier, France
- Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, Montpellier, France
| | - May Taha
- IBC, Univ. Montpellier, CNRS, Montpellier, France
- Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, Montpellier, France
- IMAG, Univ. Montpellier, CNRS, Montpellier, France
| | - Florent Petitprez
- IBC, Univ. Montpellier, CNRS, Montpellier, France
- Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, Montpellier, France
| | - Jimmy Vandel
- IBC, Univ. Montpellier, CNRS, Montpellier, France
- LIRMM, Univ. Montpellier, CNRS, Montpellier, France
| | - Jean-Michel Marin
- IBC, Univ. Montpellier, CNRS, Montpellier, France
- IMAG, Univ. Montpellier, CNRS, Montpellier, France
| | - Laurent Bréhélin
- IBC, Univ. Montpellier, CNRS, Montpellier, France
- LIRMM, Univ. Montpellier, CNRS, Montpellier, France
| | - Sophie Lèbre
- IBC, Univ. Montpellier, CNRS, Montpellier, France
- IMAG, Univ. Montpellier, CNRS, Montpellier, France
- Univ. Paul-Valéry-Montpellier 3, Montpellier, France
| | - Charles-Henri Lecellier
- IBC, Univ. Montpellier, CNRS, Montpellier, France
- Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, Montpellier, France
| |
Collapse
|
32
|
Chen Y, Widschwendter M, Teschendorff AE. Systems-epigenomics inference of transcription factor activity implicates aryl-hydrocarbon-receptor inactivation as a key event in lung cancer development. Genome Biol 2017; 18:236. [PMID: 29262847 PMCID: PMC5738803 DOI: 10.1186/s13059-017-1366-0] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2017] [Accepted: 11/27/2017] [Indexed: 12/25/2022] Open
Abstract
Background Diverse molecular alterations associated with smoking in normal and precursor lung cancer cells have been reported, yet their role in lung cancer etiology remains unclear. A prominent example is hypomethylation of the aryl hydrocarbon-receptor repressor (AHRR) locus, which is observed in blood and squamous epithelial cells of smokers, but not in lung cancer. Results Using a novel systems-epigenomics algorithm, called SEPIRA, which leverages the power of a large RNA-sequencing expression compendium to infer regulatory activity from messenger RNA expression or DNA methylation (DNAm) profiles, we infer the landscape of binding activity of lung-specific transcription factors (TFs) in lung carcinogenesis. We show that lung-specific TFs become preferentially inactivated in lung cancer and precursor lung cancer lesions and further demonstrate that these results can be derived using only DNAm data. We identify subsets of TFs which become inactivated in precursor cells. Among these regulatory factors, we identify AHR, the aryl hydrocarbon-receptor which controls a healthy immune response in the lung epithelium and whose repressor, AHRR, has recently been implicated in smoking-mediated lung cancer. In addition, we identify FOXJ1, a TF which promotes growth of airway cilia and effective clearance of the lung airway epithelium from carcinogens. Conclusions We identify TFs, such as AHR, which become inactivated in the earliest stages of lung cancer and which, unlike AHRR hypomethylation, are also inactivated in lung cancer itself. The novel systems-epigenomics algorithm SEPIRA will be useful to the wider epigenome-wide association study community as a means of inferring regulatory activity. Electronic supplementary material The online version of this article (doi:10.1186/s13059-017-1366-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yuting Chen
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, 320 Yue Yang Road, Shanghai, 200031, China
| | - Martin Widschwendter
- Department of Women's Cancer, University College London, 74 Huntley Street, London, WC1E 6AU, UK
| | - Andrew E Teschendorff
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, 320 Yue Yang Road, Shanghai, 200031, China. .,Department of Women's Cancer, University College London, 74 Huntley Street, London, WC1E 6AU, UK. .,UCL Cancer Institute, University College London, Paul O'Gorman Building, 72 Huntley Street, London, WC1E 6BT, UK.
| |
Collapse
|
33
|
Shu J, Silva BVRE, Gao T, Xu Z, Cui J. Dynamic and Modularized MicroRNA Regulation and Its Implication in Human Cancers. Sci Rep 2017; 7:13356. [PMID: 29042600 PMCID: PMC5645395 DOI: 10.1038/s41598-017-13470-5] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 09/26/2017] [Indexed: 12/19/2022] Open
Abstract
MicroRNA is responsible for the fine-tuning of fundamental cellular activities and human disease development. The altered availability of microRNAs, target mRNAs, and other types of endogenous RNAs competing for microRNA interactions reflects the dynamic and conditional property of microRNA-mediated gene regulation that remains under-investigated. Here we propose a new integrative method to study this dynamic process by considering both competing and cooperative mechanisms and identifying functional modules where different microRNAs co-regulate the same functional process. Specifically, a new pipeline was built based on a meta-Lasso regression model and the proof-of-concept study was performed using a large-scale genomic dataset from ~4,200 patients with 9 cancer types. In the analysis, 10,726 microRNA-mRNA interactions were identified to be associated with a specific stage and/or type of cancer, which demonstrated the dynamic and conditional miRNA regulation during cancer progression. On the other hands, we detected 4,134 regulatory modules that exhibit high fidelity of microRNA function through selective microRNA-mRNA binding and modulation. For example, miR-18a-3p, -320a, -193b-3p, and -92b-3p co-regulate the glycolysis/gluconeogenesis and focal adhesion in cancers of kidney, liver, lung, and uterus. Furthermore, several new insights into dynamic microRNA regulation in cancers have been discovered in this study.
Collapse
Affiliation(s)
- Jiang Shu
- Systems Biology and Biomedical Informatics (SBBI) Laboratory, Department of Computer Science and Engineering, Lincoln, NE, 68588, USA
| | - Bruno Vieira Resende E Silva
- Systems Biology and Biomedical Informatics (SBBI) Laboratory, Department of Computer Science and Engineering, Lincoln, NE, 68588, USA
| | - Tian Gao
- Systems Biology and Biomedical Informatics (SBBI) Laboratory, Department of Computer Science and Engineering, Lincoln, NE, 68588, USA
| | - Zheng Xu
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
- Quantitative Life Sciences Initiative, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - Juan Cui
- Systems Biology and Biomedical Informatics (SBBI) Laboratory, Department of Computer Science and Engineering, Lincoln, NE, 68588, USA.
| |
Collapse
|
34
|
Turki T. Learning approaches to improve prediction of drug sensitivity in breast cancer patients. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2017; 2016:3314-3320. [PMID: 28269014 DOI: 10.1109/embc.2016.7591437] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Predicting drug response to cancer disease is an important problem in modern clinical oncology that attracted increasing recent attention from various domains such as computational biology, machine learning, and data mining. Cancer patients respond differently to each cancer therapy owing to disease diversity, genetic factors, and environmental causes. Thus, oncologists aim to identify the effective therapies for cancer patients and avoid adverse drug reactions in patients. By predicting the drug response to cancer, oncologists gain full understanding of the effective treatments on each patient, which leads to better personalized treatment. In this paper, we present three learning approaches to improve the prediction of breast cancer patients' response to chemotherapy drug: the instance selection approach, the oversampling approach, and the hybrid approach. We evaluate the performance of our approaches and compare them against the baseline approach using the Area Under the ROC Curve (AUC) on clinical trial data, in addition to testing the stability of the approaches. Our experimental results show the stability of our approaches giving the highest AUC with statistical significance.
Collapse
|
35
|
Yan Z, Liu Y, Wei Y, Zhao N, Zhang Q, Wu C, Chang Z, Xu Y. The functional consequences and prognostic value of dosage sensitivity in ovarian cancer. MOLECULAR BIOSYSTEMS 2017; 13:380-391. [PMID: 28067383 DOI: 10.1039/c6mb00625f] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Copy number alteration (CNA) represents an important class of genetic variations that may contribute to tumorigenesis, tumor growth and metastatic spread. CNA can directly affect the expression of genes within the CNA regions; however, genes within the CNA regions exhibit heterogeneity in gene dosage sensitivity. In this study, a computational framework was built to identify 1170 dosage-sensitive genes (DSGs) and 1215 dosage-resistant genes (DRGs) that were related to ovarian serous cystadenocarcinoma (OV) through the association between CNA and gene expression. To analyze the different functions of the genes within the two groups, the functional annotation results indicated that DRGs were involved in cancer-related processes like immune response, cell death and apoptosis, while DSGs were enriched in essential processes like the cell cycle and the DNA metabolic process. Meanwhile, two three-dimensional regulatory networks for differentially expressed miRNAs, differentially expressed transcription factors (TFs) and DSGs or DRGs were constructed based on feed-forward loops. We identified key regulators (such as miR-16-5p, miR-98-5p, MYB and HOXA5) and cancer prognosis-related network motifs (such as miR-98-5p-HOXA5-TP53 and miR-16-5p-MYB-IGF1R) after the analysis of network topological features. Our results lead us to speculate that these genes and associated regulators may be potential mechanistic biomarkers for tumorigenesis and progression of cancer. Research on the network characteristics and the role of feed-forward loops in OV tumorigenesis and development could lead to feasible suggestions for the prevention and early diagnosis of OV, which will shed light on understanding the functional mechanism of CNA in cancer.
Collapse
Affiliation(s)
- Zichuang Yan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Yongjing Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Yunzhen Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Ning Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Qiang Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Cheng Wu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Zhiqiang Chang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Yan Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| |
Collapse
|
36
|
Trescher S, Münchmeyer J, Leser U. Estimating genome-wide regulatory activity from multi-omics data sets using mathematical optimization. BMC SYSTEMS BIOLOGY 2017; 11:41. [PMID: 28347313 PMCID: PMC5369021 DOI: 10.1186/s12918-017-0419-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2016] [Accepted: 03/08/2017] [Indexed: 12/28/2022]
Abstract
Background Gene regulation is one of the most important cellular processes, indispensable for the adaptability of organisms and closely interlinked with several classes of pathogenesis and their progression. Elucidation of regulatory mechanisms can be approached by a multitude of experimental methods, yet integration of the resulting heterogeneous, large, and noisy data sets into comprehensive and tissue or disease-specific cellular models requires rigorous computational methods. Recently, several algorithms have been proposed which model genome-wide gene regulation as sets of (linear) equations over the activity and relationships of transcription factors, genes and other factors. Subsequent optimization finds those parameters that minimize the divergence of predicted and measured expression intensities. In various settings, these methods produced promising results in terms of estimating transcription factor activity and identifying key biomarkers for specific phenotypes. However, despite their common root in mathematical optimization, they vastly differ in the types of experimental data being integrated, the background knowledge necessary for their application, the granularity of their regulatory model, the concrete paradigm used for solving the optimization problem and the data sets used for evaluation. Results Here, we review five recent methods of this class in detail and compare them with respect to several key properties. Furthermore, we quantitatively compare the results of four of the presented methods based on publicly available data sets. Conclusions The results show that all methods seem to find biologically relevant information. However, we also observe that the mutual result overlaps are very low, which contradicts biological intuition. Our aim is to raise further awareness of the power of these methods, yet also to identify common shortcomings and necessary extensions enabling focused research on the critical points. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0419-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Saskia Trescher
- Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany.
| | - Jannes Münchmeyer
- Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany
| | - Ulf Leser
- Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099, Berlin, Germany
| |
Collapse
|
37
|
Trop-Steinberg S, Azar Y. AP-1 Expression and its Clinical Relevance in Immune Disorders and Cancer. Am J Med Sci 2017; 353:474-483. [PMID: 28502334 DOI: 10.1016/j.amjms.2017.01.019] [Citation(s) in RCA: 74] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Revised: 01/26/2017] [Accepted: 01/30/2017] [Indexed: 02/07/2023]
Abstract
The inflammatory response is known to have a significant role in certain autoimmune diseases and malignancies. We review current knowledge regarding the functions of activator protein 1 (AP-1) as an important modulator in several immune disorders and carcinomas. AP-1 is overexpressed in rheumatoid arthritis and in long-term allogeneic hematopoietic stem cell transplantation survivors; however, decreased expression of AP-1 has been observed in psoriasis, systematic lupus erythematosus and in patients who do not survive after hematopoietic stem cell transplantation. AP-1 also is implicated in the control of various cancer cells. Higher levels of AP-1 components are present in breast and endometrial carcinomas, colorectal cancer and in acute myeloid leukemia, Hodgkin׳s lymphoma and anaplastic large cell lymphoma, with downregulation in ovarian and gastric carcinomas and in patients with chronic myelogenous leukemia. AP-1 may enable the development of helpful markers to identify early-stage disease or to predict severity.
Collapse
Affiliation(s)
| | - Yehudit Azar
- Bone Marrow Transplantation Unit, Hadassah-Hebrew University Medical Center, Jerusalem, Israel
| |
Collapse
|
38
|
Abstract
microRNAs (miRNAs) and DNA methylation are the 2 epigenetic modifications that have emerged in recent years as the most critical players in the regulation of gene expression. Compelling evidence has indicated the roles of miRNAs and DNA methylation in modulating cellular transformation and tumorigenesis. miRNAs act as negative regulators of gene expression and are involved in the regulation of both physiologic conditions and during diseases, such as cancer, inflammatory diseases, and psychiatric disorders, among others. Meanwhile, aberrant DNA methylation manifests in both global genome changes and in localized gene promoter changes, which influences the transcription of cancer genes. In this review, we described the mutual regulation of miRNAs and DNA methylation in human cancers. miRNAs regulate DNA methylation by targeting DNA methyltransferases or methylation-related proteins. On the other hand, both hyper- and hypo-methylation of miRNAs occur frequently in human cancers and represent a new level of complexity in gene regulation. Therefore, understanding the mechanisms underlying the mutual regulation of miRNAs and DNA methylation may provide helpful insights in the development of efficient therapeutic approaches.
Collapse
Affiliation(s)
- Sumei Wang
- a Department of Oncology , Guangdong Provincial Hospital of Chinese Medicine , Guangzhou, Guangdong , P. R. China.,b Department of Systems Biology , The University of Texas MD Anderson Cancer Center , Houston , TX , USA
| | - Wanyin Wu
- a Department of Oncology , Guangdong Provincial Hospital of Chinese Medicine , Guangzhou, Guangdong , P. R. China
| | - Francois X Claret
- b Department of Systems Biology , The University of Texas MD Anderson Cancer Center , Houston , TX , USA.,c Experimental Therapeutics Academic Program and Cancer Biology Program , The University of Texas Graduate School of Biomedical Sciences at Houston , Houston , TX , USA
| |
Collapse
|
39
|
Efficient Regularized Regression with L0 Penalty for Variable Selection and Network Construction. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2016; 2016:3456153. [PMID: 27843486 PMCID: PMC5098106 DOI: 10.1155/2016/3456153] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Revised: 08/29/2016] [Accepted: 09/20/2016] [Indexed: 12/22/2022]
Abstract
Variable selections for regression with high-dimensional big data have found many applications in bioinformatics and computational biology. One appealing approach is the L0 regularized regression which penalizes the number of nonzero features in the model directly. However, it is well known that L0 optimization is NP-hard and computationally challenging. In this paper, we propose efficient EM (L0EM) and dual L0EM (DL0EM) algorithms that directly approximate the L0 optimization problem. While L0EM is efficient with large sample size, DL0EM is efficient with high-dimensional (n ≪ m) data. They also provide a natural solution to all Lp
p ∈ [0,2] problems, including lasso with p = 1 and elastic net with p ∈ [1,2]. The regularized parameter λ can be determined through cross validation or AIC and BIC. We demonstrate our methods through simulation and high-dimensional genomic data. The results indicate that L0 has better performance than lasso, SCAD, and MC+, and L0 with AIC or BIC has similar performance as computationally intensive cross validation. The proposed algorithms are efficient in identifying the nonzero variables with less bias and constructing biologically important networks with high-dimensional big data.
Collapse
|
40
|
Modeling Gene Regulation in Liver Hepatocellular Carcinoma with Random Forests. BIOMED RESEARCH INTERNATIONAL 2016; 2016:1035945. [PMID: 27818995 PMCID: PMC5080476 DOI: 10.1155/2016/1035945] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2016] [Accepted: 09/21/2016] [Indexed: 11/29/2022]
Abstract
Liver hepatocellular carcinoma (HCC) remains a leading cause of cancer-related death. Poor understanding of the mechanisms underlying HCC prevents early detection and leads to high mortality. We developed a random forest model that incorporates copy-number variation, DNA methylation, transcription factor, and microRNA binding information as features to predict gene expression in HCC. Our model achieved a highly significant correlation between predicted and measured expression of held-out genes. Furthermore, we identified potential regulators of gene expression in HCC. Many of these regulators have been previously found to be associated with cancer and are differentially expressed in HCC. We also evaluated our predicted target sets for these regulators by making comparison with experimental results. Lastly, we found that the transcription factor E2F6, one of the candidate regulators inferred by our model, is predictive of survival rate in HCC. Results of this study will provide directions for future prospective studies in HCC.
Collapse
|
41
|
Liu C, Rohart F, Simpson PT, Khanna KK, Ragan MA, Lê Cao KA. Integrating Multi-omics Data to Dissect Mechanisms of DNA repair Dysregulation in Breast Cancer. Sci Rep 2016; 6:34000. [PMID: 27666291 PMCID: PMC5036051 DOI: 10.1038/srep34000] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Accepted: 09/01/2016] [Indexed: 12/20/2022] Open
Abstract
DNA repair genes and pathways that are transcriptionally dysregulated in cancer provide the first line of evidence for the altered DNA repair status in tumours, and hence have been explored intensively as a source for biomarker discovery. The molecular mechanisms underlying DNA repair dysregulation, however, have not been systematically investigated in any cancer type. In this study, we performed a statistical analysis to dissect the roles of DNA copy number alteration (CNA), DNA methylation (DM) at gene promoter regions and the expression changes of transcription factors (TFs) in the differential expression of individual DNA repair genes in normal versus tumour breast samples. These gene-level results were summarised at pathway level to assess whether different DNA repair pathways are affected in distinct manners. Our results suggest that CNA and expression changes of TFs are major causes of DNA repair dysregulation in breast cancer, and that a subset of the identified TFs may exert global impacts on the dysregulation of multiple repair pathways. Our work hence provides novel insights into DNA repair dysregulation in breast cancer. These insights improve our understanding of the molecular basis of the DNA repair biomarkers identified thus far, and have potential to inform future biomarker discovery.
Collapse
Affiliation(s)
- Chao Liu
- Institute for Molecular Bioscience, The University of Queensland, St. Lucia, QLD 4067, Australia
| | - Florian Rohart
- The University of Queensland Diamantina Institute, The University of Queensland, Woolloongabba, QLD 4102, Australia
| | - Peter T Simpson
- UQ Centre for Clinical Research and School of Medicine, The University of Queensland, Herston, QLD 4101, Australia
| | - Kum Kum Khanna
- QIMR-Berghofer Medical Research Institute, Herston, Brisbane, QLD 4006, Australia
| | - Mark A Ragan
- Institute for Molecular Bioscience, The University of Queensland, St. Lucia, QLD 4067, Australia
| | - Kim-Anh Lê Cao
- The University of Queensland Diamantina Institute, The University of Queensland, Woolloongabba, QLD 4102, Australia
| |
Collapse
|
42
|
Hu XM, Yan XH, Hu YW, Huang JL, Cao SW, Ren TY, Tang YT, Lin L, Zheng L, Wang Q. miRNA-548p suppresses hepatitis B virus X protein associated hepatocellular carcinoma by downregulating oncoprotein hepatitis B x-interacting protein. Hepatol Res 2016; 46:804-15. [PMID: 26583881 DOI: 10.1111/hepr.12618] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/17/2015] [Revised: 10/04/2015] [Accepted: 11/06/2015] [Indexed: 12/12/2022]
Abstract
AIM miR-548p is a recently identified and poorly characterized miRNA. However, its role of miR-548p in tumorigenesis and progression remains poorly understood. Here, we aimed to investigate the biofunction of miR-548p in hepatocellular carcinogenesis. METHODS The expression levels of miR-548p were detected by quantitative reverse transcription polymerase chain reaction (qRT-PCR). The role of miR-548p in hepatocellular carcinoma (HCC) was determined by colony formation, flow cytometry assay and nude mice xenograft experiments. miR-548p target genes were analyzed by miRNA target predication programs and verified by qRT-PCR, western blotting assay and dual-luciferase reporter assay. RESULTS miR-548p is repressed by hepatitis B virus X protein (HBx) in HCC tumor tissues and hepatoma cells, and inhibited cell growth by inhibiting cell proliferation and promoting cell apoptosis. miR-548p directly downregulated the expression of hepatitis B x-interacting protein (HBXIP) by binding to the 3'-untranslated region of HBXIP mRNA. Further study showed that hepatocyte nuclear factor-4a (HNF4A) promoted the expression of miR-548p and inhibited the transcription of HBXIP. HNF4A is a dominant transcriptional regulator of hepatocyte differentiation and hepatocellular carcinogenesis, and is shown to be repressed by HBx. CONCLUSION We proposed the model for HBx/HNF4A/miR-548p/HBXIP pathway that controls hepatoma cell growth and tumorigenesis of HCC. miR-548p was identified as a tumor-suppressor in HBx-associated hepatocellular carcinogenesis.
Collapse
Affiliation(s)
- Xiu-Mei Hu
- Laboratory Medicine Center, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Xiao-Hui Yan
- Research Center of Clinical Medicine, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Yan-Wei Hu
- Laboratory Medicine Center, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Jin-Lan Huang
- Laboratory Medicine Center, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Shun-Wang Cao
- Laboratory Medicine Center, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Ting-Yu Ren
- Laboratory Medicine Center, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Yue-Ting Tang
- Laboratory Medicine Center, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Li Lin
- Laboratory Medicine Center, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Lei Zheng
- Laboratory Medicine Center, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Qian Wang
- Laboratory Medicine Center, Nanfang Hospital, Southern Medical University, Guangzhou, China
| |
Collapse
|
43
|
Lafzi A, Kazan H. Inferring RBP-Mediated Regulation in Lung Squamous Cell Carcinoma. PLoS One 2016; 11:e0155354. [PMID: 27186987 PMCID: PMC4871487 DOI: 10.1371/journal.pone.0155354] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2016] [Accepted: 04/27/2016] [Indexed: 12/11/2022] Open
Abstract
RNA-binding proteins (RBPs) play key roles in post-transcriptional regulation of mRNAs. Dysregulations in RBP-mediated mechanisms have been found to be associated with many steps of cancer initiation and progression. Despite this, previous studies of gene expression in cancer have ignored the effect of RBPs. To this end, we developed a lasso regression model that predicts gene expression in cancer by incorporating RBP-mediated regulation as well as the effects of other well-studied factors such as copy-number variation, DNA methylation, TFs and miRNAs. As a case study, we applied our model to Lung squamous cell carcinoma (LUSC) data as we found that there are several RBPs differentially expressed in LUSC. Including RBP-mediated regulatory effects in addition to the other features significantly increased the Spearman rank correlation between predicted and measured expression of held-out genes. Using a feature selection procedure that accounts for the adaptive search employed by lasso regularization, we identified the candidate regulators in LUSC. Remarkably, several of these candidate regulators are RBPs. Furthermore, majority of the candidate regulators have been previously found to be associated with lung cancer. To investigate the mechanisms that are controlled by these regulators, we predicted their target gene sets based on our model. We validated the target gene sets by comparing against experimentally verified targets. Our results suggest that the future studies of gene expression in cancer must consider the effect of RBP-mediated regulation.
Collapse
Affiliation(s)
- Atefeh Lafzi
- Department of Health Informatics, Middle East Technical University, Ankara, Turkey
| | - Hilal Kazan
- Department of Computer Engineering, Antalya International University, Antalya, Turkey
- * E-mail:
| |
Collapse
|
44
|
Xiao F, Gao L, Ye Y, Hu Y, He R. Inferring Gene Regulatory Networks Using Conditional Regulation Pattern to Guide Candidate Genes. PLoS One 2016; 11:e0154953. [PMID: 27171286 PMCID: PMC4865039 DOI: 10.1371/journal.pone.0154953] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2015] [Accepted: 04/21/2016] [Indexed: 12/13/2022] Open
Abstract
Combining path consistency (PC) algorithms with conditional mutual information (CMI) are widely used in reconstruction of gene regulatory networks. CMI has many advantages over Pearson correlation coefficient in measuring non-linear dependence to infer gene regulatory networks. It can also discriminate the direct regulations from indirect ones. However, it is still a challenge to select the conditional genes in an optimal way, which affects the performance and computation complexity of the PC algorithm. In this study, we develop a novel conditional mutual information-based algorithm, namely RPNI (Regulation Pattern based Network Inference), to infer gene regulatory networks. For conditional gene selection, we define the co-regulation pattern, indirect-regulation pattern and mixture-regulation pattern as three candidate patterns to guide the selection of candidate genes. To demonstrate the potential of our algorithm, we apply it to gene expression data from DREAM challenge. Experimental results show that RPNI outperforms existing conditional mutual information-based methods in both accuracy and time complexity for different sizes of gene samples. Furthermore, the robustness of our algorithm is demonstrated by noisy interference analysis using different types of noise.
Collapse
Affiliation(s)
- Fei Xiao
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi 710071, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi 710071, China
- * E-mail:
| | - Yusen Ye
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi 710071, China
| | - Yuxuan Hu
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi 710071, China
| | - Ruijie He
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi 710071, China
| |
Collapse
|
45
|
Grechkin M, Logsdon BA, Gentles AJ, Lee SI. Identifying Network Perturbation in Cancer. PLoS Comput Biol 2016; 12:e1004888. [PMID: 27145341 PMCID: PMC4856318 DOI: 10.1371/journal.pcbi.1004888] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Accepted: 03/25/2016] [Indexed: 01/08/2023] Open
Abstract
We present a computational framework, called DISCERN (DIfferential SparsE Regulatory Network), to identify informative topological changes in gene-regulator dependence networks inferred on the basis of mRNA expression datasets within distinct biological states. DISCERN takes two expression datasets as input: an expression dataset of diseased tissues from patients with a disease of interest and another expression dataset from matching normal tissues. DISCERN estimates the extent to which each gene is perturbed-having distinct regulator connectivity in the inferred gene-regulator dependencies between the disease and normal conditions. This approach has distinct advantages over existing methods. First, DISCERN infers conditional dependencies between candidate regulators and genes, where conditional dependence relationships discriminate the evidence for direct interactions from indirect interactions more precisely than pairwise correlation. Second, DISCERN uses a new likelihood-based scoring function to alleviate concerns about accuracy of the specific edges inferred in a particular network. DISCERN identifies perturbed genes more accurately in synthetic data than existing methods to identify perturbed genes between distinct states. In expression datasets from patients with acute myeloid leukemia (AML), breast cancer and lung cancer, genes with high DISCERN scores in each cancer are enriched for known tumor drivers, genes associated with the biological processes known to be important in the disease, and genes associated with patient prognosis, in the respective cancer. Finally, we show that DISCERN can uncover potential mechanisms underlying network perturbation by explaining observed epigenomic activity patterns in cancer and normal tissue types more accurately than alternative methods, based on the available epigenomic data from the ENCODE project.
Collapse
Affiliation(s)
- Maxim Grechkin
- Department of Computer Science & Engineering, University of Washington, Seattle, Washington, United States of America
| | | | - Andrew J. Gentles
- Center for Cancer Systems Biology, Department of Radiology, Stanford University, Stanford, California, United States of America
| | - Su-In Lee
- Department of Computer Science & Engineering, University of Washington, Seattle, Washington, United States of America
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
- * E-mail:
| |
Collapse
|
46
|
Orlova NN, Lebedev TD, Spirin PV, Prassolov VS. Key molecular mechanisms associated with cell malignant transformation in acute myeloid leukemia. Mol Biol 2016. [DOI: 10.1134/s0026893316020187] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
47
|
Abstract
Systems medicine promotes a range of approaches and strategies to study human health and disease at a systems level with the aim of improving the overall well-being of (healthy) individuals, and preventing, diagnosing, or curing disease. In this chapter we discuss how bioinformatics critically contributes to systems medicine. First, we explain the role of bioinformatics in the management and analysis of data. In particular we show the importance of publicly available biological and clinical repositories to support systems medicine studies. Second, we discuss how the integration and analysis of multiple types of omics data through integrative bioinformatics may facilitate the determination of more predictive and robust disease signatures, lead to a better understanding of (patho)physiological molecular mechanisms, and facilitate personalized medicine. Third, we focus on network analysis and discuss how gene networks can be constructed from omics data and how these networks can be decomposed into smaller modules. We discuss how the resulting modules can be used to generate experimentally testable hypotheses, provide insight into disease mechanisms, and lead to predictive models. Throughout, we provide several examples demonstrating how bioinformatics contributes to systems medicine and discuss future challenges in bioinformatics that need to be addressed to enable the advancement of systems medicine.
Collapse
Affiliation(s)
- Ulf Schmitz
- Dept of Systems Biology & Bioinformatics, University of Rostock, Rostock, Germany
| | - Olaf Wolkenhauer
- Dept of Systems Biology & Bioinformatics, University of Rostock, Rostock, Germany
| |
Collapse
|
48
|
Xu Y, Zhu Y, Müller P, Mitra R, Ji Y. Characterizing Cancer-Specific Networks by Integrating TCGA Data. Cancer Inform 2015; 13:125-31. [PMID: 26628858 PMCID: PMC4657757 DOI: 10.4137/cin.s13776] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Revised: 05/11/2015] [Accepted: 05/12/2015] [Indexed: 12/26/2022] Open
Abstract
The Cancer Genome Atlas (TCGA) generates comprehensive genomic data for thousands of patients over more than 20 cancer types. TCGA data are typically whole-genome measurements of multiple genomic features, such as DNA copy numbers, DNA methylation, and gene expression, providing unique opportunities for investigating cancer mechanism from multiple molecular and regulatory layers. We propose a Bayesian graphical model to systemically integrate multi-platform TCGA data for inference of the interactions between different genomic features either within a gene or between multiple genes. The presence or absence of edges in the graph indicates the presence or absence of conditional dependence between genomic features. The inference is restricted to genes within a known biological network, but can be extended to any sets of genes. Applying the model to the same genes using patient samples in two different cancer types, we identify network components that are common as well as different between cancer types. The examples and codes are available at https://www.ma.utexas.edu/users/yxu/software.html.
Collapse
Affiliation(s)
- Yanxun Xu
- Department of Statistics and Data Sciences, The University of Texas at Austin, Austin, TX, USA
| | - Yitan Zhu
- Northshore University HealthSystem, Evanston, IL, USA
| | - Peter Müller
- Department of Mathematics, The University of Texas at Austin, Austin, TX, USA
| | - Riten Mitra
- School of Public Health and Information Sciences, The University of Louisville, Louisville, KY, USA
| | - Yuan Ji
- Northshore University HealthSystem, Evanston, IL, USA. ; Department of Public Health Sciences, The University of Chicago, Chicago, IL, USA
| |
Collapse
|
49
|
Abstract
Despite the rapid accumulation of tumor-profiling data and transcription factor (TF) ChIP-seq profiles, efforts integrating TF binding with the tumor-profiling data to understand how TFs regulate tumor gene expression are still limited. To systematically search for cancer-associated TFs, we comprehensively integrated 686 ENCODE ChIP-seq profiles representing 150 TFs with 7484 TCGA tumor data in 18 cancer types. For efficient and accurate inference on gene regulatory rules across a large number and variety of datasets, we developed an algorithm, RABIT (regression analysis with background integration). In each tumor sample, RABIT tests whether the TF target genes from ChIP-seq show strong differential regulation after controlling for background effect from copy number alteration and DNA methylation. When multiple ChIP-seq profiles are available for a TF, RABIT prioritizes the most relevant ChIP-seq profile in each tumor. In each cancer type, RABIT further tests whether the TF expression and somatic mutation variations are correlated with differential expression patterns of its target genes across tumors. Our predicted TF impact on tumor gene expression is highly consistent with the knowledge from cancer-related gene databases and reveals many previously unidentified aspects of transcriptional regulation in tumor progression. We also applied RABIT on RNA-binding protein motifs and found that some alternative splicing factors could affect tumor-specific gene expression by binding to target gene 3'UTR regions. Thus, RABIT (rabit.dfci.harvard.edu) is a general platform for predicting the oncogenic role of gene expression regulators.
Collapse
|
50
|
Li Y, Zhang Z. Computational Biology in microRNA. WILEY INTERDISCIPLINARY REVIEWS-RNA 2015; 6:435-52. [DOI: 10.1002/wrna.1286] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2015] [Revised: 03/24/2015] [Accepted: 03/25/2015] [Indexed: 01/24/2023]
Affiliation(s)
- Yue Li
- Department of Computer Science; University of Toronto; Toronto Ontario Canada
- Donnelly Centre for Cellular and Biomolecular Research; University of Toronto; Toronto Ontario Canada
| | - Zhaolei Zhang
- Donnelly Centre for Cellular and Biomolecular Research; University of Toronto; Toronto Ontario Canada
- Department of Molecular Genetics; University of Toronto; Toronto Ontario Canada
| |
Collapse
|