1
|
Dibaeinia P, Ojha A, Sinha S. Interpretable AI for inference of causal molecular relationships from omics data. SCIENCE ADVANCES 2025; 11:eadk0837. [PMID: 39951525 PMCID: PMC11827637 DOI: 10.1126/sciadv.adk0837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 01/14/2025] [Indexed: 02/16/2025]
Abstract
The discovery of molecular relationships from high-dimensional data is a major open problem in bioinformatics. Machine learning and feature attribution models have shown great promise in this context but lack causal interpretation. Here, we show that a popular feature attribution model, under certain assumptions, estimates an average of a causal quantity reflecting the direct influence of one variable on another. We leverage this insight to propose a precise definition of a gene regulatory relationship and implement a new tool, CIMLA (Counterfactual Inference by Machine Learning and Attribution Models), to identify differences in gene regulatory networks between biological conditions, a problem that has received great attention in recent years. Using extensive benchmarking on simulated data, we show that CIMLA is more robust to confounding variables and is more accurate than leading methods. Last, we use CIMLA to analyze a previously published single-cell RNA sequencing dataset from subjects with and without Alzheimer's disease (AD), discovering several potential regulators of AD.
Collapse
Affiliation(s)
- Payam Dibaeinia
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Abhishek Ojha
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Saurabh Sinha
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
- H. Milton School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
2
|
Li S, Liu Y, Shen LC, Yan H, Song J, Yu DJ. GMFGRN: a matrix factorization and graph neural network approach for gene regulatory network inference. Brief Bioinform 2024; 25:bbad529. [PMID: 38261340 PMCID: PMC10805180 DOI: 10.1093/bib/bbad529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/08/2023] [Accepted: 12/19/2023] [Indexed: 01/24/2024] Open
Abstract
The recent advances of single-cell RNA sequencing (scRNA-seq) have enabled reliable profiling of gene expression at the single-cell level, providing opportunities for accurate inference of gene regulatory networks (GRNs) on scRNA-seq data. Most methods for inferring GRNs suffer from the inability to eliminate transitive interactions or necessitate expensive computational resources. To address these, we present a novel method, termed GMFGRN, for accurate graph neural network (GNN)-based GRN inference from scRNA-seq data. GMFGRN employs GNN for matrix factorization and learns representative embeddings for genes. For transcription factor-gene pairs, it utilizes the learned embeddings to determine whether they interact with each other. The extensive suite of benchmarking experiments encompassing eight static scRNA-seq datasets alongside several state-of-the-art methods demonstrated mean improvements of 1.9 and 2.5% over the runner-up in area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). In addition, across four time-series datasets, maximum enhancements of 2.4 and 1.3% in AUROC and AUPRC were observed in comparison to the runner-up. Moreover, GMFGRN requires significantly less training time and memory consumption, with time and memory consumed <10% compared to the second-best method. These findings underscore the substantial potential of GMFGRN in the inference of GRNs. It is publicly available at https://github.com/Lishuoyy/GMFGRN.
Collapse
Affiliation(s)
- Shuo Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Yan Liu
- School of information Engineering, Yangzhou University, 196 West Huayang, Yangzhou, 225000, China
| | - Long-Chen Shen
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - He Yan
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| |
Collapse
|
3
|
Celis K, Moreno MDMM, Rajabli F, Whitehead P, Hamilton-Nelson K, Dykxhoorn DM, Nuytemans K, Wang L, Flanagan M, Weintraub S, Geula C, Gearing M, Dalgard CL, Jin F, Bennett DA, Schuck T, Pericak-Vance MA, Griswold AJ, Young JI, Vance JM. Ancestry-related differences in chromatin accessibility and gene expression of APOE ε4 are associated with Alzheimer's disease risk. Alzheimers Dement 2023; 19:3902-3915. [PMID: 37037656 PMCID: PMC10529851 DOI: 10.1002/alz.13075] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 03/03/2023] [Accepted: 03/08/2023] [Indexed: 04/12/2023]
Abstract
INTRODUCTION European local ancestry (ELA) surrounding apolipoprotein E (APOE) ε4 confers higher risk for Alzheimer's disease (AD) compared to African local ancestry (ALA). We demonstrated significantly higher APOE ε4 expression in ELA versus ALA in AD brains from APOE ε4/ε4 carriers. Chromatin accessibility differences could contribute to these expression changes. METHODS We performed single nuclei assays for transposase accessible chromatin sequencing from the frontal cortex of six ALA and six ELA AD brains, homozygous for local ancestry and APOE ε4. RESULTS Our results showed an increased chromatin accessibility at the APOE ε4 promoter area in ELA versus ALA astrocytes. This increased accessibility in ELA astrocytes extended genome wide. Genes with increased accessibility in ELA in astrocytes were enriched for synapsis, cholesterol processing, and astrocyte reactivity. DISCUSSION Our results suggest that increased chromatin accessibility of APOE ε4 in ELA astrocytes contributes to the observed elevated APOE ε4 expression, corresponding to the increased AD risk in ELA versus ALA APOE ε4/ε4 carriers.
Collapse
Affiliation(s)
- Katrina Celis
- John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, FL, USA, 33136
| | - Maria DM. Muniz Moreno
- John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, FL, USA, 33136
| | - Farid Rajabli
- John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, FL, USA, 33136
| | - Patrice Whitehead
- John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, FL, USA, 33136
| | - Kara Hamilton-Nelson
- John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, FL, USA, 33136
| | - Derek M. Dykxhoorn
- John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, FL, USA, 33136
- Dr. John T Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA, 33136
| | - Karen Nuytemans
- John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, FL, USA, 33136
- Dr. John T Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA, 33136
| | - Liyong Wang
- John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, FL, USA, 33136
- Dr. John T Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA, 33136
| | - Margaret Flanagan
- Northwestern ADC Neuropathology Core, Northwestern University Feinberg School of Medicine, Chicago, IL, USA, 60611
| | - Sandra Weintraub
- Northwestern ADC Neuropathology Core, Northwestern University Feinberg School of Medicine, Chicago, IL, USA, 60611
| | - Changiz Geula
- Northwestern ADC Neuropathology Core, Northwestern University Feinberg School of Medicine, Chicago, IL, USA, 60611
| | - Marla Gearing
- Goizueta Alzheimer’s Disease Research Center, Emory University, Atlanta, GA, USA, 15213
| | - Clifton L. Dalgard
- The American Genome Center, Uniformed Services University, Bethesda, MD, USA, 20814
- Collaborative Health Initiative Research Program, Henry Jackson Foundation, Bethesda, MD, USA, 20817
- Department of Anatomy Physiology & Genetics, Uniformed Services University, Bethesda, MD, USA, 20814
| | - Fulai Jin
- Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, Ohio, USA, 44106
| | - David A. Bennett
- Department of Neurological Sciences, Rush University, Chicago, IL, USA, 60612
| | - Theresa Schuck
- The Department of Pathology and Laboratory Medicine, Institute on Aging and Center for Neurodegenerative Disease Research, The Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA,19104
| | - Margaret A. Pericak-Vance
- John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, FL, USA, 33136
- Dr. John T Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA, 33136
| | - Anthony J. Griswold
- John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, FL, USA, 33136
- Dr. John T Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA, 33136
| | - Juan I. Young
- John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, FL, USA, 33136
- Dr. John T Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA, 33136
| | - Jeffery M. Vance
- John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, FL, USA, 33136
- Dr. John T Macdonald Foundation Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA, 33136
| |
Collapse
|
4
|
Yu C, Wang J. Data mining and mathematical models in cancer prognosis and prediction. MEDICAL REVIEW (BERLIN, GERMANY) 2022; 2:285-307. [PMID: 37724193 PMCID: PMC10388766 DOI: 10.1515/mr-2021-0026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 12/29/2021] [Indexed: 09/20/2023]
Abstract
Cancer is a fetal and complex disease. Individual differences of the same cancer type or the same patient at different stages of cancer development may require distinct treatments. Pathological differences are reflected in tissues, cells and gene levels etc. The interactions between the cancer cells and nearby microenvironments can also influence the cancer progression and metastasis. It is a huge challenge to understand all of these mechanistically and quantitatively. Researchers applied pattern recognition algorithms such as machine learning or data mining to predict cancer types or classifications. With the rapidly growing and available computing powers, researchers begin to integrate huge data sets, multi-dimensional data types and information. The cells are controlled by the gene expressions determined by the promoter sequences and transcription regulators. For example, the changes in the gene expression through these underlying mechanisms can modify cell progressing in the cell-cycle. Such molecular activities can be governed by the gene regulations through the underlying gene regulatory networks, which are essential for cancer study when the information and gene regulations are clear and available. In this review, we briefly introduce several machine learning methods of cancer prediction and classification which include Artificial Neural Networks (ANNs), Decision Trees (DTs), Support Vector Machine (SVM) and naive Bayes. Then we describe a few typical models for building up gene regulatory networks such as Correlation, Regression and Bayes methods based on available data. These methods can help on cancer diagnosis such as susceptibility, recurrence, survival etc. At last, we summarize and compare the modeling methods to analyze the development and progression of cancer through gene regulatory networks. These models can provide possible physical strategies to analyze cancer progression in a systematic and quantitative way.
Collapse
Affiliation(s)
- Chong Yu
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun, Jilin, China
- Department of Statistics, JiLin University of Finance and Economics, Changchun, Jilin Province, China
| | - Jin Wang
- Department of Chemistry and of Physics and Astronomy, State University of New York, Stony Brook, NY, USA
| |
Collapse
|
5
|
He W, Tang J, Zou Q, Guo F. MMFGRN: a multi-source multi-model fusion method for gene regulatory network reconstruction. Brief Bioinform 2021; 22:6261916. [PMID: 33939795 DOI: 10.1093/bib/bbab166] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 03/08/2021] [Accepted: 04/08/2021] [Indexed: 01/05/2023] Open
Abstract
Lots of biological processes are controlled by gene regulatory networks (GRNs), such as growth and differentiation of cells, occurrence and development of the diseases. Therefore, it is important to persistently concentrate on the research of GRN. The determination of the gene-gene relationships from gene expression data is a complex issue. Since it is difficult to efficiently obtain the regularity behind the gene-gene relationship by only relying on biochemical experimental methods, thus various computational methods have been used to construct GRNs, and some achievements have been made. In this paper, we propose a novel method MMFGRN (for "Multi-source Multi-model Fusion for Gene Regulatory Network reconstruction") to reconstruct the GRN. In order to make full use of the limited datasets and explore the potential regulatory relationships contained in different data types, we construct the MMFGRN model from three perspectives: single time series data model, single steady-data model and time series and steady-data joint model. And, we utilize the weighted fusion strategy to get the final global regulatory link ranking. Finally, MMFGRN model yields the best performance on the DREAM4 InSilico_Size10 data, outperforming other popular inference algorithms, with an overall area under receiver operating characteristic score of 0.909 and area under precision-recall (AUPR) curves score of 0.770 on the 10-gene network. Additionally, as the network scale increases, our method also has certain advantages with an overall AUPR score of 0.335 on the DREAM4 InSilico_Size100 data. These results demonstrate the good robustness of MMFGRN on different scales of networks. At the same time, the integration strategy proposed in this paper provides a new idea for the reconstruction of the biological network model without prior knowledge, which can help researchers to decipher the elusive mechanism of life.
Collapse
Affiliation(s)
| | | | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
6
|
Zhao M, He W, Tang J, Zou Q, Guo F. A comprehensive overview and critical evaluation of gene regulatory network inference technologies. Brief Bioinform 2021; 22:6128842. [PMID: 33539514 DOI: 10.1093/bib/bbab009] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 12/11/2020] [Accepted: 01/06/2021] [Indexed: 12/12/2022] Open
Abstract
Gene regulatory network (GRN) is the important mechanism of maintaining life process, controlling biochemical reaction and regulating compound level, which plays an important role in various organisms and systems. Reconstructing GRN can help us to understand the molecular mechanism of organisms and to reveal the essential rules of a large number of biological processes and reactions in organisms. Various outstanding network reconstruction algorithms use specific assumptions that affect prediction accuracy, in order to deal with the uncertainty of processing. In order to study why a certain method is more suitable for specific research problem or experimental data, we conduct research from model-based, information-based and machine learning-based method classifications. There are obviously different types of computational tools that can be generated to distinguish GRNs. Furthermore, we discuss several classical, representative and latest methods in each category to analyze core ideas, general steps, characteristics, etc. We compare the performance of state-of-the-art GRN reconstruction technologies on simulated networks and real networks under different scaling conditions. Through standardized performance metrics and common benchmarks, we quantitatively evaluate the stability of various methods and the sensitivity of the same algorithm applying to different scaling networks. The aim of this study is to explore the most appropriate method for a specific GRN, which helps biologists and medical scientists in discovering potential drug targets and identifying cancer biomarkers.
Collapse
Affiliation(s)
- Mengyuan Zhao
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Wenying He
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jijun Tang
- University of South Carolina, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
7
|
Floc'hlay S, Molina MD, Hernandez C, Haillot E, Thomas-Chollier M, Lepage T, Thieffry D. Deciphering and modelling the TGF-β signalling interplays specifying the dorsal-ventral axis of the sea urchin embryo. Development 2021; 148:dev.189944. [PMID: 33298464 DOI: 10.1242/dev.189944] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Accepted: 11/16/2020] [Indexed: 11/20/2022]
Abstract
During sea urchin development, secretion of Nodal and BMP2/4 ligands and their antagonists Lefty and Chordin from a ventral organiser region specifies the ventral and dorsal territories. This process relies on a complex interplay between the Nodal and BMP pathways through numerous regulatory circuits. To decipher the interplay between these pathways, we used a combination of treatments with recombinant Nodal and BMP2/4 proteins and a computational modelling approach. We assembled a logical model focusing on cell responses to signalling inputs along the dorsal-ventral axis, which was extended to cover ligand diffusion and enable multicellular simulations. Our model simulations accurately recapitulate gene expression in wild-type embryos, accounting for the specification of ventral ectoderm, ciliary band and dorsal ectoderm. Our model simulations further recapitulate various morphant phenotypes, reveal a dominance of the BMP pathway over the Nodal pathway and stress the crucial impact of the rate of Smad activation in dorsal-ventral patterning. These results emphasise the key role of the mutual antagonism between the Nodal and BMP2/4 pathways in driving early dorsal-ventral patterning of the sea urchin embryo.
Collapse
Affiliation(s)
- Swann Floc'hlay
- Department of Biology, Institut de Biologie de l'ENS (IBENS), École Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | | | - Céline Hernandez
- Department of Biology, Institut de Biologie de l'ENS (IBENS), École Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Emmanuel Haillot
- Institut Biologie Valrose, Université Côte d'Azur, 06108 Nice, France
| | - Morgane Thomas-Chollier
- Department of Biology, Institut de Biologie de l'ENS (IBENS), École Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France.,Institut Universitaire de France (IUF), 75005 Paris, France
| | - Thierry Lepage
- Institut Biologie Valrose, Université Côte d'Azur, 06108 Nice, France
| | - Denis Thieffry
- Department of Biology, Institut de Biologie de l'ENS (IBENS), École Normale Supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| |
Collapse
|
8
|
Shen Y, Lu H, Chen R, Zhu L, Song G. MicroRNA-29c affects zebrafish cardiac development via targeting Wnt4. Mol Med Rep 2020; 22:4675-4684. [PMID: 33173954 PMCID: PMC7646856 DOI: 10.3892/mmr.2020.11584] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Accepted: 09/18/2020] [Indexed: 01/07/2023] Open
Abstract
As a single cardiac malformation, ventricular septal defect (VSD) is the most common form of congenital heart disease. However, the precise molecular mechanisms underlying VSD are not completely understood. Numerous microRNAs (miRs/miRNAs) are associated with ventricular septal defects. miR-29c inhibits the proliferation and promotes the apoptosis and differentiation of P19 embryonal carcinoma cells, possibly via suppressing Wnt4 signaling. However, to the best of our knowledge, no in vivo studies have been published to determine whether overexpression of miR-29c leads to developmental abnormalities. The present study was designed to observe the effect of miRNA-29c on cardiac development and its possible mechanism in vivo. Zebrafish embryos were microinjected with different doses (1, 1.6 and 2 µmol) miR-29c mimics or negative controls, and hatchability, mortality and cardiac malformation were subsequently observed. The results showed that in zebrafish embryos, miR-29c overexpression attenuated heart development in a dose-dependent manner, manifested by heart rate slowdown, pericardial edema and heart looping disorder. Further experiments showed that overexpression of miR-29c was associated with the Wnt4/β-catenin signaling pathway to regulate zebrafish embryonic heart development. In conclusion, the present results demonstrated that miR-29c regulated the lateral development and cardiac circulation of zebrafish embryo by targeting Wnt4.
Collapse
Affiliation(s)
- Yahui Shen
- Department of Respiratory and Critical Care Medicine, Taizhou People's Hospital, Taizhou, Jiangsu 225300, P.R. China
| | - Huiyu Lu
- Department of Respiratory and Critical Care Medicine, Taizhou People's Hospital, Taizhou, Jiangsu 225300, P.R. China
| | - Rong Chen
- Department of Respiratory and Critical Care Medicine, Taizhou People's Hospital, Taizhou, Jiangsu 225300, P.R. China
| | - Li Zhu
- Department of Cardiology, Taizhou People's Hospital, Taizhou, Jiangsu 225300, P.R. China
| | - Guixian Song
- Department of Cardiology, Taizhou People's Hospital, Taizhou, Jiangsu 225300, P.R. China
| |
Collapse
|
9
|
Overton IM, Sims AH, Owen JA, Heale BSE, Ford MJ, Lubbock ALR, Pairo-Castineira E, Essafi A. Functional Transcription Factor Target Networks Illuminate Control of Epithelial Remodelling. Cancers (Basel) 2020; 12:cancers12102823. [PMID: 33007944 PMCID: PMC7652213 DOI: 10.3390/cancers12102823] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2020] [Revised: 09/16/2020] [Accepted: 09/24/2020] [Indexed: 12/15/2022] Open
Abstract
Cell identity is governed by gene expression, regulated by transcription factor (TF) binding at cis-regulatory modules. Decoding the relationship between TF binding patterns and gene regulation is nontrivial, remaining a fundamental limitation in understanding cell decision-making. We developed the NetNC software to predict functionally active regulation of TF targets; demonstrated on nine datasets for the TFs Snail, Twist, and modENCODE Highly Occupied Target (HOT) regions. Snail and Twist are canonical drivers of epithelial to mesenchymal transition (EMT), a cell programme important in development, tumour progression and fibrosis. Predicted "neutral" (non-functional) TF binding always accounted for the majority (50% to 95%) of candidate target genes from statistically significant peaks and HOT regions had higher functional binding than most of the Snail and Twist datasets examined. Our results illuminated conserved gene networks that control epithelial plasticity in development and disease. We identified new gene functions and network modules including crosstalk with notch signalling and regulation of chromatin organisation, evidencing networks that reshape Waddington's epigenetic landscape during epithelial remodelling. Expression of orthologous functional TF targets discriminated breast cancer molecular subtypes and predicted novel tumour biology, with implications for precision medicine. Predicted invasion roles were validated using a tractable cell model, supporting our approach.
Collapse
Affiliation(s)
- Ian M. Overton
- MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK; (A.H.S.); (B.S.E.H.); (M.J.F.); (A.L.R.L.); (E.P.-C.); (A.E.)
- Department of Systems Biology, Harvard University, Boston, MA 02115, USA;
- Centre for Synthetic and Systems Biology (SynthSys), University of Edinburgh, Edinburgh EH9 3BF, UK
- Patrick G Johnston Centre for Cancer Research, Queen’s University Belfast, Belfast BT9 7AE, UK
- Correspondence:
| | - Andrew H. Sims
- MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK; (A.H.S.); (B.S.E.H.); (M.J.F.); (A.L.R.L.); (E.P.-C.); (A.E.)
| | - Jeremy A. Owen
- Department of Systems Biology, Harvard University, Boston, MA 02115, USA;
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Bret S. E. Heale
- MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK; (A.H.S.); (B.S.E.H.); (M.J.F.); (A.L.R.L.); (E.P.-C.); (A.E.)
| | - Matthew J. Ford
- MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK; (A.H.S.); (B.S.E.H.); (M.J.F.); (A.L.R.L.); (E.P.-C.); (A.E.)
| | - Alexander L. R. Lubbock
- MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK; (A.H.S.); (B.S.E.H.); (M.J.F.); (A.L.R.L.); (E.P.-C.); (A.E.)
| | - Erola Pairo-Castineira
- MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK; (A.H.S.); (B.S.E.H.); (M.J.F.); (A.L.R.L.); (E.P.-C.); (A.E.)
| | - Abdelkader Essafi
- MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK; (A.H.S.); (B.S.E.H.); (M.J.F.); (A.L.R.L.); (E.P.-C.); (A.E.)
| |
Collapse
|
10
|
Réda C, Wilczyński B. Automated inference of gene regulatory networks using explicit regulatory modules. J Theor Biol 2020; 486:110091. [PMID: 31790679 DOI: 10.1016/j.jtbi.2019.110091] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Revised: 08/09/2019] [Accepted: 11/21/2019] [Indexed: 11/25/2022]
Abstract
Gene regulatory networks are a popular tool for modelling important biological phenomena, such as cell differentiation or oncogenesis. Efficient identification of the causal connections between genes, their products and regulating transcription factors, is key to understanding how defects in their function may trigger diseases. Modelling approaches should keep up with the ever more detailed descriptions of the biological phenomena at play, as provided by new experimental findings and technical improvements. In recent years, we have seen great improvements in mapping of specific binding sites of many transcription factors to distinct regulatory regions. Recent gene regulatory network models use binding measurements; but usually only to define gene-to-gene interactions, ignoring regulatory module structure. Moreover, current huge amount of transcriptomic data, and exploration of all possible cis-regulatory arrangements which can lead to the same transcriptomic response, makes manual model building both tedious and time-consuming. In our paper, we propose a method to specify possible regulatory connections in a given Boolean network, based on transcription factor binding evidence. This is implemented by an algorithm which expands a regular Boolean network model into a "cis-regulatory" Boolean network model. This expanded model explicitly defines regulatory regions as additional nodes in the network, and adds new, valuable biological insights to the system dynamics. The expanded model can automatically be compared with expression data. And, for each node, a regulatory function, consistent with the experimental data, can be found. The resulting models are usually more constrained (by biologically-motivated metadata), and can then be inspected in in silico simulations. The fully automated method for model identification has been implemented in Python, and the expansion algorithm in R. The method resorts to the Z3 Satisfiability Modulo Theories (SMT) solver, and is similar to the RE:IN application (Yordanov et al., 2016). It is available on https://github.com/regulomics/expansion-network.
Collapse
Affiliation(s)
- Clémence Réda
- École Normale Supérieure Paris-Saclay, 61 avenue du Président Wilson, 94230 Cachan, France.
| | - Bartek Wilczyński
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, ulica Stefana Banacha 2, Warsaw 02-097, Poland
| |
Collapse
|
11
|
Peng PC, Khoueiry P, Girardot C, Reddington JP, Garfield DA, Furlong EEM, Sinha S. The Role of Chromatin Accessibility in cis-Regulatory Evolution. Genome Biol Evol 2020; 11:1813-1828. [PMID: 31114856 PMCID: PMC6601868 DOI: 10.1093/gbe/evz103] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/13/2019] [Indexed: 02/07/2023] Open
Abstract
Transcription factor (TF) binding is determined by sequence as well as chromatin accessibility. Although the role of accessibility in shaping TF-binding landscapes is well recorded, its role in evolutionary divergence of TF binding, which in turn can alter cis-regulatory activities, is not well understood. In this work, we studied the evolution of genome-wide binding landscapes of five major TFs in the core network of mesoderm specification, between Drosophila melanogaster and Drosophila virilis, and examined its relationship to accessibility and sequence-level changes. We generated chromatin accessibility data from three important stages of embryogenesis in both Drosophila melanogaster and Drosophila virilis and recorded conservation and divergence patterns. We then used multivariable models to correlate accessibility and sequence changes to TF-binding divergence. We found that accessibility changes can in some cases, for example, for the master regulator Twist and for earlier developmental stages, more accurately predict binding change than is possible using TF-binding motif changes between orthologous enhancers. Accessibility changes also explain a significant portion of the codivergence of TF pairs. We noted that accessibility and motif changes offer complementary views of the evolution of TF binding and developed a combined model that captures the evolutionary data much more accurately than either view alone. Finally, we trained machine learning models to predict enhancer activity from TF binding and used these functional models to argue that motif and accessibility-based predictors of TF-binding change can substitute for experimentally measured binding change, for the purpose of predicting evolutionary changes in enhancer activity.
Collapse
Affiliation(s)
- Pei-Chen Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign.,Center for Bioinformatics and Functional Genomics, Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA
| | - Pierre Khoueiry
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.,American University of Beirut (AUB), Department of Biochemistry and Molecular Genetics, Beirut, Lebanon
| | - Charles Girardot
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - James P Reddington
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - David A Garfield
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.,IRI-Life Sciences, Humboldt Universität zu Berlin, Berlin, Germany
| | - Eileen E M Furlong
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign
| |
Collapse
|
12
|
Zaborowski R, Wilczyński B. BPscore: An Effective Metric for Meaningful Comparisons of Structural Chromosome Segmentations. J Comput Biol 2019; 26:305-314. [PMID: 30810370 DOI: 10.1089/cmb.2018.0162] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Studying the three-dimensional structure of chromosomes is an emerging field flourishing in recent years because of rapid development of experimental approaches for studying chromosomal contacts. This has led to numerous studies providing results of segmentation of chromosome sequences of different species into so-called topologically associating domains (TADs). As the number of such studies grows steadily and many of them make claims about the perceived differences between TAD structures observed in different conditions, there is a growing need for good measures of similarity (or dissimilarity) between such segmentations. We provide here a bipartite (BP) score, which is a relatively simple distance metric based on the bipartite matching between two segmentations. In this article, we provide the rationale behind choosing specifically this function and show its results on several different data sets, both simulated and experimental. We show that not only the BP score is a proper metric satisfying the triangle inequality, but also that it is providing good granularity of scores for typical situations occurring between different TAD segmentations. We also introduce local variant of the BP metric and show that in actual comparisons between experimental data sets, the local BP score is correlating with the observed changes in gene expression and genome methylation. In summary, we consider the BP score a good foundation for analyzing the dynamics of chromosome structures. The methodology we present in this study could be used by many researchers in their ongoing analyses, making it a popular and useful tool.
Collapse
Affiliation(s)
- Rafał Zaborowski
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | - Bartek Wilczyński
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| |
Collapse
|
13
|
Korona D, Koestler SA, Russell S. Engineering the Drosophila Genome for Developmental Biology. J Dev Biol 2017; 5:jdb5040016. [PMID: 29615571 PMCID: PMC5831791 DOI: 10.3390/jdb5040016] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2017] [Revised: 12/07/2017] [Accepted: 12/08/2017] [Indexed: 02/07/2023] Open
Abstract
The recent development of transposon and CRISPR-Cas9-based tools for manipulating the fly genome in vivo promises tremendous progress in our ability to study developmental processes. Tools for introducing tags into genes at their endogenous genomic loci facilitate imaging or biochemistry approaches at the cellular or subcellular levels. Similarly, the ability to make specific alterations to the genome sequence allows much more precise genetic control to address questions of gene function.
Collapse
Affiliation(s)
- Dagmara Korona
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK.
| | - Stefan A Koestler
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK.
| | - Steven Russell
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK.
| |
Collapse
|
14
|
Herman-Izycka J, Wlasnowolski M, Wilczynski B. Taking promoters out of enhancers in sequence based predictions of tissue-specific mammalian enhancers. BMC Med Genomics 2017; 10:34. [PMID: 28589862 PMCID: PMC5461523 DOI: 10.1186/s12920-017-0264-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Many genetic diseases are caused by mutations in non-coding regions of the genome. These mutations are frequently found in enhancer sequences, causing disruption to the regulatory program of the cell. Enhancers are short regulatory sequences in the non-coding part of the genome that are essential for the proper regulation of transcription. While the experimental methods for identification of such sequences are improving every year, our understanding of the rules behind the enhancer activity has not progressed much in the last decade. This is especially true in case of tissue-specific enhancers, where there are clear problems in predicting specificity of enhancer activity. RESULTS We show a random-forest based machine learning approach capable of matching the performance of the current state-of-the-art methods for enhancer prediction. Then we show that it is, similarly to other published methods, frequently cross-predicting enhancers as active in different tissues, making it less useful for predicting tissue specific activity. Then we proceed to show that the problem is related to the fact that the enhancer predicting models exhibit a bias towards predicting gene promoters as active enhancers. Then we show that using a two-step classifier can lead to lower cross-prediction between tissues. CONCLUSIONS We provide whole-genome predictions of human heart and brain enhancers obtained with two-step classifier.
Collapse
Affiliation(s)
- Julia Herman-Izycka
- Institute of Informatics, University of Warsaw, Banacha 2, Warsaw, 02-097, Poland
| | - Michal Wlasnowolski
- Institute of Informatics, University of Warsaw, Banacha 2, Warsaw, 02-097, Poland
| | - Bartek Wilczynski
- Institute of Informatics, University of Warsaw, Banacha 2, Warsaw, 02-097, Poland.
| |
Collapse
|
15
|
Ghodsi Z, Huang X, Hassani H. Causality analysis detects the regulatory role of maternal effect genes in the early Drosophila embryo. GENOMICS DATA 2017; 11:20-38. [PMID: 27924281 PMCID: PMC5129166 DOI: 10.1016/j.gdata.2016.11.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/01/2016] [Revised: 10/28/2016] [Accepted: 11/10/2016] [Indexed: 11/28/2022]
Abstract
In developmental studies, inferring regulatory interactions of segmentation genetic network play a vital role in unveiling the mechanism of pattern formation. As such, there exists an opportune demand for theoretical developments and new mathematical models which can result in a more accurate illustration of this genetic network. Accordingly, this paper seeks to extract the meaningful regulatory role of the maternal effect genes using a variety of causality detection techniques and to explore whether these methods can suggest a new analytical view to the gene regulatory networks. We evaluate the use of three different powerful and widely-used models representing time and frequency domain Granger causality and convergent cross mapping technique with the results being thoroughly evaluated for statistical significance. Our findings show that the regulatory role of maternal effect genes is detectable in different time classes and thereby the method is applicable to infer the possible regulatory interactions present among the other genes of this network.
Collapse
Affiliation(s)
- Zara Ghodsi
- Statistical Research Centre, Bournemouth University, 89 Holdenhurst Road, Bournemouth BH8 8EB, UK; Translational Genetics Group, Bournemouth University, Fern Barrow, Poole BH125BB, UK
| | - Xu Huang
- Statistical Research Centre, Bournemouth University, 89 Holdenhurst Road, Bournemouth BH8 8EB, UK
| | - Hossein Hassani
- Institute for International Energy Studies (IIES), Tehran 1967743 711, Iran
| |
Collapse
|
16
|
Fuxman Bass JI, Pons C, Kozlowski L, Reece-Hoyes JS, Shrestha S, Holdorf AD, Mori A, Myers CL, Walhout AJ. A gene-centered C. elegans protein-DNA interaction network provides a framework for functional predictions. Mol Syst Biol 2016; 12:884. [PMID: 27777270 PMCID: PMC5081483 DOI: 10.15252/msb.20167131] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Transcription factors (TFs) play a central role in controlling spatiotemporal gene expression and the response to environmental cues. A comprehensive understanding of gene regulation requires integrating physical protein–DNA interactions (PDIs) with TF regulatory activity, expression patterns, and phenotypic data. Although great progress has been made in mapping PDIs using chromatin immunoprecipitation, these studies have only characterized ~10% of TFs in any metazoan species. The nematode C. elegans has been widely used to study gene regulation due to its compact genome with short regulatory sequences. Here, we delineated the largest gene‐centered metazoan PDI network to date by examining interactions between 90% of C. elegans TFs and 15% of gene promoters. We used this network as a backbone to predict TF binding sites for 77 TFs, two‐thirds of which are novel, as well as integrate gene expression, protein–protein interaction, and phenotypic data to predict regulatory and biological functions for multiple genes and TFs.
Collapse
Affiliation(s)
- Juan I Fuxman Bass
- Program in Systems Biology and Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA, USA
| | - Carles Pons
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, Minneapolis, MN, USA
| | - Lucie Kozlowski
- Program in Systems Biology and Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA, USA
| | - John S Reece-Hoyes
- Program in Systems Biology and Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA, USA
| | - Shaleen Shrestha
- Program in Systems Biology and Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA, USA
| | - Amy D Holdorf
- Program in Systems Biology and Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA, USA
| | - Akihiro Mori
- Program in Systems Biology and Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA, USA
| | - Chad L Myers
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, Minneapolis, MN, USA
| | - Albertha Jm Walhout
- Program in Systems Biology and Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA, USA
| |
Collapse
|
17
|
Maksimov DA, Laktionov PP, Belyakin SN. Data analysis algorithm for DamID-seq profiling of chromatin proteins in Drosophila melanogaster. Chromosome Res 2016; 24:481-494. [DOI: 10.1007/s10577-016-9538-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2016] [Revised: 09/25/2016] [Accepted: 09/26/2016] [Indexed: 11/29/2022]
|
18
|
Bednarz P, Wilczyński B. Supervised learning method for predicting chromatin boundary associated insulator elements. J Bioinform Comput Biol 2015; 12:1442006. [PMID: 25385081 DOI: 10.1142/s0219720014420062] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In eukaryotic cells, the DNA material is densely packed inside the nucleus in the form of a DNA-protein complex structure called chromatin. Since the actual conformation of the chromatin fiber defines the possible regulatory interactions between genes and their regulatory elements, it is very important to understand the mechanisms governing folding of chromatin. In this paper, we show that supervised methods for predicting chromatin boundary elements are much more effective than the currently popular unsupervised methods. Using boundary locations from published Hi-C experiments and modEncode tracks as features, we can tell the insulator elements from randomly selected background sequences with great accuracy. In addition to accurate predictions of the training boundary elements, our classifiers make new predictions. Many of them correspond to the locations of known insulator elements. The key features used for predicting boundary elements do not depend on the prediction method. Because of its miniscule size, chromatin state cannot be measured directly, we need to rely on indirect measurements, such as ChIP-Seq and fill in the gaps with computational models. Our results show that currently, at least in the model organisms, where we have many measurements including ChIP-Seq and Hi-C, we can make accurate predictions of insulator positions.
Collapse
Affiliation(s)
- Paweł Bednarz
- Institute of Informatics, Warsaw University, Banacha 2, Warsaw 02-089, Poland
| | | |
Collapse
|
19
|
de Taffin M, Carrier Y, Dubois L, Bataillé L, Painset A, Le Gras S, Jost B, Crozatier M, Vincent A. Genome-Wide Mapping of Collier In Vivo Binding Sites Highlights Its Hierarchical Position in Different Transcription Regulatory Networks. PLoS One 2015. [PMID: 26204530 PMCID: PMC4512700 DOI: 10.1371/journal.pone.0133387] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Collier, the single Drosophila COE (Collier/EBF/Olf-1) transcription factor, is required in several developmental processes, including head patterning and specification of muscle and neuron identity during embryogenesis. To identify direct Collier (Col) targets in different cell types, we used ChIP-seq to map Col binding sites throughout the genome, at mid-embryogenesis. In vivo Col binding peaks were associated to 415 potential direct target genes. Gene Ontology analysis revealed a strong enrichment in proteins with DNA binding and/or transcription-regulatory properties. Characterization of a selection of candidates, using transgenic CRM-reporter assays, identified direct Col targets in dorso-lateral somatic muscles and specific neuron types in the central nervous system. These data brought new evidence that Col direct control of the expression of the transcription regulators apterous and eyes-absent (eya) is critical to specifying neuronal identities. They also showed that cross-regulation between col and eya in muscle progenitor cells is required for specification of muscle identity, revealing a new parallel between the myogenic regulatory networks operating in Drosophila and vertebrates. Col regulation of eya, both in specific muscle and neuronal lineages, may illustrate one mechanism behind the evolutionary diversification of Col biological roles.
Collapse
Affiliation(s)
- Mathilde de Taffin
- Centre de Biologie du Développement, UMR 5547 CNRS Université de Toulouse 3, 118 route de Narbonne, F-31062, Toulouse cedex 09, France
| | - Yannick Carrier
- Centre de Biologie du Développement, UMR 5547 CNRS Université de Toulouse 3, 118 route de Narbonne, F-31062, Toulouse cedex 09, France
| | - Laurence Dubois
- Centre de Biologie du Développement, UMR 5547 CNRS Université de Toulouse 3, 118 route de Narbonne, F-31062, Toulouse cedex 09, France
| | - Laetitia Bataillé
- Centre de Biologie du Développement, UMR 5547 CNRS Université de Toulouse 3, 118 route de Narbonne, F-31062, Toulouse cedex 09, France
| | - Anaïs Painset
- Centre de Biologie du Développement, UMR 5547 CNRS Université de Toulouse 3, 118 route de Narbonne, F-31062, Toulouse cedex 09, France
- Plate-forme bio-informatique Genotoul/MIA-T, INRA, Borde Rouge, 31326, Castanet-Tolosan, France
| | - Stéphanie Le Gras
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, CNRS/INSERM/Université de Strasbourg, 67404, Illkirch, France
| | - Bernard Jost
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, CNRS/INSERM/Université de Strasbourg, 67404, Illkirch, France
| | - Michèle Crozatier
- Centre de Biologie du Développement, UMR 5547 CNRS Université de Toulouse 3, 118 route de Narbonne, F-31062, Toulouse cedex 09, France
| | - Alain Vincent
- Centre de Biologie du Développement, UMR 5547 CNRS Université de Toulouse 3, 118 route de Narbonne, F-31062, Toulouse cedex 09, France
- * E-mail:
| |
Collapse
|
20
|
Chekouo T, Stingo FC, Doecke JD, Do KA. miRNA-target gene regulatory networks: A Bayesian integrative approach to biomarker selection with application to kidney cancer. Biometrics 2015; 71:428-38. [PMID: 25639276 DOI: 10.1111/biom.12266] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2013] [Revised: 09/01/2014] [Accepted: 10/01/2014] [Indexed: 11/30/2022]
Abstract
The availability of cross-platform, large-scale genomic data has enabled the investigation of complex biological relationships for many cancers. Identification of reliable cancer-related biomarkers requires the characterization of multiple interactions across complex genetic networks. MicroRNAs are small non-coding RNAs that regulate gene expression; however, the direct relationship between a microRNA and its target gene is difficult to measure. We propose a novel Bayesian model to identify microRNAs and their target genes that are associated with survival time by incorporating the microRNA regulatory network through prior distributions. We assume that biomarkers involved in regulatory networks are likely associated with survival time. We employ non-local prior distributions and a stochastic search method for the selection of biomarkers associated with the survival outcome. We use KEGG pathway information to incorporate correlated gene effects within regulatory networks. Using simulation studies, we assess the performance of our method, and apply it to experimental data of kidney renal cell carcinoma (KIRC) obtained from The Cancer Genome Atlas. Our novel method validates previously identified cancer biomarkers and identifies biomarkers specific to KIRC progression that were not previously discovered. Using the KIRC data, we confirm that biomarkers involved in regulatory networks are more likely to be associated with survival time, showing connections in one regulatory network for five out of six such genes we identified.
Collapse
Affiliation(s)
- Thierry Chekouo
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, 1400 Pressler Street, Unit 1411, Texas, 77030-3722, U.S.A
| | - Francesco C Stingo
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, 1400 Pressler Street, Unit 1411, Texas, 77030-3722, U.S.A
| | - James D Doecke
- CSIRO Computational Informatics/Australian e-Health Research Centre Level 5, UQ Health Sciences Building, 901/16 Royal Brisbane, Queensland, 4029, Australia
| | - Kim-Anh Do
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, 1400 Pressler Street, Unit 1411, Texas, 77030-3722, U.S.A
| |
Collapse
|
21
|
Yang TH, Wang CC, Hung PC, Wu WS. cisMEP: an integrated repository of genomic epigenetic profiles and cis-regulatory modules in Drosophila. BMC SYSTEMS BIOLOGY 2014; 8 Suppl 4:S8. [PMID: 25521507 PMCID: PMC4290730 DOI: 10.1186/1752-0509-8-s4-s8] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
BACKGROUND Cis-regulatory modules (CRMs), or the DNA sequences required for regulating gene expression, play the central role in biological researches on transcriptional regulation in metazoan species. Nowadays, the systematic understanding of CRMs still mainly resorts to computational methods due to the time-consuming and small-scale nature of experimental methods. But the accuracy and reliability of different CRM prediction tools are still unclear. Without comparative cross-analysis of the results and combinatorial consideration with extra experimental information, there is no easy way to assess the confidence of the predicted CRMs. This limits the genome-wide understanding of CRMs. DESCRIPTION It is known that transcription factor binding and epigenetic profiles tend to determine functions of CRMs in gene transcriptional regulation. Thus integration of the genome-wide epigenetic profiles with systematically predicted CRMs can greatly help researchers evaluate and decipher the prediction confidence and possible transcriptional regulatory functions of these potential CRMs. However, these data are still fragmentary in the literatures. Here we performed the computational genome-wide screening for potential CRMs using different prediction tools and constructed the pioneer database, cisMEP (cis-regulatory module epigenetic profile database), to integrate these computationally identified CRMs with genomic epigenetic profile data. cisMEP collects the literature-curated TFBS location data and nine genres of epigenetic data for assessing the confidence of these potential CRMs and deciphering the possible CRM functionality. CONCLUSIONS cisMEP aims to provide a user-friendly interface for researchers to assess the confidence of different potential CRMs and to understand the functions of CRMs through experimentally-identified epigenetic profiles. The deposited potential CRMs and experimental epigenetic profiles for confidence assessment provide experimentally testable hypotheses for the molecular mechanisms of metazoan gene regulation. We believe that the information deposited in cisMEP will greatly facilitate the comparative usage of different CRM prediction tools and will help biologists to study the modular regulatory mechanisms between different TFs and their target genes.
Collapse
|
22
|
Zhu JG, Shen YH, Liu HL, Liu M, Shen YQ, Kong XQ, Song GX, Qian LM. Long noncoding RNAs expression profile of the developing mouse heart. J Cell Biochem 2014; 115:910-8. [PMID: 24375461 DOI: 10.1002/jcb.24733] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2013] [Accepted: 12/04/2013] [Indexed: 12/11/2022]
Abstract
Long noncoding RNAs (lncRNAs) represent a sub-group of noncoding RNAs that are longer than 200 nucleotides. The characterization of lncRNAs and their acceptance as crucial regulators of numerous developmental and biological pathways have suggested that the lncRNA study has gradually become one of the hot topics in the field of RNA biology. Many lncRNAs show spatially and temporally restricted expression patterns during embryogenesis and organogenesis. This study aimed to characterize the lncRNA profile of the fetal mouse heart at three key time points (embryonic day E11.5, E14.5, and E18.5) in its development, by performing a microarray lncRNAs screen. Gene Ontology analysis and ingenuity pathway analysis showed some significant gene functions and pathways were altered in heart development process. We compared lncRNAs profile between the three points (E14.5 vs. E11.5 [early development]; E18.5 vs. E14.5 [later development]). A total of 1,237 lncRNAs were found to have consistent fold changes (>2.0) between the three time points. Among them, 20 dysregulated lncRNAs were randomly selected and confirmed by real-time qRT-PCR. Additionally, bioinformatics analysis of AK011347 suggested it may be involved in heart development through the target gene Map3k7. In summary, this study identified differentially expressed lncRNAs in the three time points studied, and these lncRNAs may provide a new clue of mechanism of normal heart development.
Collapse
Affiliation(s)
- Jin Gai Zhu
- Department of Pediatrics, Nanjing Maternity and Child Health Care Hospital Affiliated to Nanjing Medical University, Nanjing, 210029, People's Republic of China
| | | | | | | | | | | | | | | |
Collapse
|
23
|
Benso A, Di Carlo S, Politano G, Savino A, Vasciaveo A. An extended gene protein/products Boolean network model including post-transcriptional regulation. Theor Biol Med Model 2014; 11 Suppl 1:S5. [PMID: 25080304 PMCID: PMC4108923 DOI: 10.1186/1742-4682-11-s1-s5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Background Networks Biology allows the study of complex interactions between biological systems using formal, well structured, and computationally friendly models. Several different network models can be created, depending on the type of interactions that need to be investigated. Gene Regulatory Networks (GRN) are an effective model commonly used to study the complex regulatory mechanisms of a cell. Unfortunately, given their intrinsic complexity and non discrete nature, the computational study of realistic-sized complex GRNs requires some abstractions. Boolean Networks (BNs), for example, are a reliable model that can be used to represent networks where the possible state of a node is a boolean value (0 or 1). Despite this strong simplification, BNs have been used to study both structural and dynamic properties of real as well as randomly generated GRNs. Results In this paper we show how it is possible to include the post-transcriptional regulation mechanism (a key process mediated by small non-coding RNA molecules like the miRNAs) into the BN model of a GRN. The enhanced BN model is implemented in a software toolkit (EBNT) that allows to analyze boolean GRNs from both a structural and a dynamic point of view. The open-source toolkit is compatible with available visualization tools like Cytoscape and allows to run detailed analysis of the network topology as well as of its attractors, trajectories, and state-space. In the paper, a small GRN built around the mTOR gene is used to demonstrate the main capabilities of the toolkit. Conclusions The extended model proposed in this paper opens new opportunities in the study of gene regulation. Several of the successful researches done with the support of BN to understand high-level characteristics of regulatory networks, can now be improved to better understand the role of post-transcriptional regulation for example as a network-wide noise-reduction or stabilization mechanisms.
Collapse
|
24
|
Laktionov PP, White-Cooper H, Maksimov DA, Belyakin SN. Transcription factor Comr acts as a direct activator in the genetic program controlling spermatogenesis in D. melanogaster. Mol Biol 2014. [DOI: 10.1134/s0026893314010087] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
25
|
Chai LE, Loh SK, Low ST, Mohamad MS, Deris S, Zakaria Z. A review on the computational approaches for gene regulatory network construction. Comput Biol Med 2014; 48:55-65. [PMID: 24637147 DOI: 10.1016/j.compbiomed.2014.02.011] [Citation(s) in RCA: 129] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2013] [Revised: 02/14/2014] [Accepted: 02/17/2014] [Indexed: 01/08/2023]
Abstract
Many biological research areas such as drug design require gene regulatory networks to provide clear insight and understanding of the cellular process in living cells. This is because interactions among the genes and their products play an important role in many molecular processes. A gene regulatory network can act as a blueprint for the researchers to observe the relationships among genes. Due to its importance, several computational approaches have been proposed to infer gene regulatory networks from gene expression data. In this review, six inference approaches are discussed: Boolean network, probabilistic Boolean network, ordinary differential equation, neural network, Bayesian network, and dynamic Bayesian network. These approaches are discussed in terms of introduction, methodology and recent applications of these approaches in gene regulatory network construction. These approaches are also compared in the discussion section. Furthermore, the strengths and weaknesses of these computational approaches are described.
Collapse
Affiliation(s)
- Lian En Chai
- Artificial Intelligence and Bioinformatics Research Group, Faculty of Computing, Universiti Teknologi Malaysia, Skudai, 81310 Johor, Malaysia
| | - Swee Kuan Loh
- Artificial Intelligence and Bioinformatics Research Group, Faculty of Computing, Universiti Teknologi Malaysia, Skudai, 81310 Johor, Malaysia
| | - Swee Thing Low
- Artificial Intelligence and Bioinformatics Research Group, Faculty of Computing, Universiti Teknologi Malaysia, Skudai, 81310 Johor, Malaysia
| | - Mohd Saberi Mohamad
- Artificial Intelligence and Bioinformatics Research Group, Faculty of Computing, Universiti Teknologi Malaysia, Skudai, 81310 Johor, Malaysia.
| | - Safaai Deris
- Artificial Intelligence and Bioinformatics Research Group, Faculty of Computing, Universiti Teknologi Malaysia, Skudai, 81310 Johor, Malaysia
| | - Zalmiyah Zakaria
- Artificial Intelligence and Bioinformatics Research Group, Faculty of Computing, Universiti Teknologi Malaysia, Skudai, 81310 Johor, Malaysia
| |
Collapse
|
26
|
Podsiadło A, Wrzesień M, Paja W, Rudnicki W, Wilczyński B. Active enhancer positions can be accurately predicted from chromatin marks and collective sequence motif data. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 6:S16. [PMID: 24565409 PMCID: PMC4029456 DOI: 10.1186/1752-0509-7-s6-s16] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
BACKGROUND Transcriptional regulation in multi-cellular organisms is a complex process involving multiple modular regulatory elements for each gene. Building whole-genome models of transcriptional networks requires mapping all relevant enhancers and then linking them to target genes. Previous methods of enhancer identification based either on sequence information or on epigenetic marks have different limitations stemming from incompleteness of each of these datasets taken separately. RESULTS In this work we present a new approach for discovery of regulatory elements based on the combination of sequence motifs and epigenetic marks measured with ChIP-Seq. Our method uses supervised learning approaches to train a model describing the dependence of enhancer activity on sequence features and histone marks. Our results indicate that using combination of features provides superior results to previous approaches based on either one of the datasets. While histone modifications remain the dominant feature for accurate predictions, the models based on sequence motifs have advantages in their general applicability to different tissues. Additionally, we assess the relevance of different sequence motifs in prediction accuracy showing that even tissue-specific enhancer activity depends on multiple motifs. CONCLUSIONS Based on our results, we conclude that it is worthwhile to include sequence motif data into computational approaches to active enhancer prediction and also that classifiers trained on a specific set of enhancers can generalize with significant accuracy beyond the training set.
Collapse
Affiliation(s)
- Agnieszka Podsiadło
- Institute of Informatics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland
| | - Mariusz Wrzesień
- University of Information Technology and Management in Rzeszów, Sucharskiego 2, 35-225 Rzeszów, Poland
| | - Wiesław Paja
- University of Information Technology and Management in Rzeszów, Sucharskiego 2, 35-225 Rzeszów, Poland
| | - Witold Rudnicki
- Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Pawińskiego 5A, 02-106 Warsaw, Poland
| | - Bartek Wilczyński
- Institute of Informatics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland
| |
Collapse
|
27
|
Meireles-Filho ACA, Bardet AF, Yáñez-Cuna JO, Stampfel G, Stark A. cis-regulatory requirements for tissue-specific programs of the circadian clock. Curr Biol 2013; 24:1-10. [PMID: 24332542 DOI: 10.1016/j.cub.2013.11.017] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Revised: 09/24/2013] [Accepted: 11/06/2013] [Indexed: 01/04/2023]
Abstract
BACKGROUND Broadly expressed transcriptions factors (TFs) control tissue-specific programs of gene expression through interactions with local TF networks. A prime example is the circadian clock: although the conserved TFs CLOCK (CLK) and CYCLE (CYC) control a transcriptional circuit throughout animal bodies, rhythms in behavior and physiology are generated tissue specifically. Yet, how CLK and CYC determine tissue-specific clock programs has remained unclear. RESULTS Here, we use a functional genomics approach to determine the cis-regulatory requirements for clock specificity. We first determine CLK and CYC genome-wide binding targets in heads and bodies by ChIP-seq and show that they have distinct DNA targets in the two tissue contexts. Computational dissection of CLK/CYC context-specific binding sites reveals sequence motifs for putative partner factors, which are predictive for individual binding sites. Among them, we show that the opa and GATA motifs, differentially enriched in head and body binding sites respectively, can be bound by OPA and SERPENT (SRP). They act synergistically with CLK/CYC in the Drosophila feedback loop, suggesting that they help to determine their direct targets and therefore orchestrate tissue-specific clock outputs. In addition, using in vivo transgenic assays, we validate that GATA motifs are required for proper tissue-specific gene expression in the adult fat body, midgut, and Malpighian tubules, revealing a cis-regulatory signature for enhancers of the peripheral circadian clock. CONCLUSIONS Our results reveal how universal clock circuits can regulate tissue-specific rhythms and, more generally, provide insights into the mechanism by which universal TFs can be modulated to drive tissue-specific programs of gene expression.
Collapse
Affiliation(s)
| | - Anaïs F Bardet
- Research Institute of Molecular Pathology (IMP), 1030 Vienna, Austria
| | - J Omar Yáñez-Cuna
- Research Institute of Molecular Pathology (IMP), 1030 Vienna, Austria
| | - Gerald Stampfel
- Research Institute of Molecular Pathology (IMP), 1030 Vienna, Austria
| | - Alexander Stark
- Research Institute of Molecular Pathology (IMP), 1030 Vienna, Austria.
| |
Collapse
|
28
|
Misra A, Sriram G. Network component analysis provides quantitative insights on an Arabidopsis transcription factor-gene regulatory network. BMC SYSTEMS BIOLOGY 2013; 7:126. [PMID: 24228871 PMCID: PMC3843564 DOI: 10.1186/1752-0509-7-126] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2013] [Accepted: 11/05/2013] [Indexed: 01/01/2023]
Abstract
Background Gene regulatory networks (GRNs) are models of molecule-gene interactions instrumental in the coordination of gene expression. Transcription factor (TF)-GRNs are an important subset of GRNs that characterize gene expression as the effect of TFs acting on their target genes. Although such networks can qualitatively summarize TF-gene interactions, it is highly desirable to quantitatively determine the strengths of the interactions in a TF-GRN as well as the magnitudes of TF activities. To our knowledge, such analysis is rare in plant biology. A computational methodology developed for this purpose is network component analysis (NCA), which has been used for studying large-scale microbial TF-GRNs to obtain nontrivial, mechanistic insights. In this work, we employed NCA to quantitatively analyze a plant TF-GRN important in floral development using available regulatory information from AGRIS, by processing previously reported gene expression data from four shoot apical meristem cell types. Results The NCA model satisfactorily accounted for gene expression measurements in a TF-GRN of seven TFs (LFY, AG, SEPALLATA3 [SEP3], AP2, AGL15, HY5 and AP3/PI) and 55 genes. NCA found strong interactions between certain TF-gene pairs including LFY → MYB17, AG → CRC, AP2 → RD20, AGL15 → RAV2 and HY5 → HLH1, and the direction of the interaction (activation or repression) for some AGL15 targets for which this information was not previously available. The activity trends of four TFs - LFY, AG, HY5 and AP3/PI as deduced by NCA correlated well with the changes in expression levels of the genes encoding these TFs across all four cell types; such a correlation was not observed for SEP3, AP2 and AGL15. Conclusions For the first time, we have reported the use of NCA to quantitatively analyze a plant TF-GRN important in floral development for obtaining nontrivial information about connectivity strengths between TFs and their target genes as well as TF activity. However, since NCA relies on documented connectivity information about the underlying TF-GRN, it is currently limited in its application to larger plant networks because of the lack of documented connectivities. In the future, the identification of interactions between plant TFs and their target genes on a genome scale would allow the use of NCA to provide quantitative regulatory information about plant TF-GRNs, leading to improved insights on cellular regulatory programs.
Collapse
Affiliation(s)
| | - Ganesh Sriram
- Department of Chemical and Biomolecular Engineering, University of Maryland, College Park, MD 20742, USA.
| |
Collapse
|
29
|
Song G, Shen Y, Zhu J, Liu H, Liu M, Shen YQ, Zhu S, Kong X, Yu Z, Qian L. Integrated analysis of dysregulated lncRNA expression in fetal cardiac tissues with ventricular septal defect. PLoS One 2013; 8:e77492. [PMID: 24147006 PMCID: PMC3797806 DOI: 10.1371/journal.pone.0077492] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2013] [Accepted: 08/28/2013] [Indexed: 01/17/2023] Open
Abstract
Ventricular septal defects (VSD) are the most common form of congenital heart disease, which is the leading non-infectious cause of death in children; nevertheless, the exact cause of VSD is not yet fully understood. Long non-coding RNAs (lncRNAs) have been shown to play key roles in various biological processes, such as imprinting control, circuitry controlling pluripotency and differentiation, immune responses and chromosome dynamics. Notably, a growing number of lncRNAs have been implicated in disease etiology, although an association with VSD has not been reported. In the present study, we conducted an integrated analysis of dysregulated lncRNAs, focusing specifically on the identification and characterization of lncRNAs potentially involving in initiation of VSD. Comparison of the transcriptome profiles of cardiac tissues from VSD-affected and normal hearts was performed using a second-generation lncRNA microarray, which covers the vast majority of expressed RefSeq transcripts (29,241 lncRNAs and 30,215 coding transcripts). In total, 880 lncRNAs were upregulated and 628 were downregulated in VSD. Furthermore, our established filtering pipeline indicated an association of two lncRNAs, ENST00000513542 and RP11-473L15.2, with VSD. This dysregulation of the lncRNA profile provides a novel insight into the etiology of VSD and furthermore, illustrates the intricate relationship between coding and ncRNA transcripts in cardiac development. These data may offer a background/reference resource for future functional studies of lncRNAs related to VSD.
Collapse
Affiliation(s)
- Guixian Song
- Department of Cardiology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, People's Republic of China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Ilsley GR, Fisher J, Apweiler R, DePace AH, Luscombe NM. Cellular resolution models for even skipped regulation in the entire Drosophila embryo. eLife 2013; 2:e00522. [PMID: 23930223 PMCID: PMC3736529 DOI: 10.7554/elife.00522] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Accepted: 06/17/2013] [Indexed: 12/14/2022] Open
Abstract
Transcriptional control ensures genes are expressed in the right amounts at the correct times and locations. Understanding quantitatively how regulatory systems convert input signals to appropriate outputs remains a challenge. For the first time, we successfully model even skipped (eve) stripes 2 and 3+7 across the entire fly embryo at cellular resolution. A straightforward statistical relationship explains how transcription factor (TF) concentrations define eve's complex spatial expression, without the need for pairwise interactions or cross-regulatory dynamics. Simulating thousands of TF combinations, we recover known regulators and suggest new candidates. Finally, we accurately predict the intricate effects of perturbations including TF mutations and misexpression. Our approach imposes minimal assumptions about regulatory function; instead we infer underlying mechanisms from models that best fit the data, like the lack of TF-specific thresholds and the positional value of homotypic interactions. Our study provides a general and quantitative method for elucidating the regulation of diverse biological systems. DOI:http://dx.doi.org/10.7554/eLife.00522.001.
Collapse
Affiliation(s)
- Garth R Ilsley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom
- Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Jasmin Fisher
- Microsoft Research Cambridge, Cambridge, United Kingdom
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Rolf Apweiler
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom
| | - Angela H DePace
- Department of Systems Biology, Harvard Medical School, Boston, United States
| | - Nicholas M Luscombe
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom
- Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
- UCL Genetics Institute, Department of Genetics, Evolution, and Environment, University College London, London, United Kingdom
- London Research Institute, Cancer Research UK, London, United Kingdom
| |
Collapse
|
31
|
|
32
|
Dynamic interpretation of maternal inputs by the Drosophila segmentation gene network. Proc Natl Acad Sci U S A 2013; 110:6724-9. [PMID: 23580621 DOI: 10.1073/pnas.1220912110] [Citation(s) in RCA: 79] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Patterning of body parts in multicellular organisms relies on the interpretation of transcription factor (TF) concentrations by genetic networks. To determine the extent by which absolute TF concentration dictates gene expression and morphogenesis programs that ultimately lead to patterns in Drosophila embryos, we manipulate maternally supplied patterning determinants and measure readout concentration at the position of various developmental markers. When we increase the overall amount of the maternal TF Bicoid (Bcd) fivefold, Bcd concentrations in cells at positions of the cephalic furrow, an early morphological marker, differ by a factor of 2. This finding apparently contradicts the traditional threshold-dependent readout model, which predicts that the Bcd concentrations at these positions should be identical. In contrast, Bcd concentration at target gene expression boundaries is nearly unchanged early in development but adjusts dynamically toward the same twofold change as development progresses. Thus, the Drosophila segmentation gene network responds faithfully to Bcd concentration during early development, in agreement with the threshold model, but subsequently partially adapts in response to altered Bcd dosage, driving segmentation patterns toward their WT positions. This dynamic response requires other maternal regulators, such as Torso and Nanos, suggesting that integration of maternal input information is not achieved through molecular interactions at the time of readout but through the subsequent collective interplay of the network.
Collapse
|
33
|
Rockel S, Geertz M, Hens K, Deplancke B, Maerkl SJ. iSLIM: a comprehensive approach to mapping and characterizing gene regulatory networks. Nucleic Acids Res 2012; 41:e52. [PMID: 23258699 PMCID: PMC3575842 DOI: 10.1093/nar/gks1323] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Mapping gene regulatory networks is a significant challenge in systems biology, yet only a few methods are currently capable of systems-level identification of transcription factors (TFs) that bind a specific regulatory element. We developed a microfluidic method for integrated systems-level interaction mapping of TF-DNA interactions, generating and interrogating an array of 423 full-length Drosophila TFs. With integrated systems-level interaction mapping, it is now possible to rapidly and quantitatively map gene regulatory networks of higher eukaryotes.
Collapse
Affiliation(s)
- Sylvie Rockel
- Laboratory of Biological Network Characterization, Institute of Bioengineering, School of Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | | | | | | | | |
Collapse
|
34
|
Baitaluk M, Kozhenkov S, Ponomarenko J. An integrative approach to inferring gene regulatory module networks. PLoS One 2012; 7:e52836. [PMID: 23285197 PMCID: PMC3527610 DOI: 10.1371/journal.pone.0052836] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2012] [Accepted: 11/22/2012] [Indexed: 12/31/2022] Open
Abstract
Background Gene regulatory networks (GRNs) provide insight into the mechanisms of differential gene expression at a system level. However, the methods for inference, functional analysis and visualization of gene regulatory modules and GRNs require the user to collect heterogeneous data from many sources using numerous bioinformatics tools. This makes the analysis expensive and time-consuming. Results In this work, the BiologicalNetworks application–the data integration and network based research environment–was extended with tools for inference and analysis of gene regulatory modules and networks. The backend database of the application integrates public data on gene expression, pathways, transcription factor binding sites, gene and protein sequences, and functional annotations. Thus, all data essential for the gene regulation analysis can be mined publicly. In addition, the user’s data can either be integrated in the database and become public, or kept private within the application. The capabilities to analyze multiple gene expression experiments are also provided. Conclusion The generated modular networks, regulatory modules and binding sites can be visualized and further analyzed within this same application. The developed tools were applied to the mouse model of asthma and the OCT4 regulatory network in embryonic stem cells. Developed methods and data are available through the Java application from BiologicalNetworks program at http://www.biologicalnetworks.org.
Collapse
Affiliation(s)
- Michael Baitaluk
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California, United States of America
| | - Sergey Kozhenkov
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California, United States of America
| | - Julia Ponomarenko
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California, United States of America
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
35
|
Streit A, Tambalo M, Chen J, Grocott T, Anwar M, Sosinsky A, Stern CD. Experimental approaches for gene regulatory network construction: the chick as a model system. Genesis 2012; 51:296-310. [PMID: 23174848 DOI: 10.1002/dvg.22359] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Revised: 11/09/2012] [Accepted: 11/11/2012] [Indexed: 01/23/2023]
Abstract
Setting up the body plan during embryonic development requires the coordinated action of many signals and transcriptional regulators in a precise temporal sequence and spatial pattern. The last decades have seen an explosion of information describing the molecular control of many developmental processes. The next challenge is to integrate this information into logic "wiring diagrams" that visualize gene actions and outputs, have predictive power and point to key control nodes. Here, we provide an experimental workflow on how to construct gene regulatory networks using the chick as model system.
Collapse
Affiliation(s)
- Andrea Streit
- Department of Craniofacial Development and Stem Cell Biology, King's College London, London, United Kingdom.
| | | | | | | | | | | | | |
Collapse
|
36
|
Abstract
Heart function requires sophisticated regulatory networks to orchestrate organ development, physiological responses, and environmental adaptation. Until recently, it was thought that these regulatory networks are composed solely of protein-mediated transcriptional control and signaling systems; consequently, it was thought that cardiac disease involves perturbation of these systems. However, it is becoming evident that RNA, long considered to function primarily as the platform for protein production, may in fact play a major role in most, if not all, aspects of gene regulation, especially the epigenetic processes that underpin organogenesis. These include not only well-validated classes of regulatory RNAs, such as microRNAs, but also tens of thousands of long noncoding RNAs that are differentially expressed across the entire genome of humans and other animals. Here, we review this emerging landscape, summarizing what is known about their functions and their role in cardiac biology, and provide a toolkit to assist in exploring this previously hidden layer of gene regulation that may underpin heart adaptation and complex heart diseases.
Collapse
Affiliation(s)
- Nicole Schonrock
- From the Victor Chang Cardiac Research Institute, Darlinghurst, New South Wales, Australia (N.S., R.R.H.); St. Vincent’s Clinical School, Faculty of Medicine, University of New South Wales, Kensington, New South Wales, Australia (N.S., R.P.H., J.S.M.); and Garvan Institute of Medical Research, Darlinghurst, New South Wales, Australia (J.S.M.)
| | - Richard P. Harvey
- From the Victor Chang Cardiac Research Institute, Darlinghurst, New South Wales, Australia (N.S., R.R.H.); St. Vincent’s Clinical School, Faculty of Medicine, University of New South Wales, Kensington, New South Wales, Australia (N.S., R.P.H., J.S.M.); and Garvan Institute of Medical Research, Darlinghurst, New South Wales, Australia (J.S.M.)
| | - John S. Mattick
- From the Victor Chang Cardiac Research Institute, Darlinghurst, New South Wales, Australia (N.S., R.R.H.); St. Vincent’s Clinical School, Faculty of Medicine, University of New South Wales, Kensington, New South Wales, Australia (N.S., R.P.H., J.S.M.); and Garvan Institute of Medical Research, Darlinghurst, New South Wales, Australia (J.S.M.)
| |
Collapse
|
37
|
Linksvayer TA, Fewell JH, Gadau J, Laubichler MD. Developmental evolution in social insects: regulatory networks from genes to societies. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2012; 318:159-69. [PMID: 22544713 DOI: 10.1002/jez.b.22001] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The evolution and development of complex phenotypes in social insect colonies, such as queen-worker dimorphism or division of labor, can, in our opinion, only be fully understood within an expanded mechanistic framework of Developmental Evolution. Conversely, social insects offer a fertile research area in which fundamental questions of Developmental Evolution can be addressed empirically. We review the concept of gene regulatory networks (GRNs) that aims to fully describe the battery of interacting genomic modules that are differentially expressed during the development of individual organisms. We discuss how distinct types of network models have been used to study different levels of biological organization in social insects, from GRNs to social networks. We propose that these hierarchical networks spanning different organizational levels from genes to societies should be integrated and incorporated into full GRN models to elucidate the evolutionary and developmental mechanisms underlying social insect phenotypes. Finally, we discuss prospects and approaches to achieve such an integration.
Collapse
Affiliation(s)
- Timothy A Linksvayer
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
| | | | | | | |
Collapse
|
38
|
|
39
|
Analysis of Cryptococcus neoformans sexual development reveals rewiring of the pheromone-response network by a change in transcription factor identity. Genetics 2012; 191:435-49. [PMID: 22466042 DOI: 10.1534/genetics.112.138958] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The fundamental mechanisms that control eukaryotic development include extensive regulation at the level of transcription. Gene regulatory networks, composed of transcription factors, their binding sites in DNA, and their target genes, are responsible for executing transcriptional programs. While divergence of these control networks drives species-specific gene expression that contributes to biological diversity, little is known about the mechanisms by which these networks evolve. To investigate how network evolution has occurred in fungi, we used a combination of microarray expression profiling, cis-element identification, and transcription-factor characterization during sexual development of the human fungal pathogen Cryptococcus neoformans. We first defined the major gene expression changes that occur over time throughout sexual development. Through subsequent bioinformatic and molecular genetic analyses, we identified and functionally characterized the C. neoformans pheromone-response element (PRE). We then discovered that transcriptional activation via the PRE requires direct binding of the high-mobility transcription factor Mat2, which we conclude functions as the elusive C. neoformans pheromone-response factor. This function of Mat2 distinguishes the mechanism of regulation through the PRE of C. neoformans from all other fungal systems studied to date and reveals species-specific adaptations of a fungal transcription factor that defies predictions on the basis of sequence alone. Overall, our findings reveal that pheromone-response network rewiring has occurred at the level of transcription factor identity, despite the strong conservation of upstream and downstream components, and serve as a model for how selection pressures act differently on signaling vs. gene regulatory components during eukaryotic evolution.
Collapse
|
40
|
Marbach D, Roy S, Ay F, Meyer PE, Candeias R, Kahveci T, Bristow CA, Kellis M. Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks. Genome Res 2012; 22:1334-49. [PMID: 22456606 PMCID: PMC3396374 DOI: 10.1101/gr.127191.111] [Citation(s) in RCA: 89] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein–protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level.
Collapse
Affiliation(s)
- Daniel Marbach
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | | | | | | | | | | | | | |
Collapse
|
41
|
Haye A, Albert J, Rooman M. Robust non-linear differential equation models of gene expression evolution across Drosophila development. BMC Res Notes 2012; 5:46. [PMID: 22260205 PMCID: PMC3398324 DOI: 10.1186/1756-0500-5-46] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2011] [Accepted: 01/19/2012] [Indexed: 01/20/2023] Open
Abstract
Background This paper lies in the context of modeling the evolution of gene expression away from stationary states, for example in systems subject to external perturbations or during the development of an organism. We base our analysis on experimental data and proceed in a top-down approach, where we start from data on a system's transcriptome, and deduce rules and models from it without a priori knowledge. We focus here on a publicly available DNA microarray time series, representing the transcriptome of Drosophila across evolution from the embryonic to the adult stage. Results In the first step, genes were clustered on the basis of similarity of their expression profiles, measured by a translation-invariant and scale-invariant distance that proved appropriate for detecting transitions between development stages. Average profiles representing each cluster were computed and their time evolution was analyzed using coupled differential equations. A linear and several non-linear model structures involving a transcription and a degradation term were tested. The parameters were identified in three steps: determination of the strongest connections between genes, optimization of the parameters defining these connections, and elimination of the unnecessary parameters using various reduction schemes. Different solutions were compared on the basis of their abilities to reproduce the data, to keep realistic gene expression levels when extrapolated in time, to show the biologically expected robustness with respect to parameter variations, and to contain as few parameters as possible. Conclusions We showed that the linear model did very well in reproducing the data with few parameters, but was not sufficiently robust and yielded unrealistic values upon extrapolation in time. In contrast, the non-linear models all reached the latter two objectives, but some were unable to reproduce the data. A family of non-linear models, constructed from the exponential of linear combinations of expression levels, reached all the objectives. It defined networks with a mean number of connections equal to two, when restricted to the embryonic time series, and equal to five for the full time series. These networks were compared with experimental data about gene-transcription factor and protein-protein interactions. The non-uniqueness of the solutions was discussed in the context of plasticity and cluster versus single-gene networks.
Collapse
Affiliation(s)
- Alexandre Haye
- BioSystems, BioModeling & BioProcesses Department, Université Libre de Bruxelles, CP 165/61, Avenue Roosevelt 50, 1050 Bruxelles, Belgium
| | | | | |
Collapse
|
42
|
Wunderlich Z, DePace AH. Modeling transcriptional networks in Drosophila development at multiple scales. Curr Opin Genet Dev 2011; 21:711-8. [PMID: 21889888 DOI: 10.1016/j.gde.2011.07.005] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2011] [Accepted: 07/20/2011] [Indexed: 11/29/2022]
Abstract
Quantitative models of developmental processes can provide insights at multiple scales. Ultimately, models may be particularly informative for key questions about network level behavior during development such as how does the system respond to environmental perturbation, or operate reliably in different genetic backgrounds? The transcriptional networks that pattern the Drosophila embryo have been the subject of numerous quantitative experimental studies coupled to modeling frameworks in recent years. In this review, we describe three studies that consider these networks at different levels of molecular detail and therefore result in different types of insights. We also discuss other developmental transcriptional networks operating in Drosophila, with the goal of highlighting what additional insights they may provide.
Collapse
Affiliation(s)
- Zeba Wunderlich
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | | |
Collapse
|
43
|
Bronstein R, Segal D. Modularity of CHIP/LDB transcription complexes regulates cell differentiation. Fly (Austin) 2011; 5:200-5. [PMID: 21406967 DOI: 10.4161/fly.5.3.14854] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Transcription is the first step through which the cell operates, via its repertoire of transcription complexes, to direct cellular functions and cellular identity by generating the cell-specific transcriptome. The modularity of the composition of constituents of these complexes allows the cell to delicately regulate its transcriptome. In a recent study we have examined the effects of reducing the levels of specific transcription co-factors on the function of two competing transcription complexes, namely CHIP-AP and CHIP-PNR which regulate development of cells in the thorax of Drosophila. We found that changing the availability of these co-factors can shift the balance between these complexes leading to transition from utilization of CHIP-AP to CHIP-PNR. This is reflected in change in the expression profile of target genes, altering developmental cell fates. We propose that such a mechanism may operate in normal fly development. Transcription complexes analogous to CHIP-AP and CHIP-PNR exist in mammals and we discuss how such a shift in the balance between them may operate in normal mammalian development.
Collapse
Affiliation(s)
- Revital Bronstein
- Department of Molecular Microbiology and Biotechnology, Tel Aviv University, Tel Aviv, Israel
| | | |
Collapse
|
44
|
Waxman JS, Yelon D. Zebrafish retinoic acid receptors function as context-dependent transcriptional activators. Dev Biol 2011; 352:128-40. [PMID: 21276787 DOI: 10.1016/j.ydbio.2011.01.022] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2009] [Revised: 01/18/2011] [Accepted: 01/19/2011] [Indexed: 11/17/2022]
Abstract
RA receptors (RARs) have been thought to function through a binary repressor-activator mechanism: in the absence of ligand, they function as transcriptional repressors, and, in the presence of ligand, they function as transcriptional activators. This prevailing model of RAR mechanism has been derived mostly from in vitro studies and has not been widely tested in developmental contexts. Here, we investigate whether zebrafish RARs function as transcriptional activators or repressors during early embryonic anterior-posterior patterning. Ectopic expression of wild-type zebrafish RARs does not disrupt embryonic patterning and does not sensitize embryos to RA treatment, indicating that RAR availability is not limiting in the embryo. In contrast, ectopic expression of hyperactive zebrafish RARs induces expression of a RA-responsive reporter transgene as well as ectopic expression of endogenous RA-responsive target genes. However, ectopic expression of dominant negative zebrafish RARs fails to induce embryonic phenotypes that are consistent with loss of RA signaling, despite their ability to function as transcriptional repressors in heterologous cell culture assays. Together, our studies suggest that zebrafish RAR function is context-dependent and that, during early patterning, zebrafish RARs function primarily as transcriptional activators and may only have minimal ability to act as transcriptional repressors. Thus, it seems that the binary model for RAR function does not apply to all in vivo scenarios. Taking into account studies of RA signaling in tunicates and tetrapods, we propose a parsimonious model of the evolution of RAR function during chordate anterior-posterior patterning.
Collapse
Affiliation(s)
- Joshua S Waxman
- Developmental Genetics Program and Department of Cell Biology, Kimmel Center for Biology and Medicine at the Skirball Institute of Biomolecular Medicine, New York University School of Medicine, New York, NY, 10016, USA
| | | |
Collapse
|