1
|
Hawe JS, Saha A, Waldenberger M, Kunze S, Wahl S, Müller-Nurasyid M, Prokisch H, Grallert H, Herder C, Peters A, Strauch K, Theis FJ, Gieger C, Chambers J, Battle A, Heinig M. Network reconstruction for trans acting genetic loci using multi-omics data and prior information. Genome Med 2022; 14:125. [PMID: 36344995 PMCID: PMC9641770 DOI: 10.1186/s13073-022-01124-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 10/11/2022] [Indexed: 11/09/2022] Open
Abstract
BACKGROUND Molecular measurements of the genome, the transcriptome, and the epigenome, often termed multi-omics data, provide an in-depth view on biological systems and their integration is crucial for gaining insights in complex regulatory processes. These data can be used to explain disease related genetic variants by linking them to intermediate molecular traits (quantitative trait loci, QTL). Molecular networks regulating cellular processes leave footprints in QTL results as so-called trans-QTL hotspots. Reconstructing these networks is a complex endeavor and use of biological prior information can improve network inference. However, previous efforts were limited in the types of priors used or have only been applied to model systems. In this study, we reconstruct the regulatory networks underlying trans-QTL hotspots using human cohort data and data-driven prior information. METHODS We devised a new strategy to integrate QTL with human population scale multi-omics data. State-of-the art network inference methods including BDgraph and glasso were applied to these data. Comprehensive prior information to guide network inference was manually curated from large-scale biological databases. The inference approach was extensively benchmarked using simulated data and cross-cohort replication analyses. Best performing methods were subsequently applied to real-world human cohort data. RESULTS Our benchmarks showed that prior-based strategies outperform methods without prior information in simulated data and show better replication across datasets. Application of our approach to human cohort data highlighted two novel regulatory networks related to schizophrenia and lean body mass for which we generated novel functional hypotheses. CONCLUSIONS We demonstrate that existing biological knowledge can improve the integrative analysis of networks underlying trans associations and generate novel hypotheses about regulatory mechanisms.
Collapse
Affiliation(s)
- Johann S Hawe
- Institute of Computational Biology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,German Heart Centre Munich, Department of Cardiology, Technical University Munich, Munich, Germany.,Department of Informatics, Technical University of Munich, Garching, Germany
| | - Ashis Saha
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Melanie Waldenberger
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany
| | - Sonja Kunze
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany
| | - Simone Wahl
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany
| | - Martina Müller-Nurasyid
- Institute of Genetic Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,IBE, Faculty of Medicine, LMU Munich, 81377, Munich, Germany.,Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, Mainz, Germany.,Department of Internal Medicine I (Cardiology), Hospital of the Ludwig-Maximilians-University (LMU) Munich, Munich, Germany
| | - Holger Prokisch
- Institute of Human Genetics, School of Medicine, Technische Universität München, Munich, Germany
| | - Harald Grallert
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,Institute of Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - Christian Herder
- German Center for Diabetes Research (DZD), Neuherberg, Germany.,Institute for Clinical Diabetology, German Diabetes Center, Leibniz Center for Diabetes Research at Heinrich Heine University, Düsseldorf, Germany.,Division of Endocrinology and Diabetology, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Annette Peters
- Institute of Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany
| | - Konstantin Strauch
- Institute of Genetic Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, Mainz, Germany.,Chair of Genetic Epidemiology, IBE, Faculty of Medicine, LMU Munich, Munich, Germany
| | - Fabian J Theis
- Department of Informatics, Technical University of Munich, Garching, Germany.,Department of Mathematics, Technical University of Munich, Garching, Germany
| | - Christian Gieger
- Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,Institute of Epidemiology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Neuherberg, Germany
| | - John Chambers
- Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London, UK.,Lee Kong Chian School of Medicine, Nanyang Technological University, 308232, Singapore, Singapore
| | - Alexis Battle
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Matthias Heinig
- Institute of Computational Biology, German Research Center for Environmental Health, HelmholtzZentrum München, Neuherberg, Germany. .,Department of Informatics, Technical University of Munich, Garching, Germany. .,Munich Heart Association, Partner Site Munich, DZHK (German Centre for Cardiovascular Research), 10785, Berlin, Germany.
| |
Collapse
|
2
|
Emad A, Sinha S. Inference of phenotype-relevant transcriptional regulatory networks elucidates cancer type-specific regulatory mechanisms in a pan-cancer study. NPJ Syst Biol Appl 2021; 7:9. [PMID: 33558504 PMCID: PMC7870953 DOI: 10.1038/s41540-021-00169-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Accepted: 01/05/2021] [Indexed: 01/30/2023] Open
Abstract
Reconstruction of transcriptional regulatory networks (TRNs) is a powerful approach to unravel the gene expression programs involved in healthy and disease states of a cell. However, these networks are usually reconstructed independent of the phenotypic (or clinical) properties of the samples. Therefore, they may confound regulatory mechanisms that are specifically related to a phenotypic property with more general mechanisms underlying the full complement of the analyzed samples. In this study, we develop a method called InPheRNo to identify "phenotype-relevant" TRNs. This method is based on a probabilistic graphical model that models the simultaneous effects of multiple transcription factors (TFs) on their target genes and the statistical relationship between the target genes' expression and the phenotype. Extensive comparison of InPheRNo with related approaches using primary tumor samples of 18 cancer types from The Cancer Genome Atlas reveals that InPheRNo can accurately reconstruct cancer type-relevant TRNs and identify cancer driver TFs. In addition, survival analysis reveals that the activity level of TFs with many target genes could distinguish patients with poor prognosis from those with better prognosis.
Collapse
Affiliation(s)
- Amin Emad
- Department of Electrical and Computer Engineering, McGill University, Montreal, QC, Canada.
| | - Saurabh Sinha
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
| |
Collapse
|
3
|
Sadygov VR, Zhang W, Sadygov RG. Timepoint Selection Strategy for In Vivo Proteome Dynamics from Heavy Water Metabolic Labeling and LC-MS. J Proteome Res 2020; 19:2105-2112. [PMID: 32183509 DOI: 10.1021/acs.jproteome.0c00023] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Protein homeostasis, proteostasis, is essential for healthy cell functioning and is dysregulated in many diseases. Metabolic labeling with heavy water followed by liquid chromatography coupled online to mass spectrometry (LC-MS) is a powerful high-throughput technique to study proteome dynamics in vivo. Longer labeling duration and dense timepoint sampling (TPS) of tissues provide accurate proteome dynamics estimations. However, the experiments are expensive, and they require animal housing and care, as well as labeling with stable isotopes. Often, the animals are sacrificed at selected timepoints to collect tissues. Therefore, it is necessary to optimize TPS for a given number of sampling points and labeling duration and target a specific tissue of study. Currently, such techniques are missing in proteomics. Here, we report on a formula-based stochastic simulation strategy for TPS for in vivo studies with heavy water metabolic labeling and LC-MS. We model the rate constant (lognormal), measurement error (Laplace), peptide length (gamma), relative abundance of the monoisotopic peak (beta regression), and the number of exchangeable hydrogens (gamma regression). The parameters of the distributions are determined using the corresponding empirical probability density functions from a large-scale dataset of murine heart proteome. The models are used in the simulations of the rate constant to minimize the root-mean-square error (rmse). The rmse for different TPSs shows structured patterns. They are analyzed to elucidate common features in the patterns.
Collapse
Affiliation(s)
- Vugar R Sadygov
- Clear Creek High School, 2305 E. Main Street, League City, Texas 77573, United States
| | - William Zhang
- Department of Computer Science, The University of Texas, 2317 Speedway, Stop D9500, Austin, Texas 78712, United States
| | - Rovshan G Sadygov
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, 301 University of Blvd, Galveston, Texas 77555, United States
| |
Collapse
|
4
|
Liang X, Young WC, Hung LH, Raftery AE, Yeung KY. Integration of Multiple Data Sources for Gene Network Inference Using Genetic Perturbation Data. J Comput Biol 2019; 26:1113-1129. [PMID: 31009236 PMCID: PMC6786343 DOI: 10.1089/cmb.2019.0036] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The inference of gene networks from large-scale human genomic data is challenging due to the difficulty in identifying correct regulators for each gene in a high-dimensional search space. We present a Bayesian approach integrating external data sources with knockdown data from human cell lines to infer gene regulatory networks. In particular, we assemble multiple data sources, including gene expression data, genome-wide binding data, gene ontology, and known pathways, and use a supervised learning framework to compute prior probabilities of regulatory relationships. We show that our integrated method improves the accuracy of inferred gene networks as well as extends some previous Bayesian frameworks both in theory and applications. We apply our method to two different human cell lines, namely skin melanoma cell line A375 and lung cancer cell line A549, to illustrate the capabilities of our method. Our results show that the improvement in performance could vary from cell line to cell line and that we might need to choose different external data sources serving as prior knowledge if we hope to obtain better accuracy for different cell lines.
Collapse
Affiliation(s)
- Xiao Liang
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia
| | - William Chad Young
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Ling-Hong Hung
- School of Engineering and Technology, University of Washington, Tacoma, Washington
| | - Adrian E. Raftery
- Department of Statistics, University of Washington, Seattle, Washington
| | - Ka Yee Yeung
- School of Engineering and Technology, University of Washington, Tacoma, Washington
| |
Collapse
|
5
|
Global transcriptional regulatory network for Escherichia coli robustly connects gene expression to transcription factor activities. Proc Natl Acad Sci U S A 2017; 114:10286-10291. [PMID: 28874552 DOI: 10.1073/pnas.1702581114] [Citation(s) in RCA: 61] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Transcriptional regulatory networks (TRNs) have been studied intensely for >25 y. Yet, even for the Escherichia coli TRN-probably the best characterized TRN-several questions remain. Here, we address three questions: (i) How complete is our knowledge of the E. coli TRN; (ii) how well can we predict gene expression using this TRN; and (iii) how robust is our understanding of the TRN? First, we reconstructed a high-confidence TRN (hiTRN) consisting of 147 transcription factors (TFs) regulating 1,538 transcription units (TUs) encoding 1,764 genes. The 3,797 high-confidence regulatory interactions were collected from published, validated chromatin immunoprecipitation (ChIP) data and RegulonDB. For 21 different TF knockouts, up to 63% of the differentially expressed genes in the hiTRN were traced to the knocked-out TF through regulatory cascades. Second, we trained supervised machine learning algorithms to predict the expression of 1,364 TUs given TF activities using 441 samples. The algorithms accurately predicted condition-specific expression for 86% (1,174 of 1,364) of the TUs, while 193 TUs (14%) were predicted better than random TRNs. Third, we identified 10 regulatory modules whose definitions were robust against changes to the TRN or expression compendium. Using surrogate variable analysis, we also identified three unmodeled factors that systematically influenced gene expression. Our computational workflow comprehensively characterizes the predictive capabilities and systems-level functions of an organism's TRN from disparate data types.
Collapse
|
6
|
Schulze S, Schleicher J, Guthke R, Linde J. How to Predict Molecular Interactions between Species? Front Microbiol 2016; 7:442. [PMID: 27065992 PMCID: PMC4814556 DOI: 10.3389/fmicb.2016.00442] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Accepted: 03/18/2016] [Indexed: 12/21/2022] Open
Abstract
Organisms constantly interact with other species through physical contact which leads to changes on the molecular level, for example the transcriptome. These changes can be monitored for all genes, with the help of high-throughput experiments such as RNA-seq or microarrays. The adaptation of the gene expression to environmental changes within cells is mediated through complex gene regulatory networks. Often, our knowledge of these networks is incomplete. Network inference predicts gene regulatory interactions based on transcriptome data. An emerging application of high-throughput transcriptome studies are dual transcriptomics experiments. Here, the transcriptome of two or more interacting species is measured simultaneously. Based on a dual RNA-seq data set of murine dendritic cells infected with the fungal pathogen Candida albicans, the software tool NetGenerator was applied to predict an inter-species gene regulatory network. To promote further investigations of molecular inter-species interactions, we recently discussed dual RNA-seq experiments for host-pathogen interactions and extended the applied tool NetGenerator (Schulze et al., 2015). The updated version of NetGenerator makes use of measurement variances in the algorithmic procedure and accepts gene expression time series data with missing values. Additionally, we tested multiple modeling scenarios regarding the stimuli functions of the gene regulatory network. Here, we summarize the work by Schulze et al. (2015) and put it into a broader context. We review various studies making use of the dual transcriptomics approach to investigate the molecular basis of interacting species. Besides the application to host-pathogen interactions, dual transcriptomics data are also utilized to study mutualistic and commensalistic interactions. Furthermore, we give a short introduction into additional approaches for the prediction of gene regulatory networks and discuss their application to dual transcriptomics data. We conclude that the application of network inference on dual-transcriptomics data is a promising approach to predict molecular inter-species interactions.
Collapse
Affiliation(s)
- Sylvie Schulze
- Research Group Systems Biology and Bioinformatics, Leibniz-Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute Jena, Germany
| | - Jana Schleicher
- Research Group Systems Biology and Bioinformatics, Leibniz-Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute Jena, Germany
| | - Reinhard Guthke
- Research Group Systems Biology and Bioinformatics, Leibniz-Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute Jena, Germany
| | - Jörg Linde
- Research Group Systems Biology and Bioinformatics, Leibniz-Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute Jena, Germany
| |
Collapse
|
7
|
Shirai K, Saika S. Ocular surface mucins and local inflammation--studies in genetically modified mouse lines. BMC Ophthalmol 2015; 15 Suppl 1:154. [PMID: 26818460 PMCID: PMC4895702 DOI: 10.1186/s12886-015-0137-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Mucins locate to the apical surfaces of all wet-surfaced epithelia including ocular surface. The functions of the mucins include anti-adhesive, lubrication, water retention, allergens and pathogen barrier function. Ocular surface pathologies, i.e. dry eye syndrome or allergic conjunctivitis, are reportedly associated with alteration of expression pattern of mucin components. Recent investigations indicated anti-bacterial adhesion or anti-inflammatory effects of members of mucins in non-ocular tissues, i.e., gastrointestinal tracts or airway tissues, by using genetically modified mouse lines that lacks an expression of a mucin member. However, examination of ocular phenotypes of each of mucin gene-ablated mouse lines has not yet fully performed. Muc16-dficient mouse is associated with spontaneous subclinical inflammation in conjunctiva. The article reviews the roles of mucin members in modulation of local inflammation in mucous membrane tissues and phenotype of mouse lines with the loss of a mucin gene. Analysis of ocular surface of mucin-gene related mutant mouse lines are to be further performed.
Collapse
Affiliation(s)
- Kumi Shirai
- Department of Ophthalmology, Wakayama Medical University School of Medicine, 811-1 Kimiidera, Wakayama, 641-0012, Japan.
| | - Shizuya Saika
- Department of Ophthalmology, Wakayama Medical University School of Medicine, 811-1 Kimiidera, Wakayama, 641-0012, Japan.
| |
Collapse
|
8
|
Cokelaer T, Bansal M, Bare C, Bilal E, Bot BM, Chaibub Neto E, Eduati F, Gönen M, Hill SM, Hoff B, Karr JR, Küffner R, Menden MP, Meyer P, Norel R, Pratap A, Prill RJ, Weirauch MT, Costello JC, Stolovitzky G, Saez-Rodriguez J. DREAMTools: a Python package for scoring collaborative challenges. F1000Res 2015; 4:1030. [PMID: 27134723 DOI: 10.12688/f1000research.7118.1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/06/2015] [Indexed: 05/31/2024] Open
Abstract
UNLABELLED DREAM challenges are community competitions designed to advance computational methods and address fundamental questions in system biology and translational medicine. Each challenge asks participants to develop and apply computational methods to either predict unobserved outcomes or to identify unknown model parameters given a set of training data. Computational methods are evaluated using an automated scoring metric, scores are posted to a public leaderboard, and methods are published to facilitate community discussions on how to build improved methods. By engaging participants from a wide range of science and engineering backgrounds, DREAM challenges can comparatively evaluate a wide range of statistical, machine learning, and biophysical methods. Here, we describe DREAMTools, a Python package for evaluating DREAM challenge scoring metrics. DREAMTools provides a command line interface that enables researchers to test new methods on past challenges, as well as a framework for scoring new challenges. As of September 2015, DREAMTools includes more than 80% of completed DREAM challenges. DREAMTools complements the data, metadata, and software tools available at the DREAM website http://dreamchallenges.org and on the Synapse platform https://www.synapse.org. AVAILABILITY DREAMTools is a Python package. Releases and documentation are available at http://pypi.python.org/pypi/dreamtools. The source code is available at http://github.com/dreamtools.
Collapse
Affiliation(s)
- Thomas Cokelaer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI),Wellcome Trust Genome Campus, Cambridge, UK
| | - Mukesh Bansal
- Department of Systems Biology, Columbia University, New York, USA
| | | | - Erhan Bilal
- IBM, TJ Watson, Computational Biology Center, New York, USA
| | | | | | - Federica Eduati
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI),Wellcome Trust Genome Campus, Cambridge, UK
| | - Mehmet Gönen
- Oregon Health & Science University, Portland, OR, USA
| | - Steven M Hill
- MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge, UK
| | | | - Jonathan R Karr
- Department of Genetics & Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Robert Küffner
- Institute of Bioinformatics and Systems Biology, German Research Center for Environmental Health, Munich, Germany
| | - Michael P Menden
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI),Wellcome Trust Genome Campus, Cambridge, UK
| | - Pablo Meyer
- IBM, TJ Watson, Computational Biology Center, New York, USA
| | - Raquel Norel
- IBM, TJ Watson, Computational Biology Center, New York, USA
| | | | | | - Matthew T Weirauch
- Center for Autoimmune Genomics and Etiology and Divisions of Biomedical Informatics and Developmental Biology, Cincinnati Children's Hospital, Cincinnati, OH, USA
| | - James C Costello
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Gustavo Stolovitzky
- IBM, TJ Watson, Computational Biology Center, New York, USA; Department of Genetics & Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Julio Saez-Rodriguez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI),Wellcome Trust Genome Campus, Cambridge, UK; RWTH Aachen University Medical Hospital, Joint Research Centre for Computational Biomedicine (JRCCOMBINE), Aachen, Germany
| |
Collapse
|
9
|
Cokelaer T, Bansal M, Bare C, Bilal E, Bot BM, Chaibub Neto E, Eduati F, de la Fuente A, Gönen M, Hill SM, Hoff B, Karr JR, Küffner R, Menden MP, Meyer P, Norel R, Pratap A, Prill RJ, Weirauch MT, Costello JC, Stolovitzky G, Saez-Rodriguez J. DREAMTools: a Python package for scoring collaborative challenges. F1000Res 2015; 4:1030. [PMID: 27134723 PMCID: PMC4837986 DOI: 10.12688/f1000research.7118.2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/01/2016] [Indexed: 01/30/2023] Open
Abstract
UNLABELLED DREAM challenges are community competitions designed to advance computational methods and address fundamental questions in system biology and translational medicine. Each challenge asks participants to develop and apply computational methods to either predict unobserved outcomes or to identify unknown model parameters given a set of training data. Computational methods are evaluated using an automated scoring metric, scores are posted to a public leaderboard, and methods are published to facilitate community discussions on how to build improved methods. By engaging participants from a wide range of science and engineering backgrounds, DREAM challenges can comparatively evaluate a wide range of statistical, machine learning, and biophysical methods. Here, we describe DREAMTools, a Python package for evaluating DREAM challenge scoring metrics. DREAMTools provides a command line interface that enables researchers to test new methods on past challenges, as well as a framework for scoring new challenges. As of March 2016, DREAMTools includes more than 80% of completed DREAM challenges. DREAMTools complements the data, metadata, and software tools available at the DREAM website http://dreamchallenges.org and on the Synapse platform at https://www.synapse.org. AVAILABILITY DREAMTools is a Python package. Releases and documentation are available at http://pypi.python.org/pypi/dreamtools. The source code is available at http://github.com/dreamtools/dreamtools.
Collapse
Affiliation(s)
- Thomas Cokelaer
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI),Wellcome Trust Genome Campus, Cambridge, UK
- Bioinformatics and Biostatistics Hub, C3BI, Institut Pasteur, Paris, France
| | - Mukesh Bansal
- Department of Systems Biology, Columbia University, New York, USA
| | | | - Erhan Bilal
- IBM, TJ Watson, Computational Biology Center, New York, USA
| | | | | | - Federica Eduati
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI),Wellcome Trust Genome Campus, Cambridge, UK
| | - Alberto de la Fuente
- Leibniz Institute for Farm Animal Biology, Institute of Genetics and Biometry, Dummerstorf, Germany
| | - Mehmet Gönen
- Oregon Health & Science University, Portland, OR, USA
| | - Steven M. Hill
- MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge, UK
| | | | - Jonathan R. Karr
- Department of Genetics & Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Robert Küffner
- Institute of Bioinformatics and Systems Biology, German Research Center for Environmental Health, Munich, Germany
| | - Michael P. Menden
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI),Wellcome Trust Genome Campus, Cambridge, UK
| | - Pablo Meyer
- IBM, TJ Watson, Computational Biology Center, New York, USA
| | - Raquel Norel
- IBM, TJ Watson, Computational Biology Center, New York, USA
| | | | | | - Matthew T. Weirauch
- Center for Autoimmune Genomics and Etiology and Divisions of Biomedical Informatics and Developmental Biology, Cincinnati Children’s Hospital, Cincinnati, OH, USA
| | - James C. Costello
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Gustavo Stolovitzky
- IBM, TJ Watson, Computational Biology Center, New York, USA
- Department of Genetics & Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Julio Saez-Rodriguez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI),Wellcome Trust Genome Campus, Cambridge, UK
- RWTH Aachen University Medical Hospital, Joint Research Centre for Computational Biomedicine (JRCCOMBINE), Aachen, Germany
| |
Collapse
|
10
|
Linde J, Schulze S, Henkel SG, Guthke R. Data- and knowledge-based modeling of gene regulatory networks: an update. EXCLI JOURNAL 2015; 14:346-78. [PMID: 27047314 PMCID: PMC4817425 DOI: 10.17179/excli2015-168] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Accepted: 02/10/2015] [Indexed: 02/01/2023]
Abstract
Gene regulatory network inference is a systems biology approach which predicts interactions between genes with the help of high-throughput data. In this review, we present current and updated network inference methods focusing on novel techniques for data acquisition, network inference assessment, network inference for interacting species and the integration of prior knowledge. After the advance of Next-Generation-Sequencing of cDNAs derived from RNA samples (RNA-Seq) we discuss in detail its application to network inference. Furthermore, we present progress for large-scale or even full-genomic network inference as well as for small-scale condensed network inference and review advances in the evaluation of network inference methods by crowdsourcing. Finally, we reflect the current availability of data and prior knowledge sources and give an outlook for the inference of gene regulatory networks that reflect interacting species, in particular pathogen-host interactions.
Collapse
Affiliation(s)
- Jörg Linde
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | - Sylvie Schulze
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | | | - Reinhard Guthke
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| |
Collapse
|
11
|
Studham ME, Tjärnberg A, Nordling TEM, Nelander S, Sonnhammer ELL. Functional association networks as priors for gene regulatory network inference. ACTA ACUST UNITED AC 2014; 30:i130-8. [PMID: 24931976 PMCID: PMC4058914 DOI: 10.1093/bioinformatics/btu285] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Motivation: Gene regulatory network (GRN) inference reveals the influences genes have on one another in cellular regulatory systems. If the experimental data are inadequate for reliable inference of the network, informative priors have been shown to improve the accuracy of inferences. Results: This study explores the potential of undirected, confidence-weighted networks, such as those in functional association databases, as a prior source for GRN inference. Such networks often erroneously indicate symmetric interaction between genes and may contain mostly correlation-based interaction information. Despite these drawbacks, our testing on synthetic datasets indicates that even noisy priors reflect some causal information that can improve GRN inference accuracy. Our analysis on yeast data indicates that using the functional association databases FunCoup and STRING as priors can give a small improvement in GRN inference accuracy with biological data. Contact:matthew.studham@scilifelab.se Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matthew E Studham
- Stockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, SwedenStockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, Sweden
| | - Andreas Tjärnberg
- Stockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, SwedenStockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, Sweden
| | - Torbjörn E M Nordling
- Stockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, SwedenStockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, Sweden
| | - Sven Nelander
- Stockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, Sweden
| | - Erik L L Sonnhammer
- Stockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, SwedenStockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, SwedenStockholm Bioinformatics Centre, Science for Life Laboratory, SE-171 65 Solna, Sweden, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Rudbeck Laboratory, SE-751 05 Uppsala, Sweden and Swedish eScience Research Center, SE-100 44 Stockholm, Sweden
| |
Collapse
|
12
|
Kupfer P, Huber R, Weber M, Vlaic S, Häupl T, Koczan D, Guthke R, Kinne RW. Novel application of multi-stimuli network inference to synovial fibroblasts of rheumatoid arthritis patients. BMC Med Genomics 2014; 7:40. [PMID: 24989895 PMCID: PMC4099018 DOI: 10.1186/1755-8794-7-40] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2013] [Accepted: 06/25/2014] [Indexed: 11/19/2022] Open
Abstract
Background Network inference of gene expression data is an important challenge in systems biology. Novel algorithms may provide more detailed gene regulatory networks (GRN) for complex, chronic inflammatory diseases such as rheumatoid arthritis (RA), in which activated synovial fibroblasts (SFBs) play a major role. Since the detailed mechanisms underlying this activation are still unclear, simultaneous investigation of multi-stimuli activation of SFBs offers the possibility to elucidate the regulatory effects of multiple mediators and to gain new insights into disease pathogenesis. Methods A GRN was therefore inferred from RA-SFBs treated with 4 different stimuli (IL-1 β, TNF- α, TGF- β, and PDGF-D). Data from time series microarray experiments (0, 1, 2, 4, 12 h; Affymetrix HG-U133 Plus 2.0) were batch-corrected applying ‘ComBat’, analyzed for differentially expressed genes over time with ‘Limma’, and used for the inference of a robust GRN with NetGenerator V2.0, a heuristic ordinary differential equation-based method with soft integration of prior knowledge. Results Using all genes differentially expressed over time in RA-SFBs for any stimulus, and selecting the genes belonging to the most significant gene ontology (GO) term, i.e., ‘cartilage development’, a dynamic, robust, moderately complex multi-stimuli GRN was generated with 24 genes and 57 edges in total, 31 of which were gene-to-gene edges. Prior literature-based knowledge derived from Pathway Studio or manual searches was reflected in the final network by 25/57 confirmed edges (44%). The model contained known network motifs crucial for dynamic cellular behavior, e.g., cross-talk among pathways, positive feed-back loops, and positive feed-forward motifs (including suppression of the transcriptional repressor OSR2 by all 4 stimuli. Conclusion A multi-stimuli GRN highly concordant with literature data was successfully generated by network inference from the gene expression of stimulated RA-SFBs. The GRN showed high reliability, since 10 predicted edges were independently validated by literature findings post network inference. The selected GO term ‘cartilage development’ contained a number of differentiation markers, growth factors, and transcription factors with potential relevance for RA. Finally, the model provided new insight into the response of RA-SFBs to multiple stimuli implicated in the pathogenesis of RA, in particular to the ‘novel’ potent growth factor PDGF-D.
Collapse
Affiliation(s)
- Peter Kupfer
- Leibnitz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr, 11a, 07745 Jena, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
13
|
Santra T. A bayesian framework that integrates heterogeneous data for inferring gene regulatory networks. Front Bioeng Biotechnol 2014; 2:13. [PMID: 25152886 PMCID: PMC4126456 DOI: 10.3389/fbioe.2014.00013] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Accepted: 04/28/2014] [Indexed: 11/29/2022] Open
Abstract
Reconstruction of gene regulatory networks (GRNs) from experimental data is a fundamental challenge in systems biology. A number of computational approaches have been developed to infer GRNs from mRNA expression profiles. However, expression profiles alone are proving to be insufficient for inferring GRN topologies with reasonable accuracy. Recently, it has been shown that integration of external data sources (such as gene and protein sequence information, gene ontology data, protein-protein interactions) with mRNA expression profiles may increase the reliability of the inference process. Here, I propose a new approach that incorporates transcription factor binding sites (TFBS) and physical protein interactions (PPI) among transcription factors (TFs) in a Bayesian variable selection (BVS) algorithm which can infer GRNs from mRNA expression profiles subjected to genetic perturbations. Using real experimental data, I show that the integration of TFBS and PPI data with mRNA expression profiles leads to significantly more accurate networks than those inferred from expression profiles alone. Additionally, the performance of the proposed algorithm is compared with a series of least absolute shrinkage and selection operator (LASSO) regression-based network inference methods that can also incorporate prior knowledge in the inference framework. The results of this comparison suggest that BVS can outperform LASSO regression-based method in some circumstances.
Collapse
Affiliation(s)
- Tapesh Santra
- Systems Biology Ireland, University College Dublin, Dublin, Ireland
| |
Collapse
|
14
|
Gelfond JA, Ibrahim JG, Gupta M, Chen MH, Cody JD. Differential expression analysis with global network adjustment. BMC Bioinformatics 2013; 14:258. [PMID: 23968143 PMCID: PMC3766173 DOI: 10.1186/1471-2105-14-258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2013] [Accepted: 07/25/2013] [Indexed: 11/17/2022] Open
Abstract
Background Large-scale chromosomal deletions or other non-specific perturbations of the transcriptome can alter the expression of hundreds or thousands of genes, and it is of biological interest to understand which genes are most profoundly affected. We present a method for predicting a gene’s expression as a function of other genes thereby accounting for the effect of transcriptional regulation that confounds the identification of genes differentially expressed relative to a regulatory network. The challenge in constructing such models is that the number of possible regulator transcripts within a global network is on the order of thousands, and the number of biological samples is typically on the order of 10. Nevertheless, there are large gene expression databases that can be used to construct networks that could be helpful in modeling transcriptional regulation in smaller experiments. Results We demonstrate a type of penalized regression model that can be estimated from large gene expression databases, and then applied to smaller experiments. The ridge parameter is selected by minimizing the cross-validation error of the predictions in the independent out-sample. This tends to increase the model stability and leads to a much greater degree of parameter shrinkage, but the resulting biased estimation is mitigated by a second round of regression. Nevertheless, the proposed computationally efficient “over-shrinkage” method outperforms previously used LASSO-based techniques. In two independent datasets, we find that the median proportion of explained variability in expression is approximately 25%, and this results in a substantial increase in the signal-to-noise ratio allowing more powerful inferences on differential gene expression leading to biologically intuitive findings. We also show that a large proportion of gene dependencies are conditional on the biological state, which would be impossible with standard differential expression methods. Conclusions By adjusting for the effects of the global network on individual genes, both the sensitivity and reliability of differential expression measures are greatly improved.
Collapse
|
15
|
Greenfield A, Hafemeister C, Bonneau R. Robust data-driven incorporation of prior knowledge into the inference of dynamic regulatory networks. ACTA ACUST UNITED AC 2013; 29:1060-7. [PMID: 23525069 PMCID: PMC3624811 DOI: 10.1093/bioinformatics/btt099] [Citation(s) in RCA: 118] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
MOTIVATION Inferring global regulatory networks (GRNs) from genome-wide data is a computational challenge central to the field of systems biology. Although the primary data currently used to infer GRNs consist of gene expression and proteomics measurements, there is a growing abundance of alternate data types that can reveal regulatory interactions, e.g. ChIP-Chip, literature-derived interactions, protein-protein interactions. GRN inference requires the development of integrative methods capable of using these alternate data as priors on the GRN structure. Each source of structure priors has its unique biases and inherent potential errors; thus, GRN methods using these data must be robust to noisy inputs. RESULTS We developed two methods for incorporating structure priors into GRN inference. Both methods [Modified Elastic Net (MEN) and Bayesian Best Subset Regression (BBSR)] extend the previously described Inferelator framework, enabling the use of prior information. We test our methods on one synthetic and two bacterial datasets, and show that both MEN and BBSR infer accurate GRNs even when the structure prior used has significant amounts of error (>90% erroneous interactions). We find that BBSR outperforms MEN at inferring GRNs from expression data and noisy structure priors. AVAILABILITY AND IMPLEMENTATION Code, datasets and networks presented in this article are available at http://bonneaulab.bio.nyu.edu/software.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alex Greenfield
- Computational Biology Program, New York University Sackler School of Medicine, New York, NY 10065, USA
| | | | | |
Collapse
|
16
|
Rosa BA, Zhang J, Major IT, Qin W, Chen J. Optimal timepoint sampling in high-throughput gene expression experiments. Bioinformatics 2012; 28:2773-81. [PMID: 22923305 DOI: 10.1093/bioinformatics/bts511] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Bruce A Rosa
- Biorefining Research Institute and Department of Biology, Lakehead University, 955 Oliver Road, Thunder Bay, Canada ON P7B 5E1
| | | | | | | | | |
Collapse
|
17
|
Lo K, Raftery AE, Dombek KM, Zhu J, Schadt EE, Bumgarner RE, Yeung KY. Integrating external biological knowledge in the construction of regulatory networks from time-series expression data. BMC SYSTEMS BIOLOGY 2012; 6:101. [PMID: 22898396 PMCID: PMC3465231 DOI: 10.1186/1752-0509-6-101] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2012] [Accepted: 07/24/2012] [Indexed: 01/27/2023]
Abstract
BACKGROUND Inference about regulatory networks from high-throughput genomics data is of great interest in systems biology. We present a Bayesian approach to infer gene regulatory networks from time series expression data by integrating various types of biological knowledge. RESULTS We formulate network construction as a series of variable selection problems and use linear regression to model the data. Our method summarizes additional data sources with an informative prior probability distribution over candidate regression models. We extend the Bayesian model averaging (BMA) variable selection method to select regulators in the regression framework. We summarize the external biological knowledge by an informative prior probability distribution over the candidate regression models. CONCLUSIONS We demonstrate our method on simulated data and a set of time-series microarray experiments measuring the effect of a drug perturbation on gene expression levels, and show that it outperforms leading regression-based methods in the literature.
Collapse
Affiliation(s)
- Kenneth Lo
- Department of Microbiology, University of Washington, Box 358070, Seattle, WA, 98195, USA
| | - Adrian E Raftery
- Department of Statistics, University of Washington, Box 354320, Seattle, WA, 98195, USA
| | - Kenneth M Dombek
- Department of Biochemistry, University of Washington, Box 357350, Seattle, WA, 98195, USA
| | - Jun Zhu
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, NY, 10029, USA
| | - Eric E Schadt
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, NY, 10029, USA
| | - Roger E Bumgarner
- Department of Microbiology, University of Washington, Box 358070, Seattle, WA, 98195, USA
| | - Ka Yee Yeung
- Department of Microbiology, University of Washington, Box 358070, Seattle, WA, 98195, USA
| |
Collapse
|
18
|
Ashworth J, Wurtmann EJ, Baliga NS. Reverse engineering systems models of regulation: discovery, prediction and mechanisms. Curr Opin Biotechnol 2011; 23:598-603. [PMID: 22209016 DOI: 10.1016/j.copbio.2011.12.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2011] [Accepted: 12/08/2011] [Indexed: 10/14/2022]
Abstract
Biological systems can now be understood in comprehensive and quantitative detail using systems biology approaches. Putative genome-scale models can be built rapidly based upon biological inventories and strategic system-wide molecular measurements. Current models combine statistical associations, causative abstractions, and known molecular mechanisms to explain and predict quantitative and complex phenotypes. This top-down 'reverse engineering' approach generates useful organism-scale models despite noise and incompleteness in data and knowledge. Here we review and discuss the reverse engineering of biological systems using top-down data-driven approaches, in order to improve discovery, hypothesis generation, and the inference of biological properties.
Collapse
|
19
|
Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Muertter RN, Holko M, Ayanbule O, Yefanov A, Soboleva A. NCBI GEO: archive for functional genomics data sets--10 years on. Nucleic Acids Res 2010; 39:D1005-10. [PMID: 21097893 PMCID: PMC3013736 DOI: 10.1093/nar/gkq1184] [Citation(s) in RCA: 798] [Impact Index Per Article: 57.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
A decade ago, the Gene Expression Omnibus (GEO) database was established at the National Center for Biotechnology Information (NCBI). The original objective of GEO was to serve as a public repository for high-throughput gene expression data generated mostly by microarray technology. However, the research community quickly applied microarrays to non-gene-expression studies, including examination of genome copy number variation and genome-wide profiling of DNA-binding proteins. Because the GEO database was designed with a flexible structure, it was possible to quickly adapt the repository to store these data types. More recently, as the microarray community switches to next-generation sequencing technologies, GEO has again adapted to host these data sets. Today, GEO stores over 20,000 microarray- and sequence-based functional genomics studies, and continues to handle the majority of direct high-throughput data submissions from the research community. Multiple mechanisms are provided to help users effectively search, browse, download and visualize the data at the level of individual genes or entire studies. This paper describes recent database enhancements, including new search and data representation tools, as well as a brief review of how the community uses GEO data. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.
Collapse
Affiliation(s)
- Tanya Barrett
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Linde J, Wilson D, Hube B, Guthke R. Regulatory network modelling of iron acquisition by a fungal pathogen in contact with epithelial cells. BMC SYSTEMS BIOLOGY 2010; 4:148. [PMID: 21050438 PMCID: PMC3225834 DOI: 10.1186/1752-0509-4-148] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2010] [Accepted: 11/04/2010] [Indexed: 01/03/2023]
Abstract
BACKGROUND Reverse engineering of gene regulatory networks can be used to predict regulatory interactions of an organism faced with environmental changes, but can prove problematic, especially when focusing on complicated multi-factorial processes. Candida albicans is a major human fungal pathogen. During the infection process, this fungus is able to adapt to conditions of very low iron availability. Such adaptation is an important virulence attribute of virtually all pathogenic microbes. Understanding the regulation of iron acquisition genes will extend our knowledge of the complex regulatory changes during the infection process and might identify new potential drug targets. Thus, there is a need for efficient modelling approaches predicting key regulatory events of iron acquisition genes during the infection process. RESULTS This study deals with the regulation of C. albicans iron uptake genes during adhesion to and invasion into human oral epithelial cells. A reverse engineering strategy is presented, which is able to infer regulatory networks on the basis of gene expression data, making use of relevant selection criteria such as sparseness and robustness. An exhaustive use of available knowledge from different data sources improved the network prediction. The predicted regulatory network proposes a number of new target genes for the transcriptional regulators Rim101, Hap3, Sef1 and Tup1. Furthermore, the molecular mode of action for Tup1 is clarified. Finally, regulatory interactions between the transcription factors themselves are proposed. This study presents a model describing how C. albicans may regulate iron acquisition during contact with and invasion of human oral epithelial cells. There is evidence that some of the proposed regulatory interactions might also occur during oral infection. CONCLUSIONS This study focuses on a typical problem in Systems Biology where an interesting biological phenomenon is studied using a small number of available experimental data points. To overcome this limitation, a special modelling strategy was used which identifies sparse and robust networks. The data is augmented by an exhaustive search for additional data sources, helping to make proposals on regulatory interactions and to guide the modelling approach. The proposed modelling strategy is capable of finding known regulatory interactions and predicts a number of yet unknown biologically relevant regulatory interactions.
Collapse
Affiliation(s)
- Jörg Linde
- Research Group Systems Biology/Bioinformatics, Leibniz-Institute for Natural Product Research and Infection Biology-Hans-Knoell-Institute, Beutenbergstraße 11a, 07745 Jena, Germany
| | - Duncan Wilson
- Department Microbial Pathogenicity Mechanisms, Leibniz-Institute for Natural Product Research and Infection Biology-Hans-Knoell-Institute, Beutenbergstraße 11a, 07745 Jena, Germany
| | - Bernhard Hube
- Department Microbial Pathogenicity Mechanisms, Leibniz-Institute for Natural Product Research and Infection Biology-Hans-Knoell-Institute, Beutenbergstraße 11a, 07745 Jena, Germany
| | - Reinhard Guthke
- Research Group Systems Biology/Bioinformatics, Leibniz-Institute for Natural Product Research and Infection Biology-Hans-Knoell-Institute, Beutenbergstraße 11a, 07745 Jena, Germany
| |
Collapse
|
21
|
Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PLoS One 2010; 5:e9202. [PMID: 20186320 PMCID: PMC2826397 DOI: 10.1371/journal.pone.0009202] [Citation(s) in RCA: 298] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2009] [Accepted: 01/19/2010] [Indexed: 11/29/2022] Open
Abstract
Background Systems biology has embraced computational modeling in response to the quantitative nature and increasing scale of contemporary data sets. The onslaught of data is accelerating as molecular profiling technology evolves. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) is a community effort to catalyze discussion about the design, application, and assessment of systems biology models through annual reverse-engineering challenges. Methodology and Principal Findings We describe our assessments of the four challenges associated with the third DREAM conference which came to be known as the DREAM3 challenges: signaling cascade identification, signaling response prediction, gene expression prediction, and the DREAM3 in silico network challenge. The challenges, based on anonymized data sets, tested participants in network inference and prediction of measurements. Forty teams submitted 413 predicted networks and measurement test sets. Overall, a handful of best-performer teams were identified, while a majority of teams made predictions that were equivalent to random. Counterintuitively, combining the predictions of multiple teams (including the weaker teams) can in some cases improve predictive power beyond that of any single method. Conclusions DREAM provides valuable feedback to practitioners of systems biology modeling. Lessons learned from the predictions of the community provide much-needed context for interpreting claims of efficacy of algorithms described in the scientific literature.
Collapse
|
22
|
Ruan J. A top-performing algorithm for the DREAM3 gene expression prediction challenge. PLoS One 2010; 5:e8944. [PMID: 20140212 PMCID: PMC2816205 DOI: 10.1371/journal.pone.0008944] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2009] [Accepted: 01/04/2010] [Indexed: 11/23/2022] Open
Abstract
A wealth of computational methods has been developed to address problems in systems biology, such as modeling gene expression. However, to objectively evaluate and compare such methods is notoriously difficult. The DREAM (Dialogue on Reverse Engineering Assessments and Methods) project is a community-wide effort to assess the relative strengths and weaknesses of different computational methods for a set of core problems in systems biology. This article presents a top-performing algorithm for one of the challenge problems in the third annual DREAM (DREAM3), namely the gene expression prediction challenge. In this challenge, participants are asked to predict the expression levels of a small set of genes in a yeast deletion strain, given the expression levels of all other genes in the same strain and complete gene expression data for several other yeast strains. I propose a simple -nearest-neighbor (KNN) method to solve this problem. Despite its simplicity, this method works well for this challenge, sharing the “top performer” honor with a much more sophisticated method. I also describe several alternative, simple strategies, including a modified KNN algorithm that further improves the performance of the standard KNN method. The success of these methods suggests that complex methods attempting to integrate multiple data sets do not necessarily lead to better performance than simple yet robust methods. Furthermore, none of these top-performing methods, including the one by a different team, are based on gene regulatory networks, which seems to suggest that accurately modeling gene expression using gene regulatory networks is unfortunately still a difficult task.
Collapse
Affiliation(s)
- Jianhua Ruan
- Department of Computer Science, University of Texas at San Antonio, San Antonio, Texas, United States of America.
| |
Collapse
|