1
|
Bessonov K, Van Steen K. Practical aspects of gene regulatory inference via conditional inference forests from expression data. Genet Epidemiol 2016; 40:767-778. [PMID: 27870152 DOI: 10.1002/gepi.22017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2015] [Revised: 09/15/2016] [Accepted: 09/21/2016] [Indexed: 11/09/2022]
Abstract
Gene regulatory network (GRN) inference is an active area of research that facilitates understanding the complex interplays between biological molecules. We propose a novel framework to create such GRNs, based on Conditional Inference Forests (CIFs) as proposed by Strobl et al. Our framework consists of using ensembles of Conditional Inference Trees (CITs) and selecting an appropriate aggregation scheme for variant selection prior to network construction. We show on synthetic microarray data that taking the original implementation of CIFs with conditional permutation scheme (CIFcond ) may lead to improved performance compared to Breiman's implementation of Random Forests (RF). Among all newly introduced CIF-based methods and five network scenarios obtained from the DREAM4 challenge, CIFcond performed best. Networks derived from well-tuned CIFs, obtained by simply averaging P-values over tree ensembles (CIFmean ) are particularly attractive, because they combine adequate performance with computational efficiency. Moreover, thresholds for variable selection are based on significance levels for P-values and, hence, do not need to be tuned. From a practical point of view, our extensive simulations show the potential advantages of CIFmean -based methods. Although more work is needed to improve on speed, especially when fully exploiting the advantages of CITs in the context of heterogeneous and correlated data, we have shown that CIF methodology can be flexibly inserted in a framework to infer biological interactions. Notably, we confirmed biologically relevant interaction between IL2RA and FOXP1, linked to the IL-2 signaling pathway and to type 1 diabetes.
Collapse
Affiliation(s)
- Kyrylo Bessonov
- Medical Genomics, GIGA-R, Université de Liège, Sart-Tilman, Belgium
| | | |
Collapse
|
2
|
Lesurf R, Cotto KC, Wang G, Griffith M, Kasaian K, Jones SJM, Montgomery SB, Griffith OL. ORegAnno 3.0: a community-driven resource for curated regulatory annotation. Nucleic Acids Res 2015; 44:D126-32. [PMID: 26578589 PMCID: PMC4702855 DOI: 10.1093/nar/gkv1203] [Citation(s) in RCA: 118] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 10/26/2015] [Indexed: 12/26/2022] Open
Abstract
The Open Regulatory Annotation database (ORegAnno) is a resource for curated regulatory annotation. It contains information about regulatory regions, transcription factor binding sites, RNA binding sites, regulatory variants, haplotypes, and other regulatory elements. ORegAnno differentiates itself from other regulatory resources by facilitating crowd-sourced interpretation and annotation of regulatory observations from the literature and highly curated resources. It contains a comprehensive annotation scheme that aims to describe both the elements and outcomes of regulatory events. Moreover, ORegAnno assembles these disparate data sources and annotations into a single, high quality catalogue of curated regulatory information. The current release is an update of the database previously featured in the NAR Database Issue, and now contains 1 948 307 records, across 18 species, with a combined coverage of 334 215 080 bp. Complete records, annotation, and other associated data are available for browsing and download at http://www.oreganno.org/.
Collapse
Affiliation(s)
- Robert Lesurf
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Kelsy C Cotto
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Grace Wang
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Malachi Griffith
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Katayoon Kasaian
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC V5Z 4S6, Canada
| | - Steven J M Jones
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC V5Z 4S6, Canada Department of Molecular Biology & Biochemistry, Simon Fraser University, Burnaby, BC V5A 1S6, Canada Department of Medical Genetics, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
| | - Stephen B Montgomery
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Obi L Griffith
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA Department of Medicine, Division of Oncology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | | |
Collapse
|
3
|
Abstract
Background Cell survival and development are orchestrated by complex interlocking programs of gene activation and repression. Understanding how this gene regulatory network (GRN) functions in normal states, and is altered in cancers subtypes, offers fundamental insight into oncogenesis and disease progression, and holds great promise for guiding clinical decisions. Inferring a GRN from empirical microarray gene expression data is a challenging task in cancer systems biology. In recent years, module-based approaches for GRN inference have been proposed to address this challenge. Despite the demonstrated success of module-based approaches in uncovering biologically meaningful regulatory interactions, their application remains limited a single condition, without supporting the comparison of multiple disease subtypes/conditions. Also, their use remains unnecessarily restricted to computational biologists, as accurate inference of modules and their regulators requires integration of diverse tools and heterogeneous data sources, which in turn requires scripting skills, data infrastructure and powerful computational facilities. New analytical frameworks are required to make module-based GRN inference approach more generally useful to the research community. Results We present the RMaNI (Regulatory Module Network Inference) framework, which supports cancer subtype-specific or condition specific GRN inference and differential network analysis. It combines both transcriptomic as well as genomic data sources, and integrates heterogeneous knowledge resources and a set of complementary bioinformatic methods for automated inference of modules, their condition specific regulators and facilitates downstream network analyses and data visualization. To demonstrate its utility, we applied RMaNI to a hepatocellular microarray data containing normal and three disease conditions. We demonstrate that how RMaNI can be employed to understand the genetic architecture underlying three disease conditions. RMaNI is freely available at http://inspect.braembl.org.au/bi/inspect/rmani Conclusion RMaNI makes available a workflow with comprehensive set of tools that would otherwise be challenging for non-expert users to install and apply. The framework presented in this paper is flexible and can be easily extended to analyse any dataset with multiple disease conditions.
Collapse
|