101
|
Papatheodorou I, Oellrich A, Smedley D. Linking gene expression to phenotypes via pathway information. J Biomed Semantics 2015; 6:17. [PMID: 25901272 PMCID: PMC4404592 DOI: 10.1186/s13326-015-0013-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2014] [Accepted: 03/19/2015] [Indexed: 11/10/2022] Open
Abstract
Establishing robust links among gene expression, pathways and phenotypes is critical for understanding diseases and developing treatments. In recent years there have been many efforts to develop the computational means to traverse from genes to gene expression, model pathways and classify phenotypes. Numerous ontologies and other controlled vocabularies have been developed, as well as computational methods to combine and mine these data sets and establish connections. Here we discuss these efforts and identify areas of future work that could lead to a better integration of genes, pathways and phenotypes to provide insights into the mechanisms under which gene mutations affect expression and pathways and how these effects are manifested onto the phenotype.
Collapse
Affiliation(s)
- Irene Papatheodorou
- Mouse Developmental Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, CB1 10SA, Hinxton, UK
| | - Anika Oellrich
- Mouse Developmental Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, CB1 10SA, Hinxton, UK
| | - Damian Smedley
- Mouse Developmental Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, CB1 10SA, Hinxton, UK
| |
Collapse
|
102
|
Unraveling the association between mRNA expressions and mutant phenotypes in a genome-wide assessment of mice. Proc Natl Acad Sci U S A 2015; 112:4707-12. [PMID: 25825715 DOI: 10.1073/pnas.1415046112] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
High-throughput gene expression profiling has revealed substantial leaky and extraneous transcription of eukaryotic genes, challenging the perceptions that transcription is strictly regulated and that changes in transcription have phenotypic consequences. To assess the functional implications of mRNA transcription directly, we analyzed mRNA expression data derived from microarrays, RNA-sequencing, and in situ hybridization, together with phenotype data of mouse mutants as a proxy of gene function at the tissue level. The results indicated that despite the presence of widespread ectopic transcription, mRNA expression and mutant phenotypes of mammalian genes or tissues remain associated. The expression-phenotype association at the gene level was particularly strong for tissue-specific genes, and the association could be underestimated due to data insufficiency and incomprehensive phenotyping of mouse mutants; the strength of expression-phenotype association at the tissue level depended on tissue functions. Mutations on genes expressed at higher levels or expressed at earlier embryonic stages more often result in abnormal phenotypes in the tissues where they are expressed. The mRNA expression profiles that have stronger associations with their phenotype profiles tend to be more evolutionarily conserved, indicating that the evolution of transcriptome and the evolution of phenome are coupled. Therefore, mutations resulting in phenotypic aberrations in expressed tissues are more likely to occur in highly transcribed genes, tissue-specific genes, genes expressed during early embryonic stages, or genes with evolutionarily conserved mRNA expression profiles.
Collapse
|
103
|
Smith CL, Eppig JT. Expanding the mammalian phenotype ontology to support automated exchange of high throughput mouse phenotyping data generated by large-scale mouse knockout screens. J Biomed Semantics 2015; 6:11. [PMID: 25825651 PMCID: PMC4378007 DOI: 10.1186/s13326-015-0009-1] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 03/03/2015] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND A vast array of data is about to emerge from the large scale high-throughput mouse knockout phenotyping projects worldwide. It is critical that this information is captured in a standardized manner, made accessible, and is fully integrated with other phenotype data sets for comprehensive querying and analysis across all phenotype data types. The volume of data generated by the high-throughput phenotyping screens is expected to grow exponentially, thus, automated methods and standards to exchange phenotype data are required. RESULTS The IMPC (International Mouse Phenotyping Consortium) is using the Mammalian Phenotype (MP) ontology in the automated annotation of phenodeviant data from high throughput phenotyping screens. 287 new term additions with additional hierarchy revisions were made in multiple branches of the MP ontology to accurately describe the results generated by these high throughput screens. CONCLUSIONS Because these large scale phenotyping data sets will be reported using the MP as the common data standard for annotation and data exchange, automated importation of these data to MGI (Mouse Genome Informatics) and other resources is possible without curatorial effort. Maximum biomedical value of these mutant mice will come from integrating primary high-throughput phenotyping data with secondary, comprehensive phenotypic analyses combined with published phenotype details on these and related mutants at MGI and other resources.
Collapse
Affiliation(s)
- Cynthia L Smith
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME 04609 USA
| | - Janan T Eppig
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME 04609 USA
| |
Collapse
|
104
|
Hoehndorf R, Gruenberger M, Gkoutos GV, Schofield PN. Similarity-based search of model organism, disease and drug effect phenotypes. J Biomed Semantics 2015; 6:6. [PMID: 25763178 PMCID: PMC4355138 DOI: 10.1186/s13326-015-0001-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2014] [Accepted: 01/24/2015] [Indexed: 12/17/2022] Open
Abstract
Background Semantic similarity measures over phenotype ontologies have been demonstrated to provide a powerful approach for the analysis of model organism phenotypes, the discovery of animal models of human disease, novel pathways, gene functions, druggable therapeutic targets, and determination of pathogenicity. Results We have developed PhenomeNET 2, a system that enables similarity-based searches over a large repository of phenotypes in real-time. It can be used to identify strains of model organisms that are phenotypically similar to human patients, diseases that are phenotypically similar to model organism phenotypes, or drug effect profiles that are similar to the phenotypes observed in a patient or model organism. PhenomeNET 2 is available at http://aber-owl.net/phenomenet. Conclusions Phenotype-similarity searches can provide a powerful tool for the discovery and investigation of molecular mechanisms underlying an observed phenotypic manifestation. PhenomeNET 2 facilitates user-defined similarity searches and allows researchers to analyze their data within a large repository of human, mouse and rat phenotypes.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900 Saudi Arabia ; Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900 Saudi Arabia
| | - Michael Gruenberger
- Department of Computer Science, Aberystwyth University, Llandinam Building, Aberystwyth, SY23 3DB UK
| | - Georgios V Gkoutos
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG UK
| | - Paul N Schofield
- Department of Computer Science, Aberystwyth University, Llandinam Building, Aberystwyth, SY23 3DB UK
| |
Collapse
|
105
|
Ascensao JA, Dolan ME, Hill DP, Blake JA. Methodology for the inference of gene function from phenotype data. BMC Bioinformatics 2014; 15:405. [PMID: 25495798 PMCID: PMC4302099 DOI: 10.1186/s12859-014-0405-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2014] [Accepted: 12/02/2014] [Indexed: 12/14/2022] Open
Abstract
Background Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures. Results We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function. We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes. Conclusions We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and phenotypes that would be overlooked by a semantics-based approach. Future work will include the implementation of the described algorithms for a variety of other model organism databases, taking full advantage of the abundance of available high quality curated data. Electronic supplementary material The online version of this article (doi:10.1186/s12859-014-0405-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Joao A Ascensao
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME, USA. .,Rice University, 6100 Main Street, Houston, TX, USA.
| | - Mary E Dolan
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME, USA.
| | - David P Hill
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME, USA.
| | - Judith A Blake
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME, USA.
| |
Collapse
|
106
|
Shimoyama M, De Pons J, Hayman GT, Laulederkind SJF, Liu W, Nigam R, Petri V, Smith JR, Tutaj M, Wang SJ, Worthey E, Dwinell M, Jacob H. The Rat Genome Database 2015: genomic, phenotypic and environmental variations and disease. Nucleic Acids Res 2014; 43:D743-50. [PMID: 25355511 PMCID: PMC4383884 DOI: 10.1093/nar/gku1026] [Citation(s) in RCA: 167] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
The Rat Genome Database (RGD, http://rgd.mcw.edu) provides the most comprehensive data repository and informatics platform related to the laboratory rat, one of the most important model organisms for disease studies. RGD maintains and updates datasets for genomic elements such as genes, transcripts and increasingly in recent years, sequence variations, as well as map positions for multiple assemblies and sequence information. Functional annotations for genomic elements are curated from published literature, submitted by researchers and integrated from other public resources. Complementing the genomic data catalogs are those associated with phenotypes and disease, including strains, QTL and experimental phenotype measurements across hundreds of strains. Data are submitted by researchers, acquired through bulk data pipelines or curated from published literature. Innovative software tools provide users with an integrated platform to query, mine, display and analyze valuable genomic and phenomic datasets for discovery and enhancement of their own research. This update highlights recent developments that reflect an increasing focus on: (i) genomic variation, (ii) phenotypes and diseases, (iii) data related to the environment and experimental conditions and (iv) datasets and software tools that allow the user to explore and analyze the interactions among these and their impact on disease.
Collapse
Affiliation(s)
- Mary Shimoyama
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA Department of Surgery, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Jeff De Pons
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - G Thomas Hayman
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | | | - Weisong Liu
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Rajni Nigam
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Victoria Petri
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Jennifer R Smith
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Marek Tutaj
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Shur-Jen Wang
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Elizabeth Worthey
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA Department of Pediatrics, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Melinda Dwinell
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Howard Jacob
- Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI 53226, USA Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| |
Collapse
|
107
|
Eppig JT, Blake JA, Bult CJ, Kadin JA, Richardson JE. The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease. Nucleic Acids Res 2014; 43:D726-36. [PMID: 25348401 PMCID: PMC4384027 DOI: 10.1093/nar/gku967] [Citation(s) in RCA: 293] [Impact Index Per Article: 26.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
The Mouse Genome Database (MGD, http://www.informatics.jax.org) serves the international biomedical research community as the central resource for integrated genomic, genetic and biological data on the laboratory mouse. To facilitate use of mouse as a model in translational studies, MGD maintains a core of high-quality curated data and integrates experimentally and computationally generated data sets. MGD maintains a unified catalog of genes and genome features, including functional RNAs, QTL and phenotypic loci. MGD curates and provides functional and phenotype annotations for mouse genes using the Gene Ontology and Mammalian Phenotype Ontology. MGD integrates phenotype data and associates mouse genotypes to human diseases, providing critical mouse–human relationships and access to repositories holding mouse models. MGD is the authoritative source of nomenclature for genes, genome features, alleles and strains following guidelines of the International Committee on Standardized Genetic Nomenclature for Mice. A new addition to MGD, the Human–Mouse: Disease Connection, allows users to explore gene–phenotype–disease relationships between human and mouse. MGD has also updated search paradigms for phenotypic allele attributes, incorporated incidental mutation data, added a module for display and exploration of genes and microRNA interactions and adopted the JBrowse genome browser. MGD resources are freely available to the scientific community.
Collapse
Affiliation(s)
- Janan T Eppig
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | - Judith A Blake
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | - Carol J Bult
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | - James A Kadin
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | | | | |
Collapse
|
108
|
Mannil D, Vogt I, Prinz J, Campillos M. Organ system heterogeneity DB: a database for the visualization of phenotypes at the organ system level. Nucleic Acids Res 2014; 43:D900-6. [PMID: 25313158 PMCID: PMC4384019 DOI: 10.1093/nar/gku948] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Perturbations of mammalian organisms including diseases, drug treatments and gene perturbations in mice affect organ systems differently. Some perturbations impair relatively few organ systems while others lead to highly heterogeneous or systemic effects. Organ System Heterogeneity DB (http://mips.helmholtz-muenchen.de/Organ_System_Heterogeneity/) provides information on the phenotypic effects of 4865 human diseases, 1667 drugs and 5361 genetically modified mouse models on 26 different organ systems. Disease symptoms, drug side effects and mouse phenotypes are mapped to the System Organ Class (SOC) level of the Medical Dictionary of Regulatory Activities (MedDRA). Then, the organ system heterogeneity value, a measurement of the systemic impact of a perturbation, is calculated from the relative frequency of phenotypic features across all SOCs. For perturbations of interest, the database displays the distribution of phenotypic effects across organ systems along with the heterogeneity value and the distance between organ system distributions. In this way, it allows, in an easy and comprehensible fashion, the comparison of the phenotypic organ system distributions of diseases, drugs and their corresponding genetically modified mouse models of associated disease genes and drug targets. The Organ System Heterogeneity DB is thus a platform for the visualization and comparison of organ system level phenotypic effects of drugs, diseases and genes.
Collapse
Affiliation(s)
- Deepthi Mannil
- German Center for Diabetes Research, Neuherberg 85764, Germany Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg 85764, Germany
| | - Ingo Vogt
- German Center for Diabetes Research, Neuherberg 85764, Germany Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg 85764, Germany
| | - Jeanette Prinz
- German Center for Diabetes Research, Neuherberg 85764, Germany Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg 85764, Germany
| | - Monica Campillos
- German Center for Diabetes Research, Neuherberg 85764, Germany Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg 85764, Germany
| |
Collapse
|
109
|
Text mining and network analysis of molecular interaction in non-small cell lung cancer by using natural language processing. Mol Biol Rep 2014; 41:8071-9. [PMID: 25205120 DOI: 10.1007/s11033-014-3705-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2014] [Accepted: 08/23/2014] [Indexed: 01/21/2023]
Abstract
Lung cancer including non-small cell lung cancer (NSCLC) and small cell lung cancer is one of the most aggressive tumors with high incidence and low survival rate. The typical NSCLC patients account for 80-85 % of the total lung cancer patients. To systemically explore the molecular mechanisms of NSCLC, we performed a molecular network analysis between human and mouse to identify key genes (pathways) involved in the occurrence of NSCLC. We automatically extracted the human-to-mouse orthologous interactions using the GeneWays system by natural language processing and further constructed molecular (gene and its products) networks by mapping the human-to-mouse interactions to NSCLC-related mammalian phenotypes, followed by module analysis using ClusterONE of Cytoscape and pathway enrichment analysis using the database for annotation, visualization and integrated discovery (DAVID) successively. A total of 70 genes were proven to be related to the mammalian phenotypes of NSCLC, and seven genes (ATAD5, BECN1, CDKN2A, FNTB, E2F1, KRAS and PTEN) were found to have a bearing on more than one mammalian phenotype (MP) each. Four network clusters centered by four genes thyroglobulin (TG), neurofibromatosis type-1 (NF1 ), neurofibromatosis type 2 (NF2 ) and E2F transcription factor 1 (E2F1) were generated. Genes in the four network modules were enriched in eight KEGG pathways (p value < 0.05), including pathways in cancer, small cell lung cancer, cell cycle and p53 signaling pathway. Genes p53 and E2F1 may play important roles in NSCLC occurrence, and thus can be considered as therapeutic targets for NSCLC.
Collapse
|
110
|
Smith CM, Finger JH, Kadin JA, Richardson JE, Ringwald M. The gene expression database for mouse development (GXD): putting developmental expression information at your fingertips. Dev Dyn 2014; 243:1176-86. [PMID: 24958384 DOI: 10.1002/dvdy.24155] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2014] [Revised: 05/16/2014] [Accepted: 06/17/2014] [Indexed: 12/15/2022] Open
Abstract
Because molecular mechanisms of development are extraordinarily complex, the understanding of these processes requires the integration of pertinent research data. Using the Gene Expression Database for Mouse Development (GXD) as an example, we illustrate the progress made toward this goal, and discuss relevant issues that apply to developmental databases and developmental research in general. Since its first release in 1998, GXD has served the scientific community by integrating multiple types of expression data from publications and electronic submissions and by making these data freely and widely available. Focusing on endogenous gene expression in wild-type and mutant mice and covering data from RNA in situ hybridization, in situ reporter (knock-in), immunohistochemistry, reverse transcriptase-polymerase chain reaction, Northern blot, and Western blot experiments, the database has grown tremendously over the years in terms of data content and search utilities. Currently, GXD includes over 1.4 million annotated expression results and over 260,000 images. All these data and images are readily accessible to many types of database searches. Here we describe the data and search tools of GXD; explain how to use the database most effectively; discuss how we acquire, curate, and integrate developmental expression information; and describe how the research community can help in this process.
Collapse
|
111
|
Hancock JM. Commentary on Shimoyama et al. (2012): three ontologies to define phenotype measurement data. Front Genet 2014; 5:93. [PMID: 24795755 PMCID: PMC4006037 DOI: 10.3389/fgene.2014.00093] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Accepted: 04/03/2014] [Indexed: 01/17/2023] Open
Affiliation(s)
- John M Hancock
- Department of Physiology, Development and Neuroscience, University of Cambridge Cambridge, UK
| |
Collapse
|
112
|
Abstract
The use of model organisms as tools for the investigation of human genetic variation has significantly and rapidly advanced our understanding of the aetiologies underlying hereditary traits. However, while equivalences in the DNA sequence of two species may be readily inferred through evolutionary models, the identification of equivalence in the phenotypic consequences resulting from comparable genetic variation is far from straightforward, limiting the value of the modelling paradigm. In this review, we provide an overview of the emerging statistical and computational approaches to objectively identify phenotypic equivalence between human and model organisms with examples from the vertebrate models, mouse and zebrafish. Firstly, we discuss enrichment approaches, which deem the most frequent phenotype among the orthologues of a set of genes associated with a common human phenotype as the orthologous phenotype, or phenolog, in the model species. Secondly, we introduce and discuss computational reasoning approaches to identify phenotypic equivalences made possible through the development of intra- and interspecies ontologies. Finally, we consider the particular challenges involved in modelling neuropsychiatric disorders, which illustrate many of the remaining difficulties in developing comprehensive and unequivocal interspecies phenotype mappings.
Collapse
Affiliation(s)
- Peter N. Robinson
- Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany
- Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Berlin, Germany
- Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
- * E-mail: (PNR); (CW)
| | - Caleb Webber
- MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
- * E-mail: (PNR); (CW)
| |
Collapse
|
113
|
Singleton M, Guthery S, Voelkerding K, Chen K, Kennedy B, Margraf R, Durtschi J, Eilbeck K, Reese M, Jorde L, Huff C, Yandell M. Phevor combines multiple biomedical ontologies for accurate identification of disease-causing alleles in single individuals and small nuclear families. Am J Hum Genet 2014; 94:599-610. [PMID: 24702956 DOI: 10.1016/j.ajhg.2014.03.010] [Citation(s) in RCA: 137] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2014] [Accepted: 03/13/2014] [Indexed: 10/25/2022] Open
Abstract
Phevor integrates phenotype, gene function, and disease information with personal genomic data for improved power to identify disease-causing alleles. Phevor works by combining knowledge resident in multiple biomedical ontologies with the outputs of variant-prioritization tools. It does so by using an algorithm that propagates information across and between ontologies. This process enables Phevor to accurately reprioritize potentially damaging alleles identified by variant-prioritization tools in light of gene function, disease, and phenotype knowledge. Phevor is especially useful for single-exome and family-trio-based diagnostic analyses, the most commonly occurring clinical scenarios and ones for which existing personal genome diagnostic tools are most inaccurate and underpowered. Here, we present a series of benchmark analyses illustrating Phevor's performance characteristics. Also presented are three recent Utah Genome Project case studies in which Phevor was used to identify disease-causing alleles. Collectively, these results show that Phevor improves diagnostic accuracy not only for individuals presenting with established disease phenotypes but also for those with previously undescribed and atypical disease presentations. Importantly, Phevor is not limited to known diseases or known disease-causing alleles. As we demonstrate, Phevor can also use latent information in ontologies to discover genes and disease-causing alleles not previously associated with disease.
Collapse
|
114
|
Hoehndorf R, Haendel M, Stevens R, Rebholz-Schuhmann D. Thematic series on biomedical ontologies in JBMS: challenges and new directions. J Biomed Semantics 2014; 5:15. [PMID: 24602198 PMCID: PMC4006457 DOI: 10.1186/2041-1480-5-15] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2014] [Accepted: 02/09/2014] [Indexed: 01/08/2023] Open
Abstract
Over the past 15 years, the biomedical research community has increased its efforts to produce ontologies encoding biomedical knowledge, and to provide the corresponding infrastructure to maintain them. As ontologies are becoming a central part of biological and biomedical research, a communication channel to publish frequent updates and latest developments on them would be an advantage. Here, we introduce the JBMS thematic series on Biomedical Ontologies. The aim of the series is to disseminate the latest developments in research on biomedical ontologies and provide a venue for publishing newly developed ontologies, updates to existing ontologies as well as methodological advances, and selected contributions from conferences and workshops. We aim to give this thematic series a central role in the exploration of ongoing research in biomedical ontologies and intend to work closely together with the research community towards this aim. Researchers and working groups are encouraged to provide feedback on novel developments and special topics to be integrated into the existing publication cycles.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Department of Computer Science, Aberystwyth University, Llandinam Building, SY23 3DB Aberystwyth, UK
| | - Melissa Haendel
- OHSU Library and Department of Medical Informatics, Portland, Oregon, USA
- Department of Medical Informatics and Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| | - Robert Stevens
- School of Computer Science, The University of Manchester, Oxford Road, M13 9PL Manchester, UK
| | - Dietrich Rebholz-Schuhmann
- Department of Computational Linguistics, University of Zürich, Binzmühlestrasse 14, 8050 Zürich, Switzerland
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
115
|
InterMOD: integrated data and tools for the unification of model organism research. Sci Rep 2014; 3:1802. [PMID: 23652793 PMCID: PMC3647165 DOI: 10.1038/srep01802] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2012] [Accepted: 04/05/2013] [Indexed: 11/26/2022] Open
Abstract
Model organisms are widely used for understanding basic biology, and have significantly contributed to the study of human disease. In recent years, genomic analysis has provided extensive evidence of widespread conservation of gene sequence and function amongst eukaryotes, allowing insights from model organisms to help decipher gene function in a wider range of species. The InterMOD consortium is developing an infrastructure based around the InterMine data warehouse system to integrate genomic and functional data from a number of key model organisms, leading the way to improved cross-species research. So far including budding yeast, nematode worm, fruit fly, zebrafish, rat and mouse, the project has set up data warehouses, synchronized data models, and created analysis tools and links between data from different species. The project unites a number of major model organism databases, improving both the consistency and accessibility of comparative research, to the benefit of the wider scientific community.
Collapse
|
116
|
Ahmed MM, Dhanasekaran AR, Block A, Tong S, Costa ACS, Gardiner KJ. Protein profiles associated with context fear conditioning and their modulation by memantine. Mol Cell Proteomics 2014; 13:919-37. [PMID: 24469516 DOI: 10.1074/mcp.m113.035568] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Analysis of the molecular basis of learning and memory has revealed details of the roles played by many genes and the proteins they encode. Because most individual studies focus on a small number of proteins, many complexities of the relationships among proteins and their dynamic responses to stimulation are not known. We have used the technique of reverse phase protein arrays (RPPA) to assess the levels of more than 80 proteins/protein modifications in subcellular fractions from hippocampus and cortex of mice trained in Context Fear Conditioning (CFC). Proteins include components of signaling pathways, several encoded by immediate early genes or involved in apoptosis and inflammation, and subunits of glutamate receptors. At one hour after training, levels of more than half the proteins had changed in one or more fractions, among them multiple components of the Mitogen-activated protein kinase, MAPK, and Mechanistic Target of Rapamycin, MTOR, pathways, subunits of glutamate receptors, and the NOTCH pathway modulator, NUMB homolog (Drosophila). Levels of 37 proteins changed in the nuclear fraction of hippocampus alone. Abnormalities in levels of thirteen proteins analyzed have been reported in brains of patients with Alzheimer's Disease. We therefore further investigated the protein profiles of mice treated with memantine, a drug approved for treatment of AD. In hippocampus, memantine alone induced many changes similar to those seen after CFC and altered the levels of seven proteins associated with Alzheimer's Disease abnormalities. Lastly, to further explore the relevance of these datasets, we superimposed responses to CFC and memantine onto components of the long term potentiation pathway, a process subserving learning and memory formation. Fourteen components of the long term potentiation pathway and 26 proteins interacting with components responded to CFC and/or memantine. Together, these datasets provide a novel view of the diversity and complexity in protein responses and interactions following normal learning.
Collapse
|
117
|
Blake JA, Bult CJ, Eppig JT, Kadin JA, Richardson JE. The Mouse Genome Database: integration of and access to knowledge about the laboratory mouse. Nucleic Acids Res 2013; 42:D810-7. [PMID: 24285300 PMCID: PMC3964950 DOI: 10.1093/nar/gkt1225] [Citation(s) in RCA: 171] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
The Mouse Genome Database (MGD) (http://www.informatics.jax.org) is the community model organism database resource for the laboratory mouse, a premier animal model for the study of genetic and genomic systems relevant to human biology and disease. MGD maintains a comprehensive catalog of genes, functional RNAs and other genome features as well as heritable phenotypes and quantitative trait loci. The genome feature catalog is generated by the integration of computational and manual genome annotations generated by NCBI, Ensembl and Vega/HAVANA. MGD curates and maintains the comprehensive listing of functional annotations for mouse genes using the Gene Ontology, and MGD curates and integrates comprehensive phenotype annotations including associations of mouse models with human diseases. Recent improvements include integration of the latest mouse genome build (GRCm38), improved access to comparative and functional annotations for mouse genes with expanded representation of comparative vertebrate genomes and new loads of phenotype data from high-throughput phenotyping projects. All MGD resources are freely available to the research community.
Collapse
Affiliation(s)
- Judith A Blake
- Bioinformatics and Computational Biology, The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609, USA
| | | | | | | | | | | |
Collapse
|
118
|
Abstract
The Mouse Phenome Database (MPD; phenome.jax.org) was launched in 2001 as the data coordination center for the international Mouse Phenome Project. MPD integrates quantitative phenotype, gene expression and genotype data into a common annotated framework to facilitate query and analysis. MPD contains >3500 phenotype measurements or traits relevant to human health, including cancer, aging, cardiovascular disorders, obesity, infectious disease susceptibility, blood disorders, neurosensory disorders, drug addiction and toxicity. Since our 2012 NAR report, we have added >70 new data sets, including data from Collaborative Cross lines and Diversity Outbred mice. During this time we have completely revamped our homepage, improved search and navigational aspects of the MPD application, developed several web-enabled data analysis and visualization tools, annotated phenotype data to public ontologies, developed an ontology browser and released new single nucleotide polymorphism query functionality with much higher density coverage than before. Here, we summarize recent data acquisitions and describe our latest improvements.
Collapse
Affiliation(s)
- Stephen C Grubb
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609 USA
| | | | | |
Collapse
|
119
|
Smith CM, Finger JH, Hayamizu TF, McCright IJ, Xu J, Berghout J, Campbell J, Corbani LE, Forthofer KL, Frost PJ, Miers D, Shaw DR, Stone KR, Eppig JT, Kadin JA, Richardson JE, Ringwald M. The mouse Gene Expression Database (GXD): 2014 update. Nucleic Acids Res 2013; 42:D818-24. [PMID: 24163257 PMCID: PMC3965015 DOI: 10.1093/nar/gkt954] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Gene Expression Database (GXD; http://www.informatics.jax.org/expression.shtml) is an extensive and well-curated community resource of mouse developmental expression information. GXD collects different types of expression data from studies of wild-type and mutant mice, covering all developmental stages and including data from RNA in situ hybridization, immunohistochemistry, RT-PCR, northern blot and western blot experiments. The data are acquired from the scientific literature and from researchers, including groups doing large-scale expression studies. Integration with the other data in Mouse Genome Informatics (MGI) and interconnections with other databases places GXD's gene expression information in the larger biological and biomedical context. Since the last report, the utility of GXD has been greatly enhanced by the addition of new data and by the implementation of more powerful and versatile search and display features. Web interface enhancements include the capability to search for expression data for genes associated with specific phenotypes and/or human diseases; new, more interactive data summaries; easy downloading of data; direct searches of expression images via associated metadata; and new displays that combine image data and their associated annotations. At present, GXD includes >1.4 million expression results and 250,000 images that are accessible to our search tools.
Collapse
|
120
|
Osumi-Sutherland D, Marygold SJ, Millburn GH, McQuilton PA, Ponting L, Stefancsik R, Falls K, Brown NH, Gkoutos GV. The Drosophila phenotype ontology. J Biomed Semantics 2013; 4:30. [PMID: 24138933 PMCID: PMC3816596 DOI: 10.1186/2041-1480-4-30] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Accepted: 10/11/2013] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Phenotype ontologies are queryable classifications of phenotypes. They provide a widely-used means for annotating phenotypes in a form that is human-readable, programatically accessible and that can be used to group annotations in biologically meaningful ways. Accurate manual annotation requires clear textual definitions for terms. Accurate grouping and fruitful programatic usage require high-quality formal definitions that can be used to automate classification. The Drosophila phenotype ontology (DPO) has been used to annotate over 159,000 phenotypes in FlyBase to date, but until recently lacked textual or formal definitions. RESULTS We have composed textual definitions for all DPO terms and formal definitions for 77% of them. Formal definitions reference terms from a range of widely-used ontologies including the Phenotype and Trait Ontology (PATO), the Gene Ontology (GO) and the Cell Ontology (CL). We also describe a generally applicable system, devised for the DPO, for recording and reasoning about the timing of death in populations. As a result of the new formalisations, 85% of classifications in the DPO are now inferred rather than asserted, with much of this classification leveraging the structure of the GO. This work has significantly improved the accuracy and completeness of classification and made further development of the DPO more sustainable. CONCLUSIONS The DPO provides a set of well-defined terms for annotating Drosophila phenotypes and for grouping and querying the resulting annotation sets in biologically meaningful ways. Such queries have already resulted in successful function predictions from phenotype annotation. Moreover, such formalisations make extended queries possible, including cross-species queries via the external ontologies used in formal definitions. The DPO is openly available under an open source license in both OBO and OWL formats. There is good potential for it to be used more broadly by the Drosophila community, which may ultimately result in its extension to cover a broader range of phenotypes.
Collapse
Affiliation(s)
| | - Steven J Marygold
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| | - Gillian H Millburn
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| | - Peter A McQuilton
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| | - Laura Ponting
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| | - Raymund Stefancsik
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| | - Kathleen Falls
- The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA, USA
| | - Nicholas H Brown
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
- Gurdon Institute & Department of Physiology, Development and Neuroscience, University of Cambridge, Tennis Court Road, Cambridge, UK
| | - Georgios V Gkoutos
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| |
Collapse
|
121
|
Roncaglia P, Martone ME, Hill DP, Berardini TZ, Foulger RE, Imam FT, Drabkin H, Mungall CJ, Lomax J. The Gene Ontology (GO) Cellular Component Ontology: integration with SAO (Subcellular Anatomy Ontology) and other recent developments. J Biomed Semantics 2013; 4:20. [PMID: 24093723 PMCID: PMC3852282 DOI: 10.1186/2041-1480-4-20] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Accepted: 09/24/2013] [Indexed: 12/31/2022] Open
Abstract
Background The Gene Ontology (GO) (http://www.geneontology.org/) contains a set of terms for describing the activity and actions of gene products across all kingdoms of life. Each of these activities is executed in a location within a cell or in the vicinity of a cell. In order to capture this context, the GO includes a sub-ontology called the Cellular Component (CC) ontology (GO-CCO). The primary use of this ontology is for GO annotation, but it has also been used for phenotype annotation, and for the annotation of images. Another ontology with similar scope to the GO-CCO is the Subcellular Anatomy Ontology (SAO), part of the Neuroscience Information Framework Standard (NIFSTD) suite of ontologies. The SAO also covers cell components, but in the domain of neuroscience. Description Recently, the GO-CCO was enriched in content and links to the Biological Process and Molecular Function branches of GO as well as to other ontologies. This was achieved in several ways. We carried out an amalgamation of SAO terms with GO-CCO ones; as a result, nearly 100 new neuroscience-related terms were added to the GO. The GO-CCO also contains relationships to GO Biological Process and Molecular Function terms, as well as connecting to external ontologies such as the Cell Ontology (CL). Terms representing protein complexes in the Protein Ontology (PRO) reference GO-CCO terms for their species-generic counterparts. GO-CCO terms can also be used to search a variety of databases. Conclusions In this publication we provide an overview of the GO-CCO, its overall design, and some recent extensions that make use of additional spatial information. One of the most recent developments of the GO-CCO was the merging in of the SAO, resulting in a single unified ontology designed to serve the needs of GO annotators as well as the specific needs of the neuroscience community.
Collapse
Affiliation(s)
- Paola Roncaglia
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK.
| | | | | | | | | | | | | | | | | |
Collapse
|
122
|
Brinkley JF, Borromeo C, Clarkson M, Cox TC, Cunningham MJ, Detwiler LT, Heike CL, Hochheiser H, Mejino JLV, Travillian RS, Shapiro LG. The ontology of craniofacial development and malformation for translational craniofacial research. AMERICAN JOURNAL OF MEDICAL GENETICS PART C-SEMINARS IN MEDICAL GENETICS 2013; 163C:232-45. [PMID: 24124010 DOI: 10.1002/ajmg.c.31377] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
We introduce the Ontology of Craniofacial Development and Malformation (OCDM) as a mechanism for representing knowledge about craniofacial development and malformation, and for using that knowledge to facilitate integrating craniofacial data obtained via multiple techniques from multiple labs and at multiple levels of granularity. The OCDM is a project of the NIDCR-sponsored FaceBase Consortium, whose goal is to promote and enable research into the genetic and epigenetic causes of specific craniofacial abnormalities through the provision of publicly accessible, integrated craniofacial data. However, the OCDM should be usable for integrating any web-accessible craniofacial data, not just those data available through FaceBase. The OCDM is based on the Foundational Model of Anatomy (FMA), our comprehensive ontology of canonical human adult anatomy, and includes modules to represent adult and developmental craniofacial anatomy in both human and mouse, mappings between homologous structures in human and mouse, and associated malformations. We describe these modules, as well as prototype uses of the OCDM for integrating craniofacial data. By using the terms from the OCDM to annotate data, and by combining queries over the ontology with those over annotated data, it becomes possible to create "intelligent" queries that can, for example, find gene expression data obtained from mouse structures that are precursors to homologous human structures involved in malformations such as cleft lip. We suggest that the OCDM can be useful not only for integrating craniofacial data, but also for expressing new knowledge gained from analyzing the integrated data.
Collapse
|
123
|
Tassy O, Pourquié O. Manteia, a predictive data mining system for vertebrate genes and its applications to human genetic diseases. Nucleic Acids Res 2013; 42:D882-91. [PMID: 24038354 PMCID: PMC3964984 DOI: 10.1093/nar/gkt807] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
The function of genes is often evolutionarily conserved, and comparing the annotation of ortholog genes in different model organisms has proved to be a powerful predictive tool to identify the function of human genes. Here, we describe Manteia, a resource available online at http://manteia.igbmc.fr. Manteia allows the comparison of embryological, expression, molecular and etiological data from human, mouse, chicken and zebrafish simultaneously to identify new functional and structural correlations and gene-disease associations. Manteia is particularly useful for the analysis of gene lists produced by high-throughput techniques such as microarrays or proteomics. Data can be easily analyzed statistically to characterize the function of groups of genes and to correlate the different aspects of their annotation. Sophisticated querying tools provide unlimited ways to merge the information contained in Manteia along with the possibility of introducing custom user-designed biological questions into the system. This allows for example to connect all the animal experimental results and annotations to the human genome, and take advantage of data not available for human to look for candidate genes responsible for genetic disorders. Here, we demonstrate the predictive and analytical power of the system to predict candidate genes responsible for human genetic diseases.
Collapse
Affiliation(s)
- Olivier Tassy
- Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), CNRS (UMR 7104), Inserm U964, Université de Strasbourg, Illkirch. F-67400, France, Stowers Institute for Medical Research, Kansas City, MO 64110, USA and Howard Hughes Medical Institute, Kansas City, MO 64110, USA
| | | |
Collapse
|
124
|
Makita Y, Kobayashi N, Yoshida Y, Doi K, Mochizuki Y, Nishikata K, Matsushima A, Takahashi S, Ishii M, Takatsuki T, Bhatia R, Khadbaatar Z, Watabe H, Masuya H, Toyoda T. PosMed: Ranking genes and bioresources based on Semantic Web Association Study. Nucleic Acids Res 2013; 41:W109-14. [PMID: 23761449 PMCID: PMC3692089 DOI: 10.1093/nar/gkt474] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Positional MEDLINE (PosMed; http://biolod.org/PosMed) is a powerful Semantic Web Association Study engine that ranks biomedical resources such as genes, metabolites, diseases and drugs, based on the statistical significance of associations between user-specified phenotypic keywords and resources connected directly or inferentially through a Semantic Web of biological databases such as MEDLINE, OMIM, pathways, co-expressions, molecular interactions and ontology terms. Since 2005, PosMed has long been used for in silico positional cloning studies to infer candidate disease-responsible genes existing within chromosomal intervals. PosMed is redesigned as a workbench to discover possible functional interpretations for numerous genetic variants found from exome sequencing of human disease samples. We also show that the association search engine enhances the value of mouse bioresources because most knockout mouse resources have no phenotypic annotation, but can be associated inferentially to phenotypes via genes and biomedical documents. For this purpose, we established text-mining rules to the biomedical documents by careful human curation work, and created a huge amount of correct linking between genes and documents. PosMed associates any phenotypic keyword to mouse resources with 20 public databases and four original data sets as of May 2013.
Collapse
Affiliation(s)
- Yuko Makita
- Bioinformatics and Systems Engineering Division, RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
125
|
Smedley D, Oellrich A, Köhler S, Ruef B, Westerfield M, Robinson P, Lewis S, Mungall C. PhenoDigm: analyzing curated annotations to associate animal models with human diseases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat025. [PMID: 23660285 PMCID: PMC3649640 DOI: 10.1093/database/bat025] [Citation(s) in RCA: 96] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The ultimate goal of studying model organisms is to translate what is learned into useful knowledge about normal human biology and disease to facilitate treatment and early screening for diseases. Recent advances in genomic technologies allow for rapid generation of models with a range of targeted genotypes as well as their characterization by high-throughput phenotyping. As an abundance of phenotype data become available, only systematic analysis will facilitate valid conclusions to be drawn from these data and transferred to human diseases. Owing to the volume of data, automated methods are preferable, allowing for a reliable analysis of the data and providing evidence about possible gene-disease associations. Here, we propose Phenotype comparisons for DIsease Genes and Models (PhenoDigm), as an automated method to provide evidence about gene-disease associations by analysing phenotype information. PhenoDigm integrates data from a variety of model organisms and, at the same time, uses several intermediate scoring methods to identify only strongly data-supported gene candidates for human genetic diseases. We show results of an automated evaluation as well as selected manually assessed examples that support the validity of PhenoDigm. Furthermore, we provide guidance on how to browse the data with PhenoDigm's web interface and illustrate its usefulness in supporting research. Database URL: http://www.sanger.ac.uk/resources/databases/phenodigm
Collapse
Affiliation(s)
- Damian Smedley
- Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
| | | | | | | | | | | | | | | | | |
Collapse
|
126
|
Köhler S, Doelken SC, Ruef BJ, Bauer S, Washington N, Westerfield M, Gkoutos G, Schofield P, Smedley D, Lewis SE, Robinson PN, Mungall CJ. Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research. F1000Res 2013; 2:30. [PMID: 24358873 DOI: 10.12688/f1000research.2-30.v1] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/22/2013] [Indexed: 12/30/2022] Open
Abstract
Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species. We have generated a cross-species phenotype ontology for human, mouse and zebrafish that contains classes from the Human Phenotype Ontology, Mammalian Phenotype Ontology, and generated classes for zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases. This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from http://purl.obolibrary.org/obo/hp/uberpheno/.
Collapse
Affiliation(s)
- Sebastian Köhler
- Institute for Medical and Human Genetics, Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany ; Berlin-Brandenberg Center for Regenerative Therapies (BCRT), Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany
| | - Sandra C Doelken
- Institute for Medical and Human Genetics, Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany
| | - Barbara J Ruef
- ZFIN, Institute of Neuroscience, University of Oregon, Eugene OR, 97403-5291, USA
| | - Sebastian Bauer
- Institute for Medical and Human Genetics, Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany
| | | | - Monte Westerfield
- ZFIN, Institute of Neuroscience, University of Oregon, Eugene OR, 97403-5291, USA
| | - George Gkoutos
- Department of Computer Science, University of Aberystwyth, Aberystwyth, SY23 2AX, UK
| | - Paul Schofield
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| | - Damian Smedley
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK
| | - Suzanna E Lewis
- Lawrence Berkeley National Laboratory, Berkeley CA, 94720, USA
| | - Peter N Robinson
- Institute for Medical and Human Genetics, Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany ; Berlin-Brandenberg Center for Regenerative Therapies (BCRT), Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany ; Max Planck Institute for Molecular Genetics, Berlin, 14195, Germany
| | | |
Collapse
|
127
|
Köhler S, Doelken SC, Ruef BJ, Bauer S, Washington N, Westerfield M, Gkoutos G, Schofield P, Smedley D, Lewis SE, Robinson PN, Mungall CJ. Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research. F1000Res 2013; 2:30. [PMID: 24358873 PMCID: PMC3799545 DOI: 10.12688/f1000research.2-30.v2] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/20/2014] [Indexed: 12/11/2022] Open
Abstract
Phenotype analyses, e.g. investigating metabolic processes, tissue formation, or organism behavior, are an important element of most biological and medical research activities. Biomedical researchers are making increased use of ontological standards and methods to capture the results of such analyses, with one focus being the comparison and analysis of phenotype information between species. We have generated a cross-species phenotype ontology for human, mouse and zebrafish that contains classes from the Human Phenotype Ontology, Mammalian Phenotype Ontology, and generated classes for zebrafish phenotypes. We also provide up-to-date annotation data connecting human genes to phenotype classes from the generated ontology. We have included the data generation pipeline into our continuous integration system ensuring stable and up-to-date releases. This article describes the data generation process and is intended to help interested researchers access both the phenotype annotation data and the associated cross-species phenotype ontology. The resource described here can be used in sophisticated semantic similarity and gene set enrichment analyses for phenotype data across species. The stable releases of this resource can be obtained from
http://purl.obolibrary.org/obo/hp/uberpheno/.
Collapse
Affiliation(s)
- Sebastian Köhler
- Institute for Medical and Human Genetics, Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany ; Berlin-Brandenberg Center for Regenerative Therapies (BCRT), Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany
| | - Sandra C Doelken
- Institute for Medical and Human Genetics, Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany
| | - Barbara J Ruef
- ZFIN, Institute of Neuroscience, University of Oregon, Eugene OR, 97403-5291, USA
| | - Sebastian Bauer
- Institute for Medical and Human Genetics, Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany
| | | | - Monte Westerfield
- ZFIN, Institute of Neuroscience, University of Oregon, Eugene OR, 97403-5291, USA
| | - George Gkoutos
- Department of Computer Science, University of Aberystwyth, Aberystwyth, SY23 2AX, UK
| | - Paul Schofield
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| | - Damian Smedley
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK
| | - Suzanna E Lewis
- Lawrence Berkeley National Laboratory, Berkeley CA, 94720, USA
| | - Peter N Robinson
- Institute for Medical and Human Genetics, Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany ; Berlin-Brandenberg Center for Regenerative Therapies (BCRT), Charité-Universitatsmedizin Berlin, Berlin, 13353, Germany ; Max Planck Institute for Molecular Genetics, Berlin, 14195, Germany
| | | |
Collapse
|