Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Gkoutos GV, Schofield PN, Hoehndorf R. The anatomy of phenotype ontologies: principles, properties and applications. Brief Bioinform 2018;19:1008-1021. [PMID: 28387809 PMCID: PMC6169674 DOI: 10.1093/bib/bbx035] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2017] [Revised: 02/05/2017] [Indexed: 12/14/2022] Open

For:	Gkoutos GV, Schofield PN, Hoehndorf R. The anatomy of phenotype ontologies: principles, properties and applications. Brief Bioinform 2018;19:1008-1021. [PMID: 28387809 PMCID: PMC6169674 DOI: 10.1093/bib/bbx035] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2017] [Revised: 02/05/2017] [Indexed: 12/14/2022] Open

Number

Cited by Other Article(s)

Rutherford KM, Lera-Ramírez M, Wood V. PomBase: a Global Core Biodata Resource-growth, collaboration, and sustainability. Genetics 2024;227:iyae007. [PMID: 38376816 PMCID: PMC11075564 DOI: 10.1093/genetics/iyae007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 01/13/2024] [Indexed: 02/21/2024] Open

Dumschott K, Dörpholz H, Laporte MA, Brilhaus D, Schrader A, Usadel B, Neumann S, Arnaud E, Kranz A. Ontologies for increasing the FAIRness of plant research data. FRONTIERS IN PLANT SCIENCE 2023;14:1279694. [PMID: 38098789 PMCID: PMC10720748 DOI: 10.3389/fpls.2023.1279694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 11/15/2023] [Indexed: 12/17/2023]

Clarke JL, Cooper LD, Poelchau MF, Berardini TZ, Elser J, Farmer AD, Ficklin S, Kumari S, Laporte MA, Nelson RT, Sadohara R, Selby P, Thessen AE, Whitehead B, Sen TZ. Data sharing and ontology use among agricultural genetics, genomics, and breeding databases and resources of the Agbiodata Consortium. Database (Oxford) 2023;2023:baad076. [PMID: 37971715 PMCID: PMC10653126 DOI: 10.1093/database/baad076] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 10/17/2023] [Indexed: 11/19/2023]

Abstract

Over the last couple of decades, there has been a rapid growth in the number and scope of agricultural genetics, genomics and breeding databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources (https://www.agbiodata.org/databases) covering model or crop plant and animal GGB data, ontologies, pathways, genetic variation and breeding platforms (referred to as 'databases' throughout). One of the goals of the Consortium is to facilitate FAIR (Findable, Accessible, Interoperable, and Reusable) data management and the integration of datasets which requires data sharing, along with structured vocabularies and/or ontologies. Two AgBioData working groups, focused on Data Sharing and Ontologies, respectively, conducted a Consortium-wide survey to assess the current status and future needs of the members in those areas. A total of 33 researchers responded to the survey, representing 37 databases. Results suggest that data-sharing practices by AgBioData databases are in a fairly healthy state, but it is not clear whether this is true for all metadata and data types across all databases; and that, ontology use has not substantially changed since a similar survey was conducted in 2017. Based on our evaluation of the survey results, we recommend (i) providing training for database personnel in a specific data-sharing techniques, as well as in ontology use; (ii) further study on what metadata is shared, and how well it is shared among databases; (iii) promoting an understanding of data sharing and ontologies in the stakeholder community; (iv) improving data sharing and ontologies for specific phenotypic data types and formats; and (v) lowering specific barriers to data sharing and ontology use, by identifying sustainability solutions, and the identification, promotion, or development of data standards. Combined, these improvements are likely to help AgBioData databases increase development efforts towards improved ontology use, and data sharing via programmatic means. Database URL https://www.agbiodata.org/databases.

Collapse

Affiliation(s)

Jennifer L Clarke Department of Statistics and Department of Food Science and Technology, University of Nebraska–Lincoln, 340 Hardin Hall North Wing, Lincoln, NE 68583, USA
Laurel D Cooper Department of Botany and Plant Pathology, Oregon State University, 2503 Cordley Hall, Corvallis, OR 97331, USA
Monica F Poelchau USDA, Agricultural Research Service, National Agricultural Library, 10301 Baltimore Ave, Beltsville 20705, USA
Tanya Z Berardini The Arabidopsis Information Resource and Phoenix Bioinformatic, 39899 Balentine Drive, Suite 200, Newark, CA, USA
Justin Elser Department of Botany and Plant Pathology, Oregon State University, 2503 Cordley Hall, Corvallis, OR 97331, USA
Andrew D Farmer National Center for Genome Resources, 2935 Rodeo Park Dr. E., Santa Fe, NM 87505, USA
Stephen Ficklin Department of Horticulture, Washington State University, 249 Clark Hall, PO Box 646414, Pullman, WA 99164, USA
Sunita Kumari Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY 11724, USA
Marie-Angélique Laporte Digital Inclusion, Bioversity International, Parc Scientifique Agropolis II, 1990 Bd de la Lironde, Montpellier 34397, France
Rex T Nelson USDA, Agricultural Research Service, Corn Insects and Crop Genetics Research Unit, Iowa State University, 716 Farmhouse Lane, Ames, IA 50011, USA
Rie Sadohara Department of Plant, Soil, and Microbial Sciences, Michigan State University, 1066 Bogue St, East Lansing, MI 48824, USA
Peter Selby School of Integrative Plant Science, College of Agriculture and Life Sciences, Cornell University, 215 Garden Avenue, Ithaca, NY 14850, USA
Anne E Thessen Department of Biomedical Informatics, University of Colorado Anschutz, 1890 N. Revere Court, Mailstop F600, Aurora CO 80045, USA
Brandon Whitehead Data Science and Informatics, Manaaki Whenua—Landcare Research, Ltd., Riddet Road, Massey University, Palmerston North 4472, New Zealand
Taner Z Sen USDA, Agricultural Research Service, Crop Improvement Genetics Research Unit, Western Regional Research Center, 800 Buchanan St, Albany 94710, USA Department of Bioengineering, University of California, 306 Stanley Hall, Berkeley, CA 94720, USA

Collapse

Huttenhower C, Finn RD, McHardy AC. Challenges and opportunities in sharing microbiome data and analyses. Nat Microbiol 2023;8:1960-1970. [PMID: 37783751 DOI: 10.1038/s41564-023-01484-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Accepted: 08/28/2023] [Indexed: 10/04/2023]

Ambalavanan R, Snead RS, Marczika J, Kozinsky K, Aman E. Advancing the Management of Long COVID by Integrating into Health Informatics Domain: Current and Future Perspectives. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023;20:6836. [PMID: 37835106 PMCID: PMC10572294 DOI: 10.3390/ijerph20196836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/20/2023] [Accepted: 09/22/2023] [Indexed: 10/15/2023]

Stefancsik R, Balhoff JP, Balk MA, Ball RL, Bello SM, Caron AR, Chesler EJ, de Souza V, Gehrke S, Haendel M, Harris LW, Harris NL, Ibrahim A, Koehler S, Matentzoglu N, McMurry JA, Mungall CJ, Munoz-Torres MC, Putman T, Robinson P, Smedley D, Sollis E, Thessen AE, Vasilevsky N, Walton DO, Osumi-Sutherland D. The Ontology of Biological Attributes (OBA)-computational traits for the life sciences. Mamm Genome 2023;34:364-378. [PMID: 37076585 PMCID: PMC10382347 DOI: 10.1007/s00335-023-09992-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 04/06/2023] [Indexed: 04/21/2023]

Affiliation(s)

Ray Stefancsik European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK.
James P Balhoff Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC, 27517, USA
Meghan A Balk Natural History Museum, University of Oslo, Oslo, Norway
Robyn L Ball The Jackson Laboratory, Bar Harbor, ME, 04609, USA
Susan M Bello The Jackson Laboratory, Bar Harbor, ME, 04609, USA
Anita R Caron European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
Elissa J Chesler The Jackson Laboratory, Bar Harbor, ME, 04609, USA
Vinicius de Souza European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
Sarah Gehrke Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
Melissa Haendel Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
Laura W Harris European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
Nomi L Harris Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Arwa Ibrahim European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
Sebastian Koehler Ada Health GmbH, Berlin, Germany
Nicolas Matentzoglu Semanticly, Athens, Greece
Julie A McMurry Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
Christopher J Mungall Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Monica C Munoz-Torres Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
Tim Putman Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
Peter Robinson The Jackson Laboratory, Bar Harbor, ME, 04609, USA
Damian Smedley William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK
Elliot Sollis European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
Anne E Thessen Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
Nicole Vasilevsky Data Collaboration Center, Critical Path Institute, Tucson, AZ, 85718, USA
David O Walton The Jackson Laboratory, Bar Harbor, ME, 04609, USA
David Osumi-Sutherland European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK

Collapse

Alghamdi SM, Hoehndorf R. Improving the classification of cardinality phenotypes using collections. J Biomed Semantics 2023;14:9. [PMID: 37550716 PMCID: PMC10405428 DOI: 10.1186/s13326-023-00290-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 07/07/2023] [Indexed: 08/09/2023] Open

Ruberte J, Schofield PN, Sundberg JP, Rodriguez-Baeza A, Carretero A, McKerlie C. Bridging mouse and human anatomies; a knowledge-based approach to comparative anatomy for disease model phenotyping. Mamm Genome 2023:10.1007/s00335-023-10005-4. [PMID: 37421464 PMCID: PMC10382392 DOI: 10.1007/s00335-023-10005-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Accepted: 06/13/2023] [Indexed: 07/10/2023]

Thessen AE, Cooper L, Swetnam TL, Hegde H, Reese J, Elser J, Jaiswal P. Using knowledge graphs to infer gene expression in plants. Front Artif Intell 2023;6:1201002. [PMID: 37384147 PMCID: PMC10298150 DOI: 10.3389/frai.2023.1201002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 05/23/2023] [Indexed: 06/30/2023] Open

Abstract

Introduction

Climate change is already affecting ecosystems around the world and forcing us to adapt to meet societal needs. The speed with which climate change is progressing necessitates a massive scaling up of the number of species with understood genotype-environment-phenotype (G×E×P) dynamics in order to increase ecosystem and agriculture resilience. An important part of predicting phenotype is understanding the complex gene regulatory networks present in organisms. Previous work has demonstrated that knowledge about one species can be applied to another using ontologically-supported knowledge bases that exploit homologous structures and homologous genes. These types of structures that can apply knowledge about one species to another have the potential to enable the massive scaling up that is needed through in silico experimentation.

Methods

We developed one such structure, a knowledge graph (KG) using information from Planteome and the EMBL-EBI Expression Atlas that connects gene expression, molecular interactions, functions, and pathways to homology-based gene annotations. Our preliminary analysis uses data from gene expression studies in Arabidopsis thaliana and Populus trichocarpa plants exposed to drought conditions.

Results

A graph query identified 16 pairs of homologous genes in these two taxa, some of which show opposite patterns of gene expression in response to drought. As expected, analysis of the upstream cis-regulatory region of these genes revealed that homologs with similar expression behavior had conserved cis-regulatory regions and potential interaction with similar trans-elements, unlike homologs that changed their expression in opposite ways.

Discussion

This suggests that even though the homologous pairs share common ancestry and functional roles, predicting expression and phenotype through homology inference needs careful consideration of integrating cis and trans-regulatory components in the curated and inferred knowledge graph.

Collapse

Aleksander SA, Balhoff J, Carbon S, Cherry JM, Drabkin HJ, Ebert D, Feuermann M, Gaudet P, Harris NL, Hill DP, Lee R, Mi H, Moxon S, Mungall CJ, Muruganugan A, Mushayahama T, Sternberg PW, Thomas PD, Van Auken K, Ramsey J, Siegele DA, Chisholm RL, Fey P, Aspromonte MC, Nugnes MV, Quaglia F, Tosatto S, Giglio M, Nadendla S, Antonazzo G, Attrill H, Dos Santos G, Marygold S, Strelets V, Tabone CJ, Thurmond J, Zhou P, Ahmed SH, Asanitthong P, Luna Buitrago D, Erdol MN, Gage MC, Ali Kadhum M, Li KYC, Long M, Michalak A, Pesala A, Pritazahra A, Saverimuttu SCC, Su R, Thurlow KE, Lovering RC, Logie C, Oliferenko S, Blake J, Christie K, Corbani L, Dolan ME, Drabkin HJ, Hill DP, Ni L, Sitnikov D, Smith C, Cuzick A, Seager J, Cooper L, Elser J, Jaiswal P, Gupta P, Jaiswal P, Naithani S, Lera-Ramirez M, Rutherford K, Wood V, De Pons JL, Dwinell MR, Hayman GT, Kaldunski ML, Kwitek AE, Laulederkind SJF, Tutaj MA, Vedi M, Wang SJ, D'Eustachio P, Aimo L, Axelsen K, Bridge A, Hyka-Nouspikel N, Morgat A, Aleksander SA, Cherry JM, Engel SR, Karra K, Miyasato SR, Nash RS, Skrzypek MS, Weng S, Wong ED, Bakker E, Berardini TZ, Reiser L, Auchincloss A, Axelsen K, Argoud-Puy G, Blatter MC, Boutet E, Breuza L, Bridge A, Casals-Casas C, Coudert E, Estreicher A, Livia Famiglietti M, Feuermann M, Gos A, Gruaz-Gumowski N, Hulo C, Hyka-Nouspikel N, Jungo F, Le Mercier P, Lieberherr D, Masson P, Morgat A, Pedruzzi I, Pourcel L, Poux S, Rivoire C, Sundaram S, Bateman A, Bowler-Barnett E, Bye-A-Jee H, Denny P, Ignatchenko A, Ishtiaq R, Lock A, Lussi Y, Magrane M, Martin MJ, Orchard S, Raposo P, Speretta E, Tyagi N, Warner K, Zaru R, Diehl AD, Lee R, Chan J, Diamantakis S, Raciti D, Zarowiecki M, Fisher M, James-Zorn C, Ponferrada V, Zorn A, Ramachandran S, Ruzicka L, Westerfield M. The Gene Ontology knowledgebase in 2023. Genetics 2023;224:iyad031. [PMID: 36866529 PMCID: PMC10158837 DOI: 10.1093/genetics/iyad031] [Citation(s) in RCA: 264] [Impact Index Per Article: 264.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 02/10/2023] [Accepted: 02/11/2023] [Indexed: 03/04/2023] Open

Stefancsik R, Balhoff JP, Balk MA, Ball R, Bello SM, Caron AR, Chessler E, de Souza V, Gehrke S, Haendel M, Harris LW, Harris NL, Ibrahim A, Koehler S, Matentzoglu N, McMurry JA, Mungall CJ, Munoz-Torres MC, Putman T, Robinson P, Smedley D, Sollis E, Thessen AE, Vasilevsky N, Walton DO, Osumi-Sutherland D. The Ontology of Biological Attributes (OBA) - Computational Traits for the Life Sciences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.26.525742. [PMID: 36747660 PMCID: PMC9900877 DOI: 10.1101/2023.01.26.525742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]

Affiliation(s)

Ray Stefancsik European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
James P. Balhoff Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC 27517, USA
Meghan A. Balk National Ecological Observatory Network, Battelle, Boulder, CO 80301, USA
Robyn Ball The Jackson Laboratory, Bar Harbor, ME 04609, USA
Susan M. Bello The Jackson Laboratory, Bar Harbor, ME 04609, USA
Anita R. Caron European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
Elissa Chessler The Jackson Laboratory, Bar Harbor, ME 04609, USA
Vinicius de Souza European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
Sarah Gehrke Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
Melissa Haendel Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
Laura W. Harris European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
Nomi L. Harris Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Arwa Ibrahim European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
Sebastian Koehler Ada Health GmbH, Berlin, Germany
Nicolas Matentzoglu Semanticly Ltd., Athens, Greece
Julie A. McMurry Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
Christopher J. Mungall Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Monica C. Munoz-Torres Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
Tim Putman Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
Peter Robinson The Jackson Laboratory, Bar Harbor, ME 04609, USA
Damian Smedley William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, UK
Elliot Sollis European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
Anne E Thessen Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
Nicole Vasilevsky Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
David O. Walton The Jackson Laboratory, Bar Harbor, ME 04609, USA
David Osumi-Sutherland European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK

Collapse

Alghamdi SM, Schofield PN, Hoehndorf R. How much do model organism phenotypes contribute to the computational identification of human disease genes? Dis Model Mech 2022;15:275986. [PMID: 35758016 PMCID: PMC9366895 DOI: 10.1242/dmm.049441] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 06/13/2022] [Indexed: 12/04/2022] Open

Xiang J, Zhang J, Zhao Y, Wu FX, Li M. Biomedical data, computational methods and tools for evaluating disease-disease associations. Brief Bioinform 2022;23:6522999. [PMID: 35136949 DOI: 10.1093/bib/bbac006] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 12/12/2022] Open

Vogt L. FAIR data representation in times of eScience: a comparison of instance-based and class-based semantic representations of empirical data using phenotype descriptions as example. J Biomed Semantics 2021;12:20. [PMID: 34823588 PMCID: PMC8613519 DOI: 10.1186/s13326-021-00254-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 11/11/2021] [Indexed: 12/27/2022] Open

Abstract

BACKGROUND

The size, velocity, and heterogeneity of Big Data outclasses conventional data management tools and requires data and metadata to be fully machine-actionable (i.e., eScience-compliant) and thus findable, accessible, interoperable, and reusable (FAIR). This can be achieved by using ontologies and through representing them as semantic graphs. Here, we discuss two different semantic graph approaches of representing empirical data and metadata in a knowledge graph, with phenotype descriptions as an example. Almost all phenotype descriptions are still being published as unstructured natural language texts, with far-reaching consequences for their FAIRness, substantially impeding their overall usability within the life sciences. However, with an increasing amount of anatomy ontologies becoming available and semantic applications emerging, a solution to this problem becomes available. Researchers are starting to document and communicate phenotype descriptions through the Web in the form of highly formalized and structured semantic graphs that use ontology terms and Uniform Resource Identifiers (URIs) to circumvent the problems connected with unstructured texts.

RESULTS

Using phenotype descriptions as an example, we compare and evaluate two basic representations of empirical data and their accompanying metadata in the form of semantic graphs: the class-based TBox semantic graph approach called Semantic Phenotype and the instance-based ABox semantic graph approach called Phenotype Knowledge Graph. Their main difference is that only the ABox approach allows for identifying every individual part and property mentioned in the description in a knowledge graph. This technical difference results in substantial practical consequences that significantly affect the overall usability of empirical data. The consequences affect findability, accessibility, and explorability of empirical data as well as their comparability, expandability, universal usability and reusability, and overall machine-actionability. Moreover, TBox semantic graphs often require querying under entailment regimes, which is computationally more complex.

CONCLUSIONS

We conclude that, from a conceptual point of view, the advantages of the instance-based ABox semantic graph approach outweigh its shortcomings and outweigh the advantages of the class-based TBox semantic graph approach. Therefore, we recommend the instance-based ABox approach as a FAIR approach for documenting and communicating empirical data and metadata in a knowledge graph.

Collapse

Filice RW, Kahn CE. Biomedical Ontologies to Guide AI Development in Radiology. J Digit Imaging 2021;34:1331-1341. [PMID: 34724143 PMCID: PMC8669056 DOI: 10.1007/s10278-021-00527-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 04/27/2021] [Accepted: 10/13/2021] [Indexed: 10/25/2022] Open

Chapman M, Mumtaz S, Rasmussen LV, Karwath A, Gkoutos GV, Gao C, Thayer D, Pacheco JA, Parkinson H, Richesson RL, Jefferson E, Denaxas S, Curcin V. Desiderata for the development of next-generation electronic health record phenotype libraries. Gigascience 2021;10:giab059. [PMID: 34508578 PMCID: PMC8434766 DOI: 10.1093/gigascience/giab059] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 07/15/2021] [Accepted: 08/18/2021] [Indexed: 11/22/2022] Open

Kafkas Ş, Althubaiti S, Gkoutos GV, Hoehndorf R, Schofield PN. Linking common human diseases to their phenotypes; development of a resource for human phenomics. J Biomed Semantics 2021;12:17. [PMID: 34425897 PMCID: PMC8383460 DOI: 10.1186/s13326-021-00249-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 07/30/2021] [Indexed: 11/11/2022] Open

Abstract

Background

In recent years a large volume of clinical genomics data has become available due to rapid advances in sequencing technologies. Efficient exploitation of this genomics data requires linkage to patient phenotype profiles. Current resources providing disease-phenotype associations are not comprehensive, and they often do not have broad coverage of the disease terminologies, particularly ICD-10, which is still the primary terminology used in clinical settings.

Methods

We developed two approaches to gather disease-phenotype associations. First, we used a text mining method that utilizes semantic relations in phenotype ontologies, and applies statistical methods to extract associations between diseases in ICD-10 and phenotype ontology classes from the literature. Second, we developed a semi-automatic way to collect ICD-10–phenotype associations from existing resources containing known relationships.

Results

We generated four datasets. Two of them are independent datasets linking diseases to their phenotypes based on text mining and semi-automatic strategies. The remaining two datasets are generated from these datasets and cover a subset of ICD-10 classes of common diseases contained in UK Biobank. We extensively validated our text mined and semi-automatically curated datasets by: comparing them against an expert-curated validation dataset containing disease–phenotype associations, measuring their similarity to disease–phenotype associations found in public databases, and assessing how well they could be used to recover gene–disease associations using phenotype similarity.

Conclusion

We find that our text mining method can produce phenotype annotations of diseases that are correct but often too general to have significant information content, or too specific to accurately reflect the typical manifestations of the sporadic disease. On the other hand, the datasets generated from integrating multiple knowledgebases are more complete (i.e., cover more of the required phenotype annotations for a given disease). We make all data freely available at 10.5281/zenodo.4726713.

Supplementary Information

The online version contains supplementary material available at (10.1186/s13326-021-00249-x).

Collapse

Pendleton SC, Slater K, Karwath A, Gilbert RM, Davis N, Pesudovs K, Liu X, Denniston AK, Gkoutos GV, Braithwaite T. Development and application of the ocular immune-mediated inflammatory diseases ontology enhanced with synonyms from online patient support forum conversation. Comput Biol Med 2021;135:104542. [PMID: 34139439 PMCID: PMC8404035 DOI: 10.1016/j.compbiomed.2021.104542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 05/27/2021] [Accepted: 05/30/2021] [Indexed: 11/28/2022]

Abstract

BACKGROUND

Unstructured text created by patients represents a rich, but relatively inaccessible resource for advancing patient-centred care. This study aimed to develop an ontology for ocular immune-mediated inflammatory diseases (OcIMIDo), as a tool to facilitate data extraction and analysis, illustrating its application to online patient support forum data.

METHODS

We developed OcIMIDo using clinical guidelines, domain expertise, and cross-references to classes from other biomedical ontologies. We developed an approach to add patient-preferred synonyms text-mined from oliviasvision.org online forum, using statistical ranking. We validated the approach with split-sampling and comparison to manual extraction. Using OcIMIDo, we then explored the frequency of OcIMIDo classes and synonyms, and their potential association with natural language sentiment expressed in each online forum post.

FINDINGS

OcIMIDo (version 1.2) includes 661 classes, describing anatomy, clinical phenotype, disease activity status, complications, investigations, interventions and functional impacts. It contains 1661 relationships and axioms, 2851 annotations, including 1131 database cross-references, and 187 patient-preferred synonyms. To illustrate OcIMIDo's potential applications, we explored 9031 forum posts, revealing frequent mention of different clinical phenotypes, treatments, and complications. Language sentiment analysis of each post was generally positive (median 0.12, IQR 0.01-0.24). In multivariable logistic regression, the odds of a post expressing negative sentiment were significantly associated with first posts as compared to replies (OR 3.3, 95% CI 2.8 to 3.9, p < 0.001).

CONCLUSION

We report the development and validation of a new ontology for inflammatory eye diseases, which includes patient-preferred synonyms, and can be used to explore unstructured patient or physician-reported text data, with many potential applications.

Collapse

Kulmanov M, Smaili FZ, Gao X, Hoehndorf R. Semantic similarity and machine learning with ontologies. Brief Bioinform 2021;22:bbaa199. [PMID: 33049044 PMCID: PMC8293838 DOI: 10.1093/bib/bbaa199] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 08/03/2020] [Accepted: 08/04/2020] [Indexed: 12/13/2022] Open

Chen J, Althagafi A, Hoehndorf R. Predicting candidate genes from phenotypes, functions and anatomical site of expression. Bioinformatics 2021;37:853-860. [PMID: 33051643 PMCID: PMC8248315 DOI: 10.1093/bioinformatics/btaa879] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 08/26/2020] [Accepted: 09/28/2020] [Indexed: 12/30/2022] Open

Liu-Wei W, Kafkas Ş, Chen J, Dimonaco NJ, Tegnér J, Hoehndorf R. DeepViral: prediction of novel virus-host interactions from protein sequences and infectious disease phenotypes. Bioinformatics 2021;37:2722-2729. [PMID: 33682875 PMCID: PMC8428617 DOI: 10.1093/bioinformatics/btab147] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 01/18/2021] [Accepted: 03/01/2021] [Indexed: 11/12/2022] Open

Hammad R, Barhoush M, Abed-alguni BH. A Semantic-Based Approach for Managing Healthcare Big Data: A Survey. JOURNAL OF HEALTHCARE ENGINEERING 2020;2020:8865808. [PMID: 33489061 PMCID: PMC7787845 DOI: 10.1155/2020/8865808] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 11/02/2020] [Accepted: 11/09/2020] [Indexed: 12/20/2022]

Teletchea S, Teletchea F. STOREFISH 2.0: a database on the reproductive strategies of teleost fishes. Database (Oxford) 2020;2020:baaa095. [PMID: 33216894 PMCID: PMC7678788 DOI: 10.1093/database/baaa095] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 09/04/2020] [Accepted: 10/14/2020] [Indexed: 01/08/2023]

Deng L, Lin W, Wang J, Zhang J. DeepciRGO: functional prediction of circular RNAs through hierarchical deep neural networks using heterogeneous network features. BMC Bioinformatics 2020;21:519. [PMID: 33183227 PMCID: PMC7659092 DOI: 10.1186/s12859-020-03748-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Accepted: 09/11/2020] [Indexed: 12/28/2022] Open

Abstract

Background

Circular RNAs (circRNAs) are special noncoding RNA molecules with closed loop structures. Compared with the traditional linear RNA, circRNA is more stable and not easily degraded. Many studies have shown that circRNAs are involved in the regulation of various diseases and cancers. Determining the functions of circRNAs in mammalian cells is of great significance for revealing their mechanism of action in physiological and pathological processes, diagnosis and treatment of diseases. However, determining the functions of circRNAs on a large scale is a challenging task because of the high experimental costs.

Results

In this paper, we present a hierarchical deep learning model, DeepciRGO, which can effectively predict gene ontology functions of circRNAs. We build a heterogeneous network containing circRNA co-expressions, protein–protein interactions and protein–circRNA interactions. The topology features of proteins and circRNAs are calculated using a novel representation learning approach HIN2Vec across the heterogeneous network. Then, a deep multi-label hierarchical classification model is trained with the topology features to predict the biological process function in the gene ontology for each circRNA. In particular, we manually curated a benchmark dataset containing 185 GO annotations for 62 circRNAs, namely, circRNA2GO-62. The DeepciRGO achieves promising performance on the circRNA2GO-62 dataset with a maximum F-measure of 0.412, a recall score of 0.400, and an accuracy of 0.425, which are significantly better than other state-of-the-art RNA function prediction methods. In addition, we demonstrate the considerable potential of integrating multiple interactions and association networks.

Conclusions

DeepciRGO will be a useful tool for accurately annotating circRNAs. The experimental results show that integrating multi-source data can help to improve the predictive performance of DeepciRGO. Moreover, The model also can combine RNA structure and sequence information to further optimize predictive performance.

Collapse

Smaili FZ, Gao X, Hoehndorf R. Formal axioms in biomedical ontologies improve analysis and interpretation of associated data. Bioinformatics 2020;36:2229-2236. [PMID: 31821406 PMCID: PMC7141863 DOI: 10.1093/bioinformatics/btz920] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 10/16/2019] [Accepted: 12/06/2019] [Indexed: 12/30/2022] Open

Zhang W, Zhang H, Yang H, Li M, Xie Z, Li W. Computational resources associating diseases with genotypes, phenotypes and exposures. Brief Bioinform 2020;20:2098-2115. [PMID: 30102366 PMCID: PMC6954426 DOI: 10.1093/bib/bby071] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 07/01/2018] [Indexed: 12/16/2022] Open

Wang RL. Semantic characterization of adverse outcome pathways. AQUATIC TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2020;222:105478. [PMID: 32278258 PMCID: PMC7393770 DOI: 10.1016/j.aquatox.2020.105478] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2019] [Revised: 03/17/2020] [Accepted: 03/23/2020] [Indexed: 05/09/2023]

Abstract

This study was undertaken to systematically assess the utilities and performance of ontology-based semantic analysis in adverse outcome pathway (AOP) research. With an increasing number of AOPs developed by scientific domain experts to organize toxicity information and facilitate chemical risk assessment, there is a pressing need for objective approaches to evaluate the biological coherence and quality of these AOPs. Powered by ontologies covering a wide range of biological domains, abundant phenotypic data annotated ontologically, and some sophisticated knowledge computing tools, semantic analysis has great potential in this area of application. With the events in the AOP-Wiki first annotated into logical definitions and then grouped into phenotypic profiles by individual AOPs, the coherence and quality of AOPs were assessed at several levels: paired key event relationships (KER), all possible event pair combinations within AOPs, and the phenotypic profiles of AOPs, genes, biological pathways, human diseases, and selected chemicals. The semantic similarities were assessed at all these levels based on a unified cross-species vertebrate phenotype ontology encompassing the logical definitions of AOP events as well as many other domain ontologies. A substantial number of KERs and AOPs in the AOP-Wiki were found to be semantically coherent. These same coherent AOPs also mapped to many more genes, pathways, and diseases biologically aligned with the intended chain of events therein leading to their respective adverse outcomes. Significantly, these findings imply that semantic analysis should also have utilities in developing future AOPs by selecting candidate events from either the existing AOP-Wiki events or a broader collection of ontology terms semantically similar to the molecular initiating events or adverse outcomes of interest. In addition, semantic analysis enabled AOP networks to be constructed at the level of phenotypic profiles based on similarities, complementing those based on event sharing by bringing genes, pathways, diseases, and chemicals into the networks too-thus greatly expanding the biological scope and our understanding of AOPs.

Collapse

Gallagher RV, Falster DS, Maitner BS, Salguero-Gómez R, Vandvik V, Pearse WD, Schneider FD, Kattge J, Poelen JH, Madin JS, Ankenbrand MJ, Penone C, Feng X, Adams VM, Alroy J, Andrew SC, Balk MA, Bland LM, Boyle BL, Bravo-Avila CH, Brennan I, Carthey AJR, Catullo R, Cavazos BR, Conde DA, Chown SL, Fadrique B, Gibb H, Halbritter AH, Hammock J, Hogan JA, Holewa H, Hope M, Iversen CM, Jochum M, Kearney M, Keller A, Mabee P, Manning P, McCormack L, Michaletz ST, Park DS, Perez TM, Pineda-Munoz S, Ray CA, Rossetto M, Sauquet H, Sparrow B, Spasojevic MJ, Telford RJ, Tobias JA, Violle C, Walls R, Weiss KCB, Westoby M, Wright IJ, Enquist BJ. Open Science principles for accelerating trait-based science across the Tree of Life. Nat Ecol Evol 2020;4:294-303. [PMID: 32066887 DOI: 10.1038/s41559-020-1109-6] [Citation(s) in RCA: 73] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Accepted: 01/10/2020] [Indexed: 01/22/2023]

Affiliation(s)

Rachael V Gallagher Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia.
Daniel S Falster Evolution and Ecology Research Centre and School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, New South Wales, Australia
Brian S Maitner Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
Roberto Salguero-Gómez Department of Zoology, Oxford University, Oxford, UK.,Centre for Biodiversity and Conservation Science, University of Queensland, Brisbane, Queensland, Australia.,Evolutionary Demography Laboratory, Max Plank Institute for Demographic Research, Rostock, Germany
Vigdis Vandvik Department of Biological Sciences, University of Bergen, Bergen, Norway.,Bjerknes Centre for Climate Research, University of Bergen, Bergen, Norway
William D Pearse Ecology Center and Department of Biology, Utah State University, Logan, UT, USA
Florian D Schneider
Jens Kattge Max Planck Institute for Biogeochemistry, Jena, Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
Jorrit H Poelen
Joshua S Madin Hawai'i Institute of Marine Biology, University of Hawai'i at Manoa, Manoa, HI, USA
Markus J Ankenbrand Department of Bioinformatics, Biocenter, University of Wuerzburg, Wuerzburg, Germany.,Center for Computational and Theoretical Biology, Biocenter, University of Wuerzburg, Wuerzburg, Germany.,Comprehensive Heart Failure Center, University Hospital Wuerzburg, Wuerzburg, Germany
Caterina Penone Institute of Plant Sciences, University of Bern, Bern, Switzerland
Xiao Feng Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
Vanessa M Adams Discipline of Geography and Spatial Sciences, University of Tasmania, Hobart, Tasmania, Australia
John Alroy Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia
Samuel C Andrew Commonwealth Scientific and Industrial Research Organisation (CSIRO), Canberra, Australian Capital Territory, Australia
Meghan A Balk Bio5 Institute, University of Arizona, Tucson, AZ, USA
Lucie M Bland School of Life and Environmental Sciences, Centre for Integrative Ecology, Deakin University, Geelong, Victoria, Australia
Brad L Boyle Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
Catherine H Bravo-Avila Department of Biology, University of Miami, Miami, FL, USA.,Fairchild Tropical Botanic Garden, Coral Gables, FL, USA
Ian Brennan Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
Alexandra J R Carthey Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia
Renee Catullo Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
Brittany R Cavazos Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, USA
Dalia A Conde Species360 Conservation Science Alliance, Bloomington, MN, USA.,Interdisciplinary Center on Population Dynamics, University of Southern Denmark, Odense, Denmark.,Department of Biology, University of Southern Denmark, Odense, Denmark
Steven L Chown School of Biological Sciences, Monash University, Melbourne, Victoria, Australia
Belen Fadrique Department of Biology, University of Miami, Miami, FL, USA
Heloise Gibb Department of Ecology, Environment and Evolution and Centre for Future Landscapes, La Trobe University, Melbourne, Victoria, Australia
Aud H Halbritter Department of Biological Sciences, University of Bergen, Bergen, Norway.,Bjerknes Centre for Climate Research, University of Bergen, Bergen, Norway
Jennifer Hammock National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
J Aaron Hogan International Center for Tropical Botany, Department of Biological Sciences, Florida International University, Miami, FL, USA
Hamish Holewa Commonwealth Scientific and Industrial Research Organisation (CSIRO), Canberra, Australian Capital Territory, Australia
Michael Hope Commonwealth Scientific and Industrial Research Organisation (CSIRO), Canberra, Australian Capital Territory, Australia
Colleen M Iversen Climate Change Science Institute and Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
Malte Jochum German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.,Institute of Plant Sciences, University of Bern, Bern, Switzerland.,Institute of Biology, Leipzig University, Leipzig, Germany
Michael Kearney School of BioSciences, The University of Melbourne, Melbourne, Victoria, Australia
Alexander Keller Department of Bioinformatics, Biocenter, University of Wuerzburg, Wuerzburg, Germany.,Center for Computational and Theoretical Biology, Biocenter, University of Wuerzburg, Wuerzburg, Germany
Paula Mabee Department of Biology, University of South Dakota, Vermillion, SD, USA
Peter Manning Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Frankfurt, Germany
Luke McCormack Center for Tree Science, The Morton Arboretum, Lisle, IL, USA
Sean T Michaletz Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, British Columbia, Canada
Daniel S Park Department of Organismic and Evolutionary Biology and Harvard University Herbaria, Harvard University, Cambridge, MA, USA
Timothy M Perez Department of Biology, University of Miami, Miami, FL, USA.,Fairchild Tropical Botanic Garden, Coral Gables, FL, USA
Silvia Pineda-Munoz School of Biological Sciences and School of Earth & Atmospheric Sciences, Georgia Institute of Technology, Atlanta, GA, USA
Courtenay A Ray School of Life Sciences, Arizona State University, Tempe, AZ, USA
Maurizio Rossetto National Herbarium of New South Wales, Royal Botanic Gardens and Domain Trust, Sydney, New South Wales, Australia.,Queensland Alliance of Agriculture and Food Innovation, University of Queensland, Brisbane, Queensland, Australia
Hervé Sauquet Evolution and Ecology Research Centre and School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, New South Wales, Australia.,National Herbarium of New South Wales, Royal Botanic Gardens and Domain Trust, Sydney, New South Wales, Australia.,Ecologie Systématique Evolution, Univ. Paris-Sud, CNRS, AgroParisTech, Universite Paris-Saclay, Orsay, France
Benjamin Sparrow TERN / School of Biological Sciences, Faculty of Science, The University of Adelaide, Adelaide, South Australia, Australia
Marko J Spasojevic Department of Evolution, Ecology, and Organismal Biology, University of California Riverside, Riverside, CA, USA
Richard J Telford Department of Biological Sciences, University of Bergen, Bergen, Norway.,Bjerknes Centre for Climate Research, University of Bergen, Bergen, Norway
Joseph A Tobias Department of Life Sciences, Imperial College London, London, UK
Cyrille Violle CEFE, CNRS, Univ Montpellier, Université Paul Valéry Montpellier, Montpellier, France
Ramona Walls CyVerse, University of Arizona, Tucson, AZ, USA
Katherine C B Weiss School of Life Sciences, Arizona State University, Tempe, AZ, USA
Mark Westoby Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia
Ian J Wright Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia
Brian J Enquist Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA.,Santa Fe Institute, Santa Fe, NM, USA

Collapse

Ju M, Short AD, Thompson P, Bakerly ND, Gkoutos GV, Tsaprouni L, Ananiadou S. Annotating and detecting phenotypic information for chronic obstructive pulmonary disease. JAMIA Open 2020;2:261-271. [PMID: 31984360 PMCID: PMC6951876 DOI: 10.1093/jamiaopen/ooz009] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 02/21/2019] [Accepted: 03/19/2019] [Indexed: 12/29/2022] Open

Abstract

Objectives

Chronic obstructive pulmonary disease (COPD) phenotypes cover a range of lung abnormalities. To allow text mining methods to identify pertinent and potentially complex information about these phenotypes from textual data, we have developed a novel annotated corpus, which we use to train a neural network-based named entity recognizer to detect fine-grained COPD phenotypic information.

Materials and methods

Since COPD phenotype descriptions often mention other concepts within them (proteins, treatments, etc.), our corpus annotations include both outermost phenotype descriptions and concepts nested within them. Our neural layered bidirectional long short-term memory conditional random field (BiLSTM-CRF) network firstly recognizes nested mentions, which are fed into subsequent BiLSTM-CRF layers, to help to recognize enclosing phenotype mentions.

Results

Our corpus of 30 full papers (available at: http://www.nactem.ac.uk/COPD) is annotated by experts with 27 030 phenotype-related concept mentions, most of which are automatically linked to UMLS Metathesaurus concepts. When trained using the corpus, our BiLSTM-CRF network outperforms other popular approaches in recognizing detailed phenotypic information.

Discussion

Information extracted by our method can facilitate efficient location and exploration of detailed information about phenotypes, for example, those specifically concerning reactions to treatments.

Conclusion

The importance of our corpus for developing methods to extract fine-grained information about COPD phenotypes is demonstrated through its successful use to train a layered BiLSTM-CRF network to extract phenotypic information at various levels of granularity. The minimal human intervention needed for training should permit ready adaption to extracting phenotypic information about other diseases.

Collapse

Wegrzyn JL, Falk T, Grau E, Buehler S, Ramnath R, Herndon N. Cyberinfrastructure and resources to enable an integrative approach to studying forest trees. Evol Appl 2020;13:228-241. [PMID: 31892954 PMCID: PMC6935593 DOI: 10.1111/eva.12860] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 08/11/2019] [Accepted: 08/14/2019] [Indexed: 12/19/2022] Open

Elmore SA, Cardiff R, Cesta MF, Gkoutos GV, Hoehndorf R, Keenan CM, McKerlie C, Schofield PN, Sundberg JP, Ward JM. A Review of Current Standards and the Evolution of Histopathology Nomenclature for Laboratory Animals. ILAR J 2019;59:29-39. [PMID: 30476141 DOI: 10.1093/ilar/ily005] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Revised: 05/04/2018] [Indexed: 12/14/2022] Open

Affiliation(s)

Susan A Elmore Susan A. Elmore, MS, DVM, DCVP, DABT, FIATP, is NTP Pathologist and Staff Scientist at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Robert D. Cardiff, MD, PhD, is Distinguished Professor of Pathology, Emeritus at the UCD Center for Comparative Medicine, University of California, and the Department of Pathology and Laboratory Medicine, School of Medicine, Davis, in Davis, California. Mark F. Cesta, DVM, PhD, DACVP, is NTP Pathologist and Staff Scientist, leading the effort for establishment of the online NTP Nonneoplastic Lesion Atlas at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Georgios V. Gkoutos, PhD, DIC, is Professor of Clinical Bioinformatics at College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences Centre for Computational Biology, University of Birmingham in Birmingham, United Kingdom. Robert Hoehndorf, PhD, is Assistant Professor in Computer Science at the Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology in Thuwal, Kingdom of Saudi Arabia. Charlotte M. Keenan, VMD, DACVP, is a principle consultant at C.M. ToxPath Consulting in Doylestown, Pennsylvania, USA and leads the international STP effort for the publication of the harmonization of nomenclature and diagnostic criteria (INHAND) in toxicologic pathology. Colin McKerlie, DVM, DVSc, MRCVS, is a senior associate scientist in the Translational Medicine Research Program at The Hospital for Sick Children and a Professor in the Department of Pathobiology & Laboratory Medicine in the Faculty of Medicine at the University of Toronto, Toronto, Ontario, Canada. Paul N. Schofield, MA DPhil, is the University Reader in Biomedical Informatics at the Department of Physiology, Development & Neuroscience, University of Cambridge in Cambridge, United Kingdom and is also an adjunct professor at The Jackson Laboratory in Bar Harbor, Maine. John P. Sundberg, DVM, PhD, DACVP, is a professor at The Jackson Laboratory in Bar Harbor, Maine. Jerrold M. Ward, DVM, PhD, DACVP, FIATP, is a special volunteer at the National Cancer Institute, National Institutes of Health in Bethesda, MD and is also Adjunct Faculty at The Jackson Laboratory in Bar Harbor, Maine
Robert Cardiff Susan A. Elmore, MS, DVM, DCVP, DABT, FIATP, is NTP Pathologist and Staff Scientist at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Robert D. Cardiff, MD, PhD, is Distinguished Professor of Pathology, Emeritus at the UCD Center for Comparative Medicine, University of California, and the Department of Pathology and Laboratory Medicine, School of Medicine, Davis, in Davis, California. Mark F. Cesta, DVM, PhD, DACVP, is NTP Pathologist and Staff Scientist, leading the effort for establishment of the online NTP Nonneoplastic Lesion Atlas at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Georgios V. Gkoutos, PhD, DIC, is Professor of Clinical Bioinformatics at College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences Centre for Computational Biology, University of Birmingham in Birmingham, United Kingdom. Robert Hoehndorf, PhD, is Assistant Professor in Computer Science at the Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology in Thuwal, Kingdom of Saudi Arabia. Charlotte M. Keenan, VMD, DACVP, is a principle consultant at C.M. ToxPath Consulting in Doylestown, Pennsylvania, USA and leads the international STP effort for the publication of the harmonization of nomenclature and diagnostic criteria (INHAND) in toxicologic pathology. Colin McKerlie, DVM, DVSc, MRCVS, is a senior associate scientist in the Translational Medicine Research Program at The Hospital for Sick Children and a Professor in the Department of Pathobiology & Laboratory Medicine in the Faculty of Medicine at the University of Toronto, Toronto, Ontario, Canada. Paul N. Schofield, MA DPhil, is the University Reader in Biomedical Informatics at the Department of Physiology, Development & Neuroscience, University of Cambridge in Cambridge, United Kingdom and is also an adjunct professor at The Jackson Laboratory in Bar Harbor, Maine. John P. Sundberg, DVM, PhD, DACVP, is a professor at The Jackson Laboratory in Bar Harbor, Maine. Jerrold M. Ward, DVM, PhD, DACVP, FIATP, is a special volunteer at the National Cancer Institute, National Institutes of Health in Bethesda, MD and is also Adjunct Faculty at The Jackson Laboratory in Bar Harbor, Maine
Mark F Cesta Susan A. Elmore, MS, DVM, DCVP, DABT, FIATP, is NTP Pathologist and Staff Scientist at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Robert D. Cardiff, MD, PhD, is Distinguished Professor of Pathology, Emeritus at the UCD Center for Comparative Medicine, University of California, and the Department of Pathology and Laboratory Medicine, School of Medicine, Davis, in Davis, California. Mark F. Cesta, DVM, PhD, DACVP, is NTP Pathologist and Staff Scientist, leading the effort for establishment of the online NTP Nonneoplastic Lesion Atlas at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Georgios V. Gkoutos, PhD, DIC, is Professor of Clinical Bioinformatics at College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences Centre for Computational Biology, University of Birmingham in Birmingham, United Kingdom. Robert Hoehndorf, PhD, is Assistant Professor in Computer Science at the Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology in Thuwal, Kingdom of Saudi Arabia. Charlotte M. Keenan, VMD, DACVP, is a principle consultant at C.M. ToxPath Consulting in Doylestown, Pennsylvania, USA and leads the international STP effort for the publication of the harmonization of nomenclature and diagnostic criteria (INHAND) in toxicologic pathology. Colin McKerlie, DVM, DVSc, MRCVS, is a senior associate scientist in the Translational Medicine Research Program at The Hospital for Sick Children and a Professor in the Department of Pathobiology & Laboratory Medicine in the Faculty of Medicine at the University of Toronto, Toronto, Ontario, Canada. Paul N. Schofield, MA DPhil, is the University Reader in Biomedical Informatics at the Department of Physiology, Development & Neuroscience, University of Cambridge in Cambridge, United Kingdom and is also an adjunct professor at The Jackson Laboratory in Bar Harbor, Maine. John P. Sundberg, DVM, PhD, DACVP, is a professor at The Jackson Laboratory in Bar Harbor, Maine. Jerrold M. Ward, DVM, PhD, DACVP, FIATP, is a special volunteer at the National Cancer Institute, National Institutes of Health in Bethesda, MD and is also Adjunct Faculty at The Jackson Laboratory in Bar Harbor, Maine
Georgios V Gkoutos Susan A. Elmore, MS, DVM, DCVP, DABT, FIATP, is NTP Pathologist and Staff Scientist at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Robert D. Cardiff, MD, PhD, is Distinguished Professor of Pathology, Emeritus at the UCD Center for Comparative Medicine, University of California, and the Department of Pathology and Laboratory Medicine, School of Medicine, Davis, in Davis, California. Mark F. Cesta, DVM, PhD, DACVP, is NTP Pathologist and Staff Scientist, leading the effort for establishment of the online NTP Nonneoplastic Lesion Atlas at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Georgios V. Gkoutos, PhD, DIC, is Professor of Clinical Bioinformatics at College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences Centre for Computational Biology, University of Birmingham in Birmingham, United Kingdom. Robert Hoehndorf, PhD, is Assistant Professor in Computer Science at the Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology in Thuwal, Kingdom of Saudi Arabia. Charlotte M. Keenan, VMD, DACVP, is a principle consultant at C.M. ToxPath Consulting in Doylestown, Pennsylvania, USA and leads the international STP effort for the publication of the harmonization of nomenclature and diagnostic criteria (INHAND) in toxicologic pathology. Colin McKerlie, DVM, DVSc, MRCVS, is a senior associate scientist in the Translational Medicine Research Program at The Hospital for Sick Children and a Professor in the Department of Pathobiology & Laboratory Medicine in the Faculty of Medicine at the University of Toronto, Toronto, Ontario, Canada. Paul N. Schofield, MA DPhil, is the University Reader in Biomedical Informatics at the Department of Physiology, Development & Neuroscience, University of Cambridge in Cambridge, United Kingdom and is also an adjunct professor at The Jackson Laboratory in Bar Harbor, Maine. John P. Sundberg, DVM, PhD, DACVP, is a professor at The Jackson Laboratory in Bar Harbor, Maine. Jerrold M. Ward, DVM, PhD, DACVP, FIATP, is a special volunteer at the National Cancer Institute, National Institutes of Health in Bethesda, MD and is also Adjunct Faculty at The Jackson Laboratory in Bar Harbor, Maine
Robert Hoehndorf Susan A. Elmore, MS, DVM, DCVP, DABT, FIATP, is NTP Pathologist and Staff Scientist at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Robert D. Cardiff, MD, PhD, is Distinguished Professor of Pathology, Emeritus at the UCD Center for Comparative Medicine, University of California, and the Department of Pathology and Laboratory Medicine, School of Medicine, Davis, in Davis, California. Mark F. Cesta, DVM, PhD, DACVP, is NTP Pathologist and Staff Scientist, leading the effort for establishment of the online NTP Nonneoplastic Lesion Atlas at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Georgios V. Gkoutos, PhD, DIC, is Professor of Clinical Bioinformatics at College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences Centre for Computational Biology, University of Birmingham in Birmingham, United Kingdom. Robert Hoehndorf, PhD, is Assistant Professor in Computer Science at the Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology in Thuwal, Kingdom of Saudi Arabia. Charlotte M. Keenan, VMD, DACVP, is a principle consultant at C.M. ToxPath Consulting in Doylestown, Pennsylvania, USA and leads the international STP effort for the publication of the harmonization of nomenclature and diagnostic criteria (INHAND) in toxicologic pathology. Colin McKerlie, DVM, DVSc, MRCVS, is a senior associate scientist in the Translational Medicine Research Program at The Hospital for Sick Children and a Professor in the Department of Pathobiology & Laboratory Medicine in the Faculty of Medicine at the University of Toronto, Toronto, Ontario, Canada. Paul N. Schofield, MA DPhil, is the University Reader in Biomedical Informatics at the Department of Physiology, Development & Neuroscience, University of Cambridge in Cambridge, United Kingdom and is also an adjunct professor at The Jackson Laboratory in Bar Harbor, Maine. John P. Sundberg, DVM, PhD, DACVP, is a professor at The Jackson Laboratory in Bar Harbor, Maine. Jerrold M. Ward, DVM, PhD, DACVP, FIATP, is a special volunteer at the National Cancer Institute, National Institutes of Health in Bethesda, MD and is also Adjunct Faculty at The Jackson Laboratory in Bar Harbor, Maine
Charlotte M Keenan Susan A. Elmore, MS, DVM, DCVP, DABT, FIATP, is NTP Pathologist and Staff Scientist at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Robert D. Cardiff, MD, PhD, is Distinguished Professor of Pathology, Emeritus at the UCD Center for Comparative Medicine, University of California, and the Department of Pathology and Laboratory Medicine, School of Medicine, Davis, in Davis, California. Mark F. Cesta, DVM, PhD, DACVP, is NTP Pathologist and Staff Scientist, leading the effort for establishment of the online NTP Nonneoplastic Lesion Atlas at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Georgios V. Gkoutos, PhD, DIC, is Professor of Clinical Bioinformatics at College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences Centre for Computational Biology, University of Birmingham in Birmingham, United Kingdom. Robert Hoehndorf, PhD, is Assistant Professor in Computer Science at the Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology in Thuwal, Kingdom of Saudi Arabia. Charlotte M. Keenan, VMD, DACVP, is a principle consultant at C.M. ToxPath Consulting in Doylestown, Pennsylvania, USA and leads the international STP effort for the publication of the harmonization of nomenclature and diagnostic criteria (INHAND) in toxicologic pathology. Colin McKerlie, DVM, DVSc, MRCVS, is a senior associate scientist in the Translational Medicine Research Program at The Hospital for Sick Children and a Professor in the Department of Pathobiology & Laboratory Medicine in the Faculty of Medicine at the University of Toronto, Toronto, Ontario, Canada. Paul N. Schofield, MA DPhil, is the University Reader in Biomedical Informatics at the Department of Physiology, Development & Neuroscience, University of Cambridge in Cambridge, United Kingdom and is also an adjunct professor at The Jackson Laboratory in Bar Harbor, Maine. John P. Sundberg, DVM, PhD, DACVP, is a professor at The Jackson Laboratory in Bar Harbor, Maine. Jerrold M. Ward, DVM, PhD, DACVP, FIATP, is a special volunteer at the National Cancer Institute, National Institutes of Health in Bethesda, MD and is also Adjunct Faculty at The Jackson Laboratory in Bar Harbor, Maine
Colin McKerlie Susan A. Elmore, MS, DVM, DCVP, DABT, FIATP, is NTP Pathologist and Staff Scientist at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Robert D. Cardiff, MD, PhD, is Distinguished Professor of Pathology, Emeritus at the UCD Center for Comparative Medicine, University of California, and the Department of Pathology and Laboratory Medicine, School of Medicine, Davis, in Davis, California. Mark F. Cesta, DVM, PhD, DACVP, is NTP Pathologist and Staff Scientist, leading the effort for establishment of the online NTP Nonneoplastic Lesion Atlas at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Georgios V. Gkoutos, PhD, DIC, is Professor of Clinical Bioinformatics at College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences Centre for Computational Biology, University of Birmingham in Birmingham, United Kingdom. Robert Hoehndorf, PhD, is Assistant Professor in Computer Science at the Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology in Thuwal, Kingdom of Saudi Arabia. Charlotte M. Keenan, VMD, DACVP, is a principle consultant at C.M. ToxPath Consulting in Doylestown, Pennsylvania, USA and leads the international STP effort for the publication of the harmonization of nomenclature and diagnostic criteria (INHAND) in toxicologic pathology. Colin McKerlie, DVM, DVSc, MRCVS, is a senior associate scientist in the Translational Medicine Research Program at The Hospital for Sick Children and a Professor in the Department of Pathobiology & Laboratory Medicine in the Faculty of Medicine at the University of Toronto, Toronto, Ontario, Canada. Paul N. Schofield, MA DPhil, is the University Reader in Biomedical Informatics at the Department of Physiology, Development & Neuroscience, University of Cambridge in Cambridge, United Kingdom and is also an adjunct professor at The Jackson Laboratory in Bar Harbor, Maine. John P. Sundberg, DVM, PhD, DACVP, is a professor at The Jackson Laboratory in Bar Harbor, Maine. Jerrold M. Ward, DVM, PhD, DACVP, FIATP, is a special volunteer at the National Cancer Institute, National Institutes of Health in Bethesda, MD and is also Adjunct Faculty at The Jackson Laboratory in Bar Harbor, Maine
Paul N Schofield Susan A. Elmore, MS, DVM, DCVP, DABT, FIATP, is NTP Pathologist and Staff Scientist at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Robert D. Cardiff, MD, PhD, is Distinguished Professor of Pathology, Emeritus at the UCD Center for Comparative Medicine, University of California, and the Department of Pathology and Laboratory Medicine, School of Medicine, Davis, in Davis, California. Mark F. Cesta, DVM, PhD, DACVP, is NTP Pathologist and Staff Scientist, leading the effort for establishment of the online NTP Nonneoplastic Lesion Atlas at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Georgios V. Gkoutos, PhD, DIC, is Professor of Clinical Bioinformatics at College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences Centre for Computational Biology, University of Birmingham in Birmingham, United Kingdom. Robert Hoehndorf, PhD, is Assistant Professor in Computer Science at the Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology in Thuwal, Kingdom of Saudi Arabia. Charlotte M. Keenan, VMD, DACVP, is a principle consultant at C.M. ToxPath Consulting in Doylestown, Pennsylvania, USA and leads the international STP effort for the publication of the harmonization of nomenclature and diagnostic criteria (INHAND) in toxicologic pathology. Colin McKerlie, DVM, DVSc, MRCVS, is a senior associate scientist in the Translational Medicine Research Program at The Hospital for Sick Children and a Professor in the Department of Pathobiology & Laboratory Medicine in the Faculty of Medicine at the University of Toronto, Toronto, Ontario, Canada. Paul N. Schofield, MA DPhil, is the University Reader in Biomedical Informatics at the Department of Physiology, Development & Neuroscience, University of Cambridge in Cambridge, United Kingdom and is also an adjunct professor at The Jackson Laboratory in Bar Harbor, Maine. John P. Sundberg, DVM, PhD, DACVP, is a professor at The Jackson Laboratory in Bar Harbor, Maine. Jerrold M. Ward, DVM, PhD, DACVP, FIATP, is a special volunteer at the National Cancer Institute, National Institutes of Health in Bethesda, MD and is also Adjunct Faculty at The Jackson Laboratory in Bar Harbor, Maine
John P Sundberg Susan A. Elmore, MS, DVM, DCVP, DABT, FIATP, is NTP Pathologist and Staff Scientist at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Robert D. Cardiff, MD, PhD, is Distinguished Professor of Pathology, Emeritus at the UCD Center for Comparative Medicine, University of California, and the Department of Pathology and Laboratory Medicine, School of Medicine, Davis, in Davis, California. Mark F. Cesta, DVM, PhD, DACVP, is NTP Pathologist and Staff Scientist, leading the effort for establishment of the online NTP Nonneoplastic Lesion Atlas at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Georgios V. Gkoutos, PhD, DIC, is Professor of Clinical Bioinformatics at College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences Centre for Computational Biology, University of Birmingham in Birmingham, United Kingdom. Robert Hoehndorf, PhD, is Assistant Professor in Computer Science at the Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology in Thuwal, Kingdom of Saudi Arabia. Charlotte M. Keenan, VMD, DACVP, is a principle consultant at C.M. ToxPath Consulting in Doylestown, Pennsylvania, USA and leads the international STP effort for the publication of the harmonization of nomenclature and diagnostic criteria (INHAND) in toxicologic pathology. Colin McKerlie, DVM, DVSc, MRCVS, is a senior associate scientist in the Translational Medicine Research Program at The Hospital for Sick Children and a Professor in the Department of Pathobiology & Laboratory Medicine in the Faculty of Medicine at the University of Toronto, Toronto, Ontario, Canada. Paul N. Schofield, MA DPhil, is the University Reader in Biomedical Informatics at the Department of Physiology, Development & Neuroscience, University of Cambridge in Cambridge, United Kingdom and is also an adjunct professor at The Jackson Laboratory in Bar Harbor, Maine. John P. Sundberg, DVM, PhD, DACVP, is a professor at The Jackson Laboratory in Bar Harbor, Maine. Jerrold M. Ward, DVM, PhD, DACVP, FIATP, is a special volunteer at the National Cancer Institute, National Institutes of Health in Bethesda, MD and is also Adjunct Faculty at The Jackson Laboratory in Bar Harbor, Maine
Jerrold M Ward Susan A. Elmore, MS, DVM, DCVP, DABT, FIATP, is NTP Pathologist and Staff Scientist at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Robert D. Cardiff, MD, PhD, is Distinguished Professor of Pathology, Emeritus at the UCD Center for Comparative Medicine, University of California, and the Department of Pathology and Laboratory Medicine, School of Medicine, Davis, in Davis, California. Mark F. Cesta, DVM, PhD, DACVP, is NTP Pathologist and Staff Scientist, leading the effort for establishment of the online NTP Nonneoplastic Lesion Atlas at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Georgios V. Gkoutos, PhD, DIC, is Professor of Clinical Bioinformatics at College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences Centre for Computational Biology, University of Birmingham in Birmingham, United Kingdom. Robert Hoehndorf, PhD, is Assistant Professor in Computer Science at the Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology in Thuwal, Kingdom of Saudi Arabia. Charlotte M. Keenan, VMD, DACVP, is a principle consultant at C.M. ToxPath Consulting in Doylestown, Pennsylvania, USA and leads the international STP effort for the publication of the harmonization of nomenclature and diagnostic criteria (INHAND) in toxicologic pathology. Colin McKerlie, DVM, DVSc, MRCVS, is a senior associate scientist in the Translational Medicine Research Program at The Hospital for Sick Children and a Professor in the Department of Pathobiology & Laboratory Medicine in the Faculty of Medicine at the University of Toronto, Toronto, Ontario, Canada. Paul N. Schofield, MA DPhil, is the University Reader in Biomedical Informatics at the Department of Physiology, Development & Neuroscience, University of Cambridge in Cambridge, United Kingdom and is also an adjunct professor at The Jackson Laboratory in Bar Harbor, Maine. John P. Sundberg, DVM, PhD, DACVP, is a professor at The Jackson Laboratory in Bar Harbor, Maine. Jerrold M. Ward, DVM, PhD, DACVP, FIATP, is a special volunteer at the National Cancer Institute, National Institutes of Health in Bethesda, MD and is also Adjunct Faculty at The Jackson Laboratory in Bar Harbor, Maine

Collapse

Alshahrani M, Hoehndorf R. Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes. Bioinformatics 2019;34:i901-i907. [PMID: 30423077 PMCID: PMC6129260 DOI: 10.1093/bioinformatics/bty559] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open

Cianciarullo AM, Bonini-Domingos CR, Vizotto LD, Kobashi LS, Beçak ML, Beçak W. Whole-genome duplication and hemoglobin differentiation traits between allopatric populations of Brazilian Odontophrynus americanus species complex (Amphibia, Anura). Genet Mol Biol 2019;42:436-444. [PMID: 31259358 PMCID: PMC6726162 DOI: 10.1590/1678-4685-gmb-2017-0260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Accepted: 07/25/2018] [Indexed: 11/21/2022] Open

Alghamdi SM, Sundberg BA, Sundberg JP, Schofield PN, Hoehndorf R. Quantitative evaluation of ontology design patterns for combining pathology and anatomy ontologies. Sci Rep 2019;9:4025. [PMID: 30858527 PMCID: PMC6411989 DOI: 10.1038/s41598-019-40368-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 02/14/2019] [Indexed: 12/28/2022] Open

Kafkas Ş, Hoehndorf R. Ontology based text mining of gene-phenotype associations: application to candidate gene prediction. Database (Oxford) 2019;2019:baz019. [PMID: 30809638 PMCID: PMC6391585 DOI: 10.1093/database/baz019] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Revised: 01/09/2019] [Accepted: 01/26/2019] [Indexed: 01/07/2023]

Abstract

Gene-phenotype associations play an important role in understanding the disease mechanisms which is a requirement for treatment development. A portion of gene-phenotype associations are observed mainly experimentally and made publicly available through several standard resources such as MGI. However, there is still a vast amount of gene-phenotype associations buried in the biomedical literature. Given the large amount of literature data, we need automated text mining tools to alleviate the burden in manual curation of gene-phenotype associations and to develop comprehensive resources. In this study, we present an ontology-based approach in combination with statistical methods to text mine gene-phenotype associations from the literature. Our method achieved AUC values of 0.90 and 0.75 in recovering known gene-phenotype associations from HPO and MGI respectively. We posit that candidate genes and their relevant diseases should be expressed with similar phenotypes in publications. Thus, we demonstrate the utility of our approach by predicting disease candidate genes based on the semantic similarities of phenotypes associated with genes and diseases. To the best of our knowledge, this is the first study using an ontology based approach to extract gene-phenotype associations from the literature. We evaluated our disease candidate prediction model on the gene-disease associations from MGI. Our model achieved AUC values of 0.90 and 0.87 on OMIM (human) and MGI (mouse) datasets of gene-disease associations respectively. Our manual analysis on the text mined data revealed that our method can accurately extract gene-phenotype associations which are not currently covered by the existing public gene-phenotype resources. Overall, results indicate that our method can precisely extract known as well as new gene-phenotype associations from literature. All the data and methods are available at https://github.com/bio-ontology-research-group/genepheno.

Collapse

Neveu P, Tireau A, Hilgert N, Nègre V, Mineau‐Cesari J, Brichet N, Chapuis R, Sanchez I, Pommier C, Charnomordic B, Tardieu F, Cabrera‐Bosquet L. Dealing with multi-source and multi-scale information in plant phenomics: the ontology-driven Phenotyping Hybrid Information System. THE NEW PHYTOLOGIST 2019;221:588-601. [PMID: 30152011 PMCID: PMC6585972 DOI: 10.1111/nph.15385] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Accepted: 07/07/2018] [Indexed: 05/13/2023]

Endara L, Thessen AE, Cole HA, Walls R, Gkoutos G, Cao Y, Chong SS, Cui H. Modifier Ontologies for frequency, certainty, degree, and coverage phenotype modifier. Biodivers Data J 2018;6:e29232. [PMID: 30532623 PMCID: PMC6281706 DOI: 10.3897/bdj.6.e29232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 11/20/2018] [Indexed: 11/21/2022] Open

Abstract

Background: When phenotypic characters are described in the literature, they may be constrained or clarified with additional information such as the location or degree of expression, these terms are called "modifiers". With effort underway to convert narrative character descriptions to computable data, ontologies for such modifiers are needed. Such ontologies can also be used to guide term usage in future publications. Spatial and method modifiers are the subjects of ontologies that already have been developed or are under development. In this work, frequency (e.g., rarely, usually), certainty (e.g., probably, definitely), degree (e.g., slightly, extremely), and coverage modifiers (e.g., sparsely, entirely) are collected, reviewed, and used to create two modifier ontologies with different design considerations. The basic goal is to express the sequential relationships within a type of modifiers, for example, usually is more frequent than rarely, in order to allow data annotated with ontology terms to be classified accordingly. Method: Two designs are proposed for the ontology, both using the list pattern: a closed ordered list (i.e., five-bin design) and an open ordered list design. The five-bin design puts the modifier terms into a set of 5 fixed bins with interval object properties, for example, one_level_more/less_frequently_than, where new terms can only be added as synonyms to existing classes. The open list approach starts with 5 bins, but supports the extensibility of the list via ordinal properties, for example, more/less_frequently_than, allowing new terms to be inserted as a new class anywhere in the list. The consequences of the different design decisions are discussed in the paper. CharaParser was used to extract modifiers from plant, ant, and other taxonomic descriptions. After a manual screening, 130 modifier words were selected as the candidate terms for the modifier ontologies. Four curators/experts (three biologists and one information scientist specialized in biosemantics) reviewed and categorized the terms into 20 bins using the Ontology Term Organizer (OTO) (http://biosemantics.arizona.edu/OTO). Inter-curator variations were reviewed and expressed in the final ontologies. Results: Frequency, certainty, degree, and coverage terms with complete agreement among all curators were used as class labels or exact synonyms. Terms with different interpretations were either excluded or included using "broader synonym" or "not recommended" annotation properties. These annotations explicitly allow for the user to be aware of the semantic ambiguity associated with the terms and whether they should be used with caution or avoided. Expert categorization results showed that 16 out of 20 bins contained terms with full agreements, suggesting differentiating the modifiers into 5 levels/bins balances the need to differentiate modifiers and the need for the ontology to reflect user consensus. Two ontologies, developed using the Protege ontology editor, are made available as OWL files and can be downloaded from https://github.com/biosemantics/ontologies. Contribution: We built the first two modifier ontologies following a consensus-based approach with terms commonly used in taxonomic literature. The five-bin ontology has been used in the Explorer of Taxon Concepts web toolkit to compute the similarity between characters extracted from literature to facilitate taxon concepts alignments. The two ontologies will also be used in an ontology-informed authoring tool for taxonomists to facilitate consistency in modifier term usage.

Collapse

Affiliation(s)

Lorena Endara University of Florida, Gainesville, United States of AmericaUniversity of FloridaGainesvilleUnited States of America
Anne E Thessen The Ronin Institute for Independent Scholarship, Monclair, NJ, United States of AmericaThe Ronin Institute for Independent ScholarshipMonclair, NJUnited States of America
Heather A Cole Science and Technology Branch, Agriculture and Agri-Food Canada, Government of Canada, Ottawa, CanadaScience and Technology Branch, Agriculture and Agri-Food Canada, Government of CanadaOttawaCanada
Ramona Walls CyVerse, Tucson, United States of AmericaCyVerseTucsonUnited States of America
Georgios Gkoutos College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, United KingdomCollege of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of BirminghamBirminghamUnited Kingdom Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, B15 2TT, Birmingham, United KingdomInstitute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, B15 2TTBirminghamUnited Kingdom
Yujie Cao Center for Studies of Information Resources, Wuhan Universtity, Wuhan, ChinaCenter for Studies of Information Resources, Wuhan UniverstityWuhanChina
Steven S. Chong National Center for Ecological Analysis and Synthesis, University of California, Santa Barbara, Santa Barbara, United States of AmericaNational Center for Ecological Analysis and Synthesis, University of California, Santa BarbaraSanta BarbaraUnited States of America University of Arizona, Tucson, United States of AmericaUniversity of ArizonaTucsonUnited States of America
Hong Cui University of Arizona, Tucson, United States of AmericaUniversity of ArizonaTucsonUnited States of America

Collapse

Wang RL, Edwards S, Ives C. Ontology-based semantic mapping of chemical toxicities. Toxicology 2018;412:89-100. [PMID: 30468866 DOI: 10.1016/j.tox.2018.11.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Revised: 11/11/2018] [Accepted: 11/19/2018] [Indexed: 12/15/2022]

Abstract

This study was undertaken to evaluate the use of ontology-based semantic mapping (OS-Mapping) in chemical toxicity assessment. Nineteen chemical-species phenotypic profiles (CSPPs) were constructed by ontologically annotating the toxicity responses reported in more than seven hundred published studies of ten chemicals on six vertebrate species. The CSPPs were semantically compared to more than 29,000 publicly available phenotypic profiles of genes, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways, and diseases based on a cross-species phenotype ontology. OS-Mapping was shown to differentiate chemical toxicities among themselves as well as within and across species. It also revealed cases of chemical by species interactions. In addition to confirming similar MOAs (mechanisms of action) for a few chemicals, OS-Mapping also generated novel insights into the MOAs underlying some seemingly different, yet phenotypically similar, classes of chemicals. The nature of a unified cross-species phenotype ontology and its representation of diverse knowledge domains allowed the construction of a complete phenotypic continuum for the 17α-ethynylestradiol_fathead minnow across the biological levels of organization, which complemented a similar one derived from the Comparative Toxicogenomics Database but based primarily on 17α-ethynylestradiol-induced molecular phenotypes. Overall, OS-Mapping has been demonstrated to offer a powerful approach to help bridge the gap between the molecular and non-molecular phenotypes of chemicals characterized by using high throughput or traditional omics methods and their apical endpoints of greater regulatory relevance, which are typically phenotypes found at the higher levels of biological organization. OS-Mapping also enables comparative toxicity assessment among chemicals, both within and across species. Furthermore, the semantic analysis of phenotypes can reveal additional novel MOAs for some well-known chemicals and discover candidate MOAs for chemicals that are less molecularly characterized. A full phenotypic continuum based on OS-Mapping will also be conducive to the future development of adverse outcome pathways. As phenomics continues to advance and the ontological annotation of literature becomes more automated, the power of OS-Mapping will be further enhanced.

Collapse

Kulmanov M, Khan MA, Hoehndorf R, Wren J. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics 2018;34:660-668. [PMID: 29028931 PMCID: PMC5860606 DOI: 10.1093/bioinformatics/btx624] [Citation(s) in RCA: 199] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 09/27/2017] [Indexed: 12/29/2022] Open

Smaili FZ, Gao X, Hoehndorf R. OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction. Bioinformatics 2018;35:2133-2140. [DOI: 10.1093/bioinformatics/bty933] [Citation(s) in RCA: 65] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2018] [Revised: 11/02/2018] [Accepted: 11/07/2018] [Indexed: 12/11/2022] Open

Boudellioua I, Kulmanov M, Schofield PN, Gkoutos GV, Hoehndorf R. OligoPVP: Phenotype-driven analysis of individual genomic information to prioritize oligogenic disease variants. Sci Rep 2018;8:14681. [PMID: 30279426 PMCID: PMC6168481 DOI: 10.1038/s41598-018-32876-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 09/18/2018] [Indexed: 12/12/2022] Open

Abstract

An increasing number of disorders have been identified for which two or more distinct alleles in two or more genes are required to either cause the disease or to significantly modify its onset, severity or phenotype. It is difficult to discover such interactions using existing approaches. The purpose of our work is to develop and evaluate a system that can identify combinations of alleles underlying digenic and oligogenic diseases in individual whole exome or whole genome sequences. Information that links patient phenotypes to databases of gene-phenotype associations observed in clinical or non-human model organism research can provide useful information and improve variant prioritization for genetic diseases. Additional background knowledge about interactions between genes can be utilized to identify sets of variants in different genes in the same individual which may then contribute to the overall disease phenotype. We have developed OligoPVP, an algorithm that can be used to prioritize causative combinations of variants in digenic and oligogenic diseases, using whole exome or whole genome sequences together with patient phenotypes as input. We demonstrate that OligoPVP has significantly improved performance when compared to state of the art pathogenicity detection methods in the case of digenic diseases. Our results show that OligoPVP can efficiently prioritize sets of variants in digenic diseases using a phenotype-driven approach and identify etiologically important variants in whole genomes. OligoPVP naturally extends to oligogenic disease involving interactions between variants in two or more genes. It can be applied to the identification of multiple interacting candidate variants contributing to phenotype, where the action of modifier genes is suspected from pedigree analysis or failure of traditional causative variant identification.

Collapse

Howe DG, Blake JA, Bradford YM, Bult CJ, Calvi BR, Engel SR, Kadin JA, Kaufman TC, Kishore R, Laulederkind SJF, Lewis SE, Moxon SAT, Richardson JE, Smith C. Model organism data evolving in support of translational medicine. Lab Anim (NY) 2018;47:277-289. [PMID: 30224793 DOI: 10.1038/s41684-018-0150-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Accepted: 08/13/2018] [Indexed: 02/07/2023]

Abstract

Model organism databases (MODs) have been collecting and integrating biomedical research data for 30 years and were designed to meet specific needs of each model organism research community. The contributions of model organism research to understanding biological systems would be hard to overstate. Modern molecular biology methods and cost reductions in nucleotide sequencing have opened avenues for direct application of model organism research to elucidating mechanisms of human diseases. Thus, the mandate for model organism research and databases has now grown to include facilitating use of these data in translational applications. Challenges in meeting this opportunity include the distribution of research data across many databases and websites, a lack of data format standards for some data types, and sustainability of scale and cost for genomic database resources like MODs. The issues of widely distributed data and application of data standards are some of the challenges addressed by FAIR (Findable, Accessible, Interoperable, and Re-usable) data principles. The Alliance of Genome Resources is now moving to address these challenges by bringing together expertly curated research data from fly, mouse, rat, worm, yeast, zebrafish, and the Gene Ontology consortium. Centralized multi-species data access, integration, and format standardization will lower the data utilization barrier in comparative genomics and translational applications and will provide a framework in which sustainable scale and cost can be addressed. This article presents a brief historical perspective on how the Alliance model organisms are complementary and how they have already contributed to understanding the etiology of human diseases. In addition, we discuss four challenges for using data from MODs in translational applications and how the Alliance is working to address them, in part by applying FAIR data principles. Ultimately, combined data from these animal models are more powerful than the sum of the parts.

Collapse

Kulmanov M, Schofield PN, Gkoutos GV, Hoehndorf R. Ontology-based validation and identification of regulatory phenotypes. Bioinformatics 2018;34:i857-i865. [PMID: 30423068 PMCID: PMC6129279 DOI: 10.1093/bioinformatics/bty605] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open

Brown SDM, Holmes CC, Mallon AM, Meehan TF, Smedley D, Wells S. High-throughput mouse phenomics for characterizing mammalian gene function. Nat Rev Genet 2018;19:357-370. [PMID: 29626206 PMCID: PMC6582361 DOI: 10.1038/s41576-018-0005-2] [Citation(s) in RCA: 63] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Rodríguez-García MÁ, Gkoutos GV, Schofield PN, Hoehndorf R. Integrating phenotype ontologies with PhenomeNET. J Biomed Semantics 2017;8:58. [PMID: 29258588 PMCID: PMC5735523 DOI: 10.1186/s13326-017-0167-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 11/22/2017] [Indexed: 01/05/2023] Open

Staal YC, Pennings JL, Hessel EV, Piersma AH. Advanced Toxicological Risk Assessment by Implementation of Ontologies Operationalized in Computational Models. ACTA ACUST UNITED AC 2017. [DOI: 10.1089/aivt.2017.0019] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Boudellioua I, Mahamad Razali RB, Kulmanov M, Hashish Y, Bajic VB, Goncalves-Serra E, Schoenmakers N, Gkoutos GV, Schofield PN, Hoehndorf R. Semantic prioritization of novel causative genomic variants. PLoS Comput Biol 2017;13:e1005500. [PMID: 28414800 PMCID: PMC5411092 DOI: 10.1371/journal.pcbi.1005500] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Revised: 05/01/2017] [Accepted: 04/04/2017] [Indexed: 12/14/2022] Open

Abstract

Discriminating the causative disease variant(s) for individuals with inherited or de novo mutations presents one of the main challenges faced by the clinical genetics community today. Computational approaches for variant prioritization include machine learning methods utilizing a large number of features, including molecular information, interaction networks, or phenotypes. Here, we demonstrate the PhenomeNET Variant Predictor (PVP) system that exploits semantic technologies and automated reasoning over genotype-phenotype relations to filter and prioritize variants in whole exome and whole genome sequencing datasets. We demonstrate the performance of PVP in identifying causative variants on a large number of synthetic whole exome and whole genome sequences, covering a wide range of diseases and syndromes. In a retrospective study, we further illustrate the application of PVP for the interpretation of whole exome sequencing data in patients suffering from congenital hypothyroidism. We find that PVP accurately identifies causative variants in whole exome and whole genome sequencing datasets and provides a powerful resource for the discovery of causal variants.

We address the problem of how to distinguish which of the many thousands of DNA sequence variants carried by an individual with a rare disease is responsible for the disease phenotypes. This can help clinicians arrive at a diagnosis, but also can be instrumental in improving our understanding of the pathobiology of the disease. Many methods are currently available to help with the problem of determining causative variant, using information about evolutionary conservation and prediction of the functional consequences of the sequence variant. We have developed a novel algorithm (PVP) which augments existing strategies by using the similarity of the patients phenotype to known phenotype-genotype data in human and model organism databases to further rank potential candidate genes. In a retrospective study, we apply PVP to the interpretation of whole exome sequencing data in patients suffering from congenital hypothyroidism, and find that PVP accurately identifies causative variants in whole exome and whole genome sequencing datasets and provides a powerful resource for the discovery of causal variants.

Collapse

Affiliation(s)

Imane Boudellioua King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
Rozaimi B. Mahamad Razali King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
Maxat Kulmanov King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
Yasmeen Hashish King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
Vladimir B. Bajic King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
Eva Goncalves-Serra Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
Nadia Schoenmakers University of Cambridge Metabolic Research Laboratories, Wellcome Trust—Medical Research Council, Institute of Metabolic Science, Addenbrooke’s Hospital, Cambridge, United Kingdom
Georgios V. Gkoutos College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, United Kingdom Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, United Kingdom Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, United Kingdom * E-mail: (GVG); (PNS); (RH)
Paul N. Schofield Department of Physiology, Development & Neuroscience, University of Cambridge, Cambridge, United Kingdom * E-mail: (GVG); (PNS); (RH)
Robert Hoehndorf King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia * E-mail: (GVG); (PNS); (RH)

Collapse