1
|
Rutherford KM, Lera-Ramírez M, Wood V. PomBase: a Global Core Biodata Resource-growth, collaboration, and sustainability. Genetics 2024; 227:iyae007. [PMID: 38376816 PMCID: PMC11075564 DOI: 10.1093/genetics/iyae007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 01/13/2024] [Indexed: 02/21/2024] Open
Abstract
PomBase (https://www.pombase.org), the model organism database (MOD) for fission yeast, was recently awarded Global Core Biodata Resource (GCBR) status by the Global Biodata Coalition (GBC; https://globalbiodata.org/) after a rigorous selection process. In this MOD review, we present PomBase's continuing growth and improvement over the last 2 years. We describe these improvements in the context of the qualitative GCBR indicators related to scientific quality, comprehensivity, accelerating science, user stories, and collaborations with other biodata resources. This review also showcases the depth of existing connections both within the biocuration ecosystem and between PomBase and its user community.
Collapse
Affiliation(s)
- Kim M Rutherford
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Manuel Lera-Ramírez
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| |
Collapse
|
2
|
Dumschott K, Dörpholz H, Laporte MA, Brilhaus D, Schrader A, Usadel B, Neumann S, Arnaud E, Kranz A. Ontologies for increasing the FAIRness of plant research data. FRONTIERS IN PLANT SCIENCE 2023; 14:1279694. [PMID: 38098789 PMCID: PMC10720748 DOI: 10.3389/fpls.2023.1279694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 11/15/2023] [Indexed: 12/17/2023]
Abstract
The importance of improving the FAIRness (findability, accessibility, interoperability, reusability) of research data is undeniable, especially in the face of large, complex datasets currently being produced by omics technologies. Facilitating the integration of a dataset with other types of data increases the likelihood of reuse, and the potential of answering novel research questions. Ontologies are a useful tool for semantically tagging datasets as adding relevant metadata increases the understanding of how data was produced and increases its interoperability. Ontologies provide concepts for a particular domain as well as the relationships between concepts. By tagging data with ontology terms, data becomes both human- and machine- interpretable, allowing for increased reuse and interoperability. However, the task of identifying ontologies relevant to a particular research domain or technology is challenging, especially within the diverse realm of fundamental plant research. In this review, we outline the ontologies most relevant to the fundamental plant sciences and how they can be used to annotate data related to plant-specific experiments within metadata frameworks, such as Investigation-Study-Assay (ISA). We also outline repositories and platforms most useful for identifying applicable ontologies or finding ontology terms.
Collapse
Affiliation(s)
- Kathryn Dumschott
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Hannah Dörpholz
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Marie-Angélique Laporte
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Dominik Brilhaus
- Data Science and Management & Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Andrea Schrader
- Data Science and Management & Cluster of Excellence on Plant Sciences (CEPLAS), University of Cologne, Cologne, Germany
| | - Björn Usadel
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
- Institute for Biological Data Science & Cluster of Excellence on Plant Sciences (CEPLAS), Faculty of Mathematics and Life Sciences, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Steffen Neumann
- Program Center MetaCom, Leibniz Institute of Plant Biochemistry, Halle, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany
| | - Elizabeth Arnaud
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Angela Kranz
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| |
Collapse
|
3
|
Clarke JL, Cooper LD, Poelchau MF, Berardini TZ, Elser J, Farmer AD, Ficklin S, Kumari S, Laporte MA, Nelson RT, Sadohara R, Selby P, Thessen AE, Whitehead B, Sen TZ. Data sharing and ontology use among agricultural genetics, genomics, and breeding databases and resources of the Agbiodata Consortium. Database (Oxford) 2023; 2023:baad076. [PMID: 37971715 PMCID: PMC10653126 DOI: 10.1093/database/baad076] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 10/17/2023] [Indexed: 11/19/2023]
Abstract
Over the last couple of decades, there has been a rapid growth in the number and scope of agricultural genetics, genomics and breeding databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources (https://www.agbiodata.org/databases) covering model or crop plant and animal GGB data, ontologies, pathways, genetic variation and breeding platforms (referred to as 'databases' throughout). One of the goals of the Consortium is to facilitate FAIR (Findable, Accessible, Interoperable, and Reusable) data management and the integration of datasets which requires data sharing, along with structured vocabularies and/or ontologies. Two AgBioData working groups, focused on Data Sharing and Ontologies, respectively, conducted a Consortium-wide survey to assess the current status and future needs of the members in those areas. A total of 33 researchers responded to the survey, representing 37 databases. Results suggest that data-sharing practices by AgBioData databases are in a fairly healthy state, but it is not clear whether this is true for all metadata and data types across all databases; and that, ontology use has not substantially changed since a similar survey was conducted in 2017. Based on our evaluation of the survey results, we recommend (i) providing training for database personnel in a specific data-sharing techniques, as well as in ontology use; (ii) further study on what metadata is shared, and how well it is shared among databases; (iii) promoting an understanding of data sharing and ontologies in the stakeholder community; (iv) improving data sharing and ontologies for specific phenotypic data types and formats; and (v) lowering specific barriers to data sharing and ontology use, by identifying sustainability solutions, and the identification, promotion, or development of data standards. Combined, these improvements are likely to help AgBioData databases increase development efforts towards improved ontology use, and data sharing via programmatic means. Database URL https://www.agbiodata.org/databases.
Collapse
Affiliation(s)
- Jennifer L Clarke
- Department of Statistics and Department of Food Science and Technology, University of Nebraska–Lincoln, 340 Hardin Hall North Wing, Lincoln, NE 68583, USA
| | - Laurel D Cooper
- Department of Botany and Plant Pathology, Oregon State University, 2503 Cordley Hall, Corvallis, OR 97331, USA
| | - Monica F Poelchau
- USDA, Agricultural Research Service, National Agricultural Library, 10301 Baltimore Ave, Beltsville 20705, USA
| | - Tanya Z Berardini
- The Arabidopsis Information Resource and Phoenix Bioinformatic, 39899 Balentine Drive, Suite 200, Newark, CA, USA
| | - Justin Elser
- Department of Botany and Plant Pathology, Oregon State University, 2503 Cordley Hall, Corvallis, OR 97331, USA
| | - Andrew D Farmer
- National Center for Genome Resources, 2935 Rodeo Park Dr. E., Santa Fe, NM 87505, USA
| | - Stephen Ficklin
- Department of Horticulture, Washington State University, 249 Clark Hall, PO Box 646414, Pullman, WA 99164, USA
| | - Sunita Kumari
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Marie-Angélique Laporte
- Digital Inclusion, Bioversity International, Parc Scientifique Agropolis II, 1990 Bd de la Lironde, Montpellier 34397, France
| | - Rex T Nelson
- USDA, Agricultural Research Service, Corn Insects and Crop Genetics Research Unit, Iowa State University, 716 Farmhouse Lane, Ames, IA 50011, USA
| | - Rie Sadohara
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, 1066 Bogue St, East Lansing, MI 48824, USA
| | - Peter Selby
- School of Integrative Plant Science, College of Agriculture and Life Sciences, Cornell University, 215 Garden Avenue, Ithaca, NY 14850, USA
| | - Anne E Thessen
- Department of Biomedical Informatics, University of Colorado Anschutz, 1890 N. Revere Court, Mailstop F600, Aurora CO 80045, USA
| | - Brandon Whitehead
- Data Science and Informatics, Manaaki Whenua—Landcare Research, Ltd., Riddet Road, Massey University, Palmerston North 4472, New Zealand
| | - Taner Z Sen
- USDA, Agricultural Research Service, Crop Improvement Genetics Research Unit, Western Regional Research Center, 800 Buchanan St, Albany 94710, USA
- Department of Bioengineering, University of California, 306 Stanley Hall, Berkeley, CA 94720, USA
| |
Collapse
|
4
|
Huttenhower C, Finn RD, McHardy AC. Challenges and opportunities in sharing microbiome data and analyses. Nat Microbiol 2023; 8:1960-1970. [PMID: 37783751 DOI: 10.1038/s41564-023-01484-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Accepted: 08/28/2023] [Indexed: 10/04/2023]
Abstract
Microbiome data, metadata and analytical workflows have become 'big' in terms of volume and complexity. Although the infrastructure and technologies to share data have been established, the interdisciplinary and multi-omic nature of the field can make resources difficult to identify and use. Following best practices for data deposition requires substantial effort, with sometimes little obvious reward. Gaps remain where microbiome-specific resources for data sharing or reproducibility do not yet exist. We outline available best practices, challenges to their adoption and opportunities in data sharing in microbiome research. We showcase examples of best practices and advocate for their enforcement and incentivization for data sharing. This includes recognition of data curation and sharing endeavours by individuals, institutions, journals and funders. Opportunities for progress include enabling microbiome-specific databases to incorporate future methods for data analysis, integration and reuse.
Collapse
Affiliation(s)
- Curtis Huttenhower
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Departments of Biostatistics and Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Alice Carolyn McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany.
| |
Collapse
|
5
|
Ambalavanan R, Snead RS, Marczika J, Kozinsky K, Aman E. Advancing the Management of Long COVID by Integrating into Health Informatics Domain: Current and Future Perspectives. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:6836. [PMID: 37835106 PMCID: PMC10572294 DOI: 10.3390/ijerph20196836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/20/2023] [Accepted: 09/22/2023] [Indexed: 10/15/2023]
Abstract
The ongoing COVID-19 pandemic has profoundly affected millions of lives globally, with some individuals experiencing persistent symptoms even after recovering. Understanding and managing the long-term sequelae of COVID-19 is crucial for research, prevention, and control. To effectively monitor the health of those affected, maintaining up-to-date health records is essential, and digital health informatics apps for surveillance play a pivotal role. In this review, we overview the existing literature on identifying and characterizing long COVID manifestations through hierarchical classification based on Human Phenotype Ontology (HPO). We outline the aspects of the National COVID Cohort Collaborative (N3C) and Researching COVID to Enhance Recovery (RECOVER) initiative in artificial intelligence (AI) to identify long COVID. Through knowledge exploration, we present a concept map of clinical pathways for long COVID, which offers insights into the data required and explores innovative frameworks for health informatics apps for tackling the long-term effects of COVID-19. This study achieves two main objectives by comprehensively reviewing long COVID identification and characterization techniques, making it the first paper to explore incorporating long COVID as a variable risk factor within a digital health informatics application. By achieving these objectives, it provides valuable insights on long COVID's challenges and impact on public health.
Collapse
Affiliation(s)
- Radha Ambalavanan
- The Self Research Institute, Broken Arrow, OK 74011, USA; (R.S.S.); (J.M.); (K.K.); (E.A.)
| | | | | | | | | |
Collapse
|
6
|
Stefancsik R, Balhoff JP, Balk MA, Ball RL, Bello SM, Caron AR, Chesler EJ, de Souza V, Gehrke S, Haendel M, Harris LW, Harris NL, Ibrahim A, Koehler S, Matentzoglu N, McMurry JA, Mungall CJ, Munoz-Torres MC, Putman T, Robinson P, Smedley D, Sollis E, Thessen AE, Vasilevsky N, Walton DO, Osumi-Sutherland D. The Ontology of Biological Attributes (OBA)-computational traits for the life sciences. Mamm Genome 2023; 34:364-378. [PMID: 37076585 PMCID: PMC10382347 DOI: 10.1007/s00335-023-09992-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 04/06/2023] [Indexed: 04/21/2023]
Abstract
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focussed measurable trait data. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.
Collapse
Affiliation(s)
- Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK.
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC, 27517, USA
| | - Meghan A Balk
- Natural History Museum, University of Oslo, Oslo, Norway
| | - Robyn L Ball
- The Jackson Laboratory, Bar Harbor, ME, 04609, USA
| | | | - Anita R Caron
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sarah Gehrke
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Melissa Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Laura W Harris
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Nomi L Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Arwa Ibrahim
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | | | - Julie A McMurry
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | | | - Damian Smedley
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK
| | - Elliot Sollis
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Nicole Vasilevsky
- Data Collaboration Center, Critical Path Institute, Tucson, AZ, 85718, USA
| | | | | |
Collapse
|
7
|
Alghamdi SM, Hoehndorf R. Improving the classification of cardinality phenotypes using collections. J Biomed Semantics 2023; 14:9. [PMID: 37550716 PMCID: PMC10405428 DOI: 10.1186/s13326-023-00290-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 07/07/2023] [Indexed: 08/09/2023] Open
Abstract
MOTIVATION Phenotypes are observable characteristics of an organism and they can be highly variable. Information about phenotypes is collected in a clinical context to characterize disease, and is also collected in model organisms and stored in model organism databases where they are used to understand gene functions. Phenotype data is also used in computational data analysis and machine learning methods to provide novel insights into disease mechanisms and support personalized diagnosis of disease. For mammalian organisms and in a clinical context, ontologies such as the Human Phenotype Ontology and the Mammalian Phenotype Ontology are widely used to formally and precisely describe phenotypes. We specifically analyze axioms pertaining to phenotypes of collections of entities within a body, and we find that some of the axioms in phenotype ontologies lead to inferences that may not accurately reflect the underlying biological phenomena. RESULTS We reformulate the phenotypes of collections of entities using an ontological theory of collections. By reformulating phenotypes of collections in phenotypes ontologies, we avoid potentially incorrect inferences pertaining to the cardinality of these collections. We apply our method to two phenotype ontologies and show that the reformulation not only removes some problematic inferences but also quantitatively improves biological data analysis.
Collapse
Affiliation(s)
- Sarah M Alghamdi
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, 23955, Thuwal, Saudi Arabia.
- King Abdul-Aziz University, Faculty of Computing and Information Technology, 25732, Rabigh, Saudi Arabia.
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, 23955, Thuwal, Saudi Arabia.
| |
Collapse
|
8
|
Ruberte J, Schofield PN, Sundberg JP, Rodriguez-Baeza A, Carretero A, McKerlie C. Bridging mouse and human anatomies; a knowledge-based approach to comparative anatomy for disease model phenotyping. Mamm Genome 2023:10.1007/s00335-023-10005-4. [PMID: 37421464 PMCID: PMC10382392 DOI: 10.1007/s00335-023-10005-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Accepted: 06/13/2023] [Indexed: 07/10/2023]
Abstract
The laboratory mouse is the foremost mammalian model used for studying human diseases and is closely anatomically related to humans. Whilst knowledge about human anatomy has been collected throughout the history of mankind, the first comprehensive study of the mouse anatomy was published less than 60 years ago. This has been followed by the more recent publication of several books and resources on mouse anatomy. Nevertheless, to date, our understanding and knowledge of mouse anatomy is far from being at the same level as that of humans. In addition, the alignment between current mouse and human anatomy nomenclatures is far from being as developed as those existing between other species, such as domestic animals and humans. To close this gap, more in depth mouse anatomical research is needed and it will be necessary to extent and refine the current vocabulary of mouse anatomical terms.
Collapse
Affiliation(s)
- Jesús Ruberte
- Center for Animal Biotechnology and Gene Therapy, Universitat Autònoma de Barcelona, Barcelona, Spain.
- Department of Animal Health and Anatomy, Universitat Autònoma de Barcelona, Barcelona, Spain.
| | - Paul N Schofield
- The Jackson Laboratory, Bar Harbor, ME, USA
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK
| | - John P Sundberg
- The Jackson Laboratory, Bar Harbor, ME, USA
- Department of Dermatology, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - Ana Carretero
- Center for Animal Biotechnology and Gene Therapy, Universitat Autònoma de Barcelona, Barcelona, Spain
- Department of Animal Health and Anatomy, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Colin McKerlie
- The Hospital for Sick Children, Toronto, Canada
- Department of Lab Medicine and Pathobiology, Faculty of Medicine, University of Toronto, Toronto, Canada
| |
Collapse
|
9
|
Thessen AE, Cooper L, Swetnam TL, Hegde H, Reese J, Elser J, Jaiswal P. Using knowledge graphs to infer gene expression in plants. Front Artif Intell 2023; 6:1201002. [PMID: 37384147 PMCID: PMC10298150 DOI: 10.3389/frai.2023.1201002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 05/23/2023] [Indexed: 06/30/2023] Open
Abstract
Introduction Climate change is already affecting ecosystems around the world and forcing us to adapt to meet societal needs. The speed with which climate change is progressing necessitates a massive scaling up of the number of species with understood genotype-environment-phenotype (G×E×P) dynamics in order to increase ecosystem and agriculture resilience. An important part of predicting phenotype is understanding the complex gene regulatory networks present in organisms. Previous work has demonstrated that knowledge about one species can be applied to another using ontologically-supported knowledge bases that exploit homologous structures and homologous genes. These types of structures that can apply knowledge about one species to another have the potential to enable the massive scaling up that is needed through in silico experimentation. Methods We developed one such structure, a knowledge graph (KG) using information from Planteome and the EMBL-EBI Expression Atlas that connects gene expression, molecular interactions, functions, and pathways to homology-based gene annotations. Our preliminary analysis uses data from gene expression studies in Arabidopsis thaliana and Populus trichocarpa plants exposed to drought conditions. Results A graph query identified 16 pairs of homologous genes in these two taxa, some of which show opposite patterns of gene expression in response to drought. As expected, analysis of the upstream cis-regulatory region of these genes revealed that homologs with similar expression behavior had conserved cis-regulatory regions and potential interaction with similar trans-elements, unlike homologs that changed their expression in opposite ways. Discussion This suggests that even though the homologous pairs share common ancestry and functional roles, predicting expression and phenotype through homology inference needs careful consideration of integrating cis and trans-regulatory components in the curated and inferred knowledge graph.
Collapse
Affiliation(s)
- Anne E. Thessen
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Tyson L. Swetnam
- BIO5 Institute, University of Arizona, Tucson, AZ, United States
| | - Harshad Hegde
- Environmental Genomics and Systems Biology Division, Berkeley Lab (DOE), Berkeley, CA, United States
| | - Justin Reese
- Environmental Genomics and Systems Biology Division, Berkeley Lab (DOE), Berkeley, CA, United States
| | - Justin Elser
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| |
Collapse
|
10
|
Aleksander SA, Balhoff J, Carbon S, Cherry JM, Drabkin HJ, Ebert D, Feuermann M, Gaudet P, Harris NL, Hill DP, Lee R, Mi H, Moxon S, Mungall CJ, Muruganugan A, Mushayahama T, Sternberg PW, Thomas PD, Van Auken K, Ramsey J, Siegele DA, Chisholm RL, Fey P, Aspromonte MC, Nugnes MV, Quaglia F, Tosatto S, Giglio M, Nadendla S, Antonazzo G, Attrill H, Dos Santos G, Marygold S, Strelets V, Tabone CJ, Thurmond J, Zhou P, Ahmed SH, Asanitthong P, Luna Buitrago D, Erdol MN, Gage MC, Ali Kadhum M, Li KYC, Long M, Michalak A, Pesala A, Pritazahra A, Saverimuttu SCC, Su R, Thurlow KE, Lovering RC, Logie C, Oliferenko S, Blake J, Christie K, Corbani L, Dolan ME, Drabkin HJ, Hill DP, Ni L, Sitnikov D, Smith C, Cuzick A, Seager J, Cooper L, Elser J, Jaiswal P, Gupta P, Jaiswal P, Naithani S, Lera-Ramirez M, Rutherford K, Wood V, De Pons JL, Dwinell MR, Hayman GT, Kaldunski ML, Kwitek AE, Laulederkind SJF, Tutaj MA, Vedi M, Wang SJ, D'Eustachio P, Aimo L, Axelsen K, Bridge A, Hyka-Nouspikel N, Morgat A, Aleksander SA, Cherry JM, Engel SR, Karra K, Miyasato SR, Nash RS, Skrzypek MS, Weng S, Wong ED, Bakker E, Berardini TZ, Reiser L, Auchincloss A, Axelsen K, Argoud-Puy G, Blatter MC, Boutet E, Breuza L, Bridge A, Casals-Casas C, Coudert E, Estreicher A, Livia Famiglietti M, Feuermann M, Gos A, Gruaz-Gumowski N, Hulo C, Hyka-Nouspikel N, Jungo F, Le Mercier P, Lieberherr D, Masson P, Morgat A, Pedruzzi I, Pourcel L, Poux S, Rivoire C, Sundaram S, Bateman A, Bowler-Barnett E, Bye-A-Jee H, Denny P, Ignatchenko A, Ishtiaq R, Lock A, Lussi Y, Magrane M, Martin MJ, Orchard S, Raposo P, Speretta E, Tyagi N, Warner K, Zaru R, Diehl AD, Lee R, Chan J, Diamantakis S, Raciti D, Zarowiecki M, Fisher M, James-Zorn C, Ponferrada V, Zorn A, Ramachandran S, Ruzicka L, Westerfield M. The Gene Ontology knowledgebase in 2023. Genetics 2023; 224:iyad031. [PMID: 36866529 PMCID: PMC10158837 DOI: 10.1093/genetics/iyad031] [Citation(s) in RCA: 264] [Impact Index Per Article: 264.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 02/10/2023] [Accepted: 02/11/2023] [Indexed: 03/04/2023] Open
Abstract
The Gene Ontology (GO) knowledgebase (http://geneontology.org) is a comprehensive resource concerning the functions of genes and gene products (proteins and noncoding RNAs). GO annotations cover genes from organisms across the tree of life as well as viruses, though most gene function knowledge currently derives from experiments carried out in a relatively small number of model organisms. Here, we provide an updated overview of the GO knowledgebase, as well as the efforts of the broad, international consortium of scientists that develops, maintains, and updates the GO knowledgebase. The GO knowledgebase consists of three components: (1) the GO-a computational knowledge structure describing the functional characteristics of genes; (2) GO annotations-evidence-supported statements asserting that a specific gene product has a particular functional characteristic; and (3) GO Causal Activity Models (GO-CAMs)-mechanistic models of molecular "pathways" (GO biological processes) created by linking multiple GO annotations using defined relations. Each of these components is continually expanded, revised, and updated in response to newly published discoveries and receives extensive QA checks, reviews, and user feedback. For each of these components, we provide a description of the current contents, recent developments to keep the knowledgebase up to date with new discoveries, and guidance on how users can best make use of the data that we provide. We conclude with future directions for the project.
Collapse
|
11
|
Stefancsik R, Balhoff JP, Balk MA, Ball R, Bello SM, Caron AR, Chessler E, de Souza V, Gehrke S, Haendel M, Harris LW, Harris NL, Ibrahim A, Koehler S, Matentzoglu N, McMurry JA, Mungall CJ, Munoz-Torres MC, Putman T, Robinson P, Smedley D, Sollis E, Thessen AE, Vasilevsky N, Walton DO, Osumi-Sutherland D. The Ontology of Biological Attributes (OBA) - Computational Traits for the Life Sciences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.26.525742. [PMID: 36747660 PMCID: PMC9900877 DOI: 10.1101/2023.01.26.525742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focused measurable trait data. Moreover, variations in gene expression in response to environmental disturbances even without any genetic alterations can also be associated with particular biological attributes. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.
Collapse
Affiliation(s)
- Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - James P. Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC 27517, USA
| | - Meghan A. Balk
- National Ecological Observatory Network, Battelle, Boulder, CO 80301, USA
| | - Robyn Ball
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | | | - Anita R. Caron
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sarah Gehrke
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Melissa Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Laura W. Harris
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Nomi L. Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Arwa Ibrahim
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | | | - Julie A. McMurry
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Christopher J. Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | | | - Damian Smedley
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, UK
| | - Elliot Sollis
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Nicole Vasilevsky
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | | | | |
Collapse
|
12
|
Alghamdi SM, Schofield PN, Hoehndorf R. How much do model organism phenotypes contribute to the computational identification of human disease genes? Dis Model Mech 2022; 15:275986. [PMID: 35758016 PMCID: PMC9366895 DOI: 10.1242/dmm.049441] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 06/13/2022] [Indexed: 12/04/2022] Open
Abstract
Computing phenotypic similarity helps identify new disease genes and diagnose rare diseases. Genotype–phenotype data from orthologous genes in model organisms can compensate for lack of human data and increase genome coverage. In the past decade, cross-species phenotype comparisons have proven valuble, and several ontologies have been developed for this purpose. The relative contribution of different model organisms to computational identification of disease-associated genes is not fully explored. We used phenotype ontologies to semantically relate phenotypes resulting from loss-of-function mutations in model organisms to disease-associated phenotypes in humans. Semantic machine learning methods were used to measure the contribution of different model organisms to the identification of known human gene–disease associations. We found that mouse genotype–phenotype data provided the most important dataset in the identification of human disease genes by semantic similarity and machine learning over phenotype ontologies. Other model organisms' data did not improve identification over that obtained using the mouse alone, and therefore did not contribute significantly to this task. Our work impacts on the development of integrated phenotype ontologies, as well as for the use of model organism phenotypes in human genetic variant interpretation. This article has an associated First Person interview with the first author of the paper. Editor's choice: We investigated the use of model organism phenotypes in the computational identification of disease genes, identifying several data biases and concluding that mouse model phenotypes contribute most to computational disease gene identification.
Collapse
Affiliation(s)
- Sarah M Alghamdi
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, 23955 Thuwal, Saudi Arabia
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, CB2 3EG, Cambridge, UK
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, 23955 Thuwal, Saudi Arabia
| |
Collapse
|
13
|
Xiang J, Zhang J, Zhao Y, Wu FX, Li M. Biomedical data, computational methods and tools for evaluating disease-disease associations. Brief Bioinform 2022; 23:6522999. [PMID: 35136949 DOI: 10.1093/bib/bbac006] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 12/12/2022] Open
Abstract
In recent decades, exploring potential relationships between diseases has been an active research field. With the rapid accumulation of disease-related biomedical data, a lot of computational methods and tools/platforms have been developed to reveal intrinsic relationship between diseases, which can provide useful insights to the study of complex diseases, e.g. understanding molecular mechanisms of diseases and discovering new treatment of diseases. Human complex diseases involve both external phenotypic abnormalities and complex internal molecular mechanisms in organisms. Computational methods with different types of biomedical data from phenotype to genotype can evaluate disease-disease associations at different levels, providing a comprehensive perspective for understanding diseases. In this review, available biomedical data and databases for evaluating disease-disease associations are first summarized. Then, existing computational methods for disease-disease associations are reviewed and classified into five groups in terms of the usages of biomedical data, including disease semantic-based, phenotype-based, function-based, representation learning-based and text mining-based methods. Further, we summarize software tools/platforms for computation and analysis of disease-disease associations. Finally, we give a discussion and summary on the research of disease-disease associations. This review provides a systematic overview for current disease association research, which could promote the development and applications of computational methods and tools/platforms for disease-disease associations.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, China
| | - Jiashuai Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, China
| | - Fang-Xiang Wu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Min Li
- Division of Biomedical Engineering and Department of Mechanical Engineering at University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|
14
|
Vogt L. FAIR data representation in times of eScience: a comparison of instance-based and class-based semantic representations of empirical data using phenotype descriptions as example. J Biomed Semantics 2021; 12:20. [PMID: 34823588 PMCID: PMC8613519 DOI: 10.1186/s13326-021-00254-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 11/11/2021] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND The size, velocity, and heterogeneity of Big Data outclasses conventional data management tools and requires data and metadata to be fully machine-actionable (i.e., eScience-compliant) and thus findable, accessible, interoperable, and reusable (FAIR). This can be achieved by using ontologies and through representing them as semantic graphs. Here, we discuss two different semantic graph approaches of representing empirical data and metadata in a knowledge graph, with phenotype descriptions as an example. Almost all phenotype descriptions are still being published as unstructured natural language texts, with far-reaching consequences for their FAIRness, substantially impeding their overall usability within the life sciences. However, with an increasing amount of anatomy ontologies becoming available and semantic applications emerging, a solution to this problem becomes available. Researchers are starting to document and communicate phenotype descriptions through the Web in the form of highly formalized and structured semantic graphs that use ontology terms and Uniform Resource Identifiers (URIs) to circumvent the problems connected with unstructured texts. RESULTS Using phenotype descriptions as an example, we compare and evaluate two basic representations of empirical data and their accompanying metadata in the form of semantic graphs: the class-based TBox semantic graph approach called Semantic Phenotype and the instance-based ABox semantic graph approach called Phenotype Knowledge Graph. Their main difference is that only the ABox approach allows for identifying every individual part and property mentioned in the description in a knowledge graph. This technical difference results in substantial practical consequences that significantly affect the overall usability of empirical data. The consequences affect findability, accessibility, and explorability of empirical data as well as their comparability, expandability, universal usability and reusability, and overall machine-actionability. Moreover, TBox semantic graphs often require querying under entailment regimes, which is computationally more complex. CONCLUSIONS We conclude that, from a conceptual point of view, the advantages of the instance-based ABox semantic graph approach outweigh its shortcomings and outweigh the advantages of the class-based TBox semantic graph approach. Therefore, we recommend the instance-based ABox approach as a FAIR approach for documenting and communicating empirical data and metadata in a knowledge graph.
Collapse
Affiliation(s)
- Lars Vogt
- TIB Leibniz Information Centre for Science and Technology, Welfengarten 1B, 30167, Hanover, Germany.
| |
Collapse
|
15
|
Filice RW, Kahn CE. Biomedical Ontologies to Guide AI Development in Radiology. J Digit Imaging 2021; 34:1331-1341. [PMID: 34724143 PMCID: PMC8669056 DOI: 10.1007/s10278-021-00527-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 04/27/2021] [Accepted: 10/13/2021] [Indexed: 10/25/2022] Open
Abstract
The advent of deep learning has engendered renewed and rapidly growing interest in artificial intelligence (AI) in radiology to analyze images, manipulate textual reports, and plan interventions. Applications of deep learning and other AI approaches must be guided by sound medical knowledge to assure that they are developed successfully and that they address important problems in biomedical research or patient care. To date, AI has been applied to a limited number of real-world radiology applications. As AI systems become more pervasive and are applied more broadly, they will benefit from medical knowledge on a larger scale, such as that available through computer-based approaches. A key approach to represent computer-based knowledge in a particular domain is an ontology. As defined in informatics, an ontology defines a domain's terms through their relationships with other terms in the ontology. Those relationships, then, define the terms' semantics, or "meaning." Biomedical ontologies commonly define the relationships between terms and more general terms, and can express causal, part-whole, and anatomic relationships. Ontologies express knowledge in a form that is both human-readable and machine-computable. Some ontologies, such as RSNA's RadLex radiology lexicon, have been applied to applications in clinical practice and research, and may be familiar to many radiologists. This article describes how ontologies can support research and guide emerging applications of AI in radiology, including natural language processing, image-based machine learning, radiomics, and planning.
Collapse
Affiliation(s)
- Ross W Filice
- Department of Radiology, MedStar Georgetown University Hospital, Washington, DC, USA
| | - Charles E Kahn
- Department of Radiology and Institute for Biomedical Informatics, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, USA.
| |
Collapse
|
16
|
Chapman M, Mumtaz S, Rasmussen LV, Karwath A, Gkoutos GV, Gao C, Thayer D, Pacheco JA, Parkinson H, Richesson RL, Jefferson E, Denaxas S, Curcin V. Desiderata for the development of next-generation electronic health record phenotype libraries. Gigascience 2021; 10:giab059. [PMID: 34508578 PMCID: PMC8434766 DOI: 10.1093/gigascience/giab059] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 07/15/2021] [Accepted: 08/18/2021] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND High-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling. METHODS A group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices. RESULTS We present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing. CONCLUSIONS There are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains.
Collapse
Affiliation(s)
- Martin Chapman
- Department of Population Health Sciences, King's College London, London, SE1 1UL, UK
| | - Shahzad Mumtaz
- Health Informatics Centre (HIC), University of Dundee, Dundee, DD1 9SY, UK
| | - Luke V Rasmussen
- Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Andreas Karwath
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK
| | - Georgios V Gkoutos
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK
| | - Chuang Gao
- Health Informatics Centre (HIC), University of Dundee, Dundee, DD1 9SY, UK
| | - Dan Thayer
- SAIL Databank, Swansea University, Swansea, SA2 8PP, UK
| | - Jennifer A Pacheco
- Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Helen Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, CB10 1SD, UK
| | - Rachel L Richesson
- Department of Learning Health Sciences, University of Michigan Medical School, MI 48109, USA
| | - Emily Jefferson
- Health Informatics Centre (HIC), University of Dundee, Dundee, DD1 9SY, UK
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, NW1 2DA, UK
| | - Vasa Curcin
- Department of Population Health Sciences, King's College London, London, SE1 1UL, UK
| |
Collapse
|
17
|
Kafkas Ş, Althubaiti S, Gkoutos GV, Hoehndorf R, Schofield PN. Linking common human diseases to their phenotypes; development of a resource for human phenomics. J Biomed Semantics 2021; 12:17. [PMID: 34425897 PMCID: PMC8383460 DOI: 10.1186/s13326-021-00249-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 07/30/2021] [Indexed: 11/11/2022] Open
Abstract
Background In recent years a large volume of clinical genomics data has become available due to rapid advances in sequencing technologies. Efficient exploitation of this genomics data requires linkage to patient phenotype profiles. Current resources providing disease-phenotype associations are not comprehensive, and they often do not have broad coverage of the disease terminologies, particularly ICD-10, which is still the primary terminology used in clinical settings. Methods We developed two approaches to gather disease-phenotype associations. First, we used a text mining method that utilizes semantic relations in phenotype ontologies, and applies statistical methods to extract associations between diseases in ICD-10 and phenotype ontology classes from the literature. Second, we developed a semi-automatic way to collect ICD-10–phenotype associations from existing resources containing known relationships. Results We generated four datasets. Two of them are independent datasets linking diseases to their phenotypes based on text mining and semi-automatic strategies. The remaining two datasets are generated from these datasets and cover a subset of ICD-10 classes of common diseases contained in UK Biobank. We extensively validated our text mined and semi-automatically curated datasets by: comparing them against an expert-curated validation dataset containing disease–phenotype associations, measuring their similarity to disease–phenotype associations found in public databases, and assessing how well they could be used to recover gene–disease associations using phenotype similarity. Conclusion We find that our text mining method can produce phenotype annotations of diseases that are correct but often too general to have significant information content, or too specific to accurately reflect the typical manifestations of the sporadic disease. On the other hand, the datasets generated from integrating multiple knowledgebases are more complete (i.e., cover more of the required phenotype annotations for a given disease). We make all data freely available at 10.5281/zenodo.4726713. Supplementary Information The online version contains supplementary material available at (10.1186/s13326-021-00249-x).
Collapse
Affiliation(s)
- Şenay Kafkas
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955, Saudi Arabia
| | - Sara Althubaiti
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955, Saudi Arabia
| | - Georgios V Gkoutos
- Health Data Research UK, Midlands site, Edgbaston, Birmingham, B15 2TT, United Kingdom.,Institute of Cancer and Genomic Sciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT, United Kingdom
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955, Saudi Arabia.
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, United Kingdom
| |
Collapse
|
18
|
Pendleton SC, Slater K, Karwath A, Gilbert RM, Davis N, Pesudovs K, Liu X, Denniston AK, Gkoutos GV, Braithwaite T. Development and application of the ocular immune-mediated inflammatory diseases ontology enhanced with synonyms from online patient support forum conversation. Comput Biol Med 2021; 135:104542. [PMID: 34139439 PMCID: PMC8404035 DOI: 10.1016/j.compbiomed.2021.104542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 05/27/2021] [Accepted: 05/30/2021] [Indexed: 11/28/2022]
Abstract
BACKGROUND Unstructured text created by patients represents a rich, but relatively inaccessible resource for advancing patient-centred care. This study aimed to develop an ontology for ocular immune-mediated inflammatory diseases (OcIMIDo), as a tool to facilitate data extraction and analysis, illustrating its application to online patient support forum data. METHODS We developed OcIMIDo using clinical guidelines, domain expertise, and cross-references to classes from other biomedical ontologies. We developed an approach to add patient-preferred synonyms text-mined from oliviasvision.org online forum, using statistical ranking. We validated the approach with split-sampling and comparison to manual extraction. Using OcIMIDo, we then explored the frequency of OcIMIDo classes and synonyms, and their potential association with natural language sentiment expressed in each online forum post. FINDINGS OcIMIDo (version 1.2) includes 661 classes, describing anatomy, clinical phenotype, disease activity status, complications, investigations, interventions and functional impacts. It contains 1661 relationships and axioms, 2851 annotations, including 1131 database cross-references, and 187 patient-preferred synonyms. To illustrate OcIMIDo's potential applications, we explored 9031 forum posts, revealing frequent mention of different clinical phenotypes, treatments, and complications. Language sentiment analysis of each post was generally positive (median 0.12, IQR 0.01-0.24). In multivariable logistic regression, the odds of a post expressing negative sentiment were significantly associated with first posts as compared to replies (OR 3.3, 95% CI 2.8 to 3.9, p < 0.001). CONCLUSION We report the development and validation of a new ontology for inflammatory eye diseases, which includes patient-preferred synonyms, and can be used to explore unstructured patient or physician-reported text data, with many potential applications.
Collapse
Affiliation(s)
- Samantha C Pendleton
- Institute of Cancer and Genomic Sciences, University of Birmingham, UK; University Hospitals Birmingham NHS Foundation Trust, UK.
| | - Karin Slater
- Institute of Cancer and Genomic Sciences, University of Birmingham, UK; University Hospitals Birmingham NHS Foundation Trust, UK
| | - Andreas Karwath
- Institute of Cancer and Genomic Sciences, University of Birmingham, UK; University Hospitals Birmingham NHS Foundation Trust, UK; Health Data Research, UK
| | - Rose M Gilbert
- Moorfields Eye Hospital NHS Foundation Trust, London, UK; Institute of Ophthalmology, University College London, UK
| | - Nicola Davis
- Olivia's Vision, Southampton Buildings, London, UK
| | - Konrad Pesudovs
- School of Optometry and Vision Science, University of New South Wales, Australia
| | - Xiaoxuan Liu
- University Hospitals Birmingham NHS Foundation Trust, UK; Institute of Inflammation and Ageing, University of Birmingham, UK
| | - Alastair K Denniston
- University Hospitals Birmingham NHS Foundation Trust, UK; Health Data Research, UK; Institute of Inflammation and Ageing, University of Birmingham, UK
| | - Georgios V Gkoutos
- Institute of Cancer and Genomic Sciences, University of Birmingham, UK; University Hospitals Birmingham NHS Foundation Trust, UK; Health Data Research, UK
| | - Tasanee Braithwaite
- University Hospitals Birmingham NHS Foundation Trust, UK; Institute of Applied Health Research, University of Birmingham, UK; The Medical Eye Unit, St Thomas' Hospital NHS Foundation Trust, London, UK
| |
Collapse
|
19
|
Kulmanov M, Smaili FZ, Gao X, Hoehndorf R. Semantic similarity and machine learning with ontologies. Brief Bioinform 2021; 22:bbaa199. [PMID: 33049044 PMCID: PMC8293838 DOI: 10.1093/bib/bbaa199] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 08/03/2020] [Accepted: 08/04/2020] [Indexed: 12/13/2022] Open
Abstract
Ontologies have long been employed in the life sciences to formally represent and reason over domain knowledge and they are employed in almost every major biological database. Recently, ontologies are increasingly being used to provide background knowledge in similarity-based analysis and machine learning models. The methods employed to combine ontologies and machine learning are still novel and actively being developed. We provide an overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods; in particular, we outline how semantic similarity measures and ontology embeddings can exploit the background knowledge in ontologies and how ontologies can provide constraints that improve machine learning models. The methods and experiments we describe are available as a set of executable notebooks, and we also provide a set of slides and additional resources at https://github.com/bio-ontology-research-group/machine-learning-with-ontologies.
Collapse
Affiliation(s)
| | | | - Xin Gao
- Computational Bioscience Research Center and lead of the Structural and Functional Bioinformatics Group at King Abdullah University of Science and Technology
| | | |
Collapse
|
20
|
Chen J, Althagafi A, Hoehndorf R. Predicting candidate genes from phenotypes, functions and anatomical site of expression. Bioinformatics 2021; 37:853-860. [PMID: 33051643 PMCID: PMC8248315 DOI: 10.1093/bioinformatics/btaa879] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 08/26/2020] [Accepted: 09/28/2020] [Indexed: 12/30/2022] Open
Abstract
Motivation Over the past years, many computational methods have been developed to
incorporate information about phenotypes for disease–gene
prioritization task. These methods generally compute the similarity between
a patient’s phenotypes and a database of gene-phenotype to find the
most phenotypically similar match. The main limitation in these methods is
their reliance on knowledge about phenotypes associated with particular
genes, which is not complete in humans as well as in many model organisms,
such as the mouse and fish. Information about functions of gene products and
anatomical site of gene expression is available for more genes and can also
be related to phenotypes through ontologies and machine-learning models. Results We developed a novel graph-based machine-learning method for biomedical
ontologies, which is able to exploit axioms in ontologies and other
graph-structured data. Using our machine-learning method, we embed genes
based on their associated phenotypes, functions of the gene products and
anatomical location of gene expression. We then develop a machine-learning
model to predict gene–disease associations based on the associations
between genes and multiple biomedical ontologies, and this model
significantly improves over state-of-the-art methods. Furthermore, we extend
phenotype-based gene prioritization methods significantly to all genes,
which are associated with phenotypes, functions or site of expression. Availability and implementation Software and data are available at https://github.com/bio-ontology-research-group/DL2Vec. Supplementary information Supplementary data
are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jun Chen
- Computational Bioscience Research Center (CBRC), Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Azza Althagafi
- Computational Bioscience Research Center (CBRC), Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia.,Computer Science Department, College of Computers and Information Technology, Taif University, Taif 26571, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| |
Collapse
|
21
|
Liu-Wei W, Kafkas Ş, Chen J, Dimonaco NJ, Tegnér J, Hoehndorf R. DeepViral: prediction of novel virus-host interactions from protein sequences and infectious disease phenotypes. Bioinformatics 2021; 37:2722-2729. [PMID: 33682875 PMCID: PMC8428617 DOI: 10.1093/bioinformatics/btab147] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Revised: 01/18/2021] [Accepted: 03/01/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Infectious diseases caused by novel viruses have become a major public health concern. Rapid identification of virus-host interactions can reveal mechanistic insights into infectious diseases and shed light on potential treatments. Current computational prediction methods for novel viruses are based mainly on protein sequences. However, it is not clear to what extent other important features, such as the symptoms caused by the viruses, could contribute to a predictor. Disease phenotypes (i.e., signs and symptoms) are readily accessible from clinical diagnosis and we hypothesize that they may act as a potential proxy and an additional source of information for the underlying molecular interactions between the pathogens and hosts. RESULTS We developed DeepViral, a deep learning based method that predicts protein-protein interactions (PPI) between humans and viruses. Motivated by the potential utility of infectious disease phenotypes, we first embedded human proteins and viruses in a shared space using their associated phenotypes and functions, supported by formalized background knowledge from biomedical ontologies. By jointly learning from protein sequences and phenotype features, DeepViral significantly improves over existing sequence-based methods for intra- and inter-species PPI prediction. AVAILABILITY Code and datasets for reproduction and customization are available at https://github.com/bio-ontology-research-group/DeepViral. Prediction results for 14 virus families are available at https://doi.org/10.5281/zenodo.4429824.
Collapse
Affiliation(s)
- Wang Liu-Wei
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Şenay Kafkas
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia.,Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Jun Chen
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Nicholas J Dimonaco
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, SY23 3BQ, Wales, UK
| | - Jesper Tegnér
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia.,Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia.,Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia
| |
Collapse
|
22
|
Hammad R, Barhoush M, Abed-alguni BH. A Semantic-Based Approach for Managing Healthcare Big Data: A Survey. JOURNAL OF HEALTHCARE ENGINEERING 2020; 2020:8865808. [PMID: 33489061 PMCID: PMC7787845 DOI: 10.1155/2020/8865808] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 11/02/2020] [Accepted: 11/09/2020] [Indexed: 12/20/2022]
Abstract
Healthcare information systems can reduce the expenses of treatment, foresee episodes of pestilences, help stay away from preventable illnesses, and improve personal life satisfaction. As of late, considerable volumes of heterogeneous and differing medicinal services data are being produced from different sources covering clinic records of patients, lab results, and wearable devices, making it hard for conventional data processing to handle and manage this amount of data. Confronted with the difficulties and challenges facing the process of managing healthcare big data such as volume, velocity, and variety, healthcare information systems need to use new methods and techniques for managing and processing such data to extract useful information and knowledge. In the recent few years, a large number of organizations and companies have shown enthusiasm for using semantic web technologies with healthcare big data to convert data into knowledge and intelligence. In this paper, we review the state of the art on the semantic web for the healthcare industry. Based on our literature review, we will discuss how different techniques, standards, and points of view created by the semantic web community can participate in addressing the challenges related to healthcare big data.
Collapse
|
23
|
Teletchea S, Teletchea F. STOREFISH 2.0: a database on the reproductive strategies of teleost fishes. Database (Oxford) 2020; 2020:baaa095. [PMID: 33216894 PMCID: PMC7678788 DOI: 10.1093/database/baaa095] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 09/04/2020] [Accepted: 10/14/2020] [Indexed: 01/08/2023]
Abstract
Teleost fishes show the most outstanding reproductive diversity of all vertebrates. Yet to date, no one has been able to decisively explain this striking variability nor to perform large-scale phylogenetic analyses of reproductive modes. Here, we describe STrategies Of REproduction in FISH (STOREFISH) 2.0, an online database easing the sharing of an original data set on reproduction published in 2007, enriched with automated data extraction and presentation to display the knowledge acquired on temperate freshwater fish species. STOREFISH 2.0 contains the information for 80 freshwater fish species and 50 traits from the analysis of 1219 references. It is anticipated that this new database could be useful for freshwater biodiversity research, conservation, assessment and management. Database URL: www.storefish.org.
Collapse
Affiliation(s)
- Stéphane Teletchea
- UFIP, Université de Nantes, UMR CRNS 6286, 2 rue de la Houssinière, 44322 Nantes cedex 3, France
| | - Fabrice Teletchea
- University of Lorraine, INRAE, UR AFPA, 2 avenue de la Forêt de Haye - BP 20163
, F-54000, Vandoeuvre-lès-Nancy Cedex, France
| |
Collapse
|
24
|
Deng L, Lin W, Wang J, Zhang J. DeepciRGO: functional prediction of circular RNAs through hierarchical deep neural networks using heterogeneous network features. BMC Bioinformatics 2020; 21:519. [PMID: 33183227 PMCID: PMC7659092 DOI: 10.1186/s12859-020-03748-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Accepted: 09/11/2020] [Indexed: 12/28/2022] Open
Abstract
Background Circular RNAs (circRNAs) are special noncoding RNA molecules with closed loop structures. Compared with the traditional linear RNA, circRNA is more stable and not easily degraded. Many studies have shown that circRNAs are involved in the regulation of various diseases and cancers. Determining the functions of circRNAs in mammalian cells is of great significance for revealing their mechanism of action in physiological and pathological processes, diagnosis and treatment of diseases. However, determining the functions of circRNAs on a large scale is a challenging task because of the high experimental costs. Results In this paper, we present a hierarchical deep learning model, DeepciRGO, which can effectively predict gene ontology functions of circRNAs. We build a heterogeneous network containing circRNA co-expressions, protein–protein interactions and protein–circRNA interactions. The topology features of proteins and circRNAs are calculated using a novel representation learning approach HIN2Vec across the heterogeneous network. Then, a deep multi-label hierarchical classification model is trained with the topology features to predict the biological process function in the gene ontology for each circRNA. In particular, we manually curated a benchmark dataset containing 185 GO annotations for 62 circRNAs, namely, circRNA2GO-62. The DeepciRGO achieves promising performance on the circRNA2GO-62 dataset with a maximum F-measure of 0.412, a recall score of 0.400, and an accuracy of 0.425, which are significantly better than other state-of-the-art RNA function prediction methods. In addition, we demonstrate the considerable potential of integrating multiple interactions and association networks. Conclusions DeepciRGO will be a useful tool for accurately annotating circRNAs. The experimental results show that integrating multi-source data can help to improve the predictive performance of DeepciRGO. Moreover, The model also can combine RNA structure and sequence information to further optimize predictive performance.
Collapse
Affiliation(s)
- Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha, 410075, China
| | - Wei Lin
- School of Computer Science and Engineering, Central South University, Changsha, 410075, China
| | - Jiacheng Wang
- School of Computer Science and Engineering, Central South University, Changsha, 410075, China
| | - Jingpu Zhang
- School of Computer and Data Science, Henan University of Urban Construction, Pingdingshan, 467000, China.
| |
Collapse
|
25
|
Smaili FZ, Gao X, Hoehndorf R. Formal axioms in biomedical ontologies improve analysis and interpretation of associated data. Bioinformatics 2020; 36:2229-2236. [PMID: 31821406 PMCID: PMC7141863 DOI: 10.1093/bioinformatics/btz920] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 10/16/2019] [Accepted: 12/06/2019] [Indexed: 12/30/2022] Open
Abstract
Motivation Over the past years, significant resources have been invested into formalizing biomedical ontologies. Formal axioms in ontologies have been developed and used to detect and ensure ontology consistency, find unsatisfiable classes, improve interoperability, guide ontology extension through the application of axiom-based design patterns and encode domain background knowledge. The domain knowledge of biomedical ontologies may have also the potential to provide background knowledge for machine learning and predictive modelling. Results We use ontology-based machine learning methods to evaluate the contribution of formal axioms and ontology meta-data to the prediction of protein–protein interactions and gene–disease associations. We find that the background knowledge provided by the Gene Ontology and other ontologies significantly improves the performance of ontology-based prediction models through provision of domain-specific background knowledge. Furthermore, we find that the labels, synonyms and definitions in ontologies can also provide background knowledge that may be exploited for prediction. The axioms and meta-data of different ontologies contribute to improving data analysis in a context-specific manner. Our results have implications on the further development of formal knowledge bases and ontologies in the life sciences, in particular as machine learning methods are more frequently being applied. Our findings motivate the need for further development, and the systematic, application-driven evaluation and improvement, of formal axioms in ontologies. Availability and implementation https://github.com/bio-ontology-research-group/tsoe. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fatima Zohra Smaili
- Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Xin Gao
- Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Robert Hoehndorf
- Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| |
Collapse
|
26
|
Zhang W, Zhang H, Yang H, Li M, Xie Z, Li W. Computational resources associating diseases with genotypes, phenotypes and exposures. Brief Bioinform 2020; 20:2098-2115. [PMID: 30102366 PMCID: PMC6954426 DOI: 10.1093/bib/bby071] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 07/01/2018] [Indexed: 12/16/2022] Open
Abstract
The causes of a disease and its therapies are not only related to genotypes, but also associated with other factors, including phenotypes, environmental exposures, drugs and chemical molecules. Distinguishing disease-related factors from many neutral factors is critical as well as difficult. Over the past two decades, bioinformaticians have developed many computational resources to integrate the omics data and discover associations among these factors. However, researchers and clinicians are experiencing difficulties in choosing appropriate resources from hundreds of relevant databases and software tools. Here, in order to assist the researchers and clinicians, we systematically review the public computational resources of human diseases related to genotypes, phenotypes, environment factors, drugs and chemical exposures. We briefly describe the development history of these computational resources, followed by the details of the relevant databases and software tools. We finally conclude with a discussion of current challenges and future opportunities as well as prospects on this topic.
Collapse
Affiliation(s)
- Wenliang Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Haiyue Zhang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Huan Yang
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Miaoxin Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| | - Zhi Xie
- State Key Lab of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou 500040, China
| | - Weizhong Li
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China
| |
Collapse
|
27
|
Wang RL. Semantic characterization of adverse outcome pathways. AQUATIC TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2020; 222:105478. [PMID: 32278258 PMCID: PMC7393770 DOI: 10.1016/j.aquatox.2020.105478] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2019] [Revised: 03/17/2020] [Accepted: 03/23/2020] [Indexed: 05/09/2023]
Abstract
This study was undertaken to systematically assess the utilities and performance of ontology-based semantic analysis in adverse outcome pathway (AOP) research. With an increasing number of AOPs developed by scientific domain experts to organize toxicity information and facilitate chemical risk assessment, there is a pressing need for objective approaches to evaluate the biological coherence and quality of these AOPs. Powered by ontologies covering a wide range of biological domains, abundant phenotypic data annotated ontologically, and some sophisticated knowledge computing tools, semantic analysis has great potential in this area of application. With the events in the AOP-Wiki first annotated into logical definitions and then grouped into phenotypic profiles by individual AOPs, the coherence and quality of AOPs were assessed at several levels: paired key event relationships (KER), all possible event pair combinations within AOPs, and the phenotypic profiles of AOPs, genes, biological pathways, human diseases, and selected chemicals. The semantic similarities were assessed at all these levels based on a unified cross-species vertebrate phenotype ontology encompassing the logical definitions of AOP events as well as many other domain ontologies. A substantial number of KERs and AOPs in the AOP-Wiki were found to be semantically coherent. These same coherent AOPs also mapped to many more genes, pathways, and diseases biologically aligned with the intended chain of events therein leading to their respective adverse outcomes. Significantly, these findings imply that semantic analysis should also have utilities in developing future AOPs by selecting candidate events from either the existing AOP-Wiki events or a broader collection of ontology terms semantically similar to the molecular initiating events or adverse outcomes of interest. In addition, semantic analysis enabled AOP networks to be constructed at the level of phenotypic profiles based on similarities, complementing those based on event sharing by bringing genes, pathways, diseases, and chemicals into the networks too-thus greatly expanding the biological scope and our understanding of AOPs.
Collapse
Affiliation(s)
- Rong-Lin Wang
- Great Lakes Toxicology & Ecology Division, Center for Computational Toxicology & Exposure, U.S. Environmental Protection Agency, Cincinnati, OH, 45268, USA.
| |
Collapse
|
28
|
Gallagher RV, Falster DS, Maitner BS, Salguero-Gómez R, Vandvik V, Pearse WD, Schneider FD, Kattge J, Poelen JH, Madin JS, Ankenbrand MJ, Penone C, Feng X, Adams VM, Alroy J, Andrew SC, Balk MA, Bland LM, Boyle BL, Bravo-Avila CH, Brennan I, Carthey AJR, Catullo R, Cavazos BR, Conde DA, Chown SL, Fadrique B, Gibb H, Halbritter AH, Hammock J, Hogan JA, Holewa H, Hope M, Iversen CM, Jochum M, Kearney M, Keller A, Mabee P, Manning P, McCormack L, Michaletz ST, Park DS, Perez TM, Pineda-Munoz S, Ray CA, Rossetto M, Sauquet H, Sparrow B, Spasojevic MJ, Telford RJ, Tobias JA, Violle C, Walls R, Weiss KCB, Westoby M, Wright IJ, Enquist BJ. Open Science principles for accelerating trait-based science across the Tree of Life. Nat Ecol Evol 2020; 4:294-303. [PMID: 32066887 DOI: 10.1038/s41559-020-1109-6] [Citation(s) in RCA: 73] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Accepted: 01/10/2020] [Indexed: 01/22/2023]
Abstract
Synthesizing trait observations and knowledge across the Tree of Life remains a grand challenge for biodiversity science. Species traits are widely used in ecological and evolutionary science, and new data and methods have proliferated rapidly. Yet accessing and integrating disparate data sources remains a considerable challenge, slowing progress toward a global synthesis to integrate trait data across organisms. Trait science needs a vision for achieving global integration across all organisms. Here, we outline how the adoption of key Open Science principles-open data, open source and open methods-is transforming trait science, increasing transparency, democratizing access and accelerating global synthesis. To enhance widespread adoption of these principles, we introduce the Open Traits Network (OTN), a global, decentralized community welcoming all researchers and institutions pursuing the collaborative goal of standardizing and integrating trait data across organisms. We demonstrate how adherence to Open Science principles is key to the OTN community and outline five activities that can accelerate the synthesis of trait data across the Tree of Life, thereby facilitating rapid advances to address scientific inquiries and environmental issues. Lessons learned along the path to a global synthesis of trait data will provide a framework for addressing similarly complex data science and informatics challenges.
Collapse
Affiliation(s)
- Rachael V Gallagher
- Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia.
| | - Daniel S Falster
- Evolution and Ecology Research Centre and School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, New South Wales, Australia
| | - Brian S Maitner
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Roberto Salguero-Gómez
- Department of Zoology, Oxford University, Oxford, UK.,Centre for Biodiversity and Conservation Science, University of Queensland, Brisbane, Queensland, Australia.,Evolutionary Demography Laboratory, Max Plank Institute for Demographic Research, Rostock, Germany
| | - Vigdis Vandvik
- Department of Biological Sciences, University of Bergen, Bergen, Norway.,Bjerknes Centre for Climate Research, University of Bergen, Bergen, Norway
| | - William D Pearse
- Ecology Center and Department of Biology, Utah State University, Logan, UT, USA
| | | | - Jens Kattge
- Max Planck Institute for Biogeochemistry, Jena, Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
| | | | - Joshua S Madin
- Hawai'i Institute of Marine Biology, University of Hawai'i at Manoa, Manoa, HI, USA
| | - Markus J Ankenbrand
- Department of Bioinformatics, Biocenter, University of Wuerzburg, Wuerzburg, Germany.,Center for Computational and Theoretical Biology, Biocenter, University of Wuerzburg, Wuerzburg, Germany.,Comprehensive Heart Failure Center, University Hospital Wuerzburg, Wuerzburg, Germany
| | - Caterina Penone
- Institute of Plant Sciences, University of Bern, Bern, Switzerland
| | - Xiao Feng
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Vanessa M Adams
- Discipline of Geography and Spatial Sciences, University of Tasmania, Hobart, Tasmania, Australia
| | - John Alroy
- Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Samuel C Andrew
- Commonwealth Scientific and Industrial Research Organisation (CSIRO), Canberra, Australian Capital Territory, Australia
| | - Meghan A Balk
- Bio5 Institute, University of Arizona, Tucson, AZ, USA
| | - Lucie M Bland
- School of Life and Environmental Sciences, Centre for Integrative Ecology, Deakin University, Geelong, Victoria, Australia
| | - Brad L Boyle
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Catherine H Bravo-Avila
- Department of Biology, University of Miami, Miami, FL, USA.,Fairchild Tropical Botanic Garden, Coral Gables, FL, USA
| | - Ian Brennan
- Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Alexandra J R Carthey
- Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Renee Catullo
- Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Brittany R Cavazos
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, USA
| | - Dalia A Conde
- Species360 Conservation Science Alliance, Bloomington, MN, USA.,Interdisciplinary Center on Population Dynamics, University of Southern Denmark, Odense, Denmark.,Department of Biology, University of Southern Denmark, Odense, Denmark
| | - Steven L Chown
- School of Biological Sciences, Monash University, Melbourne, Victoria, Australia
| | - Belen Fadrique
- Department of Biology, University of Miami, Miami, FL, USA
| | - Heloise Gibb
- Department of Ecology, Environment and Evolution and Centre for Future Landscapes, La Trobe University, Melbourne, Victoria, Australia
| | - Aud H Halbritter
- Department of Biological Sciences, University of Bergen, Bergen, Norway.,Bjerknes Centre for Climate Research, University of Bergen, Bergen, Norway
| | - Jennifer Hammock
- National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - J Aaron Hogan
- International Center for Tropical Botany, Department of Biological Sciences, Florida International University, Miami, FL, USA
| | - Hamish Holewa
- Commonwealth Scientific and Industrial Research Organisation (CSIRO), Canberra, Australian Capital Territory, Australia
| | - Michael Hope
- Commonwealth Scientific and Industrial Research Organisation (CSIRO), Canberra, Australian Capital Territory, Australia
| | - Colleen M Iversen
- Climate Change Science Institute and Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Malte Jochum
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.,Institute of Plant Sciences, University of Bern, Bern, Switzerland.,Institute of Biology, Leipzig University, Leipzig, Germany
| | - Michael Kearney
- School of BioSciences, The University of Melbourne, Melbourne, Victoria, Australia
| | - Alexander Keller
- Department of Bioinformatics, Biocenter, University of Wuerzburg, Wuerzburg, Germany.,Center for Computational and Theoretical Biology, Biocenter, University of Wuerzburg, Wuerzburg, Germany
| | - Paula Mabee
- Department of Biology, University of South Dakota, Vermillion, SD, USA
| | - Peter Manning
- Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Frankfurt, Germany
| | - Luke McCormack
- Center for Tree Science, The Morton Arboretum, Lisle, IL, USA
| | - Sean T Michaletz
- Department of Botany and Biodiversity Research Centre, University of British Columbia, Vancouver, British Columbia, Canada
| | - Daniel S Park
- Department of Organismic and Evolutionary Biology and Harvard University Herbaria, Harvard University, Cambridge, MA, USA
| | - Timothy M Perez
- Department of Biology, University of Miami, Miami, FL, USA.,Fairchild Tropical Botanic Garden, Coral Gables, FL, USA
| | - Silvia Pineda-Munoz
- School of Biological Sciences and School of Earth & Atmospheric Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - Courtenay A Ray
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Maurizio Rossetto
- National Herbarium of New South Wales, Royal Botanic Gardens and Domain Trust, Sydney, New South Wales, Australia.,Queensland Alliance of Agriculture and Food Innovation, University of Queensland, Brisbane, Queensland, Australia
| | - Hervé Sauquet
- Evolution and Ecology Research Centre and School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, New South Wales, Australia.,National Herbarium of New South Wales, Royal Botanic Gardens and Domain Trust, Sydney, New South Wales, Australia.,Ecologie Systématique Evolution, Univ. Paris-Sud, CNRS, AgroParisTech, Universite Paris-Saclay, Orsay, France
| | - Benjamin Sparrow
- TERN / School of Biological Sciences, Faculty of Science, The University of Adelaide, Adelaide, South Australia, Australia
| | - Marko J Spasojevic
- Department of Evolution, Ecology, and Organismal Biology, University of California Riverside, Riverside, CA, USA
| | - Richard J Telford
- Department of Biological Sciences, University of Bergen, Bergen, Norway.,Bjerknes Centre for Climate Research, University of Bergen, Bergen, Norway
| | - Joseph A Tobias
- Department of Life Sciences, Imperial College London, London, UK
| | - Cyrille Violle
- CEFE, CNRS, Univ Montpellier, Université Paul Valéry Montpellier, Montpellier, France
| | | | | | - Mark Westoby
- Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Ian J Wright
- Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Brian J Enquist
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA.,Santa Fe Institute, Santa Fe, NM, USA
| |
Collapse
|
29
|
Ju M, Short AD, Thompson P, Bakerly ND, Gkoutos GV, Tsaprouni L, Ananiadou S. Annotating and detecting phenotypic information for chronic obstructive pulmonary disease. JAMIA Open 2020; 2:261-271. [PMID: 31984360 PMCID: PMC6951876 DOI: 10.1093/jamiaopen/ooz009] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 02/21/2019] [Accepted: 03/19/2019] [Indexed: 12/29/2022] Open
Abstract
Objectives Chronic obstructive pulmonary disease (COPD) phenotypes cover a range of lung abnormalities. To allow text mining methods to identify pertinent and potentially complex information about these phenotypes from textual data, we have developed a novel annotated corpus, which we use to train a neural network-based named entity recognizer to detect fine-grained COPD phenotypic information. Materials and methods Since COPD phenotype descriptions often mention other concepts within them (proteins, treatments, etc.), our corpus annotations include both outermost phenotype descriptions and concepts nested within them. Our neural layered bidirectional long short-term memory conditional random field (BiLSTM-CRF) network firstly recognizes nested mentions, which are fed into subsequent BiLSTM-CRF layers, to help to recognize enclosing phenotype mentions. Results Our corpus of 30 full papers (available at: http://www.nactem.ac.uk/COPD) is annotated by experts with 27 030 phenotype-related concept mentions, most of which are automatically linked to UMLS Metathesaurus concepts. When trained using the corpus, our BiLSTM-CRF network outperforms other popular approaches in recognizing detailed phenotypic information. Discussion Information extracted by our method can facilitate efficient location and exploration of detailed information about phenotypes, for example, those specifically concerning reactions to treatments. Conclusion The importance of our corpus for developing methods to extract fine-grained information about COPD phenotypes is demonstrated through its successful use to train a layered BiLSTM-CRF network to extract phenotypic information at various levels of granularity. The minimal human intervention needed for training should permit ready adaption to extracting phenotypic information about other diseases.
Collapse
Affiliation(s)
- Meizhi Ju
- National Centre for Text Mining, School of Computer Science, The University of Manchester, Manchester, UK
| | - Andrea D Short
- Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, UK
| | - Paul Thompson
- National Centre for Text Mining, School of Computer Science, The University of Manchester, Manchester, UK
| | - Nawar Diar Bakerly
- Salford Royal NHS Foundation Trust; and School of Health Sciences, The University of Manchester, Manchester, UK
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, UK.,Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK.,MRC Health Data Research UK (HDR UK).,NIHR Experimental Cancer Medicine Centre, Birmingham, UK.,NIHR Surgical Reconstruction and Microbiology Research Centre, Birmingham, UK.,NIHR Biomedical Research Centre, Birmingham, UK
| | - Loukia Tsaprouni
- School of Health Sciences, Centre for Life and Sport Sciences, Birmingham City University, Birmingham, UK
| | - Sophia Ananiadou
- National Centre for Text Mining, School of Computer Science, The University of Manchester, Manchester, UK
| |
Collapse
|
30
|
Wegrzyn JL, Falk T, Grau E, Buehler S, Ramnath R, Herndon N. Cyberinfrastructure and resources to enable an integrative approach to studying forest trees. Evol Appl 2020; 13:228-241. [PMID: 31892954 PMCID: PMC6935593 DOI: 10.1111/eva.12860] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 08/11/2019] [Accepted: 08/14/2019] [Indexed: 12/19/2022] Open
Abstract
Sequencing technologies and bioinformatic approaches are now available to resolve the challenges associated with complex and heterozygous genomes. Increased access to less expensive and more effective instrumentation will contribute to a wealth of high-quality plant genomes in the next few years. In the meantime, more than 370 tree species are associated with public projects in primary repositories that are interrogating expression profiles, identifying variants, or analyzing targeted capture without a high-quality reference genome. Genomic data from these projects generates sequences that represent intermediate assemblies for transcriptomes and genomes. These data contribute to forest tree biology, but the associated sequence remains trapped in supplemental files that are poorly integrated in plant community databases and comparative genomic platforms. Successful implementation of life science cyberinfrastructure is improving data standards, ontologies, analytic workflows, and integrated database platforms for both model and non-model plant species. Unique to forest trees with large populations that are long-lived, outcrossing, and genetically diverse, the phenotypic and environmental metrics associated with georeferenced populations are just as important as the genomic data sampled for each individual. To address questions related to forest health and productivity, cyberinfrastructure must keep pace with the magnitude of genomic and phenomic sampling of larger populations. This review examines the current landscape of cyberinfrastructure, with an emphasis on best practices and resources to align community data with the Findable, Accessible, Interoperable, and Reusable (FAIR) guidelines.
Collapse
Affiliation(s)
- Jill L. Wegrzyn
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut
| | - Taylor Falk
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut
| | - Emily Grau
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut
| | - Sean Buehler
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut
| | - Risharde Ramnath
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut
| | - Nic Herndon
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut
| |
Collapse
|
31
|
Elmore SA, Cardiff R, Cesta MF, Gkoutos GV, Hoehndorf R, Keenan CM, McKerlie C, Schofield PN, Sundberg JP, Ward JM. A Review of Current Standards and the Evolution of Histopathology Nomenclature for Laboratory Animals. ILAR J 2019; 59:29-39. [PMID: 30476141 DOI: 10.1093/ilar/ily005] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Revised: 05/04/2018] [Indexed: 12/14/2022] Open
Abstract
The need for international collaboration in rodent pathology has evolved since the 1970s and was initially driven by the new field of toxicologic pathology. First initiated by the World Health Organization's International Agency for Research on Cancer for rodents, it has evolved to include pathology of the major species (rats, mice, guinea pigs, nonhuman primates, pigs, dogs, fish, rabbits) used in medical research, safety assessment, and mouse pathology. The collaborative effort today is driven by the needs of the regulatory agencies in multiple countries, and by needs of research involving genetically engineered animals, for "basic" research and for more translational preclinical models of human disease. These efforts led to the establishment of an international rodent pathology nomenclature program. Since that time, multiple collaborations for standardization of laboratory animal pathology nomenclature and diagnostic criteria have been developed, and just a few are described herein. Recently, approaches to a nomenclature that is amenable to sophisticated computation have been made available and implemented for large-scale programs in functional genomics and aging. Most terminologies continue to evolve as the science of human and veterinary pathology continues to develop, but standardization and successful implementation remain critical for scientific communication now as ever in the history of veterinary nosology.
Collapse
Affiliation(s)
- Susan A Elmore
- Susan A. Elmore, MS, DVM, DCVP, DABT, FIATP, is NTP Pathologist and Staff Scientist at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Robert D. Cardiff, MD, PhD, is Distinguished Professor of Pathology, Emeritus at the UCD Center for Comparative Medicine, University of California, and the Department of Pathology and Laboratory Medicine, School of Medicine, Davis, in Davis, California. Mark F. Cesta, DVM, PhD, DACVP, is NTP Pathologist and Staff Scientist, leading the effort for establishment of the online NTP Nonneoplastic Lesion Atlas at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Georgios V. Gkoutos, PhD, DIC, is Professor of Clinical Bioinformatics at College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences Centre for Computational Biology, University of Birmingham in Birmingham, United Kingdom. Robert Hoehndorf, PhD, is Assistant Professor in Computer Science at the Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology in Thuwal, Kingdom of Saudi Arabia. Charlotte M. Keenan, VMD, DACVP, is a principle consultant at C.M. ToxPath Consulting in Doylestown, Pennsylvania, USA and leads the international STP effort for the publication of the harmonization of nomenclature and diagnostic criteria (INHAND) in toxicologic pathology. Colin McKerlie, DVM, DVSc, MRCVS, is a senior associate scientist in the Translational Medicine Research Program at The Hospital for Sick Children and a Professor in the Department of Pathobiology & Laboratory Medicine in the Faculty of Medicine at the University of Toronto, Toronto, Ontario, Canada. Paul N. Schofield, MA DPhil, is the University Reader in Biomedical Informatics at the Department of Physiology, Development & Neuroscience, University of Cambridge in Cambridge, United Kingdom and is also an adjunct professor at The Jackson Laboratory in Bar Harbor, Maine. John P. Sundberg, DVM, PhD, DACVP, is a professor at The Jackson Laboratory in Bar Harbor, Maine. Jerrold M. Ward, DVM, PhD, DACVP, FIATP, is a special volunteer at the National Cancer Institute, National Institutes of Health in Bethesda, MD and is also Adjunct Faculty at The Jackson Laboratory in Bar Harbor, Maine
| | - Robert Cardiff
- Susan A. Elmore, MS, DVM, DCVP, DABT, FIATP, is NTP Pathologist and Staff Scientist at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Robert D. Cardiff, MD, PhD, is Distinguished Professor of Pathology, Emeritus at the UCD Center for Comparative Medicine, University of California, and the Department of Pathology and Laboratory Medicine, School of Medicine, Davis, in Davis, California. Mark F. Cesta, DVM, PhD, DACVP, is NTP Pathologist and Staff Scientist, leading the effort for establishment of the online NTP Nonneoplastic Lesion Atlas at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Georgios V. Gkoutos, PhD, DIC, is Professor of Clinical Bioinformatics at College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences Centre for Computational Biology, University of Birmingham in Birmingham, United Kingdom. Robert Hoehndorf, PhD, is Assistant Professor in Computer Science at the Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology in Thuwal, Kingdom of Saudi Arabia. Charlotte M. Keenan, VMD, DACVP, is a principle consultant at C.M. ToxPath Consulting in Doylestown, Pennsylvania, USA and leads the international STP effort for the publication of the harmonization of nomenclature and diagnostic criteria (INHAND) in toxicologic pathology. Colin McKerlie, DVM, DVSc, MRCVS, is a senior associate scientist in the Translational Medicine Research Program at The Hospital for Sick Children and a Professor in the Department of Pathobiology & Laboratory Medicine in the Faculty of Medicine at the University of Toronto, Toronto, Ontario, Canada. Paul N. Schofield, MA DPhil, is the University Reader in Biomedical Informatics at the Department of Physiology, Development & Neuroscience, University of Cambridge in Cambridge, United Kingdom and is also an adjunct professor at The Jackson Laboratory in Bar Harbor, Maine. John P. Sundberg, DVM, PhD, DACVP, is a professor at The Jackson Laboratory in Bar Harbor, Maine. Jerrold M. Ward, DVM, PhD, DACVP, FIATP, is a special volunteer at the National Cancer Institute, National Institutes of Health in Bethesda, MD and is also Adjunct Faculty at The Jackson Laboratory in Bar Harbor, Maine
| | - Mark F Cesta
- Susan A. Elmore, MS, DVM, DCVP, DABT, FIATP, is NTP Pathologist and Staff Scientist at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Robert D. Cardiff, MD, PhD, is Distinguished Professor of Pathology, Emeritus at the UCD Center for Comparative Medicine, University of California, and the Department of Pathology and Laboratory Medicine, School of Medicine, Davis, in Davis, California. Mark F. Cesta, DVM, PhD, DACVP, is NTP Pathologist and Staff Scientist, leading the effort for establishment of the online NTP Nonneoplastic Lesion Atlas at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Georgios V. Gkoutos, PhD, DIC, is Professor of Clinical Bioinformatics at College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences Centre for Computational Biology, University of Birmingham in Birmingham, United Kingdom. Robert Hoehndorf, PhD, is Assistant Professor in Computer Science at the Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology in Thuwal, Kingdom of Saudi Arabia. Charlotte M. Keenan, VMD, DACVP, is a principle consultant at C.M. ToxPath Consulting in Doylestown, Pennsylvania, USA and leads the international STP effort for the publication of the harmonization of nomenclature and diagnostic criteria (INHAND) in toxicologic pathology. Colin McKerlie, DVM, DVSc, MRCVS, is a senior associate scientist in the Translational Medicine Research Program at The Hospital for Sick Children and a Professor in the Department of Pathobiology & Laboratory Medicine in the Faculty of Medicine at the University of Toronto, Toronto, Ontario, Canada. Paul N. Schofield, MA DPhil, is the University Reader in Biomedical Informatics at the Department of Physiology, Development & Neuroscience, University of Cambridge in Cambridge, United Kingdom and is also an adjunct professor at The Jackson Laboratory in Bar Harbor, Maine. John P. Sundberg, DVM, PhD, DACVP, is a professor at The Jackson Laboratory in Bar Harbor, Maine. Jerrold M. Ward, DVM, PhD, DACVP, FIATP, is a special volunteer at the National Cancer Institute, National Institutes of Health in Bethesda, MD and is also Adjunct Faculty at The Jackson Laboratory in Bar Harbor, Maine
| | - Georgios V Gkoutos
- Susan A. Elmore, MS, DVM, DCVP, DABT, FIATP, is NTP Pathologist and Staff Scientist at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Robert D. Cardiff, MD, PhD, is Distinguished Professor of Pathology, Emeritus at the UCD Center for Comparative Medicine, University of California, and the Department of Pathology and Laboratory Medicine, School of Medicine, Davis, in Davis, California. Mark F. Cesta, DVM, PhD, DACVP, is NTP Pathologist and Staff Scientist, leading the effort for establishment of the online NTP Nonneoplastic Lesion Atlas at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Georgios V. Gkoutos, PhD, DIC, is Professor of Clinical Bioinformatics at College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences Centre for Computational Biology, University of Birmingham in Birmingham, United Kingdom. Robert Hoehndorf, PhD, is Assistant Professor in Computer Science at the Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology in Thuwal, Kingdom of Saudi Arabia. Charlotte M. Keenan, VMD, DACVP, is a principle consultant at C.M. ToxPath Consulting in Doylestown, Pennsylvania, USA and leads the international STP effort for the publication of the harmonization of nomenclature and diagnostic criteria (INHAND) in toxicologic pathology. Colin McKerlie, DVM, DVSc, MRCVS, is a senior associate scientist in the Translational Medicine Research Program at The Hospital for Sick Children and a Professor in the Department of Pathobiology & Laboratory Medicine in the Faculty of Medicine at the University of Toronto, Toronto, Ontario, Canada. Paul N. Schofield, MA DPhil, is the University Reader in Biomedical Informatics at the Department of Physiology, Development & Neuroscience, University of Cambridge in Cambridge, United Kingdom and is also an adjunct professor at The Jackson Laboratory in Bar Harbor, Maine. John P. Sundberg, DVM, PhD, DACVP, is a professor at The Jackson Laboratory in Bar Harbor, Maine. Jerrold M. Ward, DVM, PhD, DACVP, FIATP, is a special volunteer at the National Cancer Institute, National Institutes of Health in Bethesda, MD and is also Adjunct Faculty at The Jackson Laboratory in Bar Harbor, Maine
| | - Robert Hoehndorf
- Susan A. Elmore, MS, DVM, DCVP, DABT, FIATP, is NTP Pathologist and Staff Scientist at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Robert D. Cardiff, MD, PhD, is Distinguished Professor of Pathology, Emeritus at the UCD Center for Comparative Medicine, University of California, and the Department of Pathology and Laboratory Medicine, School of Medicine, Davis, in Davis, California. Mark F. Cesta, DVM, PhD, DACVP, is NTP Pathologist and Staff Scientist, leading the effort for establishment of the online NTP Nonneoplastic Lesion Atlas at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Georgios V. Gkoutos, PhD, DIC, is Professor of Clinical Bioinformatics at College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences Centre for Computational Biology, University of Birmingham in Birmingham, United Kingdom. Robert Hoehndorf, PhD, is Assistant Professor in Computer Science at the Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology in Thuwal, Kingdom of Saudi Arabia. Charlotte M. Keenan, VMD, DACVP, is a principle consultant at C.M. ToxPath Consulting in Doylestown, Pennsylvania, USA and leads the international STP effort for the publication of the harmonization of nomenclature and diagnostic criteria (INHAND) in toxicologic pathology. Colin McKerlie, DVM, DVSc, MRCVS, is a senior associate scientist in the Translational Medicine Research Program at The Hospital for Sick Children and a Professor in the Department of Pathobiology & Laboratory Medicine in the Faculty of Medicine at the University of Toronto, Toronto, Ontario, Canada. Paul N. Schofield, MA DPhil, is the University Reader in Biomedical Informatics at the Department of Physiology, Development & Neuroscience, University of Cambridge in Cambridge, United Kingdom and is also an adjunct professor at The Jackson Laboratory in Bar Harbor, Maine. John P. Sundberg, DVM, PhD, DACVP, is a professor at The Jackson Laboratory in Bar Harbor, Maine. Jerrold M. Ward, DVM, PhD, DACVP, FIATP, is a special volunteer at the National Cancer Institute, National Institutes of Health in Bethesda, MD and is also Adjunct Faculty at The Jackson Laboratory in Bar Harbor, Maine
| | - Charlotte M Keenan
- Susan A. Elmore, MS, DVM, DCVP, DABT, FIATP, is NTP Pathologist and Staff Scientist at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Robert D. Cardiff, MD, PhD, is Distinguished Professor of Pathology, Emeritus at the UCD Center for Comparative Medicine, University of California, and the Department of Pathology and Laboratory Medicine, School of Medicine, Davis, in Davis, California. Mark F. Cesta, DVM, PhD, DACVP, is NTP Pathologist and Staff Scientist, leading the effort for establishment of the online NTP Nonneoplastic Lesion Atlas at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Georgios V. Gkoutos, PhD, DIC, is Professor of Clinical Bioinformatics at College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences Centre for Computational Biology, University of Birmingham in Birmingham, United Kingdom. Robert Hoehndorf, PhD, is Assistant Professor in Computer Science at the Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology in Thuwal, Kingdom of Saudi Arabia. Charlotte M. Keenan, VMD, DACVP, is a principle consultant at C.M. ToxPath Consulting in Doylestown, Pennsylvania, USA and leads the international STP effort for the publication of the harmonization of nomenclature and diagnostic criteria (INHAND) in toxicologic pathology. Colin McKerlie, DVM, DVSc, MRCVS, is a senior associate scientist in the Translational Medicine Research Program at The Hospital for Sick Children and a Professor in the Department of Pathobiology & Laboratory Medicine in the Faculty of Medicine at the University of Toronto, Toronto, Ontario, Canada. Paul N. Schofield, MA DPhil, is the University Reader in Biomedical Informatics at the Department of Physiology, Development & Neuroscience, University of Cambridge in Cambridge, United Kingdom and is also an adjunct professor at The Jackson Laboratory in Bar Harbor, Maine. John P. Sundberg, DVM, PhD, DACVP, is a professor at The Jackson Laboratory in Bar Harbor, Maine. Jerrold M. Ward, DVM, PhD, DACVP, FIATP, is a special volunteer at the National Cancer Institute, National Institutes of Health in Bethesda, MD and is also Adjunct Faculty at The Jackson Laboratory in Bar Harbor, Maine
| | - Colin McKerlie
- Susan A. Elmore, MS, DVM, DCVP, DABT, FIATP, is NTP Pathologist and Staff Scientist at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Robert D. Cardiff, MD, PhD, is Distinguished Professor of Pathology, Emeritus at the UCD Center for Comparative Medicine, University of California, and the Department of Pathology and Laboratory Medicine, School of Medicine, Davis, in Davis, California. Mark F. Cesta, DVM, PhD, DACVP, is NTP Pathologist and Staff Scientist, leading the effort for establishment of the online NTP Nonneoplastic Lesion Atlas at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Georgios V. Gkoutos, PhD, DIC, is Professor of Clinical Bioinformatics at College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences Centre for Computational Biology, University of Birmingham in Birmingham, United Kingdom. Robert Hoehndorf, PhD, is Assistant Professor in Computer Science at the Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology in Thuwal, Kingdom of Saudi Arabia. Charlotte M. Keenan, VMD, DACVP, is a principle consultant at C.M. ToxPath Consulting in Doylestown, Pennsylvania, USA and leads the international STP effort for the publication of the harmonization of nomenclature and diagnostic criteria (INHAND) in toxicologic pathology. Colin McKerlie, DVM, DVSc, MRCVS, is a senior associate scientist in the Translational Medicine Research Program at The Hospital for Sick Children and a Professor in the Department of Pathobiology & Laboratory Medicine in the Faculty of Medicine at the University of Toronto, Toronto, Ontario, Canada. Paul N. Schofield, MA DPhil, is the University Reader in Biomedical Informatics at the Department of Physiology, Development & Neuroscience, University of Cambridge in Cambridge, United Kingdom and is also an adjunct professor at The Jackson Laboratory in Bar Harbor, Maine. John P. Sundberg, DVM, PhD, DACVP, is a professor at The Jackson Laboratory in Bar Harbor, Maine. Jerrold M. Ward, DVM, PhD, DACVP, FIATP, is a special volunteer at the National Cancer Institute, National Institutes of Health in Bethesda, MD and is also Adjunct Faculty at The Jackson Laboratory in Bar Harbor, Maine
| | - Paul N Schofield
- Susan A. Elmore, MS, DVM, DCVP, DABT, FIATP, is NTP Pathologist and Staff Scientist at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Robert D. Cardiff, MD, PhD, is Distinguished Professor of Pathology, Emeritus at the UCD Center for Comparative Medicine, University of California, and the Department of Pathology and Laboratory Medicine, School of Medicine, Davis, in Davis, California. Mark F. Cesta, DVM, PhD, DACVP, is NTP Pathologist and Staff Scientist, leading the effort for establishment of the online NTP Nonneoplastic Lesion Atlas at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Georgios V. Gkoutos, PhD, DIC, is Professor of Clinical Bioinformatics at College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences Centre for Computational Biology, University of Birmingham in Birmingham, United Kingdom. Robert Hoehndorf, PhD, is Assistant Professor in Computer Science at the Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology in Thuwal, Kingdom of Saudi Arabia. Charlotte M. Keenan, VMD, DACVP, is a principle consultant at C.M. ToxPath Consulting in Doylestown, Pennsylvania, USA and leads the international STP effort for the publication of the harmonization of nomenclature and diagnostic criteria (INHAND) in toxicologic pathology. Colin McKerlie, DVM, DVSc, MRCVS, is a senior associate scientist in the Translational Medicine Research Program at The Hospital for Sick Children and a Professor in the Department of Pathobiology & Laboratory Medicine in the Faculty of Medicine at the University of Toronto, Toronto, Ontario, Canada. Paul N. Schofield, MA DPhil, is the University Reader in Biomedical Informatics at the Department of Physiology, Development & Neuroscience, University of Cambridge in Cambridge, United Kingdom and is also an adjunct professor at The Jackson Laboratory in Bar Harbor, Maine. John P. Sundberg, DVM, PhD, DACVP, is a professor at The Jackson Laboratory in Bar Harbor, Maine. Jerrold M. Ward, DVM, PhD, DACVP, FIATP, is a special volunteer at the National Cancer Institute, National Institutes of Health in Bethesda, MD and is also Adjunct Faculty at The Jackson Laboratory in Bar Harbor, Maine
| | - John P Sundberg
- Susan A. Elmore, MS, DVM, DCVP, DABT, FIATP, is NTP Pathologist and Staff Scientist at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Robert D. Cardiff, MD, PhD, is Distinguished Professor of Pathology, Emeritus at the UCD Center for Comparative Medicine, University of California, and the Department of Pathology and Laboratory Medicine, School of Medicine, Davis, in Davis, California. Mark F. Cesta, DVM, PhD, DACVP, is NTP Pathologist and Staff Scientist, leading the effort for establishment of the online NTP Nonneoplastic Lesion Atlas at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Georgios V. Gkoutos, PhD, DIC, is Professor of Clinical Bioinformatics at College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences Centre for Computational Biology, University of Birmingham in Birmingham, United Kingdom. Robert Hoehndorf, PhD, is Assistant Professor in Computer Science at the Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology in Thuwal, Kingdom of Saudi Arabia. Charlotte M. Keenan, VMD, DACVP, is a principle consultant at C.M. ToxPath Consulting in Doylestown, Pennsylvania, USA and leads the international STP effort for the publication of the harmonization of nomenclature and diagnostic criteria (INHAND) in toxicologic pathology. Colin McKerlie, DVM, DVSc, MRCVS, is a senior associate scientist in the Translational Medicine Research Program at The Hospital for Sick Children and a Professor in the Department of Pathobiology & Laboratory Medicine in the Faculty of Medicine at the University of Toronto, Toronto, Ontario, Canada. Paul N. Schofield, MA DPhil, is the University Reader in Biomedical Informatics at the Department of Physiology, Development & Neuroscience, University of Cambridge in Cambridge, United Kingdom and is also an adjunct professor at The Jackson Laboratory in Bar Harbor, Maine. John P. Sundberg, DVM, PhD, DACVP, is a professor at The Jackson Laboratory in Bar Harbor, Maine. Jerrold M. Ward, DVM, PhD, DACVP, FIATP, is a special volunteer at the National Cancer Institute, National Institutes of Health in Bethesda, MD and is also Adjunct Faculty at The Jackson Laboratory in Bar Harbor, Maine
| | - Jerrold M Ward
- Susan A. Elmore, MS, DVM, DCVP, DABT, FIATP, is NTP Pathologist and Staff Scientist at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Robert D. Cardiff, MD, PhD, is Distinguished Professor of Pathology, Emeritus at the UCD Center for Comparative Medicine, University of California, and the Department of Pathology and Laboratory Medicine, School of Medicine, Davis, in Davis, California. Mark F. Cesta, DVM, PhD, DACVP, is NTP Pathologist and Staff Scientist, leading the effort for establishment of the online NTP Nonneoplastic Lesion Atlas at the National Toxicology Program, National Institute of Environmental Health Sciences in the Research Triangle Park, North Carolina. Georgios V. Gkoutos, PhD, DIC, is Professor of Clinical Bioinformatics at College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences Centre for Computational Biology, University of Birmingham in Birmingham, United Kingdom. Robert Hoehndorf, PhD, is Assistant Professor in Computer Science at the Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology in Thuwal, Kingdom of Saudi Arabia. Charlotte M. Keenan, VMD, DACVP, is a principle consultant at C.M. ToxPath Consulting in Doylestown, Pennsylvania, USA and leads the international STP effort for the publication of the harmonization of nomenclature and diagnostic criteria (INHAND) in toxicologic pathology. Colin McKerlie, DVM, DVSc, MRCVS, is a senior associate scientist in the Translational Medicine Research Program at The Hospital for Sick Children and a Professor in the Department of Pathobiology & Laboratory Medicine in the Faculty of Medicine at the University of Toronto, Toronto, Ontario, Canada. Paul N. Schofield, MA DPhil, is the University Reader in Biomedical Informatics at the Department of Physiology, Development & Neuroscience, University of Cambridge in Cambridge, United Kingdom and is also an adjunct professor at The Jackson Laboratory in Bar Harbor, Maine. John P. Sundberg, DVM, PhD, DACVP, is a professor at The Jackson Laboratory in Bar Harbor, Maine. Jerrold M. Ward, DVM, PhD, DACVP, FIATP, is a special volunteer at the National Cancer Institute, National Institutes of Health in Bethesda, MD and is also Adjunct Faculty at The Jackson Laboratory in Bar Harbor, Maine
| |
Collapse
|
32
|
Alshahrani M, Hoehndorf R. Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes. Bioinformatics 2019; 34:i901-i907. [PMID: 30423077 PMCID: PMC6129260 DOI: 10.1093/bioinformatics/bty559] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Motivation In the past years, several methods have been developed to incorporate information about phenotypes into computational disease gene prioritization methods. These methods commonly compute the similarity between a disease’s (or patient’s) phenotypes and a database of gene-to-phenotype associations to find the phenotypically most similar match. A key limitation of these methods is their reliance on knowledge about phenotypes associated with particular genes which is highly incomplete in humans as well as in many model organisms such as the mouse. Results We developed SmuDGE, a method that uses feature learning to generate vector-based representations of phenotypes associated with an entity. SmuDGE can be used as a trainable semantic similarity measure to compare two sets of phenotypes (such as between a disease and gene, or a disease and patient). More importantly, SmuDGE can generate phenotype representations for entities that are only indirectly associated with phenotypes through an interaction network; for this purpose, SmuDGE exploits background knowledge in interaction networks comprised of multiple types of interactions. We demonstrate that SmuDGE can match or outperform semantic similarity in phenotype-based disease gene prioritization, and furthermore significantly extends the coverage of phenotype-based methods to all genes in a connected interaction network. Availability and implementation https://github.com/bio-ontology-research-group/SmuDGE
Collapse
Affiliation(s)
- Mona Alshahrani
- Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
33
|
Cianciarullo AM, Bonini-Domingos CR, Vizotto LD, Kobashi LS, Beçak ML, Beçak W. Whole-genome duplication and hemoglobin differentiation traits between allopatric populations of Brazilian Odontophrynus americanus species complex (Amphibia, Anura). Genet Mol Biol 2019; 42:436-444. [PMID: 31259358 PMCID: PMC6726162 DOI: 10.1590/1678-4685-gmb-2017-0260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Accepted: 07/25/2018] [Indexed: 11/21/2022] Open
Abstract
Two allopatric populations of Brazilian diploid and tetraploid
Odontophrynus americanus species complex, both from São
Paulo state, had their blood hemoglobin biochemically analyzed. In addition,
these specimens were cytogenetically characterized. Biochemical characterization
of hemoglobin expression showed a distinct banding pattern between the
allopatric specimens. Besides this, two distinct phenotypes, not linked to
ploidy, sex, or age, were observed in adult animals of both populations.
Phenotype A exhibits dark-colored body with small papillae, ogival-shaped jaw
with reduced interpupillary distance and shorter hind limbs. Phenotype B shows
yellowish-colored body with larger papillae, arch-shaped jaw with broader
interpupillary distance and longer hind limbs. Intermediate phenotypes were also
found. Considering the geographical isolation of both populations, differences
in chromosomal secondary constrictions and distinct hemoglobins banding
patterns, these data indicate that 2n and 4n populations represent cryptic
species in the O. americanus species complex. The observed
phenotypic diversity can be interpreted as population genetic variability.
Eventually future data may indicate a probable beginning of speciation in these
Brazilian frogs. Such inter- and intrapopulational differentiation/speciation
process indicates that O. americanus species complex taxonomy
deserves further evaluation by genomics and metabarcoding communities, also
considering the pattern of hemoglobin expression, in South American frogs.
Collapse
Affiliation(s)
| | - Claudia R Bonini-Domingos
- Department of Biology, Laboratory of Hemoglobins and Genetics of the Hematological Diseases, Universidade Estadual Paulista "Julio de Mesquita Filho (UNESP), São José do Rio Preto, SP, Brazil
| | - Luiz D Vizotto
- Department of Zoology, Universidade Estadual Paulista "Julio de Mesquita Filho (UNESP), São José do Rio Preto, SP, Brazil
| | - Leonardo S Kobashi
- Laboratory of Ecology and Evolution, Instituto Butantan, São Paulo, SP, Brazil.,Universidade Paulista (UNIP) São Paulo, SP, Brazil
| | | | - Willy Beçak
- Laboratory of Genetics, Instituto Butantan, São Paulo, SP, Brazil
| |
Collapse
|
34
|
Alghamdi SM, Sundberg BA, Sundberg JP, Schofield PN, Hoehndorf R. Quantitative evaluation of ontology design patterns for combining pathology and anatomy ontologies. Sci Rep 2019; 9:4025. [PMID: 30858527 PMCID: PMC6411989 DOI: 10.1038/s41598-019-40368-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 02/14/2019] [Indexed: 12/28/2022] Open
Abstract
Data are increasingly annotated with multiple ontologies to capture rich information about the features of the subject under investigation. Analysis may be performed over each ontology separately, but recently there has been a move to combine multiple ontologies to provide more powerful analytical possibilities. However, it is often not clear how to combine ontologies or how to assess or evaluate the potential design patterns available. Here we use a large and well-characterized dataset of anatomic pathology descriptions from a major study of aging mice. We show how different design patterns based on the MPATH and MA ontologies provide orthogonal axes of analysis, and perform differently in over-representation and semantic similarity applications. We discuss how such a data-driven approach might be used generally to generate and evaluate ontology design patterns.
Collapse
Affiliation(s)
- Sarah M Alghamdi
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, 23955-6900, Saudi Arabia
- King Abdul-Aziz University, Faculty of Computing and Information Technology, Rabigh, 25732, Saudi Arabia
| | - Beth A Sundberg
- The Jackson Laboratory, 600, Main Street, Bar Harbor, ME, 04609, USA
| | - John P Sundberg
- The Jackson Laboratory, 600, Main Street, Bar Harbor, ME, 04609, USA
| | - Paul N Schofield
- The Jackson Laboratory, 600, Main Street, Bar Harbor, ME, 04609, USA.
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, UK.
| | - Robert Hoehndorf
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, 23955-6900, Saudi Arabia.
| |
Collapse
|
35
|
Kafkas Ş, Hoehndorf R. Ontology based text mining of gene-phenotype associations: application to candidate gene prediction. Database (Oxford) 2019; 2019:baz019. [PMID: 30809638 PMCID: PMC6391585 DOI: 10.1093/database/baz019] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Revised: 01/09/2019] [Accepted: 01/26/2019] [Indexed: 01/07/2023]
Abstract
Gene-phenotype associations play an important role in understanding the disease mechanisms which is a requirement for treatment development. A portion of gene-phenotype associations are observed mainly experimentally and made publicly available through several standard resources such as MGI. However, there is still a vast amount of gene-phenotype associations buried in the biomedical literature. Given the large amount of literature data, we need automated text mining tools to alleviate the burden in manual curation of gene-phenotype associations and to develop comprehensive resources. In this study, we present an ontology-based approach in combination with statistical methods to text mine gene-phenotype associations from the literature. Our method achieved AUC values of 0.90 and 0.75 in recovering known gene-phenotype associations from HPO and MGI respectively. We posit that candidate genes and their relevant diseases should be expressed with similar phenotypes in publications. Thus, we demonstrate the utility of our approach by predicting disease candidate genes based on the semantic similarities of phenotypes associated with genes and diseases. To the best of our knowledge, this is the first study using an ontology based approach to extract gene-phenotype associations from the literature. We evaluated our disease candidate prediction model on the gene-disease associations from MGI. Our model achieved AUC values of 0.90 and 0.87 on OMIM (human) and MGI (mouse) datasets of gene-disease associations respectively. Our manual analysis on the text mined data revealed that our method can accurately extract gene-phenotype associations which are not currently covered by the existing public gene-phenotype resources. Overall, results indicate that our method can precisely extract known as well as new gene-phenotype associations from literature. All the data and methods are available at https://github.com/bio-ontology-research-group/genepheno.
Collapse
Affiliation(s)
- Şenay Kafkas
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Kingdom of Saudi Arabia
| |
Collapse
|
36
|
Neveu P, Tireau A, Hilgert N, Nègre V, Mineau‐Cesari J, Brichet N, Chapuis R, Sanchez I, Pommier C, Charnomordic B, Tardieu F, Cabrera‐Bosquet L. Dealing with multi-source and multi-scale information in plant phenomics: the ontology-driven Phenotyping Hybrid Information System. THE NEW PHYTOLOGIST 2019; 221:588-601. [PMID: 30152011 PMCID: PMC6585972 DOI: 10.1111/nph.15385] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Accepted: 07/07/2018] [Indexed: 05/13/2023]
Abstract
Phenomic datasets need to be accessible to the scientific community. Their reanalysis requires tracing relevant information on thousands of plants, sensors and events. The open-source Phenotyping Hybrid Information System (PHIS) is proposed for plant phenotyping experiments in various categories of installations (field, glasshouse). It unambiguously identifies all objects and traits in an experiment and establishes their relations via ontologies and semantics that apply to both field and controlled conditions. For instance, the genotype is declared for a plant or plot and is associated with all objects related to it. Events such as successive plant positions, anomalies and annotations are associated with objects so they can be easily retrieved. Its ontology-driven architecture is a powerful tool for integrating and managing data from multiple experiments and platforms, for creating relationships between objects and enriching datasets with knowledge and metadata. It interoperates with external resources via web services, thereby allowing data integration into other systems; for example, modelling platforms or external databases. It has the potential for rapid diffusion because of its ability to integrate, manage and visualize multi-source and multi-scale data, but also because it is based on 10 yr of trial and error in our groups.
Collapse
Affiliation(s)
- Pascal Neveu
- MISTEA, INRA, Montpellier SupAgro, Université de MontpellierMontpellier34060France
| | - Anne Tireau
- MISTEA, INRA, Montpellier SupAgro, Université de MontpellierMontpellier34060France
| | - Nadine Hilgert
- MISTEA, INRA, Montpellier SupAgro, Université de MontpellierMontpellier34060France
| | - Vincent Nègre
- LEPSE, INRA, Montpellier SupAgro, Université de MontpellierMontpellier34060France
| | - Jonathan Mineau‐Cesari
- MISTEA, INRA, Montpellier SupAgro, Université de MontpellierMontpellier34060France
- LEPSE, INRA, Montpellier SupAgro, Université de MontpellierMontpellier34060France
| | - Nicolas Brichet
- LEPSE, INRA, Montpellier SupAgro, Université de MontpellierMontpellier34060France
| | - Romain Chapuis
- UE DIASCOPE, INRA, Montpellier SupAgro, Université de MontpellierMontpellier34060France
| | - Isabelle Sanchez
- MISTEA, INRA, Montpellier SupAgro, Université de MontpellierMontpellier34060France
| | - Cyril Pommier
- INRA, UR1164 URGI – Research Unit in Genomics‐InfoINRA de Versailles‐GrignonRoute de Saint‐CyrVersailles78026France
| | | | - François Tardieu
- LEPSE, INRA, Montpellier SupAgro, Université de MontpellierMontpellier34060France
| | | |
Collapse
|
37
|
Endara L, Thessen AE, Cole HA, Walls R, Gkoutos G, Cao Y, Chong SS, Cui H. Modifier Ontologies for frequency, certainty, degree, and coverage phenotype modifier. Biodivers Data J 2018; 6:e29232. [PMID: 30532623 PMCID: PMC6281706 DOI: 10.3897/bdj.6.e29232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 11/20/2018] [Indexed: 11/21/2022] Open
Abstract
Background: When phenotypic characters are described in the literature, they may be constrained or clarified with additional information such as the location or degree of expression, these terms are called "modifiers". With effort underway to convert narrative character descriptions to computable data, ontologies for such modifiers are needed. Such ontologies can also be used to guide term usage in future publications. Spatial and method modifiers are the subjects of ontologies that already have been developed or are under development. In this work, frequency (e.g., rarely, usually), certainty (e.g., probably, definitely), degree (e.g., slightly, extremely), and coverage modifiers (e.g., sparsely, entirely) are collected, reviewed, and used to create two modifier ontologies with different design considerations. The basic goal is to express the sequential relationships within a type of modifiers, for example, usually is more frequent than rarely, in order to allow data annotated with ontology terms to be classified accordingly. Method: Two designs are proposed for the ontology, both using the list pattern: a closed ordered list (i.e., five-bin design) and an open ordered list design. The five-bin design puts the modifier terms into a set of 5 fixed bins with interval object properties, for example, one_level_more/less_frequently_than, where new terms can only be added as synonyms to existing classes. The open list approach starts with 5 bins, but supports the extensibility of the list via ordinal properties, for example, more/less_frequently_than, allowing new terms to be inserted as a new class anywhere in the list. The consequences of the different design decisions are discussed in the paper. CharaParser was used to extract modifiers from plant, ant, and other taxonomic descriptions. After a manual screening, 130 modifier words were selected as the candidate terms for the modifier ontologies. Four curators/experts (three biologists and one information scientist specialized in biosemantics) reviewed and categorized the terms into 20 bins using the Ontology Term Organizer (OTO) (http://biosemantics.arizona.edu/OTO). Inter-curator variations were reviewed and expressed in the final ontologies. Results: Frequency, certainty, degree, and coverage terms with complete agreement among all curators were used as class labels or exact synonyms. Terms with different interpretations were either excluded or included using "broader synonym" or "not recommended" annotation properties. These annotations explicitly allow for the user to be aware of the semantic ambiguity associated with the terms and whether they should be used with caution or avoided. Expert categorization results showed that 16 out of 20 bins contained terms with full agreements, suggesting differentiating the modifiers into 5 levels/bins balances the need to differentiate modifiers and the need for the ontology to reflect user consensus. Two ontologies, developed using the Protege ontology editor, are made available as OWL files and can be downloaded from https://github.com/biosemantics/ontologies. Contribution: We built the first two modifier ontologies following a consensus-based approach with terms commonly used in taxonomic literature. The five-bin ontology has been used in the Explorer of Taxon Concepts web toolkit to compute the similarity between characters extracted from literature to facilitate taxon concepts alignments. The two ontologies will also be used in an ontology-informed authoring tool for taxonomists to facilitate consistency in modifier term usage.
Collapse
Affiliation(s)
- Lorena Endara
- University of Florida, Gainesville, United States of AmericaUniversity of FloridaGainesvilleUnited States of America
| | - Anne E Thessen
- The Ronin Institute for Independent Scholarship, Monclair, NJ, United States of AmericaThe Ronin Institute for Independent ScholarshipMonclair, NJUnited States of America
| | - Heather A Cole
- Science and Technology Branch, Agriculture and Agri-Food Canada, Government of Canada, Ottawa, CanadaScience and Technology Branch, Agriculture and Agri-Food Canada, Government of CanadaOttawaCanada
| | - Ramona Walls
- CyVerse, Tucson, United States of AmericaCyVerseTucsonUnited States of America
| | - Georgios Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, United KingdomCollege of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of BirminghamBirminghamUnited Kingdom
- Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, B15 2TT, Birmingham, United KingdomInstitute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, B15 2TTBirminghamUnited Kingdom
| | - Yujie Cao
- Center for Studies of Information Resources, Wuhan Universtity, Wuhan, ChinaCenter for Studies of Information Resources, Wuhan UniverstityWuhanChina
| | - Steven S. Chong
- National Center for Ecological Analysis and Synthesis, University of California, Santa Barbara, Santa Barbara, United States of AmericaNational Center for Ecological Analysis and Synthesis, University of California, Santa BarbaraSanta BarbaraUnited States of America
- University of Arizona, Tucson, United States of AmericaUniversity of ArizonaTucsonUnited States of America
| | - Hong Cui
- University of Arizona, Tucson, United States of AmericaUniversity of ArizonaTucsonUnited States of America
| |
Collapse
|
38
|
Wang RL, Edwards S, Ives C. Ontology-based semantic mapping of chemical toxicities. Toxicology 2018; 412:89-100. [PMID: 30468866 DOI: 10.1016/j.tox.2018.11.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Revised: 11/11/2018] [Accepted: 11/19/2018] [Indexed: 12/15/2022]
Abstract
This study was undertaken to evaluate the use of ontology-based semantic mapping (OS-Mapping) in chemical toxicity assessment. Nineteen chemical-species phenotypic profiles (CSPPs) were constructed by ontologically annotating the toxicity responses reported in more than seven hundred published studies of ten chemicals on six vertebrate species. The CSPPs were semantically compared to more than 29,000 publicly available phenotypic profiles of genes, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways, and diseases based on a cross-species phenotype ontology. OS-Mapping was shown to differentiate chemical toxicities among themselves as well as within and across species. It also revealed cases of chemical by species interactions. In addition to confirming similar MOAs (mechanisms of action) for a few chemicals, OS-Mapping also generated novel insights into the MOAs underlying some seemingly different, yet phenotypically similar, classes of chemicals. The nature of a unified cross-species phenotype ontology and its representation of diverse knowledge domains allowed the construction of a complete phenotypic continuum for the 17α-ethynylestradiol_fathead minnow across the biological levels of organization, which complemented a similar one derived from the Comparative Toxicogenomics Database but based primarily on 17α-ethynylestradiol-induced molecular phenotypes. Overall, OS-Mapping has been demonstrated to offer a powerful approach to help bridge the gap between the molecular and non-molecular phenotypes of chemicals characterized by using high throughput or traditional omics methods and their apical endpoints of greater regulatory relevance, which are typically phenotypes found at the higher levels of biological organization. OS-Mapping also enables comparative toxicity assessment among chemicals, both within and across species. Furthermore, the semantic analysis of phenotypes can reveal additional novel MOAs for some well-known chemicals and discover candidate MOAs for chemicals that are less molecularly characterized. A full phenotypic continuum based on OS-Mapping will also be conducive to the future development of adverse outcome pathways. As phenomics continues to advance and the ontological annotation of literature becomes more automated, the power of OS-Mapping will be further enhanced.
Collapse
Affiliation(s)
- Rong-Lin Wang
- Exposure Methods and Measurements Division, National Exposure Research Laboratory, US EPA, Cincinnati, OH 45268, USA.
| | - Stephen Edwards
- Research Computing Division, RTI International, Research Triangle Park, NC 27709, USA
| | - Cataia Ives
- Research Computing Division, RTI International, Research Triangle Park, NC 27709, USA
| |
Collapse
|
39
|
Kulmanov M, Khan MA, Hoehndorf R, Wren J. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics 2018; 34:660-668. [PMID: 29028931 PMCID: PMC5860606 DOI: 10.1093/bioinformatics/btx624] [Citation(s) in RCA: 199] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 09/27/2017] [Indexed: 12/29/2022] Open
Abstract
Motivation A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. Results We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein–protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations. Availability and implementation Web server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgo Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maxat Kulmanov
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Mohammed Asif Khan
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia
| | | |
Collapse
|
40
|
Smaili FZ, Gao X, Hoehndorf R. OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction. Bioinformatics 2018; 35:2133-2140. [DOI: 10.1093/bioinformatics/bty933] [Citation(s) in RCA: 65] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2018] [Revised: 11/02/2018] [Accepted: 11/07/2018] [Indexed: 12/11/2022] Open
Affiliation(s)
- Fatima Zohra Smaili
- Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Xin Gao
- Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Robert Hoehndorf
- Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| |
Collapse
|
41
|
Boudellioua I, Kulmanov M, Schofield PN, Gkoutos GV, Hoehndorf R. OligoPVP: Phenotype-driven analysis of individual genomic information to prioritize oligogenic disease variants. Sci Rep 2018; 8:14681. [PMID: 30279426 PMCID: PMC6168481 DOI: 10.1038/s41598-018-32876-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 09/18/2018] [Indexed: 12/12/2022] Open
Abstract
An increasing number of disorders have been identified for which two or more distinct alleles in two or more genes are required to either cause the disease or to significantly modify its onset, severity or phenotype. It is difficult to discover such interactions using existing approaches. The purpose of our work is to develop and evaluate a system that can identify combinations of alleles underlying digenic and oligogenic diseases in individual whole exome or whole genome sequences. Information that links patient phenotypes to databases of gene-phenotype associations observed in clinical or non-human model organism research can provide useful information and improve variant prioritization for genetic diseases. Additional background knowledge about interactions between genes can be utilized to identify sets of variants in different genes in the same individual which may then contribute to the overall disease phenotype. We have developed OligoPVP, an algorithm that can be used to prioritize causative combinations of variants in digenic and oligogenic diseases, using whole exome or whole genome sequences together with patient phenotypes as input. We demonstrate that OligoPVP has significantly improved performance when compared to state of the art pathogenicity detection methods in the case of digenic diseases. Our results show that OligoPVP can efficiently prioritize sets of variants in digenic diseases using a phenotype-driven approach and identify etiologically important variants in whole genomes. OligoPVP naturally extends to oligogenic disease involving interactions between variants in two or more genes. It can be applied to the identification of multiple interacting candidate variants contributing to phenotype, where the action of modifier genes is suspected from pedigree analysis or failure of traditional causative variant identification.
Collapse
Affiliation(s)
- Imane Boudellioua
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Maxat Kulmanov
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Cambridge, UK
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, B15 2TT, Birmingham, United Kingdom
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, B15 2TT, Birmingham, United Kingdom
- NIHR Experimental Cancer Medicine Centre, B15 2TT, Birmingham, UK
- NIHR Surgical Reconstruction and Microbiology Research Centre, B15 2TT, Birmingham, UK
- NIHR Biomedical Research Centre, B15 2TT, Birmingham, UK
| | - Robert Hoehndorf
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.
| |
Collapse
|
42
|
Howe DG, Blake JA, Bradford YM, Bult CJ, Calvi BR, Engel SR, Kadin JA, Kaufman TC, Kishore R, Laulederkind SJF, Lewis SE, Moxon SAT, Richardson JE, Smith C. Model organism data evolving in support of translational medicine. Lab Anim (NY) 2018; 47:277-289. [PMID: 30224793 DOI: 10.1038/s41684-018-0150-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Accepted: 08/13/2018] [Indexed: 02/07/2023]
Abstract
Model organism databases (MODs) have been collecting and integrating biomedical research data for 30 years and were designed to meet specific needs of each model organism research community. The contributions of model organism research to understanding biological systems would be hard to overstate. Modern molecular biology methods and cost reductions in nucleotide sequencing have opened avenues for direct application of model organism research to elucidating mechanisms of human diseases. Thus, the mandate for model organism research and databases has now grown to include facilitating use of these data in translational applications. Challenges in meeting this opportunity include the distribution of research data across many databases and websites, a lack of data format standards for some data types, and sustainability of scale and cost for genomic database resources like MODs. The issues of widely distributed data and application of data standards are some of the challenges addressed by FAIR (Findable, Accessible, Interoperable, and Re-usable) data principles. The Alliance of Genome Resources is now moving to address these challenges by bringing together expertly curated research data from fly, mouse, rat, worm, yeast, zebrafish, and the Gene Ontology consortium. Centralized multi-species data access, integration, and format standardization will lower the data utilization barrier in comparative genomics and translational applications and will provide a framework in which sustainable scale and cost can be addressed. This article presents a brief historical perspective on how the Alliance model organisms are complementary and how they have already contributed to understanding the etiology of human diseases. In addition, we discuss four challenges for using data from MODs in translational applications and how the Alliance is working to address them, in part by applying FAIR data principles. Ultimately, combined data from these animal models are more powerful than the sum of the parts.
Collapse
Affiliation(s)
- Douglas G Howe
- The Institute of Neuroscience, University of Oregon, Eugene, OR, USA.
| | | | - Yvonne M Bradford
- The Institute of Neuroscience, University of Oregon, Eugene, OR, USA
| | | | - Brian R Calvi
- Department of Biology, Indiana University, Bloomington, IN, USA
| | - Stacia R Engel
- Department of Genetics, Stanford University, Palo Alto, CA, USA
| | | | | | - Ranjana Kishore
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Stanley J F Laulederkind
- Department of Biomedical Engineering, Medical College of Wisconsin and Marquette University, Milwaukee, WI, USA
| | - Suzanna E Lewis
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Sierra A T Moxon
- The Institute of Neuroscience, University of Oregon, Eugene, OR, USA
| | | | | |
Collapse
|
43
|
Kulmanov M, Schofield PN, Gkoutos GV, Hoehndorf R. Ontology-based validation and identification of regulatory phenotypes. Bioinformatics 2018; 34:i857-i865. [PMID: 30423068 PMCID: PMC6129279 DOI: 10.1093/bioinformatics/bty605] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Motivation Function annotations of gene products, and phenotype annotations of genotypes, provide valuable information about molecular mechanisms that can be utilized by computational methods to identify functional and phenotypic relatedness, improve our understanding of disease and pathobiology, and lead to discovery of drug targets. Identifying functions and phenotypes commonly requires experiments which are time-consuming and expensive to carry out; creating the annotations additionally requires a curator to make an assertion based on reported evidence. Support to validate the mutual consistency of functional and phenotype annotations as well as a computational method to predict phenotypes from function annotations, would greatly improve the utility of function annotations. Results We developed a novel ontology-based method to validate the mutual consistency of function and phenotype annotations. We apply our method to mouse and human annotations, and identify several inconsistencies that can be resolved to improve overall annotation quality. We also apply our method to the rule-based prediction of regulatory phenotypes from functions and demonstrate that we can predict these phenotypes with Fmax of up to 0.647. Availability and implementation https://github.com/bio-ontology-research-group/phenogocon.
Collapse
Affiliation(s)
- Maxat Kulmanov
- Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Centre, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Paul N Schofield
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, UK
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, UK
- NIHR Experimental Cancer Medicine Centre, Birmingham, UK
- NIHR Surgical Reconstruction and Microbiology Research Centre, Birmingham, UK
- NIHR Biomedical Research Centre, Birmingham, UK
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Centre, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
44
|
Brown SDM, Holmes CC, Mallon AM, Meehan TF, Smedley D, Wells S. High-throughput mouse phenomics for characterizing mammalian gene function. Nat Rev Genet 2018; 19:357-370. [PMID: 29626206 PMCID: PMC6582361 DOI: 10.1038/s41576-018-0005-2] [Citation(s) in RCA: 63] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
We are entering a new era of mouse phenomics, driven by large-scale and economical generation of mouse mutants coupled with increasingly sophisticated and comprehensive phenotyping. These studies are generating large, multidimensional gene-phenotype data sets, which are shedding new light on the mammalian genome landscape and revealing many hitherto unknown features of mammalian gene function. Moreover, these phenome resources provide a wealth of disease models and can be integrated with human genomics data as a powerful approach for the interpretation of human genetic variation and its relationship to disease. In the future, the development of novel phenotyping platforms allied to improved computational approaches, including machine learning, for the analysis of phenotype data will continue to enhance our ability to develop a comprehensive and powerful model of mammalian gene-phenotype space.
Collapse
Affiliation(s)
| | - Chris C Holmes
- Nuffield Department of Medicine and Department of Statistics, University of Oxford, Oxford, UK.
| | | | - Terrence F Meehan
- European Molecular Biology Laboratory - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
| | | | | |
Collapse
|
45
|
Rodríguez-García MÁ, Gkoutos GV, Schofield PN, Hoehndorf R. Integrating phenotype ontologies with PhenomeNET. J Biomed Semantics 2017; 8:58. [PMID: 29258588 PMCID: PMC5735523 DOI: 10.1186/s13326-017-0167-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 11/22/2017] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Integration and analysis of phenotype data from humans and model organisms is a key challenge in building our understanding of normal biology and pathophysiology. However, the range of phenotypes and anatomical details being captured in clinical and model organism databases presents complex problems when attempting to match classes across species and across phenotypes as diverse as behaviour and neoplasia. We have previously developed PhenomeNET, a system for disease gene prioritization that includes as one of its components an ontology designed to integrate phenotype ontologies. While not applicable to matching arbitrary ontologies, PhenomeNET can be used to identify related phenotypes in different species, including human, mouse, zebrafish, nematode worm, fruit fly, and yeast. RESULTS Here, we apply the PhenomeNET to identify related classes from two phenotype and two disease ontologies using automated reasoning. We demonstrate that we can identify a large number of mappings, some of which require automated reasoning and cannot easily be identified through lexical approaches alone. Combining automated reasoning with lexical matching further improves results in aligning ontologies. CONCLUSIONS PhenomeNET can be used to align and integrate phenotype ontologies. The results can be utilized for biomedical analyses in which phenomena observed in model organisms are used to identify causative genes and mutations underlying human disease.
Collapse
Affiliation(s)
- Miguel Ángel Rodríguez-García
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia.,Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, PO Box 2882, Thuwal, 23955-6900, Saudi Arabia
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT, UK.,Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, B15 2TT, UK.,Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 2AX, UK
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, UK
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia. .,Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, PO Box 2882, Thuwal, 23955-6900, Saudi Arabia.
| |
Collapse
|
46
|
Staal YC, Pennings JL, Hessel EV, Piersma AH. Advanced Toxicological Risk Assessment by Implementation of Ontologies Operationalized in Computational Models. ACTA ACUST UNITED AC 2017. [DOI: 10.1089/aivt.2017.0019] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Yvonne C.M. Staal
- Center for Health Protection, National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands
| | - Jeroen L.A. Pennings
- Center for Health Protection, National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands
| | - Ellen V.S. Hessel
- Center for Health Protection, National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands
| | - Aldert H. Piersma
- Center for Health Protection, National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands
| |
Collapse
|
47
|
Boudellioua I, Mahamad Razali RB, Kulmanov M, Hashish Y, Bajic VB, Goncalves-Serra E, Schoenmakers N, Gkoutos GV, Schofield PN, Hoehndorf R. Semantic prioritization of novel causative genomic variants. PLoS Comput Biol 2017; 13:e1005500. [PMID: 28414800 PMCID: PMC5411092 DOI: 10.1371/journal.pcbi.1005500] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Revised: 05/01/2017] [Accepted: 04/04/2017] [Indexed: 12/14/2022] Open
Abstract
Discriminating the causative disease variant(s) for individuals with inherited or de novo mutations presents one of the main challenges faced by the clinical genetics community today. Computational approaches for variant prioritization include machine learning methods utilizing a large number of features, including molecular information, interaction networks, or phenotypes. Here, we demonstrate the PhenomeNET Variant Predictor (PVP) system that exploits semantic technologies and automated reasoning over genotype-phenotype relations to filter and prioritize variants in whole exome and whole genome sequencing datasets. We demonstrate the performance of PVP in identifying causative variants on a large number of synthetic whole exome and whole genome sequences, covering a wide range of diseases and syndromes. In a retrospective study, we further illustrate the application of PVP for the interpretation of whole exome sequencing data in patients suffering from congenital hypothyroidism. We find that PVP accurately identifies causative variants in whole exome and whole genome sequencing datasets and provides a powerful resource for the discovery of causal variants. We address the problem of how to distinguish which of the many thousands of DNA sequence variants carried by an individual with a rare disease is responsible for the disease phenotypes. This can help clinicians arrive at a diagnosis, but also can be instrumental in improving our understanding of the pathobiology of the disease. Many methods are currently available to help with the problem of determining causative variant, using information about evolutionary conservation and prediction of the functional consequences of the sequence variant. We have developed a novel algorithm (PVP) which augments existing strategies by using the similarity of the patients phenotype to known phenotype-genotype data in human and model organism databases to further rank potential candidate genes. In a retrospective study, we apply PVP to the interpretation of whole exome sequencing data in patients suffering from congenital hypothyroidism, and find that PVP accurately identifies causative variants in whole exome and whole genome sequencing datasets and provides a powerful resource for the discovery of causal variants.
Collapse
Affiliation(s)
- Imane Boudellioua
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Rozaimi B. Mahamad Razali
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Maxat Kulmanov
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Yasmeen Hashish
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Vladimir B. Bajic
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
| | - Eva Goncalves-Serra
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Nadia Schoenmakers
- University of Cambridge Metabolic Research Laboratories, Wellcome Trust—Medical Research Council, Institute of Metabolic Science, Addenbrooke’s Hospital, Cambridge, United Kingdom
| | - Georgios V. Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, United Kingdom
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, United Kingdom
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, United Kingdom
- * E-mail: (GVG); (PNS); (RH)
| | - Paul N. Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Cambridge, United Kingdom
- * E-mail: (GVG); (PNS); (RH)
| | - Robert Hoehndorf
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
- * E-mail: (GVG); (PNS); (RH)
| |
Collapse
|