51
|
Rushing BR, Thessen AE, Soliman GA, Ramesh A, Sumner SCJ. The Exposome and Nutritional Pharmacology and Toxicology: A New Application for Metabolomics. EXPOSOME 2023; 3:osad008. [PMID: 38766521 PMCID: PMC11101153 DOI: 10.1093/exposome/osad008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
The exposome refers to all of the internal and external life-long exposures that an individual experiences. These exposures, either acute or chronic, are associated with changes in metabolism that will positively or negatively influence the health and well-being of individuals. Nutrients and other dietary compounds modulate similar biochemical processes and have the potential in some cases to counteract the negative effects of exposures or enhance their beneficial effects. We present herein the concept of Nutritional Pharmacology/Toxicology which uses high-information metabolomics workflows to identify metabolic targets associated with exposures. Using this information, nutritional interventions can be designed toward those targets to mitigate adverse effects or enhance positive effects. We also discuss the potential for this approach in precision nutrition where nutrients/diet can be used to target gene-environment interactions and other subpopulation characteristics. Deriving these "nutrient cocktails" presents an opportunity to modify the effects of exposures for more beneficial outcomes in public health.
Collapse
Affiliation(s)
- Blake R. Rushing
- Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Anne E Thessen
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Ghada A. Soliman
- Department of Environmental, Occupational and Geospatial Health Sciences, City University of New York-Graduate School of Public Health and Health Policy, New York, NY, USA
| | - Aramandla Ramesh
- Department of Biochemistry, Cancer Biology, Neuroscience & Pharmacology, Meharry Medical College, Nashville, TN, USA
| | - Susan CJ Sumner
- Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
52
|
Orlic-Milacic M, Rothfels K, Matthews L, Wright A, Jassal B, Shamovsky V, Trinh Q, Gillespie M, Sevilla C, Tiwari K, Ragueneau E, Gong C, Stephan R, May B, Haw R, Weiser J, Beavers D, Conley P, Hermjakob H, Stein LD, D'Eustachio P, Wu G. Pathway-based, reaction-specific annotation of disease variants for elucidation of molecular phenotypes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.18.562964. [PMID: 37904913 PMCID: PMC10614924 DOI: 10.1101/2023.10.18.562964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/01/2023]
Abstract
Disease variant annotation in the context of biological reactions and pathways can provide a standardized overview of molecular phenotypes of pathogenic mutations that is amenable to computational mining and mathematical modeling. Reactome, an open source, manually curated, peer-reviewed database of human biological pathways, provides annotations for over 4000 disease variants of close to 400 genes in the context of ∼800 disease reactions constituting ∼400 disease pathways. Functional annotation of disease variants proceeds from normal gene functions, through disease variants whose divergence from normal molecular behaviors has been experimentally verified, to extrapolation from molecular phenotypes of characterized variants to variants of unknown significance using criteria of the American College of Medical Genetics and Genomics (ACMG). Reactome's pathway-based, reaction-specific disease variant dataset and data model provide a platform to infer pathway output impacts of numerous human disease variants and model organism orthologs, complementing computational predictions of variant pathogenicity.
Collapse
|
53
|
Stefancsik R, Balhoff JP, Balk MA, Ball RL, Bello SM, Caron AR, Chesler EJ, de Souza V, Gehrke S, Haendel M, Harris LW, Harris NL, Ibrahim A, Koehler S, Matentzoglu N, McMurry JA, Mungall CJ, Munoz-Torres MC, Putman T, Robinson P, Smedley D, Sollis E, Thessen AE, Vasilevsky N, Walton DO, Osumi-Sutherland D. The Ontology of Biological Attributes (OBA)-computational traits for the life sciences. Mamm Genome 2023; 34:364-378. [PMID: 37076585 PMCID: PMC10382347 DOI: 10.1007/s00335-023-09992-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 04/06/2023] [Indexed: 04/21/2023]
Abstract
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focussed measurable trait data. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.
Collapse
Affiliation(s)
- Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK.
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC, 27517, USA
| | - Meghan A Balk
- Natural History Museum, University of Oslo, Oslo, Norway
| | - Robyn L Ball
- The Jackson Laboratory, Bar Harbor, ME, 04609, USA
| | | | - Anita R Caron
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sarah Gehrke
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Melissa Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Laura W Harris
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Nomi L Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Arwa Ibrahim
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | | | - Julie A McMurry
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | | | - Damian Smedley
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK
| | - Elliot Sollis
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Nicole Vasilevsky
- Data Collaboration Center, Critical Path Institute, Tucson, AZ, 85718, USA
| | | | | |
Collapse
|
54
|
Aradhya S, Facio FM, Metz H, Manders T, Colavin A, Kobayashi Y, Nykamp K, Johnson B, Nussbaum RL. Applications of artificial intelligence in clinical laboratory genomics. AMERICAN JOURNAL OF MEDICAL GENETICS. PART C, SEMINARS IN MEDICAL GENETICS 2023; 193:e32057. [PMID: 37507620 DOI: 10.1002/ajmg.c.32057] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/13/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023]
Abstract
The transition from analog to digital technologies in clinical laboratory genomics is ushering in an era of "big data" in ways that will exceed human capacity to rapidly and reproducibly analyze those data using conventional approaches. Accurately evaluating complex molecular data to facilitate timely diagnosis and management of genomic disorders will require supportive artificial intelligence methods. These are already being introduced into clinical laboratory genomics to identify variants in DNA sequencing data, predict the effects of DNA variants on protein structure and function to inform clinical interpretation of pathogenicity, link phenotype ontologies to genetic variants identified through exome or genome sequencing to help clinicians reach diagnostic answers faster, correlate genomic data with tumor staging and treatment approaches, utilize natural language processing to identify critical published medical literature during analysis of genomic data, and use interactive chatbots to identify individuals who qualify for genetic testing or to provide pre-test and post-test education. With careful and ethical development and validation of artificial intelligence for clinical laboratory genomics, these advances are expected to significantly enhance the abilities of geneticists to translate complex data into clearly synthesized information for clinicians to use in managing the care of their patients at scale.
Collapse
Affiliation(s)
- Swaroop Aradhya
- Invitae Corporation, San Francisco, California, USA
- Adjunct Clinical Faculty, Department of Pathology, Stanford University School of Medicine, Stanford, California, USA
| | | | - Hillery Metz
- Invitae Corporation, San Francisco, California, USA
| | - Toby Manders
- Invitae Corporation, San Francisco, California, USA
| | | | | | - Keith Nykamp
- Invitae Corporation, San Francisco, California, USA
| | | | - Robert L Nussbaum
- Invitae Corporation, San Francisco, California, USA
- Volunteer Faculty, School of Medicine, University of California San Francisco, San Francisco, California, USA
| |
Collapse
|
55
|
Abstract
Within the next decade, the genomes of 1.8 million eukaryotic species will be sequenced. Identifying genes in these sequences is essential to understand the biology of the species. This is challenging due to the transcriptional complexity of eukaryotic genomes, which encode hundreds of thousands of transcripts of multiple types. Among these, a small set of protein-coding mRNAs play a disproportionately large role in defining phenotypes. Due to their sequence conservation, orthology can be established, making it possible to define the universal catalog of eukaryotic protein-coding genes. This catalog should substantially contribute to uncovering the genomic events underlying the emergence of eukaryotic phenotypes. This piece briefly reviews the basics of protein-coding gene prediction, discusses challenges in finalizing annotation of the human genome, and proposes strategies for producing annotations across the eukaryotic Tree of Life. This lays the groundwork for obtaining the catalog of all genes-the Earth's code of life.
Collapse
Affiliation(s)
- Roderic Guigó
- Bioinformatics and Genomics, Center for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology (BIST), Dr. Aiguader 88, 08003 Barcelona, Catalonia
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia
| |
Collapse
|
56
|
Alghamdi SM, Hoehndorf R. Improving the classification of cardinality phenotypes using collections. J Biomed Semantics 2023; 14:9. [PMID: 37550716 PMCID: PMC10405428 DOI: 10.1186/s13326-023-00290-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 07/07/2023] [Indexed: 08/09/2023] Open
Abstract
MOTIVATION Phenotypes are observable characteristics of an organism and they can be highly variable. Information about phenotypes is collected in a clinical context to characterize disease, and is also collected in model organisms and stored in model organism databases where they are used to understand gene functions. Phenotype data is also used in computational data analysis and machine learning methods to provide novel insights into disease mechanisms and support personalized diagnosis of disease. For mammalian organisms and in a clinical context, ontologies such as the Human Phenotype Ontology and the Mammalian Phenotype Ontology are widely used to formally and precisely describe phenotypes. We specifically analyze axioms pertaining to phenotypes of collections of entities within a body, and we find that some of the axioms in phenotype ontologies lead to inferences that may not accurately reflect the underlying biological phenomena. RESULTS We reformulate the phenotypes of collections of entities using an ontological theory of collections. By reformulating phenotypes of collections in phenotypes ontologies, we avoid potentially incorrect inferences pertaining to the cardinality of these collections. We apply our method to two phenotype ontologies and show that the reformulation not only removes some problematic inferences but also quantitatively improves biological data analysis.
Collapse
Affiliation(s)
- Sarah M Alghamdi
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, 23955, Thuwal, Saudi Arabia.
- King Abdul-Aziz University, Faculty of Computing and Information Technology, 25732, Rabigh, Saudi Arabia.
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), Computer, Electrical, and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, 23955, Thuwal, Saudi Arabia.
| |
Collapse
|
57
|
Elkin J, Martin A, Courtier-Orgogozo V, Santos ME. Analysis of the genetic loci of pigment pattern evolution in vertebrates. Biol Rev Camb Philos Soc 2023; 98:1250-1277. [PMID: 37017088 DOI: 10.1111/brv.12952] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 03/08/2023] [Accepted: 03/14/2023] [Indexed: 04/06/2023]
Abstract
Vertebrate pigmentation patterns are amongst the best characterised model systems for studying the genetic basis of adaptive evolution. The wealth of available data on the genetic basis for pigmentation evolution allows for analysis of trends and quantitative testing of evolutionary hypotheses. We employed Gephebase, a database of genetic variants associated with natural and domesticated trait variation, to examine trends in how cis-regulatory and coding mutations contribute to vertebrate pigmentation phenotypes, as well as factors that favour one mutation type over the other. We found that studies with lower ascertainment bias identified higher proportions of cis-regulatory mutations, and that cis-regulatory mutations were more common amongst animals harbouring a higher number of pigment cell classes. We classified pigmentation traits firstly according to their physiological basis and secondly according to whether they affect colour or pattern, and identified that carotenoid-based pigmentation and variation in pattern boundaries are preferentially associated with cis-regulatory change. We also classified genes according to their developmental, cellular, and molecular functions. We found a greater proportion of cis-regulatory mutations in genes implicated in upstream developmental processes compared to those involved in downstream cellular functions, and that ligands were associated with a higher proportion of cis-regulatory mutations than their respective receptors. Based on these trends, we discuss future directions for research in vertebrate pigmentation evolution.
Collapse
Affiliation(s)
- Joel Elkin
- Department of Zoology, University of Cambridge, Downing Street, Cambridge, CB2 3EJ, UK
| | - Arnaud Martin
- Department of Biological Sciences, The George Washington University, 800 22nd St. NW, Suite 6000, Washington, DC, 20052, USA
| | | | - M Emília Santos
- Department of Zoology, University of Cambridge, Downing Street, Cambridge, CB2 3EJ, UK
| |
Collapse
|
58
|
Carmody LC, Gargano MA, Toro S, Vasilevsky NA, Adam MP, Blau H, Chan LE, Gomez-Andres D, Horvath R, Kraus ML, Ladewig MS, Lewis-Smith D, Lochmüller H, Matentzoglu NA, Munoz-Torres MC, Schuetz C, Seitz B, Similuk MN, Sparks TN, Strauss T, Swietlik EM, Thompson R, Zhang XA, Mungall CJ, Haendel MA, Robinson PN. The Medical Action Ontology: A Tool for Annotating and Analyzing Treatments and Clinical Management of Human Disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.13.23292612. [PMID: 37503136 PMCID: PMC10370244 DOI: 10.1101/2023.07.13.23292612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Navigating the vast landscape of clinical literature to find optimal treatments and management strategies can be a challenging task, especially for rare diseases. To address this task, we introduce the Medical Action Ontology (MAxO), the first ontology specifically designed to organize medical procedures, therapies, and interventions in a structured way. Currently, MAxO contains 1757 medical action terms added through a combination of manual and semi-automated processes. MAxO was developed with logical structures that make it compatible with several other ontologies within the Open Biological and Biomedical Ontologies (OBO) Foundry. These cover a wide range of biomedical domains, from human anatomy and investigations to the chemical and protein entities involved in biological processes. We have created a database of over 16000 annotations that describe diagnostic modalities for specific phenotypic abnormalities as defined by the Human Phenotype Ontology (HPO). Additionally, 413 annotations are provided for medical actions for 189 rare diseases. We have developed a web application called POET (https://poet.jax.org/) for the community to use to contribute MAxO annotations. MAxO provides a computational representation of treatments and other actions taken for the clinical management of patients. The development of MAxO is closely coupled to the Mondo Disease Ontology (Mondo) and the Human Phenotype Ontology (HPO) and expands the scope of our computational modeling of diseases and phenotypic features to include diagnostics and therapeutic actions. MAxO is available under the open-source CC-BY 4.0 license (https://github.com/monarch-initiative/MAxO).
Collapse
Affiliation(s)
- Leigh C Carmody
- The Jackson Laboratory for Genomic Medicine,Farmington,CT,United States
| | - Michael A Gargano
- The Jackson Laboratory for Genomic Medicine,Farmington,CT,United States
| | - Sabrina Toro
- University of Colorado Anschutz Medical Campus,Aurora,CO,United States
| | | | - Margaret P Adam
- University of Washington School of Medicine, Seattle, WA, United States
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine,Farmington,CT,United States
| | | | - David Gomez-Andres
- Pediatric Neurology, Vall d'Hebron Institut de Recerca (VHIR), Hospital Universitari Vall d'Hebron, Vall d'Hebron Barcelona Hospital Campus., Passeig Vall d'Hebron 119-129, 08035 Barcelona, Spain
| | - Rita Horvath
- Department of Clinical Neurosciences, University of Cambridge, Robinson Way CB2 0PY, Cambridge UK
| | - Megan L Kraus
- University of Colorado Anschutz Medical Campus,Aurora,CO,United States
| | - Markus S Ladewig
- Department of Ophthalmology,Klinikum Saarbrücken,Saarbrücken,Germany
| | - David Lewis-Smith
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH, United Kingdom
| | | | | | | | - Catharina Schuetz
- Department of Pediatrics, Medizinische Fakultät Carl Gustav Carus, Technische Universität Dresden, 01307 Dresden, Germany
| | - Berthold Seitz
- Department of Ophthalmology,Saarland University Hospital UKS,Homburg/Saar Germany
| | - Morgan N Similuk
- National Institute of Allergy and Infectious Diseases,National Institutes of Health,Bethesda,MD,United States
| | - Teresa N Sparks
- Department of Obstetrics, Gynecology, & Reproductive Sciences, University of California, San Francisco, San Francisco, CA 94143
| | - Timmy Strauss
- Department of Pediatrics, Medizinische Fakultät Carl Gustav Carus, Technische Universität Dresden, 01307 Dresden, Germany
| | - Emilia M Swietlik
- Department of Medicine, University of Cambridge, Heart and Lung Research Institute, CB2 0BB, Cambridge, UK
| | | | | | | | - Melissa A Haendel
- University of Colorado Anschutz Medical Campus,Aurora,CO,United States
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine,Farmington,CT,United States
| |
Collapse
|
59
|
Bao C, Wang S, Jiang L, Fang Z, Zou K, Lin J, Chen S, Fang H. OpenXGR: a web-server update for genomic summary data interpretation. Nucleic Acids Res 2023; 51:W387-W396. [PMID: 37158276 PMCID: PMC10320191 DOI: 10.1093/nar/gkad357] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Revised: 04/21/2023] [Accepted: 04/25/2023] [Indexed: 05/10/2023] Open
Abstract
How to effectively convert genomic summary data into downstream knowledge discovery represents a major challenge in human genomics research. To address this challenge, we have developed efficient and effective approaches and tools. Extending our previously established software tools, we here introduce OpenXGR (http://www.openxgr.com), a newly designed web server that offers almost real-time enrichment and subnetwork analyses for a user-input list of genes, SNPs or genomic regions. It achieves so through leveraging ontologies, networks, and functional genomic datasets (such as promoter capture Hi-C, e/pQTL and enhancer-gene maps for linking SNPs or genomic regions to candidate genes). Six analysers are provided, each doing specific interpretations tailored to genomic summary data at various levels. Three enrichment analysers are designed to identify ontology terms enriched for input genes, as well as genes linked from input SNPs or genomic regions. Three subnetwork analysers allow users to identify gene subnetworks from input gene-, SNP- or genomic region-level summary data. With a step-by-step user manual, OpenXGR provides a user-friendly and all-in-one platform for interpreting summary data on the human genome, enabling more integrated and effective knowledge discovery.
Collapse
Affiliation(s)
- Chaohui Bao
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai200025, China
| | - Shan Wang
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai200025, China
| | - Lulu Jiang
- Translational Health Sciences, University of Bristol, BristolBS1 3NY, UK
| | - Zhongcheng Fang
- Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai200092, China
| | - Kexin Zou
- School of Life Sciences, Central South University, Hunan410083, China
| | - James Lin
- High Performance Computing Center, Shanghai Jiao Tong University, Shanghai200240, China
| | - Saijuan Chen
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai200025, China
| | - Hai Fang
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai200025, China
| |
Collapse
|
60
|
Hu Y, Comjean A, Attrill H, Antonazzo G, Thurmond J, Chen W, Li F, Chao T, Mohr SE, Brown NH, Perrimon N. PANGEA: a new gene set enrichment tool for Drosophila and common research organisms. Nucleic Acids Res 2023; 51:W419-W426. [PMID: 37125646 PMCID: PMC10320058 DOI: 10.1093/nar/gkad331] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 03/28/2023] [Accepted: 04/29/2023] [Indexed: 05/02/2023] Open
Abstract
Gene set enrichment analysis (GSEA) plays an important role in large-scale data analysis, helping scientists discover the underlying biological patterns over-represented in a gene list resulting from, for example, an 'omics' study. Gene Ontology (GO) annotation is the most frequently used classification mechanism for gene set definition. Here we present a new GSEA tool, PANGEA (PAthway, Network and Gene-set Enrichment Analysis; https://www.flyrnai.org/tools/pangea/), developed to allow a more flexible and configurable approach to data analysis using a variety of classification sets. PANGEA allows GO analysis to be performed on different sets of GO annotations, for example excluding high-throughput studies. Beyond GO, gene sets for pathway annotation and protein complex data from various resources as well as expression and disease annotation from the Alliance of Genome Resources (Alliance). In addition, visualizations of results are enhanced by providing an option to view network of gene set to gene relationships. The tool also allows comparison of multiple input gene lists and accompanying visualisation tools for quick and easy comparison. This new tool will facilitate GSEA for Drosophila and other major model organisms based on high-quality annotated information available for these species.
Collapse
Affiliation(s)
- Yanhui Hu
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Aram Comjean
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Helen Attrill
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Giulia Antonazzo
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Jim Thurmond
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Weihang Chen
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Fangge Li
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Tiffany Chao
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Stephanie E Mohr
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Nicholas H Brown
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Norbert Perrimon
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
- Howard Hughes Medical Institute, Boston, MA 02138, USA
| |
Collapse
|
61
|
Cuzick A, Seager J, Wood V, Urban M, Rutherford K, Hammond-Kosack KE. A framework for community curation of interspecies interactions literature. eLife 2023; 12:e84658. [PMID: 37401199 DOI: 10.7554/elife.84658] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 05/18/2023] [Indexed: 07/05/2023] Open
Abstract
The quantity and complexity of data being generated and published in biology has increased substantially, but few methods exist for capturing knowledge about phenotypes derived from molecular interactions between diverse groups of species, in such a way that is amenable to data-driven biology and research. To improve access to this knowledge, we have constructed a framework for the curation of the scientific literature studying interspecies interactions, using data curated for the Pathogen-Host Interactions database (PHI-base) as a case study. The framework provides a curation tool, phenotype ontology, and controlled vocabularies to curate pathogen-host interaction data, at the level of the host, pathogen, strain, gene, and genotype. The concept of a multispecies genotype, the 'metagenotype,' is introduced to facilitate capturing changes in the disease-causing abilities of pathogens, and host resistance or susceptibility, observed by gene alterations. We report on this framework and describe PHI-Canto, a community curation tool for use by publication authors.
Collapse
Affiliation(s)
- Alayne Cuzick
- Strategic area: Protecting Crops and the Environment, Rothamsted Research, Harpenden, United Kingdom
| | - James Seager
- Strategic area: Protecting Crops and the Environment, Rothamsted Research, Harpenden, United Kingdom
| | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Martin Urban
- Strategic area: Protecting Crops and the Environment, Rothamsted Research, Harpenden, United Kingdom
| | - Kim Rutherford
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Kim E Hammond-Kosack
- Strategic area: Protecting Crops and the Environment, Rothamsted Research, Harpenden, United Kingdom
| |
Collapse
|
62
|
Caufield JH, Putman T, Schaper K, Unni DR, Hegde H, Callahan TJ, Cappelletti L, Moxon SAT, Ravanmehr V, Carbon S, Chan LE, Cortes K, Shefchek KA, Elsarboukh G, Balhoff J, Fontana T, Matentzoglu N, Bruskiewich RM, Thessen AE, Harris NL, Munoz-Torres MC, Haendel MA, Robinson PN, Joachimiak MP, Mungall CJ, Reese JT. KG-Hub-building and exchanging biological knowledge graphs. Bioinformatics 2023; 39:btad418. [PMID: 37389415 PMCID: PMC10336030 DOI: 10.1093/bioinformatics/btad418] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/09/2023] [Accepted: 06/29/2023] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking. RESULTS Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification. AVAILABILITY AND IMPLEMENTATION https://kghub.org.
Collapse
Affiliation(s)
- J Harry Caufield
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Kevin Schaper
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Deepak R Unni
- SIB Swiss Institute of Bioinformatics, Basel 1015, Switzerland
| | - Harshad Hegde
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Luca Cappelletti
- Department of Computer Science, University of Milano, Milan 20126, Italy
| | - Sierra A T Moxon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Vida Ravanmehr
- Department of Lymphoma-Myeloma, MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Seth Carbon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Lauren E Chan
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, United States
| | - Katherina Cortes
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Kent A Shefchek
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Glass Elsarboukh
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Jim Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC 27517, United States
| | - Tommaso Fontana
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan 20133, Italy
| | | | | | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Nomi L Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | | | - Melissa A Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, United States
| | - Marcin P Joachimiak
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Justin T Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| |
Collapse
|
63
|
Danis D, Jacobsen JOB, Wagner AH, Groza T, Beckwith MA, Rekerle L, Carmody LC, Reese J, Hegde H, Ladewig MS, Seitz B, Munoz-Torres M, Harris NL, Rambla J, Baudis M, Mungall CJ, Haendel MA, Robinson PN. Phenopacket-tools: Building and validating GA4GH Phenopackets. PLoS One 2023; 18:e0285433. [PMID: 37196000 PMCID: PMC10191354 DOI: 10.1371/journal.pone.0285433] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/21/2023] [Indexed: 05/19/2023] Open
Abstract
The Global Alliance for Genomics and Health (GA4GH) is a standards-setting organization that is developing a suite of coordinated standards for genomics. The GA4GH Phenopacket Schema is a standard for sharing disease and phenotype information that characterizes an individual person or biosample. The Phenopacket Schema is flexible and can represent clinical data for any kind of human disease including rare disease, complex disease, and cancer. It also allows consortia or databases to apply additional constraints to ensure uniform data collection for specific goals. We present phenopacket-tools, an open-source Java library and command-line application for construction, conversion, and validation of phenopackets. Phenopacket-tools simplifies construction of phenopackets by providing concise builders, programmatic shortcuts, and predefined building blocks (ontology classes) for concepts such as anatomical organs, age of onset, biospecimen type, and clinical modifiers. Phenopacket-tools can be used to validate the syntax and semantics of phenopackets as well as to assess adherence to additional user-defined requirements. The documentation includes examples showing how to use the Java library and the command-line tool to create and validate phenopackets. We demonstrate how to create, convert, and validate phenopackets using the library or the command-line application. Source code, API documentation, comprehensive user guide and a tutorial can be found at https://github.com/phenopackets/phenopacket-tools. The library can be installed from the public Maven Central artifact repository and the application is available as a standalone archive. The phenopacket-tools library helps developers implement and standardize the collection and exchange of phenotypic and other clinical data for use in phenotype-driven genomic diagnostics, translational research, and precision medicine applications.
Collapse
Affiliation(s)
- Daniel Danis
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
| | - Julius O. B. Jacobsen
- William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
| | - Alex H. Wagner
- Departments of Pediatrics and Biomedical Informatics, The Ohio State University College of Medicine, Columbus, OH, United States of America
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children’s Hospital, Columbus, OH, United States of America
| | | | - Martha A. Beckwith
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
| | - Lauren Rekerle
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
| | - Leigh C. Carmody
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
| | - Justin Reese
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, United States of America
| | - Harshad Hegde
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, United States of America
| | - Markus S. Ladewig
- Department of Ophthalmology, Klinikum Saarbrücken, Saarbrücken, Germany
| | - Berthold Seitz
- Department of Ophthalmology, Saarland University Medical Center, Homburg/Saar, Germany
| | - Monica Munoz-Torres
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Nomi L. Harris
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, United States of America
| | - Jordi Rambla
- European Genome-Phenome Archive (EGA) in the Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Michael Baudis
- University of Zurich and Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Christopher J. Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, United States of America
| | - Melissa A. Haendel
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Peter N. Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States of America
- Institute for Systems Genomics, University of Connecticut, Farmington, CT, United States of America
| |
Collapse
|
64
|
Licata L, Via A, Turina P, Babbi G, Benevenuta S, Carta C, Casadio R, Cicconardi A, Facchiano A, Fariselli P, Giordano D, Isidori F, Marabotti A, Martelli PL, Pascarella S, Pinelli M, Pippucci T, Russo R, Savojardo C, Scafuri B, Valeriani L, Capriotti E. Resources and tools for rare disease variant interpretation. Front Mol Biosci 2023; 10:1169109. [PMID: 37234922 PMCID: PMC10206239 DOI: 10.3389/fmolb.2023.1169109] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Accepted: 04/25/2023] [Indexed: 05/28/2023] Open
Abstract
Collectively, rare genetic disorders affect a substantial portion of the world's population. In most cases, those affected face difficulties in receiving a clinical diagnosis and genetic characterization. The understanding of the molecular mechanisms of these diseases and the development of therapeutic treatments for patients are also challenging. However, the application of recent advancements in genome sequencing/analysis technologies and computer-aided tools for predicting phenotype-genotype associations can bring significant benefits to this field. In this review, we highlight the most relevant online resources and computational tools for genome interpretation that can enhance the diagnosis, clinical management, and development of treatments for rare disorders. Our focus is on resources for interpreting single nucleotide variants. Additionally, we present use cases for interpreting genetic variants in clinical settings and review the limitations of these results and prediction tools. Finally, we have compiled a curated set of core resources and tools for analyzing rare disease genomes. Such resources and tools can be utilized to develop standardized protocols that will enhance the accuracy and effectiveness of rare disease diagnosis.
Collapse
Affiliation(s)
- Luana Licata
- Department of Biology, University of Rome Tor Vergata, Roma, Italy
| | - Allegra Via
- Department of Biochemical Sciences “A. Rossi Fanelli”, University of Rome “La Sapienza”, Roma, Italy
| | - Paola Turina
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Giulia Babbi
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | - Claudio Carta
- National Centre for Rare Diseases, Istituto Superiore di Sanità, Roma, Italy
| | - Rita Casadio
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Andrea Cicconardi
- Department of Physics, University of Genova, Genova, Italy
- Italiano di Tecnologia—IIT, Genova, Italy
| | - Angelo Facchiano
- National Research Council, Institute of Food Science, Avellino, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Torino, Italy
| | - Deborah Giordano
- National Research Council, Institute of Food Science, Avellino, Italy
| | - Federica Isidori
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Anna Marabotti
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano, SA, Italy
| | - Pier Luigi Martelli
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Stefano Pascarella
- Department of Biochemical Sciences “A. Rossi Fanelli”, University of Rome “La Sapienza”, Roma, Italy
| | - Michele Pinelli
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy
| | - Tommaso Pippucci
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Roberta Russo
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy
- CEINGE Biotecnologie Avanzate Franco Salvatore, Napoli, Italy
| | - Castrense Savojardo
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Bernardina Scafuri
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano, SA, Italy
| | | | - Emidio Capriotti
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
65
|
Fisher M, James-Zorn C, Ponferrada V, Bell AJ, Sundararaj N, Segerdell E, Chaturvedi P, Bayyari N, Chu S, Pells T, Lotay V, Agalakov S, Wang DZ, Arshinoff BI, Foley S, Karimi K, Vize PD, Zorn AM. Xenbase: key features and resources of the Xenopus model organism knowledgebase. Genetics 2023; 224:iyad018. [PMID: 36755307 PMCID: PMC10158840 DOI: 10.1093/genetics/iyad018] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 01/20/2023] [Accepted: 01/22/2023] [Indexed: 02/10/2023] Open
Abstract
Xenbase (https://www.xenbase.org/), the Xenopus model organism knowledgebase, is a web-accessible resource that integrates the diverse genomic and biological data from research on the laboratory frogs Xenopus laevis and Xenopus tropicalis. The goal of Xenbase is to accelerate discovery and empower Xenopus research, to enhance the impact of Xenopus research data, and to facilitate the dissemination of these data. Xenbase also enhances the value of Xenopus data through high-quality curation, data integration, providing bioinformatics tools optimized for Xenopus experiments, and linking Xenopus data to human data, and other model organisms. Xenbase also plays an indispensable role in making Xenopus data interoperable and accessible to the broader biomedical community in accordance with FAIR principles. Xenbase provides annotated data updates to organizations such as NCBI, UniProtKB, Ensembl, the Gene Ontology consortium, and most recently, the Alliance of Genomic Resources, a common clearing house for data from humans and model organisms. This article provides a brief overview of key and recently added features of Xenbase. New features include processing of Xenopus high-throughput sequencing data from the NCBI Gene Expression Omnibus; curation of anatomical, physiological, and expression phenotypes with the newly created Xenopus Phenotype Ontology; Xenopus Gene Ontology annotations; new anatomical drawings of the Normal Table of Xenopus development; and integration of the latest Xenopus laevis v10.1 genome annotations. Finally, we highlight areas for future development at Xenbase as we continue to support the Xenopus research community.
Collapse
Affiliation(s)
- Malcolm Fisher
- Xenbase, Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Christina James-Zorn
- Xenbase, Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Virgilio Ponferrada
- Xenbase, Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Andrew J Bell
- Xenbase, Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Nivitha Sundararaj
- Xenbase, Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Erik Segerdell
- Xenbase, Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Praneet Chaturvedi
- Xenbase, Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Nadia Bayyari
- Xenbase, Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Stanley Chu
- Xenbase, Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1N4, Canada
| | - Troy Pells
- Xenbase, Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1N4, Canada
| | - Vaneet Lotay
- Xenbase, Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1N4, Canada
| | - Sergei Agalakov
- Xenbase, Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1N4, Canada
| | - Dong Zhuo Wang
- Xenbase, Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1N4, Canada
| | - Bradley I Arshinoff
- Xenbase, Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1N4, Canada
| | - Saoirse Foley
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Kamran Karimi
- Xenbase, Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1N4, Canada
| | - Peter D Vize
- Xenbase, Department of Biological Sciences, University of Calgary, Calgary, AB T2N 1N4, Canada
| | - Aaron M Zorn
- Xenbase, Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
| |
Collapse
|
66
|
Kocere A, Lalonde RL, Mosimann C, Burger A. Lateral thinking in syndromic congenital cardiovascular disease. Dis Model Mech 2023; 16:dmm049735. [PMID: 37125615 PMCID: PMC10184679 DOI: 10.1242/dmm.049735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2023] Open
Abstract
Syndromic birth defects are rare diseases that can present with seemingly pleiotropic comorbidities. Prime examples are rare congenital heart and cardiovascular anomalies that can be accompanied by forelimb defects, kidney disorders and more. Whether such multi-organ defects share a developmental link remains a key question with relevance to the diagnosis, therapeutic intervention and long-term care of affected patients. The heart, endothelial and blood lineages develop together from the lateral plate mesoderm (LPM), which also harbors the progenitor cells for limb connective tissue, kidneys, mesothelia and smooth muscle. This developmental plasticity of the LPM, which founds on multi-lineage progenitor cells and shared transcription factor expression across different descendant lineages, has the potential to explain the seemingly disparate syndromic defects in rare congenital diseases. Combining patient genome-sequencing data with model organism studies has already provided a wealth of insights into complex LPM-associated birth defects, such as heart-hand syndromes. Here, we summarize developmental and known disease-causing mechanisms in early LPM patterning, address how defects in these processes drive multi-organ comorbidities, and outline how several cardiovascular and hematopoietic birth defects with complex comorbidities may be LPM-associated diseases. We also discuss strategies to integrate patient sequencing, data-aggregating resources and model organism studies to mechanistically decode congenital defects, including potentially LPM-associated orphan diseases. Eventually, linking complex congenital phenotypes to a common LPM origin provides a framework to discover developmental mechanisms and to anticipate comorbidities in congenital diseases affecting the cardiovascular system and beyond.
Collapse
Affiliation(s)
- Agnese Kocere
- University of Colorado School of Medicine, Anschutz Medical Campus, Department of Pediatrics, Section of Developmental Biology, Aurora, CO 80045, USA
- Department of Molecular Life Science, University of Zurich, 8057 Zurich, Switzerland
| | - Robert L. Lalonde
- University of Colorado School of Medicine, Anschutz Medical Campus, Department of Pediatrics, Section of Developmental Biology, Aurora, CO 80045, USA
| | - Christian Mosimann
- University of Colorado School of Medicine, Anschutz Medical Campus, Department of Pediatrics, Section of Developmental Biology, Aurora, CO 80045, USA
| | - Alexa Burger
- University of Colorado School of Medicine, Anschutz Medical Campus, Department of Pediatrics, Section of Developmental Biology, Aurora, CO 80045, USA
| |
Collapse
|
67
|
Statzer C, Luthria K, Sharma A, Kann MG, Ewald CY. The Human Extracellular Matrix Diseasome Reveals Genotype-Phenotype Associations with Clinical Implications for Age-Related Diseases. Biomedicines 2023; 11:1212. [PMID: 37189830 PMCID: PMC10135578 DOI: 10.3390/biomedicines11041212] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 04/07/2023] [Accepted: 04/14/2023] [Indexed: 05/17/2023] Open
Abstract
The extracellular matrix (ECM) is earning an increasingly relevant role in many disease states and aging. The analysis of these disease states is possible with the GWAS and PheWAS methodologies, and through our analysis, we aimed to explore the relationships between polymorphisms in the compendium of ECM genes (i.e., matrisome genes) in various disease states. A significant contribution on the part of ECM polymorphisms is evident in various types of disease, particularly those in the core-matrisome genes. Our results confirm previous links to connective-tissue disorders but also unearth new and underexplored relationships with neurological, psychiatric, and age-related disease states. Through our analysis of the drug indications for gene-disease relationships, we identify numerous targets that may be repurposed for age-related pathologies. The identification of ECM polymorphisms and their contributions to disease will play an integral role in future therapeutic developments, drug repurposing, precision medicine, and personalized care.
Collapse
Affiliation(s)
- Cyril Statzer
- Department of Health Sciences and Technology, Institute of Translational Medicine, Eidgenössische Technische Hochschule Zürich, Schwerzenbach, CH-8603 Zurich, Switzerland; (C.S.); (A.S.)
| | - Karan Luthria
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA;
| | - Arastu Sharma
- Department of Health Sciences and Technology, Institute of Translational Medicine, Eidgenössische Technische Hochschule Zürich, Schwerzenbach, CH-8603 Zurich, Switzerland; (C.S.); (A.S.)
| | - Maricel G. Kann
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA;
| | - Collin Y. Ewald
- Department of Health Sciences and Technology, Institute of Translational Medicine, Eidgenössische Technische Hochschule Zürich, Schwerzenbach, CH-8603 Zurich, Switzerland; (C.S.); (A.S.)
| |
Collapse
|
68
|
Hu Y, Comjean A, Attrill H, Antonazzo G, Thurmond J, Li F, Chao T, Mohr SE, Brown NH, Perrimon N. PANGEA: A New Gene Set Enrichment Tool for Drosophila and Common Research Organisms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.20.529262. [PMID: 36865134 PMCID: PMC9980003 DOI: 10.1101/2023.02.20.529262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
Abstract
Gene set enrichment analysis (GSEA) plays an important role in large-scale data analysis, helping scientists discover the underlying biological patterns over-represented in a gene list resulting from, for example, an 'omics' study. Gene Ontology (GO) annotation is the most frequently used classification mechanism for gene set definition. Here we present a new GSEA tool, PANGEA (PAthway, Network and Gene-set Enrichment Analysis; https://www.flyrnai.org/tools/pangea/ ), developed to allow a more flexible and configurable approach to data analysis using a variety of classification sets. PANGEA allows GO analysis to be performed on different sets of GO annotations, for example excluding high-throughput studies. Beyond GO, gene sets for pathway annotation and protein complex data from various resources as well as expression and disease annotation from the Alliance of Genome Resources (Alliance). In addition, visualisations of results are enhanced by providing an option to view network of gene set to gene relationships. The tool also allows comparison of multiple input gene lists and accompanying visualisation tools for quick and easy comparison. This new tool will facilitate GSEA for Drosophila and other major model organisms based on high-quality annotated information available for these species.
Collapse
Affiliation(s)
- Yanhui Hu
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Aram Comjean
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Helen Attrill
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Giulia Antonazzo
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Jim Thurmond
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Fangge Li
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Tiffany Chao
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Stephanie E. Mohr
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Nicholas H. Brown
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Norbert Perrimon
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Drosophila RNAi Screening Center, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
- Howard Hughes Medical Institute, Boston, MA 02138, USA
| |
Collapse
|
69
|
Veljković AN, Orlov YL, Mitić NS. BioGraph: Data Model for Linking and Querying Diverse Biological Metadata. Int J Mol Sci 2023; 24:ijms24086954. [PMID: 37108117 PMCID: PMC10138499 DOI: 10.3390/ijms24086954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 03/30/2023] [Accepted: 04/06/2023] [Indexed: 04/29/2023] Open
Abstract
Studying the association of gene function, diseases, and regulatory gene network reconstruction demands data compatibility. Data from different databases follow distinct schemas and are accessible in heterogenic ways. Although the experiments differ, data may still be related to the same biological entities. Some entities may not be strictly biological, such as geolocations of habitats or paper references, but they provide a broader context for other entities. The same entities from different datasets can share similar properties, which may or may not be found within other datasets. Joint, simultaneous data fetching from multiple data sources is complicated for the end-user or, in many cases, unsupported and inefficient due to differences in data structures and ways of accessing the data. We propose BioGraph-a new model that enables connecting and retrieving information from the linked biological data that originated from diverse datasets. We have tested the model on metadata collected from five diverse public datasets and successfully constructed a knowledge graph containing more than 17 million model objects, of which 2.5 million are individual biological entity objects. The model enables the selection of complex patterns and retrieval of matched results that can be discovered only by joining the data from multiple sources.
Collapse
Affiliation(s)
- Aleksandar N Veljković
- Faculty of Mathematics, University of Belgrade, Studentski trg 16, 11158 Belgrade, Serbia
| | - Yuriy L Orlov
- The Digital Health Institute, I.M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation (Sechenov University), 119991 Moscow, Russia
- Institute of Cytology and Genetics SB RAS, 630090 Novosibirsk, Russia
- Agrarian and Technological Institute, Peoples' Friendship University of Russia, 117198 Moscow, Russia
| | - Nenad S Mitić
- Faculty of Mathematics, University of Belgrade, Studentski trg 16, 11158 Belgrade, Serbia
| |
Collapse
|
70
|
James KN, Phadke S, Wong TC, Chowdhury S. Artificial Intelligence in the Genetic Diagnosis of Rare Disease. Clin Lab Med 2023; 43:127-143. [PMID: 36764805 DOI: 10.1016/j.cll.2022.09.023] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Affiliation(s)
- Kiely N James
- Genomics, Rady Children's Institute for Genomic Medicine, 7910 Frost Street, MC5129, San Diego, CA 92123, USA
| | - Sujal Phadke
- Genomics, Rady Children's Institute for Genomic Medicine, 7910 Frost Street, MC5129, San Diego, CA 92123, USA
| | - Terence C Wong
- Genomics, Rady Children's Institute for Genomic Medicine, 7910 Frost Street, MC5129, San Diego, CA 92123, USA
| | - Shimul Chowdhury
- Rady Children's Institute for Genomic Medicine, 7910 Frost Street, MC5129, San Diego, CA 92123, USA.
| |
Collapse
|
71
|
Chan LE, Thessen AE, Duncan WD, Matentzoglu N, Schmitt C, Grondin CJ, Vasilevsky N, McMurry JA, Robinson PN, Mungall CJ, Haendel MA. The Environmental Conditions, Treatments, and Exposures Ontology (ECTO): connecting toxicology and exposure to human health and beyond. J Biomed Semantics 2023; 14:3. [PMID: 36823605 PMCID: PMC9951428 DOI: 10.1186/s13326-023-00283-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 02/03/2023] [Indexed: 02/25/2023] Open
Abstract
BACKGROUND Evaluating the impact of environmental exposures on organism health is a key goal of modern biomedicine and is critically important in an age of greater pollution and chemicals in our environment. Environmental health utilizes many different research methods and generates a variety of data types. However, to date, no comprehensive database represents the full spectrum of environmental health data. Due to a lack of interoperability between databases, tools for integrating these resources are needed. In this manuscript we present the Environmental Conditions, Treatments, and Exposures Ontology (ECTO), a species-agnostic ontology focused on exposure events that occur as a result of natural and experimental processes, such as diet, work, or research activities. ECTO is intended for use in harmonizing environmental health data resources to support cross-study integration and inference for mechanism discovery. METHODS AND FINDINGS ECTO is an ontology designed for describing organismal exposures such as toxicological research, environmental variables, dietary features, and patient-reported data from surveys. ECTO utilizes the base model established within the Exposure Ontology (ExO). ECTO is developed using a combination of manual curation and Dead Simple OWL Design Patterns (DOSDP), and contains over 2700 environmental exposure terms, and incorporates chemical and environmental ontologies. ECTO is an Open Biological and Biomedical Ontology (OBO) Foundry ontology that is designed for interoperability, reuse, and axiomatization with other ontologies. ECTO terms have been utilized in axioms within the Mondo Disease Ontology to represent diseases caused or influenced by environmental factors, as well as for survey encoding for the Personalized Environment and Genes Study (PEGS). CONCLUSIONS We constructed ECTO to meet Open Biological and Biomedical Ontology (OBO) Foundry principles to increase translation opportunities between environmental health and other areas of biology. ECTO has a growing community of contributors consisting of toxicologists, public health epidemiologists, and health care providers to provide the necessary expertise for areas that have been identified previously as gaps.
Collapse
Affiliation(s)
| | - Anne E Thessen
- Oregon State University, Corvallis, OR, 97331, USA
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80054, USA
| | | | | | - Charles Schmitt
- National Institute of Environmental Health Sciences, Durham, NC, 27709, USA
| | | | - Nicole Vasilevsky
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80054, USA
| | - Julie A McMurry
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80054, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | | | - Melissa A Haendel
- Oregon State University, Corvallis, OR, 97331, USA
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80054, USA
| |
Collapse
|
72
|
Tsueng G, Cano MAA, Bento J, Czech C, Kang M, Pache L, Rasmussen LV, Savidge TC, Starren J, Wu Q, Xin J, Yeaman MR, Zhou X, Su AI, Wu C, Brown L, Shabman RS, Hughes LD. Developing a standardized but extendable framework to increase the findability of infectious disease datasets. Sci Data 2023; 10:99. [PMID: 36823157 PMCID: PMC9950378 DOI: 10.1038/s41597-023-01968-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 01/13/2023] [Indexed: 02/25/2023] Open
Abstract
Biomedical datasets are increasing in size, stored in many repositories, and face challenges in FAIRness (findability, accessibility, interoperability, reusability). As a Consortium of infectious disease researchers from 15 Centers, we aim to adopt open science practices to promote transparency, encourage reproducibility, and accelerate research advances through data reuse. To improve FAIRness of our datasets and computational tools, we evaluated metadata standards across established biomedical data repositories. The vast majority do not adhere to a single standard, such as Schema.org, which is widely-adopted by generalist repositories. Consequently, datasets in these repositories are not findable in aggregation projects like Google Dataset Search. We alleviated this gap by creating a reusable metadata schema based on Schema.org and catalogued nearly 400 datasets and computational tools we collected. The approach is easily reusable to create schemas interoperable with community standards, but customized to a particular context. Our approach enabled data discovery, increased the reusability of datasets from a large research consortium, and accelerated research. Lastly, we discuss ongoing challenges with FAIRness beyond discoverability.
Collapse
Affiliation(s)
- Ginger Tsueng
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA.
| | - Marco A Alvarado Cano
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - José Bento
- Department of Computer Science, Boston College, 245 Beacon St, Chestnut Hill, MA, 02467, USA
| | - Candice Czech
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Mengjia Kang
- Division of Pulmonary and Critical Care, Feinberg School of Medicine, Northwestern University, Chicago, IL, 60611, USA
| | - Lars Pache
- Infectious and Inflammatory Disease Center, Immunity and Pathogenesis Program, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, 92037, USA
| | - Luke V Rasmussen
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Tor C Savidge
- Texas Children's Microbiome Center & Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Justin Starren
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Qinglong Wu
- Texas Children's Microbiome Center & Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Jiwen Xin
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Michael R Yeaman
- Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Divisions of Molecular Medicine and Infectious Diseases, Harbor-UCLA Medical Center, Torrance, CA, 90502, USA
- Lundquist Institute for Infection & Immunity at Harbor-UCLA Medical Center, Torrance, CA, 90502, USA
| | - Xinghua Zhou
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Andrew I Su
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
- Scripps Research Translational Institute, La Jolla, CA, 92037, USA
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Chunlei Wu
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
- Scripps Research Translational Institute, La Jolla, CA, 92037, USA
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Liliana Brown
- Office of Genomics and Advanced Technologies, National Institute of Allergy and Infectious Diseases, Rockville, MD, 20852, USA
| | - Reed S Shabman
- Office of Genomics and Advanced Technologies, National Institute of Allergy and Infectious Diseases, Rockville, MD, 20852, USA
| | - Laura D Hughes
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA.
| |
Collapse
|
73
|
Xu H, Woicik A, Poon H, Altman RB, Wang S. Multilingual translation for zero-shot biomedical classification using BioTranslator. Nat Commun 2023; 14:738. [PMID: 36759510 PMCID: PMC9911740 DOI: 10.1038/s41467-023-36476-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 02/01/2023] [Indexed: 02/11/2023] Open
Abstract
Existing annotation paradigms rely on controlled vocabularies, where each data instance is classified into one term from a predefined set of controlled vocabularies. This paradigm restricts the analysis to concepts that are known and well-characterized. Here, we present the novel multilingual translation method BioTranslator to address this problem. BioTranslator takes a user-written textual description of a new concept and then translates this description to a non-text biological data instance. The key idea of BioTranslator is to develop a multilingual translation framework, where multiple modalities of biological data are all translated to text. We demonstrate how BioTranslator enables the identification of novel cell types using only a textual description and how BioTranslator can be further generalized to protein function prediction and drug target identification. Our tool frees scientists from limiting their analyses within predefined controlled vocabularies, enabling them to interact with biological data using free text.
Collapse
Affiliation(s)
- Hanwen Xu
- School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Addie Woicik
- School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | | | - Russ B Altman
- Department of Bioengineering, Stanford University, Stanford, CA, USA.,Department of Genetics, Stanford University, Stanford, CA, USA.,Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Sheng Wang
- School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
| |
Collapse
|
74
|
Doncheva NT, Morris JH, Holze H, Kirsch R, Nastou KC, Cuesta-Astroz Y, Rattei T, Szklarczyk D, von Mering C, Jensen LJ. Cytoscape stringApp 2.0: Analysis and Visualization of Heterogeneous Biological Networks. J Proteome Res 2023; 22:637-646. [PMID: 36512705 PMCID: PMC9904289 DOI: 10.1021/acs.jproteome.2c00651] [Citation(s) in RCA: 61] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Indexed: 12/15/2022]
Abstract
Biological networks are often used to represent complex biological systems, which can contain several types of entities. Analysis and visualization of such networks is supported by the Cytoscape software tool and its many apps. While earlier versions of stringApp focused on providing intraspecies protein-protein interactions from the STRING database, the new stringApp 2.0 greatly improves the support for heterogeneous networks. Here, we highlight new functionality that makes it possible to create networks that contain proteins and interactions from STRING as well as other biological entities and associations from other sources. We exemplify this by complementing a published SARS-CoV-2 interactome with interactions from STRING. We have also extended stringApp with new data and query functionality for protein-protein interactions between eukaryotic parasites and their hosts. We show how this can be used to retrieve and visualize a cross-species network for a malaria parasite, its host, and its vector. Finally, the latest stringApp version has an improved user interface, allows retrieval of both functional associations and physical interactions, and supports group-wise enrichment analysis of different parts of a network to aid biological interpretation. stringApp is freely available at https://apps.cytoscape.org/apps/stringapp.
Collapse
Affiliation(s)
- Nadezhda T. Doncheva
- Novo
Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen, Denmark
| | - John H. Morris
- Resource
on Biocomputing, Visualization, and Informatics, University of California, San
Francisco, California 94143, United States
| | - Henrietta Holze
- Novo
Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Rebecca Kirsch
- Novo
Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Katerina C. Nastou
- Novo
Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Yesid Cuesta-Astroz
- Instituto
Colombiano de Medicina Tropical, Universidad
CES, 055413 Sabaneta, Colombia
| | - Thomas Rattei
- Centre
for Microbiology and Environmental Systems Science, University of Vienna, 1030 Vienna, Austria
| | - Damian Szklarczyk
- Department
of Molecular Life Sciences, University of
Zurich, 8057 Zurich, Switzerland
- SIB
Swiss
Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Christian von Mering
- Department
of Molecular Life Sciences, University of
Zurich, 8057 Zurich, Switzerland
- SIB
Swiss
Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Lars J. Jensen
- Novo
Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen, Denmark
| |
Collapse
|
75
|
A Tissue-Specific and Toxicology-Focused Knowledge Graph. INFORMATION 2023. [DOI: 10.3390/info14020091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Molecular biology-focused knowledge graphs (KGs) are directed graphs that integrate information from heterogeneous sources of biological and biomedical data, such as ontologies and public databases. They provide a holistic view of biology, chemistry, and disease, allowing users to draw non-obvious connections between concepts through shared associations. While these massive graphs are constructed using carefully curated ontologies and annotations from public databases, much of the information relating the concepts is context specific. Two important variables that determine the applicability of a given ontology annotation are the species and (especially) the tissue type in which it takes place. Using a data-driven approach and the results from thousands of high-quality gene expression samples, we have constructed tissue-specific KGs (using liver, kidney, and heart as examples) that empirically validate the annotations provided by ontology curators. The resulting human-centered KGs are designed for toxicology applications but are generalizable to other areas of human biology, addressing the issue of tissue specificity that often limits the applicability of other large KGs. These knowledge graphs can serve as valuable tools for generating transparent explanations of experimental results in the form of mechanistic hypotheses that are highly relevant to the studied tissue. Because the data-driven relations are derived from a large collection of human in vitro data, these KGs are particularly well suited for in vitro toxicology applications.
Collapse
|
76
|
Chandak P, Huang K, Zitnik M. Building a knowledge graph to enable precision medicine. Sci Data 2023; 10:67. [PMID: 36732524 PMCID: PMC9893183 DOI: 10.1038/s41597-023-01960-3] [Citation(s) in RCA: 96] [Impact Index Per Article: 48.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 01/11/2023] [Indexed: 02/04/2023] Open
Abstract
Developing personalized diagnostic strategies and targeted treatments requires a deep understanding of disease biology and the ability to dissect the relationship between molecular and genetic factors and their phenotypic consequences. However, such knowledge is fragmented across publications, non-standardized repositories, and evolving ontologies describing various scales of biological organization between genotypes and clinical phenotypes. Here, we present PrimeKG, a multimodal knowledge graph for precision medicine analyses. PrimeKG integrates 20 high-quality resources to describe 17,080 diseases with 4,050,249 relationships representing ten major biological scales, including disease-associated protein perturbations, biological processes and pathways, anatomical and phenotypic scales, and the entire range of approved drugs with their therapeutic action, considerably expanding previous efforts in disease-rooted knowledge graphs. PrimeKG contains an abundance of 'indications', 'contradictions', and 'off-label use' drug-disease edges that lack in other knowledge graphs and can support AI analyses of how drugs affect disease-associated networks. We supplement PrimeKG's graph structure with language descriptions of clinical guidelines to enable multimodal analyses and provide instructions for continual updates of PrimeKG as new data become available.
Collapse
Affiliation(s)
- Payal Chandak
- Harvard-MIT Program in Health Sciences and Technology, Cambridge, MA, 02139, USA
| | - Kexin Huang
- Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Harvard University, Boston, MA, 02115, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Harvard Data Science Initiative, Cambridge, MA, 02138, USA.
| |
Collapse
|
77
|
Stefancsik R, Balhoff JP, Balk MA, Ball R, Bello SM, Caron AR, Chessler E, de Souza V, Gehrke S, Haendel M, Harris LW, Harris NL, Ibrahim A, Koehler S, Matentzoglu N, McMurry JA, Mungall CJ, Munoz-Torres MC, Putman T, Robinson P, Smedley D, Sollis E, Thessen AE, Vasilevsky N, Walton DO, Osumi-Sutherland D. The Ontology of Biological Attributes (OBA) - Computational Traits for the Life Sciences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.26.525742. [PMID: 36747660 PMCID: PMC9900877 DOI: 10.1101/2023.01.26.525742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focused measurable trait data. Moreover, variations in gene expression in response to environmental disturbances even without any genetic alterations can also be associated with particular biological attributes. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.
Collapse
Affiliation(s)
- Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - James P. Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC 27517, USA
| | - Meghan A. Balk
- National Ecological Observatory Network, Battelle, Boulder, CO 80301, USA
| | - Robyn Ball
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | | | - Anita R. Caron
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sarah Gehrke
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Melissa Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Laura W. Harris
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Nomi L. Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Arwa Ibrahim
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | | | - Julie A. McMurry
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Christopher J. Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | | | - Damian Smedley
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, UK
| | - Elliot Sollis
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Nicole Vasilevsky
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | | | | |
Collapse
|
78
|
Groza T, Gomez FL, Mashhadi HH, Muñoz-Fuentes V, Gunes O, Wilson R, Cacheiro P, Frost A, Keskivali-Bond P, Vardal B, McCoy A, Cheng TK, Santos L, Wells S, Smedley D, Mallon AM, Parkinson H. The International Mouse Phenotyping Consortium: comprehensive knockout phenotyping underpinning the study of human disease. Nucleic Acids Res 2023; 51:D1038-D1045. [PMID: 36305825 PMCID: PMC9825559 DOI: 10.1093/nar/gkac972] [Citation(s) in RCA: 207] [Impact Index Per Article: 103.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 10/05/2022] [Accepted: 10/14/2022] [Indexed: 01/30/2023] Open
Abstract
The International Mouse Phenotyping Consortium (IMPC; https://www.mousephenotype.org/) web portal makes available curated, integrated and analysed knockout mouse phenotyping data generated by the IMPC project consisting of 85M data points and over 95,000 statistically significant phenotype hits mapped to human diseases. The IMPC portal delivers a substantial reference dataset that supports the enrichment of various domain-specific projects and databases, as well as the wider research and clinical community, where the IMPC genotype-phenotype knowledge contributes to the molecular diagnosis of patients affected by rare disorders. Data from 9,000 mouse lines and 750 000 images provides vital resources enabling the interpretation of the ignorome, and advancing our knowledge on mammalian gene function and the mechanisms underlying phenotypes associated with human diseases. The resource is widely integrated and the lines have been used in over 4,600 publications indicating the value of the data and the materials.
Collapse
Affiliation(s)
- Tudor Groza
- European Bioinformatics Institute, European Molecular Biology Laboratory, Welcome Genome Campus, Hinxton CB10 1SD, UK
| | - Federico Lopez Gomez
- European Bioinformatics Institute, European Molecular Biology Laboratory, Welcome Genome Campus, Hinxton CB10 1SD, UK
| | - Hamed Haseli Mashhadi
- European Bioinformatics Institute, European Molecular Biology Laboratory, Welcome Genome Campus, Hinxton CB10 1SD, UK
| | - Violeta Muñoz-Fuentes
- European Bioinformatics Institute, European Molecular Biology Laboratory, Welcome Genome Campus, Hinxton CB10 1SD, UK
| | - Osman Gunes
- European Bioinformatics Institute, European Molecular Biology Laboratory, Welcome Genome Campus, Hinxton CB10 1SD, UK
| | - Robert Wilson
- European Bioinformatics Institute, European Molecular Biology Laboratory, Welcome Genome Campus, Hinxton CB10 1SD, UK
| | - Pilar Cacheiro
- William Harvey Research Institute, Queen Mary University of London, London EC1M 6BQ, UK
| | - Anthony Frost
- Mary Lyon Centre at MRC Harwell, Harwell Campus OX11 7UE, UK
| | | | - Bora Vardal
- Mary Lyon Centre at MRC Harwell, Harwell Campus OX11 7UE, UK
| | - Aaron McCoy
- Mary Lyon Centre at MRC Harwell, Harwell Campus OX11 7UE, UK
| | - Tsz Kwan Cheng
- Mary Lyon Centre at MRC Harwell, Harwell Campus OX11 7UE, UK
| | - Luis Santos
- Research Data Team, The Turing Institute, 96 Euston Rd, London NW1 2DB, UK
| | - Sara Wells
- Mary Lyon Centre at MRC Harwell, Harwell Campus OX11 7UE, UK
| | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, London EC1M 6BQ, UK
| | - Ann-Marie Mallon
- Research Data Team, The Turing Institute, 96 Euston Rd, London NW1 2DB, UK
| | - Helen Parkinson
- European Bioinformatics Institute, European Molecular Biology Laboratory, Welcome Genome Campus, Hinxton CB10 1SD, UK
| |
Collapse
|
79
|
Mendes de Farias T, Wollbrett J, Robinson-Rechavi M, Bastian F. Lessons learned to boost a bioinformatics knowledge base reusability, the Bgee experience. Gigascience 2022; 12:giad058. [PMID: 37589308 PMCID: PMC10433096 DOI: 10.1093/gigascience/giad058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 05/30/2023] [Accepted: 07/07/2023] [Indexed: 08/18/2023] Open
Abstract
BACKGROUND Enhancing interoperability of bioinformatics knowledge bases is a high-priority requirement to maximize data reusability and thus increase their utility such as the return on investment for biomedical research. A knowledge base may provide useful information for life scientists and other knowledge bases, but it only acquires exchange value once the knowledge base is (re)used, and without interoperability, the utility lies dormant. RESULTS In this article, we discuss several approaches to boost interoperability depending on the interoperable parts. The findings are driven by several real-world scenario examples that were mostly implemented by Bgee, a well-established gene expression knowledge base. To better justify the findings are transferable, for each Bgee interoperability experience, we also highlight similar implementations by major bioinformatics knowledge bases. Moreover, we discuss ten general main lessons learned. These lessons can be applied in the context of any bioinformatics knowledge base to foster data reusability. CONCLUSIONS This work provides pragmatic methods and transferable skills to promote reusability of bioinformatics knowledge bases by focusing on interoperability.
Collapse
Affiliation(s)
- Tarcisio Mendes de Farias
- SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
- Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland
| | - Julien Wollbrett
- SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
- Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland
| | - Marc Robinson-Rechavi
- SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
- Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland
| | - Frederic Bastian
- SIB Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
- Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland
| |
Collapse
|
80
|
Santangelo BE, Gillenwater LA, Salem NM, Hunter LE. Molecular cartooning with knowledge graphs. FRONTIERS IN BIOINFORMATICS 2022; 2:1054578. [PMID: 36568701 PMCID: PMC9772836 DOI: 10.3389/fbinf.2022.1054578] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 11/23/2022] [Indexed: 12/13/2022] Open
Abstract
Molecular "cartoons," such as pathway diagrams, provide a visual summary of biomedical research results and hypotheses. Their ubiquitous appearance within the literature indicates their universal application in mechanistic communication. A recent survey of pathway diagrams identified 64,643 pathway figures published between 1995 and 2019 with 1,112,551 mentions of 13,464 unique human genes participating in a wide variety of biological processes. Researchers generally create these diagrams using generic diagram editing software that does not itself embody any biomedical knowledge. Biomedical knowledge graphs (KGs) integrate and represent knowledge in a semantically consistent way, systematically capturing biomedical knowledge similar to that in molecular cartoons. KGs have the potential to provide context and precise details useful in drawing such figures. However, KGs cannot generally be translated directly into figures. They include substantial material irrelevant to the scientific point of a given figure and are often more detailed than is appropriate. How could KGs be used to facilitate the creation of molecular diagrams? Here we present a new approach towards cartoon image creation that utilizes the semantic structure of knowledge graphs to aid the production of molecular diagrams. We introduce a set of "semantic graphical actions" that select and transform the relational information between heterogeneous entities (e.g., genes, proteins, pathways, diseases) in a KG to produce diagram schematics that meet the scientific communication needs of the user. These semantic actions search, select, filter, transform, group, arrange, connect and extract relevant subgraphs from KGs based on meaning in biological terms, e.g., a protein upstream of a target in a pathway. To demonstrate the utility of this approach, we show how semantic graphical actions on KGs could have been used to produce three existing pathway diagrams in diverse biomedical domains: Down Syndrome, COVID-19, and neuroinflammation. Our focus is on recapitulating the semantic content of the figures, not the layout, glyphs, or other aesthetic aspects. Our results suggest that the use of KGs and semantic graphical actions to produce biomedical diagrams will reduce the effort required and improve the quality of this visual form of scientific communication.
Collapse
|
81
|
Manfredi M, Savojardo C, Martelli PL, Casadio R. E-SNPs&GO: embedding of protein sequence and function improves the annotation of human pathogenic variants. Bioinformatics 2022; 38:5168-5174. [PMID: 36227117 PMCID: PMC9710551 DOI: 10.1093/bioinformatics/btac678] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 09/14/2022] [Accepted: 10/10/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The advent of massive DNA sequencing technologies is producing a huge number of human single-nucleotide polymorphisms occurring in protein-coding regions and possibly changing their sequences. Discriminating harmful protein variations from neutral ones is one of the crucial challenges in precision medicine. Computational tools based on artificial intelligence provide models for protein sequence encoding, bypassing database searches for evolutionary information. We leverage the new encoding schemes for an efficient annotation of protein variants. RESULTS E-SNPs&GO is a novel method that, given an input protein sequence and a single amino acid variation, can predict whether the variation is related to diseases or not. The proposed method adopts an input encoding completely based on protein language models and embedding techniques, specifically devised to encode protein sequences and GO functional annotations. We trained our model on a newly generated dataset of 101 146 human protein single amino acid variants in 13 661 proteins, derived from public resources. When tested on a blind set comprising 10 266 variants, our method well compares to recent approaches released in literature for the same task, reaching a Matthews Correlation Coefficient score of 0.72. We propose E-SNPs&GO as a suitable, efficient and accurate large-scale annotator of protein variant datasets. AVAILABILITY AND IMPLEMENTATION The method is available as a webserver at https://esnpsandgo.biocomp.unibo.it. Datasets and predictions are available at https://esnpsandgo.biocomp.unibo.it/datasets. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matteo Manfredi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna 40126, Italy
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna 40126, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna 40126, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna 40126, Italy
| |
Collapse
|
82
|
Verma A, Damrauer SM, Naseer N, Weaver J, Kripke CM, Guare L, Sirugo G, Kember RL, Drivas TG, Dudek SM, Bradford Y, Lucas A, Judy R, Verma SS, Meagher E, Nathanson KL, Feldman M, Ritchie MD, Rader DJ, BioBank FTPM. The Penn Medicine BioBank: Towards a Genomics-Enabled Learning Healthcare System to Accelerate Precision Medicine in a Diverse Population. J Pers Med 2022; 12:jpm12121974. [PMID: 36556195 PMCID: PMC9785650 DOI: 10.3390/jpm12121974] [Citation(s) in RCA: 62] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 11/17/2022] [Accepted: 11/19/2022] [Indexed: 12/02/2022] Open
Abstract
The Penn Medicine BioBank (PMBB) is an electronic health record (EHR)-linked biobank at the University of Pennsylvania (Penn Medicine). A large variety of health-related information, ranging from diagnosis codes to laboratory measurements, imaging data and lifestyle information, is integrated with genomic and biomarker data in the PMBB to facilitate discoveries and translational science. To date, 174,712 participants have been enrolled into the PMBB, including approximately 30% of participants of non-European ancestry, making it one of the most diverse medical biobanks. There is a median of seven years of longitudinal data in the EHR available on participants, who also consent to permission to recontact. Herein, we describe the operations and infrastructure of the PMBB, summarize the phenotypic architecture of the enrolled participants, and use body mass index (BMI) as a proof-of-concept quantitative phenotype for PheWAS, LabWAS, and GWAS. The major representation of African-American participants in the PMBB addresses the essential need to expand the diversity in genetic and translational research. There is a critical need for a "medical biobank consortium" to facilitate replication, increase power for rare phenotypes and variants, and promote harmonized collaboration to optimize the potential for biological discovery and precision medicine.
Collapse
Affiliation(s)
- Anurag Verma
- Department of Medicine, Division of Translational Medicine and Human Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Correspondence: (A.V.); (D.J.R.)
| | - Scott M. Damrauer
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Surgery, Division of Vascular Surgery and Endovascular Therapy, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Nawar Naseer
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - JoEllen Weaver
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Colleen M. Kripke
- Department of Medicine, Division of Translational Medicine and Human Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Lindsay Guare
- Department of Pathology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Giorgio Sirugo
- Department of Medicine, Division of Translational Medicine and Human Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Rachel L. Kember
- Department of Psychiatry, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Theodore G. Drivas
- Department of Medicine, Division of Translational Medicine and Human Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Scott M. Dudek
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yuki Bradford
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Anastasia Lucas
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Renae Judy
- Department of Surgery, Division of Vascular Surgery and Endovascular Therapy, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Shefali S. Verma
- Department of Pathology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Emma Meagher
- Department of Medicine, Division of Translational Medicine and Human Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Katherine L. Nathanson
- Department of Medicine, Division of Translational Medicine and Human Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Abramson Cancer Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Michael Feldman
- Department of Pathology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Marylyn D. Ritchie
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Daniel J. Rader
- Department of Medicine, Division of Translational Medicine and Human Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Correspondence: (A.V.); (D.J.R.)
| | | |
Collapse
|
83
|
Natale MI, Manzur GB, Lusso SB, Cella E, Giovo ME, Andrada R, Goitia J, Fernández MF, Della Giovanna PS, Guillamondegui MJ, Domínguez M, Gutiérrez O, Izquierdo A, Hernández Herrera H, Velázquez Perdomo LG, Mistchenko AS, Valinotto LE. Analysis of COL7A1 pathogenic variants in a large cohort of dystrophic epidermolysis bullosa patients from Argentina reveals a new genotype-phenotype correlation. Am J Med Genet A 2022; 188:3153-3161. [PMID: 35979658 DOI: 10.1002/ajmg.a.62957] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 06/22/2022] [Accepted: 08/01/2022] [Indexed: 01/31/2023]
Abstract
Dystrophic epidermolysis bullosa (DEB) is a clinically heterogeneous heritable skin disorder, characterized by blistering of the skin and mucous membranes following minor trauma. Dominant (DDEB) and recessive (RDEB) forms are caused by pathogenic variants in COL7A1 gene. Argentina's population has a heterogeneous genetic background, and little is known about the molecular basis of DEB in our country or in native South American populations. In this study, we present the prevalence and geographical distribution of pathogenic variants found in 181 patients from 136 unrelated families (31 DDEB and 105 RDEB). We detected 95 different variants, 59 of them were previously reported in the literature and 36 were novel, nine of which were detected in more than one family. The most prevalent pathogenic variants were identified in exon 73 in DDEB patients and in exon 3 in RDEB patients. We also report a new phenotype-genotype correlation found in 10 unrelated families presenting mild blistering and severe mucosal involvement. Molecular studies in populations with an unexplored genetic background like ours revealed a diversity of pathogenic variants, and we hope that these findings will contribute to the definition of targets for new gene therapies.
Collapse
Affiliation(s)
- Mónica Inés Natale
- Center for Research in Genodermatoses and Epidermolysis Bullosa (CEDIGEA), University of Buenos Aires, Buenos Aires, Argentina
| | - Graciela Beatriz Manzur
- Center for Research in Genodermatoses and Epidermolysis Bullosa (CEDIGEA), University of Buenos Aires, Buenos Aires, Argentina.,Rare Diseases of the Skin Unit, Dr. R. Gutierrez Children's Hospital, Buenos Aires, Argentina.,Dermatology Department, Hospital de Clinicas "Jose de San Martín", Buenos Aires, Argentina
| | - Silvina Beatriz Lusso
- Center for Research in Genodermatoses and Epidermolysis Bullosa (CEDIGEA), University of Buenos Aires, Buenos Aires, Argentina
| | - Eliana Cella
- Pediatric Dermatology, Prof. Dr. Juan P. Garrahan Children's Hospital, Buenos Aires, Argentina
| | - María Elsa Giovo
- Pediatric Dermatology, La Santisima Trinidad Children's Hospital, Córdoba, Argentina
| | - Romina Andrada
- Dermatology, Avelino Castelan Children's Hospital, Resistencia, Chaco, Argentina
| | - Juana Goitia
- Pediatric Dermatology, Sor Maria Ludovica Children's Hospital, La Plata, Buenos Aires, Argentina
| | | | | | | | - Mariángeles Domínguez
- Pediatric Dermatology, Hospital General de Agudos "Carlos G. Durand", Buenos Aires, Argentina
| | - Olga Gutiérrez
- Pediatric Dermatology, Niños de Acosta Ñu Children's Hospital, San Lorenzo, Paraguay
| | - Agustín Izquierdo
- Bioinformatics, Translational Research Unit, Dr. R. Gutiérrez Children's Hospital, Buenos Aires, Argentina
| | - Heliana Hernández Herrera
- Center for Research in Genodermatoses and Epidermolysis Bullosa (CEDIGEA), University of Buenos Aires, Buenos Aires, Argentina.,Dermatology Department, Hospital de Clinicas "Jose de San Martín", Buenos Aires, Argentina
| | - Luz Graciela Velázquez Perdomo
- Center for Research in Genodermatoses and Epidermolysis Bullosa (CEDIGEA), University of Buenos Aires, Buenos Aires, Argentina.,Dermatology Department, Hospital de Clinicas "Jose de San Martín", Buenos Aires, Argentina
| | - Alicia Susana Mistchenko
- Center for Research in Genodermatoses and Epidermolysis Bullosa (CEDIGEA), University of Buenos Aires, Buenos Aires, Argentina
| | - Laura Elena Valinotto
- Center for Research in Genodermatoses and Epidermolysis Bullosa (CEDIGEA), University of Buenos Aires, Buenos Aires, Argentina.,National Scientific and Technical Research Council (CONICET), Argentina
| |
Collapse
|
84
|
Seal RL, Braschi B, Gray K, Jones TEM, Tweedie S, Haim-Vilmovsky L, Bruford EA. Genenames.org: the HGNC resources in 2023. Nucleic Acids Res 2022; 51:D1003-D1009. [PMID: 36243972 PMCID: PMC9825485 DOI: 10.1093/nar/gkac888] [Citation(s) in RCA: 204] [Impact Index Per Article: 68.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 09/28/2022] [Accepted: 10/03/2022] [Indexed: 01/30/2023] Open
Abstract
The HUGO Gene Nomenclature Committee (HGNC) assigns unique symbols and names to human genes. The HGNC database (www.genenames.org) currently contains over 43 000 approved gene symbols, over 19 200 of which are assigned to protein-coding genes, 14 000 to pseudogenes and nearly 9000 to non-coding RNA genes. The public website, www.genenames.org, displays all approved nomenclature within Symbol Reports that contain data curated by HGNC nomenclature advisors and links to related genomic, clinical, and proteomic information. Here, we describe updates to our resource, including improvements to our search facility and new download features.
Collapse
Affiliation(s)
- Ruth L Seal
- To whom correspondence should be addressed. Tel: +44 1223 494444; Fax: +44 1223 494446;
| | - Bryony Braschi
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Kristian Gray
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK,Department of Haematology, University of Cambridge School of Clinical Medicine, Cambridge CB2 0PT, UK
| | - Tamsin E M Jones
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Susan Tweedie
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Liora Haim-Vilmovsky
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Elspeth A Bruford
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK,Department of Haematology, University of Cambridge School of Clinical Medicine, Cambridge CB2 0PT, UK
| |
Collapse
|
85
|
Shen XY, Shi SH, Li H, Wang CC, Zhang Y, Yu H, Li YB, Liu B. The role of Gadd45b in neurologic and neuropsychiatric disorders: An overview. Front Mol Neurosci 2022; 15:1021207. [PMID: 36311022 PMCID: PMC9606402 DOI: 10.3389/fnmol.2022.1021207] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 09/21/2022] [Indexed: 11/26/2022] Open
Abstract
Growth arrest and DNA damage-inducible beta (Gadd45b) is directly intertwined with stress-induced DNA repair, cell cycle arrest, survival, and apoptosis. Previous research on Gadd45b has focused chiefly on non-neuronal cells. Gadd45b is extensively expressed in the nervous system and plays a critical role in epigenetic DNA demethylation, neuroplasticity, and neuroprotection, according to accumulating evidence. This article provided an overview of the preclinical and clinical effects of Gadd45b, as well as its hypothesized mechanisms of action, focusing on major psychosis, depression, autism, stroke, seizure, dementia, Parkinson’s disease, and autoimmune diseases of the nervous system.
Collapse
Affiliation(s)
- Xiao-yue Shen
- Department of Neurology, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
- The First Clinical Medical College, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Shu-han Shi
- College of Traditional Chinese Medicine, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Heng Li
- Department of Neurology, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
| | - Cong-cong Wang
- Department of Neurology, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
| | - Yao Zhang
- Department of Neurology, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
| | - Hui Yu
- Department of Neurology, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
| | - Yan-bin Li
- Department of Neurology, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
- Yan-bin Li,
| | - Bin Liu
- Department of Neurology, The First Affiliated Hospital of Shandong First Medical University and Shandong Provincial Qianfoshan Hospital, Jinan, China
- *Correspondence: Bin Liu,
| |
Collapse
|
86
|
Wood EC, Glen AK, Kvarfordt LG, Womack F, Acevedo L, Yoon TS, Ma C, Flores V, Sinha M, Chodpathumwan Y, Termehchy A, Roach JC, Mendoza L, Hoffman AS, Deutsch EW, Koslicki D, Ramsey SA. RTX-KG2: a system for building a semantically standardized knowledge graph for translational biomedicine. BMC Bioinformatics 2022; 23:400. [PMID: 36175836 PMCID: PMC9520835 DOI: 10.1186/s12859-022-04932-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 09/14/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Biomedical translational science is increasingly using computational reasoning on repositories of structured knowledge (such as UMLS, SemMedDB, ChEMBL, Reactome, DrugBank, and SMPDB in order to facilitate discovery of new therapeutic targets and modalities. The NCATS Biomedical Data Translator project is working to federate autonomous reasoning agents and knowledge providers within a distributed system for answering translational questions. Within that project and the broader field, there is a need for a framework that can efficiently and reproducibly build an integrated, standards-compliant, and comprehensive biomedical knowledge graph that can be downloaded in standard serialized form or queried via a public application programming interface (API). RESULTS To create a knowledge provider system within the Translator project, we have developed RTX-KG2, an open-source software system for building-and hosting a web API for querying-a biomedical knowledge graph that uses an Extract-Transform-Load approach to integrate 70 knowledge sources (including the aforementioned core six sources) into a knowledge graph with provenance information including (where available) citations. The semantic layer and schema for RTX-KG2 follow the standard Biolink model to maximize interoperability. RTX-KG2 is currently being used by multiple Translator reasoning agents, both in its downloadable form and via its SmartAPI-registered interface. Serializations of RTX-KG2 are available for download in both the pre-canonicalized form and in canonicalized form (in which synonyms are merged). The current canonicalized version (KG2.7.3) of RTX-KG2 contains 6.4M nodes and 39.3M edges with a hierarchy of 77 relationship types from Biolink. CONCLUSION RTX-KG2 is the first knowledge graph that integrates UMLS, SemMedDB, ChEMBL, DrugBank, Reactome, SMPDB, and 64 additional knowledge sources within a knowledge graph that conforms to the Biolink standard for its semantic layer and schema. RTX-KG2 is publicly available for querying via its API at arax.rtx.ai/api/rtxkg2/v1.2/openapi.json . The code to build RTX-KG2 is publicly available at github:RTXteam/RTX-KG2 .
Collapse
Affiliation(s)
- E C Wood
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - Amy K Glen
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA.
| | - Lindsey G Kvarfordt
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - Finn Womack
- Computer Science and Engineering, Penn State University, State College, PA, USA
| | - Liliana Acevedo
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - Timothy S Yoon
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - Chunyu Ma
- Huck Institutes of the Life Sciences, Penn State University, State College, PA, USA
| | - Veronica Flores
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - Meghamala Sinha
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | | | - Arash Termehchy
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | | | | | - Andrew S Hoffman
- Interdisciplinary Hub for Digitalization and Society, Radboud University, Nijmegen, The Netherlands
| | | | - David Koslicki
- Computer Science and Engineering, Penn State University, State College, PA, USA
- Huck Institutes of the Life Sciences, Penn State University, State College, PA, USA
- Department of Biology, Penn State University, State College, PA, USA
| | - Stephen A Ramsey
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
- Department of Biomedical Sciences, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
87
|
[Rare-disease data standards]. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2022; 65:1126-1132. [PMID: 36149471 DOI: 10.1007/s00103-022-03591-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 09/01/2022] [Indexed: 11/02/2022]
Abstract
The use of standardized data formats (data standards) in healthcare supports four main goals: (1) exchange of data, (2) integration of computer systems and tools, (3) data storage and archiving, and (4) support of federated databases. Standards are especially important for rare-disease research and clinical care.In this review, we introduce healthcare standards and present a selection of standards that are commonly used in the field of rare diseases. The Human Phenotype Ontology (HPO) is the most commonly used standard for annotating phenotypic abnormalities and supporting phenotype-driven analysis of diagnostic exome and genome sequencing. Numerous standards for diseases are available that support a range of needs. Online Mendelian Inheritance in Man (OMIM) and the Orphanet Rare Disease Ontology (ORDO) are the most important standards developed specifically for rare diseases. The Mondo Disease Ontology (Mondo) is a new disease ontology that aims to integrate data from a comprehensive range of current nosologies. New standards and schemas such as the Medical Action Ontology (MAxO) and the Global Alliance for Genomics and Health (GA4GH) phenopacket are being introduced to extend the scope of standards that support rare disease research.In order to provide optimal care for patients with SE in different healthcare settings, it will be necessary to better integrate standards for rare disease with electronic healthcare resources such as the Fast Healthcare Interoperability Resources (FHIR) standard for healthcare data exchange.
Collapse
|
88
|
Riggs ER, Bingaman TI, Barry CA, Behlmann A, Bluske K, Bostwick B, Bright A, Chen CA, Clause AR, Dharmadhikari AV, Ganapathi M, Gonzaga-Jauregui C, Grant AR, Hughes MY, Kim SR, Krause A, Liao J, Lumaka A, Mah M, Maloney CM, Mohan S, Osei-Owusu IA, Reble E, Rennie O, Savatt JM, Shimelis H, Siegert RK, Sneddon TP, Thaxton C, Toner KA, Tran KT, Webb R, Wilcox EH, Yin J, Zhuo X, Znidarsic M, Martin CL, Betancur C, Vorstman JAS, Miller DT, Schaaf CP. Clinical validity assessment of genes frequently tested on intellectual disability/autism sequencing panels. Genet Med 2022; 24:1899-1908. [PMID: 35616647 PMCID: PMC10200330 DOI: 10.1016/j.gim.2022.05.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 04/28/2022] [Accepted: 05/02/2022] [Indexed: 12/27/2022] Open
Abstract
PURPOSE Neurodevelopmental disorders (NDDs), such as intellectual disability (ID) and autism spectrum disorder (ASD), exhibit genetic and phenotypic heterogeneity, making them difficult to differentiate without a molecular diagnosis. The Clinical Genome Resource Intellectual Disability/Autism Gene Curation Expert Panel (GCEP) uses systematic curation to distinguish ID/ASD genes that are appropriate for clinical testing (ie, with substantial evidence supporting their relationship to disease) from those that are not. METHODS Using the Clinical Genome Resource gene-disease validity curation framework, the ID/Autism GCEP classified genes frequently included on clinical ID/ASD testing panels as Definitive, Strong, Moderate, Limited, Disputed, Refuted, or No Known Disease Relationship. RESULTS As of September 2021, 156 gene-disease pairs have been evaluated. Although most (75%) were determined to have definitive roles in NDDs, 22 (14%) genes evaluated had either Limited or Disputed evidence. Such genes are currently not recommended for use in clinical testing owing to the limited ability to assess the effect of identified variants. CONCLUSION Our understanding of gene-disease relationships evolves over time; new relationships are discovered and previously-held conclusions may be questioned. Without periodic re-examination, inaccurate gene-disease claims may be perpetuated. The ID/Autism GCEP will continue to evaluate these claims to improve diagnosis and clinical care for NDDs.
Collapse
Affiliation(s)
| | | | | | | | | | - Bret Bostwick
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX
| | | | - Chun-An Chen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX
| | | | - Avinash V Dharmadhikari
- Department of Pathology and Laboratory Medicine, Children's Hospital of Los Angeles, Los Angeles, CA; Keck School of Medicine, University of Southern California, Los Angeles, CA
| | - Mythily Ganapathi
- Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY
| | - Claudia Gonzaga-Jauregui
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Juriquilla, Querétaro, Mexico
| | - Andrew R Grant
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA; New York Medical College, Valhalla, NY
| | | | - Se Rin Kim
- National Human Genome Research Institute, Bethesda, MD
| | - Amanda Krause
- Division of Human Genetics, National Health Laboratory Service and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Jun Liao
- Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY
| | - Aimé Lumaka
- Laboratoire de Génétique Humaine, University of Liège, Liège, Belgium
| | - Michelle Mah
- Trillium Health Partners, Mississauga, Ontario, Canada
| | | | | | - Ikeoluwa A Osei-Owusu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Emma Reble
- St. Michael's Hospital, Unity Health Toronto, Toronto, Ontario, Canada
| | - Olivia Rennie
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Juliann M Savatt
- Autism & Developmental Medicine Institute, Geisinger, Danville, PA
| | - Hermela Shimelis
- Autism & Developmental Medicine Institute, Geisinger, Danville, PA
| | - Rebecca K Siegert
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Tam P Sneddon
- Department of Pathology and Laboratory Medicine, School of Medicine, The University of North Carolina, Chapel Hill, NC
| | - Courtney Thaxton
- Department of Pathology and Laboratory Medicine, School of Medicine, The University of North Carolina, Chapel Hill, NC
| | - Kelly A Toner
- Drexel University College of Medicine, Philadelphia, PA
| | - Kien Trung Tran
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Ryan Webb
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Emma H Wilcox
- The Warren Alpert Medical School of Brown University, Providence, RI
| | - Jiani Yin
- Department of Neurology, University of California Los Angeles, Los Angeles, CA
| | - Xinming Zhuo
- The Jackson Laboratory for Genomic Medicine, Farmington, CT
| | - Masa Znidarsic
- University Medical Center Ljubljana, Ljubljana, Slovenia
| | | | - Catalina Betancur
- Sorbonne Université, INSERM, CNRS, Neuroscience Paris Seine, Institut de Biologie Paris Seine, Paris, France
| | - Jacob A S Vorstman
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - David T Miller
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA
| | - Christian P Schaaf
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX; Institute of Human Genetics, Heidelberg University Hospital, Heidelberg, Germany
| |
Collapse
|
89
|
Cheng KC, Burdine RD, Dickinson ME, Ekker SC, Lin AY, Lloyd KCK, Lutz CM, MacRae CA, Morrison JH, O'Connor DH, Postlethwait JH, Rogers CD, Sanchez S, Simpson JH, Talbot WS, Wallace DC, Weimer JM, Bellen HJ. Promoting validation and cross-phylogenetic integration in model organism research. Dis Model Mech 2022; 15:dmm049600. [PMID: 36125045 PMCID: PMC9531892 DOI: 10.1242/dmm.049600] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Model organism (MO) research provides a basic understanding of biology and disease due to the evolutionary conservation of the molecular and cellular language of life. MOs have been used to identify and understand the function of orthologous genes, proteins, cells and tissues involved in biological processes, to develop and evaluate techniques and methods, and to perform whole-organism-based chemical screens to test drug efficacy and toxicity. However, a growing richness of datasets and the rising power of computation raise an important question: How do we maximize the value of MOs? In-depth discussions in over 50 virtual presentations organized by the National Institutes of Health across more than 10 weeks yielded important suggestions for improving the rigor, validation, reproducibility and translatability of MO research. The effort clarified challenges and opportunities for developing and integrating tools and resources. Maintenance of critical existing infrastructure and the implementation of suggested improvements will play important roles in maintaining productivity and facilitating the validation of animal models of human biology and disease.
Collapse
Affiliation(s)
- Keith C. Cheng
- Department of Pathology, Penn State College of Medicine, Hershey, PA 17033, USA
- Institute for Computational and Data Sciences, Pennsylvania State University, Park, PA 16802, USA
| | - Rebecca D. Burdine
- Department of Molecular Biology, Princeton University, Princeton, NJ 08540, USA
| | - Mary E. Dickinson
- Department of Molecular Physiology and Biophysics, Baylor College of Medicine, Houston, TX 77007, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77007, USA
| | - Stephen C. Ekker
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN 55906, USA
| | - Alex Y. Lin
- Department of Pathology, Penn State College of Medicine, Hershey, PA 17033, USA
| | - K. C. Kent Lloyd
- Mouse Biology Program, School of Medicinel, University of California Davis, Davis, CA 95618, USA
- Department of Surgery, School of Medicine, University of California Davis, Davis, CA 95618, USA
| | - Cathleen M. Lutz
- The Jackson Laboratory, Genetic Resource Science, Bar Harbor, ME 04609, USA
| | - Calum A. MacRae
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, 360 Longwood Avenue, Boston, MA 02215, USA
| | - John H. Morrison
- California National Primate Research Center, University of California Davis, Davis, CA 95616, USA
- Department of Neurology, University of California Davis, Davis, CA 95616, USA
| | - David H. O'Connor
- Department of Pathology and Laboratory Medicine, University ofWisconsin-Madison, Madison, WI 53711, USA
| | | | - Crystal D. Rogers
- School of Veterinary Medicine, University of California Davis, Davis, CA 95616, USA
| | - Susan Sanchez
- Department of Infectious Diseases, College of Veterinary Medicine, The University of Georgia, Athens, GA 30602, USA
| | - Julie H. Simpson
- Department of Molecular, Cell and Developmental Biology, University of California, Santa Barbara, CA 93117, USA
| | - William S. Talbot
- Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA
| | - Douglas C. Wallace
- Department of Pediatrics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Jill M. Weimer
- Pediatrics and Rare Diseases Group, Sanford Research, Sioux Falls, SD 57104, USA
| | - Hugo J. Bellen
- Department of Molecular and Human Genetics, Neurological Research Institute (TCH), Baylor College of Medicine, Houston, TX 77007, USA
| |
Collapse
|
90
|
Chen H, Chen X, Zeng F, Fu A, Huang M. Prognostic value of SOX9 in cervical cancer: Bioinformatics and experimental approaches. Front Genet 2022; 13:939328. [PMID: 36003340 PMCID: PMC9394184 DOI: 10.3389/fgene.2022.939328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 06/30/2022] [Indexed: 11/13/2022] Open
Abstract
Among gynecological cancers, cervical cancer is a common malignancy and remains the leading cause of cancer-related death for women. However, the exact molecular pathogenesis of cervical cancer is not known. Hence, understanding the molecular mechanisms underlying cervical cancer pathogenesis will aid in the development of effective treatment modalities. In this research, we attempted to discern candidate biomarkers for cervical cancer by using multiple bioinformatics approaches. First, we performed differential expression analysis based on cervical squamous cell carcinoma and endocervical adenocarcinoma data from The Cancer Genome Atlas database, then used differentially expressed genes for weighted gene co-expression network construction to find the most relevant gene module for cervical cancer. Next, the Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses were performed on the module genes, followed by using protein–protein interaction network analysis and Cytoscape to find the key gene. Finally, we validated the key gene by using multiple online sites and experimental methods. Through weighted gene co-expression network analysis, we found the turquoise module was the highest correlated module with cervical cancer diagnosis. The biological process of the module genes focused on cell proliferation, cell adhesion, and protein binding processes, while the Kyoto Encyclopedia of Genes and Genomes pathway of the module significantly enriched pathways related to cancer and cell circle. Among the module genes, SOX9 was identified as the hub gene, and its expression was associated with cervical cancer prognosis. We found the expression of SOX9 correlates with cancer-associated fibroblast immune infiltration in immune cells by Timer2.0. Furthermore, cancer-associated fibroblast infiltration is linked to cervical cancer patients’ prognosis. Compared to those in normal adjacent, immunohistochemical and real-time quantitative polymerase chain reaction (qPCR) showed that the protein and mRNA expression of SOX9 in cervical cancer were higher. Therefore, the SOX9 gene acts as an oncogene in cervical cancer, interactive with immune infiltration of cancer-associated fibroblasts, thereby affecting the prognosis of patients with cervical cancer.
Collapse
Affiliation(s)
- Huan Chen
- Department of Obstetrics and Gynecology, Zhu Zhou Central Hospital, Zhuzhou, Hunan China
| | - Xupeng Chen
- Laboratory Medicine Center, Zhu Zhou Central Hospital, Zhuzhou, Hunan China
| | - Fanhua Zeng
- Department of Obstetrics and Gynecology, Zhu Zhou Central Hospital, Zhuzhou, Hunan China
| | - Aizhen Fu
- Department of Obstetrics and Gynecology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, China
| | - Meiyuan Huang
- Department of Pathology, Zhu Zhou Central Hospital, Zhuzhou, Hunan China
- *Correspondence: Meiyuan Huang,
| |
Collapse
|
91
|
Unni DR, Moxon SAT, Bada M, Brush M, Bruskiewich R, Caufield JH, Clemons PA, Dancik V, Dumontier M, Fecho K, Glusman G, Hadlock JJ, Harris NL, Joshi A, Putman T, Qin G, Ramsey SA, Shefchek KA, Solbrig H, Soman K, Thessen AE, Haendel MA, Bizon C, Mungall CJ, Acevedo L, Ahalt SC, Alden J, Alkanaq A, Amin N, Avila R, Balhoff J, Baranzini SE, Baumgartner A, Baumgartner W, Belhu B, Brandes M, Brandon N, Burtt N, Byrd W, Callaghan J, Cano MA, Carrell S, Celebi R, Champion J, Chen Z, Chen M, Chung L, Cohen K, Conlin T, Corkill D, Costanzo M, Cox S, Crouse A, Crowder C, Crumbley ME, Dai C, Dančík V, De Miranda Azevedo R, Deutsch E, Dougherty J, Duby MP, Duvvuri V, Edwards S, Emonet V, Fehrmann N, Flannick J, Foksinska AM, Gardner V, Gatica E, Glen A, Goel P, Gormley J, Greyber A, Haaland P, Hanspers K, He K, He K, Henrickson J, Hinderer EW, Hoatlin M, Hoffman A, Huang S, Huang C, Hubal R, Huellas‐Bruskiewicz K, Huls FB, Hunter L, Hyde G, Issabekova T, Jarrell M, Jenkins L, Johs A, Kang J, Kanwar R, Kebede Y, Kim KJ, Kluge A, Knowles M, Koesterer R, Korn D, et alUnni DR, Moxon SAT, Bada M, Brush M, Bruskiewich R, Caufield JH, Clemons PA, Dancik V, Dumontier M, Fecho K, Glusman G, Hadlock JJ, Harris NL, Joshi A, Putman T, Qin G, Ramsey SA, Shefchek KA, Solbrig H, Soman K, Thessen AE, Haendel MA, Bizon C, Mungall CJ, Acevedo L, Ahalt SC, Alden J, Alkanaq A, Amin N, Avila R, Balhoff J, Baranzini SE, Baumgartner A, Baumgartner W, Belhu B, Brandes M, Brandon N, Burtt N, Byrd W, Callaghan J, Cano MA, Carrell S, Celebi R, Champion J, Chen Z, Chen M, Chung L, Cohen K, Conlin T, Corkill D, Costanzo M, Cox S, Crouse A, Crowder C, Crumbley ME, Dai C, Dančík V, De Miranda Azevedo R, Deutsch E, Dougherty J, Duby MP, Duvvuri V, Edwards S, Emonet V, Fehrmann N, Flannick J, Foksinska AM, Gardner V, Gatica E, Glen A, Goel P, Gormley J, Greyber A, Haaland P, Hanspers K, He K, He K, Henrickson J, Hinderer EW, Hoatlin M, Hoffman A, Huang S, Huang C, Hubal R, Huellas‐Bruskiewicz K, Huls FB, Hunter L, Hyde G, Issabekova T, Jarrell M, Jenkins L, Johs A, Kang J, Kanwar R, Kebede Y, Kim KJ, Kluge A, Knowles M, Koesterer R, Korn D, Koslicki D, Krishnamurthy A, Kvarfordt L, Lee J, Leigh M, Lin J, Liu Z, Liu S, Ma C, Magis A, Mamidi T, Mandal M, Mantilla M, Massung J, Mauldin D, McClelland J, McMurry J, Mease P, Mendoza L, Mersmann M, Mesbah A, Might M, Morton K, Muller S, Muluka AT, Osborne J, Owen P, Patton M, Peden DB, Peene RC, Persaud B, Pfaff E, Pico A, Pollard E, Price G, Raj S, Reilly J, Riutta A, Roach J, Roper RT, Rosenblatt G, Rubin I, Rucka S, Rudavsky‐Brody N, Sakaguchi R, Santos E, Schaper K, Schmitt CP, Schurman S, Scott E, Seitanakis S, Sharma P, Shmulevich I, Shrestha M, Shrivastava S, Sinha M, Smith B, Southall N, Southern N, Stillwell L, Strasser M"M, Su AI, Ta C, Thessen AE, Tinglin J, Tonstad L, Tran‐Nguyen T, Tropsha A, Vaidya G, Veenhuis L, Viola A, Grotthuss M, Wang M, Wang P, Watkins PB, Weber R, Wei Q, Weng C, Whitlock J, Williams MD, Williams A, Womack F, Wood E, Wu C, Xin JK, Xu H, Xu C, Yakaboski C, Yao Y, Yi H, Yilmaz A, Zheng M, Zhou X, Zhou E, Zhu Q, Zisk T. Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science. Clin Transl Sci 2022; 15:1848-1855. [PMID: 36125173 PMCID: PMC9372416 DOI: 10.1111/cts.13302] [Show More Authors] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 04/27/2022] [Accepted: 05/02/2022] [Indexed: 12/12/2022] Open
Abstract
Within clinical, biomedical, and translational science, an increasing number of projects are adopting graphs for knowledge representation. Graph-based data models elucidate the interconnectedness among core biomedical concepts, enable data structures to be easily updated, and support intuitive queries, visualizations, and inference algorithms. However, knowledge discovery across these "knowledge graphs" (KGs) has remained difficult. Data set heterogeneity and complexity; the proliferation of ad hoc data formats; poor compliance with guidelines on findability, accessibility, interoperability, and reusability; and, in particular, the lack of a universally accepted, open-access model for standardization across biomedical KGs has left the task of reconciling data sources to downstream consumers. Biolink Model is an open-source data model that can be used to formalize the relationships between data structures in translational science. It incorporates object-oriented classification and graph-oriented features. The core of the model is a set of hierarchical, interconnected classes (or categories) and relationships between them (or predicates) representing biomedical entities such as gene, disease, chemical, anatomic structure, and phenotype. The model provides class and edge attributes and associations that guide how entities should relate to one another. Here, we highlight the need for a standardized data model for KGs, describe Biolink Model, and compare it with other models. We demonstrate the utility of Biolink Model in various initiatives, including the Biomedical Data Translator Consortium and the Monarch Initiative, and show how it has supported easier integration and interoperability of biomedical KGs, bringing together knowledge from multiple sources and helping to realize the goals of translational science.
Collapse
Grants
- OT3TR002019 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR003445 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR003449 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR002515 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR002584 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR003434 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- RM1 HG010860 NHGRI NIH HHS
- OT2TR003433 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR003435 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR002517 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT3TR002027 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR003422 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2 TR003434 NCATS NIH HHS
- OT2TR003441 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT3TR002020 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT3 TR002019 NCATS NIH HHS
- OT2TR003448 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR003428 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR002520 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR003427 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR003436 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR002514 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- R24 OD011883 NIH HHS
- OT2TR003443 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2 TR003443 NCATS NIH HHS
- OT3TR002025 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2 TR003428 NCATS NIH HHS
- OT2TR003437 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR003450 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT3TR002026 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR003430 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- U.S. Department of Energy
- National Human Genome Research Institute
- National Institutes of Health
Collapse
Affiliation(s)
- Deepak R. Unni
- Genome Biology Unit, European Molecular Biology Laboratory Heidelberg Germany
- Division of Environmental Genomics and Systems Biology Lawrence Berkeley National Laboratory Berkeley California USA
| | - Sierra A. T. Moxon
- Division of Environmental Genomics and Systems Biology Lawrence Berkeley National Laboratory Berkeley California USA
| | - Michael Bada
- Center for Health AI University of Colorado Anschutz Medical Campus Aurora Colorado USA
| | - Matthew Brush
- Center for Health AI University of Colorado Anschutz Medical Campus Aurora Colorado USA
| | | | - J. Harry Caufield
- Division of Environmental Genomics and Systems Biology Lawrence Berkeley National Laboratory Berkeley California USA
| | - Paul A. Clemons
- Chemical Biology and Therapeutics Science Program Broad Institute Cambridge Massachusetts USA
| | - Vlado Dancik
- Chemical Biology and Therapeutics Science Program Broad Institute Cambridge Massachusetts USA
| | - Michel Dumontier
- Institute of Data Science Maastricht University Maastricht The Netherlands
| | - Karamarie Fecho
- Renaissance Computing Institute University of North Carolina at Chapel Hill Chapel Hill North Carolina USA
| | | | | | - Nomi L. Harris
- Division of Environmental Genomics and Systems Biology Lawrence Berkeley National Laboratory Berkeley California USA
| | - Arpita Joshi
- Institute for Systems Biology Seattle Washington USA
| | - Tim Putman
- Center for Health AI University of Colorado Anschutz Medical Campus Aurora Colorado USA
| | - Guangrong Qin
- Institute for Systems Biology Seattle Washington USA
| | - Stephen A. Ramsey
- Department of Biomedical Sciences Oregon State University Corvallis Oregon USA
| | - Kent A. Shefchek
- Center for Health AI University of Colorado Anschutz Medical Campus Aurora Colorado USA
| | | | - Karthik Soman
- Department of Neurology University of California San Francisco San Francisco California USA
| | - Anne E. Thessen
- Center for Health AI University of Colorado Anschutz Medical Campus Aurora Colorado USA
| | - Melissa A. Haendel
- Center for Health AI University of Colorado Anschutz Medical Campus Aurora Colorado USA
| | - Chris Bizon
- Renaissance Computing Institute University of North Carolina at Chapel Hill Chapel Hill North Carolina USA
| | - Christopher J. Mungall
- Division of Environmental Genomics and Systems Biology Lawrence Berkeley National Laboratory Berkeley California USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
92
|
Truong TTT, Panizzutti B, Kim JH, Walder K. Repurposing Drugs via Network Analysis: Opportunities for Psychiatric Disorders. Pharmaceutics 2022; 14:1464. [PMID: 35890359 PMCID: PMC9319329 DOI: 10.3390/pharmaceutics14071464] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 06/30/2022] [Accepted: 07/12/2022] [Indexed: 02/04/2023] Open
Abstract
Despite advances in pharmacology and neuroscience, the path to new medications for psychiatric disorders largely remains stagnated. Drug repurposing offers a more efficient pathway compared with de novo drug discovery with lower cost and less risk. Various computational approaches have been applied to mine the vast amount of biomedical data generated over recent decades. Among these methods, network-based drug repurposing stands out as a potent tool for the comprehension of multiple domains of knowledge considering the interactions or associations of various factors. Aligned well with the poly-pharmacology paradigm shift in drug discovery, network-based approaches offer great opportunities to discover repurposing candidates for complex psychiatric disorders. In this review, we present the potential of network-based drug repurposing in psychiatry focusing on the incentives for using network-centric repurposing, major network-based repurposing strategies and data resources, applications in psychiatry and challenges of network-based drug repurposing. This review aims to provide readers with an update on network-based drug repurposing in psychiatry. We expect the repurposing approach to become a pivotal tool in the coming years to battle debilitating psychiatric disorders.
Collapse
Affiliation(s)
- Trang T. T. Truong
- IMPACT, The Institute for Mental and Physical Health and Clinical Translation, School of Medicine, Deakin University, Geelong 3220, Australia; (T.T.T.T.); (B.P.); (J.H.K.)
| | - Bruna Panizzutti
- IMPACT, The Institute for Mental and Physical Health and Clinical Translation, School of Medicine, Deakin University, Geelong 3220, Australia; (T.T.T.T.); (B.P.); (J.H.K.)
| | - Jee Hyun Kim
- IMPACT, The Institute for Mental and Physical Health and Clinical Translation, School of Medicine, Deakin University, Geelong 3220, Australia; (T.T.T.T.); (B.P.); (J.H.K.)
- Mental Health Theme, The Florey Institute of Neuroscience and Mental Health, Parkville 3010, Australia
| | - Ken Walder
- IMPACT, The Institute for Mental and Physical Health and Clinical Translation, School of Medicine, Deakin University, Geelong 3220, Australia; (T.T.T.T.); (B.P.); (J.H.K.)
| |
Collapse
|
93
|
Alghamdi SM, Schofield PN, Hoehndorf R. How much do model organism phenotypes contribute to the computational identification of human disease genes? Dis Model Mech 2022; 15:275986. [PMID: 35758016 PMCID: PMC9366895 DOI: 10.1242/dmm.049441] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 06/13/2022] [Indexed: 12/04/2022] Open
Abstract
Computing phenotypic similarity helps identify new disease genes and diagnose rare diseases. Genotype–phenotype data from orthologous genes in model organisms can compensate for lack of human data and increase genome coverage. In the past decade, cross-species phenotype comparisons have proven valuble, and several ontologies have been developed for this purpose. The relative contribution of different model organisms to computational identification of disease-associated genes is not fully explored. We used phenotype ontologies to semantically relate phenotypes resulting from loss-of-function mutations in model organisms to disease-associated phenotypes in humans. Semantic machine learning methods were used to measure the contribution of different model organisms to the identification of known human gene–disease associations. We found that mouse genotype–phenotype data provided the most important dataset in the identification of human disease genes by semantic similarity and machine learning over phenotype ontologies. Other model organisms' data did not improve identification over that obtained using the mouse alone, and therefore did not contribute significantly to this task. Our work impacts on the development of integrated phenotype ontologies, as well as for the use of model organism phenotypes in human genetic variant interpretation. This article has an associated First Person interview with the first author of the paper. Editor's choice: We investigated the use of model organism phenotypes in the computational identification of disease genes, identifying several data biases and concluding that mouse model phenotypes contribute most to computational disease gene identification.
Collapse
Affiliation(s)
- Sarah M Alghamdi
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, 23955 Thuwal, Saudi Arabia
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, CB2 3EG, Cambridge, UK
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, 23955 Thuwal, Saudi Arabia
| |
Collapse
|
94
|
Heacock ML, Lopez AR, Amolegbe SM, Carlin DJ, Henry HF, Trottier BA, Velasco ML, Suk WA. Enhancing Data Integration, Interoperability, and Reuse to Address Complex and Emerging Environmental Health Problems. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2022; 56:7544-7552. [PMID: 35549252 PMCID: PMC9227711 DOI: 10.1021/acs.est.1c08383] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Indexed: 05/21/2023]
Abstract
Environmental health sciences (EHS) span many diverse disciplines. Within the EHS community, the National Institute of Environmental Health Sciences Superfund Research Program (SRP) funds multidisciplinary research aimed to address pressing and complex issues on how people are exposed to hazardous substances and their related health consequences with the goal of identifying strategies to reduce exposures and protect human health. While disentangling the interrelationships that contribute to environmental exposures and their effects on human health over the course of life remains difficult, advances in data science and data sharing offer a path forward to explore data across disciplines to reveal new insights. Multidisciplinary SRP-funded teams are well-positioned to examine how to best integrate EHS data across diverse research domains to address multifaceted environmental health problems. As such, SRP supported collaborative research projects designed to foster and enhance the interoperability and reuse of diverse and complex data streams. This perspective synthesizes those experiences as a landscape view of the challenges identified while working to increase the FAIR-ness (Findable, Accessible, Interoperable, and Reusable) of EHS data and opportunities to address them.
Collapse
Affiliation(s)
- Michelle L. Heacock
- Superfund
Research Program, National Institute of Environmental Health Sciences
(NIEHS), National Institutes
of Health (NIH), Department of Health and Human Services (DHHS), Research Triangle Park, North Carolina 27709, United States
- . Tel: 984-287-3267
| | | | - Sara M. Amolegbe
- Superfund
Research Program, National Institute of Environmental Health Sciences
(NIEHS), National Institutes
of Health (NIH), Department of Health and Human Services (DHHS), Research Triangle Park, North Carolina 27709, United States
| | - Danielle J. Carlin
- Superfund
Research Program, National Institute of Environmental Health Sciences
(NIEHS), National Institutes
of Health (NIH), Department of Health and Human Services (DHHS), Research Triangle Park, North Carolina 27709, United States
| | - Heather F. Henry
- Superfund
Research Program, National Institute of Environmental Health Sciences
(NIEHS), National Institutes
of Health (NIH), Department of Health and Human Services (DHHS), Research Triangle Park, North Carolina 27709, United States
| | - Brittany A. Trottier
- Superfund
Research Program, National Institute of Environmental Health Sciences
(NIEHS), National Institutes
of Health (NIH), Department of Health and Human Services (DHHS), Research Triangle Park, North Carolina 27709, United States
| | | | - William A. Suk
- Superfund
Research Program, National Institute of Environmental Health Sciences
(NIEHS), National Institutes
of Health (NIH), Department of Health and Human Services (DHHS), Research Triangle Park, North Carolina 27709, United States
| |
Collapse
|
95
|
Yates T, Lain A, Campbell J, FitzPatrick DR, Simpson TI. Creation and evaluation of full-text literature-derived, feature-weighted disease models of genetically determined developmental disorders. Database (Oxford) 2022; 2022:baac038. [PMID: 35670729 PMCID: PMC9216525 DOI: 10.1093/database/baac038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 03/26/2022] [Accepted: 05/25/2022] [Indexed: 11/24/2022]
Abstract
There are >2500 different genetically determined developmental disorders (DD), which, as a group, show very high levels of both locus and allelic heterogeneity. This has led to the wide-spread use of evidence-based filtering of genome-wide sequence data as a diagnostic tool in DD. Determining whether the association of a filtered variant at a specific locus is a plausible explanation of the phenotype in the proband is crucial and commonly requires extensive manual literature review by both clinical scientists and clinicians. Access to a database of weighted clinical features extracted from rigorously curated literature would increase the efficiency of this process and facilitate the development of robust phenotypic similarity metrics. However, given the large and rapidly increasing volume of published information, conventional biocuration approaches are becoming impractical. Here, we present a scalable, automated method for the extraction of categorical phenotypic descriptors from the full-text literature. Papers identified through literature review were downloaded and parsed using the Cadmus custom retrieval package. Human Phenotype Ontology terms were extracted using MetaMap, with 76-84% precision and 65-73% recall. Mean terms per paper increased from 9 in title + abstract, to 68 using full text. We demonstrate that these literature-derived disease models plausibly reflect true disease expressivity more accurately than widely used manually curated models, through comparison with prospectively gathered data from the Deciphering Developmental Disorders study. The area under the curve for receiver operating characteristic (ROC) curves increased by 5-10% through the use of literature-derived models. This work shows that scalable automated literature curation increases performance and adds weight to the need for this strategy to be integrated into informatic variant analysis pipelines. Database URL: https://doi.org/10.1093/database/baac038.
Collapse
Affiliation(s)
- T.M Yates
- MRC Human Genetics Unit, Western General Hospital, Institute of Genetics and Cancer, The University of Edinburgh, Crewe Road South, Edinburgh EH4 2XU, UK
- Transforming Genetic Medicine Initiative, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - A Lain
- Institute for Adaptive and Neural Computation, Informatics Forum, The University of Edinburgh, 10 Crichton Street, Edinburgh EH8 9AB, UK
| | - J Campbell
- MRC Human Genetics Unit, Western General Hospital, Institute of Genetics and Cancer, The University of Edinburgh, Crewe Road South, Edinburgh EH4 2XU, UK
- Simons Initiative for the Developing Brain, The University of Edinburgh, Hugh Robson Building, George Square, Edinburgh EH8 9XF, UK
| | - D R FitzPatrick
- MRC Human Genetics Unit, Western General Hospital, Institute of Genetics and Cancer, The University of Edinburgh, Crewe Road South, Edinburgh EH4 2XU, UK
- Transforming Genetic Medicine Initiative, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- Simons Initiative for the Developing Brain, The University of Edinburgh, Hugh Robson Building, George Square, Edinburgh EH8 9XF, UK
| | - T I Simpson
- Institute for Adaptive and Neural Computation, Informatics Forum, The University of Edinburgh, 10 Crichton Street, Edinburgh EH8 9AB, UK
- Simons Initiative for the Developing Brain, The University of Edinburgh, Hugh Robson Building, George Square, Edinburgh EH8 9XF, UK
| |
Collapse
|
96
|
Larkindale J, Betourne A, Borens A, Boulanger V, Theurer Crider V, Gavin P, Burton J, Liwski R, Romero K, Walls R, Barrett JS. Innovations in Therapy Development for Rare Diseases Through the Rare Disease Cures Accelerator-Data and Analytics Platform. Ther Innov Regul Sci 2022; 56:768-776. [PMID: 35668316 DOI: 10.1007/s43441-022-00408-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 04/07/2022] [Indexed: 10/18/2022]
Abstract
Rare diseases impact the lives of an estimated 350 million people worldwide, and yet about 90% of rare diseases remain without an approved treatment. New technologies have become available, such as gene and oligonucleotide therapies, that offer great promise in treating rare diseases. However, progress toward the development of therapies to treat these diseases is hampered by a limited understanding of the course of each rare disease, how changes in disease progression occur and can be effectively measured over time, and challenges in designing and running clinical trials in diseases where the natural history is poorly characterized. Data that could be used to characterize the natural history of each disease has often been collected in various ways, including in electronic health records, patient-report registries, clinical natural history studies, and in past clinical trials. However, each data source contains a limited number of subjects and different data elements, and data is frequently kept proprietary in the hands of the study sponsor rather than shared widely across the rare disease community. The Rare Disease Cures Accelerator-Data and Analytics Platform (RDCA-DAP) is an FDA-funded effort to overcome these persistent challenges. By aggregating data across all rare diseases and making that data available to the community to support understanding of rare disease natural history and inform drug development, RDCA-DAP aims to accelerate the regulatory approval of new therapies. RDCA-DAP curates, standardizes, and tags data across rare disease datasets to make it findable within the database, and contains a built-in analytics platform to help visualize, interpret, and use it to support drug development. RDCA-DAP will coordinate data and tool resources across non-profit, commercial, and for-profit entities to serve a diverse array of rare disease stakeholders that includes academic researchers, drug developers, FDA reviewers and of course patients and their caregivers. Drug development programs utilizing the RDCA-DAP will be able to leverage existing data to support their efforts and reach definitive decisions on the efficacy of their therapeutics more efficiently and more rapidly than ever.
Collapse
Affiliation(s)
- Jane Larkindale
- Rare Disease Cures Accelerator - Data Analysis Platform (RDCA-DAP), Tucson, USA
| | - Alexandre Betourne
- Rare Disease Cures Accelerator - Data Analysis Platform (RDCA-DAP), Tucson, USA
| | | | | | | | - Pamela Gavin
- National Organization for Rare Disorders (NORD), Danbury, CT, USA
| | - Jackson Burton
- Quantitative Medicine (QM) Groups, Critical Path Institute, Tucson, AZ, USA
| | | | - Klaus Romero
- Quantitative Medicine (QM) Groups, Critical Path Institute, Tucson, AZ, USA
| | | | - Jeffrey S Barrett
- Rare Disease Cures Accelerator - Data Analysis Platform (RDCA-DAP), Tucson, USA. .,Critical Path Institute, 1730 East River Road, Tucson, AZ, 85718-5893, USA.
| |
Collapse
|
97
|
Dhombres F, Morgan P, Chaudhari BP, Filges I, Sparks TN, Lapunzina P, Roscioli T, Agarwal U, Aggarwal S, Beneteau C, Cacheiro P, Carmody LC, Collardeau‐Frachon S, Dempsey EA, Dufke A, Duyzend MH, el Ghosh M, Giordano JL, Glad R, Grinfelde I, Iliescu DG, Ladewig MS, Munoz‐Torres MC, Pollazzon M, Radio FC, Rodo C, Silva RG, Smedley D, Sundaramurthi JC, Toro S, Valenzuela I, Vasilevsky NA, Wapner RJ, Zemet R, Haendel MA, Robinson PN. Prenatal phenotyping: A community effort to enhance the Human Phenotype Ontology. AMERICAN JOURNAL OF MEDICAL GENETICS. PART C, SEMINARS IN MEDICAL GENETICS 2022; 190:231-242. [PMID: 35872606 PMCID: PMC9588534 DOI: 10.1002/ajmg.c.31989] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 07/01/2022] [Indexed: 01/07/2023]
Abstract
Technological advances in both genome sequencing and prenatal imaging are increasing our ability to accurately recognize and diagnose Mendelian conditions prenatally. Phenotype-driven early genetic diagnosis of fetal genetic disease can help to strategize treatment options and clinical preventive measures during the perinatal period, to plan in utero therapies, and to inform parental decision-making. Fetal phenotypes of genetic diseases are often unique and at present are not well understood; more comprehensive knowledge about prenatal phenotypes and computational resources have an enormous potential to improve diagnostics and translational research. The Human Phenotype Ontology (HPO) has been widely used to support diagnostics and translational research in human genetics. To better support prenatal usage, the HPO consortium conducted a series of workshops with a group of domain experts in a variety of medical specialties, diagnostic techniques, as well as diseases and phenotypes related to prenatal medicine, including perinatal pathology, musculoskeletal anomalies, neurology, medical genetics, hydrops fetalis, craniofacial malformations, cardiology, neonatal-perinatal medicine, fetal medicine, placental pathology, prenatal imaging, and bioinformatics. We expanded the representation of prenatal phenotypes in HPO by adding 95 new phenotype terms under the Abnormality of prenatal development or birth (HP:0001197) grouping term, and revised definitions, synonyms, and disease annotations for most of the 152 terms that existed before the beginning of this effort. The expansion of prenatal phenotypes in HPO will support phenotype-driven prenatal exome and genome sequencing for precision genetic diagnostics of rare diseases to support prenatal care.
Collapse
Affiliation(s)
- Ferdinand Dhombres
- Sorbonne University, GRC26, INSERM, Limics, Armand Trousseau Hospital, Fetal Medicine Department, APHPParisFrance
| | - Patricia Morgan
- American College of Medical Genetics and Genomics, Newborn Screening Translational Research NetworkBethesdaMarylandUSA
| | - Bimal P. Chaudhari
- Institute for Genomic MedicineNationwide Children's HospitalColumbusOhioUSA
| | - Isabel Filges
- University Hospital Basel and University of Basel, Medical GeneticsBaselSwitzerland
| | - Teresa N. Sparks
- Department of Obstetrics, Gynecology, & Reproductive SciencesUniversity of California, San FranciscoSan FranciscoCaliforniaUSA
| | - Pablo Lapunzina
- CIBERER and Hospital Universitario La Paz, INGEMM‐Institute of Medical and Molecular GeneticsMadridSpain
| | - Tony Roscioli
- Neuroscience Research Australia (NeuRA), University of New South WalesSydneyNew South WalesAustralia
| | - Umber Agarwal
- Department of Maternal and Fetal MedicineLiverpool Women's NHS Foundation TrustLiverpoolUK
| | - Shagun Aggarwal
- Department of Medical GeneticsNizam's Institute of Medical SciencesHyderabadTelanganaIndia
| | - Claire Beneteau
- Service de Génétique Médicale, UF 9321 de Fœtopathologie et Génétique, CHU de NantesNantesFrance
| | - Pilar Cacheiro
- William Harvey Research InstituteQueen Mary University of LondonLondonUK
| | - Leigh C. Carmody
- Department of Genomic MedicineThe Jackson LaboratoryFarmingtonConnecticutUSA
| | | | - Esther A. Dempsey
- St George's University of London, Molecular and Clinical Sciences Research InstituteLondonUK
| | - Andreas Dufke
- University of Tübingen, Institute of Medical Genetics and Applied GenomicsTübingenGermany
| | | | | | - Jessica L. Giordano
- Department of Obstetrics and GynecologyColumbia University Irving Medical CenterNew YorkNew YorkUSA
| | - Ragnhild Glad
- Department of Obstetrics and GynecologyUniversity Hospital of North NorwayTromsøNorway
| | - Ieva Grinfelde
- Department of Medical Genetics and Prenatal diagnosisChildren's University HospitalRigaLatvia
| | - Dominic G. Iliescu
- Department of Obstetrics and GynecologyUniversity of Medicine and Pharmacy CraiovaCraiovaDoljRomania
| | - Markus S. Ladewig
- Department of OphthalmologyKlinikum SaarbrückenSaarbrückenSaarlandGermany
| | - Monica C. Munoz‐Torres
- Department of Biochemistry and Molecular GeneticsUniversity of Colorado Anschutz Medical CampusAuroraColoradoUSA
| | - Marzia Pollazzon
- Azienda USL‐IRCCS di Reggio EmiliaMedical Genetics UnitReggio EmiliaItaly
| | | | - Carlota Rodo
- Vall d'Hebron Hospital Campus, Maternal & Fetal MedicineBarcelonaSpain
| | - Raquel Gouveia Silva
- Hospital Santa Maria, Serviço de Genética, Departamento de PediatriaHospital de Santa Maria, Centro Hospitalar Universitário Lisboa Norte, Centro Académico de Medicina de LisboaLisboaPortugal
| | - Damian Smedley
- William Harvey Research InstituteQueen Mary University of LondonLondonUK
| | | | - Sabrina Toro
- Department of Biochemistry and Molecular GeneticsUniversity of Colorado Anschutz Medical CampusAuroraColoradoUSA
| | - Irene Valenzuela
- Hospital Vall d'Hebron, Clinical and Molecular Genetics AreaBarcelonaSpain
| | - Nicole A. Vasilevsky
- Department of Biochemistry and Molecular GeneticsUniversity of Colorado Anschutz Medical CampusAuroraColoradoUSA
| | - Ronald J. Wapner
- Department of Obstetrics and GynecologyColumbia University Irving Medical CenterNew YorkNew YorkUSA
| | - Roni Zemet
- Department of Molecular and Human GeneticsBaylor College of MedicineHoustonTexasUSA
| | - Melissa A Haendel
- Department of Biochemistry and Molecular GeneticsUniversity of Colorado Anschutz Medical CampusAuroraColoradoUSA
| | - Peter N. Robinson
- Department of Genomic MedicineThe Jackson LaboratoryFarmingtonConnecticutUSA
| |
Collapse
|
98
|
Fujiwara T, Shin J, Yamaguchi A. Advances in the development of PubCaseFinder, including the new application programming interface and matching algorithm. Hum Mutat 2022; 43:734-742. [PMID: 35143083 PMCID: PMC9305291 DOI: 10.1002/humu.24341] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 01/17/2022] [Accepted: 02/07/2022] [Indexed: 11/11/2022]
Abstract
Over 10,000 rare genetic diseases have been identified, and millions of newborns are affected by severe rare genetic diseases each year. A variety of Human Phenotype Ontology (HPO)-based clinical decision support systems (CDSS) and patient repositories have been developed to support clinicians in diagnosing patients with suspected rare genetic diseases. In September 2017, we released PubCaseFinder (https://pubcasefinder.dbcls.jp), a web-based CDSS that provides ranked lists of genetic and rare diseases using HPO-based phenotypic similarities, where top-listed diseases represent the most likely differential diagnosis. We also developed a Matchmaker Exchange (MME) application programming interface (API) to query PubCaseFinder, which has been adopted by several patient repositories. In this paper, we describe notable updates regarding PubCaseFinder, the GeneYenta matching algorithm implemented in PubCaseFinder, and the PubCaseFinder API. The updated GeneYenta matching algorithm improves the performance of the CDSS automated differential diagnosis function. Moreover, the updated PubCaseFinder and new API empower patient repositories participating in MME and medical professionals to actively use HPO-based resources.
Collapse
Affiliation(s)
- Toyofumi Fujiwara
- Database Center for Life Science, Joint Support‐Center for Data Science ResearchResearch Organization of Information and SystemsKashiwa‐shiChiba‐kenJapan
| | - Jae‐Moon Shin
- Database Center for Life Science, Joint Support‐Center for Data Science ResearchResearch Organization of Information and SystemsKashiwa‐shiChiba‐kenJapan
| | - Atsuko Yamaguchi
- Graduate School of Integrative Science and EngineeringTokyo City UniversitySetagaya‐kuTokyoJapan
| |
Collapse
|
99
|
Boycott KM, Azzariti DR, Hamosh A, Rehm HL. Seven years since the launch of the Matchmaker Exchange: The evolution of genomic matchmaking. Hum Mutat 2022; 43:659-667. [PMID: 35537081 PMCID: PMC9133175 DOI: 10.1002/humu.24373] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 03/22/2022] [Indexed: 11/09/2022]
Abstract
The Matchmaker Exchange (MME) was launched in 2015 to provide a robust mechanism to discover novel disease-gene relationships. It operates as a federated network connecting databases holding relevant data using a common application programming interface, where two or more users are looking for a match for the same gene (two-sided matchmaking). Seven years from its launch, it is clear that the MME is making outstanding contributions to understanding the morbid anatomy of the genome. The number of unique genes present across the MME has steadily increased over time; there are currently >13,520 unique genes (~68% of all protein-coding genes) connected across the MME's eight genomic matchmaking nodes, GeneMatcher, DECIPHER, PhenomeCentral, MyGene2, seqr, Initiative on Rare and Undiagnosed Disease, PatientMatcher, and the RD-Connect Genome-Phenome Analysis Platform. The collective data set accessible across the MME currently includes more than 120,000 cases from over 12,000 contributors in 98 countries. The discovery of potential new disease-gene relationships is happening daily and international collaborative teams are moving these advances forward to publication, now numbering well over 500. Expansion of data sharing into routine clinical practice by clinicians, genetic counselors, and clinical laboratories has ensured access to discovery for even more individuals with undiagnosed rare genetic diseases. Tens of thousands of patients and their family members have been directly or indirectly impacted by the discoveries facilitated by two-sided genomic matchmaking. MME supports further connections to the literature (PubCaseFinder) and to human and model organism resources (Monarch Initiative) and scientists (ModelMatcher). Efforts are now underway to explore additional approaches to matchmaking at the gene or variant level where there is only one querier (one-sided matchmaking). Genomic matchmaking has proven its utility over the past 7 years and will continue to facilitate discoveries in the years to come.
Collapse
Affiliation(s)
- Kym M. Boycott
- Children’s Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Ontario, Canada
| | - Danielle R. Azzariti
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Ada Hamosh
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Heidi L. Rehm
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA
| |
Collapse
|
100
|
Pais LS, Snow H, Weisburd B, Zhang S, Baxter SM, DiTroia S, O’Heir E, England E, Chao KR, Lemire G, Osei-Owusu I, VanNoy GE, Wilson M, Nguyen K, Arachchi H, Phu W, Solomonson M, Mano S, O’Leary M, Lovgren A, Babb L, Austin-Tse CA, Rehm HL, MacArthur DG, O’Donnell-Luria A. seqr: A web-based analysis and collaboration tool for rare disease genomics. Hum Mutat 2022; 43:698-707. [PMID: 35266241 PMCID: PMC9903206 DOI: 10.1002/humu.24366] [Citation(s) in RCA: 54] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 02/23/2022] [Accepted: 03/04/2022] [Indexed: 02/04/2023]
Abstract
Exome and genome sequencing have become the tools of choice for rare disease diagnosis, leading to large amounts of data available for analyses. To identify causal variants in these datasets, powerful filtering and decision support tools that can be efficiently used by clinicians and researchers are required. To address this need, we developed seqr - an open-source, web-based tool for family-based monogenic disease analysis that allows researchers to work collaboratively to search and annotate genomic callsets. To date, seqr is being used in several research pipelines and one clinical diagnostic lab. In our own experience through the Broad Institute Center for Mendelian Genomics, seqr has enabled analyses of over 10,000 families, supporting the diagnosis of more than 3,800 individuals with rare disease and discovery of over 300 novel disease genes. Here, we describe a framework for genomic analysis in rare disease that leverages seqr's capabilities for variant filtration, annotation, and causal variant identification, as well as support for research collaboration and data sharing. The seqr platform is available as open source software, allowing low-cost participation in rare disease research, and a community effort to support diagnosis and gene discovery in rare disease.
Collapse
Affiliation(s)
- Lynn S. Pais
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA,Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Hana Snow
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Ben Weisburd
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Shifa Zhang
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Samantha M. Baxter
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Stephanie DiTroia
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA,Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Emily O’Heir
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA,Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Eleina England
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA,Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Katherine R. Chao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Gabrielle Lemire
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA,Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Ikeoluwa Osei-Owusu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Grace E. VanNoy
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Michael Wilson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Kevin Nguyen
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Harindra Arachchi
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - William Phu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA,Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Matthew Solomonson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Stacy Mano
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Melanie O’Leary
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Alysia Lovgren
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Lawrence Babb
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Christina A. Austin-Tse
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA,Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Heidi L. Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA,Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Daniel G. MacArthur
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA,Centre for Population Genomics, Garvan Institute of Medical Research and UNSW Sydney, Sydney, Australia,Centre for Population Genomics, Murdoch Children’s Research Institute, Melbourne, Australia
| | - Anne O’Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA,Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|