1
|
Zhang J, Jiang Q, Du Z, Geng Y, Hu Y, Tong Q, Song Y, Zhang HY, Yan X, Feng Z. Knowledge graph-derived feed efficiency analysis via pig gut microbiota. Sci Rep 2024; 14:13939. [PMID: 38886444 PMCID: PMC11182767 DOI: 10.1038/s41598-024-64835-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 06/13/2024] [Indexed: 06/20/2024] Open
Abstract
Feed efficiency (FE) is essential for pig production, has been reported to be partially explained by gut microbiota. Despite an extensive body of research literature to this topic, studies regarding the regulation of feed efficiency by gut microbiota remain fragmented and mostly confined to disorganized or semi-structured unrestricted texts. Meanwhile, structured databases for microbiota analysis are available, yet they often lack a comprehensive understanding of the associated biological processes. Therefore, we have devised an approach to construct a comprehensive knowledge graph by combining unstructured textual intelligence with structured database information and applied it to investigate the relationship between pig gut microbes and FE. Firstly, we created the pgmReading knowledge base and the domain ontology of pig gut microbiota by annotating, extracting, and integrating semantic information from 157 scientific publications. Secondly, we created the pgmPubtator by utilizing PubTator to expand the semantic information related to microbiota. Thirdly, we created the pgmDatabase by mapping and combining the ADDAGMA, gutMGene, and KEGG databases based on the ontology. These three knowledge bases were integrated to form the Pig Gut Microbial Knowledge Graph (PGMKG). Additionally, we created five biological query cases to validate the performance of PGMKG. These cases not only allow us to identify microbes with the most significant impact on FE but also provide insights into the metabolites produced by these microbes and the associated metabolic pathways. This study introduces PGMKG, mapping key microbes in pig feed efficiency and guiding microbiota-targeted optimization.
Collapse
Affiliation(s)
- Junmei Zhang
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Qin Jiang
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
- Yazhouwan National Laboratory (YNL), Sanya, 572025, China
| | - Zhihong Du
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yilin Geng
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yuren Hu
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Qichang Tong
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yunfeng Song
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hong-Yu Zhang
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xianghua Yan
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China
| | - Zaiwen Feng
- National Key Laboratory of Agricultural Microbiology, College of Informatics, College of Animal Sciences and Technology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
2
|
Zhao J, Gao J, Ma S, Chen X, Wang J. Predicting the potential risks posed by antidepressants as emerging contaminants in fish based on network pharmacological analysis. Toxicol In Vitro 2024; 99:105872. [PMID: 38851602 DOI: 10.1016/j.tiv.2024.105872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 05/23/2024] [Accepted: 06/05/2024] [Indexed: 06/10/2024]
Abstract
This study conducted a network pharmacology-based analysis to simultaneously discern a broad spectrum of potential environmental risks and health effects of antidepressants, a common class of pharmaceutical emerging contaminants (PECs) possessing a complex pharmacological profile, and in silico predict the adverse phenotypes potentially occurring in fish associated with exposure to antidepressants and their mixtures under realistic exposure scenarios. Results showed that 24 of the included 39 antidepressants had been detected worldwide in water environment across 50 countries. Using the environmentally realistic exposure scenario for China as an example, the predicted blood concentrations of antidepressant residues that were generated based on the Fish Plasma Model ranged from 37.89 (Alprazolam) to 16,772.05 (Sertraline) ng/L in exposed fish. Hazard-based bioactivity network without regard to concentration data was composed of 148 potential targets and 701 antidepressant-target interactions. After filtering each antidepressant-target interaction node using the predicted drug concentrations in the blood of fish under realistic exposure scenarios in China, an environmental risk-based network was refined and showed that 11 targets, including muscarinic acetylcholine receptor M1, alpha-2B adrenergic receptor, serotonin 2 A receptor, etc. might be modulated by antidepressants at concentrations equal to or below the environmental exposure levels and their mixtures in fish. Environmentally relevant concentrations of antidepressants in water samples from China might perturb the behavior, stress response, phototaxis, and development in exposed fish.
Collapse
Affiliation(s)
- Jinru Zhao
- Hubei Province Key Laboratory of Occupational Hazard Identification and Control, Medical College, Wuhan University of Science and Technology, Wuhan, China
| | - Jian Gao
- Hubei Province Key Laboratory of Occupational Hazard Identification and Control, Medical College, Wuhan University of Science and Technology, Wuhan, China
| | - Sijia Ma
- Hubei Province Key Laboratory of Occupational Hazard Identification and Control, Medical College, Wuhan University of Science and Technology, Wuhan, China
| | - Xintong Chen
- Hubei Province Key Laboratory of Occupational Hazard Identification and Control, Medical College, Wuhan University of Science and Technology, Wuhan, China
| | - Jun Wang
- Hubei Province Key Laboratory of Occupational Hazard Identification and Control, Medical College, Wuhan University of Science and Technology, Wuhan, China.
| |
Collapse
|
3
|
Maier A, Hartung M, Abovsky M, Adamowicz K, Bader GD, Baier S, Blumenthal DB, Chen J, Elkjaer ML, Garcia-Hernandez C, Helmy M, Hoffmann M, Jurisica I, Kotlyar M, Lazareva O, Levi H, List M, Lobentanzer S, Loscalzo J, Malod-Dognin N, Manz Q, Matschinske J, Mee M, Oubounyt M, Pastrello C, Pico AR, Pillich RT, Poschenrieder JM, Pratt D, Pržulj N, Sadegh S, Saez-Rodriguez J, Sarkar S, Shaked G, Shamir R, Trummer N, Turhan U, Wang RS, Zolotareva O, Baumbach J. Drugst.One - a plug-and-play solution for online systems medicine and network-based drug repurposing. Nucleic Acids Res 2024:gkae388. [PMID: 38783119 DOI: 10.1093/nar/gkae388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 04/08/2024] [Accepted: 04/29/2024] [Indexed: 05/25/2024] Open
Abstract
In recent decades, the development of new drugs has become increasingly expensive and inefficient, and the molecular mechanisms of most pharmaceuticals remain poorly understood. In response, computational systems and network medicine tools have emerged to identify potential drug repurposing candidates. However, these tools often require complex installation and lack intuitive visual network mining capabilities. To tackle these challenges, we introduce Drugst.One, a platform that assists specialized computational medicine tools in becoming user-friendly, web-based utilities for drug repurposing. With just three lines of code, Drugst.One turns any systems biology software into an interactive web tool for modeling and analyzing complex protein-drug-disease networks. Demonstrating its broad adaptability, Drugst.One has been successfully integrated with 21 computational systems medicine tools. Available at https://drugst.one, Drugst.One has significant potential for streamlining the drug discovery process, allowing researchers to focus on essential aspects of pharmaceutical treatment research.
Collapse
Affiliation(s)
- Andreas Maier
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Michael Hartung
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Mark Abovsky
- Division of Orthopaedic Surgery, Schroeder Arthritis Institute, Toronto, Canada
- Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, Toronto, ON M5T 0S8, Canada
| | - Klaudia Adamowicz
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Gary D Bader
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, ON, Canada
| | - Sylvie Baier
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - David B Blumenthal
- Department Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander University Erlangen-Nürnberg (FAU), 91052 Erlangen, Germany
| | - Jing Chen
- Department of Medicine, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Maria L Elkjaer
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Department of Neurology, Odense University Hospital, Odense, Denmark
- Institute of Clinical Research, University of Southern Denmark, Odense, Denmark
- Institute of Molecular Medicine, University of Southern Denmark, Odense, Denmark
| | | | - Mohamed Helmy
- Vaccine and Infectious Disease Organization (VIDO), University of Saskatchewan, Canada
- School of Public Health, University of Saskatchewan, Canada
- Department of Computer Science, University of Saskatchewan, Canada
- Department of Computer Science, Lakehead University, Canada
- Department of Computer Science, Idaho State University, USA
- Bioinformatics Institute (BII), A*STAR, Singapore
| | - Markus Hoffmann
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
- Institute for Advanced Study, Technical University of Munich, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, Bethesda, MD 20892, USA
| | - Igor Jurisica
- Division of Orthopaedic Surgery, Schroeder Arthritis Institute, Toronto, Canada
- Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, Toronto, ON M5T 0S8, Canada
- Departments of Medical Biophysics and Computer Science, University of Toronto, Toronto, Canada
- Institute of Neuroimmunology, Slovak Academy of Sciences, Bratislava, Slovakia
| | - Max Kotlyar
- Division of Orthopaedic Surgery, Schroeder Arthritis Institute, Toronto, Canada
- Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, Toronto, ON M5T 0S8, Canada
| | - Olga Lazareva
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
- Junior Clinical Cooperation Unit Multiparametric methods for early detection of prostate cancer, German Cancer Research Center (DKFZ), Heidelberg, Germany
- European Molecular Biology Laboratory, Genome Biology Unit, 69117 Heidelberg, Germany
| | - Hagai Levi
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| | - Markus List
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Sebastian Lobentanzer
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Joseph Loscalzo
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | | | - Quirin Manz
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Julian Matschinske
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Miles Mee
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Mhaned Oubounyt
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Chiara Pastrello
- Division of Orthopaedic Surgery, Schroeder Arthritis Institute, Toronto, Canada
- Data Science Discovery Centre for Chronic Diseases, Krembil Research Institute, Toronto, ON M5T 0S8, Canada
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, 1650 Owens Street, San Francisco, 94158 California, USA
| | - Rudolf T Pillich
- Department of Medicine, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Julian M Poschenrieder
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Dexter Pratt
- Department of Medicine, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Nataša Pržulj
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
- Department of Computer Science, University College London, London WC1E 6BT, UK
- ICREA, Pg. Lluís Companys 23, 08010 Barcelona, Spain
| | - Sepideh Sadegh
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
- Department of Clinical Genetics, Odense University Hospital, Odense, Denmark
- Clinical Genome Center, Department of Clinical Research, University of Southern Denmark, Odense, Denmark
| | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Suryadipto Sarkar
- Department Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander University Erlangen-Nürnberg (FAU), 91052 Erlangen, Germany
| | - Gideon Shaked
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| | - Ron Shamir
- Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel
| | - Nico Trummer
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Ugur Turhan
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Rui-Sheng Wang
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Olga Zolotareva
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Computational Biomedicine Lab, Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
4
|
Dingemans AJM, Jansen S, van Reeuwijk J, de Leeuw N, Pfundt R, Schuurs-Hoeijmakers J, van Bon BW, Marcelis C, Ockeloen CW, Willemsen M, van der Sluijs PJ, Santen GWE, Kooy RF, Vulto-van Silfhout AT, Kleefstra T, Koolen DA, Vissers LELM, de Vries BBA. Prevalence of comorbidities in individuals with neurodevelopmental disorders from the aggregated phenomics data of 51,227 pediatric individuals. Nat Med 2024:10.1038/s41591-024-03005-7. [PMID: 38745008 DOI: 10.1038/s41591-024-03005-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 04/16/2024] [Indexed: 05/16/2024]
Abstract
The prevalence of comorbidities in individuals with neurodevelopmental disorders (NDDs) is not well understood, yet these are important for accurate diagnosis and prognosis in routine care and for characterizing the clinical spectrum of NDD syndromes. We thus developed PhenomAD-NDD, an aggregated database containing the comorbid phenotypic data of 51,227 individuals with NDD, all harmonized into Human Phenotype Ontology (HPO), with in total 3,054 unique HPO terms. We demonstrate that almost all congenital anomalies are more prevalent in the NDD population than in the general population, and the NDD baseline prevalence allows for an approximation of the enrichment of symptoms. For example, such analyses of 33 genetic NDDs show that 32% of enriched phenotypes are currently not reported in the clinical synopsis in the Online Mendelian Inheritance in Man (OMIM). PhenomAD-NDD is open to all via a visualization online tool and allows us to determine the enrichment of symptoms in NDD.
Collapse
Affiliation(s)
- Alexander J M Dingemans
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Sandra Jansen
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Jeroen van Reeuwijk
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Nicole de Leeuw
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Rolph Pfundt
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Janneke Schuurs-Hoeijmakers
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Bregje W van Bon
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Carlo Marcelis
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Charlotte W Ockeloen
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Marjolein Willemsen
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | | | - Gijs W E Santen
- Department of Clinical Genetics, Leiden University Medical Center, Leiden, the Netherlands
| | - R Frank Kooy
- Department of Medical Genetics, University of Antwerp, Antwerp, Belgium
| | - Anneke T Vulto-van Silfhout
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Tjitske Kleefstra
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - David A Koolen
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Lisenka E L M Vissers
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Bert B A de Vries
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, the Netherlands.
| |
Collapse
|
5
|
Mohan S, McNulty S, Thaxton C, Elnagheeb M, Owens E, Flowers M, Nunnery T, Self A, Palus B, Gorokhova S, Kennedy A, Niu Z, Johari M, Maiga AB, Macalalad K, Clause AR, Beckmann JS, Bronicki L, Cooper ST, Ganesh VS, Kang PB, Kesari A, Lek M, Levy J, Rufibach L, Savarese M, Spencer MJ, Straub V, Tasca G, Weihl CC. Expert Panel Curation of 31 Genes in Relation to Limb Girdle Muscular Dystrophy. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.03.592369. [PMID: 38765987 PMCID: PMC11100593 DOI: 10.1101/2024.05.03.592369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Introduction Limb girdle muscular dystrophies (LGMDs) are a group of genetically heterogeneous autosomal conditions with some degree of phenotypic homogeneity. LGMD is defined as having onset >2 years of age with progressive proximal weakness, elevated serum creatine kinase levels and dystrophic features on muscle biopsy. Advances in massively parallel sequencing have led to a surge in genes linked to LGMD. Methods The ClinGen Muscular Dystrophies and Myopathies gene curation expert panel (MDM GCEP, formerly Limb Girdle Muscular Dystrophy GCEP) convened to evaluate the strength of evidence supporting gene-disease relationships (GDR) using the ClinGen gene-disease clinical validity framework to evaluate 31 genes implicated in LGMD. Results The GDR was exclusively LGMD for 17 genes, whereas an additional 14 genes were related to a broader phenotype encompassing congenital weakness. Four genes (CAPN3, COL6A1, COL6A2, COL6A3) were split into two separate disease entities, based on each displaying both dominant and recessive inheritance patterns, resulting in curation of 35 GDRs. Of these, 30 (86%) were classified as Definitive, 4 (11%) as Moderate and 1 (3%) as Limited. Two genes, POMGNT1 and DAG1, though definitively related to myopathy, currently have insufficient evidence to support a relationship specifically with LGMD. Conclusions The expert-reviewed assertions on the clinical validity of genes implicated in LGMDs form an invaluable resource for clinicians and molecular geneticists. We encourage the global neuromuscular community to publish case-level data that help clarify disputed or novel LGMD associations.
Collapse
Affiliation(s)
- Shruthi Mohan
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Shannon McNulty
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Courtney Thaxton
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Marwa Elnagheeb
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Emma Owens
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - May Flowers
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Teagan Nunnery
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Autumn Self
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Brooke Palus
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Svetlana Gorokhova
- Aix Marseille Univ, INSERM, MMG, U 1251, Marseille, France
- Department of Medical Genetics, Timone Children's Hospital, APHM, Marseille, France
| | | | - Zhiyv Niu
- Department of Laboratory Medicine and Pathology, Mayo Clinic
| | - Mridul Johari
- Harry Perkins Institute of Medical Research, Centre for Medical Research, University of Western Australia, Nedlands, WA, Australia
- Folkhälsan Research Center, Department of Medical and Clinical Genetics, Medicum, University of Helsinki, Finland
| | | | | | | | | | - Lucas Bronicki
- Children's Hospital of Eastern Ontario, Ottawa, Ontario, Canada
| | - Sandra T Cooper
- Kids Neuroscience Centre, Children's Hospital at Westmead; School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney; Functional Neuromics, Children's Medical Research Institute, Westmead, NSW, Australia
| | - Vijay S Ganesh
- Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, MA
- Department of Neurology, Brigham and Women's Hospital, Boston, MA
| | - Peter B Kang
- Greg Marzolf Jr. Muscular Dystrophy Center and Department of Neurology, University of Minnesota, Minneapolis, MN, USA
| | | | - Monkol Lek
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
| | | | | | - Marco Savarese
- Folkhälsan Research Center, Department of Medical and Clinical Genetics, Medicum, University of Helsinki, Finland
| | | | - Volker Straub
- John Walton Muscular Dystrophy Research Centre, Newcastle University and Newcastle Hospitals NHS Foundation Trusts, Newcastle Upon Tyne, UK
| | - Giorgio Tasca
- John Walton Muscular Dystrophy Research Centre, Newcastle University and Newcastle Hospitals NHS Foundation Trusts, Newcastle Upon Tyne, UK
| | | |
Collapse
|
6
|
Claussnitzer M, Parikh VN, Wagner AH, Arbesfeld JA, Bult CJ, Firth HV, Muffley LA, Nguyen Ba AN, Riehle K, Roth FP, Tabet D, Bolognesi B, Glazer AM, Rubin AF. Minimum information and guidelines for reporting a multiplexed assay of variant effect. Genome Biol 2024; 25:100. [PMID: 38641812 PMCID: PMC11027375 DOI: 10.1186/s13059-024-03223-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 03/25/2024] [Indexed: 04/21/2024] Open
Abstract
Multiplexed assays of variant effect (MAVEs) have emerged as a powerful approach for interrogating thousands of genetic variants in a single experiment. The flexibility and widespread adoption of these techniques across diverse disciplines have led to a heterogeneous mix of data formats and descriptions, which complicates the downstream use of the resulting datasets. To address these issues and promote reproducibility and reuse of MAVE data, we define a set of minimum information standards for MAVE data and metadata and outline a controlled vocabulary aligned with established biomedical ontologies for describing these experimental designs.
Collapse
Affiliation(s)
- Melina Claussnitzer
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Cambridge, MA, 02142, USA
| | - Victoria N Parikh
- Stanford Center for Inherited Cardiovascular Disease, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Alex H Wagner
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, 43215, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, 43210, USA
| | - Jeremy A Arbesfeld
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, 43215, USA
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Carol J Bult
- The Jackson Laboratory, Bar Harbor, ME, 04609, USA
| | - Helen V Firth
- Wellcome Sanger Institute, Hinxton, Cambridge, UK
- Dept of Medical Genetics, Cambridge University Hospitals NHS Trust, Cambridge, UK
| | - Lara A Muffley
- Department of Genome Sciences, University of Washington, Seattle, WA, 98105, USA
| | - Alex N Nguyen Ba
- Department of Biology, University of Toronto at Mississauga, Mississauga, ON, Canada
| | - Kevin Riehle
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Frederick P Roth
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Daniel Tabet
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Benedetta Bolognesi
- Institute for Bioengineering of Catalunya (IBEC), The Barcelona Institute of Science and Technology, Barcelona, Spain.
| | - Andrew M Glazer
- Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
| | - Alan F Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia.
- Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia.
| |
Collapse
|
7
|
Callahan TJ, Tripodi IJ, Stefanski AL, Cappelletti L, Taneja SB, Wyrwa JM, Casiraghi E, Matentzoglu NA, Reese J, Silverstein JC, Hoyt CT, Boyce RD, Malec SA, Unni DR, Joachimiak MP, Robinson PN, Mungall CJ, Cavalleri E, Fontana T, Valentini G, Mesiti M, Gillenwater LA, Santangelo B, Vasilevsky NA, Hoehndorf R, Bennett TD, Ryan PB, Hripcsak G, Kahn MG, Bada M, Baumgartner WA, Hunter LE. An open source knowledge graph ecosystem for the life sciences. Sci Data 2024; 11:363. [PMID: 38605048 PMCID: PMC11009265 DOI: 10.1038/s41597-024-03171-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 03/21/2024] [Indexed: 04/13/2024] Open
Abstract
Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA.
| | - Ignacio J Tripodi
- Computer Science Department, Interdisciplinary Quantitative Biology, University of Colorado Boulder, Boulder, CO, 80301, USA
| | - Adrianne L Stefanski
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Luca Cappelletti
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Jordan M Wyrwa
- Department of Physical Medicine and Rehabilitation, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Elena Casiraghi
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Jonathan C Silverstein
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA
| | - Charles Tapley Hoyt
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
| | - Richard D Boyce
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA
| | - Scott A Malec
- Division of Translational Informatics, University of New Mexico School of Medicine, Albuquerque, NM, 87131, USA
| | - Deepak R Unni
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Marcin P Joachimiak
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Peter N Robinson
- Berlin Institute of Health at Charité-Universitatsmedizin, 10117, Berlin, Germany
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Emanuele Cavalleri
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Tommaso Fontana
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Giorgio Valentini
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
- ELLIS, European Laboratory for Learning and Intelligent Systems, Milan Unit, Italy
| | - Marco Mesiti
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Lucas A Gillenwater
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Brook Santangelo
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Nicole A Vasilevsky
- Data Collaboration Center, Critical Path Institute, 1840 E River Rd. Suite 100, Tucson, AZ, 85718, USA
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Tellen D Bennett
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Patrick B Ryan
- Janssen Research and Development, Raritan, NJ, 08869, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Michael G Kahn
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Michael Bada
- Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - William A Baumgartner
- Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| |
Collapse
|
8
|
Kilicoglu H, Ensan F, McInnes B, Wang LL. Semantics-enabled biomedical literature analytics. J Biomed Inform 2024; 150:104588. [PMID: 38244957 DOI: 10.1016/j.jbi.2024.104588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 01/10/2024] [Indexed: 01/22/2024]
Affiliation(s)
- Halil Kilicoglu
- School of Information Sciences, University of Illinois Urbana Champaign, Champaign, IL, USA.
| | - Faezeh Ensan
- Department of Electrical, Computer, and Biomedical Engineering, Toronto Metropolitan University, Toronto, ON, Canada.
| | - Bridget McInnes
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| | - Lucy Lu Wang
- Information School, University of Washington, Seattle, WA, USA.
| |
Collapse
|
9
|
Wang X, Li H, Luo H, Zou Y, Li H, Qin Y, Song J. Evaluating ClinGen variant curation expert panels' application of PVS1 code. Eur J Med Genet 2024; 67:104909. [PMID: 38199457 DOI: 10.1016/j.ejmg.2024.104909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 08/02/2023] [Accepted: 01/07/2024] [Indexed: 01/12/2024]
Abstract
BACKGROUND The 2015 American College of Medical Genetics and Genomics (ACMG) and Association for Molecular Pathology (AMP) guidelines articulates that the effects of certain types of variants on gene function can often be seen as a complete absence of the gene product by leading to a lack of transcription or nonsense-mediated decay(NMD). However, detailed information considering different types of loss of function(LOF) variants, refined steps assimilating details concerning location of variant, changes in strength levels, NMD boundary, or any additional information pointing to a true null effect, were all left to expert judgement. As part of its Clinical Genome Resource (ClinGen) initiative, Variant Curation Expert Panels (VCEPs) are designated to make gene/disease-centric specifications in accordance with the ACMG/AMP guidelines, including a more detailed definition of what constitutes an appropriate LOF evidence. Our goal was to evaluate the current LOF guidelines developed by the VCEPs and analyse the prior curated variants concerning the PVS1 criteria, bringing people occupied in genetic data analysis a comprehensive understanding of this code. METHODS Our study evaluated 7 VCEPs for their LOF criteria (PVS1). Subsequently, we assessed the predictive criteria by considering the underlying disease mechanism, protein transcript, and variant types delineated. Then, we meticulously curated the LOF evidence referenced by each VCEP in their preliminary variant classification, thereby scrutinizing the recommendations put forth by VCEPs and their application in the interpretation of the aforementioned predictive criteria. Based on these, an extensive curation of evidence summary considering PVS1 applied by VCEPs according to their classification of pilot variants for the purpose of analyzing VCEP criteria specifications and their use in the understanding of LOF was conducted. RESULTS We observed in this article that the VCEPs discussed followed the majority of Sequence Variant Interpretation (SVI) recommendations concerning the application of this LOF criteria, except for some disease/gene specific considerations. We highlighted the wide range of PVS1 strength levels approved by VCEP, reflecting the diversity of evidence for each variants type. In addition, we observed substantial differences in the approach used to determine relative strengths for different types of null variants and in the attitude towards these principles concerning variant location, NMD and influence to protein function between VCEPs. CONCLUSIONS It is difficult to understand the intricacies of the predictive data(PVS1), which often requires expert-level knowledge of disease/gene. The VCEP criteria specifications for the predictive evidence play an important role in making it more accessible for the curators to apply the predictive data by providing details concerning this complex criteria. Despite this, we believe there is a need for more guidance on standardizing this process and ensuring consistency in the application of this predictive evidence.
Collapse
Affiliation(s)
- Xiaoyan Wang
- Medical Genetics Center, Maternal and Child Health Hospital of Hubei Province, Wuhan, Hubei, China
| | - Haibo Li
- The Central Laboratory of Birth Defects Prevention and Control, Ningbo Women and Children's Hospital, 339 Liuting St, Ningbo City, Zhejiang Province, China
| | - Haiyan Luo
- Department of Medical Genetics, Jiangxi Maternal and Child Health Hospital, Nanchang, China
| | - Yongyi Zou
- Department of Medical Genetics, Jiangxi Maternal and Child Health Hospital, Nanchang, China
| | - Haoxian Li
- Center of Medical Genetics, Jiangmen Maternity and Child Health Care Hospital, Jiangmen, Guangdong, China
| | - Yayun Qin
- Medical Genetics Center, Maternal and Child Health Hospital of Hubei Province, Wuhan, Hubei, China
| | - Jieping Song
- Medical Genetics Center, Maternal and Child Health Hospital of Hubei Province, Wuhan, Hubei, China.
| |
Collapse
|
10
|
Balachandran S, Prada-Medina CA, Mensah MA, Kakar N, Nagel I, Pozojevic J, Audain E, Hitz MP, Kircher M, Sreenivasan VKA, Spielmann M. STIGMA: Single-cell tissue-specific gene prioritization using machine learning. Am J Hum Genet 2024; 111:338-349. [PMID: 38228144 PMCID: PMC10870135 DOI: 10.1016/j.ajhg.2023.12.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 12/01/2023] [Accepted: 12/07/2023] [Indexed: 01/18/2024] Open
Abstract
Clinical exome and genome sequencing have revolutionized the understanding of human disease genetics. Yet many genes remain functionally uncharacterized, complicating the establishment of causal disease links for genetic variants. While several scoring methods have been devised to prioritize these candidate genes, these methods fall short of capturing the expression heterogeneity across cell subpopulations within tissues. Here, we introduce single-cell tissue-specific gene prioritization using machine learning (STIGMA), an approach that leverages single-cell RNA-seq (scRNA-seq) data to prioritize candidate genes associated with rare congenital diseases. STIGMA prioritizes genes by learning the temporal dynamics of gene expression across cell types during healthy organogenesis. To assess the efficacy of our framework, we applied STIGMA to mouse limb and human fetal heart scRNA-seq datasets. In a cohort of individuals with congenital limb malformation, STIGMA prioritized 469 variants in 345 genes, with UBA2 as a notable example. For congenital heart defects, we detected 34 genes harboring nonsynonymous de novo variants (nsDNVs) in two or more individuals from a set of 7,958 individuals, including the ortholog of Prdm1, which is associated with hypoplastic left ventricle and hypoplastic aortic arch. Overall, our findings demonstrate that STIGMA effectively prioritizes tissue-specific candidate genes by utilizing single-cell transcriptome data. The ability to capture the heterogeneity of gene expression across cell populations makes STIGMA a powerful tool for the discovery of disease-associated genes and facilitates the identification of causal variants underlying human genetic disorders.
Collapse
Affiliation(s)
- Saranya Balachandran
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Cesar A Prada-Medina
- Human Molecular Genetics Group, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Martin A Mensah
- Institut für Medizinische Genetik und Humangenetik, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Augustenburger Platz 1, 13353 Berlin, Germany; BIH Charité Digital Clinician Scientist Program, BIH Biomedical Innovation Academy, Anna-Louisa-Karsch-Strasse 2, 10178 Berlin, Germany; RG Development & Disease, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Naseebullah Kakar
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany; Department of Biotechnology, BUITEMS, Quetta, Pakistan
| | - Inga Nagel
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Jelena Pozojevic
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Enrique Audain
- Institute of Medical Genetics, Carl von Ossietzky University, 26129 Oldenburg, Germany; DZHK e.V. (German Center for Cardiovascular Research), Partner Site Hamburg/Kiel/Lübeck; Department of Congenital Heart Disease and Pediatric Cardiology, University Hospital of Schleswig-Holstein, 24105 Kiel, Germany
| | - Marc-Phillip Hitz
- Institute of Medical Genetics, Carl von Ossietzky University, 26129 Oldenburg, Germany; DZHK e.V. (German Center for Cardiovascular Research), Partner Site Hamburg/Kiel/Lübeck; Department of Congenital Heart Disease and Pediatric Cardiology, University Hospital of Schleswig-Holstein, 24105 Kiel, Germany
| | - Martin Kircher
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany
| | - Varun K A Sreenivasan
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany.
| | - Malte Spielmann
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Germany; Human Molecular Genetics Group, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany; DZHK e.V. (German Center for Cardiovascular Research), Partner Site Hamburg/Kiel/Lübeck.
| |
Collapse
|
11
|
Putman TE, Schaper K, Matentzoglu N, Rubinetti V, Alquaddoomi F, Cox C, Caufield JH, Elsarboukh G, Gehrke S, Hegde H, Reese J, Braun I, Bruskiewich R, Cappelletti L, Carbon S, Caron A, Chan L, Chute C, Cortes K, De Souza V, Fontana T, Harris N, Hartley E, Hurwitz E, Jacobsen JB, Krishnamurthy M, Laraway B, McLaughlin J, McMurry J, Moxon ST, Mullen K, O’Neil S, Shefchek K, Stefancsik R, Toro S, Vasilevsky N, Walls R, Whetzel P, Osumi-Sutherland D, Smedley D, Robinson P, Mungall C, Haendel M, Munoz-Torres M. The Monarch Initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species. Nucleic Acids Res 2024; 52:D938-D949. [PMID: 38000386 PMCID: PMC10767791 DOI: 10.1093/nar/gkad1082] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 10/21/2023] [Accepted: 11/02/2023] [Indexed: 11/26/2023] Open
Abstract
Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research. The Monarch App is an integrated platform combining data about genes, phenotypes, and diseases across species. Monarch's APIs enable access to carefully curated datasets and advanced analysis tools that support the understanding and diagnosis of disease for diverse applications such as variant prioritization, deep phenotyping, and patient profile-matching. We have migrated our system into a scalable, cloud-based infrastructure; simplified Monarch's data ingestion and knowledge graph integration systems; enhanced data mapping and integration standards; and developed a new user interface with novel search and graph navigation features. Furthermore, we advanced Monarch's analytic tools by developing a customized plugin for OpenAI's ChatGPT to increase the reliability of its responses about phenotypic data, allowing us to interrogate the knowledge in the Monarch graph using state-of-the-art Large Language Models. The resources of the Monarch Initiative can be found at monarchinitiative.org and its corresponding code repository at github.com/monarch-initiative/monarch-app.
Collapse
Affiliation(s)
- Tim E Putman
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Kevin Schaper
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | - Vincent P Rubinetti
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Faisal S Alquaddoomi
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Corey Cox
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - J Harry Caufield
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Glass Elsarboukh
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Sarah Gehrke
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Harshad Hegde
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Justin T Reese
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Ian Braun
- Data Collaboration Center, Critical Path Institute, Tucson, AZ 85718, USA
| | | | | | - Seth Carbon
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Anita R Caron
- European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Lauren E Chan
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, USA
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Katherina G Cortes
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | - Tommaso Fontana
- Dipartimento di Informatica, Università degli Studi di Milano Statale, Milano, Italy
| | - Nomi L Harris
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Emily L Hartley
- Data Collaboration Center, Critical Path Institute, Tucson, AZ 85718, USA
| | - Eric Hurwitz
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Julius O B Jacobsen
- William Harvey Research Institute, Queen Mary University of London, London EC1M 6BQ, UK
| | - Madan Krishnamurthy
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Bryan J Laraway
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | - Julie A McMurry
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Sierra A T Moxon
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Kathleen R Mullen
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Shawn T O’Neil
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Kent A Shefchek
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Sabrina Toro
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | - Ramona L Walls
- Data Collaboration Center, Critical Path Institute, Tucson, AZ 85718, USA
| | - Patricia L Whetzel
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, London EC1M 6BQ, UK
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 6032, USA
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Melissa A Haendel
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Monica C Munoz-Torres
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| |
Collapse
|
12
|
Cooper L, Elser J, Laporte MA, Arnaud E, Jaiswal P. Planteome 2024 Update: Reference Ontologies and Knowledgebase for Plant Biology. Nucleic Acids Res 2024; 52:D1548-D1555. [PMID: 38055832 PMCID: PMC10767901 DOI: 10.1093/nar/gkad1028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 10/14/2023] [Accepted: 10/23/2023] [Indexed: 12/08/2023] Open
Abstract
The Planteome project (https://planteome.org/) provides a suite of reference and crop-specific ontologies and an integrated knowledgebase of plant genomics data. The plant genomics data in the Planteome has been obtained through manual and automated curation and sourced from more than 40 partner databases and resources. Here, we report on updates to the Planteome reference ontologies, namely, the Plant Ontology (PO), Trait Ontology (TO), the Plant Experimental Conditions Ontology (PECO), and integration of species/crop-specific vocabularies from our partners, the Crop Ontology (CO) into the TO ontology graph. Currently, 11 CO vocabularies are integrated into the Planteome with the addition of yam, sorghum, and potato since 2018. In addition, the size of the annotation database has increased by 34%, and the number of bioentities (genes, proteins, etc.) from 125 plant taxa has increased by 72%. We developed new tools to facilitate user requests and improvements to the CO vocabularies, and to allow fast searching and browsing of PO terms and definitions. These enhancements and future changes to automate the TO-CO mappings and knowledge discovery tools ensure that the Planteome will continue to be a valuable resource for plant biology.
Collapse
Affiliation(s)
- Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Justin Elser
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | | | - Elizabeth Arnaud
- Digital Inclusion, Biodiversity International, 34397 Montpellier, France
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| |
Collapse
|
13
|
Mancuso CA, Johnson KA, Liu R, Krishnan A. Joint representation of molecular networks from multiple species improves gene classification. PLoS Comput Biol 2024; 20:e1011773. [PMID: 38198480 PMCID: PMC10805316 DOI: 10.1371/journal.pcbi.1011773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 01/23/2024] [Accepted: 12/20/2023] [Indexed: 01/12/2024] Open
Abstract
Network-based machine learning (ML) has the potential for predicting novel genes associated with nearly any health and disease context. However, this approach often uses network information from only the single species under consideration even though networks for most species are noisy and incomplete. While some recent methods have begun addressing this shortcoming by using networks from more than one species, they lack one or more key desirable properties: handling networks from more than two species simultaneously, incorporating many-to-many orthology information, or generating a network representation that is reusable across different types of and newly-defined prediction tasks. Here, we present GenePlexusZoo, a framework that casts molecular networks from multiple species into a single reusable feature space for network-based ML. We demonstrate that this multi-species network representation improves both gene classification within a single species and knowledge-transfer across species, even in cases where the inter-species correspondence is undetectable based on shared orthologous genes. Thus, GenePlexusZoo enables effectively leveraging the high evolutionary molecular, functional, and phenotypic conservation across species to discover novel genes associated with diverse biological contexts.
Collapse
Affiliation(s)
- Christopher A. Mancuso
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| | - Kayla A. Johnson
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, Michigan, United States of America
| | - Renming Liu
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, Michigan, United States of America
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, Michigan, United States of America
| |
Collapse
|
14
|
Fernando PC, Mabee PM, Zeng E. Protein-protein interaction network module changes associated with the vertebrate fin-to-limb transition. Sci Rep 2023; 13:22594. [PMID: 38114646 PMCID: PMC10730527 DOI: 10.1038/s41598-023-50050-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 12/14/2023] [Indexed: 12/21/2023] Open
Abstract
Evolutionary phenotypic transitions, such as the fin-to-limb transition in vertebrates, result from modifications in related proteins and their interactions, often in response to changing environment. Identifying these alterations in protein networks is crucial for a more comprehensive understanding of these transitions. However, previous research has not attempted to compare protein-protein interaction (PPI) networks associated with evolutionary transitions, and most experimental studies concentrate on a limited set of proteins. Therefore, the goal of this work was to develop a network-based platform for investigating the fin-to-limb transition using PPI networks. Quality-enhanced protein networks, constructed by integrating PPI networks with anatomy ontology data, were leveraged to compare protein modules for paired fins (pectoral fin and pelvic fin) of fishes (zebrafish) to those of the paired limbs (forelimb and hindlimb) of mammals (mouse). This also included prediction of novel protein candidates and their validation by enrichment and homology analyses. Hub proteins such as shh and bmp4, which are crucial for module stability, were identified, and their changing roles throughout the transition were examined. Proteins with preserved roles during the fin-to-limb transition were more likely to be hub proteins. This study also addressed hypotheses regarding the role of non-preserved proteins associated with the transition.
Collapse
Affiliation(s)
- Pasan C Fernando
- Department of Plant Sciences, University of Colombo, Colombo, Sri Lanka.
| | - Paula M Mabee
- Department of Biology, University of South Dakota, Vermillion, SD, USA
- National Ecological Observatory Network, Battelle, 1625 38th St. #100, Boulder, CO, 80301, USA
| | - Erliang Zeng
- Departments of Preventive & Community Dentistry, College of Dentistry, University of Iowa, Iowa City, IA, USA.
- Division of Biostatistics and Computational Biology, College of Dentistry, University of Iowa, Iowa City, IA, USA.
- Departments of Biostatistics, College of Public Health, University of Iowa, Iowa City, IA, USA.
- Departments of Biomedical Engineering, College of Engineering, University of Iowa, Iowa City, IA, USA.
| |
Collapse
|
15
|
Seaby EG, Thomas NS, Hunt D, Baralle D, Rehm HL, O’Donnell-Luria A, Ennis S. A Panel-Agnostic Strategy 'HiPPo' Improves Diagnostic Efficiency in the UK Genomic Medicine Service. Healthcare (Basel) 2023; 11:3179. [PMID: 38132069 PMCID: PMC10742528 DOI: 10.3390/healthcare11243179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Revised: 12/12/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023] Open
Abstract
Genome sequencing is available as a clinical test in the UK through the Genomic Medicine Service (GMS). The GMS analytical strategy predominantly filters genome data on preselected gene panels. Whilst this reduces variants requiring assessment by reporting laboratories, pathogenic variants outside applied panels may be missed, and variants in genes without established disease-gene relationships are largely ignored. This study compares the analysis of a research exome to a GMS clinical genome for the same patients. For the research exome, we applied a panel-agnostic approach filtering for variants with High Pathogenic Potential (HiPPo) using ClinVar, allele frequency, and in silico prediction tools. We then restricted HiPPo variants to Gene Curation Coalition (GenCC) disease genes. These results were compared with the GMS genome panel-based approach. Twenty-four participants from eight families underwent parallel research exome and GMS genome sequencing. Exome HiPPo analysis identified a similar number of variants as the GMS panel-based approach. GMS genome analysis returned two pathogenic variants and one de novo variant. Exome HiPPo analysis returned the same variants plus an additional pathogenic variant and three further de novo variants in novel genes, where case series are underway. When HiPPo was restricted to GenCC disease genes, statistically fewer variants required assessment to identify more pathogenic variants than reported by the GMS, giving a diagnostic rate per variant assessed of 20% for HiPPo versus 3% for the GMS. With UK plans to sequence 5 million genomes, strategies are needed to optimise genome analysis beyond gene panels whilst minimising the burden of variants requiring clinical assessment.
Collapse
Affiliation(s)
- Eleanor G. Seaby
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton SO16 6YD, Hampshire, UK; (D.H.); (D.B.); (S.E.)
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; (H.L.R.); (A.O.-L.)
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115, USA
- Paediatric Infectious Diseases, Imperial College London, London W2 1NY, UK
| | - N. Simon Thomas
- Wessex Regional Genomics Laboratory, Salisbury NHS Foundation Trust, Salisbury SP2 8BJ, UK;
| | - David Hunt
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton SO16 6YD, Hampshire, UK; (D.H.); (D.B.); (S.E.)
| | - Diana Baralle
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton SO16 6YD, Hampshire, UK; (D.H.); (D.B.); (S.E.)
| | - Heidi L. Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; (H.L.R.); (A.O.-L.)
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Anne O’Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; (H.L.R.); (A.O.-L.)
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Sarah Ennis
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton SO16 6YD, Hampshire, UK; (D.H.); (D.B.); (S.E.)
| |
Collapse
|
16
|
Carmody LC, Gargano MA, Toro S, Vasilevsky NA, Adam MP, Blau H, Chan LE, Gomez-Andres D, Horvath R, Kraus ML, Ladewig MS, Lewis-Smith D, Lochmüller H, Matentzoglu NA, Munoz-Torres MC, Schuetz C, Seitz B, Similuk MN, Sparks TN, Strauss T, Swietlik EM, Thompson R, Zhang XA, Mungall CJ, Haendel MA, Robinson PN. The Medical Action Ontology: A tool for annotating and analyzing treatments and clinical management of human disease. MED 2023; 4:913-927.e3. [PMID: 37963467 PMCID: PMC10842845 DOI: 10.1016/j.medj.2023.10.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 08/31/2023] [Accepted: 10/14/2023] [Indexed: 11/16/2023]
Abstract
BACKGROUND Navigating the clinical literature to determine the optimal clinical management for rare diseases presents significant challenges. We introduce the Medical Action Ontology (MAxO), an ontology specifically designed to organize medical procedures, therapies, and interventions. METHODS MAxO incorporates logical structures that link MAxO terms to numerous other ontologies within the OBO Foundry. Term development involves a blend of manual and semi-automated processes. Additionally, we have generated annotations detailing diagnostic modalities for specific phenotypic abnormalities defined by the Human Phenotype Ontology (HPO). We introduce a web application, POET, that facilitates MAxO annotations for specific medical actions for diseases using the Mondo Disease Ontology. FINDINGS MAxO encompasses 1,757 terms spanning a wide range of biomedical domains, from human anatomy and investigations to the chemical and protein entities involved in biological processes. These terms annotate phenotypic features associated with specific disease (using HPO and Mondo). Presently, there are over 16,000 MAxO diagnostic annotations that target HPO terms. Through POET, we have created 413 MAxO annotations specifying treatments for 189 rare diseases. CONCLUSIONS MAxO offers a computational representation of treatments and other actions taken for the clinical management of patients. Its development is closely coupled to Mondo and HPO, broadening the scope of our computational modeling of diseases and phenotypic features. We invite the community to contribute disease annotations using POET (https://poet.jax.org/). MAxO is available under the open-source CC-BY 4.0 license (https://github.com/monarch-initiative/MAxO). FUNDING NHGRI 1U24HG011449-01A1 and NHGRI 5RM1HG010860-04.
Collapse
Affiliation(s)
- Leigh C Carmody
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Sabrina Toro
- University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | | - Margaret P Adam
- University of Washington School of Medicine, Seattle, WA, USA
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - David Gomez-Andres
- Pediatric Neurology, Vall d'Hebron Institut de Recerca (VHIR), Hospital Universitari Vall d'Hebron, Vall d'Hebron Barcelona Hospital Campus, Passeig Vall d'Hebron 119-129, 08035 Barcelona, Spain
| | - Rita Horvath
- Department of Clinical Neurosciences, University of Cambridge, Robinson Way, Cambridge CB2 0PY, UK
| | - Megan L Kraus
- University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Markus S Ladewig
- Department of Ophthalmology, Klinikum Saarbrücken, Saarbrücken, Germany
| | - David Lewis-Smith
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Hanns Lochmüller
- Children's Hospital of Eastern Ontario Research Institute, Ottowa, Canada; Division of Neurology, Department of Medicine, The Ottawa Hospital, Ottawa, Canada; Brain and Mind Research Institute, University of Ottawa, Ottawa, Canada; Department of Neuropediatrics and Muscle Disorders, Medical Center - University of Freiburg, Faculty of Medicine, Freiburg, Germany; Centro Nacional de Análisis Genómico, Barcelona, Spain
| | | | | | - Catharina Schuetz
- Department of Pediatrics, Medizinische Fakultät Carl Gustav Carus, Technische Universität Dresden, 01307 Dresden, Germany
| | - Berthold Seitz
- Department of Ophthalmology, Saarland University Medical Center UKS, Homburg, Saar, Germany
| | - Morgan N Similuk
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Teresa N Sparks
- Department of Obstetrics, Gynecology, & Reproductive Sciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Timmy Strauss
- Department of Pediatrics, Medizinische Fakultät Carl Gustav Carus, Technische Universität Dresden, 01307 Dresden, Germany
| | - Emilia M Swietlik
- Department of Medicine, University of Cambridge, Heart and Lung Research Institute, Cambridge CB2 0BB, UK
| | - Rachel Thompson
- Children's Hospital of Eastern Ontario Research Institute, Ottowa, Canada
| | | | | | | | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
| |
Collapse
|
17
|
Mah N, Kurtz A, Fuhr A, Seltmann S, Chen Y, Bultjer N, Dewender J, Lual A, Steeg R, Mueller SC. The Management of Data for the Banking, Qualification, and Distribution of Induced Pluripotent Stem Cells: Lessons Learned from the European Bank for Induced Pluripotent Stem Cells. Cells 2023; 12:2756. [PMID: 38067184 PMCID: PMC10705942 DOI: 10.3390/cells12232756] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 11/17/2023] [Accepted: 11/20/2023] [Indexed: 12/18/2023] Open
Abstract
The European Bank for induced pluripotent Stem Cells (EBiSC) was established in 2014 as a non-profit project for the banking, quality control, and distribution of human iPSC lines for research around the world. EBiSC iPSCs are deposited from diverse laboratories internationally and, hence, a key activity for EBiSC is standardising not only the iPSC lines themselves but also the data associated with them. This includes enabling unique nomenclature for the cells, as well as applying uniformity to the data provided by the cell line generator versus quality control data generated by EBiSC, and providing mechanisms to share personal data in a secure and GDPR-compliant manner. A joint approach implemented by EBiSC and the human pluripotent stem cell registry (hPSCreg®) has provided a solution that enabled hPSCreg® to improve its registration platform for iPSCs and EBiSC to have a pipeline for the import, standardisation, storage, and management of data associated with EBiSC iPSCs. In this work, we describe the experience of cell line data management for iPSC banking throughout the course of EBiSC's development as a central European banking infrastructure and present a model for how this could be implemented by other iPSC repositories to increase the FAIRness of iPSC research globally.
Collapse
Affiliation(s)
- Nancy Mah
- Fraunhofer-Institute für Biomedizinische Technik (IBMT), Joseph-von-Fraunhofer Weg 1, 66280 Sulzbach, Germany; (N.M.)
| | - Andreas Kurtz
- Fraunhofer-Institute für Biomedizinische Technik (IBMT), Joseph-von-Fraunhofer Weg 1, 66280 Sulzbach, Germany; (N.M.)
- Berlin Institute of Health Center for Regenerative Therapies, Augustenburger Platz 1, 13353 Berlin, Germany
| | - Antonie Fuhr
- Fraunhofer-Institute für Biomedizinische Technik (IBMT), Joseph-von-Fraunhofer Weg 1, 66280 Sulzbach, Germany; (N.M.)
| | - Stefanie Seltmann
- Fraunhofer-Institute für Biomedizinische Technik (IBMT), Joseph-von-Fraunhofer Weg 1, 66280 Sulzbach, Germany; (N.M.)
| | - Ying Chen
- Fraunhofer-Institute für Biomedizinische Technik (IBMT), Joseph-von-Fraunhofer Weg 1, 66280 Sulzbach, Germany; (N.M.)
| | - Nils Bultjer
- Fraunhofer-Institute für Biomedizinische Technik (IBMT), Joseph-von-Fraunhofer Weg 1, 66280 Sulzbach, Germany; (N.M.)
| | - Johannes Dewender
- Fraunhofer-Institute für Biomedizinische Technik (IBMT), Joseph-von-Fraunhofer Weg 1, 66280 Sulzbach, Germany; (N.M.)
| | - Ayuen Lual
- European Collection of Authenticated Cell Cultures (ECACC), UK Health Security Agency, Porton Down, Salisbury SP4 0JG, UK;
| | - Rachel Steeg
- Fraunhofer UK Research Ltd., Technology and Innovation Centre, 99 George St., Glasgow G1 1RD, UK
| | - Sabine C. Mueller
- Fraunhofer-Institute für Biomedizinische Technik (IBMT), Joseph-von-Fraunhofer Weg 1, 66280 Sulzbach, Germany; (N.M.)
| |
Collapse
|
18
|
Adamowicz K, Arend L, Maier A, Schmidt JR, Kuster B, Tsoy O, Zolotareva O, Baumbach J, Laske T. Proteomic meta-study harmonization, mechanotyping and drug repurposing candidate prediction with ProHarMeD. NPJ Syst Biol Appl 2023; 9:49. [PMID: 37816770 PMCID: PMC10564802 DOI: 10.1038/s41540-023-00311-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 09/25/2023] [Indexed: 10/12/2023] Open
Abstract
Proteomics technologies, which include a diverse range of approaches such as mass spectrometry-based, array-based, and others, are key technologies for the identification of biomarkers and disease mechanisms, referred to as mechanotyping. Despite over 15,000 published studies in 2022 alone, leveraging publicly available proteomics data for biomarker identification, mechanotyping and drug target identification is not readily possible. Proteomic data addressing similar biological/biomedical questions are made available by multiple research groups in different locations using different model organisms. Furthermore, not only various organisms are employed but different assay systems, such as in vitro and in vivo systems, are used. Finally, even though proteomics data are deposited in public databases, such as ProteomeXchange, they are provided at different levels of detail. Thus, data integration is hampered by non-harmonized usage of identifiers when reviewing the literature or performing meta-analyses to consolidate existing publications into a joint picture. To address this problem, we present ProHarMeD, a tool for harmonizing and comparing proteomics data gathered in multiple studies and for the extraction of disease mechanisms and putative drug repurposing candidates. It is available as a website, Python library and R package. ProHarMeD facilitates ID and name conversions between protein and gene levels, or organisms via ortholog mapping, and provides detailed logs on the loss and gain of IDs after each step. The web tool further determines IDs shared by different studies, proposes potential disease mechanisms as well as drug repurposing candidates automatically, and visualizes these results interactively. We apply ProHarMeD to a set of four studies on bone regeneration. First, we demonstrate the benefit of ID harmonization which increases the number of shared genes between studies by 50%. Second, we identify a potential disease mechanism, with five corresponding drug targets, and the top 20 putative drug repurposing candidates, of which Fondaparinux, the candidate with the highest score, and multiple others are known to have an impact on bone regeneration. Hence, ProHarMeD allows users to harmonize multi-centric proteomics research data in meta-analyses, evaluates the success of the ID conversions and remappings, and finally, it closes the gaps between proteomics, disease mechanism mining and drug repurposing. It is publicly available at https://apps.cosy.bio/proharmed/ .
Collapse
Affiliation(s)
- Klaudia Adamowicz
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, 22607, Germany
| | - Lis Arend
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, 22607, Germany
| | - Andreas Maier
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, 22607, Germany
| | - Johannes R Schmidt
- Department of Preclinical Development and Validation, Fraunhofer Institute for Cell Therapy and Immunology IZI, Leipzig, Germany
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technical University of Munich, Freising, Germany
| | - Olga Tsoy
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, 22607, Germany
| | - Olga Zolotareva
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, 22607, Germany
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, 22607, Germany
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, 5230, Denmark
| | - Tanja Laske
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, 22607, Germany.
| |
Collapse
|
19
|
Rodríguez-López M, Bordin N, Lees J, Scholes H, Hassan S, Saintain Q, Kamrad S, Orengo C, Bähler J. Broad functional profiling of fission yeast proteins using phenomics and machine learning. eLife 2023; 12:RP88229. [PMID: 37787768 PMCID: PMC10547477 DOI: 10.7554/elife.88229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023] Open
Abstract
Many proteins remain poorly characterized even in well-studied organisms, presenting a bottleneck for research. We applied phenomics and machine-learning approaches with Schizosaccharomyces pombe for broad cues on protein functions. We assayed colony-growth phenotypes to measure the fitness of deletion mutants for 3509 non-essential genes in 131 conditions with different nutrients, drugs, and stresses. These analyses exposed phenotypes for 3492 mutants, including 124 mutants of 'priority unstudied' proteins conserved in humans, providing varied functional clues. For example, over 900 proteins were newly implicated in the resistance to oxidative stress. Phenotype-correlation networks suggested roles for poorly characterized proteins through 'guilt by association' with known proteins. For complementary functional insights, we predicted Gene Ontology (GO) terms using machine learning methods exploiting protein-network and protein-homology data (NET-FF). We obtained 56,594 high-scoring GO predictions, of which 22,060 also featured high information content. Our phenotype-correlation data and NET-FF predictions showed a strong concordance with existing PomBase GO annotations and protein networks, with integrated analyses revealing 1675 novel GO predictions for 783 genes, including 47 predictions for 23 priority unstudied proteins. Experimental validation identified new proteins involved in cellular aging, showing that these predictions and phenomics data provide a rich resource to uncover new protein functions.
Collapse
Affiliation(s)
- María Rodríguez-López
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Nicola Bordin
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Jon Lees
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
- University of BristolBristolUnited Kingdom
| | - Harry Scholes
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Shaimaa Hassan
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
- Helwan University, Faculty of PharmacyCairoEgypt
| | - Quentin Saintain
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Stephan Kamrad
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Christine Orengo
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Jürg Bähler
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| |
Collapse
|
20
|
Feuer KL, Peng X, Yovo CK, Avramopoulos D. DPYSL2/CRMP2 isoform B knockout in human iPSC-derived glutamatergic neurons confirms its role in mTOR signaling and neurodevelopmental disorders. Mol Psychiatry 2023; 28:4353-4362. [PMID: 37479784 PMCID: PMC11138811 DOI: 10.1038/s41380-023-02186-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 07/05/2023] [Accepted: 07/07/2023] [Indexed: 07/23/2023]
Abstract
The DPYSL2/CRMP2 gene encodes a microtubule-stabilizing protein crucial for neurogenesis and is associated with numerous psychiatric and neurodegenerative disorders including schizophrenia, bipolar disorder, and Alzheimer's disease. DPYSL2 generates multiple RNA and protein isoforms, but few studies have differentiated between them. We previously reported an association of a functional variant in the DPYSL2-B isoform with schizophrenia (SCZ) and demonstrated in HEK293 cells that this variant reduced the length of cellular projections and created transcriptomic changes that captured schizophrenia etiology by disrupting mTOR signaling-mediated regulation. In the present study, we follow up on these results by creating, to our knowledge, the first models of endogenous DPYSL2-B knockout in human induced pluripotent stem cells (iPSCs) and neurons. CRISPR/Cas9-faciliated knockout of DPYSL2-B in iPSCs followed by Ngn2-induced differentiation to glutamatergic neurons showed a reduction in DPYSL2-B/CRMP2-B RNA and protein with no observable impact on DPYSL2-A/CRMP2-A. The average length of dendrites in knockout neurons was reduced up to 58% compared to controls. Transcriptome analysis revealed disruptions in pathways highly relevant to psychiatric disease including mTOR signaling, cytoskeletal dynamics, immune function, calcium signaling, and cholesterol biosynthesis. We also observed a significant enrichment of the differentially expressed genes in SCZ-associated loci from genome-wide association studies (GWAS). Our findings expand our previous results to neuronal cells, clarify the functions of the human DPYSL2-B isoform and confirm its involvement in molecular pathologies shared between many psychiatric diseases.
Collapse
Affiliation(s)
- Kyra L Feuer
- Predoctoral Training Program in Human Genetics, McKusick-Nathans Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Xi Peng
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Christian K Yovo
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Dimitrios Avramopoulos
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
21
|
McGowan E, Sanjak J, Mathé EA, Zhu Q. Integrative rare disease biomedical profile based network supporting drug repurposing or repositioning, a case study of glioblastoma. Orphanet J Rare Dis 2023; 18:301. [PMID: 37749605 PMCID: PMC10519087 DOI: 10.1186/s13023-023-02876-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 08/24/2023] [Indexed: 09/27/2023] Open
Abstract
BACKGROUND Glioblastoma (GBM) is the most aggressive and common malignant primary brain tumor; however, treatment remains a significant challenge. This study aims to identify drug repurposing or repositioning candidates for GBM by developing an integrative rare disease profile network containing heterogeneous types of biomedical data. METHODS We developed a Glioblastoma-based Biomedical Profile Network (GBPN) by extracting and integrating biomedical information pertinent to GBM-related diseases from the NCATS GARD Knowledge Graph (NGKG). We further clustered the GBPN based on modularity classes which resulted in multiple focused subgraphs, named mc_GBPN. We then identified high-influence nodes by performing network analysis over the mc_GBPN and validated those nodes that could be potential drug repurposing or repositioning candidates for GBM. RESULTS We developed the GBPN with 1,466 nodes and 107,423 edges and consequently the mc_GBPN with forty-one modularity classes. A list of the ten most influential nodes were identified from the mc_GBPN. These notably include Riluzole, stem cell therapy, cannabidiol, and VK-0214, with proven evidence for treating GBM. CONCLUSION Our GBM-targeted network analysis allowed us to effectively identify potential candidates for drug repurposing or repositioning. Further validation will be conducted by using other different types of biomedical and clinical data and biological experiments. The findings could lead to less invasive treatments for glioblastoma while significantly reducing research costs by shortening the drug development timeline. Furthermore, this workflow can be extended to other disease areas.
Collapse
Affiliation(s)
- Erin McGowan
- Division of Pre-Clinical Innovation National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Jaleal Sanjak
- Division of Pre-Clinical Innovation National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Ewy A Mathé
- Division of Pre-Clinical Innovation National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), 9800 Medical Center Drive, Rockville, MD, 20850, USA
| | - Qian Zhu
- Division of Pre-Clinical Innovation National Center for Advancing Translational Sciences (NCATS), National Institutes of Health (NIH), 9800 Medical Center Drive, Rockville, MD, 20850, USA.
| |
Collapse
|
22
|
Callaghan J, Xu CH, Xin J, Cano MA, Riutta A, Zhou E, Juneja R, Yao Y, Narayan M, Hanspers K, Agrawal A, Pico AR, Wu C, Su AI. BioThings Explorer: a query engine for a federated knowledge graph of biomedical APIs. Bioinformatics 2023; 39:7273783. [PMID: 37707514 PMCID: PMC11015316 DOI: 10.1093/bioinformatics/btad570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 08/18/2023] [Accepted: 09/12/2023] [Indexed: 09/15/2023] Open
Abstract
SUMMARY Knowledge graphs are an increasingly common data structure for representing biomedical information. These knowledge graphs can easily represent heterogeneous types of information, and many algorithms and tools exist for querying and analyzing graphs. Biomedical knowledge graphs have been used in a variety of applications, including drug repurposing, identification of drug targets, prediction of drug side effects, and clinical decision support. Typically, knowledge graphs are constructed by centralization and integration of data from multiple disparate sources. Here, we describe BioThings Explorer, an application that can query a virtual, federated knowledge graph derived from the aggregated information in a network of biomedical web services. BioThings Explorer leverages semantically precise annotations of the inputs and outputs for each resource, and automates the chaining of web service calls to execute multi-step graph queries. Because there is no large, centralized knowledge graph to maintain, BioThings Explorer is distributed as a lightweight application that dynamically retrieves information at query time. AVAILABILITY AND IMPLEMENTATION More information can be found at https://explorer.biothings.io and code is available at https://github.com/biothings/biothings_explorer.
Collapse
Affiliation(s)
- Jackson Callaghan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Colleen H Xu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Jiwen Xin
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Marco Alvarado Cano
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Anders Riutta
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, CA 94158, United States
| | - Eric Zhou
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Rohan Juneja
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Yao Yao
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Madhumita Narayan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Kristina Hanspers
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, CA 94158, United States
| | - Ayushi Agrawal
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, CA 94158, United States
| | - Alexander R Pico
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, CA 94158, United States
| | - Chunlei Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States
| |
Collapse
|
23
|
Yankee TN, Oh S, Winchester EW, Wilderman A, Robinson K, Gordon T, Rosenfeld JA, VanOudenhove J, Scott DA, Leslie EJ, Cotney J. Integrative analysis of transcriptome dynamics during human craniofacial development identifies candidate disease genes. Nat Commun 2023; 14:4623. [PMID: 37532691 PMCID: PMC10397224 DOI: 10.1038/s41467-023-40363-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 07/25/2023] [Indexed: 08/04/2023] Open
Abstract
Craniofacial disorders arise in early pregnancy and are one of the most common congenital defects. To fully understand how craniofacial disorders arise, it is essential to characterize gene expression during the patterning of the craniofacial region. To address this, we performed bulk and single-cell RNA-seq on human craniofacial tissue from 4-8 weeks post conception. Comparisons to dozens of other human tissues revealed 239 genes most strongly expressed during craniofacial development. Craniofacial-biased developmental enhancers were enriched +/- 400 kb surrounding these craniofacial-biased genes. Gene co-expression analysis revealed that regulatory hubs are enriched for known disease causing genes and are resistant to mutation in the normal healthy population. Combining transcriptomic and epigenomic data we identified 539 genes likely to contribute to craniofacial disorders. While most have not been previously implicated in craniofacial disorders, we demonstrate this set of genes has increased levels of de novo mutations in orofacial clefting patients warranting further study.
Collapse
Affiliation(s)
- Tara N Yankee
- Graduate Program in Genetics and Developmental Biology, UConn Health, Farmington, CT, 06030, USA
| | - Sungryong Oh
- University of Connecticut School of Medicine, Department of Genetics and Genome Sciences, Farmington, CT, 06030, USA
| | | | - Andrea Wilderman
- Graduate Program in Genetics and Developmental Biology, UConn Health, Farmington, CT, 06030, USA
| | - Kelsey Robinson
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
| | - Tia Gordon
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Jill A Rosenfeld
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
- Baylor Genetics Laboratory, Houston, TX, 77021, USA
| | - Jennifer VanOudenhove
- University of Connecticut School of Medicine, Department of Genetics and Genome Sciences, Farmington, CT, 06030, USA
| | - Daryl A Scott
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Molecular Physiology and Biophysics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Elizabeth J Leslie
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
| | - Justin Cotney
- University of Connecticut School of Medicine, Department of Genetics and Genome Sciences, Farmington, CT, 06030, USA.
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, 06269, USA.
| |
Collapse
|
24
|
Caufield JH, Putman T, Schaper K, Unni DR, Hegde H, Callahan TJ, Cappelletti L, Moxon SAT, Ravanmehr V, Carbon S, Chan LE, Cortes K, Shefchek KA, Elsarboukh G, Balhoff J, Fontana T, Matentzoglu N, Bruskiewich RM, Thessen AE, Harris NL, Munoz-Torres MC, Haendel MA, Robinson PN, Joachimiak MP, Mungall CJ, Reese JT. KG-Hub-building and exchanging biological knowledge graphs. Bioinformatics 2023; 39:btad418. [PMID: 37389415 PMCID: PMC10336030 DOI: 10.1093/bioinformatics/btad418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/09/2023] [Accepted: 06/29/2023] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking. RESULTS Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification. AVAILABILITY AND IMPLEMENTATION https://kghub.org.
Collapse
Affiliation(s)
- J Harry Caufield
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Kevin Schaper
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Deepak R Unni
- SIB Swiss Institute of Bioinformatics, Basel 1015, Switzerland
| | - Harshad Hegde
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Luca Cappelletti
- Department of Computer Science, University of Milano, Milan 20126, Italy
| | - Sierra A T Moxon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Vida Ravanmehr
- Department of Lymphoma-Myeloma, MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Seth Carbon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Lauren E Chan
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, United States
| | - Katherina Cortes
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Kent A Shefchek
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Glass Elsarboukh
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Jim Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC 27517, United States
| | - Tommaso Fontana
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan 20133, Italy
| | | | | | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Nomi L Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | | | - Melissa A Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, United States
| | - Marcin P Joachimiak
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Justin T Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| |
Collapse
|
25
|
Banerjee J, Taroni JN, Allaway RJ, Prasad DV, Guinney J, Greene C. Machine learning in rare disease. Nat Methods 2023:10.1038/s41592-023-01886-z. [PMID: 37248386 DOI: 10.1038/s41592-023-01886-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 04/22/2023] [Indexed: 05/31/2023]
Abstract
High-throughput profiling methods (such as genomics or imaging) have accelerated basic research and made deep molecular characterization of patient samples routine. These approaches provide a rich portrait of genes, molecular pathways and cell types involved in disease phenotypes. Machine learning (ML) can be a useful tool for extracting disease-relevant patterns from high-dimensional datasets. However, depending upon the complexity of the biological question, machine learning often requires many samples to identify recurrent and biologically meaningful patterns. Rare diseases are inherently limited in clinical cases, leading to few samples to study. In this Perspective, we outline the challenges and emerging solutions for using ML for small sample sets, specifically in rare diseases. Advances in ML methods for rare diseases are likely to be informative for applications beyond rare diseases for which few samples exist with high-dimensional data. We propose that the method community prioritize the development of ML techniques for rare disease research.
Collapse
Affiliation(s)
| | - Jaclyn N Taroni
- Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Philadelphia, PA, USA
| | | | | | | | - Casey Greene
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA.
| |
Collapse
|
26
|
Callahan TJ, Stefanski AL, Wyrwa JM, Zeng C, Ostropolets A, Banda JM, Baumgartner WA, Boyce RD, Casiraghi E, Coleman BD, Collins JH, Deakyne Davies SJ, Feinstein JA, Lin AY, Martin B, Matentzoglu NA, Meeker D, Reese J, Sinclair J, Taneja SB, Trinkley KE, Vasilevsky NA, Williams AE, Zhang XA, Denny JC, Ryan PB, Hripcsak G, Bennett TD, Haendel MA, Robinson PN, Hunter LE, Kahn MG. Ontologizing health systems data at scale: making translational discovery a reality. NPJ Digit Med 2023; 6:89. [PMID: 37208468 PMCID: PMC10196319 DOI: 10.1038/s41746-023-00830-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 04/28/2023] [Indexed: 05/21/2023] Open
Abstract
Common data models solve many challenges of standardizing electronic health record (EHR) data but are unable to semantically integrate all of the resources needed for deep phenotyping. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of heterogeneous data. However, mapping EHR data to OBO ontologies requires significant manual curation and domain expertise. We introduce OMOP2OBO, an algorithm for mapping Observational Medical Outcomes Partnership (OMOP) vocabularies to OBO ontologies. Using OMOP2OBO, we produced mappings for 92,367 conditions, 8611 drug ingredients, and 10,673 measurement results, which covered 68-99% of concepts used in clinical practice when examined across 24 hospitals. When used to phenotype rare disease patients, the mappings helped systematically identify undiagnosed patients who might benefit from genetic testing. By aligning OMOP vocabularies to OBO ontologies our algorithm presents new opportunities to advance EHR-based deep phenotyping.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA.
| | - Adrianne L Stefanski
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Jordan M Wyrwa
- Department of Physical Medicine and Rehabilitation, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Chenjie Zeng
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Anna Ostropolets
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Juan M Banda
- Department of Computer Science, Georgia State University, Atlanta, GA, 30303, USA
| | - William A Baumgartner
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Richard D Boyce
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15260, USA
| | - Elena Casiraghi
- Computer Science, Università degli Studi di Milano, Milan, Italy
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Ben D Coleman
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Janine H Collins
- Department of Haematology, University of Cambridge, Cambridge, UK
| | - Sara J Deakyne Davies
- Department of Research Informatics & Data Science, Analytics Resource Center, Children's Hospital Colorado, Aurora, CO, 80045, USA
| | - James A Feinstein
- Adult and Child Center for Health Outcomes Research and Delivery Science (ACCORDS), University of Colorado Anschutz School of Medicine, Aurora, CO, 80045, USA
| | - Asiyah Y Lin
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Blake Martin
- Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | | | | | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Katy E Trinkley
- Department of Family Medicine, University of Colorado Anschutz School of Medicine, Aurora, CO, 80045, USA
| | - Nicole A Vasilevsky
- Translational and Integrative Sciences Lab, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Andrew E Williams
- Tufts Institute for Clinical Research and Health Policy Studies, Tufts University, Boston, MA, 02155, USA
| | - Xingmin A Zhang
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Joshua C Denny
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Patrick B Ryan
- Janssen Research and Development, Raritan, NJ, 08869, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Tellen D Bennett
- Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Melissa A Haendel
- Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Michael G Kahn
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| |
Collapse
|
27
|
McGowan E, Sanjak J, Mathé EA, Zhu Q. Integrative Rare Disease Biomedical Profile based Network Supporting Drug Repurposing, a case study of Glioblastoma. RESEARCH SQUARE 2023:rs.3.rs-2809689. [PMID: 37131675 PMCID: PMC10153381 DOI: 10.21203/rs.3.rs-2809689/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Background Glioblastoma (GBM) is the most aggressive and common malignant primary brain tumor; however, treatment remains a significant challenge. This study aims to identify drug repurposing candidates for GBM by developing an integrative rare disease profile network containing heterogeneous types of biomedical data. Methods We developed a Glioblastoma-based Biomedical Profile Network (GBPN) by extracting and integrating biomedical information pertinent to GBM-related diseases from the NCATS GARD Knowledge Graph (NGKG). We further clustered the GBPN based on modularity classes which resulted in multiple focused subgraphs, named mc_GBPN. We then identified high-influence nodes by performing network analysis over the mc_GBPN and validated those nodes that could be potential drug repositioning candidates for GBM. Results We developed the GBPN with 1,466 nodes and 107,423 edges and consequently the mc_GBPN with forty-one modularity classes. A list of the ten most influential nodes were identified from the mc_GBPN. These notably include Riluzole, stem cell therapy, cannabidiol, and VK-0214, with proven evidence for treating GBM. Conclusion Our GBM-targeted network analysis allowed us to effectively identify potential candidates for drug repurposing. This could lead to less invasive treatments for glioblastoma while significantly reducing research costs by shortening the drug development timeline. Furthermore, this workflow can be extended to other disease areas.
Collapse
Affiliation(s)
- Erin McGowan
- NCATS: National Center for Advancing Translational Sciences
| | - Jaleal Sanjak
- NCATS: National Center for Advancing Translational Sciences
| | - Ewy A Mathé
- NCATS: National Center for Advancing Translational Sciences
| | | |
Collapse
|
28
|
Callaghan J, Xu CH, Xin J, Cano MA, Riutta A, Zhou E, Juneja R, Yao Y, Narayan M, Hanspers K, Agrawal A, Pico AR, Wu C, Su AI. BioThings Explorer: a query engine for a federated knowledge graph of biomedical APIs. ARXIV 2023:arXiv:2304.09344v1. [PMID: 37131885 PMCID: PMC10153288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Knowledge graphs are an increasingly common data structure for representing biomedical information. These knowledge graphs can easily represent heterogeneous types of information, and many algorithms and tools exist for querying and analyzing graphs. Biomedical knowledge graphs have been used in a variety of applications, including drug repurposing, identification of drug targets, prediction of drug side effects, and clinical decision support. Typically, knowledge graphs are constructed by centralization and integration of data from multiple disparate sources. Here, we describe BioThings Explorer, an application that can query a virtual, federated knowledge graph derived from the aggregated information in a network of biomedical web services. BioThings Explorer leverages semantically precise annotations of the inputs and outputs for each resource, and automates the chaining of web service calls to execute multi-step graph queries. Because there is no large, centralized knowledge graph to maintain, BioThing Explorer is distributed as a lightweight application that dynamically retrieves information at query time. More information can be found at https://explorer.biothings.io, and code is available at https://github.com/biothings/biothings_explorer.
Collapse
Affiliation(s)
- Jackson Callaghan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Colleen H Xu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Jiwen Xin
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Marco Alvarado Cano
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Anders Riutta
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Eric Zhou
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Rohan Juneja
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Yao Yao
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Madhumita Narayan
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Kristina Hanspers
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Ayushi Agrawal
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Alexander R Pico
- Data Science and Biotechnology, Gladstone Institutes, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Chunlei Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute
| |
Collapse
|
29
|
Taneja SB, Callahan TJ, Paine MF, Kane-Gill SL, Kilicoglu H, Joachimiak MP, Boyce RD. Developing a Knowledge Graph for Pharmacokinetic Natural Product-Drug Interactions. J Biomed Inform 2023; 140:104341. [PMID: 36933632 PMCID: PMC10150409 DOI: 10.1016/j.jbi.2023.104341] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Revised: 01/09/2023] [Accepted: 03/13/2023] [Indexed: 03/17/2023]
Abstract
BACKGROUND Pharmacokinetic natural product-drug interactions (NPDIs) occur when botanical or other natural products are co-consumed with pharmaceutical drugs. With the growing use of natural products, the risk for potential NPDIs and consequent adverse events has increased. Understanding mechanisms of NPDIs is key to preventing or minimizing adverse events. Although biomedical knowledge graphs (KGs) have been widely used for drug-drug interaction applications, computational investigation of NPDIs is novel. We constructed NP-KG as a first step toward computational discovery of plausible mechanistic explanations for pharmacokinetic NPDIs that can be used to guide scientific research. METHODS We developed a large-scale, heterogeneous KG with biomedical ontologies, linked data, and full texts of the scientific literature. To construct the KG, biomedical ontologies and drug databases were integrated with the Phenotype Knowledge Translator framework. The semantic relation extraction systems, SemRep and Integrated Network and Dynamic Reasoning Assembler, were used to extract semantic predications (subject-relation-object triples) from full texts of the scientific literature related to the exemplar natural products green tea and kratom. A literature-based graph constructed from the predications was integrated into the ontology-grounded KG to create NP-KG. NP-KG was evaluated with case studies of pharmacokinetic green tea- and kratom-drug interactions through KG path searches and meta-path discovery to determine congruent and contradictory information in NP-KG compared to ground truth data. We also conducted an error analysis to identify knowledge gaps and incorrect predications in the KG. RESULTS The fully integrated NP-KG consisted of 745,512 nodes and 7,249,576 edges. Evaluation of NP-KG resulted in congruent (38.98% for green tea, 50% for kratom), contradictory (15.25% for green tea, 21.43% for kratom), and both congruent and contradictory (15.25% for green tea, 21.43% for kratom) information compared to ground truth data. Potential pharmacokinetic mechanisms for several purported NPDIs, including the green tea-raloxifene, green tea-nadolol, kratom-midazolam, kratom-quetiapine, and kratom-venlafaxine interactions were congruent with the published literature. CONCLUSION NP-KG is the first KG to integrate biomedical ontologies with full texts of the scientific literature focused on natural products. We demonstrate the application of NP-KG to identify known pharmacokinetic interactions between natural products and pharmaceutical drugs mediated by drug metabolizing enzymes and transporters. Future work will incorporate context, contradiction analysis, and embedding-based methods to enrich NP-KG. NP-KG is publicly available at https://doi.org/10.5281/zenodo.6814507. The code for relation extraction, KG construction, and hypothesis generation is available at https://github.com/sanyabt/np-kg.
Collapse
Affiliation(s)
- Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15206, USA.
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Mary F Paine
- Department of Pharmaceutical Sciences, College of Pharmacy and Pharmaceutical Sciences, Washington State University, Spokane, WA 99202, USA
| | | | - Halil Kilicoglu
- School of Information Sciences, University of Illinois at Urbana-Champaign, Champaign, IL 61820, USA
| | - Marcin P Joachimiak
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Richard D Boyce
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, USA
| |
Collapse
|
30
|
Herr BW, Hardi J, Quardokus EM, Bueckle A, Chen L, Wang F, Caron AR, Osumi-Sutherland D, Musen MA, Börner K. Specimen, biological structure, and spatial ontologies in support of a Human Reference Atlas. Sci Data 2023; 10:171. [PMID: 36973309 PMCID: PMC10043028 DOI: 10.1038/s41597-023-01993-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 01/30/2023] [Indexed: 03/29/2023] Open
Abstract
The Human Reference Atlas (HRA) is defined as a comprehensive, three-dimensional (3D) atlas of all the cells in the healthy human body. It is compiled by an international team of experts who develop standard terminologies that they link to 3D reference objects, describing anatomical structures. The third HRA release (v1.2) covers spatial reference data and ontology annotations for 26 organs. Experts access the HRA annotations via spreadsheets and view reference object models in 3D editing tools. This paper introduces the Common Coordinate Framework (CCF) Ontology v2.0.1 that interlinks specimen, biological structure, and spatial data, together with the CCF API that makes the HRA programmatically accessible and interoperable with Linked Open Data (LOD). We detail how real-world user needs and experimental data guide CCF Ontology design and implementation, present CCF Ontology classes and properties together with exemplary usage, and report on validation methods. The CCF Ontology graph database and API are used in the HuBMAP portal, HRA Organ Gallery, and other applications that support data queries across multiple, heterogeneous sources.
Collapse
Affiliation(s)
- Bruce W Herr
- Department of Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, 47408, USA
| | - Josef Hardi
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, 94305, USA
| | - Ellen M Quardokus
- Department of Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, 47408, USA
| | - Andreas Bueckle
- Department of Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, 47408, USA.
| | - Lu Chen
- Department of Computer Science, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Fusheng Wang
- Department of Computer Science, Stony Brook University, Stony Brook, NY, 11794, USA
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, 11794, USA
| | - Anita R Caron
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | | | - Mark A Musen
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, 94305, USA
| | - Katy Börner
- Department of Intelligent Systems Engineering, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, 47408, USA.
| |
Collapse
|
31
|
Abood A, Mesner LD, Jeffery ED, Murali M, Lehe M, Saquing J, Farber CR, Sheynkman GM. Long-read proteogenomics to connect disease-associated sQTLs to the protein isoform effectors of disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.17.531557. [PMID: 36993769 PMCID: PMC10055087 DOI: 10.1101/2023.03.17.531557] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
Abstract
A major fraction of loci identified by genome-wide association studies (GWASs) lead to alterations in alternative splicing, but interpretation of how such alterations impact proteins is hindered by the technical limitations of short-read RNA-seq, which cannot directly link splicing events to full-length transcript or protein isoforms. Long-read RNA-seq represents a powerful tool to define and quantify transcript isoforms, and recently, infer protein isoform existence. Here we present a novel approach that integrates information from GWAS, splicing QTL (sQTL), and PacBio long-read RNA-seq in a disease-relevant model to infer the effects of sQTLs on the ultimate protein isoform products they encode. We demonstrate the utility of our approach using bone mineral density (BMD) GWAS data. We identified 1,863 sQTLs from the Genotype-Tissue Expression (GTEx) project in 732 protein-coding genes which colocalized with BMD associations (H 4 PP ≥ 0.75). We generated deep coverage PacBio long-read RNA-seq data (N=∼22 million full-length reads) on human osteoblasts, identifying 68,326 protein-coding isoforms, of which 17,375 (25%) were novel. By casting the colocalized sQTLs directly onto protein isoforms, we connected 809 sQTLs to 2,029 protein isoforms from 441 genes expressed in osteoblasts. Using these data, we created one of the first proteome-scale resources defining full-length isoforms impacted by colocalized sQTLs. Overall, we found that 74 sQTLs influenced isoforms likely impacted by nonsense mediated decay (NMD) and 190 that potentially resulted in the expression of new protein isoforms. Finally, we identified colocalizing sQTLs in TPM2 for splice junctions between two mutually exclusive exons, and two different transcript termination sites, making it impossible to interpret without long-read RNA-seq data. siRNA mediated knockdown in osteoblasts showed two TPM2 isoforms with opposing effects on mineralization. We expect our approach to be widely generalizable across diverse clinical traits and accelerate system-scale analyses of protein isoform activities modulated by GWAS loci.
Collapse
|
32
|
Glen AK, Ma C, Mendoza L, Womack F, Wood EC, Sinha M, Acevedo L, Kvarfordt LG, Peene RC, Liu S, Hoffman AS, Roach JC, Deutsch EW, Ramsey SA, Koslicki D. ARAX: a graph-based modular reasoning tool for translational biomedicine. Bioinformatics 2023; 39:7031241. [PMID: 36752514 PMCID: PMC10027432 DOI: 10.1093/bioinformatics/btad082] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 12/17/2022] [Accepted: 02/07/2023] [Indexed: 04/12/2023] Open
Abstract
MOTIVATION With the rapidly growing volume of knowledge and data in biomedical databases, improved methods for knowledge-graph-based computational reasoning are needed in order to answer translational questions. Previous efforts to solve such challenging computational reasoning problems have contributed tools and approaches, but progress has been hindered by the lack of an expressive analysis workflow language for translational reasoning and by the lack of a reasoning engine-supporting that language-that federates semantically integrated knowledge-bases. RESULTS We introduce ARAX, a new reasoning system for translational biomedicine that provides a web browser user interface and an application programming interface (API). ARAX enables users to encode translational biomedical questions and to integrate knowledge across sources to answer the user's query and facilitate exploration of results. For ARAX, we developed new approaches to query planning, knowledge-gathering, reasoning and result ranking and dynamically integrate knowledge providers for answering biomedical questions. To illustrate ARAX's application and utility in specific disease contexts, we present several use-case examples. AVAILABILITY AND IMPLEMENTATION The source code and technical documentation for building the ARAX server-side software and its built-in knowledge database are freely available online (https://github.com/RTXteam/RTX). We provide a hosted ARAX service with a web browser interface at arax.rtx.ai and a web API endpoint at arax.rtx.ai/api/arax/v1.3/ui/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Luis Mendoza
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Finn Womack
- Huck Institutes of the Life Sciences, Pennsylvania State University, State College, PA 16802, USA
| | - E C Wood
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97331, USA
| | - Meghamala Sinha
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97331, USA
| | - Liliana Acevedo
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97331, USA
| | - Lindsey G Kvarfordt
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97331, USA
| | - Ross C Peene
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97331, USA
| | - Shaopeng Liu
- Huck Institutes of the Life Sciences, Pennsylvania State University, State College, PA 16802, USA
| | - Andrew S Hoffman
- Interdisciplinary Hub for Digitalization and Society, Radboud University, Nijmegen 6500GL, The Netherlands
| | - Jared C Roach
- Institute for Systems Biology, Seattle, WA 98109, USA
| | | | | | | |
Collapse
|
33
|
James KN, Phadke S, Wong TC, Chowdhury S. Artificial Intelligence in the Genetic Diagnosis of Rare Disease. Clin Lab Med 2023; 43:127-143. [PMID: 36764805 DOI: 10.1016/j.cll.2022.09.023] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Affiliation(s)
- Kiely N James
- Genomics, Rady Children's Institute for Genomic Medicine, 7910 Frost Street, MC5129, San Diego, CA 92123, USA
| | - Sujal Phadke
- Genomics, Rady Children's Institute for Genomic Medicine, 7910 Frost Street, MC5129, San Diego, CA 92123, USA
| | - Terence C Wong
- Genomics, Rady Children's Institute for Genomic Medicine, 7910 Frost Street, MC5129, San Diego, CA 92123, USA
| | - Shimul Chowdhury
- Rady Children's Institute for Genomic Medicine, 7910 Frost Street, MC5129, San Diego, CA 92123, USA.
| |
Collapse
|
34
|
Seaby EG, Thomas NS, Hunt D, Baralle D, Rehm HL, O’Donnell-Luria A, Ennis S. A panel-agnostic strategy 'HiPPo' improves diagnostic efficiency in the UK Genome Medicine Service. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.01.31.23285025. [PMID: 36778464 PMCID: PMC9915838 DOI: 10.1101/2023.01.31.23285025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Genome sequencing is now available as a clinical test on the National Health Service (NHS) through the Genome Medicine Service (GMS). The GMS have set out an analytical strategy that predominantly filters genome data on a pre-selected gene panel(s). Whilst this approach reduces the number of variants requiring assessment by reporting laboratories, pathogenic variants outside of the gene panel applied may be missed, and candidate variants in novel genes are largely ignored. This study sought to compare a research exome analysis to an independent clinical genome analysis performed through the NHS for the same group of patients. When analysing the exome data, we applied a panel agnostic approach filtering for variants with High Pathogenic Potential (HiPPo) using ClinVar, allele frequency, and in silico prediction tools. We then compared this gene agnostic analysis to the panel-based approach as applied by the GMS to genome data. Later we restricted HiPPo variants to a panel of the Gene Curation Coalition (GenCC) morbid genes and compared the diagnostic yield with the variants filtered using the GMS strategy. 24 patients from 8 families underwent parallel research exome sequencing and GMS genome sequencing. HiPPo analysis applied to research exome data identified a similar number of variants as the gene panel-based approach applied by the GMS. GMS clinical genome analysis identified and returned 2 pathogenic variants and 3 variants of uncertain significance. HiPPo research exome analysis identified the same variants plus an additional pathogenic variant and a further 3 de novo variants of uncertain significance in novel genes, where case series and functional studies are underway. When HiPPo was restricted to GenCC disease genes (strong or definitive), the same pathogenic variants were identified yet statistically fewer variants required assessment to identify more diagnostic variants than reported by the GMS genome strategy. This gave a diagnostic rate per variant assessed of 20% for HiPPo restricted to GenCC versus 3% for the GMS panel-based approach. With plans to sequence 5 million more NHS patients, strategies are needed to optimise the full potential of genome data beyond gene panels whilst minimising the burden of variants that require clinical assessment.
Collapse
Affiliation(s)
- Eleanor G. Seaby
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton, Hampshire, SO16 6YD, UK
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115, USA
- Paediatric Infectious Diseases, Imperial College London, London, W2 1NY, UK
| | - N. Simon Thomas
- Wessex Regional Genomics Laboratory, Salisbury NHS Foundation Trust, Salisbury, SP2 8BJ, UK
| | - David Hunt
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton, Hampshire, SO16 6YD, UK
| | - Diana Baralle
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton, Hampshire, SO16 6YD, UK
| | - Heidi L. Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Anne O’Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Sarah Ennis
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton, Hampshire, SO16 6YD, UK
| |
Collapse
|
35
|
Holmgren S, Bell SM, Wignall J, Duncan CG, Kwok RK, Cronk R, Osborn K, Black S, Thessen A, Schmitt C. Workshop Report: Catalyzing Knowledge-Driven Discovery in Environmental Health Sciences through a Harmonized Language. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:2317. [PMID: 36767684 PMCID: PMC9915042 DOI: 10.3390/ijerph20032317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 01/20/2023] [Accepted: 01/25/2023] [Indexed: 06/18/2023]
Abstract
Harmonized language is essential to finding, sharing, and reusing large-scale, complex data. Gaps and barriers prevent the adoption of harmonized language approaches in environmental health sciences (EHS). To address this, the National Institute of Environmental Health Sciences and partners created the Environmental Health Language Collaborative (EHLC). The purpose of EHLC is to facilitate a community-driven effort to advance the development and adoption of harmonized language approaches in EHS. EHLC is a forum to pinpoint language harmonization gaps, to facilitate the development of, raise awareness of, and encourage the use of harmonization approaches and tools, and to develop new standards and recommendations. To ensure that EHLC's focus and structure would be sustainable long-term and meet the needs of the field, EHLC launched an inaugural workshop in September 2021 focused on "Developing Sustainable Language Solutions" and "Building a Sustainable Community". When the attendees were surveyed, 91% said harmonized language solutions would be of high value/benefit, and 60% agreed to continue contributing to EHLC efforts. Based on workshop discussions, future activities will focus on targeted collaborative use-case working groups in addition to offering education and training on ontologies, metadata, and standards, and developing an EHS language resource portal.
Collapse
Affiliation(s)
- Stephanie Holmgren
- Office of Data Science, National Institute of Environmental Health Sciences (NIEHS), Durham, NC 27709, USA
| | | | | | - Christopher G. Duncan
- Genes, Environment, and Health Branch, Division of Extramural Research and Training, National Institute of Environmental Health Sciences (NIEHS), Durham, NC 27709, USA
| | - Richard K. Kwok
- Division of Neuroscience, National Institute on Aging (NIA), Bethesda, MD 20892, USA
| | - Ryan Cronk
- Health Sciences, ICF, Reston, VA 20190, USA
| | | | | | - Anne Thessen
- Center for Health Artificial Intelligence, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Charles Schmitt
- Office of Data Science, National Institute of Environmental Health Sciences (NIEHS), Durham, NC 27709, USA
| |
Collapse
|
36
|
Li MM, Huang K, Zitnik M. Graph representation learning in biomedicine and healthcare. Nat Biomed Eng 2022; 6:1353-1369. [PMID: 36316368 PMCID: PMC10699434 DOI: 10.1038/s41551-022-00942-x] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Accepted: 08/09/2022] [Indexed: 11/11/2022]
Abstract
Networks-or graphs-are universal descriptors of systems of interacting elements. In biomedicine and healthcare, they can represent, for example, molecular interactions, signalling pathways, disease co-morbidities or healthcare systems. In this Perspective, we posit that representation learning can realize principles of network medicine, discuss successes and current limitations of the use of representation learning on graphs in biomedicine and healthcare, and outline algorithmic strategies that leverage the topology of graphs to embed them into compact vectorial spaces. We argue that graph representation learning will keep pushing forward machine learning for biomedicine and healthcare applications, including the identification of genetic variants underlying complex traits, the disentanglement of single-cell behaviours and their effects on health, the assistance of patients in diagnosis and treatment, and the development of safe and effective medicines.
Collapse
Affiliation(s)
- Michelle M Li
- Bioinformatics and Integrative Genomics Program, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Kexin Huang
- Health Data Science Program, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Data Science Initiative, Cambridge, MA, USA.
| |
Collapse
|
37
|
Kelleher KJ, Sheils TK, Mathias SL, Yang JJ, Metzger V, Siramshetty V, Nguyen DT, Jensen LJ, Vidović D, Schürer S, Holmes J, Sharma K, Pillai A, Bologa C, Edwards J, Mathé E, Oprea T. Pharos 2023: an integrated resource for the understudied human proteome. Nucleic Acids Res 2022; 51:D1405-D1416. [PMID: 36624666 PMCID: PMC9825581 DOI: 10.1093/nar/gkac1033] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/12/2022] [Accepted: 11/28/2022] [Indexed: 11/30/2022] Open
Abstract
The Illuminating the Druggable Genome (IDG) project aims to improve our understanding of understudied proteins and our ability to study them in the context of disease biology by perturbing them with small molecules, biologics, or other therapeutic modalities. Two main products from the IDG effort are the Target Central Resource Database (TCRD) (http://juniper.health.unm.edu/tcrd/), which curates and aggregates information, and Pharos (https://pharos.nih.gov/), a web interface for fusers to extract and visualize data from TCRD. Since the 2021 release, TCRD/Pharos has focused on developing visualization and analysis tools that help reveal higher-level patterns in the underlying data. The current iterations of TCRD and Pharos enable users to perform enrichment calculations based on subsets of targets, diseases, or ligands and to create interactive heat maps and UpSet charts of many types of annotations. Using several examples, we show how to address disease biology and drug discovery questions through enrichment calculations and UpSet charts.
Collapse
Affiliation(s)
- Keith J Kelleher
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Timothy K Sheils
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Stephen L Mathias
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Jeremy J Yang
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Vincent T Metzger
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Vishal B Siramshetty
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen 2200, Copenhagen, Denmark
| | - Dušica Vidović
- Institute for Data Science and Computing, University of Miami, Coral Gables, FL 33146, USA,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
| | - Stephan C Schürer
- Institute for Data Science and Computing, University of Miami, Coral Gables, FL 33146, USA,Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA,Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
| | - Jayme Holmes
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Karlie R Sharma
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Ajay Pillai
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA
| | - Cristian G Bologa
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| | - Jeremy S Edwards
- Correspondence may also be addressed to Jeremy Edwards. Tel: +1 505 277 6655;
| | - Ewy A Mathé
- To whom correspondence should be addressed. Tel: +1 301 402 8953;
| | - Tudor I Oprea
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA
| |
Collapse
|
38
|
Tudini E, Andrews J, Lawrence DM, King-Smith SL, Baker N, Baxter L, Beilby J, Bennetts B, Beshay V, Black M, Boughtwood TF, Brion K, Cheong PL, Christie M, Christodoulou J, Chong B, Cox K, Davis MR, Dejong L, Dinger ME, Doig KD, Douglas E, Dubowsky A, Ellul M, Fellowes A, Fisk K, Fortuno C, Friend K, Gallagher RL, Gao S, Hackett E, Hadler J, Hipwell M, Ho G, Hollway G, Hooper AJ, Kassahn KS, Krishnaraj R, Lau C, Le H, San Leong H, Lundie B, Lunke S, Marty A, McPhillips M, Nguyen LT, Nones K, Palmer K, Pearson JV, Quinn MC, Rawlings LH, Sadedin S, Sanchez L, Schreiber AW, Sigalas E, Simsek A, Soubrier J, Stark Z, Thompson BA, U J, Vakulin CG, Wells AV, Wise CA, Woods R, Ziolkowski A, Brion MJ, Scott HS, Thorne NP, Spurdle AB. Shariant platform: Enabling evidence sharing across Australian clinical genetic-testing laboratories to support variant interpretation. Am J Hum Genet 2022; 109:1960-1973. [PMID: 36332611 PMCID: PMC9674965 DOI: 10.1016/j.ajhg.2022.10.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 10/10/2022] [Indexed: 11/06/2022] Open
Abstract
Sharing genomic variant interpretations across laboratories promotes consistency in variant assertions. A landscape analysis of Australian clinical genetic-testing laboratories in 2017 identified that, despite the national-accreditation-body recommendations encouraging laboratories to submit genotypic data to clinical databases, fewer than 300 variants had been shared to the ClinVar public database. Consultations with Australian laboratories identified resource constraints limiting routine application of manual processes, consent issues, and differences in interpretation systems as barriers to sharing. This information was used to define key needs and solutions required to enable national sharing of variant interpretations. The Shariant platform, using both the GRCh37 and GRCh38 genome builds, was developed to enable ongoing sharing of variant interpretations and associated evidence between Australian clinical genetic-testing laboratories. Where possible, two-way automated sharing was implemented so that disruption to laboratory workflows would be minimized. Terms of use were developed through consultation and currently restrict access to Australian clinical genetic-testing laboratories. Shariant was designed to store and compare structured evidence, to promote and record resolution of inter-laboratory classification discrepancies, and to streamline the submission of variant assertions to ClinVar. As of December 2021, more than 14,000 largely prospectively curated variant records from 11 participating laboratories have been shared. Discrepant classifications have been identified for 11% (28/260) of variants submitted by more than one laboratory. We have demonstrated that co-design with clinical laboratories is vital to developing and implementing a national variant-interpretation sharing effort. This approach has improved inter-laboratory concordance and enabled opportunities to standardize interpretation practices.
Collapse
Affiliation(s)
- Emma Tudini
- Australian Genomics, Melbourne, VIC 3052, Australia,Population Health, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4006, Australia
| | - James Andrews
- Australian Genomics, Melbourne, VIC 3052, Australia,Australian Cancer Research Foundation Cancer Genomics Facility, Centre for Cancer Biology, SA Pathology and University of South Australia, Adelaide, SA 5000, Australia
| | - David M. Lawrence
- Australian Cancer Research Foundation Cancer Genomics Facility, Centre for Cancer Biology, SA Pathology and University of South Australia, Adelaide, SA 5000, Australia
| | - Sarah L. King-Smith
- Australian Genomics, Melbourne, VIC 3052, Australia,Genetics and Molecular Pathology, SA Pathology, Adelaide, SA 5000, Australia
| | - Naomi Baker
- Victorian Clinical Genetics Services, Murdoch Children’s Research Institute, Melbourne, VIC 3052, Australia,University of Melbourne, Melbourne, VIC 3052, Australia
| | | | - John Beilby
- PathWest Laboratory Medicine Western Australia, Perth, WA 6009, Australia,School of Biomedical Sciences, The University of Western Australia, Perth, WA 6009, Australia
| | - Bruce Bennetts
- Sydney Genome Diagnostics, Western Sydney Genetics Program, The Children’s Hospital at Westmead, Sydney, NSW 2145, Australia,Disciplines of Child and Adolescent Health and Genomic Medicine, University of Sydney, Sydney, NSW 2145, Australia
| | - Victoria Beshay
- Department of Pathology, Peter MacCallum Cancer Centre, Melbourne, VIC 3052, Australia
| | - Michael Black
- Department of Diagnostic Genomics, PathWest Laboratory Medicine Western Australia, Perth, WA 6009, Australia
| | - Tiffany F. Boughtwood
- Australian Genomics, Melbourne, VIC 3052, Australia,Murdoch Children’s Research Institute, Melbourne, VIC 3052, Australia
| | | | - Pak Leng Cheong
- Department of Medical Genomics, Royal Prince Alfred Hospital, NSW Health Pathology, Sydney, NSW 2050, Australia,University of Sydney, Sydney, NSW 2006, Australia
| | - Michael Christie
- Department of Pathology, Royal Melbourne Hospital, Melbourne, VIC 3050, Australia
| | - John Christodoulou
- Australian Genomics, Melbourne, VIC 3052, Australia,Disciplines of Child and Adolescent Health and Genomic Medicine, University of Sydney, Sydney, NSW 2145, Australia,Murdoch Children’s Research Institute, Melbourne, VIC 3052, Australia,Department of Paediatrics, University of Melbourne, Melbourne, VIC 3052, Australia
| | - Belinda Chong
- Victorian Clinical Genetics Services, Murdoch Children’s Research Institute, Melbourne, VIC 3052, Australia
| | - Kathy Cox
- Genetics and Molecular Pathology, SA Pathology, Adelaide, SA 5000, Australia
| | - Mark R. Davis
- Department of Diagnostic Genomics, PathWest Laboratory Medicine Western Australia, Perth, WA 6009, Australia,Centre for Medical Research, The University of Western Australia, Perth, WA 6009, Australia
| | - Lucas Dejong
- Genetics and Molecular Pathology, SA Pathology, Adelaide, SA 5000, Australia
| | - Marcel E. Dinger
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Kenneth D. Doig
- Department of Pathology, Peter MacCallum Cancer Centre, Melbourne, VIC 3052, Australia,Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC 3052, Australia
| | - Evelyn Douglas
- Genetics and Molecular Pathology, SA Pathology, Adelaide, SA 5000, Australia
| | - Andrew Dubowsky
- Genetics and Molecular Pathology, SA Pathology, Adelaide, SA 5000, Australia
| | - Melissa Ellul
- Genetics and Molecular Pathology, SA Pathology, Adelaide, SA 5000, Australia
| | - Andrew Fellowes
- Department of Pathology, Peter MacCallum Cancer Centre, Melbourne, VIC 3052, Australia
| | - Katrina Fisk
- Sydney Genome Diagnostics, Western Sydney Genetics Program, The Children’s Hospital at Westmead, Sydney, NSW 2145, Australia
| | - Cristina Fortuno
- Population Health, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4006, Australia
| | - Kathryn Friend
- Genetics and Molecular Pathology, SA Pathology, Adelaide, SA 5000, Australia
| | | | - Song Gao
- Genetics and Molecular Pathology, SA Pathology, Adelaide, SA 5000, Australia
| | - Emma Hackett
- Sydney Genome Diagnostics, Western Sydney Genetics Program, The Children’s Hospital at Westmead, Sydney, NSW 2145, Australia
| | - Johanna Hadler
- Genetics and Molecular Pathology, SA Pathology, Adelaide, SA 5000, Australia
| | - Michael Hipwell
- Division of Molecular Medicine, NSW Health Pathology North, Newcastle, NSW 2305, Australia
| | - Gladys Ho
- Sydney Genome Diagnostics, Western Sydney Genetics Program, The Children’s Hospital at Westmead, Sydney, NSW 2145, Australia,Disciplines of Child and Adolescent Health and Genomic Medicine, University of Sydney, Sydney, NSW 2145, Australia
| | - Georgina Hollway
- Garvan Institute of Medical Research, Sydney, NSW 2010, Australia,Cancer Research, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4006, Australia
| | - Amanda J. Hooper
- Department of Clinical Biochemistry, PathWest Laboratory Medicine Western Australia, Fiona Stanley Hospital Network, Perth, WA 6150, Australia,School of Medicine, The University of Western Australia, Perth, WA 6009, Australia
| | - Karin S. Kassahn
- Genetics and Molecular Pathology, SA Pathology, Adelaide, SA 5000, Australia,Adelaide Medical School, The University of Adelaide, Adelaide, SA 5000, Australia
| | - Rahul Krishnaraj
- Sydney Genome Diagnostics, Western Sydney Genetics Program, The Children’s Hospital at Westmead, Sydney, NSW 2145, Australia
| | - Chiyan Lau
- Pathology Queensland, Brisbane, QLD 4006, Australia,The University of Queensland, Brisbane, QLD 4072, Australia
| | - Huong Le
- Department of Medical Genomics, Royal Prince Alfred Hospital, NSW Health Pathology, Sydney, NSW 2050, Australia
| | - Huei San Leong
- Department of Pathology, Peter MacCallum Cancer Centre, Melbourne, VIC 3052, Australia
| | - Ben Lundie
- Pathology Queensland, Brisbane, QLD 4006, Australia
| | - Sebastian Lunke
- Victorian Clinical Genetics Services, Murdoch Children’s Research Institute, Melbourne, VIC 3052, Australia,University of Melbourne, Melbourne, VIC 3052, Australia
| | - Anthony Marty
- Melbourne Genomics Health Alliance, Melbourne, VIC 3052, Australia
| | - Mary McPhillips
- Division of Molecular Medicine, NSW Health Pathology North, Newcastle, NSW 2305, Australia
| | - Lan T. Nguyen
- Department of Clinical Biochemistry, PathWest Laboratory Medicine Western Australia, Fiona Stanley Hospital Network, Perth, WA 6150, Australia
| | - Katia Nones
- Cancer Research, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4006, Australia
| | - Kristen Palmer
- Genomics Statewide Services, New South Wales Health Pathology, Newcastle, NSW 2300, Australia
| | - John V. Pearson
- Genome Informatics, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4006, Australia
| | - Michael C.J. Quinn
- Australian Genomics, Melbourne, VIC 3052, Australia,Genetic Health Queensland, Royal Brisbane and Women’s Hospital, Brisbane, QLD 4006, Australia
| | - Lesley H. Rawlings
- Genetics and Molecular Pathology, SA Pathology, Adelaide, SA 5000, Australia
| | - Simon Sadedin
- Victorian Clinical Genetics Services, Murdoch Children’s Research Institute, Melbourne, VIC 3052, Australia,University of Melbourne, Melbourne, VIC 3052, Australia,Murdoch Children’s Research Institute, Melbourne, VIC 3052, Australia
| | - Louisa Sanchez
- Genetics and Molecular Pathology, SA Pathology, Adelaide, SA 5000, Australia
| | - Andreas W. Schreiber
- Australian Cancer Research Foundation Cancer Genomics Facility, Centre for Cancer Biology, SA Pathology and University of South Australia, Adelaide, SA 5000, Australia,School of Biological Sciences, The University of Adelaide, Adelaide, SA 5005, Australia
| | - Emanouil Sigalas
- Department of Pathology, Royal Melbourne Hospital, Melbourne, VIC 3050, Australia
| | - Aygul Simsek
- Genetics and Molecular Pathology, SA Pathology, Adelaide, SA 5000, Australia
| | - Julien Soubrier
- Genetics and Molecular Pathology, SA Pathology, Adelaide, SA 5000, Australia,School of Biological Sciences, The University of Adelaide, Adelaide, SA 5005, Australia
| | - Zornitza Stark
- Australian Genomics, Melbourne, VIC 3052, Australia,Victorian Clinical Genetics Services, Murdoch Children’s Research Institute, Melbourne, VIC 3052, Australia,University of Melbourne, Melbourne, VIC 3052, Australia
| | - Bryony A. Thompson
- Department of Pathology, Royal Melbourne Hospital, Melbourne, VIC 3050, Australia
| | - James U
- Melbourne Genomics Health Alliance, Melbourne, VIC 3052, Australia
| | | | - Amanda V. Wells
- Genetics and Molecular Pathology, SA Pathology, Adelaide, SA 5000, Australia
| | - Cheryl A. Wise
- Department of Diagnostic Genomics, PathWest Laboratory Medicine Western Australia, Perth, WA 6009, Australia
| | - Rick Woods
- Pathology Queensland, Brisbane, QLD 4006, Australia
| | - Andrew Ziolkowski
- Division of Molecular Medicine, NSW Health Pathology North, Newcastle, NSW 2305, Australia
| | - Marie-Jo Brion
- Australian Genomics, Melbourne, VIC 3052, Australia,Population Health, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4006, Australia
| | - Hamish S. Scott
- Australian Genomics, Melbourne, VIC 3052, Australia,Australian Cancer Research Foundation Cancer Genomics Facility, Centre for Cancer Biology, SA Pathology and University of South Australia, Adelaide, SA 5000, Australia,Genetics and Molecular Pathology, SA Pathology, Adelaide, SA 5000, Australia,Adelaide Medical School, The University of Adelaide, Adelaide, SA 5000, Australia
| | - Natalie P. Thorne
- Australian Genomics, Melbourne, VIC 3052, Australia,University of Melbourne, Melbourne, VIC 3052, Australia,Murdoch Children’s Research Institute, Melbourne, VIC 3052, Australia,Melbourne Genomics Health Alliance, Melbourne, VIC 3052, Australia,Walter and Eliza Hall Institute, Melbourne, VIC 3052, Australia
| | - Amanda B. Spurdle
- Australian Genomics, Melbourne, VIC 3052, Australia,Population Health, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4006, Australia,Corresponding author
| | | |
Collapse
|
39
|
Ye C, Swiers R, Bonner S, Barrett I. A Knowledge Graph-Enhanced Tensor Factorisation Model for Discovering Drug Targets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3070-3080. [PMID: 35939454 DOI: 10.1109/tcbb.2022.3197320] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The drug discovery and development process is a long and expensive one, costing over 1 billion USD on average per drug and taking 10-15 years. To reduce the high levels of attrition throughout the process, there has been a growing interest in applying machine learning methodologies to various stages of drug discovery and development in the recent decade, especially at the earliest stage - identification of druggable disease genes. In this paper, we have developed a new tensor factorisation model to predict potential drug targets (genes or proteins) for treating diseases. We created a three-dimensional data tensor consisting of 1,048 gene targets, 860 diseases and 230,011 evidence attributes and clinical outcomes connecting them, using data extracted from the Open Targets and PharmaProjects databases. We enriched the data with gene target representations learned from a drug discovery-oriented knowledge graph and applied our proposed method to predict the clinical outcomes for unseen gene target and disease pairs. We designed three evaluation strategies to measure the prediction performance and benchmarked several commonly used machine learning classifiers together with Bayesian matrix and tensor factorisation methods. The result shows that incorporating knowledge graph embeddings significantly improves the prediction accuracy and that training tensor factorisation alongside a dense neural network outperforms all other baselines. In summary, our framework combines two actively studied machine learning approaches to disease target identification, namely tensor factorisation and knowledge graph representation learning, which could be a promising avenue for further exploration in data-driven drug discovery.
Collapse
|
40
|
Wood EC, Glen AK, Kvarfordt LG, Womack F, Acevedo L, Yoon TS, Ma C, Flores V, Sinha M, Chodpathumwan Y, Termehchy A, Roach JC, Mendoza L, Hoffman AS, Deutsch EW, Koslicki D, Ramsey SA. RTX-KG2: a system for building a semantically standardized knowledge graph for translational biomedicine. BMC Bioinformatics 2022; 23:400. [PMID: 36175836 PMCID: PMC9520835 DOI: 10.1186/s12859-022-04932-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 09/14/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Biomedical translational science is increasingly using computational reasoning on repositories of structured knowledge (such as UMLS, SemMedDB, ChEMBL, Reactome, DrugBank, and SMPDB in order to facilitate discovery of new therapeutic targets and modalities. The NCATS Biomedical Data Translator project is working to federate autonomous reasoning agents and knowledge providers within a distributed system for answering translational questions. Within that project and the broader field, there is a need for a framework that can efficiently and reproducibly build an integrated, standards-compliant, and comprehensive biomedical knowledge graph that can be downloaded in standard serialized form or queried via a public application programming interface (API). RESULTS To create a knowledge provider system within the Translator project, we have developed RTX-KG2, an open-source software system for building-and hosting a web API for querying-a biomedical knowledge graph that uses an Extract-Transform-Load approach to integrate 70 knowledge sources (including the aforementioned core six sources) into a knowledge graph with provenance information including (where available) citations. The semantic layer and schema for RTX-KG2 follow the standard Biolink model to maximize interoperability. RTX-KG2 is currently being used by multiple Translator reasoning agents, both in its downloadable form and via its SmartAPI-registered interface. Serializations of RTX-KG2 are available for download in both the pre-canonicalized form and in canonicalized form (in which synonyms are merged). The current canonicalized version (KG2.7.3) of RTX-KG2 contains 6.4M nodes and 39.3M edges with a hierarchy of 77 relationship types from Biolink. CONCLUSION RTX-KG2 is the first knowledge graph that integrates UMLS, SemMedDB, ChEMBL, DrugBank, Reactome, SMPDB, and 64 additional knowledge sources within a knowledge graph that conforms to the Biolink standard for its semantic layer and schema. RTX-KG2 is publicly available for querying via its API at arax.rtx.ai/api/rtxkg2/v1.2/openapi.json . The code to build RTX-KG2 is publicly available at github:RTXteam/RTX-KG2 .
Collapse
Affiliation(s)
- E C Wood
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - Amy K Glen
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA.
| | - Lindsey G Kvarfordt
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - Finn Womack
- Computer Science and Engineering, Penn State University, State College, PA, USA
| | - Liliana Acevedo
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - Timothy S Yoon
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - Chunyu Ma
- Huck Institutes of the Life Sciences, Penn State University, State College, PA, USA
| | - Veronica Flores
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - Meghamala Sinha
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | | | - Arash Termehchy
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | | | | | - Andrew S Hoffman
- Interdisciplinary Hub for Digitalization and Society, Radboud University, Nijmegen, The Netherlands
| | | | - David Koslicki
- Computer Science and Engineering, Penn State University, State College, PA, USA.,Huck Institutes of the Life Sciences, Penn State University, State College, PA, USA.,Department of Biology, Penn State University, State College, PA, USA
| | - Stephen A Ramsey
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA.,Department of Biomedical Sciences, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
41
|
Babbi G, Savojardo C, Baldazzi D, Martelli PL, Casadio R. Pathogenic variation types in human genes relate to diseases through Pfam and InterPro mapping. Front Mol Biosci 2022; 9:966927. [PMID: 36188216 PMCID: PMC9523224 DOI: 10.3389/fmolb.2022.966927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Accepted: 08/31/2022] [Indexed: 11/13/2022] Open
Abstract
Grouping residue variations in a protein according to their physicochemical properties allows a dimensionality reduction of all the possible substitutions in a variant with respect to the wild type. Here, by using a large dataset of proteins with disease-related and benign variations, as derived by merging Humsavar and ClinVar data, we investigate to which extent our physicochemical grouping procedure can help in determining whether patterns of variation types are related to specific groups of diseases and whether they occur in Pfam and/or InterPro gene domains. Here, we download 75,145 germline disease-related and benign variations of 3,605 genes, group them according to physicochemical categories and map them into Pfam and InterPro gene domains. Statistically validated analysis indicates that each cluster of genes associated to Mondo anatomical system categorizations is characterized by a specific variation pattern. Patterns identify specific Pfam and InterPro domain–Mondo category associations. Our data suggest that the association of variation patterns to Mondo categories is unique and may help in associating gene variants to genetic diseases. This work corroborates in a much larger data set previous observations from our group.
Collapse
Affiliation(s)
- Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
- *Correspondence: Pier Luigi Martelli, ; Rita Casadio,
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), Bari, Italy
- *Correspondence: Pier Luigi Martelli, ; Rita Casadio,
| |
Collapse
|
42
|
Arora UP, Dumont BL. Meiotic drive in house mice: mechanisms, consequences, and insights for human biology. Chromosome Res 2022; 30:165-186. [PMID: 35829972 PMCID: PMC9509409 DOI: 10.1007/s10577-022-09697-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Revised: 04/20/2022] [Accepted: 04/27/2022] [Indexed: 11/27/2022]
Abstract
Meiotic drive occurs when one allele at a heterozygous site cheats its way into a disproportionate share of functional gametes, violating Mendel's law of equal segregation. This genetic conflict typically imposes a fitness cost to individuals, often by disrupting the process of gametogenesis. The evolutionary impact of meiotic drive is substantial, and the phenomenon has been associated with infertility and reproductive isolation in a wide range of organisms. However, cases of meiotic drive in humans remain elusive, a finding that likely reflects the inherent challenges of detecting drive in our species rather than unique features of human genome biology. Here, we make the case that house mice (Mus musculus) present a powerful model system to investigate the mechanisms and consequences of meiotic drive and facilitate translational inferences about the scope and potential mechanisms of drive in humans. We first detail how different house mouse resources have been harnessed to identify cases of meiotic drive and the underlying mechanisms utilized to override Mendel's rules of inheritance. We then summarize the current state of knowledge of meiotic drive in the mouse genome. We profile known mechanisms leading to transmission bias at several established drive elements. We discuss how a detailed understanding of meiotic drive in mice can steer the search for drive elements in our own species. Lastly, we conclude with a prospective look into how new technologies and molecular tools can help resolve lingering mysteries about the prevalence and mechanisms of selfish DNA transmission in mammals.
Collapse
Affiliation(s)
- Uma P Arora
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME, 04609, USA
- Graduate School of Biomedical Sciences, Tufts University, 136 Harrison Ave, Boston, MA, 02111, USA
| | - Beth L Dumont
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME, 04609, USA.
- Graduate School of Biomedical Sciences, Tufts University, 136 Harrison Ave, Boston, MA, 02111, USA.
| |
Collapse
|
43
|
Riggs ER, Bingaman TI, Barry CA, Behlmann A, Bluske K, Bostwick B, Bright A, Chen CA, Clause AR, Dharmadhikari AV, Ganapathi M, Gonzaga-Jauregui C, Grant AR, Hughes MY, Kim SR, Krause A, Liao J, Lumaka A, Mah M, Maloney CM, Mohan S, Osei-Owusu IA, Reble E, Rennie O, Savatt JM, Shimelis H, Siegert RK, Sneddon TP, Thaxton C, Toner KA, Tran KT, Webb R, Wilcox EH, Yin J, Zhuo X, Znidarsic M, Martin CL, Betancur C, Vorstman JAS, Miller DT, Schaaf CP. Clinical validity assessment of genes frequently tested on intellectual disability/autism sequencing panels. Genet Med 2022; 24:1899-1908. [PMID: 35616647 PMCID: PMC10200330 DOI: 10.1016/j.gim.2022.05.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 04/28/2022] [Accepted: 05/02/2022] [Indexed: 12/27/2022] Open
Abstract
PURPOSE Neurodevelopmental disorders (NDDs), such as intellectual disability (ID) and autism spectrum disorder (ASD), exhibit genetic and phenotypic heterogeneity, making them difficult to differentiate without a molecular diagnosis. The Clinical Genome Resource Intellectual Disability/Autism Gene Curation Expert Panel (GCEP) uses systematic curation to distinguish ID/ASD genes that are appropriate for clinical testing (ie, with substantial evidence supporting their relationship to disease) from those that are not. METHODS Using the Clinical Genome Resource gene-disease validity curation framework, the ID/Autism GCEP classified genes frequently included on clinical ID/ASD testing panels as Definitive, Strong, Moderate, Limited, Disputed, Refuted, or No Known Disease Relationship. RESULTS As of September 2021, 156 gene-disease pairs have been evaluated. Although most (75%) were determined to have definitive roles in NDDs, 22 (14%) genes evaluated had either Limited or Disputed evidence. Such genes are currently not recommended for use in clinical testing owing to the limited ability to assess the effect of identified variants. CONCLUSION Our understanding of gene-disease relationships evolves over time; new relationships are discovered and previously-held conclusions may be questioned. Without periodic re-examination, inaccurate gene-disease claims may be perpetuated. The ID/Autism GCEP will continue to evaluate these claims to improve diagnosis and clinical care for NDDs.
Collapse
Affiliation(s)
| | | | | | | | | | - Bret Bostwick
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX
| | | | - Chun-An Chen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX
| | | | - Avinash V Dharmadhikari
- Department of Pathology and Laboratory Medicine, Children's Hospital of Los Angeles, Los Angeles, CA; Keck School of Medicine, University of Southern California, Los Angeles, CA
| | - Mythily Ganapathi
- Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY
| | - Claudia Gonzaga-Jauregui
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Juriquilla, Querétaro, Mexico
| | - Andrew R Grant
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA; New York Medical College, Valhalla, NY
| | | | - Se Rin Kim
- National Human Genome Research Institute, Bethesda, MD
| | - Amanda Krause
- Division of Human Genetics, National Health Laboratory Service and School of Pathology, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Jun Liao
- Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY
| | - Aimé Lumaka
- Laboratoire de Génétique Humaine, University of Liège, Liège, Belgium
| | - Michelle Mah
- Trillium Health Partners, Mississauga, Ontario, Canada
| | | | | | - Ikeoluwa A Osei-Owusu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Emma Reble
- St. Michael's Hospital, Unity Health Toronto, Toronto, Ontario, Canada
| | - Olivia Rennie
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Juliann M Savatt
- Autism & Developmental Medicine Institute, Geisinger, Danville, PA
| | - Hermela Shimelis
- Autism & Developmental Medicine Institute, Geisinger, Danville, PA
| | - Rebecca K Siegert
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Tam P Sneddon
- Department of Pathology and Laboratory Medicine, School of Medicine, The University of North Carolina, Chapel Hill, NC
| | - Courtney Thaxton
- Department of Pathology and Laboratory Medicine, School of Medicine, The University of North Carolina, Chapel Hill, NC
| | - Kelly A Toner
- Drexel University College of Medicine, Philadelphia, PA
| | - Kien Trung Tran
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Ryan Webb
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA
| | - Emma H Wilcox
- The Warren Alpert Medical School of Brown University, Providence, RI
| | - Jiani Yin
- Department of Neurology, University of California Los Angeles, Los Angeles, CA
| | - Xinming Zhuo
- The Jackson Laboratory for Genomic Medicine, Farmington, CT
| | - Masa Znidarsic
- University Medical Center Ljubljana, Ljubljana, Slovenia
| | | | - Catalina Betancur
- Sorbonne Université, INSERM, CNRS, Neuroscience Paris Seine, Institut de Biologie Paris Seine, Paris, France
| | - Jacob A S Vorstman
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - David T Miller
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA
| | - Christian P Schaaf
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX; Institute of Human Genetics, Heidelberg University Hospital, Heidelberg, Germany
| |
Collapse
|
44
|
Ochsner SA, Pillich RT, Rawool D, Grethe JS, McKenna NJ. Transcriptional regulatory networks of circulating immune cells in type 1 diabetes: A community knowledgebase. iScience 2022; 25:104581. [PMID: 35832893 PMCID: PMC9272393 DOI: 10.1016/j.isci.2022.104581] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 06/01/2022] [Accepted: 06/07/2022] [Indexed: 12/02/2022] Open
Abstract
Investigator-generated transcriptomic datasets interrogating circulating immune cell (CIC) gene expression in clinical type 1 diabetes (T1D) have underappreciated re-use value. Here, we repurposed these datasets to create an open science environment for the generation of hypotheses around CIC signaling pathways whose gain or loss of function contributes to T1D pathogenesis. We firstly computed sets of genes that were preferentially induced or repressed in T1D CICs and validated these against community benchmarks. We then inferred and validated signaling node networks regulating expression of these gene sets, as well as differentially expressed genes in the original underlying T1D case:control datasets. In a set of three use cases, we demonstrated how informed integration of these networks with complementary digital resources supports substantive, actionable hypotheses around signaling pathway dysfunction in T1D CICs. Finally, we developed a federated, cloud-based web resource that exposes the entire data matrix for unrestricted access and re-use by the research community. Re-use of transcriptomic type 1 diabetes (T1D) circulating immune cells (CICs) datasets We generated transcriptional regulatory networks for T1D CICs Use cases generate substantive hypotheses around signaling pathway dysfunction in T1D CICs Networks are freely accessible on the web for re-use by the research community
Collapse
Affiliation(s)
- Scott A. Ochsner
- Department of Molecular, Baylor College of Medicine, Houston, TX 77030, USA
- Cellular Biology and Medicine, Baylor College of Medicine, Houston, TX 77030, USA
| | - Rudolf T. Pillich
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Deepali Rawool
- Center for Research in Biological Systems, University of California San Diego, La Jolla, CA 92093, USA
| | - Jeffrey S. Grethe
- Center for Research in Biological Systems, University of California San Diego, La Jolla, CA 92093, USA
| | - Neil J. McKenna
- Department of Molecular, Baylor College of Medicine, Houston, TX 77030, USA
- Cellular Biology and Medicine, Baylor College of Medicine, Houston, TX 77030, USA
- Corresponding author
| |
Collapse
|
45
|
Königs C, Friedrichs M, Dietrich T. The heterogeneous pharmacological medical biochemical network PharMeBINet. Sci Data 2022; 9:393. [PMID: 35821017 PMCID: PMC9276653 DOI: 10.1038/s41597-022-01510-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 06/22/2022] [Indexed: 12/04/2022] Open
Abstract
Heterogeneous biomedical pharmacological databases are important for multiple fields in bioinformatics. Hetionet is a freely available database combining diverse entities and relationships from 29 public resources. Therefore, it is used as the basis for this project. 19 additional pharmacological medical and biological databases such as CTD, DrugBank, and ClinVar are parsed and integrated into Neo4j. Afterwards, the information is merged into the Hetionet structure. Different mapping methods are used such as external identification systems or name mapping. The resulting open-source Neo4j database PharMeBINet has 2,869,407 different nodes with 66 labels and 15,883,653 relationships with 208 edge types. It is a heterogeneous database containing interconnected information on ADRs, diseases, drugs, genes, gene variations, proteins, and more. Relationships between these entities represent drug-drug interactions or drug-causes-ADR relations, to name a few. It has much potential for developing further data analyses including machine learning applications. A web application for accessing the database is free to use for everyone and available at https://pharmebi.net. Additionally, the database is deposited on Zenodo at 10.5281/zenodo.6578218. Measurement(s) | data integration objective | Technology Type(s) | database creation objective |
Collapse
Affiliation(s)
- Cassandra Königs
- Bielefeld University, Bioinformatics/Medical Informatics Department, Bielefeld, 33615, Germany.
| | - Marcel Friedrichs
- Bielefeld University, Bioinformatics/Medical Informatics Department, Bielefeld, 33615, Germany
| | - Theresa Dietrich
- Bielefeld University, Bioinformatics/Medical Informatics Department, Bielefeld, 33615, Germany
| |
Collapse
|
46
|
Adamowicz K, Maier A, Baumbach J, Blumenthal DB. Online in silico validation of disease and gene sets, clusterings or subnetworks with DIGEST. Brief Bioinform 2022; 23:6618231. [PMID: 35753693 DOI: 10.1093/bib/bbac247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 05/25/2022] [Accepted: 05/26/2022] [Indexed: 11/12/2022] Open
Abstract
As the development of new drugs reaches its physical and financial limits, drug repurposing has become more important than ever. For mechanistically grounded drug repurposing, it is crucial to uncover the disease mechanisms and to detect clusters of mechanistically related diseases. Various methods for computing candidate disease mechanisms and disease clusters exist. However, in the absence of ground truth, in silico validation is challenging. This constitutes a major hurdle toward the adoption of in silico prediction tools by experimentalists who are often hesitant to carry out wet-lab validations for predicted candidate mechanisms without clearly quantified initial plausibility. To address this problem, we present DIGEST (in silico validation of disease and gene sets, clusterings or subnetworks), a Python-based validation tool available as a web interface (https://digest-validation.net), as a stand-alone package or over a REST API. DIGEST greatly facilitates in silico validation of gene and disease sets, clusterings or subnetworks via fully automated pipelines comprising disease and gene ID mapping, enrichment analysis, comparisons of shared genes and variants and background distribution estimation. Moreover, functionality is provided to automatically update the external databases used by the pipelines. DIGEST hence allows the user to assess the statistical significance of candidate mechanisms with regard to functional and genetic coherence and enables the computation of empirical $P$-values with just a few mouse clicks.
Collapse
Affiliation(s)
- Klaudia Adamowicz
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Andreas Maier
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany.,Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - David B Blumenthal
- Department Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
| |
Collapse
|
47
|
Alghamdi SM, Schofield PN, Hoehndorf R. How much do model organism phenotypes contribute to the computational identification of human disease genes? Dis Model Mech 2022; 15:275986. [PMID: 35758016 PMCID: PMC9366895 DOI: 10.1242/dmm.049441] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 06/13/2022] [Indexed: 12/04/2022] Open
Abstract
Computing phenotypic similarity helps identify new disease genes and diagnose rare diseases. Genotype–phenotype data from orthologous genes in model organisms can compensate for lack of human data and increase genome coverage. In the past decade, cross-species phenotype comparisons have proven valuble, and several ontologies have been developed for this purpose. The relative contribution of different model organisms to computational identification of disease-associated genes is not fully explored. We used phenotype ontologies to semantically relate phenotypes resulting from loss-of-function mutations in model organisms to disease-associated phenotypes in humans. Semantic machine learning methods were used to measure the contribution of different model organisms to the identification of known human gene–disease associations. We found that mouse genotype–phenotype data provided the most important dataset in the identification of human disease genes by semantic similarity and machine learning over phenotype ontologies. Other model organisms' data did not improve identification over that obtained using the mouse alone, and therefore did not contribute significantly to this task. Our work impacts on the development of integrated phenotype ontologies, as well as for the use of model organism phenotypes in human genetic variant interpretation. This article has an associated First Person interview with the first author of the paper. Editor's choice: We investigated the use of model organism phenotypes in the computational identification of disease genes, identifying several data biases and concluding that mouse model phenotypes contribute most to computational disease gene identification.
Collapse
Affiliation(s)
- Sarah M Alghamdi
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, 23955 Thuwal, Saudi Arabia
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, CB2 3EG, Cambridge, UK
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, 23955 Thuwal, Saudi Arabia
| |
Collapse
|
48
|
Pagano-Márquez R, Córdoba-Caballero J, Martínez-Poveda B, Quesada AR, Rojano E, Seoane P, Ranea JAG, Ángel Medina M. Deepening the knowledge of rare diseases dependent on angiogenesis through semantic similarity clustering and network analysis. Brief Bioinform 2022; 23:6613395. [PMID: 35731990 PMCID: PMC9294413 DOI: 10.1093/bib/bbac220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 04/28/2022] [Accepted: 05/11/2022] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Angiogenesis is regulated by multiple genes whose variants can lead to different disorders. Among them, rare diseases are a heterogeneous group of pathologies, most of them genetic, whose information may be of interest to determine the still unknown genetic and molecular causes of other diseases. In this work, we use the information on rare diseases dependent on angiogenesis to investigate the genes that are associated with this biological process and to determine if there are interactions between the genes involved in its deregulation. RESULTS We propose a systemic approach supported by the use of pathological phenotypes to group diseases by semantic similarity. We grouped 158 angiogenesis-related rare diseases in 18 clusters based on their phenotypes. Of them, 16 clusters had traceable gene connections in a high-quality interaction network. These disease clusters are associated with 130 different genes. We searched for genes associated with angiogenesis througth ClinVar pathogenic variants. Of the seven retrieved genes, our system confirms six of them. Furthermore, it allowed us to identify common affected functions among these disease clusters. AVAILABILITY https://github.com/ElenaRojano/angio_cluster. CONTACT seoanezonjic@uma.es and elenarojano@uma.es.
Collapse
Affiliation(s)
- Raquel Pagano-Márquez
- Department of Molecular Biology and Biochemistry, University of Malaga, Andalucia Tech, Bulevar Louis Pasteur 31, E-29071, Malaga, Spain
| | - José Córdoba-Caballero
- Department of Molecular Biology and Biochemistry, University of Malaga, Andalucia Tech, Bulevar Louis Pasteur 31, E-29071, Malaga, Spain
| | - Beatriz Martínez-Poveda
- Department of Molecular Biology and Biochemistry, University of Malaga, Andalucia Tech, Bulevar Louis Pasteur 31, E-29071, Malaga, Spain.,CIBER de Enfermedades Cardiovasculares, CIBERCV, Av. Monforte de Lemos, 3-5, Pabellon 11, Planta 0, 28029, Madrid, Spain.,Biomedical Research Institute of Malaga, IBIMA, Calle Doctor Miguel Diaz Recio 28, 29010, Malaga, Spain
| | - Ana R Quesada
- Department of Molecular Biology and Biochemistry, University of Malaga, Andalucia Tech, Bulevar Louis Pasteur 31, E-29071, Malaga, Spain.,Biomedical Research Institute of Malaga, IBIMA, Calle Doctor Miguel Diaz Recio 28, 29010, Malaga, Spain.,CIBER de Enfermedades Raras, CIBERER, Av. Monforte de Lemos, 3-5, Pabellon 11, Planta 0, 28029, Madrid, Spain
| | - Elena Rojano
- Department of Molecular Biology and Biochemistry, University of Malaga, Andalucia Tech, Bulevar Louis Pasteur 31, E-29071, Malaga, Spain.,Biomedical Research Institute of Malaga, IBIMA, Calle Doctor Miguel Diaz Recio 28, 29010, Malaga, Spain.,CIBER de Enfermedades Raras, CIBERER, Av. Monforte de Lemos, 3-5, Pabellon 11, Planta 0, 28029, Madrid, Spain
| | - Pedro Seoane
- Department of Molecular Biology and Biochemistry, University of Malaga, Andalucia Tech, Bulevar Louis Pasteur 31, E-29071, Malaga, Spain.,Biomedical Research Institute of Malaga, IBIMA, Calle Doctor Miguel Diaz Recio 28, 29010, Malaga, Spain
| | - Juan A G Ranea
- Department of Molecular Biology and Biochemistry, University of Malaga, Andalucia Tech, Bulevar Louis Pasteur 31, E-29071, Malaga, Spain.,Biomedical Research Institute of Malaga, IBIMA, Calle Doctor Miguel Diaz Recio 28, 29010, Malaga, Spain.,CIBER de Enfermedades Raras, CIBERER, Av. Monforte de Lemos, 3-5, Pabellon 11, Planta 0, 28029, Madrid, Spain
| | - Miguel Ángel Medina
- Department of Molecular Biology and Biochemistry, University of Malaga, Andalucia Tech, Bulevar Louis Pasteur 31, E-29071, Malaga, Spain.,Biomedical Research Institute of Malaga, IBIMA, Calle Doctor Miguel Diaz Recio 28, 29010, Malaga, Spain.,CIBER de Enfermedades Raras, CIBERER, Av. Monforte de Lemos, 3-5, Pabellon 11, Planta 0, 28029, Madrid, Spain
| |
Collapse
|
49
|
Waldrop AM, Cheadle JB, Bradford K, Preiss A, Chew R, Holt JR, Kebede Y, Braswell N, Watson M, Hench V, Crerar A, Ball CM, Schreep C, Linebaugh PJ, Hiles H, Boyles R, Bizon C, Krishnamurthy A, Cox S. Dug: a semantic search engine leveraging peer-reviewed knowledge to query biomedical data repositories. Bioinformatics 2022; 38:3252-3258. [PMID: 35441678 PMCID: PMC9991886 DOI: 10.1093/bioinformatics/btac284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 03/04/2022] [Accepted: 04/15/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION As the number of public data resources continues to proliferate, identifying relevant datasets across heterogenous repositories is becoming critical to answering scientific questions. To help researchers navigate this data landscape, we developed Dug: a semantic search tool for biomedical datasets utilizing evidence-based relationships from curated knowledge graphs to find relevant datasets and explain why those results are returned. RESULTS Developed through the National Heart, Lung and Blood Institute's (NHLBI) BioData Catalyst ecosystem, Dug has indexed more than 15 911 study variables from public datasets. On a manually curated search dataset, Dug's total recall (total relevant results/total results) of 0.79 outperformed default Elasticsearch's total recall of 0.76. When using synonyms or related concepts as search queries, Dug (0.36) far outperformed Elasticsearch (0.14) in terms of total recall with no significant loss in the precision of its top results. AVAILABILITY AND IMPLEMENTATION Dug is freely available at https://github.com/helxplatform/dug. An example Dug deployment is also available for use at https://search.biodatacatalyst.renci.org/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alexander M Waldrop
- Center for Genomics, Bioinformatics, and Translational Research, RTI International, Research Triangle Park, NC 27709-2194, USA
| | - John B Cheadle
- Research Computing Division, RTI International, Research Triangle Park, NC 27709-2194, USA
| | - Kira Bradford
- Renaissance Computing Institute, University of Chapel Hill, North Carolina, Chapel Hill, NC 27599-7568, USA
| | - Alexander Preiss
- Center for Data Science, RTI International, Research Triangle Park, NC 27709-2194, USA
| | - Robert Chew
- Center for Data Science, RTI International, Research Triangle Park, NC 27709-2194, USA
| | - Jonathan R Holt
- Center for Data Science, RTI International, Research Triangle Park, NC 27709-2194, USA
| | - Yaphet Kebede
- Renaissance Computing Institute, University of Chapel Hill, North Carolina, Chapel Hill, NC 27599-7568, USA
| | - Nathan Braswell
- Research Computing Division, RTI International, Research Triangle Park, NC 27709-2194, USA
| | - Matt Watson
- Renaissance Computing Institute, University of Chapel Hill, North Carolina, Chapel Hill, NC 27599-7568, USA
| | - Virginia Hench
- Center for Genomics, Bioinformatics, and Translational Research, RTI International, Research Triangle Park, NC 27709-2194, USA
| | - Andrew Crerar
- Center for Genomics, Bioinformatics, and Translational Research, RTI International, Research Triangle Park, NC 27709-2194, USA
| | - Chris M Ball
- Research Computing Division, RTI International, Research Triangle Park, NC 27709-2194, USA
| | - Carl Schreep
- Renaissance Computing Institute, University of Chapel Hill, North Carolina, Chapel Hill, NC 27599-7568, USA
| | - P J Linebaugh
- Renaissance Computing Institute, University of Chapel Hill, North Carolina, Chapel Hill, NC 27599-7568, USA
| | - Hannah Hiles
- Renaissance Computing Institute, University of Chapel Hill, North Carolina, Chapel Hill, NC 27599-7568, USA
| | - Rebecca Boyles
- Research Computing Division, RTI International, Research Triangle Park, NC 27709-2194, USA
| | - Chris Bizon
- Renaissance Computing Institute, University of Chapel Hill, North Carolina, Chapel Hill, NC 27599-7568, USA
| | - Ashok Krishnamurthy
- Renaissance Computing Institute, University of Chapel Hill, North Carolina, Chapel Hill, NC 27599-7568, USA.,Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7548, USA
| | - Steve Cox
- Renaissance Computing Institute, University of Chapel Hill, North Carolina, Chapel Hill, NC 27599-7568, USA
| |
Collapse
|
50
|
Navarro AM, Orti F, Martínez-Pérez E, Alonso M, Simonetti FL, Iserte JA, Marino-Buslje C. DisPhaseDB: an integrative database of diseases related variations in liquid-liquid phase separation proteins. Comput Struct Biotechnol J 2022; 20:2551-2557. [PMID: 35685370 PMCID: PMC9156858 DOI: 10.1016/j.csbj.2022.05.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 05/03/2022] [Accepted: 05/03/2022] [Indexed: 11/29/2022] Open
Abstract
Phase separation proteins involved in membraneless organelles are increasingly implicated in several complex human diseases. DisPhaseDB integrates ten repositories for analyzing clinically relevant mutations in phase separation proteins. Contains over a million disease-related mutations mapped onto the protein sequences along with extensive metadata. It is a comprehensive meta-database, implemented in an user-friendly web with visualization tools and downloadable datasets. DisPhaseDB will contribute deciphering still not fully understood human disease mechanisms under the lens of phase separation.
Motivation Proteins involved in liquid–liquid phase separation (LLPS) and membraneless organelles (MLOs) are recognized to be decisive for many biological processes and also responsible for several diseases. The recent explosion of research in the area still lacks tools for the analysis and data integration among different repositories. Currently, there is not a comprehensive and dedicated database that collects all disease-related variations in combination with the protein location, biological role in the MLO, and all the metadata available for each protein and disease. Disease-related protein variants and additional features are dispersed and the user has to navigate many databases, with a different focus, formats, and often not user friendly. Results We present DisPhaseDB, a database dedicated to disease-related variants of liquid–liquid phase separation proteins. It integrates 10 databases, contains 5,741 proteins, 1,660,059 variants, and 4,051 disease terms. It also offers intuitive navigation and an informative display. It constitutes a pivotal starting point for further analysis, encouraging the development of new computational tools. The database is freely available at http://disphasedb.leloir.org.ar.
Collapse
|