1
|
Manders TR, Tan CA, Kobayashi Y, Wahl A, Araya C, Colavin A, Facio FM, Metz H, Reuter J, Frésard L, Padigepati SR, Stafford DA, Nussbaum RL, Nykamp K. Harnessing genotype and phenotype data for population-scale variant classification using large language models and bayesian inference. Hum Genet 2025:10.1007/s00439-025-02743-z. [PMID: 40266329 DOI: 10.1007/s00439-025-02743-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Accepted: 03/31/2025] [Indexed: 04/24/2025]
Abstract
Variants of Uncertain Significance (VUS) in genetic testing for hereditary diseases burden patients and clinicians, yet clinical data that could reduce VUS are underutilized due to a lack of scalable strategies. We assessed whether a machine learning approach using genotype and phenotype data could improve variant classification and reduce VUS. In this cohort study of a multi-step machine learning approach, patient data from test requisition forms were used to distinguish patients with molecular diagnoses from controls ("patient score"). A generative Bayesian model then used patient scores and variant classifications to infer variant pathogenicity ("variant score"). The study included 3.5 million patients referred for clinical genetic testing across various conditions. Primary outcomes were model- and gene-level discrimination, classification performance, probabilistic calibration, and concordance with orthogonal pathogenicity measures. Integration into a semi-quantitative classification framework was based on posterior pathogenicity probabilities matching PPV ≥ 0.99/NPV ≥ 0.95 thresholds, followed by expert review. We generated 1,334 clinical variant models (CVMs); 595 showed high performance in both machine learning steps (AUROCpatient ≥ 0.8 and AUROCvariant ≥ 0.8) on held-out data. High-confidence predictions from these CVMs provided evidence for 5,362 VUS observed in 200,174 patients, representing 23.4% of all VUS observations in these genes. In 17 frequently tested genes, CVMs reclassified over 1,000 unique VUS, reducing VUS report rates by 9-49% per condition. In conclusion, a scalable machine learning approach using underutilized clinical data improved variant classification and reduced VUS.
Collapse
Affiliation(s)
- Toby R Manders
- Labcorp Genetics Inc, 1400 16th Street, San Francisco, CA, 94103, USA.
| | - Christopher A Tan
- Labcorp Genetics Inc, 1400 16th Street, San Francisco, CA, 94103, USA
| | - Yuya Kobayashi
- Labcorp Genetics Inc, 1400 16th Street, San Francisco, CA, 94103, USA
| | - Alexander Wahl
- Labcorp Genetics Inc, 1400 16th Street, San Francisco, CA, 94103, USA
| | - Carlos Araya
- Invitae Corporation, 1400 16th Street, San Francisco, CA, 94103, USA
- Tapanti.org, PO Box #727, 836 Anacapa St, Santa Barbara, CA, 93102, USA
| | - Alexandre Colavin
- Invitae Corporation, 1400 16th Street, San Francisco, CA, 94103, USA
- Present Address: Threshold Health Inc, 1638 Myrtle Ave, San Diego, CA, 92103, USA
| | - Flavia M Facio
- Labcorp Genetics Inc, 1400 16th Street, San Francisco, CA, 94103, USA
- Present Address: GeneDx, 205/207 Perry Parkway, Gaitherburg, MD, 20877, USA
| | - Hillery Metz
- Labcorp Genetics Inc, 1400 16th Street, San Francisco, CA, 94103, USA
| | - Jason Reuter
- Labcorp Genetics Inc, 1400 16th Street, San Francisco, CA, 94103, USA
| | - Laure Frésard
- Labcorp Genetics Inc, 1400 16th Street, San Francisco, CA, 94103, USA
| | | | - David A Stafford
- Labcorp Genetics Inc, 1400 16th Street, San Francisco, CA, 94103, USA
| | - Robert L Nussbaum
- Invitae Corporation, 1400 16th Street, San Francisco, CA, 94103, USA
- Present Address: Division of Medical Genetics, Department of Pediatrics, University of California San Francisco, 1825 4th St, San Francisco, CA, 94158, USA
| | - Keith Nykamp
- Labcorp Genetics Inc, 1400 16th Street, San Francisco, CA, 94103, USA
- Present Address: GeneDx, 205/207 Perry Parkway, Gaitherburg, MD, 20877, USA
| |
Collapse
|
2
|
Groza T, Rayabsri W, Gration D, Hariram H, Jamuar SS, Baynam G. First steps toward building natural history of diseases computationally: Lessons learned from the Noonan syndrome use case. Am J Hum Genet 2025:S0002-9297(25)00135-1. [PMID: 40245863 DOI: 10.1016/j.ajhg.2025.03.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2024] [Revised: 03/20/2025] [Accepted: 03/21/2025] [Indexed: 04/19/2025] Open
Abstract
Rare diseases (RDs) are conditions affecting fewer than 1 in 2,000 people, with over 7,000 identified, primarily genetic in nature, and more than half impacting children. Although each RD affects a small population, collectively, between 3.5% and 5.9% of the global population, or 262.9-446.2 million people, live with an RD. Most RDs lack established treatment protocols, highlighting the need for proper care pathways addressing prognosis, diagnosis, and management. Advances in generative AI and large language models (LLMs) offer new opportunities to document the temporal progression of phenotypic features, addressing gaps in current knowledge bases. This study proposes an LLM-based framework to capture the natural history of diseases, specifically focusing on Noonan syndrome. The framework aims to document phenotypic trajectories, validate against RD knowledge bases, and integrate insights into care coordination using electronic health record (EHR) data from the Undiagnosed Diseases Program Singapore.
Collapse
Affiliation(s)
- Tudor Groza
- Rare Care Centre, Perth Children's Hospital, Nedlands, WA 6009, Australia; Bioinformatics Institute, Agency for Science, Technology and Research (A(∗)STAR), 30 Biopolis Street #07-01 Matrix, Singapore 138671, Singapore; SingHealth Duke-NUS Institute of Precision Medicine, 5 Hospital Drive Level 9, Singapore 169609, Singapore; School of Electrical Engineering, Computing and Mathematical Sciences, Curtin University, Kent Street, Bentley, WA 6102, Australia.
| | - Warittha Rayabsri
- Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, 374 Bagot Road, Subiaco, WA 6008, Australia
| | - Dylan Gration
- Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, 374 Bagot Road, Subiaco, WA 6008, Australia
| | - Harshini Hariram
- Medical Student, Division of Medical Education, School of Medical Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester M13 9PL, UK
| | - Saumya Shekhar Jamuar
- SingHealth Duke-NUS Institute of Precision Medicine, 5 Hospital Drive Level 9, Singapore 169609, Singapore; Genetics Service, Department of Paediatrics, KK Women's and Children's Hospital, 100 Bukit Timah Road, Singapore 229899, Singapore; SingHealth Duke-NUS Genomic Medicine Centre, 100 Bukit Timah Road, Singapore 229899, Singapore
| | - Gareth Baynam
- Rare Care Centre, Perth Children's Hospital, Nedlands, WA 6009, Australia; Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, 374 Bagot Road, Subiaco, WA 6008, Australia; Faculty of Health and Medical Sciences, University of Western Australia, 35 Stirling Highway, Crawley, WA 6009, Australia
| |
Collapse
|
3
|
Kumari M, Chauhan R, Garg P. MedKG: enabling drug discovery through a unified biomedical knowledge graph. Mol Divers 2025:10.1007/s11030-025-11164-z. [PMID: 40085402 DOI: 10.1007/s11030-025-11164-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2024] [Accepted: 03/07/2025] [Indexed: 03/16/2025]
Abstract
Biomedical knowledge graphs have emerged as powerful tools for drug discovery, but existing platforms often suffer from outdated information, limited accessibility, and insufficient integration of complex data. This study presents MedKG, a comprehensive and continuously updated knowledge graph designed to address these challenges in precision medicine and drug discovery. MedKG integrates data from 35 authoritative sources, encompassing 34 node types and 79 relationships. A Continuous Integration/Continuous Update pipeline ensures MedKG remains current, addressing a critical limitation of static knowledge bases. The integration of molecular embeddings enhances semantic analysis capabilities, bridging the gap between chemical structures and biological entities. To demonstrate MedKG's utility, a novel hybrid Relational Graph Convolutional Network for disease-drug link prediction, MedLINK was developed and used in case studies on clinical trial data for disease drug link prediction. Furthermore, a web-based application with user-friendly APIs and visualization tools was built, making MedKG accessible to both technical and non-technical users, which is freely available at http://pitools.niper.ac.in/medkg/.
Collapse
Affiliation(s)
- Madhavi Kumari
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research (NIPER), S.A.S. Nagar, Sector 67, S.A.S. Nagar, Mohali, Punjab, 160062, India
| | - Rohit Chauhan
- Department of Computer Science, National Institute of Technology (NIT), Durgapur, MG Road, Durgapur, West Bengal, 713209, India
| | - Prabha Garg
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research (NIPER), S.A.S. Nagar, Sector 67, S.A.S. Nagar, Mohali, Punjab, 160062, India.
| |
Collapse
|
4
|
Hier DB, Do TS, Obafemi-Ajayi T. A simplified retriever to improve accuracy of phenotype normalizations by large language models. Front Digit Health 2025; 7:1495040. [PMID: 40103736 PMCID: PMC11913805 DOI: 10.3389/fdgth.2025.1495040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Accepted: 02/12/2025] [Indexed: 03/20/2025] Open
Abstract
Large language models have shown improved accuracy in phenotype term normalization tasks when augmented with retrievers that suggest candidate normalizations based on term definitions. In this work, we introduce a simplified retriever that enhances large language model accuracy by searching the Human Phenotype Ontology (HPO) for candidate matches using contextual word embeddings from BioBERT without the need for explicit term definitions. Testing this method on terms derived from the clinical synopses of Online Mendelian Inheritance in Man (OMIM®), we demonstrate that the normalization accuracy of GPT-4o increases from a baseline of 62% without augmentation to 85% with retriever augmentation. This approach is potentially generalizable to other biomedical term normalization tasks and offers an efficient alternative to more complex retrieval methods.
Collapse
Affiliation(s)
- Daniel B Hier
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Thanh Son Do
- Department of Computer Science, Missouri State University, Springfield, MO, United States
| | - Tayo Obafemi-Ajayi
- Engineering Program, Missouri State University, Springfield, MO, United States
| |
Collapse
|
5
|
Shin J, Fujiwara T, Saitsu H, Yamaguchi A. Ontology-based expansion of virtual gene panels to improve diagnostic efficiency for rare genetic diseases. BMC Med Inform Decis Mak 2025; 25:59. [PMID: 39910609 PMCID: PMC11800421 DOI: 10.1186/s12911-025-02910-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 01/30/2025] [Indexed: 02/07/2025] Open
Abstract
BACKGROUND Virtual Gene Panels (VGP) comprising disease-associated causal genes are utilized in the diagnosis of rare genetic diseases to evaluate candidate genes identified by whole-genome and whole-exome sequencing. VGPs generated by the PanelApp software were utilized in a UK 100,000 Genome Project pilot study to filter candidate genes, thus enhancing diagnostic efficiency for rare diseases. However, PanelApp also filtered out disease-causing genes in nearly 50% of the cases. METHODS Here, we propose various methods for optimized approach to design VGPs that significantly improve the diagnostic efficiency by leveraging the hierarchical structure of the Mondo disease ontology, without excluding disease-causing genes. We also performed computational experiments on an evaluation dataset comprising 74 patients to determine the optimal VGP design method. RESULTS Our results demonstrate that the proposed method can significantly enhance rare disease diagnosis efficiency by automatically identifying candidate genes. The proposed method successfully designed VGPs that improve diagnosis efficiency without excluding disease-causing genes. CONCLUSION We have developed novel methods for VGP design that leverage the hierarchical structure of the Mondo disease ontology to improve rare genetic disease diagnosis efficiency. This approach identifies candidate genes without excluding disease-causing genes, and thereby improves diagnostic efficiency.
Collapse
Affiliation(s)
- Jaemoon Shin
- Database Center for Life Science, Kashiwa, Chiba, Japan
| | | | | | | |
Collapse
|
6
|
Ma Y, Jiang D, Li J, Zheng G, Deng Y, Gou X, Gao S, Chen C, Zhou Y, Zhang Y, Deng C, Yao Y, Han H, Su J. Systematic dissection of pleiotropic loci and critical regulons in excitatory neurons and microglia relevant to neuropsychiatric and ocular diseases. Transl Psychiatry 2025; 15:24. [PMID: 39856056 PMCID: PMC11760387 DOI: 10.1038/s41398-025-03243-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 12/08/2024] [Accepted: 01/14/2025] [Indexed: 01/27/2025] Open
Abstract
Advancements in single-cell multimodal techniques have greatly enhanced our understanding of disease-relevant loci identified through genome-wide association studies (GWASs). To investigate the biological connections between the eye and brain, we integrated bulk and single-cell multiomic profiles with GWAS summary statistics for eight neuropsychiatric and five ocular diseases. Our analysis uncovered five latent factors explaining 61.7% of the genetic variance across these 13 diseases, revealing diverse correlational patterns among them. We identified 45 pleiotropic loci with 91 candidate genes that contribute to disease risk. By integrating GWAS and single-cell profiles, we implicated excitatory neurons and microglia as key contributors in the eye-brain connections. Polygenic enrichment analysis further identified 15 pleiotropic regulons in excitatory neurons and 16 in microglia that were linked to comorbid conditions. Functionally, excitatory neuron-specific regulons were involved in axon guidance and synaptic activity, while microglia-specific regulons were associated with immune response and cell activation. In sum, these findings underscore the genetic link between psychiatric disorders and ocular diseases.
Collapse
Affiliation(s)
- Yunlong Ma
- Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China.
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Lifespan Brain Institute at Penn Med and the Children's Hospital of Philadelphia, Philadelphia, PA, USA.
| | - Dingping Jiang
- Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Jingjing Li
- Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Gongwei Zheng
- Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Yao Deng
- Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Xuanxuan Gou
- Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Shuaishuai Gao
- Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Cheng Chen
- Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Yijun Zhou
- Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Yaru Zhang
- Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Chunyu Deng
- Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Yinghao Yao
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Haijun Han
- School of Medicine, Hangzhou City University, Hangzhou, China
| | - Jianzhong Su
- Oujiang Laboratory, Zhejiang Lab for Regenerative Medicine, Vision and Brain Health, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China.
- National Engineering Research Center of Ophthalmology and Optometry, Eye Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China.
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang, China.
| |
Collapse
|
7
|
Díaz-Santiago E, Moya-García AA, Pérez-García J, Yahyaoui R, Orengo C, Pazos F, Perkins JR, Ranea JAG. Better understanding the phenotypic effects of drugs through shared targets in genetic disease networks. Front Pharmacol 2025; 15:1470931. [PMID: 39911831 PMCID: PMC11794328 DOI: 10.3389/fphar.2024.1470931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 12/12/2024] [Indexed: 02/07/2025] Open
Abstract
Introduction Most drugs fail during development and there is a clear and unmet need for approaches to better understand mechanistically how drugs exert both their intended and adverse effects. Gaining traction in this field is the use of disease data linking genes with pathological phenotypes and combining this with drugtarget interaction data. Methods We introduce methodology to associate drugs with effects, both intended and adverse, using a tripartite network approach that combines drug-target and target-phenotype data, in which targets can be represented as proteins and protein domains. Results We were able to detect associations for over 140,000 ChEMBL drugs and 3,800 phenotypes, represented as Human Phenotype Ontology (HPO) terms. The overlap of these results with the SIDER databases of known drug side effects was up to 10 times higher than random, depending on the target type, disease database and score threshold used. In terms of overlap with drug-phenotype pairs extracted from the literature, the performance of our methodology was up to 17.47 times greater than random. The top results include phenotype-drug associations that represent intended effects, particularly for cancers such as chronic myelogenous leukemia, which was linked with nilotinib. They also include adverse side effects, such as blurred vision being linked with tetracaine. Discussion This work represents an important advance in our understanding of how drugs cause intended and adverse side effects through their action on disease causing genes and has potential applications for drug development and repositioning.
Collapse
Affiliation(s)
- Elena Díaz-Santiago
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
| | | | - Jesús Pérez-García
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
| | - Raquel Yahyaoui
- Laboratory of Inherited Metabolic Diseases and Newborn Screening, Malaga Regional University Hospital, Malaga, Spain
- Instituto de Investigación Biomédica de Málaga y Plataforma en Nanomedicina-IBIMA Plataforma BIONAND, Malaga, Spain
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Florencio Pazos
- Computational Systems Biology Group, Systems Biology Department, National Centre for Biotechnology (CNB-CSIC), Madrid, Spain
| | - James R. Perkins
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
- Instituto de Investigación Biomédica de Málaga y Plataforma en Nanomedicina-IBIMA Plataforma BIONAND, Malaga, Spain
- CIBER de Enfermedades Raras, Instituto de Salud Carlos III, Madrid, Spain
| | - Juan A. G. Ranea
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
- Instituto de Investigación Biomédica de Málaga y Plataforma en Nanomedicina-IBIMA Plataforma BIONAND, Malaga, Spain
- CIBER de Enfermedades Raras, Instituto de Salud Carlos III, Madrid, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES), Instituto de Salud Carlos III (ISCIII), Madrid, Spain
| |
Collapse
|
8
|
Danis D, Bamshad MJ, Bridges Y, Caballero-Oteyza A, Cacheiro P, Carmody LC, Chimirri L, Chong JX, Coleman B, Dalgleish R, Freeman PJ, Graefe ASL, Groza T, Hansen P, Jacobsen JOB, Klocperk A, Kusters M, Ladewig MS, Marcello AJ, Mattina T, Mungall CJ, Munoz-Torres MC, Reese JT, Rehburg F, Reis BCS, Schuetz C, Smedley D, Strauss T, Sundaramurthi JC, Thun S, Wissink K, Wagstaff JF, Zocche D, Haendel MA, Robinson PN. A corpus of GA4GH phenopackets: Case-level phenotyping for genomic diagnostics and discovery. HGG ADVANCES 2025; 6:100371. [PMID: 39394689 PMCID: PMC11564936 DOI: 10.1016/j.xhgg.2024.100371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 10/04/2024] [Accepted: 10/04/2024] [Indexed: 10/14/2024] Open
Abstract
The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present Phenopacket Store. Phenopacket Store v.0.1.19 includes 6,668 phenopackets representing 475 Mendelian and chromosomal diseases associated with 423 genes and 3,834 unique pathogenic alleles curated from 959 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.
Collapse
Affiliation(s)
- Daniel Danis
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany; The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
| | - Michael J Bamshad
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA; Brotman-Baty Institute for Precision Medicine, 1959 NE Pacific Street, Box 357657, Seattle, WA 98195, USA; Department of Pediatrics, Division of Genetic Medicine, Seattle Children's Hospital, Seattle, WA 98195, USA
| | - Yasemin Bridges
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Andrés Caballero-Oteyza
- Clinic for Immunology and Rheumatology, Hanover Medical School, Hanover, Germany; RESiST-Cluster of Excellence 2155, Hanover Medical School, Hanover, Germany
| | - Pilar Cacheiro
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Leigh C Carmody
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
| | - Leonardo Chimirri
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Jessica X Chong
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA; Brotman-Baty Institute for Precision Medicine, 1959 NE Pacific Street, Box 357657, Seattle, WA 98195, USA
| | - Ben Coleman
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
| | - Raymond Dalgleish
- Department of Genetics, Genomics and Cancer Sciences, University of Leicester, Leicester, UK
| | - Peter J Freeman
- Division of Informatics, Imaging and Data Science, The University of Manchester, Manchester, UK
| | - Adam S L Graefe
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Tudor Groza
- Rare Care Centre, Perth Children's Hospital, Nedlands, WA 6009, Australia; SingHealth Duke-NUS Institute of Precision Medicine, 5 Hospital Drive Level 9, Singapore 169609, Singapore; Telethon Kids Institute, Nedlands, WA 6009, Australia
| | - Peter Hansen
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Julius O B Jacobsen
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Adam Klocperk
- Department of Immunology, 2nd Faculty of Medicine, Charles University and University Hospital in Motol, Prague, Czech Republic
| | - Maaike Kusters
- Department of Paediatric Immunology, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK; University College London Institute of Child Health, London, UK
| | - Markus S Ladewig
- Department of Ophthalmology, University Clinic Marburg - Campus Fulda, Fulda, Germany
| | - Allison J Marcello
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA
| | - Teresa Mattina
- Medica Genetics University of Catania Italy, Catania, Italy; Morgagni Foundation and Clinic, Catania, Italy
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Monica C Munoz-Torres
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Justin T Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Filip Rehburg
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Bárbara C S Reis
- Department of Allergy and Immunology, National Institute of Women's, Children's and Adolescents' Health Fernandes Figueira, Rio de Janeiro, Brazil; High Complexity Laboratory, National Institute of Women's, Children's and Adolescents' Health Fernandes Figueira, Rio de Janeiro, Brazil
| | - Catharina Schuetz
- Department of Pediatrics, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany; University Center for Rare Diseases, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany; German Center for Child and Adolescent Health (DZKJ), partner site Leipzig/Dresden, Dresden, Germany
| | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Timmy Strauss
- Department of Pediatrics, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany; University Center for Rare Diseases, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | | | - Sylvia Thun
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Kyran Wissink
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany; Utrecht University, Utrecht, the Netherlands
| | - John F Wagstaff
- Department of Genetics, Genomics and Cancer Sciences, University of Leicester, Leicester, UK
| | - David Zocche
- North West Thames Regional Genetics Service, Northwick Park & St Mark's Hospitals, London, UK
| | | | - Peter N Robinson
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany; The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA; ELLIS-European Laboratory for Learning and Intelligent Systems.
| |
Collapse
|
9
|
Bradshaw MS, Gibbs C, Martin S, Firman T, Gaskell A, Fosdick B, Layer R. Hypothesis generation for rare and undiagnosed diseases through clustering and classifying time-versioned biological ontologies. PLoS One 2024; 19:e0309205. [PMID: 39724242 DOI: 10.1371/journal.pone.0309205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 08/06/2024] [Indexed: 12/28/2024] Open
Abstract
Rare diseases affect 1-in-10 people in the United States and despite increased genetic testing, up to half never receive a diagnosis. Even when using advanced genome sequencing platforms to discover variants, if there is no connection between the variants found in the patient's genome and their phenotypes in the literature, then the patient will remain undiagnosed. When a direct variant-phenotype connection is not known, putting a patient's information in the larger context of phenotype relationships and protein-protein interactions may provide an opportunity to find an indirect explanation. Databases such as STRING contain millions of protein-protein interactions, and the Human Phenotype Ontology (HPO) contains the relations of thousands of phenotypes. By integrating these networks and clustering the entities within, we can potentially discover latent gene-to-phenotype connections. The historical records for STRING and HPO provide a unique opportunity to create a network time series for evaluating the cluster significance. Most excitingly, working with Children's Hospital Colorado, we have provided promising hypotheses about latent gene-to-phenotype connections for 38 patients. We also provide potential answers for 14 patients listed on MyGene2. Clusters our tool finds significant harbor 2.35 to 8.72 times as many gene-to-phenotype edges inferred from known drug interactions than clusters found to be insignificant. Our tool, BOCC, is available as a web app and command line tool.
Collapse
Affiliation(s)
- Michael S Bradshaw
- Department of Computer Science, University of Colorado Boulder, Boulder, CO, United States of America
| | - Connor Gibbs
- Department of Statistics, Colorado State University, Fort Collins, CO, United States of America
| | - Skylar Martin
- Department of Computer Science, University of Colorado Boulder, Boulder, CO, United States of America
| | - Taylor Firman
- Precision Medicine Institute, Children's Hospital Colorado, Aurora, CO, United States of America
| | - Alisa Gaskell
- Precision Medicine Institute, Children's Hospital Colorado, Aurora, CO, United States of America
| | - Bailey Fosdick
- Department of Biostatistics & Informatics, Colorado School of Public Health, Aurora, CO, United States of America
| | - Ryan Layer
- Department of Computer Science, University of Colorado Boulder, Boulder, CO, United States of America
| |
Collapse
|
10
|
Stear BJ, Mohseni Ahooyi T, Simmons JA, Kollar C, Hartman L, Beigel K, Lahiri A, Vasisht S, Callahan TJ, Nemarich CM, Silverstein JC, Taylor DM. Petagraph: A large-scale unifying knowledge graph framework for integrating biomolecular and biomedical data. Sci Data 2024; 11:1338. [PMID: 39695169 PMCID: PMC11655564 DOI: 10.1038/s41597-024-04070-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 11/04/2024] [Indexed: 12/20/2024] Open
Abstract
Over the past decade, there has been substantial growth in both the quantity and complexity of available biomedical data. In order to more efficiently harness this extensive data and alleviate challenges associated with integration of multi-omics data, we developed Petagraph, a biomedical knowledge graph that encompasses over 32 million nodes and 118 million relationships. Petagraph leverages more than 180 ontologies and standards in the Unified Biomedical Knowledge Graph (UBKG) to embed millions of quantitative genomics data points. Petagraph provides a cohesive data environment that enables users to efficiently analyze, annotate, and discern relationships within and across complex multi-omics datasets supported by UBKG's annotation scaffold. We demonstrate how queries on Petagraph can generate meaningful results across various research contexts and use cases.
Collapse
Affiliation(s)
- Benjamin J Stear
- Department of Biomedical and Health Informatics (DBHI), The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Taha Mohseni Ahooyi
- Department of Biomedical and Health Informatics (DBHI), The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - J Alan Simmons
- Department of Biomedical Informatics, School of Medicine, The University of Pittsburgh, Pittsburgh, PA, USA
| | - Charles Kollar
- Department of Biomedical Informatics, School of Medicine, The University of Pittsburgh, Pittsburgh, PA, USA
| | - Lance Hartman
- Department of Biomedical and Health Informatics (DBHI), The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Katherine Beigel
- Department of Biomedical and Health Informatics (DBHI), The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Aditya Lahiri
- Department of Biomedical and Health Informatics (DBHI), The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Shubha Vasisht
- Department of Biomedical and Health Informatics (DBHI), The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University Irving Medical Campus, New York, NY, USA
| | - Christopher M Nemarich
- Department of Biomedical and Health Informatics (DBHI), The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Jonathan C Silverstein
- Department of Biomedical Informatics, School of Medicine, The University of Pittsburgh, Pittsburgh, PA, USA
| | - Deanne M Taylor
- Department of Biomedical and Health Informatics (DBHI), The Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Pediatrics, University of Pennsylvania Perelman Medical School, Philadelphia, PA, USA.
| |
Collapse
|
11
|
Wang Z, Yuan Y, Wang Z, Zhang W, Chen C, Duan Z, Peng S, Zheng J, He Y, Yang X. CancerPro: deciphering the pan-cancer prognostic landscape through combinatorial enrichment analysis and knowledge network insights. NAR Genom Bioinform 2024; 6:lqae157. [PMID: 39633722 PMCID: PMC11616677 DOI: 10.1093/nargab/lqae157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 08/26/2024] [Accepted: 10/30/2024] [Indexed: 12/07/2024] Open
Abstract
Gene expression levels serve as valuable markers for assessing prognosis in cancer patients. To understand the mechanisms underlying prognosis and explore potential therapeutics across diverse cancers, we developed CancerPro (https:/medcode.link/cancerpro). This knowledge network platform integrates comprehensive biomedical data on genes, drugs, diseases and pathways, along with their interactions. By integrating ontology and knowledge graph technologies, CancerPro offers a user-friendly interface for analyzing pan-cancer prognostic markers and exploring genes or drugs of interest. CancerPro implements three core functions: gene set enrichment analysis based on multiple annotations; in-depth drug analysis; and in-depth gene list analysis. Using CancerPro, we categorized genes and cancers into distinct groups and utilized network analysis to identify key biological pathways associated with unfavorable prognostic genes. The platform further pinpoints potential drug targets and explores potential links between prognostic markers and patient characteristics such as glutathione levels and obesity. For renal and prostate cancer, CancerPro identified risk genes linked to immune deficiency pathways and alternative splicing abnormalities. This research highlights CancerPro's potential as a valuable tool for researchers to explore pan-cancer prognostic markers and uncover novel therapeutic avenues. Its flexible tools support a wide range of biological investigations, making it a versatile asset in cancer research and beyond.
Collapse
Affiliation(s)
- Zhigang Wang
- Department of Biomedical Engineering, Institute of Basic Medical Sciences Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing 100005, China
| | - Yize Yuan
- Department of Biomedical Engineering, Institute of Basic Medical Sciences Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing 100005, China
| | - Zhe Wang
- Department of Biomedical Engineering, Institute of Basic Medical Sciences Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing 100005, China
| | - Wenjia Zhang
- Department of Biomedical Engineering, Institute of Basic Medical Sciences Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing 100005, China
| | - Chong Chen
- Department of Immunology, Institute of Basic Medical Sciences Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing 100005, China
| | - Zhaojun Duan
- Department of Immunology, Institute of Basic Medical Sciences Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing 100005, China
| | - Suyuan Peng
- Institute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Jie Zheng
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Yongqun He
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Xiaolin Yang
- Department of Biomedical Engineering, Institute of Basic Medical Sciences Chinese Academy of Medical Sciences, School of Basic Medicine Peking Union Medical College, Beijing 100005, China
| |
Collapse
|
12
|
Lin Z, Wang S, Cao Y, Lin J, Sun A, Huang W, Zhou J, Hong Q. Bioinformatics and validation reveal the potential target of curcumin in the treatment of diabetic peripheral neuropathy. Neuropharmacology 2024; 260:110131. [PMID: 39179172 DOI: 10.1016/j.neuropharm.2024.110131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Revised: 08/18/2024] [Accepted: 08/20/2024] [Indexed: 08/26/2024]
Abstract
Diabetic peripheral neuropathy (DPN) is a common nerve-damaging complication of diabetes mellitus. Effective treatments are needed to alleviate and reverse diabetes-associated damage to the peripheral nerves. Curcumin is an effective neuroprotectant that plays a protective role in DPN promoted by Schwann cells (SCs) lesions. However, the potential molecular mechanism of curcumin remains unclear. Therefore, our aim is to study the detailed molecular mechanism of curcumin-mediated SCs repair in order to improve the efficacy of curcumin in the clinical treatment of DPN. First, candidate target genes of curcumin in rat SC line RSC96 cells stimulated by high glucose were identified by RNA sequencing and bioinformatic analyses. Enrichment analysis of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) was carried out by Metascape, followed by 8 algorithms on Cytoscape to determine 4 hub genes, namly Hmox1, Pten, Vegfa and Myc. Next, gene set enrichment analysis (GSEA) and Pearson function showed that Hmox1 was significantly correlated with apoptosis. Subsequently, qRT-PCR, MTT assay, flow cytometry, caspase-3 activity detection and westernblot showed that curcumin treatment increased RSC96 cell viability, reduced cell apoptosis, increased Hmox1, Pten, Vegfa and Myc expression, and up-regulated Akt phosphorylation level under high glucose environment. Finally, molecular docking predicted the binding site of curcumin to Hmox1. These results suggest that curcumin can reduce the apoptosis of SCs induced by high glucose, and Hmox1 is a potential target for curcumin. Our findings provide new insights about the mechanism of action of curcumin on SC as a potential treatment in DPN.
Collapse
Affiliation(s)
- Ziqiang Lin
- Department of Anesthesiology, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, No. 111 Dade Road, Yuexiu District, Guangzhou, Guangdong, 510000, China; Department of Anesthesiology, The Third Affiliated Hospital of Southern Medical University, No. 183 Zhongshan Avenue West, Tianhe District, Guangzhou, Guangdong, 510000, China
| | - Suo Wang
- Department of Anesthesiology, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, No. 111 Dade Road, Yuexiu District, Guangzhou, Guangdong, 510000, China
| | - Yu Cao
- Department of Anesthesiology, The Third Affiliated Hospital of Southern Medical University, No. 183 Zhongshan Avenue West, Tianhe District, Guangzhou, Guangdong, 510000, China
| | - Jialing Lin
- Department of Anesthesiology, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, No. 111 Dade Road, Yuexiu District, Guangzhou, Guangdong, 510000, China
| | - Ailing Sun
- Department of Anesthesiology, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, No. 111 Dade Road, Yuexiu District, Guangzhou, Guangdong, 510000, China
| | - Wei Huang
- Department of Anesthesiology, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, No. 111 Dade Road, Yuexiu District, Guangzhou, Guangdong, 510000, China
| | - Jun Zhou
- Department of Anesthesiology, The Third Affiliated Hospital of Southern Medical University, No. 183 Zhongshan Avenue West, Tianhe District, Guangzhou, Guangdong, 510000, China.
| | - Qingxiong Hong
- Department of Anesthesiology, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, No. 111 Dade Road, Yuexiu District, Guangzhou, Guangdong, 510000, China.
| |
Collapse
|
13
|
Reese JT, Chimirri L, Bridges Y, Danis D, Caufield JH, Wissink K, McMurry JA, Graefe ASL, Casiraghi E, Valentini G, Jacobsen JOB, Haendel M, Smedley D, Mungall CJ, Robinson PN. Systematic benchmarking demonstrates large language models have not reached the diagnostic accuracy of traditional rare-disease decision support tools. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.07.22.24310816. [PMID: 39108510 PMCID: PMC11302616 DOI: 10.1101/2024.07.22.24310816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/12/2024]
Abstract
Large language models (LLMs) show promise in supporting differential diagnosis, but their performance is challenging to evaluate due to the unstructured nature of their responses. To assess the current capabilities of LLMs to diagnose genetic diseases, we benchmarked these models on 5,213 case reports using the Phenopacket Schema, the Human Phenotype Ontology and Mondo disease ontology. Prompts generated from each phenopacket were sent to three generative pretrained transformer (GPT) models. The same phenopackets were used as input to a widely used diagnostic tool, Exomiser, in phenotype-only mode. The best LLM ranked the correct diagnosis first in 23.6% of cases, whereas Exomiser did so in 35.5% of cases. While the performance of LLMs for supporting differential diagnosis has been improving, it has not reached the level of commonly used traditional bioinformatics tools. Future research is needed to determine the best approach to incorporate LLMs into diagnostic pipelines.
Collapse
Affiliation(s)
- Justin T Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Monarch Initiative
| | - Leonardo Chimirri
- Monarch Initiative
- Berlin Institute of Health at Charite Universitaetsmedizin Berlin, Berlin, Germany
| | - Yasemin Bridges
- Monarch Initiative
- William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Daniel Danis
- Monarch Initiative
- Berlin Institute of Health at Charite Universitaetsmedizin Berlin, Berlin, Germany
| | - J Harry Caufield
- Monarch Initiative
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Kyran Wissink
- Berlin Institute of Health at Charite Universitaetsmedizin Berlin, Berlin, Germany
| | - Julie A McMurry
- Monarch Initiative
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Adam SL Graefe
- Berlin Institute of Health at Charite Universitaetsmedizin Berlin, Berlin, Germany
| | - Elena Casiraghi
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milano, Italy
- ELLIS-European Laboratory for Learning and Intelligent Systems
| | - Giorgio Valentini
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milano, Italy
- ELLIS-European Laboratory for Learning and Intelligent Systems
| | - Julius OB Jacobsen
- Monarch Initiative
- William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Melissa Haendel
- Monarch Initiative
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Damian Smedley
- Monarch Initiative
- William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Christopher J Mungall
- Monarch Initiative
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Peter N Robinson
- Monarch Initiative
- Berlin Institute of Health at Charite Universitaetsmedizin Berlin, Berlin, Germany
- ELLIS-European Laboratory for Learning and Intelligent Systems
- The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
| |
Collapse
|
14
|
Yates TM, Ansari M, Thompson L, Hunt SE, Uhalte EC, Hobson RJ, Marsh JA, Wright CF, Firth HV. Curating genomic disease-gene relationships with Gene2Phenotype (G2P). Genome Med 2024; 16:127. [PMID: 39506859 PMCID: PMC11539801 DOI: 10.1186/s13073-024-01398-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 10/21/2024] [Indexed: 11/08/2024] Open
Abstract
Genetically determined disorders are highly heterogenous in clinical presentation and underlying molecular mechanism. The evidence underpinning these conditions in the peer-reviewed literature requires robust critical evaluation for diagnostic use. Here, we present a structured curation process for Gene2Phenotype (G2P). This draws on multiple lines of clinical, bioinformatic and functional evidence. The process utilises and extends existing terminologies, allows for precise definition of the molecular basis of disease, and confidence levels to be attributed to a given gene-disease assertion. In-depth disease curation using this process will prove useful in applications including in diagnostics, research and development of targeted therapeutics. G2P: www.ebi.ac.uk/gene2phenotype .
Collapse
Affiliation(s)
- T Michael Yates
- School of Informatics, University of Edinburgh, Edinburgh, UK
- West of Scotland Clinical Genetics Service, Queen Elizabeth University Hospital, Glasgow, Queen, UK
| | - Morad Ansari
- South East Scotland Genetic Service, Western General Hospital, Edinburgh, UK
| | - Louise Thompson
- South East Scotland Genetic Service, Western General Hospital, Edinburgh, UK
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.
| | - Elena Cibrian Uhalte
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Rachel J Hobson
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Joseph A Marsh
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Caroline F Wright
- Institute of Clinical and Biomedical Clinical Sciences, University of Exeter, Exeter, UK
| | - Helen V Firth
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.
- East Anglian Medical Genetics Service, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK.
| |
Collapse
|
15
|
Standley A, Xie J, Lau AW, Grote L, Gifford AJ. Working with Miraculous Mice: Mus musculus as a Model Organism. Curr Protoc 2024; 4:e70021. [PMID: 39435766 DOI: 10.1002/cpz1.70021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
Abstract
The laboratory mouse has been described as a "miracle" model organism, providing a window by which we may gain an understanding of ourselves. Since the first recorded mouse experiment in 1664, the mouse has become the most used animal model in biomedical research. Mice are ideally suited as a model organism because of their small size, short gestation period, large litter size, and genetic similarity to humans. This article provides a broad overview of the laboratory mouse as a model organism and is intended for undergraduates and those new to working with mice. We delve into the history of the laboratory mouse and outline important terminology to accurately describe research mice. The types of laboratory mice available to researchers are reviewed, including outbred stocks, inbred strains, immunocompromised mice, and genetically engineered mice. The critical role mice have played in advancing knowledge in the areas of oncology, immunology, and pharmacology is highlighted by examining the significant contribution of mice to Nobel Prize winning research. International mouse mutagenesis programs and accurate phenotyping of mouse models are outlined. We also explain important considerations for working with mice, including animal ethics; the welfare principles of replacement, refinement, and reduction; and the choice of mouse model in experimental design. Finally, we present practical advice for maintaining a mouse colony, which involves adequate training of staff, the logistics of mouse housing, monitoring colony health, and breeding strategies. Useful resources for working with mice are also listed. The aim of this overview is to equip the reader with a broad appreciation of the enormous potential and some of the complexities of working with the laboratory mouse in a quest to improve human health. © 2024 The Author(s). Current Protocols published by Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Anick Standley
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
| | - Jinhan Xie
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
| | - Angelica Wy Lau
- Garvan Institute of Medical Research, St Vincent's Clinical School, Darlinghurst, NSW, Australia
| | - Lauren Grote
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
| | - Andrew J Gifford
- Children's Cancer Institute, Lowy Cancer Research Centre, UNSW Sydney, Sydney, NSW, Australia
- Anatomical Pathology, NSW Heath Pathology, Prince of Wales Hospital, Randwick, NSW, Australia
- School of Clinical Medicine, UNSW Medicine & Health, UNSW Sydney, Sydney, NSW, Australia
| |
Collapse
|
16
|
Slater K, Schofield PN, Wright J, Clift P, Irani A, Bradlow W, Aziz F, Gkoutos GV. Talking about diseases; developing a model of patient and public-prioritised disease phenotypes. NPJ Digit Med 2024; 7:263. [PMID: 39349692 PMCID: PMC11443070 DOI: 10.1038/s41746-024-01257-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 09/11/2024] [Indexed: 10/04/2024] Open
Abstract
Deep phenotyping describes the use of standardised terminologies to create comprehensive phenotypic descriptions of biomedical phenomena. These characterisations facilitate secondary analysis, evidence synthesis, and practitioner awareness, thereby guiding patient care. The vast majority of this knowledge is derived from sources that describe an academic understanding of disease, including academic literature and experimental databases. Previous work indicates a gulf between the priorities, perspectives, and perceptions held by different healthcare stakeholders. Using social media data, we develop a phenotype model that represents a public perspective on disease and compare this with a model derived from a combination of existing academic phenotype databases. We identified 52,198 positive disease-phenotype associations from social media across 311 diseases. We further identified 24,618 novel phenotype associations not shared by the biomedical and literature-derived phenotype model across 304 diseases, of which we considered 14,531 significant. Manifestations of disease affecting quality of life, and concerning endocrine, digestive, and reproductive diseases were over-represented in the social media phenotype model. An expert clinical review found that social media-derived associations were considered similarly well-established to those derived from literature, and were seen significantly more in patient clinical encounters. The phenotype model recovered from social media presents a significantly different perspective than existing resources derived from biomedical databases and literature, providing a large number of associations novel to the latter dataset. We propose that the integration and interrogation of these public perspectives on the disease can inform clinical awareness, improve secondary analysis, and bridge understanding and priorities across healthcare stakeholders.
Collapse
Affiliation(s)
- Karin Slater
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK.
- Centre for Environmental Research and Justice, University of Birmingham, Birmingham, UK.
- Centre for Health Data Science, University of Birmingham, Birmingham, UK.
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK.
| | - Paul N Schofield
- Department of Physiology, Development, and Neuroscience, University of Cambridge, Cambridge, UK
| | | | - Paul Clift
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Anushka Irani
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, UK
- Division of Rheumatology, Mayo Clinic Florida, Jacksonville, FL, USA
| | - William Bradlow
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Furqan Aziz
- Centre for Health Data Science, University of Birmingham, Birmingham, UK
- School of Computing and Mathematical Sciences, University of Leicester, Leicester, UK
| | - Georgios V Gkoutos
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, UK
- Centre for Environmental Research and Justice, University of Birmingham, Birmingham, UK
- Centre for Health Data Science, University of Birmingham, Birmingham, UK
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| |
Collapse
|
17
|
van Karnebeek CDM, O'Donnell-Luria A, Baynam G, Baudot A, Groza T, Jans JJM, Lassmann T, Letinturier MCV, Montgomery SB, Robinson PN, Sansen S, Mehrian-Shai R, Steward C, Kosaki K, Durao P, Sadikovic B. Leaving no patient behind! Expert recommendation in the use of innovative technologies for diagnosing rare diseases. Orphanet J Rare Dis 2024; 19:357. [PMID: 39334316 PMCID: PMC11438178 DOI: 10.1186/s13023-024-03361-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 09/11/2024] [Indexed: 09/30/2024] Open
Abstract
Genetic diagnosis plays a crucial role in rare diseases, particularly with the increasing availability of emerging and accessible treatments. The International Rare Diseases Research Consortium (IRDiRC) has set its primary goal as: "Ensuring that all patients who present with a suspected rare disease receive a diagnosis within one year if their disorder is documented in the medical literature". Despite significant advances in genomic sequencing technologies, more than half of the patients with suspected Mendelian disorders remain undiagnosed. In response, IRDiRC proposes the establishment of "a globally coordinated diagnostic and research pipeline". To help facilitate this, IRDiRC formed the Task Force on Integrating New Technologies for Rare Disease Diagnosis. This multi-stakeholder Task Force aims to provide an overview of the current state of innovative diagnostic technologies for clinicians and researchers, focusing on the patient's diagnostic journey. Herein, we provide an overview of a broad spectrum of emerging diagnostic technologies involving genomics, epigenomics and multi-omics, functional testing and model systems, data sharing, bioinformatics, and Artificial Intelligence (AI), highlighting their advantages, limitations, and the current state of clinical adaption. We provide expert recommendations outlining the stepwise application of these innovative technologies in the diagnostic pathways while considering global differences in accessibility. The importance of FAIR (Findability, Accessibility, Interoperability, and Reusability) and CARE (Collective benefit, Authority to control, Responsibility, and Ethics) data management is emphasized, along with the need for enhanced and continuing education in medical genomics. We provide a perspective on future technological developments in genome diagnostics and their integration into clinical practice. Lastly, we summarize the challenges related to genomic diversity and accessibility, highlighting the significance of innovative diagnostic technologies, global collaboration, and equitable access to diagnosis and treatment for people living with rare disease.
Collapse
Affiliation(s)
- Clara D M van Karnebeek
- Departments of Pediatrics and Human Genetics, Emma Center for Personalized Medicine, Amsterdam Gastro-Enterology Endocrinology Metabolism, Amsterdam University Medical Centers, Amsterdam, The Netherlands.
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, USA
| | - Gareth Baynam
- Aix Marseille Univ, INSERM, Marseille Medical Genetics, MMG, Marseille, France
| | - Anaïs Baudot
- Aix Marseille Univ, INSERM, Marseille Medical Genetics, MMG, Marseille, France
| | - Tudor Groza
- Rare Care Centre, Perth Children's Hospital and Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, Perth, Australia
- European Molecular Biology Laboratory (EMBL-EBI), European Bioinformatics Institute, Hinxton, UK
| | - Judith J M Jans
- Department of Genetics, Section Metabolic Diagnostics, University Medical Center Utrecht, Utrecht, The Netherlands
| | | | | | | | | | | | - Ruty Mehrian-Shai
- Pediatric Brain Cancer Molecular Lab, Sheba Medical Center, Ramat Gan, Israel
| | | | | | - Patricia Durao
- The Cure and Action for Tay-Sachs (CATS) Foundation, Altringham, UK
| | - Bekim Sadikovic
- Verspeeten Clinical Genome Centre, London Health Sciences, London, Canada
- Department of Pathology and Laboratory Medicine, Western University, London, Canada
| |
Collapse
|
18
|
Diener C, Thüre K, Engel A, Hart M, Keller A, Meese E, Fischer U. Paving the way to a neural fate - RNA signatures in naive and trans-differentiating mesenchymal stem cells. Eur J Cell Biol 2024; 103:151458. [PMID: 39341198 DOI: 10.1016/j.ejcb.2024.151458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 09/18/2024] [Accepted: 09/21/2024] [Indexed: 09/30/2024] Open
Abstract
Mesenchymal Stem Cells (MSCs) derived from the embryonic mesoderm persist as a viable source of multipotent cells in adults and have a crucial role in tissue repair. One of the most promising aspects of MSCs is their ability to trans-differentiate into cell types outside of the mesodermal lineage, such as neurons. This characteristic positions MSCs as potential therapeutic tools for neurological disorders. However, the definition of a clear MSC signature is an ongoing topic of debate. Likewise, there is still a significant knowledge gap about functional alterations of MSCs during their transition to a neural fate. In this study, our focus is on the dynamic expression of RNA in MSCs as they undergo trans-differentiation compared to undifferentiated MSCs. To track and correlate changes in cellular signaling, we conducted high-throughput RNA expression profiling during the early time-course of human MSC neurogenic trans-differentiation. The expression of synapse maturation markers, including NLGN2 and NPTX1, increased during the first 24 h. The expression of neuron differentiation markers, such as GAP43 strongly increased during 48 h of trans-differentiation. Neural stem cell marker NES and neuron differentiation marker, including TUBB3 and ENO1, were highly expressed in mesenchymal stem cells and remained so during trans-differentiation. Pathways analyses revealed early changes in MSCs signaling that can be linked to the acquisition of neuronal features. Furthermore, we identified microRNAs (miRNAs) as potential drivers of the cellular trans-differentiation process. We also determined potential risk factors related to the neural trans-differentiation process. These factors include the persistence of stemness features and the expression of factors involved in neurofunctional abnormalities and tumorigenic processes. In conclusion, our findings contribute valuable insights into the intricate landscape of MSCs during neural trans-differentiation. These insights can pave the way for the development of safer treatments of neurological disorders.
Collapse
Affiliation(s)
- Caroline Diener
- Saarland University (USAAR), Institute of Human Genetics, Homburg 66421, Germany
| | - Konstantin Thüre
- Saarland University (USAAR), Institute of Human Genetics, Homburg 66421, Germany
| | - Annika Engel
- Saarland University (USAAR), Chair for Clinical Bioinformatics, Saarbrücken 66123, Germany; Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Center for Infection Research (HZI), Saarland University Campus, Saarbrücken 66123, Germany
| | - Martin Hart
- Saarland University (USAAR), Institute of Human Genetics, Homburg 66421, Germany
| | - Andreas Keller
- Saarland University (USAAR), Chair for Clinical Bioinformatics, Saarbrücken 66123, Germany; Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Center for Infection Research (HZI), Saarland University Campus, Saarbrücken 66123, Germany
| | - Eckart Meese
- Saarland University (USAAR), Institute of Human Genetics, Homburg 66421, Germany
| | - Ulrike Fischer
- Saarland University (USAAR), Institute of Human Genetics, Homburg 66421, Germany.
| |
Collapse
|
19
|
Mazein I, Rougny A, Mazein A, Henkel R, Gütebier L, Michaelis L, Ostaszewski M, Schneider R, Satagopam V, Jensen LJ, Waltemath D, Wodke JAH, Balaur I. Graph databases in systems biology: a systematic review. Brief Bioinform 2024; 25:bbae561. [PMID: 39565895 PMCID: PMC11578065 DOI: 10.1093/bib/bbae561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 09/28/2024] [Accepted: 10/21/2024] [Indexed: 11/22/2024] Open
Abstract
Graph databases are becoming increasingly popular across scientific disciplines, being highly suitable for storing and connecting complex heterogeneous data. In systems biology, they are used as a backend solution for biological data repositories, ontologies, networks, pathways, and knowledge graph databases. In this review, we analyse all publications using or mentioning graph databases retrieved from PubMed and PubMed Central full-text search, focusing on the top 16 available graph databases, Publications are categorized according to their domain and application, focusing on pathway and network biology and relevant ontologies and tools. We detail different approaches and highlight the advantages of outstanding resources, such as UniProtKB, Disease Ontology, and Reactome, which provide graph-based solutions. We discuss ongoing efforts of the systems biology community to standardize and harmonize knowledge graph creation and the maintenance of integrated resources. Outlining prospects, including the use of graph databases as a way of communication between biological data repositories, we conclude that efficient design, querying, and maintenance of graph databases will be key for knowledge generation in systems biology and other research fields with heterogeneous data.
Collapse
Affiliation(s)
- Ilya Mazein
- Medical Informatics Laboratory, University Medicine Greifswald, Walther-Rathenau-Straße 48, Greifswald 17475, Germany
| | - Adrien Rougny
- Luxembourg Centre for Systems Biology, University of Luxembourg, 6 Avenue du Swing, Belvaux L-4367, Luxembourg
| | - Alexander Mazein
- Luxembourg Centre for Systems Biology, University of Luxembourg, 6 Avenue du Swing, Belvaux L-4367, Luxembourg
| | - Ron Henkel
- Medical Informatics Laboratory, University Medicine Greifswald, Walther-Rathenau-Straße 48, Greifswald 17475, Germany
| | - Lea Gütebier
- Medical Informatics Laboratory, University Medicine Greifswald, Walther-Rathenau-Straße 48, Greifswald 17475, Germany
| | - Lea Michaelis
- Medical Informatics Laboratory, University Medicine Greifswald, Walther-Rathenau-Straße 48, Greifswald 17475, Germany
| | - Marek Ostaszewski
- Luxembourg Centre for Systems Biology, University of Luxembourg, 6 Avenue du Swing, Belvaux L-4367, Luxembourg
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biology, University of Luxembourg, 6 Avenue du Swing, Belvaux L-4367, Luxembourg
| | - Venkata Satagopam
- Luxembourg Centre for Systems Biology, University of Luxembourg, 6 Avenue du Swing, Belvaux L-4367, Luxembourg
| | - Lars Juhl Jensen
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Grønnegårdsvej 15, 1870 Frederiksberg C, Denmark
| | - Dagmar Waltemath
- Medical Informatics Laboratory, University Medicine Greifswald, Walther-Rathenau-Straße 48, Greifswald 17475, Germany
| | - Judith A H Wodke
- Medical Informatics Laboratory, University Medicine Greifswald, Walther-Rathenau-Straße 48, Greifswald 17475, Germany
| | - Irina Balaur
- Luxembourg Centre for Systems Biology, University of Luxembourg, 6 Avenue du Swing, Belvaux L-4367, Luxembourg
| |
Collapse
|
20
|
Liu Y, Gaunt TR. Triangulating evidence in health sciences with Annotated Semantic Queries. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae519. [PMID: 39171832 PMCID: PMC11377847 DOI: 10.1093/bioinformatics/btae519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 07/31/2024] [Accepted: 08/19/2024] [Indexed: 08/23/2024]
Abstract
MOTIVATION Integrating information from data sources representing different study designs has the potential to strengthen evidence in population health research. However, this concept of evidence "triangulation" presents a number of challenges for systematically identifying and integrating relevant information. These include the harmonization of heterogenous evidence with common semantic concepts and properties, as well as the priortization of the retrieved evidence for triangulation with the question of interest. RESULTS We present Annotated Semantic Queries (ASQ), a natural language query interface to the integrated biomedical entities and epidemiological evidence in EpiGraphDB, which enables users to extract "claims" from a piece of unstructured text, and then investigate the evidence that could either support, contradict the claims, or offer additional information to the query. This approach has the potential to support the rapid review of preprints, grant applications, conference abstracts, and articles submitted for peer review. ASQ implements strategies to harmonize biomedical entities in different taxonomies and evidence from different sources, to facilitate evidence triangulation and interpretation. AVAILABILITY AND IMPLEMENTATION ASQ is openly available at https://asq.epigraphdb.org and its source code is available at https://github.com/mrcieu/epigraphdb-asq under GPL-3.0 license.
Collapse
Affiliation(s)
- Yi Liu
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, United Kingdom
| | - Tom R Gaunt
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, United Kingdom
- NIHR Bristol Biomedical Research Centre, University of Bristol, Bristol, BS8 2BN, United Kingdom
| |
Collapse
|
21
|
Wang S, Bao C, Yang S, Gao C, Lu C, Jiang L, Chen L, Wang Z, Fang H. XGRm: A Web Server for Interpreting Mouse Summary-level Genomic Data. J Mol Biol 2024; 436:168705. [PMID: 39237194 DOI: 10.1016/j.jmb.2024.168705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 06/30/2024] [Accepted: 07/09/2024] [Indexed: 09/07/2024]
Abstract
We introduce XGR-model (or XGRm), a web server made accessible at http://www.xgrm.pro, with the aim of meeting the increasing demand for effectively interpreting summary-level genomic data in model organisms. Currently, it hosts two enrichment analysers and two subnetwork analysers to support enrichment and subnetwork analyses for user-input mouse genomic data, whether gene-centric or genomic region-centric. The enrichment analysers identify ontology term enrichments for input genes (GElyser) or for genes linked from input genomic regions (RElyser). The subnetwork analysers rely on our previously established network algorithm to identify gene subnetworks from input gene-centric summary data (GSlyser) or from input region-centric summary data (RSlyser), leveraging network information about either functional interactions or pathway-derived interactions. Collectively, XGRm offers an all-in-one solution for gaining systems biology insights into summary-level genomic data in mice, underpinned by our commitment to regular updates as well as natural extensions to other model organisms.
Collapse
Affiliation(s)
- Shan Wang
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Chaohui Bao
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Siyue Yang
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China; Faculty of Medical Laboratory Science, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Chenxu Gao
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Chang Lu
- MRC Laboratory of Medical Sciences, Imperial College London, Hammersmith Hospital Campus, London W12 0HS, UK
| | - Lulu Jiang
- Translational Health Sciences, University of Bristol, Bristol BS1 3NY, UK
| | - Liye Chen
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Zheng Wang
- Medical Center of Hematology, Xinqiao Hospital of Army Medical University, State Key Laboratory of Trauma and Chemical Poisoning, Chongqing Key Laboratory of Hematology and Microenvironment, Chongqing 400037, China; Jinfeng Laboratory, Chongqing 401329, China; Bio-Med Informatics Research Center & Clinical Research Center, The Second Affiliated Hospital, Army Medical University, Chongqing 400037, China.
| | - Hai Fang
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China.
| |
Collapse
|
22
|
Cavalleri E, Cabri A, Soto-Gomez M, Bonfitto S, Perlasca P, Gliozzo J, Callahan TJ, Reese J, Robinson PN, Casiraghi E, Valentini G, Mesiti M. An ontology-based knowledge graph for representing interactions involving RNA molecules. Sci Data 2024; 11:906. [PMID: 39174566 PMCID: PMC11341713 DOI: 10.1038/s41597-024-03673-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 07/23/2024] [Indexed: 08/24/2024] Open
Abstract
The "RNA world" represents a novel frontier for the study of fundamental biological processes and human diseases and is paving the way for the development of new drugs tailored to each patient's biomolecular characteristics. Although scientific data about coding and non-coding RNA molecules are constantly produced and available from public repositories, they are scattered across different databases and a centralized, uniform, and semantically consistent representation of the "RNA world" is still lacking. We propose RNA-KG, a knowledge graph (KG) encompassing biological knowledge about RNAs gathered from more than 60 public databases, integrating functional relationships with genes, proteins, and chemicals and ontologically grounded biomedical concepts. To develop RNA-KG, we first identified, pre-processed, and characterized each data source; next, we built a meta-graph that provides an ontological description of the KG by representing all the bio-molecular entities and medical concepts of interest in this domain, as well as the types of interactions connecting them. Finally, we leveraged an instance-based semantically abstracted knowledge model to specify the ontological alignment according to which RNA-KG was generated. RNA-KG can be downloaded in different formats and also queried by a SPARQL endpoint. A thorough topological analysis of the resulting heterogeneous graph provides further insights into the characteristics of the "RNA world". RNA-KG can be both directly explored and visualized, and/or analyzed by applying computational methods to infer bio-medical knowledge from its heterogeneous nodes and edges. The resource can be easily updated with new experimental data, and specific views of the overall KG can be extracted according to the bio-medical problem to be studied.
Collapse
Affiliation(s)
- Emanuele Cavalleri
- AnacletoLab, Computer Science Department, University of Milan, Milan, 20133, Italy
| | - Alberto Cabri
- AnacletoLab, Computer Science Department, University of Milan, Milan, 20133, Italy
| | - Mauricio Soto-Gomez
- AnacletoLab, Computer Science Department, University of Milan, Milan, 20133, Italy
| | - Sara Bonfitto
- AnacletoLab, Computer Science Department, University of Milan, Milan, 20133, Italy
| | - Paolo Perlasca
- AnacletoLab, Computer Science Department, University of Milan, Milan, 20133, Italy
| | - Jessica Gliozzo
- AnacletoLab, Computer Science Department, University of Milan, Milan, 20133, Italy
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Justin Reese
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Peter N Robinson
- Berlin Institute of Health - Charité, Universitätsmedizin, Berlin, 13353, Germany
- ELLIS, European Laboratory for Learning and Intelligent Systems, Munich, Germany
| | - Elena Casiraghi
- AnacletoLab, Computer Science Department, University of Milan, Milan, 20133, Italy
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- ELLIS, European Laboratory for Learning and Intelligent Systems, Munich, Germany
| | - Giorgio Valentini
- AnacletoLab, Computer Science Department, University of Milan, Milan, 20133, Italy
- ELLIS, European Laboratory for Learning and Intelligent Systems, Munich, Germany
| | - Marco Mesiti
- AnacletoLab, Computer Science Department, University of Milan, Milan, 20133, Italy.
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
| |
Collapse
|
23
|
Yuan H, Mancuso CA, Johnson K, Braasch I, Krishnan A. Computational strategies for cross-species knowledge transfer and translational biomedicine. ARXIV 2024:arXiv:2408.08503v1. [PMID: 39184546 PMCID: PMC11343225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Research organisms provide invaluable insights into human biology and diseases, serving as essential tools for functional experiments, disease modeling, and drug testing. However, evolutionary divergence between humans and research organisms hinders effective knowledge transfer across species. Here, we review state-of-the-art methods for computationally transferring knowledge across species, primarily focusing on methods that utilize transcriptome data and/or molecular networks. We introduce the term "agnology" to describe the functional equivalence of molecular components regardless of evolutionary origin, as this concept is becoming pervasive in integrative data-driven models where the role of evolutionary origin can become unclear. Our review addresses four key areas of information and knowledge transfer across species: (1) transferring disease and gene annotation knowledge, (2) identifying agnologous molecular components, (3) inferring equivalent perturbed genes or gene sets, and (4) identifying agnologous cell types. We conclude with an outlook on future directions and several key challenges that remain in cross-species knowledge transfer.
Collapse
Affiliation(s)
- Hao Yuan
- Genetics and Genome Science Program; Ecology, Evolution, and Behavior Program, Michigan State University
| | - Christopher A. Mancuso
- Department of Biostatistics & Informatics, University of Colorado Anschutz Medical Campus
| | - Kayla Johnson
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus
| | - Ingo Braasch
- Department of Integrative Biology; Genetics and Genome Science Program; Ecology, Evolution, and Behavior Program, Michigan State University
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus
| |
Collapse
|
24
|
Acs-Szabo L, Papp LA, Miklos I. Understanding the molecular mechanisms of human diseases: the benefits of fission yeasts. MICROBIAL CELL (GRAZ, AUSTRIA) 2024; 11:288-311. [PMID: 39104724 PMCID: PMC11299203 DOI: 10.15698/mic2024.08.833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 07/04/2024] [Accepted: 07/10/2024] [Indexed: 08/07/2024]
Abstract
The role of model organisms such as yeasts in life science research is crucial. Although the baker's yeast (Saccharomyces cerevisiae) is the most popular model among yeasts, the contribution of the fission yeasts (Schizosaccharomyces) to life science is also indisputable. Since both types of yeasts share several thousands of common orthologous genes with humans, they provide a simple research platform to investigate many fundamental molecular mechanisms and functions, thereby contributing to the understanding of the background of human diseases. In this review, we would like to highlight the many advantages of fission yeasts over budding yeasts. The usefulness of fission yeasts in virus research is shown as an example, presenting the most important research results related to the Human Immunodeficiency Virus Type 1 (HIV-1) Vpr protein. Besides, the potential role of fission yeasts in the study of prion biology is also discussed. Furthermore, we are keen to promote the uprising model yeast Schizosaccharomyces japonicus, which is a dimorphic species in the fission yeast genus. We propose the hyphal growth of S. japonicus as an unusual opportunity as a model to study the invadopodia of human cancer cells since the two seemingly different cell types can be compared along fundamental features. Here we also collect the latest laboratory protocols and bioinformatics tools for the fission yeasts to highlight the many possibilities available to the research community. In addition, we present several limiting factors that everyone should be aware of when working with yeast models.
Collapse
Affiliation(s)
- Lajos Acs-Szabo
- Department of Genetics and Applied Microbiology, Faculty of Science and Technology, University of DebrecenDebrecen, 4032Hungary
| | - Laszlo Attila Papp
- Department of Genetics and Applied Microbiology, Faculty of Science and Technology, University of DebrecenDebrecen, 4032Hungary
| | - Ida Miklos
- Department of Genetics and Applied Microbiology, Faculty of Science and Technology, University of DebrecenDebrecen, 4032Hungary
| |
Collapse
|
25
|
Zhao J, Gao J, Ma S, Chen X, Wang J. Predicting the potential risks posed by antidepressants as emerging contaminants in fish based on network pharmacological analysis. Toxicol In Vitro 2024; 99:105872. [PMID: 38851602 DOI: 10.1016/j.tiv.2024.105872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 05/23/2024] [Accepted: 06/05/2024] [Indexed: 06/10/2024]
Abstract
This study conducted a network pharmacology-based analysis to simultaneously discern a broad spectrum of potential environmental risks and health effects of antidepressants, a common class of pharmaceutical emerging contaminants (PECs) possessing a complex pharmacological profile, and in silico predict the adverse phenotypes potentially occurring in fish associated with exposure to antidepressants and their mixtures under realistic exposure scenarios. Results showed that 24 of the included 39 antidepressants had been detected worldwide in water environment across 50 countries. Using the environmentally realistic exposure scenario for China as an example, the predicted blood concentrations of antidepressant residues that were generated based on the Fish Plasma Model ranged from 37.89 (Alprazolam) to 16,772.05 (Sertraline) ng/L in exposed fish. Hazard-based bioactivity network without regard to concentration data was composed of 148 potential targets and 701 antidepressant-target interactions. After filtering each antidepressant-target interaction node using the predicted drug concentrations in the blood of fish under realistic exposure scenarios in China, an environmental risk-based network was refined and showed that 11 targets, including muscarinic acetylcholine receptor M1, alpha-2B adrenergic receptor, serotonin 2 A receptor, etc. might be modulated by antidepressants at concentrations equal to or below the environmental exposure levels and their mixtures in fish. Environmentally relevant concentrations of antidepressants in water samples from China might perturb the behavior, stress response, phototaxis, and development in exposed fish.
Collapse
Affiliation(s)
- Jinru Zhao
- Hubei Province Key Laboratory of Occupational Hazard Identification and Control, Medical College, Wuhan University of Science and Technology, Wuhan, China
| | - Jian Gao
- Hubei Province Key Laboratory of Occupational Hazard Identification and Control, Medical College, Wuhan University of Science and Technology, Wuhan, China
| | - Sijia Ma
- Hubei Province Key Laboratory of Occupational Hazard Identification and Control, Medical College, Wuhan University of Science and Technology, Wuhan, China
| | - Xintong Chen
- Hubei Province Key Laboratory of Occupational Hazard Identification and Control, Medical College, Wuhan University of Science and Technology, Wuhan, China
| | - Jun Wang
- Hubei Province Key Laboratory of Occupational Hazard Identification and Control, Medical College, Wuhan University of Science and Technology, Wuhan, China.
| |
Collapse
|
26
|
Joachimiak MP, Caufield JH, Harris NL, Kim H, Mungall CJ. Gene Set Summarization Using Large Language Models. ARXIV 2024:arXiv:2305.13338v3. [PMID: 37292480 PMCID: PMC10246080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Molecular biologists frequently interpret gene lists derived from high-throughput experiments and computational analysis. This is typically done as a statistical enrichment analysis that measures the over- or under-representation of biological function terms associated with genes or their properties, based on curated assertions from a knowledge base (KB) such as the Gene Ontology (GO). Interpreting gene lists can also be framed as a textual summarization task, enabling Large Language Models (LLMs) to use scientific texts directly and avoid reliance on a KB. TALISMAN (Terminological ArtificiaL Intelligence SuMmarization of Annotation and Narratives) uses generative AI to perform gene set function summarization as a complement to standard enrichment analysis. This method can use different sources of gene functional information: (1) structured text derived from curated ontological KB annotations, (2) ontology-free narrative gene summaries, or (3) direct retrieval from the model. We demonstrate that these methods are able to generate plausible and biologically valid summary GO term lists for an input gene set. However, LLM-based approaches are unable to deliver reliable scores or p-values and often return terms that are not statistically significant. Crucially, in our experiments these methods were rarely able to recapitulate the most precise and informative term from standard enrichment analysis. We also observe minor differences depending on prompt input information, with GO term descriptions leading to higher recall but lower precision. However, newer LLM models perform statistically significantly better than the oldest model across all performance metrics, suggesting that future models may lead to further improvements. Overall, the results are nondeterministic, with minor variations in prompt resulting in radically different term lists, true to the stochastic nature of LLMs. Our results show that at this point, LLM-based methods are unsuitable as a replacement for standard term enrichment analysis, however they may provide summarization benefits for implicit knowledge integration across extant but unstandardized knowledge, for large sets of features, and where the amount of information is difficult for humans to process.
Collapse
Affiliation(s)
- Marcin P Joachimiak
- Biosystems Data Science Department, Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - J Harry Caufield
- Biosystems Data Science Department, Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Nomi L Harris
- Biosystems Data Science Department, Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | | | - Christopher J Mungall
- Biosystems Data Science Department, Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| |
Collapse
|
27
|
Groza T, Gration D, Baynam G, Robinson PN. FastHPOCR: pragmatic, fast, and accurate concept recognition using the human phenotype ontology. Bioinformatics 2024; 40:btae406. [PMID: 38913850 PMCID: PMC11227366 DOI: 10.1093/bioinformatics/btae406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 05/18/2024] [Accepted: 06/19/2024] [Indexed: 06/26/2024] Open
Abstract
MOTIVATION Human Phenotype Ontology (HPO)-based phenotype concept recognition (CR) underpins a faster and more effective mechanism to create patient phenotype profiles or to document novel phenotype-centred knowledge statements. While the increasing adoption of large language models (LLMs) for natural language understanding has led to several LLM-based solutions, we argue that their intrinsic resource-intensive nature is not suitable for realistic management of the phenotype CR lifecycle. Consequently, we propose to go back to the basics and adopt a dictionary-based approach that enables both an immediate refresh of the ontological concepts as well as efficient re-analysis of past data. RESULTS We developed a dictionary-based approach using a pre-built large collection of clusters of morphologically equivalent tokens-to address lexical variability and a more effective CR step by reducing the entity boundary detection strictly to candidates consisting of tokens belonging to ontology concepts. Our method achieves state-of-the-art results (0.76 F1 on the GSC+ corpus) and a processing efficiency of 10 000 publication abstracts in 5 s. AVAILABILITY AND IMPLEMENTATION FastHPOCR is available as a Python package installable via pip. The source code is available at https://github.com/tudorgroza/fast_hpo_cr. A Java implementation of FastHPOCR will be made available as part of the Fenominal Java library available at https://github.com/monarch-initiative/fenominal. The up-to-date GCS-2024 corpus is available at https://github.com/tudorgroza/code-for-papers/tree/main/gsc-2024.
Collapse
Affiliation(s)
- Tudor Groza
- Rare Care Centre, Perth Children’s Hospital, Nedlands, WA 6009, Australia
- Telethon Kids Institute, Nedlands, WA 6009, Australia
- School of Electrical Engineering, Computing and Mathematical Sciences, Curtin University, Bentley, WA 6102, Australia
- SingHealth Duke-NUS Institute of Precision Medicine, Singapore 169609, Singapore
| | - Dylan Gration
- Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, Subiaco, WA 6008, Australia
| | - Gareth Baynam
- Rare Care Centre, Perth Children’s Hospital, Nedlands, WA 6009, Australia
- Telethon Kids Institute, Nedlands, WA 6009, Australia
- Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, Subiaco, WA 6008, Australia
- Faculty of Health and Medical Sciences, University of Western Australia, Crawley, WA 6009, Australia
| | - Peter N Robinson
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, United States
| |
Collapse
|
28
|
Montanaro G, Balhoff JP, Girón JC, Söderholm M, Tarasov S. Computable species descriptions and nanopublications: applying ontology-based technologies to dung beetles (Coleoptera, Scarabaeinae). Biodivers Data J 2024; 12:e121562. [PMID: 38912113 PMCID: PMC11190572 DOI: 10.3897/bdj.12.e121562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Accepted: 05/22/2024] [Indexed: 06/25/2024] Open
Abstract
Background Taxonomy has long struggled with analysing vast amounts of phenotypic data due to computational and accessibility challenges. Ontology-based technologies provide a framework for modelling semantic phenotypes that are understandable by computers and compliant with FAIR principles. In this paper, we explore the use of Phenoscript, an emerging language designed for creating semantic phenotypes, to produce computable species descriptions. Our case study centers on the application of this approach to dung beetles (Coleoptera, Scarabaeinae). New information We illustrate the effectiveness of Phenoscript for creating semantic phenotypes. We also demonstrate the ability of the Phenospy python package to automatically translate Phenoscript descriptions into natural language (NL), which eliminates the need for writing traditional NL descriptions. We introduce a computational pipeline that streamlines the generation of semantic descriptions and their conversion to NL. To demonstrate the power of the semantic approach, we apply simple semantic queries to the generated phenotypic descriptions. This paper addresses the current challenges in crafting semantic species descriptions and outlines the path towards future improvements. Furthermore, we discuss the promising integration of semantic phenotypes and nanopublications, as emerging methods for sharing scientific information. Overall, our study highlights the pivotal role of ontology-based technologies in modernising taxonomy and aligning it with the evolving landscape of big data analysis and FAIR principles.
Collapse
Affiliation(s)
- Giulio Montanaro
- Finnish Museum of Natural History, University of Helsinki, Helsinki, FinlandFinnish Museum of Natural History, University of HelsinkiHelsinkiFinland
| | - James P. Balhoff
- RENCI, University of North Carolina, Chapel Hill, North Carolina, United States of AmericaRENCI, University of North CarolinaChapel Hill, North CarolinaUnited States of America
| | - Jennifer C. Girón
- Museum of Texas Tech University, Texas, United States of AmericaMuseum of Texas Tech UniversityTexasUnited States of America
| | - Max Söderholm
- Finnish Museum of Natural History, University of Helsinki, Helsinki, FinlandFinnish Museum of Natural History, University of HelsinkiHelsinkiFinland
| | - Sergei Tarasov
- Finnish Museum of Natural History, University of Helsinki, Helsinki, FinlandFinnish Museum of Natural History, University of HelsinkiHelsinkiFinland
| |
Collapse
|
29
|
Danis D, Bamshad MJ, Bridges Y, Cacheiro P, Carmody LC, Chong JX, Coleman B, Dalgleish R, Freeman PJ, Graefe ASL, Groza T, Jacobsen JOB, Klocperk A, Kusters M, Ladewig MS, Marcello AJ, Mattina T, Mungall CJ, Munoz-Torres MC, Reese JT, Rehburg F, Reis BCS, Schuetz C, Smedley D, Strauss T, Sundaramurthi JC, Thun S, Wissink K, Wagstaff JF, Zocche D, Haendel MA, Robinson PN. A corpus of GA4GH Phenopackets: case-level phenotyping for genomic diagnostics and discovery. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.29.24308104. [PMID: 38854034 PMCID: PMC11160806 DOI: 10.1101/2024.05.29.24308104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present phenopacket-store. Version 0.1.12 of phenopacket-store includes 4916 phenopackets representing 277 Mendelian and chromosomal diseases associated with 236 genes, and 2872 unique pathogenic alleles curated from 605 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.
Collapse
Affiliation(s)
- Daniel Danis
- The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Michael J Bamshad
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA
- Brotman-Baty Institute for Precision Medicine, 1959 NE Pacific Street, Box 357657, Seattle WA 98195, USA
- Department of Pediatrics, Division of Genetic Medicine, Seattle Children's Hospital, Seattle, WA 98195, USA
| | - Yasemin Bridges
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Pilar Cacheiro
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Leigh C Carmody
- The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
| | - Jessica X Chong
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA
- Brotman-Baty Institute for Precision Medicine, 1959 NE Pacific Street, Box 357657, Seattle WA 98195, USA
| | - Ben Coleman
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA
- The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
| | - Raymond Dalgleish
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Peter J Freeman
- Division of Informatics, Imaging and Data Science, The University of Manchester, Manchester, UK
| | - Adam S L Graefe
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Tudor Groza
- Rare Care Centre, Perth Children's Hospital, Nedlands, WA 6009, Australia
- SingHealth Duke-NUS Institute of Precision Medicine, 5 Hospital Drive Level 9, Singapore 169609, Singapore
- Telethon Kids Institute, Nedlands, WA 6009, Australia
| | - Julius O B Jacobsen
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Adam Klocperk
- Department of Immunology, 2nd Faculty of Medicine, Charles University and University Hospital in Motol, Prague, Czech Republic
| | - Maaike Kusters
- Department of Paediatric Immunology, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
- University College London Institute of Child Health, London, United Kingdom
| | - Markus S Ladewig
- Department of Ophthalmology, University Clinic Marburg - Campus Fulda, Fulda, Germany
| | - Anthony J Marcello
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA
| | - Teresa Mattina
- Medica Genetics University of Catania Italy
- Morgagni foundation and Clinic, Catania, Italy
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Monica C Munoz-Torres
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Ccampus
| | - Justin T Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Filip Rehburg
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Bárbara C S Reis
- Department of Immunology, National Institute of Women's, Children's and Adolescents' Health Fernandes Figueira, Rio de Janeiro, Brazil
- High Complexity Laboratory, National Institute of Women's, Children's and Adolescents' Health Fernandes Figueira, Rio de Janeiro, Brazil
| | - Catharina Schuetz
- Department of Pediatrics, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- University Center for Rare Diseases, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Timmy Strauss
- Department of Pediatrics, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- University Center for Rare Diseases, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | | | - Sylvia Thun
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Kyran Wissink
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Utrecht University, Utrecht, the Netherlands
| | | | - David Zocche
- North West Thames Regional Genetics Service, Northwick Park & St Mark's Hospitals, London, UK
| | | | - Peter N Robinson
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
- ELLIS-European Laboratory for Learning and Intelligent Systems
| |
Collapse
|
30
|
Rutherford KM, Lera-Ramírez M, Wood V. PomBase: a Global Core Biodata Resource-growth, collaboration, and sustainability. Genetics 2024; 227:iyae007. [PMID: 38376816 PMCID: PMC11075564 DOI: 10.1093/genetics/iyae007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 01/13/2024] [Indexed: 02/21/2024] Open
Abstract
PomBase (https://www.pombase.org), the model organism database (MOD) for fission yeast, was recently awarded Global Core Biodata Resource (GCBR) status by the Global Biodata Coalition (GBC; https://globalbiodata.org/) after a rigorous selection process. In this MOD review, we present PomBase's continuing growth and improvement over the last 2 years. We describe these improvements in the context of the qualitative GCBR indicators related to scientific quality, comprehensivity, accelerating science, user stories, and collaborations with other biodata resources. This review also showcases the depth of existing connections both within the biocuration ecosystem and between PomBase and its user community.
Collapse
Affiliation(s)
- Kim M Rutherford
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Manuel Lera-Ramírez
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| |
Collapse
|
31
|
Orlic-Milacic M, Rothfels K, Matthews L, Wright A, Jassal B, Shamovsky V, Trinh Q, Gillespie ME, Sevilla C, Tiwari K, Ragueneau E, Gong C, Stephan R, May B, Haw R, Weiser J, Beavers D, Conley P, Hermjakob H, Stein LD, D’Eustachio P, Wu G. Pathway-based, reaction-specific annotation of disease variants for elucidation of molecular phenotypes. Database (Oxford) 2024; 2024:baae031. [PMID: 38713862 PMCID: PMC11184451 DOI: 10.1093/database/baae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 02/23/2024] [Accepted: 04/01/2024] [Indexed: 05/09/2024]
Abstract
Germline and somatic mutations can give rise to proteins with altered activity, including both gain and loss-of-function. The effects of these variants can be captured in disease-specific reactions and pathways that highlight the resulting changes to normal biology. A disease reaction is defined as an aberrant reaction in which a variant protein participates. A disease pathway is defined as a pathway that contains a disease reaction. Annotation of disease variants as participants of disease reactions and disease pathways can provide a standardized overview of molecular phenotypes of pathogenic variants that is amenable to computational mining and mathematical modeling. Reactome (https://reactome.org/), an open source, manually curated, peer-reviewed database of human biological pathways, in addition to providing annotations for >11 000 unique human proteins in the context of ∼15 000 wild-type reactions within more than 2000 wild-type pathways, also provides annotations for >4000 disease variants of close to 400 genes as participants of ∼800 disease reactions in the context of ∼400 disease pathways. Functional annotation of disease variants proceeds from normal gene functions, described in wild-type reactions and pathways, through disease variants whose divergence from normal molecular behaviors has been experimentally verified, to extrapolation from molecular phenotypes of characterized variants to variants of unknown significance using criteria of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Reactome's data model enables mapping of disease variant datasets to specific disease reactions within disease pathways, providing a platform to infer pathway output impacts of numerous human disease variants and model organism orthologs, complementing computational predictions of variant pathogenicity. Database URL: https://reactome.org/.
Collapse
Affiliation(s)
- Marija Orlic-Milacic
- Adaptive Oncology, Ontario Institute for Cancer Research, 661 University Avenue Suite 510, Toronto, ON M5G 0A3, Canada
| | - Karen Rothfels
- Adaptive Oncology, Ontario Institute for Cancer Research, 661 University Avenue Suite 510, Toronto, ON M5G 0A3, Canada
| | - Lisa Matthews
- Department of Biochemistry and Molecular Pharmacology, New York University Grossman School of Medicine, 550 First Avenue, New York, NY 10016, USA
| | - Adam Wright
- Adaptive Oncology, Ontario Institute for Cancer Research, 661 University Avenue Suite 510, Toronto, ON M5G 0A3, Canada
| | - Bijay Jassal
- Adaptive Oncology, Ontario Institute for Cancer Research, 661 University Avenue Suite 510, Toronto, ON M5G 0A3, Canada
| | - Veronica Shamovsky
- Department of Biochemistry and Molecular Pharmacology, New York University Grossman School of Medicine, 550 First Avenue, New York, NY 10016, USA
| | - Quang Trinh
- Adaptive Oncology, Ontario Institute for Cancer Research, 661 University Avenue Suite 510, Toronto, ON M5G 0A3, Canada
| | - Marc E Gillespie
- Adaptive Oncology, Ontario Institute for Cancer Research, 661 University Avenue Suite 510, Toronto, ON M5G 0A3, Canada
- College of Pharmacy and Health Sciences, St. John’s University, 8000 Utopia Parkway, Queens, NY 11439, USA
| | - Cristoffer Sevilla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Krishna Tiwari
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Eliot Ragueneau
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Chuqiao Gong
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Ralf Stephan
- Adaptive Oncology, Ontario Institute for Cancer Research, 661 University Avenue Suite 510, Toronto, ON M5G 0A3, Canada
- Institute for Globally Distributed Open Research and Education (IGDORE)
| | - Bruce May
- Adaptive Oncology, Ontario Institute for Cancer Research, 661 University Avenue Suite 510, Toronto, ON M5G 0A3, Canada
| | - Robin Haw
- Adaptive Oncology, Ontario Institute for Cancer Research, 661 University Avenue Suite 510, Toronto, ON M5G 0A3, Canada
| | - Joel Weiser
- Adaptive Oncology, Ontario Institute for Cancer Research, 661 University Avenue Suite 510, Toronto, ON M5G 0A3, Canada
| | - Deidre Beavers
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, 3181 S.W. Sam Jackson Park Rd., Portland, OR 97239, USA
| | - Patrick Conley
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, 3181 S.W. Sam Jackson Park Rd., Portland, OR 97239, USA
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Lincoln D Stein
- Adaptive Oncology, Ontario Institute for Cancer Research, 661 University Avenue Suite 510, Toronto, ON M5G 0A3, Canada
- Department of Molecular Genetics, University of Toronto, 1 King’s College Circle, Room 4386, Toronto, ON M5S 1A8, Canada
| | - Peter D’Eustachio
- Department of Biochemistry and Molecular Pharmacology, New York University Grossman School of Medicine, 550 First Avenue, New York, NY 10016, USA
| | - Guanming Wu
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, 3181 S.W. Sam Jackson Park Rd., Portland, OR 97239, USA
| |
Collapse
|
32
|
Althagafi A, Zhapa-Camacho F, Hoehndorf R. Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning. Bioinformatics 2024; 40:btae301. [PMID: 38696757 PMCID: PMC11132820 DOI: 10.1093/bioinformatics/btae301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 04/05/2024] [Accepted: 04/30/2024] [Indexed: 05/04/2024] Open
Abstract
MOTIVATION Whole-exome and genome sequencing have become common tools in diagnosing patients with rare diseases. Despite their success, this approach leaves many patients undiagnosed. A common argument is that more disease variants still await discovery, or the novelty of disease phenotypes results from a combination of variants in multiple disease-related genes. Interpreting the phenotypic consequences of genomic variants relies on information about gene functions, gene expression, physiology, and other genomic features. Phenotype-based methods to identify variants involved in genetic diseases combine molecular features with prior knowledge about the phenotypic consequences of altering gene functions. While phenotype-based methods have been successfully applied to prioritizing variants, such methods are based on known gene-disease or gene-phenotype associations as training data and are applicable to genes that have phenotypes associated, thereby limiting their scope. In addition, phenotypes are not assigned uniformly by different clinicians, and phenotype-based methods need to account for this variability. RESULTS We developed an Embedding-based Phenotype Variant Predictor (EmbedPVP), a computational method to prioritize variants involved in genetic diseases by combining genomic information and clinical phenotypes. EmbedPVP leverages a large amount of background knowledge from human and model organisms about molecular mechanisms through which abnormal phenotypes may arise. Specifically, EmbedPVP incorporates phenotypes linked to genes, functions of gene products, and the anatomical site of gene expression, and systematically relates them to their phenotypic effects through neuro-symbolic, knowledge-enhanced machine learning. We demonstrate EmbedPVP's efficacy on a large set of synthetic genomes and genomes matched with clinical information. AVAILABILITY AND IMPLEMENTATION EmbedPVP and all evaluation experiments are freely available at https://github.com/bio-ontology-research-group/EmbedPVP.
Collapse
Affiliation(s)
- Azza Althagafi
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Department, College of Computers and Information Technology, Taif University, Taif 26571, Saudi Arabia
| | - Fernando Zhapa-Camacho
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
- SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence, King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia
| |
Collapse
|
33
|
Callahan TJ, Tripodi IJ, Stefanski AL, Cappelletti L, Taneja SB, Wyrwa JM, Casiraghi E, Matentzoglu NA, Reese J, Silverstein JC, Hoyt CT, Boyce RD, Malec SA, Unni DR, Joachimiak MP, Robinson PN, Mungall CJ, Cavalleri E, Fontana T, Valentini G, Mesiti M, Gillenwater LA, Santangelo B, Vasilevsky NA, Hoehndorf R, Bennett TD, Ryan PB, Hripcsak G, Kahn MG, Bada M, Baumgartner WA, Hunter LE. An open source knowledge graph ecosystem for the life sciences. Sci Data 2024; 11:363. [PMID: 38605048 PMCID: PMC11009265 DOI: 10.1038/s41597-024-03171-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 03/21/2024] [Indexed: 04/13/2024] Open
Abstract
Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA.
| | - Ignacio J Tripodi
- Computer Science Department, Interdisciplinary Quantitative Biology, University of Colorado Boulder, Boulder, CO, 80301, USA
| | - Adrianne L Stefanski
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Luca Cappelletti
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Jordan M Wyrwa
- Department of Physical Medicine and Rehabilitation, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Elena Casiraghi
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Jonathan C Silverstein
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA
| | - Charles Tapley Hoyt
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
| | - Richard D Boyce
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA
| | - Scott A Malec
- Division of Translational Informatics, University of New Mexico School of Medicine, Albuquerque, NM, 87131, USA
| | - Deepak R Unni
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Marcin P Joachimiak
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Peter N Robinson
- Berlin Institute of Health at Charité-Universitatsmedizin, 10117, Berlin, Germany
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Emanuele Cavalleri
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Tommaso Fontana
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Giorgio Valentini
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
- ELLIS, European Laboratory for Learning and Intelligent Systems, Milan Unit, Italy
| | - Marco Mesiti
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Lucas A Gillenwater
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Brook Santangelo
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Nicole A Vasilevsky
- Data Collaboration Center, Critical Path Institute, 1840 E River Rd. Suite 100, Tucson, AZ, 85718, USA
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Tellen D Bennett
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Patrick B Ryan
- Janssen Research and Development, Raritan, NJ, 08869, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Michael G Kahn
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Michael Bada
- Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - William A Baumgartner
- Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| |
Collapse
|
34
|
Novoa J, López-Ibáñez J, Chagoyen M, Ranea JAG, Pazos F. CoMentG: comprehensive retrieval of generic relationships between biomedical concepts from the scientific literature. Database (Oxford) 2024; 2024:baae025. [PMID: 38564426 PMCID: PMC10986793 DOI: 10.1093/database/baae025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 03/01/2024] [Accepted: 03/15/2024] [Indexed: 04/04/2024]
Abstract
The CoMentG resource contains millions of relationships between terms of biomedical interest obtained from the scientific literature. At the core of the system is a methodology for detecting significant co-mentions of concepts in the entire PubMed corpus. That method was applied to nine sets of terms covering the most important classes of biomedical concepts: diseases, symptoms/clinical signs, molecular functions, biological processes, cellular compartments, anatomic parts, cell types, bacteria and chemical compounds. We obtained more than 7 million relationships between more than 74 000 terms, and many types of relationships were not available in any other resource. As the terms were obtained from widely used resources and ontologies, the relationships are given using the standard identifiers provided by them and hence can be linked to other data. A web interface allows users to browse these associations, searching for relationships for a set of terms of interests provided as input, such as between a disease and their associated symptoms, underlying molecular processes or affected tissues. The results are presented in an interactive interface where the user can explore the reported relationships in different ways and follow links to other resources. Database URL: https://csbg.cnb.csic.es/CoMentG/.
Collapse
Affiliation(s)
- Jorge Novoa
- Computational Systems Biology, National Center for Biotechnology (CNB-CSIC), c/ Darwin, 3., Madrid 28049 , Spain
| | - Javier López-Ibáñez
- Computational Systems Biology, National Center for Biotechnology (CNB-CSIC), c/ Darwin, 3., Madrid 28049 , Spain
| | - Mónica Chagoyen
- Computational Systems Biology, National Center for Biotechnology (CNB-CSIC), c/ Darwin, 3., Madrid 28049 , Spain
| | - Juan A G Ranea
- Department of Molecular Biology and Biochemistry, University of Málaga, Avda. Cervantes, 2., Málaga 29071, Spain
- CIBER de Enfermedades Raras (CIBERER), Instituto de Salud Carlos III, Madrid, Spain
- Institute of Biomedical Research in Malaga and platform of nanomedicine (IBIMA platform BIONAND), Malaga 29071, Spain
- Spanish National Bioinformatics Institute (INB/ELIXIR-ES), Barcelona 08034, Spain
| | - Florencio Pazos
- Computational Systems Biology, National Center for Biotechnology (CNB-CSIC), c/ Darwin, 3., Madrid 28049 , Spain
| |
Collapse
|
35
|
Rosenberg FM, Kamali Z, Voorberg AN, Oude Munnink TH, van der Most PJ, Snieder H, Vaez A, Schuttelaar MLA. Transcriptomics- and Genomics-Guided Drug Repurposing for the Treatment of Vesicular Hand Eczema. Pharmaceutics 2024; 16:476. [PMID: 38675137 PMCID: PMC11054470 DOI: 10.3390/pharmaceutics16040476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 03/22/2024] [Accepted: 03/26/2024] [Indexed: 04/28/2024] Open
Abstract
Vesicular hand eczema (VHE), a clinical subtype of hand eczema (HE), showed limited responsiveness to alitretinoin, the only approved systemic treatment for severe chronic HE. This emphasizes the need for alternative treatment approaches. Therefore, our study aimed to identify drug repurposing opportunities for VHE using transcriptomics and genomics data. We constructed a gene network by combining 52 differentially expressed genes (DEGs) from a VHE transcriptomics study with 3 quantitative trait locus (QTL) genes associated with HE. Through network analysis, clustering, and functional enrichment analyses, we investigated the underlying biological mechanisms of this network. Next, we leveraged drug-gene interactions and retrieved pharmaco-transcriptomics data from the DrugBank database to identify drug repurposing opportunities for (V)HE. We developed a drug ranking system, primarily based on efficacy, safety, and practical and pricing factors, to select the most promising drug repurposing candidates. Our results revealed that the (V)HE network comprised 78 genes that yielded several biological pathways underlying the disease. The drug-gene interaction search together with pharmaco-transcriptomics lookups revealed 123 unique drug repurposing opportunities. Based on our drug ranking system, our study identified the most promising drug repurposing opportunities (e.g., vitamin D analogues, retinoids, and immunomodulating drugs) that might be effective in treating (V)HE.
Collapse
Affiliation(s)
- Fieke M. Rosenberg
- Department of Dermatology, University Medical Center Groningen, University of Groningen, 9713 GZ Groningen, The Netherlands; (F.M.R.); (A.N.V.)
| | - Zoha Kamali
- Department of Epidemiology, University Medical Centre Groningen, University of Groningen, 9713 GZ Groningen, The Netherlands (H.S.)
- Department of Bioinformatics, School of Advanced Medical Technologies, Isfahan University of Medical Sciences, Isfahan P.O. Box 81746-7346, Iran
| | - Angelique N. Voorberg
- Department of Dermatology, University Medical Center Groningen, University of Groningen, 9713 GZ Groningen, The Netherlands; (F.M.R.); (A.N.V.)
| | - Thijs H. Oude Munnink
- Department of Clinical Pharmacy and Pharmacology, University Medical Center Groningen, University of Groningen, 9713 GZ Groningen, The Netherlands;
| | - Peter J. van der Most
- Department of Epidemiology, University Medical Centre Groningen, University of Groningen, 9713 GZ Groningen, The Netherlands (H.S.)
| | - Harold Snieder
- Department of Epidemiology, University Medical Centre Groningen, University of Groningen, 9713 GZ Groningen, The Netherlands (H.S.)
| | - Ahmad Vaez
- Department of Epidemiology, University Medical Centre Groningen, University of Groningen, 9713 GZ Groningen, The Netherlands (H.S.)
- Department of Bioinformatics, School of Advanced Medical Technologies, Isfahan University of Medical Sciences, Isfahan P.O. Box 81746-7346, Iran
| | - Marie L. A. Schuttelaar
- Department of Dermatology, University Medical Center Groningen, University of Groningen, 9713 GZ Groningen, The Netherlands; (F.M.R.); (A.N.V.)
| |
Collapse
|
36
|
Gravel B, Renaux A, Papadimitriou S, Smits G, Nowé A, Lenaerts T. Prioritization of oligogenic variant combinations in whole exomes. Bioinformatics 2024; 40:btae184. [PMID: 38603604 PMCID: PMC11037482 DOI: 10.1093/bioinformatics/btae184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 01/29/2024] [Accepted: 04/10/2024] [Indexed: 04/13/2024] Open
Abstract
MOTIVATION Whole exome sequencing (WES) has emerged as a powerful tool for genetic research, enabling the collection of a tremendous amount of data about human genetic variation. However, properly identifying which variants are causative of a genetic disease remains an important challenge, often due to the number of variants that need to be screened. Expanding the screening to combinations of variants in two or more genes, as would be required under the oligogenic inheritance model, simply blows this problem out of proportion. RESULTS We present here the High-throughput oligogenic prioritizer (Hop), a novel prioritization method that uses direct oligogenic information at the variant, gene and gene pair level to detect digenic variant combinations in WES data. This method leverages information from a knowledge graph, together with specialized pathogenicity predictions in order to effectively rank variant combinations based on how likely they are to explain the patient's phenotype. The performance of Hop is evaluated in cross-validation on 36 120 synthetic exomes for training and 14 280 additional synthetic exomes for independent testing. Whereas the known pathogenic variant combinations are found in the top 20 in approximately 60% of the cross-validation exomes, 71% are found in the same ranking range when considering the independent set. These results provide a significant improvement over alternative approaches that depend simply on a monogenic assessment of pathogenicity, including early attempts for digenic ranking using monogenic pathogenicity scores. AVAILABILITY AND IMPLEMENTATION Hop is available at https://github.com/oligogenic/HOP.
Collapse
Affiliation(s)
- Barbara Gravel
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Department of Computer Science, Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Department of Computer Science, Artificial Intelligence Laboratory, Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Alexandre Renaux
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Department of Computer Science, Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Department of Computer Science, Artificial Intelligence Laboratory, Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Sofia Papadimitriou
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Department of Computer Science, Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Brussels Interuniversity Genomics High Throughput core (BRIGHTcore), UZ Brussel, Vrije Universiteit Brussel (VUB) - Université Libre de Bruxelles (ULB), 1090 Brussels, Belgium
| | - Guillaume Smits
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Center of Human Genetics, Hôpital Erasme, Hôpital Universitaire de Bruxelles, Université Libre de Bruxelles, 1070 Brussels, Belgium
| | - Ann Nowé
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Department of Computer Science, Artificial Intelligence Laboratory, Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Department of Computer Science, Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Department of Computer Science, Artificial Intelligence Laboratory, Vrije Universiteit Brussels, 1050 Brussels, Belgium
| |
Collapse
|
37
|
Zeng F, Wade A, Harbert K, Patel S, Holley JS, Dehghanpuor CK, Hopwood T, Marino S, Sophocleous A, Idris AI. Classical cannabinoid receptors as target in cancer-induced bone pain: a systematic review, meta-analysis and bioinformatics validation. Sci Rep 2024; 14:5782. [PMID: 38461339 PMCID: PMC10924854 DOI: 10.1038/s41598-024-56220-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Accepted: 03/04/2024] [Indexed: 03/11/2024] Open
Abstract
To test the hypothesis that genetic and pharmacological modulation of the classical cannabinoid type 1 (CB1) and 2 (CB2) receptors attenuate cancer-induced bone pain, we searched Medline, Web of Science and Scopus for relevant skeletal and non-skeletal cancer studies from inception to July 28, 2022. We identified 29 animal and 35 human studies. In mice, a meta-analysis of pooled studies showed that treatment of osteolysis-bearing males with the endocannabinoids AEA and 2-AG (mean difference [MD] - 24.83, 95% confidence interval [95%CI] - 34.89, - 14.76, p < 0.00001) or the synthetic cannabinoid (CB) agonists ACPA, WIN55,212-2, CP55,940 (CB1/2-non-selective) and AM1241 (CB2-selective) (MD - 28.73, 95%CI - 45.43, - 12.02, p = 0.0008) are associated with significant reduction in paw withdrawal frequency. Consistently, the synthetic agonists AM1241 and JWH015 (CB2-selective) increased paw withdrawal threshold (MD 0.89, 95%CI 0.79, 0.99, p < 0.00001), and ACEA (CB1-selective), AM1241 and JWH015 (CB2-selective) reduced spontaneous flinches (MD - 4.85, 95%CI - 6.74, - 2.96, p < 0. 00001) in osteolysis-bearing male mice. In rats, significant increase in paw withdrawal threshold is associated with the administration of ACEA and WIN55,212-2 (CB1/2-non-selective), JWH015 and AM1241 (CB2-selective) in osteolysis-bearing females (MD 8.18, 95%CI 6.14, 10.21, p < 0.00001), and treatment with AM1241 (CB2-selective) increased paw withdrawal thermal latency in males (mean difference [MD]: 3.94, 95%CI 2.13, 5.75, p < 0.0001), confirming the analgesic capabilities of CB1/2 ligands in rodents. In human, treatment of cancer patients with medical cannabis (standardized MD - 0.19, 95%CI - 0.35, - 0.02, p = 0.03) and the plant-derived delta-9-THC (20 mg) (MD 3.29, CI 2.24, 4.33, p < 0.00001) or its synthetic derivative NIB (4 mg) (MD 2.55, 95%CI 1.58, 3.51, p < 0.00001) are associated with reduction in pain intensity. Bioinformatics validation of KEGG, GO and MPO pathway, function and process enrichment analysis of mouse, rat and human data revealed that CB1 and CB2 receptors are enriched in a cocktail of nociceptive and sensory perception, inflammatory, immune-modulatory, and cancer pathways. Thus, we cautiously conclude that pharmacological modulators of CB1/2 receptors show promise in the treatment of cancer-induced bone pain, however further assessment of their effects on bone pain in genetically engineered animal models and cancer patients is warranted.
Collapse
Affiliation(s)
- Feier Zeng
- Department of Oncology and Metabolism, University of Sheffield, Medical School, Beech Hill Road, Sheffield, S10 2RX, UK
| | - Abbie Wade
- Department of Oncology and Metabolism, University of Sheffield, Medical School, Beech Hill Road, Sheffield, S10 2RX, UK
| | - Kade Harbert
- Department of Oncology and Metabolism, University of Sheffield, Medical School, Beech Hill Road, Sheffield, S10 2RX, UK
| | - Shrina Patel
- Department of Oncology and Metabolism, University of Sheffield, Medical School, Beech Hill Road, Sheffield, S10 2RX, UK
| | - Joshua S Holley
- Department of Oncology and Metabolism, University of Sheffield, Medical School, Beech Hill Road, Sheffield, S10 2RX, UK
| | - Cornelia K Dehghanpuor
- Department of Oncology and Metabolism, University of Sheffield, Medical School, Beech Hill Road, Sheffield, S10 2RX, UK
| | - Thomas Hopwood
- Department of Oncology and Metabolism, University of Sheffield, Medical School, Beech Hill Road, Sheffield, S10 2RX, UK
| | - Silvia Marino
- Department of Physiology and Cell Biology, University of Arkansas for Medical Sciences (UAMS), BioMed II, 238-2, Little Rock, AR, USA
| | - Antonia Sophocleous
- Department of Life Sciences, School of Sciences, European University Cyprus, 6 Diogenes Street, 1516, Nicosia, Cyprus.
| | - Aymen I Idris
- Department of Oncology and Metabolism, University of Sheffield, Medical School, Beech Hill Road, Sheffield, S10 2RX, UK.
| |
Collapse
|
38
|
Ermshaus A, Piechotta M, Rüter G, Keilholz U, Leser U, Benary M. preon: Fast and accurate entity normalization for drug names and cancer types in precision oncology. Bioinformatics 2024; 40:btae085. [PMID: 38383060 PMCID: PMC10918631 DOI: 10.1093/bioinformatics/btae085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 01/15/2024] [Accepted: 02/20/2024] [Indexed: 02/23/2024] Open
Abstract
MOTIVATION In precision oncology (PO), clinicians aim to find the best treatment for any patient based on their molecular characterization. A major bottleneck is the manual annotation and evaluation of individual variants, for which usually a range of knowledge bases are screened. To incorporate and integrate the vast information of different databases, fast and accurate methods for harmonizing databases with different types of information are necessary. An essential step for harmonization in PO includes the normalization of tumor entities as well as therapy options for patients. SUMMARY preon is a fast and accurate library for the normalization of drug names and cancer types in large-scale data integration. AVAILABILITY AND IMPLEMENTATION preon is implemented in Python and freely available via the PyPI repository. Source code and the data underlying this article are available in GitHub at https://github.com/ermshaua/preon/.
Collapse
Affiliation(s)
- Arik Ermshaus
- Institute for Computer Science, Humboldt-Universität zu Berlin, Berlin 10099, Germany
| | - Michael Piechotta
- Institute for Computer Science, Humboldt-Universität zu Berlin, Berlin 10099, Germany
| | - Gina Rüter
- Charite Comprehensive Cancer Center, Charite—Universitätsmedizin Berlin, Berlin 10115, Germany
| | - Ulrich Keilholz
- Charite Comprehensive Cancer Center, Charite—Universitätsmedizin Berlin, Berlin 10115, Germany
| | - Ulf Leser
- Institute for Computer Science, Humboldt-Universität zu Berlin, Berlin 10099, Germany
| | - Manuela Benary
- Charite Comprehensive Cancer Center, Charite—Universitätsmedizin Berlin, Berlin 10115, Germany
- Core Unit Bioinformatics (CUBI), Berlin Institute of Health, Charite—Universitätsmedizin Berlin, Berlin 10115, Germany
| |
Collapse
|
39
|
Margiotta-Casaluci L, Owen SF, Winter MJ. Cross-Species Extrapolation of Biological Data to Guide the Environmental Safety Assessment of Pharmaceuticals-The State of the Art and Future Priorities. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY 2024; 43:513-525. [PMID: 37067359 DOI: 10.1002/etc.5634] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 03/23/2023] [Accepted: 04/13/2023] [Indexed: 05/27/2023]
Abstract
The extrapolation of biological data across species is a key aspect of biomedical research and drug development. In this context, comparative biology considerations are applied with the goal of understanding human disease and guiding the development of effective and safe medicines. However, the widespread occurrence of pharmaceuticals in the environment and the need to assess the risk posed to wildlife have prompted a renewed interest in the extrapolation of pharmacological and toxicological data across the entire tree of life. To address this challenge, a biological "read-across" approach, based on the use of mammalian data to inform toxicity predictions in wildlife species, has been proposed as an effective way to streamline the environmental safety assessment of pharmaceuticals. Yet, how effective has this approach been, and are we any closer to being able to accurately predict environmental risk based on known human risk? We discuss the main theoretical and experimental advancements achieved in the last 10 years of research in this field. We propose that a better understanding of the functional conservation of drug targets across species and of the quantitative relationship between target modulation and adverse effects should be considered as future research priorities. This pharmacodynamic focus should be complemented with the application of higher-throughput experimental and computational approaches to accelerate the prediction of internal exposure dynamics. The translation of comparative (eco)toxicology research into real-world applications, however, relies on the (limited) availability of experts with the skill set needed to navigate the complexity of the problem; hence, we also call for synergistic multistakeholder efforts to support and strengthen comparative toxicology research and education at a global level. Environ Toxicol Chem 2024;43:513-525. © 2023 The Authors. Environmental Toxicology and Chemistry published by Wiley Periodicals LLC on behalf of SETAC.
Collapse
Affiliation(s)
- Luigi Margiotta-Casaluci
- Institute of Pharmaceutical Science, Faculty of Life Sciences & Medicine, King's College London, London, United Kingdom
| | - Stewart F Owen
- Global Sustainability, AstraZeneca, Macclesfield, Cheshire, United Kingdom
| | - Matthew J Winter
- Biosciences, Faculty of Health and Life Sciences, University of Exeter, Exeter, Devon, United Kingdom
| |
Collapse
|
40
|
Yan Z, Ge F, Liu Y, Zhang Y, Li F, Song J, Yu DJ. TransEFVP: A Two-Stage Approach for the Prediction of Human Pathogenic Variants Based on Protein Sequence Embedding Fusion. J Chem Inf Model 2024; 64:1407-1418. [PMID: 38334115 DOI: 10.1021/acs.jcim.3c02019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024]
Abstract
Studying the effect of single amino acid variations (SAVs) on protein structure and function is integral to advancing our understanding of molecular processes, evolutionary biology, and disease mechanisms. Screening for deleterious variants is one of the crucial issues in precision medicine. Here, we propose a novel computational approach, TransEFVP, based on large-scale protein language model embeddings and a transformer-based neural network to predict disease-associated SAVs. The model adopts a two-stage architecture: the first stage is designed to fuse different feature embeddings through a transformer encoder. In the second stage, a support vector machine model is employed to quantify the pathogenicity of SAVs after dimensionality reduction. The prediction performance of TransEFVP on blind test data achieves a Matthews correlation coefficient of 0.751, an F1-score of 0.846, and an area under the receiver operating characteristic curve of 0.871, higher than the existing state-of-the-art methods. The benchmark results demonstrate that TransEFVP can be explored as an accurate and effective SAV pathogenicity prediction method. The data and codes for TransEFVP are available at https://github.com/yzh9607/TransEFVP/tree/master for academic use.
Collapse
Affiliation(s)
- Zihao Yan
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China
| | - Fang Ge
- State Key Laboratory of Organic Electronics and lnformation Displays & lnstitute of Advanced Materials (IAM), Nanjing University of Posts & Telecommunications, 9 Wenyuan Road, Nanjing 210023, PR China
| | - Yan Liu
- Department of Computer Science, Yangzhou University, Yangzhou 225100, PR China
| | - Yumeng Zhang
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, PR China
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Fuyi Li
- South Australian immunoGENomics Cancer Institute (SAiGENCI), Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, South Australia 5005, Australia
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, Victoria 3000, Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China
| |
Collapse
|
41
|
Wyrwoll MJ, van der Heijden GW, Krausz C, Aston KI, Kliesch S, McLachlan R, Ramos L, Conrad DF, O'Bryan MK, Veltman JA, Tüttelmann F. Improved phenotypic classification of male infertility to promote discovery of genetic causes. Nat Rev Urol 2024; 21:91-101. [PMID: 37723288 DOI: 10.1038/s41585-023-00816-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/16/2023] [Indexed: 09/20/2023]
Abstract
An increasing number of genes are being described in the context of non-syndromic male infertility. Linking the underlying genetic causes of non-syndromic male infertility with clinical data from patients is important to establish new genotype-phenotype correlations. This process can be facilitated by using universal nomenclature, but no standardized vocabulary is available in the field of non-syndromic male infertility. The International Male Infertility Genomics Consortium aimed at filling this gap, providing a standardized vocabulary containing nomenclature based on the Human Phenotype Ontology (HPO). The "HPO tree" was substantially revised compared with the previous version and is based on the clinical work-up of infertile men, including physical examination and hormonal assessment. Some causes of male infertility can already be suspected based on the patient's clinical history, whereas in other instances, a testicular biopsy is needed for diagnosis. We assembled 49 HPO terms that are linked in a logical hierarchy and showed examples of morphological features of spermatozoa and testicular histology of infertile men with identified genetic diagnoses to describe the phenotypes. This work will help to record patients' phenotypes systematically and facilitate communication between geneticists and andrologists. Collaboration across institutions will improve the identification of patients with the same phenotypes, which will promote the discovery of novel genetic causes for non-syndromic male infertility.
Collapse
Affiliation(s)
- Margot J Wyrwoll
- Institute of Reproductive Genetics, University of Münster, Münster, Germany
| | | | - Csilla Krausz
- Department of Biomedical, Experimental and Clinical Sciences "Mario Serio", University of Florence, University Hospital of Careggi (AOUC), Florence, Italy
| | - Kenneth I Aston
- Andrology and IVF Laboratory, Department of Surgery (Urology), University of Utah, Salt Lake City, UT, USA
| | - Sabine Kliesch
- Centre of Reproductive Medicine and Andrology, Department of Clinical and Surgical Andrology, University of Münster, Münster, Germany
| | - Robert McLachlan
- Department of Clinical Research, Hudson Institute of Medical Research, Melbourne, Victoria, Australia
| | - Liliana Ramos
- Department of Obstetrics and Gynecology, Radboud University Medical Center, Nijmegen, Netherlands
| | - Donald F Conrad
- Department of Genetics, Oregon National Primate Research Center, Oregon Health and Science University, Beaverton, OR, USA
| | - Moira K O'Bryan
- School of BioSciences and Bio21 Institute, The University of Melbourne, Parkville, Victoria, Australia
| | - Joris A Veltman
- Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
| | - Frank Tüttelmann
- Institute of Reproductive Genetics, University of Münster, Münster, Germany.
| |
Collapse
|
42
|
Groza T, Caufield H, Gration D, Baynam G, Haendel MA, Robinson PN, Mungall CJ, Reese JT. An evaluation of GPT models for phenotype concept recognition. BMC Med Inform Decis Mak 2024; 24:30. [PMID: 38297371 PMCID: PMC10829255 DOI: 10.1186/s12911-024-02439-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 01/24/2024] [Indexed: 02/02/2024] Open
Abstract
OBJECTIVE Clinical deep phenotyping and phenotype annotation play a critical role in both the diagnosis of patients with rare disorders as well as in building computationally-tractable knowledge in the rare disorders field. These processes rely on using ontology concepts, often from the Human Phenotype Ontology, in conjunction with a phenotype concept recognition task (supported usually by machine learning methods) to curate patient profiles or existing scientific literature. With the significant shift in the use of large language models (LLMs) for most NLP tasks, we examine the performance of the latest Generative Pre-trained Transformer (GPT) models underpinning ChatGPT as a foundation for the tasks of clinical phenotyping and phenotype annotation. MATERIALS AND METHODS The experimental setup of the study included seven prompts of various levels of specificity, two GPT models (gpt-3.5-turbo and gpt-4.0) and two established gold standard corpora for phenotype recognition, one consisting of publication abstracts and the other clinical observations. RESULTS The best run, using in-context learning, achieved 0.58 document-level F1 score on publication abstracts and 0.75 document-level F1 score on clinical observations, as well as a mention-level F1 score of 0.7, which surpasses the current best in class tool. Without in-context learning, however, performance is significantly below the existing approaches. CONCLUSION Our experiments show that gpt-4.0 surpasses the state of the art performance if the task is constrained to a subset of the target ontology where there is prior knowledge of the terms that are expected to be matched. While the results are promising, the non-deterministic nature of the outcomes, the high cost and the lack of concordance between different runs using the same prompt and input make the use of these LLMs challenging for this particular task.
Collapse
Affiliation(s)
- Tudor Groza
- Rare Care Centre, Perth Children's Hospital, 15 Hospital Avenue, Nedlands, WA, 6009, Australia.
- Telethon Kids Institute, 15 Hospital Avenue, Nedlands, WA, 6009, Australia.
- School of Electrical Engineering, Computing and Mathematical Sciences, Curtin University, Kent St, Bentley, WA, 6102, Australia.
- SingHealth Duke-NUS Institute of Precision Medicine, 5 Hospital Drive Level 9, Singapore, 169609, Singapore.
| | - Harry Caufield
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Dylan Gration
- Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, 374 Bagot Road, Subiaco, WA, 6008, Australia
| | - Gareth Baynam
- Rare Care Centre, Perth Children's Hospital, 15 Hospital Avenue, Nedlands, WA, 6009, Australia
- Telethon Kids Institute, 15 Hospital Avenue, Nedlands, WA, 6009, Australia
- Western Australian Register of Developmental Anomalies, King Edward Memorial Hospital, 374 Bagot Road, Subiaco, WA, 6008, Australia
- Faculty of Health and Medical Sciences, University of Western Australia, 35 Stirling Hwy, Crawley, WA, 6009, Australia
| | - Melissa A Haendel
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
- Institute for Systems Genomics, University of Connecticut, Farmington, CT, 06032, USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Justin T Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| |
Collapse
|
43
|
Tu KJ, Roy SK, Keepers Z, Gartia MR, Shukla HD, Biswal NC. Docetaxel radiosensitizes castration-resistant prostate cancer by downregulating CAV-1. Int J Radiat Biol 2024; 100:256-267. [PMID: 37747697 DOI: 10.1080/09553002.2023.2263553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 09/18/2023] [Indexed: 09/26/2023]
Abstract
PURPOSE Docetaxel (DXL), a noted radiosensitizer, is one of the few chemotherapy drugs approved for castration-resistant prostate cancer (CRPC), though only a fraction of CRPCs respond to it. CAV-1, a critical regulator of radioresistance, has been known to modulate DXL and radiation effects. Combining DXL with radiotherapy may create a synergistic anticancer effect through CAV-1 and improve CRPC patients' response to therapy. Here, we investigate the effectiveness and molecular characteristics of DXL and radiation combination therapy in vitro. MATERIALS AND METHODS We used live/dead assays to determine the IC50 of DXL for PC3, DU-145, and TRAMP-C1 cells. Colony formation assay was used to determine the radioresponse of the same cells treated with radiation with/without IC50 DXL (4, 8, and 12 Gy). We performed gene expression analysis on public transcriptomic data collected from human-derived prostate cancer cell lines (C4-2, PC3, DU-145, and LNCaP) treated with DXL for 8, 16, and 72 hours. Cell cycle arrest and protein expression were assessed using flow cytometry and western blot, respectively. RESULTS Compared to radiation alone, combination therapy with DXL significantly increased CRPC death in PC3 (1.48-fold, p < .0001), DU-145 (1.64-fold, p < .05), and TRAMP-C1 (1.13-fold, p < .05) at 4 Gy of radiation. Gene expression of CRPC treated with DXL revealed downregulated genes related to cell cycle regulation and upregulated genes related to immune activation and oxidative stress. Confirming the results, G2/M cell cycle arrest was significantly increased after treatment with DXL and radiation. CAV-1 protein expression was decreased after DXL treatment in a dose-dependent manner; furthermore, CAV-1 copy number was strongly associated with poor response to therapy in CRPC patients. CONCLUSIONS Our results suggest that DXL sensitizes CRPC cells to radiation by downregulating CAV-1. DXL + radiation combination therapy may be effective at treating CRPC, especially subtypes associated with high CAV-1 expression, and should be studied further.
Collapse
Affiliation(s)
- Kevin J Tu
- Department of Radiation Oncology, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD, USA
| | - Sanjit K Roy
- Department of Radiation Oncology, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Zachery Keepers
- Department of Radiation Oncology, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Manas R Gartia
- Department of Mechanical and Industrial Engineering, Louisiana State University, Baton Rouge, LA, USA
| | - Hem D Shukla
- Department of Radiation Oncology, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Nrusingh C Biswal
- Department of Radiation Oncology, University of Maryland School of Medicine, Baltimore, MD, USA
| |
Collapse
|
44
|
Gargano MA, Matentzoglu N, Coleman B, Addo-Lartey EB, Anagnostopoulos A, Anderton J, Avillach P, Bagley AM, Bakštein E, Balhoff JP, Baynam G, Bello SM, Berk M, Bertram H, Bishop S, Blau H, Bodenstein DF, Botas P, Boztug K, Čady J, Callahan TJ, Cameron R, Carbon S, Castellanos F, Caufield JH, Chan LE, Chute C, Cruz-Rojo J, Dahan-Oliel N, Davids JR, de Dieuleveult M, de Souza V, de Vries BBA, de Vries E, DePaulo JR, Derfalvi B, Dhombres F, Diaz-Byrd C, Dingemans AJM, Donadille B, Duyzend M, Elfeky R, Essaid S, Fabrizzi C, Fico G, Firth HV, Freudenberg-Hua Y, Fullerton JM, Gabriel DL, Gilmour K, Giordano J, Goes FS, Moses RG, Green I, Griese M, Groza T, Gu W, Guthrie J, Gyori B, Hamosh A, Hanauer M, Hanušová K, He Y(O, Hegde H, Helbig I, Holasová K, Hoyt CT, Huang S, Hurwitz E, Jacobsen JOB, Jiang X, Joseph L, Keramatian K, King B, Knoflach K, Koolen DA, Kraus M, Kroll C, Kusters M, Ladewig MS, Lagorce D, Lai MC, Lapunzina P, Laraway B, Lewis-Smith D, Li X, Lucano C, Majd M, Marazita ML, Martinez-Glez V, McHenry TH, McInnis MG, McMurry JA, Mihulová M, Millett CE, Mitchell PB, Moslerová V, Narutomi K, Nematollahi S, Nevado J, et alGargano MA, Matentzoglu N, Coleman B, Addo-Lartey EB, Anagnostopoulos A, Anderton J, Avillach P, Bagley AM, Bakštein E, Balhoff JP, Baynam G, Bello SM, Berk M, Bertram H, Bishop S, Blau H, Bodenstein DF, Botas P, Boztug K, Čady J, Callahan TJ, Cameron R, Carbon S, Castellanos F, Caufield JH, Chan LE, Chute C, Cruz-Rojo J, Dahan-Oliel N, Davids JR, de Dieuleveult M, de Souza V, de Vries BBA, de Vries E, DePaulo JR, Derfalvi B, Dhombres F, Diaz-Byrd C, Dingemans AJM, Donadille B, Duyzend M, Elfeky R, Essaid S, Fabrizzi C, Fico G, Firth HV, Freudenberg-Hua Y, Fullerton JM, Gabriel DL, Gilmour K, Giordano J, Goes FS, Moses RG, Green I, Griese M, Groza T, Gu W, Guthrie J, Gyori B, Hamosh A, Hanauer M, Hanušová K, He Y(O, Hegde H, Helbig I, Holasová K, Hoyt CT, Huang S, Hurwitz E, Jacobsen JOB, Jiang X, Joseph L, Keramatian K, King B, Knoflach K, Koolen DA, Kraus M, Kroll C, Kusters M, Ladewig MS, Lagorce D, Lai MC, Lapunzina P, Laraway B, Lewis-Smith D, Li X, Lucano C, Majd M, Marazita ML, Martinez-Glez V, McHenry TH, McInnis MG, McMurry JA, Mihulová M, Millett CE, Mitchell PB, Moslerová V, Narutomi K, Nematollahi S, Nevado J, Nierenberg AA, Čajbiková NN, Nurnberger JI, Ogishima S, Olson D, Ortiz A, Pachajoa H, Perez de Nanclares G, Peters A, Putman T, Rapp CK, Rath A, Reese J, Rekerle L, Roberts A, Roy S, Sanders SJ, Schuetz C, Schulte EC, Schulze TG, Schwarz M, Scott K, Seelow D, Seitz B, Shen Y, Similuk MN, Simon ES, Singh B, Smedley D, Smith CL, Smolinsky JT, Sperry S, Stafford E, Stefancsik R, Steinhaus R, Strawbridge R, Sundaramurthi JC, Talapova P, Tenorio Castano JA, Tesner P, Thomas RH, Thurm A, Turnovec M, van Gijn ME, Vasilevsky NA, Vlčková M, Walden A, Wang K, Wapner R, Ware JS, Wiafe AA, Wiafe SA, Wiggins LD, Williams AE, Wu C, Wyrwoll MJ, Xiong H, Yalin N, Yamamoto Y, Yatham LN, Yocum AK, Young AH, Yüksel Z, Zandi PP, Zankl A, Zarante I, Zvolský M, Toro S, Carmody LC, Harris NL, Munoz-Torres MC, Danis D, Mungall CJ, Köhler S, Haendel MA, Robinson PN. The Human Phenotype Ontology in 2024: phenotypes around the world. Nucleic Acids Res 2024; 52:D1333-D1346. [PMID: 37953324 PMCID: PMC10767975 DOI: 10.1093/nar/gkad1005] [Show More Authors] [Citation(s) in RCA: 75] [Impact Index Per Article: 75.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 10/12/2023] [Accepted: 10/19/2023] [Indexed: 11/14/2023] Open
Abstract
The Human Phenotype Ontology (HPO) is a widely used resource that comprehensively organizes and defines the phenotypic features of human disease, enabling computational inference and supporting genomic and phenotypic analyses through semantic similarity and machine learning algorithms. The HPO has widespread applications in clinical diagnostics and translational research, including genomic diagnostics, gene-disease discovery, and cohort analytics. In recent years, groups around the world have developed translations of the HPO from English to other languages, and the HPO browser has been internationalized, allowing users to view HPO term labels and in many cases synonyms and definitions in ten languages in addition to English. Since our last report, a total of 2239 new HPO terms and 49235 new HPO annotations were developed, many in collaboration with external groups in the fields of psychiatry, arthrogryposis, immunology and cardiology. The Medical Action Ontology (MAxO) is a new effort to model treatments and other measures taken for clinical management. Finally, the HPO consortium is contributing to efforts to integrate the HPO and the GA4GH Phenopacket Schema into electronic health records (EHRs) with the goal of more standardized and computable integration of rare disease data in EHRs.
Collapse
Affiliation(s)
| | | | - Ben Coleman
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | | | - Joel Anderton
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | | | - Anita M Bagley
- Shriners Children's Northern California, Sacramento, CA, USA
| | - Eduard Bakštein
- National Institute of Mental Health, Klecany, Czech Republic
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC 27517, USA
| | - Gareth Baynam
- Rare Care Centre, Perth Children's Hospital, Perth, Australia
| | | | - Michael Berk
- Deakin University, IMPACT - the Institute for Mental and Physical Health and Clinical Translation, School of Medicine, Barwon Health, Geelong, Australia
| | - Holli Bertram
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | - Somer Bishop
- Department of Psychiatry and Behavioral Sciences, UCSF Weil Institute for Neuroscience, San Francisco, CA, USA
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - David F Bodenstein
- Department of Pharmacology and Toxicology, University of Toronto, Toronto, ON, Canada
| | | | - Kaan Boztug
- St. Anna Children's Cancer Research Institute (CCRI), Vienna, Austria
| | - Jolana Čady
- Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, NY, NY, USA
| | | | - Seth J Carbon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | - J Harry Caufield
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Lauren E Chan
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, USA
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, MD 21287, USA
| | - Jaime Cruz-Rojo
- UDISGEN (Dysmorphology and Genetics Unit), 12 de Octubre Hospital, Madrid, Spain
| | - Noémi Dahan-Oliel
- Department of Clinical Research, Shriners Hospitals for Children, Montreal, Quebec, Canada
| | - Jon R Davids
- Shriners Children's Northern California, Sacramento, CA, USA
| | - Maud de Dieuleveult
- Département I&D, AP-HP, Banque Nationale de Données Maladies Rares, Paris, France
| | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Bert B A de Vries
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, Netherlands
| | | | - J Raymond DePaulo
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Beata Derfalvi
- Department of Pediatrics, Dalhousie University, Halifax, NS, Canada
| | - Ferdinand Dhombres
- Fetal Medicine Department, Armand Trousseau Hospital, Sorbonne University, GRC26, INSERM, Limics, Paris, France
| | - Claudia Diaz-Byrd
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | - Alexander J M Dingemans
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, Netherlands
| | - Bruno Donadille
- St Antoine Hospital, Reference Center for Rare Growth Endocrine Disorders, Sorbonne University, AP-HP, INSERM, US14 - Orphanet, Plateforme Maladies Rares, Paris, France
| | | | - Reem Elfeky
- Department of Immunology, GOS Hospital for Children NHS Foundation Trust, University College London, London, UK
| | - Shahim Essaid
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | - Giovanna Fico
- Bipolar and Depressive Disorders Unit, Institute of Neuroscience, Hospital Clinic, University of Barcelona, IDIBAPS, CIBERSAM, Barcelona, Catalonia, Spain
| | - Helen V Firth
- Addenbrooke's Hospital, Cambridge University Hospitals, Cambridge, UK
| | - Yun Freudenberg-Hua
- Department of Psychiatry, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
| | | | - Davera L Gabriel
- School of Medicine, Johns Hopkins University, Baltimore, MD 21287, USA
| | | | - Jessica Giordano
- Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY, USA
| | - Fernando S Goes
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Rachel Gore Moses
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Ian Green
- SNOMED International, London W2 6BD, UK
| | - Matthias Griese
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, LMU Munich, German center for Lung research (DZL), Munich, Germany
| | - Tudor Groza
- Rare Care Centre, Perth Children's Hospital, Perth, Australia
| | | | - Julia Guthrie
- Department of Structural and Computational Biology, University of Vienna; Max Perutz Labs, Vienna, Austria
| | - Benjamin Gyori
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Ada Hamosh
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Marc Hanauer
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, Paris, France
| | - Kateřina Hanušová
- Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic
| | | | - Harshad Hegde
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Ingo Helbig
- Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kateřina Holasová
- Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic
| | - Charles Tapley Hoyt
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | | | - Eric Hurwitz
- University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Julius O B Jacobsen
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | | | - Lisa Joseph
- Neurodevelopmental and Behavioral Phenotyping Service, National Institute of Mental Health, Bethesda, MD, USA
| | - Kamyar Keramatian
- Department of Psychiatry, University of British Columbia, Vancouver, BC, Canada
| | - Bryan King
- Department of Psychiatry and Behavioral Sciences, UCSF Weil Institute for Neuroscience, San Francisco, CA, USA
| | - Katrin Knoflach
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, LMU Munich, German center for Lung research (DZL), Munich, Germany
| | - David A Koolen
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, Netherlands
| | - Megan L Kraus
- University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Carlo Kroll
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Maaike Kusters
- Immunology, NIHR Great Ormond Street Hospital BRC, London, UK
| | - Markus S Ladewig
- Department of Ophthalmology, University Clinic Marburg - Campus Fulda, Fulda, Germany
| | - David Lagorce
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, Paris, France
| | - Meng-Chuan Lai
- Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, Canada
| | - Pablo Lapunzina
- Institute of Medical and Molecular Genetics, Hospital Univ. La Paz, Madrid, Spain
| | - Bryan Laraway
- University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - David Lewis-Smith
- Translational and Clinical Research Institute, Henry Wellcome Building, Framlington Place, Newcastle University, Newcastle-Upon-Tyne NE14LP, UK
| | | | - Caterina Lucano
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, Paris, France
| | - Marzieh Majd
- Department of Psychiatry, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Mary L Marazita
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Victor Martinez-Glez
- Center for Genomic Medicine, Parc Taulí Hospital Universitari, Institut d’Investigació i Innovació Parc Taulí (I3PT-CERCA), Sabadell, Spain
| | - Toby H McHenry
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Melvin G McInnis
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | - Julie A McMurry
- University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Michaela Mihulová
- Department of Biology and Medical Genetics, 2nd Medical Faculty of Charles University and University Hospital Motol, Prague, Czech Republic
| | - Caitlin E Millett
- Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
| | - Philip B Mitchell
- Discipline of Psychiatry & Mental Health, School of Clinical Medicine, Faculty of Medicine & Health, University of New South Wales, Sydney, NSW, Australia
| | - Veronika Moslerová
- Department of Biology and Medical Genetics, 2nd Medical Faculty of Charles University and University Hospital Motol, Prague, Czech Republic
| | - Kenji Narutomi
- Okinawa Prefectural Nanbu Medical Center & Children's Medical Center
| | - Shahrzad Nematollahi
- School of Physical and Occupational Therapy, McGill University, Montreal, Quebec, Canada
| | - Julian Nevado
- Institute of Medical and Molecular Genetics, Hospital Univ. La Paz, Madrid, Spain
| | - Andrew A Nierenberg
- Dauten Family Center for Bipolar Treatment Innovation, Massachusetts General Hospital, Boston, MA, USA
| | - Nikola Novák Čajbiková
- Department of Biology and Medical Genetics, 2nd Medical Faculty of Charles University and University Hospital Motol, Prague, Czech Republic
| | - John I Nurnberger
- Stark Neurosciences Research Institute, Departments of Psychiatry and Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | | | - Daniel Olson
- Data Collaboration Center, Data Science, Critical Path Institute, Tucson, AZ, USA
| | - Abigail Ortiz
- Department of Psychiatry, University of Toronto, Toronto, ON, Canada
| | - Harry Pachajoa
- Centro de Investigaciones en Anomalías Congénitas y Enfermedades Raras (CIACER), Universidad Icesi, Cali, Colombia
| | - Guiomar Perez de Nanclares
- Molecular (epi) genetics lab, Bioaraba Health Research Institute, Araba University Hospital, Vitoria-Gasteiz, Spain
| | - Amy Peters
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - Tim Putman
- University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Christina K Rapp
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, LMU Munich, German center for Lung research (DZL), Munich, Germany
| | - Ana Rath
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, Paris, France
| | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Lauren Rekerle
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Angharad M Roberts
- National Heart & Lung Institute & MRC London Institute of Medical Sciences, Imperial College London, London W12 0HS, UK
| | - Suzy Roy
- SNOMED International, London W2 6BD, UK
| | - Stephan J Sanders
- Department of Paediatrics, Institute of Developmental and Regenerative Medicine, University of Oxford, Oxford, UK
| | - Catharina Schuetz
- Universitätsklinikum Carl Gustav Carus, Medizinische Fakultät, TU, Dresden, Germany
| | - Eva C Schulte
- Institute of Psychiatric Phenomics and Genomics (IPPG), LMU University Hospital, LMU Munich, Munich, Germany
| | - Thomas G Schulze
- Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Martin Schwarz
- Department of Biology and Medical Genetics, 2nd Medical Faculty of Charles University and University Hospital Motol, Prague, Czech Republic
| | - Katie Scott
- Department of Psychiatry, Dalhousie University, Halifax, NS, Canada
| | - Dominik Seelow
- Exploratory Diagnostic Sciences, Berliner Institut für Gesundheitsforschung - Charité, Berlin, Germany
| | - Berthold Seitz
- Department of Ophthalmology, Saarland University Medical Center UKS, Homburg/Saar, Germany
| | | | - Morgan N Similuk
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Eric S Simon
- Eisenberg Family Depression Center, University of Michigan, Ann Arbor, MI, USA
| | - Balwinder Singh
- Department of Psychiatry and Psychology, Mayo Clinic, Rochester, MN, USA
| | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | | | - Jake T Smolinsky
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, USA
| | - Sarah Sperry
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | | | - Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Robin Steinhaus
- Exploratory Diagnostic Sciences, Berliner Institut für Gesundheitsforschung - Charité, Berlin, Germany
| | - Rebecca Strawbridge
- Department of Psychological Medicine, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | | | - Polina Talapova
- Institute for Research and Health Policy Studies, Tufts Medicine, Boston, MA 2111, USA
| | | | - Pavel Tesner
- Department of Biology and Medical Genetics, 2nd Medical Faculty of Charles University and University Hospital Motol, Prague, Czech Republic
| | - Rhys H Thomas
- Translational and Clinical Research Institute, Henry Wellcome Building, Framlington Place, Newcastle University, Newcastle-Upon-Tyne NE14LP, UK
| | - Audrey Thurm
- Neurodevelopmental and Behavioral Phenotyping Service, National Institute of Mental Health, Bethesda, MD, USA
| | - Marek Turnovec
- Department of Biology and Medical Genetics, 2nd Medical Faculty of Charles University and University Hospital Motol, Prague, Czech Republic
| | - Marielle E van Gijn
- Department of Genetics, University Medical Center Groningen, Groningen, Netherlands
| | | | - Markéta Vlčková
- Department of Biology and Medical Genetics, 2nd Medical Faculty of Charles University and University Hospital Motol, Prague, Czech Republic
| | - Anita Walden
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Kai Wang
- Chinese HPO Consortium, Beijing, China
| | - Ron Wapner
- Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY, USA
| | - James S Ware
- National Heart & Lung Institute & MRC London Institute of Medical Sciences, Imperial College London, London W12 0HS, UK
| | | | | | - Lisa D Wiggins
- National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Andrew E Williams
- Institute for Research and Health Policy Studies, Tufts Medicine, Boston, MA 2111, USA
| | - Chen Wu
- Chinese HPO Consortium, Beijing, China
| | - Margot J Wyrwoll
- Centre for Regenerative Medicine, Institute for Regeneration and Repair, Institute for Stem Cell Research, University of Edinburgh, Edinburgh, UK
| | - Hui Xiong
- Chinese HPO Consortium, Beijing, China
| | - Nefize Yalin
- Department of Psychological Medicine, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | - Yasunori Yamamoto
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Japan
| | - Lakshmi N Yatham
- Department of Psychiatry, University of British Columbia, Vancouver, BC, Canada
| | - Anastasia K Yocum
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | - Allan H Young
- Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London & South London and Maudsley NHS Foundation Trust, Bethlem Royal Hospital, Monks Orchard Road, Beckenham, Kent, London SE5 8AF, UK
| | - Zafer Yüksel
- Department of Human Genetics, Bioscientia Healthcare GmbH, Ingelheim, Germany
| | - Peter P Zandi
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Andreas Zankl
- Faculty of Medicine and Health, The University of Sydney, Camperdown, Australia
| | - Ignacio Zarante
- Institute of Human Genetics, Pontificia Universidad Javeriana, Bogotá, Colombia
| | - Miroslav Zvolský
- Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic
| | - Sabrina Toro
- University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Leigh C Carmody
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Nomi L Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Monica C Munoz-Torres
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Daniel Danis
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | - Melissa A Haendel
- University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| |
Collapse
|
45
|
Putman TE, Schaper K, Matentzoglu N, Rubinetti V, Alquaddoomi F, Cox C, Caufield JH, Elsarboukh G, Gehrke S, Hegde H, Reese J, Braun I, Bruskiewich R, Cappelletti L, Carbon S, Caron A, Chan L, Chute C, Cortes K, De Souza V, Fontana T, Harris N, Hartley E, Hurwitz E, Jacobsen JB, Krishnamurthy M, Laraway B, McLaughlin J, McMurry J, Moxon ST, Mullen K, O’Neil S, Shefchek K, Stefancsik R, Toro S, Vasilevsky N, Walls R, Whetzel P, Osumi-Sutherland D, Smedley D, Robinson P, Mungall C, Haendel M, Munoz-Torres M. The Monarch Initiative in 2024: an analytic platform integrating phenotypes, genes and diseases across species. Nucleic Acids Res 2024; 52:D938-D949. [PMID: 38000386 PMCID: PMC10767791 DOI: 10.1093/nar/gkad1082] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 10/21/2023] [Accepted: 11/02/2023] [Indexed: 11/26/2023] Open
Abstract
Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research. The Monarch App is an integrated platform combining data about genes, phenotypes, and diseases across species. Monarch's APIs enable access to carefully curated datasets and advanced analysis tools that support the understanding and diagnosis of disease for diverse applications such as variant prioritization, deep phenotyping, and patient profile-matching. We have migrated our system into a scalable, cloud-based infrastructure; simplified Monarch's data ingestion and knowledge graph integration systems; enhanced data mapping and integration standards; and developed a new user interface with novel search and graph navigation features. Furthermore, we advanced Monarch's analytic tools by developing a customized plugin for OpenAI's ChatGPT to increase the reliability of its responses about phenotypic data, allowing us to interrogate the knowledge in the Monarch graph using state-of-the-art Large Language Models. The resources of the Monarch Initiative can be found at monarchinitiative.org and its corresponding code repository at github.com/monarch-initiative/monarch-app.
Collapse
Affiliation(s)
- Tim E Putman
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Kevin Schaper
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | - Vincent P Rubinetti
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Faisal S Alquaddoomi
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Corey Cox
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - J Harry Caufield
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Glass Elsarboukh
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Sarah Gehrke
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Harshad Hegde
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Justin T Reese
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Ian Braun
- Data Collaboration Center, Critical Path Institute, Tucson, AZ 85718, USA
| | | | | | - Seth Carbon
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Anita R Caron
- European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Lauren E Chan
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, USA
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Katherina G Cortes
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | - Tommaso Fontana
- Dipartimento di Informatica, Università degli Studi di Milano Statale, Milano, Italy
| | - Nomi L Harris
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Emily L Hartley
- Data Collaboration Center, Critical Path Institute, Tucson, AZ 85718, USA
| | - Eric Hurwitz
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Julius O B Jacobsen
- William Harvey Research Institute, Queen Mary University of London, London EC1M 6BQ, UK
| | - Madan Krishnamurthy
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Bryan J Laraway
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | - Julie A McMurry
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Sierra A T Moxon
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Kathleen R Mullen
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Shawn T O’Neil
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Kent A Shefchek
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Sabrina Toro
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | - Ramona L Walls
- Data Collaboration Center, Critical Path Institute, Tucson, AZ 85718, USA
| | - Patricia L Whetzel
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, London EC1M 6BQ, UK
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 6032, USA
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Melissa A Haendel
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Monica C Munoz-Torres
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| |
Collapse
|
46
|
Bertolini E, Babbi G, Savojardo C, Martelli PL, Casadio R. MultifacetedProtDB: a database of human proteins with multiple functions. Nucleic Acids Res 2024; 52:D494-D501. [PMID: 37791887 PMCID: PMC10767882 DOI: 10.1093/nar/gkad783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 08/29/2023] [Accepted: 09/15/2023] [Indexed: 10/05/2023] Open
Abstract
MultifacetedProtDB is a database of multifunctional human proteins deriving information from other databases, including UniProt, GeneCards, Human Protein Atlas (HPA), Human Phenotype Ontology (HPO) and MONDO. It collects under the label 'multifaceted' multitasking proteins addressed in literature as pleiotropic, multidomain, promiscuous (in relation to enzymes catalysing multiple substrates) and moonlighting (with two or more molecular functions), and difficult to be retrieved with a direct search in existing non-specific databases. The study of multifunctional proteins is an expanding research area aiming to elucidate the complexities of biological processes, particularly in humans, where multifunctional proteins play roles in various processes, including signal transduction, metabolism, gene regulation and cellular communication, and are often involved in disease insurgence and progression. The webserver allows searching by gene, protein and any associated structural and functional information, like available structures from PDB, structural models and interactors, using multiple filters. Protein entries are supplemented with comprehensive annotations including EC number, GO terms (biological pathways, molecular functions, and cellular components), pathways from Reactome, subcellular localization from UniProt, tissue and cell type expression from HPA, and associated diseases following MONDO, Orphanet and OMIM classification. MultiFacetedProtDB is freely available as a web server at: https://multifacetedprotdb.biocomp.unibo.it/.
Collapse
Affiliation(s)
- Elisa Bertolini
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Giulia Babbi
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| |
Collapse
|
47
|
Schubach M, Maass T, Nazaretyan L, Röner S, Kircher M. CADD v1.7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions. Nucleic Acids Res 2024; 52:D1143-D1154. [PMID: 38183205 PMCID: PMC10767851 DOI: 10.1093/nar/gkad989] [Citation(s) in RCA: 42] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/14/2023] [Accepted: 10/17/2023] [Indexed: 01/07/2024] Open
Abstract
Machine Learning-based scoring and classification of genetic variants aids the assessment of clinical findings and is employed to prioritize variants in diverse genetic studies and analyses. Combined Annotation-Dependent Depletion (CADD) is one of the first methods for the genome-wide prioritization of variants across different molecular functions and has been continuously developed and improved since its original publication. Here, we present our most recent release, CADD v1.7. We explored and integrated new annotation features, among them state-of-the-art protein language model scores (Meta ESM-1v), regulatory variant effect predictions (from sequence-based convolutional neural networks) and sequence conservation scores (Zoonomia). We evaluated the new version on data sets derived from ClinVar, ExAC/gnomAD and 1000 Genomes variants. For coding effects, we tested CADD on 31 Deep Mutational Scanning (DMS) data sets from ProteinGym and, for regulatory effect prediction, we used saturation mutagenesis reporter assay data of promoter and enhancer sequences. The inclusion of new features further improved the overall performance of CADD. As with previous releases, all data sets, genome-wide CADD v1.7 scores, scripts for on-site scoring and an easy-to-use webserver are readily provided via https://cadd.bihealth.org/ or https://cadd.gs.washington.edu/ to the community.
Collapse
Affiliation(s)
- Max Schubach
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Thorben Maass
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| | - Lusiné Nazaretyan
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Sebastian Röner
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Martin Kircher
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| |
Collapse
|
48
|
Carmody LC, Gargano MA, Toro S, Vasilevsky NA, Adam MP, Blau H, Chan LE, Gomez-Andres D, Horvath R, Kraus ML, Ladewig MS, Lewis-Smith D, Lochmüller H, Matentzoglu NA, Munoz-Torres MC, Schuetz C, Seitz B, Similuk MN, Sparks TN, Strauss T, Swietlik EM, Thompson R, Zhang XA, Mungall CJ, Haendel MA, Robinson PN. The Medical Action Ontology: A tool for annotating and analyzing treatments and clinical management of human disease. MED 2023; 4:913-927.e3. [PMID: 37963467 PMCID: PMC10842845 DOI: 10.1016/j.medj.2023.10.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 08/31/2023] [Accepted: 10/14/2023] [Indexed: 11/16/2023]
Abstract
BACKGROUND Navigating the clinical literature to determine the optimal clinical management for rare diseases presents significant challenges. We introduce the Medical Action Ontology (MAxO), an ontology specifically designed to organize medical procedures, therapies, and interventions. METHODS MAxO incorporates logical structures that link MAxO terms to numerous other ontologies within the OBO Foundry. Term development involves a blend of manual and semi-automated processes. Additionally, we have generated annotations detailing diagnostic modalities for specific phenotypic abnormalities defined by the Human Phenotype Ontology (HPO). We introduce a web application, POET, that facilitates MAxO annotations for specific medical actions for diseases using the Mondo Disease Ontology. FINDINGS MAxO encompasses 1,757 terms spanning a wide range of biomedical domains, from human anatomy and investigations to the chemical and protein entities involved in biological processes. These terms annotate phenotypic features associated with specific disease (using HPO and Mondo). Presently, there are over 16,000 MAxO diagnostic annotations that target HPO terms. Through POET, we have created 413 MAxO annotations specifying treatments for 189 rare diseases. CONCLUSIONS MAxO offers a computational representation of treatments and other actions taken for the clinical management of patients. Its development is closely coupled to Mondo and HPO, broadening the scope of our computational modeling of diseases and phenotypic features. We invite the community to contribute disease annotations using POET (https://poet.jax.org/). MAxO is available under the open-source CC-BY 4.0 license (https://github.com/monarch-initiative/MAxO). FUNDING NHGRI 1U24HG011449-01A1 and NHGRI 5RM1HG010860-04.
Collapse
Affiliation(s)
- Leigh C Carmody
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Sabrina Toro
- University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | | - Margaret P Adam
- University of Washington School of Medicine, Seattle, WA, USA
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - David Gomez-Andres
- Pediatric Neurology, Vall d'Hebron Institut de Recerca (VHIR), Hospital Universitari Vall d'Hebron, Vall d'Hebron Barcelona Hospital Campus, Passeig Vall d'Hebron 119-129, 08035 Barcelona, Spain
| | - Rita Horvath
- Department of Clinical Neurosciences, University of Cambridge, Robinson Way, Cambridge CB2 0PY, UK
| | - Megan L Kraus
- University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Markus S Ladewig
- Department of Ophthalmology, Klinikum Saarbrücken, Saarbrücken, Germany
| | - David Lewis-Smith
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Hanns Lochmüller
- Children's Hospital of Eastern Ontario Research Institute, Ottowa, Canada; Division of Neurology, Department of Medicine, The Ottawa Hospital, Ottawa, Canada; Brain and Mind Research Institute, University of Ottawa, Ottawa, Canada; Department of Neuropediatrics and Muscle Disorders, Medical Center - University of Freiburg, Faculty of Medicine, Freiburg, Germany; Centro Nacional de Análisis Genómico, Barcelona, Spain
| | | | | | - Catharina Schuetz
- Department of Pediatrics, Medizinische Fakultät Carl Gustav Carus, Technische Universität Dresden, 01307 Dresden, Germany
| | - Berthold Seitz
- Department of Ophthalmology, Saarland University Medical Center UKS, Homburg, Saar, Germany
| | - Morgan N Similuk
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Teresa N Sparks
- Department of Obstetrics, Gynecology, & Reproductive Sciences, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Timmy Strauss
- Department of Pediatrics, Medizinische Fakultät Carl Gustav Carus, Technische Universität Dresden, 01307 Dresden, Germany
| | - Emilia M Swietlik
- Department of Medicine, University of Cambridge, Heart and Lung Research Institute, Cambridge CB2 0BB, UK
| | - Rachel Thompson
- Children's Hospital of Eastern Ontario Research Institute, Ottowa, Canada
| | | | | | | | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
| |
Collapse
|
49
|
Mohr SE, Kim AR, Hu Y, Perrimon N. Finding information about uncharacterized Drosophila melanogaster genes. Genetics 2023; 225:iyad187. [PMID: 37933691 PMCID: PMC10697813 DOI: 10.1093/genetics/iyad187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 10/02/2023] [Indexed: 11/08/2023] Open
Abstract
Genes that have been identified in the genome but remain uncharacterized with regards to function offer an opportunity to uncover novel biological information. Novelty is exciting but can also be a barrier. If nothing is known, how does one start planning and executing experiments? Here, we provide a recommended information-mining workflow and a corresponding guide to accessing information about uncharacterized Drosophila melanogaster genes, such as those assigned only a systematic coding gene identifier. The available information can provide insights into where and when the gene is expressed, what the function of the gene might be, whether there are similar genes in other species, whether there are known relationships to other genes, and whether any other features have already been determined. In addition, available information about relevant reagents can inspire and facilitate experimental studies. Altogether, mining available information can help prioritize genes for further study, as well as provide starting points for experimental assays and other analyses.
Collapse
Affiliation(s)
- Stephanie E Mohr
- Department of Genetics, Blavatnik Institute, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Ah-Ram Kim
- Department of Genetics, Blavatnik Institute, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Yanhui Hu
- Department of Genetics, Blavatnik Institute, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | - Norbert Perrimon
- Department of Genetics, Blavatnik Institute, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
- Howard Hughes Medical Institute, Boston, MA 02115, USA
| |
Collapse
|
50
|
Groza T, Wu H, Dinger ME, Danis D, Hilton C, Bagley A, Davids JR, Luo L, Lu Z, Robinson PN. Term-BLAST-like alignment tool for concept recognition in noisy clinical texts. Bioinformatics 2023; 39:btad716. [PMID: 38001031 PMCID: PMC10710372 DOI: 10.1093/bioinformatics/btad716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/20/2023] [Accepted: 11/23/2023] [Indexed: 11/26/2023] Open
Abstract
MOTIVATION Methods for concept recognition (CR) in clinical texts have largely been tested on abstracts or articles from the medical literature. However, texts from electronic health records (EHRs) frequently contain spelling errors, abbreviations, and other nonstandard ways of representing clinical concepts. RESULTS Here, we present a method inspired by the BLAST algorithm for biosequence alignment that screens texts for potential matches on the basis of matching k-mer counts and scores candidates based on conformance to typical patterns of spelling errors derived from 2.9 million clinical notes. Our method, the Term-BLAST-like alignment tool (TBLAT) leverages a gold standard corpus for typographical errors to implement a sequence alignment-inspired method for efficient entity linkage. We present a comprehensive experimental comparison of TBLAT with five widely used tools. Experimental results show an increase of 10% in recall on scientific publications and 20% increase in recall on EHR records (when compared against the next best method), hence supporting a significant enhancement of the entity linking task. The method can be used stand-alone or as a complement to existing approaches. AVAILABILITY AND IMPLEMENTATION Fenominal is a Java library that implements TBLAT for named CR of Human Phenotype Ontology terms and is available at https://github.com/monarch-initiative/fenominal under the GNU General Public License v3.0.
Collapse
Affiliation(s)
- Tudor Groza
- Rare Care Centre, Perth Children’s Hospital, Nedlands, WA 6009, Australia
- Genetics and Rare Diseases Program, Telethon Kids Institute, Nedlands, WA 6009, Australia
| | - Honghan Wu
- Institute of Health Informatics, University College London, London WC1E 6BT, United Kingdom
| | - Marcel E Dinger
- Pryzm Health, Sydney, NSW 2089, Australia
- School of Life and Environmental Sciences, Faculty of Science, University of Sydney, NSW 2006, Australia
| | - Daniel Danis
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, United States
| | - Coleman Hilton
- Shriners Children’s Corporate Headquarters, Tampa, FL 33607, United States
| | - Anita Bagley
- Shriners Children's Northern California, Sacramento, CA 95817, United States
| | - Jon R Davids
- Shriners Children's Northern California, Sacramento, CA 95817, United States
| | - Ling Luo
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, United States
- Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, United States
| |
Collapse
|