1
|
Callahan TJ, Tripodi IJ, Stefanski AL, Cappelletti L, Taneja SB, Wyrwa JM, Casiraghi E, Matentzoglu NA, Reese J, Silverstein JC, Hoyt CT, Boyce RD, Malec SA, Unni DR, Joachimiak MP, Robinson PN, Mungall CJ, Cavalleri E, Fontana T, Valentini G, Mesiti M, Gillenwater LA, Santangelo B, Vasilevsky NA, Hoehndorf R, Bennett TD, Ryan PB, Hripcsak G, Kahn MG, Bada M, Baumgartner WA, Hunter LE. An open source knowledge graph ecosystem for the life sciences. Sci Data 2024; 11:363. [PMID: 38605048 PMCID: PMC11009265 DOI: 10.1038/s41597-024-03171-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 03/21/2024] [Indexed: 04/13/2024] Open
Abstract
Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA.
| | - Ignacio J Tripodi
- Computer Science Department, Interdisciplinary Quantitative Biology, University of Colorado Boulder, Boulder, CO, 80301, USA
| | - Adrianne L Stefanski
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Luca Cappelletti
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Jordan M Wyrwa
- Department of Physical Medicine and Rehabilitation, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Elena Casiraghi
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Jonathan C Silverstein
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA
| | - Charles Tapley Hoyt
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
| | - Richard D Boyce
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15206, USA
| | - Scott A Malec
- Division of Translational Informatics, University of New Mexico School of Medicine, Albuquerque, NM, 87131, USA
| | - Deepak R Unni
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Marcin P Joachimiak
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Peter N Robinson
- Berlin Institute of Health at Charité-Universitatsmedizin, 10117, Berlin, Germany
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Emanuele Cavalleri
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Tommaso Fontana
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Giorgio Valentini
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
- ELLIS, European Laboratory for Learning and Intelligent Systems, Milan Unit, Italy
| | - Marco Mesiti
- AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano, Via Celoria 18, 20133, Milan, Italy
| | - Lucas A Gillenwater
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Brook Santangelo
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Nicole A Vasilevsky
- Data Collaboration Center, Critical Path Institute, 1840 E River Rd. Suite 100, Tucson, AZ, 85718, USA
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Tellen D Bennett
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Patrick B Ryan
- Janssen Research and Development, Raritan, NJ, 08869, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Michael G Kahn
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Michael Bada
- Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - William A Baumgartner
- Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| |
Collapse
|
2
|
Gargano MA, Matentzoglu N, Coleman B, Addo-Lartey EB, Anagnostopoulos A, Anderton J, Avillach P, Bagley AM, Bakštein E, Balhoff JP, Baynam G, Bello SM, Berk M, Bertram H, Bishop S, Blau H, Bodenstein DF, Botas P, Boztug K, Čady J, Callahan TJ, Cameron R, Carbon S, Castellanos F, Caufield JH, Chan LE, Chute C, Cruz-Rojo J, Dahan-Oliel N, Davids JR, de Dieuleveult M, de Souza V, de Vries BBA, de Vries E, DePaulo JR, Derfalvi B, Dhombres F, Diaz-Byrd C, Dingemans AJM, Donadille B, Duyzend M, Elfeky R, Essaid S, Fabrizzi C, Fico G, Firth HV, Freudenberg-Hua Y, Fullerton JM, Gabriel DL, Gilmour K, Giordano J, Goes FS, Moses RG, Green I, Griese M, Groza T, Gu W, Guthrie J, Gyori B, Hamosh A, Hanauer M, Hanušová K, He Y(O, Hegde H, Helbig I, Holasová K, Hoyt CT, Huang S, Hurwitz E, Jacobsen JOB, Jiang X, Joseph L, Keramatian K, King B, Knoflach K, Koolen DA, Kraus M, Kroll C, Kusters M, Ladewig MS, Lagorce D, Lai MC, Lapunzina P, Laraway B, Lewis-Smith D, Li X, Lucano C, Majd M, Marazita ML, Martinez-Glez V, McHenry TH, McInnis MG, McMurry JA, Mihulová M, Millett CE, Mitchell PB, Moslerová V, Narutomi K, Nematollahi S, Nevado J, Nierenberg AA, Čajbiková NN, Nurnberger JI, Ogishima S, Olson D, Ortiz A, Pachajoa H, Perez de Nanclares G, Peters A, Putman T, Rapp CK, Rath A, Reese J, Rekerle L, Roberts A, Roy S, Sanders SJ, Schuetz C, Schulte EC, Schulze TG, Schwarz M, Scott K, Seelow D, Seitz B, Shen Y, Similuk MN, Simon ES, Singh B, Smedley D, Smith CL, Smolinsky JT, Sperry S, Stafford E, Stefancsik R, Steinhaus R, Strawbridge R, Sundaramurthi JC, Talapova P, Tenorio Castano JA, Tesner P, Thomas RH, Thurm A, Turnovec M, van Gijn ME, Vasilevsky NA, Vlčková M, Walden A, Wang K, Wapner R, Ware JS, Wiafe AA, Wiafe SA, Wiggins LD, Williams AE, Wu C, Wyrwoll MJ, Xiong H, Yalin N, Yamamoto Y, Yatham LN, Yocum AK, Young AH, Yüksel Z, Zandi PP, Zankl A, Zarante I, Zvolský M, Toro S, Carmody LC, Harris NL, Munoz-Torres MC, Danis D, Mungall CJ, Köhler S, Haendel MA, Robinson PN. The Human Phenotype Ontology in 2024: phenotypes around the world. Nucleic Acids Res 2024; 52:D1333-D1346. [PMID: 37953324 PMCID: PMC10767975 DOI: 10.1093/nar/gkad1005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 10/12/2023] [Accepted: 10/19/2023] [Indexed: 11/14/2023] Open
Abstract
The Human Phenotype Ontology (HPO) is a widely used resource that comprehensively organizes and defines the phenotypic features of human disease, enabling computational inference and supporting genomic and phenotypic analyses through semantic similarity and machine learning algorithms. The HPO has widespread applications in clinical diagnostics and translational research, including genomic diagnostics, gene-disease discovery, and cohort analytics. In recent years, groups around the world have developed translations of the HPO from English to other languages, and the HPO browser has been internationalized, allowing users to view HPO term labels and in many cases synonyms and definitions in ten languages in addition to English. Since our last report, a total of 2239 new HPO terms and 49235 new HPO annotations were developed, many in collaboration with external groups in the fields of psychiatry, arthrogryposis, immunology and cardiology. The Medical Action Ontology (MAxO) is a new effort to model treatments and other measures taken for clinical management. Finally, the HPO consortium is contributing to efforts to integrate the HPO and the GA4GH Phenopacket Schema into electronic health records (EHRs) with the goal of more standardized and computable integration of rare disease data in EHRs.
Collapse
Affiliation(s)
| | | | - Ben Coleman
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | | | - Joel Anderton
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | | | - Anita M Bagley
- Shriners Children's Northern California, Sacramento, CA, USA
| | - Eduard Bakštein
- National Institute of Mental Health, Klecany, Czech Republic
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC 27517, USA
| | - Gareth Baynam
- Rare Care Centre, Perth Children's Hospital, Perth, Australia
| | | | - Michael Berk
- Deakin University, IMPACT - the Institute for Mental and Physical Health and Clinical Translation, School of Medicine, Barwon Health, Geelong, Australia
| | - Holli Bertram
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | - Somer Bishop
- Department of Psychiatry and Behavioral Sciences, UCSF Weil Institute for Neuroscience, San Francisco, CA, USA
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - David F Bodenstein
- Department of Pharmacology and Toxicology, University of Toronto, Toronto, ON, Canada
| | | | - Kaan Boztug
- St. Anna Children's Cancer Research Institute (CCRI), Vienna, Austria
| | - Jolana Čady
- Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, NY, NY, USA
| | | | - Seth J Carbon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | - J Harry Caufield
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Lauren E Chan
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, USA
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, MD 21287, USA
| | - Jaime Cruz-Rojo
- UDISGEN (Dysmorphology and Genetics Unit), 12 de Octubre Hospital, Madrid, Spain
| | - Noémi Dahan-Oliel
- Department of Clinical Research, Shriners Hospitals for Children, Montreal, Quebec, Canada
| | - Jon R Davids
- Shriners Children's Northern California, Sacramento, CA, USA
| | - Maud de Dieuleveult
- Département I&D, AP-HP, Banque Nationale de Données Maladies Rares, Paris, France
| | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Bert B A de Vries
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, Netherlands
| | | | - J Raymond DePaulo
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Beata Derfalvi
- Department of Pediatrics, Dalhousie University, Halifax, NS, Canada
| | - Ferdinand Dhombres
- Fetal Medicine Department, Armand Trousseau Hospital, Sorbonne University, GRC26, INSERM, Limics, Paris, France
| | - Claudia Diaz-Byrd
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | - Alexander J M Dingemans
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, Netherlands
| | - Bruno Donadille
- St Antoine Hospital, Reference Center for Rare Growth Endocrine Disorders, Sorbonne University, AP-HP, INSERM, US14 - Orphanet, Plateforme Maladies Rares, Paris, France
| | | | - Reem Elfeky
- Department of Immunology, GOS Hospital for Children NHS Foundation Trust, University College London, London, UK
| | - Shahim Essaid
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | | | - Giovanna Fico
- Bipolar and Depressive Disorders Unit, Institute of Neuroscience, Hospital Clinic, University of Barcelona, IDIBAPS, CIBERSAM, Barcelona, Catalonia, Spain
| | - Helen V Firth
- Addenbrooke's Hospital, Cambridge University Hospitals, Cambridge, UK
| | - Yun Freudenberg-Hua
- Department of Psychiatry, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
| | | | - Davera L Gabriel
- School of Medicine, Johns Hopkins University, Baltimore, MD 21287, USA
| | | | - Jessica Giordano
- Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY, USA
| | - Fernando S Goes
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Rachel Gore Moses
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Ian Green
- SNOMED International, London W2 6BD, UK
| | - Matthias Griese
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, LMU Munich, German center for Lung research (DZL), Munich, Germany
| | - Tudor Groza
- Rare Care Centre, Perth Children's Hospital, Perth, Australia
| | | | - Julia Guthrie
- Department of Structural and Computational Biology, University of Vienna; Max Perutz Labs, Vienna, Austria
| | - Benjamin Gyori
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Ada Hamosh
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Marc Hanauer
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, Paris, France
| | - Kateřina Hanušová
- Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic
| | | | - Harshad Hegde
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Ingo Helbig
- Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kateřina Holasová
- Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic
| | - Charles Tapley Hoyt
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | | | - Eric Hurwitz
- University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Julius O B Jacobsen
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | | | - Lisa Joseph
- Neurodevelopmental and Behavioral Phenotyping Service, National Institute of Mental Health, Bethesda, MD, USA
| | - Kamyar Keramatian
- Department of Psychiatry, University of British Columbia, Vancouver, BC, Canada
| | - Bryan King
- Department of Psychiatry and Behavioral Sciences, UCSF Weil Institute for Neuroscience, San Francisco, CA, USA
| | - Katrin Knoflach
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, LMU Munich, German center for Lung research (DZL), Munich, Germany
| | - David A Koolen
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, Netherlands
| | - Megan L Kraus
- University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Carlo Kroll
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Maaike Kusters
- Immunology, NIHR Great Ormond Street Hospital BRC, London, UK
| | - Markus S Ladewig
- Department of Ophthalmology, University Clinic Marburg - Campus Fulda, Fulda, Germany
| | - David Lagorce
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, Paris, France
| | - Meng-Chuan Lai
- Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, Canada
| | - Pablo Lapunzina
- Institute of Medical and Molecular Genetics, Hospital Univ. La Paz, Madrid, Spain
| | - Bryan Laraway
- University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - David Lewis-Smith
- Translational and Clinical Research Institute, Henry Wellcome Building, Framlington Place, Newcastle University, Newcastle-Upon-Tyne NE14LP, UK
| | | | - Caterina Lucano
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, Paris, France
| | - Marzieh Majd
- Department of Psychiatry, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Mary L Marazita
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Victor Martinez-Glez
- Center for Genomic Medicine, Parc Taulí Hospital Universitari, Institut d’Investigació i Innovació Parc Taulí (I3PT-CERCA), Sabadell, Spain
| | - Toby H McHenry
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Melvin G McInnis
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | - Julie A McMurry
- University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Michaela Mihulová
- Department of Biology and Medical Genetics, 2nd Medical Faculty of Charles University and University Hospital Motol, Prague, Czech Republic
| | - Caitlin E Millett
- Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, USA
| | - Philip B Mitchell
- Discipline of Psychiatry & Mental Health, School of Clinical Medicine, Faculty of Medicine & Health, University of New South Wales, Sydney, NSW, Australia
| | - Veronika Moslerová
- Department of Biology and Medical Genetics, 2nd Medical Faculty of Charles University and University Hospital Motol, Prague, Czech Republic
| | - Kenji Narutomi
- Okinawa Prefectural Nanbu Medical Center & Children's Medical Center
| | - Shahrzad Nematollahi
- School of Physical and Occupational Therapy, McGill University, Montreal, Quebec, Canada
| | - Julian Nevado
- Institute of Medical and Molecular Genetics, Hospital Univ. La Paz, Madrid, Spain
| | - Andrew A Nierenberg
- Dauten Family Center for Bipolar Treatment Innovation, Massachusetts General Hospital, Boston, MA, USA
| | - Nikola Novák Čajbiková
- Department of Biology and Medical Genetics, 2nd Medical Faculty of Charles University and University Hospital Motol, Prague, Czech Republic
| | - John I Nurnberger
- Stark Neurosciences Research Institute, Departments of Psychiatry and Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | | | - Daniel Olson
- Data Collaboration Center, Data Science, Critical Path Institute, Tucson, AZ, USA
| | - Abigail Ortiz
- Department of Psychiatry, University of Toronto, Toronto, ON, Canada
| | - Harry Pachajoa
- Centro de Investigaciones en Anomalías Congénitas y Enfermedades Raras (CIACER), Universidad Icesi, Cali, Colombia
| | - Guiomar Perez de Nanclares
- Molecular (epi) genetics lab, Bioaraba Health Research Institute, Araba University Hospital, Vitoria-Gasteiz, Spain
| | - Amy Peters
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - Tim Putman
- University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Christina K Rapp
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, LMU Munich, German center for Lung research (DZL), Munich, Germany
| | - Ana Rath
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, Paris, France
| | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Lauren Rekerle
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Angharad M Roberts
- National Heart & Lung Institute & MRC London Institute of Medical Sciences, Imperial College London, London W12 0HS, UK
| | - Suzy Roy
- SNOMED International, London W2 6BD, UK
| | - Stephan J Sanders
- Department of Paediatrics, Institute of Developmental and Regenerative Medicine, University of Oxford, Oxford, UK
| | - Catharina Schuetz
- Universitätsklinikum Carl Gustav Carus, Medizinische Fakultät, TU, Dresden, Germany
| | - Eva C Schulte
- Institute of Psychiatric Phenomics and Genomics (IPPG), LMU University Hospital, LMU Munich, Munich, Germany
| | - Thomas G Schulze
- Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Martin Schwarz
- Department of Biology and Medical Genetics, 2nd Medical Faculty of Charles University and University Hospital Motol, Prague, Czech Republic
| | - Katie Scott
- Department of Psychiatry, Dalhousie University, Halifax, NS, Canada
| | - Dominik Seelow
- Exploratory Diagnostic Sciences, Berliner Institut für Gesundheitsforschung - Charité, Berlin, Germany
| | - Berthold Seitz
- Department of Ophthalmology, Saarland University Medical Center UKS, Homburg/Saar, Germany
| | | | - Morgan N Similuk
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Eric S Simon
- Eisenberg Family Depression Center, University of Michigan, Ann Arbor, MI, USA
| | - Balwinder Singh
- Department of Psychiatry and Psychology, Mayo Clinic, Rochester, MN, USA
| | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | | | - Jake T Smolinsky
- Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ, USA
| | - Sarah Sperry
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | | | - Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Robin Steinhaus
- Exploratory Diagnostic Sciences, Berliner Institut für Gesundheitsforschung - Charité, Berlin, Germany
| | - Rebecca Strawbridge
- Department of Psychological Medicine, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | | | - Polina Talapova
- Institute for Research and Health Policy Studies, Tufts Medicine, Boston, MA 2111, USA
| | | | - Pavel Tesner
- Department of Biology and Medical Genetics, 2nd Medical Faculty of Charles University and University Hospital Motol, Prague, Czech Republic
| | - Rhys H Thomas
- Translational and Clinical Research Institute, Henry Wellcome Building, Framlington Place, Newcastle University, Newcastle-Upon-Tyne NE14LP, UK
| | - Audrey Thurm
- Neurodevelopmental and Behavioral Phenotyping Service, National Institute of Mental Health, Bethesda, MD, USA
| | - Marek Turnovec
- Department of Biology and Medical Genetics, 2nd Medical Faculty of Charles University and University Hospital Motol, Prague, Czech Republic
| | - Marielle E van Gijn
- Department of Genetics, University Medical Center Groningen, Groningen, Netherlands
| | | | - Markéta Vlčková
- Department of Biology and Medical Genetics, 2nd Medical Faculty of Charles University and University Hospital Motol, Prague, Czech Republic
| | - Anita Walden
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Kai Wang
- Chinese HPO Consortium, Beijing, China
| | - Ron Wapner
- Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY, USA
| | - James S Ware
- National Heart & Lung Institute & MRC London Institute of Medical Sciences, Imperial College London, London W12 0HS, UK
| | | | | | - Lisa D Wiggins
- National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Andrew E Williams
- Institute for Research and Health Policy Studies, Tufts Medicine, Boston, MA 2111, USA
| | - Chen Wu
- Chinese HPO Consortium, Beijing, China
| | - Margot J Wyrwoll
- Centre for Regenerative Medicine, Institute for Regeneration and Repair, Institute for Stem Cell Research, University of Edinburgh, Edinburgh, UK
| | - Hui Xiong
- Chinese HPO Consortium, Beijing, China
| | - Nefize Yalin
- Department of Psychological Medicine, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | - Yasunori Yamamoto
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Japan
| | - Lakshmi N Yatham
- Department of Psychiatry, University of British Columbia, Vancouver, BC, Canada
| | - Anastasia K Yocum
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | - Allan H Young
- Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London & South London and Maudsley NHS Foundation Trust, Bethlem Royal Hospital, Monks Orchard Road, Beckenham, Kent, London SE5 8AF, UK
| | - Zafer Yüksel
- Department of Human Genetics, Bioscientia Healthcare GmbH, Ingelheim, Germany
| | - Peter P Zandi
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Andreas Zankl
- Faculty of Medicine and Health, The University of Sydney, Camperdown, Australia
| | - Ignacio Zarante
- Institute of Human Genetics, Pontificia Universidad Javeriana, Bogotá, Colombia
| | - Miroslav Zvolský
- Institute of Health Information and Statistics of the Czech Republic, Prague, Czech Republic
| | - Sabrina Toro
- University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Leigh C Carmody
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Nomi L Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Monica C Munoz-Torres
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Daniel Danis
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | - Melissa A Haendel
- University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| |
Collapse
|
3
|
Stefanucci L, Collins J, Sims MC, Barrio-Hernandez I, Sun L, Burren OS, Perfetto L, Bender I, Callahan TJ, Fleming K, Guerrero JA, Hermjakob H, Martin MJ, Stephenson J, Paneerselvam K, Petrovski S, Porras P, Robinson PN, Wang Q, Watkins X, Frontini M, Laskowski RA, Beltrao P, Di Angelantonio E, Gomez K, Laffan M, Ouwehand WH, Mumford AD, Freson K, Carss K, Downes K, Gleadall N, Megy K, Bruford E, Vuckovic D. The effects of pathogenic and likely pathogenic variants for inherited hemostasis disorders in 140 214 UK Biobank participants. Blood 2023; 142:2055-2068. [PMID: 37647632 PMCID: PMC10733830 DOI: 10.1182/blood.2023020118] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 08/04/2023] [Accepted: 08/04/2023] [Indexed: 09/01/2023] Open
Abstract
Rare genetic diseases affect millions, and identifying causal DNA variants is essential for patient care. Therefore, it is imperative to estimate the effect of each independent variant and improve their pathogenicity classification. Our study of 140 214 unrelated UK Biobank (UKB) participants found that each of them carries a median of 7 variants previously reported as pathogenic or likely pathogenic. We focused on 967 diagnostic-grade gene (DGG) variants for rare bleeding, thrombotic, and platelet disorders (BTPDs) observed in 12 367 UKB participants. By association analysis, for a subset of these variants, we estimated effect sizes for platelet count and volume, and odds ratios for bleeding and thrombosis. Variants causal of some autosomal recessive platelet disorders revealed phenotypic consequences in carriers. Loss-of-function variants in MPL, which cause chronic amegakaryocytic thrombocytopenia if biallelic, were unexpectedly associated with increased platelet counts in carriers. We also demonstrated that common variants identified by genome-wide association studies (GWAS) for platelet count or thrombosis risk may influence the penetrance of rare variants in BTPD DGGs on their associated hemostasis disorders. Network-propagation analysis applied to an interactome of 18 410 nodes and 571 917 edges showed that GWAS variants with large effect sizes are enriched in DGGs and their first-order interactors. Finally, we illustrate the modifying effect of polygenic scores for platelet count and thrombosis risk on disease severity in participants carrying rare variants in TUBB1 or PROC and PROS1, respectively. Our findings demonstrate the power of association analyses using large population datasets in improving pathogenicity classifications of rare variants.
Collapse
Affiliation(s)
- Luca Stefanucci
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom
- National Health Service Blood and Transplant, Cambridge Biomedical Campus, Cambridge, United Kingdom
- British Heart Foundation, BHF Centre of Research Excellence, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Janine Collins
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom
- National Health Service Blood and Transplant, Cambridge Biomedical Campus, Cambridge, United Kingdom
- Department of Haematology, Barts Health NHS Trust, London, United Kingdom
| | - Matthew C. Sims
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom
- National Health Service Blood and Transplant, Cambridge Biomedical Campus, Cambridge, United Kingdom
- Department of Haematology, Sheffield Teaching Hospitals NHS Foundation Trust, Royal Hallamshire Hospital, Sheffield, United Kingdom
- Department of Oncology and Metabolism, University of Sheffield, Sheffield, United Kingdom
| | - Inigo Barrio-Hernandez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, United Kingdom
| | - Luanluan Sun
- Department of Public Health and Primary Care, BHF Cardiovascular Epidemiology Unit, University of Cambridge, Cambridge, United Kingdom
| | - Oliver S. Burren
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, United Kingdom
| | - Livia Perfetto
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, United Kingdom
- Department of Biology and Biotechnology “C.Darwin,” Sapienza University of Rome, Rome, Italy
| | - Isobel Bender
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | - Tiffany J. Callahan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY
| | - Kathryn Fleming
- School of Cellular and Molecular Medicine, University of Bristol, Bristol, United Kingdom
| | - Jose A. Guerrero
- National Health Service Blood and Transplant, Cambridge Biomedical Campus, Cambridge, United Kingdom
- Department of Haematology, Barts Health NHS Trust, London, United Kingdom
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, United Kingdom
| | - Maria J. Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, United Kingdom
| | - James Stephenson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, United Kingdom
| | - NIHR BioResource
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom
- National Health Service Blood and Transplant, Cambridge Biomedical Campus, Cambridge, United Kingdom
- British Heart Foundation, BHF Centre of Research Excellence, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom
- Department of Haematology, Barts Health NHS Trust, London, United Kingdom
- Department of Haematology, Sheffield Teaching Hospitals NHS Foundation Trust, Royal Hallamshire Hospital, Sheffield, United Kingdom
- Department of Oncology and Metabolism, University of Sheffield, Sheffield, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, United Kingdom
- Department of Public Health and Primary Care, BHF Cardiovascular Epidemiology Unit, University of Cambridge, Cambridge, United Kingdom
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, United Kingdom
- Department of Biology and Biotechnology “C.Darwin,” Sapienza University of Rome, Rome, Italy
- Department of Biochemistry, University of Oxford, Oxford, United Kingdom
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY
- School of Cellular and Molecular Medicine, University of Bristol, Bristol, United Kingdom
- Centre for Genomics Research, Discovery Sciences, AstraZeneca, Cambridge, United Kingdom
- Department of Medicine, Austin Health, The University of Melbourne, Melbourne, Australia
- Genomic Medicine, The Jackson Laboratory, Farmington, CT
- Institute for Systems Genomics, University of Connecticut, Farmington, CT
- Department of Clinical and Biomedical Sciences, Faculty of Health and Life Sciences RILD Building, University of Exeter Medical School, Exeter, United Kingdom
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
- Heart and Lung Research Institute, University of Cambridge, Cambridge, United Kingdom
- NIHR Blood and Transplant Research Unit in Donor Health and Behaviour, Cambridge, United Kingdom
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, United Kingdom
- Health Data Science Centre, Human Technopole, Milan, Italy
- Haemophilia Centre and Thrombosis Unit, Royal Free London NHS Foundation Trust, London, United Kingdom
- Department of Haematology, Imperial College Healthcare NHS Trust, London, United Kingdom
- Department of Immunology and Inflammation, Centre for Haematology, Imperial College London, London, United Kingdom
- Department of Haematology, University College London Hospitals NHS Trust, London, United Kingdom
- Department of Cardiovascular Sciences, Center for Molecular and Vascular Biology, KULeuven, Leuven, Belgium
- Cambridge Genomics Laboratory, Cambridge University Hospitals National Health Service Foundation Trust, Cambridge Biomedical Campus, Cambridge, United Kingdom
- Department of Epidemiology and Biostatistics, Imperial College London, London, United Kingdom
| | - Kalpana Paneerselvam
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, United Kingdom
| | - Slavé Petrovski
- Centre for Genomics Research, Discovery Sciences, AstraZeneca, Cambridge, United Kingdom
- Department of Medicine, Austin Health, The University of Melbourne, Melbourne, Australia
| | - Pablo Porras
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, United Kingdom
| | - Peter N. Robinson
- Genomic Medicine, The Jackson Laboratory, Farmington, CT
- Institute for Systems Genomics, University of Connecticut, Farmington, CT
| | - Quanli Wang
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, United Kingdom
| | - Xavier Watkins
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, United Kingdom
| | - Mattia Frontini
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom
- National Health Service Blood and Transplant, Cambridge Biomedical Campus, Cambridge, United Kingdom
- British Heart Foundation, BHF Centre of Research Excellence, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom
- Department of Clinical and Biomedical Sciences, Faculty of Health and Life Sciences RILD Building, University of Exeter Medical School, Exeter, United Kingdom
| | - Roman A. Laskowski
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, United Kingdom
| | - Pedro Beltrao
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
| | - Emanuele Di Angelantonio
- British Heart Foundation, BHF Centre of Research Excellence, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom
- Department of Public Health and Primary Care, BHF Cardiovascular Epidemiology Unit, University of Cambridge, Cambridge, United Kingdom
- Heart and Lung Research Institute, University of Cambridge, Cambridge, United Kingdom
- NIHR Blood and Transplant Research Unit in Donor Health and Behaviour, Cambridge, United Kingdom
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, United Kingdom
- Health Data Science Centre, Human Technopole, Milan, Italy
| | - Keith Gomez
- Haemophilia Centre and Thrombosis Unit, Royal Free London NHS Foundation Trust, London, United Kingdom
| | - Mike Laffan
- Department of Haematology, Imperial College Healthcare NHS Trust, London, United Kingdom
- Department of Immunology and Inflammation, Centre for Haematology, Imperial College London, London, United Kingdom
| | - Willem H. Ouwehand
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom
- National Health Service Blood and Transplant, Cambridge Biomedical Campus, Cambridge, United Kingdom
- Department of Haematology, University College London Hospitals NHS Trust, London, United Kingdom
| | - Andrew D. Mumford
- School of Cellular and Molecular Medicine, University of Bristol, Bristol, United Kingdom
| | - Kathleen Freson
- Department of Cardiovascular Sciences, Center for Molecular and Vascular Biology, KULeuven, Leuven, Belgium
| | - Keren Carss
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, United Kingdom
| | - Kate Downes
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom
- National Health Service Blood and Transplant, Cambridge Biomedical Campus, Cambridge, United Kingdom
- Cambridge Genomics Laboratory, Cambridge University Hospitals National Health Service Foundation Trust, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Nick Gleadall
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom
- National Health Service Blood and Transplant, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Karyn Megy
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Elspeth Bruford
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, United Kingdom
| | - Dragana Vuckovic
- Department of Epidemiology and Biostatistics, Imperial College London, London, United Kingdom
| |
Collapse
|
4
|
Antony B, Blau H, Casiraghi E, Loomba JJ, Callahan TJ, Laraway BJ, Wilkins KJ, Antonescu CC, Valentini G, Williams AE, Robinson PN, Reese JT, Murali TM. Predictive models of long COVID. EBioMedicine 2023; 96:104777. [PMID: 37672869 PMCID: PMC10494314 DOI: 10.1016/j.ebiom.2023.104777] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 07/24/2023] [Accepted: 08/15/2023] [Indexed: 09/08/2023] Open
Abstract
BACKGROUND The cause and symptoms of long COVID are poorly understood. It is challenging to predict whether a given COVID-19 patient will develop long COVID in the future. METHODS We used electronic health record (EHR) data from the National COVID Cohort Collaborative to predict the incidence of long COVID. We trained two machine learning (ML) models - logistic regression (LR) and random forest (RF). Features used to train predictors included symptoms and drugs ordered during acute infection, measures of COVID-19 treatment, pre-COVID comorbidities, and demographic information. We assigned the 'long COVID' label to patients diagnosed with the U09.9 ICD10-CM code. The cohorts included patients with (a) EHRs reported from data partners using U09.9 ICD10-CM code and (b) at least one EHR in each feature category. We analysed three cohorts: all patients (n = 2,190,579; diagnosed with long COVID = 17,036), inpatients (149,319; 3,295), and outpatients (2,041,260; 13,741). FINDINGS LR and RF models yielded median AUROC of 0.76 and 0.75, respectively. Ablation study revealed that drugs had the highest influence on the prediction task. The SHAP method identified age, gender, cough, fatigue, albuterol, obesity, diabetes, and chronic lung disease as explanatory features. Models trained on data from one N3C partner and tested on data from the other partners had average AUROC of 0.75. INTERPRETATION ML-based classification using EHR information from the acute infection period is effective in predicting long COVID. SHAP methods identified important features for prediction. Cross-site analysis demonstrated the generalizability of the proposed methodology. FUNDING NCATS U24 TR002306, NCATS UL1 TR003015, Axle Informatics Subcontract: NCATS-P00438-B, NIH/NIDDK/OD, PSR2015-1720GVALE_01, G43C22001320007, and Director, Office of Science, Office of Basic Energy Sciences of the U.S. Department of Energy Contract No. DE-AC02-05CH11231.
Collapse
Affiliation(s)
- Blessy Antony
- Department of Computer Science, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, 24061, USA
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Elena Casiraghi
- AnacletoLab, Computer Science Department, Dipartimento di Informatica, Università degli Studi di Milano, Milan, 20133, Italy; Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA; ELLIS - European Laboratory for Learning and Intelligent Systems, Milan Unit, Milan, 20133, Italy
| | - Johanna J Loomba
- Integrated Translational Health Research Institute of Virginia, University of Virginia, Charlottesville, VA, 22904, USA
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Bryan J Laraway
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Kenneth J Wilkins
- Biostatistics Program, Office of the Director, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, 20814, USA
| | | | - Giorgio Valentini
- AnacletoLab, Computer Science Department, Dipartimento di Informatica, Università degli Studi di Milano, Milan, 20133, Italy; ELLIS - European Laboratory for Learning and Intelligent Systems, Milan Unit, Milan, 20133, Italy
| | - Andrew E Williams
- Institute for Clinical Research and Health Policy Studies, Tufts University School of Medicine, Boston, MA, 02111, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA; Institute for Systems Genomics, University of Connecticut, Farmington, CT, 06269, USA
| | - Justin T Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - T M Murali
- Department of Computer Science, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, 24061, USA.
| |
Collapse
|
5
|
Spotnitz M, Acharya N, Cimino JJ, Murphy S, Namjou B, Crimmins N, Walunas T, Liu C, Crosslin D, Benoit B, Rosenthal E, Pacheco JA, Ostropolets A, Reyes Nieva H, Patterson JS, Richter LR, Callahan TJ, Elhussein A, Pang C, Kiryluk K, Nestor J, Khan A, Mohan S, Minty E, Chung W, Wei WQ, Natarajan K, Weng C. A metadata framework for computational phenotypes. JAMIA Open 2023; 6:ooad032. [PMID: 37181728 PMCID: PMC10168627 DOI: 10.1093/jamiaopen/ooad032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 04/10/2023] [Accepted: 04/21/2023] [Indexed: 05/16/2023] Open
Abstract
With the burgeoning development of computational phenotypes, it is increasingly difficult to identify the right phenotype for the right tasks. This study uses a mixed-methods approach to develop and evaluate a novel metadata framework for retrieval of and reusing computational phenotypes. Twenty active phenotyping researchers from 2 large research networks, Electronic Medical Records and Genomics and Observational Health Data Sciences and Informatics, were recruited to suggest metadata elements. Once consensus was reached on 39 metadata elements, 47 new researchers were surveyed to evaluate the utility of the metadata framework. The survey consisted of 5-Likert multiple-choice questions and open-ended questions. Two more researchers were asked to use the metadata framework to annotate 8 type-2 diabetes mellitus phenotypes. More than 90% of the survey respondents rated metadata elements regarding phenotype definition and validation methods and metrics positively with a score of 4 or 5. Both researchers completed annotation of each phenotype within 60 min. Our thematic analysis of the narrative feedback indicates that the metadata framework was effective in capturing rich and explicit descriptions and enabling the search for phenotypes, compliance with data standards, and comprehensive validation metrics. Current limitations were its complexity for data collection and the entailed human costs.
Collapse
Affiliation(s)
- Matthew Spotnitz
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA
| | - Nripendra Acharya
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA
| | - James J Cimino
- Informatics Institute, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Shawn Murphy
- Laboratory of Computer Science, Mass General Brigham, Boston, Massachusetts, USA
- Department of Neurology, Mass General Brigham, Boston, Massachusetts, USA
| | - Bahram Namjou
- Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
| | - Nancy Crimmins
- Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
| | - Theresa Walunas
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Cong Liu
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA
| | - David Crosslin
- Division of Biomedical Informatics and Genomics, Tulane University School of Medicine, New Orleans, Louisiana, USA
| | - Barbara Benoit
- Department of Research Information Science & Computing, Mass General Brigham, Boston, Massachusetts, USA
| | | | - Jennifer A Pacheco
- Center for Genetic Medicine, Northwestern University, Chicago, Illinois, USA
| | - Anna Ostropolets
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA
| | - Harry Reyes Nieva
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA
| | - Jason S Patterson
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA
| | - Lauren R Richter
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA
| | - Ahmed Elhussein
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA
| | - Chao Pang
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA
| | - Krzysztof Kiryluk
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA
| | - Jordan Nestor
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA
| | - Atlas Khan
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA
| | - Sumit Mohan
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA
- Department of Epidemiology, Columbia University Mailman School of Public Health, New York, New York, USA
| | - Evan Minty
- Department of Medicine, University of Calgary, Calgary, Alberta, Canada
| | - Wendy Chung
- Department of Pediatrics, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, USA
| | - Karthik Natarajan
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University Irving Medical Center, New York, New York, USA
| |
Collapse
|
6
|
Caufield JH, Putman T, Schaper K, Unni DR, Hegde H, Callahan TJ, Cappelletti L, Moxon SAT, Ravanmehr V, Carbon S, Chan LE, Cortes K, Shefchek KA, Elsarboukh G, Balhoff J, Fontana T, Matentzoglu N, Bruskiewich RM, Thessen AE, Harris NL, Munoz-Torres MC, Haendel MA, Robinson PN, Joachimiak MP, Mungall CJ, Reese JT. KG-Hub-building and exchanging biological knowledge graphs. Bioinformatics 2023; 39:btad418. [PMID: 37389415 PMCID: PMC10336030 DOI: 10.1093/bioinformatics/btad418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/09/2023] [Accepted: 06/29/2023] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking. RESULTS Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification. AVAILABILITY AND IMPLEMENTATION https://kghub.org.
Collapse
Affiliation(s)
- J Harry Caufield
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Kevin Schaper
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Deepak R Unni
- SIB Swiss Institute of Bioinformatics, Basel 1015, Switzerland
| | - Harshad Hegde
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Luca Cappelletti
- Department of Computer Science, University of Milano, Milan 20126, Italy
| | - Sierra A T Moxon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Vida Ravanmehr
- Department of Lymphoma-Myeloma, MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Seth Carbon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Lauren E Chan
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, United States
| | - Katherina Cortes
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Kent A Shefchek
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Glass Elsarboukh
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Jim Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC 27517, United States
| | - Tommaso Fontana
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan 20133, Italy
| | | | | | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Nomi L Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | | | - Melissa A Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, United States
| | - Marcin P Joachimiak
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| | - Justin T Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
| |
Collapse
|
7
|
Cappelletti L, Fontana T, Casiraghi E, Ravanmehr V, Callahan TJ, Cano C, Joachimiak MP, Mungall CJ, Robinson PN, Reese J, Valentini G. GRAPE for fast and scalable graph processing and random-walk-based embedding. Nat Comput Sci 2023; 3:552-568. [PMID: 38177435 PMCID: PMC10768636 DOI: 10.1038/s43588-023-00465-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 05/12/2023] [Indexed: 01/06/2024]
Abstract
Graph representation learning methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE (Graph Representation Learning, Prediction and Evaluation), a software resource for graph processing and embedding that is able to scale with big graphs by using specialized and smart data structures, algorithms, and a fast parallel implementation of random-walk-based methods. Compared with state-of-the-art software resources, GRAPE shows an improvement of orders of magnitude in empirical space and time complexity, as well as competitive edge- and node-label prediction performance. GRAPE comprises approximately 1.7 million well-documented lines of Python and Rust code and provides 69 node-embedding methods, 25 inference models, a collection of efficient graph-processing utilities, and over 80,000 graphs from the literature and other sources. Standardized interfaces allow a seamless integration of third-party libraries, while ready-to-use and modular pipelines permit an easy-to-use evaluation of graph-representation-learning methods, therefore also positioning GRAPE as a software resource that performs a fair comparison between methods and libraries for graph processing and embedding.
Collapse
Affiliation(s)
- Luca Cappelletti
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milan, Italy
| | - Tommaso Fontana
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milan, Italy
| | - Elena Casiraghi
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milan, Italy
- National Laboratory in Artificial Intelligence and Intelligent Systems, Consorzio Interuniversitario Nazionale per l'Informatica, Rome, Italy
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Vida Ravanmehr
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Department of Lymphoma and Myeloma, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Carlos Cano
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Marcin P Joachimiak
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Giorgio Valentini
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milan, Italy.
- National Laboratory in Artificial Intelligence and Intelligent Systems, Consorzio Interuniversitario Nazionale per l'Informatica, Rome, Italy.
- European Laboratory for Learning and Intelligent Systems, Tübingen, Germany.
- Data Science Research Center, Università degli Studi di Milano, Milan, Italy.
| |
Collapse
|
8
|
Callahan TJ, Stefanski AL, Wyrwa JM, Zeng C, Ostropolets A, Banda JM, Baumgartner WA, Boyce RD, Casiraghi E, Coleman BD, Collins JH, Deakyne Davies SJ, Feinstein JA, Lin AY, Martin B, Matentzoglu NA, Meeker D, Reese J, Sinclair J, Taneja SB, Trinkley KE, Vasilevsky NA, Williams AE, Zhang XA, Denny JC, Ryan PB, Hripcsak G, Bennett TD, Haendel MA, Robinson PN, Hunter LE, Kahn MG. Ontologizing health systems data at scale: making translational discovery a reality. NPJ Digit Med 2023; 6:89. [PMID: 37208468 PMCID: PMC10196319 DOI: 10.1038/s41746-023-00830-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 04/28/2023] [Indexed: 05/21/2023] Open
Abstract
Common data models solve many challenges of standardizing electronic health record (EHR) data but are unable to semantically integrate all of the resources needed for deep phenotyping. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of heterogeneous data. However, mapping EHR data to OBO ontologies requires significant manual curation and domain expertise. We introduce OMOP2OBO, an algorithm for mapping Observational Medical Outcomes Partnership (OMOP) vocabularies to OBO ontologies. Using OMOP2OBO, we produced mappings for 92,367 conditions, 8611 drug ingredients, and 10,673 measurement results, which covered 68-99% of concepts used in clinical practice when examined across 24 hospitals. When used to phenotype rare disease patients, the mappings helped systematically identify undiagnosed patients who might benefit from genetic testing. By aligning OMOP vocabularies to OBO ontologies our algorithm presents new opportunities to advance EHR-based deep phenotyping.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA.
| | - Adrianne L Stefanski
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Jordan M Wyrwa
- Department of Physical Medicine and Rehabilitation, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Chenjie Zeng
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Anna Ostropolets
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Juan M Banda
- Department of Computer Science, Georgia State University, Atlanta, GA, 30303, USA
| | - William A Baumgartner
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Richard D Boyce
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15260, USA
| | - Elena Casiraghi
- Computer Science, Università degli Studi di Milano, Milan, Italy
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Ben D Coleman
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Janine H Collins
- Department of Haematology, University of Cambridge, Cambridge, UK
| | - Sara J Deakyne Davies
- Department of Research Informatics & Data Science, Analytics Resource Center, Children's Hospital Colorado, Aurora, CO, 80045, USA
| | - James A Feinstein
- Adult and Child Center for Health Outcomes Research and Delivery Science (ACCORDS), University of Colorado Anschutz School of Medicine, Aurora, CO, 80045, USA
| | - Asiyah Y Lin
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Blake Martin
- Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | | | | | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Katy E Trinkley
- Department of Family Medicine, University of Colorado Anschutz School of Medicine, Aurora, CO, 80045, USA
| | - Nicole A Vasilevsky
- Translational and Integrative Sciences Lab, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Andrew E Williams
- Tufts Institute for Clinical Research and Health Policy Studies, Tufts University, Boston, MA, 02155, USA
| | - Xingmin A Zhang
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Joshua C Denny
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Patrick B Ryan
- Janssen Research and Development, Raritan, NJ, 08869, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, 10032, USA
| | - Tellen D Bennett
- Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Melissa A Haendel
- Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| | - Michael G Kahn
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
| |
Collapse
|
9
|
Callahan TJ, Stefanksi AL, Ostendorf DM, Wyrwa JM, Davies SJD, Hripcsak G, Hunter LE, Kahn MG. Characterizing Patient Representations for Computational Phenotyping. AMIA Annu Symp Proc 2023; 2022:319-328. [PMID: 37128436 PMCID: PMC10148332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Patient representation learning methods create rich representations of complex data and have potential to further advance the development of computational phenotypes (CP). Currently, these methods are either applied to small predefined concept sets or all available patient data, limiting the potential for novel discovery and reducing the explainability of the resulting representations. We report on an extensive, data-driven characterization of the utility of patient representation learning methods for the purpose of CP development or automatization. We conducted ablation studies to examine the impact of patient representations, built using data from different combinations of data types and sampling windows on rare disease classification. We demonstrated that the data type and sampling window directly impact classification and clustering performance, and these results differ by rare disease group. Our results, although preliminary, exemplify the importance of and need for data-driven characterization in patient representation-based CP development pipelines.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Columbia University, New York, NY, 10032, USA
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | | | | | - Jordan M Wyrwa
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
- Children's Hospital Colorado, Aurora, CO, 80045, USA
| | | | | | - Lawrence E Hunter
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Michael G Kahn
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| |
Collapse
|
10
|
Malec SA, Taneja SB, Albert SM, Elizabeth Shaaban C, Karim HT, Levine AS, Munro P, Callahan TJ, Boyce RD. Causal feature selection using a knowledge graph combining structured knowledge from the biomedical literature and ontologies: a use case studying depression as a risk factor for Alzheimer's disease. J Biomed Inform 2023; 142:104368. [PMID: 37086959 DOI: 10.1016/j.jbi.2023.104368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 03/03/2023] [Accepted: 04/17/2023] [Indexed: 04/24/2023]
Abstract
BACKGROUND Causal feature selection is essential for estimating effects from observational data. Identifying confounders is a crucial step in this process. Traditionally, researchers employ content-matter expertise and literature review to identify confounders. Uncontrolled confounding from unidentified confounders threatens validity, conditioning on intermediate variables (mediators) weakens estimates, and conditioning on common effects (colliders) induces bias. Additionally, without special treatment, erroneous conditioning on variables combining roles introduces bias. However, the vast literature is growing exponentially, making it infeasible to assimilate this knowledge. To address these challenges, we introduce a novel knowledge graph (KG) application enabling causal feature selection by combining computable literature-derived knowledge with biomedical ontologies. We present a use case of our approach specifying a causal model for estimating the total causal effect of depression on the risk of developing Alzheimer's disease (AD) from observational data. METHODS We extracted computable knowledge from a literature corpus using three machine reading systems and inferred missing knowledge using logical closure operations. Using a KG framework, we mapped the output to target terminologies and combined it with ontology-grounded resources. We translated epidemiological definitions of confounder, collider, and mediator into queries for searching the KG and summarized the roles played by the identified variables. We compared the results with output from a complementary method and published observational studies and examined a selection of confounding and combined role variables in-depth. RESULTS Our search identified 128 confounders, including 58 phenotypes, 47 drugs, 35 genes, 23 collider, and 16 mediator phenotypes. However, only 31 of the 58 confounder phenotypes were found to behave exclusively as confounders, while the remaining 27 phenotypes played other roles. Obstructive sleep apnea emerged as a potential novel confounder for depression and AD. Anemia exemplified a variable playing combined roles. CONCLUSION Our findings suggest combining machine reading and KG could augment human expertise for causal feature selection. However, the complexity of causal feature selection for depression with AD highlights the need for standardized field-specific databases of causal variables. Further work is needed to optimize KG search and transform the output for human consumption.
Collapse
Affiliation(s)
- Scott A Malec
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA USA
| | - Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA USA
| | - Steven M Albert
- Department of Behavioral and Community Health Sciences, School of Public Health, University of Pittsburgh, Pittsburgh, PA USA
| | - C Elizabeth Shaaban
- Department of Epidemiology, School of Public Health, University of Pittsburgh, Pittsburgh, PA USA
| | - Helmet T Karim
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA USA; Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA USA
| | - Arthur S Levine
- Department of Neurobiology, School of Medicine, University of Pittsburgh, Pittsburgh, PA USA; The Brain Institute, School of Medicine, University of Pittsburgh, Pittsburgh, PA USA
| | - Paul Munro
- School of Computing and Information, University of Pittsburgh, Pittsburgh, PA USA
| | - Tiffany J Callahan
- Department of Biomedical informatics, Columbia University, New York, NY USA
| | - Richard D Boyce
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA USA; Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA USA
| |
Collapse
|
11
|
Taneja SB, Callahan TJ, Paine MF, Kane-Gill SL, Kilicoglu H, Joachimiak MP, Boyce RD. Developing a Knowledge Graph for Pharmacokinetic Natural Product-Drug Interactions. J Biomed Inform 2023; 140:104341. [PMID: 36933632 PMCID: PMC10150409 DOI: 10.1016/j.jbi.2023.104341] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Revised: 01/09/2023] [Accepted: 03/13/2023] [Indexed: 03/17/2023]
Abstract
BACKGROUND Pharmacokinetic natural product-drug interactions (NPDIs) occur when botanical or other natural products are co-consumed with pharmaceutical drugs. With the growing use of natural products, the risk for potential NPDIs and consequent adverse events has increased. Understanding mechanisms of NPDIs is key to preventing or minimizing adverse events. Although biomedical knowledge graphs (KGs) have been widely used for drug-drug interaction applications, computational investigation of NPDIs is novel. We constructed NP-KG as a first step toward computational discovery of plausible mechanistic explanations for pharmacokinetic NPDIs that can be used to guide scientific research. METHODS We developed a large-scale, heterogeneous KG with biomedical ontologies, linked data, and full texts of the scientific literature. To construct the KG, biomedical ontologies and drug databases were integrated with the Phenotype Knowledge Translator framework. The semantic relation extraction systems, SemRep and Integrated Network and Dynamic Reasoning Assembler, were used to extract semantic predications (subject-relation-object triples) from full texts of the scientific literature related to the exemplar natural products green tea and kratom. A literature-based graph constructed from the predications was integrated into the ontology-grounded KG to create NP-KG. NP-KG was evaluated with case studies of pharmacokinetic green tea- and kratom-drug interactions through KG path searches and meta-path discovery to determine congruent and contradictory information in NP-KG compared to ground truth data. We also conducted an error analysis to identify knowledge gaps and incorrect predications in the KG. RESULTS The fully integrated NP-KG consisted of 745,512 nodes and 7,249,576 edges. Evaluation of NP-KG resulted in congruent (38.98% for green tea, 50% for kratom), contradictory (15.25% for green tea, 21.43% for kratom), and both congruent and contradictory (15.25% for green tea, 21.43% for kratom) information compared to ground truth data. Potential pharmacokinetic mechanisms for several purported NPDIs, including the green tea-raloxifene, green tea-nadolol, kratom-midazolam, kratom-quetiapine, and kratom-venlafaxine interactions were congruent with the published literature. CONCLUSION NP-KG is the first KG to integrate biomedical ontologies with full texts of the scientific literature focused on natural products. We demonstrate the application of NP-KG to identify known pharmacokinetic interactions between natural products and pharmaceutical drugs mediated by drug metabolizing enzymes and transporters. Future work will incorporate context, contradiction analysis, and embedding-based methods to enrich NP-KG. NP-KG is publicly available at https://doi.org/10.5281/zenodo.6814507. The code for relation extraction, KG construction, and hypothesis generation is available at https://github.com/sanyabt/np-kg.
Collapse
Affiliation(s)
- Sanya B Taneja
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15206, USA.
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Mary F Paine
- Department of Pharmaceutical Sciences, College of Pharmacy and Pharmaceutical Sciences, Washington State University, Spokane, WA 99202, USA
| | | | - Halil Kilicoglu
- School of Information Sciences, University of Illinois at Urbana-Champaign, Champaign, IL 61820, USA
| | - Marcin P Joachimiak
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Richard D Boyce
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, USA
| |
Collapse
|
12
|
Casiraghi E, Wong R, Hall M, Coleman B, Notaro M, Evans MD, Tronieri JS, Blau H, Laraway B, Callahan TJ, Chan LE, Bramante CT, Buse JB, Moffitt RA, Stürmer T, Johnson SG, Raymond Shao Y, Reese J, Robinson PN, Paccanaro A, Valentini G, Huling JD, Wilkins KJ. A method for comparing multiple imputation techniques: A case study on the U.S. national COVID cohort collaborative. J Biomed Inform 2023; 139:104295. [PMID: 36716983 PMCID: PMC10683778 DOI: 10.1016/j.jbi.2023.104295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 01/16/2023] [Accepted: 01/21/2023] [Indexed: 02/01/2023]
Abstract
Healthcare datasets obtained from Electronic Health Records have proven to be extremely useful for assessing associations between patients' predictors and outcomes of interest. However, these datasets often suffer from missing values in a high proportion of cases, whose removal may introduce severe bias. Several multiple imputation algorithms have been proposed to attempt to recover the missing information under an assumed missingness mechanism. Each algorithm presents strengths and weaknesses, and there is currently no consensus on which multiple imputation algorithm works best in a given scenario. Furthermore, the selection of each algorithm's parameters and data-related modeling choices are also both crucial and challenging. In this paper we propose a novel framework to numerically evaluate strategies for handling missing data in the context of statistical analysis, with a particular focus on multiple imputation techniques. We demonstrate the feasibility of our approach on a large cohort of type-2 diabetes patients provided by the National COVID Cohort Collaborative (N3C) Enclave, where we explored the influence of various patient characteristics on outcomes related to COVID-19. Our analysis included classic multiple imputation techniques as well as simple complete-case Inverse Probability Weighted models. Extensive experiments show that our approach can effectively highlight the most promising and performant missing-data handling strategy for our case study. Moreover, our methodology allowed a better understanding of the behavior of the different models and of how it changed as we modified their parameters. Our method is general and can be applied to different research fields and on datasets containing heterogeneous types.
Collapse
Affiliation(s)
- Elena Casiraghi
- AnacletoLab, Department of Computer Science "Giovanni degli Antoni", Università degli Studi di Milano, Milan, Italy; CINI, Infolife National Laboratory, Roma, Italy; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Rachel Wong
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| | - Margaret Hall
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| | - Ben Coleman
- The Jackson Laboratory for Genomic Medicine, Farmington, USA; Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA
| | - Marco Notaro
- AnacletoLab, Department of Computer Science "Giovanni degli Antoni", Università degli Studi di Milano, Milan, Italy; CINI, Infolife National Laboratory, Roma, Italy
| | - Michael D Evans
- Biostatistical Design and Analysis Center, Clinical and Translational Science Institute, University of Minnesota, Minneapolis, MN, USA
| | - Jena S Tronieri
- Department of Psychiatry, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine, Farmington, USA
| | - Bryan Laraway
- University of Colorado, Anschutz Medical Campus, Aurora, CO, USA
| | | | - Lauren E Chan
- College of Public Health and Human Sciences, Oregon State University, Corvallis, USA
| | - Carolyn T Bramante
- Division of General Internal Medicine, University of Minnesota, Minneapolis, MN, USA
| | - John B Buse
- NC Translational and Clinical Sciences Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA; Division of Endocrinology, Department of Medicine, University of North Carolina School of Medicine, USA
| | - Richard A Moffitt
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| | - Til Stürmer
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Steven G Johnson
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA
| | - Yu Raymond Shao
- Harvard-MIT Division of Health Sciences and Technology (HST), 260 Longwood Ave, Boston, USA; Department of Radiation Oncology, UT Southwestern Medical Center, Dallas, USA
| | - Justin Reese
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, USA; Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA
| | - Alberto Paccanaro
- School of Applied Mathematics (EMAp), Fundação Getúlio Vargas, Rio de Janeiro, Brazil; Department of Computer Science, Royal Holloway, University of London, Egham, UK
| | - Giorgio Valentini
- AnacletoLab, Department of Computer Science "Giovanni degli Antoni", Università degli Studi di Milano, Milan, Italy; CINI, Infolife National Laboratory, Roma, Italy
| | - Jared D Huling
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Kenneth J Wilkins
- Biostatistics Program, Office of the Director, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
13
|
Callahan TJ, Stefanski AL, Kim JD, Baumgartner WA, Wyrwa JM, Hunter LE. Knowledge-Driven Mechanistic Enrichment of the Preeclampsia Ignorome. Pac Symp Biocomput 2023; 28:371-382. [PMID: 36540992 PMCID: PMC9782728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Preeclampsia is a leading cause of maternal and fetal morbidity and mortality. Currently, the only definitive treatment of preeclampsia is delivery of the placenta, which is central to the pathogenesis of the disease. Transcriptional profiling of human placenta from pregnancies complicated by preeclampsia has been extensively performed to identify differentially expressed genes (DEGs). The decisions to investigate DEGs experimentally are biased by many factors, causing many DEGs to remain uninvestigated. A set of DEGs which are associated with a disease experimentally, but which have no known association to the disease in the literature are known as the ignorome. Preeclampsia has an extensive body of scientific literature, a large pool of DEG data, and only one definitive treatment. Tools facilitating knowledge-based analyses, which are capable of combining disparate data from many sources in order to suggest underlying mechanisms of action, may be a valuable resource to support discovery and improve our understanding of this disease. In this work we demonstrate how a biomedical knowledge graph (KG) can be used to identify novel preeclampsia molecular mechanisms. Existing open source biomedical resources and publicly available high-throughput transcriptional profiling data were used to identify and annotate the function of currently uninvestigated preeclampsia-associated DEGs. Experimentally investigated genes associated with preeclampsia were identified from PubMed abstracts using text-mining methodologies. The relative complement of the text-mined- and meta-analysis-derived lists were identified as the uninvestigated preeclampsia-associated DEGs (n=445), i.e., the preeclampsia ignorome. Using the KG to investigate relevant DEGs revealed 53 novel clinically relevant and biologically actionable mechanistic associations.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University, New York, NY, USA,
| | | | | | | | | | | |
Collapse
|
14
|
Reese JT, Blau H, Casiraghi E, Bergquist T, Loomba JJ, Callahan TJ, Laraway B, Antonescu C, Coleman B, Gargano M, Wilkins KJ, Cappelletti L, Fontana T, Ammar N, Antony B, Murali TM, Caufield JH, Karlebach G, McMurry JA, Williams A, Moffitt R, Banerjee J, Solomonides AE, Davis H, Kostka K, Valentini G, Sahner D, Chute CG, Madlock-Brown C, Haendel MA, Robinson PN. Generalisable long COVID subtypes: findings from the NIH N3C and RECOVER programmes. EBioMedicine 2023; 87:104413. [PMID: 36563487 PMCID: PMC9769411 DOI: 10.1016/j.ebiom.2022.104413] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Revised: 11/23/2022] [Accepted: 11/29/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterised by a wide range of manifestations that are difficult to analyse computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. METHODS We present a method for computationally modelling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning. FINDINGS We found six clusters of PASC patients, each with distinct profiles of phenotypic abnormalities, including clusters with distinct pulmonary, neuropsychiatric, and cardiovascular abnormalities, and a cluster associated with broad, severe manifestations and increased mortality. There was significant association of cluster membership with a range of pre-existing conditions and measures of severity during acute COVID-19. We assigned new patients from other healthcare centres to clusters by maximum semantic similarity to the original patients, and showed that the clusters were generalisable across different hospital systems. The increased mortality rate originally identified in one cluster was consistently observed in patients assigned to that cluster in other hospital systems. INTERPRETATION Semantic phenotypic clustering provides a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC. FUNDING NIH (TR002306/OT2HL161847-01/OD011883/HG010860), U.S.D.O.E. (DE-AC02-05CH11231), Donald A. Roux Family Fund at Jackson Laboratory, Marsico Family at CU Anschutz.
Collapse
Affiliation(s)
- Justin T Reese
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
| | - Elena Casiraghi
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA; AnacletoLab, Dipartimento di Informatica, Università Degli Studi di Milano, Milan, Italy
| | | | - Johanna J Loomba
- The Integrated Translational Health Research Institute of Virginia (iTHRIV), University of Virginia, Charlottesville, VA, USA
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Bryan Laraway
- Departments of Biomedical Informatics and Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | | - Ben Coleman
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
| | - Michael Gargano
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
| | - Kenneth J Wilkins
- Biostatistics Program, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Luca Cappelletti
- AnacletoLab, Dipartimento di Informatica, Università Degli Studi di Milano, Milan, Italy
| | - Tommaso Fontana
- AnacletoLab, Dipartimento di Informatica, Università Degli Studi di Milano, Milan, Italy
| | - Nariman Ammar
- Health Science Center, University of Tennessee, Memphis, TN, USA
| | - Blessy Antony
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - J Harry Caufield
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Guy Karlebach
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA
| | - Julie A McMurry
- Departments of Biomedical Informatics and Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Andrew Williams
- Tufts Medical Center Clinical and Translational Science Institute, Tufts Medical Center, Boston, MA, USA; Tufts University School of Medicine, Institute for Clinical Research and Health Policy Studies, Boston, MA, USA; Northeastern University, OHDSI Center at the Roux Institute, Boston, MA, USA
| | - Richard Moffitt
- Department of Biomedical Informatics and Stony Brook Cancer Center, Stony Brook University, Stony Brook, NY, USA
| | | | | | | | - Kristin Kostka
- Northeastern University, OHDSI Center at the Roux Institute, Boston, MA, USA
| | - Giorgio Valentini
- AnacletoLab, Dipartimento di Informatica, Università Degli Studi di Milano, Milan, Italy
| | | | - Christopher G Chute
- Schools of Medicine, Public Health and Nursing, Johns Hopkins University, Baltimore, MD, USA
| | | | - Melissa A Haendel
- Departments of Biomedical Informatics and Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT, USA; Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA.
| |
Collapse
|
15
|
Coleman B, Casiraghi E, Callahan TJ, Blau H, Chan L, Laraway B, Clark KB, Reâ Em Y, Gersing KR, Wilkins K, Harris NL, Valentini G, Haendel MA, Reese J, Robinson PN. Post-COVID Phenotypic Manifestations are Associated with New-Onset Psychiatric Disease: Findings from the NIH N3C and RECOVER Studies. medRxiv 2022:2022.07.08.22277388. [PMID: 36380762 PMCID: PMC9645424 DOI: 10.1101/2022.07.08.22277388] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
UNLABELLED Acute COVID-19 infection can be followed by diverse clinical manifestations referred to as Post Acute Sequelae of SARS-CoV2 Infection (PASC). Studies have shown an increased risk of being diagnosed with new-onset psychiatric disease following a diagnosis of acute COVID-19. However, it was unclear whether non-psychiatric PASC-associated manifestations (PASC-AMs) are associated with an increased risk of new-onset psychiatric disease following COVID-19. A retrospective EHR cohort study of 1,603,767 individuals with acute COVID-19 was performed to evaluate whether non-psychiatric PASC-AMs are associated with new-onset psychiatric disease. Data were obtained from the National COVID Cohort Collaborative (N3C), which has EHR data from 65 clinical organizations. EHR codes were mapped to 151 non-psychiatric PASC-AMs recorded 28-120 days following SARS-CoV-2 diagnosis and before diagnosis of new-onset psychiatric disease. Association of newly diagnosed psychiatric disease with age, sex, race, pre-existing comorbidities, and PASC-AMs in seven categories was assessed by logistic regression. There was a significant association between six categories and newly diagnosed anxiety, mood, and psychotic disorders, with odds ratios highest for cardiovascular (1.35, 1.27-1.42) PASC-AMs. Secondary analysis revealed that the proportions of 95 individual clinical features significantly differed between patients diagnosed with different psychiatric disorders. Our study provides evidence for association between non-psychiatric PASC-AMs and the incidence of newly diagnosed psychiatric disease. Significant associations were found for features related to multiple organ systems. This information could prove useful in understanding risk stratification for new-onset psychiatric disease following COVID-19. Prospective studies are needed to corroborate these findings. FUNDING NCATS U24 TR002306.
Collapse
|
16
|
Coleman B, Casiraghi E, Blau H, Chan L, Haendel MA, Laraway B, Callahan TJ, Deer RR, Wilkins KJ, Reese J, Robinson PN. Risk of new-onset psychiatric sequelae of COVID-19 in the early and late post-acute phase. World Psychiatry 2022; 21:319-320. [PMID: 35524622 PMCID: PMC9077621 DOI: 10.1002/wps.20992] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Affiliation(s)
- Ben Coleman
- Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA
| | - Elena Casiraghi
- AnacletoLab, Dipartimento di Informatica "Giovanni degli Antoni", Università di Milano, Milan, Italy
- CINI, Infolife National Laboratory, Rome, Italy
| | - Hannah Blau
- Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Lauren Chan
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR, USA
| | - Melissa A Haendel
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Bryan Laraway
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Tiffany J Callahan
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Rachel R Deer
- University of Texas Medical Branch, Galveston, TX, USA
| | - Kenneth J Wilkins
- Biostatistics Program, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Peter N Robinson
- Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA
| |
Collapse
|
17
|
Matentzoglu N, Balhoff JP, Bello SM, Bizon C, Brush M, Callahan TJ, Chute CG, Duncan WD, Evelo CT, Gabriel D, Graybeal J, Gray A, Gyori BM, Haendel M, Harmse H, Harris NL, Harrow I, Hegde HB, Hoyt AL, Hoyt CT, Jiao D, Jiménez-Ruiz E, Jupp S, Kim H, Koehler S, Liener T, Long Q, Malone J, McLaughlin JA, McMurry JA, Moxon S, Munoz-Torres MC, Osumi-Sutherland D, Overton JA, Peters B, Putman T, Queralt-Rosinach N, Shefchek K, Solbrig H, Thessen A, Tudorache T, Vasilevsky N, Wagner AH, Mungall CJ. A Simple Standard for Sharing Ontological Mappings (SSSOM). Database (Oxford) 2022; 2022:6591806. [PMID: 35616100 PMCID: PMC9216545 DOI: 10.1093/database/baac035] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 03/08/2022] [Accepted: 05/11/2022] [Indexed: 02/03/2023]
Abstract
Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec.
Collapse
Affiliation(s)
| | - James P Balhoff
- RENCI, University of North Carolina, Chapel Hill, NC 27517, USA
| | | | - Chris Bizon
- RENCI, University of North Carolina, Chapel Hill, NC 27517, USA
| | - Matthew Brush
- University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA
| | | | | | | | - Chris T Evelo
- Maastricht University, Maastricht 6211 LK, The Netherlands
| | | | | | - Alasdair Gray
- Department of Computer Science, Heriot-Watt University, Edinburgh, Currie EH14 4AS, UK
| | | | - Melissa Haendel
- University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA
| | - Henriette Harmse
- European Bioinformatics Institute (EMBL-EBI), Hinxton CB10 1SD, UK
| | - Nomi L Harris
- Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | - Harshad B Hegde
- Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Amelia L Hoyt
- Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
| | | | - Dazhi Jiao
- Johns Hopkins University, Baltimore, MD 21210, USA
| | - Ernesto Jiménez-Ruiz
- City University of London, London EC1V 0HB, UK,University of Oslo, Oslo 0315, Norway
| | - Simon Jupp
- SciBite Limited, Bio Data Innovation Centre, Wellcome Genome Campus, Hinxton, Saffron Walden CB10 1DR, UK
| | | | | | | | - Qinqin Long
- Leiden University Medical Center, Leiden 2333 ZA, The Netherlands
| | - James Malone
- BenchSci, 25 York St Suite 1100, Toronto, ON M5J 2V5, Canada
| | | | - Julie A McMurry
- University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA
| | - Sierra Moxon
- Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | | | | | - Bjoern Peters
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Tim Putman
- University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA
| | | | - Kent Shefchek
- University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA
| | | | - Anne Thessen
- University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA
| | | | - Nicole Vasilevsky
- University of Colorado Anschutz Medical Campus, Aurora, CO 80217, USA
| | - Alex H Wagner
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children’s Hospital, Columbus, OH 43205, USA,The Ohio State University College of Medicine, Columbus, OH 43210, USA
| | | |
Collapse
|
18
|
Reese JT, Coleman B, Chan L, Blau H, Callahan TJ, Cappelletti L, Fontana T, Bradwell KR, Harris NL, Casiraghi E, Valentini G, Karlebach G, Deer R, McMurry JA, Haendel MA, Chute CG, Pfaff E, Moffitt R, Spratt H, Singh JA, Mungall CJ, Williams AE, Robinson PN. NSAID use and clinical outcomes in COVID-19 patients: a 38-center retrospective cohort study. Virol J 2022; 19:84. [PMID: 35570298 PMCID: PMC9107579 DOI: 10.1186/s12985-022-01813-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 05/04/2022] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND Non-steroidal anti-inflammatory drugs (NSAIDs) are commonly used to reduce pain, fever, and inflammation but have been associated with complications in community-acquired pneumonia. Observations shortly after the start of the COVID-19 pandemic in 2020 suggested that ibuprofen was associated with an increased risk of adverse events in COVID-19 patients, but subsequent observational studies failed to demonstrate increased risk and in one case showed reduced risk associated with NSAID use. METHODS A 38-center retrospective cohort study was performed that leveraged the harmonized, high-granularity electronic health record data of the National COVID Cohort Collaborative. A propensity-matched cohort of 19,746 COVID-19 inpatients was constructed by matching cases (treated with NSAIDs at the time of admission) and 19,746 controls (not treated) from 857,061 patients with COVID-19 available for analysis. The primary outcome of interest was COVID-19 severity in hospitalized patients, which was classified as: moderate, severe, or mortality/hospice. Secondary outcomes were acute kidney injury (AKI), extracorporeal membrane oxygenation (ECMO), invasive ventilation, and all-cause mortality at any time following COVID-19 diagnosis. RESULTS Logistic regression showed that NSAID use was not associated with increased COVID-19 severity (OR: 0.57 95% CI: 0.53-0.61). Analysis of secondary outcomes using logistic regression showed that NSAID use was not associated with increased risk of all-cause mortality (OR 0.51 95% CI: 0.47-0.56), invasive ventilation (OR: 0.59 95% CI: 0.55-0.64), AKI (OR: 0.67 95% CI: 0.63-0.72), or ECMO (OR: 0.51 95% CI: 0.36-0.7). In contrast, the odds ratios indicate reduced risk of these outcomes, but our quantitative bias analysis showed E-values of between 1.9 and 3.3 for these associations, indicating that comparatively weak or moderate confounder associations could explain away the observed associations. CONCLUSIONS Study interpretation is limited by the observational design. Recording of NSAID use may have been incomplete. Our study demonstrates that NSAID use is not associated with increased COVID-19 severity, all-cause mortality, invasive ventilation, AKI, or ECMO in COVID-19 inpatients. A conservative interpretation in light of the quantitative bias analysis is that there is no evidence that NSAID use is associated with risk of increased severity or the other measured outcomes. Our results confirm and extend analogous findings in previous observational studies using a large cohort of patients drawn from 38 centers in a nationally representative multicenter database.
Collapse
Affiliation(s)
- Justin T Reese
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| | - Ben Coleman
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA
| | - Lauren Chan
- Translational and Integrative Sciences Center, Oregon State University, Corvallis, OR, USA
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Tiffany J Callahan
- Computational Bioscience, University of Colorado Anschutz Medical Campus, Boulder, CO, USA
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Luca Cappelletti
- AnacletoLab, Dipartimento Di Informatica, Università Degli Studi Di Milano, Milan, Italy
| | - Tommaso Fontana
- AnacletoLab, Dipartimento Di Informatica, Università Degli Studi Di Milano, Milan, Italy
| | | | - Nomi L Harris
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Elena Casiraghi
- AnacletoLab, Dipartimento Di Informatica, Università Degli Studi Di Milano, Milan, Italy
- CINI, National Laboratory in Artificial Intelligence and Intelligent Systems-AIIS, Rome, Italy
| | - Giorgio Valentini
- AnacletoLab, Dipartimento Di Informatica, Università Degli Studi Di Milano, Milan, Italy
- CINI, National Laboratory in Artificial Intelligence and Intelligent Systems-AIIS, Rome, Italy
| | - Guy Karlebach
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Rachel Deer
- University of Texas Medical Branch, Galveston, TX, USA
| | - Julie A McMurry
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Melissa A Haendel
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, MD, USA
| | - Emily Pfaff
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Richard Moffitt
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| | - Heidi Spratt
- University of Texas Medical Branch, Galveston, TX, USA
| | - Jasvinder A Singh
- University of Alabama at Birmingham, Birmingham, AL, USA
- Medicine Service, VA Medical Center, Birmingham, AL, USA
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Andrew E Williams
- Tufts Medical Center Clinical and Translational Science Institute, Tufts Medical Center, Boston, MA, USA
- Institute for Clinical Research and Health Policy Studies, Tufts University School of Medicine, Boston, USA
- OHDSI Center at the Roux Institute, Northeastern University, Boston, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
- Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA.
| |
Collapse
|
19
|
Reese JT, Coleman B, Chan L, Blau H, Callahan TJ, Cappelletti L, Fontana T, Bradwell KR, Harris NL, Casiraghi E, Valentini G, Karlebach G, Deer R, McMurry JA, Haendel MA, Chute CG, Pfaff E, Moffitt R, Spratt H, Singh J, Mungall CJ, Williams AE, Robinson PN. NSAID use and clinical outcomes in COVID-19 patients: A 38-center retrospective cohort study. medRxiv 2021:2021.04.13.21255438. [PMID: 33907758 PMCID: PMC8077581 DOI: 10.1101/2021.04.13.21255438] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
BACKGROUND Non-steroidal anti-inflammatory drugs (NSAIDs) are commonly used to reduce pain, fever, and inflammation but have been associated with complications in community-acquired pneumonia. Observations shortly after the start of the COVID-19 pandemic in 2020 suggested that ibuprofen was associated with an increased risk of adverse events in COVID-19 patients, but subsequent observational studies failed to demonstrate increased risk and in one case showed reduced risk associated with NSAID use. METHODS A 38-center retrospective cohort study was performed that leveraged the harmonized, high-granularity electronic health record data of the National COVID Cohort Collaborative. A propensity-matched cohort of COVID-19 inpatients was constructed by matching cases (treated with NSAIDs) and controls (not treated) from 857,061 patients with COVID-19. The primary outcome of interest was COVID-19 severity in hospitalized patients, which was classified as: moderate, severe, or mortality/hospice. Secondary outcomes were acute kidney injury (AKI), extracorporeal membrane oxygenation (ECMO), invasive ventilation, and all-cause mortality at any time following COVID-19 diagnosis. RESULTS Logistic regression showed that NSAID use was not associated with increased COVID-19 severity (OR: 0.57 95% CI: 0.53-0.61). Analysis of secondary outcomes using logistic regression showed that NSAID use was not associated with increased risk of all-cause mortality (OR 0.51 95% CI: 0.47-0.56), invasive ventilation (OR: 0.59 95% CI: 0.55-0.64), AKI (OR: 0.67 95% CI: 0.63-0.72), or ECMO (OR: 0.51 95% CI: 0.36-0.7). In contrast, the odds ratios indicate reduced risk of these outcomes, but our quantitative bias analysis showed E-values of between 1.9 and 3.3 for these associations, indicating that comparatively weak or moderate confounder associations could explain away the observed associations. CONCLUSIONS Study interpretation is limited by the observational design. Recording of NSAID use may have been incomplete. Our study demonstrates that NSAID use is not associated with increased COVID-19 severity, all-cause mortality, invasive ventilation, AKI, or ECMO in COVID-19 inpatients. A conservative interpretation in light of the quantitative bias analysis is that there is no evidence that NSAID use is associated with risk of increased severity or the other measured outcomes. Our findings are the largest EHR-based analysis of the effect of NSAIDs on outcome in COVID-19 patients to date. Our results confirm and extend analogous findings in previous observational studies using a large cohort of patients drawn from 38 centers in a nationally representative multicenter database.
Collapse
Affiliation(s)
- Justin T Reese
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Ben Coleman
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Lauren Chan
- Translational and Integrative Sciences Center, Oregon State University, Corvallis, OR, USA
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Tiffany J Callahan
- Computational Bioscience, University of Colorado Anschutz Medical Campus, Boulder, CO, USA
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Luca Cappelletti
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Italy
| | - Tommaso Fontana
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Italy
| | | | - Nomi L Harris
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Elena Casiraghi
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Italy
- CINI, National Laboratory in Artificial Intelligence and Intelligent Systems-AIIS, Roma, Italy
| | - Giorgio Valentini
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Italy
- CINI, National Laboratory in Artificial Intelligence and Intelligent Systems-AIIS, Roma, Italy
| | - Guy Karlebach
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Rachel Deer
- University of Texas Medical Branch, Galveston, TX, USA
| | - Julie A McMurry
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Melissa A Haendel
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, MD, USA
| | - Emily Pfaff
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Richard Moffitt
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| | - Heidi Spratt
- University of Texas Medical Branch, Galveston, TX, USA
| | - Jasvinder Singh
- University of Alabama at Birmingham, Birmingham, AL, USA
- Medicine Service, VA Medical Center, Birmingham, AL, USA
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Andrew E Williams
- Tufts Medical Center Clinical and Translational Science Institute, Tufts Medical Center, Boston, MA, USA
- Tufts University School of Medicine, Institute for Clinical Research and Health Policy Studies
- Northeastern University, OHDSI Center at the Roux Institute
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA
| |
Collapse
|
20
|
Coleman B, Casiraghi E, Blau H, Chan L, Haendel M, Laraway B, Callahan TJ, Deer RR, Wilkins K, Reese J, Robinson PN. Increased risk of psychiatric sequelae of COVID-19 is highest early in the clinical course. medRxiv 2021:2021.11.30.21267071. [PMID: 34909790 PMCID: PMC8669857 DOI: 10.1101/2021.11.30.21267071] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Background COVID-19 has been shown to increase the risk of adverse mental health consequences. A recent electronic health record (EHR)-based observational study showed an almost two-fold increased risk of new-onset mental illness in the first 90 days following a diagnosis of acute COVID-19. Methods We used the National COVID Cohort Collaborative, a harmonized EHR repository with 2,965,506 COVID-19 positive patients, and compared cohorts of COVID-19 patients with comparable controls. Patients were propensity score-matched to control for confounding factors. We estimated the hazard ratio (COVID-19:control) for new-onset of mental illness for the first year following diagnosis. We additionally estimated the change in risk for new-onset mental illness between the periods of 21-120 and 121-365 days following infection. Findings We find a significant increase in incidence of new-onset mental disorders in the period of 21-120 days following COVID-19 (3.8%, 3.6-4.0) compared to patients with respiratory tract infections (3%, 2.8-3.2). We further show that the risk for new-onset mental illness decreases over the first year following COVID-19 diagnosis compared to other respiratory tract infections and demonstrate a reduced (non-significant) hazard ratio over the period of 121-365 days following diagnosis. Similar findings are seen for new-onset anxiety disorders but not for mood disorders. Interpretation Patients who have recovered from COVID-19 are at an increased risk for developing new-onset mental illness, especially anxiety disorders. This risk is most prominent in the first 120 days following infection. Funding National Center for Advancing Translational Sciences (NCATS).
Collapse
Affiliation(s)
- Ben Coleman
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, USA
| | - Elena Casiraghi
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Italy
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Lauren Chan
- College of Public Health and Human Sciences, Oregon State University, Corvallis, OR, USA
| | - Melissa Haendel
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Bryan Laraway
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Tiffany J Callahan
- University of Colorado Anschutz Medical Campus, Center for Health AI, Aurora 80045, CO, USA
| | - Rachel R Deer
- University of Texas Medical Branch, Galveston, TX, 77550 USA
| | - Ken Wilkins
- Biostatistics Program, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland
| | - Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, USA
| |
Collapse
|
21
|
Deer RR, Rock MA, Vasilevsky N, Carmody L, Rando H, Anzalone AJ, Basson MD, Bennett TD, Bergquist T, Boudreau EA, Bramante CT, Byrd JB, Callahan TJ, Chan LE, Chu H, Chute CG, Coleman BD, Davis HE, Gagnier J, Greene CS, Hillegass WB, Kavuluru R, Kimble WD, Koraishy FM, Köhler S, Liang C, Liu F, Liu H, Madhira V, Madlock-Brown CR, Matentzoglu N, Mazzotti DR, McMurry JA, McNair DS, Moffitt RA, Monteith TS, Parker AM, Perry MA, Pfaff E, Reese JT, Saltz J, Schuff RA, Solomonides AE, Solway J, Spratt H, Stein GS, Sule AA, Topaloglu U, Vavougios GD, Wang L, Haendel MA, Robinson PN. Characterizing Long COVID: Deep Phenotype of a Complex Condition. EBioMedicine 2021; 74:103722. [PMID: 34839263 PMCID: PMC8613500 DOI: 10.1016/j.ebiom.2021.103722] [Citation(s) in RCA: 102] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 10/22/2021] [Accepted: 11/15/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Numerous publications describe the clinical manifestations of post-acute sequelae of SARS-CoV-2 (PASC or "long COVID"), but they are difficult to integrate because of heterogeneous methods and the lack of a standard for denoting the many phenotypic manifestations. Patient-led studies are of particular importance for understanding the natural history of COVID-19, but integration is hampered because they often use different terms to describe the same symptom or condition. This significant disparity in patient versus clinical characterization motivated the proposed ontological approach to specifying manifestations, which will improve capture and integration of future long COVID studies. METHODS The Human Phenotype Ontology (HPO) is a widely used standard for exchange and analysis of phenotypic abnormalities in human disease but has not yet been applied to the analysis of COVID-19. FUNDING We identified 303 articles published before April 29, 2021, curated 59 relevant manuscripts that described clinical manifestations in 81 cohorts three weeks or more following acute COVID-19, and mapped 287 unique clinical findings to HPO terms. We present layperson synonyms and definitions that can be used to link patient self-report questionnaires to standard medical terminology. Long COVID clinical manifestations are not assessed consistently across studies, and most manifestations have been reported with a wide range of synonyms by different authors. Across at least 10 cohorts, authors reported 31 unique clinical features corresponding to HPO terms; the most commonly reported feature was Fatigue (median 45.1%) and the least commonly reported was Nausea (median 3.9%), but the reported percentages varied widely between studies. INTERPRETATION Translating long COVID manifestations into computable HPO terms will improve analysis, data capture, and classification of long COVID patients. If researchers, clinicians, and patients share a common language, then studies can be compared/pooled more effectively. Furthermore, mapping lay terminology to HPO will help patients assist clinicians and researchers in creating phenotypic characterizations that are computationally accessible, thereby improving the stratification, diagnosis, and treatment of long COVID. FUNDING U24TR002306; UL1TR001439; P30AG024832; GBMF4552; R01HG010067; UL1TR002535; K23HL128909; UL1TR002389; K99GM145411.
Collapse
Affiliation(s)
- Rachel R Deer
- University of Texas Medical Branch, Galveston, TX, USA.
| | | | - Nicole Vasilevsky
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA; Monarch Initiative
| | - Leigh Carmody
- Monarch Initiative; The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Halie Rando
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA; Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Alfred J Anzalone
- Department of Neurological Sciences, College of Medicine, University of Nebraska Medical Center, Omaha, NE, USA
| | - Marc D Basson
- Department of Surgery, University of North Dakota School of Medicine and Health Sciences
| | - Tellen D Bennett
- Section of Informatics and Data Science, Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | | - Eilis A Boudreau
- Department of Neurology; Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239
| | - Carolyn T Bramante
- Departments of Internal Medicine and Pediatrics, University of Minnesota Medical School, Minneapolis, MN 55455
| | - James Brian Byrd
- Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan Medical School, Ann Arbor, MI, 48109
| | - Tiffany J Callahan
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Lauren E Chan
- Monarch Initiative; College of Public Health and Human Sciences, Oregon State University, Corvallis, OR, USA
| | - Haitao Chu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN USA
| | - Christopher G Chute
- Johns Hopkins University, Schools of Medicine, Public Health, and Nursing, Baltimore, MD, USA
| | - Ben D Coleman
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA; Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, USA
| | | | - Joel Gagnier
- Departments of Orthopaedic Surgery & Epidemiology, University of Michigan, Ann Arbor, MI, USA
| | - Casey S Greene
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA; Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - William B Hillegass
- University of Mississippi Medical Center, University of Mississippi Medical Center, Jackson, MS, USA; Departments of Data Science and Medicine
| | | | - Wesley D Kimble
- West Virginia Clinical and Translational Science Institute, West Virginia University, Morgantown, WV, USA
| | | | | | - Chen Liang
- Arnold School of Public Health, University of South Carolina, Columbia, SC, USA
| | - Feifan Liu
- Department of Population and Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, USA
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, MN, USA
| | | | - Charisse R Madlock-Brown
- Department of Diagnostic and Health Sciences, University of Tennessee Health Science Center, 920 Madison Ave. Suite 518N, Memphis TN 38613
| | - Nicolas Matentzoglu
- Monarch Initiative; Semanticly Ltd; European Bioinformatics Institute (EMBL-EBI)
| | - Diego R Mazzotti
- Division of Medical Informatics, Department of Internal Medicine, University of Kansas Medical Center
| | - Julie A McMurry
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA; Monarch Initiative
| | - Douglas S McNair
- Quantitative Sciences, Global Health Div., Gates Foundation, Seattle, WA 98109, USA
| | | | | | - Ann M Parker
- Pulmonary and Critical Care Medicine, Johns Hopkins University, Schools of Medicine, Baltimore, MD, USA
| | - Mallory A Perry
- Children's Hospital of Philadelphia Research Institute, Philadelphia, PA, USA
| | | | - Justin T Reese
- Monarch Initiative; Lawrence Berkeley National Laboratory
| | - Joel Saltz
- Stony Brook University; Biomedical Informatics
| | | | - Anthony E Solomonides
- Outcomes Research Network, Research Institute, NorthShore University HealthSystem, Evanston, IL 60201, USA; Institute for Translational Medicine, University of Chicago, Chicago, IL, USA
| | - Julian Solway
- Institute for Translational Medicine, University of Chicago, Chicago, IL, USA
| | - Heidi Spratt
- University of Texas Medical Branch, Galveston, TX, USA
| | - Gary S Stein
- University of Vermont Larner College of Medicine, Departments of Biochemistry and Surgery, Burlington, Vermont 05405
| | | | | | - George D Vavougios
- Department of Computer Science and Telecommunications, University of Thessaly, Papasiopoulou 2 - 4, P.C.; 131 - Galaneika, Lamia, Greece; Department of Neurology, Athens Naval Hospital 70 Deinokratous Street, P.C. 115 21 Athens, Greece; Department of Respiratory Medicine, Faculty of Medicine, University of Thessaly, Biopolis, P.C. 41500 Larissa, Greece
| | - Liwei Wang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, MN, USA
| | - Melissa A Haendel
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA; Monarch Initiative.
| | - Peter N Robinson
- Monarch Initiative; The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA; Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, USA.
| |
Collapse
|
22
|
Hernandez LAR, Callahan TJ, Banda JM. A biomedically oriented automatically annotated Twitter COVID-19 dataset. Genomics Inform 2021; 19:e21. [PMID: 34638168 PMCID: PMC8510871 DOI: 10.5808/gi.21011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Accepted: 07/26/2021] [Indexed: 01/08/2023] Open
Abstract
The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the coronavirus disease 2019 (COVID-19) pandemic, researchers have turned to more non-traditional sources of clinical data to characterize the disease in near-real time, study the societal implications of interventions, as well as the sequelae that recovered COVID-19 cases present. However, manually curated social media datasets are difficult to come by due to the expensive costs of manual annotation and the efforts needed to identify the correct texts. When datasets are available, they are usually very small and their annotations don't generalize well over time or to larger sets of documents. As part of the 2021 Biomedical Linked Annotation Hackathon, we release our dataset of over 120 million automatically annotated tweets for biomedical research purposes. Incorporating best-practices, we identify tweets with potentially high clinical relevance. We evaluated our work by comparing several SpaCy-based annotation frameworks against a manually annotated gold-standard dataset. Selecting the best method to use for automatic annotation, we then annotated 120 million tweets and released them publicly for future downstream usage within the biomedical domain.
Collapse
Affiliation(s)
| | - Tiffany J. Callahan
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Juan M. Banda
- Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA
| |
Collapse
|
23
|
Robles Hernandez LA, Callahan TJ, Banda JM. A Biomedically oriented automatically annotated Twitter COVID-19 Dataset. ArXiv 2021:arXiv:2107.12565v1. [PMID: 34341767 PMCID: PMC8328063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the COVID-19 pandemic, researchers have turned to more nontraditional sources of clinical data to characterize the disease in near real-time, study the societal implications of interventions, as well as the sequelae that recovered COVID-19 cases present (Long-COVID). However, manually curated social media datasets are difficult to come by due to the expensive costs of manual annotation and the efforts needed to identify the correct texts. When datasets are available, they are usually very small and their annotations do not generalize well over time or to larger sets of documents. As part of the 2021 Biomedical Linked Annotation Hackathon, we release our dataset of over 120 million automatically annotated tweets for biomedical research purposes. Incorporating best practices, we identify tweets with potentially high clinical relevance. We evaluated our work by comparing several SpaCy-based annotation frameworks against a manually annotated gold-standard dataset. Selecting the best method to use for automatic annotation, we then annotated 120 million tweets and released them publicly for future downstream usage within the biomedical domain.
Collapse
Affiliation(s)
| | - Tiffany J. Callahan
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045 USA
| | - Juan M. Banda
- Department of Computer Science, Georgia State University, Atlanta, Georgia, 30303 USA
| |
Collapse
|
24
|
Rando HM, Bennett TD, Byrd JB, Bramante C, Callahan TJ, Chute CG, Davis HE, Deer R, Gagnier J, Koraishy FM, Liu F, McMurry JA, Moffitt RA, Pfaff ER, Reese JT, Relevo R, Robinson PN, Saltz JH, Solomonides A, Sule A, Topaloglu U, Haendel MA. Challenges in defining Long COVID: Striking differences across literature, Electronic Health Records, and patient-reported information. medRxiv 2021:2021.03.20.21253896. [PMID: 33791733 PMCID: PMC8010765 DOI: 10.1101/2021.03.20.21253896] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Since late 2019, the novel coronavirus SARS-CoV-2 has introduced a wide array of health challenges globally. In addition to a complex acute presentation that can affect multiple organ systems, increasing evidence points to long-term sequelae being common and impactful. The worldwide scientific community is forging ahead to characterize a wide range of outcomes associated with SARS-CoV-2 infection; however the underlying assumptions in these studies have varied so widely that the resulting data are difficult to compareFormal definitions are needed in order to design robust and consistent studies of Long COVID that consistently capture variation in long-term outcomes. Even the condition itself goes by three terms, most widely "Long COVID", but also "COVID-19 syndrome (PACS)" or, "post-acute sequelae of SARS-CoV-2 infection (PASC)". In the present study, we investigate the definitions used in the literature published to date and compare them against data available from electronic health records and patient-reported information collected via surveys. Long COVID holds the potential to produce a second public health crisis on the heels of the pandemic itself. Proactive efforts to identify the characteristics of this heterogeneous condition are imperative for a rigorous scientific effort to investigate and mitigate this threat.
Collapse
Affiliation(s)
- Halie M. Rando
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO, USA
| | - Tellen D. Bennett
- Center for Health AI and Section of Informatics and Data Science, Department of Pediatrics, University of Colorado School of Medicine, University of Colorado, Aurora, CO, USA
| | | | | | - Tiffany J. Callahan
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Computational Bioscience, University of Colorado Anschutz Medical Campus, Boulder, CO, USA
| | - Christopher G. Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, MD, USA
| | | | - Rachel Deer
- The University of Texas Medical Branch at Galveston, Galveston, TX, USA
| | - Joel Gagnier
- Computational Bioscience, University of Colorado Anschutz Medical Campus, Boulder, CO, USA
| | | | - Feifan Liu
- University of Massachusetts Medical School Worcester, Worcester, MA, USA
| | - Julie A. McMurry
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Richard A. Moffitt
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| | - Emily R. Pfaff
- Department of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Justin T. Reese
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Rose Relevo
- Oregon Health & Science University, Portland, OR, USA
| | - Peter N. Robinson
- The Jackson Laboratory For Genomic Medicine, Farmington, CT, USA
- Institute for Systems Genomics, University of Connecticut, Farmington, CT, USA
| | - Joel H. Saltz
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA
| | | | - Anupam Sule
- Saint Joseph Mercy Health System, Ypsilanti, MI, USA
| | - Umit Topaloglu
- School of Medicine, Wake Forest University, Winston Salem, NC, USA
| | - Melissa A. Haendel
- Center for Health AI, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
25
|
Köhler S, Gargano M, Matentzoglu N, Carmody LC, Lewis-Smith D, Vasilevsky NA, Danis D, Balagura G, Baynam G, Brower AM, Callahan TJ, Chute CG, Est JL, Galer PD, Ganesan S, Griese M, Haimel M, Pazmandi J, Hanauer M, Harris NL, Hartnett M, Hastreiter M, Hauck F, He Y, Jeske T, Kearney H, Kindle G, Klein C, Knoflach K, Krause R, Lagorce D, McMurry JA, Miller JA, Munoz-Torres M, Peters RL, Rapp CK, Rath AM, Rind SA, Rosenberg A, Segal MM, Seidel MG, Smedley D, Talmy T, Thomas Y, Wiafe SA, Xian J, Yüksel Z, Helbig I, Mungall CJ, Haendel MA, Robinson PN. The Human Phenotype Ontology in 2021. Nucleic Acids Res 2021; 49:D1207-D1217. [PMID: 33264411 PMCID: PMC7778952 DOI: 10.1093/nar/gkaa1043] [Citation(s) in RCA: 476] [Impact Index Per Article: 158.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 10/11/2020] [Accepted: 11/16/2020] [Indexed: 12/21/2022] Open
Abstract
The Human Phenotype Ontology (HPO, https://hpo.jax.org) was launched in 2008 to provide a comprehensive logical standard to describe and computationally analyze phenotypic abnormalities found in human disease. The HPO is now a worldwide standard for phenotype exchange. The HPO has grown steadily since its inception due to considerable contributions from clinical experts and researchers from a diverse range of disciplines. Here, we present recent major extensions of the HPO for neurology, nephrology, immunology, pulmonology, newborn screening, and other areas. For example, the seizure subontology now reflects the International League Against Epilepsy (ILAE) guidelines and these enhancements have already shown clinical validity. We present new efforts to harmonize computational definitions of phenotypic abnormalities across the HPO and multiple phenotype ontologies used for animal models of disease. These efforts will benefit software such as Exomiser by improving the accuracy and scope of cross-species phenotype matching. The computational modeling strategy used by the HPO to define disease entities and phenotypic features and distinguish between them is explained in detail.We also report on recent efforts to translate the HPO into indigenous languages. Finally, we summarize recent advances in the use of HPO in electronic health record systems.
Collapse
Affiliation(s)
| | - Michael Gargano
- Monarch Initiative
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Nicolas Matentzoglu
- Monarch Initiative
- Semanticly Ltd, London, UK
- European Bioinformatics Institute (EMBL-EBI)
| | - Leigh C Carmody
- Monarch Initiative
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - David Lewis-Smith
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, UK
- Clinical Neurosciences, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK
| | - Nicole A Vasilevsky
- Monarch Initiative
- Oregon Clinical & Translational Research Institute, Oregon Health & Science University
| | | | - Ganna Balagura
- Department of Neurosciences, Rehabilitation, Ophthalmology, Genetics, and Maternal and Child Health, University of Genoa, Genoa, Italy
- Pediatric Neurology and Muscular Diseases Unit, IRCCS ‘G. Gaslini’ Institute, Genoa, Italy
| | - Gareth Baynam
- Western Australian Register of Developmental Anomalies, King Edward memorial Hospital, Perth, Australia
- Telethon Kids Institute and the Division of Paediatrics, Faculty of Helath and Medical Sciences, University of Western Australia, Perth, Australia
| | - Amy M Brower
- American College of Medical Genetics and Genomics (ACMG), Bethesda, MD, USA
| | - Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Colorado, USA
| | | | - Johanna L Est
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Peter D Galer
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Shiva Ganesan
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Matthias Griese
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
- Ludwig-Maximilians University, German Center for Lung Research (DZL), Munich, Germany
| | - Matthias Haimel
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, Vienna, Austria
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Julia Pazmandi
- Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases, Vienna, Austria
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
- Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, USA
| | - Marc Hanauer
- INSERM, US14––Orphanet, Plateforme Maladies Rares, Paris, France
| | - Nomi L Harris
- Monarch Initiative
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley CA, USA
| | - Michael J Hartnett
- American College of Medical Genetics and Genomics (ACMG), Bethesda, MD, USA
| | - Maximilian Hastreiter
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Fabian Hauck
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
- German Centre for Infection Research (DZIF), Munich, Germany
| | - Yongqun He
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, Center for Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Tim Jeske
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Hugh Kearney
- FutureNeuro, SFI Research Centre for Chronic and Rare Neurological Diseases, Ireland
| | - Gerhard Kindle
- Institute for Immunodeficiency, Center for Chronic Immunodeficiency (CCI). Faculty of Medicine, Medical Center - University of Freiburg, Freiburg, Germany
- Centre for Biobanking FREEZE, Faculty of Medicine, Medical Center - University of Freiburg, Freiburg, Germany
| | - Christoph Klein
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Katrin Knoflach
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
- Ludwig-Maximilians University, German Center for Lung Research (DZL), Munich, Germany
| | - Roland Krause
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4367 Belvaux, Luxembourg
| | - David Lagorce
- INSERM, US14––Orphanet, Plateforme Maladies Rares, Paris, France
| | - Julie A McMurry
- Monarch Initiative
- Translational and Integrative Sciences Center, Department of Environmental and Molecular Toxicology, Oregon State University, OR, USA
| | - Jillian A Miller
- American College of Medical Genetics and Genomics (ACMG), Bethesda, MD, USA
| | - Monica C Munoz-Torres
- Monarch Initiative
- Translational and Integrative Sciences Center, Department of Environmental and Molecular Toxicology, Oregon State University, OR, USA
| | - Rebecca L Peters
- American College of Medical Genetics and Genomics (ACMG), Bethesda, MD, USA
| | - Christina K Rapp
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, Ludwig-Maximilians-Universität München, Munich, Germany
- Ludwig-Maximilians University, German Center for Lung Research (DZL), Munich, Germany
| | - Ana M Rath
- INSERM, US14––Orphanet, Plateforme Maladies Rares, Paris, France
| | - Shahmir A Rind
- WA Register of Developmental Anomalies
- Curtin University, Western Australia, Australia
| | - Avi Z Rosenberg
- Division of Kidney-Urologic Pathology, Johns Hopkins University, Baltimore, MD 21205, USA
| | | | - Markus G Seidel
- Research Unit for Pediatric Hematology and Immunology, Division of Pediatric Hemato-Oncology, Department of Pediatrics and Adolescent Medicine, Medical University of Graz, Graz, Austria
| | - Damian Smedley
- The William Harvey Research Institute, Charterhouse Square Barts and the London School of Medicine and Dentistry Queen Mary University of London, London EC1M 6BQ, UK
| | - Tomer Talmy
- Genomic Research Department, Emedgene Technologies, Tel Aviv, Israel
- Faculty of Medicine, Hebrew University Hadassah Medical School, Jerusalem, Israel
| | - Yarlalu Thomas
- West Australian Register of Developmental Anomalies, East Perth, WA, Australia
| | | | - Julie Xian
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, PA, USA
| | - Zafer Yüksel
- Human Genetics, Bioscientia GmbH, Ingelheim, Germany
| | - Ingo Helbig
- Department of Neurology, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA
- The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Christopher J Mungall
- Monarch Initiative
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley CA, USA
| | - Melissa A Haendel
- Monarch Initiative
- Oregon Clinical & Translational Research Institute, Oregon Health & Science University
- Translational and Integrative Sciences Center, Department of Environmental and Molecular Toxicology, Oregon State University, OR, USA
| | - Peter N Robinson
- Monarch Initiative
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032, USA
| |
Collapse
|
26
|
Reese JT, Unni D, Callahan TJ, Cappelletti L, Ravanmehr V, Carbon S, Shefchek KA, Good BM, Balhoff JP, Fontana T, Blau H, Matentzoglu N, Harris NL, Munoz-Torres MC, Haendel MA, Robinson PN, Joachimiak MP, Mungall CJ. KG-COVID-19: A Framework to Produce Customized Knowledge Graphs for COVID-19 Response. Patterns (N Y) 2021; 2:100155. [PMID: 33196056 PMCID: PMC7649624 DOI: 10.1016/j.patter.2020.100155] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 10/02/2020] [Accepted: 11/05/2020] [Indexed: 02/06/2023]
Abstract
Integrated, up-to-date data about SARS-CoV-2 and COVID-19 is crucial for the ongoing response to the COVID-19 pandemic by the biomedical research community. While rich biological knowledge exists for SARS-CoV-2 and related viruses (SARS-CoV, MERS-CoV), integrating this knowledge is difficult and time-consuming, since much of it is in siloed databases or in textual format. Furthermore, the data required by the research community vary drastically for different tasks; the optimal data for a machine learning task, for example, is much different from the data used to populate a browsable user interface for clinicians. To address these challenges, we created KG-COVID-19, a flexible framework that ingests and integrates heterogeneous biomedical data to produce knowledge graphs (KGs), and applied it to create a KG for COVID-19 response. This KG framework also can be applied to other problems in which siloed biomedical data must be quickly integrated for different research applications, including future pandemics.
Collapse
Affiliation(s)
- Justin T. Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Deepak Unni
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Tiffany J. Callahan
- Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045, USA
| | - Luca Cappelletti
- Department of Computer Science, University of Milano, 20122 Milan, Italy
| | - Vida Ravanmehr
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Seth Carbon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Kent A. Shefchek
- Linus Pauling Institute, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Benjamin M. Good
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - James P. Balhoff
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27517, USA
| | - Tommaso Fontana
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133 Milan, Italy
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | | | - Nomi L. Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Monica C. Munoz-Torres
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Linus Pauling Institute, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Melissa A. Haendel
- Linus Pauling Institute, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Peter N. Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Marcin P. Joachimiak
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Christopher J. Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| |
Collapse
|
27
|
Thessen AE, Grondin CJ, Kulkarni RD, Brander S, Truong L, Vasilevsky NA, Callahan TJ, Chan LE, Westra B, Willis M, Rothenberg SE, Jarabek AM, Burgoon L, Korrick SA, Haendel MA. Community Approaches for Integrating Environmental Exposures into Human Models of Disease. Environ Health Perspect 2020; 128:125002. [PMID: 33369481 PMCID: PMC7769179 DOI: 10.1289/ehp7215] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 11/30/2020] [Accepted: 12/04/2020] [Indexed: 05/03/2023]
Abstract
BACKGROUND A critical challenge in genomic medicine is identifying the genetic and environmental risk factors for disease. Currently, the available data links a majority of known coding human genes to phenotypes, but the environmental component of human disease is extremely underrepresented in these linked data sets. Without environmental exposure information, our ability to realize precision health is limited, even with the promise of modern genomics. Achieving integration of gene, phenotype, and environment will require extensive translation of data into a standard, computable form and the extension of the existing gene/phenotype data model. The data standards and models needed to achieve this integration do not currently exist. OBJECTIVES Our objective is to foster development of community-driven data-reporting standards and a computational model that will facilitate the inclusion of exposure data in computational analysis of human disease. To this end, we present a preliminary semantic data model and use cases and competency questions for further community-driven model development and refinement. DISCUSSION There is a real desire by the exposure science, epidemiology, and toxicology communities to use informatics approaches to improve their research workflow, gain new insights, and increase data reuse. Critical to success is the development of a community-driven data model for describing environmental exposures and linking them to existing models of human disease. https://doi.org/10.1289/EHP7215.
Collapse
Affiliation(s)
- Anne E. Thessen
- Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, USA
- Ronin Institute for Independent Scholarship, Montclair, New Jersey, USA
| | - Cynthia J. Grondin
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina, USA
| | - Resham D. Kulkarni
- Biomedical Informatics and Data Science, Frederick National Laboratory for Cancer Research, Frederick, Maryland, USA
| | - Susanne Brander
- Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, USA
| | - Lisa Truong
- Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, USA
| | - Nicole A. Vasilevsky
- Oregon Clinical & Translational Research Institute, Oregon Health & Science University, Portland, Oregon, USA
- Department of Medical Informatics and Clinical Epidemiology, School of Medicine, Oregon Health & Science University, Portland, Oregon, USA
| | - Tiffany J. Callahan
- Computational Bioscience Program, University of Colorado Denver Anschutz Medical Campus, Aurora, Colorado, USA
- Department of Pharmacology, School of Medicine, University of Colorado Denver Anschutz Medical Campus, Aurora, Colorado, USA
| | - Lauren E. Chan
- Nutrition, Oregon State University, Corvallis, Oregon, USA
| | - Brian Westra
- University Libraries, University of Iowa, Iowa City, Iowa, USA
| | - Mary Willis
- School of Biological and Population Health Sciences, College of Public Health and Human Sciences, Oregon State University, Corvallis, Oregon, USA
| | - Sarah E. Rothenberg
- School of Biological and Population Health Sciences, College of Public Health and Human Sciences, Oregon State University, Corvallis, Oregon, USA
| | - Annie M. Jarabek
- Center for Public Health and Environmental Assessment, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | - Lyle Burgoon
- U.S. Army Engineering Research and Development Center, Vicksburg, Mississippi, USA
| | - Susan A. Korrick
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Melissa A. Haendel
- Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, USA
| |
Collapse
|
28
|
Reese JT, Unni D, Callahan TJ, Cappelletti L, Ravanmehr V, Carbon S, Shefchek KA, Good BM, Balhoff JP, Fontana T, Blau H, Matentzoglu N, Harris NL, Munoz-Torres MC, Haendel MA, Robinson PN, Joachimiak MP, Mungall CJ. KG-COVID-19: A Framework to Produce Customized Knowledge Graphs for COVID-19 Response. Patterns (N Y) 2020; 2:100155. [PMID: 33196056 PMCID: PMC7649624 DOI: 10.2196/13803.100155 10.1016/j.patter.2020.100155] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Integrated, up-to-date data about SARS-CoV-2 and COVID-19 is crucial for the ongoing response to the COVID-19 pandemic by the biomedical research community. While rich biological knowledge exists for SARS-CoV-2 and related viruses (SARS-CoV, MERS-CoV), integrating this knowledge is difficult and time-consuming, since much of it is in siloed databases or in textual format. Furthermore, the data required by the research community vary drastically for different tasks; the optimal data for a machine learning task, for example, is much different from the data used to populate a browsable user interface for clinicians. To address these challenges, we created KG-COVID-19, a flexible framework that ingests and integrates heterogeneous biomedical data to produce knowledge graphs (KGs), and applied it to create a KG for COVID-19 response. This KG framework also can be applied to other problems in which siloed biomedical data must be quickly integrated for different research applications, including future pandemics.
Collapse
Affiliation(s)
- Justin T. Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA,Corresponding author
| | - Deepak Unni
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Tiffany J. Callahan
- Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045, USA
| | - Luca Cappelletti
- Department of Computer Science, University of Milano, 20122 Milan, Italy
| | - Vida Ravanmehr
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Seth Carbon
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Kent A. Shefchek
- Linus Pauling Institute, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Benjamin M. Good
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - James P. Balhoff
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27517, USA
| | - Tommaso Fontana
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133 Milan, Italy
| | - Hannah Blau
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | | | - Nomi L. Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Monica C. Munoz-Torres
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA,Linus Pauling Institute, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Melissa A. Haendel
- Linus Pauling Institute, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
| | - Peter N. Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Marcin P. Joachimiak
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Christopher J. Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| |
Collapse
|
29
|
Reese J, Unni D, Callahan TJ, Cappelletti L, Ravanmehr V, Carbon S, Fontana T, Blau H, Matentzoglu N, Harris NL, Munoz-Torres MC, Robinson PN, Joachimiak MP, Mungall CJ. KG-COVID-19: a framework to produce customized knowledge graphs for COVID-19 response. bioRxiv 2020:2020.08.17.254839. [PMID: 32839776 PMCID: PMC7444288 DOI: 10.1101/2020.08.17.254839] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Integrated, up-to-date data about SARS-CoV-2 and coronavirus disease 2019 (COVID-19) is crucial for the ongoing response to the COVID-19 pandemic by the biomedical research community. While rich biological knowledge exists for SARS-CoV-2 and related viruses (SARS-CoV, MERS-CoV), integrating this knowledge is difficult and time consuming, since much of it is in siloed databases or in textual format. Furthermore, the data required by the research community varies drastically for different tasks - the optimal data for a machine learning task, for example, is much different from the data used to populate a browsable user interface for clinicians. To address these challenges, we created KG-COVID-19, a flexible framework that ingests and integrates biomedical data to produce knowledge graphs (KGs) for COVID-19 response. This KG framework can also be applied to other problems in which siloed biomedical data must be quickly integrated for different research applications, including future pandemics. BIGGER PICTURE An effective response to the COVID-19 pandemic relies on integration of many different types of data available about SARS-CoV-2 and related viruses. KG-COVID-19 is a framework for producing knowledge graphs that can be customized for downstream applications including machine learning tasks, hypothesis-based querying, and browsable user interface to enable researchers to explore COVID-19 data and discover relationships.
Collapse
|
30
|
Abstract
Knowledge-based biomedical data science involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey recent progress in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as progress on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing to construct knowledge graphs, and the expansion of novel knowledge-based approaches to clinical and biological domains.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Computational Bioscience Program and Department of Pharmacology, University of Colorado Denver Anschutz Medical Campus, Aurora, Colorado 80045, USA
| | - Ignacio J Tripodi
- Department of Computer Science, University of Colorado, Boulder, Colorado 80309, USA
| | - Harrison Pielke-Lombardo
- Computational Bioscience Program and Department of Pharmacology, University of Colorado Denver Anschutz Medical Campus, Aurora, Colorado 80045, USA
| | - Lawrence E Hunter
- Computational Bioscience Program and Department of Pharmacology, University of Colorado Denver Anschutz Medical Campus, Aurora, Colorado 80045, USA
| |
Collapse
|
31
|
Kim JD, Wang Y, Fujiwara T, Okuda S, Callahan TJ, Cohen KB. Open Agile text mining for bioinformatics: the PubAnnotation ecosystem. Bioinformatics 2020; 35:4372-4380. [PMID: 30937439 PMCID: PMC6821251 DOI: 10.1093/bioinformatics/btz227] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Revised: 03/16/2019] [Accepted: 03/29/2019] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Most currently available text mining tools share two characteristics that make them less than optimal for use by biomedical researchers: they require extensive specialist skills in natural language processing and they were built on the assumption that they should optimize global performance metrics on representative datasets. This is a problem because most end-users are not natural language processing specialists and because biomedical researchers often care less about global metrics like F-measure or representative datasets than they do about more granular metrics such as precision and recall on their own specialized datasets. Thus, there are fundamental mismatches between the assumptions of much text mining work and the preferences of potential end-users. RESULTS This article introduces the concept of Agile text mining, and presents the PubAnnotation ecosystem as an example implementation. The system approaches the problems from two perspectives: it allows the reformulation of text mining by biomedical researchers from the task of assembling a complete system to the task of retrieving warehoused annotations, and it makes it possible to do very targeted customization of the pre-existing system to address specific end-user requirements. Two use cases are presented: assisted curation of the GlycoEpitope database, and assessing coverage in the literature of pre-eclampsia-associated genes. AVAILABILITY AND IMPLEMENTATION The three tools that make up the ecosystem, PubAnnotation, PubDictionaries and TextAE are publicly available as web services, and also as open source projects. The dictionaries and the annotation datasets associated with the use cases are all publicly available through PubDictionaries and PubAnnotation, respectively.
Collapse
Affiliation(s)
- Jin-Dong Kim
- Database Center for Life Science, Research Organization of Information and Systems, Kashiwa, Chiba, Japan
| | - Yue Wang
- Database Center for Life Science, Research Organization of Information and Systems, Kashiwa, Chiba, Japan
| | - Toyofumi Fujiwara
- Database Center for Life Science, Research Organization of Information and Systems, Kashiwa, Chiba, Japan
| | - Shujiro Okuda
- Graduate School of Medical and Dental Sciences, Niigata University, Niigata, Japan
| | - Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Denver, Anschutz Medical Campus, Aurora, CO, USA
| | - K Bretonnel Cohen
- Computational Bioscience Program, University of Colorado Denver, Anschutz Medical Campus, Aurora, CO, USA.,Université Paris-Saclay, LIMSI-ILES, France
| |
Collapse
|
32
|
Tripodi IJ, Callahan TJ, Westfall JT, Meitzer NS, Dowell RD, Hunter LE. Applying knowledge-driven mechanistic inference to toxicogenomics. Toxicol In Vitro 2020; 66:104877. [PMID: 32387679 DOI: 10.1016/j.tiv.2020.104877] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 04/13/2020] [Accepted: 04/23/2020] [Indexed: 02/07/2023]
Abstract
When considering toxic chemicals in the environment, a mechanistic, causal explanation of toxicity may be preferred over a statistical or machine learning-based prediction by itself. Elucidating a mechanism of toxicity is, however, a costly and time-consuming process that requires the participation of specialists from a variety of fields, often relying on animal models. We present an innovative mechanistic inference framework (MechSpy), which can be used as a hypothesis generation aid to narrow the scope of mechanistic toxicology analysis. MechSpy generates hypotheses of the most likely mechanisms of toxicity, by combining a semantically-interconnected knowledge representation of human biology, toxicology and biochemistry with gene expression time series on human tissue. Using vector representations of biological entities, MechSpy seeks enrichment in a manually curated list of high-level mechanisms of toxicity, represented as biochemically- and causally-linked ontology concepts. Besides predicting the canonical mechanism of toxicity for many well-studied compounds, we experimentally validated some of our predictions for other chemicals without an established mechanism of toxicity. This mechanistic inference framework is an advantageous tool for predictive toxicology, and the first of its kind to produce a mechanistic explanation for each prediction. MechSpy can be modified to include additional mechanisms of toxicity, and is generalizable to other types of mechanisms of human biology.
Collapse
Affiliation(s)
- Ignacio J Tripodi
- University of Colorado, Computer Science / Interdisciplinary Quantitative Biology, Boulder, CO 80309, USA.
| | - Tiffany J Callahan
- University of Colorado Anschutz Medical Campus, Computational Bioscience, Denver, CO 80045, USA
| | - Jessica T Westfall
- University of Colorado, Molecular, Cellular and Developmental Biology, Boulder, CO 80309, USA
| | | | - Robin D Dowell
- University of Colorado, Molecular, Cellular and Developmental Biology / Interdisciplinary Quantitative Biology, Boulder, CO 80309, USA
| | - Lawrence E Hunter
- University of Colorado Anschutz Medical Campus, Computational Bioscience / Interdisciplinary Quantitative Biology, Denver, CO 80045, USA
| |
Collapse
|
33
|
Zhang XA, Yates A, Vasilevsky N, Gourdine JP, Callahan TJ, Carmody LC, Danis D, Joachimiak MP, Ravanmehr V, Pfaff ER, Champion J, Robasky K, Xu H, Fecho K, Walton NA, Zhu RL, Ramsdill J, Mungall CJ, Köhler S, Haendel MA, McDonald CJ, Vreeman DJ, Peden DB, Bennett TD, Feinstein JA, Martin B, Stefanski AL, Hunter LE, Chute CG, Robinson PN. Semantic integration of clinical laboratory tests from electronic health records for deep phenotyping and biomarker discovery. NPJ Digit Med 2019; 2:32. [PMID: 31119199 PMCID: PMC6527418 DOI: 10.1038/s41746-019-0110-4] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Accepted: 04/18/2019] [Indexed: 12/22/2022] Open
Abstract
Electronic Health Record (EHR) systems typically define laboratory test results using the Laboratory Observation Identifier Names and Codes (LOINC) and can transmit them using Fast Healthcare Interoperability Resource (FHIR) standards. LOINC has not yet been semantically integrated with computational resources for phenotype analysis. Here, we provide a method for mapping LOINC-encoded laboratory test results transmitted in FHIR standards to Human Phenotype Ontology (HPO) terms. We annotated the medical implications of 2923 commonly used laboratory tests with HPO terms. Using these annotations, our software assesses laboratory test results and converts each result into an HPO term. We validated our approach with EHR data from 15,681 patients with respiratory complaints and identified known biomarkers for asthma. Finally, we provide a freely available SMART on FHIR application that can be used within EHR systems. Our approach allows readily available laboratory tests in EHR to be reused for deep phenotyping and exploits the hierarchical structure of HPO to integrate distinct tests that have comparable medical interpretations for association studies.
Collapse
Affiliation(s)
| | - Amy Yates
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
| | - Nicole Vasilevsky
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR 97239 USA
| | - J. P. Gourdine
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
- Library, Oregon Health and Science University, Portland, OR 97239 USA
| | - Tiffany J. Callahan
- Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045 USA
| | - Leigh C. Carmody
- The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032 USA
| | - Daniel Danis
- The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032 USA
| | - Marcin P. Joachimiak
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA
| | - Vida Ravanmehr
- The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032 USA
| | - Emily R. Pfaff
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - James Champion
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Kimberly Robasky
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
- Genetics Department, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
- School of Information and Library Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Hao Xu
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Karamarie Fecho
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Nephi A. Walton
- Genomic Medicine Institute, Geisinger Health System, Danville, PA 17822 USA
| | - Richard L. Zhu
- Institute for Clinical and Translational Research, Johns Hopkins University, Baltimore, MD 21202 USA
| | - Justin Ramsdill
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
| | - Christopher J. Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA
| | - Sebastian Köhler
- Charité Centrum für Therapieforschung, Charité - Universitätsmedizin Berlin Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, 10117 Germany
- Einstein Center Digital Future, Berlin, 10117 Germany
| | - Melissa A. Haendel
- Oregon Clinical and Translational Research Institute, Oregon Health and Science University, Portland, OR 97239 USA
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR 97239 USA
- Linus Pauling Institute and Center for Genome Research and Biocomputing, Oregon State University, Corvallis, OR 97331 USA
| | - Clement J. McDonald
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA
| | - Daniel J. Vreeman
- Department of Medicine, Indiana University School of Medicine, Indianapolis, IN 46202 USA
- Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, IN 46202 USA
| | - David B. Peden
- North Carolina Translational and Clinical Sciences Institute (NC TraCS), University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
- Division of Allergy, Immunology and Rheumatology, Department of Pediatrics, University of North Carolina, Chapel Hill, NC 27599 USA
- University of North Carolina Center for Environmental Medicine, Asthma and Lung Biology, University of North Carolina, Chapel Hill, NC 27599 USA
| | - Tellen D. Bennett
- Department of Pediatrics, Section of Pediatric Critical Care, University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - James A. Feinstein
- Adult and Child Consortium for Health Outcomes Research and Delivery Science (ACCORDS), University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - Blake Martin
- Department of Pediatrics, Section of Pediatric Critical Care, University of Colorado School of Medicine, Aurora, CO 80045 USA
| | - Adrianne L. Stefanski
- Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045 USA
| | - Lawrence E. Hunter
- Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045 USA
| | - Christopher G. Chute
- Institute for Clinical and Translational Research, Johns Hopkins University, Baltimore, MD 21202 USA
| | - Peter N. Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington CT, 06032 USA
- Institute for Systems Genomics, University of Connecticut, Farmington, CT 06032 USA
| |
Collapse
|
34
|
Bennett TD, Callahan TJ, Feinstein JA, Ghosh D, Lakhani SA, Spaeder MC, Szefler SJ, Kahn MG. Data Science for Child Health. J Pediatr 2019; 208:12-22. [PMID: 30686480 PMCID: PMC6486872 DOI: 10.1016/j.jpeds.2018.12.041] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Revised: 12/11/2018] [Accepted: 12/18/2018] [Indexed: 12/12/2022]
Affiliation(s)
- Tellen D Bennett
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO; CU Data Science to Patient Value (D2V), University of Colorado School of Medicine, Aurora, CO; Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO; Adult and Child Consortium for Outcomes Research and Delivery Science (ACCORDS), University of Colorado School of Medicine and Children's Hospital Colorado, Aurora, CO; Computational Bioscience Program, University of Colorado Denver Anschutz Medical Campus, Aurora, CO.
| | - Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Denver Anschutz Medical Campus, Aurora, CO
| | - James A Feinstein
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO; Adult and Child Consortium for Outcomes Research and Delivery Science (ACCORDS), University of Colorado School of Medicine and Children's Hospital Colorado, Aurora, CO
| | - Debashis Ghosh
- CU Data Science to Patient Value (D2V), University of Colorado School of Medicine, Aurora, CO; Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO; Computational Bioscience Program, University of Colorado Denver Anschutz Medical Campus, Aurora, CO
| | - Saquib A Lakhani
- Pediatric Genomics Discovery Program, Department of Pediatrics, Yale University School of Medicine, New Haven, CT
| | - Michael C Spaeder
- Pediatric Critical Care, University of Virginia School of Medicine, Charlottesville, VA
| | - Stanley J Szefler
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO; Adult and Child Consortium for Outcomes Research and Delivery Science (ACCORDS), University of Colorado School of Medicine and Children's Hospital Colorado, Aurora, CO
| | - Michael G Kahn
- Department of Pediatrics, University of Colorado School of Medicine, Aurora, CO; Computational Bioscience Program, University of Colorado Denver Anschutz Medical Campus, Aurora, CO
| |
Collapse
|
35
|
Stefanski AL, Martinez N, Peterson LK, Callahan TJ, Treacy E, Luck M, Friend SF, Hermesch A, Maltepe E, Phang T, Dragone LL, Winn VD. Murine trophoblast-derived and pregnancy-associated exosome-enriched extracellular vesicle microRNAs: Implications for placenta driven effects on maternal physiology. PLoS One 2019; 14:e0210675. [PMID: 30730971 PMCID: PMC6366741 DOI: 10.1371/journal.pone.0210675] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 12/28/2018] [Indexed: 12/23/2022] Open
Abstract
The role of extracellular vesicles (EVs), specifically exosomes, in intercellular communication likely plays a key role in placental orchestration of pregnancy and maternal immune sensing of the fetus. While murine models are powerful tools to study pregnancy and maternal-fetal immune interactions, in contrast to human placental exosomes, the content of murine placental and pregnancy exosomes remains largely understudied. Using a recently developed in vitro culture technique, murine trophoblast stem cells derived from B6 mice were differentiated into syncytial-like cells. EVs from the conditioned media, as well as from pregnant and non-pregnant sera, were enriched for exosomes. The RNA composition of these murine trophoblast-derived and pregnancy-associated exosome-enriched-EVs (ExoE-EVs) was determined using RNA-sequencing analysis and expression levels confirmed by qRT-PCR. Differentially abundant miRNAs were detected in syncytial differentiated ExoE-EVs, particularly from the X chromosome cluster (mmu-miR-322-3p, mmu-miR-322-5p, mmu-miR-503-5p, mmu-miR-542-3p, and mmu-miR-450a-5p). These were confirmed to be increased in pregnant mouse sera ExoE-EVs by qRT-PCR analysis. Interestingly, fifteen miRNAs were only present within the pregnancy-derived ExoE-EVs compared to non-pregnant controls. Mmu-miR-292-3p and mmu-miR-183-5p were noted to be some of the most abundant miRNAs in syncytial ExoE-EVs and were also present at higher levels in pregnant versus non-pregnant sera ExoE-EVs. The bioinformatics tool, MultiMir, was employed to query publicly available databases of predicted miRNA-target interactions. This analysis reveals that the X-chromosome miRNAs are predicted to target ubiquitin-mediated proteolysis and intracellular signaling pathways. Knowing the cargo of placental and pregnancy-specific ExoE-EVs as well as the predicted biological targets informs studies using murine models to examine not only maternal-fetal immune interactions but also the physiologic consequences of placental-maternal communication.
Collapse
Affiliation(s)
- Adrianne L. Stefanski
- Department of Obstetrics and Gynecology, University of Colorado School of Medicine, Aurora, CO, United States of America
- Department of Medicine, Division of Pulmonary Sciences and Critical Care Medicine, University of Colorado School of Medicine, Aurora CO, United States of America
| | - Nadine Martinez
- Department of Obstetrics and Gynecology, Stanford University School of Medicine, Stanford, CA, United States of America
| | - Lisa K. Peterson
- Department of Pediatrics, National Jewish Health, Denver CO, United States of America
| | - Tiffany J. Callahan
- Computational Biosciences Program, University of Colorado School of Medicine, Aurora, CO, United States of America
| | - Eric Treacy
- Department of Pediatrics, National Jewish Health, Denver CO, United States of America
| | - Marisa Luck
- Department of Pediatrics, National Jewish Health, Denver CO, United States of America
| | - Samantha F. Friend
- Department of Pediatrics, National Jewish Health, Denver CO, United States of America
| | - Amy Hermesch
- Department of Obstetrics and Gynecology, University of Colorado School of Medicine, Aurora, CO, United States of America
| | - Emin Maltepe
- Department of Medicine, University of Colorado School of Medicine, Aurora CO, United States of America
| | - Tzu Phang
- Department of Medicine, University of Colorado School of Medicine, Aurora CO, United States of America
| | - Leonard L. Dragone
- Department of Pediatrics, National Jewish Health, Denver CO, United States of America
- Department of Pediatrics, University of Colorado School of Medicine, Aurora CO, United States of America
| | - Virginia D. Winn
- Department of Obstetrics and Gynecology, University of Colorado School of Medicine, Aurora, CO, United States of America
- Department of Obstetrics and Gynecology, Stanford University School of Medicine, Stanford, CA, United States of America
- * E-mail:
| |
Collapse
|
36
|
Hastings-Tolsma M, Foster SW, Brucker MC, Nodine P, Burpo R, Camune B, Griggs J, Callahan TJ. Nature and scope of certified nurse-midwifery practice: A workforce study. J Clin Nurs 2018; 27:4000-4017. [PMID: 29679403 PMCID: PMC7992184 DOI: 10.1111/jocn.14489] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/11/2018] [Indexed: 11/28/2022]
Abstract
AIMS AND OBJECTIVES To describe the nature and scope of nurse-midwifery practice in Texas and to determine legislative priorities and practice barriers. BACKGROUND Across the globe, midwives are the largest group of maternity care providers despite little known about midwifery practice. With a looming shortage of midwives, there is a pressing need to understand midwives' work environment and scope of practice. DESIGN Mixed methods research utilising prospective descriptive survey and interview. METHODS An online survey was administered to nurse-midwives practicing in the state of Texas (N = 449) with a subset (n = 10) telephone interviewed. Descriptive and inferential statistics and content analysis was performed. RESULTS The survey was completed by 141 midwives with eight interviewed. Most were older, Caucasian and held a master's degree. A majority worked full-time, were in clinical practice in larger urban areas and were employed by a hospital or physician group. Care was most commonly provided for Hispanic and White women; approximately a quarter could care for greater numbers of patients. Most did not clinically teach midwifery students. Physician practice agreements were believed unnecessary and prescriptive authority requirements restrictive. Legislative issues were typically followed through the professional organisation or social media sites; most felt a lack of competence to influence health policy decisions. While most were satisfied with current clinical practice, a majority planned a change in the next 3 to 5 years. CONCLUSIONS An ageing midwifery workforce, not representative of the race/ethnicity of the populations served, is underutilised with practice requirements that limit provision of services. Health policy changes are needed to ensure unrestricted practice. RELEVANCE TO CLINICAL PRACTICE Robust midwifery workforce data are needed as well as a midwifery board which tracks availability and accessibility of midwives. Educators should consider training models promoting long-term service in underserved areas, and development of skills crucial for impacting health policy change.
Collapse
Affiliation(s)
| | | | - Mary C. Brucker
- School of Nursing, Georgetown University, Washington, District of Columbia
| | - Priscilla Nodine
- College of Nursing, University of Colorado Denver Anschutz Medical Campus, Aurora, Colorado
| | - Rebecca Burpo
- School of Nursing, Texas Tech University Health Sciences Center, Lubbock, Texas
| | - Barbara Camune
- Louise Herrington School of Nursing, Baylor University, Dallas, Texas
| | | | - Tiffany J. Callahan
- Computational Bioscience Program, University of Colorado Denver Anschutz Medical Campus, Aurora, Colorado
| |
Collapse
|
37
|
Toler S, Stapleton S, Kertsburg K, Callahan TJ, Hastings-Tolsma M. Screening for postpartum anxiety: A quality improvement project to promote the screening of women suffering in silence. Midwifery 2018; 62:161-170. [PMID: 29684795 PMCID: PMC8040026 DOI: 10.1016/j.midw.2018.03.016] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Revised: 02/28/2018] [Accepted: 03/17/2018] [Indexed: 11/29/2022]
Abstract
BACKGROUND Postpartum anxiety is a mental health problem that has largely been ignored by maternity care providers despite an estimated incidence as high as 28.9%. Though postpartum anxiety may or may not be accompanied by depression, and while screening for postpartum depression has become more common place, postpartum anxiety is often not assessed or addressed. PURPOSE The purpose of this pilot quality improvement project was to implement a screening, treatment and referral program for postpartum anxiety in the birth centre environment. PROCEDURES Midwives from 10 geographically diverse birth centres, and all members of the American Association of Birth Centres, were recruited to participate in the project. An online video was developed which detailed postpartum anxiety, screening through use of the anxiety subscale of the Edinburgh Postnatal Depression Scale and a toolkit for treatment and/or referral for screen positive patients. Participants entered patient scores into the Perinatal Data Registry of the American Association of Birth Centres. Individual interviews of midwives were conducted following the 10-week pilot period. MAIN FINDINGS There were a total of 387 participants across 9 participating sites. Among all screened participants with follow-up data, (n = 382), 9.69% (n = 37) were lost to follow-up. Among all participants screened with the Edinburgh Postpartum Depression Scale -3A and Edinburgh Postpartum Depression Scale (n = 318), 12.58% (n = 40) had a positive Edinburgh Postpartum Depression Scale -3A score of greater than six. Of all screened participants with an Edinburgh Postpartum Depression Scale score, 15 (6.98%) had a Edinburgh Postpartum Depression Scale score of less than 12 and an Edinburgh Postpartum Depression Scale -3A score greater than six, and would have not received follow up care if only screened for postpartum depression. Midwife participants expressed heightened awareness of the need to screen and felt screening was easy to integrate into clinical practice. CONCLUSIONS The Edinburgh Postpartum Depression Scale -3A is a valid, easy-to-use tool which should be considered for use in clinical practice. Modification of the electronic health record can serve as an important impetus triggering screening and treatment. It is important that clinicians are educated on the prevalence of postpartum anxiety, its risk factors, symptoms and implications.
Collapse
Affiliation(s)
- Sarah Toler
- Louise Herrington School of Nursing, Baylor University, Dallas, TX 75211, United States .
| | - Susan Stapleton
- Research Committee Chair, American Association of Birth Centers, Perkiomenville, PA 180474, United States
| | - Kim Kertsburg
- Licensed Clinical Social Worker, Dallas Postpartum Support, Dallas, TX 75231, United States.
| | - Tiffany J Callahan
- Computational Bioscience, University of Colorado Denver Anschutz Medical Campus, Aurora, CO 80045, United States.
| | - Marie Hastings-Tolsma
- Louise Herrington School of Nursing, Baylor University, Dallas, TX 75246, United States .
| |
Collapse
|
38
|
Cohen KB, Xia J, Zweigenbaum P, Callahan TJ, Hargraves O, Goss F, Ide N, Névéol A, Grouin C, Hunter LE. Three Dimensions of Reproducibility in Natural Language Processing. LREC Int Conf Lang Resour Eval 2018; 2018:156-165. [PMID: 29911205 PMCID: PMC5998676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Despite considerable recent attention to problems with reproducibility of scientific research, there is a striking lack of agreement about the definition of the term. That is a problem, because the lack of a consensus definition makes it difficult to compare studies of reproducibility, and thus to have even a broad overview of the state of the issue in natural language processing. This paper proposes an ontology of reproducibility in that field. Its goal is to enhance both future research and communication about the topic, and retrospective meta-analyses. We show that three dimensions of reproducibility, corresponding to three kinds of claims in natural language processing papers, can account for a variety of types of research reports. These dimensions are reproducibility of a conclusion, of a finding, and of a value. Three biomedical natural language processing papers by the authors of this paper are analyzed with respect to these dimensions.
Collapse
Affiliation(s)
- K Bretonnel Cohen
- Computational Bioscience Program, University of Colorado School of Medicine
- LIMSI, CNRS, Université Paris-Saclay
| | | | | | - Tiffany J Callahan
- Computational Bioscience Program, University of Colorado School of Medicine
| | | | - Foster Goss
- Department of Emergency Medicine, University of Colorado
| | | | | | | | - Lawrence E Hunter
- Computational Bioscience Program, University of Colorado School of Medicine
| |
Collapse
|
39
|
McNeely HL, Ream TL, Thrasher JM, Dziadkowiec O, Callahan TJ. Utilization of a biomedical device (VeinViewer ® ) to assist with peripheral intravenous catheter (PIV) insertion for pediatric nurses. J SPEC PEDIATR NURS 2018; 23:e12208. [PMID: 29427533 PMCID: PMC8056604 DOI: 10.1111/jspn.12208] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Revised: 11/09/2017] [Accepted: 12/14/2017] [Indexed: 11/29/2022]
Abstract
PURPOSE Vascular access in pediatric patients can be challenging even with the currently available technological resources. This nurse-driven research study explored time, cost, and resources for intravenous access to determine if a biomedical device, VeinViewer® Vision, would facilitate improvements in pediatric access. In addition, this study looked at nurse perceptions of skills and confidence around intravenous insertion and if the use of the VeinViewer® impacted these perceptions. Literature examining pediatric intravenous access success rates compared with nurse perceived skills and confidence is lacking. DESIGN Nonblinded randomized control trial of pediatric nurses working in an acute care hospital setting. METHODS A preliminary needs assessment solicited feedback from nurses regarding their practice, perceived skills, and confidence with placing peripheral intravenous catheters (PIVs). Due to the results of the preliminary needs assessment, a research study was designed and 40 nurses were recruited to participate. The nurses were randomized into either a VeinViewer® or standard practice group. Nurse participants placed intravenous catheters on hospitalized pediatric patients using established procedures while tracking data for the study. RESULTS Needs assessment showed a majority of nurses felt a biomedical device would be helpful in building their intravenous insertion skills and their confidence. The study results did not demonstrate any clinically significant differences between VeinViewer® use and standard practice for intravenous catheter insertion in pediatric patients for success of placement, number of attempts, or overall cost. In addition, no difference was noted between nurses in either group on perceived skills or confidence with insertion of PIVs. PRACTICE IMPLICATIONS The ongoing need for resources focused on building nurse skills and confidence for PIV insertion was highlighted and organizations should continue to direct efforts toward developing skills and competency for staff that are responsible for pediatric vascular access. This study illustrates the importance of data-driven decision-making for expensive hospital-funded equipment purchases. This nursing led research study highlights how perceptions do not always align with outcomes. The lessons gleaned from this study may aid in decision-making around pediatric intravenous access practice.
Collapse
Affiliation(s)
- Heidi L McNeely
- Clinical Nurse Specialist, Children's Hospital Colorado, Aurora, Colorado, USA
| | - Theresa L Ream
- Charge Nurse Liaison, Children's Hospital Colorado, Aurora, Colorado, USA
| | - Jodi M Thrasher
- Clinical Practice Specialist, Children's Hospital Colorado, Aurora, Colorado, USA
| | - Oliwier Dziadkowiec
- Center for Research & Nursing Scholarship University of Colorado Denver College of Nursing, Aurora, Colorado, USA
| | - Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Denver Anschutz Medical Campus, Aurora, Colorado, USA
| |
Collapse
|
40
|
Callahan TJ, Baumgartner WA, Bada M, Stefanski AL, Tripodi I, White EK, Hunter LE. OWL-NETS: Transforming OWL Representations for Improved Network Inference. Pac Symp Biocomput 2018; 23:133-144. [PMID: 29218876 PMCID: PMC5737627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Our knowledge of the biological mechanisms underlying complex human disease is largely incomplete. While Semantic Web technologies, such as the Web Ontology Language (OWL), provide powerful techniques for representing existing knowledge, well-established OWL reasoners are unable to account for missing or uncertain knowledge. The application of inductive inference methods, like machine learning and network inference are vital for extending our current knowledge. Therefore, robust methods which facilitate inductive inference on rich OWL-encoded knowledge are needed. Here, we propose OWL-NETS (NEtwork Transformation for Statistical learning), a novel computational method that reversibly abstracts OWL-encoded biomedical knowledge into a network representation tailored for network inference. Using several examples built with the Open Biomedical Ontologies, we show that OWL-NETS can leverage existing ontology-based knowledge representations and network inference methods to generate novel, biologically-relevant hypotheses. Further, the lossless transformation of OWL-NETS allows for seamless integration of inferred edges back into the original knowledge base, extending its coverage and completeness.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Computational Bioscience Program, University of Colorado Denver Anschutz Medical Campus, Aurora, CO 80045, USA,
| | | | | | | | | | | | | |
Collapse
|
41
|
Callahan TJ, Bauck AE, Bertoch D, Brown J, Khare R, Ryan PB, Staab J, Zozus MN, Kahn MG. A Comparison Of Data Quality Assessment Checks In Six Data Sharing Networks. eGEMs (Generating Evidence & Methods to improve patient outcomes) 2017. [DOI: 10.13063/2327-9214.1287] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
42
|
Yadav P, Jezek E, Bouillon P, Callahan TJ, Bada M, Hunter LE, Cohen KB. Semantic Relations in Compound Nouns: Perspectives from Inter-Annotator Agreement. Stud Health Technol Inform 2017; 245:644-648. [PMID: 29295175 PMCID: PMC7781293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Semantic relations have been studied for decades without yet reaching consensus on the set of these relations. However, biomedical language processing and ontologies rely on these relations, so it is important to be able to evaluate their suitability. In this paper we examine the role of inter-annotator agreement in choosing between competing proposals regarding the set of such relations. The experiments consisted of labeling the semantic relations between two elements of noun-noun compounds (e.g. cell migration). Two judges annotated a dataset of terms from the biomedical domain using two competing sets of relations and analyzed the inter-annotator agreement. With no training and little documentation, agreement on this task was fairly high and disagreements were consistent. The results support the utility of the relation-based approach to semantic representation.
Collapse
Affiliation(s)
- Prabha Yadav
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, Colorado 80045, USA
| | | | - Pierrette Bouillon
- Faculté de Traduction et d’Interprétation, Université de Genève, Switzerland
| | - Tiffany J. Callahan
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, Colorado 80045, USA
| | - Michael Bada
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, Colorado 80045, USA
| | - Lawrence E. Hunter
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, Colorado 80045, USA
| | - K. Bretonnel Cohen
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, Colorado 80045, USA
| |
Collapse
|
43
|
Kahn MG, Callahan TJ, Barnard J, Bauck AE, Brown J, Davidson BN, Estiri H, Goerg C, Holve E, Johnson SG, Liaw ST, Hamilton-Lopez M, Meeker D, Ong TC, Ryan P, Shang N, Weiskopf NG, Weng C, Zozus MN, Schilling L. A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data. ACTA ACUST UNITED AC 2016; 4:1244. [PMID: 27713905 PMCID: PMC5051581 DOI: 10.13063/2327-9214.1244] [Citation(s) in RCA: 176] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Objective: Harmonized data quality (DQ) assessment terms, methods, and reporting practices can establish a common understanding of the strengths and limitations of electronic health record (EHR) data for operational analytics, quality improvement, and research. Existing published DQ terms were harmonized to a comprehensive unified terminology with definitions and examples and organized into a conceptual framework to support a common approach to defining whether EHR data is ‘fit’ for specific uses. Materials and Methods: DQ publications, informatics and analytics experts, managers of established DQ programs, and operational manuals from several mature EHR-based research networks were reviewed to identify potential DQ terms and categories. Two face-to-face stakeholder meetings were used to vet an initial set of DQ terms and definitions that were grouped into an overall conceptual framework. Feedback received from data producers and users was used to construct a draft set of harmonized DQ terms and categories. Multiple rounds of iterative refinement resulted in a set of terms and organizing framework consisting of DQ categories, subcategories, terms, definitions, and examples. The harmonized terminology and logical framework’s inclusiveness was evaluated against ten published DQ terminologies. Results: Existing DQ terms were harmonized and organized into a framework by defining three DQ categories: (1) Conformance (2) Completeness and (3) Plausibility and two DQ assessment contexts: (1) Verification and (2) Validation. Conformance and Plausibility categories were further divided into subcategories. Each category and subcategory was defined with respect to whether the data may be verified with organizational data, or validated against an accepted gold standard, depending on proposed context and uses. The coverage of the harmonized DQ terminology was validated by successfully aligning to multiple published DQ terminologies. Discussion: Existing DQ concepts, community input, and expert review informed the development of a distinct set of terms, organized into categories and subcategories. The resulting DQ terms successfully encompassed a wide range of disparate DQ terminologies. Operational definitions were developed to provide guidance for implementing DQ assessment procedures. The resulting structure is an inclusive DQ framework for standardizing DQ assessment and reporting. While our analysis focused on the DQ issues often found in EHR data, the new terminology may be applicable to a wide range of electronic health data such as administrative, research, and patient-reported data. Conclusion: A consistent, common DQ terminology, organized into a logical framework, is an initial step in enabling data owners and users, patients, and policy makers to evaluate and communicate data quality findings in a well-defined manner with a shared vocabulary. Future work will leverage the framework and terminology to develop reusable data quality assessment and reporting methods.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Hossein Estiri
- University of Washington, Institute of Translational Health Sciences
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Olson J, Aldrich H, Callahan TJ, Matthews EE, Gance-Cleveland B. Characterization of Childhood Obesity and Behavioral Factors. J Pediatr Health Care 2016; 30:444-52. [PMID: 26614274 PMCID: PMC7783778 DOI: 10.1016/j.pedhc.2015.10.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/09/2015] [Accepted: 10/23/2015] [Indexed: 10/22/2022]
Abstract
INTRODUCTION Childhood obesity is a major public health threat in the United States. Recent data indicate that 34.2% of children ages 6 to 11 years are overweight or obese. The purpose of this study is to describe childhood obesity levels and identify risk behaviors in two school-based health centers in Michigan, one urban and one rural. METHODS This study is a secondary data analysis from a multicenter comparative effectiveness trial. Multiple logistic regression was used to examine behavioral factors associated with overweight/obesity in children. RESULTS In this sample (n = 105), 41.9% were obese and 16.2% were overweight. The duration of sleep per night (p = .04) and the frequency of eating breakfast (p = .04) were significant predictors of being overweight/obese. DISCUSSION Health care providers in school-based health centers must be comfortable assessing, preventing, and treating childhood obesity in this high-risk group of patients. Interventions should encourage children to eat breakfast daily and to get adequate sleep.
Collapse
|
45
|
Thayer RE, Montanaro E, Weiland BJ, Callahan TJ, Bryan AD. Exploring the relationship of functional network connectivity to latent trajectories of alcohol use and risky sex. Curr HIV Res 2015; 12:293-300. [PMID: 25053362 DOI: 10.2174/1570162x12666140721124441] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2013] [Revised: 04/06/2014] [Accepted: 04/06/2014] [Indexed: 11/22/2022]
Abstract
Alcohol use is a major risk factor associated with unprotected sexual behavior, leading to higher risk of sexually transmitted infections (STI) including the human immunodeficiency virus (HIV). Emerging largely cross-sectional data suggest functional network connectivity strength is associated with problematic alcohol use, and as evidence supports a relationship between risky sexual behaviors and alcohol use, we hypothesized that functional connectivity might be associated with both categories of risk behavior. As part of a sexual risk reduction intervention study, juvenile justice-involved adolescents (N = 239) underwent a baseline functional magnetic resonance imaging scan and completed questionnaires about their alcohol use and risky sexual behavior at 3-month intervals over 12 months of follow up. To test both cross-sectional and longitudinal relationships between alcohol use and sexual risk behaviors, we estimated a parallel process latent growth model that simultaneously modeled the trajectories of alcohol use and sexual risk behavior. Functional connectivity strength was included as an exogenous variable to evaluate its relationship with level of risk and change in risk over time in both behaviors. Associations were found between baseline alcohol use and risky sex, and between longitudinal trajectories of alcohol use and risky sex. Network functional connectivity strength of the dorsal default mode network was associated with initial and longitudinal alcohol use, which may suggest that self-awareness of the effects of alcohol could serve as a useful target to decrease subsequent risky sexual behavior in adolescence.
Collapse
Affiliation(s)
| | | | | | | | - Angela D Bryan
- Department of Psychology & Neuroscience, University of Colorado Boulder, Muenzinger D244, 345 UCB, Boulder, CO 80309-0345, USA.
| |
Collapse
|
46
|
Thayer RE, Callahan TJ, Weiland BJ, Hutchison KE, Bryan AD. Associations between fractional anisotropy and problematic alcohol use in juvenile justice-involved adolescents. Am J Drug Alcohol Abuse 2014; 39:365-71. [PMID: 24200206 DOI: 10.3109/00952990.2013.834909] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
BACKGROUND Studies have shown associations between heavy alcohol use and white matter alterations in adolescence. Youth involved with the juvenile justice system engage in high levels of risk behavior generally and alcohol use in particular as compared to their non-justice-involved peers. OBJECTIVES This study explored white matter integrity among justice-involved adolescents. Analyses examined fractional anisotropy (FA) and mean diffusivity (MD) between adolescents with low and high levels of problematic alcohol use as assessed by the Alcohol Use Disorders Identification Test (AUDIT). METHODS Participants (N = 125; 80% male; 14-18 years) completed measures assessing psychological status and substance use followed by diffusion tensor imaging (DTI). DTI data for low (n = 51) and high AUDIT (n = 74) adolescents were subjected to cluster-based group comparisons on skeletonized FA and MD data. RESULTS Whole-brain analyses revealed significantly lower FA in clusters in the right and left posterior corona radiata (PCR) and right superior longitudinal fasciculus (SLF) in the high AUDIT group, as well as one cluster in the right anterior corona radiata that showed higher FA in the high AUDIT group. No differences in MD were identified. Exploratory analyses correlated cluster FA with measures of additional risk factors. FA in the right SLF and left PCR was negatively associated with impulsivity. CONCLUSION Justice-involved adolescents with alcohol use problems generally showed poorer FA than their low problematic alcohol use peers. Future research should aim to better understand the nature of the relationship between white matter development and alcohol use specifically as well as risk behavior more generally.
Collapse
Affiliation(s)
- Rachel E Thayer
- Department of Psychology and Neuroscience, University of Colorado Boulder , Boulder, CO , USA
| | | | | | | | | |
Collapse
|
47
|
Abstract
OBJECTIVE The purpose of this study was to economically evaluate Project MARS (Motivating Adolescents to Reduce Sexual Risk; T. J. Callahan, E. A. Montanaro, R. E. Magnan, & A. D. Bryan, 2013, "Project MARS: Design of a multi-behavior intervention trial for justice-involved youth," Translational Behavioral Medicine, Vol. 3, pp. 122-130), an ongoing, randomized, sexual-risk-reduction intervention for justice-involved youth. We consider the effect of including viral STIs in the economic analysis, and explore the impact of the MARS intervention on the perceived cost of acquiring STIs to justice-involved youth. METHOD 206 participants, ages 14 to 18, participated in a sexual-risk-reduction intervention that included screening and treatment for chlamydia and gonorrhea. A Bernoulli probability model was used to estimate averted STIs attributable to the MARS intervention. The economic benefit of averted STIs was monetized using the direct medical cost of treatment. In addition, we used a contingent valuation (willingness-to-pay) model to investigate the impact of the Project MARS on participants' perceived cost of acquiring an STI. RESULTS Using the standard outcome domains typically used to evaluate STI interventions, Project MARS resulted in a reduction of $2.08 in direct medical costs for every $1 spent. When viral STIs were added to the economic model, a considerable increase in averted direct medical costs ($2.68 for every $1 spent) was found. Preliminary contingent valuation estimates suggest that participants' willingness-to-pay for averted STIs significantly increased after receiving the MARS intervention. CONCLUSION From an economic perspective, Project MARS is a worthwhile program to adopt. Future attention should be given to the impact of behavioral interventions on viral infections.
Collapse
Affiliation(s)
- Bern C Dealy
- Department of Economics, University of New Mexico, Albuquerque, NM 87131-0001, USA
| | | | | | | |
Collapse
|
48
|
Callahan TJ, Caldwell Hooper AE, Thayer RE, Magnan RE, Bryan AD. Relationships between marijuana dependence and condom use intentions and behavior among justice-involved adolescents. AIDS Behav 2013; 17:2715-24. [PMID: 23370834 PMCID: PMC3676463 DOI: 10.1007/s10461-013-0417-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
The current study examined the relationships among marijuana dependence, a theoretical model of condom use intentions, and subsequent condom use behavior in justice-involved adolescents. Participants completed baseline measures of prior sexual and substance use behavior. Of the original 720 participants, 649 (90.13 %) completed follow-up measures 6 months later. There were high levels of marijuana use (58.7 % met criteria for dependence) and risky sexual behavior among participants. Baseline model constructs were associated with condom use intentions, and intentions were a significant predictor of condom use at follow-up. Marijuana dependence did not significantly influence the relationships between model constructs, nor did it moderate the relationship of model constructs with subsequent condom use. Findings suggest that the theoretical model of condom use intentions is equally valid regardless of marijuana dependence status, suggesting that interventions to reduce sexual risk behavior among both marijuana dependent and non-dependent justice-involved adolescents can be appropriately based on the model.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Department of Psychology and Neuroscience, University of Colorado Boulder, Campus Box 345, Muenzinger Psychology Building, RM. D244, Boulder, CO, 80309, USA,
| | | | | | | | | |
Collapse
|
49
|
Magnan RE, Callahan TJ, Ladd BO, Claus ED, Hutchison KE, Bryan AD. Evaluating an Integrative Theoretical Framework for HIV Sexual Risk among Juvenile Justice involved Adolescents. J AIDS Clin Res 2013; 4:217. [PMID: 25126447 PMCID: PMC4128495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Juvenile justice involved youth are at great risk for negative outcomes of risky sexual behavior including HIV/AIDS. Given the strong connection between alcohol use and risky sex in this population, it is important to consider alcohol use in interventions designed to decrease risky sexual behavior. This paper provides support for an integrative translational model that incorporates psychosocial, neurobiological, and genetic factors to better predict alcohol-related sexual risk behavior. Specifically, we present the design, methods, and baseline data from a complex randomized control trial, Project SHARP (Sexual Health and Adolescent Risk Prevention) in order to illustrate how this broad array of factors can best predict alcohol-related sexual risk behavior. Participants were justice-involved adolescents (n=284) who completed an fMRI and self-report assessments prior to randomization to either a sexual risk plus alcohol risk reduction group intervention or to an information-only contact control group intervention. Structural equation modeling was utilized and findings supported the hypothesized relationships in the translational model. Preliminary data suggest that interventions among justice-involved adolescents targeting alcohol-related sexual risk behavior may be more effective if a biopsychosocial approach is considered.
Collapse
Affiliation(s)
- Renee E Magnan
- Department of Psychology, Washington State University Vancouver, USA,Corresponding author: Renee Magnan, Department of Psychology, WSU Vancouver, 14204 NE Salmon Creek Ave, Vancouver, WA 98686, USA, Tel: 360-546-9403; Fax: 360-546-9038;
| | | | | | | | | | - Angela D Bryan
- University of Colorado Boulder, USA,Center on Alcoholism, Substance Abuse and Addictions, USA
| |
Collapse
|
50
|
Abstract
BACKGROUND Marijuana and alcohol use are associated with increased sexual risk behavior among justice-involved youth. A multi-behavior intervention may reduce all three risk behaviors. PURPOSE To examine the relationships among multiple risk behaviors and the Theory of Planned Behavior (TPB) constructs guiding the development of the MARS (Motivating Adolescents to Reduce Sexual risk) intervention. We describe the MARS study design to inform the process through which a multi-behavior intervention trial can be implemented and evaluated. METHODS Participants completed questionnaires prior to randomization to one of three interventions. RESULTS Relationships were found between TPB constructs and risk behavior. A single latent variable was inadequate to capture all three risk behaviors. CONCLUSIONS Interventions to reduce sexual risk behavior can include content related to the role of substance use in influencing sexual risk behavior with only minimal modifications to the curriculum, and preliminary data suggest a common theory can apply across risk behaviors.
Collapse
|