1
|
Lee K, Mai Y, Liu Z, Raja K, Jun T, Ma M, Wang T, Ai L, Calay E, Oh W, Schadt E, Wang X. CriteriaMapper: establishing the automatic identification of clinical trial cohorts from electronic health records by matching normalized eligibility criteria and patient clinical characteristics. Sci Rep 2024; 14:25387. [PMID: 39455879 PMCID: PMC11511882 DOI: 10.1038/s41598-024-77447-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 10/22/2024] [Indexed: 10/28/2024] Open
Abstract
The use of electronic health records (EHRs) holds the potential to enhance clinical trial activities. However, the identification of eligible patients within EHRs presents considerable challenges. We aimed to develop a CriteriaMapper system for phenotyping eligibility criteria, enabling the identification of patients from EHRs with clinical characteristics that match those criteria. We utilized clinical trial eligibility criteria and patient EHRs from the Mount Sinai Database. The CriteriaMapper system was developed to normalize the criteria using national standard terminologies and in-house databases, facilitating computability and queryability to bridge clinical trial criteria and EHRs. The system employed rule-based pattern recognition and manual annotation. Our system normalized 367 out of 640 unique eligibility criteria attributes, covering various medical conditions including non-small cell lung cancer, small cell lung cancer, prostate cancer, breast cancer, multiple myeloma, ulcerative colitis, Crohn's disease, non-alcoholic steatohepatitis, and sickle cell anemia. About 174 criteria were encoded with standard terminologies and 193 were normalized using the in-house reference tables. The agreement between automated and manual normalization was high (Cohen's Kappa = 0.82), and patient matching demonstrated a 0.94 F1 score. Our system has proven effective on EHRs from multiple institutions, showing broad applicability and promising improved clinical trial processes, leading to better patient selection, and enhanced clinical research outcomes.
Collapse
Affiliation(s)
- K Lee
- GeneDx (Sema4), 333 Ludlow Street, Stamford, CT, 06902, USA.
| | - Y Mai
- GeneDx (Sema4), 333 Ludlow Street, Stamford, CT, 06902, USA
| | - Z Liu
- GeneDx (Sema4), 333 Ludlow Street, Stamford, CT, 06902, USA
| | - K Raja
- GeneDx (Sema4), 333 Ludlow Street, Stamford, CT, 06902, USA
| | - T Jun
- GeneDx (Sema4), 333 Ludlow Street, Stamford, CT, 06902, USA
| | - M Ma
- GeneDx (Sema4), 333 Ludlow Street, Stamford, CT, 06902, USA
| | - T Wang
- GeneDx (Sema4), 333 Ludlow Street, Stamford, CT, 06902, USA
| | - L Ai
- GeneDx (Sema4), 333 Ludlow Street, Stamford, CT, 06902, USA
| | - E Calay
- GeneDx (Sema4), 333 Ludlow Street, Stamford, CT, 06902, USA
| | - W Oh
- GeneDx (Sema4), 333 Ludlow Street, Stamford, CT, 06902, USA
- Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY, 10029, USA
| | - E Schadt
- GeneDx (Sema4), 333 Ludlow Street, Stamford, CT, 06902, USA
- Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY, 10029, USA
| | - X Wang
- GeneDx (Sema4), 333 Ludlow Street, Stamford, CT, 06902, USA.
| |
Collapse
|
2
|
Slater L, Gkoutos GV, Schofield PN, Hoehndorf R. Using AberOWL for fast and scalable reasoning over BioPortal ontologies. J Biomed Semantics 2016; 7:49. [PMID: 27502585 PMCID: PMC4976511 DOI: 10.1186/s13326-016-0090-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2016] [Accepted: 07/08/2016] [Indexed: 11/30/2022] Open
Abstract
Background Reasoning over biomedical ontologies using their OWL semantics has traditionally been a challenging task due to the high theoretical complexity of OWL-based automated reasoning. As a consequence, ontology repositories, as well as most other tools utilizing ontologies, either provide access to ontologies without use of automated reasoning, or limit the number of ontologies for which automated reasoning-based access is provided. Methods We apply the AberOWL infrastructure to provide automated reasoning-based access to all accessible and consistent ontologies in BioPortal (368 ontologies). We perform an extensive performance evaluation to determine query times, both for queries of different complexity and for queries that are performed in parallel over the ontologies. Results and conclusions We demonstrate that, with the exception of a few ontologies, even complex and parallel queries can now be answered in milliseconds, therefore allowing automated reasoning to be used on a large scale, to run in parallel, and with rapid response times.
Collapse
Affiliation(s)
- Luke Slater
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT, United Kingdom.
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT, United Kingdom.,Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, Birmingham, B15 2TT, United Kingdom
| | - Paul N Schofield
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, England, CB2 3EG, United Kingdom
| | - Robert Hoehndorf
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia
| |
Collapse
|
3
|
Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan V, Turner S, Swainston N, Mendes P, Steinbeck C. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Res 2015; 44:D1214-9. [PMID: 26467479 PMCID: PMC4702775 DOI: 10.1093/nar/gkv1031] [Citation(s) in RCA: 622] [Impact Index Per Article: 62.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Accepted: 09/28/2015] [Indexed: 12/31/2022] Open
Abstract
ChEBI is a database and ontology containing information about chemical entities of biological interest. It currently includes over 46,000 entries, each of which is classified within the ontology and assigned multiple annotations including (where relevant) a chemical structure, database cross-references, synonyms and literature citations. All content is freely available and can be accessed online at http://www.ebi.ac.uk/chebi. In this update paper, we describe recent improvements and additions to the ChEBI offering. We have substantially extended our collection of endogenous metabolites for several organisms including human, mouse, Escherichia coli and yeast. Our front-end has also been reworked and updated, improving the user experience, removing our dependency on Java applets in favour of embedded JavaScript components and moving from a monthly release update to a 'live' website. Programmatic access has been improved by the introduction of a library, libChEBI, in Java, Python and Matlab. Furthermore, we have added two new tools, namely an analysis tool, BiNChE, and a query tool for the ontology, OntoQuery.
Collapse
Affiliation(s)
- Janna Hastings
- Cheminformatics and Metabolism, European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Gareth Owen
- Cheminformatics and Metabolism, European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Adriano Dekker
- Cheminformatics and Metabolism, European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Marcus Ennis
- Cheminformatics and Metabolism, European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Namrata Kale
- Cheminformatics and Metabolism, European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Venkatesh Muthukrishnan
- Cheminformatics and Metabolism, European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Steve Turner
- Cheminformatics and Metabolism, European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Neil Swainston
- Manchester Centre for Integrative Systems Biology, University of Manchester, UK
| | - Pedro Mendes
- Manchester Centre for Integrative Systems Biology, University of Manchester, UK
| | - Christoph Steinbeck
- Cheminformatics and Metabolism, European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| |
Collapse
|
4
|
Hoehndorf R, Slater L, Schofield PN, Gkoutos GV. Aber-OWL: a framework for ontology-based data access in biology. BMC Bioinformatics 2015; 16:26. [PMID: 25627673 PMCID: PMC4384359 DOI: 10.1186/s12859-015-0456-9] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2014] [Accepted: 01/09/2015] [Indexed: 11/10/2022] Open
Abstract
Background Many ontologies have been developed in biology and these ontologies increasingly contain large volumes of formalized knowledge commonly expressed in the Web Ontology Language (OWL). Computational access to the knowledge contained within these ontologies relies on the use of automated reasoning. Results We have developed the Aber-OWL infrastructure that provides reasoning services for bio-ontologies. Aber-OWL consists of an ontology repository, a set of web services and web interfaces that enable ontology-based semantic access to biological data and literature. Aber-OWL is freely available at http://aber-owl.net. Conclusions Aber-OWL provides a framework for automatically accessing information that is annotated with ontologies or contains terms used to label classes in ontologies. When using Aber-OWL, access to ontologies and data annotated with them is not merely based on class names or identifiers but rather on the knowledge the ontologies contain and the inferences that can be drawn from it.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia. .,Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia.
| | - Luke Slater
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia. .,Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955-6900, Saudi Arabia. .,Department of Computer Science, Aberystwyth University, Llandinam Building, Aberystwyth, SY23 3DB, UK.
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, UK.
| | - Georgios V Gkoutos
- Department of Computer Science, Aberystwyth University, Llandinam Building, Aberystwyth, SY23 3DB, UK.
| |
Collapse
|
5
|
Magka D, Krötzsch M, Horrocks I. A rule-based ontological framework for the classification of molecules. J Biomed Semantics 2014; 5:17. [PMID: 24735706 PMCID: PMC4040517 DOI: 10.1186/2041-1480-5-17] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2013] [Accepted: 01/15/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A variety of key activities within life sciences research involves integrating and intelligently managing large amounts of biochemical information. Semantic technologies provide an intuitive way to organise and sift through these rapidly growing datasets via the design and maintenance of ontology-supported knowledge bases. To this end, OWL-a W3C standard declarative language- has been extensively used in the deployment of biochemical ontologies that can be conveniently organised using the classification facilities of OWL-based tools. One of the most established ontologies for the chemical domain is ChEBI, an open-access dictionary of molecular entities that supplies high quality annotation and taxonomical information for biologically relevant compounds. However, ChEBI is being manually expanded which hinders its potential to grow due to the limited availability of human resources. RESULTS In this work, we describe a prototype that performs automatic classification of chemical compounds. The software we present implements a sound and complete reasoning procedure of a formalism that extends datalog and builds upon an off-the-shelf deductive database system. We capture a wide range of chemical classes that are not expressible with OWL-based formalisms such as cyclic molecules, saturated molecules and alkanes. Furthermore, we describe a surface 'less-logician-like' syntax that allows application experts to create ontological descriptions of complex biochemical objects without prior knowledge of logic. In terms of performance, a noticeable improvement is observed in comparison with previous approaches. Our evaluation has discovered subsumptions that are missing from the manually curated ChEBI ontology as well as discrepancies with respect to existing subclass relations. We illustrate thus the potential of an ontology language suitable for the life sciences domain that exhibits a favourable balance between expressive power and practical feasibility. CONCLUSIONS Our proposed methodology can form the basis of an ontology-mediated application to assist biocurators in the production of complete and error-free taxonomies. Moreover, such a tool could contribute to a more rapid development of the ChEBI ontology and to the efforts of the ChEBI team to make annotated chemical datasets available to the public. From a modelling point of view, our approach could stimulate the adoption of a different and expressive reasoning paradigm based on rules for which state-of-the-art and highly optimised reasoners are available; it could thus pave the way for the representation of a broader spectrum of life sciences and biomedical knowledge.
Collapse
Affiliation(s)
- Despoina Magka
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Markus Krötzsch
- Department of Computer Science, Technical University of Dresden, Dresden, Germany
| | - Ian Horrocks
- Department of Computer Science, University of Oxford, Oxford, UK
| |
Collapse
|
6
|
Hoehndorf R, Haendel M, Stevens R, Rebholz-Schuhmann D. Thematic series on biomedical ontologies in JBMS: challenges and new directions. J Biomed Semantics 2014; 5:15. [PMID: 24602198 PMCID: PMC4006457 DOI: 10.1186/2041-1480-5-15] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2014] [Accepted: 02/09/2014] [Indexed: 01/08/2023] Open
Abstract
Over the past 15 years, the biomedical research community has increased its efforts to produce ontologies encoding biomedical knowledge, and to provide the corresponding infrastructure to maintain them. As ontologies are becoming a central part of biological and biomedical research, a communication channel to publish frequent updates and latest developments on them would be an advantage. Here, we introduce the JBMS thematic series on Biomedical Ontologies. The aim of the series is to disseminate the latest developments in research on biomedical ontologies and provide a venue for publishing newly developed ontologies, updates to existing ontologies as well as methodological advances, and selected contributions from conferences and workshops. We aim to give this thematic series a central role in the exploration of ongoing research in biomedical ontologies and intend to work closely together with the research community towards this aim. Researchers and working groups are encouraged to provide feedback on novel developments and special topics to be integrated into the existing publication cycles.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Department of Computer Science, Aberystwyth University, Llandinam Building, SY23 3DB Aberystwyth, UK
| | - Melissa Haendel
- OHSU Library and Department of Medical Informatics, Portland, Oregon, USA
- Department of Medical Informatics and Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| | - Robert Stevens
- School of Computer Science, The University of Manchester, Oxford Road, M13 9PL Manchester, UK
| | - Dietrich Rebholz-Schuhmann
- Department of Computational Linguistics, University of Zürich, Binzmühlestrasse 14, 8050 Zürich, Switzerland
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|