1
|
Purpura A, Bettencourt-Silva J, Mulligan N, Yadete T, Njoku K, Liu J, Stappenbeck T. Automatic Mapping of Terminology Items with Transformers. AMIA Annu Symp Proc 2024; 2023:599-607. [PMID: 38222370 PMCID: PMC10785948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
Biomedical ontologies are a key component in many systems for the analysis of textual clinical data. They are employed to organize information about a certain domain relying on a hierarchy of different classes. Each class maps a concept to items in a terminology developed by domain experts. These mappings are then leveraged to organize the information extracted by Natural Language Processing (NLP) models to build knowledge graphs for inferences. The creation of these associations, however, requires extensive manual review. In this paper, we present an automated approach and repeatable framework to learn a mapping between ontology classes and terminology terms derived from vocabularies in the Unified Medical Language System (UMLS) metathesaurus. According to our evaluation, the proposed system achieves a performance close to humans and provides a substantial improvement over existing systems developed by the National Library of Medicine to assist researchers through this process.
Collapse
Affiliation(s)
| | | | | | - Tesfaye Yadete
- Department of Inflammation and Immunity, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Kingsley Njoku
- Department of Internal Medicine, Morehouse School of Medicine, Atlanta, GA, USA
| | - Julia Liu
- Department of Internal Medicine, Morehouse School of Medicine, Atlanta, GA, USA
| | - Thaddeus Stappenbeck
- Department of Inflammation and Immunity, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| |
Collapse
|
2
|
Kartoun U, Njoku K, Yadete T, Ravid S, Koski E, Ogallo W, Bettencourt-Silva J, Mulligan N, Hu J, Liu J, Stappenbeck T, Anand V. Subtyping Gastrointestinal Surgical Outcomes from Real World Data: A Comprehensive Analysis of UK Biobank. AMIA Annu Symp Proc 2024; 2023:426-435. [PMID: 38222374 PMCID: PMC10785930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
Chronic gastrointestinal (GI) conditions, such as inflammatory bowel diseases (IBD), offer a promising opportunity to create classification systems that can enhance the accuracy of predicting the most effective therapies and prognosis for each patient. Here, we present a novel methodology to explore disease subtypes using our open-sourced BiomedSciAI toolkit. Applying methods available in this toolkit on the UK Biobank, including subpopulation-based feature selection and multi-dimensional subset scanning, we aimed to discover unique subgroups from GI surgery cohorts. Of a 12,073-patient cohort, a subgroup of 440 IBD patients was discovered with an increased risk of a subsequent GI surgery (OR: 2.21, 95% CI [1.81-2.69]). We iteratively demonstrate the discovery process using an additional cohort (with a narrower definition of GI surgery). Our results show that the iterative process can refine the subgroup discovery process and generate novel hypotheses to investigate determinants of treatment response.
Collapse
Affiliation(s)
| | - Kingsley Njoku
- Department of Internal Medicine, Morehouse School of Medicine, Atlanta, GA, USA
| | - Tesfaye Yadete
- Department of Inflammation and Immunity, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Sivan Ravid
- Healthcare Informatics, IBM Research-Haifa, Mount Carmel Haifa, Israel
| | | | | | | | | | | | - Julia Liu
- Department of Internal Medicine, Morehouse School of Medicine, Atlanta, GA, USA
| | - Thaddeus Stappenbeck
- Department of Inflammation and Immunity, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | | |
Collapse
|
3
|
Upegui H, Bettencourt-Silva J. Identifying social aspects in real world data to support health outcomes. Saf Health Work 2022. [DOI: 10.1016/j.shaw.2021.12.1485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open
|
4
|
Sbodio ML, Mulligan N, Speichert S, Lopez V, Bettencourt-Silva J. Encoding Health Records into Pathway Representations for Deep Learning. Stud Health Technol Inform 2021; 287:8-12. [PMID: 34795069 DOI: 10.3233/shti210800] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
There is a growing trend in building deep learning patient representations from health records to obtain a comprehensive view of a patient's data for machine learning tasks. This paper proposes a reproducible approach to generate patient pathways from health records and to transform them into a machine-processable image-like structure useful for deep learning tasks. Based on this approach, we generated over a million pathways from FAIR synthetic health records and used them to train a convolutional neural network. Our initial experiments show the accuracy of the CNN on a prediction task is comparable or better than other autoencoders trained on the same data, while requiring significantly less computational resources for training. We also assess the impact of the size of the training dataset on autoencoders performances. The source code for generating pathways from health records is provided as open source.
Collapse
|
5
|
Bettencourt-Silva J, Cullen C, Mulligan N, Di Bari A, Gleize M. Combining digital trends and scientific literature: a case study on Social Determinants of COVID-19. Eur J Public Health 2021. [PMCID: PMC8574790 DOI: 10.1093/eurpub/ckab164.280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Background Digital sources such as Internet-based search tools have created prospects for augmenting traditional public health surveillance. These can support the identification of emerging public health concerns, intervention evaluation or policy making. Social Determinants of Health(SDoH) are key factors linked to health outcomes yet they are poorly recorded and seldom used. Scientific papers have reported SDoH impacts on various outcomes yet this data has not been explored for population trend analyses. We present an analysis of our approach to combining insights mined from PubMed with online search trends. Methods PubMed-2019 database was used to build a knowledge graph(KG) of connected health and social concepts based on relative co-occurrences found throughout the abstracts. We then observed Google search trends for 10 SDoH concepts at the outset of the 2020 pandemic (March-May) and compared them with the previous 4 years. For concepts with increasing trends, the KG was used to identify other potentially relevant concepts. Subsequently we continued observing online trends for a further 12 months in order to examine the KG concepts' trends. Results Our analysis showed Food Security and Unemployment trended the highest compared to previous years. These became the seed concepts used to traverse the KG, where the top 20 relevant concepts were identified for each seed. An analysis of the 8 concepts overlapping both seeds showed 6 with increasing trends during follow-up. Correlation coefficients were computed and positive relations observed (Unemployment+Anxiety, r=.72; Distress+Food Security, r =.62). Data will be discussed. Conclusions Positive correlations were observed between data from a 2019 PubMed KG and online search trends during COVID-19. These preliminary results suggest value in combining these digital sources to strengthen public health systems. This is important to understand the interactions between health and social factors and identify emerging trends. Key messages Combining data from scientific research papers with online search trends yielded interesting results that may further complement traditional surveillance systems and clinical case ascertainment. Social determinants of health data should play an increasingly important part in complementing public health surveillance systems and in strengthening population trend analyses.
Collapse
Affiliation(s)
| | - C Cullen
- IBM Watson Health, Dublin, Ireland
| | | | | | - M Gleize
- IBM Research Europe, Dublin, Ireland
| |
Collapse
|
6
|
Bettencourt-Silva J, Mulligan N, Cullen C, Kotoulas S. Bridging Clinical and Social Determinants of Health Using Unstructured Data. Stud Health Technol Inform 2018; 255:70-74. [PMID: 30306909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
There is a growing interest in identifying, weighing and accounting for the impact of health determinants that lie outside of the traditional healthcare system, yet there is a remarkable paucity of data and sources to sustain these efforts. Decision support systems would greatly benefit from leveraging models which are able to extend and use such cross-domain knowledge. This paper describes an approach to identify and explore related social and clinical terms based on large corpora of unstructured data. Using word embedding techniques on relevant sources of knowledge, we have identified terms that appear close together in the high-dimensional space. In particular, having created a model with cross-domain knowledge on the social determinants of health, we have been able to demonstrate that it is possible to surface terms in this domain when querying for related clinical terms, thereby creating a bridge between the social and clinical determinants of health. This is a promising approach with significant applicability in decision support efforts in healthcare.
Collapse
|
7
|
Lopez V, Mccarthy G, Bettencourt-Silva J, Sbodio M, Mulligan N, Cucci F, Deparis S, Hennessy C, Yadav N, Kelly K, Olsen R, Cullen C, Kotoulas S. Using Semantic Technologies to Extract Highlights from Care Notes. Stud Health Technol Inform 2017; 245:1331. [PMID: 29295412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
We propose a cognitive system for patient-centric care that leverages and combines natural language processing, semantics, and learning from users over time to support care professionals working with large volumes of patient notes. The proposed methods highlight the entities embedded in the unstructured data to provide a holistic semantic view of an individual. A user-based evaluation is presented, showing consensus between the users and the system.
Collapse
|
8
|
Bettencourt-Silva J, De La Iglesia B, Donell S, Rayward-Smith V. On creating a patient-centric database from multiple Hospital Information Systems. Methods Inf Med 2011; 51:210-20. [PMID: 21818520 DOI: 10.3414/me10-01-0069] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2010] [Accepted: 05/16/2011] [Indexed: 11/09/2022]
Abstract
BACKGROUND The information present in Hospital Information Systems (HIS) is heterogeneous and is used primarily by health practitioners to support and improve patient care. Conducting clinical research, data analyses or knowledge discovery projects using electronic patient data in secondary care centres relies on accurate data collection, which is often an ad-hoc process poorly described in the literature. OBJECTIVES This paper aims at facilitating and expanding on the process of retrieving and collating patient-centric data from multiple HIS for the purpose of creating a research database. The development of a process roadmap for this purpose illustrates and exposes the constraints and drawbacks of undertaking such work in secondary care centres. METHODS A data collection exercise was carried using a combined approach based on segments of well established data mining and knowledge discovery methodologies, previous work on clinical data integration and local expert consultation. A case study on prostate cancer was carried out at an English regional National Health Service (NHS) hospital. RESULTS The process for data retrieval described in this paper allowed patient-centric data, pertaining to the case study on prostate cancer, to be successfully collected from multiple heterogeneous hospital sources, and collated in a format suitable for further clinical research. CONCLUSIONS The data collection exercise described in this paper exposes the lengthy and difficult journey of retrieving and collating patient-centric, multi-source data from a hospital, which is indeed a non-trivial task, and one which will greatly benefit from further attention from researchers and hospital IT management.
Collapse
Affiliation(s)
- J Bettencourt-Silva
- School of Computing Sciences, University of East Anglia, Norwich, United Kingdom.
| | | | | | | |
Collapse
|