Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Rotmensch M, Halpern Y, Tlimat A, Horng S, Sontag D. Learning a Health Knowledge Graph from Electronic Medical Records. Sci Rep 2017;7:5994. [PMID: 28729710 PMCID: PMC5519723 DOI: 10.1038/s41598-017-05778-z] [Citation(s) in RCA: 99] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Accepted: 06/01/2017] [Indexed: 12/03/2022] Open

For:	Rotmensch M, Halpern Y, Tlimat A, Horng S, Sontag D. Learning a Health Knowledge Graph from Electronic Medical Records. Sci Rep 2017;7:5994. [PMID: 28729710 PMCID: PMC5519723 DOI: 10.1038/s41598-017-05778-z] [Citation(s) in RCA: 99] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Accepted: 06/01/2017] [Indexed: 12/03/2022] Open

Number

Cited by Other Article(s)

Hammoud M, Douglas S, Darmach M, Alawneh S, Sanyal S, Kanbour Y. Evaluating the Diagnostic Performance of Symptom Checkers: Clinical Vignette Study. JMIR AI 2024;3:e46875. [PMID: 38875676 PMCID: PMC11091811 DOI: 10.2196/46875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 06/15/2023] [Accepted: 03/02/2024] [Indexed: 06/16/2024]

Abstract

BACKGROUND

Medical self-diagnostic tools (or symptom checkers) are becoming an integral part of digital health and our daily lives, whereby patients are increasingly using them to identify the underlying causes of their symptoms. As such, it is essential to rigorously investigate and comprehensively report the diagnostic performance of symptom checkers using standard clinical and scientific approaches.

OBJECTIVE

This study aims to evaluate and report the accuracies of a few known and new symptom checkers using a standard and transparent methodology, which allows the scientific community to cross-validate and reproduce the reported results, a step much needed in health informatics.

METHODS

We propose a 4-stage experimentation methodology that capitalizes on the standard clinical vignette approach to evaluate 6 symptom checkers. To this end, we developed and peer-reviewed 400 vignettes, each approved by at least 5 out of 7 independent and experienced primary care physicians. To establish a frame of reference and interpret the results of symptom checkers accordingly, we further compared the best-performing symptom checker against 3 primary care physicians with an average experience of 16.6 (SD 9.42) years. To measure accuracy, we used 7 standard metrics, including M1 as a measure of a symptom checker's or a physician's ability to return a vignette's main diagnosis at the top of their differential list, F1-score as a trade-off measure between recall and precision, and Normalized Discounted Cumulative Gain (NDCG) as a measure of a differential list's ranking quality, among others.

RESULTS

The diagnostic accuracies of the 6 tested symptom checkers vary significantly. For instance, the differences in the M1, F1-score, and NDCG results between the best-performing and worst-performing symptom checkers or ranges were 65.3%, 39.2%, and 74.2%, respectively. The same was observed among the participating human physicians, whereby the M1, F1-score, and NDCG ranges were 22.8%, 15.3%, and 21.3%, respectively. When compared against each other, physicians outperformed the best-performing symptom checker by an average of 1.2% using F1-score, whereas the best-performing symptom checker outperformed physicians by averages of 10.2% and 25.1% using M1 and NDCG, respectively.

CONCLUSIONS

The performance variation between symptom checkers is substantial, suggesting that symptom checkers cannot be treated as a single entity. On a different note, the best-performing symptom checker was an artificial intelligence (AI)-based one, shedding light on the promise of AI in improving the diagnostic capabilities of symptom checkers, especially as AI keeps advancing exponentially.

Collapse

Woodman RJ, Koczwara B, Mangoni AA. Applying precision medicine principles to the management of multimorbidity: the utility of comorbidity networks, graph machine learning, and knowledge graphs. Front Med (Lausanne) 2024;10:1302844. [PMID: 38404463 PMCID: PMC10885565 DOI: 10.3389/fmed.2023.1302844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 12/22/2023] [Indexed: 02/27/2024] Open

Abstract

The current management of patients with multimorbidity is suboptimal, with either a single-disease approach to care or treatment guideline adaptations that result in poor adherence due to their complexity. Although this has resulted in calls for more holistic and personalized approaches to prescribing, progress toward these goals has remained slow. With the rapid advancement of machine learning (ML) methods, promising approaches now also exist to accelerate the advance of precision medicine in multimorbidity. These include analyzing disease comorbidity networks, using knowledge graphs that integrate knowledge from different medical domains, and applying network analysis and graph ML. Multimorbidity disease networks have been used to improve disease diagnosis, treatment recommendations, and patient prognosis. Knowledge graphs that combine different medical entities connected by multiple relationship types integrate data from different sources, allowing for complex interactions and creating a continuous flow of information. Network analysis and graph ML can then extract the topology and structure of networks and reveal hidden properties, including disease phenotypes, network hubs, and pathways; predict drugs for repurposing; and determine safe and more holistic treatments. In this article, we describe the basic concepts of creating bipartite and unipartite disease and patient networks and review the use of knowledge graphs, graph algorithms, graph embedding methods, and graph ML within the context of multimorbidity. Specifically, we provide an overview of the application of graph theory for studying multimorbidity, the methods employed to extract knowledge from graphs, and examples of the application of disease networks for determining the structure and pathways of multimorbidity, identifying disease phenotypes, predicting health outcomes, and selecting safe and effective treatments. In today's modern data-hungry, ML-focused world, such network-based techniques are likely to be at the forefront of developing robust clinical decision support tools for safer and more holistic approaches to treating older patients with multimorbidity.

Collapse

Wen J, Zhang T, Ye S, Zhang P, Han R, Chen X, Huang R, Chen A, Li Q. Quantitative patient graph analysis for transient ischemic attack risk factor distribution based on electronic medical records. Heliyon 2024;10:e22766. [PMID: 38163107 PMCID: PMC10755279 DOI: 10.1016/j.heliyon.2023.e22766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 10/26/2023] [Accepted: 11/19/2023] [Indexed: 01/03/2024] Open

Liu Z, Cao Q, Du N, Shu H, Zhong E, Jiang N, Chen Q, Shen Y, Chen K. FIT-graph: A multi-grained evolutionary graph based framework for disease diagnosis. Artif Intell Med 2024;147:102735. [PMID: 38184359 DOI: 10.1016/j.artmed.2023.102735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 10/04/2023] [Accepted: 11/28/2023] [Indexed: 01/08/2024]

Huang L, Chen Q, Lan W. Predicting drug-drug interactions based on multi-view and multichannel attention deep learning. Health Inf Sci Syst 2023;11:50. [PMID: 37941825 PMCID: PMC10628064 DOI: 10.1007/s13755-023-00250-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 09/25/2023] [Indexed: 11/10/2023] Open

Ma X, Wang M, Lin S, Zhang Y, Zhang Y, Ouyang W, Liu X. Knowledge and data-driven prediction of organ failure in critical care patients. Health Inf Sci Syst 2023;11:7. [PMID: 36703901 PMCID: PMC9871106 DOI: 10.1007/s13755-023-00210-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 01/02/2023] [Indexed: 01/24/2023] Open

Abstract

Purpose

The early detection of organ failure mitigates the risk of post-intensive care syndrome and long-term functional impairment. The aim of this study is to predict organ failure in real-time for critical care patients based on a data-driven and knowledge-driven machine learning method (DKM) and provide explanations for the prediction by incorporating a medical knowledge graph.

Methods

The cohort of this study was a subset of the 4,386 adult Intensive Care Unit (ICU) patients from the MIMIC-III dataset collected between 2001 and 2012, and the primary outcome was the Delta Sequential Organ Failure Assessment (SOFA) score. A real-time Delta SOFA score prediction model was developed with two key components: an improved deep learning temporal convolutional network (S-TCN) and a graph-embedding feature extraction method based on a medical knowledge graph. Entities and relations related to organ failure were extracted from the Unified Medical Language System to build the medical knowledge graph, and patient data were mapped onto the graph to extract the embeddings. We measured the performance of our DKM approach with cross-validation to avoid the formation of biased assessments.

Results

An area under the receiver operating characteristic curve (AUC) of 0.973, a precision of 0.923, a NPV of 0.989, and an F1 score of 0.927 were achieved using the DKM approach, which significantly outperformed the baseline methods. Additionally, the performance remained stable following external validation on the eICU dataset, which consists of 2,816 admissions (AUC = 0.981, precision = 0.860, NPV = 0.984). Visualization of feature importance for the Delta SOFA score and their relationships on the basic clinical medical (BCM) knowledge graph provided a model explanation.

Conclusion

The use of an improved TCN model and a medical knowledge graph led to substantial improvement in prediction accuracy, providing generalizability and an independent explanation for organ failure prediction in critical care patients. These findings show the potential of incorporating prior domain knowledge into machine learning models to inform care and service planning.

Supplementary Information

The online version of this article contains supplementary material available 10.1007/s13755-023-00210-5.

Collapse

Feng J, Zhang R, Chen D, Shi L. A Visualization Method of Knowledge Graphs for the Computation and Comprehension of Ultrasound Reports. Biomimetics (Basel) 2023;8:560. [PMID: 38132500 PMCID: PMC10741754 DOI: 10.3390/biomimetics8080560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 10/30/2023] [Accepted: 11/20/2023] [Indexed: 12/23/2023] Open

Blaudin de Thé FX, Baudier C, Andrade Pereira R, Lefebvre C, Moingeon P. Transforming drug discovery with a high-throughput AI-powered platform: A 5-year experience with Patrimony. Drug Discov Today 2023;28:103772. [PMID: 37717933 DOI: 10.1016/j.drudis.2023.103772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 09/01/2023] [Accepted: 09/12/2023] [Indexed: 09/19/2023]

Schiano di Cola V, Chiaro D, Prezioso E, Izzo S, Giampaolo F. Insight Extraction From E-Health Bookings by Means of Hypergraph and Machine Learning. IEEE J Biomed Health Inform 2023;27:4649-4659. [PMID: 37018305 DOI: 10.1109/jbhi.2022.3233498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Hou Y, Yeung J, Xu H, Su C, Wang F, Zhang R. From Answers to Insights: Unveiling the Strengths and Limitations of ChatGPT and Biomedical Knowledge Graphs. RESEARCH SQUARE 2023:rs.3.rs-3185632. [PMID: 37577545 PMCID: PMC10418534 DOI: 10.21203/rs.3.rs-3185632/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]

Abstract

Purpose

Large Language Models (LLMs) have shown exceptional performance in various natural language processing tasks, benefiting from their language generation capabilities and ability to acquire knowledge from unstructured text. However, in the biomedical domain, LLMs face limitations that lead to inaccurate and inconsistent answers. Knowledge Graphs (KGs) have emerged as valuable resources for organizing structured information. Biomedical Knowledge Graphs (BKGs) have gained significant attention for managing diverse and large-scale biomedical knowledge. The objective of this study is to assess and compare the capabilities of ChatGPT and existing BKGs in question-answering, biomedical knowledge discovery, and reasoning tasks within the biomedical domain.

Methods

We conducted a series of experiments to assess the performance of ChatGPT and the BKGs in various aspects of querying existing biomedical knowledge, knowledge discovery, and knowledge reasoning. Firstly, we tasked ChatGPT with answering questions sourced from the "Alternative Medicine" sub-category of Yahoo! Answers and recorded the responses. Additionally, we queried BKG to retrieve the relevant knowledge records corresponding to the questions and assessed them manually. In another experiment, we formulated a prediction scenario to assess ChatGPT's ability to suggest potential drug/dietary supplement repurposing candidates. Simultaneously, we utilized BKG to perform link prediction for the same task. The outcomes of ChatGPT and BKG were compared and analyzed. Furthermore, we evaluated ChatGPT and BKG's capabilities in establishing associations between pairs of proposed entities. This evaluation aimed to assess their reasoning abilities and the extent to which they can infer connections within the knowledge domain.

Results

The results indicate that ChatGPT with GPT-4.0 outperforms both GPT-3.5 and BKGs in providing existing information. However, BKGs demonstrate higher reliability in terms of information accuracy. ChatGPT exhibits limitations in performing novel discoveries and reasoning, particularly in establishing structured links between entities compared to BKGs.

Conclusions

To address the limitations observed, future research should focus on integrating LLMs and BKGs to leverage the strengths of both approaches. Such integration would optimize task performance and mitigate potential risks, leading to advancements in knowledge within the biomedical field and contributing to the overall well-being of individuals.

Collapse

Walke D, Micheel D, Schallert K, Muth T, Broneske D, Saake G, Heyer R. The importance of graph databases and graph learning for clinical applications. Database (Oxford) 2023;2023:baad045. [PMID: 37428679 PMCID: PMC10332447 DOI: 10.1093/database/baad045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 05/26/2023] [Accepted: 06/16/2023] [Indexed: 07/12/2023]

Caufield JH, Putman T, Schaper K, Unni DR, Hegde H, Callahan TJ, Cappelletti L, Moxon SAT, Ravanmehr V, Carbon S, Chan LE, Cortes K, Shefchek KA, Elsarboukh G, Balhoff J, Fontana T, Matentzoglu N, Bruskiewich RM, Thessen AE, Harris NL, Munoz-Torres MC, Haendel MA, Robinson PN, Joachimiak MP, Mungall CJ, Reese JT. KG-Hub-building and exchanging biological knowledge graphs. Bioinformatics 2023;39:btad418. [PMID: 37389415 PMCID: PMC10336030 DOI: 10.1093/bioinformatics/btad418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/09/2023] [Accepted: 06/29/2023] [Indexed: 07/01/2023] Open

Affiliation(s)

J Harry Caufield Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
Tim Putman Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
Kevin Schaper Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
Deepak R Unni SIB Swiss Institute of Bioinformatics, Basel 1015, Switzerland
Harshad Hegde Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
Tiffany J Callahan Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY 10032, United States
Luca Cappelletti Department of Computer Science, University of Milano, Milan 20126, Italy
Sierra A T Moxon Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
Vida Ravanmehr Department of Lymphoma-Myeloma, MD Anderson Cancer Center, Houston, TX 77030, United States
Seth Carbon Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
Lauren E Chan College of Public Health and Human Sciences, Oregon State University, Corvallis, OR 97331, United States
Katherina Cortes Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
Kent A Shefchek Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
Glass Elsarboukh Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
Jim Balhoff Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC 27517, United States
Tommaso Fontana Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan 20133, Italy
Nicolas Matentzoglu Semanticly, Athens, Greece
Richard M Bruskiewich STAR Informatics, Delphinai Corporation, Sooke, BC V9Z 0M3, Canada
Anne E Thessen Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
Nomi L Harris Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
Monica C Munoz-Torres Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
Melissa A Haendel Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, United States
Peter N Robinson The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, United States
Marcin P Joachimiak Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
Christopher J Mungall Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States
Justin T Reese Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States

Collapse

Hou Y, Yeung J, Xu H, Su C, Wang F, Zhang R. From Answers to Insights: Unveiling the Strengths and Limitations of ChatGPT and Biomedical Knowledge Graphs. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.06.09.23291208. [PMID: 37398259 PMCID: PMC10312889 DOI: 10.1101/2023.06.09.23291208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]

Murali L, Gopakumar G, Viswanathan DM, Nedungadi P. Towards electronic health record-based medical knowledge graph construction, completion, and applications: A literature study. J Biomed Inform 2023:104403. [PMID: 37230406 DOI: 10.1016/j.jbi.2023.104403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 05/16/2023] [Accepted: 05/19/2023] [Indexed: 05/27/2023]

Cenikj G, Strojnik L, Angelski R, Ogrinc N, Koroušić Seljak B, Eftimov T. From language models to large-scale food and biomedical knowledge graphs. Sci Rep 2023;13:7815. [PMID: 37188766 DOI: 10.1038/s41598-023-34981-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 05/10/2023] [Indexed: 05/17/2023] Open

Su C, Hou Y, Zhou M, Rajendran S, Maasch JRA, Abedi Z, Zhang H, Bai Z, Cuturrufo A, Guo W, Chaudhry FF, Ghahramani G, Tang J, Cheng F, Li Y, Zhang R, DeKosky ST, Bian J, Wang F. Biomedical discovery through the integrative biomedical knowledge hub (iBKH). iScience 2023;26:106460. [PMID: 37020958 PMCID: PMC10068563 DOI: 10.1016/j.isci.2023.106460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 09/20/2022] [Accepted: 03/16/2023] [Indexed: 04/01/2023] Open

Affiliation(s)

Chang Su Department of Health Service Administration and Policy, College of Public Health, Temple University, Philadelphia, PA 19122, USA
Yu Hou Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA Department of Surgery, University of Minnesota, Minneapolis, MN 55455, USA
Manqi Zhou Department of Computational Biology, Cornell University, Ithaca, NY 14850, USA
Suraj Rajendran Tri-Institutional Computational Biology & Medicine Program, Cornell University, New York, NY 10065, USA
Jacqueline R.M. A. Maasch Department of Computer Science, Cornell Tech, New York, NY 10044, USA
Zehra Abedi Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA
Haotan Zhang Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA
Zilong Bai Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA
Anthony Cuturrufo Computer Science, Cornell University, Ithaca, NY 14850, USA
Winston Guo Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA
Fayzan F. Chaudhry Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA
Gregory Ghahramani Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA
Jian Tang Mila-Quebec AI Institute and HEC Montreal, Montreal, QC H2S 3H1, Canada
Feixiong Cheng Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
Yue Li School of Computer Science, McGill University, Montreal, QC H3A 0C6, Canada
Rui Zhang Department of Surgery, University of Minnesota, Minneapolis, MN 55455, USA
Steven T. DeKosky Department of Neurology, College of Medicine, University of Florida, Gainesville, FL 32610, USA
Jiang Bian Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32610, USA
Fei Wang Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, USA

Collapse

Gao J, He S, Hu J, Chen G. A hybrid system to understand the relations between assessments and plans in progress notes. J Biomed Inform 2023;141:104363. [PMID: 37054961 DOI: 10.1016/j.jbi.2023.104363] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 04/05/2023] [Accepted: 04/07/2023] [Indexed: 04/15/2023]

Gani MO, Kethireddy S, Adib R, Hasan U, Griffin P, Adibuzzaman M. Structural causal model with expert augmented knowledge to estimate the effect of oxygen therapy on mortality in the ICU. Artif Intell Med 2023;137:102493. [PMID: 36868692 PMCID: PMC9992896 DOI: 10.1016/j.artmed.2023.102493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 01/17/2023] [Accepted: 01/17/2023] [Indexed: 02/01/2023]

Getzen E, Ungar L, Mowery D, Jiang X, Long Q. Mining for equitable health: Assessing the impact of missing data in electronic health records. J Biomed Inform 2023;139:104269. [PMID: 36621750 PMCID: PMC10391553 DOI: 10.1016/j.jbi.2022.104269] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 11/25/2022] [Accepted: 12/07/2022] [Indexed: 01/07/2023]

Abstract

Electronic health records (EHR) are collected as a routine part of healthcare delivery, and have great potential to be utilized to improve patient health outcomes. They contain multiple years of health information to be leveraged for risk prediction, disease detection, and treatment evaluation. However, they do not have a consistent, standardized format across institutions, particularly in the United States, and can present significant analytical challenges- they contain multi-scale data from heterogeneous domains and include both structured and unstructured data. Data for individual patients are collected at irregular time intervals and with varying frequencies. In addition to the analytical challenges, EHR can reflect inequity- patients belonging to different groups will have differing amounts of data in their health records. Many of these issues can contribute to biased data collection. The consequence is that the data for under-served groups may be less informative partly due to more fragmented care, which can be viewed as a type of missing data problem. For EHR data in this complex form, there is currently no framework for introducing realistic missing values. There has also been little to no work in assessing the impact of missing data in EHR. In this work, we first introduce a terminology to define three levels of EHR data and then propose a novel framework for simulating realistic missing data scenarios in EHR to adequately assess their impact on predictive modeling. We incorporate the use of a medical knowledge graph to capture dependencies between medical events to create a more realistic missing data framework. In an intensive care unit setting, we found that missing data have greater negative impact on the performance of disease prediction models in groups that tend to have less access to healthcare, or seek less healthcare. We also found that the impact of missing data on disease prediction models is stronger when using the knowledge graph framework to introduce realistic missing values as opposed to random event removal.

Collapse

Chen A, Lu R, Han R, Huang R, Qin G, Wen J, Li Q, Zhang Z, Jiang W. Building Practical Risk Prediction Models for Nasopharyngeal Carcinoma Screening with Patient Graph Analysis and Machine Learning. Cancer Epidemiol Biomarkers Prev 2023;32:274-280. [PMID: 36480263 DOI: 10.1158/1055-9965.epi-22-0792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 09/07/2022] [Accepted: 12/06/2022] [Indexed: 12/13/2022] Open

Knowledge mining of unstructured information: application to cyber domain. Sci Rep 2023;13:1714. [PMID: 36720897 PMCID: PMC9889742 DOI: 10.1038/s41598-023-28796-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 01/24/2023] [Indexed: 02/01/2023] Open

Yang Y, Lu Y, Yan W. A comprehensive review on knowledge graphs for complex diseases. Brief Bioinform 2023;24:6931722. [PMID: 36528805 DOI: 10.1093/bib/bbac543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 11/02/2022] [Accepted: 11/10/2022] [Indexed: 12/23/2022] Open

A weighted-link graph neural network for lung cancer knowledge classification. APPL INTELL 2023. [DOI: 10.1007/s10489-022-04437-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]

Griesser A, Bidmon S. A Process Related View on the Usage of Electronic Health Records from the Patients' Perspective: A Systematic Review. J Med Syst 2022;47:2. [PMID: 36580132 PMCID: PMC9800349 DOI: 10.1007/s10916-022-01886-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 11/02/2022] [Indexed: 12/30/2022]

Cho HN, Ahn I, Gwon H, Kang HJ, Kim Y, Seo H, Choi H, Kim M, Han J, Kee G, Jun TJ, Kim YH. Heterogeneous graph construction and HinSAGE learning from electronic medical records. Sci Rep 2022;12:21152. [PMID: 36477457 PMCID: PMC9729175 DOI: 10.1038/s41598-022-25693-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 12/02/2022] [Indexed: 12/12/2022] Open

Affiliation(s)

Ha Na Cho grid.267370.70000 0004 0533 4667Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
Imjin Ahn grid.267370.70000 0004 0533 4667Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
Hansle Gwon grid.267370.70000 0004 0533 4667Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
Hee Jun Kang grid.267370.70000 0004 0533 4667Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
Yunha Kim grid.267370.70000 0004 0533 4667Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
Hyeram Seo grid.267370.70000 0004 0533 4667Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
Heejung Choi grid.267370.70000 0004 0533 4667Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
Minkyoung Kim grid.267370.70000 0004 0533 4667Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
Jiye Han grid.267370.70000 0004 0533 4667Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
Gaeun Kee grid.267370.70000 0004 0533 4667Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
Tae Joon Jun grid.413967.e0000 0001 0842 2126Big Data Research Center, Asan Institute for Life Sciences, Asan Medical Center, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea
Young-Hak Kim grid.267370.70000 0004 0533 4667Division of Cardiology, Department of Internal Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88, Olympicro 43 Gil, Songpagu, 05505 Seoul, Republic of Korea

Collapse

Li MM, Huang K, Zitnik M. Graph representation learning in biomedicine and healthcare. Nat Biomed Eng 2022;6:1353-1369. [PMID: 36316368 PMCID: PMC10699434 DOI: 10.1038/s41551-022-00942-x] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Accepted: 08/09/2022] [Indexed: 11/11/2022]

Wang Y, Han X, Hao X, Zhu T, Shu H. A Curriculum Batching Strategy for Automatic ICD Coding with Deep Multi-Label Classification Models. Healthcare (Basel) 2022;10:healthcare10122397. [PMID: 36553921 PMCID: PMC9777784 DOI: 10.3390/healthcare10122397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 11/21/2022] [Accepted: 11/25/2022] [Indexed: 12/05/2022] Open

Chen A, Huang R, Wu E, Han R, Wen J, Li Q, Zhang Z, Shen B. The Generation of a Lung Cancer Health Factor Distribution Using Patient Graphs Constructed From Electronic Medical Records: Retrospective Study. J Med Internet Res 2022;24:e40361. [PMID: 36427233 PMCID: PMC9736747 DOI: 10.2196/40361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 09/09/2022] [Accepted: 10/25/2022] [Indexed: 11/27/2022] Open

Abstract

BACKGROUND

Electronic medical records (EMRs) of patients with lung cancer (LC) capture a variety of health factors. Understanding the distribution of these factors will help identify key factors for risk prediction in preventive screening for LC.

OBJECTIVE

We aimed to generate an integrated biomedical graph from EMR data and Unified Medical Language System (UMLS) ontology for LC, and to generate an LC health factor distribution from a hospital EMR of approximately 1 million patients.

METHODS

The data were collected from 2 sets of 1397 patients with and those without LC. A patient-centered health factor graph was plotted with 108,000 standardized data, and a graph database was generated to integrate the graphs of patient health factors and the UMLS ontology. With the patient graph, we calculated the connection delta ratio (CDR) for each of the health factors to measure the relative strength of the factor's relationship to LC.

RESULTS

The patient graph had 93,000 relations between the 2794 patient nodes and 650 factor nodes. An LC graph with 187 related biomedical concepts and 188 horizontal biomedical relations was plotted and linked to the patient graph. Searching the integrated biomedical graph with any number or category of health factors resulted in graphical representations of relationships between patients and factors, while searches using any patient presented the patient's health factors from the EMR and the LC knowledge graph (KG) from the UMLS in the same graph. Sorting the health factors by CDR in descending order generated a distribution of health factors for LC. The top 70 CDR-ranked factors of disease, symptom, medical history, observation, and laboratory test categories were verified to be concordant with those found in the literature.

CONCLUSIONS

By collecting standardized data of thousands of patients with and those without LC from the EMR, it was possible to generate a hospital-wide patient-centered health factor graph for graph search and presentation. The patient graph could be integrated with the UMLS KG for LC and thus enable hospitals to bring continuously updated international standard biomedical KGs from the UMLS for clinical use in hospitals. CDR analysis of the graph of patients with LC generated a CDR-sorted distribution of health factors, in which the top CDR-ranked health factors were concordant with the literature. The resulting distribution of LC health factors can be used to help personalize risk evaluation and preventive screening recommendations.

Collapse

Xiao G, Pfaff E, Prud'hommeaux E, Booth D, Sharma DK, Huo N, Yu Y, Zong N, Ruddy KJ, Chute CG, Jiang G. FHIR-Ontop-OMOP: Building clinical knowledge graphs in FHIR RDF with the OMOP Common data Model. J Biomed Inform 2022;134:104201. [PMID: 36089199 PMCID: PMC9561043 DOI: 10.1016/j.jbi.2022.104201] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 08/04/2022] [Accepted: 09/04/2022] [Indexed: 11/26/2022]

Abstract

BACKGROUND

Knowledge graphs (KGs) play a key role to enable explainable artificial intelligence (AI) applications in healthcare. Constructing clinical knowledge graphs (CKGs) against heterogeneous electronic health records (EHRs) has been desired by the research and healthcare AI communities. From the standardization perspective, community-based standards such as the Fast Healthcare Interoperability Resources (FHIR) and the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) are increasingly used to represent and standardize EHR data for clinical data analytics, however, the potential of such a standard on building CKG has not been well investigated.

OBJECTIVE

To develop and evaluate methods and tools that expose the OMOP CDM-based clinical data repositories into virtual clinical KGs that are compliant with FHIR Resource Description Framework (RDF) specification.

METHODS

We developed a system called FHIR-Ontop-OMOP to generate virtual clinical KGs from the OMOP relational databases. We leveraged an OMOP CDM-based Medical Information Mart for Intensive Care (MIMIC-III) data repository to evaluate the FHIR-Ontop-OMOP system in terms of the faithfulness of data transformation and the conformance of the generated CKGs to the FHIR RDF specification.

RESULTS

A beta version of the system has been released. A total of more than 100 data element mappings from 11 OMOP CDM clinical data, health system and vocabulary tables were implemented in the system, covering 11 FHIR resources. The generated virtual CKG from MIMIC-III contains 46,520 instances of FHIR Patient, 716,595 instances of Condition, 1,063,525 instances of Procedure, 24,934,751 instances of MedicationStatement, 365,181,104 instances of Observations, and 4,779,672 instances of CodeableConcept. Patient counts identified by five pairs of SQL (over the MIMIC database) and SPARQL (over the virtual CKG) queries were identical, ensuring the faithfulness of the data transformation. Generated CKG in RDF triples for 100 patients were fully conformant with the FHIR RDF specification.

CONCLUSION

The FHIR-Ontop-OMOP system can expose OMOP database as a FHIR-compliant RDF graph. It provides a meaningful use case demonstrating the potentials that can be enabled by the interoperability between FHIR and OMOP CDM. Generated clinical KGs in FHIR RDF provide a semantic foundation to enable explainable AI applications in healthcare.

Collapse

Jing F, Ren H, Cheng W, Wang X, Zhang Q. Knowledge-enhanced attentive learning for answer selection in community question answering systems. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Brakefield WS, Ammar N, Shaban-Nejad A. An Urban Population Health Observatory for Disease Causal Pathway Analysis and Decision Support: Underlying Explainable Artificial Intelligence Model. JMIR Form Res 2022;6:e36055. [PMID: 35857363 PMCID: PMC9350817 DOI: 10.2196/36055] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 05/03/2022] [Accepted: 06/07/2022] [Indexed: 01/16/2023] Open

Abstract

Background

Many researchers have aimed to develop chronic health surveillance systems to assist in public health decision-making. Several digital health solutions created lack the ability to explain their decisions and actions to human users.

Objective

This study sought to (1) expand our existing Urban Population Health Observatory (UPHO) system by incorporating a semantics layer; (2) cohesively employ machine learning and semantic/logical inference to provide measurable evidence and detect pathways leading to undesirable health outcomes; (3) provide clinical use case scenarios and design case studies to identify socioenvironmental determinants of health associated with the prevalence of obesity, and (4) design a dashboard that demonstrates the use of UPHO in the context of obesity surveillance using the provided scenarios.

Methods

The system design includes a knowledge graph generation component that provides contextual knowledge from relevant domains of interest. This system leverages semantics using concepts, properties, and axioms from existing ontologies. In addition, we used the publicly available US Centers for Disease Control and Prevention 500 Cities data set to perform multivariate analysis. A cohesive approach that employs machine learning and semantic/logical inference reveals pathways leading to diseases.

Results

In this study, we present 2 clinical case scenarios and a proof-of-concept prototype design of a dashboard that provides warnings, recommendations, and explanations and demonstrates the use of UPHO in the context of obesity surveillance, treatment, and prevention. While exploring the case scenarios using a support vector regression machine learning model, we found that poverty, lack of physical activity, education, and unemployment were the most important predictive variables that contribute to obesity in Memphis, TN.

Conclusions

The application of UPHO could help reduce health disparities and improve urban population health. The expanded UPHO feature incorporates an additional level of interpretable knowledge to enhance physicians, researchers, and health officials' informed decision-making at both patient and community levels.

International Registered Report Identifier (IRRID)

RR2-10.2196/28269

Collapse

Construction of Disease-Symptom Knowledge Graph from Web-Board Documents. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12136615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]

A lattice LSTM-based framework for knowledge graph construction from power plants maintenance reports. SERVICE ORIENTED COMPUTING AND APPLICATIONS 2022. [DOI: 10.1007/s11761-022-00338-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Graph neural network approaches for drug-target interactions. Curr Opin Struct Biol 2022;73:102327. [DOI: 10.1016/j.sbi.2021.102327] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 11/22/2021] [Accepted: 12/13/2021] [Indexed: 01/06/2023]

Weng H, Chen J, Ou A, Lao Y. Leveraging Representation Learning for the Construction and Application of Knowledge Graph for Traditional Chinese Medicine (Preprint). JMIR Med Inform 2022;10:e38414. [PMID: 36053574 PMCID: PMC9482071 DOI: 10.2196/38414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 07/04/2022] [Accepted: 07/27/2022] [Indexed: 11/13/2022] Open

Lan G, Liu T, Wang X, Pan X, Huang Z. A semantic web technology index. Sci Rep 2022;12:3672. [PMID: 35256665 PMCID: PMC8901930 DOI: 10.1038/s41598-022-07615-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 01/13/2022] [Indexed: 01/22/2023] Open

Process knowledge graph modeling techniques and application methods for ship heterogeneous models. Sci Rep 2022;12:2911. [PMID: 35190625 PMCID: PMC8861156 DOI: 10.1038/s41598-022-06940-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 02/09/2022] [Indexed: 12/03/2022] Open

Richens JG, Buchard A. Artificial Intelligence for Medical Diagnosis. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_29] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Risk Assessment of Alpine Skiing Events Based on Knowledge Graph: A Focus on Meteorological Conditions. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2021. [DOI: 10.3390/ijgi10120835] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Li Z, Zhong Q, Yang J, Duan Y, Wang W, Wu C, He K. DeepKG: an end-to-end deep learning-based workflow for biomedical knowledge graph extraction, optimization and applications. Bioinformatics 2021;38:1477-1479. [PMID: 34788369 PMCID: PMC8689937 DOI: 10.1093/bioinformatics/btab767] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 10/11/2021] [Accepted: 11/01/2021] [Indexed: 01/05/2023] Open

Sentiment Analysis in Twitter Based on Knowledge Graph and Deep Learning Classification. ELECTRONICS 2021. [DOI: 10.3390/electronics10222739] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Lu Z, Sim JA, Wang JX, Forrest CB, Krull KR, Srivastava D, Hudson MM, Robison LL, Baker JN, Huang IC. Natural Language Processing and Machine Learning Methods to Characterize Unstructured Patient-Reported Outcomes: Validation Study. J Med Internet Res 2021;23:e26777. [PMID: 34730546 PMCID: PMC8600437 DOI: 10.2196/26777] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2020] [Revised: 03/20/2021] [Accepted: 08/12/2021] [Indexed: 01/22/2023] Open

Abstract

BACKGROUND

Assessing patient-reported outcomes (PROs) through interviews or conversations during clinical encounters provides insightful information about survivorship.

OBJECTIVE

This study aims to test the validity of natural language processing (NLP) and machine learning (ML) algorithms in identifying different attributes of pain interference and fatigue symptoms experienced by child and adolescent survivors of cancer versus the judgment by PRO content experts as the gold standard to validate NLP/ML algorithms.

METHODS

This cross-sectional study focused on child and adolescent survivors of cancer, aged 8 to 17 years, and caregivers, from whom 391 meaning units in the pain interference domain and 423 in the fatigue domain were generated for analyses. Data were collected from the After Completion of Therapy Clinic at St. Jude Children's Research Hospital. Experienced pain interference and fatigue symptoms were reported through in-depth interviews. After verbatim transcription, analyzable sentences (ie, meaning units) were semantically labeled by 2 content experts for each attribute (physical, cognitive, social, or unclassified). Two NLP/ML methods were used to extract and validate the semantic features: bidirectional encoder representations from transformers (BERT) and Word2vec plus one of the ML methods, the support vector machine or extreme gradient boosting. Receiver operating characteristic and precision-recall curves were used to evaluate the accuracy and validity of the NLP/ML methods.

RESULTS

Compared with Word2vec/support vector machine and Word2vec/extreme gradient boosting, BERT demonstrated higher accuracy in both symptom domains, with 0.931 (95% CI 0.905-0.957) and 0.916 (95% CI 0.887-0.941) for problems with cognitive and social attributes on pain interference, respectively, and 0.929 (95% CI 0.903-0.953) and 0.917 (95% CI 0.891-0.943) for problems with cognitive and social attributes on fatigue, respectively. In addition, BERT yielded superior areas under the receiver operating characteristic curve for cognitive attributes on pain interference and fatigue domains (0.923, 95% CI 0.879-0.997; 0.948, 95% CI 0.922-0.979) and superior areas under the precision-recall curve for cognitive attributes on pain interference and fatigue domains (0.818, 95% CI 0.735-0.917; 0.855, 95% CI 0.791-0.930).

CONCLUSIONS

The BERT method performed better than the other methods. As an alternative to using standard PRO surveys, collecting unstructured PROs via interviews or conversations during clinical encounters and applying NLP/ML methods can facilitate PRO assessment in child and adolescent cancer survivors.

Collapse

System for evaluating the reliability and novelty of medical scientific papers. J Informetr 2021. [DOI: 10.1016/j.joi.2021.101188] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Zanotto BS, Beck da Silva Etges AP, Dal Bosco A, Cortes EG, Ruschel R, De Souza AC, Andrade CMV, Viegas F, Canuto S, Luiz W, Ouriques Martins S, Vieira R, Polanczyk C, André Gonçalves M. Stroke Outcome Measurements From Electronic Medical Records: Cross-sectional Study on the Effectiveness of Neural and Nonneural Classifiers. JMIR Med Inform 2021;9:e29120. [PMID: 34723829 PMCID: PMC8593798 DOI: 10.2196/29120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 06/27/2021] [Accepted: 08/05/2021] [Indexed: 01/20/2023] Open

Abstract

BACKGROUND

With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered stroke management.

OBJECTIVE

This study aims to compare the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs.

METHODS

Our study addressed the computational problems of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: tier 1 (achieved health care status), tier 2 (recovery process), care related (clinical management and risk scores), and baseline characteristics. The analyzed data set was retrospectively extracted from the EMRs of patients with stroke from a private Brazilian hospital between 2018 and 2019. A total of 44,206 sentences from free-text medical records in Portuguese were used to train and develop 10 supervised computational machine learning methods, including state-of-the-art neural and nonneural methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated 6 times, along with subject-wise sampling. A heatmap was used to display comparative result analyses according to the best algorithmic effectiveness (F1 score), supported by statistical significance tests. A feature importance analysis was conducted to provide insights into the results.

RESULTS

The top-performing models were support vector machines trained with lexical and semantic textual features, showing the importance of dealing with noise in EMR textual representations. The support vector machine models produced statistically superior results in 71% (17/24) of tasks, with an F1 score >80% regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally or ambulate and communicate), health care status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Neural methods were largely outperformed by more traditional nonneural methods, given the characteristics of the data set. Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, and coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more tasks in the future.

CONCLUSIONS

Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to clinical conditions of stroke victims, and thus ultimately assess the possibility of proactively using these machine learning techniques in real-world situations.

Collapse

Affiliation(s)

Bruna Stella Zanotto National Institute of Health Technology Assessment - INCT/IATS (CNPQ 465518/2014-1), Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil.,Graduate Program in Epidemiology, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
Ana Paula Beck da Silva Etges National Institute of Health Technology Assessment - INCT/IATS (CNPQ 465518/2014-1), Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil.,School of Technology, Pontifícia Universidade Católica do Rio Grande do Sul, Porto Alegre, Brazil
Avner Dal Bosco School of Technology, Pontifícia Universidade Católica do Rio Grande do Sul, Porto Alegre, Brazil
Eduardo Gabriel Cortes Graduate Program of Computer Science, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
Renata Ruschel National Institute of Health Technology Assessment - INCT/IATS (CNPQ 465518/2014-1), Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
Ana Claudia De Souza Brazilian Stroke Network, Hospital Moinhos de Vento, Porto Alegre, Brazil
Claudio M V Andrade Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Felipe Viegas Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Sergio Canuto Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Washington Luiz Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Sheila Ouriques Martins Brazilian Stroke Network, Hospital Moinhos de Vento, Porto Alegre, Brazil
Renata Vieira Centro Interdisciplinar de História, Culturas e Sociedades (CIDEHUS), Universidade de Évora, Évora, Portugal
Carisi Polanczyk National Institute of Health Technology Assessment - INCT/IATS (CNPQ 465518/2014-1), Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil.,Graduate Program in Epidemiology, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
Marcos André Gonçalves Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil

Collapse

Jiang Y, Gao X, Su W, Li J. Systematic Knowledge Management of Construction Safety Standards Based on Knowledge Graphs: A Case Study in China. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021;18:ijerph182010692. [PMID: 34682437 PMCID: PMC8536078 DOI: 10.3390/ijerph182010692] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 10/01/2021] [Accepted: 10/03/2021] [Indexed: 12/02/2022]

Jonnagaddala J, Chen A, Batongbacal S, Nekkantti C. The OpenDeID corpus for patient de-identification. Sci Rep 2021;11:19973. [PMID: 34620985 PMCID: PMC8497517 DOI: 10.1038/s41598-021-99554-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 09/28/2021] [Indexed: 11/18/2022] Open

Yu Y, Huang K, Zhang C, Glass LM, Sun J, Xiao C. SumGNN: multi-typed drug interaction prediction via efficient knowledge graph summarization. Bioinformatics 2021;37:2988-2995. [PMID: 33769494 DOI: 10.1093/bioinformatics/btab207] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Revised: 02/07/2021] [Accepted: 03/24/2021] [Indexed: 02/02/2023] Open

Constructing public health evidence knowledge graph for decision-making support from COVID-19 literature of modelling study. JOURNAL OF SAFETY SCIENCE AND RESILIENCE 2021. [PMCID: PMC8361008 DOI: 10.1016/j.jnlssr.2021.08.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]

Yi HC, You ZH, Huang DS, Kwoh CK. Graph representation learning in bioinformatics: trends, methods and applications. Brief Bioinform 2021;23:6361044. [PMID: 34471921 DOI: 10.1093/bib/bbab340] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/18/2021] [Accepted: 08/02/2021] [Indexed: 12/12/2022] Open

Abstract

Graph is a natural data structure for describing complex systems, which contains a set of objects and relationships. Ubiquitous real-life biomedical problems can be modeled as graph analytics tasks. Machine learning, especially deep learning, succeeds in vast bioinformatics scenarios with data represented in Euclidean domain. However, rich relational information between biological elements is retained in the non-Euclidean biomedical graphs, which is not learning friendly to classic machine learning methods. Graph representation learning aims to embed graph into a low-dimensional space while preserving graph topology and node properties. It bridges biomedical graphs and modern machine learning methods and has recently raised widespread interest in both machine learning and bioinformatics communities. In this work, we summarize the advances of graph representation learning and its representative applications in bioinformatics. To provide a comprehensive and structured analysis and perspective, we first categorize and analyze both graph embedding methods (homogeneous graph embedding, heterogeneous graph embedding, attribute graph embedding) and graph neural networks. Furthermore, we summarize their representative applications from molecular level to genomics, pharmaceutical and healthcare systems level. Moreover, we provide open resource platforms and libraries for implementing these graph representation learning methods and discuss the challenges and opportunities of graph representation learning in bioinformatics. This work provides a comprehensive survey of emerging graph representation learning algorithms and their applications in bioinformatics. It is anticipated that it could bring valuable insights for researchers to contribute their knowledge to graph representation learning and future-oriented bioinformatics studies.

Collapse

Prediction of Bladder Cancer Treatment Side Effects Using an Ontology-Based Reasoning for Enhanced Patient Health Safety. INFORMATICS 2021. [DOI: 10.3390/informatics8030055] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open

Abstract Predicting potential cancer treatment side effects at time of prescription could decrease potential health risks and achieve better patient satisfaction. This paper presents a new approach, founded on evidence-based medical knowledge, using as much information and proof as possible to help a computer program to predict bladder cancer treatment side effects and support the oncologist’s decision. This will help in deciding treatment options for patients with bladder malignancies. Bladder cancer knowledge is complex and requires simplification before any attempt to represent it in a formal or computerized manner. In this work we rely on the capabilities of OWL ontologies to seamlessly capture and conceptualize the required knowledge about this type of cancer and the underlying patient treatment process. Our ontology allows case-based reasoning to effectively predict treatment side effects for a given set of contextual information related to a specific medical case. The ontology is enriched with proofs and evidence collected from online biomedical research databases using “web crawlers”. We have exclusively designed the crawler algorithm to search for the required knowledge based on a set of specified keywords. Results from the study presented 80.3% of real reported bladder cancer treatment side-effects prediction and were close to really occurring adverse events recorded within the collected test samples when applying the approach. Evidence-based medicine combined with semantic knowledge-based models is prominent in generating predictions related to possible health concerns. The integration of a diversity of knowledge and evidence into one single integrated knowledge-base could dramatically enhance the process of predicting treatment risks and side effects applied to bladder cancer oncotherapy. Collapse