1
|
Wang SY, Ravindranath R, Stein JD. Prediction Models for Glaucoma in a Multicenter Electronic Health Records Consortium: The Sight Outcomes Research Collaborative. OPHTHALMOLOGY SCIENCE 2024; 4:100445. [PMID: 38317869 PMCID: PMC10838906 DOI: 10.1016/j.xops.2023.100445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 11/22/2023] [Accepted: 12/01/2023] [Indexed: 02/07/2024]
Abstract
Purpose Advances in artificial intelligence have enabled the development of predictive models for glaucoma. However, most work is single-center and uncertainty exists regarding the generalizability of such models. The purpose of this study was to build and evaluate machine learning (ML) approaches to predict glaucoma progression requiring surgery using data from a large multicenter consortium of electronic health records (EHR). Design Cohort study. Participants Thirty-six thousand five hundred forty-eight patients with glaucoma, as identified by International Classification of Diseases (ICD) codes from 6 academic eye centers participating in the Sight OUtcomes Research Collaborative (SOURCE). Methods We developed ML models to predict whether patients with glaucoma would progress to glaucoma surgery in the coming year (identified by Current Procedural Terminology codes) using the following modeling approaches: (1) penalized logistic regression (lasso, ridge, and elastic net); (2) tree-based models (random forest, gradient boosted machines, and XGBoost), and (3) deep learning models. Model input features included demographics, diagnosis codes, medications, and clinical information (intraocular pressure, visual acuity, refractive status, and central corneal thickness) available from structured EHR data. One site was reserved as an "external site" test set (N = 1550); of the patients from the remaining sites, 10% each were randomly selected to be in development and test sets, with the remaining 27 999 reserved for model training. Main Outcome Measures Evaluation metrics included area under the receiver operating characteristic curve (AUROC) on the test set and the external site. Results Six thousand nineteen (16.5%) of 36 548 patients underwent glaucoma surgery. Overall, the AUROC ranged from 0.735 to 0.771 on the random test set and from 0.706 to 0.754 on the external test site, with the XGBoost and random forest model performing best, respectively. There was greatest performance decrease from the random test set to the external test site for the penalized regression models. Conclusions Machine learning models developed using structured EHR data can reasonably predict whether glaucoma patients will need surgery, with reasonable generalizability to an external site. Additional research is needed to investigate the impact of protected class characteristics such as race or gender on model performance and fairness. Financial Disclosures Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Collapse
Affiliation(s)
- Sophia Y. Wang
- Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, California
| | - Rohith Ravindranath
- Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, California
| | - Joshua D. Stein
- Department of Ophthalmology & Visual Sciences, University of Michigan Kellogg Eye Center, Ann Arbor, Michigan
| |
Collapse
|
2
|
Ke Y, Yang R, Liu N. Comparing Open-Access Database and Traditional Intensive Care Studies Using Machine Learning: Bibliometric Analysis Study. J Med Internet Res 2024; 26:e48330. [PMID: 38630522 PMCID: PMC11063894 DOI: 10.2196/48330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 08/01/2023] [Accepted: 01/14/2024] [Indexed: 04/19/2024] Open
Abstract
BACKGROUND Intensive care research has predominantly relied on conventional methods like randomized controlled trials. However, the increasing popularity of open-access, free databases in the past decade has opened new avenues for research, offering fresh insights. Leveraging machine learning (ML) techniques enables the analysis of trends in a vast number of studies. OBJECTIVE This study aims to conduct a comprehensive bibliometric analysis using ML to compare trends and research topics in traditional intensive care unit (ICU) studies and those done with open-access databases (OADs). METHODS We used ML for the analysis of publications in the Web of Science database in this study. Articles were categorized into "OAD" and "traditional intensive care" (TIC) studies. OAD studies were included in the Medical Information Mart for Intensive Care (MIMIC), eICU Collaborative Research Database (eICU-CRD), Amsterdam University Medical Centers Database (AmsterdamUMCdb), High Time Resolution ICU Dataset (HiRID), and Pediatric Intensive Care database. TIC studies included all other intensive care studies. Uniform manifold approximation and projection was used to visualize the corpus distribution. The BERTopic technique was used to generate 30 topic-unique identification numbers and to categorize topics into 22 topic families. RESULTS A total of 227,893 records were extracted. After exclusions, 145,426 articles were identified as TIC and 1301 articles as OAD studies. TIC studies experienced exponential growth over the last 2 decades, culminating in a peak of 16,378 articles in 2021, while OAD studies demonstrated a consistent upsurge since 2018. Sepsis, ventilation-related research, and pediatric intensive care were the most frequently discussed topics. TIC studies exhibited broader coverage than OAD studies, suggesting a more extensive research scope. CONCLUSIONS This study analyzed ICU research, providing valuable insights from a large number of publications. OAD studies complement TIC studies, focusing on predictive modeling, while TIC studies capture essential qualitative information. Integrating both approaches in a complementary manner is the future direction for ICU research. Additionally, natural language processing techniques offer a transformative alternative for literature review and bibliometric analysis.
Collapse
Affiliation(s)
- Yuhe Ke
- Division of Anesthesiology and Perioperative Medicine, Singapore General Hospital, Singapore, Singapore
| | - Rui Yang
- Centre for Quantitative Medicine, Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, National University of Singapore, Singapore, Singapore
| |
Collapse
|
3
|
Mekkes NJ, Groot M, Hoekstra E, de Boer A, Dagkesamanskaia E, Bouwman S, Wehrens SMT, Herbert MK, Wever DD, Rozemuller A, Eggen BJL, Huitinga I, Holtman IR. Identification of clinical disease trajectories in neurodegenerative disorders with natural language processing. Nat Med 2024; 30:1143-1153. [PMID: 38472295 PMCID: PMC11031398 DOI: 10.1038/s41591-024-02843-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 01/31/2024] [Indexed: 03/14/2024]
Abstract
Neurodegenerative disorders exhibit considerable clinical heterogeneity and are frequently misdiagnosed. This heterogeneity is often neglected and difficult to study. Therefore, innovative data-driven approaches utilizing substantial autopsy cohorts are needed to address this complexity and improve diagnosis, prognosis and fundamental research. We present clinical disease trajectories from 3,042 Netherlands Brain Bank donors, encompassing 84 neuropsychiatric signs and symptoms identified through natural language processing. This unique resource provides valuable new insights into neurodegenerative disorder symptomatology. To illustrate, we identified signs and symptoms that differed between frequently misdiagnosed disorders. In addition, we performed predictive modeling and identified clinical subtypes of various brain disorders, indicative of neural substructures being differently affected. Finally, integrating clinical diagnosis information revealed a substantial proportion of inaccurately diagnosed donors that masquerade as another disorder. The unique datasets allow researchers to study the clinical manifestation of signs and symptoms across neurodegenerative disorders, and identify associated molecular and cellular features.
Collapse
Affiliation(s)
- Nienke J Mekkes
- Department of Biomedical Sciences, Section Molecular Neurobiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands.
- Machine Learning Lab, Data Science Center in Health, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands.
| | - Minke Groot
- The Netherlands Brain Bank, Netherlands Institute for Neuroscience, Amsterdam, The Netherlands
| | - Eric Hoekstra
- Department of Biomedical Sciences, Section Molecular Neurobiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Alyse de Boer
- Department of Biomedical Sciences, Section Molecular Neurobiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Ekaterina Dagkesamanskaia
- Department of Biomedical Sciences, Section Molecular Neurobiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
- Machine Learning Lab, Data Science Center in Health, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Sander Bouwman
- Department of Biomedical Sciences, Section Molecular Neurobiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Sophie M T Wehrens
- The Netherlands Brain Bank, Netherlands Institute for Neuroscience, Amsterdam, The Netherlands
| | - Megan K Herbert
- The Netherlands Brain Bank, Netherlands Institute for Neuroscience, Amsterdam, The Netherlands
| | - Dennis D Wever
- The Netherlands Brain Bank, Netherlands Institute for Neuroscience, Amsterdam, The Netherlands
| | | | - Bart J L Eggen
- Department of Biomedical Sciences, Section Molecular Neurobiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Inge Huitinga
- The Netherlands Brain Bank, Netherlands Institute for Neuroscience, Amsterdam, The Netherlands
- Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands
| | - Inge R Holtman
- Department of Biomedical Sciences, Section Molecular Neurobiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands.
- Machine Learning Lab, Data Science Center in Health, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands.
- The Netherlands Brain Bank, Netherlands Institute for Neuroscience, Amsterdam, The Netherlands.
| |
Collapse
|
4
|
V JP, S AAV, P GK, N K K. A novel attention-based cross-modal transfer learning framework for predicting cardiovascular disease. Comput Biol Med 2024; 170:107977. [PMID: 38217974 DOI: 10.1016/j.compbiomed.2024.107977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/19/2023] [Accepted: 01/08/2024] [Indexed: 01/15/2024]
Abstract
Cardiovascular disease (CVD) remains a leading cause of death globally, presenting significant challenges in early detection and treatment. The complexity of CVD arises from its multifaceted nature, influenced by a combination of genetic, environmental, and lifestyle factors. Traditional diagnostic approaches often struggle to effectively integrate and interpret the heterogeneous data associated with CVD. Addressing this challenge, we introduce a novel Attention-Based Cross-Modal (ABCM) transfer learning framework. This framework innovatively merges diverse data types, including clinical records, medical imagery, and genetic information, through an attention-driven mechanism. This mechanism adeptly identifies and focuses on the most pertinent attributes from each data source, thereby enhancing the model's ability to discern intricate interrelationships among various data types. Our extensive testing and validation demonstrate that the ABCM framework significantly surpasses traditional single-source models and other advanced multi-source methods in predicting CVD. Specifically, our approach achieves an accuracy of 93.5%, precision of 92.0%, recall of 94.5%, and an impressive area under the curve (AUC) of 97.2%. These results not only underscore the superior predictive capability of our model but also highlight its potential in offering more accurate and early detection of CVD. The integration of cross-modal data through attention-based mechanisms provides a deeper understanding of the disease, paving the way for more informed clinical decision-making and personalized patient care.
Collapse
Affiliation(s)
- Jothi Prakash V
- Karpagam College of Engineering, Myleripalayam Village, Coimbatore, 641032, Tamil Nadu, India.
| | - Arul Antran Vijay S
- Karpagam College of Engineering, Myleripalayam Village, Coimbatore, 641032, Tamil Nadu, India.
| | - Ganesh Kumar P
- College of Engineering, Guindy, Anna University, Chennai, 600025, Tamil Nadu, India.
| | - Karthikeyan N K
- Coimbatore Institute of Technology, Peelamedu, Coimbatore, 641014, Tamil Nadu, India.
| |
Collapse
|
5
|
Oss Boll H, Amirahmadi A, Ghazani MM, Morais WOD, Freitas EPD, Soliman A, Etminani F, Byttner S, Recamonde-Mendoza M. Graph neural networks for clinical risk prediction based on electronic health records: A survey. J Biomed Inform 2024; 151:104616. [PMID: 38423267 DOI: 10.1016/j.jbi.2024.104616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 02/21/2024] [Accepted: 02/23/2024] [Indexed: 03/02/2024]
Abstract
OBJECTIVE This study aims to comprehensively review the use of graph neural networks (GNNs) for clinical risk prediction based on electronic health records (EHRs). The primary goal is to provide an overview of the state-of-the-art of this subject, highlighting ongoing research efforts and identifying existing challenges in developing effective GNNs for improved prediction of clinical risks. METHODS A search was conducted in the Scopus, PubMed, ACM Digital Library, and Embase databases to identify relevant English-language papers that used GNNs for clinical risk prediction based on EHR data. The study includes original research papers published between January 2009 and May 2023. RESULTS Following the initial screening process, 50 articles were included in the data collection. A significant increase in publications from 2020 was observed, with most selected papers focusing on diagnosis prediction (n = 36). The study revealed that the graph attention network (GAT) (n = 19) was the most prevalent architecture, and MIMIC-III (n = 23) was the most common data resource. CONCLUSION GNNs are relevant tools for predicting clinical risk by accounting for the relational aspects among medical events and entities and managing large volumes of EHR data. Future studies in this area may address challenges such as EHR data heterogeneity, multimodality, and model interpretability, aiming to develop more holistic GNN models that can produce more accurate predictions, be effectively implemented in clinical settings, and ultimately improve patient care.
Collapse
Affiliation(s)
- Heloísa Oss Boll
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Avenida Bento Gonçalves, 9500, Porto Alegre, 91501-970, RS, Brazil; School of Information Technology, Halmstad University, Kristian IV:s väg 3, Halmstad, 301 18, Sweden.
| | - Ali Amirahmadi
- School of Information Technology, Halmstad University, Kristian IV:s väg 3, Halmstad, 301 18, Sweden
| | - Mirfarid Musavian Ghazani
- School of Information Technology, Halmstad University, Kristian IV:s väg 3, Halmstad, 301 18, Sweden
| | - Wagner Ourique de Morais
- School of Information Technology, Halmstad University, Kristian IV:s väg 3, Halmstad, 301 18, Sweden
| | - Edison Pignaton de Freitas
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Avenida Bento Gonçalves, 9500, Porto Alegre, 91501-970, RS, Brazil
| | - Amira Soliman
- School of Information Technology, Halmstad University, Kristian IV:s väg 3, Halmstad, 301 18, Sweden
| | - Farzaneh Etminani
- School of Information Technology, Halmstad University, Kristian IV:s väg 3, Halmstad, 301 18, Sweden
| | - Stefan Byttner
- School of Information Technology, Halmstad University, Kristian IV:s väg 3, Halmstad, 301 18, Sweden
| | - Mariana Recamonde-Mendoza
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Avenida Bento Gonçalves, 9500, Porto Alegre, 91501-970, RS, Brazil; Bioinformatics Core, Hospital de Clínicas de Porto Alegre (HCPA), Av. Protásio Alves, 211, Bloco C, Porto Alegre, 90035-903, RS, Brazil
| |
Collapse
|
6
|
Kim J, Villarreal M, Arya S, Hernandez A, Moreira A. Bridging the Gap: Exploring Bronchopulmonary Dysplasia through the Lens of Biomedical Informatics. J Clin Med 2024; 13:1077. [PMID: 38398389 PMCID: PMC10889493 DOI: 10.3390/jcm13041077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 02/07/2024] [Accepted: 02/12/2024] [Indexed: 02/25/2024] Open
Abstract
Bronchopulmonary dysplasia (BPD), a chronic lung disease predominantly affecting premature infants, poses substantial clinical challenges. This review delves into the promise of biomedical informatics (BMI) in reshaping BPD research and care. We commence by highlighting the escalating prevalence and healthcare impact of BPD, emphasizing the necessity for innovative strategies to comprehend its intricate nature. To this end, we introduce BMI as a potent toolset adept at managing and analyzing extensive, diverse biomedical data. The challenges intrinsic to BPD research are addressed, underscoring the inadequacies of conventional approaches and the compelling need for data-driven solutions. We subsequently explore how BMI can revolutionize BPD research, encompassing genomics and personalized medicine to reveal potential biomarkers and individualized treatment strategies. Predictive analytics emerges as a pivotal facet of BMI, enabling early diagnosis and risk assessment for timely interventions. Moreover, we examine how mobile health technologies facilitate real-time monitoring and enhance patient engagement, ultimately refining BPD management. Ethical and legal considerations surrounding BMI implementation in BPD research are discussed, accentuating issues of privacy, data security, and informed consent. In summation, this review highlights BMI's transformative potential in advancing BPD research, addressing challenges, and opening avenues for personalized medicine and predictive analytics.
Collapse
Affiliation(s)
- Jennifer Kim
- Division of Neonatology, Department of Pediatrics, University of Texas Health San Antonio, San Antonio, TX 78229, USA; (J.K.); (M.V.); (A.H.)
| | - Mariela Villarreal
- Division of Neonatology, Department of Pediatrics, University of Texas Health San Antonio, San Antonio, TX 78229, USA; (J.K.); (M.V.); (A.H.)
| | - Shreyas Arya
- Division of Neonatal-Perinatal Medicine, Dayton Children’s Hospital, Dayton, OH 45404, USA
| | - Antonio Hernandez
- Division of Neonatology, Department of Pediatrics, University of Texas Health San Antonio, San Antonio, TX 78229, USA; (J.K.); (M.V.); (A.H.)
| | - Alvaro Moreira
- Division of Neonatology, Department of Pediatrics, University of Texas Health San Antonio, San Antonio, TX 78229, USA; (J.K.); (M.V.); (A.H.)
| |
Collapse
|
7
|
Miranda O, Fan P, Qi X, Wang H, Brannock MD, Kosten TR, Ryan ND, Kirisci L, Wang L. DeepBiomarker2: Prediction of Alcohol and Substance Use Disorder Risk in Post-Traumatic Stress Disorder Patients Using Electronic Medical Records and Multiple Social Determinants of Health. J Pers Med 2024; 14:94. [PMID: 38248795 PMCID: PMC10817272 DOI: 10.3390/jpm14010094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 01/03/2024] [Accepted: 01/10/2024] [Indexed: 01/23/2024] Open
Abstract
Prediction of high-risk events amongst patients with mental disorders is critical for personalized interventions. We developed DeepBiomarker2 by leveraging deep learning and natural language processing to analyze lab tests, medication use, diagnosis, social determinants of health (SDoH) parameters, and psychotherapy for outcome prediction. To increase the model's interpretability, we further refined our contribution analysis to identify key features by scaling with a factor from a reference feature. We applied DeepBiomarker2 to analyze the EMR data of 38,807 patients from the University of Pittsburgh Medical Center diagnosed with post-traumatic stress disorder (PTSD) to determine their risk of developing alcohol and substance use disorder (ASUD). DeepBiomarker2 predicted whether a PTSD patient would have a diagnosis of ASUD within the following 3 months with an average c-statistic (receiver operating characteristic AUC) of 0.93 and average F1 score, precision, and recall of 0.880, 0.895, and 0.866 in the test sets, respectively. Our study found that the medications clindamycin, enalapril, penicillin, valacyclovir, Xarelto/rivaroxaban, moxifloxacin, and atropine and the SDoH parameters access to psychotherapy, living in zip codes with a high normalized vegetative index, Gini index, and low-income segregation may have potential to reduce the risk of ASUDs in PTSD. In conclusion, the integration of SDoH information, coupled with the refined feature contribution analysis, empowers DeepBiomarker2 to accurately predict ASUD risk. Moreover, the model can further identify potential indicators of increased risk along with medications with beneficial effects.
Collapse
Affiliation(s)
- Oshin Miranda
- Computational Chemical Genomics Screening Center, Department of Pharmaceutical Sciences/School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15213, USA; (O.M.); (P.F.); (X.Q.)
| | - Peihao Fan
- Computational Chemical Genomics Screening Center, Department of Pharmaceutical Sciences/School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15213, USA; (O.M.); (P.F.); (X.Q.)
| | - Xiguang Qi
- Computational Chemical Genomics Screening Center, Department of Pharmaceutical Sciences/School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15213, USA; (O.M.); (P.F.); (X.Q.)
| | - Haohan Wang
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL 61820, USA;
| | | | - Thomas R. Kosten
- Menninger Department of Psychiatry, Baylor College of Medicine, Houston, TX 77030, USA;
| | - Neal David Ryan
- Department of Psychiatry, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA;
| | - Levent Kirisci
- Center for Education and Drug Abuse Research, Department of Pharmaceutical Sciences/School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15213, USA;
| | - Lirong Wang
- Computational Chemical Genomics Screening Center, Department of Pharmaceutical Sciences/School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15213, USA; (O.M.); (P.F.); (X.Q.)
| |
Collapse
|
8
|
Wu Y, Wang X, Zhou M, Huang Z, Liu L, Cong L. Application of eHealth Tools in Anticoagulation Management After Cardiac Valve Replacement: Scoping Review Coupled With Bibliometric Analysis. JMIR Mhealth Uhealth 2024; 12:e48716. [PMID: 38180783 PMCID: PMC10799280 DOI: 10.2196/48716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/20/2023] [Accepted: 12/07/2023] [Indexed: 01/06/2024] Open
Abstract
BACKGROUND Anticoagulation management can effectively prevent complications in patients undergoing cardiac valve replacement (CVR). The emergence of eHealth tools provides new prospects for the management of long-term anticoagulants. However, there is no comprehensive summary of the application of eHealth tools in anticoagulation management after CVR. OBJECTIVE Our objective is to clarify the current state, trends, benefits, and challenges of using eHealth tools in the anticoagulation management of patients after CVR and provide future directions and recommendations for development in this field. METHODS This scoping review follows the 5-step framework developed by Arksey and O'Malley. We searched 5 databases such as PubMed, MEDLINE, Web of Science, CINAHL, and Embase using keywords such as "eHealth," "anticoagulation," and "valve replacement." We included papers on the practical application of eHealth tools and excluded papers describing the underlying mechanisms for developing eHealth tools. The search time ranged from the database inception to March 1, 2023. The study findings were reported according to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews). Additionally, VOSviewer (version 1.6.18) was used to construct visualization maps of countries, institutions, authors, and keywords to investigate the internal relations of included literature and to explore research hotspots and frontiers. RESULTS This study included 25 studies that fulfilled the criteria. There were 27,050 participants in total, with the sample size of the included studies ranging from 49 to 13,219. The eHealth tools mainly include computer-based support systems, electronic health records, telemedicine platforms, and mobile apps. Compared to traditional anticoagulation management, eHealth tools can improve time in therapeutic range and life satisfaction. However, there is no significant impact observed in terms of economic benefits and anticoagulation-related complications. Bibliometric analysis suggests the potential for increased collaboration and opportunities among countries and academic institutions. Italy had the widest cooperative relationships. Machine learning and artificial intelligence are the popular research directions in anticoagulation management. CONCLUSIONS eHealth tools exhibit promise for clinical applications in anticoagulation management after CVR, with the potential to enhance postoperative rehabilitation. Further high-quality research is needed to explore the economic benefits of eHealth tools in long-term anticoagulant therapy and the potential to reduce the occurrence of adverse events.
Collapse
Affiliation(s)
- Ying Wu
- Center for Moral Culture, Hunan Normal University, Changsha, China
- School of Medicine, Hunan Normal University, Changsha, China
| | - Xiaohui Wang
- School of Medicine, Hunan Normal University, Changsha, China
| | - Mengyao Zhou
- School of Medicine, Hunan Normal University, Changsha, China
| | - Zhuoer Huang
- School of Medicine, Hunan Normal University, Changsha, China
| | - Lijuan Liu
- Teaching and Research Section of Clinical Nursing, Xiangya Hospital of Central South University, Changsha, China
| | - Li Cong
- School of Medicine, Hunan Normal University, Changsha, China
| |
Collapse
|
9
|
Grzenda A, Widge AS. Electronic health records and stratified psychiatry: bridge to precision treatment? Neuropsychopharmacology 2024; 49:285-290. [PMID: 37667021 PMCID: PMC10700348 DOI: 10.1038/s41386-023-01724-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 08/24/2023] [Accepted: 08/27/2023] [Indexed: 09/06/2023]
Abstract
The use of a stratified psychiatry approach that combines electronic health records (EHR) data with machine learning (ML) is one potentially fruitful path toward rapidly improving precision treatment in clinical practice. This strategy, however, requires confronting pervasive methodological flaws as well as deficiencies in transparency and reporting in the current conduct of ML-based studies for treatment prediction. EHR data shares many of the same data quality issues as other types of data used in ML prediction, plus some unique challenges. To fully leverage EHR data's power for patient stratification, increased attention to data quality and collection of patient-reported outcome data is needed.
Collapse
Affiliation(s)
- Adrienne Grzenda
- Department of Psychiatry & Biobehavioral Sciences, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA, USA.
- Olive View-UCLA Medical Center, Sylmar, CA, USA.
| | - Alik S Widge
- Department of Psychiatry & Behavioral Sciences, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
10
|
Jordan DM, Vy HMT, Do R. A deep learning transformer model predicts high rates of undiagnosed rare disease in large electronic health systems. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.21.23300393. [PMID: 38196638 PMCID: PMC10775679 DOI: 10.1101/2023.12.21.23300393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
It is estimated that as many as 1 in 16 people worldwide suffer from rare diseases. Rare disease patients face difficulty finding diagnosis and treatment for their conditions, including long diagnostic odysseys, multiple incorrect diagnoses, and unavailable or prohibitively expensive treatments. As a result, it is likely that large electronic health record (EHR) systems include high numbers of participants suffering from undiagnosed rare disease. While this has been shown in detail for specific diseases, these studies are expensive and time consuming and have only been feasible to perform for a handful of the thousands of known rare diseases. The bulk of these undiagnosed cases are effectively hidden, with no straightforward way to differentiate them from healthy controls. The ability to access them at scale would enormously expand our capacity to study and develop drugs for rare diseases, adding to tools aimed at increasing availability of study cohorts for rare disease. In this study, we train a deep learning transformer algorithm, RarePT (Rare-Phenotype Prediction Transformer), to impute undiagnosed rare disease from EHR diagnosis codes in 436,407 participants in the UK Biobank and validated on an independent cohort from 3,333,560 individuals from the Mount Sinai Health System. We applied our model to 155 rare diagnosis codes with fewer than 250 cases each in the UK Biobank and predicted participants with elevated risk for each diagnosis, with the number of participants predicted to be at risk ranging from 85 to 22,000 for different diagnoses. These risk predictions are significantly associated with increased mortality for 65% of diagnoses, with disease burden expressed as disability-adjusted life years (DALY) for 73% of diagnoses, and with 72% of available disease-specific diagnostic tests. They are also highly enriched for known rare diagnoses in patients not included in the training set, with an odds ratio (OR) of 48.0 in cross-validation cohorts of the UK Biobank and an OR of 30.6 in the independent Mount Sinai Health System cohort. Most importantly, RarePT successfully screens for undiagnosed patients in 32 rare diseases with available diagnostic tests in the UK Biobank. Using the trained model to estimate the prevalence of undiagnosed disease in the UK Biobank for these 32 rare phenotypes, we find that at least 50% of patients remain undiagnosed for 20 of 32 diseases. These estimates provide empirical evidence of a high prevalence of undiagnosed rare disease, as well as demonstrating the enormous potential benefit of using RarePT to screen for undiagnosed rare disease patients in large electronic health systems.
Collapse
Affiliation(s)
- Daniel M. Jordan
- Center for Genomic Data Analytics, Charles Bronfman Institute for Personalized Medicine, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ha My T. Vy
- Center for Genomic Data Analytics, Charles Bronfman Institute for Personalized Medicine, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ron Do
- Center for Genomic Data Analytics, Charles Bronfman Institute for Personalized Medicine, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
11
|
Zhang J, Xu Y, Ye B, Zhao Y, Sun X, Meng Q, Zhang Y, Cui L. EAPR: explainable and augmented patient representation learning for disease prediction. Health Inf Sci Syst 2023; 11:53. [PMID: 37974902 PMCID: PMC10645955 DOI: 10.1007/s13755-023-00256-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 10/17/2023] [Indexed: 11/19/2023] Open
Abstract
Patient representation learning aims to encode meaningful information about the patient's Electronic Health Records (EHR) in the form of a mathematical representation. Recent advances in deep learning have empowered Patient representation learning methods with greater representational power, allowing the learned representations to significantly improve the performance of disease prediction models. However, the inherent shortcomings of deep learning models, such as the need for massive amounts of labeled data and inexplicability, limit the performance of deep learning-based Patient representation learning methods to further improvements. In particular, learning robust patient representations is challenging when patient data is missing or insufficient. Although data augmentation techniques can tackle this deficiency, the complex data processing further weakens the inexplicability of patient representation learning models. To address the above challenges, this paper proposes an Explainable and Augmented Patient Representation Learning for disease prediction (EAPR). EAPR utilizes data augmentation controlled by confidence interval to enhance patient representation in the presence of limited patient data. Moreover, EAPR proposes to use two-stage gradient backpropagation to address the problem of unexplainable patient representation learning models due to the complex data enhancement process. The experimental results on real clinical data validate the effectiveness and explainability of the proposed approach.
Collapse
Affiliation(s)
- Jiancheng Zhang
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Jinan, China
- School of Software, Shandong University, Jinan, China
| | - Yonghui Xu
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Jinan, China
- School of Software, Shandong University, Jinan, China
| | - Bicui Ye
- Wuzhou Red Cross Hospital, Wuzhou, China
- Jinan University, Jinan, China
| | - Yibowen Zhao
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Jinan, China
- School of Software, Shandong University, Jinan, China
| | - Xiaofang Sun
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Jinan, China
- School of Software, Shandong University, Jinan, China
| | - Qi Meng
- Department of Radiology, Qilu Hospital of Shandong University, Jinan, China
| | - Yang Zhang
- Department of Radiology, Qilu Hospital of Shandong University, Jinan, China
| | - Lizhen Cui
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Jinan, China
- School of Software, Shandong University, Jinan, China
| |
Collapse
|
12
|
Sadegh-Zadeh SA, Sakha H, Movahedi S, Fasihi Harandi A, Ghaffari S, Javanshir E, Ali SA, Hooshanginezhad Z, Hajizadeh R. Advancing prognostic precision in pulmonary embolism: A clinical and laboratory-based artificial intelligence approach for enhanced early mortality risk stratification. Comput Biol Med 2023; 167:107696. [PMID: 37979394 DOI: 10.1016/j.compbiomed.2023.107696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 10/25/2023] [Accepted: 11/07/2023] [Indexed: 11/20/2023]
Abstract
BACKGROUND Acute pulmonary embolism (PE) is a critical medical emergency that necessitates prompt identification and intervention. Accurate prognostication of early mortality is vital for recognizing patients at elevated risk for unfavourable outcomes and administering suitable therapy. Machine learning (ML) algorithms hold promise for enhancing the precision of early mortality prediction in PE patients. OBJECTIVE To devise an ML algorithm for early mortality prediction in PE patients by employing clinical and laboratory variables. METHODS This study utilized diverse oversampling techniques to improve the performance of various machine learning models including ANN, SVM, DT, RF, and AdaBoost for early mortality prediction. Appropriate oversampling methods were chosen for each model based on algorithm characteristics and dataset properties. Predictor variables included four lab tests, eight physiological time series indicators, and two general descriptors. Evaluation used metrics like accuracy, F1_score, precision, recall, Area Under the Curve (AUC) and Receiver Operating Characteristic (ROC) curves, providing a comprehensive view of models' predictive abilities. RESULTS The findings indicated that the RF model with random oversampling exhibited superior performance among the five models assessed, achieving elevated accuracy and precision alongside high recall for predicting the death class. The oversampling approaches effectively equalized the sample distribution among the classes and enhanced the models' performance. CONCLUSIONS The suggested ML technique can efficiently prognosticate mortality in patients afflicted with acute PE. The RF model with random oversampling can aid healthcare professionals in making well-informed decisions regarding the treatment of patients with acute PE. The study underscores the significance of oversampling methods in managing imbalanced data and emphasizes the potential of ML algorithms in refining early mortality prediction for PE patients.
Collapse
Affiliation(s)
- Seyed-Ali Sadegh-Zadeh
- Department of Computing, School of Digital, Technologies and Arts, Staffordshire University, Stoke-on-Trent, England, United Kingdom
| | - Hanie Sakha
- Department of Computing, School of Digital, Technologies and Arts, Staffordshire University, Stoke-on-Trent, England, United Kingdom
| | | | | | - Samad Ghaffari
- Cardiovascular Research Centre, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Elnaz Javanshir
- Cardiovascular Research Centre, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Syed Ahsan Ali
- Health Education England West Midlands, Birmingham, England, United Kingdom
| | - Zahra Hooshanginezhad
- Department of Cardiovascular Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Reza Hajizadeh
- Department of Cardiology, Urmia University of Medical Sciences, Urmia, Iran.
| |
Collapse
|
13
|
Serghiou S, Rough K. Deep Learning for Epidemiologists: An Introduction to Neural Networks. Am J Epidemiol 2023; 192:1904-1916. [PMID: 37139570 DOI: 10.1093/aje/kwad107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Revised: 11/30/2022] [Accepted: 04/24/2023] [Indexed: 05/05/2023] Open
Abstract
Deep learning methods are increasingly being applied to problems in medicine and health care. However, few epidemiologists have received formal training in these methods. To bridge this gap, this article introduces the fundamentals of deep learning from an epidemiologic perspective. Specifically, this article reviews core concepts in machine learning (e.g., overfitting, regularization, and hyperparameters); explains several fundamental deep learning architectures (convolutional neural networks, recurrent neural networks); and summarizes training, evaluation, and deployment of models. Conceptual understanding of supervised learning algorithms is the focus of the article; instructions on the training of deep learning models and applications of deep learning to causal learning are out of this article's scope. We aim to provide an accessible first step towards enabling the reader to read and assess research on the medical applications of deep learning and to familiarize readers with deep learning terminology and concepts to facilitate communication with computer scientists and machine learning engineers.
Collapse
|
14
|
Hua Y, Wang L, Nguyen V, Rieu-Werden M, McDowell A, Bates DW, Foer D, Zhou L. A deep learning approach for transgender and gender diverse patient identification in electronic health records. J Biomed Inform 2023; 147:104507. [PMID: 37778672 PMCID: PMC10687838 DOI: 10.1016/j.jbi.2023.104507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 09/18/2023] [Accepted: 09/22/2023] [Indexed: 10/03/2023]
Abstract
BACKGROUND Although accurate identification of gender identity in the electronic health record (EHR) is crucial for providing equitable health care, particularly for transgender and gender diverse (TGD) populations, it remains a challenging task due to incomplete gender information in structured EHR fields. OBJECTIVE Using TGD identification as a case study, this research uses NLP and deep learning to build an accurate patient gender identity predictive model, aiming to tackle the challenges of identifying relevant patient-level information from EHR data and reducing annotation work. METHODS This study included adult patients in a large healthcare system in Boston, MA, between 4/1/2017 to 4/1/2022. To identify relevant information from massive clinical notes, we compiled a list of gender-related keywords through expert curation, literature review, and expansion via a fine-tuned BioWordVec model. This keyword list was used to pre-screen potential TGD individuals and create two datasets for model training, testing, and validation. Dataset I was a balanced dataset that contained clinician-confirmed TGD patients and cases without keywords. Dataset II contained cases with keywords. The performance of the deep learning model was compared to traditional machine learning and rule-based algorithms. RESULTS The final keyword list consists of 109 keywords, of which 58 (53.2%) were expanded by the BioWordVec model. Dataset I contained 3,150 patients (50% TGD) while Dataset II contained 200 patients (90% TGD). On Dataset I the deep learning model achieved a F1 score of 0.917, sensitivity of 0.854, and a precision of 0.980; and on Dataset II a F1 score of 0.969, sensitivity of 0.967, and precision of 0.972. The deep learning model significantly outperformed rule-based algorithms. CONCLUSION This is the first study to show that deep learning-integrated NLP algorithms can accurately identify gender identity using EHR data. Future work should leverage and evaluate additional diverse data sources to generate more generalizable algorithms.
Collapse
Affiliation(s)
- Yining Hua
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Department of Epidemiology, Harvard T.H Chan School of Public Health, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Liqin Wang
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| | - Vi Nguyen
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| | - Meghan Rieu-Werden
- Division of General Medicine, Massachusetts General Hospital, Boston, MA, USA.
| | - Alex McDowell
- Health Policy Research Institute, Mongan Institute, Massachusetts General Hospital, Boston, MA, USA; Department of Health Care Policy, Harvard Medical School, Boston, MA, USA.
| | - David W Bates
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| | - Dinah Foer
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA; Division of Allergy and Clinical Immunology, Department of Medicine, Brigham and Women's Hospital, USA.
| | - Li Zhou
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
15
|
Sun Z, Lin M, Zhu Q, Xie Q, Wang F, Lu Z, Peng Y. A scoping review on multimodal deep learning in biomedical images and texts. J Biomed Inform 2023; 146:104482. [PMID: 37652343 PMCID: PMC10591890 DOI: 10.1016/j.jbi.2023.104482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 07/18/2023] [Accepted: 08/28/2023] [Indexed: 09/02/2023]
Abstract
OBJECTIVE Computer-assisted diagnostic and prognostic systems of the future should be capable of simultaneously processing multimodal data. Multimodal deep learning (MDL), which involves the integration of multiple sources of data, such as images and text, has the potential to revolutionize the analysis and interpretation of biomedical data. However, it only caught researchers' attention recently. To this end, there is a critical need to conduct a systematic review on this topic, identify the limitations of current work, and explore future directions. METHODS In this scoping review, we aim to provide a comprehensive overview of the current state of the field and identify key concepts, types of studies, and research gaps with a focus on biomedical images and texts joint learning, mainly because these two were the most commonly available data types in MDL research. RESULT This study reviewed the current uses of multimodal deep learning on five tasks: (1) Report generation, (2) Visual question answering, (3) Cross-modal retrieval, (4) Computer-aided diagnosis, and (5) Semantic segmentation. CONCLUSION Our results highlight the diverse applications and potential of MDL and suggest directions for future research in the field. We hope our review will facilitate the collaboration of natural language processing (NLP) and medical imaging communities and support the next generation of decision-making and computer-assisted diagnostic system development.
Collapse
Affiliation(s)
- Zhaoyi Sun
- Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA.
| | - Mingquan Lin
- Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA.
| | - Qingqing Zhu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA.
| | - Qianqian Xie
- Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA.
| | - Fei Wang
- Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA.
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA.
| | - Yifan Peng
- Population Health Sciences, Weill Cornell Medicine, New York, NY 10016, USA.
| |
Collapse
|
16
|
Pungitore S, Subbian V. Assessment of Prediction Tasks and Time Window Selection in Temporal Modeling of Electronic Health Record Data: a Systematic Review. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2023; 7:313-331. [PMID: 37637723 PMCID: PMC10449760 DOI: 10.1007/s41666-023-00143-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 04/12/2023] [Accepted: 07/28/2023] [Indexed: 08/29/2023]
Abstract
Temporal electronic health record (EHR) data are often preferred for clinical prediction tasks because they offer more complete representations of a patient's pathophysiology than static data. A challenge when working with temporal EHR data is problem formulation, which includes defining the time windows of interest and the prediction task. Our objective was to conduct a systematic review that assessed the definition and reporting of concepts relevant to temporal clinical prediction tasks. We searched PubMed® and IEEE Xplore® databases for studies from January 1, 2010 applying machine learning models to EHR data for patient outcome prediction. Publications applying time-series methods were selected for further review. We identified 92 studies and summarized them by clinical context and definition and reporting of the prediction problem. For the time windows of interest, 12 studies did not discuss window lengths, 57 used a single set of window lengths, and 23 evaluated the relationship between window length and model performance. We also found that 72 studies had appropriate reporting of the prediction task. However, evaluation of prediction problem formulation for temporal EHR data was complicated by heterogeneity in assessing and reporting of these concepts. Even among studies modeling similar clinical outcomes, there were variations in terminology used to describe the prediction problem, rationale for window lengths, and determination of the outcome of interest. As temporal modeling using EHR data expands, minimal reporting standards should include time-series specific concerns to promote rigor and reproducibility in future studies and facilitate model implementation in clinical settings. Supplementary Information The online version contains supplementary material available at 10.1007/s41666-023-00143-4.
Collapse
Affiliation(s)
- Sarah Pungitore
- Program in Applied Mathematics, Department of Mathematics, 617 N Santa Rita Ave, Tucson, AZ 85721 USA
| | - Vignesh Subbian
- Department of Biomedical Engineering, The University of Arizona, Tucson, AZ 85721-0020 USA
- Department of Systems and Industrial Engineering, The University of Arizona, Tucson, AZ 85721-0020 USA
| |
Collapse
|
17
|
Bosch D, Kuppen MCP, Tascilar M, Smilde TJ, Mulders PFA, Uyl-de Groot CA, van Oort IM. Reliability and Efficiency of the CAPRI-3 Metastatic Prostate Cancer Registry Driven by Artificial Intelligence. Cancers (Basel) 2023; 15:3808. [PMID: 37568624 PMCID: PMC10417512 DOI: 10.3390/cancers15153808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/19/2023] [Accepted: 07/23/2023] [Indexed: 08/13/2023] Open
Abstract
BACKGROUND Manual data collection is still the gold standard for disease-specific patient registries. However, CAPRI-3 uses text mining (an artificial intelligence (AI) technology) for patient identification and data collection. The aim of this study is to demonstrate the reliability and efficiency of this AI-driven approach. METHODS CAPRI-3 is an observational retrospective multicenter cohort registry on metastatic prostate cancer. We tested the patient-identification algorithm and automated data extraction through manual validation of the same patients in two pilots in 2019 and 2022. RESULTS Pilot one identified 2030 patients and pilot two 9464 patients. The negative predictive value of the algorithm was maximized to prevent false exclusions and reached 94.8%. The completeness and accuracy of the automated data extraction were 92.3% or higher, except for date fields and inaccessible data (images/pdf) (10-88.9%). Additional manual quality control took over 3 h less time per patient than the original fully manual CAPRI registry (105 vs. 300 min). CONCLUSIONS The CAPRI-3 patient-identification algorithm is a sound replacement for excluding ineligible candidates. The AI-driven data extraction is largely accurate and complete, but manual quality control is needed for less reliable and inaccessible data. Overall, the AI-driven approach of the CAPRI-3 registry is reliable and timesaving.
Collapse
Affiliation(s)
- Dianne Bosch
- Department of Urology, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands (I.M.v.O.)
| | - Malou C. P. Kuppen
- Department of Radiotherapy, Maastro Clinic, 6229 ET Maastricht, The Netherlands
| | - Metin Tascilar
- Department of Medical Oncology, Isala Hospital, 8025 AB Zwolle, The Netherlands
| | - Tineke J. Smilde
- Department of Medical Oncology, Jeroen Bosch Hospital, 5223 GZ ‘s-Hertogenbosch, The Netherlands;
| | - Peter F. A. Mulders
- Department of Urology, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands (I.M.v.O.)
| | - Carin A. Uyl-de Groot
- Erasmus School of Health Policy and Management, Erasmus University Rotterdam, 3062 PA Rotterdam, The Netherlands
| | - Inge M. van Oort
- Department of Urology, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands (I.M.v.O.)
| |
Collapse
|
18
|
Ma M, Hao X, Zhao J, Luo S, Liu Y, Li D. Predicting heart failure in-hospital mortality by integrating longitudinal and category data in electronic health records. Med Biol Eng Comput 2023:10.1007/s11517-023-02816-z. [PMID: 36959414 DOI: 10.1007/s11517-023-02816-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 03/02/2023] [Indexed: 03/25/2023]
Abstract
Heart failure is a life-threatening syndrome that is diagnosed in 3.6 million people worldwide each year. We propose a deep fusion learning model (DFL-IMP) that uses time series and category data from electronic health records to predict in-hospital mortality in patients with heart failure. We considered 41 time series features (platelets, white blood cells, urea nitrogen, etc.) and 17 category features (gender, insurance, marital status, etc.) as predictors, all of which were available within the time of the patient's last hospitalization, and a total of 7696 patients participated in the observational study. Our model was evaluated against different time windows. The best performance was achieved with an AUC of 0.914 when the observation window was 5 days and the prediction window was 30 days. Outperformed other baseline models including LR (0.708), RF (0.717), SVM (0.675), LSTM (0.757), GRU (0.759), GRU-U (0.766) and MTSSP (0.770). This tool allows us to predict the expected pathway of heart failure patients and intervene early in the treatment process, which has significant implications for improving the life expectancy of heart failure patients.
Collapse
Affiliation(s)
- Meikun Ma
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
- Key Laboratory of Big Data Fusion Analysis and Application of Shanxi Province, Taiyuan, 030024, China
- Technology Research Center of Spatial Information Network Engineering of Shanxi, Taiyuan, 030024, China
| | - Xiaoyan Hao
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
| | - Jumin Zhao
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
- Key Laboratory of Big Data Fusion Analysis and Application of Shanxi Province, Taiyuan, 030024, China
- Intelligent Perception Engineering Technology Center of Shanxi, Taiyuan, 030024, China
| | - Shijie Luo
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
| | - Yi Liu
- Key Laboratory of Big Data Fusion Analysis and Application of Shanxi Province, Taiyuan, 030024, China
- Technology Research Center of Spatial Information Network Engineering of Shanxi, Taiyuan, 030024, China
- College of Data Science, Taiyuan University of Technology, Taiyuan, 030024, China
| | - Dengao Li
- Key Laboratory of Big Data Fusion Analysis and Application of Shanxi Province, Taiyuan, 030024, China.
- Technology Research Center of Spatial Information Network Engineering of Shanxi, Taiyuan, 030024, China.
- College of Data Science, Taiyuan University of Technology, Taiyuan, 030024, China.
| |
Collapse
|
19
|
Zhong Y, Guo Y, Fang Y, Wu Z, Wang J, Hu W. Geometric and dosimetric evaluation of deep learning based auto-segmentation for clinical target volume on breast cancer. J Appl Clin Med Phys 2023:e13951. [PMID: 36920901 DOI: 10.1002/acm2.13951] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 02/09/2023] [Accepted: 02/12/2023] [Indexed: 03/16/2023] Open
Abstract
BACKGROUND Recently, target auto-segmentation techniques based on deep learning (DL) have shown promising results. However, inaccurate target delineation will directly affect the treatment planning dose distribution and the effect of subsequent radiotherapy work. Evaluation based on geometric metrics alone may not be sufficient for target delineation accuracy assessment. The purpose of this paper is to validate the performance of automatic segmentation with dosimetric metrics and try to construct new evaluation geometric metrics to comprehensively understand the dose-response relationship from the perspective of clinical application. MATERIALS AND METHODS A DL-based target segmentation model was developed by using 186 manual delineation modified radical mastectomy breast cancer cases. The resulting DL model were used to generate alternative target contours in a new set of 48 patients. The Auto-plan was reoptimized to ensure the same optimized parameters as the reference Manual-plan. To assess the dosimetric impact of target auto-segmentation, not only common geometric metrics but also new spatial parameters with distance and relative volume ( R V ${R}_V$ ) to target were used. Correlations were performed using Spearman's correlation between segmentation evaluation metrics and dosimetric changes. RESULTS Only strong (|R2 | > 0.6, p < 0.01) or moderate (|R2 | > 0.4, p < 0.01) Pearson correlation was established between the traditional geometric metric and three dosimetric evaluation indices to target (conformity index, homogeneity index, and mean dose). For organs at risk (OARs), inferior or no significant relationship was found between geometric parameters and dosimetric differences. Furthermore, we found that OARs dose distribution was affected by boundary error of target segmentation instead of distance and R V ${R}_V$ to target. CONCLUSIONS Current geometric metrics could reflect a certain degree of dose effect of target variation. To find target contour variations that do lead to OARs dosimetry changes, clinically oriented metrics that more accurately reflect how segmentation quality affects dosimetry should be constructed.
Collapse
Affiliation(s)
- Yang Zhong
- Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Shanghai, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.,Shanghai Clinical Research Center for Radiation Oncology, Shanghai, China.,Shanghai Key Laboratory of Radiation Oncology, Shanghai, China
| | - Ying Guo
- Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Shanghai, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.,Shanghai Clinical Research Center for Radiation Oncology, Shanghai, China.,Shanghai Key Laboratory of Radiation Oncology, Shanghai, China
| | - Yingtao Fang
- Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Shanghai, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.,Shanghai Clinical Research Center for Radiation Oncology, Shanghai, China.,Shanghai Key Laboratory of Radiation Oncology, Shanghai, China
| | - Zhiqiang Wu
- Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Shanghai, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.,Shanghai Clinical Research Center for Radiation Oncology, Shanghai, China.,Shanghai Key Laboratory of Radiation Oncology, Shanghai, China
| | - Jiazhou Wang
- Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Shanghai, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.,Shanghai Clinical Research Center for Radiation Oncology, Shanghai, China.,Shanghai Key Laboratory of Radiation Oncology, Shanghai, China
| | - Weigang Hu
- Department of Radiation Oncology, Fudan University Shanghai Cancer Center, Shanghai, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.,Shanghai Clinical Research Center for Radiation Oncology, Shanghai, China.,Shanghai Key Laboratory of Radiation Oncology, Shanghai, China
| |
Collapse
|
20
|
Liang Y, Guo C. Heart failure disease prediction and stratification with temporal electronic health records data using patient representation. Biocybern Biomed Eng 2023. [DOI: 10.1016/j.bbe.2022.12.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
21
|
Ding Y, Sun Y, Liu C, Jiang Q, Chen F, Cao Y. SERS-Based Biosensors Combined with Machine Learning for Medical Application. ChemistryOpen 2023; 12:e202200192. [PMID: 36627171 PMCID: PMC9831797 DOI: 10.1002/open.202200192] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 12/09/2022] [Indexed: 01/12/2023] Open
Abstract
Surface-enhanced Raman spectroscopy (SERS) has shown strength in non-invasive, rapid, trace analysis and has been used in many fields in medicine. Machine learning (ML) is an algorithm that can imitate human learning styles and structure existing content with the knowledge to effectively improve learning efficiency. Integrating SERS and ML can have a promising future in the medical field. In this review, we summarize the applications of SERS combined with ML in recent years, such as the recognition of biological molecules, rapid diagnosis of diseases, developing of new immunoassay techniques, and enhancing SERS capabilities in semi-quantitative measurements. Ultimately, the possible opportunities and challenges of combining SERS with ML are addressed.
Collapse
Affiliation(s)
- Yan Ding
- Department of Forensic MedicineNanjing Medical UniversityNanjing211166P.R. China
| | - Yang Sun
- Department of Forensic MedicineNanjing Medical UniversityNanjing211166P.R. China
| | - Cheng Liu
- Department of Forensic MedicineNanjing Medical UniversityNanjing211166P.R. China
| | - Qiao‐Yan Jiang
- Department of Forensic MedicineNanjing Medical UniversityNanjing211166P.R. China
| | - Feng Chen
- Department of Forensic MedicineNanjing Medical UniversityNanjing211166P.R. China
| | - Yue Cao
- Department of Forensic MedicineNanjing Medical UniversityNanjing211166P.R. China
| |
Collapse
|
22
|
Application of machine learning in predicting the risk of postpartum depression: A systematic review. J Affect Disord 2022; 318:364-379. [PMID: 36055532 DOI: 10.1016/j.jad.2022.08.070] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 08/08/2022] [Accepted: 08/22/2022] [Indexed: 11/20/2022]
Abstract
BACKGROUND Postpartum depression (PPD) presents a serious health problem among women and their families. Machine learning (ML) is a rapidly advancing field with increasing utility in predicting PPD risk. We aimed to synthesize and evaluate the quality of studies on application of ML techniques in predicting PPD risk. METHODS We conducted a systematic search of eight databases, identifying English and Chinese studies on ML techniques for predicting PPD risk and ML techniques with performance metrics. Quality of the studies involved was evaluated using the Prediction Model Risk of Bias Assessment Tool. RESULTS Seventeen studies involving 62 prediction models were included. Supervised learning was the main ML technique employed and the common ML models were support vector machine, random forest and logistic regression. Five studies (30 %) reported both internal and external validation. Two studies involved model translation, but none were tested clinically. All studies showed a high risk of bias, and more than half showed high application risk. LIMITATIONS Including Chinese articles slightly reduced the reproducibility of the review. Model performance was not quantitatively analyzed owing to inconsistent metrics and the absence of methods for correlation meta-analysis. CONCLUSIONS Researchers have paid more attention to model development than to validation, and few have focused on improvement and innovation. Models for predicting PPD risk continue to emerge. However, few have achieved the acceptable quality standards. Therefore, ML techniques for successfully predicting PPD risk are yet to be deployed in clinical environments.
Collapse
|
23
|
Luo M, Wang YT, Wang XK, Hou WH, Huang RL, Liu Y, Wang JQ. A multi-granularity convolutional neural network model with temporal information and attention mechanism for efficient diabetes medical cost prediction. Comput Biol Med 2022; 151:106246. [PMID: 36343403 DOI: 10.1016/j.compbiomed.2022.106246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 09/30/2022] [Accepted: 10/22/2022] [Indexed: 12/27/2022]
Abstract
As the cost of diabetes treatment continues to grow, it is critical to accurately predict the medical costs of diabetes. Most medical cost studies based on convolutional neural networks (CNNs) ignore the importance of multi-granularity information of medical concepts and time interval characteristics of patients' multiple visit sequences, which reflect the frequency of patient visits and the severity of the disease. Therefore, this paper proposes a new end-to-end deep neural network structure, MST-CNN, for medical cost prediction. The MST-CNN model improves the representation quality of medical concepts by constructing a multi-granularity embedding model of medical concepts and incorporates a time interval vector to accurately measure the frequency of patient visits and form an accurate representation of medical events. Moreover, the MST-CNN model integrates a channel attention mechanism to adaptively adjust the channel weights to focus on significant medical features. The MST-CNN model systematically addresses the problem of deep learning models for temporal data representation. A case study and three comparative experiments are conducted using data collected from Pingjiang County. Through experiments, the methods used in the proposed model are analyzed, and the super contribution of the model performance is demonstrated.
Collapse
Affiliation(s)
- Min Luo
- School of Business, Central South University, Changsha, 410083, PR China
| | - Yi-Ting Wang
- School of Business, Central South University, Changsha, 410083, PR China
| | - Xiao-Kang Wang
- School of Business, Central South University, Changsha, 410083, PR China
| | - Wen-Hui Hou
- School of Business, Central South University, Changsha, 410083, PR China
| | - Rui-Lu Huang
- School of Business, Central South University, Changsha, 410083, PR China
| | - Ye Liu
- School of Business, Central South University, Changsha, 410083, PR China
| | - Jian-Qiang Wang
- School of Business, Central South University, Changsha, 410083, PR China.
| |
Collapse
|
24
|
Benchmarking emergency department prediction models with machine learning and public electronic health records. Sci Data 2022; 9:658. [PMID: 36302776 PMCID: PMC9610299 DOI: 10.1038/s41597-022-01782-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 10/14/2022] [Indexed: 11/26/2022] Open
Abstract
The demand for emergency department (ED) services is increasing across the globe, particularly during the current COVID-19 pandemic. Clinical triage and risk assessment have become increasingly challenging due to the shortage of medical resources and the strain on hospital infrastructure caused by the pandemic. As a result of the widespread use of electronic health records (EHRs), we now have access to a vast amount of clinical data, which allows us to develop prediction models and decision support systems to address these challenges. To date, there is no widely accepted clinical prediction benchmark related to the ED based on large-scale public EHRs. An open-source benchmark data platform would streamline research workflows by eliminating cumbersome data preprocessing, and facilitate comparisons among different studies and methodologies. Based on the Medical Information Mart for Intensive Care IV Emergency Department (MIMIC-IV-ED) database, we created a benchmark dataset and proposed three clinical prediction benchmarks. This study provides future researchers with insights, suggestions, and protocols for managing data and developing predictive tools for emergency care.
Collapse
|
25
|
Computational drug repurposing based on electronic health records: a scoping review. NPJ Digit Med 2022; 5:77. [PMID: 35701544 PMCID: PMC9198008 DOI: 10.1038/s41746-022-00617-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 05/19/2022] [Indexed: 11/30/2022] Open
Abstract
Computational drug repurposing methods adapt Artificial intelligence (AI) algorithms for the discovery of new applications of approved or investigational drugs. Among the heterogeneous datasets, electronic health records (EHRs) datasets provide rich longitudinal and pathophysiological data that facilitate the generation and validation of drug repurposing. Here, we present an appraisal of recently published research on computational drug repurposing utilizing the EHR. Thirty-three research articles, retrieved from Embase, Medline, Scopus, and Web of Science between January 2000 and January 2022, were included in the final review. Four themes, (1) publication venue, (2) data types and sources, (3) method for data processing and prediction, and (4) targeted disease, validation, and released tools were presented. The review summarized the contribution of EHR used in drug repurposing as well as revealed that the utilization is hindered by the validation, accessibility, and understanding of EHRs. These findings can support researchers in the utilization of medical data resources and the development of computational methods for drug repurposing.
Collapse
|