1
|
Guo Z, Felag J, Rozum JC, Correia RB, Wang X, Rocha LM. Focused digital cohort selection from social media using the metric backbone of biomedical knowledge graphs. J Biomed Inform 2025; 168:104847. [PMID: 40460925 DOI: 10.1016/j.jbi.2025.104847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 03/27/2025] [Accepted: 05/08/2025] [Indexed: 06/11/2025]
Abstract
Social media data allows researchers to construct large digital cohorts - groups of users who post health-related content - to study the interplay between human behavior and medical treatment. Identifying the users most relevant to a specific health problem is, however, a challenge in that social media sites vary in the generality of their discourse. While X (formerly Twitter), Instagram, and Facebook cater to wide ranging topics, Reddit subgroups and dedicated patient advocacy forums trade in much more specific, biomedically-relevant discourse. To filter relevant users on any social media, we have developed a general method and tested it on epilepsy discourse. We analyzed the text from posts by users who mention epilepsy drugs at least once in the general-purpose social media sites X and Instagram, the epilepsy-focused Reddit subgroup (r/Epilepsy), and the Epilepsy Foundation of America (EFA) forums. We used a curated medical terminology dictionary to generate a knowledge graph (KG) from each social media site, whereby nodes represent terms, and edge weights denote the strength of association between pairs of terms in the collected text. Our method is based on computing the metric backbone of each KG, which yields the (sparsified) subgraph of edges that participate in shortest paths. By comparing the subset of users who contribute to the backbone to the subset who do not, we show that epilepsy-focused social media users contribute to the KG backbone in much higher proportion than do general-purpose social media users. Furthermore, using human annotation of Instagram posts, we demonstrate that users who do not contribute to the backbone are much more likely to use dictionary terms in a manner inconsistent with their biomedical meaning and are rightly excluded from the cohort of interest. Our metric backbone approach, thus, has several benefits: it yields focused user cohorts who engage in discourse relevant to a targeted biomedical problem; unlike engagement-based approaches, it can retain low-engagement users who nonetheless contribute meaningful biomedical insights and filter out very vocal users who contribute no relevant content, it is parameter-free, algebraically principled, does not require classifiers or human-curation, and is simple to compute with the open-source code we provide.
Collapse
Affiliation(s)
- Ziqi Guo
- School of Systems Science & Industrial Engineering, Binghamton University, Binghamton, NY, USA
| | - Jack Felag
- School of Systems Science & Industrial Engineering, Binghamton University, Binghamton, NY, USA
| | - Jordan C Rozum
- School of Systems Science & Industrial Engineering, Binghamton University, Binghamton, NY, USA
| | - Rion Brattig Correia
- School of Systems Science & Industrial Engineering, Binghamton University, Binghamton, NY, USA
| | - Xuan Wang
- School of Systems Science & Industrial Engineering, Binghamton University, Binghamton, NY, USA; School of Informatics, Computing & Engineering, Indiana University, Bloomington, IN, USA
| | - Luis M Rocha
- School of Systems Science & Industrial Engineering, Binghamton University, Binghamton, NY, USA; Universidade Católica Portuguesa, Católica Biomedical Research Centre, Lisbon, Portugal.
| |
Collapse
|
2
|
Correia RB, Rozum JC, Cross L, Felag J, Gallant M, Guo Z, Herr BW, Min A, Sanchez-Valle J, Stungis Rocha D, Valencia A, Wang X, Börner K, Miller W, Rocha LM. myAURA: a personalized health library for epilepsy management via knowledge graph sparsification and visualization. J Am Med Inform Assoc 2025:ocaf012. [PMID: 39890454 DOI: 10.1093/jamia/ocaf012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 12/06/2024] [Accepted: 01/14/2025] [Indexed: 02/03/2025] Open
Abstract
OBJECTIVES Report the development of the patient-centered myAURA application and suite of methods designed to aid epilepsy patients, caregivers, and clinicians in making decisions about self-management and care. MATERIALS AND METHODS myAURA rests on an unprecedented collection of epilepsy-relevant heterogeneous data resources, such as biomedical databases, social media, and electronic health records (EHRs). We use a patient-centered biomedical dictionary to link the collected data in a multilayer knowledge graph (KG) computed with a generalizable, open-source methodology. RESULTS Our approach is based on a novel network sparsification method that uses the metric backbone of weighted graphs to discover important edges for inference, recommendation, and visualization. We demonstrate by studying drug-drug interaction from EHRs, extracting epilepsy-focused digital cohorts from social media, and generating a multilayer KG visualization. We also present our patient-centered design and pilot-testing of myAURA, including its user interface. DISCUSSION The ability to search and explore myAURA's heterogeneous data sources in a single, sparsified, multilayer KG is highly useful for a range of epilepsy studies and stakeholder support. CONCLUSION Our stakeholder-driven, scalable approach to integrating traditional and nontraditional data sources enables both clinical discovery and data-powered patient self-management in epilepsy and can be generalized to other chronic conditions.
Collapse
Affiliation(s)
- Rion Brattig Correia
- School of Systems Science and Industrial Engineering, Binghamton University, Binghamton, NY 13902-6000, United States
| | - Jordan C Rozum
- School of Systems Science and Industrial Engineering, Binghamton University, Binghamton, NY 13902-6000, United States
| | - Leonard Cross
- Luddy School of Informatics, Computing & Engineering, Indiana University, Bloomington, IN 47408, United States
| | - Jack Felag
- School of Systems Science and Industrial Engineering, Binghamton University, Binghamton, NY 13902-6000, United States
| | - Michael Gallant
- Luddy School of Informatics, Computing & Engineering, Indiana University, Bloomington, IN 47408, United States
| | - Ziqi Guo
- School of Systems Science and Industrial Engineering, Binghamton University, Binghamton, NY 13902-6000, United States
| | - Bruce W Herr
- Luddy School of Informatics, Computing & Engineering, Indiana University, Bloomington, IN 47408, United States
| | - Aehong Min
- Donald Bren School of Information & Computer Sciences, University of California, Irvine, CA 92697-3435, United States
| | - Jon Sanchez-Valle
- Life Sciences Department, Barcelona Supercomputing Center, 08034 Barcelona, Spain
| | - Deborah Stungis Rocha
- School of Systems Science and Industrial Engineering, Binghamton University, Binghamton, NY 13902-6000, United States
| | - Alfonso Valencia
- Life Sciences Department, Barcelona Supercomputing Center, 08034 Barcelona, Spain
| | - Xuan Wang
- Luddy School of Informatics, Computing & Engineering, Indiana University, Bloomington, IN 47408, United States
| | - Katy Börner
- Luddy School of Informatics, Computing & Engineering, Indiana University, Bloomington, IN 47408, United States
| | - Wendy Miller
- School of Nursing, Indiana University, Indianapolis, IN 46202, United States
| | - Luis M Rocha
- School of Systems Science and Industrial Engineering, Binghamton University, Binghamton, NY 13902-6000, United States
- Universidade Católica Portuguesa, Católica Biomedical Research Centre, 1649-023 Lisboa, Portugal
| |
Collapse
|
3
|
Zhang G, Jin Q, Jered McInerney D, Chen Y, Wang F, Cole CL, Yang Q, Wang Y, Malin BA, Peleg M, Wallace BC, Lu Z, Weng C, Peng Y. Leveraging generative AI for clinical evidence synthesis needs to ensure trustworthiness. J Biomed Inform 2024; 153:104640. [PMID: 38608915 PMCID: PMC11217921 DOI: 10.1016/j.jbi.2024.104640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 04/08/2024] [Accepted: 04/09/2024] [Indexed: 04/14/2024]
Abstract
Evidence-based medicine promises to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, hold promise in facilitating the arduous task. However, developing accountable, fair, and inclusive models remains a complicated undertaking. In this perspective, we discuss the trustworthiness of generative AI in the context of automated summarization of medical evidence.
Collapse
Affiliation(s)
- Gongbo Zhang
- Columbia University, Department of Biomedical Informatics, New York, 10032, USA
| | - Qiao Jin
- National Institutes of Health, National Library of Medicine, National Center for Biotechnology Information, Bethesda, 20894, USA
| | | | - Yong Chen
- University of Pennsylvania, Department of Biostatistics, Epidemiology and Informatics, Philadelphia 19104, USA
| | - Fei Wang
- Weill Cornell Medicine, Department of Population Health Sciences, New York 10065, USA; Weill Cornell Medicine, Institute of AI for Digital Health, New York 10065, USA
| | - Curtis L Cole
- Weill Cornell Medicine, Department of Population Health Sciences, New York 10065, USA; Weill Cornell Medicine, Department of Medicine, New York 10065, USA
| | - Qian Yang
- Cornell University, Computing and Information Science, Ithaca 14853, USA
| | - Yanshan Wang
- University of Pittsburgh, Department of Health Information Management, Pittsburgh 15260, USA
| | - Bradley A Malin
- Vanderbilt University Medical Center, Department of Biomedical Informatics, Nashville 37203, USA; Vanderbilt University Medical Center, Department of Biostatistics, Nashville 37203, USA; Vanderbilt University, Department of Computer Science, Nashville 37212, USA
| | - Mor Peleg
- University of Haifa, Department of Information Systems, Haifa 3498838, Israel
| | - Byron C Wallace
- Northeastern University, the Khoury College of Computer Sciences, Boston 02115, USA
| | - Zhiyong Lu
- National Institutes of Health, National Library of Medicine, National Center for Biotechnology Information, Bethesda, 20894, USA
| | - Chunhua Weng
- Columbia University, Department of Biomedical Informatics, New York, 10032, USA.
| | - Yifan Peng
- Weill Cornell Medicine, Department of Population Health Sciences, New York 10065, USA.
| |
Collapse
|
4
|
Zhou J, Zhao M, Yang Z, Chen L, Liu X. Exploring the Value of MRI Measurement of Hippocampal Volume for Predicting the Occurrence and Progression of Alzheimer's Disease Based on Artificial Intelligence Deep Learning Technology and Evidence-Based Medicine Meta-Analysis. J Alzheimers Dis 2024; 97:1275-1288. [PMID: 38277290 DOI: 10.3233/jad-230733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2024]
Abstract
BACKGROUND Alzheimer's disease (AD), a major dementia cause, lacks effective treatment. MRI-based hippocampal volume measurement using artificial intelligence offers new insights into early diagnosis and intervention in AD progression. OBJECTIVE This study, involving 483 AD patients, 756 patients with mild cognitive impairment (MCI), and 968 normal controls (NC), investigated the predictive capability of MRI-based hippocampus volume measurements for AD risk using artificial intelligence and evidence-based medicine. METHODS Utilizing data from ADNI and OASIS-brains databases, three convolutional neural networks (InceptionResNetv2, Densenet169, and SEResNet50) were employed for automated AD classification based on structural MRI imaging. A multitask deep learning model and a densely connected 3D convolutional network were utilized. Additionally, a systematic meta-analysis explored the value of MRI-based hippocampal volume measurement in predicting AD occurrence and progression, drawing on 23 eligible articles from PubMed and Embase databases. RESULTS InceptionResNetv2 outperformed other networks, achieving 99.75% accuracy and 100% AUC for AD-NC classification and 99.16% accuracy and 100% AUC for MCI-NC classification. Notably, at a 512×512 size, InceptionResNetv2 demonstrated a classification accuracy of 94.29% and an AUC of 98% for AD-NC and 97.31% accuracy and 98% AUC for MCI-NC. CONCLUSIONS The study concludes that MRI-based hippocampal volume changes effectively predict AD onset and progression, facilitating early intervention and prevention.
Collapse
Affiliation(s)
- Jianguo Zhou
- Department of Radiology, Lianyungang TCM Hospital Affiliated to Nanjing University of Chinese Medicine, Lianyungang, China
| | - Mingli Zhao
- Department of Radiology, The Fourth People's Hospital of Lianyungang Affiliated to Nanjing Medical University Kangda, Lianyungang, China
| | - Zhou Yang
- Department of Rehabilitation, Lianyungang TCM Hospital Affiliated to Nanjing University of Chinese Medicine, Lianyungang, China
| | - Liping Chen
- Department of Rehabilitation, Lianyungang TCM Hospital Affiliated to Nanjing University of Chinese Medicine, Lianyungang, China
| | - Xiaoli Liu
- Department of Rehabilitation, Lianyungang TCM Hospital Affiliated to Nanjing University of Chinese Medicine, Lianyungang, China
| |
Collapse
|
5
|
Xie W, Fan K, Zhang S, Li L. Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature. J Biomed Semantics 2023; 14:5. [PMID: 37248476 PMCID: PMC10228061 DOI: 10.1186/s13326-023-00287-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 04/29/2023] [Indexed: 05/31/2023] Open
Abstract
BACKGROUND Drug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively small positive DDI samples among overwhelmingly large negative samples. Random negative sampling and positive sampling are purposely designed to improve the efficiency of AL analysis. The consistency of random negative sampling and positive sampling is shown in the paper. RESULTS PubMed abstracts are divided into two pools. Screened pool contains all abstracts that pass the DDI keywords query in PubMed, while unscreened pool includes all the other abstracts. At a prespecified recall rate of 0.95, DDI IR analysis precision is evaluated and compared. In screened pool IR analysis using supporting vector machine (SVM), similarity sampling plus uncertainty sampling improves the precision over uncertainty sampling, from 0.89 to 0.92 respectively. In the unscreened pool IR analysis, the integrated random negative sampling, positive sampling, and similarity sampling improve the precision over uncertainty sampling along, from 0.72 to 0.81 respectively. When we change the SVM to a deep learning method, all sampling schemes consistently improve DDI AL analysis in both screened pool and unscreened pool. Deep learning has significant improvement of precision over SVM, 0.96 vs. 0.92 in screened pool, and 0.90 vs. 0.81 in the unscreened pool, respectively. CONCLUSIONS By integrating various sampling schemes and deep learning algorithms into AL, the DDI IR analysis from literature is significantly improved. The random negative sampling and positive sampling are highly effective methods in improving AL analysis where the positive and negative samples are extremely imbalanced.
Collapse
Affiliation(s)
- Weixin Xie
- Department of Biomedical Informatics, Ohio State University, Columbus, OH 43210 USA
| | - Kunjie Fan
- Department of Biomedical Informatics, Ohio State University, Columbus, OH 43210 USA
| | - Shijun Zhang
- Department of Biomedical Informatics, Ohio State University, Columbus, OH 43210 USA
| | - Lang Li
- Department of Biomedical Informatics, Ohio State University, Columbus, OH 43210 USA
| |
Collapse
|