1
|
Jimenez Ramos M, Kendall TJ, Drozdov I, Fallowfield JA. A data-driven approach to decode metabolic dysfunction-associated steatotic liver disease. Ann Hepatol 2024; 29:101278. [PMID: 38135251 PMCID: PMC10907333 DOI: 10.1016/j.aohep.2023.101278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 12/04/2023] [Indexed: 12/24/2023]
Abstract
Metabolic dysfunction-associated steatotic liver disease (MASLD), defined by the presence of liver steatosis together with at least one out of five cardiometabolic factors, is the most common cause of chronic liver disease worldwide, affecting around one in three people. Yet the clinical presentation of MASLD and the risk of progression to cirrhosis and adverse clinical outcomes is highly variable. It, therefore, represents both a global public health threat and a precision medicine challenge. Artificial intelligence (AI) is being investigated in MASLD to develop reproducible, quantitative, and automated methods to enhance patient stratification and to discover new biomarkers and therapeutic targets in MASLD. This review details the different applications of AI and machine learning algorithms in MASLD, particularly in analyzing electronic health record, digital pathology, and imaging data. Additionally, it also describes how specific MASLD consortia are leveraging multimodal data sources to spark research breakthroughs in the field. Using a new national-level 'data commons' (SteatoSITE) as an exemplar, the opportunities, as well as the technical challenges of large-scale databases in MASLD research, are highlighted.
Collapse
Affiliation(s)
- Maria Jimenez Ramos
- Centre for Inflammation Research, Institute for Regeneration and Repair, University of Edinburgh, Edinburgh BioQuarter, 4-5 Little France Drive, Edinburgh EH16 4UU, UK
| | - Timothy J Kendall
- Centre for Inflammation Research, Institute for Regeneration and Repair, University of Edinburgh, Edinburgh BioQuarter, 4-5 Little France Drive, Edinburgh EH16 4UU, UK; Edinburgh Pathology, University of Edinburgh, 51 Little France Crescent, Old Dalkeith Rd, Edinburgh EH16 4SA, UK
| | - Ignat Drozdov
- Bering Limited, 54 Portland Place, London, W1B 1DY, UK
| | - Jonathan A Fallowfield
- Centre for Inflammation Research, Institute for Regeneration and Repair, University of Edinburgh, Edinburgh BioQuarter, 4-5 Little France Drive, Edinburgh EH16 4UU, UK.
| |
Collapse
|
2
|
Gao J, Bonzel CL, Hong C, Varghese P, Zakir K, Gronsbell J. Semi-supervised ROC analysis for reliable and streamlined evaluation of phenotyping algorithms. J Am Med Inform Assoc 2024; 31:640-650. [PMID: 38128118 PMCID: PMC10873838 DOI: 10.1093/jamia/ocad226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 09/22/2023] [Accepted: 11/20/2023] [Indexed: 12/23/2023] Open
Abstract
OBJECTIVE High-throughput phenotyping will accelerate the use of electronic health records (EHRs) for translational research. A critical roadblock is the extensive medical supervision required for phenotyping algorithm (PA) estimation and evaluation. To address this challenge, numerous weakly-supervised learning methods have been proposed. However, there is a paucity of methods for reliably evaluating the predictive performance of PAs when a very small proportion of the data is labeled. To fill this gap, we introduce a semi-supervised approach (ssROC) for estimation of the receiver operating characteristic (ROC) parameters of PAs (eg, sensitivity, specificity). MATERIALS AND METHODS ssROC uses a small labeled dataset to nonparametrically impute missing labels. The imputations are then used for ROC parameter estimation to yield more precise estimates of PA performance relative to classical supervised ROC analysis (supROC) using only labeled data. We evaluated ssROC with synthetic, semi-synthetic, and EHR data from Mass General Brigham (MGB). RESULTS ssROC produced ROC parameter estimates with minimal bias and significantly lower variance than supROC in the simulated and semi-synthetic data. For the 5 PAs from MGB, the estimates from ssROC are 30% to 60% less variable than supROC on average. DISCUSSION ssROC enables precise evaluation of PA performance without demanding large volumes of labeled data. ssROC is also easily implementable in open-source R software. CONCLUSION When used in conjunction with weakly-supervised PAs, ssROC facilitates the reliable and streamlined phenotyping necessary for EHR-based research.
Collapse
Affiliation(s)
- Jianhui Gao
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Clara-Lea Bonzel
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| | - Chuan Hong
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States
| | - Paul Varghese
- Health Informatics, Verily Life Sciences, Cambridge, MA, United States
| | - Karim Zakir
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Jessica Gronsbell
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
- Department of Family and Community Medicine, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
3
|
Wen J, Hou J, Bonzel CL, Zhao Y, Castro VM, Gainer VS, Weisenfeld D, Cai T, Ho YL, Panickan VA, Costa L, Hong C, Gaziano JM, Liao KP, Lu J, Cho K, Cai T. LATTE: Label-efficient incident phenotyping from longitudinal electronic health records. Patterns (N Y) 2024; 5:100906. [PMID: 38264714 PMCID: PMC10801250 DOI: 10.1016/j.patter.2023.100906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 09/06/2023] [Accepted: 12/01/2023] [Indexed: 01/25/2024]
Abstract
Electronic health record (EHR) data are increasingly used to support real-world evidence studies but are limited by the lack of precise timings of clinical events. Here, we propose a label-efficient incident phenotyping (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embeddings, LATTE selects predictive features and compresses their information into longitudinal visit embeddings through visit attention learning. LATTE models the sequential dependency between the target event and visit embeddings to derive the timings. To improve label efficiency, LATTE constructs longitudinal silver-standard labels from unlabeled patients to perform semi-supervised training. LATTE is evaluated on the onset of type 2 diabetes, heart failure, and relapses of multiple sclerosis. LATTE consistently achieves substantial improvements over benchmark methods while providing high prediction interpretability. The event timings are shown to help discover risk factors of heart failure among patients with rheumatoid arthritis.
Collapse
Affiliation(s)
- Jun Wen
- Harvard Medical School, Boston, MA, USA
- VA Boston Healthcare System, Boston, MA, USA
| | - Jue Hou
- University of Minnesota, Minneapolis, MN, USA
| | - Clara-Lea Bonzel
- Harvard Medical School, Boston, MA, USA
- VA Boston Healthcare System, Boston, MA, USA
| | | | | | | | | | - Tianrun Cai
- VA Boston Healthcare System, Boston, MA, USA
- Mass General Brigham, Boston, MA, USA
| | - Yuk-Lam Ho
- VA Boston Healthcare System, Boston, MA, USA
| | - Vidul A. Panickan
- Harvard Medical School, Boston, MA, USA
- VA Boston Healthcare System, Boston, MA, USA
| | | | | | - J. Michael Gaziano
- Harvard Medical School, Boston, MA, USA
- VA Boston Healthcare System, Boston, MA, USA
- Brigham and Women’s Hospital, Boston, MA, USA
| | - Katherine P. Liao
- Harvard Medical School, Boston, MA, USA
- VA Boston Healthcare System, Boston, MA, USA
- Brigham and Women’s Hospital, Boston, MA, USA
| | - Junwei Lu
- VA Boston Healthcare System, Boston, MA, USA
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Kelly Cho
- Harvard Medical School, Boston, MA, USA
- VA Boston Healthcare System, Boston, MA, USA
- Brigham and Women’s Hospital, Boston, MA, USA
| | - Tianxi Cai
- Harvard Medical School, Boston, MA, USA
- VA Boston Healthcare System, Boston, MA, USA
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
4
|
Xiong X, Sweet SM, Liu M, Hong C, Bonzel CL, Panickan VA, Zhou D, Wang L, Costa L, Ho YL, Geva A, Mandl KD, Cheng S, Xia Z, Cho K, Gaziano JM, Liao KP, Cai T, Cai T. Knowledge-Driven Online Multimodal Automated Phenotyping System. medRxiv 2023:2023.09.29.23296239. [PMID: 37873131 PMCID: PMC10593060 DOI: 10.1101/2023.09.29.23296239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Though electronic health record (EHR) systems are a rich repository of clinical information with large potential, the use of EHR-based phenotyping algorithms is often hindered by inaccurate diagnostic records, the presence of many irrelevant features, and the requirement for a human-labeled training set. In this paper, we describe a knowledge-driven online multimodal automated phenotyping (KOMAP) system that i) generates a list of informative features by an online narrative and codified feature search engine (ONCE) and ii) enables the training of a multimodal phenotyping algorithm based on summary data. Powered by composite knowledge from multiple EHR sources, online article corpora, and a large language model, features selected by ONCE show high concordance with the state-of-the-art AI models (GPT4 and ChatGPT) and encourage large-scale phenotyping by providing a smaller but highly relevant feature set. Validation of the KOMAP system across four healthcare centers suggests that it can generate efficient phenotyping algorithms with robust performance. Compared to other methods requiring patient-level inputs and gold-standard labels, the fully online KOMAP provides a significant opportunity to enable multi-center collaboration.
Collapse
|
5
|
Xie H, Li D, Wang Y, Kawai Y. An early warning model of type 2 diabetes risk based on POI visit history and food access management. PLoS One 2023; 18:e0288231. [PMID: 37494340 PMCID: PMC10370762 DOI: 10.1371/journal.pone.0288231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Accepted: 06/22/2023] [Indexed: 07/28/2023] Open
Abstract
Type 2 diabetes (T2D) is a long-term, highly prevalent disease that provides extensive data support in spatial-temporal user case data mining studies. In this paper, we present a novel T2D food access early risk warning model that aims to emphasize health management awareness among susceptible populations. This model incorporates the representation of T2D-related food categories with graph convolutional networks (GCN), enabling the diet risk visualization from the geotagged Twitter visit records on a map. A long short-term memory (LSTM) module is used to enhance the performance of the case temporal feature extraction and location approximate predictive approach. Through an analysis of the resulting data set, we highlight the food effect category has on T2D early risk visualization and user food access management on the map. Moreover, our proposed method can provide suggestions to T2D susceptible patients on diet management.
Collapse
Affiliation(s)
- Huaze Xie
- School of Computer Science and Technology, Hainan University, Haikou City, Hainan Province, China
| | - Da Li
- Faculty of Engineering, Fukuoka University, Fukuoka City, Fukuoka State, Japan
| | - Yuanyuan Wang
- Graduate School of Sciences and Technology for Innovation, Yamaguchi University, Ube City, Yamaguchi State, Japan
| | - Yukiko Kawai
- Division for Frontier Informatics, Kyoto Sangyo University, Kyoto City, Kyoto Prefecture, Japan
- Cybermedia Center, Osaka University, Ibaraki City, Osaka Prefecture, Japan
| |
Collapse
|