1
|
Asowata OJ, Okekunle AP, Olaiya MT, Akinyemi J, Owolabi M, Akpa OM. Stroke risk prediction models: A systematic review and meta-analysis. J Neurol Sci 2024; 460:122997. [PMID: 38669758 DOI: 10.1016/j.jns.2024.122997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 04/03/2024] [Accepted: 04/04/2024] [Indexed: 04/28/2024]
Abstract
BACKGROUND Prediction algorithms/models are viable methods for identifying individuals at high risk of stroke across diverse populations for timely intervention. However, evidence summarizing the performance of these models is limited. This study examined the performance and weaknesses of existing stroke risk-score-prediction models (SRSMs) and whether performance varied by population and region. METHODS PubMed, EMBASE, and Web of Science were searched for articles on SRSMs from the earliest records until February 2022. The Prediction Model Risk of Bias Assessment Tool was used to assess the quality of eligible articles. The performance of the SRSMs was assessed by meta-analyzing C-statistics (0 and 1) estimates from identified studies to determine the overall pooled C-statistics by fitting a linear restricted maximum likelihood in a random effect model. RESULTS Overall, 17 articles (cohort study = 15, nested case-control study = 2) comprising 739,134 stroke cases from 6,396,594 participants from diverse populations/regions (Asia; n = 8, United States; n = 3, and Europe and the United Kingdom; n = 6) were eligible for inclusion. The overall pooled c-statistics of SRSMs was 0.78 (95%CI: 0.75, 0.80; I2 = 99.9%), with most SRSMs developed using cohort studies; 0.78 (95%CI: 0.75, 0.80; I2 = 99.9%). The subgroup analyses by geographical region: Asia [0.81 (95%CI: 0.79, 0.83; I2 = 99.8%)], Europe and the United Kingdom [0.76 (95%CI: 0.69, 0.83; I2 = 99.9%)] and the United States only [0.75 (95%CI: 0.72, 0.78; I2 = 73.5%)] revealed relatively indifferent performances of SRSMs. CONCLUSION SRSM performance varied widely, and the pooled c-statistics of SRSMs suggested a fair predictive performance, with very few SRSMs validated in independent population group(s) from diverse world regions.
Collapse
Affiliation(s)
- Osahon Jeffery Asowata
- Department of Epidemiology and Medical Statistics, University of Ibadan, 200284, Nigeria
| | - Akinkunmi Paul Okekunle
- Department of Epidemiology and Medical Statistics, University of Ibadan, 200284, Nigeria; Department of Medicine, College of Medicine, University of Ibadan, 200284, Nigeria; Research Institute of Human Ecology, Seoul National University, 08826, Republic of Korea.
| | - Muideen Tunbosun Olaiya
- Stroke and Ageing Research, School of Clinical Sciences at Monash Health, Monash University, Clayton, VIC 3168, Australia
| | - Joshua Akinyemi
- Department of Epidemiology and Medical Statistics, University of Ibadan, 200284, Nigeria
| | - Mayowa Owolabi
- Department of Medicine, College of Medicine, University of Ibadan, 200284, Nigeria; Lebanese American University, 1102 2801 Beirut, Lebanon; Center for Genomic and Precision Medicine, College of Medicine, University of Ibadan, 200284, Nigeria
| | - Onoja M Akpa
- Department of Epidemiology and Medical Statistics, University of Ibadan, 200284, Nigeria; Preventive Cardiology Research Unit, Institute of Cardiovascular Diseases, College of Medicine, University of Ibadan, 200284, Nigeria; Division of Epidemiology, Biostatistics and Environmental Health, School of Public Health, University of Memphis, Memphis, USA.
| |
Collapse
|
2
|
Cai Y, Cai YQ, Tang LY, Wang YH, Gong M, Jing TC, Li HJ, Li-Ling J, Hu W, Yin Z, Gong DX, Zhang GW. Artificial intelligence in the risk prediction models of cardiovascular disease and development of an independent validation screening tool: a systematic review. BMC Med 2024; 22:56. [PMID: 38317226 PMCID: PMC10845808 DOI: 10.1186/s12916-024-03273-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Accepted: 01/23/2024] [Indexed: 02/07/2024] Open
Abstract
BACKGROUND A comprehensive overview of artificial intelligence (AI) for cardiovascular disease (CVD) prediction and a screening tool of AI models (AI-Ms) for independent external validation are lacking. This systematic review aims to identify, describe, and appraise AI-Ms of CVD prediction in the general and special populations and develop a new independent validation score (IVS) for AI-Ms replicability evaluation. METHODS PubMed, Web of Science, Embase, and IEEE library were searched up to July 2021. Data extraction and analysis were performed for the populations, distribution, predictors, algorithms, etc. The risk of bias was evaluated with the prediction risk of bias assessment tool (PROBAST). Subsequently, we designed IVS for model replicability evaluation with five steps in five items, including transparency of algorithms, performance of models, feasibility of reproduction, risk of reproduction, and clinical implication, respectively. The review is registered in PROSPERO (No. CRD42021271789). RESULTS In 20,887 screened references, 79 articles (82.5% in 2017-2021) were included, which contained 114 datasets (67 in Europe and North America, but 0 in Africa). We identified 486 AI-Ms, of which the majority were in development (n = 380), but none of them had undergone independent external validation. A total of 66 idiographic algorithms were found; however, 36.4% were used only once and only 39.4% over three times. A large number of different predictors (range 5-52,000, median 21) and large-span sample size (range 80-3,660,000, median 4466) were observed. All models were at high risk of bias according to PROBAST, primarily due to the incorrect use of statistical methods. IVS analysis confirmed only 10 models as "recommended"; however, 281 and 187 were "not recommended" and "warning," respectively. CONCLUSION AI has led the digital revolution in the field of CVD prediction, but is still in the early stage of development as the defects of research design, report, and evaluation systems. The IVS we developed may contribute to independent external validation and the development of this field.
Collapse
Affiliation(s)
- Yue Cai
- China Medical University, Shenyang, 110122, China
| | - Yu-Qing Cai
- China Medical University, Shenyang, 110122, China
| | - Li-Ying Tang
- China Medical University, Shenyang, 110122, China
| | - Yi-Han Wang
- China Medical University, Shenyang, 110122, China
| | - Mengchun Gong
- Digital Health China Co. Ltd, Beijing, 100089, China
| | - Tian-Ci Jing
- Smart Hospital Management Department, the First Hospital of China Medical University, Shenyang, 110001, China
| | - Hui-Jun Li
- Shenyang Medical & Film Science and Technology Co. Ltd., Shenyang, 110001, China
- Enduring Medicine Smart Innovation Research Institute, Shenyang, 110001, China
| | - Jesse Li-Ling
- Institute of Genetic Medicine, School of Life Science, State Key Laboratory of Biotherapy, Sichuan University, Chengdu, 610065, China
| | - Wei Hu
- Bayi Orthopedic Hospital, Chengdu, 610017, China
| | - Zhihua Yin
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang, 110122, China.
| | - Da-Xin Gong
- Smart Hospital Management Department, the First Hospital of China Medical University, Shenyang, 110001, China.
- The Internet Hospital Branch of the Chinese Research Hospital Association, Beijing, 100006, China.
| | - Guang-Wei Zhang
- Smart Hospital Management Department, the First Hospital of China Medical University, Shenyang, 110001, China.
- The Internet Hospital Branch of the Chinese Research Hospital Association, Beijing, 100006, China.
| |
Collapse
|
3
|
El-Bouri WK, Sanders A, Lip GYH. Predicting acute and long-term mortality in a cohort of pulmonary embolism patients using machine learning. Eur J Intern Med 2023; 118:42-48. [PMID: 37487827 DOI: 10.1016/j.ejim.2023.07.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 06/22/2023] [Accepted: 07/10/2023] [Indexed: 07/26/2023]
Abstract
BACKGROUND Pulmonary embolism (PE) is a severe condition that causes significant mortality and morbidity. Due to its acute nature, scores have been developed to stratify patients at high risk of 30-day mortality. Here we develop a machine-learning based score to predict 30-day, 90-day, and 365-day mortality in PE patients. METHODS The Birmingham and Black Country Venous Thromboembolism registry (BBC-VTE) of 2183 venous thromboembolism patients is used. Random forests were trained on a 70% training cohort and tested against 30% held-out set. The outcomes of interest were 30-day, 90-day, and 365-day mortality. These were compared to the pulmonary embolism severity index (PESI) and simplified pulmonary embolism severity index (sPESI). Shapley values were used to determine important predictors. Oral anticoagulation at discharge was also investigated as a predictor of mortality. RESULTS The machine learning risk score predicted 30-day mortality with AUC 0.71 [95% CI: 0.63 - 0.78] compared to the sPESI AUC of 0.65 [95% CI: 0.57 - 0.73] and PESI AUC of 0.64 [95% CI: 0.56 - 0.72]. 90-day mortality and 365-day mortality were predicted with an AUC of 0.74 and 0.73 respectively. High counts of neutrophils, white blood cell counts, and c-reactive protein and low counts of haemoglobin were important for 30-day mortality prediction but progressively lost importance with time. Older age was an important predictor of high risk throughout. CONCLUSION Machine learning algorithms have improved on standard clinical risk stratification for PE patients. External cohort validation is required before incorporation into clinical workflows.
Collapse
Affiliation(s)
- Wahbi K El-Bouri
- Liverpool Centre for Cardiovascular Science, University of Liverpool and Liverpool Heart & Chest Hospital, Liverpool, UK; Department of Cardiovascular and Metabolic Medicine, Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool, UK.
| | - Alexander Sanders
- Liverpool Centre for Cardiovascular Science, University of Liverpool and Liverpool Heart & Chest Hospital, Liverpool, UK
| | - Gregory Y H Lip
- Liverpool Centre for Cardiovascular Science, University of Liverpool and Liverpool Heart & Chest Hospital, Liverpool, UK; Department of Cardiovascular and Metabolic Medicine, Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool, UK
| |
Collapse
|
4
|
Liao X, Yao C, Zhang J, Liu LZ. Recent advancement in integrating artificial intelligence and information technology with real-world data for clinical decision-making in China: A scoping review. J Evid Based Med 2023; 16:534-546. [PMID: 37772921 DOI: 10.1111/jebm.12549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 08/31/2023] [Indexed: 09/30/2023]
Abstract
OBJECTIVE Striking innovations and advancements have been achieved with the use of artificial intelligence and healthcare information technology being integrated into clinical real-world data. The current scoping review aimed to provide an overview of the current status of artificial intelligence-/information technology-based clinical decision support tools in China. METHODS PubMed/MEDLINE, Embase, China National Knowledge Internet, and Wanfang data were searched for both English and Chinese literature. The gray literature search was conducted for commercially available tools. Original studies that focused on clinical decision support tools driven by artificial intelligence or information technology in China and were published between 2010 and February 2022 were included. Information extracted from each article was further synthesized by themes based on three types of clinical decision-making. RESULTS A total of 37 peer-reviewed publications and 13 commercially available tools were included in the final analysis. Among them, 32.0% were developed for disease diagnosis, 54.0% for risk prediction and classification, and 14.0% for disease management. Chronic diseases were the most popular therapeutic areas of exploration, with particular emphasis on cardiovascular and cerebrovascular diseases. Single-center electronic medical records were the mainstream data sources leveraged to inform clinical decision-making, with internal validation being predominately used for model evaluation. CONCLUSIONS To effectively promote the extensive use of real-world data and drive a paradigm shift in clinical decision-making in China, multidisciplinary collaboration of key stakeholders is urgently needed.
Collapse
Affiliation(s)
- Xiwen Liao
- Peking University Clinical Research Institute, Peking University First Hospital, Beijing, China
| | - Chen Yao
- Peking University Clinical Research Institute, Peking University First Hospital, Beijing, China
- Hainan Institute of Real World Data, Qionghai, Hainan, China
| | - Jun Zhang
- Center for Observational and Real-world Evidence (CORE), MSD R&D (China) Co., Ltd., Beijing, China
| | - Larry Z Liu
- Center for Observational and Real-world Evidence (CORE), Merck & Co Inc, Rahway, Rahway, New Jersey, USA
- Department of Population Health Sciences, Weill Cornell Medical College, New York City, New York, USA
| |
Collapse
|
5
|
Feng J, Zhang Q, Wu F, Peng J, Li Z, Chen Z. The Value of Applying Machine Learning in Predicting the Time of Symptom Onset in Stroke Patients: Systematic Review and Meta-Analysis. J Med Internet Res 2023; 25:e44895. [PMID: 37824198 PMCID: PMC10603565 DOI: 10.2196/44895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 04/02/2023] [Accepted: 09/14/2023] [Indexed: 10/13/2023] Open
Abstract
BACKGROUND Machine learning is a potentially effective method for identifying and predicting the time of the onset of stroke. However, the value of applying machine learning in this field remains controversial and debatable. OBJECTIVE We aimed to assess the value of applying machine learning in predicting the time of stroke onset. METHODS PubMed, Web of Science, Embase, and Cochrane were comprehensively searched. The C index and sensitivity with 95% CI were used as effect sizes. The risk of bias was evaluated using PROBAST (Prediction Model Risk of Bias Assessment Tool), and meta-analysis was conducted using R (version 4.2.0; R Core Team). RESULTS Thirteen eligible studies were included in the meta-analysis involving 55 machine learning models with 41 models in the training set and 14 in the validation set. The overall C index was 0.800 (95% CI 0.773-0.826) in the training set and 0.781 (95% CI 0.709-0.852) in the validation set. The sensitivity and specificity were 0.76 (95% CI 0.73-0.80) and 0.79 (95% CI 0.74-0.82) in the training set and 0.81 (95% CI 0.68-0.90) and 0.83 (95% CI 0.73-0.89) in the validation set, respectively. Subgroup analysis revealed that the accuracy of machine learning in predicting the time of stroke onset within 4.5 hours was optimal (training: 0.80, 95% CI 0.77-0.83; validation: 0.79, 95% CI 0.71-0.86). CONCLUSIONS Machine learning has ideal performance in identifying the time of stroke onset. More reasonable image segmentation and texture extraction methods in radiomics should be used to promote the value of applying machine learning in diverse ethnic backgrounds. TRIAL REGISTRATION PROSPERO CRD42022358898; https://www.crd.york.ac.uk/Prospero/display_record.php?RecordID=358898.
Collapse
Affiliation(s)
- Jing Feng
- Department of Neurology, Fifth People's Hospital of Jinan, Jinan, China
| | - Qizhi Zhang
- Department of Neurology, Fifth People's Hospital of Jinan, Jinan, China
| | - Feng Wu
- Department of Pulmonary Disease and Diabetes Mellitus, Central Hospital of Enshi Tujia and Miao Autonomous Prefecture, Enshi, China
| | - Jinxiang Peng
- Medical Department, Hubei Enshi College, Enshi, China
| | - Ziwei Li
- Experimental Center, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Zhuang Chen
- Department of Cardiovascular Medicine, Fifth People's Hospital of Jinan, Jinan, China
| |
Collapse
|
6
|
Abegaz TM, Baljoon A, Kilanko O, Sherbeny F, Ali AA. Machine learning algorithms to predict major adverse cardiovascular events in patients with diabetes. Comput Biol Med 2023; 164:107289. [PMID: 37557056 DOI: 10.1016/j.compbiomed.2023.107289] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 07/01/2023] [Accepted: 07/28/2023] [Indexed: 08/11/2023]
Abstract
BACKGROUND Major Adverse Cardiovascular Events (MACE) are common complications of type 2 diabetes mellitus (T2DM) that include myocardial infarction (MI), stroke, and heart failure (HF). The objective of the current study was to predict MACE among T2DM patients. METHODS Type 2 diabetes mellitus patients above 18 years old were recruited for the study from the All of Us Research Program. Eligible participants were those who took sodium-glucose cotransporter 2 inhibitors. Different Machine learning algorithms: including RandomForest (RF), XGBoost, logistic regression (LR), and weighted ensemble model (WEM) were employed. Clinical attributes, electrolytes and biomarkers were explored in predicting MACE. The feature importance was determined using mean decrease accuracy. RESULTS Overall, 9, 059 subjects were included in the analyses, of which 5197 (57.4%) were females. The XGBoost Model demonstrated a prediction accuracy of 0.80 [0.78-0.82], which is higher as compared to the RF 0.78[0.76-0.80], the LR model 0.65 [0.62-0.67], and the WEM 0.75 [0.73-0.76], respectively. The classification accuracy of the models for stroke was more than 95%, which was higher than prediction accuracy for MI (∼85%), and HF (∼80%). Phosphate, blood urea nitrogen and troponin levels were the major predictors of MACE. CONCLUSION The ML models had shown acceptable performance in predicting MACE in T2DM patients, except the LR model. Phosphate, blood urea nitrogen, and other electrolytes were important predictors of MACE, which is consistent between the individual components of MACE, such as stroke, MI, and HF. These parameters can be calibrated as prognostic parameters of MACE events in T2DM patients.
Collapse
Affiliation(s)
- Tadesse M Abegaz
- Economic, Social and Administrative Pharmacy (ESAP), College of Pharmacy and Pharmaceutical Sciences, Institute of Public Heath, Florida A&M University, Tallahassee, FL, 32307, USA
| | - Ahmead Baljoon
- Economic, Social and Administrative Pharmacy (ESAP), College of Pharmacy and Pharmaceutical Sciences, Institute of Public Heath, Florida A&M University, Tallahassee, FL, 32307, USA
| | - Oluwaseun Kilanko
- Economic, Social and Administrative Pharmacy (ESAP), College of Pharmacy and Pharmaceutical Sciences, Institute of Public Heath, Florida A&M University, Tallahassee, FL, 32307, USA
| | - Fatimah Sherbeny
- Economic, Social and Administrative Pharmacy (ESAP), College of Pharmacy and Pharmaceutical Sciences, Institute of Public Heath, Florida A&M University, Tallahassee, FL, 32307, USA
| | - Askal Ayalew Ali
- Economic, Social and Administrative Pharmacy (ESAP), College of Pharmacy and Pharmaceutical Sciences, Institute of Public Heath, Florida A&M University, Tallahassee, FL, 32307, USA.
| |
Collapse
|
7
|
Qian X, Keerman M, Zhang X, Guo H, He J, Maimaitijiang R, Wang X, Ma J, Li Y, Ma R, Guo S. Study on the prediction model of atherosclerotic cardiovascular disease in the rural Xinjiang population based on survival analysis. BMC Public Health 2023; 23:1041. [PMID: 37264356 DOI: 10.1186/s12889-023-15630-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 04/07/2023] [Indexed: 06/03/2023] Open
Abstract
PURPOSE With the increase in aging and cardiovascular risk factors, the morbidity and mortality of atherosclerotic cardiovascular disease (ASCVD), represented by ischemic heart disease and stroke, continue to rise in China. For better prevention and intervention, relevant guidelines recommend using predictive models for early detection of ASCVD high-risk groups. Therefore, this study aims to establish a population ASCVD prediction model in rural areas of Xinjiang using survival analysis. METHODS Baseline cohort data were collected from September to December 2016 and followed up till June 2022. A total of 7975 residents (4054 males and 3920 females) aged 30-74 years were included in the analysis. The data set was divided according to different genders, and the training and test sets ratio was 7:3 for different genders. A Cox regression, Lasso-Cox regression, and random survival forest (RSF) model were established in the training set. The model parameters were determined by cross-validation and parameter tuning and then verified in the training set. Traditional ASCVD prediction models (Framingham and China-PAR models) were constructed in the test set. Different models' discrimination and calibration degrees were compared to find the optimal prediction model for this population according to different genders and further analyze the risk factors of ASCVD. RESULTS After 5.79 years of follow-up, 873 ASCVD events with a cumulative incidence of 10.19% were found (7.57% in men and 14.44% in women). By comparing the discrimination and calibration degrees of each model, the RSF showed the best prediction performance in males and females (male: Area Under Curve (AUC) 0.791 (95%CI 0.767,0.813), C statistic 0.780 (95%CI 0.730,0.829), Brier Score (BS):0.060, female: AUC 0.759 (95%CI 0.734,0.783) C statistic was 0.737 (95%CI 0.702,0.771), BS:0.110). Age, systolic blood pressure (SBP), apolipoprotein B (APOB), Visceral Adiposity Index (VAI), hip circumference (HC), and plasma arteriosclerosis index (AIP) are important predictors of ASCVD in the rural population of Xinjiang. CONCLUSION The performance of the ASCVD prediction model based on the RSF algorithm is better than that based on Cox regression, Lasso-Cox, and the traditional ASCVD prediction model in the rural population of Xinjiang.
Collapse
Affiliation(s)
- Xin Qian
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China
| | - Mulatibieke Keerman
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China
| | - Xianghui Zhang
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China
| | - Heng Guo
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China
| | - Jia He
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China
| | - Remina Maimaitijiang
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China
| | - Xinping Wang
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China
| | - Jiaolong Ma
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China
| | - Yu Li
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China
| | - Rulin Ma
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China.
- Department of Public Health, The Key Laboratory of Preventive Medicine, Shihezi University School of Medicine, Suite 816Building No. 1, Beier Road, Shihezi, 832000, Xinjiang, China.
| | - Shuxia Guo
- Department of Public Health, Shihezi University School of Medicine, Suite 721, The Key Laboratory of Preventive Medicine, Building No. 1, Beier Road, ShiheziShihezi, 832000, Xinjiang, China.
- Department of NHC Key Laboratory of Prevention and Treatment of Central, Asia High Incidence Diseases, The First Affiliated Hospital of Shihezi University Medical College, Shihezi, Xinjiang, China.
| |
Collapse
|
8
|
Chahine Y, Magoon MJ, Maidu B, del Álamo JC, Boyle PM, Akoum N. Machine Learning and the Conundrum of Stroke Risk Prediction. Arrhythm Electrophysiol Rev 2023; 12:e07. [PMID: 37427297 PMCID: PMC10326666 DOI: 10.15420/aer.2022.34] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 02/07/2023] [Indexed: 07/11/2023] Open
Abstract
Stroke is a leading cause of death worldwide. With escalating healthcare costs, early non-invasive stroke risk stratification is vital. The current paradigm of stroke risk assessment and mitigation is focused on clinical risk factors and comorbidities. Standard algorithms predict risk using regression-based statistical associations, which, while useful and easy to use, have moderate predictive accuracy. This review summarises recent efforts to deploy machine learning (ML) to predict stroke risk and enrich the understanding of the mechanisms underlying stroke. The surveyed body of literature includes studies comparing ML algorithms with conventional statistical models for predicting cardiovascular disease and, in particular, different stroke subtypes. Another avenue of research explored is ML as a means of enriching multiscale computational modelling, which holds great promise for revealing thrombogenesis mechanisms. Overall, ML offers a new approach to stroke risk stratification that accounts for subtle physiologic variants between patients, potentially leading to more reliable and personalised predictions than standard regression-based statistical associations.
Collapse
Affiliation(s)
- Yaacoub Chahine
- Division of Cardiology, University of Washington, Seattle, WA, US
| | - Matthew J Magoon
- Department of Bioengineering, University of Washington, Seattle, WA, US
| | - Bahetihazi Maidu
- Department of Mechanical Engineering, University of Washington, Seattle, WA, US
| | - Juan C del Álamo
- Department of Mechanical Engineering, University of Washington, Seattle, WA, US
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, US
- Center for Cardiovascular Biology, University of Washington, Seattle, WA, US
| | - Patrick M Boyle
- Department of Bioengineering, University of Washington, Seattle, WA, US
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, US
- Center for Cardiovascular Biology, University of Washington, Seattle, WA, US
| | - Nazem Akoum
- Division of Cardiology, University of Washington, Seattle, WA, US
- Department of Bioengineering, University of Washington, Seattle, WA, US
| |
Collapse
|
9
|
Liu Y, Luo Y, Naidech AM. Big Data in Stroke: How to Use Big Data to Make the Next Management Decision. Neurotherapeutics 2023; 20:744-757. [PMID: 36899137 PMCID: PMC10275829 DOI: 10.1007/s13311-023-01358-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/17/2023] [Indexed: 03/12/2023] Open
Abstract
The last decade has seen significant advances in the accumulation of medical data, the computational techniques to analyze that data, and corresponding improvements in management. Interventions such as thrombolytics and mechanical thrombectomy improve patient outcomes after stroke in selected patients; however, significant gaps remain in our ability to select patients, predict complications, and understand outcomes. Big data and the computational methods needed to analyze it can address these gaps. For example, automated analysis of neuroimaging to estimate the volume of brain tissue that is ischemic and salvageable can help triage patients for acute interventions. Data-intensive computational techniques can perform complex risk calculations that are too cumbersome to be completed by humans, resulting in more accurate and timely prediction of which patients require increased vigilance for adverse events such as treatment complications. To handle the accumulation of complex medical data, a variety of advanced computational techniques referred to as machine learning and artificial intelligence now routinely complement traditional statistical inference. In this narrative review, we explore data-intensive techniques in stroke research, how it has informed the management of stroke patients, and how current work could shape clinical practice in the future.
Collapse
Affiliation(s)
- Yuzhe Liu
- Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.
| | - Yuan Luo
- Section of Health and Biomedical Informatics, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Andrew M Naidech
- Section of Neurocritical Care, Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| |
Collapse
|
10
|
Huang R, Liu J, Wan TK, Siriwanna D, Woo YMP, Vodencarevic A, Wong CW, Chan KHK. Stroke mortality prediction based on ensemble learning and the combination of structured and textual data. Comput Biol Med 2023; 155:106176. [PMID: 36805232 DOI: 10.1016/j.compbiomed.2022.106176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 09/12/2022] [Accepted: 10/01/2022] [Indexed: 11/23/2022]
Abstract
For severe cerebrovascular diseases such as stroke, the prediction of short-term mortality of patients has tremendous medical significance. In this study, we combined machine learning models Random Forest classifier (RF), Adaptive Boosting (AdaBoost), Extremely Randomised Trees (ExtraTree) classifier, XGBoost classifier, TabNet, and DistilBERT to construct a multi-level prediction model that used bioassay data and radiology text reports from haemorrhagic and ischaemic stroke patients to predict six-month mortality. The performances of the prediction models were measured using the area under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve (AUPRC), precision, recall, and F1-score. The prediction models were built with the use of data from 19,616 haemorrhagic stroke patients and 50,178 ischaemic stroke patients. Novel six-month mortality prediction models for these patients were developed, which enhanced the performance of the prediction models by combining laboratory test data, structured data, and textual radiology report data. The achieved performances were as follows: AUROC = 0.89, AUPRC = 0.70, precision = 0.52, recall = 0.78, and F1 score = 0.63 for haemorrhagic patients, and AUROC = 0.88, AUPRC = 0.54, precision = 0.34, recall = 0.80, and F1 score = 0.48 for ischaemic patients. Such models could be used for mortality risk assessment and early identification of high-risk stroke patients. This could contribute to more efficient utilisation of healthcare resources for stroke survivors.
Collapse
Affiliation(s)
- Ruixuan Huang
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China
| | - Jundong Liu
- Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China
| | - Tsz Kin Wan
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China
| | - Damrongrat Siriwanna
- Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China
| | | | | | - Chi Wah Wong
- Department of Applied AI and Data Science, City of Hope National Medical Center, Duarte, CA, 91010, United States
| | - Kei Hang Katie Chan
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China; Department of Biomedical Sciences, City University of Hong Kong, Hong Kong, China; Department of Epidemiology and Center for Global Cardiometabolic Health, School of Public Health, Department of Medicine, The Warrant Alpert School of Medicine, Brown University, Providence, RI, United States.
| |
Collapse
|
11
|
Qiu Y, Cheng S, Wu Y, Yan W, Hu S, Chen Y, Xu Y, Chen X, Yang J, Chen X, Zheng H. Development of rapid and effective risk prediction models for stroke in the Chinese population: a cross-sectional study. BMJ Open 2023; 13:e068045. [PMID: 36858471 PMCID: PMC9980356 DOI: 10.1136/bmjopen-2022-068045] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/03/2023] Open
Abstract
OBJECTIVES The purpose of this study was to use easily obtained and directly observable clinical features to establish predictive models to identify patients at increased risk of stroke. SETTING AND PARTICIPANTS A total of 46 240 valid records were obtained from 8 research centres and 14 communities in Jiangxi province, China, between February and September 2018. PRIMARY AND SECONDARY OUTCOME MEASURES The area under the receiver operating characteristic curve (AUC), sensitivity, specificity and accuracy were calculated to test the performance of the five models (logistic regression (LR), random forest (RF), decision tree (DT), extreme gradient boosting (XGBoost) and gradient boosting DT). The calibration curve was used to show calibration performance. RESULTS The results indicated that XGBoost (AUC: 0.924, accuracy: 0.873, sensitivity: 0.776, specificity: 0.916) and RF (AUC: 0.924, accuracy: 0.872, sensitivity: 0.778, specificity: 0.913) demonstrated excellent performance in predicting stroke. Physical inactivity, hypertension, meat-based diet and high salt intake were important prediction features of stroke. CONCLUSION The five machine learning models all had good predictive and discriminatory performance for stroke. The performance of RF and XGBoost was slightly better than that of LR, which was easier to interpret and less prone to overfitting. This work provides a rapid and accurate tool for stroke risk assessment, which can help to improve the efficiency of stroke screening medical services and the management of high-risk groups.
Collapse
Affiliation(s)
- Yuexin Qiu
- School of Public Health, Nanchang University, Nanchang, Jiangxi, China
- Key Laboratory of Preventive Medicine, Nanchang University, Nanchang, Jiangxi, China
| | - Shiqi Cheng
- Neurosurgery Department, Nanchang University Second Affiliated Hospital, Nanchang, Jiangxi, China
| | - Yuhang Wu
- Department of Epidemiology and Health Statistics, Central South University, Changsha, Hunan, China
| | - Wei Yan
- Institute of Chronic Non-communicable Diseases, Center for Disease Control and Prevention of Jiangxi Province, Nanchang, Jiangxi, China
| | - Songbo Hu
- School of Public Health, Nanchang University, Nanchang, Jiangxi, China
- Key Laboratory of Preventive Medicine, Nanchang University, Nanchang, Jiangxi, China
| | - Yiying Chen
- Institute of Chronic Non-communicable Diseases, Center for Disease Control and Prevention of Jiangxi Province, Nanchang, Jiangxi, China
| | - Yan Xu
- Institute of Chronic Non-communicable Diseases, Center for Disease Control and Prevention of Jiangxi Province, Nanchang, Jiangxi, China
| | - Xiaona Chen
- Institute of Chronic Non-communicable Diseases, Center for Disease Control and Prevention of Jiangxi Province, Nanchang, Jiangxi, China
| | - Junsai Yang
- School of Public Health, Nanchang University, Nanchang, Jiangxi, China
- Key Laboratory of Preventive Medicine, Nanchang University, Nanchang, Jiangxi, China
| | - Xiaoyun Chen
- School of Public Health, Nanchang University, Nanchang, Jiangxi, China
- Key Laboratory of Preventive Medicine, Nanchang University, Nanchang, Jiangxi, China
| | - Huilie Zheng
- School of Public Health, Nanchang University, Nanchang, Jiangxi, China
- Key Laboratory of Preventive Medicine, Nanchang University, Nanchang, Jiangxi, China
| |
Collapse
|
12
|
Hsu W, Warren J, Riddle P. Multivariate Sequential Analytics for Cardiovascular Disease Event Prediction. Methods Inf Med 2022; 61:e149-e171. [PMID: 36564011 PMCID: PMC9788915 DOI: 10.1055/s-0042-1758687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
BACKGROUND Automated clinical decision support for risk assessment is a powerful tool in combating cardiovascular disease (CVD), enabling targeted early intervention that could avoid issues of overtreatment or undertreatment. However, current CVD risk prediction models use observations at baseline without explicitly representing patient history as a time series. OBJECTIVE The aim of this study is to examine whether by explicitly modelling the temporal dimension of patient history event prediction may be improved. METHODS This study investigates methods for multivariate sequential modelling with a particular emphasis on long short-term memory (LSTM) recurrent neural networks. Data from a CVD decision support tool is linked to routinely collected national datasets including pharmaceutical dispensing, hospitalization, laboratory test results, and deaths. The study uses a 2-year observation and a 5-year prediction window. Selected methods are applied to the linked dataset. The experiments performed focus on CVD event prediction. CVD death or hospitalization in a 5-year interval was predicted for patients with history of lipid-lowering therapy. RESULTS The results of the experiments showed temporal models are valuable for CVD event prediction over a 5-year interval. This is especially the case for LSTM, which produced the best predictive performance among all models compared achieving AUROC of 0.801 and average precision of 0.425. The non-temporal model comparator ridge classifier (RC) trained using all quarterly data or by aggregating quarterly data (averaging time-varying features) was highly competitive achieving AUROC of 0.799 and average precision of 0.420 and AUROC of 0.800 and average precision of 0.421, respectively. CONCLUSION This study provides evidence that the use of deep temporal models particularly LSTM in clinical decision support for chronic disease would be advantageous with LSTM significantly improving on commonly used regression models such as logistic regression and Cox proportional hazards on the task of CVD event prediction.
Collapse
Affiliation(s)
- William Hsu
- School of Computer Science, University of Auckland, Auckland, New Zealand,Address for correspondence William Hsu, PhD School of Computer Science, University of AucklandPrivate Bag 92019, Auckland 1142New Zealand
| | - Jim Warren
- School of Computer Science, University of Auckland, Auckland, New Zealand
| | - Patricia Riddle
- School of Computer Science, University of Auckland, Auckland, New Zealand
| |
Collapse
|
13
|
Yang WX, Wang FF, Pan YY, Xie JQ, Lu MH, You CG. Comparison of ischemic stroke diagnosis models based on machine learning. Front Neurol 2022; 13:1014346. [PMID: 36545400 PMCID: PMC9762505 DOI: 10.3389/fneur.2022.1014346] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 11/09/2022] [Indexed: 12/11/2022] Open
Abstract
Background The incidence, prevalence, and mortality of ischemic stroke (IS) continue to rise, resulting in a serious global disease burden. The prediction models have a great value in the early prediction and diagnosis of IS. Methods The R software was used to screen the differentially expressed genes (DEGs) of IS and control samples in the datasets GSE16561, GSE58294, and GSE37587 and analyze DEGs for enrichment analysis. The feature genes of IS were obtained by several machine learning algorithms, including the least absolute shrinkage and selector operation (LASSO) logistic regression, the support vector machine-recursive feature elimination (SVM-RFE), and the Random Forest (RF). The IS diagnostic models were constructed based on transcriptomics by machine learning and artificial neural network (ANN). Results A total of 69 DEGs, mainly involved in immune and inflammatory responses, were identified. The pathways enriched in the IS group were complement and coagulation cascades, lysosome, PPAR signaling pathway, regulation of autophagy, and toll-like receptor signaling pathway. The feature genes selected by LASSO, SVM-RFE, and RF were 17, 10, and 12, respectively. The area under the curve (AUC) of the LASSO model in the training dataset, GSE22255, and GSE195442 was 0.969, 0.890, and 1.000. The AUC of the SVM-RFE model was 0.957, 0.805, and 1.000, respectively. The AUC of the RF model was 0.947, 0.935, and 1.000, respectively. The models have good sensitivity, specificity, and accuracy. The AUC of the LASSO+ANN, SVM-RFE+ANN, and RF+ANN models was 1.000, 0.995, and 0.997, respectively, in the training dataset. However, the AUC of LASSO+ANN, SVM-RFE+ANN, and RF+ANN models was 0.688, 0.605, and 0.619, respectively, in the GSE22255 dataset. The AUC of the LASSO+ANN and RF+ANN models was 0.740 and 0.630, respectively, in the GSE195442 dataset. In the training dataset, the sensitivity, specificity, and accuracy of the LASSO+ANN model were 1.000, 1.000, and 1.000, respectively; of the SVM-RFE+ANN model were 0.946, 0.982, and 0.964, respectively; and of the RF+ANN model were 0.964, 1.000, and 0.982, respectively. In the test datasets, the sensitivity was very satisfactory; however, the specificity and accuracy were not good. Conclusion The LASSO, SVM-RFE, and RF models have good prediction abilities. However, the ANN model is efficient at classifying positive samples and is unsuitable at classifying negative samples.
Collapse
Affiliation(s)
- Wan-Xia Yang
- Laboratory Medicine Center, Lanzhou University Second Hospital, Lanzhou, China
| | - Fang-Fang Wang
- Laboratory Medicine Center, Lanzhou University Second Hospital, Lanzhou, China
| | - Yun-Yan Pan
- Laboratory Medicine Center, Lanzhou University Second Hospital, Lanzhou, China
| | - Jian-Qin Xie
- Anesthesiology Department, Lanzhou University Second Hospital, Lanzhou, China
| | - Ming-Hua Lu
- Laboratory Medicine Center, Lanzhou University Second Hospital, Lanzhou, China
| | - Chong-Ge You
- Laboratory Medicine Center, Lanzhou University Second Hospital, Lanzhou, China,*Correspondence: Chong-Ge You
| |
Collapse
|
14
|
Hunter E, Kelleher JD. A review of risk concepts and models for predicting the risk of primary stroke. Front Neuroinform 2022; 16:883762. [DOI: 10.3389/fninf.2022.883762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 10/31/2022] [Indexed: 11/17/2022] Open
Abstract
Predicting an individual's risk of primary stroke is an important tool that can help to lower the burden of stroke for both the individual and society. There are a number of risk models and risk scores in existence but no review or classification designed to help the reader better understand how models differ and the reasoning behind these differences. In this paper we review the existing literature on primary stroke risk prediction models. From our literature review we identify key similarities and differences in the existing models. We find that models can differ in a number of ways, including the event type, the type of analysis, the model type and the time horizon. Based on these similarities and differences we have created a set of questions and a system to help answer those questions that modelers and readers alike can use to help classify and better understand the existing models as well as help to make necessary decisions when creating a new model.
Collapse
|
15
|
Important Risk Factors in Patients with Nonvalvular Atrial Fibrillation Taking Dabigatran Using Integrated Machine Learning Scheme—A Post Hoc Analysis. J Pers Med 2022; 12:jpm12050756. [PMID: 35629177 PMCID: PMC9146635 DOI: 10.3390/jpm12050756] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 04/29/2022] [Accepted: 05/03/2022] [Indexed: 02/06/2023] Open
Abstract
Our study aims to develop an effective integrated machine learning (ML) scheme to predict vascular events and bleeding in patients with nonvalvular atrial fibrillation taking dabigatran and identify important risk factors. This study is a post-hoc analysis from the Randomized Evaluation of Long-Term Anticoagulant Therapy trial database. One traditional prediction method, logistic regression (LGR), and four ML techniques—naive Bayes, random forest (RF), classification and regression tree, and extreme gradient boosting (XGBoost)—were combined to construct our scheme. Area under the receiver operating characteristic curve (AUC) of RF (0.780) and XGBoost (0.717) was higher than that of LGR (0.674) in predicting vascular events. In predicting bleeding, AUC of RF (0.684) and XGBoost (0.618) showed higher values than those generated by LGR (0.605). Our integrated ML feature selection scheme based on the two convincing prediction techniques identified age, history of congestive heart failure and myocardial infarction, smoking, kidney function, and body mass index as major variables of vascular events; age, kidney function, smoking, bleeding history, concomitant use of specific drugs, and dabigatran dosage as major variables of bleeding. ML is an effective data analysis algorithm for solving complex medical data. Our results may provide preliminary direction for precision medicine.
Collapse
|