1
|
Hennebelle A, Ismail L, Materwala H, Al Kaabi J, Ranjan P, Janardhanan R. Secure and privacy-preserving automated machine learning operations into end-to-end integrated IoT-edge-artificial intelligence-blockchain monitoring system for diabetes mellitus prediction. Comput Struct Biotechnol J 2024; 23:212-233. [PMID: 38169966 PMCID: PMC10758733 DOI: 10.1016/j.csbj.2023.11.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 11/20/2023] [Accepted: 11/20/2023] [Indexed: 01/05/2024] Open
Abstract
Diabetes Mellitus, one of the leading causes of death worldwide, has no cure to date and can lead to severe health complications, such as retinopathy, limb amputation, cardiovascular diseases, and neuronal disease, if left untreated. Consequently, it becomes crucial to be able to monitor and predict the incidence of diabetes. Machine learning approaches have been proposed and evaluated in the literature for diabetes prediction. This paper proposes an IoT-edge-Artificial Intelligence (AI)-blockchain system for diabetes prediction based on risk factors. The proposed system is underpinned by blockchain to obtain a cohesive view of the risk factors data from patients across different hospitals and ensure security and privacy of the user's data. We provide a comparative analysis of different medical sensors, devices, and methods to measure and collect the risk factors values in the system. Numerical experiments and comparative analysis were carried out within our proposed system, using the most accurate random forest (RF) model, and the two most used state-of-the-art machine learning approaches, Logistic Regression (LR) and Support Vector Machine (SVM), using three real-life diabetes datasets. The results show that the proposed system predicts diabetes using RF with 4.57% more accuracy on average in comparison with the other models LR and SVM, with 2.87 times more execution time. Data balancing without feature selection does not show significant improvement. When using feature selection, the performance is improved by 1.14% for PIMA Indian and 0.02% for Sylhet datasets, while it is reduced by 0.89% for MIMIC III.
Collapse
Affiliation(s)
- Alain Hennebelle
- School of Computing and Information Systems, The University of Melbourne, Australia
| | - Leila Ismail
- School of Computing and Information Systems, The University of Melbourne, Australia
- Intelligent Distributed Computing and Systems Lab, Department of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, United Arab Emirates
- National Water and Energy Center, United Arab Emirates University, United Arab Emirates
| | - Huned Materwala
- Intelligent Distributed Computing and Systems Lab, Department of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, United Arab Emirates
- National Water and Energy Center, United Arab Emirates University, United Arab Emirates
| | - Juma Al Kaabi
- College of Medicine and Health Sciences, Department of Internal Medicine, United Arab Emirates University, United Arab Emirates
- Tawam and Mediclinic Hospitals, Al Ain, Abu Dhabi, United Arab Emirates
| | - Priya Ranjan
- School of Computer Science, Internet of Things Center of Excellence, University of Petroleum and Energy Studies, India
| | - Rajiv Janardhanan
- Faculty of Medical & Health Sciences, SRM Institute of Science & Technology, India
| |
Collapse
|
2
|
Garcés-Jiménez A, Polo-Luque ML, Gómez-Pulido JA, Rodríguez-Puyol D, Gómez-Pulido JM. Predictive health monitoring: Leveraging artificial intelligence for early detection of infectious diseases in nursing home residents through discontinuous vital signs analysis. Comput Biol Med 2024; 174:108469. [PMID: 38636331 DOI: 10.1016/j.compbiomed.2024.108469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 04/08/2024] [Accepted: 04/09/2024] [Indexed: 04/20/2024]
Abstract
This research addresses the problem of detecting acute respiratory, urinary tract, and other infectious diseases in elderly nursing home residents using machine learning algorithms. The study analyzes data extracted from multiple vital signs and other contextual information for diagnostic purposes. The daily data collection process encounters sampling constraints due to weekends, holidays, shift changes, staff turnover, and equipment breakdowns, resulting in numerous nulls, repeated readings, outliers, and meaningless values. The short time series generated also pose a challenge to analysis, preventing the extraction of seasonal information or consistent trends. Blind data collection results in most of the data coming from periods when residents are healthy, resulting in excessively imbalanced data. This study proposes a data cleaning process and then builds a mechanism that reproduces the basal activity of the residents to improve the classification of the disease. The results show that the proposed basal module-assisted machine learning techniques allow anticipating diagnostics 2, 3 or 4 days before doctors decide to start treatment with antibiotics, achieving a performance measured by the area-under-the-curve metric of 0.857. The contributions of this work are: (1) a new data cleaning process; (2) the analysis of contextual information to improve data quality; (3) the generation of a baseline measure for relative comparison; and (4) the use of either binary (disease/no disease) or multiclass classification, differentiating among types of infections and showing the advantages of multiclass versus binary classification. From a medical point of view, the anticipated detection of infectious diseases in institutionalized individuals is brand new.
Collapse
Affiliation(s)
- Alberto Garcés-Jiménez
- Department of Computer Science, Universidad de Alcalá, Politechnic School, Alcala de Henares, 28805, Spain
| | - María-Luz Polo-Luque
- Department of Nursing and Physiotherapy, Universidad de Alcalá, Faculty of Medicine and Health Sciences, Alcala de Henares, 28805, Spain
| | - Juan A Gómez-Pulido
- Department of Technologies of Computers and Communications, Universidad de Extremadura, School of Technology, Cáceres, 10003, Spain.
| | - Diego Rodríguez-Puyol
- Department of Medicine and Medical Specialties, Research Foundation of the University Hospital Príncipe de Asturias, Campus Científico Tecnológico, Alcala de Henares, 28805, Spain
| | - José M Gómez-Pulido
- Department of Computer Science, Universidad de Alcalá, Politechnic School, Alcala de Henares, 28805, Spain
| |
Collapse
|
3
|
Woodman RJ, Koczwara B, Mangoni AA. Applying precision medicine principles to the management of multimorbidity: the utility of comorbidity networks, graph machine learning, and knowledge graphs. Front Med (Lausanne) 2024; 10:1302844. [PMID: 38404463 PMCID: PMC10885565 DOI: 10.3389/fmed.2023.1302844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 12/22/2023] [Indexed: 02/27/2024] Open
Abstract
The current management of patients with multimorbidity is suboptimal, with either a single-disease approach to care or treatment guideline adaptations that result in poor adherence due to their complexity. Although this has resulted in calls for more holistic and personalized approaches to prescribing, progress toward these goals has remained slow. With the rapid advancement of machine learning (ML) methods, promising approaches now also exist to accelerate the advance of precision medicine in multimorbidity. These include analyzing disease comorbidity networks, using knowledge graphs that integrate knowledge from different medical domains, and applying network analysis and graph ML. Multimorbidity disease networks have been used to improve disease diagnosis, treatment recommendations, and patient prognosis. Knowledge graphs that combine different medical entities connected by multiple relationship types integrate data from different sources, allowing for complex interactions and creating a continuous flow of information. Network analysis and graph ML can then extract the topology and structure of networks and reveal hidden properties, including disease phenotypes, network hubs, and pathways; predict drugs for repurposing; and determine safe and more holistic treatments. In this article, we describe the basic concepts of creating bipartite and unipartite disease and patient networks and review the use of knowledge graphs, graph algorithms, graph embedding methods, and graph ML within the context of multimorbidity. Specifically, we provide an overview of the application of graph theory for studying multimorbidity, the methods employed to extract knowledge from graphs, and examples of the application of disease networks for determining the structure and pathways of multimorbidity, identifying disease phenotypes, predicting health outcomes, and selecting safe and effective treatments. In today's modern data-hungry, ML-focused world, such network-based techniques are likely to be at the forefront of developing robust clinical decision support tools for safer and more holistic approaches to treating older patients with multimorbidity.
Collapse
Affiliation(s)
- Richard John Woodman
- Flinders Health and Medical Research Institute, College of Medicine and Public Health, Flinders University, Adelaide, SA, Australia
| | - Bogda Koczwara
- Flinders Health and Medical Research Institute, College of Medicine and Public Health, Flinders University, Adelaide, SA, Australia
- Department of Medical Oncology, Flinders Medical Centre, Southern Adelaide Local Health Network, Adelaide, SA, Australia
| | - Arduino Aleksander Mangoni
- Flinders Health and Medical Research Institute, College of Medicine and Public Health, Flinders University, Adelaide, SA, Australia
- Department of Clinical Pharmacology, Flinders Medical Centre, Southern Adelaide Local Health Network, Adelaide, SA, Australia
| |
Collapse
|
4
|
Zhang J, Xu Y, Ye B, Zhao Y, Sun X, Meng Q, Zhang Y, Cui L. EAPR: explainable and augmented patient representation learning for disease prediction. Health Inf Sci Syst 2023; 11:53. [PMID: 37974902 PMCID: PMC10645955 DOI: 10.1007/s13755-023-00256-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 10/17/2023] [Indexed: 11/19/2023] Open
Abstract
Patient representation learning aims to encode meaningful information about the patient's Electronic Health Records (EHR) in the form of a mathematical representation. Recent advances in deep learning have empowered Patient representation learning methods with greater representational power, allowing the learned representations to significantly improve the performance of disease prediction models. However, the inherent shortcomings of deep learning models, such as the need for massive amounts of labeled data and inexplicability, limit the performance of deep learning-based Patient representation learning methods to further improvements. In particular, learning robust patient representations is challenging when patient data is missing or insufficient. Although data augmentation techniques can tackle this deficiency, the complex data processing further weakens the inexplicability of patient representation learning models. To address the above challenges, this paper proposes an Explainable and Augmented Patient Representation Learning for disease prediction (EAPR). EAPR utilizes data augmentation controlled by confidence interval to enhance patient representation in the presence of limited patient data. Moreover, EAPR proposes to use two-stage gradient backpropagation to address the problem of unexplainable patient representation learning models due to the complex data enhancement process. The experimental results on real clinical data validate the effectiveness and explainability of the proposed approach.
Collapse
Affiliation(s)
- Jiancheng Zhang
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Jinan, China
- School of Software, Shandong University, Jinan, China
| | - Yonghui Xu
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Jinan, China
- School of Software, Shandong University, Jinan, China
| | - Bicui Ye
- Wuzhou Red Cross Hospital, Wuzhou, China
- Jinan University, Jinan, China
| | - Yibowen Zhao
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Jinan, China
- School of Software, Shandong University, Jinan, China
| | - Xiaofang Sun
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Jinan, China
- School of Software, Shandong University, Jinan, China
| | - Qi Meng
- Department of Radiology, Qilu Hospital of Shandong University, Jinan, China
| | - Yang Zhang
- Department of Radiology, Qilu Hospital of Shandong University, Jinan, China
| | - Lizhen Cui
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Jinan, China
- School of Software, Shandong University, Jinan, China
| |
Collapse
|
5
|
Lu H, Uddin S. Embedding-based link predictions to explore latent comorbidity of chronic diseases. Health Inf Sci Syst 2023; 11:2. [PMID: 36593862 PMCID: PMC9803807 DOI: 10.1007/s13755-022-00206-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 12/13/2022] [Indexed: 12/31/2022] Open
Abstract
Purpose Comorbidity is a term used to describe when a patient simultaneously has more than one chronic disease. Comorbidity is a significant health issue that affects people worldwide. This study aims to use machine learning and graph theory to predict the comorbidity of chronic diseases. Methods A patient-disease bipartite graph is constructed based on the administrative claim data. The bipartite graph projection approach was used to create the comorbidity network. For the link prediction task, three graph machine learning embedding-based models (node2vec, graph neural networks and hand-crafted approach) with different variants were used on the comorbidity network to compare their performance. This study also considered three commonly used similarity-based link prediction approaches (Jaccard coefficient, Adamic-Adar index and Resource allocation index) for performance comparison. Results The results showed that the embedding-based hand-crafted features technique achieved outstanding performance compared with the remaining similarity-based and embedding-based models. Especially, the hand-crafted technique with the extreme gradient boosting classifier achieved the highest accuracy (91.67%), followed by the same technique with the Logistic regression classifier (90.26%). For this shallow embedding method, the Jaccard coefficient and the degree centrality of the original chronic disease were the most important features for comorbidity prediction. Conclusion The proposed framework can be used to predict the comorbidity of chronic disease at an early stage of hospital admission. Thus, the prediction outcome could be valuable for medical practice, giving healthcare providers more control over their services and lowering expenses.
Collapse
Affiliation(s)
- Haohui Lu
- School of Project Management, Faculty of Engineering, The University of Sydney, Level 2, 21 Ross Street, Forest Lodge, NSW 2037 Australia
| | - Shahadat Uddin
- School of Project Management, Faculty of Engineering, The University of Sydney, Level 2, 21 Ross Street, Forest Lodge, NSW 2037 Australia
| |
Collapse
|
6
|
Woodman RJ, Mangoni AA. A comprehensive review of machine learning algorithms and their application in geriatric medicine: present and future. Aging Clin Exp Res 2023; 35:2363-2397. [PMID: 37682491 PMCID: PMC10627901 DOI: 10.1007/s40520-023-02552-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 08/24/2023] [Indexed: 09/09/2023]
Abstract
The increasing access to health data worldwide is driving a resurgence in machine learning research, including data-hungry deep learning algorithms. More computationally efficient algorithms now offer unique opportunities to enhance diagnosis, risk stratification, and individualised approaches to patient management. Such opportunities are particularly relevant for the management of older patients, a group that is characterised by complex multimorbidity patterns and significant interindividual variability in homeostatic capacity, organ function, and response to treatment. Clinical tools that utilise machine learning algorithms to determine the optimal choice of treatment are slowly gaining the necessary approval from governing bodies and being implemented into healthcare, with significant implications for virtually all medical disciplines during the next phase of digital medicine. Beyond obtaining regulatory approval, a crucial element in implementing these tools is the trust and support of the people that use them. In this context, an increased understanding by clinicians of artificial intelligence and machine learning algorithms provides an appreciation of the possible benefits, risks, and uncertainties, and improves the chances for successful adoption. This review provides a broad taxonomy of machine learning algorithms, followed by a more detailed description of each algorithm class, their purpose and capabilities, and examples of their applications, particularly in geriatric medicine. Additional focus is given on the clinical implications and challenges involved in relying on devices with reduced interpretability and the progress made in counteracting the latter via the development of explainable machine learning.
Collapse
Affiliation(s)
- Richard J Woodman
- Centre of Epidemiology and Biostatistics, College of Medicine and Public Health, Flinders University, GPO Box 2100, Adelaide, SA, 5001, Australia.
| | - Arduino A Mangoni
- Discipline of Clinical Pharmacology, College of Medicine and Public Health, Flinders University, Adelaide, SA, Australia
- Department of Clinical Pharmacology, Flinders Medical Centre, Southern Adelaide Local Health Network, Adelaide, SA, Australia
| |
Collapse
|
7
|
Jiang L, Xia Z, Zhu R, Gong H, Wang J, Li J, Wang L. Diabetes risk prediction model based on community follow-up data using machine learning. Prev Med Rep 2023; 35:102358. [PMID: 37654514 PMCID: PMC10465943 DOI: 10.1016/j.pmedr.2023.102358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/31/2023] [Accepted: 08/01/2023] [Indexed: 09/02/2023] Open
Abstract
Diabetes is a chronic metabolic disease characterized by hyperglycemia, the follow-up management of diabetes patients is mostly in the community, but the relationship between key lifestyle indicators in community follow-up and the risk of diabetes is unclear. In order to explore the association between key life characteristic indicators of community follow-up and the risk of diabetes, 252,176 follow-up records of people with diabetes patients from 2016 to 2023 were obtained from Haizhu District, Guangzhou. According to the follow-up data, the key life characteristic indicators that affect diabetes are determined, and the optimal feature subset is obtained through feature selection technology to accurately assess the risk of diabetes. A diabetes risk assessment model based on a random forest classifier was designed, which used optimal feature parameter selection and algorithm model comparison, with an accuracy of 91.24% and an AUC corresponding to the ROC curve of 97%. In order to improve the applicability of the model in clinical and real life, a diabetes risk score card was designed and tested using the original data, the accuracy was 95.15%, and the model reliability was high. The diabetes risk prediction model based on community follow-up big data mining can be used for large-scale risk screening and early warning by community doctors based on patient follow-up data, further promoting diabetes prevention and control strategies, and can also be used for wearable devices or intelligent biosensors for individual patient self examination, in order to improve lifestyle and reduce risk factor levels.
Collapse
Affiliation(s)
- Liangjun Jiang
- College of Information and Communication Engineering, State Key Lab of Marine Resource Utilisation in South China Sea, Hainan University, Haikou, China
| | - Zhenhua Xia
- Electronics & Information School of Yangtze University, Jingzhou, China
| | - Ronghui Zhu
- Shenzhen Nanshan Medical Group HQ, Shenzhen, China
| | - Haimei Gong
- College of Information and Communication Engineering, State Key Lab of Marine Resource Utilisation in South China Sea, Hainan University, Haikou, China
| | - Jing Wang
- E-link Wisdom Co., Ltd, Shenzhen, China
| | - Juan Li
- Haizhu District Community Health Development Guidance Center, Guangzhou, China
| | - Lei Wang
- College of Information and Communication Engineering, State Key Lab of Marine Resource Utilisation in South China Sea, Hainan University, Haikou, China
| |
Collapse
|
8
|
Zhou D, Qiu H, Wang L, Shen M. Risk prediction of heart failure in patients with ischemic heart disease using network analytics and stacking ensemble learning. BMC Med Inform Decis Mak 2023; 23:99. [PMID: 37221512 DOI: 10.1186/s12911-023-02196-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 05/15/2023] [Indexed: 05/25/2023] Open
Abstract
BACKGROUND Heart failure (HF) is a major complication following ischemic heart disease (IHD) and it adversely affects the outcome. Early prediction of HF risk in patients with IHD is beneficial for timely intervention and for reducing disease burden. METHODS Two cohorts, cases for patients first diagnosed with IHD and then with HF (N = 11,862) and control IHD patients without HF (N = 25,652), were established from the hospital discharge records in Sichuan, China during 2015-2019. Directed personal disease network (PDN) was constructed for each patient, and then these PDNs were merged to generate the baseline disease network (BDN) for the two cohorts, respectively, which identifies the health trajectories of patients and the complex progression patterns. The differences between the BDNs of the two cohort was represented as disease-specific network (DSN). Three novel network features were exacted from PDN and DSN to represent the similarity of disease patterns and specificity trends from IHD to HF. A stacking-based ensemble model DXLR was proposed to predict HF risk in IHD patients using the novel network features and basic demographic features (i.e., age and sex). The Shapley Addictive exPlanations method was applied to analyze the feature importance of the DXLR model. RESULTS Compared with the six traditional machine learning models, our DXLR model exhibited the highest AUC (0.934 ± 0.004), accuracy (0.857 ± 0.007), precision (0.723 ± 0.014), recall (0.892 ± 0.012) and F1 score (0.798 ± 0.010). The feature importance showed that the novel network features ranked as the top three features, playing a notable role in predicting HF risk of IHD patient. The feature comparison experiment also indicated that our novel network features were superior to those proposed by the state-of-the-art study in improving the performance of the prediction model, with an increase in AUC by 19.9%, in accuracy by 18.7%, in precision by 30.7%, in recall by 37.4%, and in F1 score by 33.7%. CONCLUSIONS Our proposed approach that combines network analytics and ensemble learning effectively predicts HF risk in patients with IHD. This highlights the potential value of network-based machine learning in disease risk prediction field using administrative data.
Collapse
Affiliation(s)
- Dejia Zhou
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, No.2006, Xiyuan Ave, West Hi-Tech Zone, Chengdu, Sichuan, 611731, P.R. China
| | - Hang Qiu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, No.2006, Xiyuan Ave, West Hi-Tech Zone, Chengdu, Sichuan, 611731, P.R. China.
- Big Data Research Center, University of Electronic Science and Technology of China, Chengdu, China.
| | - Liya Wang
- Big Data Research Center, University of Electronic Science and Technology of China, Chengdu, China
| | - Minghui Shen
- Health Information Center of Sichuan Province, Chengdu, China
| |
Collapse
|
9
|
Kumar R, Maheshwari S, Sharma A, Linda S, Kumar S, Chatterjee I. Ensemble learning-based early detection of influenza disease. Multimed Tools Appl 2023:1-21. [PMID: 37362719 PMCID: PMC10199437 DOI: 10.1007/s11042-023-15848-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 08/16/2022] [Accepted: 05/15/2023] [Indexed: 06/28/2023]
Abstract
Across the world, the seasonal disease influenza is a respiratory illness that impacts all age groups in many ways. Its symptoms are fever, chills, aches, pains, headaches, fatigue, cough, and weakness. Seasonal influenza can cause mild to severe illness and lead to death at times. The task of early detection of influenza is an important research area these days. Various studies show that machine learning techniques have attracted many researchers' attention to the early detection of influenza disease. In this paper, early detection of Influenza disease among all age groups is done using various machine learning techniques. Influenza Research Database and the Human Surveillance Records data sets are used. Data analysis is undertaken, and ensemble-based stacked algorithms are implemented on the whole data set. The performance of different models has been evaluated using different performance metrics. Overall, the study proposes efficient machine learning models that can be implemented to provide a cheaper and quicker diagnostic tool for detecting influenza.
Collapse
Affiliation(s)
- Ranjan Kumar
- Department of Computer Science, Aryabhatta College, University of Delhi, Delhi, 110021 India
| | - Sajal Maheshwari
- Department of Computer Science, Aryabhatta College, University of Delhi, Delhi, 110021 India
| | - Anushka Sharma
- Department of Computer Science, Aryabhatta College, University of Delhi, Delhi, 110021 India
| | - Sonal Linda
- Department of Computer Science, Aryabhatta College, University of Delhi, Delhi, 110021 India
| | - Subhash Kumar
- Department of Physics, Acharya Narendra Dev College, University of Delhi, Delhi, 110019 India
| | - Indranath Chatterjee
- Department of Computer Engineering, Tongmyong University, Busan, 48520 South Korea
- School of Technology, Woxsen University, Hyderabad, Telangana 500033 India
| |
Collapse
|
10
|
Choudhary GI, Fränti P. Predicting onset of disease progression using temporal disease occurrence networks. Int J Med Inform 2023; 175:105068. [PMID: 37104895 DOI: 10.1016/j.ijmedinf.2023.105068] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2022] [Revised: 03/27/2023] [Accepted: 04/05/2023] [Indexed: 04/29/2023]
Abstract
OBJECTIVE Early recognition and prevention are crucial for reducing the risk of disease progression. This study aimed to develop a novel technique based on a temporal disease occurrence network to analyze and predict disease progression. METHODS This study used a total of 3.9 million patient records. Patient health records were transformed into temporal disease occurrence networks, and a supervised depth first search was used to find frequent disease sequences to predict the onset of disease progression. The diseases represented nodes in the network and paths between nodes represented edges that co-occurred in a patient cohort with temporal order. The node and edge level attributes contained meta-information about patients' gender, age group, and identity as labels where the disease occurred. The node and edge level attributes guided the depth first search to identify frequent disease occurrences in specific genders and age groups. The patient history was used to match the most frequent disease occurrences and then the obtained sequences were merged together to generate a ranked list of diseases with their conditional probability and relative risk. RESULTS The study found that the proposed method had improved performance compared to other methods. Specifically, when predicting a single disease, the method achieved an area under the receiver operating characteristic curve (AUC) of 0.65 and an F1-score of 0.11. When predicting a set of diseases relative to ground truth, the method achieved an AUC of 0.68 and an F1-score of 0.13. CONCLUSION The ranked list generated by the proposed method, which includes the probability of occurrence and relative risk score, can provide physicians with valuable information about the sequential development of diseases in patients. This information can help physicians to take preventive measures in a timely manner, based on the best available information.
Collapse
Affiliation(s)
| | - P Fränti
- School of Computing, University of Eastern Finland.
| |
Collapse
|
11
|
Lu H, Uddin S. Disease Prediction Using Graph Machine Learning Based on Electronic Health Data: A Review of Approaches and Trends. Healthcare (Basel) 2023; 11:healthcare11071031. [PMID: 37046958 PMCID: PMC10094099 DOI: 10.3390/healthcare11071031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 03/11/2023] [Accepted: 04/01/2023] [Indexed: 04/07/2023] Open
Abstract
Graph machine-learning (ML) methods have recently attracted great attention and have made significant progress in graph applications. To date, most graph ML approaches have been evaluated on social networks, but they have not been comprehensively reviewed in the health informatics domain. Herein, a review of graph ML methods and their applications in the disease prediction domain based on electronic health data is presented in this study from two levels: node classification and link prediction. Commonly used graph ML approaches for these two levels are shallow embedding and graph neural networks (GNN). This study performs comprehensive research to identify articles that applied or proposed graph ML models on disease prediction using electronic health data. We considered journals and conferences from four digital library databases (i.e., PubMed, Scopus, ACM digital library, and IEEEXplore). Based on the identified articles, we review the present status of and trends in graph ML approaches for disease prediction using electronic health data. Even though GNN-based models have achieved outstanding results compared with the traditional ML methods in a wide range of disease prediction tasks, they still confront interpretability and dynamic graph challenges. Though the disease prediction field using ML techniques is still emerging, GNN-based models have the potential to be an excellent approach for disease prediction, which can be used in medical diagnosis, treatment, and the prognosis of diseases.
Collapse
Affiliation(s)
- Haohui Lu
- School of Project Management, Faculty of Engineering, The University of Sydney, Forest Lodge, Sydney, NSW 2037, Australia
| | - Shahadat Uddin
- School of Project Management, Faculty of Engineering, The University of Sydney, Forest Lodge, Sydney, NSW 2037, Australia
| |
Collapse
|
12
|
Qi H, Song X, Liu S, Zhang Y, Wong KKL. KFPredict: An ensemble learning prediction framework for diabetes based on fusion of key features. Comput Methods Programs Biomed 2023; 231:107378. [PMID: 36731312 DOI: 10.1016/j.cmpb.2023.107378] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Revised: 10/30/2022] [Accepted: 01/25/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND AND OBJECTIVE Diabetes is a disease that requires early detection and early treatment, and complications are likely to occur in late stages of the disease, threatening the life of patients. Therefore, in order to diagnose diabetic patients as early as possible, it is necessary to establish a model that can accurately predict diabetes. METHODOLOGY This paper proposes an ensemble learning framework: KFPredict, which combines multi-input models with key features and machine learning algorithms. We first propose a multi-input neural network model (KF_NN) that fuses key features and uses a decision tree-based selection recursive feature elimination algorithm and correlation coefficient method to screen out the key feature inputs and secondary feature inputs in the model. We then ensemble KF_NN with three machine learning algorithms (i.e., Support Vector Machine, Random Forest and K-Nearest Neighbors) for soft voting to form our predictive classifier for diabetes prediction. RESULTS Our framework demonstrates good prediction results on the test set with a sensitivity of 0.85, a specificity of 0.98, and an accuracy of 93.5%. Compared with the single prediction method KFPredict, the accuracy is up to 18.18% higher. Concurrently, we also compared KFPredict with the existing prediction methods. It still has good prediction performance, and the accuracy rate is improved by up to 14.93%. CONCLUSION This paper constructs a diabetes prediction framework that combines multi-input models with key features and machine learning algorithms. Taking tthe PIMA diabetes dataset as the test data, the experiment shows that the framework presents good prediction results.
Collapse
Affiliation(s)
- Huamei Qi
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Xiaomeng Song
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Shengzong Liu
- School of Information Technology and Management, Hunan University of Finance and Economics, Changsha, 410075, China.
| | - Yan Zhang
- Department of Computing, School of Computing, Engineering and Built Environment, Glasgow Caledonian University, Glasgow, G4 0BA, UK
| | - Kelvin K L Wong
- School of Electrical and Electronic Engineering, The University of Adelaide, North Terrace, Adelaide SA 5000, Australia.
| |
Collapse
|
13
|
Abstract
AbstractA broad range of medical diagnoses is based on analyzing disease images obtained through high-tech digital devices. The application of artificial intelligence (AI) in the assessment of medical images has led to accurate evaluations being performed automatically, which in turn has reduced the workload of physicians, decreased errors and times in diagnosis, and improved performance in the prediction and detection of various diseases. AI techniques based on medical image processing are an essential area of research that uses advanced computer algorithms for prediction, diagnosis, and treatment planning, leading to a remarkable impact on decision-making procedures. Machine Learning (ML) and Deep Learning (DL) as advanced AI techniques are two main subfields applied in the healthcare system to diagnose diseases, discover medication, and identify patient risk factors. The advancement of electronic medical records and big data technologies in recent years has accompanied the success of ML and DL algorithms. ML includes neural networks and fuzzy logic algorithms with various applications in automating forecasting and diagnosis processes. DL algorithm is an ML technique that does not rely on expert feature extraction, unlike classical neural network algorithms. DL algorithms with high-performance calculations give promising results in medical image analysis, such as fusion, segmentation, recording, and classification. Support Vector Machine (SVM) as an ML method and Convolutional Neural Network (CNN) as a DL method is usually the most widely used techniques for analyzing and diagnosing diseases. This review study aims to cover recent AI techniques in diagnosing and predicting numerous diseases such as cancers, heart, lung, skin, genetic, and neural disorders, which perform more precisely compared to specialists without human error. Also, AI's existing challenges and limitations in the medical area are discussed and highlighted.
Collapse
Affiliation(s)
- Nafiseh Ghaffar Nia
- College of Engineering and Computer Science, The University of Tennessee at Chattanooga, Chattanooga, TN 37403 USA
| | - Erkan Kaplanoglu
- College of Engineering and Computer Science, The University of Tennessee at Chattanooga, Chattanooga, TN 37403 USA
| | - Ahad Nasab
- College of Engineering and Computer Science, The University of Tennessee at Chattanooga, Chattanooga, TN 37403 USA
| |
Collapse
|
14
|
Afsaneh E, Sharifdini A, Ghazzaghi H, Ghobadi MZ. Recent applications of machine learning and deep learning models in the prediction, diagnosis, and management of diabetes: a comprehensive review. Diabetol Metab Syndr 2022; 14:196. [PMID: 36572938 PMCID: PMC9793536 DOI: 10.1186/s13098-022-00969-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 12/16/2022] [Indexed: 12/28/2022] Open
Abstract
Diabetes as a metabolic illness can be characterized by increased amounts of blood glucose. This abnormal increase can lead to critical detriment to the other organs such as the kidneys, eyes, heart, nerves, and blood vessels. Therefore, its prediction, prognosis, and management are essential to prevent harmful effects and also recommend more useful treatments. For these goals, machine learning algorithms have found considerable attention and have been developed successfully. This review surveys the recently proposed machine learning (ML) and deep learning (DL) models for the objectives mentioned earlier. The reported results disclose that the ML and DL algorithms are promising approaches for controlling blood glucose and diabetes. However, they should be improved and employed in large datasets to affirm their applicability.
Collapse
|
15
|
Yan SR, Guo W, Mohammadzadeh A, Rathinasamy S. Optimal deep learning control for modernized microgrids. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04298-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
16
|
Rashid J, Batool S, Kim J, Wasif Nisar M, Hussain A, Juneja S, Kushwaha R. An Augmented Artificial Intelligence Approach for Chronic Diseases Prediction. Front Public Health 2022; 10:860396. [PMID: 35433587 PMCID: PMC9008324 DOI: 10.3389/fpubh.2022.860396] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2022] [Accepted: 02/22/2022] [Indexed: 12/23/2022] Open
Abstract
Chronic diseases are increasing in prevalence and mortality worldwide. Early diagnosis has therefore become an important research area to enhance patient survival rates. Several research studies have reported classification approaches for specific disease prediction. In this paper, we propose a novel augmented artificial intelligence approach using an artificial neural network (ANN) with particle swarm optimization (PSO) to predict five prevalent chronic diseases including breast cancer, diabetes, heart attack, hepatitis, and kidney disease. Seven classification algorithms are compared to evaluate the proposed model's prediction performance. The ANN prediction model constructed with a PSO based feature extraction approach outperforms other state-of-the-art classification approaches when evaluated with accuracy. Our proposed approach gave the highest accuracy of 99.67%, with the PSO. However, the classification model's performance is found to depend on the attributes of data used for classification. Our results are compared with various chronic disease datasets and shown to outperform other benchmark approaches. In addition, our optimized ANN processing is shown to require less time compared to random forest (RF), deep learning and support vector machine (SVM) based methods. Our study could play a role for early diagnosis of chronic diseases in hospitals, including through development of online diagnosis systems.
Collapse
Affiliation(s)
- Junaid Rashid
- Department of Computer Science and Engineering, Kongju National University, Cheonan, South Korea
| | - Saba Batool
- Department of Computer Science, COMSATS University Islamabad, Islamabad, Pakistan
| | - Jungeun Kim
- Department of Computer Science and Engineering, Kongju National University, Cheonan, South Korea
- *Correspondence: Jungeun Kim
| | - Muhammad Wasif Nisar
- Department of Computer Science, COMSATS University Islamabad, Islamabad, Pakistan
| | - Amir Hussain
- Data Science and Cyber Analytics Research Group, Edinburgh Napier University, Edinburgh, United Kingdom
| | - Sapna Juneja
- Department of Computer Science, KIET Group of Institutions, Ghaziabad, India
| | - Riti Kushwaha
- Department of Computer Science, Bennett University, Greater Noida, India
| |
Collapse
|
17
|
Hu Z, Qiu H, Wang L, Shen M. Network analytics and machine learning for predicting length of stay in elderly patients with chronic diseases at point of admission. BMC Med Inform Decis Mak 2022; 22:62. [PMID: 35272654 PMCID: PMC8915508 DOI: 10.1186/s12911-022-01802-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Accepted: 03/07/2022] [Indexed: 11/13/2022] Open
Abstract
Background An aging population with a burden of chronic diseases puts increasing pressure on health care systems. Early prediction of the hospital length of stay (LOS) can be useful in optimizing the allocation of medical resources, and improving healthcare quality. However, the data available at the point of admission (PoA) are limited, making it difficult to forecast the LOS accurately. Methods In this study, we proposed a novel approach combining network analytics and machine learning to predict the LOS in elderly patients with chronic diseases at the PoA. Two networks, including multimorbidity network (MN) and patient similarity network (PSN), were constructed and novel network features were created. Five machine learning models (eXtreme Gradient Boosting, Gradient Boosting Decision Tree, Random Forest, Linear Support Vector Machine, and Deep Neural Network) with different input feature sets were developed to compare their performance. Results The experimental results indicated that the network features can bring significant improvements to the performances of the prediction models, suggesting that the MN and PSN are useful for LOS predictions. Conclusion Our predictive framework which integrates network science with data mining can forecast the LOS effectively at the PoA and provide decision support for hospital managers, which highlights the potential value of network-based machine learning in healthcare field.
Collapse
Affiliation(s)
- Zhixu Hu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, No. 2006, Xiyuan Ave, West Hi-Tech Zone, 611731, Chengdu, Sichuan, People's Republic of China
| | - Hang Qiu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, No. 2006, Xiyuan Ave, West Hi-Tech Zone, 611731, Chengdu, Sichuan, People's Republic of China. .,Big Data Research Center, University of Electronic Science and Technology of China, Chengdu, People's Republic of China.
| | - Liya Wang
- Big Data Research Center, University of Electronic Science and Technology of China, Chengdu, People's Republic of China
| | - Minghui Shen
- Health Information Center of Sichuan Province, Chengdu, People's Republic of China
| |
Collapse
|
18
|
Howlader KC, Satu MS, Awal MA, Islam MR, Islam SMS, Quinn JMW, Moni MA. Machine learning models for classification and identification of significant attributes to detect type 2 diabetes. Health Inf Sci Syst 2022; 10. [PMID: 35178244 PMCID: PMC8828812 DOI: 10.1007/s13755-021-00168-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 10/27/2021] [Indexed: 12/15/2022] Open
Abstract
Type 2 Diabetes (T2D) is a chronic disease characterized by abnormally high blood glucose levels due to insulin resistance and reduced pancreatic insulin production. The challenge of this work is to identify T2D-associated features that can distinguish T2D sub-types for prognosis and treatment purposes. We thus employed machine learning (ML) techniques to categorize T2D patients using data from the Pima Indian Diabetes Dataset from the Kaggle ML repository. After data preprocessing, several feature selection techniques were used to extract feature subsets, and a range of classification techniques were used to analyze these. We then compared the derived classification results to identify the best classifiers by considering accuracy, kappa statistics, area under the receiver operating characteristic (AUROC), sensitivity, specificity, and logarithmic loss (logloss). To evaluate the performance of different classifiers, we investigated their outcomes using the summary statistics with a resampling distribution. Therefore, Generalized Boosted Regression modeling showed the highest accuracy (90.91%), followed by kappa statistics (78.77%) and specificity (85.19%). In addition, Sparse Distance Weighted Discrimination, Generalized Additive Model using LOESS and Boosted Generalized Additive Models also gave the maximum sensitivity (100%), highest AUROC (95.26%) and lowest logarithmic loss (30.98%) respectively. Notably, the Generalized Additive Model using LOESS was the top-ranked algorithm according to non-parametric Friedman testing. Of the features identified by these machine learning models, glucose levels, body mass index, diabetes pedigree function, and age were consistently identified as the best and most frequently accurate outcome predictors. These results indicate the utility of ML methods in constructing improved prediction models for T2D and successfully identified outcome predictors for this Pima Indian population.
Collapse
|
19
|
Lu H, Uddin S. A disease network-based recommender system framework for predictive risk modelling of chronic diseases and their comorbidities. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02963-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
20
|
Abstract
Chronic disease prediction is a critical task in healthcare. Existing studies fulfil this requirement by employing machine learning techniques based on patient features, but they suffer from high dimensional data problems and a high level of bias. We propose a framework for predicting chronic disease based on Graph Neural Networks (GNNs) to address these issues. We begin by projecting a patient-disease bipartite graph to create a weighted patient network (WPN) that extracts the latent relationship among patients. We then use GNN-based techniques to build prediction models. These models use features extracted from WPN to create robust patient representations for chronic disease prediction. We compare the output of GNN-based models to machine learning methods by using cardiovascular disease and chronic pulmonary disease. The results show that our framework enhances the accuracy of chronic disease prediction. The model with attention mechanisms achieves an accuracy of 93.49% for cardiovascular disease prediction and 89.15% for chronic pulmonary disease prediction. Furthermore, the visualisation of the last hidden layers of GNN-based models shows the pattern for the two cohorts, demonstrating the discriminative strength of the framework. The proposed framework can help stakeholders improve health management systems for patients at risk of developing chronic diseases and conditions.
Collapse
Affiliation(s)
- Haohui Lu
- School of Project Management, Faculty of Engineering, The University of Sydney, 21 Ross St, Forest Lodge, NSW, 2037, Australia
| | - Shahadat Uddin
- School of Project Management, Faculty of Engineering, The University of Sydney, 21 Ross St, Forest Lodge, NSW, 2037, Australia.
| |
Collapse
|
21
|
Uddin S, Imam T, Hossain ME, Gide E, Sianaki OA, Moni MA, Mohammed AA, Vandana V. Intelligent type 2 diabetes risk prediction from administrative claim data. Inform Health Soc Care 2021; 47:243-257. [PMID: 34672859 DOI: 10.1080/17538157.2021.1988957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Type 2 diabetes is a chronic, costly disease and is a serious global population health problem. Yet, the disease is well manageable and preventable if there is an early warning. This study aims to apply supervised machine learning algorithms for developing predictive models for type 2 diabetes using administrative claim data. Following guidelines from the Elixhauser Comorbidity Index, 31 variables were considered. Five supervised machine learning algorithms were used for developing type 2 diabetes prediction models. Principal component analysis was applied to rank variables' importance in predictive models. Random forest (RF) showed the highest accuracy (85.06%) among the algorithms, closely followed by the k-nearest neighbor (84.48%). The analysis further revealed RF as a high performing algorithm irrespective of data imbalance. As revealed by the principal component analysis, patient age is the most important predictor for type 2 diabetes, followed by a comorbid condition (i.e., solid tumor without metastasis). This study's finding of RF as the best performing classifier is consistent with the promise of tree-based algorithms for public data in other works. Thus, the outcome can guide in designing automated surveillance of patients at risk of forming diabetes from administrative claim information and will be useful to health regulators and insurers.
Collapse
Affiliation(s)
- Shahadat Uddin
- Complex Systems Research Group, Faculty of Engineering, The University of Sydney, Darlington, NSW, Australia
| | - Tasadduq Imam
- School of Business and Law, CQUniversity, Melbourne, VIC, Australia
| | - Md Ekramul Hossain
- Complex Systems Research Group, Faculty of Engineering, The University of Sydney, Darlington, NSW, Australia
| | - Ergun Gide
- School of Engineering and Technology, CQUniversity, Sydney, NSW, Australia
| | - Omid Ameri Sianaki
- College of Engineering and Science, Victoria University, Sydney, NSW, Australia.,Victoria University Business School, Melbourne, Victoria, Australia
| | - Mohammad Ali Moni
- School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD, Australia
| | | | - Vandana Vandana
- College of Engineering and Science, Victoria University, Sydney, NSW, Australia
| |
Collapse
|