1
|
Yang Q, Meng W, Zhuang P, Anton S, Wu Y, Yin R. AutoRADP: An Interpretable Deep Learning Framework to Predict Rapid Progression for Alzheimer's Disease and Related Dementias Using Electronic Health Records. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2025.04.06.25325337. [PMID: 40297450 PMCID: PMC12036374 DOI: 10.1101/2025.04.06.25325337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2025]
Abstract
Alzheimer's disease (AD) and AD-related dementias (ADRD) exhibit heterogeneous progression rates, with rapid progression (RP) posing significant challenges for timely intervention and treatment. The increasingly available patient-centered electronic health records (EHRs) have made it possible to develop advanced machine learning models for risk prediction of disease progression by leveraging comprehensive clinical, demographic, and laboratory data. In this study, we propose AutoRADP, an interpretable autoencoder-based framework that predicts rapid AD/ADRD progression using both structured and unstructured EHR data from UFHealth. AutoRADP incorporates a rule-based natural language processing method to extract critical cognitive assessments from clinical notes, combined with feature selection techniques to identify essential structured EHR features. To address the data imbalance issue, we implement a hybrid sampling strategy that combines similarity-based and clustering-based upsampling. Additionally, by utilizing SHapley Additive exPlanations (SHAP) values, we provide interpretable predictions, shedding light on the key factors driving the rapid progression of AD/ADRD. We demonstrate that AutoRADP outperforms existing methods, highlighting the potential of our framework to advance precision medicine by enabling accurate and interpretable predictions of rapid AD/ADRD progression, and thereby supporting improved clinical decision-making and personalized interventions.
Collapse
|
2
|
Qi X, Han Y, Zhang Y, Ma N, Liu Z, Zhai J, Guo H. Development and validation of a support vector machine-based nomogram for diagnosis of obstetric antiphospholipid syndrome. Clin Chim Acta 2025; 568:120122. [PMID: 39765286 DOI: 10.1016/j.cca.2025.120122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 01/02/2025] [Accepted: 01/04/2025] [Indexed: 01/12/2025]
Abstract
BACKGROUND Antiphospholipid Syndrome (APS) is a systemic autoimmune disorder characterized by arterial or venous thrombosis and/or pregnancy complications. This study aims to develop a diagnostic model for Obstetric APS (OAPS) using the Support Vector Machine (SVM) algorithm. METHODS Data were retrospectively collected from 102 patients with OAPS and 80 healthy controls (HC). Utilizing random sampling, patients were randomly allocated into a training set and a validation set. The training set comprised 72 OAPS patients and 52 HCs, while the validation set included 30 OAPS patients and 24 HCs. Univariate logistic regression analysis and the LASSO method were employed to screen feature variables. Subsequently, the selected feature variables were used to construct a diagnostic model based on the SVM algorithm, which was then validated within the training set. RESULTS An optimal subset comprising 12 clinical features was curated. This ensemble of clinical features exhibited formidable predictive efficacy within both the training and validation datasets, as evidenced by Area Under the Curve (AUC) values of 0.969 and 0.942, sensitivities of 0.875 and 0.867, and specificities of 0.929 and 0.875, respectively. Furthermore, the nomogram generated a Concordance Index (C-index) of 0.851 across the entire dataset. Decision curve analysis demonstrates that the combined nomogram and TAT nomogram offer greater net benefit compared to nomograms based on other individual clinical indicators within the dataset. CONCLUSION The SVM-based model can effectively diagnose patients with OAPS.
Collapse
Affiliation(s)
- Xuan Qi
- Department of Rheumatism and Immunology, The Second Hospital of Hebei Medical University, Shijiazhuang, Hebei 050000, PR China
| | - Yan Han
- Department of Fertility, The Second Hospital of Hebei Medical University, Shijiazhuang, Hebei 050000, PR China
| | - Yue Zhang
- Department of Rheumatism and Immunology, The Second Hospital of Hebei Medical University, Shijiazhuang, Hebei 050000, PR China
| | - Nianqiang Ma
- Department of Emergency, The Fourth Hospital of Shijiazhuang, Shijiazhuang, Hebei 050000, PR China
| | - Zhifeng Liu
- Department of Rheumatism and Immunology, The Second Hospital of Hebei Medical University, Shijiazhuang, Hebei 050000, PR China
| | - Jiajia Zhai
- Department of Fertility, The Second Hospital of Hebei Medical University, Shijiazhuang, Hebei 050000, PR China
| | - Huifang Guo
- Department of Rheumatism and Immunology, The Second Hospital of Hebei Medical University, Shijiazhuang, Hebei 050000, PR China.
| |
Collapse
|
3
|
Cardamone NC, Olfson M, Schmutte T, Ungar L, Liu T, Cullen SW, Williams NJ, Marcus SC. Classifying Unstructured Text in Electronic Health Records for Mental Health Prediction Models: Large Language Model Evaluation Study. JMIR Med Inform 2025; 13:e65454. [PMID: 39864953 PMCID: PMC11884378 DOI: 10.2196/65454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 11/25/2024] [Accepted: 11/30/2024] [Indexed: 01/28/2025] Open
Abstract
Background Prediction models have demonstrated a range of applications across medicine, including using electronic health record (EHR) data to identify hospital readmission and mortality risk. Large language models (LLMs) can transform unstructured EHR text into structured features, which can then be integrated into statistical prediction models, ensuring that the results are both clinically meaningful and interpretable. Objective This study aims to compare the classification decisions made by clinical experts with those generated by a state-of-the-art LLM, using terms extracted from a large EHR data set of individuals with mental health disorders seen in emergency departments (EDs). Methods Using a dataset from the EHR systems of more than 50 health care provider organizations in the United States from 2016 to 2021, we extracted all clinical terms that appeared in at least 1000 records of individuals admitted to the ED for a mental health-related problem from a source population of over 6 million ED episodes. Two experienced mental health clinicians (one medically trained psychiatrist and one clinical psychologist) reached consensus on the classification of EHR terms and diagnostic codes into categories. We evaluated an LLM's agreement with clinical judgment across three classification tasks as follows: (1) classify terms into "mental health" or "physical health", (2) classify mental health terms into 1 of 42 prespecified categories, and (3) classify physical health terms into 1 of 19 prespecified broad categories. Results There was high agreement between the LLM and clinical experts when categorizing 4553 terms as "mental health" or "physical health" (κ=0.77, 95% CI 0.75-0.80). However, there was still considerable variability in LLM-clinician agreement on the classification of mental health terms (κ=0.62, 95% CI 0.59-0.66) and physical health terms (κ=0.69, 95% CI 0.67-0.70). Conclusions The LLM displayed high agreement with clinical experts when classifying EHR terms into certain mental health or physical health term categories. However, agreement with clinical experts varied considerably within both sets of mental and physical health term categories. Importantly, the use of LLMs presents an alternative to manual human coding, presenting great potential to create interpretable features for prediction models.
Collapse
Affiliation(s)
- Nicholas C Cardamone
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Mark Olfson
- Department of Psychiatry, the New York State Psychiatric Institute, New York, NY, United States
| | - Timothy Schmutte
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, United States
| | - Lyle Ungar
- Computer and Information Science, University of Pennsylvania, Philadelphia, PA, United States
| | - Tony Liu
- Computer and Information Science, University of Pennsylvania, Philadelphia, PA, United States
| | - Sara W Cullen
- School of Social Policy & Practice, University of Pennsylvania, Philadelphia, PA, United States
| | | | - Steven C Marcus
- School of Social Policy & Practice, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
4
|
Hancox Z, Pang A, Conaghan PG, Kingsbury SR, Clegg A, Relton SD. A systematic review of networks for prognostic prediction of health outcomes and diagnostic prediction of health conditions within Electronic Health Records. Artif Intell Med 2024; 158:102999. [PMID: 39488091 DOI: 10.1016/j.artmed.2024.102999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 10/01/2024] [Accepted: 10/16/2024] [Indexed: 11/04/2024]
Abstract
BACKGROUND AND OBJECTIVE Using graph theory, Electronic Health Records (EHRs) can be represented graphically to exploit the relational dependencies of the multiple information formats to improve Machine Learning (ML) prediction models. In this systematic qualitative review, we explore the question: How are graphs used on EHRs, to predict diagnosis and health outcomes? METHODOLOGY The search strategy identified studies that used patient-level graph representations of EHRs to utilise ML to predict health outcomes and diagnoses. We conducted our search on MEDLINE, Web of Science and Scopus. RESULTS 832 studies were identified by the search strategy, of which 27 studies were selected for data extraction. Following data extraction, 18 studies used ML with patient-level graph-based representations of EHRs to predict health outcomes and diagnoses. Models ranged from traditional ML to neural network-based models. MIMIC-III was the most used dataset (n = 6, where n is the number of occurrences), followed by National Health Insurance Research Database (NHIRD) (n = 4) and eICU Collaborative Research Database (eICU) (n = 4). The most predicted health outcomes were mortality (n = 9; 21%), hospital readmission (n = 9; 21%), and treatment success (n = 4; 9%). Model performances ranged across outcomes, mortality prediction (Area Under the Receiver Operating Characteristic (AUROC): 72.1 - 91.6; Area Under Precision-Recall Curve (AUPRC): 34.8 - 81.3) and readmission prediction (AUROC: 63.7 - 85.8; AUPRC 39.86 - 84.7). Only one paper had a low Risk of Bias (RoB) that applied to our research question (4%). CONCLUSION Graph-based representations using EHRs, for individual health outcomes and diagnoses requires further research before we can see the results applied clinically. The use of graph representations appears to improve EHR representation and predictive performance compared to baseline ML methods in multiple fields of medicine.
Collapse
Affiliation(s)
- Zoe Hancox
- University of Leeds, Leeds, United Kingdom.
| | - Allan Pang
- University of Leeds, Leeds, United Kingdom; Royal Centre for Defence Medicine, Research & Clinical Innovation (RCI), ICT Centre, Vincent Drive, Birmingham, United Kingdom.
| | - Philip G Conaghan
- Leeds Institute of Rheumatic and Musculoskeletal Medicine, University of Leeds, Leeds, United Kingdom; NIHR Leeds Biomedical Research Centre, United Kingdom
| | - Sarah R Kingsbury
- Leeds Institute of Rheumatic and Musculoskeletal Medicine, University of Leeds, Leeds, United Kingdom; NIHR Leeds Biomedical Research Centre, United Kingdom
| | | | | |
Collapse
|
5
|
Abuhantash F, Abu Hantash MK, AlShehhi A. Comorbidity-based framework for Alzheimer's disease classification using graph neural networks. Sci Rep 2024; 14:21061. [PMID: 39256497 PMCID: PMC11387500 DOI: 10.1038/s41598-024-72321-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Accepted: 09/05/2024] [Indexed: 09/12/2024] Open
Abstract
Alzheimer's disease (AD), the most prevalent form of dementia, requires early prediction for timely intervention. Current deep learning approaches, particularly those using traditional neural networks, face challenges such as handling high-dimensional data, interpreting complex relationships, and managing data bias. To address these limitations, we propose a framework utilizing graph neural networks (GNNs), which excel in modeling relationships within graph-structured data. Our study employs GNNs on data from the Alzheimer's Disease Neuroimaging Initiative for binary and multi-class classification across the three stages of AD: cognitively normal (CN), mild cognitive impairment (MCI), and Alzheimer's disease (AD). By incorporating comorbidity data derived from electronic health records, we achieved the most effective multi-classification results. Notably, the GNN model (Chebyshev Convolutional Neural Networks) demonstrated superior performance with a 0.98 accuracy in multi-class classification and 0.99, 0.93, and 0.94 in the AD/CN, AD/MCI, and CN/MCI binary tasks, respectively. The model's robustness was further validated using the Australian Imaging, Biomarker & Lifestyle dataset as an external validation set. This work contributes to the field by offering a robust, accurate, and cost-effective method for early AD prediction (CN vs. MCI), addressing key challenges in existing deep learning approaches.
Collapse
Affiliation(s)
- Ferial Abuhantash
- Department of Biomedical Engineering and Biotechnology, Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates
| | - Mohd Khalil Abu Hantash
- Department of Biomedical Engineering and Biotechnology, Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates
| | - Aamna AlShehhi
- Department of Biomedical Engineering and Biotechnology, Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates.
- Healthcare Engineering Innovation Group (HEIG), Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates.
| |
Collapse
|
6
|
Lu H, Uddin S. Embedding-based link predictions to explore latent comorbidity of chronic diseases. Health Inf Sci Syst 2023; 11:2. [PMID: 36593862 PMCID: PMC9803807 DOI: 10.1007/s13755-022-00206-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 12/13/2022] [Indexed: 12/31/2022] Open
Abstract
Purpose Comorbidity is a term used to describe when a patient simultaneously has more than one chronic disease. Comorbidity is a significant health issue that affects people worldwide. This study aims to use machine learning and graph theory to predict the comorbidity of chronic diseases. Methods A patient-disease bipartite graph is constructed based on the administrative claim data. The bipartite graph projection approach was used to create the comorbidity network. For the link prediction task, three graph machine learning embedding-based models (node2vec, graph neural networks and hand-crafted approach) with different variants were used on the comorbidity network to compare their performance. This study also considered three commonly used similarity-based link prediction approaches (Jaccard coefficient, Adamic-Adar index and Resource allocation index) for performance comparison. Results The results showed that the embedding-based hand-crafted features technique achieved outstanding performance compared with the remaining similarity-based and embedding-based models. Especially, the hand-crafted technique with the extreme gradient boosting classifier achieved the highest accuracy (91.67%), followed by the same technique with the Logistic regression classifier (90.26%). For this shallow embedding method, the Jaccard coefficient and the degree centrality of the original chronic disease were the most important features for comorbidity prediction. Conclusion The proposed framework can be used to predict the comorbidity of chronic disease at an early stage of hospital admission. Thus, the prediction outcome could be valuable for medical practice, giving healthcare providers more control over their services and lowering expenses.
Collapse
Affiliation(s)
- Haohui Lu
- School of Project Management, Faculty of Engineering, The University of Sydney, Level 2, 21 Ross Street, Forest Lodge, NSW 2037 Australia
| | - Shahadat Uddin
- School of Project Management, Faculty of Engineering, The University of Sydney, Level 2, 21 Ross Street, Forest Lodge, NSW 2037 Australia
| |
Collapse
|
7
|
Mews S, Surmann B, Hasemann L, Elkenkamp S. Markov-modulated marked Poisson processes for modeling disease dynamics based on medical claims data. Stat Med 2023; 42:3804-3815. [PMID: 37308135 DOI: 10.1002/sim.9832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 05/26/2023] [Accepted: 06/01/2023] [Indexed: 06/14/2023]
Abstract
We explore Markov-modulated marked Poisson processes (MMMPPs) as a natural framework for modeling patients' disease dynamics over time based on medical claims data. In claims data, observations do not only occur at random points in time but are also informative, that is, driven by unobserved disease levels, as poor health conditions usually lead to more frequent interactions with the health care system. Therefore, we model the observation process as a Markov-modulated Poisson process, where the rate of health care interactions is governed by a continuous-time Markov chain. Its states serve as proxies for the patients' latent disease levels and further determine the distribution of additional data collected at each observation time, the so-called marks. Overall, MMMPPs jointly model observations and their informative time points by comprising two state-dependent processes: the observation process (corresponding to the event times) and the mark process (corresponding to event-specific information), which both depend on the underlying states. The approach is illustrated using claims data from patients diagnosed with chronic obstructive pulmonary disease by modeling their drug use and the interval lengths between consecutive physician consultations. The results indicate that MMMPPs are able to detect distinct patterns of health care utilization related to disease processes and reveal interindividual differences in the state-switching dynamics.
Collapse
Affiliation(s)
- Sina Mews
- Department of Business Administration and Economics, Bielefeld University, Bielefeld, Germany
| | - Bastian Surmann
- Department for Health Economics and Health Care Management, Bielefeld University, Bielefeld, Germany
| | - Lena Hasemann
- Department for Health Economics and Health Care Management, Bielefeld University, Bielefeld, Germany
| | - Svenja Elkenkamp
- Department for Health Economics and Health Care Management, Bielefeld University, Bielefeld, Germany
| |
Collapse
|
8
|
Tsiampalis T, Panagiotakos D. Methodological issues of the electronic health records' use in the context of epidemiological investigations, in light of missing data: a review of the recent literature. BMC Med Res Methodol 2023; 23:180. [PMID: 37559072 PMCID: PMC10410989 DOI: 10.1186/s12874-023-02004-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Accepted: 07/27/2023] [Indexed: 08/11/2023] Open
Abstract
BACKGROUND Electronic health records (EHRs) are widely accepted to enhance the health care quality, patient monitoring, and early prevention of various diseases, even when there is incomplete or missing information in them. AIM The present review sought to investigate the impact of EHR implementation on healthcare quality and medical decision in the context of epidemiological investigations, considering missing or incomplete data. METHODS Google scholar, Medline (via PubMed) and Scopus databases were searched for studies investigating the impact of EHR implementation on healthcare quality and medical decision, as well as for studies investigating the way of dealing with missing data, and their impact on medical decision and the development process of prediction models. Electronic searches were carried out up to 2022. RESULTS EHRs were shown that they constitute an increasingly important tool for both physicians, decision makers and patients, which can improve national healthcare systems both for the convenience of patients and doctors, while they improve the quality of health care as well as they can also be used in order to save money. As far as the missing data handling techniques is concerned, several investigators have already tried to propose the best possible methodology, yet there is no wide consensus and acceptance in the scientific community, while there are also crucial gaps which should be addressed. CONCLUSIONS Through the present thorough investigation, the importance of the EHRs' implementation in clinical practice was established, while at the same time the gap of knowledge regarding the missing data handling techniques was also pointed out.
Collapse
Affiliation(s)
- Thomas Tsiampalis
- Department of Nutrition and Dietetics, School of Health Sciences and Education, Harokopio University, Athens, Greece
| | - Demosthenes Panagiotakos
- Department of Nutrition and Dietetics, School of Health Sciences and Education, Harokopio University, Athens, Greece.
- Faculty of Health, University of Canberra, Canberra, Australia.
| |
Collapse
|
9
|
Amirahmadi A, Ohlsson M, Etminani K. Deep learning prediction models based on EHR trajectories: A systematic review. J Biomed Inform 2023; 144:104430. [PMID: 37380061 DOI: 10.1016/j.jbi.2023.104430] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 06/08/2023] [Accepted: 06/17/2023] [Indexed: 06/30/2023]
Abstract
BACKGROUND Electronic health records (EHRs) are generated at an ever-increasing rate. EHR trajectories, the temporal aspect of health records, facilitate predicting patients' future health-related risks. It enables healthcare systems to increase the quality of care through early identification and primary prevention. Deep learning techniques have shown great capacity for analyzing complex data and have been successful for prediction tasks using complex EHR trajectories. This systematic review aims to analyze recent studies to identify challenges, knowledge gaps, and ongoing research directions. METHODS For this systematic review, we searched Scopus, PubMed, IEEE Xplore, and ACM databases from Jan 2016 to April 2022 using search terms centered around EHR, deep learning, and trajectories. Then the selected papers were analyzed according to publication characteristics, objectives, and their solutions regarding existing challenges, such as the model's capacity to deal with intricate data dependencies, data insufficiency, and explainability. RESULTS After removing duplicates and out-of-scope papers, 63 papers were selected, which showed rapid growth in the number of research in recent years. Predicting all diseases in the next visit and the onset of cardiovascular diseases were the most common targets. Different contextual and non-contextual representation learning methods are employed to retrieve important information from the sequence of EHR trajectories. Recurrent neural networks and the time-aware attention mechanism for modeling long-term dependencies, self-attentions, convolutional neural networks, graphs for representing inner visit relations, and attention scores for explainability were frequently used among the reviewed publications. CONCLUSIONS This systematic review demonstrated how recent breakthroughs in deep learning methods have facilitated the modeling of EHR trajectories. Research on improving the ability of graph neural networks, attention mechanisms, and cross-modal learning to analyze intricate dependencies among EHRs has shown good progress. There is a need to increase the number of publicly available EHR trajectory datasets to allow for easier comparison among different models. Also, very few developed models can handle all aspects of EHR trajectory data.
Collapse
Affiliation(s)
- Ali Amirahmadi
- Center for Applied Intelligent Systems Research, Halmstad University, Sweden.
| | - Mattias Ohlsson
- Center for Applied Intelligent Systems Research, Halmstad University, Sweden; Computational Biology & Biological Physics, Department of Astronomy and Theoretical Physics, Lund University, Sweden
| | - Kobra Etminani
- Center for Applied Intelligent Systems Research, Halmstad University, Sweden
| |
Collapse
|
10
|
Greenberg ZF, Graim KS, He M. Towards artificial intelligence-enabled extracellular vesicle precision drug delivery. Adv Drug Deliv Rev 2023:114974. [PMID: 37356623 DOI: 10.1016/j.addr.2023.114974] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 06/21/2023] [Accepted: 06/22/2023] [Indexed: 06/27/2023]
Abstract
Extracellular Vesicles (EVs), particularly exosomes, recently exploded into nanomedicine as an emerging drug delivery approach due to their superior biocompatibility, circulating stability, and bioavailability in vivo. However, EV heterogeneity makes molecular targeting precision a critical challenge. Deciphering key molecular drivers for controlling EV tissue targeting specificity is in great need. Artificial intelligence (AI) brings powerful prediction ability for guiding the rational design of engineered EVs in precision control for drug delivery. This review focuses on cutting-edge nano-delivery via integrating large-scale EV data with AI to develop AI-directed EV therapies and illuminate the clinical translation potential. We briefly review the current status of EVs in drug delivery, including the current frontier, limitations, and considerations to advance the field. Subsequently, we detail the future of AI in drug delivery and its impact on precision EV delivery. Our review discusses the current universal challenge of standardization and critical considerations when using AI combined with EVs for precision drug delivery. Finally, we will conclude this review with a perspective on future clinical translation led by a combined effort of AI and EV research.
Collapse
Affiliation(s)
- Zachary F Greenberg
- Department of Pharmaceutics, College of Pharmacy, University of Florida, Gainesville, Florida, 32610, USA
| | - Kiley S Graim
- Department of Computer & Information Science & Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, 32610, USA
| | - Mei He
- Department of Pharmaceutics, College of Pharmacy, University of Florida, Gainesville, Florida, 32610, USA.
| |
Collapse
|
11
|
Lu H, Uddin S. Disease Prediction Using Graph Machine Learning Based on Electronic Health Data: A Review of Approaches and Trends. Healthcare (Basel) 2023; 11:healthcare11071031. [PMID: 37046958 PMCID: PMC10094099 DOI: 10.3390/healthcare11071031] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 03/11/2023] [Accepted: 04/01/2023] [Indexed: 04/07/2023] Open
Abstract
Graph machine-learning (ML) methods have recently attracted great attention and have made significant progress in graph applications. To date, most graph ML approaches have been evaluated on social networks, but they have not been comprehensively reviewed in the health informatics domain. Herein, a review of graph ML methods and their applications in the disease prediction domain based on electronic health data is presented in this study from two levels: node classification and link prediction. Commonly used graph ML approaches for these two levels are shallow embedding and graph neural networks (GNN). This study performs comprehensive research to identify articles that applied or proposed graph ML models on disease prediction using electronic health data. We considered journals and conferences from four digital library databases (i.e., PubMed, Scopus, ACM digital library, and IEEEXplore). Based on the identified articles, we review the present status of and trends in graph ML approaches for disease prediction using electronic health data. Even though GNN-based models have achieved outstanding results compared with the traditional ML methods in a wide range of disease prediction tasks, they still confront interpretability and dynamic graph challenges. Though the disease prediction field using ML techniques is still emerging, GNN-based models have the potential to be an excellent approach for disease prediction, which can be used in medical diagnosis, treatment, and the prognosis of diseases.
Collapse
Affiliation(s)
- Haohui Lu
- School of Project Management, Faculty of Engineering, The University of Sydney, Forest Lodge, Sydney, NSW 2037, Australia
| | - Shahadat Uddin
- School of Project Management, Faculty of Engineering, The University of Sydney, Forest Lodge, Sydney, NSW 2037, Australia
| |
Collapse
|
12
|
Dhiyanesh B, Rameshkumar M, Karthick K, Radha R. Cloud computing and machine learning for analysis of health care data based on neuro fuzzy logistic regression. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2023. [DOI: 10.3233/jifs-223280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/07/2023]
Abstract
Healthcare data is the most sensitive information for processing through machine learning and cloud computing in the various healthcare organizations. Electronic Health Record (EHR) manipulation are now on the rise, and we need to focus on using the data generated by the healthcare applications. Many sensitive data are associated with various health care domains, particularly neurology and cardiology. Previous approaches, such as manual data records, had significant disadvantages, and hence disease prediction based on the above records was found ineffective resulting with improper diagnosis on the patients. These data records require special attention, and current frameworks focused on these areas must implement sophisticated technologies to predict specific patterns. To address the above concerns, the proposed work incorporates the integration of Neuro Fuzzy Logistic Regression (NFLR) machine learning algorithm and cloud computing storage management to solve these problems. The usage of cloud storage reduces data duplication while handling the storage of EHRs where the proposed ML algorithm accurately predict the disease. In the proposed research, the features are extracted using a specific algorithm –Self-organizing Clustering (SOC) which forms a clustered data with highest weight. To select the maximum number of features, and to predict the disease risk factors, the S2NO algorithm and NFLR algorithms are used in this work. Further, the database storage estimation with fuzzy rules, logistic analysis, and other benefits such as experimental learning of different ML tools, data privacy constraints related to healthcare are considered in this paper.
Collapse
Affiliation(s)
- B. Dhiyanesh
- CSE, Dr. N.G.P. Institute of Technology, Coimbatore, Tamil Nadu, India
| | - M. Rameshkumar
- CSE, AVS College of Technology, Salem, Tamil Nadu, India
| | - K. Karthick
- IT, Sona College of Technology, Salem, Tamil Nadu, India
| | - R. Radha
- EEE, Study World College of Engineering, Coimbatore, Tamil Nadu, India
| |
Collapse
|
13
|
Saroja T, Kalpana Y. Adaptive Weight Dynamic Butterfly Optimization Algorithm (ADBOA)-Based Feature Selection and Classifier for Chronic Kidney Disease (CKD) Diagnosis. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS 2023. [DOI: 10.1142/s1469026823410018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
Abstract
Chronic Kidney Disease (CKD) are a universal issue for the well-being of people as they result in morbidities and deaths with the onset of additional diseases. Because there are no clear early symptoms of CKD, people frequently miss them. Timely identification of CKD allows individuals to acquire proper medications to prevent the development of the diseases. Machine learning technique (MLT) can strongly assist doctors in achieving this aim due to their rapid and precise determination capabilities. Many MLT encounter inappropriate features in most databases that might lower the classifier’s performance. Missing values are filled using K-Nearest Neighbor (KNN). Adaptive Weight Dynamic Butterfly Optimization Algorithm (AWDBOA) are nature-inspired feature selection (FS) techniques with good explorations, exploitations, convergences, and do not get trapped in local optimums. Operators used in Local Search Algorithm-Based Mutation (LSAM) and Butterfly Optimization Algorithm (BOA) which use diversity and generations of adaptive weights to features for enhancing FS are modified in this work. Simultaneously, an adaptive weight value is added for FS from the database. Following the identification of features, six MLT are used in classification tasks namely Logistic Regressions (LOG), Random Forest (RF), Support Vector Machine (SVM), KNNs, Naive Baye (NB), and Feed Forward Neural Network (FFNN). The CKD databases were retrieved from MLT repository of UCI (University of California, Irvine). Precision, Recall, F1-Score, Sensitivity, Specificity, and accuracy are compared to assess this work’s classification framework with existing approaches.
Collapse
|
14
|
Stevens CAT, Lyons ARM, Dharmayat KI, Mahani A, Ray KK, Vallejo-Vaz AJ, Sharabiani MTA. Ensemble machine learning methods in screening electronic health records: A scoping review. Digit Health 2023; 9:20552076231173225. [PMID: 37188075 PMCID: PMC10176785 DOI: 10.1177/20552076231173225] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 04/14/2023] [Indexed: 05/17/2023] Open
Abstract
Background Electronic health records provide the opportunity to identify undiagnosed individuals likely to have a given disease using machine learning techniques, and who could then benefit from more medical screening and case finding, reducing the number needed to screen with convenience and healthcare cost savings. Ensemble machine learning models combining multiple prediction estimates into one are often said to provide better predictive performances than non-ensemble models. Yet, to our knowledge, no literature review summarises the use and performances of different types of ensemble machine learning models in the context of medical pre-screening. Method We aimed to conduct a scoping review of the literature reporting the derivation of ensemble machine learning models for screening of electronic health records. We searched EMBASE and MEDLINE databases across all years applying a formal search strategy using terms related to medical screening, electronic health records and machine learning. Data were collected, analysed, and reported in accordance with the PRISMA scoping review guideline. Results A total of 3355 articles were retrieved, of which 145 articles met our inclusion criteria and were included in this study. Ensemble machine learning models were increasingly employed across several medical specialties and often outperformed non-ensemble approaches. Ensemble machine learning models with complex combination strategies and heterogeneous classifiers often outperformed other types of ensemble machine learning models but were also less used. Ensemble machine learning models methodologies, processing steps and data sources were often not clearly described. Conclusions Our work highlights the importance of deriving and comparing the performances of different types of ensemble machine learning models when screening electronic health records and underscores the need for more comprehensive reporting of machine learning methodologies employed in clinical research.
Collapse
Affiliation(s)
- Christophe AT Stevens
- Imperial Centre for Cardiovascular
Disease Prevention (ICCP), Department of Primary Care and Public Health, School of
Public Health, Imperial College London, London, UK
| | - Alexander RM Lyons
- Imperial Centre for Cardiovascular
Disease Prevention (ICCP), Department of Primary Care and Public Health, School of
Public Health, Imperial College London, London, UK
| | - Kanika I Dharmayat
- Imperial Centre for Cardiovascular
Disease Prevention (ICCP), Department of Primary Care and Public Health, School of
Public Health, Imperial College London, London, UK
| | - Alireza Mahani
- Quantitative Research, Davidson Kempner
Capital Management, New York, NY, USA
| | - Kausik K Ray
- Imperial Centre for Cardiovascular
Disease Prevention (ICCP), Department of Primary Care and Public Health, School of
Public Health, Imperial College London, London, UK
| | - Antonio J Vallejo-Vaz
- Imperial Centre for Cardiovascular
Disease Prevention (ICCP), Department of Primary Care and Public Health, School of
Public Health, Imperial College London, London, UK
- Department of Medicine, Faculty of
Medicine, University of Seville, Sevilla, Spain
- Clinical Epidemiology and Vascular
Risk, Instituto de Biomedicina de Sevilla (IBiS), IBiS/Hospital Universitario Virgen
del Rocío/Universidad de Sevilla/CSIC, Sevilla, Spain
| | - Mansour TA Sharabiani
- Department of Primary Care and Public
Health, School of Public Health, Imperial College London, London, UK
| |
Collapse
|
15
|
Cardozo G, Tirloni SF, Pereira Moro AR, Marques JLB. Use of Artificial Intelligence in the Search for New Information Through Routine Laboratory Tests: Systematic Review. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2022; 3:e40473. [PMID: 36644762 PMCID: PMC9828303 DOI: 10.2196/40473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/28/2022] [Accepted: 10/31/2022] [Indexed: 11/05/2022]
Abstract
Background In recent decades, the use of artificial intelligence has been widely explored in health care. Similarly, the amount of data generated in the most varied medical processes has practically doubled every year, requiring new methods of analysis and treatment of these data. Mainly aimed at aiding in the diagnosis and prevention of diseases, this precision medicine has shown great potential in different medical disciplines. Laboratory tests, for example, almost always present their results separately as individual values. However, physicians need to analyze a set of results to propose a supposed diagnosis, which leads us to think that sets of laboratory tests may contain more information than those presented separately for each result. In this way, the processes of medical laboratories can be strongly affected by these techniques. Objective In this sense, we sought to identify scientific research that used laboratory tests and machine learning techniques to predict hidden information and diagnose diseases. Methods The methodology adopted used the population, intervention, comparison, and outcomes principle, searching the main engineering and health sciences databases. The search terms were defined based on the list of terms used in the Medical Subject Heading database. Data from this study were presented descriptively and followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses; 2020) statement flow diagram and the National Institutes of Health tool for quality assessment of articles. During the analysis, the inclusion and exclusion criteria were independently applied by 2 authors, with a third author being consulted in cases of disagreement. Results Following the defined requirements, 40 studies presenting good quality in the analysis process were selected and evaluated. We found that, in recent years, there has been a significant increase in the number of works that have used this methodology, mainly because of COVID-19. In general, the studies used machine learning classification models to predict new information, and the most used parameters were data from routine laboratory tests such as the complete blood count. Conclusions Finally, we conclude that laboratory tests, together with machine learning techniques, can predict new tests, thus helping the search for new diagnoses. This process has proved to be advantageous and innovative for medical laboratories. It is making it possible to discover hidden information and propose additional tests, reducing the number of false negatives and helping in the early discovery of unknown diseases.
Collapse
Affiliation(s)
- Glauco Cardozo
- Federal Institute of Santa Catarina Florianópolis Brazil
| | | | | | | |
Collapse
|
16
|
Zhou X, Li X, Zhang Z, Han Q, Deng H, Jiang Y, Tang C, Yang L. Support vector machine deep mining of electronic medical records to predict the prognosis of severe acute myocardial infarction. Front Physiol 2022; 13:991990. [PMID: 36246101 PMCID: PMC9558165 DOI: 10.3389/fphys.2022.991990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 08/17/2022] [Indexed: 11/13/2022] Open
Abstract
Cardiovascular disease is currently one of the most important diseases causing death in China and the world, and acute myocardial infarction is a major cause of cardiovascular disease. This study provides an analytical technique for predicting the prognosis of patients with severe acute myocardial infarction using a support vector machine (SVM) technique based on information gleaned from electronic medical records in the Medical Information Marketplace for Intensive Care (MIMIC)-III database. The MIMIC-III database provided 4785 electronic medical records data for inclusion in the model development after screening 7070 electronic medical records of patients admitted to the intensive care unit for treatment of acute myocardial infarction. Adopting the APS-III score as the criterion for identifying anticipated risk, the dimensions of data information incorporated into the mathematical model design were found using correlation coefficient matrix heatmaps and ordered logistic analysis. An automated prognostic risk-prediction model was developed using SVM, and the fit was evaluated by 5× cross-validation. We used a grid search method to further optimize the parameters and improve the model fit. The excellent generalization ability of SVM was fully verified by calculating the 95% confidence interval of the area under the receiver operating characteristic curve (AUC) for six algorithms (linear discriminant, tree, Kernel Naive Bayes, RUSBoost, KNN, and SVM). Compared to the remaining five models, its confidence interval was the narrowest with higher fitting accuracy and better performance. The patient prognostic risk prediction model constructed using SVM had a relatively impressive accuracy (92.2%) and AUC value (0.98). In this study, a model was designed for fitting that can maximize the potential information to be gleaned in the electronic medical records data. It was demonstrated that SVM models based on electronic medical records data can offer an effective solution for clinical disease prognostic risk assessment and improved clinical outcomes and have great potential for clinical application in the clinical treatment of myocardial infarction.
Collapse
Affiliation(s)
- Xingyu Zhou
- Zhuhai Campus of Zunyi Medical University, Zhuhai, China
- Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences (CAS), Shenzhen, China
| | - Xianying Li
- Zhuhai Campus of Zunyi Medical University, Zhuhai, China
| | - Zijun Zhang
- Zhuhai Campus of Zunyi Medical University, Zhuhai, China
| | - Qinrong Han
- Zhuhai Campus of Zunyi Medical University, Zhuhai, China
| | - Huijiao Deng
- Zhuhai Campus of Zunyi Medical University, Zhuhai, China
| | - Yi Jiang
- Zhuhai Campus of Zunyi Medical University, Zhuhai, China
| | - Chunxiao Tang
- Zhuhai Campus of Zunyi Medical University, Zhuhai, China
| | - Lin Yang
- Zhuhai Campus of Zunyi Medical University, Zhuhai, China
- Key Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences (CAS), Shenzhen, China
- *Correspondence: Lin Yang,
| |
Collapse
|
17
|
An improved pairing-free certificateless aggregate signature scheme for healthcare wireless medical sensor networks. PLoS One 2022; 17:e0268484. [PMID: 35816499 PMCID: PMC9273098 DOI: 10.1371/journal.pone.0268484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 05/01/2022] [Indexed: 11/19/2022] Open
Abstract
In healthcare wireless medical sensor networks (HWMSNs), the medical sensor nodes are employed to collect medical data which is transmitted to doctors for diagnosis and treatment. In HWMSNs, medical data is vulnerable to various attacks through public channels. In addition, leakage of patients’ information happens frequently. Hence, secure communication and privacy preservation are major concerns in HWMSNs. To solve the above issues, Zhan et al. put forward a pairing-free certificateless aggregate signature (PF-CLAS) scheme. However, according to our cryptanalysis, the malicious medical sensor node (MSNi) can generate the forged signature by replacing the public key in the PF-CLAS scheme. Hence, to address this security flaw, we design the improved PF-CLAS scheme that can achieve unforgeability, anonymity, and traceability. Since we have changed the construction of the partial private key, the improved PF-CLAS scheme can resist Type I and Type II attacks under the Elliptic Curve Discrete Logarithm assumption. In terms of the performance evaluation, the proposed scheme outperforms related CLAS schemes, which is more suitable for HWMSNs environments.
Collapse
|
18
|
Cardozo G, Pintarelli GB, Andreis GR, Lopes ACW, Marques JLB. Use of Machine Learning and Routine Laboratory Tests for Diabetes Mellitus Screening. BIOMED RESEARCH INTERNATIONAL 2022; 2022:8114049. [PMID: 35392258 PMCID: PMC8983182 DOI: 10.1155/2022/8114049] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 02/18/2022] [Accepted: 03/10/2022] [Indexed: 12/28/2022]
Abstract
Most patients with diabetes mellitus are asymptomatic, which leads to delayed and more complex treatment. At the same time, most individuals are routinely subjected to standard clinical laboratory examinations, which create large health datasets over a lifetime. Computer processing has been used to search for health anomalies and predict diseases using clinical examinations. This work studied machine learning models to support the screening of diabetes through routine laboratory tests using data from laboratory tests of 62,496 patients. The classification and regression models used were the K-nearest neighbor, support vector machines, Bayes naïve, random forest models, and artificial neural networks. Glycated hemoglobin, a test used for diabetes diagnosis, was used as the target. Regression models calculated glycated hemoglobin directly and were later classified. The performance of classification computer models has been studied under various subdataset partitions and combinations (e.g., healthy, prediabetic, and diabetes, as well as no healthy and no diabetes). The best single performance was achieved with the artificial neural network model when detecting prediabetes or diabetes. The artificial neural network classification model scored 78.1%, 78.7%, and 78.4% for sensitivity, precision, and F1 scores, respectively, when identifying no healthy group. Other models also had good results, depending on what is desired. Machine learning-based models can predict glycated hemoglobin values from routine laboratory tests and can be used as a screening tool to refer a patient for further testing.
Collapse
Affiliation(s)
- Glauco Cardozo
- Academic Department of Health and Services, Federal Institute of Santa Catarina, Florianopolis, SC 88020-300, Brazil
- Institute of Biomedical Engineering, Federal University of Santa Catarina, Florianopolis, SC 88040-900, Brazil
| | - Guilherme Brasil Pintarelli
- Institute of Biomedical Engineering, Federal University of Santa Catarina, Florianopolis, SC 88040-900, Brazil
| | - Guilherme Rettore Andreis
- Institute of Biomedical Engineering, Federal University of Santa Catarina, Florianopolis, SC 88040-900, Brazil
| | | | - Jefferson Luiz Brum Marques
- Institute of Biomedical Engineering, Federal University of Santa Catarina, Florianopolis, SC 88040-900, Brazil
| |
Collapse
|
19
|
Avina-Bravo EG, Cassirame J, Escriba C, Acco P, Fourniols JY, Soto-Romero G. Smart Electrically Assisted Bicycles as Health Monitoring Systems: A Review. SENSORS (BASEL, SWITZERLAND) 2022; 22:468. [PMID: 35062429 PMCID: PMC8780236 DOI: 10.3390/s22020468] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 12/24/2021] [Accepted: 01/05/2022] [Indexed: 05/03/2023]
Abstract
This paper aims to provide a review of the electrically assisted bicycles (also known as e-bikes) used for recovery of the rider's physical and physiological information, monitoring of their health state, and adjusting the "medical" assistance accordingly. E-bikes have proven to be an excellent way to do physical activity while commuting, thus improving the user's health and reducing air pollutant emissions. Such devices can also be seen as the first step to help unhealthy sedentary people to start exercising with reduced strain. Based on this analysis, the need to have e-bikes with artificial intelligence (AI) systems that recover and processe a large amount of data is discussed in depth. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines were used to complete the relevant papers' search and selection in this systematic review.
Collapse
Affiliation(s)
- Eli Gabriel Avina-Bravo
- Laboratory for Analysis and Architecture of Systems (LAAS), University of Toulouse, F-31077 Toulouse, France
| | - Johan Cassirame
- EA4660, Culture, Sport, Health and Society Department and Exercise Performance, University of Bourgogne-France Comté, 25000 Besançon, France
- EA7507, Laboratoire Performance Santé Métrologie Société, 51100 Reims, France
- Société Mtraining, R&D Division, 25480 Ecole Valentin, France
| | - Christophe Escriba
- Laboratory for Analysis and Architecture of Systems (LAAS), University of Toulouse, F-31077 Toulouse, France
| | - Pascal Acco
- Laboratory for Analysis and Architecture of Systems (LAAS), University of Toulouse, F-31077 Toulouse, France
| | - Jean-Yves Fourniols
- Laboratory for Analysis and Architecture of Systems (LAAS), University of Toulouse, F-31077 Toulouse, France
| | - Georges Soto-Romero
- Laboratory for Analysis and Architecture of Systems (LAAS), University of Toulouse, F-31077 Toulouse, France
| |
Collapse
|
20
|
Uddin S, Imam T, Hossain ME, Gide E, Sianaki OA, Moni MA, Mohammed AA, Vandana V. Intelligent type 2 diabetes risk prediction from administrative claim data. Inform Health Soc Care 2021; 47:243-257. [PMID: 34672859 DOI: 10.1080/17538157.2021.1988957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Type 2 diabetes is a chronic, costly disease and is a serious global population health problem. Yet, the disease is well manageable and preventable if there is an early warning. This study aims to apply supervised machine learning algorithms for developing predictive models for type 2 diabetes using administrative claim data. Following guidelines from the Elixhauser Comorbidity Index, 31 variables were considered. Five supervised machine learning algorithms were used for developing type 2 diabetes prediction models. Principal component analysis was applied to rank variables' importance in predictive models. Random forest (RF) showed the highest accuracy (85.06%) among the algorithms, closely followed by the k-nearest neighbor (84.48%). The analysis further revealed RF as a high performing algorithm irrespective of data imbalance. As revealed by the principal component analysis, patient age is the most important predictor for type 2 diabetes, followed by a comorbid condition (i.e., solid tumor without metastasis). This study's finding of RF as the best performing classifier is consistent with the promise of tree-based algorithms for public data in other works. Thus, the outcome can guide in designing automated surveillance of patients at risk of forming diabetes from administrative claim information and will be useful to health regulators and insurers.
Collapse
Affiliation(s)
- Shahadat Uddin
- Complex Systems Research Group, Faculty of Engineering, The University of Sydney, Darlington, NSW, Australia
| | - Tasadduq Imam
- School of Business and Law, CQUniversity, Melbourne, VIC, Australia
| | - Md Ekramul Hossain
- Complex Systems Research Group, Faculty of Engineering, The University of Sydney, Darlington, NSW, Australia
| | - Ergun Gide
- School of Engineering and Technology, CQUniversity, Sydney, NSW, Australia
| | - Omid Ameri Sianaki
- College of Engineering and Science, Victoria University, Sydney, NSW, Australia.,Victoria University Business School, Melbourne, Victoria, Australia
| | - Mohammad Ali Moni
- School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD, Australia
| | | | - Vandana Vandana
- College of Engineering and Science, Victoria University, Sydney, NSW, Australia
| |
Collapse
|
21
|
Rashed-Al-Mahfuz M, Haque A, Azad A, Alyami SA, Quinn JMW, Moni MA. Clinically Applicable Machine Learning Approaches to Identify Attributes of Chronic Kidney Disease (CKD) for Use in Low-Cost Diagnostic Screening. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE-JTEHM 2021; 9:4900511. [PMID: 33948393 PMCID: PMC8075287 DOI: 10.1109/jtehm.2021.3073629] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2020] [Revised: 02/21/2021] [Accepted: 04/12/2021] [Indexed: 11/07/2022]
Abstract
OBJECTIVE Chronic kidney disease (CKD) is a major public health concern worldwide. High costs of late-stage diagnosis and insufficient testing facilities can contribute to high morbidity and mortality rates in CKD patients, particularly in less developed countries. Thus, early diagnosis aided by vital parameter analytics using affordable computer-aided diagnosis could not only reduce diagnosis costs but improve patient management and outcomes. METHODS In this study, we developed machine learning models using selective key pathological categories to identify clinical test attributes that will aid in accurate early diagnosis of CKD. Such an approach will save time and costs for diagnostic screening. We have also evaluated the performance of several classifiers with k-fold cross-validation on optimized datasets derived using these selected clinical test attributes. RESULTS Our results suggest that the optimized datasets with important attributes perform well in diagnosis of CKD using our proposed machine learning models. Furthermore, we evaluated clinical test attributes based on urine and blood tests along with clinical parameters that have low costs of acquisition. The predictive models with the optimized and pathologically categorized attributes set yielded high levels of CKD diagnosis accuracy with random forest (RF) classifier being the best performing. CONCLUSIONS Our machine learning approach has yielded effective predictive analytics for CKD screening which can be developed as a resource to facilitate improved CKD screening for enhanced and timely treatment plans.
Collapse
Affiliation(s)
- Md Rashed-Al-Mahfuz
- Department of Computer Science and EngineeringUniversity of RajshahiRajshahi6205Bangladesh
| | - Abedul Haque
- Department of HematopathologyThe University of Texas MD Anderson Cancer CenterHoustonTX77030USA
| | - Akm Azad
- iThree Institute, University of Technology SydneyNSW2007Australia
| | - Salem A Alyami
- Department of Mathematics and StatisticsImam Muhammad Ibn Saud Islamic UniversityRiyadh13318Saudi Arabia
| | - Julian M W Quinn
- Bone Biology DivisionGarvan Institute of Medical ResearchDarlinghurstNSW2010Australia
| | - Mohammad Ali Moni
- WHO Collaborating Centre of eHealth, School of Public Health and Community MedicineUniversity of New South WalesSydneyNSW2052Australia
| |
Collapse
|
22
|
Rana HK, Akhtar MR, Islam MB, Ahmed MB, Lió P, Huq F, Quinn JMW, Moni MA. Machine Learning and Bioinformatics Models to Identify Pathways that Mediate Influences of Welding Fumes on Cancer Progression. Sci Rep 2020; 10:2795. [PMID: 32066756 PMCID: PMC7026442 DOI: 10.1038/s41598-020-57916-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Accepted: 12/21/2019] [Indexed: 12/13/2022] Open
Abstract
Welding generates and releases fumes that are hazardous to human health. Welding fumes (WFs) are a complex mix of metallic oxides, fluorides and silicates that can cause or exacerbate health problems in exposed individuals. In particular, WF inhalation over an extended period carries an increased risk of cancer, but how WFs may influence cancer behaviour or growth is unclear. To address this issue we employed a quantitative analytical framework to identify the gene expression effects of WFs that may affect the subsequent behaviour of the cancers. We examined datasets of transcript analyses made using microarray studies of WF-exposed tissues and of cancers, including datasets from colorectal cancer (CC), prostate cancer (PC), lung cancer (LC) and gastric cancer (GC). We constructed gene-disease association networks, identified signaling and ontological pathways, clustered protein-protein interaction network using multilayer network topology, and analyzed survival function of the significant genes using Cox proportional hazards (Cox PH) model and product-limit (PL) estimator. We observed that WF exposure causes altered expression of many genes (36, 13, 25 and 17 respectively) whose expression are also altered in CC, PC, LC and GC. Gene-disease association networks, signaling and ontological pathways, protein-protein interaction network, and survival functions of the significant genes suggest ways that WFs may influence the progression of CC, PC, LC and GC. This quantitative analytical framework has identified potentially novel mechanisms by which tissue WF exposure may lead to gene expression changes in tissue gene expression that affect cancer behaviour and, thus, cancer progression, growth or establishment.
Collapse
Affiliation(s)
- Humayan Kabir Rana
- Department of Computer Science and Engineering, Green University of Bangladesh, Dhaka, Bangladesh
| | - Mst Rashida Akhtar
- Department of Computer Science and Engineering, Varendra University, Rajshahi, Bangladesh
| | - M Babul Islam
- Department of Electrical and Electronic Engineering, University of Rajshahi, Rajshahi, Bangladesh
| | - Mohammad Boshir Ahmed
- Bio-electronics Materials Laboratory, School of Materials Science and Engineering, Gwangju Institute of Science and Technology, 261 Cheomdan-gwagiro, Buk-gu, Gwangju, 500-712, Republic of Korea
| | - Pietro Lió
- Computer Laboratory, Department of Computer Science and Technology, University of Cambridge, 15 JJ Thomson Avenue, Cambridge, CB3 0FD, UK
| | - Fazlul Huq
- Discipline of Pathology, School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, Australia
| | - Julian M W Quinn
- Bone Biology Division, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| | - Mohammad Ali Moni
- Discipline of Pathology, School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, Australia. .,Bone Biology Division, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia.
| |
Collapse
|
23
|
Hossain ME, Uddin S, Khan A, Moni MA. A Framework to Understand the Progression of Cardiovascular Disease for Type 2 Diabetes Mellitus Patients Using a Network Approach. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:E596. [PMID: 31963383 PMCID: PMC7013570 DOI: 10.3390/ijerph17020596] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 01/14/2020] [Indexed: 12/13/2022]
Abstract
The prevalence of chronic disease comorbidity has increased worldwide. Comorbidity-i.e., the presence of multiple chronic diseases-is associated with adverse health outcomes in terms of mobility and quality of life as well as financial burden. Understanding the progression of comorbidities can provide valuable insights towards the prevention and better management of chronic diseases. Administrative data can be used in this regard as they contain semantic information on patients' health conditions. Most studies in this field are focused on understanding the progression of one chronic disease rather than multiple diseases. This study aims to understand the progression of two chronic diseases in the Australian health context. It specifically focuses on the comorbidity progression of cardiovascular disease (CVD) in patients with type 2 diabetes mellitus (T2DM), as the prevalence of these chronic diseases in Australians is high. A research framework is proposed to understand and represent the progression of CVD in patients with T2DM using graph theory and social network analysis techniques. Two study cohorts (i.e., patients with both T2DM and CVD and patients with only T2DM) were selected from an administrative dataset obtained from an Australian health insurance company. Two baseline disease networks were constructed from these two selected cohorts. A final disease network from two baseline disease networks was then generated by weight adjustments in a normalized way. The prevalence of renal failure, fluid and electrolyte disorders, hypertension and obesity was significantly higher in patients with both CVD and T2DM than patients with only T2DM. This showed that these chronic diseases occurred frequently during the progression of CVD in patients with T2DM. The proposed network-based model may potentially help the healthcare provider to understand high-risk diseases and the progression patterns between the recurrence of T2DM and CVD. Also, the framework could be useful for stakeholders including governments and private health insurers to adopt appropriate preventive health management programs for patients at a high risk of developing multiple chronic diseases.
Collapse
Affiliation(s)
- Md Ekramul Hossain
- Complex Systems Research Group, Faculty of Engineering, The University of Sydney, Darlington, NSW 2008, Australia;
| | - Shahadat Uddin
- Complex Systems Research Group, Faculty of Engineering, The University of Sydney, Darlington, NSW 2008, Australia;
| | - Arif Khan
- Complex Systems Research Group, Faculty of Engineering, The University of Sydney, Darlington, NSW 2008, Australia;
| | - Mohammad Ali Moni
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia;
| |
Collapse
|
24
|
Hossain MA, Saiful Islam SM, Quinn JM, Huq F, Moni MA. Machine learning and bioinformatics models to identify gene expression patterns of ovarian cancer associated with disease progression and mortality. J Biomed Inform 2019; 100:103313. [DOI: 10.1016/j.jbi.2019.103313] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2019] [Revised: 09/20/2019] [Accepted: 10/13/2019] [Indexed: 02/07/2023]
|