1
|
Zhao A, Bai H, Bao X, Liao K, Ren H, Hu H. Model-driven high-throughput zebrafish embryo assay for evaluating whole effluent toxicity variation across 100 full-scale wastewater treatment plants. WATER RESEARCH 2025; 281:123675. [PMID: 40273605 DOI: 10.1016/j.watres.2025.123675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2025] [Revised: 03/26/2025] [Accepted: 04/17/2025] [Indexed: 04/26/2025]
Abstract
The zebrafish embryo is a valuable model for evaluating whole effluent toxicity (WET). However, the widely recognized acute toxicity indicator, based on International Organization of Standardization (ISO) methods, requires large numbers of embryos and is often time-consuming due to its complex experimental procedures. In this study, we propose an alternative to the conventional reliance on ISO standards by developing a model-driven high-throughput assay that utilizes actual wastewater, enabling rapid LC10 (the lethal concentration at which 10 % of the test organisms are affected) prediction through machine learning techniques and multidimensional indicators derived from streamlined experimental procedures. We compared three streamlined toxicity assays-developmental toxicity, behavioral toxicity, and vascular toxicity-along with five different models. Among these, the Lasso model based on behavioral toxicity emerged as the most effective, achieving an R2 value of 0.893 while reducing experimental time by 5- to 8-fold. Furthermore, fivefold cross-validation confirmed its robust predictive accuracy. The application of this model-driven high-throughput assay across 100 wastewater treatment plants in China highlights the crucial role of biological treatment, particularly aerobic processes and secondary sedimentation, in reducing toxicity, thereby providing valuable insights into their functions. This high-throughput assay not only surpasses the ISO standard method in efficiency but also substantially decreases embryo usage, facilitating rapid WET assessments of actual wastewater with larger sample sizes.
Collapse
Affiliation(s)
- Aixia Zhao
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, Jiangsu, PR China
| | - Hongwei Bai
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, Jiangsu, PR China
| | - Xingchen Bao
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, Jiangsu, PR China
| | - Kewei Liao
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, Jiangsu, PR China
| | - Hongqiang Ren
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, Jiangsu, PR China
| | - Haidong Hu
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, Jiangsu, PR China.
| |
Collapse
|
2
|
Kim J, Han BS, Ha JE, Park MS, Kwon S, Cho BJ. Prediction of the Cause of Fundus-Obscuring Vitreous Hemorrhage Using Machine Learning. Diagnostics (Basel) 2025; 15:371. [PMID: 39941302 PMCID: PMC11817034 DOI: 10.3390/diagnostics15030371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2024] [Revised: 01/28/2025] [Accepted: 02/01/2025] [Indexed: 02/16/2025] Open
Abstract
Objectives: This study aimed to predict the unknown etiology of fundus-obscuring vitreous hemorrhage (FOVH) based on preoperative conditions using machine learning (ML) and to identify key preoperative factors. Methods: Medical records of 223 eyes from 204 patients who underwent vitrectomy for FOVH of unknown etiology between January 2012 and July 2022 were retrospectively reviewed. Preoperative data, including demographic information, systemic disease, ophthalmic history, and retinal status of the unaffected eye, were collected. The postoperatively identified etiologies of FOVH were categorized into six groups: proliferative diabetic retinopathy (PDR), retinal vein occlusion (RVO) or rupture of retinal arterial macroaneurysm, neovascular age-related macular degeneration (nAMD), retinal tear, Terson syndrome, and other causes. Four ML algorithms were trained and evaluated using seven-fold cross-validation. Results: The ML algorithms achieved mean accuracies of 76.2% for artificial neural network, 74.5% for XG-Boost, 74.4% for LASSO logistic regression, and 68.5% for decision tree. Key predictive factors commonly selected by the ML algorithms included PDR in the fellow eye, underlying diabetes mellitus, subarachnoid hemorrhage, and a history of retinal tear, RVO, or nAMD in the affected eye. Conclusions: The unknown etiology of FOVH could be predicted preoperatively with considerable accuracy by ML algorithms. Previous ophthalmic conditions in the affected eye and the condition of the fellow eye were important variables for prediction. This approach might assist in determining appropriate treatment plans.
Collapse
Affiliation(s)
- Jinsoo Kim
- Department of Ophthalmology, Hallym University Sacred Heart Hospital, Hallym University College of Medicine, Anyang 14068, Republic of Korea
| | - Bo Sook Han
- Department of Ophthalmology, Hallym University Sacred Heart Hospital, Hallym University College of Medicine, Anyang 14068, Republic of Korea
| | - Joo Eun Ha
- Department of Ophthalmology, Hallym University Sacred Heart Hospital, Hallym University College of Medicine, Anyang 14068, Republic of Korea
| | - Min Seon Park
- Department of Ophthalmology, Hallym University Sacred Heart Hospital, Hallym University College of Medicine, Anyang 14068, Republic of Korea
| | - Soonil Kwon
- Department of Ophthalmology, Hallym University Sacred Heart Hospital, Hallym University College of Medicine, Anyang 14068, Republic of Korea
| | - Bum-Joo Cho
- Department of Ophthalmology, Hallym University Sacred Heart Hospital, Hallym University College of Medicine, Anyang 14068, Republic of Korea
- Medical Artificial Intelligence Center, Hallym University Medical Center, Anyang 14068, Republic of Korea
| |
Collapse
|
3
|
Shi X, Wang D, Li L, Wang Y, Ning R, Yu S, Gao N. Algal classification and Chlorophyll-a concentration determination using convolutional neural networks and three-dimensional fluorescence data matrices. ENVIRONMENTAL RESEARCH 2025; 266:120500. [PMID: 39631647 DOI: 10.1016/j.envres.2024.120500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Revised: 11/13/2024] [Accepted: 11/30/2024] [Indexed: 12/07/2024]
Abstract
In recent years, the frequency of harmful algal blooms has increased, leading to the release of large quantities of toxins and compounds that cause unpleasant odors and tastes, significantly compromising drinking water quality. Chlorophyll-a (Chl-a) is commonly used as a proxy for algal biomass. However, current methods for measuring Chl-a concentration face challenges in accurately quantifying algae by categories and effectively adapting to natural aquatic environments. This study combined convolutional neural networks (CNNs) and three-dimensional fluorescence data matrices to address these challenges. The algal classification model achieved over 99.5% accuracy in identifying thirteen types of algal samples, with class activation maps showing that the model primarily focused on algal pigment regions. In determining Chl-a concentrations of each algal species in mixed algae solutions (Microcystis aeruginosa, Cyclotella, and Chlorella), the Chl-a models demonstrated Mean Absolute Percentage Errors (MAPEs) ranging from 6.55% to 10.56% in the ultrapure water background, 11.57%-14.12% in the Qingcaosha Reservoir raw water background, and 21.46%-123.37% in the Lake Taihu raw water background. After calibration, the models were significantly improved, achieving MAPEs ranging from 11.86% to 14.18% in the Lake Taihu raw water background. Discrepancies in determination performance indicated that the intensity and locations of characteristic algal pigment fluorescence peaks greatly influenced the Chl-a models' accuracy. This research introduces a novel approach for algal classification and Chl-a concentration determination in water bodies, with significant potential for practical applications.
Collapse
Affiliation(s)
- Xujie Shi
- State Key Laboratory of Pollution Control and Resource Reuse, College of Environmental Science and Engineering, Tongji University, Shanghai, 200092, China
| | - Denghui Wang
- State Key Laboratory of Pollution Control and Resource Reuse, College of Environmental Science and Engineering, Tongji University, Shanghai, 200092, China
| | - Lei Li
- State Key Laboratory of Pollution Control and Resource Reuse, College of Environmental Science and Engineering, Tongji University, Shanghai, 200092, China; Shanghai Institute of Intelligent Science and Technology, Tongji University, Shanghai, 200092, China.
| | - Yang Wang
- State Key Laboratory of Pollution Control and Resource Reuse, College of Environmental Science and Engineering, Tongji University, Shanghai, 200092, China
| | - Rongsheng Ning
- State Key Laboratory of Pollution Control and Resource Reuse, College of Environmental Science and Engineering, Tongji University, Shanghai, 200092, China
| | - Shuili Yu
- State Key Laboratory of Pollution Control and Resource Reuse, College of Environmental Science and Engineering, Tongji University, Shanghai, 200092, China
| | - Naiyun Gao
- State Key Laboratory of Pollution Control and Resource Reuse, College of Environmental Science and Engineering, Tongji University, Shanghai, 200092, China
| |
Collapse
|
4
|
Pahlevani M, Rajabi E, Taghavi M, VanBerkel P. Developing a decision support tool to predict delayed discharge from hospitals using machine learning. BMC Health Serv Res 2025; 25:56. [PMID: 39799370 PMCID: PMC11724564 DOI: 10.1186/s12913-024-12195-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Accepted: 12/30/2024] [Indexed: 01/15/2025] Open
Abstract
BACKGROUND The growing demand for healthcare services challenges patient flow management in health systems. Alternative Level of Care (ALC) patients who no longer need acute care yet face discharge barriers contribute to prolonged stays and hospital overcrowding. Predicting these patients at admission allows for better resource planning, reducing bottlenecks, and improving flow. This study addresses three objectives: identifying likely ALC patients, key predictive features, and preparing guidelines for early ALC identification at admission. METHODS Data from Nova Scotia Health (2015-2022) covering patient demographics, diagnoses, and clinical information was extracted. Data preparation involved managing outliers, feature engineering, handling missing values, transforming categorical variables, and standardizing. Data imbalance was addressed using class weights, random oversampling, and the Synthetic Minority Over-Sampling Technique (SMOTE). Three ML classifiers, Random Forest (RF), Artificial Neural Network (ANN), and eXtreme Gradient Boosting (XGB), were tested to classify patients as ALC or not. Also, to ensure accurate ALC prediction at admission, only features available at that time were used in a separate model iteration. RESULTS Model performance was assessed using recall, F1-Score, and AUC metrics. The XGB model with SMOTE achieved the highest performance, with a recall of 0.95 and an AUC of 0.97, excelling in identifying ALC patients. The next best models were XGB with random oversampling and ANN with class weights. When limited to admission-only features, the XGB with SMOTE still performed well, achieving a recall of 0.91 and an AUC of 0.94, demonstrating its effectiveness in early ALC prediction. Additionally, the analysis identified diagnosis 1, patient age, and entry code as the top three predictors of ALC status. CONCLUSIONS The results demonstrate the potential of ML models to predict ALC status at admission. The findings support real-time decision-making to improve patient flow and reduce hospital overcrowding. The ALC guideline groups patients first by diagnosis, then by age, and finally by entry code, categorizing prediction outcomes into three probability ranges: below 30%, 30-70%, and above 70%. This framework assesses whether ALC status can be accurately predicted at admission or during the patient's stay before discharge.
Collapse
Affiliation(s)
- Mahsa Pahlevani
- Department of Industrial Engineering, Dalhousie University, PO Box 15000, Halifax, B3H 4R2, NS, Canada
| | - Enayat Rajabi
- Management Science Department, Cape Breton University, 1250 Grand Lake Road, Sydney, B1M 1A2, NS, Canada
| | - Majid Taghavi
- Department of Industrial Engineering, Dalhousie University, PO Box 15000, Halifax, B3H 4R2, NS, Canada.
- Sobey School of Business, Saint Mary's University, 923 Robie St., Halifax, B3H 3C3, NS, Canada.
| | - Peter VanBerkel
- Department of Industrial Engineering, Dalhousie University, PO Box 15000, Halifax, B3H 4R2, NS, Canada
| |
Collapse
|
5
|
Visvanathan P, Vincent PDR. Prediction of Gait Neurodegenerative Diseases by Variational Mode Decomposition Using Machine Learning Algorithms. APPLIED ARTIFICIAL INTELLIGENCE 2024; 38. [DOI: 10.1080/08839514.2024.2389375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 05/21/2024] [Accepted: 07/27/2024] [Indexed: 01/03/2025]
Affiliation(s)
| | - P.M. Durai Raj Vincent
- School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, India
| |
Collapse
|
6
|
Tang CY, Gao C, Prasai K, Li T, Dash S, McElroy JA, Hang J, Wan XF. Prediction models for COVID-19 disease outcomes. Emerg Microbes Infect 2024; 13:2361791. [PMID: 38828796 PMCID: PMC11182058 DOI: 10.1080/22221751.2024.2361791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Accepted: 05/26/2024] [Indexed: 06/05/2024]
Abstract
SARS-CoV-2 has caused over 6.9 million deaths and continues to produce lasting health consequences. COVID-19 manifests broadly from no symptoms to death. In a retrospective cross-sectional study, we developed personalized risk assessment models that predict clinical outcomes for individuals with COVID-19 and inform targeted interventions. We sequenced viruses from SARS-CoV-2-positive nasopharyngeal swab samples between July 2020 and July 2022 from 4450 individuals in Missouri and retrieved associated disease courses, clinical history, and urban-rural classification. We integrated this data to develop machine learning-based predictive models to predict hospitalization, ICU admission, and long COVID.The mean age was 38.3 years (standard deviation = 21.4) with 55.2% (N = 2453) females and 44.8% (N = 1994) males (not reported, N = 4). Our analyses revealed a comprehensive set of predictors for each outcome, encompassing human, environment, and virus genome-wide genetic markers. Immunosuppression, cardiovascular disease, older age, cardiac, gastrointestinal, and constitutional symptoms, rural residence, and specific amino acid substitutions were associated with hospitalization. ICU admission was associated with acute respiratory distress syndrome, ventilation, bacterial co-infection, rural residence, and non-wild type SARS-CoV-2 variants. Finally, long COVID was associated with hospital admission, ventilation, and female sex.Overall, we developed risk assessment models that offer the capability to identify patients with COVID-19 necessitating enhanced monitoring or early interventions. Of importance, we demonstrate the value of including key elements of virus, host, and environmental factors to predict patient outcomes, serving as a valuable platform in the field of personalized medicine with the potential for adaptation to other infectious diseases.
Collapse
Affiliation(s)
- Cynthia Y. Tang
- Center for Influenza and Emerging Infectious Diseases, University of Missouri, Columbia, Missouri, USA
- Molecular Microbiology and Immunology, School of Medicine, University of Missouri, Columbia, Missouri, USA
- Bond Life Sciences Center, University of Missouri, Columbia, Missouri, USA
- Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri, USA
| | - Cheng Gao
- Center for Influenza and Emerging Infectious Diseases, University of Missouri, Columbia, Missouri, USA
- Molecular Microbiology and Immunology, School of Medicine, University of Missouri, Columbia, Missouri, USA
- Bond Life Sciences Center, University of Missouri, Columbia, Missouri, USA
- Department of Electrical Engineering & Computer Science, College of Engineering, University of Missouri, Columbia, Missouri, USA
| | - Kritika Prasai
- Center for Influenza and Emerging Infectious Diseases, University of Missouri, Columbia, Missouri, USA
- Molecular Microbiology and Immunology, School of Medicine, University of Missouri, Columbia, Missouri, USA
- Bond Life Sciences Center, University of Missouri, Columbia, Missouri, USA
- Department of Electrical Engineering & Computer Science, College of Engineering, University of Missouri, Columbia, Missouri, USA
| | - Tao Li
- Viral Diseases Branch, Walter Reed Army Institute of Research, Silver Spring, Maryland, USA
| | - Shreya Dash
- Center for Influenza and Emerging Infectious Diseases, University of Missouri, Columbia, Missouri, USA
- Molecular Microbiology and Immunology, School of Medicine, University of Missouri, Columbia, Missouri, USA
- Bond Life Sciences Center, University of Missouri, Columbia, Missouri, USA
| | - Jane A. McElroy
- Family and Community Medicine, University of Missouri, Columbia, Missouri, USA
| | - Jun Hang
- Viral Diseases Branch, Walter Reed Army Institute of Research, Silver Spring, Maryland, USA
| | - Xiu-Feng Wan
- Center for Influenza and Emerging Infectious Diseases, University of Missouri, Columbia, Missouri, USA
- Molecular Microbiology and Immunology, School of Medicine, University of Missouri, Columbia, Missouri, USA
- Bond Life Sciences Center, University of Missouri, Columbia, Missouri, USA
- Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri, USA
- Department of Electrical Engineering & Computer Science, College of Engineering, University of Missouri, Columbia, Missouri, USA
| |
Collapse
|
7
|
Kang S, Koo JH. Exploring stigma experiences of scattered-site public housing residents and its characteristics based on social contact theory. PLoS One 2024; 19:e0313005. [PMID: 39509379 PMCID: PMC11542776 DOI: 10.1371/journal.pone.0313005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 10/17/2024] [Indexed: 11/15/2024] Open
Abstract
Governments worldwide have been striving to efficiently manage public rental housing. However, the stigma associated with public rental housing persists as a significant challenge. In response, the scattered-site public housing strategy has been introduced as an alternative to traditional large-scale rental housing. The objective of this study was to evaluate the effectiveness of this strategy in reducing the stigma within Seoul metropolitan city. The empirical analysis utilized 2019 Seoul Public Housing Occupant data and a binary logistic regression model. The main findings indicate that residents of scattered-site public housing experience significantly lower levels of stigmatization compared to residents of other public housing types. Notably, the stigmatization experienced by scattered-site public housing residents is lower not only compared to independent public housing residents but also to those in socially mixed public housing, which is typically advantageous for reducing stigmatization. This suggests that residents of scattered-site public housing are statistically more free from both external and internal stigmatization. In addition, a unique characteristic found only in scattered-site public housing is that as residents form closer relationships with their neighbors, they experience more stigmatization. This implies that as scattered-site public housing residents form closer relationships with their neighbors, their identity as public housing residents can become exposed, potentially leading to increased stigmatization.
Collapse
Affiliation(s)
- Sungik Kang
- Department of Urban and Regional Development, Hanyang University, Seoul, South Korea
| | - Ja-Hoon Koo
- Department of Urban and Regional Development, Hanyang University, Seoul, South Korea
| |
Collapse
|
8
|
Sánchez-Marqués R, García V, Sánchez JS. A data-centric machine learning approach to improve prediction of glioma grades using low-imbalance TCGA data. Sci Rep 2024; 14:17195. [PMID: 39060383 PMCID: PMC11282236 DOI: 10.1038/s41598-024-68291-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 07/22/2024] [Indexed: 07/28/2024] Open
Abstract
Accurate prediction and grading of gliomas play a crucial role in evaluating brain tumor progression, assessing overall prognosis, and treatment planning. In addition to neuroimaging techniques, identifying molecular biomarkers that can guide the diagnosis, prognosis and prediction of the response to therapy has aroused the interest of researchers in their use together with machine learning and deep learning models. Most of the research in this field has been model-centric, meaning it has been based on finding better performing algorithms. However, in practice, improving data quality can result in a better model. This study investigates a data-centric machine learning approach to determine their potential benefits in predicting glioma grades. We report six performance metrics to provide a complete picture of model performance. Experimental results indicate that standardization and oversizing the minority class increase the prediction performance of four popular machine learning models and two classifier ensembles applied on a low-imbalanced data set consisting of clinical factors and molecular biomarkers. The experiments also show that the two classifier ensembles significantly outperform three of the four standard prediction models. Furthermore, we conduct a comprehensive descriptive analysis of the glioma data set to identify relevant statistical characteristics and discover the most informative attributes using four feature ranking algorithms.
Collapse
Affiliation(s)
- Raquel Sánchez-Marqués
- Fundación Estatal, Salud, Infancia y Bienestar Social, 28029, Madrid, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Infecciosas (CIBERINFEC), Instituto de Salud Carlos III, 28029, Madrid, Spain
| | - Vicente García
- Dept. Electrical and Computer Engineering, Instituto de Ingeniería y Tecnología, Universidad Autónoma de Ciudad Juárez, 32310, Ciudad Juárez, Mexico.
| | - J Salvador Sánchez
- Dept. Computer Languages and Systems, Institute of New Imaging Technologies, Universitat Jaume I, 12071, Castelló, Spain
| |
Collapse
|
9
|
Scala A, Trunfio TA, Improta G. The classification algorithms to support the management of the patient with femur fracture. BMC Med Res Methodol 2024; 24:150. [PMID: 39014322 PMCID: PMC11251118 DOI: 10.1186/s12874-024-02276-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 07/05/2024] [Indexed: 07/18/2024] Open
Abstract
Effectiveness in health care is a specific characteristic of each intervention and outcome evaluated. Especially with regard to surgical interventions, organization, structure and processes play a key role in determining this parameter. In addition, health care services by definition operate in a context of limited resources, so rationalization of service organization becomes the primary goal for health care management. This aspect becomes even more relevant for those surgical services for which there are high volumes. Therefore, in order to support and optimize the management of patients undergoing surgical procedures, the data analysis could play a significant role. To this end, in this study used different classification algorithms for characterizing the process of patients undergoing surgery for a femoral neck fracture. The models showed significant accuracy with values of 81%, and parameters such as Anaemia and Gender proved to be determined risk factors for the patient's length of stay. The predictive power of the implemented model is assessed and discussed in view of its capability to support the management and optimisation of the hospitalisation process for femoral neck fracture, and is compared with different model in order to identify the most promising algorithms. In the end, the support of artificial intelligence algorithms laying the basis for building more accurate decision-support tools for healthcare practitioners.
Collapse
Affiliation(s)
- Arianna Scala
- Department of Public Health, University of Naples "Federico II", Naples, Italy
| | - Teresa Angela Trunfio
- Department of Advanced Biomedical Sciences, University of Naples "Federico II", Naples, Italy.
| | - Giovanni Improta
- Department of Public Health, University of Naples "Federico II", Naples, Italy
- Interdepartmental Research Center on Management and Innovation in Healthcare, University of Naples "Federico II", Naples, Italy
| |
Collapse
|
10
|
Montolío A, Cegoñino J, Garcia-Martin E, Pérez Del Palomar A. The macular retinal ganglion cell layer as a biomarker for diagnosis and prognosis in multiple sclerosis: A deep learning approach. Acta Ophthalmol 2024; 102:e272-e284. [PMID: 37300357 DOI: 10.1111/aos.15722] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 05/12/2023] [Accepted: 05/28/2023] [Indexed: 06/12/2023]
Abstract
PURPOSE The macular ganglion cell layer (mGCL) is a strong potential biomarker of axonal degeneration in multiple sclerosis (MS). For this reason, this study aims to develop a computer-aided method to facilitate diagnosis and prognosis in MS. METHODS This paper combines a cross-sectional study of 72 MS patients and 30 healthy control subjects for diagnosis and a 10-year longitudinal study of the same MS patients for the prediction of disability progression, during which the mGCL was measured using optical coherence tomography (OCT). Deep neural networks were used as an automatic classifier. RESULTS For MS diagnosis, greatest accuracy (90.3%) was achieved using 17 features as inputs. The neural network architecture comprised the input layer, two hidden layers and the output layer with softmax activation. For the prediction of disability progression 8 years later, accuracy of 81.9% was achieved with a neural network comprising two hidden layers and 400 epochs. CONCLUSION We present evidence that by applying deep learning techniques to clinical and mGCL thickness data it is possible to identify MS and predict the course of the disease. This approach potentially constitutes a non-invasive, low-cost, easy-to-implement and effective method.
Collapse
Affiliation(s)
- Alberto Montolío
- Biomaterials Group, Aragon Institute of Engineering Research (I3A), University of Zaragoza, Zaragoza, Spain
- Mechanical Engineering Department, University of Zaragoza, Zaragoza, Spain
| | - José Cegoñino
- Biomaterials Group, Aragon Institute of Engineering Research (I3A), University of Zaragoza, Zaragoza, Spain
- Mechanical Engineering Department, University of Zaragoza, Zaragoza, Spain
| | - Elena Garcia-Martin
- Ophthalmology Department, Miguel Servet University Hospital, Zaragoza, Spain
- GIMSO Research and Innovation Group, Aragon Institute for Health Research (IIS Aragon), Zaragoza, Spain
| | - Amaya Pérez Del Palomar
- Biomaterials Group, Aragon Institute of Engineering Research (I3A), University of Zaragoza, Zaragoza, Spain
- Mechanical Engineering Department, University of Zaragoza, Zaragoza, Spain
| |
Collapse
|
11
|
Kren J, Skambath I, Kuppler P, Buschschlüter S, Detrez N, Burhan S, Huber R, Brinkmann R, Bonsanto MM. Mechanical characteristics of glioblastoma and peritumoral tumor-free human brain tissue. Acta Neurochir (Wien) 2024; 166:102. [PMID: 38396016 PMCID: PMC10891200 DOI: 10.1007/s00701-024-06009-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 02/16/2024] [Indexed: 02/25/2024]
Abstract
BACKGROUND The diagnosis of brain tumor is a serious event for the affected patient. Surgical resection is a crucial part in the treatment of brain tumors. However, the distinction between tumor and brain tissue can be difficult, even for experienced neurosurgeons. This is especially true in the case of gliomas. In this project we examined whether the biomechanical parameters elasticity and stress relaxation behavior are suitable as additional differentiation criteria between tumorous (glioblastoma multiforme; glioblastoma, IDH-wildtype; GBM) and non-tumorous, peritumoral tissue. METHODS Indentation measurements were used to examine non-tumorous human brain tissue and GBM samples for the biomechanical properties of elasticity and stress-relaxation behavior. The results of these measurements were then used in a classification algorithm (Logistic Regression) to distinguish between tumor and non-tumor. RESULTS Differences could be found in elasticity spread and relaxation behavior between tumorous and non-tumorous tissue. Classification was successful with a sensitivity/recall of 83% (sd = 12%) and a precision of 85% (sd = 9%) for detecting tumorous tissue. CONCLUSION The findings imply that the data on mechanical characteristics, with particular attention to stress relaxation behavior, can serve as an extra element in differentiating tumorous brain tissue from non-tumorous brain tissue.
Collapse
Affiliation(s)
- Jessica Kren
- Department of Neurosurgery, University Hospital Schleswig-Holstein, Luebeck, Germany.
| | - Isabelle Skambath
- Department of Neurosurgery, University Hospital Schleswig-Holstein, Luebeck, Germany
| | - Patrick Kuppler
- Department of Neurosurgery, University Hospital Schleswig-Holstein, Luebeck, Germany
| | | | - Nicolas Detrez
- Medizinisches Laserzentrum Lübeck GmbH, Luebeck, Germany
| | - Sazgar Burhan
- Institute of Biomedical Optics, University of Luebeck, Luebeck, Germany
| | - Robert Huber
- Institute of Biomedical Optics, University of Luebeck, Luebeck, Germany
| | - Ralf Brinkmann
- Medizinisches Laserzentrum Lübeck GmbH, Luebeck, Germany
| | - Matteo Mario Bonsanto
- Department of Neurosurgery, University Hospital Schleswig-Holstein, Luebeck, Germany
| |
Collapse
|
12
|
Chen M, Wei Z, Li L, Zhang K. Edge computing-based proactive control method for industrial product manufacturing quality prediction. Sci Rep 2024; 14:1288. [PMID: 38218746 PMCID: PMC10787841 DOI: 10.1038/s41598-024-51974-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 01/11/2024] [Indexed: 01/15/2024] Open
Abstract
With the emergence of intelligent manufacturing, new-generation information technologies such as big data and artificial intelligence are rapidly integrating with the manufacturing industry. One of the primary applications is to assist manufacturing plants in predicting product quality. Traditional predictive models primarily focus on establishing high-precision classification or regression models, with less emphasis on imbalanced data. This is a specific but common scenario in practical industrial environments concerning quality prediction. A SMOTE-XGboost quality prediction active control method based on joint optimization hyperparameters is proposed to address the problem of imbalanced data classification in product quality prediction. In addition, edge computing technology is introduced to address issues in industrial manufacturing, such as the large bandwidth load and resource limitations associated with traditional cloud computing models. Finally, the practicality and effectiveness of the proposed method are validated through a case study of the brake disc production line. Experimental results indicate that the proposed method outperforms other classification methods in brake disc quality prediction.
Collapse
Affiliation(s)
- Mo Chen
- School of Mechanical Engineering, Shenyang University of Technology, Shenyang, China
| | - Zhe Wei
- School of Mechanical Engineering, Shenyang University of Technology, Shenyang, China.
- Shenyang Innovative Design & Research Institute Co., Ltd., Shenyang, China.
| | - Li Li
- School of Mechanical Engineering, Shenyang University of Technology, Shenyang, China
| | - Kai Zhang
- School of Mechanical Engineering, Shenyang University of Technology, Shenyang, China
| |
Collapse
|
13
|
Ahmad Amshi H, Prasad R, Sharma BK, Yusuf SI, Sani Z. How can machine learning predict cholera: insights from experiments and design science for action research. JOURNAL OF WATER AND HEALTH 2024; 22:21-35. [PMID: 38295070 PMCID: wh_2023_026 DOI: 10.2166/wh.2023.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2024]
Abstract
Cholera is a leading cause of mortality in Nigeria. The two most significant predictors of cholera are a lack of access to clean water and poor sanitary conditions. Other factors such as natural disasters, illiteracy, and internal conflicts that drive people to seek sanctuary in refugee camps may contribute to the spread of cholera in Nigeria. The aim of this research is to develop a cholera outbreak risk prediction (CORP) model using machine learning tools and data science. In this study, we developed a CORP model using design science perspectives and machine learning to detect cholera outbreaks in Nigeria. Nonnegative matrix factorization (NMF) was used for dimensionality reduction, and synthetic minority oversampling technique (SMOTE) was used for data balancing. Outliers were detected using density-based spatial clustering of applications with noise (DBSCAN) were removed improving the overall performance of the model, and the extreme-gradient boost algorithm was used for prediction. The findings revealed that the CORP model outcomes resulted in the best accuracy of 99.62%, Matthews's correlation coefficient of 0.976, and area under the curve of 99.2%, which were improved compared with the previous findings. The developed model can be helpful to healthcare providers in predicting possible cholera outbreaks.
Collapse
Affiliation(s)
- Hauwa Ahmad Amshi
- African University of Science and Technology, Abuja, Nigeria E-mail:
| | - Rajesh Prasad
- Department of Computer Science and Engineering, Ajay Kumar Garg Engineering College, Ghaziabad, India
| | | | | | | |
Collapse
|
14
|
Zhu JJ, Yang M, Ren ZJ. Machine Learning in Environmental Research: Common Pitfalls and Best Practices. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:17671-17689. [PMID: 37384597 DOI: 10.1021/acs.est.3c00026] [Citation(s) in RCA: 102] [Impact Index Per Article: 51.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Machine learning (ML) is increasingly used in environmental research to process large data sets and decipher complex relationships between system variables. However, due to the lack of familiarity and methodological rigor, inadequate ML studies may lead to spurious conclusions. In this study, we synthesized literature analysis with our own experience and provided a tutorial-like compilation of common pitfalls along with best practice guidelines for environmental ML research. We identified more than 30 key items and provided evidence-based data analysis based on 148 highly cited research articles to exhibit the misconceptions of terminologies, proper sample size and feature size, data enrichment and feature selection, randomness assessment, data leakage management, data splitting, method selection and comparison, model optimization and evaluation, and model explainability and causality. By analyzing good examples on supervised learning and reference modeling paradigms, we hope to help researchers adopt more rigorous data preprocessing and model development standards for more accurate, robust, and practicable model uses in environmental research and applications.
Collapse
Affiliation(s)
- Jun-Jie Zhu
- Department of Civil and Environmental Engineering and Andlinger Center for Energy and the Environment, Princeton University, Princeton, New Jersey 08544, United States
| | - Meiqi Yang
- Department of Civil and Environmental Engineering and Andlinger Center for Energy and the Environment, Princeton University, Princeton, New Jersey 08544, United States
| | - Zhiyong Jason Ren
- Department of Civil and Environmental Engineering and Andlinger Center for Energy and the Environment, Princeton University, Princeton, New Jersey 08544, United States
| |
Collapse
|
15
|
Obukhov NV, Naish PLN, Solnyshkina IE, Siourdaki TG, Martynov IA. Real-time assessment of hypnotic depth, using an EEG-based brain-computer interface: a preliminary study. BMC Res Notes 2023; 16:288. [PMID: 37875937 PMCID: PMC10599062 DOI: 10.1186/s13104-023-06553-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 10/02/2023] [Indexed: 10/26/2023] Open
Abstract
OBJECTIVE Hypnosis can be an effective treatment for many conditions, and there have been attempts to develop instrumental approaches to continuously monitor hypnotic state level ("depth"). However, there is no method that addresses the individual variability of electrophysiological hypnotic correlates. We explore the possibility of using an EEG-based passive brain-computer interface (pBCI) for real-time, individualised estimation of the hypnosis deepening process. RESULTS The wakefulness and deep hypnosis intervals were manually defined and labelled in 27 electroencephalographic (EEG) recordings obtained from eight outpatients after hypnosis sessions. Spectral analysis showed that EEG correlates of deep hypnosis were relatively stable in each patient throughout the treatment but varied between patients. Data from each first session was used to train classification models to continuously assess deep hypnosis probability in subsequent sessions. Models trained using four frequency bands (1.5-45, 1.5-8, 1.5-14, and 4-15 Hz) showed accuracy mostly exceeding 85% in a 10-fold cross-validation. Real-time classification accuracy was also acceptable, so at least one of the four bands yielded results exceeding 74% in any session. The best results averaged across all sessions were obtained using 1.5-14 and 4-15 Hz, with an accuracy of 82%. The revealed issues are also discussed.
Collapse
Affiliation(s)
- Nikita V Obukhov
- Research Department, The Association of Experts in the Field of Clinical Hypnosis, 40, Kamennoostrovsky Ave., 410, Saint Petersburg, 197022, Russian Federation.
- Department of Psychotherapy, Academician I.P. Pavlov First St. Petersburg State Medical University, 6-8, L. Tolstoy str, Saint Petersburg, 197022, Russian Federation.
| | - Peter L N Naish
- Department of Psychology, The Open University, Walton Hall, Milton Keynes, MK7 6AA, UK
| | - Irina E Solnyshkina
- Department of Psychotherapy, Academician I.P. Pavlov First St. Petersburg State Medical University, 6-8, L. Tolstoy str, Saint Petersburg, 197022, Russian Federation
| | - Tatiana G Siourdaki
- Research Department, The Association of Experts in the Field of Clinical Hypnosis, 40, Kamennoostrovsky Ave., 410, Saint Petersburg, 197022, Russian Federation
| | - Ilya A Martynov
- Research Department, The Association of Experts in the Field of Clinical Hypnosis, 40, Kamennoostrovsky Ave., 410, Saint Petersburg, 197022, Russian Federation
| |
Collapse
|
16
|
Zhang L, Li C, Zhang R, Sun Q. Online semi-supervised learning for motor imagery EEG classification. Comput Biol Med 2023; 165:107405. [PMID: 37678137 DOI: 10.1016/j.compbiomed.2023.107405] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 07/29/2023] [Accepted: 08/26/2023] [Indexed: 09/09/2023]
Abstract
OBJECTIVE Time-consuming data labeling in brain-computer interfaces (BCIs) raises many problems such as mental fatigue and is one key factor that hinders the real-world adoption of motor imagery (MI)-based BCIs. An alternative approach is to integrate readily available, as well as informative, unlabeled data online, whereas this approach is less investigated. APPROACH We proposed an online semi-supervised learning scheme to improve the classification performance of MI-based BCI. This scheme uses regularized weighted online sequential extreme learning machine (RWOS-ELM) as the base classifier and updates its model parameters with incoming balanced data chunk-by-chunk. In the initial stage, we designed a technique that combines the synthetic minority oversampling with the edited nearest neighbor rule for data augmentation to construct more discriminative initial classifiers. When used online, the incoming chunk of data is first pseudo-labeled by RWOS-ELM as well as an auxiliary classifier, and then balanced again by the above-mentioned technique. Initial classifiers are further updated based on these class-balanced data. MAIN RESULTS Offline experimental results on two publicly available MI datasets demonstrate the superiority of the proposed scheme over its counterparts. Further online experiments on six subjects show that their BCI performance gradually improved by learning from incoming unlabeled data. SIGNIFICANCE Our proposed online semi-supervised learning scheme has higher computation and memory usage efficiency, which is promising for online MI-based BCIs, especially in the case of insufficient labeled training data.
Collapse
Affiliation(s)
- Li Zhang
- State Key Laboratory of Power Transmission Equipment & System Security and New Technology, School of Electrical Engineering, Chongqing University, Chongqing, 400044, People's Republic of China.
| | - Changsheng Li
- State Key Laboratory of Power Transmission Equipment & System Security and New Technology, School of Electrical Engineering, Chongqing University, Chongqing, 400044, People's Republic of China
| | - Run Zhang
- Marketing Service Center, State Grid Chongqing Electric Power Company, Yuzhong District, Chongqing, 400014, People's Republic of China
| | - Qiang Sun
- State Key Laboratory of Power Transmission Equipment & System Security and New Technology, School of Electrical Engineering, Chongqing University, Chongqing, 400044, People's Republic of China
| |
Collapse
|
17
|
Eşsiz UE, Yüregir OH, Saraç E. Applying data mining techniques to predict vitamin D deficiency in diabetic patients. Health Informatics J 2023; 29:14604582231214864. [PMID: 37963409 DOI: 10.1177/14604582231214864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
Vitamin D is among the vitamins necessary for both adults' and children's health. It plays a significant role in calcium absorption, the immune system, cell proliferation and differentiation, bone protection, skeletal health, rickets, muscle health, heart health, disease pathogenesis and severity, glucose metabolism, glucose intolerance, varying insulin secretion, and diabetes. Because the 25-hydroxyvitamin D (25OHD) test, which is used to measure vitamin D is expensive and may not be covered in healthcare benefits in many countries, this study aims to predict vitamin D deficiency in diabetic patients. The prediction method is based on data mining techniques combined with feature selection by using historical electronic health records. The results were compared with a filter-based feature selection algorithm, namely relief-F. Non-valuable features were eliminated effectively with the relief-F feature selection method without any performance loss in classification. The performances of the methods were evaluated using classification accuracy (ACC), sensitivity, specificity, F1-score, precision, kappa results, and receiver operating characteristic (ROC) curves. The analyses have been conducted on a vitamin D dataset of diabetic patients and the results show that the highest classification accuracy of 97.044% was obtained for the support vector machines (SVM) model using radial kernel that contains 18 features.
Collapse
Affiliation(s)
- Uğur Engin Eşsiz
- Department of Industrial Engineering, Çukurova University, Adana, Turkey
| | - Oya Hacire Yüregir
- Department of Industrial Engineering, Çukurova University, Adana, Turkey
| | - Esra Saraç
- Department of Computer Engineering, Adana Alparslan Türkeş Science and Technology University, Adana, Turkey
| |
Collapse
|
18
|
Park S, Kim JH, Cha YK, Chung MJ, Woo JH, Park S. Application of Machine Learning Algorithm in Predicting Axillary Lymph Node Metastasis from Breast Cancer on Preoperative Chest CT. Diagnostics (Basel) 2023; 13:2953. [PMID: 37761320 PMCID: PMC10528867 DOI: 10.3390/diagnostics13182953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 09/05/2023] [Accepted: 09/13/2023] [Indexed: 09/29/2023] Open
Abstract
Axillary lymph node (ALN) status is one of the most critical prognostic factors in patients with breast cancer. However, ALN evaluation with contrast-enhanced CT (CECT) has been challenging. Machine learning (ML) is known to show excellent performance in image recognition tasks. The purpose of our study was to evaluate the performance of the ML algorithm for predicting ALN metastasis by combining preoperative CECT features of both ALN and primary tumor. This was a retrospective single-institutional study of a total of 266 patients with breast cancer who underwent preoperative chest CECT. Random forest (RF), extreme gradient boosting (XGBoost), and neural network (NN) algorithms were used. Statistical analysis and recursive feature elimination (RFE) were adopted as feature selection for ML. The best ML-based ALN prediction model for breast cancer was NN with RFE, which achieved an AUROC of 0.76 ± 0.11 and an accuracy of 0.74 ± 0.12. By comparing NN with RFE model performance with and without ALN features from CECT, NN with RFE model with ALN features showed better performance at all performance evaluations, which indicated the effect of ALN features. Through our study, we were able to demonstrate that the ML algorithm could effectively predict the final diagnosis of ALN metastases from CECT images of the primary tumor and ALN. This suggests that ML has the potential to differentiate between benign and malignant ALNs.
Collapse
Affiliation(s)
- Soyoung Park
- Department of Health Sciences and Technology, SAIHST, Sungkyunkwan University, Seoul 06351, Republic of Korea; (S.P.); (S.P.)
| | - Jong Hee Kim
- Department of Radiology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul 06351, Republic of Korea; (J.H.K.); (J.H.W.)
| | - Yoon Ki Cha
- Department of Radiology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul 06351, Republic of Korea; (J.H.K.); (J.H.W.)
| | - Myung Jin Chung
- Department of Radiology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul 06351, Republic of Korea; (J.H.K.); (J.H.W.)
| | - Jung Han Woo
- Department of Radiology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul 06351, Republic of Korea; (J.H.K.); (J.H.W.)
| | - Subin Park
- Department of Health Sciences and Technology, SAIHST, Sungkyunkwan University, Seoul 06351, Republic of Korea; (S.P.); (S.P.)
| |
Collapse
|
19
|
Habenicht R, Fehrmann E, Blohm P, Ebenbichler G, Fischer-Grote L, Kollmitzer J, Mair P, Kienbacher T. Machine Learning Based Linking of Patient Reported Outcome Measures to WHO International Classification of Functioning, Disability, and Health Activity/Participation Categories. J Clin Med 2023; 12:5609. [PMID: 37685676 PMCID: PMC10488436 DOI: 10.3390/jcm12175609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/06/2023] [Accepted: 08/23/2023] [Indexed: 09/10/2023] Open
Abstract
BACKGROUND In the primary and secondary medical health sector, patient reported outcome measures (PROMs) are widely used to assess a patient's disease-related functional health state. However, the World Health Organization (WHO), in its recently adopted resolution on "strengthening rehabilitation in all health systems", encourages that all health sectors, not only the rehabilitation sector, classify a patient's functioning and health state according to the International Classification of Functioning, Disability and Health (ICF). AIM This research sought to optimize machine learning (ML) methods that fully and automatically link information collected from PROMs in persons with unspecific chronic low back pain (cLBP) to limitations in activities and restrictions in participation that are listed in the WHO core set categories for LBP. The study also aimed to identify the minimal set of PROMs necessary for linking without compromising performance. METHODS A total of 806 patients with cLBP completed a comprehensive set of validated PROMs and were interviewed by clinical psychologists who assessed patients' performance in activity limitations and restrictions in participation according to the ICF brief core set for low back pain (LBP). The information collected was then utilized to further develop random forest (RF) methods that classified the presence or absence of a problem within each of the activity participation ICF categories of the ICF core set for LBP. Further analyses identified those PROM items relevant to the linking process and validated the respective linking performance that utilized a minimal subset of items. RESULTS Compared to a recently developed ML linking method, receiver operating characteristic curve (ROC-AUC) values for the novel RF methods showed overall improved performance, with AUC values ranging from 0.73 for the ICF category d850 to 0.81 for the ICF category d540. Variable importance measurements revealed that minimal subsets of either 24 or 15 important PROM variables (out of 80 items included in full set of PROMs) would show similar linking performance. CONCLUSIONS Findings suggest that our optimized ML based methods more accurately predict the presence or absence of limitations and restrictions listed in ICF core categories for cLBP. In addition, this accurate performance would not suffer if the list of PROM items was reduced to a minimum of 15 out of 80 items assessed.
Collapse
Affiliation(s)
- Richard Habenicht
- Karl-Landsteiner-Institute of Outpatient Rehabilitation Research, 1230 Vienna, Austria; (R.H.); (P.B.); (G.E.); (L.F.-G.); (T.K.)
| | - Elisabeth Fehrmann
- Karl-Landsteiner-Institute of Outpatient Rehabilitation Research, 1230 Vienna, Austria; (R.H.); (P.B.); (G.E.); (L.F.-G.); (T.K.)
- Department of Psychology, Karl Landsteiner University of Health Sciences, 3500 Krems, Austria
| | - Peter Blohm
- Karl-Landsteiner-Institute of Outpatient Rehabilitation Research, 1230 Vienna, Austria; (R.H.); (P.B.); (G.E.); (L.F.-G.); (T.K.)
| | - Gerold Ebenbichler
- Karl-Landsteiner-Institute of Outpatient Rehabilitation Research, 1230 Vienna, Austria; (R.H.); (P.B.); (G.E.); (L.F.-G.); (T.K.)
- Department of Physical Medicine, Rehabilitation and Occupational Medicine, Medical University of Vienna, 1090 Vienna, Austria
| | - Linda Fischer-Grote
- Karl-Landsteiner-Institute of Outpatient Rehabilitation Research, 1230 Vienna, Austria; (R.H.); (P.B.); (G.E.); (L.F.-G.); (T.K.)
| | - Josef Kollmitzer
- Department of Biomedical Engineering, TGM College for Higher Vocational Education, 1200 Vienna, Austria;
| | - Patrick Mair
- Department of Psychology, Harvard University, Cambridge, MA 02138, USA;
| | - Thomas Kienbacher
- Karl-Landsteiner-Institute of Outpatient Rehabilitation Research, 1230 Vienna, Austria; (R.H.); (P.B.); (G.E.); (L.F.-G.); (T.K.)
| |
Collapse
|
20
|
Choi SG, Oh M, Park DH, Lee B, Lee YH, Jee SH, Jeon JY. Comparisons of the prediction models for undiagnosed diabetes between machine learning versus traditional statistical methods. Sci Rep 2023; 13:13101. [PMID: 37567907 PMCID: PMC10421881 DOI: 10.1038/s41598-023-40170-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 08/06/2023] [Indexed: 08/13/2023] Open
Abstract
We compared the prediction performance of machine learning-based undiagnosed diabetes prediction models with that of traditional statistics-based prediction models. We used the 2014-2020 Korean National Health and Nutrition Examination Survey (KNHANES) (N = 32,827). The KNHANES 2014-2018 data were used as training and internal validation sets and the 2019-2020 data as external validation sets. The receiver operating characteristic curve area under the curve (AUC) was used to compare the prediction performance of the machine learning-based and the traditional statistics-based prediction models. Using sex, age, resting heart rate, and waist circumference as features, the machine learning-based model showed a higher AUC (0.788 vs. 0.740) than that of the traditional statistical-based prediction model. Using sex, age, waist circumference, family history of diabetes, hypertension, alcohol consumption, and smoking status as features, the machine learning-based prediction model showed a higher AUC (0.802 vs. 0.759) than the traditional statistical-based prediction model. The machine learning-based prediction model using features for maximum prediction performance showed a higher AUC (0.819 vs. 0.765) than the traditional statistical-based prediction model. Machine learning-based prediction models using anthropometric and lifestyle measurements may outperform the traditional statistics-based prediction models in predicting undiagnosed diabetes.
Collapse
Affiliation(s)
- Seong Gyu Choi
- Department of Sports Industry Studies, Yonsei University, Seoul, Republic of Korea
| | - Minsuk Oh
- Department of Sports Industry Studies, Yonsei University, Seoul, Republic of Korea
- Frontier Research Institute of Convergence Sports Science, Yonsei University, Seoul, Republic of Korea
| | - Dong-Hyuk Park
- Department of Sports Industry Studies, Yonsei University, Seoul, Republic of Korea
| | | | - Yong-Ho Lee
- Department of Internal Medicine, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Sun Ha Jee
- Institute for Health Promotion, Graduate School of Public Health, Yonsei University, Seoul, Republic of Korea
| | - Justin Y Jeon
- Department of Sports Industry Studies, Yonsei University, Seoul, Republic of Korea.
- Frontier Research Institute of Convergence Sports Science, Yonsei University, Seoul, Republic of Korea.
- Exercise Medicine Center for Diabetes and Cancer Patients, ICONS, Seoul, Republic of Korea.
- Cancer Prevention Center Shinchon Severance, Yonsei University College of Medicine, Shinchon-Dong, Seodaemun-Gu, Seoul, 120-749, Republic of Korea.
| |
Collapse
|
21
|
Moon JW, Yang E, Kim JH, Kwon OJ, Park M, Yi CA. Predicting Non-Small-Cell Lung Cancer Survival after Curative Surgery via Deep Learning of Diffusion MRI. Diagnostics (Basel) 2023; 13:2555. [PMID: 37568918 PMCID: PMC10417371 DOI: 10.3390/diagnostics13152555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 07/19/2023] [Accepted: 07/27/2023] [Indexed: 08/13/2023] Open
Abstract
BACKGROUND the objective of this study is to evaluate the predictive power of the survival model using deep learning of diffusion-weighted images (DWI) in patients with non-small-cell lung cancer (NSCLC). METHODS DWI at b-values of 0, 100, and 700 sec/mm2 (DWI0, DWI100, DWI700) were preoperatively obtained for 100 NSCLC patients who underwent curative surgery (57 men, 43 women; mean age, 62 years). The ADC0-100 (perfusion-sensitive ADC), ADC100-700 (perfusion-insensitive ADC), ADC0-100-700, and demographic features were collected as input data and 5-year survival was collected as output data. Our survival model adopted transfer learning from a pre-trained VGG-16 network, whereby the softmax layer was replaced with the binary classification layer for the prediction of 5-year survival. Three channels of input data were selected in combination out of DWIs and ADC images and their accuracies and AUCs were compared for the best performance during 10-fold cross validation. RESULTS 66 patients survived, and 34 patients died. The predictive performance was the best in the following combination: DWI0-ADC0-100-ADC0-100-700 (accuracy: 92%; AUC: 0.904). This was followed by DWI0-DWI700-ADC0-100-700, DWI0-DWI100-DWI700, and DWI0-DWI0-DWI0 (accuracy: 91%, 81%, 76%; AUC: 0.889, 0.763, 0.711, respectively). Survival prediction models trained with ADC performed significantly better than the one trained with DWI only (p-values < 0.05). The survival prediction was improved when demographic features were added to the model with only DWIs, but the benefit of clinical information was not prominent when added to the best performing model using both DWI and ADC. CONCLUSIONS Deep learning may play a role in the survival prediction of lung cancer. The performance of learning can be enhanced by inputting precedented, proven functional parameters of the ADC instead of the original data of DWIs only.
Collapse
Affiliation(s)
- Jung Won Moon
- Department of Radiology, Kangnam Sacred Heart Hospital, Hallym University School of Medicine, Seoul 07441, Republic of Korea;
| | - Ehwa Yang
- Department of Radiology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul 06351, Republic of Korea;
| | - Jae-Hun Kim
- Department of Radiology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul 06351, Republic of Korea;
| | - O Jung Kwon
- Division of Respiratory and Critical Care Medicine, Department of Internal Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul 06351, Republic of Korea;
| | - Minsu Park
- Department of Information and Statistics, Chungnam National University, Daejeon 34134, Republic of Korea;
| | - Chin A Yi
- Department of Radiology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul 06351, Republic of Korea;
| |
Collapse
|
22
|
Yi F, Yang H, Chen D, Qin Y, Han H, Cui J, Bai W, Ma Y, Zhang R, Yu H. XGBoost-SHAP-based interpretable diagnostic framework for alzheimer's disease. BMC Med Inform Decis Mak 2023; 23:137. [PMID: 37491248 PMCID: PMC10369804 DOI: 10.1186/s12911-023-02238-9] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Accepted: 07/13/2023] [Indexed: 07/27/2023] Open
Abstract
BACKGROUND Due to the class imbalance issue faced when Alzheimer's disease (AD) develops from normal cognition (NC) to mild cognitive impairment (MCI), present clinical practice is met with challenges regarding the auxiliary diagnosis of AD using machine learning (ML). This leads to low diagnosis performance. We aimed to construct an interpretable framework, extreme gradient boosting-Shapley additive explanations (XGBoost-SHAP), to handle the imbalance among different AD progression statuses at the algorithmic level. We also sought to achieve multiclassification of NC, MCI, and AD. METHODS We obtained patient data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, including clinical information, neuropsychological test results, neuroimaging-derived biomarkers, and APOE-ε4 gene statuses. First, three feature selection algorithms were applied, and they were then included in the XGBoost algorithm. Due to the imbalance among the three classes, we changed the sample weight distribution to achieve multiclassification of NC, MCI, and AD. Then, the SHAP method was linked to XGBoost to form an interpretable framework. This framework utilized attribution ideas that quantified the impacts of model predictions into numerical values and analysed them based on their directions and sizes. Subsequently, the top 10 features (optimal subset) were used to simplify the clinical decision-making process, and their performance was compared with that of a random forest (RF), Bagging, AdaBoost, and a naive Bayes (NB) classifier. Finally, the National Alzheimer's Coordinating Center (NACC) dataset was employed to assess the impact path consistency of the features within the optimal subset. RESULTS Compared to the RF, Bagging, AdaBoost, NB and XGBoost (unweighted), the interpretable framework had higher classification performance with accuracy improvements of 0.74%, 0.74%, 1.46%, 13.18%, and 0.83%, respectively. The framework achieved high sensitivity (81.21%/74.85%), specificity (92.18%/89.86%), accuracy (87.57%/80.52%), area under the receiver operating characteristic curve (AUC) (0.91/0.88), positive clinical utility index (0.71/0.56), and negative clinical utility index (0.75/0.68) on the ADNI and NACC datasets, respectively. In the ADNI dataset, the top 10 features were found to have varying associations with the risk of AD onset based on their SHAP values. Specifically, the higher SHAP values of CDRSB, ADAS13, ADAS11, ventricle volume, ADASQ4, and FAQ were associated with higher risks of AD onset. Conversely, the higher SHAP values of LDELTOTAL, mPACCdigit, RAVLT_immediate, and MMSE were associated with lower risks of AD onset. Similar results were found for the NACC dataset. CONCLUSIONS The proposed interpretable framework contributes to achieving excellent performance in imbalanced AD multiclassification tasks and provides scientific guidance (optimal subset) for clinical decision-making, thereby facilitating disease management and offering new research ideas for optimizing AD prevention and treatment programs.
Collapse
Affiliation(s)
- Fuliang Yi
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001 P.R. China
| | - Hui Yang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001 P.R. China
| | - Durong Chen
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001 P.R. China
| | - Yao Qin
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001 P.R. China
| | - Hongjuan Han
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001 P.R. China
| | - Jing Cui
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001 P.R. China
| | - Wenlin Bai
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001 P.R. China
| | - Yifei Ma
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001 P.R. China
| | - Rong Zhang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001 P.R. China
| | - Hongmei Yu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001 P.R. China
- Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China
| |
Collapse
|
23
|
Chen Z, Hu B, Liu X, Becker B, Eickhoff SB, Miao K, Gu X, Tang Y, Dai X, Li C, Leonov A, Xiao Z, Feng Z, Chen J, Chuan-Peng H. Sampling inequalities affect generalization of neuroimaging-based diagnostic classifiers in psychiatry. BMC Med 2023; 21:241. [PMID: 37400814 DOI: 10.1186/s12916-023-02941-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 06/13/2023] [Indexed: 07/05/2023] Open
Abstract
BACKGROUND The development of machine learning models for aiding in the diagnosis of mental disorder is recognized as a significant breakthrough in the field of psychiatry. However, clinical practice of such models remains a challenge, with poor generalizability being a major limitation. METHODS Here, we conducted a pre-registered meta-research assessment on neuroimaging-based models in the psychiatric literature, quantitatively examining global and regional sampling issues over recent decades, from a view that has been relatively underexplored. A total of 476 studies (n = 118,137) were included in the current assessment. Based on these findings, we built a comprehensive 5-star rating system to quantitatively evaluate the quality of existing machine learning models for psychiatric diagnoses. RESULTS A global sampling inequality in these models was revealed quantitatively (sampling Gini coefficient (G) = 0.81, p < .01), varying across different countries (regions) (e.g., China, G = 0.47; the USA, G = 0.58; Germany, G = 0.78; the UK, G = 0.87). Furthermore, the severity of this sampling inequality was significantly predicted by national economic levels (β = - 2.75, p < .001, R2adj = 0.40; r = - .84, 95% CI: - .41 to - .97), and was plausibly predictable for model performance, with higher sampling inequality for reporting higher classification accuracy. Further analyses showed that lack of independent testing (84.24% of models, 95% CI: 81.0-87.5%), improper cross-validation (51.68% of models, 95% CI: 47.2-56.2%), and poor technical transparency (87.8% of models, 95% CI: 84.9-90.8%)/availability (80.88% of models, 95% CI: 77.3-84.4%) are prevailing in current diagnostic classifiers despite improvements over time. Relating to these observations, model performances were found decreased in studies with independent cross-country sampling validations (all p < .001, BF10 > 15). In light of this, we proposed a purpose-built quantitative assessment checklist, which demonstrated that the overall ratings of these models increased by publication year but were negatively associated with model performance. CONCLUSIONS Together, improving sampling economic equality and hence the quality of machine learning models may be a crucial facet to plausibly translating neuroimaging-based diagnostic classifiers into clinical practice.
Collapse
Affiliation(s)
- Zhiyi Chen
- Experimental Research Center for Medical and Psychological Science (ERC-MPS), School of Psychology, Third Military Medical University, Chongqing, China.
- Faculty of Psychology, Southwest University, Chongqing, China.
| | - Bowen Hu
- Faculty of Psychology, Southwest University, Chongqing, China
| | - Xuerong Liu
- Experimental Research Center for Medical and Psychological Science (ERC-MPS), School of Psychology, Third Military Medical University, Chongqing, China
| | - Benjamin Becker
- The Center of Psychosomatic Medicine, Sichuan Provincial Center for Mental Health, Sichuan Provincial People's Hospital, Chengdu, China
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, University of Electronic Science and Technology of China, Chengdu, China
| | - Simon B Eickhoff
- Institute of Systems Neuroscience, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Kuan Miao
- Experimental Research Center for Medical and Psychological Science (ERC-MPS), School of Psychology, Third Military Medical University, Chongqing, China
| | - Xingmei Gu
- Experimental Research Center for Medical and Psychological Science (ERC-MPS), School of Psychology, Third Military Medical University, Chongqing, China
| | - Yancheng Tang
- School of Business and Management, Shanghai International Studies University, Shanghai, China
| | - Xin Dai
- Faculty of Psychology, Southwest University, Chongqing, China
| | - Chao Li
- Department of Radiology, The Third Affiliated Hospital, Sun Yat-Sen University, Guangdong, China
| | - Artemiy Leonov
- School of Psychology, Clark University, Worcester, MA, USA
| | - Zhibing Xiao
- State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, China
| | - Zhengzhi Feng
- Experimental Research Center for Medical and Psychological Science (ERC-MPS), School of Psychology, Third Military Medical University, Chongqing, China
| | - Ji Chen
- Department of Psychology and Behavioral Sciences, Zhejiang University, Hangzhou, China.
- Department of Psychiatry, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, Zhejiang, China.
| | - Hu Chuan-Peng
- School of Psychology, Nanjing Normal University, Nanjing, China
| |
Collapse
|
24
|
Bradshaw TJ, Huemann Z, Hu J, Rahmim A. A Guide to Cross-Validation for Artificial Intelligence in Medical Imaging. Radiol Artif Intell 2023; 5:e220232. [PMID: 37529208 PMCID: PMC10388213 DOI: 10.1148/ryai.220232] [Citation(s) in RCA: 44] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 05/02/2023] [Accepted: 05/10/2023] [Indexed: 08/03/2023]
Abstract
Artificial intelligence (AI) is being increasingly used to automate and improve technologies within the field of medical imaging. A critical step in the development of an AI algorithm is estimating its prediction error through cross-validation (CV). The use of CV can help prevent overoptimism in AI algorithms and can mitigate certain biases associated with hyperparameter tuning and algorithm selection. This article introduces the principles of CV and provides a practical guide on the use of CV for AI algorithm development in medical imaging. Different CV techniques are described, as well as their advantages and disadvantages under different scenarios. Common pitfalls in prediction error estimation and guidance on how to avoid them are also discussed. Keywords: Education, Research Design, Technical Aspects, Statistics, Supervised Learning, Convolutional Neural Network (CNN) Supplemental material is available for this article. © RSNA, 2023.
Collapse
|
25
|
Tozlu C, Card S, Jamison K, Gauthier SA, Kuceyeski A. Larger lesion volume in people with multiple sclerosis is associated with increased transition energies between brain states and decreased entropy of brain activity. Netw Neurosci 2023; 7:539-556. [PMID: 37397885 PMCID: PMC10312270 DOI: 10.1162/netn_a_00292] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 11/07/2022] [Indexed: 01/10/2024] Open
Abstract
Quantifying the relationship between the brain's functional activity patterns and its structural backbone is crucial when relating the severity of brain pathology to disability in multiple sclerosis (MS). Network control theory (NCT) characterizes the brain's energetic landscape using the structural connectome and patterns of brain activity over time. We applied NCT to investigate brain-state dynamics and energy landscapes in controls and people with MS (pwMS). We also computed entropy of brain activity and investigated its association with the dynamic landscape's transition energy and lesion volume. Brain states were identified by clustering regional brain activity vectors, and NCT was applied to compute the energy required to transition between these brain states. We found that entropy was negatively correlated with lesion volume and transition energy, and that larger transition energies were associated with pwMS with disability. This work supports the notion that shifts in the pattern of brain activity in pwMS without disability results in decreased transition energies compared to controls, but, as this shift evolves over the disease, transition energies increase beyond controls and disability occurs. Our results provide the first evidence in pwMS that larger lesion volumes result in greater transition energy between brain states and decreased entropy of brain activity.
Collapse
Affiliation(s)
- Ceren Tozlu
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA
| | - Sophie Card
- Horace Greeley High School, Chappaqua, NY, USA
| | - Keith Jamison
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA
| | - Susan A. Gauthier
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA
- Judith Jaffe Multiple Sclerosis Center, Weill Cornell Medicine, New York, NY, USA
- Department of Neurology, Weill Cornell Medical College, New York, NY, USA
| | - Amy Kuceyeski
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA
- Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
| |
Collapse
|
26
|
Lai DKH, Cheng ESW, Lim HJ, So BPH, Lam WK, Cheung DSK, Wong DWC, Cheung JCW. Computer-aided screening of aspiration risks in dysphagia with wearable technology: a Systematic Review and meta-analysis on test accuracy. Front Bioeng Biotechnol 2023; 11:1205009. [PMID: 37441197 PMCID: PMC10334490 DOI: 10.3389/fbioe.2023.1205009] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 06/20/2023] [Indexed: 07/15/2023] Open
Abstract
Aspiration caused by dysphagia is a prevalent problem that causes serious health consequences and even death. Traditional diagnostic instruments could induce pain, discomfort, nausea, and radiation exposure. The emergence of wearable technology with computer-aided screening might facilitate continuous or frequent assessments to prompt early and effective management. The objectives of this review are to summarize these systems to identify aspiration risks in dysphagic individuals and inquire about their accuracy. Two authors independently searched electronic databases, including CINAHL, Embase, IEEE Xplore® Digital Library, PubMed, Scopus, and Web of Science (PROSPERO reference number: CRD42023408960). The risk of bias and applicability were assessed using QUADAS-2. Nine (n = 9) articles applied accelerometers and/or acoustic devices to identify aspiration risks in patients with neurodegenerative problems (e.g., dementia, Alzheimer's disease), neurogenic problems (e.g., stroke, brain injury), in addition to some children with congenital abnormalities, using videofluoroscopic swallowing study (VFSS) or fiberoptic endoscopic evaluation of swallowing (FEES) as the reference standard. All studies employed a traditional machine learning approach with a feature extraction process. Support vector machine (SVM) was the most famous machine learning model used. A meta-analysis was conducted to evaluate the classification accuracy and identify risky swallows. Nevertheless, we decided not to conclude the meta-analysis findings (pooled diagnostic odds ratio: 21.5, 95% CI, 2.7-173.6) because studies had unique methodological characteristics and major differences in the set of parameters/thresholds, in addition to the substantial heterogeneity and variations, with sensitivity levels ranging from 21.7% to 90.0% between studies. Small sample sizes could be a critical problem in existing studies (median = 34.5, range 18-449), especially for machine learning models. Only two out of the nine studies had an optimized model with sensitivity over 90%. There is a need to enlarge the sample size for better generalizability and optimize signal processing, segmentation, feature extraction, classifiers, and their combinations to improve the assessment performance. Systematic Review Registration: (https://www.crd.york.ac.uk/prospero/), identifier (CRD42023408960).
Collapse
Affiliation(s)
- Derek Ka-Hei Lai
- Department of Biomedical Engineering, Faculty of Engineering, The Hong Kong Polytechnic University, Hong Kong, China
| | - Ethan Shiu-Wang Cheng
- Department of Electronic and Information Engineering, Faculty of Engineering, The Hong Kong Polytechnic University, Hong Kong, China
| | - Hyo-Jung Lim
- Department of Biomedical Engineering, Faculty of Engineering, The Hong Kong Polytechnic University, Hong Kong, China
| | - Bryan Pak-Hei So
- Department of Biomedical Engineering, Faculty of Engineering, The Hong Kong Polytechnic University, Hong Kong, China
| | - Wing-Kai Lam
- Sports Information and External Affairs Centre, Hong Kong Sports Institute Ltd, Hong Kong, China
| | - Daphne Sze Ki Cheung
- School of Nursing, The Hong Kong Polytechnic University, Hong Kong, China
- Research Institute of Smart Ageing, The Hong Kong Polytechnic University, Hong Kong, China
| | - Duo Wai-Chi Wong
- Department of Biomedical Engineering, Faculty of Engineering, The Hong Kong Polytechnic University, Hong Kong, China
| | - James Chung-Wai Cheung
- Department of Biomedical Engineering, Faculty of Engineering, The Hong Kong Polytechnic University, Hong Kong, China
- Research Institute of Smart Ageing, The Hong Kong Polytechnic University, Hong Kong, China
| |
Collapse
|
27
|
Opwonya J, Ku B, Lee KH, Kim JI, Kim JU. Eye movement changes as an indicator of mild cognitive impairment. Front Neurosci 2023; 17:1171417. [PMID: 37397453 PMCID: PMC10307957 DOI: 10.3389/fnins.2023.1171417] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 05/23/2023] [Indexed: 07/04/2023] Open
Abstract
Background Early identification of patients at risk of dementia, alongside timely medical intervention, can prevent disease progression. Despite their potential clinical utility, the application of diagnostic tools, such as neuropsychological assessments and neuroimaging biomarkers, is hindered by their high cost and time-consuming administration, rendering them impractical for widespread implementation in the general population. We aimed to develop non-invasive and cost-effective classification models for predicting mild cognitive impairment (MCI) using eye movement (EM) data. Methods We collected eye-tracking (ET) data from 594 subjects, 428 cognitively normal controls, and 166 patients with MCI while they performed prosaccade/antisaccade and go/no-go tasks. Logistic regression (LR) was used to calculate the EM metrics' odds ratios (ORs). We then used machine learning models to construct classification models using EM metrics, demographic characteristics, and brief cognitive screening test scores. Model performance was evaluated based on the area under the receiver operating characteristic curve (AUROC). Results LR models revealed that several EM metrics are significantly associated with increased odds of MCI, with odds ratios ranging from 1.213 to 1.621. The AUROC scores for models utilizing demographic information and either EM metrics or MMSE were 0.752 and 0.767, respectively. Combining all features, including demographic, MMSE, and EM, notably resulted in the best-performing model, which achieved an AUROC of 0.840. Conclusion Changes in EM metrics linked with MCI are associated with attentional and executive function deficits. EM metrics combined with demographics and cognitive test scores enhance MCI prediction, making it a non-invasive, cost-effective method to identify early stages of cognitive decline.
Collapse
Affiliation(s)
- Julius Opwonya
- Digital Health Research Division, Korea Institute of Oriental Medicine, Daejeon, South Korea
- KM Convergence Science, University of Science and Technology, Daejeon, South Korea
| | - Boncho Ku
- Digital Health Research Division, Korea Institute of Oriental Medicine, Daejeon, South Korea
| | - Kun Ho Lee
- Gwangju Alzheimer’s Disease and Related Dementias (GARD) Cohort Research Center, Chosun University, Gwangju, South Korea
- Department of Biomedical Science, Chosun University, Gwangju, South Korea
- Dementia Research Group, Korea Brain Research Institute, Daegu, South Korea
| | - Joong Il Kim
- Digital Health Research Division, Korea Institute of Oriental Medicine, Daejeon, South Korea
| | - Jaeuk U. Kim
- Digital Health Research Division, Korea Institute of Oriental Medicine, Daejeon, South Korea
- KM Convergence Science, University of Science and Technology, Daejeon, South Korea
| |
Collapse
|
28
|
Hamamoto R, Takasawa K, Shinkai N, Machino H, Kouno N, Asada K, Komatsu M, Kaneko S. Analysis of super-enhancer using machine learning and its application to medical biology. Brief Bioinform 2023; 24:bbad107. [PMID: 36960780 PMCID: PMC10199775 DOI: 10.1093/bib/bbad107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 02/11/2023] [Accepted: 03/01/2023] [Indexed: 03/25/2023] Open
Abstract
The analysis of super-enhancers (SEs) has recently attracted attention in elucidating the molecular mechanisms of cancer and other diseases. SEs are genomic structures that strongly induce gene expression and have been reported to contribute to the overexpression of oncogenes. Because the analysis of SEs and integrated analysis with other data are performed using large amounts of genome-wide data, artificial intelligence technology, with machine learning at its core, has recently begun to be utilized. In promoting precision medicine, it is important to consider information from SEs in addition to genomic data; therefore, machine learning technology is expected to be introduced appropriately in terms of building a robust analysis platform with a high generalization performance. In this review, we explain the history and principles of SE, and the results of SE analysis using state-of-the-art machine learning and integrated analysis with other data are presented to provide a comprehensive understanding of the current status of SE analysis in the field of medical biology. Additionally, we compared the accuracy between existing machine learning methods on the benchmark dataset and attempted to explore the kind of data preprocessing and integration work needed to make the existing algorithms work on the benchmark dataset. Furthermore, we discuss the issues and future directions of current SE analysis.
Collapse
Affiliation(s)
- Ryuji Hamamoto
- Division Chief in the Division of Medical AI Research and Development, National Cancer Center Research Institute; a Professor in the Department of NCC Cancer Science, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University and a Team Leader of the Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project
| | - Ken Takasawa
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project and an External Research Staff in the Medical AI Research and Development, National Cancer Center Research Institute
| | - Norio Shinkai
- Department of NCC Cancer Science, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University
| | - Hidenori Machino
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project and an External Research Staff in the Medical AI Research and Development, National Cancer Center Research Institute
| | - Nobuji Kouno
- Department of Surgery, Graduate School of Medicine, Kyoto University
| | - Ken Asada
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project and an External Research Staff of Medical AI Research and Development, National Cancer Center Research Institute
| | - Masaaki Komatsu
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project and an External Research Staff of Medical AI Research and Development, National Cancer Center Research Institute
| | - Syuzo Kaneko
- Division of Medical AI Research and Development, National Cancer Center Research Institute and a Visiting Scientist in the Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project
| |
Collapse
|
29
|
Bakheet S, Alsubai S, El-Nagar A, Alqahtani A. A Multi-Feature Fusion Framework for Automatic Skin Cancer Diagnostics. Diagnostics (Basel) 2023; 13:diagnostics13081474. [PMID: 37189574 DOI: 10.3390/diagnostics13081474] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 04/07/2023] [Accepted: 04/17/2023] [Indexed: 05/17/2023] Open
Abstract
Malignant melanoma is the most invasive skin cancer and is currently regarded as one of the deadliest disorders; however, it can be cured more successfully if detected and treated early. Recently, CAD (computer-aided diagnosis) systems have emerged as a powerful alternative tool for the automatic detection and categorization of skin lesions, such as malignant melanoma or benign nevus, in given dermoscopy images. In this paper, we propose an integrated CAD framework for rapid and accurate melanoma detection in dermoscopy images. Initially, an input dermoscopy image is pre-processed by using a median filter and bottom-hat filtering for noise reduction, artifact removal, and, thus, enhancing the image quality. After this, each skin lesion is described by an effective skin lesion descriptor with high discrimination and descriptiveness capabilities, which is constructed by calculating the HOG (Histogram of Oriented Gradient) and LBP (Local Binary Patterns) and their extensions. After feature selection, the lesion descriptors are fed into three supervised machine learning classification models, namely SVM (Support Vector Machine), kNN (k-Nearest Neighbors), and GAB (Gentle AdaBoost), to diagnostically classify melanocytic skin lesions into one of two diagnostic categories, melanoma or nevus. Experimental results achieved using 10-fold cross-validation on the publicly available MED-NODEE dermoscopy image dataset demonstrate that the proposed CAD framework performs either competitively or superiorly to several state-of-the-art methods with stronger training settings in relation to various diagnostic metrics, such as accuracy (94%), specificity (92%), and sensitivity (100%).
Collapse
Affiliation(s)
- Samy Bakheet
- Faculty of Computers and Artificial Intelligence, Sohag University, Sohag 82524, Egypt
- Institute for Information Technology and Communications (IIKT), Otto-von-Guericke-University Magdeburg, D-39106 Magdeburg, Germany
| | - Shtwai Alsubai
- College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, Al Kharj 11942, Saudi Arabia
| | - Aml El-Nagar
- Faculty of Computers and Artificial Intelligence, Sohag University, Sohag 82524, Egypt
| | - Abdullah Alqahtani
- College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, Al Kharj 11942, Saudi Arabia
| |
Collapse
|
30
|
Dhamala E, Yeo BTT, Holmes AJ. One Size Does Not Fit All: Methodological Considerations for Brain-Based Predictive Modeling in Psychiatry. Biol Psychiatry 2023; 93:717-728. [PMID: 36577634 DOI: 10.1016/j.biopsych.2022.09.024] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 09/07/2022] [Accepted: 09/23/2022] [Indexed: 12/30/2022]
Abstract
Psychiatric illnesses are heterogeneous in nature. No illness manifests in the same way across individuals, and no two patients with a shared diagnosis exhibit identical symptom profiles. Over the last several decades, group-level analyses of in vivo neuroimaging data have led to fundamental advances in our understanding of the neurobiology of psychiatric illnesses. More recently, access to computational resources and large, publicly available datasets alongside the rise of predictive modeling and precision medicine approaches have facilitated the study of psychiatric illnesses at an individual level. Data-driven machine learning analyses can be applied to identify disease-relevant biological subtypes, predict individual symptom profiles, and recommend personalized therapeutic interventions. However, when developing these predictive models, methodological choices must be carefully considered to ensure accurate, robust, and interpretable results. Choices pertaining to algorithms, neuroimaging modalities and states, data transformation, phenotypes, parcellations, sample sizes, and populations we are specifically studying can influence model performance. Here, we review applications of neuroimaging-based machine learning models to study psychiatric illnesses and discuss the effects of different methodological choices on model performance. An understanding of these effects is crucial for the proper implementation of predictive models in psychiatry and will facilitate more accurate diagnoses, prognoses, and therapeutics.
Collapse
Affiliation(s)
- Elvisha Dhamala
- Department of Psychology, Yale University, New Haven, Connecticut; Kavli Institute for Neuroscience, Yale University, New Haven, Connecticut.
| | - B T Thomas Yeo
- Centre for Sleep & Cognition & Centre for Translational Magnetic Resonance Research, Yong Loo Lin School of Medicine, Singapore, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; N.1 Institute for Health & Institute for Digital Medicine, National University of Singapore, Singapore; Integrative Sciences and Engineering Programme, National University of Singapore, Singapore; Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, Massachusetts
| | - Avram J Holmes
- Department of Psychology, Yale University, New Haven, Connecticut; Kavli Institute for Neuroscience, Yale University, New Haven, Connecticut; Department of Psychiatry, Yale University, New Haven, Connecticut; Wu Tsai Institute, Yale University, New Haven, Connecticut.
| |
Collapse
|
31
|
Esteki B, Masoomi M, Moosazadeh M, Yoo C. Data-Driven Prediction of Janus/Core-Shell Morphology in Polymer Particles: A Machine-Learning Approach. LANGMUIR : THE ACS JOURNAL OF SURFACES AND COLLOIDS 2023; 39:4943-4958. [PMID: 36999232 DOI: 10.1021/acs.langmuir.2c03355] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The majority of research on Janus particles prepared by solvent evaporation-induced phase separation technique uses models based on interfacial tension or free energy to predict Janus/core-shell morphology. Data-driven predictions, in contrast, utilize multiple samples to identify patterns and outliers. Using machine-learning algorithms and explainable artificial intelligence (XAI) analysis, we developed a model based on a 200-instance data set to predict particle morphology. As model features, simplified molecular input line entry system syntax identifies explanatory variables, including cohesive energy density, molar volume, the Flory-Huggins interaction parameter of polymers, and the solvent solubility parameter. Our most accurate ensemble classifiers predict morphology with an accuracy of 90%. In addition, we employ innovative XAI tools to interpret system behavior, suggesting phase-separated morphology to be most affected by solvent solubility, polymer cohesive energy difference, and blend composition. While polymers with cohesive energy densities above a certain threshold favor the core-shell structure, systems with weak intermolecular interactions favor the Janus structure. The correlation between molar volume and morphology suggests that increasing the size of polymer repeating units favors Janus particles. Additionally, the Janus structure is preferred when the Flory-Huggins interaction parameter exceeds 0.4. XAI analysis introduces feature values that generate the thermodynamically low driving force of phase separation, resulting in kinetically stable morphologies as opposed to thermodynamically stable ones. The Shapley plots of this study also reveal novel methods for creating Janus or core-shell particles based on solvent evaporation-induced phase separation by selecting feature values that strongly favor a given morphology.
Collapse
Affiliation(s)
- Bahareh Esteki
- Department of Chemical Engineering, Polymer Group, Isfahan University of Technology, Isfahan 84156-83111, Iran
| | - Mahmood Masoomi
- Department of Chemical Engineering, Polymer Group, Isfahan University of Technology, Isfahan 84156-83111, Iran
| | - Mohammad Moosazadeh
- Integrated Engineering Major, Department of Environmental Science and Engineering, Kyung Hee University, Seocheon-dong 1, Giheung-gu, Yongin-Si, Gyeonggi-Do 446-701, South Korea
| | - ChangKyoo Yoo
- Integrated Engineering Major, Department of Environmental Science and Engineering, Kyung Hee University, Seocheon-dong 1, Giheung-gu, Yongin-Si, Gyeonggi-Do 446-701, South Korea
| |
Collapse
|
32
|
Hossain E, Rana R, Higgins N, Soar J, Barua PD, Pisani AR, Turner K. Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review. Comput Biol Med 2023; 155:106649. [PMID: 36805219 DOI: 10.1016/j.compbiomed.2023.106649] [Citation(s) in RCA: 82] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 01/04/2023] [Accepted: 02/07/2023] [Indexed: 02/12/2023]
Abstract
BACKGROUND Natural Language Processing (NLP) is widely used to extract clinical insights from Electronic Health Records (EHRs). However, the lack of annotated data, automated tools, and other challenges hinder the full utilisation of NLP for EHRs. Various Machine Learning (ML), Deep Learning (DL) and NLP techniques are studied and compared to understand the limitations and opportunities in this space comprehensively. METHODOLOGY After screening 261 articles from 11 databases, we included 127 papers for full-text review covering seven categories of articles: (1) medical note classification, (2) clinical entity recognition, (3) text summarisation, (4) deep learning (DL) and transfer learning architecture, (5) information extraction, (6) Medical language translation and (7) other NLP applications. This study follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. RESULT AND DISCUSSION EHR was the most commonly used data type among the selected articles, and the datasets were primarily unstructured. Various ML and DL methods were used, with prediction or classification being the most common application of ML or DL. The most common use cases were: the International Classification of Diseases, Ninth Revision (ICD-9) classification, clinical note analysis, and named entity recognition (NER) for clinical descriptions and research on psychiatric disorders. CONCLUSION We find that the adopted ML models were not adequately assessed. In addition, the data imbalance problem is quite important, yet we must find techniques to address this underlining problem. Future studies should address key limitations in studies, primarily identifying Lupus Nephritis, Suicide Attempts, perinatal self-harmed and ICD-9 classification.
Collapse
Affiliation(s)
- Elias Hossain
- School of Engineering & Physical Sciences, North South University, Dhaka 1229, Bangladesh.
| | - Rajib Rana
- School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield Central QLD 4300, Australia
| | - Niall Higgins
- School of Management and Enterprise, University of Southern Queensland, Darling Heights QLD 4350, Australia; School of Nursing, Queensland University of Technology, Kelvin Grove, Brisbane, QLD 4000, Australia; Metro North Mental Health, Herston QLD 4029, Australia
| | - Jeffrey Soar
- School of Business, University of Southern Queensland, Springfield Central QLD 4300, Australia
| | - Prabal Datta Barua
- School of Business, University of Southern Queensland, Springfield Central QLD 4300, Australia
| | - Anthony R Pisani
- Center for the Study and Prevention of Suicide, University of Rochester, Rochester, NY, United States
| | - Kathryn Turner
- School of Nursing, Queensland University of Technology, Kelvin Grove, Brisbane, QLD 4000, Australia
| |
Collapse
|
33
|
Jayaramu V, Zulkafli Z, De Stercke S, Buytaert W, Rahmat F, Abdul Rahman RZ, Ishak AJ, Tahir W, Ab Rahman J, Mohd Fuzi NMH. Leptospirosis modelling using hydrometeorological indices and random forest machine learning. INTERNATIONAL JOURNAL OF BIOMETEOROLOGY 2023; 67:423-437. [PMID: 36719482 DOI: 10.1007/s00484-022-02422-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 12/21/2022] [Accepted: 12/26/2022] [Indexed: 06/18/2023]
Abstract
Leptospirosis is a zoonosis that has been linked to hydrometeorological variability. Hydrometeorological averages and extremes have been used before as drivers in the statistical prediction of disease. However, their importance and predictive capacity are still little known. In this study, the use of a random forest classifier was explored to analyze the relative importance of hydrometeorological indices in developing the leptospirosis model and to evaluate the performance of models based on the type of indices used, using case data from three districts in Kelantan, Malaysia, that experience annual monsoonal rainfall and flooding. First, hydrometeorological data including rainfall, streamflow, water level, relative humidity, and temperature were transformed into 164 weekly average and extreme indices in accordance with the Expert Team on Climate Change Detection and Indices (ETCCDI). Then, weekly case occurrences were classified into binary classes "high" and "low" based on an average threshold. Seventeen models based on "average," "extreme," and "mixed" indices were trained by optimizing the feature subsets based on the model computed mean decrease Gini (MDG) scores. The variable importance was assessed through cross-correlation analysis and the MDG score. The average and extreme models showed similar prediction accuracy ranges (61.5-76.1% and 72.3-77.0%) while the mixed models showed an improvement (71.7-82.6% prediction accuracy). An extreme model was the most sensitive while an average model was the most specific. The time lag associated with the driving indices agreed with the seasonality of the monsoon. The rainfall variable (extreme) was the most important in classifying the leptospirosis occurrence while streamflow was the least important despite showing higher correlations with leptospirosis.
Collapse
Affiliation(s)
- Veianthan Jayaramu
- Department of Civil Engineering, Universiti Putra Malaysia, Serdang, Malaysia
| | - Zed Zulkafli
- Department of Civil Engineering, Universiti Putra Malaysia, Serdang, Malaysia.
| | - Simon De Stercke
- Department of Civil and Environmental Engineering, Imperial College London, London, UK
| | - Wouter Buytaert
- Department of Civil and Environmental Engineering, Imperial College London, London, UK
| | - Fariq Rahmat
- Department of Electrical and Electronic Engineering, Universiti Putra Malaysia, Serdang, Malaysia
| | | | - Asnor Juraiza Ishak
- Department of Electrical and Electronic Engineering, Universiti Putra Malaysia, Serdang, Malaysia
| | - Wardah Tahir
- Flood Control Research Group, Faculty of Civil Engineering, Universiti Teknologi Mara, Shah Alam, Malaysia
| | - Jamalludin Ab Rahman
- Department of Community Medicine, Kulliyyah of Medicine, International Islamic University Malaysia, Kuantan, Malaysia
| | | |
Collapse
|
34
|
Makkar A, Santosh KC. SecureFed: federated learning empowered medical imaging technique to analyze lung abnormalities in chest X-rays. INT J MACH LEARN CYB 2023; 14:1-12. [PMID: 36817940 PMCID: PMC9928498 DOI: 10.1007/s13042-023-01789-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 01/20/2023] [Indexed: 02/16/2023]
Abstract
Machine learning is an effective and accurate technique to diagnose COVID-19 infections using image data, and chest X-Ray (CXR) is no exception. Considering privacy issues, machine learning scientists end up receiving less medical imaging data. Federated Learning (FL) is a privacy-preserving distributed machine learning paradigm that generates an unbiased global model that follows local model (from clients) without exposing their personal data. In the case of heterogeneous data among clients, vanilla or default FL mechanism still introduces an insecure method for updating models. Therefore, we proposed SecureFed-a secure aggregation method-which ensures fairness and robustness. In our experiments, we employed COVID-19 CXR dataset (of size 2100 positive cases) and compared it with the existing FL frameworks such as FedAvg, FedMGDA+, and FedRAD. In our comparison, we primarily considered robustness (accuracy) and fairness (consistency). As the SecureFed produced consistently better results, it is generic enough to be considered for multimodal data.
Collapse
Affiliation(s)
- Aaisha Makkar
- College of Science and Engineering, University of Derby, Kedleston Rd, Derby, DE22 1GB UK
| | - KC Santosh
- Applied AI Research Lab, Department of Computer Science, University of South Dakota, 414 E Clark St, Vermillion, SD 57069 USA
| |
Collapse
|
35
|
A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining. INFORMATION 2023. [DOI: 10.3390/info14010054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Educational data mining is capable of producing useful data-driven applications (e.g., early warning systems in schools or the prediction of students’ academic achievement) based on predictive models. However, the class imbalance problem in educational datasets could hamper the accuracy of predictive models as many of these models are designed on the assumption that the predicted class is balanced. Although previous studies proposed several methods to deal with the imbalanced class problem, most of them focused on the technical details of how to improve each technique, while only a few focused on the application aspect, especially for the application of data with different imbalance ratios. In this study, we compared several sampling techniques to handle the different ratios of the class imbalance problem (i.e., moderately or extremely imbalanced classifications) using the High School Longitudinal Study of 2009 dataset. For our comparison, we used random oversampling (ROS), random undersampling (RUS), and the combination of the synthetic minority oversampling technique for nominal and continuous (SMOTE-NC) and RUS as a hybrid resampling technique. We used the Random Forest as our classification algorithm to evaluate the results of each sampling technique. Our results show that random oversampling for moderately imbalanced data and hybrid resampling for extremely imbalanced data seem to work best. The implications for educational data mining applications and suggestions for future research are discussed.
Collapse
|
36
|
Automatic detection of Alzheimer’s disease progression: An efficient information fusion approach with heterogeneous ensemble classifiers. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
37
|
Haliduola HN, Bretz F, Mansmann U. Missing data imputation using utility-based regression and sampling approaches. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 226:107172. [PMID: 36260971 DOI: 10.1016/j.cmpb.2022.107172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 02/04/2022] [Accepted: 10/02/2022] [Indexed: 06/16/2023]
Abstract
Data are often missing not at random (MNAR) in scientific experiments. We treat the MNAR problem as an imbalanced learning task. Standard predictive error measures of regression (e.g., mean squared error) are not suitable for imbalanced learning problems, such as in clinical trials where extreme values tend to be MNAR. We investigate hybrid imbalanced learning approaches that combine utility-based regression (UBR) with synthetic minority oversampling technique for regression (SMOTER) in cross-sectional trial settings. UBR optimizes the product of the conditional probability density (estimated by quantile regression forests) and a utility function which takes the relevance of the target variable value and the prediction error into account. SMOTER oversamples the relevant rare cases. Simulations show that the proposed method provides plausible predictions and reduces the bias for realistic missing data scenarios when compared with standard approaches like random forests and multiple imputation (systematic bias is observed in those methods, i.e., a tendency to underestimate the mean and standard deviation given the presence of MNAR in the area of high values of the target variable). The proposed method is implemented in a real dataset from an antidepressant clinical trial, and similar pattern of the systematic bias from commonly used methods is observed in the real data compare to the proposed method. Therefore, we encourage the integration of utility-based learning strategies for handling of missing data in the analysis of clinical trials.
Collapse
Affiliation(s)
- Halimu N Haliduola
- Institute for Medical Information Processing, Biometry and Epidemiology - IBE, LMU Munich, Munich, Germany
| | - Frank Bretz
- Novartis Pharma AG, Basel, Switzerland; Section for Medical Statistics, Medical University of Vienna, Vienna, Austria
| | - Ulrich Mansmann
- Institute for Medical Information Processing, Biometry and Epidemiology - IBE, LMU Munich, Munich, Germany.
| |
Collapse
|
38
|
Islam MT, Mustafa HA. Multi-Layer Hybrid (MLH) balancing technique: A combined approach to remove data imbalance. DATA KNOWL ENG 2022. [DOI: 10.1016/j.datak.2022.102105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
39
|
Rapid label-free detection of cholangiocarcinoma from human serum using Raman spectroscopy. PLoS One 2022; 17:e0275362. [PMID: 36227878 PMCID: PMC9562168 DOI: 10.1371/journal.pone.0275362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Accepted: 09/15/2022] [Indexed: 11/25/2022] Open
Abstract
Cholangiocarcinoma (CCA) is highly prevalent in the northeastern region of Thailand. Current diagnostic methods for CCA are often expensive, time-consuming, and require medical professionals. Thus, there is a need for a simple and low-cost CCA screening method. This work developed a rapid label-free technique by Raman spectroscopy combined with the multivariate statistical methods of principal component analysis and linear discriminant analysis (PCA-LDA), aiming to analyze and classify between CCA (n = 30) and healthy (n = 30) serum specimens. The model's classification performance was validated using k-fold cross validation (k = 5). Serum levels of cholesterol (548, 700 cm-1), tryptophan (878 cm-1), and amide III (1248,1265 cm-1) were found to be statistically significantly higher in the CCA patients, whereas serum beta-carotene (1158, 1524 cm-1) levels were significantly lower. The peak heights of these identified Raman marker bands were input into an LDA model, achieving a cross-validated diagnostic sensitivity and specificity of 71.33% and 90.00% in distinguishing the CCA from healthy specimens. The PCA-LDA technique provided a higher cross-validated sensitivity and specificity of 86.67% and 96.67%. To conclude, this work demonstrated the feasibility of using Raman spectroscopy combined with PCA-LDA as a helpful tool for cholangiocarcinoma serum-based screening.
Collapse
|
40
|
Wibowo P, Fatichah C. Pruning-based oversampling technique with smoothed bootstrap resampling for imbalanced clinical dataset of Covid-19. JOURNAL OF KING SAUD UNIVERSITY. COMPUTER AND INFORMATION SCIENCES 2022; 34:7830-7839. [PMID: 38620726 PMCID: PMC8482553 DOI: 10.1016/j.jksuci.2021.09.021] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Revised: 07/28/2021] [Accepted: 09/25/2021] [Indexed: 11/25/2022]
Abstract
The Coronavirus Disease (COVID-19) was declared a pandemic disease by the World Health Organization (WHO), and it has not ended so far. Since the infection rate of the COVID-19 increases, the computational approach is needed to predict patients infected with COVID-19 in order to speed up the diagnosis time and minimize human error compared to conventional diagnoses. However, the number of negative data that is higher than positive data can result in a data imbalance situation that affects the classification performance, resulting in a bias in the model evaluation results. This study proposes a new oversampling technique, i.e., TRIM-SBR, to generate the minor class data for diagnosing patients infected with COVID-19. It is still challenging to develop the oversampling technique due to the data's generalization issue. The proposed method is based on pruning by looking for specific minority areas while retaining data generalization, resulting in minority data seeds that serve as benchmarks in creating new synthesized data using bootstrap resampling techniques. Accuracy, Specificity, Sensitivity, F-measure, and AUC are used to evaluate classifier performance in data imbalance cases. The results show that the TRIM-SBR method provides the best performance compared to other oversampling techniques.
Collapse
Affiliation(s)
- Prasetyo Wibowo
- Department of Informatics, Faculty of Intelligent Electrical and Informatics Technology Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
| | - Chastine Fatichah
- Department of Informatics, Faculty of Intelligent Electrical and Informatics Technology Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
| |
Collapse
|
41
|
Kibria HB, Nahiduzzaman M, Goni MOF, Ahsan M, Haider J. An Ensemble Approach for the Prediction of Diabetes Mellitus Using a Soft Voting Classifier with an Explainable AI. SENSORS (BASEL, SWITZERLAND) 2022; 22:7268. [PMID: 36236367 PMCID: PMC9571784 DOI: 10.3390/s22197268] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 09/20/2022] [Accepted: 09/21/2022] [Indexed: 06/16/2023]
Abstract
Diabetes is a chronic disease that continues to be a primary and worldwide health concern since the health of the entire population has been affected by it. Over the years, many academics have attempted to develop a reliable diabetes prediction model using machine learning (ML) algorithms. However, these research investigations have had a minimal impact on clinical practice as the current studies focus mainly on improving the performance of complicated ML models while ignoring their explainability to clinical situations. Therefore, the physicians find it difficult to understand these models and rarely trust them for clinical use. In this study, a carefully constructed, efficient, and interpretable diabetes detection method using an explainable AI has been proposed. The Pima Indian diabetes dataset was used, containing a total of 768 instances where 268 are diabetic, and 500 cases are non-diabetic with several diabetic attributes. Here, six machine learning algorithms (artificial neural network (ANN), random forest (RF), support vector machine (SVM), logistic regression (LR), AdaBoost, XGBoost) have been used along with an ensemble classifier to diagnose the diabetes disease. For each machine learning model, global and local explanations have been produced using the Shapley additive explanations (SHAP), which are represented in different types of graphs to help physicians in understanding the model predictions. The balanced accuracy of the developed weighted ensemble model was 90% with a F1 score of 89% using a five-fold cross-validation (CV). The median values were used for the imputation of the missing values and the synthetic minority oversampling technique (SMOTETomek) was used to balance the classes of the dataset. The proposed approach can improve the clinical understanding of a diabetes diagnosis and help in taking necessary action at the very early stages of the disease.
Collapse
Affiliation(s)
- Hafsa Binte Kibria
- Department of Electrical & Computer Engineering, Rajshahi University of Engineering & Technology, Rajshahi 6204, Bangladesh
| | - Md Nahiduzzaman
- Department of Electrical & Computer Engineering, Rajshahi University of Engineering & Technology, Rajshahi 6204, Bangladesh
| | - Md. Omaer Faruq Goni
- Department of Electrical & Computer Engineering, Rajshahi University of Engineering & Technology, Rajshahi 6204, Bangladesh
| | - Mominul Ahsan
- Department of Computer Science, University of York, Deramore Lane, Heslington, York YO10 5GH, UK
| | - Julfikar Haider
- Department of Engineering, Manchester Metropolitan University, Manchester M1 5GD, UK
| |
Collapse
|
42
|
Rautiainen H, Alam M, Blackwell PG, Skarin A. Identification of reindeer fine-scale foraging behaviour using tri-axial accelerometer data. MOVEMENT ECOLOGY 2022; 10:40. [PMID: 36127747 PMCID: PMC9490970 DOI: 10.1186/s40462-022-00339-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 09/10/2022] [Indexed: 06/15/2023]
Abstract
Animal behavioural responses to the environment ultimately affect their survival. Monitoring animal fine-scale behaviour may improve understanding of animal functional response to the environment and provide an important indicator of the welfare of both wild and domesticated species. In this study, we illustrate the application of collar-attached acceleration sensors for investigating reindeer fine-scale behaviour. Using data from 19 reindeer, we tested the supervised machine learning algorithms Random forests, Support vector machines, and hidden Markov models to classify reindeer behaviour into seven classes: grazing, browsing low from shrubs or browsing high from trees, inactivity, walking, trotting, and other behaviours. We implemented leave-one-subject-out cross-validation to assess generalizable results on new individuals. Our main results illustrated that hidden Markov models were able to classify collar-attached accelerometer data into all our pre-defined behaviours of reindeer with reasonable accuracy while Random forests and Support vector machines were biased towards dominant classes. Random forests using 5-s windows had the highest overall accuracy (85%), while hidden Markov models were able to best predict individual behaviours and handle rare behaviours such as trotting and browsing high. We conclude that hidden Markov models provide a useful tool to remotely monitor reindeer and potentially other large herbivore species behaviour. These methods will allow us to quantify fine-scale behavioural processes in relation to environmental events.
Collapse
Affiliation(s)
- Heidi Rautiainen
- Department of Animal Nutrition and Management, Swedish University of Agricultural Sciences, Uppsala, Sweden.
| | - Moudud Alam
- School of Information and Engineering, Dalarna University, Falun, Sweden
| | - Paul G Blackwell
- School of Mathematics & Statistics, University of Sheffield, Sheffield, UK
| | - Anna Skarin
- Department of Animal Nutrition and Management, Swedish University of Agricultural Sciences, Uppsala, Sweden
| |
Collapse
|
43
|
Predicting Breast Cancer from Risk Factors Using SVM and Extra-Trees-Based Feature Selection Method. COMPUTERS 2022. [DOI: 10.3390/computers11090136] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Developing a prediction model from risk factors can provide an efficient method to recognize breast cancer. Machine learning (ML) algorithms have been applied to increase the efficiency of diagnosis at the early stage. This paper studies a support vector machine (SVM) combined with an extremely randomized trees classifier (extra-trees) to provide a diagnosis of breast cancer at the early stage based on risk factors. The extra-trees classifier was used to remove irrelevant features, while SVM was utilized to diagnose the breast cancer status. A breast cancer dataset consisting of 116 subjects was utilized by machine learning models to predict breast cancer, while the stratified 10-fold cross-validation was employed for the model evaluation. Our proposed combined SVM and extra-trees model reached the highest accuracy up to 80.23%, which was significantly better than the other ML model. The experimental results demonstrated that by applying extra-trees-based feature selection, the average ML prediction accuracy was improved by up to 7.29% as contrasted to ML without the feature selection method. Our proposed model is expected to increase the efficiency of breast cancer diagnosis based on risk factors. In addition, we presented the proposed prediction model that could be employed for web-based breast cancer prediction. The proposed model is expected to improve diagnostic decision-support systems by predicting breast cancer disease accurately.
Collapse
|
44
|
Ahmed J, Green II RC. Predicting severely imbalanced data disk drive failures with machine learning models. MACHINE LEARNING WITH APPLICATIONS 2022. [DOI: 10.1016/j.mlwa.2022.100361] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
|
45
|
Wade MW, Fisher M, Matich P. Comparison of two machine learning frameworks for predicting aggregatory behavior of sharks. J Appl Ecol 2022. [DOI: 10.1111/1365-2664.14273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Michael W. Wade
- Data Science Institute Vanderbilt University Nashville TN USA
| | - Mark Fisher
- Texas Parks and Wildlife Department, Coastal Fisheries Division, Rockport Marine Science Laboratory Rockport TX USA
| | | |
Collapse
|
46
|
Nieto-del-Amor F, Prats-Boluda G, Garcia-Casado J, Diaz-Martinez A, Diago-Almela VJ, Monfort-Ortiz R, Hao D, Ye-Lin Y. Combination of Feature Selection and Resampling Methods to Predict Preterm Birth Based on Electrohysterographic Signals from Imbalance Data. SENSORS 2022; 22:s22145098. [PMID: 35890778 PMCID: PMC9319575 DOI: 10.3390/s22145098] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 07/01/2022] [Accepted: 07/05/2022] [Indexed: 02/01/2023]
Abstract
Due to its high sensitivity, electrohysterography (EHG) has emerged as an alternative technique for predicting preterm labor. The main obstacle in designing preterm labor prediction models is the inherent preterm/term imbalance ratio, which can give rise to relatively low performance. Numerous studies obtained promising preterm labor prediction results using the synthetic minority oversampling technique. However, these studies generally overestimate mathematical models’ real generalization capacity by generating synthetic data before splitting the dataset, leaking information between the training and testing partitions and thus reducing the complexity of the classification task. In this work, we analyzed the effect of combining feature selection and resampling methods to overcome the class imbalance problem for predicting preterm labor by EHG. We assessed undersampling, oversampling, and hybrid methods applied to the training and validation dataset during feature selection by genetic algorithm, and analyzed the resampling effect on training data after obtaining the optimized feature subset. The best strategy consisted of undersampling the majority class of the validation dataset to 1:1 during feature selection, without subsequent resampling of the training data, achieving an AUC of 94.5 ± 4.6%, average precision of 84.5 ± 11.7%, maximum F1-score of 79.6 ± 13.8%, and recall of 89.8 ± 12.1%. Our results outperformed the techniques currently used in clinical practice, suggesting the EHG could be used to predict preterm labor in clinics.
Collapse
Affiliation(s)
- Félix Nieto-del-Amor
- Centro de Investigación e Innovación en Bioingeniería, Universitat Politècnica de València, 46022 Valencia, Spain; (F.N.-d.-A.); (J.G.-C.); (A.D.-M.); (Y.Y.-L.)
| | - Gema Prats-Boluda
- Centro de Investigación e Innovación en Bioingeniería, Universitat Politècnica de València, 46022 Valencia, Spain; (F.N.-d.-A.); (J.G.-C.); (A.D.-M.); (Y.Y.-L.)
- Correspondence:
| | - Javier Garcia-Casado
- Centro de Investigación e Innovación en Bioingeniería, Universitat Politècnica de València, 46022 Valencia, Spain; (F.N.-d.-A.); (J.G.-C.); (A.D.-M.); (Y.Y.-L.)
| | - Alba Diaz-Martinez
- Centro de Investigación e Innovación en Bioingeniería, Universitat Politècnica de València, 46022 Valencia, Spain; (F.N.-d.-A.); (J.G.-C.); (A.D.-M.); (Y.Y.-L.)
| | | | - Rogelio Monfort-Ortiz
- Servicio de Obstetricia, H.U.P. La Fe, 46026 Valencia, Spain; (V.J.D.-A.); (R.M.-O.)
| | - Dongmei Hao
- Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China;
| | - Yiyao Ye-Lin
- Centro de Investigación e Innovación en Bioingeniería, Universitat Politècnica de València, 46022 Valencia, Spain; (F.N.-d.-A.); (J.G.-C.); (A.D.-M.); (Y.Y.-L.)
| |
Collapse
|
47
|
Danilatou V, Nikolakakis S, Antonakaki D, Tzagkarakis C, Mavroidis D, Kostoulas T, Ioannidis S. Outcome Prediction in Critically-Ill Patients with Venous Thromboembolism and/or Cancer Using Machine Learning Algorithms: External Validation and Comparison with Scoring Systems. Int J Mol Sci 2022; 23:ijms23137132. [PMID: 35806137 PMCID: PMC9266386 DOI: 10.3390/ijms23137132] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 06/17/2022] [Accepted: 06/19/2022] [Indexed: 12/16/2022] Open
Abstract
Intensive care unit (ICU) patients with venous thromboembolism (VTE) and/or cancer suffer from high mortality rates. Mortality prediction in the ICU has been a major medical challenge for which several scoring systems exist but lack in specificity. This study focuses on two target groups, namely patients with thrombosis or cancer. The main goal is to develop and validate interpretable machine learning (ML) models to predict early and late mortality, while exploiting all available data stored in the medical record. To this end, retrospective data from two freely accessible databases, MIMIC-III and eICU, were used. Well-established ML algorithms were implemented utilizing automated and purposely built ML frameworks for addressing class imbalance. Prediction of early mortality showed excellent performance in both disease categories, in terms of the area under the receiver operating characteristic curve (AUC–ROC): VTE-MIMIC-III 0.93, eICU 0.87, cancer-MIMIC-III 0.94. On the other hand, late mortality prediction showed lower performance, i.e., AUC–ROC: VTE 0.82, cancer 0.74–0.88. The predictive model of early mortality developed from 1651 VTE patients (MIMIC-III) ended up with a signature of 35 features and was externally validated in 2659 patients from the eICU dataset. Our model outperformed traditional scoring systems in predicting early as well as late mortality. Novel biomarkers, such as red cell distribution width, were identified.
Collapse
Affiliation(s)
- Vasiliki Danilatou
- Sphynx Technology Solutions, 6300 Zug, Switzerland
- School of Medicine, European University of Cyprus, 2404 Nicosia, Cyprus
- Correspondence: or
| | - Stylianos Nikolakakis
- School of Electrical and Computer Engineering, Technical University of Crete, 73100 Chania, Greece; (S.N.); (S.I.)
| | - Despoina Antonakaki
- Institute of Computer Science (ICS)-Foundation for Research and Technology-Hellas (FORTH), 70013 Heraklion, Greece; (D.A.); (C.T.); (D.M.)
| | - Christos Tzagkarakis
- Institute of Computer Science (ICS)-Foundation for Research and Technology-Hellas (FORTH), 70013 Heraklion, Greece; (D.A.); (C.T.); (D.M.)
| | - Dimitrios Mavroidis
- Institute of Computer Science (ICS)-Foundation for Research and Technology-Hellas (FORTH), 70013 Heraklion, Greece; (D.A.); (C.T.); (D.M.)
| | - Theodoros Kostoulas
- Department of Information and Communication Systems Engineering, School of Engineering, University of the Aegean, 83200 Samos, Greece;
| | - Sotirios Ioannidis
- School of Electrical and Computer Engineering, Technical University of Crete, 73100 Chania, Greece; (S.N.); (S.I.)
- Institute of Computer Science (ICS)-Foundation for Research and Technology-Hellas (FORTH), 70013 Heraklion, Greece; (D.A.); (C.T.); (D.M.)
| |
Collapse
|
48
|
Teoh L, Ihalage AA, Harp S, F. Al-Khateeb Z, Michael-Titus AT, Tremoleda JL, Hao Y. Deep learning for behaviour classification in a preclinical brain injury model. PLoS One 2022; 17:e0268962. [PMID: 35704595 PMCID: PMC9200342 DOI: 10.1371/journal.pone.0268962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 05/11/2022] [Indexed: 11/18/2022] Open
Abstract
The early detection of traumatic brain injuries can directly impact the prognosis and survival of patients. Preceding attempts to automate the detection and the assessment of the severity of traumatic brain injury continue to be based on clinical diagnostic methods, with limited tools for disease outcomes in large populations. Despite advances in machine and deep learning tools, current approaches still use simple trends of statistical analysis which lack generality. The effectiveness of deep learning to extract information from large subsets of data can be further emphasised through the use of more elaborate architectures. We therefore explore the use of a multiple input, convolutional neural network and long short-term memory (LSTM) integrated architecture in the context of traumatic injury detection through predicting the presence of brain injury in a murine preclinical model dataset. We investigated the effectiveness and validity of traumatic brain injury detection in the proposed model against various other machine learning algorithms such as the support vector machine, the random forest classifier and the feedforward neural network. Our dataset was acquired using a home cage automated (HCA) system to assess the individual behaviour of mice with traumatic brain injury or non-central nervous system (non-CNS) injured controls, whilst housed in their cages. Their distance travelled, body temperature, separation from other mice and movement were recorded every 15 minutes, for 72 hours weekly, for 5 weeks following intervention. The HCA behavioural data was used to train a deep learning model, which then predicts if the animals were subjected to a brain injury or just a sham intervention without brain damage. We also explored and evaluated different ways to handle the class imbalance present in the uninjured class of our training data. We then evaluated our models with leave-one-out cross validation. Our proposed deep learning model achieved the best performance and showed promise in its capability to detect the presence of brain trauma in mice.
Collapse
Affiliation(s)
- Lucas Teoh
- School of Electronic Engineering and Computer Science, Queen Mary University of London, Mile End, London, United Kingdom
| | - Achintha Avin Ihalage
- School of Electronic Engineering and Computer Science, Queen Mary University of London, Mile End, London, United Kingdom
| | - Srooley Harp
- Centre for Neuroscience, Surgery and Trauma, The Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom
| | - Zahra F. Al-Khateeb
- Centre for Neuroscience, Surgery and Trauma, The Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom
| | - Adina T. Michael-Titus
- Centre for Neuroscience, Surgery and Trauma, The Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom
| | - Jordi L. Tremoleda
- Centre for Neuroscience, Surgery and Trauma, The Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom
- * E-mail: (YH); (JLT)
| | - Yang Hao
- School of Electronic Engineering and Computer Science, Queen Mary University of London, Mile End, London, United Kingdom
- * E-mail: (YH); (JLT)
| |
Collapse
|
49
|
Haliduola HN, Bretz F, Mansmann U. Missing data imputation in clinical trials using recurrent neural network facilitated by clustering and oversampling. Biom J 2022; 64:863-882. [PMID: 35266565 DOI: 10.1002/bimj.202000393] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 11/09/2021] [Accepted: 02/06/2022] [Indexed: 11/07/2022]
Abstract
In clinical practice, the composition of missing data may be complex, for example, a mixture of missing at random (MAR) and missing not at random (MNAR) assumptions. Many methods under the assumption of MAR are available. Under the assumption of MNAR, likelihood-based methods require specification of the joint distribution of the data, and the missingness mechanism has been introduced as sensitivity analysis. These classic models heavily rely on the underlying assumption, and, in many realistic scenarios, they can produce unreliable estimates. In this paper, we develop a machine learning based missing data prediction framework with the aim of handling more realistic missing data scenarios. We use an imbalanced learning technique (i.e., oversampling of minority class) to handle the MNAR data. To implement oversampling in longitudinal continuous variable, we first perform clustering via k $k$ -mean trajectories. And use the recurrent neural network (RNN) to model the longitudinal data. Further, we apply bootstrap aggregating to improve the accuracy of prediction and also to consider the uncertainty of a single prediction. We evaluate the proposed method using simulated data. The prediction result is evaluated at the individual patient level and the overall population level. We demonstrate the powerful predictive capability of RNN for longitudinal data and its flexibility for nonlinear modeling. Overall, the proposed method provides an accurate individual prediction for both MAR and MNAR data and reduce the bias of missing data in treatment effect estimation when compared to standard methods and classic models. Finally, we implement the proposed method in a real dataset from an antidepressant clinical trial. In summary, this paper offers an opportunity to encourage the integration of machine learning strategies for handling of missing data in the analysis of randomized clinical trials.
Collapse
Affiliation(s)
- Halimu N Haliduola
- Institute for Medical Information Processing, Biometry and Epidemiology (IBE), LMU Munich, Munich, Germany.,Alvotech Germany GmbH, Jülich, Germany
| | - Frank Bretz
- Novartis Pharma AG, Basel, Switzerland.,Section for Medical Statistics, Medical University of Vienna, Vienna, Austria
| | - Ulrich Mansmann
- Institute for Medical Information Processing, Biometry and Epidemiology (IBE), LMU Munich, Munich, Germany
| |
Collapse
|
50
|
Dai Q, Liu JW, Liu Y. Multi-granularity relabeled under-sampling algorithm for imbalanced data. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109083] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|