1
|
Demircioğlu A. Applying oversampling before cross-validation will lead to high bias in radiomics. Sci Rep 2024; 14:11563. [PMID: 38773233 PMCID: PMC11109211 DOI: 10.1038/s41598-024-62585-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 05/20/2024] [Indexed: 05/23/2024] Open
Abstract
Class imbalance is often unavoidable for radiomic data collected from clinical routine. It can create problems during classifier training since the majority class could dominate the minority class. Consequently, resampling methods like oversampling or undersampling are applied to the data to class-balance the data. However, the resampling must not be applied upfront to all data because it would lead to data leakage and, therefore, to erroneous results. This study aims to measure the extent of this bias. Five-fold cross-validation with 30 repeats was performed using a set of 15 radiomic datasets to train predictive models. The training involved two scenarios: first, the models were trained correctly by applying the resampling methods during the cross-validation. Second, the models were trained incorrectly by performing the resampling on all the data before cross-validation. The bias was defined empirically as the difference between the best-performing models in both scenarios in terms of area under the receiver operating characteristic curve (AUC), sensitivity, specificity, balanced accuracy, and the Brier score. In addition, a simulation study was performed on a randomly generated dataset for verification. The results demonstrated that incorrectly applying the oversampling methods to all data resulted in a large positive bias (up to 0.34 in AUC, 0.33 in sensitivity, 0.31 in specificity, and 0.37 in balanced accuracy). The bias depended on the data balance, and approximately an increase of 0.10 in the AUC was observed for each increase in imbalance. The models also showed a bias in calibration measured using the Brier score, which differed by up to -0.18 between the correctly and incorrectly trained models. The undersampling methods were not affected significantly by bias. These results emphasize that any resampling method should be applied correctly only to the training data to avoid data leakage and, subsequently, biased model performance and calibration.
Collapse
Affiliation(s)
- Aydin Demircioğlu
- Institute of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Hufelandstraße 55, 45147, Essen, Germany.
| |
Collapse
|
2
|
Kapoor S, Cantrell EM, Peng K, Pham TH, Bail CA, Gundersen OE, Hofman JM, Hullman J, Lones MA, Malik MM, Nanayakkara P, Poldrack RA, Raji ID, Roberts M, Salganik MJ, Serra-Garcia M, Stewart BM, Vandewiele G, Narayanan A. REFORMS: Consensus-based Recommendations for Machine-learning-based Science. SCIENCE ADVANCES 2024; 10:eadk3452. [PMID: 38691601 PMCID: PMC11092361 DOI: 10.1126/sciadv.adk3452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 03/29/2024] [Indexed: 05/03/2024]
Abstract
Machine learning (ML) methods are proliferating in scientific research. However, the adoption of these methods has been accompanied by failures of validity, reproducibility, and generalizability. These failures can hinder scientific progress, lead to false consensus around invalid claims, and undermine the credibility of ML-based science. ML methods are often applied and fail in similar ways across disciplines. Motivated by this observation, our goal is to provide clear recommendations for conducting and reporting ML-based science. Drawing from an extensive review of past literature, we present the REFORMS checklist (recommendations for machine-learning-based science). It consists of 32 questions and a paired set of guidelines. REFORMS was developed on the basis of a consensus of 19 researchers across computer science, data science, mathematics, social sciences, and biomedical sciences. REFORMS can serve as a resource for researchers when designing and implementing a study, for referees when reviewing papers, and for journals when enforcing standards for transparency and reproducibility.
Collapse
Affiliation(s)
- Sayash Kapoor
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
| | - Emily M. Cantrell
- Department of Sociology, Princeton University, Princeton, NJ 08544, USA
- School of Public and International Affairs, Princeton University, Princeton, NJ 08544, USA
| | - Kenny Peng
- Department of Computer Science, Cornell University, Ithaca, NY 14850, USA
| | - Thanh Hien Pham
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
| | - Christopher A. Bail
- Department of Sociology, Duke University, Durham, NC 27708, USA
- Department of Political Science, Duke University, Durham, NC 27708, USA
- Sanford School of Public Policy, Duke University, Durham, NC 27708, USA
| | - Odd Erik Gundersen
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
- Aneo AS, Trondheim, Norway
| | | | - Jessica Hullman
- Department of Computer Science, Northwestern University, Evanston, IL 60208, USA
| | - Michael A. Lones
- School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK
| | - Momin M. Malik
- Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA
- School of Social Policy & Practice, University of Pennsylvania, Philadelphia, PA 19104, USA
- Institute in Critical Quantitative, Computational, & Mixed Methodologies, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Priyanka Nanayakkara
- Department of Computer Science, Northwestern University, Evanston, IL 60208, USA
- Department of Communication Studies, Northwestern University, Evanston, IL 60208, USA
| | | | - Inioluwa Deborah Raji
- Department of Computer Science, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Michael Roberts
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge, Cambridge, UK
| | - Matthew J. Salganik
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
- Department of Sociology, Princeton University, Princeton, NJ 08544, USA
- Office of Population Research, Princeton University, Princeton, NJ 08544, USA
| | - Marta Serra-Garcia
- Rady School of Management, University of California, San Diego, La Jolla, CA 92093, USA
| | - Brandon M. Stewart
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
- Department of Sociology, Princeton University, Princeton, NJ 08544, USA
- Office of Population Research, Princeton University, Princeton, NJ 08544, USA
- Department of Politics, Princeton University, Princeton, NJ 08544, USA
| | - Gilles Vandewiele
- Department of Information Technology, Ghent University, Ghent, Belgium
| | - Arvind Narayanan
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
3
|
Azim Mim M, Majadi N, Mazumder P. A soft voting ensemble learning approach for credit card fraud detection. Heliyon 2024; 10:e25466. [PMID: 38333818 PMCID: PMC10850588 DOI: 10.1016/j.heliyon.2024.e25466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 12/27/2023] [Accepted: 01/27/2024] [Indexed: 02/10/2024] Open
Abstract
With the advancement of e-commerce and modern technological development, credit cards are widely used for both online and offline purchases, which has increased the number of daily fraudulent transactions. Many organizations and financial institutions worldwide lose billions of dollars annually because of credit card fraud. Due to the global distribution of both legitimate and fraudulent transactions, it is difficult to discern between the two. Furthermore, because only a small proportion of transactions are fraudulent, there is a problem of class imbalance. Hence, an effective fraud-detection methodology is required to sustain the reliability of the payment system. Machine learning has recently emerged as a viable substitute for identifying this type of fraud. However, ML approaches have difficulty identifying fraud with high prediction accuracy, while also decreasing misclassification costs due to the size of the imbalanced data. In this research, a soft voting ensemble learning approach for detecting credit card fraud on imbalanced data is proposed. To do this, the proposed approach is evaluated and compared with numerous sophisticated sampling techniques (i.e., oversampling, undersampling, and hybrid sampling) to overcome the class imbalance problem. We develop several credit card fraud classifiers, including ensemble classifiers, with and without sampling techniques. According to the experimental results, the proposed soft-voting approach outperforms individual classifiers. With a false negative rate (FNR) of 0.0306, it achieves a precision of 0.9870, recall of 0.9694, f1-score of 0.8764, and AUROC of 0.9936.
Collapse
Affiliation(s)
- Mimusa Azim Mim
- Department of Computer Science and Telecommunication Engineering, Noakhali Science and Technology University, Noakhali-3814, Bangladesh
| | - Nazia Majadi
- Department of Computer Science and Telecommunication Engineering, Noakhali Science and Technology University, Noakhali-3814, Bangladesh
| | - Peal Mazumder
- Department of Computer Science and Telecommunication Engineering, Noakhali Science and Technology University, Noakhali-3814, Bangladesh
| |
Collapse
|
4
|
Habets PC, Thomas RM, Milaneschi Y, Jansen R, Pool R, Peyrot WJ, Penninx BWJH, Meijer OC, van Wingen GA, Vinkers CH. Multimodal Data Integration Advances Longitudinal Prediction of the Naturalistic Course of Depression and Reveals a Multimodal Signature of Remission During 2-Year Follow-up. Biol Psychiatry 2023; 94:948-958. [PMID: 37330166 DOI: 10.1016/j.biopsych.2023.05.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 05/11/2023] [Accepted: 05/30/2023] [Indexed: 06/19/2023]
Abstract
BACKGROUND The ability to predict the disease course of individuals with major depressive disorder (MDD) is essential for optimal treatment planning. Here, we used a data-driven machine learning approach to assess the predictive value of different sets of biological data (whole-blood proteomics, lipid metabolomics, transcriptomics, genetics), both separately and added to clinical baseline variables, for the longitudinal prediction of 2-year remission status in MDD at the individual-subject level. METHODS Prediction models were trained and cross-validated in a sample of 643 patients with current MDD (2-year remission n = 325) and subsequently tested for performance in 161 individuals with MDD (2-year remission n = 82). RESULTS Proteomics data showed the best unimodal data predictions (area under the receiver operating characteristic curve = 0.68). Adding proteomic to clinical data at baseline significantly improved 2-year MDD remission predictions (area under the receiver operating characteristic curve = 0.63 vs. 0.78, p = .013), while the addition of other omics data to clinical data did not yield significantly improved model performance. Feature importance and enrichment analysis revealed that proteomic analytes were involved in inflammatory response and lipid metabolism, with fibrinogen levels showing the highest variable importance, followed by symptom severity. Machine learning models outperformed psychiatrists' ability to predict 2-year remission status (balanced accuracy = 71% vs. 55%). CONCLUSIONS This study showed the added predictive value of combining proteomic data, but not other omics data, with clinical data for the prediction of 2-year remission status in MDD. Our results reveal a novel multimodal signature of 2-year MDD remission status that shows clinical potential for individual MDD disease course predictions from baseline measurements.
Collapse
Affiliation(s)
- Philippe C Habets
- Department of Anatomy & Neurosciences, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands; Department of Psychiatry, Amsterdam Neuroscience, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands; Department of Internal Medicine, section Endocrinology, Leiden University Medical Center, Leiden, the Netherlands.
| | - Rajat M Thomas
- Department of Anatomy & Neurosciences, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands
| | - Yuri Milaneschi
- Department of Anatomy & Neurosciences, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands; Department of Psychiatry, Amsterdam Neuroscience, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands
| | - Rick Jansen
- Department of Anatomy & Neurosciences, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands; Department of Psychiatry, Amsterdam Neuroscience, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands
| | - Rene Pool
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Neuroscience Campus Amsterdam, Amsterdam, the Netherlands
| | - Wouter J Peyrot
- Department of Psychiatry, Amsterdam Neuroscience, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands; Department of Complex Traits Genetics, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, Vrije Universiteit, Amsterdam, the Netherlands
| | - Brenda W J H Penninx
- Department of Psychiatry, Amsterdam Neuroscience, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands
| | - Onno C Meijer
- Department of Internal Medicine, section Endocrinology, Leiden University Medical Center, Leiden, the Netherlands
| | - Guido A van Wingen
- Department of Anatomy & Neurosciences, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands
| | - Christiaan H Vinkers
- Department of Anatomy & Neurosciences, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands; Department of Psychiatry, Amsterdam Neuroscience, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands
| |
Collapse
|
5
|
Lapp L, Roper M, Kavanagh K, Schraag S. Development and validation of a digital biomarker predicting acute kidney injury following cardiac surgery on an hourly basis. JTCVS OPEN 2023; 16:540-581. [PMID: 38204694 PMCID: PMC10775068 DOI: 10.1016/j.xjon.2023.09.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 09/01/2023] [Accepted: 09/06/2023] [Indexed: 01/12/2024]
Abstract
Objectives To develop and validate a digital biomarker for predicting the onset of acute kidney injury (AKI) on an hourly basis up to 24 hours in advance in the intensive care unit after cardiac surgery. Methods The study analyzed data from 6056 adult patients undergoing coronary artery bypass graft and/or valve surgery between April 1, 2012, and December 31, 2018 (development phase, training, and testing) and 3572 patients between January 1, 2019, and June 30, 2022 (validation phase). The study used 2 dynamic predictive modeling approaches, namely logistic regression and bootstrap aggregated regression trees machine (BARTm), to predict AKI. The mean area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and positive and negative predictive values across all lead times before the occurrence of AKI were reported. The clinical practicality was assessed using calibration. Results Of all included patients, 8.45% and 16.66% had AKI in the development and validation phases, respectively. When applied to testing data, AKI was predicted with the mean AUC of 0.850 and 0.802 by BARTm and logistic regression, respectively. When applied to validation data, BARTm and LR resulted in a mean AUC of 0.844 and 0.786, respectively. Conclusions This study demonstrated the successful prediction of AKI on an hourly basis up to 24 hours in advance. The digital biomarkers developed and validated in this study have the potential to assist clinicians in optimizing treatment and implementing preventive strategies for patients at risk of developing AKI after cardiac surgery in the intensive care unit.
Collapse
Affiliation(s)
- Linda Lapp
- Department of Computer and Information Sciences, Faculty of Science, University of Strathclyde, Glasgow, Scotland
| | - Marc Roper
- Department of Computer and Information Sciences, Faculty of Science, University of Strathclyde, Glasgow, Scotland
| | - Kimberley Kavanagh
- Department of Mathematics and Statistics, Faculty of Science, University of Strathclyde, Glasgow, Scotland
| | - Stefan Schraag
- Department of Anaesthesia and Perioperative Medicine, Golden Jubilee National Hospital, Clydebank, United Kingdom
| |
Collapse
|
6
|
Shen J, Liu Y, Zhang M, Pumir A, Mu L, Li B, Xu J. Multi-channel electrohysterography enabled uterine contraction characterization and its effect in delivery assessment. Comput Biol Med 2023; 167:107697. [PMID: 37976821 DOI: 10.1016/j.compbiomed.2023.107697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 11/03/2023] [Accepted: 11/07/2023] [Indexed: 11/19/2023]
Abstract
Uterine contractions are routinely monitored by tocodynamometer (TOCO) at late stage of pregnancy to predict the onset of labor. However, TOCO reveals no information on the synchrony and coherence of contractions, which are important contributors to a successful delivery. The electrohysterography (EHG) is a recording of the electrical activities that trigger the local muscles to contract. The spatial-temporal information embedded in multiple channel EHG signals make them ideal for characterizing the synchrony and coherence of uterine contraction. To proceed, contractile time-windows are identified from TOCO signals and are then used to segment out the simultaneously recorded EHG signals of different channels. We construct sample entropy SamEn and Concordance Correlation based feature ψ from these EHG segments to quantify the synchrony and coherence of contraction. To test the effectiveness of the proposed method, 122 EHG recordings in the Icelandic EHG database were divided into two groups according to the time difference between the gestational ages at recording and at delivery (TTD). Both SamEn and ψ show clear difference in the two groups (p<10-5) even when measurements were made 120 h before delivery. Receiver operating characteristic curve analysis of these two features gave AUC values of 0.834 and 0.726 for discriminating imminent labor defined with TTD ≤ 24 h. The SamEn was significantly smaller in women (0.1433) of imminent labor group than in women (0.3774) of the pregnancy group. Using an optimal cutoff value of SamEn to identify imminent labor gives sensitivity, specificity, and accuracy as high as 0.909, 0.712 and 0.743, respectively. These results demonstrate superiority in comparing to the existing SOTA methods. This study is the first research work focusing on characterizing the synchrony property of contractions from the electrohysterography signals. Despite the very limited dataset used in the validation process, the promising results open a new direction to the use of electrohysterography in obstetrics.
Collapse
Affiliation(s)
- Junhua Shen
- Women's Hospital, School of Medicine, Zhejiang University, Hangzhou, China; Zhejiang Provincial Clinical Research Center for Obstetrics and Gynecology, Hangzhou, China
| | - Yan Liu
- College of Computer Science, Zhejiang University of Technology, Hangzhou, China
| | - Meiyu Zhang
- College of Computer Science, Zhejiang University of Technology, Hangzhou, China
| | - Alain Pumir
- Laboratoire de Physique, Ecole Normal Superieure de Lyon, Lyon, France
| | - Liangshan Mu
- Women's Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Baohua Li
- Women's Hospital, School of Medicine, Zhejiang University, Hangzhou, China.
| | - Jinshan Xu
- College of Computer Science, Zhejiang University of Technology, Hangzhou, China.
| |
Collapse
|
7
|
Jager F. An open dataset with electrohysterogram records of pregnancies ending in induced and cesarean section delivery. Sci Data 2023; 10:669. [PMID: 37783671 PMCID: PMC10545725 DOI: 10.1038/s41597-023-02581-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 09/20/2023] [Indexed: 10/04/2023] Open
Abstract
The existing non-invasive automated preterm birth prediction methods rely on the use of uterine electrohysterogram (EHG) records coming from spontaneous preterm and term deliveries, and are indifferent to term induced and cesarean section deliveries. In order to enhance current publicly available pool of term EHG records, we developed a new EHG dataset, Induced Cesarean EHG DataSet (ICEHG DS), containing 126 30-minute EHG records, recorded early (23rd week), and/or later (31st week) during pregnancy, of those pregnancies that were expected to end in spontaneous term delivery, but ended in induced or cesarean section delivery. The records were collected at the University Medical Center Ljubljana, Ljubljana, Slovenia. The dataset includes 38 and 43, early and later, induced; 11 and 8, early and later, cesarean; and 13 and 13, early and later, induced and cesarean EHG records. This dataset enables better understanding of the underlying physiological mechanisms involved during pregnancies ending in induced and cesarean deliveries, and provides a robust and more realistic assessment of the performance of automated preterm birth prediction methods.
Collapse
Affiliation(s)
- Franc Jager
- Faculty of Computer and Information Science, University of Ljubljana, Večna pot 113, 1000, Ljubljana, Slovenia.
| |
Collapse
|
8
|
Kapoor S, Narayanan A. Leakage and the reproducibility crisis in machine-learning-based science. PATTERNS (NEW YORK, N.Y.) 2023; 4:100804. [PMID: 37720327 PMCID: PMC10499856 DOI: 10.1016/j.patter.2023.100804] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 05/18/2023] [Accepted: 07/05/2023] [Indexed: 09/19/2023]
Abstract
Machine-learning (ML) methods have gained prominence in the quantitative sciences. However, there are many known methodological pitfalls, including data leakage, in ML-based science. We systematically investigate reproducibility issues in ML-based science. Through a survey of literature in fields that have adopted ML methods, we find 17 fields where leakage has been found, collectively affecting 294 papers and, in some cases, leading to wildly overoptimistic conclusions. Based on our survey, we introduce a detailed taxonomy of eight types of leakage, ranging from textbook errors to open research problems. We propose that researchers test for each type of leakage by filling out model info sheets, which we introduce. Finally, we conduct a reproducibility study of civil war prediction, where complex ML models are believed to vastly outperform traditional statistical models such as logistic regression (LR). When the errors are corrected, complex ML models do not perform substantively better than decades-old LR models.
Collapse
Affiliation(s)
- Sayash Kapoor
- Department of Computer Science and Center for Information Technology Policy, Princeton University, Princeton, NJ 08540, USA
| | - Arvind Narayanan
- Department of Computer Science and Center for Information Technology Policy, Princeton University, Princeton, NJ 08540, USA
| |
Collapse
|
9
|
Di Napoli A, Tagliente E, Pasquini L, Cipriano E, Pietrantonio F, Ortis P, Curti S, Boellis A, Stefanini T, Bernardini A, Angeletti C, Ranieri SC, Franchi P, Voicu IP, Capotondi C, Napolitano A. 3D CT-Inclusive Deep-Learning Model to Predict Mortality, ICU Admittance, and Intubation in COVID-19 Patients. J Digit Imaging 2023; 36:603-616. [PMID: 36450922 PMCID: PMC9713092 DOI: 10.1007/s10278-022-00734-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 10/08/2022] [Accepted: 10/30/2022] [Indexed: 12/02/2022] Open
Abstract
Chest CT is a useful initial exam in patients with coronavirus disease 2019 (COVID-19) for assessing lung damage. AI-powered predictive models could be useful to better allocate resources in the midst of the pandemic. Our aim was to build a deep-learning (DL) model for COVID-19 outcome prediction inclusive of 3D chest CT images acquired at hospital admission. This retrospective multicentric study included 1051 patients (mean age 69, SD = 15) who presented to the emergency department of three different institutions between 20th March 2020 and 20th January 2021 with COVID-19 confirmed by real-time reverse transcriptase polymerase chain reaction (RT-PCR). Chest CT at hospital admission were evaluated by a 3D residual neural network algorithm. Training, internal validation, and external validation groups included 608, 153, and 290 patients, respectively. Images, clinical, and laboratory data were fed into different customizations of a dense neural network to choose the best performing architecture for the prediction of mortality, intubation, and intensive care unit (ICU) admission. The AI model tested on CT and clinical features displayed accuracy, sensitivity, specificity, and ROC-AUC, respectively, of 91.7%, 90.5%, 92.4%, and 95% for the prediction of patient's mortality; 91.3%, 91.5%, 89.8%, and 95% for intubation; and 89.6%, 90.2%, 86.5%, and 94% for ICU admission (internal validation) in the testing cohort. The performance was lower in the validation cohort for mortality (71.7%, 55.6%, 74.8%, 72%), intubation (72.6%, 74.7%, 45.7%, 64%), and ICU admission (74.7%, 77%, 46%, 70%) prediction. The addition of the available laboratory data led to an increase in sensitivity for patient's mortality (66%) and specificity for intubation and ICU admission (50%, 52%, respectively), while the other metrics maintained similar performance results. We present a deep-learning model to predict mortality, ICU admittance, and intubation in COVID-19 patients. KEY POINTS: • 3D CT-based deep learning model predicted the internal validation set with high accuracy, sensibility and specificity (> 90%) mortality, ICU admittance, and intubation in COVID-19 patients. • The model slightly increased prediction results when laboratory data were added to the analysis, despite data imbalance. However, the model accuracy dropped when CT images were not considered in the analysis, implying an important role of CT in predicting outcomes.
Collapse
Affiliation(s)
- Alberto Di Napoli
- Radiology Department, Castelli Hospital, 00040, Ariccia, Italy
- NESMOS Department, Neuroradiology Unit, Sant'Andrea Hospital, Sapienza University, Via Grottarossa 1035, 00189, 00165, Rome, Italy
| | - Emanuela Tagliente
- Medical Physics Department, Bambino Gesù Children's Hospital, Scientific Institute for Research, Hospitalization and Healthcare (IRCCS), 00165, Rome, Italy
| | - Luca Pasquini
- NESMOS Department, Neuroradiology Unit, Sant'Andrea Hospital, Sapienza University, Via Grottarossa 1035, 00189, 00165, Rome, Italy.
- Radiology Department, Neuroradiology Service, Memorial Sloan Kettering Cancer Center, New York, NY, 1275, USA.
| | - Enrica Cipriano
- COVID Medicine Department, Castelli Hospital, 00040, Ariccia, Italy
| | | | - Piermaria Ortis
- COVID Intensive Care Unit, Castelli Hospital, 00040, Ariccia, Italy
| | - Simona Curti
- Emergency Department, Castelli Hospital, 00040, Ariccia, Italy
| | - Alessandro Boellis
- Radiology Department, Sant'Andrea Civil Hospital, 19121, La Spezia, Italy
| | - Teseo Stefanini
- Radiology Department, Sant'Andrea Civil Hospital, 19121, La Spezia, Italy
| | - Antonio Bernardini
- Radiology Department, Giuseppe Mazzini Civil Hospital, 64100, Teramo, Italy
| | - Chiara Angeletti
- Anestesiology, Intensive Care and Pain Medicine, Emergency Department, Giuseppe Mazzini Civil Hospital, 64100, Teramo, Italy
| | | | - Paola Franchi
- Radiology Department, Giuseppe Mazzini Civil Hospital, 64100, Teramo, Italy
| | - Ioan Paul Voicu
- Radiology Department, Giuseppe Mazzini Civil Hospital, 64100, Teramo, Italy
| | - Carlo Capotondi
- Radiology Department, Castelli Hospital, 00040, Ariccia, Italy
| | - Antonio Napolitano
- Medical Physics Department, Bambino Gesù Children's Hospital, Scientific Institute for Research, Hospitalization and Healthcare (IRCCS), 00165, Rome, Italy
| |
Collapse
|
10
|
Fischer A, Rietveld A, Teunissen P, Bakker P, Hoogendoorn M. End-to-end learning with interpretation on electrohysterography data to predict preterm birth. Comput Biol Med 2023; 158:106846. [PMID: 37019011 DOI: 10.1016/j.compbiomed.2023.106846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 03/03/2023] [Accepted: 03/30/2023] [Indexed: 04/03/2023]
Abstract
Prediction of preterm birth is a difficult task for clinicians. By examining an electrohysterogram, electrical activity of the uterus that can lead to preterm birth can be detected. Since signals associated with uterine activity are difficult to interpret for clinicians without a background in signal processing, machine learning may be a viable solution. We are the first to employ Deep Learning models, a long-short term memory and temporal convolutional network model, on electrohysterography data using the Term-Preterm Electrohysterogram database. We show that end-to-end learning achieves an AUC score of 0.58, which is comparable to machine learning models that use handcrafted features. Moreover, we evaluate the effect of adding clinical data to the model and conclude that adding the available clinical data to electrohysterography data does not result in a gain in performance. Also, we propose an interpretability framework for time series classification that is well-suited to use in case of limited data, as opposed to existing methods that require large amounts of data. Clinicians with extensive work experience as gynaecologist used our framework to provide insights on how to link our results to clinical practice and stress that in order to decrease the number of false positives, a dataset with patients at high risk of preterm birth should be collected. All code is made publicly available.
Collapse
|
11
|
Rajput D, Wang WJ, Chen CC. Evaluation of a decided sample size in machine learning applications. BMC Bioinformatics 2023; 24:48. [PMID: 36788550 PMCID: PMC9926644 DOI: 10.1186/s12859-023-05156-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 01/23/2023] [Indexed: 02/16/2023] Open
Abstract
BACKGROUND An appropriate sample size is essential for obtaining a precise and reliable outcome of a study. In machine learning (ML), studies with inadequate samples suffer from overfitting of data and have a lower probability of producing true effects, while the increment in sample size increases the accuracy of prediction but may not cause a significant change after a certain sample size. Existing statistical approaches using standardized mean difference, effect size, and statistical power for determining sample size are potentially biased due to miscalculations or lack of experimental details. This study aims to design criteria for evaluating sample size in ML studies. We examined the average and grand effect sizes and the performance of five ML methods using simulated datasets and three real datasets to derive the criteria for sample size. We systematically increase the sample size, starting from 16, by randomly sampling and examine the impact of sample size on classifiers' performance and both effect sizes. Tenfold cross-validation was used to quantify the accuracy. RESULTS The results demonstrate that the effect sizes and the classification accuracies increase while the variances in effect sizes shrink with the increment of samples when the datasets have a good discriminative power between two classes. By contrast, indeterminate datasets had poor effect sizes and classification accuracies, which did not improve by increasing sample size in both simulated and real datasets. A good dataset exhibited a significant difference in average and grand effect sizes. We derived two criteria based on the above findings to assess a decided sample size by combining the effect size and the ML accuracy. The sample size is considered suitable when it has appropriate effect sizes (≥ 0.5) and ML accuracy (≥ 80%). After an appropriate sample size, the increment in samples will not benefit as it will not significantly change the effect size and accuracy, thereby resulting in a good cost-benefit ratio. CONCLUSION We believe that these practical criteria can be used as a reference for both the authors and editors to evaluate whether the selected sample size is adequate for a study.
Collapse
Affiliation(s)
- Daniyal Rajput
- Institute of Cognitive Neuroscience, National Central University, Zhongda Rd, No. 300, Zhongli District, Taoyuan City, 320317, Taiwan, ROC. .,Taiwan International Graduate Program in Interdisciplinary Neuroscience, National Central University and Academia Sinica, Taipei, Taiwan, ROC.
| | - Wei-Jen Wang
- grid.37589.300000 0004 0532 3167Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan, ROC
| | - Chun-Chuan Chen
- grid.37589.300000 0004 0532 3167Institute of Cognitive Neuroscience, National Central University, Zhongda Rd, No. 300, Zhongli District, Taoyuan City, 320317 Taiwan, ROC ,grid.37589.300000 0004 0532 3167Department of Biomedical Sciences and Engineering, National Central University, Taoyuan, Taiwan, ROC
| |
Collapse
|
12
|
Goldsztejn U, Nehorai A. Predicting preterm births from electrohysterogram recordings via deep learning. PLoS One 2023; 18:e0285219. [PMID: 37167222 PMCID: PMC10174487 DOI: 10.1371/journal.pone.0285219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 04/18/2023] [Indexed: 05/13/2023] Open
Abstract
About one in ten babies is born preterm, i.e., before completing 37 weeks of gestation, which can result in permanent neurologic deficit and is a leading cause of child mortality. Although imminent preterm labor can be detected, predicting preterm births more than one week in advance remains elusive. Here, we develop a deep learning method to predict preterm births directly from electrohysterogram (EHG) measurements of pregnant mothers recorded at around 31 weeks of gestation. We developed a prediction model, which includes a recurrent neural network, to predict preterm births using short-time Fourier transforms of EHG recordings and clinical information from two public datasets. We predicted preterm births with an area under the receiver-operating characteristic curve (AUC) of 0.78 (95% confidence interval: 0.76-0.80). Moreover, we found that the spectral patterns of the measurements were more predictive than the temporal patterns, suggesting that preterm births can be predicted from short EHG recordings in an automated process. We show that preterm births can be predicted for pregnant mothers around their 31st week of gestation, prompting beneficial treatments to reduce the incidence of preterm births and improve their outcomes.
Collapse
Affiliation(s)
- Uri Goldsztejn
- Department of Biomedical Engineering, McKelvey School of Engineering, Washington University in St. Louis, St. Louis, MO, United States of America
| | - Arye Nehorai
- Preston M. Green Department of Electrical and Systems Engineering, McKelvey School of Engineering, Washington University in St. Louis, St. Louis, MO, United States of America
| |
Collapse
|
13
|
Qin Y, Wu J, Xiao W, Wang K, Huang A, Liu B, Yu J, Li C, Yu F, Ren Z. Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph192215027. [PMID: 36429751 PMCID: PMC9690067 DOI: 10.3390/ijerph192215027] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 11/04/2022] [Accepted: 11/10/2022] [Indexed: 06/01/2023]
Abstract
The prevalence of diabetes has been increasing in recent years, and previous research has found that machine-learning models are good diabetes prediction tools. The purpose of this study was to compare the efficacy of five different machine-learning models for diabetes prediction using lifestyle data from the National Health and Nutrition Examination Survey (NHANES) database. The 1999-2020 NHANES database yielded data on 17,833 individuals data based on demographic characteristics and lifestyle-related variables. To screen training data for machine models, the Akaike Information Criterion (AIC) forward propagation algorithm was utilized. For predicting diabetes, five machine-learning models (CATBoost, XGBoost, Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM)) were developed. Model performance was evaluated using accuracy, sensitivity, specificity, precision, F1 score, and receiver operating characteristic (ROC) curve. Among the five machine-learning models, the dietary intake levels of energy, carbohydrate, and fat, contributed the most to the prediction of diabetes patients. In terms of model performance, CATBoost ranks higher than RF, LG, XGBoost, and SVM. The best-performing machine-learning model among the five is CATBoost, which achieves an accuracy of 82.1% and an AUC of 0.83. Machine-learning models based on NHANES data can assist medical institutions in identifying diabetes patients.
Collapse
Affiliation(s)
- Yifan Qin
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Jinlong Wu
- College of Physical Education, Southwest University, Chongqing 400715, China
| | - Wen Xiao
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Kun Wang
- Physical Education College, Yanching Institute of Technology, Langfang 065201, China
| | - Anbing Huang
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Bowen Liu
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Jingxuan Yu
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Chuhao Li
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Fengyu Yu
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Zhanbing Ren
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| |
Collapse
|
14
|
Gerussi A, Scaravaglio M, Cristoferi L, Verda D, Milani C, De Bernardi E, Ippolito D, Asselta R, Invernizzi P, Kather JN, Carbone M. Artificial intelligence for precision medicine in autoimmune liver disease. Front Immunol 2022; 13:966329. [PMID: 36439097 PMCID: PMC9691668 DOI: 10.3389/fimmu.2022.966329] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 10/13/2022] [Indexed: 09/10/2023] Open
Abstract
Autoimmune liver diseases (AiLDs) are rare autoimmune conditions of the liver and the biliary tree with unknown etiology and limited treatment options. AiLDs are inherently characterized by a high degree of complexity, which poses great challenges in understanding their etiopathogenesis, developing novel biomarkers and risk-stratification tools, and, eventually, generating new drugs. Artificial intelligence (AI) is considered one of the best candidates to support researchers and clinicians in making sense of biological complexity. In this review, we offer a primer on AI and machine learning for clinicians, and discuss recent available literature on its applications in medicine and more specifically how it can help to tackle major unmet needs in AiLDs.
Collapse
Affiliation(s)
- Alessio Gerussi
- Division of Gastroenterology, Center for Autoimmune Liver Diseases, Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
- European Reference Network on Hepatological Diseases (ERN RARE-LIVER), San Gerardo Hospital, Monza, Italy
| | - Miki Scaravaglio
- Division of Gastroenterology, Center for Autoimmune Liver Diseases, Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
- European Reference Network on Hepatological Diseases (ERN RARE-LIVER), San Gerardo Hospital, Monza, Italy
| | - Laura Cristoferi
- Division of Gastroenterology, Center for Autoimmune Liver Diseases, Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
- European Reference Network on Hepatological Diseases (ERN RARE-LIVER), San Gerardo Hospital, Monza, Italy
- Bicocca Bioinformatics Biostatistics and Bioimaging Centre - B4, School of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | | | - Chiara Milani
- Division of Gastroenterology, Center for Autoimmune Liver Diseases, Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
- European Reference Network on Hepatological Diseases (ERN RARE-LIVER), San Gerardo Hospital, Monza, Italy
| | - Elisabetta De Bernardi
- Department of Medicine and Surgery and Tecnomed Foundation, University of Milano - Bicocca, Monza, Italy
| | | | - Rosanna Asselta
- Humanitas Clinical and Research Center, Rozzano, Milan, Italy
- Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Milan, Italy
| | - Pietro Invernizzi
- Division of Gastroenterology, Center for Autoimmune Liver Diseases, Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
- European Reference Network on Hepatological Diseases (ERN RARE-LIVER), San Gerardo Hospital, Monza, Italy
| | - Jakob Nikolas Kather
- Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany
- Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, Technical University Dresden, Dresden, Germany
| | - Marco Carbone
- Division of Gastroenterology, Center for Autoimmune Liver Diseases, Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
- European Reference Network on Hepatological Diseases (ERN RARE-LIVER), San Gerardo Hospital, Monza, Italy
| |
Collapse
|
15
|
Teji JS, Jain S, Gupta SK, Suri JS. NeoAI 1.0: Machine learning-based paradigm for prediction of neonatal and infant risk of death. Comput Biol Med 2022; 147:105639. [DOI: 10.1016/j.compbiomed.2022.105639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Revised: 05/01/2022] [Accepted: 05/01/2022] [Indexed: 11/29/2022]
|
16
|
Buchlak QD, Esmaili N, Bennett C, Wang YY, King J, Goldschlager T. Predictors of improvement in quality of life at 12-month follow-up in patients undergoing anterior endoscopic skull base surgery. PLoS One 2022; 17:e0272147. [PMID: 35895728 PMCID: PMC9328523 DOI: 10.1371/journal.pone.0272147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 07/13/2022] [Indexed: 11/18/2022] Open
Abstract
Background Patients with pituitary lesions experience decrements in quality of life (QoL) and treatment aims to arrest or improve QoL decline. Objective To detect associations with QoL in trans-nasal endoscopic skull base surgery patients and train supervised learning classifiers to predict QoL improvement at 12 months. Methods A supervised learning analysis of a prospective multi-institutional dataset (451 patients) was conducted. QoL was measured using the anterior skull base surgery questionnaire (ASBS). Factors associated with QoL at baseline and at 12-month follow-up were identified using multivariate logistic regression. Multiple supervised learning models were trained to predict postoperative QoL improvement with five-fold cross-validation. Results ASBS at 12-month follow-up was significantly higher (132.19,SD = 24.87) than preoperative ASBS (121.87,SD = 25.72,p<0.05). High preoperative scores were significantly associated with institution, diabetes and lesions at the planum sphenoidale / tuberculum sella site. Patients with diabetes were five times less likely to report high preoperative QoL. Low preoperative QoL was significantly associated with female gender, a vision-related presentation, diabetes, secreting adenoma and the cavernous sinus site. Top quartile change in postoperative QoL at 12-month follow-up was negatively associated with baseline hypercholesterolemia, acromegaly and intraoperative CSF leak. Positive associations were detected for lesions at the sphenoid sinus site and deficient preoperative endocrine function. AdaBoost, logistic regression and neural network classifiers yielded the strongest predictive performance. Conclusion It was possible to predict postoperative positive change in QoL at 12-month follow-up using perioperative data. Further development and implementation of these models may facilitate improvements in informed consent, treatment decision-making and patient QoL.
Collapse
Affiliation(s)
- Quinlan D. Buchlak
- School of Medicine, The University of Notre Dame Australia, Sydney, NSW, Australia
- Department of Neurosurgery, Monash Health, Melbourne, VIC, Australia
- * E-mail:
| | - Nazanin Esmaili
- School of Medicine, The University of Notre Dame Australia, Sydney, NSW, Australia
- Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, NSW, Australia
| | - Christine Bennett
- School of Medicine, The University of Notre Dame Australia, Sydney, NSW, Australia
| | - Yi Yuen Wang
- St Vincent’s Hospital, Melbourne, VIC, Australia
| | - James King
- Royal Melbourne Hospital, Melbourne, VIC, Australia
| | - Tony Goldschlager
- Department of Neurosurgery, Monash Health, Melbourne, VIC, Australia
- Department of Surgery, Monash University, Melbourne, VIC, Australia
| |
Collapse
|
17
|
Nieto-del-Amor F, Prats-Boluda G, Garcia-Casado J, Diaz-Martinez A, Diago-Almela VJ, Monfort-Ortiz R, Hao D, Ye-Lin Y. Combination of Feature Selection and Resampling Methods to Predict Preterm Birth Based on Electrohysterographic Signals from Imbalance Data. SENSORS 2022; 22:s22145098. [PMID: 35890778 PMCID: PMC9319575 DOI: 10.3390/s22145098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 07/01/2022] [Accepted: 07/05/2022] [Indexed: 02/01/2023]
Abstract
Due to its high sensitivity, electrohysterography (EHG) has emerged as an alternative technique for predicting preterm labor. The main obstacle in designing preterm labor prediction models is the inherent preterm/term imbalance ratio, which can give rise to relatively low performance. Numerous studies obtained promising preterm labor prediction results using the synthetic minority oversampling technique. However, these studies generally overestimate mathematical models’ real generalization capacity by generating synthetic data before splitting the dataset, leaking information between the training and testing partitions and thus reducing the complexity of the classification task. In this work, we analyzed the effect of combining feature selection and resampling methods to overcome the class imbalance problem for predicting preterm labor by EHG. We assessed undersampling, oversampling, and hybrid methods applied to the training and validation dataset during feature selection by genetic algorithm, and analyzed the resampling effect on training data after obtaining the optimized feature subset. The best strategy consisted of undersampling the majority class of the validation dataset to 1:1 during feature selection, without subsequent resampling of the training data, achieving an AUC of 94.5 ± 4.6%, average precision of 84.5 ± 11.7%, maximum F1-score of 79.6 ± 13.8%, and recall of 89.8 ± 12.1%. Our results outperformed the techniques currently used in clinical practice, suggesting the EHG could be used to predict preterm labor in clinics.
Collapse
Affiliation(s)
- Félix Nieto-del-Amor
- Centro de Investigación e Innovación en Bioingeniería, Universitat Politècnica de València, 46022 Valencia, Spain; (F.N.-d.-A.); (J.G.-C.); (A.D.-M.); (Y.Y.-L.)
| | - Gema Prats-Boluda
- Centro de Investigación e Innovación en Bioingeniería, Universitat Politècnica de València, 46022 Valencia, Spain; (F.N.-d.-A.); (J.G.-C.); (A.D.-M.); (Y.Y.-L.)
- Correspondence:
| | - Javier Garcia-Casado
- Centro de Investigación e Innovación en Bioingeniería, Universitat Politècnica de València, 46022 Valencia, Spain; (F.N.-d.-A.); (J.G.-C.); (A.D.-M.); (Y.Y.-L.)
| | - Alba Diaz-Martinez
- Centro de Investigación e Innovación en Bioingeniería, Universitat Politècnica de València, 46022 Valencia, Spain; (F.N.-d.-A.); (J.G.-C.); (A.D.-M.); (Y.Y.-L.)
| | | | - Rogelio Monfort-Ortiz
- Servicio de Obstetricia, H.U.P. La Fe, 46026 Valencia, Spain; (V.J.D.-A.); (R.M.-O.)
| | - Dongmei Hao
- Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China;
| | - Yiyao Ye-Lin
- Centro de Investigación e Innovación en Bioingeniería, Universitat Politècnica de València, 46022 Valencia, Spain; (F.N.-d.-A.); (J.G.-C.); (A.D.-M.); (Y.Y.-L.)
| |
Collapse
|
18
|
Lou H, Liu H, Chen Z, Zhen Z, Dong B, Xu J. Bio-process inspired characterization of pregnancy evolution using entropy and its application in preterm birth detection. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103587] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
19
|
Alharbi F, Ouarbya L, Ward JA. Comparing Sampling Strategies for Tackling Imbalanced Data in Human Activity Recognition. SENSORS 2022; 22:s22041373. [PMID: 35214275 PMCID: PMC8963022 DOI: 10.3390/s22041373] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Revised: 01/22/2022] [Accepted: 01/27/2022] [Indexed: 12/04/2022]
Abstract
Human activity recognition (HAR) using wearable sensors is an increasingly active research topic in machine learning, aided in part by the ready availability of detailed motion capture data from smartphones, fitness trackers, and smartwatches. The goal of HAR is to use such devices to assist users in their daily lives in application areas such as healthcare, physical therapy, and fitness. One of the main challenges for HAR, particularly when using supervised learning methods, is obtaining balanced data for algorithm optimisation and testing. As people perform some activities more than others (e.g., walk more than run), HAR datasets are typically imbalanced. The lack of dataset representation from minority classes hinders the ability of HAR classifiers to sufficiently capture new instances of those activities. We introduce three novel hybrid sampling strategies to generate more diverse synthetic samples to overcome the class imbalance problem. The first strategy, which we call the distance-based method (DBM), combines Synthetic Minority Oversampling Techniques (SMOTE) with Random_SMOTE, both of which are built around the k-nearest neighbors (KNN). The second technique, referred to as the noise detection-based method (NDBM), combines SMOTE Tomek links (SMOTE_Tomeklinks) and the modified synthetic minority oversampling technique (MSMOTE). The third approach, which we call the cluster-based method (CBM), combines Cluster-Based Synthetic Oversampling (CBSO) and Proximity Weighted Synthetic Oversampling Technique (ProWSyn). We compare the performance of the proposed hybrid methods to the individual constituent methods and baseline using accelerometer data from three commonly used benchmark datasets. We show that DBM, NDBM, and CBM reduce the impact of class imbalance and enhance F1 scores by a range of 9-20 percentage point compared to their constituent sampling methods. CBM performs significantly better than the others under a Friedman test, however, DBM has lower computational requirements.
Collapse
Affiliation(s)
- Fayez Alharbi
- Computer Sciences and Information Technology College, Majmaah University, Al Majmaah 15341, Saudi Arabia
- Department of Computing, Goldsmiths, University of London, London SE14 6NW, UK; (L.O.); (J.A.W.)
- Correspondence:
| | - Lahcen Ouarbya
- Department of Computing, Goldsmiths, University of London, London SE14 6NW, UK; (L.O.); (J.A.W.)
| | - Jamie A Ward
- Department of Computing, Goldsmiths, University of London, London SE14 6NW, UK; (L.O.); (J.A.W.)
| |
Collapse
|
20
|
Xu J, Wang M, Zhang J, Chen Z, Huang W, Shen G, Zhang M. Network theory based EHG signal analysis and its application in preterm prediction. IEEE J Biomed Health Inform 2022; 26:2876-2887. [PMID: 34986107 DOI: 10.1109/jbhi.2022.3140427] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
OBJECTIVE Preterm birth is the leading cause of neonatal morbidity and mortality. Early identification of high-risk patients followed by medical interventions is essential to the prevention of preterm birth. Based on the relationship between uterine contraction and the fundamental electrical activities of muscles, we extracted effective features from EHG signals recorded from pregnant women, and use them to train classifiers with the purpose of providing high precision in classifying term and preterm pregnancies. METHODS To characterize changes from irregularity to coherence of the uterine activity during the whole pregnancy, network representations of the original electrohysterogram (EHG) signals are established by applying the Horizontal Visibility Graph (HVG) algorithm, from which we extract network degree density and distribution, clustering coefficient and assortativity coefficient. Concerns on the interferences of different noise sources embedded in the EHG signal, we apply Short-Time Fourier Transform (STFT) to expand the original signal in the time-frequency domain. This allows a network representation and the extraction of related features on each frequency component. Feature selection algorithms are then used to filter out unrelated frequency components. We further apply the proposed feature extraction method to EHG signals available in the Term-Preterm EHG database (TPEHG), and use them to train classifiers. We adopt the Partition-Synthesis scheme which splits the original imbalanced dataset into two sets and synthesizes artificial samples separately within each subset to solve the problem of dataset imbalance. RESULTS The optimally selected network-based features, not only contribute to the identification of the essential frequency components of uterine activities related to preterm birth, but also to improved performance in classifying term/preterm pregnancies, i.e., the SVM (Support Vector Machine) classifier trained with the available samples in the TPEHG gives sensitivity, specificity, overall accuracy, and auc values as high as 0.89, 0.93, 0.91, and 0.97, respectively.
Collapse
|
21
|
Kovács G, Fazekas A. A new baseline for retinal vessel segmentation: Numerical identification and correction of methodological inconsistencies affecting 100+ papers. Med Image Anal 2021; 75:102300. [PMID: 34814057 DOI: 10.1016/j.media.2021.102300] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Revised: 09/20/2021] [Accepted: 11/04/2021] [Indexed: 12/18/2022]
Abstract
In the last 15 years, the segmentation of vessels in retinal images has become an intensively researched problem in medical imaging, with hundreds of algorithms published. One of the de facto benchmarking data sets of vessel segmentation techniques is the DRIVE data set. Since DRIVE contains a predefined split of training and test images, the published performance results of the various segmentation techniques should provide a reliable ranking of the algorithms. Including more than 100 papers in the study, we performed a detailed numerical analysis of the coherence of the published performance scores. We found inconsistencies in the reported scores related to the use of the field of view (FoV), which has a significant impact on the performance scores. We attempted to eliminate the biases using numerical techniques to provide a more realistic picture of the state of the art. Based on the results, we have formulated several findings, most notably: despite the well-defined test set of DRIVE, most rankings in published papers are based on non-comparable figures; in contrast to the near-perfect accuracy scores reported in the literature, the highest accuracy score achieved to date is 0.9582 in the FoV region, which is 1% higher than that of human annotators. The methods we have developed for identifying and eliminating the evaluation biases can be easily applied to other domains where similar problems may arise.
Collapse
Affiliation(s)
- György Kovács
- Analytical Minds Ltd., Árpád street 5, Beregsurány 4933, Hungary.
| | - Attila Fazekas
- University of Debrecen, Faculty of Informatics, P.O.BOX 400, Debrecen 4002, Hungary.
| |
Collapse
|
22
|
Russo S, Batista A, Esgalhado F, Palma dos Reis CR, Serrano F, Vassilenko V, Ortigueira M. Alvarez waves in pregnancy: a comprehensive review. Biophys Rev 2021; 13:563-574. [PMID: 34471439 PMCID: PMC8355272 DOI: 10.1007/s12551-021-00818-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Accepted: 06/29/2021] [Indexed: 10/20/2022] Open
Abstract
Alvarez waves are local rhythmic contractions of the myometrium with high frequency and low intensity. They can be detected using internal or external tocography and electrohysterography. Some researchers correlate these small contractions with the initiation of labor, since they have been described as a pattern representing the uterine response to prostaglandin production. Other authors either do not validate a causality relation between Alvarez waves and labor or suggest that they have low predictive value for preterm labor. Alvarez waves' research has become a multidisciplinary subject with inputs ranging from medical science, biomedical engineering, and related areas. A comprehensive review is herein conducted to summarize the state of the art regarding Alvarez waves and their role in the initiation of labor, namely in preterm birth. The results show that a large number of studies have analyzed and characterized Alvarez waves without necessarily digging into their relationship with labor. Publications were categorized in three groups: (A) reports about morphology and characterization of Alvarez waves; (B) publications reporting a positive causality relation between Alvarez waves and labor; and (C) publications reporting an absence of causality regarding the previous hypothesis. Studies in group B outnumbered those in group C. A critical analysis is presented.
Collapse
Affiliation(s)
- Sara Russo
- Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa, Campus de Caparica, 2829-516 Caparica, Portugal
| | - Arnaldo Batista
- Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa, Campus de Caparica, 2829-516 Caparica, Portugal
- UNINOVA, Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa, Campus de Caparica, 2829-516 Caparica, Portugal
| | - Filipa Esgalhado
- Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa, Campus de Caparica, 2829-516 Caparica, Portugal
- NMT, S.A., Parque Tecnológico de Cantanhede, Núcleo 04, Lote 3, 3060 -, 197 Cantanhede, Portugal
| | - Catarina R. Palma dos Reis
- Maternidade Alfredo da Costa, Rua Viriato 1, 1050-170 Lisboa, Portugal
- Nova Medical School / Faculty of Medical Sciences, Universidade Nova de Lisboa, 1169-056 Lisboa, Portugal
| | - Fátima Serrano
- Maternidade Alfredo da Costa, Rua Viriato 1, 1050-170 Lisboa, Portugal
- Nova Medical School / Faculty of Medical Sciences, Universidade Nova de Lisboa, 1169-056 Lisboa, Portugal
| | - Valentina Vassilenko
- Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa, Campus de Caparica, 2829-516 Caparica, Portugal
- NMT, S.A., Parque Tecnológico de Cantanhede, Núcleo 04, Lote 3, 3060 -, 197 Cantanhede, Portugal
| | - Manuel Ortigueira
- Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa, Campus de Caparica, 2829-516 Caparica, Portugal
- UNINOVA, Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa, Campus de Caparica, 2829-516 Caparica, Portugal
| |
Collapse
|
23
|
Vandewiele G, Ongenae F, Dehaene I. Commentary: Automated detection of preterm condition using uterine electromyography based topological features. Biocybern Biomed Eng 2021. [DOI: 10.1016/j.bbe.2021.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
24
|
Class-Imbalanced Voice Pathology Detection and Classification Using Fuzzy Cluster Oversampling Method. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11083450] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The Massachusetts Eye and Ear Infirmary (MEEI) database is an international-standard training database for voice pathology detection (VPD) systems. However, there is a class-imbalanced distribution in normal and pathological voice samples and different types of pathological voice samples in the MEEI database. This study aimed to develop a VPD system that uses the fuzzy clustering synthetic minority oversampling technique algorithm (FC-SMOTE) to automatically detect and classify four types of pathological voices in a multi-class imbalanced database. The proposed FC-SMOTE algorithm processes the initial class-imbalanced dataset. A set of machine learning models was evaluated and validated using the resulting class-balanced dataset as an input. The effectiveness of the VPD system with FC-SMOTE was further verified by an external validation set and another pathological voice database (Saarbruecken Voice Database (SVD)). The experimental results show that, in the multi-classification of pathological voice for the class-imbalanced dataset, the method we propose can significantly improve the diagnostic accuracy. Meanwhile, FC-SMOTE outperforms the traditional imbalanced data oversampling algorithms, and it is preferred for imbalanced voice diagnosis in practical applications.
Collapse
|
25
|
Riaño D, Wilk S, Teije AT. Preface: AIME 2019. Artif Intell Med 2021; 115:102058. [PMID: 34001318 DOI: 10.1016/j.artmed.2021.102058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 03/22/2021] [Indexed: 10/21/2022]
Affiliation(s)
- David Riaño
- Universitat Rovira i Virgili, Tarragona, Spain.
| | - Szymon Wilk
- Poznań University of Technology, Poznań, Poland
| | | |
Collapse
|
26
|
Belouali A, Gupta S, Sourirajan V, Yu J, Allen N, Alaoui A, Dutton MA, Reinhard MJ. Acoustic and language analysis of speech for suicidal ideation among US veterans. BioData Min 2021; 14:11. [PMID: 33531048 PMCID: PMC7856815 DOI: 10.1186/s13040-021-00245-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 01/20/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Screening for suicidal ideation in high-risk groups such as U.S. veterans is crucial for early detection and suicide prevention. Currently, screening is based on clinical interviews or self-report measures. Both approaches rely on subjects to disclose their suicidal thoughts. Innovative approaches are necessary to develop objective and clinically applicable assessments. Speech has been investigated as an objective marker to understand various mental states including suicidal ideation. In this work, we developed a machine learning and natural language processing classifier based on speech markers to screen for suicidal ideation in US veterans. METHODOLOGY Veterans submitted 588 narrative audio recordings via a mobile app in a real-life setting. In addition, participants completed self-report psychiatric scales and questionnaires. Recordings were analyzed to extract voice characteristics including prosodic, phonation, and glottal. The audios were also transcribed to extract textual features for linguistic analysis. We evaluated the acoustic and linguistic features using both statistical significance and ensemble feature selection. We also examined the performance of different machine learning algorithms on multiple combinations of features to classify suicidal and non-suicidal audios. RESULTS A combined set of 15 acoustic and linguistic features of speech were identified by the ensemble feature selection. Random Forest classifier, using the selected set of features, correctly identified suicidal ideation in veterans with 86% sensitivity, 70% specificity, and an area under the receiver operating characteristic curve (AUC) of 80%. CONCLUSIONS Speech analysis of audios collected from veterans in everyday life settings using smartphones offers a promising approach for suicidal ideation detection. A machine learning classifier may eventually help clinicians identify and monitor high-risk veterans.
Collapse
Affiliation(s)
- Anas Belouali
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA.
| | - Samir Gupta
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
| | - Vaibhav Sourirajan
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
| | - Jiawei Yu
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
| | - Nathaniel Allen
- War Related Illness and Injury Study Center, Veterans Affairs Medical Center, Washington, DC, USA
| | - Adil Alaoui
- Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA
| | - Mary Ann Dutton
- Department of Psychiatry, Georgetown University Medical Center, Washington, DC, USA
| | - Matthew J Reinhard
- War Related Illness and Injury Study Center, Veterans Affairs Medical Center, Washington, DC, USA
- Department of Psychiatry, Georgetown University Medical Center, Washington, DC, USA
| |
Collapse
|