51
|
Ortiz-Toro C, García-Pedrero A, Lillo-Saavedra M, Gonzalo-Martín C. Automatic detection of pneumonia in chest X-ray images using textural features. Comput Biol Med 2022; 145:105466. [PMID: 35585732 PMCID: PMC8966154 DOI: 10.1016/j.compbiomed.2022.105466] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 03/25/2022] [Accepted: 03/26/2022] [Indexed: 12/16/2022]
Abstract
Fast and accurate diagnosis is critical for the triage and management of pneumonia, particularly in the current scenario of a COVID-19 pandemic, where this pathology is a major symptom of the infection. With the objective of providing tools for that purpose, this study assesses the potential of three textural image characterisation methods: radiomics, fractal dimension and the recently developed superpixel-based histon, as biomarkers to be used for training Artificial Intelligence (AI) models in order to detect pneumonia in chest X-ray images. Models generated from three different AI algorithms have been studied: K-Nearest Neighbors, Support Vector Machine and Random Forest. Two open-access image datasets were used in this study. In the first one, a dataset composed of paediatric chest X-ray, the best performing generated models achieved an 83.3% accuracy with 89% sensitivity for radiomics, 89.9% accuracy with 93.6% sensitivity for fractal dimension and 91.3% accuracy with 90.5% sensitivity for superpixels based histon. Second, a dataset derived from an image repository developed primarily as a tool for studying COVID-19 was used. For this dataset, the best performing generated models resulted in a 95.3% accuracy with 99.2% sensitivity for radiomics, 99% accuracy with 100% sensitivity for fractal dimension and 99% accuracy with 98.6% sensitivity for superpixel-based histons. The results confirm the validity of the tested methods as reliable and easy-to-implement automatic diagnostic tools for pneumonia.
Collapse
Affiliation(s)
- César Ortiz-Toro
- Department of Computer Architecture and Technology, Universidad Politécnica de Madrid, 28660, Boadilla del Monte, Spain
| | - Angel García-Pedrero
- Department of Computer Architecture and Technology, Universidad Politécnica de Madrid, 28660, Boadilla del Monte, Spain,Center for Biomedical Technology, Campus de Montegancedo, Universidad Politécnica de Madrid, 28233, Pozuelo de Alarcón, Spain
| | - Mario Lillo-Saavedra
- Facultad de Ingeniería Agrícola, Universidad de Concepción, Chillán, 3812120, Chile
| | - Consuelo Gonzalo-Martín
- Department of Computer Architecture and Technology, Universidad Politécnica de Madrid, 28660, Boadilla del Monte, Spain,Center for Biomedical Technology, Campus de Montegancedo, Universidad Politécnica de Madrid, 28233, Pozuelo de Alarcón, Spain,Corresponding author. Department of Computer Architecture and Technology, Universidad Politécnica de Madrid, 28660, Boadilla del Monte, Spain
| |
Collapse
|
52
|
Turco S, Tiyarattanachai T, Ebrahimkheil K, Eisenbrey J, Kamaya A, Mischi M, Lyshchik A, Kaffas AE. Interpretable Machine Learning for Characterization of Focal Liver Lesions by Contrast-Enhanced Ultrasound. IEEE TRANSACTIONS ON ULTRASONICS, FERROELECTRICS, AND FREQUENCY CONTROL 2022; 69:1670-1681. [PMID: 35320099 PMCID: PMC9188683 DOI: 10.1109/tuffc.2022.3161719] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
This work proposes an interpretable radiomics approach to differentiate between malignant and benign focal liver lesions (FLLs) on contrast-enhanced ultrasound (CEUS). Although CEUS has shown promise for differential FLLs diagnosis, current clinical assessment is performed only by qualitative analysis of the contrast enhancement patterns. Quantitative analysis is often hampered by the unavoidable presence of motion artifacts and by the complex, spatiotemporal nature of liver contrast enhancement, consisting of multiple, overlapping vascular phases. To fully exploit the wealth of information in CEUS, while coping with these challenges, here we propose combining features extracted by the temporal and spatiotemporal analysis in the arterial phase enhancement with spatial features extracted by texture analysis at different time points. Using the extracted features as input, several machine learning classifiers are optimized to achieve semiautomatic FLLs characterization, for which there is no need for motion compensation and the only manual input required is the location of a suspicious lesion. Clinical validation on 87 FLLs from 72 patients at risk for hepatocellular carcinoma (HCC) showed promising performance, achieving a balanced accuracy of 0.84 in the distinction between benign and malignant lesions. Analysis of feature relevance demonstrates that a combination of spatiotemporal and texture features is needed to achieve the best performance. Interpretation of the most relevant features suggests that aspects related to microvascular perfusion and the microvascular architecture, together with the spatial enhancement characteristics at wash-in and peak enhancement, are important to aid the accurate characterization of FLLs.
Collapse
|
53
|
Santos MS, Abreu PH, Japkowicz N, Fernández A, Soares C, Wilk S, Santos J. On the joint-effect of class imbalance and overlap: a critical review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10150-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
54
|
Rana A, Singh H, Mavuduru R, Pattanaik S, Rana PS. Quantifying prognosis severity of COVID-19 patients from deep learning based analysis of CT chest images. MULTIMEDIA TOOLS AND APPLICATIONS 2022; 81:18129-18153. [PMID: 35282403 PMCID: PMC8901869 DOI: 10.1007/s11042-022-12214-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 01/04/2022] [Accepted: 01/10/2022] [Indexed: 05/28/2023]
Abstract
The COVID-19 pandemic has affected all the countries in the world with its droplet spread mode. The colossal amount of cases has strained all the healthcare systems due to the serious nature of infections especially for people with comorbidities. A very high specificity Reverse Transcriptase-Polymerase Chain Reaction (RT-PCR) test is the principal technique in use for diagnosing the COVID-19 patients. Also, CT scans have helped medical professionals in patient severity estimation & progression tracking of COVID-19 virus. In study we present our own extensible COVID-19 viral infection tracking prognosis technique. It uses annotated dataset of CT chest scan slice images created with the help of medical professionals. The annotated dataset contains bounding box coordinates of different features for COVID-19 detection like ground glass opacities, crazy paving pattern, consolidations, lesions etc. We qualitatively identify the severity of the patient for later prognosis stages in our study to assist medical staff for patient prioritization. First we detected COVID-19 positive patients with pre-trained Siamese Neural Network (SNN) which obtained 87.6% accuracy, 87.1% F1-Score & 95.1% AUC scores. These metrics were achieved after removal of 40% quantitatively highly similar images from the COVID-CT dataset. This reduced dataset was further medically annotated with COVID-19 features for bounding box detection. After this we assigned severity scores to detected COVID-19 features and calculated the cumulative severity score for COVID-19 patients. For qualitative patient prioritization with prognosis clinical assistance information, we finally converted this score into a multi-classification problem which obtained 47% weighted-average F1-score.
Collapse
Affiliation(s)
- Ashish Rana
- Department of Computer Science and Engineering, TIET, Patiala, Punjab India
| | - Harpreet Singh
- Department of Computer Science and Engineering, TIET, Patiala, Punjab India
| | | | - Smita Pattanaik
- Department of Urology and Pharmacology, PGIMER, Chandigarh, India
| | | |
Collapse
|
55
|
Prescott L. SARS-CoV-2 3CLpro whole human proteome cleavage prediction and enrichment/depletion analysis. Comput Biol Chem 2022; 98:107671. [PMID: 35429835 PMCID: PMC8958254 DOI: 10.1016/j.compbiolchem.2022.107671] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Revised: 03/21/2022] [Accepted: 03/25/2022] [Indexed: 12/12/2022]
Abstract
A novel coronavirus (SARS-CoV-2) has devastated the globe as a pandemic that has killed millions of people. Widespread vaccination is still uncertain, so many scientific efforts have been directed toward discovering antiviral treatments. Many drugs are being investigated to inhibit the coronavirus main protease, 3CLpro, from cleaving its viral polyprotein, but few publications have addressed this protease’s interactions with the host proteome or their probable contribution to virulence. Too few host protein cleavages have been experimentally verified to fully understand 3CLpro’s global effects on relevant cellular pathways and tissues. Here, I set out to determine this protease’s targets and corresponding potential drug targets. Using a neural network trained on cleavages from 392 coronavirus proteomes with a Matthews correlation coefficient of 0.985, I predict that a large proportion of the human proteome is vulnerable to 3CLpro, with 4898 out of approximately 20,000 human proteins containing at least one putative cleavage site. These cleavages are nonrandomly distributed and are enriched in the epithelium along the respiratory tract, brain, testis, plasma, and immune tissues and depleted in olfactory and gustatory receptors despite the prevalence of anosmia and ageusia in COVID-19 patients. Affected cellular pathways include cytoskeleton/motor/cell adhesion proteins, nuclear condensation and other epigenetics, host transcription and RNAi, ribosomal stoichiometry and nascent-chain detection and degradation, ubiquitination, pattern recognition receptors, coagulation, lipoproteins, redox, and apoptosis. This whole proteome cleavage prediction demonstrates the importance of 3CLpro in expected and nontrivial pathways affecting virulence, lead me to propose more than a dozen potential therapeutic targets against coronaviruses, and should therefore be applied to all viral proteases and subsequently experimentally verified.
Collapse
|
56
|
Comparison of Machine Learning Methods Using Spectralis OCT for Diagnosis and Disability Progression Prognosis in Multiple Sclerosis. Ann Biomed Eng 2022; 50:507-528. [PMID: 35220529 PMCID: PMC9001622 DOI: 10.1007/s10439-022-02930-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 02/10/2022] [Indexed: 12/28/2022]
Abstract
Machine learning approaches in diagnosis and prognosis of multiple sclerosis (MS) were analysed using retinal nerve fiber layer (RNFL) thickness, measured by optical coherence tomography (OCT). A cross-sectional study (72 MS patients and 30 healthy controls) was used for diagnosis. These 72 MS patients were involved in a 10-year longitudinal follow-up study for prognostic purposes. Structural measurements of RNFL thickness were performed using different Spectralis OCT protocols: fast macular thickness protocol to measure macular RNFL, and fast RNFL thickness protocol and fast RNFL-N thickness protocol to measure peripapillary RNFL. Binary classifiers such as multiple linear regression (MLR), support vector machines (SVM), decision tree (DT), k-nearest neighbours (k-NN), Naïve Bayes (NB), ensemble classifier (EC) and long short-term memory (LSTM) recurrent neural network were tested. For MS diagnosis, the best acquisition protocol was fast macular thickness protocol using k-NN (accuracy: 95.8%; sensitivity: 94.4%; specificity: 97.2%; precision: 97.1%; AUC: 0.958). For MS prognosis, our model with a 3-year follow up to predict disability progression 8 years later was the best predictive model. DT performed best for fast macular thickness protocol (accuracy: 91.3%; sensitivity: 90.0%; specificity: 92.5%; precision: 92.3%; AUC: 0.913) and SVM for fast RNFL-N thickness protocol (accuracy: 91.3%; sensitivity: 87.5%; specificity: 95.0%; precision: 94.6%; AUC: 0.913). This work concludes that measurements of RNFL thickness obtained with Spectralis OCT have a good ability to diagnose MS and to predict disability progression in MS patients. This machine learning approach would help clinicians to have valuable information.
Collapse
|
57
|
Mao YJ, Lim HJ, Ni M, Yan WH, Wong DWC, Cheung JCW. Breast Tumour Classification Using Ultrasound Elastography with Machine Learning: A Systematic Scoping Review. Cancers (Basel) 2022; 14:367. [PMID: 35053531 PMCID: PMC8773731 DOI: 10.3390/cancers14020367] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 01/10/2022] [Accepted: 01/11/2022] [Indexed: 12/21/2022] Open
Abstract
Ultrasound elastography can quantify stiffness distribution of tissue lesions and complements conventional B-mode ultrasound for breast cancer screening. Recently, the development of computer-aided diagnosis has improved the reliability of the system, whilst the inception of machine learning, such as deep learning, has further extended its power by facilitating automated segmentation and tumour classification. The objective of this review was to summarize application of the machine learning model to ultrasound elastography systems for breast tumour classification. Review databases included PubMed, Web of Science, CINAHL, and EMBASE. Thirteen (n = 13) articles were eligible for review. Shear-wave elastography was investigated in six articles, whereas seven studies focused on strain elastography (5 freehand and 2 Acoustic Radiation Force). Traditional computer vision workflow was common in strain elastography with separated image segmentation, feature extraction, and classifier functions using different algorithm-based methods, neural networks or support vector machines (SVM). Shear-wave elastography often adopts the deep learning model, convolutional neural network (CNN), that integrates functional tasks. All of the reviewed articles achieved sensitivity ³ 80%, while only half of them attained acceptable specificity ³ 95%. Deep learning models did not necessarily perform better than traditional computer vision workflow. Nevertheless, there were inconsistencies and insufficiencies in reporting and calculation, such as the testing dataset, cross-validation, and methods to avoid overfitting. Most of the studies did not report loss or hyperparameters. Future studies may consider using the deep network with an attention layer to locate the targeted object automatically and online training to facilitate efficient re-training for sequential data.
Collapse
Affiliation(s)
- Ye-Jiao Mao
- Department of Bioengineering, Imperial College, London SW7 2AZ, UK;
| | - Hyo-Jung Lim
- Department of Biomedical Engineering, Faculty of Engineering, The Hong Kong Polytechnic University, Hong Kong 999077, China;
| | - Ming Ni
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China;
- Department of Orthopaedics, Pudong New Area People’s Hospital Affiliated to Shanghai University of Medicine and Health Science, Shanghai 201299, China
| | - Wai-Hin Yan
- Department of Economics, The Chinese University of Hong Kong, Hong Kong 999077, China;
| | - Duo Wai-Chi Wong
- Department of Biomedical Engineering, Faculty of Engineering, The Hong Kong Polytechnic University, Hong Kong 999077, China;
| | - James Chung-Wai Cheung
- Department of Biomedical Engineering, Faculty of Engineering, The Hong Kong Polytechnic University, Hong Kong 999077, China;
- Research Institute of Smart Ageing, The Hong Kong Polytechnic University, Hong Kong 999077, China
| |
Collapse
|
58
|
Jiang Z, Li J, Kong N, Kim JH, Kim BS, Lee MJ, Park YM, Lee SY, Hong SJ, Sul JH. Accurate diagnosis of atopic dermatitis by combining transcriptome and microbiota data with supervised machine learning. Sci Rep 2022; 12:290. [PMID: 34997172 PMCID: PMC8741793 DOI: 10.1038/s41598-021-04373-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 12/14/2021] [Indexed: 11/16/2022] Open
Abstract
Atopic dermatitis (AD) is a common skin disease in childhood whose diagnosis requires expertise in dermatology. Recent studies have indicated that host genes–microbial interactions in the gut contribute to human diseases including AD. We sought to develop an accurate and automated pipeline for AD diagnosis based on transcriptome and microbiota data. Using these data of 161 subjects including AD patients and healthy controls, we trained a machine learning classifier to predict the risk of AD. We found that the classifier could accurately differentiate subjects with AD and healthy individuals based on the omics data with an average F1-score of 0.84. With this classifier, we also identified a set of 35 genes and 50 microbiota features that are predictive for AD. Among the selected features, we discovered at least three genes and three microorganisms directly or indirectly associated with AD. Although further replications in other cohorts are needed, our findings suggest that these genes and microbiota features may provide novel biological insights and may be developed into useful biomarkers of AD prediction.
Collapse
Affiliation(s)
- Ziyuan Jiang
- Department of Automation, Tsinghua University, Beijing, 100084, China
| | - Jiajin Li
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Nahyun Kong
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology, Daejeon, Daejeon, 34141, Republic of Korea
| | - Jeong-Hyun Kim
- Department of Medicine, University of Ulsan College of Medicine, Seoul, 05505, Republic of Korea
| | - Bong-Soo Kim
- Department of Life Science, Multidisciplinary Genome Institute, Hallym University, Chuncheon, 24252, Republic of Korea
| | - Min-Jung Lee
- Department of Life Science, Multidisciplinary Genome Institute, Hallym University, Chuncheon, 24252, Republic of Korea
| | - Yoon Mee Park
- Department of Medicine, University of Ulsan College of Medicine, Seoul, 05505, Republic of Korea
| | - So-Yeon Lee
- Department of Pediatrics, Asan Medical Center, Childhood Asthma Atopy Center, Humidifier Disinfectant Health Center, University of Ulsan College of Medicine, Seoul, 05505, Republic of Korea
| | - Soo-Jong Hong
- Department of Pediatrics, Asan Medical Center, Childhood Asthma Atopy Center, Humidifier Disinfectant Health Center, University of Ulsan College of Medicine, Seoul, 05505, Republic of Korea.
| | - Jae Hoon Sul
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
| |
Collapse
|
59
|
A new approach to impact case study analytics. DATA & POLICY 2022. [DOI: 10.1017/dap.2022.21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
Abstract
The 2014 Research Excellence Framework (REF) assessed the quality of university research in the UK. 20% of the assessment was allocated according to peer review of the impact of research, reflecting the growing importance of impact in UK government policy. Beyond academia, impact is defined as a change or benefit to the economy, society, culture, public policy or services, health, the environment, or quality of life. Each institution submitted a set of four-page impact case studies. These are predominantly free-form descriptions and evidences of the impact of study. Numerous analyses of these case studies have been conducted, but they have utilised either qualitative methods or primary forms of text searching. These approaches have limitations, including the time required to manually analyse the data and the frequently inferior quality of the answers provided by applying computational analysis to unstructured, context-less free text data. This paper describes a new system to address these problems. At its core is a structured, queryable representation of the case study data. We describe the ontology design used to structure the information and how semantic web related technologies are used to store and query the data. Experiments show that this gives two significant advantages over existing techniques: improved accuracy in question answering and the capability to answer a broader range of questions, by integrating data from external sources. Then we investigate whether machine learning can predict each case study’s grade using this structured representation. The results provide accurate predictions for computer science impact case studies.
Collapse
|
60
|
Koh DH, Song WS, Kim EY. Multi-step structure-activity relationship screening efficiently predicts diverse PPARγ antagonists. CHEMOSPHERE 2022; 286:131540. [PMID: 34346341 DOI: 10.1016/j.chemosphere.2021.131540] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 07/08/2021] [Accepted: 07/10/2021] [Indexed: 06/13/2023]
Abstract
In discovering the potential antagonist of peroxisome proliferator-activated receptor gamma (PPARγ), the structure-activity relationship (SAR) is a useful in silico method. However, it is difficult for conventional SAR approaches to predict the activities of antagonists owing to the large structural diversity of antagonistic compounds. This study provides evidence that multi-step SAR screening is applicable for predicting PPARγ antagonists by combining different complementary methodologies. We constructed three models: read-across-like SAR, docking-simulation-interpreting SAR, and deep-learning-based SAR. To provide user-customized prediction results, our multi-step SAR screening model combined the three SAR models in a stepwise manner, which subdivided them according to potential levels of the PPARγ antagonist. The read-across-like SAR, which considered specific antagonist scaffolds, revealed the highest positive predictive value (PPV). The docking-simulation-interpreting SAR, which considered the molecular surface features, revealed high statistics for the PPV and the true-positive rate (TPR). The deep-learning-based SAR showed the highest TPR at the last classification step. This multi-step SAR screening covered the antagonists of high reliability provided by a read-across-like SAR, as well as the antagonists of diverse scaffolds provided by docking-simulation-interpreting SAR and deep-learning-based SAR. Therefore, to predict PPARγ antagonists, multi-step SAR screening could be as a useful tool.
Collapse
Affiliation(s)
- Dong-Hee Koh
- Department of Life and Nanopharmaceutical Science, South Korea
| | - Woo-Seon Song
- Department of Life and Nanopharmaceutical Science, South Korea
| | - Eun-Young Kim
- Department of Life and Nanopharmaceutical Science, South Korea; Department of Biology, Kyung Hee University, Hoegi-Dong, Dongdaemun-Gu, Seoul, 130-701, South Korea.
| |
Collapse
|
61
|
Nguyen ND, Huang J, Wang D. A deep manifold-regularized learning model for improving phenotype prediction from multi-modal data. NATURE COMPUTATIONAL SCIENCE 2022; 2:38-46. [PMID: 35480297 PMCID: PMC9038085 DOI: 10.1038/s43588-021-00185-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 12/13/2021] [Indexed: 11/09/2022]
Abstract
The phenotypes of complex biological systems are fundamentally driven by various multi-scale mechanisms. Multi-modal data, such as single cell multi-omics data, enables a deeper understanding of underlying complex mechanisms across scales for phenotypes. We developed an interpretable regularized learning model, deepManReg, to predict phenotypes from multi-modal data. First, deepManReg employs deep neural networks to learn cross-modal manifolds and then to align multi-modal features onto a common latent space. Second, deepManReg uses cross-modal manifolds as a feature graph to regularize the classifiers for improving phenotype predictions and also for prioritizing the multi-modal features and cross-modal interactions for the phenotypes. We applied deepManReg to (1) an image dataset of handwritten digits with multi-features and (2) single cell multi-modal data (Patch-seq data) including transcriptomics and electrophysiology for neuronal cells in the mouse brain. We show that deepManReg improved phenotype prediction in both datasets, and also prioritized genes and electrophysiological features for the phenotypes of neuronal cells.
Collapse
Affiliation(s)
- Nam D. Nguyen
- Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, USA
- Present address: Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Jiawei Huang
- Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, USA
- Present address: Carl H. Lindner College of Business, University of Cincinnati, Cincinnati, OH, 45223, USA
| | - Daifeng Wang
- Waisman Center, University of Wisconsin-Madison, Madison, WI 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706, USA
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA
| |
Collapse
|
62
|
Zhang S, Li H, Qiu T. An Innovative Graph Neural Network Model for Detailed Effluent Prediction in Steam Cracking. Ind Eng Chem Res 2021. [DOI: 10.1021/acs.iecr.1c03728] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Shuyuan Zhang
- Department of Chemical Engineering, Tsinghua University, Beijing 100084, China
- Beijing Key Laboratory of Industrial Big Data Systems and Applications, Tsinghua University, Beijing 100084, China
| | - Haoran Li
- Department of Chemical Engineering, Tsinghua University, Beijing 100084, China
- Beijing Key Laboratory of Industrial Big Data Systems and Applications, Tsinghua University, Beijing 100084, China
| | - Tong Qiu
- Department of Chemical Engineering, Tsinghua University, Beijing 100084, China
- Beijing Key Laboratory of Industrial Big Data Systems and Applications, Tsinghua University, Beijing 100084, China
| |
Collapse
|
63
|
Tozlu C, Jamison K, Gauthier SA, Kuceyeski A. Dynamic Functional Connectivity Better Predicts Disability Than Structural and Static Functional Connectivity in People With Multiple Sclerosis. Front Neurosci 2021; 15:763966. [PMID: 34966255 PMCID: PMC8710545 DOI: 10.3389/fnins.2021.763966] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 11/17/2021] [Indexed: 12/30/2022] Open
Abstract
Background: Advanced imaging techniques such as diffusion and functional MRI can be used to identify pathology-related changes to the brain's structural and functional connectivity (SC and FC) networks and mapping of these changes to disability and compensatory mechanisms in people with multiple sclerosis (pwMS). No study to date performed a comparison study to investigate which connectivity type (SC, static or dynamic FC) better distinguishes healthy controls (HC) from pwMS and/or classifies pwMS by disability status. Aims: We aim to compare the performance of SC, static FC, and dynamic FC (dFC) in classifying (a) HC vs. pwMS and (b) pwMS who have no disability vs. with disability. The secondary objective of the study is to identify which brain regions' connectome measures contribute most to the classification tasks. Materials and Methods: One hundred pwMS and 19 HC were included. Expanded Disability Status Scale (EDSS) was used to assess disability, where 67 pwMS who had EDSS<2 were considered as not having disability. Diffusion and resting-state functional MRI were used to compute the SC and FC matrices, respectively. Logistic regression with ridge regularization was performed, where the models included demographics/clinical information and either pairwise entries or regional summaries from one of the following matrices: SC, FC, and dFC. The performance of the models was assessed using the area under the receiver operating curve (AUC). Results: In classifying HC vs. pwMS, the regional SC model significantly outperformed others with a median AUC of 0.89 (p <0.05). In classifying pwMS by disability status, the regional dFC and dFC metrics models significantly outperformed others with a median AUC of 0.65 and 0.61 (p < 0.05). Regional SC in the dorsal attention, subcortical and cerebellar networks were the most important variables in the HC vs. pwMS classification task. Increased regional dFC in dorsal attention and visual networks and decreased regional dFC in frontoparietal and cerebellar networks in certain dFC states was associated with being in the group of pwMS with evidence of disability. Discussion: Damage to SCs is a hallmark of MS and, unsurprisingly, the most accurate connectomic measure in classifying patients and controls. On the other hand, dynamic FC metrics were most important for determining disability level in pwMS, and could represent functional compensation in response to white matter pathology in pwMS.
Collapse
Affiliation(s)
- Ceren Tozlu
- Department of Radiology, Weill Cornell Medicine, New York, NY, United States
| | - Keith Jamison
- Department of Radiology, Weill Cornell Medicine, New York, NY, United States
| | - Susan A. Gauthier
- Department of Radiology, Weill Cornell Medicine, New York, NY, United States
- Judith Jaffe Multiple Sclerosis Center, Weill Cornell Medicine, New York, NY, United States
- Department of Neurology, Weill Cornell Medical College, New York, NY, United States
| | - Amy Kuceyeski
- Department of Radiology, Weill Cornell Medicine, New York, NY, United States
- Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, United States
- *Correspondence: Amy Kuceyeski
| |
Collapse
|
64
|
Masías VH, Crespo R FA, Navarro R P, Masood R, Krämer NC, Hoppe HU. On spatial variation in the detectability and density of social media user protest supporters. TELEMATICS AND INFORMATICS 2021. [DOI: 10.1016/j.tele.2021.101730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
65
|
Predicting Final User Satisfaction Using Momentary UX Data and Machine Learning Techniques. JOURNAL OF THEORETICAL AND APPLIED ELECTRONIC COMMERCE RESEARCH 2021. [DOI: 10.3390/jtaer16070171] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
User experience (UX) evaluation investigates how people feel about using products or services and is considered an important factor in the design process. However, there is no comprehensive UX evaluation method for time-continuous situations during the use of products or services. Because user experience changes over time, it is difficult to discern the relationship between momentary UX and episodic or cumulative UX, which is related to final user satisfaction. This research aimed to predict final user satisfaction by using momentary UX data and machine learning techniques. The participants were 50 and 25 university students who were asked to evaluate a service (Experiment I) or a product (Experiment II), respectively, during usage by answering a satisfaction survey. Responses were used to draw a customized UX curve. Participants were also asked to complete a final satisfaction questionnaire about the product or service. Momentary UX data and participant satisfaction scores were used to build machine learning models, and the experimental results were compared with those obtained using seven built machine learning models. This study shows that participants’ momentary UX can be understood using a support vector machine (SVM) with a polynomial kernel and that momentary UX can be used to make more accurate predictions about final user satisfaction regarding product and service usage.
Collapse
|
66
|
Orsini F, Gecchele G, Rossi R, Gastaldi M. A conflict-based approach for real-time road safety analysis: Comparative evaluation with crash-based models. ACCIDENT; ANALYSIS AND PREVENTION 2021; 161:106382. [PMID: 34479121 DOI: 10.1016/j.aap.2021.106382] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 05/14/2021] [Accepted: 08/22/2021] [Indexed: 06/13/2023]
Abstract
An innovative approach for real-time road safety analysis is presented in this work. Unlike traditional real-time crash prediction models (RTCPMs), in which crash data are used in the training phase, a real-time conflict prediction model (RTConfPM) is proposed. This model can be trained using surrogate measures of safety, and can therefore be applied even in situations in which highly spatial/temporal-accurate crash data are unavailable or unreliable. The application of an RTConfPM consists of using a set of input variables recorded during a given time interval, to predict whether there will be an increased risk of unsafe situations in the following interval. This paper presents an RTConfPM to predict rear-end crashes, using time-to-collision values recorded with radar sensors on multiple motorway cross-sections to define unsafe situations, and traffic conditions recorded on the same sections as input to the model. The RTConfPM is compared to a traditional RTCPM, trained with a dataset of crashes located on the same motorway, and using the same traffic data as input. In both approaches, variable selection is performed with Pearson's correlation test and random forest; synthetic minority oversampling technique (SMOTE) is used to balance the classes in the training dataset, support vector machine (SVM) is used as classifier, and Monte Carlo cross-validation is adopted for robustness. The two approaches are evaluated considering accuracy, recall, specificity/false alarm rate, and area under the curve (AUC). As shown by the results of this paper, the conflict-based approach appears promising, and is able to predict the occurrence of unsafe situations within 5 min with more than 93% accuracy, recall and specificity, significantly outperforming the RTCPM.
Collapse
Affiliation(s)
- Federico Orsini
- Department of Civil Environmental and Architectural Engineering, University of Padua, Via Marzolo 9, 35131 Padua, Italy.
| | - Gregorio Gecchele
- Atraki s.r.l., via Diaz 4, 37015 S. Ambrogio di Valpolicella (Verona), Italy
| | - Riccardo Rossi
- Department of Civil Environmental and Architectural Engineering, University of Padua, Via Marzolo 9, 35131 Padua, Italy
| | - Massimiliano Gastaldi
- Department of Civil Environmental and Architectural Engineering, University of Padua, Via Marzolo 9, 35131 Padua, Italy; Department of General Psychology, University of Padua, Via Venezia 8, 35131 Padua, Italy
| |
Collapse
|
67
|
Tozlu C, Jamison K, Nguyen T, Zinger N, Kaunzner U, Pandya S, Wang Y, Gauthier S, Kuceyeski A. Structural disconnectivity from paramagnetic rim lesions is related to disability in multiple sclerosis. Brain Behav 2021; 11:e2353. [PMID: 34498432 PMCID: PMC8553317 DOI: 10.1002/brb3.2353] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 07/28/2021] [Accepted: 08/19/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND In people with multiple sclerosis (pwMS), lesions with a hyperintense rim (rim+) on Quantitative Susceptibility Mapping (QSM) have been shown to have greater myelin damage compared to rim- lesions, but their association with disability has not yet been investigated. Furthermore, how QSM rim+ and rim- lesions differentially impact disability through their disruptions to structural connectivity has not been explored. We test the hypothesis that structural disconnectivity due to rim+ lesions is more predictive of disability compared to structural disconnectivity due to rim- lesions. METHODS Ninety-six pwMS were included in our study. Individuals with Expanded Disability Status Scale (EDSS) <2 were considered to have lower disability (n = 59). For each gray matter region, a Change in Connectivity (ChaCo) score, that is, the percent of connecting streamlines also passing through a rim- or rim+ lesion, was computed. Adaptive Boosting was used to classify the pwMS into lower versus greater disability groups based on ChaCo scores from rim+ and rim- lesions. Classification performance was assessed using the area under ROC curve (AUC). RESULTS The model based on ChaCo from rim+ lesions outperformed the model based on ChaCo from rim- lesions (AUC = 0.67 vs 0.63, p-value < .05). The left thalamus and left cerebellum were the most important regions in classifying pwMS into disability categories. CONCLUSION rim+ lesions may be more influential on disability through their disruptions to the structural connectome than rim- lesions. This study provides a deeper understanding of how rim+ lesion location/size and resulting disruption to the structural connectome can contribute to MS-related disability.
Collapse
Affiliation(s)
- Ceren Tozlu
- Department of Radiology, Weill Cornell Medicine, New York, New York, USA
| | - Keith Jamison
- Department of Radiology, Weill Cornell Medicine, New York, New York, USA
| | - Thanh Nguyen
- Department of Radiology, Weill Cornell Medicine, New York, New York, USA
| | - Nicole Zinger
- Department of Neurology, Weill Cornell Medicine, New York, New York, USA
| | - Ulrike Kaunzner
- Department of Neurology, Weill Cornell Medicine, New York, New York, USA
| | - Sneha Pandya
- Department of Radiology, Weill Cornell Medicine, New York, New York, USA
| | - Yi Wang
- Department of Radiology, Weill Cornell Medicine, New York, New York, USA
| | - Susan Gauthier
- Department of Neurology, Weill Cornell Medicine, New York, New York, USA
| | - Amy Kuceyeski
- Department of Radiology, Weill Cornell Medicine, New York, New York, USA.,Brain and Mind Research Institute, Weill Cornell Medicine, New York, New York, USA
| |
Collapse
|
68
|
Ojaghi A, Casteleiro Costa P, Caruso C, Lam WA, Robles FE. Label-free automated neutropenia detection and grading using deep-ultraviolet microscopy. BIOMEDICAL OPTICS EXPRESS 2021; 12:6115-6128. [PMID: 34745725 PMCID: PMC8547990 DOI: 10.1364/boe.434465] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 08/30/2021] [Accepted: 08/31/2021] [Indexed: 05/20/2023]
Abstract
Neutropenia is a condition identified by an abnormally low number of neutrophils in the bloodstream and signifies an increased risk of severe infection. Cancer patients are particularly susceptible to this condition, which can be disruptive to their treatment and even life-threatening in severe cases. Thus, it is critical to routinely monitor neutrophil counts in cancer patients. However, the standard of care to assess neutropenia, the complete blood count (CBC), requires expensive and complex equipment, as well as cumbersome procedures, which precludes easy or timely access to critical hematological information, namely neutrophil counts. Here we present a simple, low-cost, fast, and robust technique to detect and grade neutropenia based on label-free multi-spectral deep-UV microscopy. Results show that the developed framework for automated segmentation and classification of live, unstained blood cells in a smear accurately differentiates patients with moderate and severe neutropenia from healthy samples in minutes. This work has significant implications towards the development of a low-cost and easy-to-use point-of-care device for tracking neutrophil counts, which can not only improve the quality of life and treatment-outcomes of many patients but can also be lifesaving.
Collapse
Affiliation(s)
- Ashkan Ojaghi
- Wallace H. Coulter Department of Biomedical Engineering,
Georgia Institute of Technology and Emory
University, Atlanta, GA 30332, USA
- These authors contributed equally
| | - Paloma Casteleiro Costa
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
- These authors contributed equally
| | - Christina Caruso
- Aflac Cancer and Blood Disorders Center of
Children's Healthcare of Atlanta and Department of Pediatrics,
Emory University School of Medicine,
Atlanta, GA 30322, USA
| | - Wilbur A. Lam
- Wallace H. Coulter Department of Biomedical Engineering,
Georgia Institute of Technology and Emory
University, Atlanta, GA 30332, USA
- Aflac Cancer and Blood Disorders Center of
Children's Healthcare of Atlanta and Department of Pediatrics,
Emory University School of Medicine,
Atlanta, GA 30322, USA
| | - Francisco E. Robles
- Wallace H. Coulter Department of Biomedical Engineering,
Georgia Institute of Technology and Emory
University, Atlanta, GA 30332, USA
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
69
|
ÇAĞLAYAN M, BAHTİYAR Ş. Money Laundering Detection with Node2Vec. GAZI UNIVERSITY JOURNAL OF SCIENCE 2021. [DOI: 10.35378/gujs.854725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
70
|
Tozlu C, Jamison K, Gu Z, Gauthier SA, Kuceyeski A. Estimated connectivity networks outperform observed connectivity networks when classifying people with multiple sclerosis into disability groups. Neuroimage Clin 2021; 32:102827. [PMID: 34601310 PMCID: PMC8488753 DOI: 10.1016/j.nicl.2021.102827] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 09/09/2021] [Accepted: 09/11/2021] [Indexed: 11/22/2022]
Abstract
BACKGROUND Multiple Sclerosis (MS), a neurodegenerative and neuroinflammatory disease, causing lesions that disrupt the brain's anatomical and physiological connectivity networks, resulting in cognitive, visual and/or motor disabilities. Advanced imaging techniques like diffusion and functional MRI allow measurement of the brain's structural connectivity (SC) and functional connectivity (FC) networks, and can enable a better understanding of how their disruptions cause disability in people with MS (pwMS). However, advanced MRI techniques are used mainly for research purposes as they are expensive, time-consuming and require high-level expertise to acquire and process. As an alternative, the Network Modification (NeMo) Tool can be used to estimate SC and FC using lesion masks derived from pwMS and a reference set of controls' connectivity networks. OBJECTIVE Here, we test the hypothesis that estimated SC and FC (eSC and eFC) from the NeMo Tool, based only on an individual's lesion masks, can be used to classify pwMS into disability categories just as well as SC and FC extracted from advanced MRI directly in pwMS. We also aim to find the connections most important for differentiating between no disability vs evidence of disability groups. MATERIALS AND METHODS One hundred pwMS (age:45.5 ± 11.4 years, 66% female, disease duration: 12.97 ± 8.07 years) were included in this study. Expanded Disability Status Scale (EDSS) was used to assess disability, 67 pwMS had no disability (EDSS < 2). Observed SC and FC were extracted from diffusion and functional MRI directly in pwMS, respectively. The NeMo Tool was used to estimate the remaining structural connectome (eSC), by removing streamlines in a reference set of tractograms that intersected the lesion mask. The NeMo Tool's eSC was used then as input to a deep neural network to estimate the corresponding FC (eFC). Logistic regression with ridge regularization was used to classify pwMS into disability categories (no disability vs evidence of disability), based on demographics/clinical information (sex, age, race, disease duration, clinical phenotype, and spinal lesion burden) and either pairwise entries or regional summaries from one of the following matrices: SC, FC, eSC, and eFC. The area under the ROC curve (AUC) was used to assess the classification performance. Both univariate statistics and parameter coefficients from the classification models were used to identify features important to differentiating between the groups. RESULTS The regional eSC and eFC models outperformed their observed FC and SC counterparts (p-value < 0.05), while the pairwise eSC and SC performed similarly (p = 0.10). Regional eSC and eFC models had higher AUC (0.66-0.68) than the pairwise models (0.60-0.65), with regional eFC having highest classification accuracy across all models. Ridge regression coefficients for the regional eFC and regional observed FC models were significantly correlated (Pearson's r = 0.52, p-value < 10e-7). Decreased estimated SC node strength in default mode and ventral attention networks and increased eFC node strength in visual networks was associated with evidence of disability. DISCUSSION Here, for the first time, we use clinically acquired lesion masks to estimate both structural and functional connectomes in patient populations to better understand brain lesion-dysfunction mapping in pwMS. Models based on the NeMo Tool's estimates of SC and FC better classified pwMS by disability level than SC and FC observed directly in the individual using advanced MRI. This work provides a viable alternative to performing high-cost, advanced MRI in patient populations, bringing the connectome one step closer to the clinic.
Collapse
Affiliation(s)
- Ceren Tozlu
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA
| | - Keith Jamison
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA
| | - Zijin Gu
- Electrical and Computer Engineering Department, Cornell University, Ithaca 14850, USA
| | - Susan A Gauthier
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA; Department of Neurology, Weill Cornell Medicine, New York, NY, USA; Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
| | - Amy Kuceyeski
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA; Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
71
|
Chen B, Ju X, Xiao B, Ding W, Zheng Y, de Albuquerque VHC. Locally GAN-generated face detection based on an improved Xception. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.05.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
72
|
Zuzarte I, Sternad D, Paydarfar D. Predicting apneic events in preterm infants using cardio-respiratory and movement features. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 209:106321. [PMID: 34380078 PMCID: PMC8898595 DOI: 10.1016/j.cmpb.2021.106321] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 07/25/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND AND OBJECTIVE Preterm neonates are prone to episodes of apnea, bradycardia and hypoxia (ABH) that can lead to neurological morbidities or even death. There is broad interest in developing methods for real-time prediction of ABH events to inform interventions that prevent or reduce their incidence and severity. Using advances in machine learning methods, this study develops an algorithm to predict ABH events. METHODS Following previous studies showing that respiratory instabilities are closely associated with bouts of movement, we present a modeling framework that can predict ABH events using both movement and cardio-respiratory features derived from routine clinical recordings. In 10 preterm infants, movement onsets and durations were estimated with a wavelet-based algorithm that quantified artifactual distortions of the photoplethysmogram signal. For prediction, cardio-respiratory features were created from time-delayed correlations of inter-beat and inter-breath intervals with past values; movement features were derived from time-delayed correlations with inter-breath intervals. Gaussian Mixture Models and Logistic Regression were used to develop predictive models of apneic events. Performance of the models was evaluated with ROC curves. RESULTS Performance of the prediction framework (mean AUC) was 0.77 ± 0.04 for 66 ABH events on training data from 7 infants. When grouped by the severity of the associated bradycardia during the ABH event, the framework was able to predict 83% and 75% of the most severe episodes in the 7-infant training set and 3-infant test set, respectively. Notably, inclusion of movement features significantly improved the predictions compared with modeling with only cardio-respiratory signals. CONCLUSIONS Our findings suggest that recordings of movement provide important information for predicting ABH events in preterm infants, and can inform preemptive interventions designed to reduce the incidence and severity of ABH events.
Collapse
Affiliation(s)
- Ian Zuzarte
- Department of Bioengineering, Northeastern University, Boston, MA 02115, United States
| | - Dagmar Sternad
- Departments of Biology, Electrical and Computer Engineering & Physics, Northeastern University, Boston, MA 02115, United States
| | - David Paydarfar
- Department of Neurology, Dell Medical School, Austin, TX 78712, United States; Oden Institute for Computational Sciences and Engineering, The University of Texas at Austin, Austin, TX 78712, United States.
| |
Collapse
|
73
|
De Souza FSH, Hojo-Souza NS, Dos Santos EB, Da Silva CM, Guidoni DL. Predicting the Disease Outcome in COVID-19 Positive Patients Through Machine Learning: A Retrospective Cohort Study With Brazilian Data. Front Artif Intell 2021; 4:579931. [PMID: 34514377 PMCID: PMC8427867 DOI: 10.3389/frai.2021.579931] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 08/02/2021] [Indexed: 01/08/2023] Open
Abstract
The first officially registered case of COVID-19 in Brazil was on February 26, 2020. Since then, the situation has worsened with more than 672, 000 confirmed cases and at least 36, 000 reported deaths by June 2020. Accurate diagnosis of patients with COVID-19 is extremely important to offer adequate treatment, and avoid overloading the healthcare system. Characteristics of patients such as age, comorbidities and varied clinical symptoms can help in classifying the level of infection severity, predict the disease outcome and the need for hospitalization. Here, we present a study to predict a poor prognosis in positive COVID-19 patients and possible outcomes using machine learning. The study dataset comprises information of 8, 443 patients concerning closed cases due to cure or death. Our experimental results show the disease outcome can be predicted with a Receiver Operating Characteristic AUC of 0.92, Sensitivity of 0.88 and Specificity of 0.82 for the best prediction model. This is a preliminary retrospective study which can be improved with the inclusion of further data. Conclusion: Machine learning techniques fed with demographic and clinical data along with comorbidities of the patients can assist in the prognostic prediction and physician decision-making, allowing a faster response and contributing to the non-overload of healthcare systems.
Collapse
Affiliation(s)
| | | | | | | | - Daniel Ludovico Guidoni
- Department of Computer Science, Federal University of São João Del-Rei, São João Del-Rei, Brazil
| |
Collapse
|
74
|
Pasetto L, Callegaro S, Corbelli A, Fiordaliso F, Ferrara D, Brunelli L, Sestito G, Pastorelli R, Bianchi E, Cretich M, Chiari M, Potrich C, Moglia C, Corbo M, Sorarù G, Lunetta C, Calvo A, Chiò A, Mora G, Pennuto M, Quattrone A, Rinaldi F, D'Agostino VG, Basso M, Bonetto V. Decoding distinctive features of plasma extracellular vesicles in amyotrophic lateral sclerosis. Mol Neurodegener 2021; 16:52. [PMID: 34376243 PMCID: PMC8353748 DOI: 10.1186/s13024-021-00470-3] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 07/05/2021] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Amyotrophic lateral sclerosis (ALS) is a multifactorial, multisystem motor neuron disease for which currently there is no effective treatment. There is an urgent need to identify biomarkers to tackle the disease's complexity and help in early diagnosis, prognosis, and therapy. Extracellular vesicles (EVs) are nanostructures released by any cell type into body fluids. Their biophysical and biochemical characteristics vary with the parent cell's physiological and pathological state and make them an attractive source of multidimensional data for patient classification and stratification. METHODS We analyzed plasma-derived EVs of ALS patients (n = 106) and controls (n = 96), and SOD1G93A and TDP-43Q331K mouse models of ALS. We purified plasma EVs by nickel-based isolation, characterized their EV size distribution and morphology respectively by nanotracking analysis and transmission electron microscopy, and analyzed EV markers and protein cargos by Western blot and proteomics. We used machine learning techniques to predict diagnosis and prognosis. RESULTS Our procedure resulted in high-yield isolation of intact and polydisperse plasma EVs, with minimal lipoprotein contamination. EVs in the plasma of ALS patients and the two mouse models of ALS had a distinctive size distribution and lower HSP90 levels compared to the controls. In terms of disease progression, the levels of cyclophilin A with the EV size distribution distinguished fast and slow disease progressors, a possibly new means for patient stratification. Immuno-electron microscopy also suggested that phosphorylated TDP-43 is not an intravesicular cargo of plasma-derived EVs. CONCLUSIONS Our analysis unmasked features in plasma EVs of ALS patients with potential straightforward clinical application. We conceived an innovative mathematical model based on machine learning which, by integrating EV size distribution data with protein cargoes, gave very high prediction rates for disease diagnosis and prognosis.
Collapse
Affiliation(s)
- Laura Pasetto
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Stefano Callegaro
- Department of Mathematics "Tullio Levi-Civita", University of Padova, Padova, Italy
| | | | - Fabio Fiordaliso
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Deborah Ferrara
- Department of Cellular, Computational and Integrative Biology - CIBIO, University of Trento, Trento, Italy
| | - Laura Brunelli
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Giovanna Sestito
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | | | - Elisa Bianchi
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
| | - Marina Cretich
- Consiglio Nazionale delle Ricerche, Istituto di Scienze e Tecnologie Chimiche "Giulio Natta" (SCITEC-CNR), Milan, Italy
| | - Marcella Chiari
- Consiglio Nazionale delle Ricerche, Istituto di Scienze e Tecnologie Chimiche "Giulio Natta" (SCITEC-CNR), Milan, Italy
| | - Cristina Potrich
- Centre for Materials and Microsystems, Fondazione Bruno Kessler, Trento, Italy.,Istituto di Biofisica, Consiglio Nazionale delle Ricerche, Trento, Italy
| | - Cristina Moglia
- 'Rita Levi Montalcini' Department of Neuroscience, Università degli Studi di Torino, Torino, Italy
| | - Massimo Corbo
- Department of Neurorehabilitation Sciences, Casa Cura Policlinico (CCP), Milan, Italy
| | - Gianni Sorarù
- Department of Neuroscience, University of Padova, 35122, Padova, Italy
| | - Christian Lunetta
- NEuroMuscular Omnicentre (NEMO), Serena Onlus Foundation, Milan, Italy
| | - Andrea Calvo
- 'Rita Levi Montalcini' Department of Neuroscience, Università degli Studi di Torino, Torino, Italy
| | - Adriano Chiò
- 'Rita Levi Montalcini' Department of Neuroscience, Università degli Studi di Torino, Torino, Italy
| | - Gabriele Mora
- Department of Neurorehabilitation, ICS Maugeri IRCCS, Milan, Italy
| | - Maria Pennuto
- Department of Biomedical Sciences (DBS), University of Padova, 35131, Padova, Italy.,Veneto Institute of Molecular Medicine (VIMM), 35129, Padova, Italy
| | - Alessandro Quattrone
- Department of Cellular, Computational and Integrative Biology - CIBIO, University of Trento, Trento, Italy
| | - Francesco Rinaldi
- Department of Mathematics "Tullio Levi-Civita", University of Padova, Padova, Italy
| | - Vito Giuseppe D'Agostino
- Department of Cellular, Computational and Integrative Biology - CIBIO, University of Trento, Trento, Italy
| | - Manuela Basso
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy. .,Department of Cellular, Computational and Integrative Biology - CIBIO, University of Trento, Trento, Italy.
| | - Valentina Bonetto
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy.
| |
Collapse
|
75
|
Theft Prediction Model Based on Spatial Clustering to Reflect Spatial Characteristics of Adjacent Lands. SUSTAINABILITY 2021. [DOI: 10.3390/su13147715] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Previous studies have shown that when a crime occurs, the risk of crime in adjacent areas increases. To reflect this, previous grid-based crime prediction studies combined all the cells surrounding the event location to be predicted for use in model training. However, the actual land is continuous rather than a set of independent cells as in a geographic information system. Because the patterns that occur according to the detailed method of crime vary, it is necessary to reflect the spatial characteristics of the adjacent land in crime prediction. In this study, cells with similar spatial characteristics were classified using the Max-p region model (a spatial clustering technique), and the performance was compared to the existing method using random forest (a tree-based machine learning model). According to the results, the F1 score of the model using spatial clustering increased by approximately 2%. Accordingly, there are differences in the physical environmental factors influenced by the detailed method of crime. The findings reveal that crime involving the same offender is likely to occur around the area of the original crime, indicating that a repeated crime is likely in areas with similar spatial features to the area where the crime occurred.
Collapse
|
76
|
K. P. MN, P. T. Alzheimer's classification using dynamic ensemble of classifiers selection algorithms: A performance analysis. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102729] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
77
|
Wang H, Liu X. Undersampling bankruptcy prediction: Taiwan bankruptcy data. PLoS One 2021; 16:e0254030. [PMID: 34197533 PMCID: PMC8248686 DOI: 10.1371/journal.pone.0254030] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 06/17/2021] [Indexed: 12/01/2022] Open
Abstract
Machine learning models have increasingly been used in bankruptcy prediction. However, the observed historical data of bankrupt companies are often affected by data imbalance, which causes incorrect prediction, resulting in substantial economic losses. Many studies have proposed the insolvency imbalance problem, but little attention has been paid to the effect of the undersampling technology. Therefore, a framework is used to spot-check algorithms quickly and combine which undersampling method and classification model performs best. The results show that Naive Bayes (NB) after Edited Nearest Neighbors (ENN) has the best performance, with an F2-measure of 0.423. In addition, by changing the undersampling rate of the cluster centroid-based method, we find that the performance of the Linear Discriminant Analysis (LDA) and Naive Bayes (NB) are affected by the undersampling rate. Neither of them is uniformly declining, and LDA has higher performance when the undersampling rate is 30%. This study accordingly provides another perspective and a guide for future design.
Collapse
Affiliation(s)
- Haoming Wang
- School of Economics, Jinan University, Guangzhou, Guangdong, China
| | - Xiangdong Liu
- School of Economics, Jinan University, Guangzhou, Guangdong, China
- * E-mail:
| |
Collapse
|
78
|
A Study of Fall Detection in Assisted Living: Identifying and Improving the Optimal Machine Learning Method. JOURNAL OF SENSOR AND ACTUATOR NETWORKS 2021. [DOI: 10.3390/jsan10030039] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
This paper makes four scientific contributions to the field of fall detection in the elderly to contribute to their assisted living in the future of Internet of Things (IoT)-based pervasive living environments, such as smart homes. First, it presents and discusses a comprehensive comparative study, where 19 different machine learning methods were used to develop fall detection systems, to deduce the optimal machine learning method for the development of such systems. This study was conducted on two different datasets, and the results show that out of all the machine learning methods, the k-NN classifier is best suited for the development of fall detection systems in terms of performance accuracy. Second, it presents a framework that overcomes the limitations of binary classifier-based fall detection systems by being able to detect falls and fall-like motions. Third, to increase the trust and reliance on fall detection systems, it introduces a novel methodology based on the usage of k-folds cross-validation and the AdaBoost algorithm that improves the performance accuracy of the k-NN classifier-based fall detection system to the extent that it outperforms all similar works in this field. This approach achieved performance accuracies of 99.87% and 99.66%, respectively, when evaluated on the two datasets. Finally, the proposed approach is also highly accurate in detecting the activity of standing up from a lying position to infer whether a fall was followed by a long lie, which can cause minor to major health-related concerns. The above contributions address multiple research challenges in the field of fall detection, that we identified after conducting a comprehensive review of related works, which is also presented in this paper.
Collapse
|
79
|
ODBOT: Outlier detection-based oversampling technique for imbalanced datasets learning. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-06198-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
80
|
Gnip P, Vokorokos L, Drotár P. Selective oversampling approach for strongly imbalanced data. PeerJ Comput Sci 2021; 7:e604. [PMID: 34239981 PMCID: PMC8237317 DOI: 10.7717/peerj-cs.604] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 05/31/2021] [Indexed: 06/03/2023]
Abstract
Challenges posed by imbalanced data are encountered in many real-world applications. One of the possible approaches to improve the classifier performance on imbalanced data is oversampling. In this paper, we propose the new selective oversampling approach (SOA) that first isolates the most representative samples from minority classes by using an outlier detection technique and then utilizes these samples for synthetic oversampling. We show that the proposed approach improves the performance of two state-of-the-art oversampling methods, namely, the synthetic minority oversampling technique and adaptive synthetic sampling. The prediction performance is evaluated on four synthetic datasets and four real-world datasets, and the proposed SOA methods always achieved the same or better performance than other considered existing oversampling methods.
Collapse
Affiliation(s)
- Peter Gnip
- Department of Computers and Informatics, Technical University of Košice, Slovak Republic
| | - Liberios Vokorokos
- Department of Computers and Informatics, Technical University of Košice, Slovak Republic
| | - Peter Drotár
- Department of Computers and Informatics, Technical University of Košice, Slovak Republic
| |
Collapse
|
81
|
Subudhi S, Verma A, Patel AB, Hardin CC, Khandekar MJ, Lee H, McEvoy D, Stylianopoulos T, Munn LL, Dutta S, Jain RK. Comparing machine learning algorithms for predicting ICU admission and mortality in COVID-19. NPJ Digit Med 2021; 4:87. [PMID: 34021235 PMCID: PMC8140139 DOI: 10.1038/s41746-021-00456-x] [Citation(s) in RCA: 73] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Accepted: 04/16/2021] [Indexed: 02/06/2023] Open
Abstract
As predicting the trajectory of COVID-19 is challenging, machine learning models could assist physicians in identifying high-risk individuals. This study compares the performance of 18 machine learning algorithms for predicting ICU admission and mortality among COVID-19 patients. Using COVID-19 patient data from the Mass General Brigham (MGB) Healthcare database, we developed and internally validated models using patients presenting to the Emergency Department (ED) between March-April 2020 (n = 3597) and further validated them using temporally distinct individuals who presented to the ED between May-August 2020 (n = 1711). We show that ensemble-based models perform better than other model types at predicting both 5-day ICU admission and 28-day mortality from COVID-19. CRP, LDH, and O2 saturation were important for ICU admission models whereas eGFR <60 ml/min/1.73 m2, and neutrophil and lymphocyte percentages were the most important variables for predicting mortality. Implementing such models could help in clinical decision-making for future infectious disease outbreaks including COVID-19.
Collapse
Affiliation(s)
- Sonu Subudhi
- Department of Medicine/Gastroenterology Division, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Ashish Verma
- Department of Medicine/Renal Division, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Ankit B Patel
- Department of Medicine/Renal Division, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - C Corey Hardin
- Department of Pulmonary and Critical Care Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Melin J Khandekar
- Department of Radiation Oncology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Hang Lee
- Biostatistics Center, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Dustin McEvoy
- Mass General Brigham Digital Health eCare, Somerville, MA, USA
| | - Triantafyllos Stylianopoulos
- Cancer Biophysics Laboratory, Department of Mechanical and Manufacturing Engineering, University of Cyprus, Nicosia, Cyprus
| | - Lance L Munn
- Edwin L. Steele Laboratories, Department of Radiation Oncology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Sayon Dutta
- Mass General Brigham Digital Health eCare, Somerville, MA, USA.
- Department of Emergency Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
| | - Rakesh K Jain
- Edwin L. Steele Laboratories, Department of Radiation Oncology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
82
|
Liu J, Wong ZSY, So HY, Tsui KL. Evaluating resampling methods and structured features to improve fall incident report identification by the severity level. J Am Med Inform Assoc 2021; 28:1756-1764. [PMID: 34010385 DOI: 10.1093/jamia/ocab048] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 02/24/2021] [Accepted: 04/27/2021] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE This study aims to improve the classification of the fall incident severity level by considering data imbalance issues and structured features through machine learning. MATERIALS AND METHODS We present an incident report classification (IRC) framework to classify the in-hospital fall incident severity level by addressing the imbalanced class problem and incorporating structured attributes. After text preprocessing, bag-of-words features, structured text features, and structured clinical features were extracted from the reports. Next, resampling techniques were incorporated into the training process. Machine learning algorithms were used to build classification models. IRC systems were trained, validated, and tested using a repeated and randomly stratified shuffle-split cross-validation method. Finally, we evaluated the system performance using the F1-measure, precision, and recall over 15 stratified test sets. RESULTS The experimental results demonstrated that the classification system setting considering both data imbalance issues and structured features outperformed the other system settings (with a mean macro-averaged F1-measure of 0.733). Considering the structured features and resampling techniques, this classification system setting significantly improved the mean F1-measure for the rare class by 30.88% (P value < .001) and the mean macro-averaged F1-measure by 8.26% from the baseline system setting (P value < .001). In general, the classification system employing the random forest algorithm and random oversampling method outperformed the others. CONCLUSIONS Structured features provide essential information for categorizing the fall incident severity level. Resampling methods help rebalance the class distribution of the original incident report data, which improves the performance of machine learning models. The IRC framework presented in this study effectively automates the identification of fall incident reports by the severity level.
Collapse
Affiliation(s)
- Jiaxing Liu
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, China.,School of Data Science, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| | - Zoie S Y Wong
- Graduate School of Public Health, St. Luke's International University, Tokyo, Japan
| | - H Y So
- Alice Ho Miu Ling Nethersole Hospital, New Territories, Hong Kong SAR, China
| | - Kwok Leung Tsui
- School of Data Science, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| |
Collapse
|
83
|
Data-Driven Modeling for Multiphysics Parametrized Problems-Application to Induction Hardening Process. METALS 2021. [DOI: 10.3390/met11050738] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Data-driven modeling provides an efficient approach to compute approximate solutions for complex multiphysics parametrized problems such as induction hardening (IH) process. Basically, some physical quantities of interest (QoI) related to the IH process will be evaluated under real-time constraint, without any explicit knowledge of the physical behavior of the system. Hence, computationally expensive finite element models will be replaced by a parametric solution, called metamodel. Two data-driven models for temporal evolution of temperature and austenite phase transformation, during induction heating, were first developed by using the proper orthogonal decomposition based reduced-order model followed by a nonlinear regression method for temperature field and a classification combined with regression for austenite evolution. Then, data-driven and hybrid models were created to predict hardness, after quenching. It is shown that the results of artificial intelligence models are promising and provide good approximations in the low-data limit case.
Collapse
|
84
|
Leonard F, Gilligan J, Barrett MJ. Predicting Admissions From a Paediatric Emergency Department - Protocol for Developing and Validating a Low-Dimensional Machine Learning Prediction Model. Front Big Data 2021; 4:643558. [PMID: 33937750 PMCID: PMC8085432 DOI: 10.3389/fdata.2021.643558] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 03/22/2021] [Indexed: 12/02/2022] Open
Abstract
Introduction: Patients boarding in the Emergency Department can contribute to overcrowding, leading to longer waiting times and patients leaving without being seen or completing their treatment. The early identification of potential admissions could act as an additional decision support tool to alert clinicians that a patient needs to be reviewed for admission and would also be of benefit to bed managers in advance bed planning for the patient. We aim to create a low-dimensional model predicting admissions early from the paediatric Emergency Department. Methods and Analysis: The methodology Cross Industry Standard Process for Data Mining (CRISP-DM) will be followed. The dataset will comprise of 2 years of data, ~76,000 records. Potential predictors were identified from previous research, comprising of demographics, registration details, triage assessment, hospital usage and past medical history. Fifteen models will be developed comprised of 3 machine learning algorithms (Logistic regression, naïve Bayes and gradient boosting machine) and 5 sampling methods, 4 of which are aimed at addressing class imbalance (undersampling, oversampling, and synthetic oversampling techniques). The variables of importance will then be identified from the optimal model (selected based on the highest Area under the curve) and used to develop an additional low-dimensional model for deployment. Discussion: A low-dimensional model comprised of routinely collected data, captured up to post triage assessment would benefit many hospitals without data rich platforms for the development of models with a high number of predictors. Novel to the planned study is the use of data from the Republic of Ireland and the application of sampling techniques aimed at improving model performance impacted by an imbalance between admissions and discharges in the outcome variable.
Collapse
Affiliation(s)
- Fiona Leonard
- Business Intelligence Unit, Children's Health Ireland at Crumlin, Dublin, Ireland
| | - John Gilligan
- School of Computer Science, Technological University Dublin, Dublin, Ireland
| | - Michael J Barrett
- Department of Emergency Medicine, Children's Health Ireland at Crumlin, Dublin, Ireland.,School of Medicine, University College Dublin, Dublin, Ireland
| |
Collapse
|
85
|
|
86
|
SMOTE-ENC: A Novel SMOTE-Based Method to Generate Synthetic Data for Nominal and Continuous Features. APPLIED SYSTEM INNOVATION 2021. [DOI: 10.3390/asi4010018] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Real-world datasets are heavily skewed where some classes are significantly outnumbered by the other classes. In these situations, machine learning algorithms fail to achieve substantial efficacy while predicting these underrepresented instances. To solve this problem, many variations of synthetic minority oversampling methods (SMOTE) have been proposed to balance datasets which deal with continuous features. However, for datasets with both nominal and continuous features, SMOTE-NC is the only SMOTE-based oversampling technique to balance the data. In this paper, we present a novel minority oversampling method, SMOTE-ENC (SMOTE—Encoded Nominal and Continuous), in which nominal features are encoded as numeric values and the difference between two such numeric values reflects the amount of change of association with the minority class. Our experiments show that classification models using the SMOTE-ENC method offer better prediction than models using SMOTE-NC when the dataset has a substantial number of nominal features and also when there is some association between the categorical features and the target class. Additionally, our proposed method addressed one of the major limitations of the SMOTE-NC algorithm. SMOTE-NC can be applied only on mixed datasets that have features consisting of both continuous and nominal features and cannot function if all the features of the dataset are nominal. Our novel method has been generalized to be applied to both mixed datasets and nominal-only datasets.
Collapse
|
87
|
Kokkotis C, Moustakidis S, Baltzopoulos V, Giakas G, Tsaopoulos D. Identifying Robust Risk Factors for Knee Osteoarthritis Progression: An Evolutionary Machine Learning Approach. Healthcare (Basel) 2021; 9:260. [PMID: 33804560 PMCID: PMC8000487 DOI: 10.3390/healthcare9030260] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Revised: 02/22/2021] [Accepted: 02/23/2021] [Indexed: 12/27/2022] Open
Abstract
Knee osteoarthritis (KOA) is a multifactorial disease which is responsible for more than 80% of the osteoarthritis disease's total burden. KOA is heterogeneous in terms of rates of progression with several different phenotypes and a large number of risk factors, which often interact with each other. A number of modifiable and non-modifiable systemic and mechanical parameters along with comorbidities as well as pain-related factors contribute to the development of KOA. Although models exist to predict the onset of the disease or discriminate between asymptotic and OA patients, there are just a few studies in the recent literature that focused on the identification of risk factors associated with KOA progression. This paper contributes to the identification of risk factors for KOA progression via a robust feature selection (FS) methodology that overcomes two crucial challenges: (i) the observed high dimensionality and heterogeneity of the available data that are obtained from the Osteoarthritis Initiative (OAI) database and (ii) a severe class imbalance problem posed by the fact that the KOA progressors class is significantly smaller than the non-progressors' class. The proposed feature selection methodology relies on a combination of evolutionary algorithms and machine learning (ML) models, leading to the selection of a relatively small feature subset of 35 risk factors that generalizes well on the whole dataset (mean accuracy of 71.25%). We investigated the effectiveness of the proposed approach in a comparative analysis with well-known FS techniques with respect to metrics related to both prediction accuracy and generalization capability. The impact of the selected risk factors on the prediction output was further investigated using SHapley Additive exPlanations (SHAP). The proposed FS methodology may contribute to the development of new, efficient risk stratification strategies and identification of risk phenotypes of each KOA patient to enable appropriate interventions.
Collapse
Affiliation(s)
- Christos Kokkotis
- Institute for Bio-Economy & Agri-Technology, Center for Research and Technology Hellas, 60361 Volos, Greece;
- Department of Physical Education & Sport Science, University of Thessaly, 38221 Trikala, Greece;
| | | | - Vasilios Baltzopoulos
- Research Institute for Sport and Exercises Sciences, Liverpool John Moores University, Liverpool L3 3AF, UK;
| | - Giannis Giakas
- Department of Physical Education & Sport Science, University of Thessaly, 38221 Trikala, Greece;
| | - Dimitrios Tsaopoulos
- Institute for Bio-Economy & Agri-Technology, Center for Research and Technology Hellas, 60361 Volos, Greece;
| |
Collapse
|
88
|
Cousyn L, Navarro V, Chavez M. Preictal state detection using prodromal symptoms: A machine learning approach. Epilepsia 2021; 62:e42-e47. [PMID: 33465245 DOI: 10.1111/epi.16804] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2020] [Revised: 12/13/2020] [Accepted: 12/13/2020] [Indexed: 12/01/2022]
Abstract
A reliable identification of a high-risk state for upcoming seizures may allow for preemptive treatment and improve the quality of patients' lives. We evaluated the ability of prodromal symptoms to predict preictal states using a machine learning (ML) approach. Twenty-four patients with drug-resistant epilepsy were admitted for continuous video-electroencephalographic monitoring and filled out a daily four-point questionnaire on prodromal symptoms. Data were then classified into (1) a preictal group for questionnaires completed in a 24-h period prior to at least one seizure (n1 = 58) and (2) an interictal group for questionnaires completed in a 24-h period without seizures (n2 = 190). Our prediction model was based on a support vector machine classifier and compared to a Fisher's linear classifier. The combination of all the prodromal symptoms yielded a good prediction performance (area under the curve [AUC] = .72, 95% confidence interval [CI] = .61-.81). This performance was significantly enhanced by selecting a subset of the most relevant symptoms (AUC = .80, 95% CI = .69-.88). In comparison, the linear classifier systematically failed (AUCs < .6). Our findings indicate that the ML analysis of prodromal symptoms is a promising approach to identifying preictal states prior to seizures. This could pave the way for development of clinical strategies in seizure prevention and even a noninvasive alarm system.
Collapse
Affiliation(s)
- Louis Cousyn
- Department of Neurology, Epilepsy Unit, Pitié-Salpêtrière Hospital, Public Hospital Network of Paris, Paris, France.,Paris Brain Institute, ICM (INSERM-U1127, CNRS-UMR7225), Paris, France.,Center of Reference for Rare Epilepsies, Pitié-Salpêtrière Hospital, Paris, France.,Sorbonne University, Paris, France
| | - Vincent Navarro
- Department of Neurology, Epilepsy Unit, Pitié-Salpêtrière Hospital, Public Hospital Network of Paris, Paris, France.,Paris Brain Institute, ICM (INSERM-U1127, CNRS-UMR7225), Paris, France.,Center of Reference for Rare Epilepsies, Pitié-Salpêtrière Hospital, Paris, France.,Sorbonne University, Paris, France
| | - Mario Chavez
- Paris Brain Institute, ICM (INSERM-U1127, CNRS-UMR7225), Paris, France
| |
Collapse
|
89
|
Robust Detection of COVID-19 in Cough Sounds: Using Recurrence Dynamics and Variable Markov Model. SN COMPUTER SCIENCE 2021; 2:34. [PMID: 33458700 PMCID: PMC7802616 DOI: 10.1007/s42979-020-00422-6] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Accepted: 12/08/2020] [Indexed: 01/31/2023]
Abstract
COVID-19, otherwise known as the coronavirus, has precipitated the world into a pandemic that has infected, as of the time of writing, more than 10 million persons worldwide and caused the death of more than 500,000 persons. Early symptoms of the virus include trouble breathing, fever and fatigue and over 60% of people experience a dry cough. Due to the devastating impact of COVID-19 and the tragic loss of lives, it is of the utmost urgency to develop methods for the early detection of the disease that may help limit its spread as well as aid in the development of targeted solutions. Coughs and other vocal sounds contain pulmonary health information that can be used for diagnostic purposes, and recent studies in chaotic dynamics have shown that nonlinear phenomena exist in vocal signals. The present work investigates the use of symbolic recurrence quantification measures with MFCC features for the automatic detection of COVID-19 in cough sounds of healthy and sick individuals. Our performance evaluation reveals that our symbolic dynamics measures capture the complex dynamics in the vocal sounds and are highly effective at discriminating sick and healthy coughs. We apply our method to sustained vowel 'ah' recordings, and show that our model is robust for the detection of the disease in sustained vowel utterances as well. Furthermore, we introduce a robust novel method of informative undersampling using information rate to deal with the imbalance in our dataset, due to the unavailability of an equal number of sick and healthy recordings. The proposed model achieves a mean classification performance of 97% and 99%, and a mean F 1 -score of 91% and 89% after optimization, for coughs and sustained vowels, respectively.
Collapse
|
90
|
Abstract
One of the significant challenges in machine learning is the classification of imbalanced data. In many situations, standard classifiers cannot learn how to distinguish minority class examples from the others. Since many real problems are unbalanced, this problem has become very relevant and deeply studied today. This paper presents a new preprocessing method based on Delaunay tessellation and the preprocessing algorithm SMOTE (Synthetic Minority Over-sampling Technique), which we call DTO-SMOTE (Delaunay Tessellation Oversampling SMOTE). DTO-SMOTE constructs a mesh of simplices (in this paper, we use tetrahedrons) for creating synthetic examples. We compare results with five preprocessing algorithms (GEOMETRIC-SMOTE, SVM-SMOTE, SMOTE-BORDERLINE-1, SMOTE-BORDERLINE-2, and SMOTE), eight classification algorithms, and 61 binary-class data sets. For some classifiers, DTO-SMOTE has higher performance than others in terms of Area Under the ROC curve (AUC), Geometric Mean (GEO), and Generalized Index of Balanced Accuracy (IBA).
Collapse
|
91
|
Vandewiele G, Dehaene I, Kovács G, Sterckx L, Janssens O, Ongenae F, De Backere F, De Turck F, Roelens K, Decruyenaere J, Van Hoecke S, Demeester T. Overly optimistic prediction results on imbalanced data: a case study of flaws and benefits when applying over-sampling. Artif Intell Med 2020; 111:101987. [PMID: 33461687 DOI: 10.1016/j.artmed.2020.101987] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 09/09/2020] [Accepted: 11/12/2020] [Indexed: 01/10/2023]
Abstract
Information extracted from electrohysterography recordings could potentially prove to be an interesting additional source of information to estimate the risk on preterm birth. Recently, a large number of studies have reported near-perfect results to distinguish between recordings of patients that will deliver term or preterm using a public resource, called the Term/Preterm Electrohysterogram database. However, we argue that these results are overly optimistic due to a methodological flaw being made. In this work, we focus on one specific type of methodological flaw: applying over-sampling before partitioning the data into mutually exclusive training and testing sets. We show how this causes the results to be biased using two artificial datasets and reproduce results of studies in which this flaw was identified. Moreover, we evaluate the actual impact of over-sampling on predictive performance, when applied prior to data partitioning, using the same methodologies of related studies, to provide a realistic view of these methodologies' generalization capabilities. We make our research reproducible by providing all the code under an open license.
Collapse
Affiliation(s)
- Gilles Vandewiele
- IDLab, Ghent University - imec, Technologiepark-Zwijnaarde 126, Ghent, Belgium.
| | - Isabelle Dehaene
- Department of Gynaecology and Obstetrics, Ghent University Hospital, Corneel Heymanslaan 10, Ghent, Belgium
| | - György Kovács
- Analytical Minds Ltd Arpad street 5, Beregsurany, Hungary
| | - Lucas Sterckx
- IDLab, Ghent University - imec, Technologiepark-Zwijnaarde 126, Ghent, Belgium
| | - Olivier Janssens
- IDLab, Ghent University - imec, Technologiepark-Zwijnaarde 126, Ghent, Belgium
| | - Femke Ongenae
- IDLab, Ghent University - imec, Technologiepark-Zwijnaarde 126, Ghent, Belgium
| | - Femke De Backere
- IDLab, Ghent University - imec, Technologiepark-Zwijnaarde 126, Ghent, Belgium
| | - Filip De Turck
- IDLab, Ghent University - imec, Technologiepark-Zwijnaarde 126, Ghent, Belgium
| | - Kristien Roelens
- Department of Gynaecology and Obstetrics, Ghent University Hospital, Corneel Heymanslaan 10, Ghent, Belgium
| | - Johan Decruyenaere
- Department of Intensive Care Medicine, Ghent University Hospital, Corneel Heymanslaan 10, Ghent, Belgium
| | - Sofie Van Hoecke
- IDLab, Ghent University - imec, Technologiepark-Zwijnaarde 126, Ghent, Belgium
| | - Thomas Demeester
- IDLab, Ghent University - imec, Technologiepark-Zwijnaarde 126, Ghent, Belgium
| |
Collapse
|
92
|
Fassina L, Faragli A, Lo Muzio FP, Kelle S, Campana C, Pieske B, Edelmann F, Alogna A. A Random Shuffle Method to Expand a Narrow Dataset and Overcome the Associated Challenges in a Clinical Study: A Heart Failure Cohort Example. Front Cardiovasc Med 2020; 7:599923. [PMID: 33330661 PMCID: PMC7714902 DOI: 10.3389/fcvm.2020.599923] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 10/19/2020] [Indexed: 11/13/2022] Open
Abstract
Heart failure (HF) affects at least 26 million people worldwide, so predicting adverse events in HF patients represents a major target of clinical data science. However, achieving large sample sizes sometimes represents a challenge due to difficulties in patient recruiting and long follow-up times, increasing the problem of missing data. To overcome the issue of a narrow dataset cardinality (in a clinical dataset, the cardinality is the number of patients in that dataset), population-enhancing algorithms are therefore crucial. The aim of this study was to design a random shuffle method to enhance the cardinality of an HF dataset while it is statistically legitimate, without the need of specific hypotheses and regression models. The cardinality enhancement was validated against an established random repeated-measures method with regard to the correctness in predicting clinical conditions and endpoints. In particular, machine learning and regression models were employed to highlight the benefits of the enhanced datasets. The proposed random shuffle method was able to enhance the HF dataset cardinality (711 patients before dataset preprocessing) circa 10 times and circa 21 times when followed by a random repeated-measures approach. We believe that the random shuffle method could be used in the cardiovascular field and in other data science problems when missing data and the narrow dataset cardinality represent an issue.
Collapse
Affiliation(s)
- Lorenzo Fassina
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Alessandro Faragli
- Department of Internal Medicine and Cardiology, Deutsches Herzzentrum Berlin, Berlin, Germany.,Department of Internal Medicine and Cardiology, Charité - Universitätsmedizin Berlin, Berlin, Germany.,Berlin Institute of Health (BIH), Berlin, Germany.,DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, Berlin, Germany
| | - Francesco Paolo Lo Muzio
- Department of Surgery, Dentistry, Paediatrics and Gynaecology, University of Verona, Verona, Italy.,Department of Medicine and Surgery, University of Parma, Parma, Italy
| | - Sebastian Kelle
- Department of Internal Medicine and Cardiology, Deutsches Herzzentrum Berlin, Berlin, Germany.,Department of Internal Medicine and Cardiology, Charité - Universitätsmedizin Berlin, Berlin, Germany.,Berlin Institute of Health (BIH), Berlin, Germany.,DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, Berlin, Germany
| | - Carlo Campana
- Department of Cardiology, Sant'Anna Hospital, ASST-Lariana, Como, Italy
| | - Burkert Pieske
- Department of Internal Medicine and Cardiology, Deutsches Herzzentrum Berlin, Berlin, Germany.,Department of Internal Medicine and Cardiology, Charité - Universitätsmedizin Berlin, Berlin, Germany.,Berlin Institute of Health (BIH), Berlin, Germany.,DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, Berlin, Germany
| | - Frank Edelmann
- Department of Internal Medicine and Cardiology, Charité - Universitätsmedizin Berlin, Berlin, Germany.,Berlin Institute of Health (BIH), Berlin, Germany.,DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, Berlin, Germany
| | - Alessio Alogna
- Department of Internal Medicine and Cardiology, Charité - Universitätsmedizin Berlin, Berlin, Germany.,Berlin Institute of Health (BIH), Berlin, Germany.,DZHK (German Centre for Cardiovascular Research), Partner Site Berlin, Berlin, Germany
| |
Collapse
|
93
|
Veeraragavan S, Gopalai AA, Gouwanda D, Ahmad SA. Parkinson's Disease Diagnosis and Severity Assessment Using Ground Reaction Forces and Neural Networks. Front Physiol 2020; 11:587057. [PMID: 33240106 PMCID: PMC7680965 DOI: 10.3389/fphys.2020.587057] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Accepted: 10/09/2020] [Indexed: 11/23/2022] Open
Abstract
Gait analysis plays a key role in the diagnosis of Parkinson’s Disease (PD), as patients generally exhibit abnormal gait patterns compared to healthy controls. Current diagnosis and severity assessment procedures entail manual visual examinations of motor tasks, speech, and handwriting, among numerous other tests, which can vary between clinicians based on their expertise and visual observation of gait tasks. Automating gait differentiation procedure can serve as a useful tool in early diagnosis and severity assessment of PD and limits the data collection to solely walking gait. In this research, a holistic, non-intrusive method is proposed to diagnose and assess PD severity in its early and moderate stages by using only Vertical Ground Reaction Force (VGRF). From the VGRF data, gait features are extracted and selected to use as training features for the Artificial Neural Network (ANN) model to diagnose PD using cross validation. If the diagnosis is positive, another ANN model will predict their Hoehn and Yahr (H&Y) score to assess their PD severity using the same VGRF data. PD Diagnosis is achieved with a high accuracy of 97.4% using simple network architecture. Additionally, the results indicate a better performance compared to other complex machine learning models that have been researched previously. Severity Assessment is also performed on the H&Y scale with 87.1% accuracy. The results of this study show that it is plausible to use only VGRF data in diagnosing and assessing early stage Parkinson’s Disease, helping patients manage the symptoms earlier and giving them a better quality of life.
Collapse
Affiliation(s)
- Srivardhini Veeraragavan
- Advanced Engineering Platform, School of Engineering, Monash University Malaysia, Subang Jaya, Malaysia
| | - Alpha Agape Gopalai
- Advanced Engineering Platform, School of Engineering, Monash University Malaysia, Subang Jaya, Malaysia
| | - Darwin Gouwanda
- Advanced Engineering Platform, School of Engineering, Monash University Malaysia, Subang Jaya, Malaysia
| | - Siti Anom Ahmad
- Malaysian Research Institute on Ageing, Universiti Putra Malaysia, Selangor, Malaysia
| |
Collapse
|
94
|
Hahn S, Mackey S, Cousijn J, Foxe JJ, Heinz A, Hester R, Hutchinson K, Kiefer F, Korucuoglu O, Lett T, Li CSR, London E, Lorenzetti V, Maartje L, Momenan R, Orr C, Paulus M, Schmaal L, Sinha R, Sjoerds Z, Stein DJ, Stein E, van Holst RJ, Veltman D, Walter H, Wiers RW, Yucel M, Thompson PM, Conrod P, Allgaier N, Garavan H. Predicting alcohol dependence from multi-site brain structural measures. Hum Brain Mapp 2020; 43:555-565. [PMID: 33064342 PMCID: PMC8675424 DOI: 10.1002/hbm.25248] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 09/21/2020] [Accepted: 10/06/2020] [Indexed: 12/16/2022] Open
Abstract
To identify neuroimaging biomarkers of alcohol dependence (AD) from structural magnetic resonance imaging, it may be useful to develop classification models that are explicitly generalizable to unseen sites and populations. This problem was explored in a mega‐analysis of previously published datasets from 2,034 AD and comparison participants spanning 27 sites curated by the ENIGMA Addiction Working Group. Data were grouped into a training set used for internal validation including 1,652 participants (692 AD, 24 sites), and a test set used for external validation with 382 participants (146 AD, 3 sites). An exploratory data analysis was first conducted, followed by an evolutionary search based feature selection to site generalizable and high performing subsets of brain measurements. Exploratory data analysis revealed that inclusion of case‐ and control‐only sites led to the inadvertent learning of site‐effects. Cross validation methods that do not properly account for site can drastically overestimate results. Evolutionary‐based feature selection leveraging leave‐one‐site‐out cross‐validation, to combat unintentional learning, identified cortical thickness in the left superior frontal gyrus and right lateral orbitofrontal cortex, cortical surface area in the right transverse temporal gyrus, and left putamen volume as final features. Ridge regression restricted to these features yielded a test‐set area under the receiver operating characteristic curve of 0.768. These findings evaluate strategies for handling multi‐site data with varied underlying class distributions and identify potential biomarkers for individuals with current AD.
Collapse
Affiliation(s)
- Sage Hahn
- Department of Psychiatry, University of Vermont College of Medicine, Burlington, Vermont, USA
| | - Scott Mackey
- Department of Psychiatry, University of Vermont College of Medicine, Burlington, Vermont, USA
| | - Janna Cousijn
- Department of Psychology, University of Amsterdam, Amsterdam, the Netherlands
| | - John J Foxe
- Department of Neuroscience & The Ernest J. Del Monte Institute for Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, USA
| | - Andreas Heinz
- Department of Psychiatry and Psychotherapy, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Robert Hester
- Melbourne School of Psychological Sciences, University of Melbourne, Melbourne, Australia
| | - Kent Hutchinson
- Department of Psychology and Neuroscience, University of Colorado, Boulder, Colorado, USA
| | - Falk Kiefer
- Department of Addictive Behaviour and Addiction Medicine, Central Institute of Mental Health, Heidelberg University, Mannheim, Germany
| | - Ozlem Korucuoglu
- Department of Psychiatry, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Tristram Lett
- Department of Psychiatry and Psychotherapy, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Chiang-Shan R Li
- Department of Psychiatry, Yale University School of Medicine, New Haven, Connecticut, USA
| | - Edythe London
- David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, California, USA
| | - Valentina Lorenzetti
- Monash Institute of Cognitive and Clinical Neurosciences & School of Psychological Sciences, Monash University, Melbourne, Australia.,School of Psychology, Faculty of Health Sciences, Australian Catholic University, Melbourne, Australia.,Department of Psychological Sciences, the University of Liverpool, Liverpool, UK
| | - Luijten Maartje
- Behavioural Science Institute, Radboud University, Nijmegen, the Netherlands
| | - Reza Momenan
- Clinical NeuroImaging Research Core, Division of Intramural Clinical and Biological Research, National Institute on Alcohol Abuse and Alcoholism, Bethesda, Maryland, USA
| | - Catherine Orr
- Department of Psychiatry, University of Vermont College of Medicine, Burlington, Vermont, USA
| | - Martin Paulus
- VA San Diego Healthcare System and Department of Psychiatry, University of California San Diego, La Jolla, California, USA.,Laureate Institute for Brain Research, Tulsa, Oklahoma, USA
| | - Lianne Schmaal
- Orygen, The National Centre of Excellence in Youth Mental Health, Parkville, Australia.,Centre for Youth Mental Health, The University of Melbourne, Melbourne, Australia
| | - Rajita Sinha
- Department of Psychiatry, Yale University School of Medicine, New Haven, Connecticut, USA
| | - Zsuzsika Sjoerds
- Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany.,Institute of Psychology, Cognitive Psychology Unit & Leiden Institute for Brain and Cognition, Leiden University, Leiden, the Netherlands
| | - Dan J Stein
- SA MRC Unit on Risk & Resilience in Mental Disorders, Department of Psychiatry & Neuroscience Institute, University of Cape Town, Cape Town, South Africa
| | - Elliot Stein
- Neuroimaging Research Branch, Intramural Research Program, National Institute on Drug Abuse, Baltimore, Maryland, USA
| | - Ruth J van Holst
- Department of Psychiatry, Amsterdam UMC, Location AMC, University of Amsterdam, Amsterdam, the Netherlands
| | - Dick Veltman
- Department of Psychiatry, VU University Medical Center, Amsterdam, the Netherlands
| | - Henrik Walter
- Department of Psychiatry and Psychotherapy, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Reinout W Wiers
- Department of Psychology, University of Amsterdam, Amsterdam, the Netherlands
| | - Murat Yucel
- David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, California, USA.,Melbourne Neuropsychiatry Centre, Department of Psychiatry, The University of Melbourne and Melbourne Health, Melbourne, Australia
| | - Paul M Thompson
- Imaging Genetics Center, Stevens Institute for Neuroimaging & Informatics, Keck School of Medicine, University of Southern California, California, USA
| | - Patricia Conrod
- Department of Psychiatry, Université de Montreal, CHU Ste Justine Hospital, Montreal, Quebec, Canada
| | - Nicholas Allgaier
- Department of Psychiatry, University of Vermont College of Medicine, Burlington, Vermont, USA
| | - Hugh Garavan
- Department of Psychiatry, University of Vermont College of Medicine, Burlington, Vermont, USA
| |
Collapse
|
95
|
A Hybrid Data Balancing Method for Classification of Imbalanced Training Data within Google Earth Engine: Case Studies from Mountainous Regions. REMOTE SENSING 2020. [DOI: 10.3390/rs12203301] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Distribution of Land Cover (LC) classes is mostly imbalanced with some majority LC classes dominating against minority classes in mountainous areas. Although standard Machine Learning (ML) classifiers can achieve high accuracies for majority classes, they largely fail to provide reasonable accuracies for minority classes. This is mainly due to the class imbalance problem. In this study, a hybrid data balancing method, called the Partial Random Over-Sampling and Random Under-Sampling (PROSRUS), was proposed to resolve the class imbalance issue. Unlike most data balancing techniques which seek to fully balance datasets, PROSRUS uses a partial balancing approach with hundreds of fractions for majority and minority classes to balance datasets. For this, time-series of Landsat-8 and SRTM topographic data along with various spectral indices and topographic data were used over three mountainous sites within the Google Earth Engine (GEE) cloud platform. It was observed that PROSRUS had better performance than several other balancing methods and increased the accuracy of minority classes without a reduction in overall classification accuracy. Furthermore, adopting complementary information, particularly topographic data, considerably increased the accuracy of minority classes in mountainous areas. Finally, the obtained results from PROSRUS indicated that every imbalanced dataset requires a specific fraction(s) for addressing the class imbalance problem, because different datasets contain various characteristics.
Collapse
|
96
|
Alshakhs F, Alharthi H, Aslam N, Khan IU, Elasheri M. Predicting Postoperative Length of Stay for Isolated Coronary Artery Bypass Graft Patients Using Machine Learning. Int J Gen Med 2020; 13:751-762. [PMID: 33061545 PMCID: PMC7537993 DOI: 10.2147/ijgm.s250334] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 08/10/2020] [Indexed: 11/23/2022] Open
Abstract
Purpose Predictive analytics (PA) is a new trending approach in the field of healthcare that uses machine learning to build a prediction model using supervised learning algorithms. Isolated coronary artery bypass grafting (iCABG), an open-heart surgery, is commonly performed in the treatment of coronary heart disease. Aim The aim of this study was to develop and evaluate a model to predict postoperative length of stay (PLoS) for iCABG patients using supervised machine learning techniques, and to identify the features with the highest contribution to the model. Methods This is a retrospective study that uses historic data of adult patients who underwent isolated CABG (iCABG). After initial data pre-processing, data imputation using the kNN method was applied. The study used five prediction models using Naïve Bayes, Decision Tree, Random Forest, Logistic Regression and k Nearest Neighbor algorithms. Data imbalance was managed using the following widely used methods: oversampling, undersampling, "Both", and random over-sampling examples (ROSE). The features selection process was conducted using the Boruta method. Two techniques were applied to examine the performance of the models, (70%, 30%) split and cross-validation, respectively. Models were evaluated by comparing their performance using AUC and other metrics. Results In the final dataset, six distinct features and 621 instances were used to develop the models. A total of 20 models were developed using R statistical software. The model generated using Random Forest with "Both" resampling method and cross-validation technique was deemed the best fit (AUC=0.81; F1 score=0.82; and recall=0.82). Attributes found to be highly predictive of PLoS were pulmonary artery systolic, age, height, EuroScore II, intra-aortic balloon pump used, and complications during operation. Conclusion This study demonstrates the significance and effectiveness of building a model that predicts PLoS for iCABG patients using patient specifications and pre-/intra-operative measures.
Collapse
Affiliation(s)
- Fatima Alshakhs
- Department of Health Information Management & Technology, College of Public Health, Imam Abdulrahman Bin Faisal University, Dammam 34221-4237, Saudi Arabia
| | - Hana Alharthi
- Department of Health Information Management & Technology, College of Public Health, Imam Abdulrahman Bin Faisal University, Dammam 34221-4237, Saudi Arabia
| | - Nida Aslam
- Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam 34221-4237, Saudi Arabia
| | - Irfan Ullah Khan
- Department of Computer Science, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam 34221-4237, Saudi Arabia
| | - Mohamed Elasheri
- Department of Cardiac Surgery, Saud Albabtain Cardiac Centre, Dammam 32245, Saudi Arabia
| |
Collapse
|
97
|
A Self-Care Prediction Model for Children with Disability Based on Genetic Algorithm and Extreme Gradient Boosting. MATHEMATICS 2020. [DOI: 10.3390/math8091590] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Detecting self-care problems is one of important and challenging issues for occupational therapists, since it requires a complex and time-consuming process. Machine learning algorithms have been recently applied to overcome this issue. In this study, we propose a self-care prediction model called GA-XGBoost, which combines genetic algorithms (GAs) with extreme gradient boosting (XGBoost) for predicting self-care problems of children with disability. Selecting the feature subset affects the model performance; thus, we utilize GA to optimize finding the optimum feature subsets toward improving the model’s performance. To validate the effectiveness of GA-XGBoost, we present six experiments: comparing GA-XGBoost with other machine learning models and previous study results, a statistical significant test, impact analysis of feature selection and comparison with other feature selection methods, and sensitivity analysis of GA parameters. During the experiments, we use accuracy, precision, recall, and f1-score to measure the performance of the prediction models. The results show that GA-XGBoost obtains better performance than other prediction models and the previous study results. In addition, we design and develop a web-based self-care prediction to help therapist diagnose the self-care problems of children with disabilities. Therefore, appropriate treatment/therapy could be performed for each child to improve their therapeutic outcome.
Collapse
|
98
|
Santos MS, Abreu PH, Wilk S, Santos J. How distance metrics influence missing data imputation with k-nearest neighbours. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2020.05.032] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
99
|
Automated Classification of Significant Prostate Cancer on MRI: A Systematic Review on the Performance of Machine Learning Applications. Cancers (Basel) 2020; 12:cancers12061606. [PMID: 32560558 PMCID: PMC7352160 DOI: 10.3390/cancers12061606] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2020] [Revised: 06/13/2020] [Accepted: 06/14/2020] [Indexed: 11/16/2022] Open
Abstract
Significant prostate carcinoma (sPCa) classification based on MRI using radiomics or deep learning approaches has gained much interest, due to the potential application in assisting in clinical decision-making. OBJECTIVE To systematically review the literature (i) to determine which algorithms are most frequently used for sPCa classification, (ii) to investigate whether there exists a relation between the performance and the method or the MRI sequences used, (iii) to assess what study design factors affect the performance on sPCa classification, and (iv) to research whether performance had been evaluated in a clinical setting Methods: The databases Embase and Ovid MEDLINE were searched for studies describing machine learning or deep learning classification methods discriminating between significant and nonsignificant PCa on multiparametric MRI that performed a valid validation procedure. Quality was assessed by the modified radiomics quality score. We computed the median area under the receiver operating curve (AUC) from overall methods and the interquartile range. RESULTS From 2846 potentially relevant publications, 27 were included. The most frequent algorithms used in the literature for PCa classification are logistic regression (22%) and convolutional neural networks (CNNs) (22%). The median AUC was 0.79 (interquartile range: 0.77-0.87). No significant effect of number of included patients, image sequences, or reference standard on the reported performance was found. Three studies described an external validation and none of the papers described a validation in a prospective clinical trial. CONCLUSIONS To unlock the promising potential of machine and deep learning approaches, validation studies and clinical prospective studies should be performed with an established protocol to assess the added value in decision-making.
Collapse
|
100
|
Drug design by machine-trained elastic networks: predicting Ser/Thr-protein kinase inhibitors' activities. Mol Divers 2020; 25:899-909. [PMID: 32222890 DOI: 10.1007/s11030-020-10074-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 03/11/2020] [Indexed: 12/23/2022]
Abstract
An elastic network model (ENM) represents a molecule as a matrix of pairwise atomic interactions. Rich in coded information, ENMs are hereby proposed as a novel tool for the prediction of the activity of series of molecules, with widely different chemical structures, but a common biological activity. The new approach is developed and tested using a set of 183 inhibitors of serine/threonine-protein kinase enzyme (Plk3) which is an enzyme implicated in the regulation of cell cycle and tumorigenesis. The elastic network (EN) predictive model is found to exhibit high accuracy and speed compared to descriptor-based machine-trained modeling. EN modeling appears to be a highly promising new tool for the high demands of industrial applications such as drug and material design.
Collapse
|