1
|
Autism spectrum disorder detection with kNN imputer and machine learning classifiers via questionnaire mode of screening. Health Inf Sci Syst 2024; 12:18. [PMID: 38464462 PMCID: PMC10917726 DOI: 10.1007/s13755-024-00277-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 01/17/2024] [Indexed: 03/12/2024] Open
Abstract
Autism spectrum disorder (ASD) is a neurodevelopmental disorder. ASD cannot be fully cured, but early-stage diagnosis followed by therapies and rehabilitation helps an autistic person to live a quality life. Clinical diagnosis of ASD symptoms via questionnaire and screening tests such as Autism Spectrum Quotient-10 (AQ-10) and Quantitative Check-list for Autism in Toddlers (Q-chat) are expensive, inaccessible, and time-consuming processes. Machine learning (ML) techniques are beneficial to predict ASD easily at the initial stage of diagnosis. The main aim of this work is to classify ASD and typical developed (TD) class data using ML classifiers. In our work, we have used different ASD data sets of all age groups (toddlers, adults, children, and adolescents) to classify ASD and TD cases. We implemented One-Hot encoding to translate categorical data into numerical data during preprocessing. We then used kNN Imputer with MinMaxScaler feature transformation to handle missing values and data normalization. ASD and TD class data is classified using Support vector machine, k-nearest-neighbor (KNN), random forest (RF), and artificial neural network classifiers. RF gives the best performance in terms of the accuracy of 100% with different training and testing data split for all four types of data sets and has no over-fitting issue. We have also examined our results with already published work, including recent methods like Deep Neural Network (DNN) and Convolution Neural Network (CNN). Even using complex architectures like DNN and CNN, our proposed methods provide the best results with low-complexity models. In contrast, existing methods have shown accuracy upto 98% with log-loss upto 15%. Our proposed methodology demonstrates the improved generalization for real-time ASD detection during clinical trials.
Collapse
|
2
|
Automatic classification of seizure and seizure-free EEG signals based on phase space reconstruction features. J Biol Phys 2024; 50:181-196. [PMID: 38466526 PMCID: PMC11106053 DOI: 10.1007/s10867-024-09654-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Accepted: 02/16/2024] [Indexed: 03/13/2024] Open
Abstract
Epilepsy is a type of brain disorder triggered by an abrupt electrical imbalance of neuronal networks. An electroencephalogram (EEG) is a diagnostic tool to capture the underlying brain mechanisms and detect seizure onset in epileptic patients. To detect seizures, neurologists need to manually monitor EEG recordings for long periods, which is challenging and susceptible to errors depending on expertise and experience. Therefore, automatic identification of seizure and seizure-free EEG signals becomes essential. This study introduces a method based on the features extracted from the phase space reconstruction for classifying seizure and seizure-free EEG signals. The computed features are derived from the elliptical area and interquartile range of the Euclidean distance by varying percentage values of data points ranging from 50 to 100%. We consider two public datasets and evaluate these features in each EEG epoch that includes the healthy, interictal, preictal, and ictal stages of epileptic subjects, utilizing the K-nearest neighbor classifier for classification. Results show that the features have higher values during the seizure than the seizure-free EEG signals and healthy subjects. Furthermore, the proposed features can effectively discriminate seizure EEG signals from the seizure-free and normal subjects with 100% accuracy, sensitivity, and specificity in both datasets. Likewise, the classification between the preictal stage and seizure EEG signals attains 98% accuracy. Overall, the reconstructed phase space features significantly enhance the accuracy of detecting epileptic EEG signals compared with existing methods. This advancement holds great potential in assisting neurologists in swiftly and accurately diagnosing epileptic seizures from EEG signals.
Collapse
|
3
|
Vehicle Position Detection Based on Machine Learning Algorithms in Dynamic Wireless Charging. SENSORS (BASEL, SWITZERLAND) 2024; 24:2346. [PMID: 38610560 PMCID: PMC11013965 DOI: 10.3390/s24072346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 03/19/2024] [Accepted: 04/05/2024] [Indexed: 04/14/2024]
Abstract
Dynamic wireless charging (DWC) has emerged as a viable approach to mitigate range anxiety by ensuring continuous and uninterrupted charging for electric vehicles in motion. DWC systems rely on the length of the transmitter, which can be categorized into long-track transmitters and segmented coil arrays. The segmented coil array, favored for its heightened efficiency and reduced electromagnetic interference, stands out as the preferred option. However, in such DWC systems, the need arises to detect the vehicle's position, specifically to activate the transmitter coils aligned with the receiver pad and de-energize uncoupled transmitter coils. This paper introduces various machine learning algorithms for precise vehicle position determination, accommodating diverse ground clearances of electric vehicles and various speeds. Through testing eight different machine learning algorithms and comparing the results, the random forest algorithm emerged as superior, displaying the lowest error in predicting the actual position.
Collapse
|
4
|
High-Dimensional Feature Selection for Automatic Classification of Coronary Stenosis Using an Evolutionary Algorithm. Diagnostics (Basel) 2024; 14:268. [PMID: 38337787 PMCID: PMC10855604 DOI: 10.3390/diagnostics14030268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 01/11/2024] [Accepted: 01/23/2024] [Indexed: 02/12/2024] Open
Abstract
In this paper, a novel strategy to perform high-dimensional feature selection using an evolutionary algorithm for the automatic classification of coronary stenosis is introduced. The method involves a feature extraction stage to form a bank of 473 features considering different types such as intensity, texture and shape. The feature selection task is carried out on a high-dimensional feature bank, where the search space is denoted by O(2n) and n=473. The proposed evolutionary search strategy was compared in terms of the Jaccard coefficient and accuracy classification with different state-of-the-art methods. The highest feature selection rate, along with the best classification performance, was obtained with a subset of four features, representing a 99% discrimination rate. In the last stage, the feature subset was used as input to train a support vector machine using an independent testing set. The classification of coronary stenosis cases involves a binary classification type by considering positive and negative classes. The highest classification performance was obtained with the four-feature subset in terms of accuracy (0.86) and Jaccard coefficient (0.75) metrics. In addition, a second dataset containing 2788 instances was formed from a public image database, obtaining an accuracy of 0.89 and a Jaccard Coefficient of 0.80. Finally, based on the performance achieved with the four-feature subset, they can be suitable for use in a clinical decision support system.
Collapse
|
5
|
PCA-WRKNN-assisted label-free SERS serum analysis platform enabling non-invasive diagnosis of Alzheimer's disease. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2023; 302:123088. [PMID: 37392535 DOI: 10.1016/j.saa.2023.123088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 06/26/2023] [Accepted: 06/27/2023] [Indexed: 07/03/2023]
Abstract
Alzheimer's disease (AD) is a progressive and irreversible neurodegenerative brain disorder with significant economic and societal impacts, whereas early AD diagnosis remains a considerable challenge. Here, a robust and convenient surface-enhanced Raman scattering (SERS) analysis platform was fabricated on a microarray chip to dissect the variation in serum composition for AD diagnosis, eliminating the invasive cerebrospinal fluid (CSF)-based and costly instrument-dependent diagnostic methods. AuNOs array prepared by self-assembly at liquid-liquid interface enabled the acquirement of SERS spectra with excellent reproducibility. Moreover, a finite-difference time-domain (FDTD) simulation suggested the significant plasmon hybridization generated by AuNOs aggregation, resulting in high signal-to-noise ratio SERS spectra. We established an AD mice model with Aβ1-40 induction followed by recording the serum SERS spectra at different stages. A multivariate analysis method of principal component analysis (PCA)-weighted representation-based k-nearest neighbor (WRKNN) was applied for the characteristics extraction to improve the classification performance, with an accuracy of over 95 %, an AUC of over 90 %, a sensitivity of over 80 %, and a specificity of over 96.7 %. The results of this study demonstrate the potential of SERS application as a diagnostic screening method, following further validation and optimization, which may open up new exciting opportunities for future biomedical applications.
Collapse
|
6
|
Application of Near-Infrared Spectroscopy and Fuzzy Improved Null Linear Discriminant Analysis for Rapid Discrimination of Milk Brands. Foods 2023; 12:3929. [PMID: 37959047 PMCID: PMC10649686 DOI: 10.3390/foods12213929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 10/18/2023] [Accepted: 10/25/2023] [Indexed: 11/15/2023] Open
Abstract
The quality of milk is tightly linked to its brand. A famous brand of milk always has good quality. Therefore, this study seeks to design a new fuzzy feature extraction method, called fuzzy improved null linear discriminant analysis (FiNLDA), to cluster the spectra of collected milk for identifying milk brands. To elevate the classification accuracy, FiNLDA was applied to process the near-infrared (NIR) spectra of milk acquired by the portable near-infrared spectrometer. The principal component analysis and Savitzky-Golay (SG) filtering algorithm were employed to lower dimensionality and eliminate noise in this system, respectively. Thereafter, improved null linear discriminant analysis (iNLDA) and FiNLDA were applied to attain the discriminant information of the NIR spectra. At last, the K-nearest neighbor classifier was utilized for assessing the performance of the identification system. The results indicated that the maximum classification accuracies of LDA, iNLDA and FiNLDA were 74.7%, 88% and 94.67%, respectively. Accordingly, the portable NIR spectrometer in combination with FiNLDA can classify milk brands correctly and effectively.
Collapse
|
7
|
Heterogeneous road traffic noise modeling at mid-block sections of mid-sized city in India. ENVIRONMENTAL MONITORING AND ASSESSMENT 2023; 195:1349. [PMID: 37861796 DOI: 10.1007/s10661-023-11924-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 09/30/2023] [Indexed: 10/21/2023]
Abstract
This study attempted to develop a computer-based software for monitoring the traffic noise under heterogeneous traffic condition at the morning peak (MP), off peak (OP), and evening peak (EP) periods of mid-block sections of mid-sized city in India. Traffic noise dataset of 776 (LAeq, 1hr) were collected from 23 locations of Gorakhpur mid-sized city in the state of Uttar Pradesh in India. K-nearest neighbor (K-NN) algorithm was adopted for traffic noise prediction modeling. Moreover, principal component analysis (PCA) technique was used for the dimensionality reduction and to overcome the problem of multi-collinearity. The developed model exhibits R2 value of 0.81, 0.78, and 0.77 in the MP, OP, and EP, respectively, for Leq, and a value of 0.86, 0.80, and 0.84 for L10. The proposed model can predict more than 94% observations within an accuracy of ±3%. Ultimately, a user-friendly noise level calculator named "Traffic Noise Prediction Calculator for Heterogeneous Traffic (TNPC-H)" was developed for the benefit of field engineers and policy planners.
Collapse
|
8
|
Machine learning algorithms accurately identify free-living marine nematode species. PeerJ 2023; 11:e16216. [PMID: 37842061 PMCID: PMC10569207 DOI: 10.7717/peerj.16216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 09/11/2023] [Indexed: 10/17/2023] Open
Abstract
Background Identifying species, particularly small metazoans, remains a daunting challenge and the phylum Nematoda is no exception. Typically, nematode species are differentiated based on morphometry and the presence or absence of certain characters. However, recent advances in artificial intelligence, particularly machine learning (ML) algorithms, offer promising solutions for automating species identification, mostly in taxonomically complex groups. By training ML models with extensive datasets of accurately identified specimens, the models can learn to recognize patterns in nematodes' morphological and morphometric features. This enables them to make precise identifications of newly encountered individuals. Implementing ML algorithms can improve the speed and accuracy of species identification and allow researchers to efficiently process vast amounts of data. Furthermore, it empowers non-taxonomists to make reliable identifications. The objective of this study is to evaluate the performance of ML algorithms in identifying species of free-living marine nematodes, focusing on two well-known genera: Acantholaimus Allgén, 1933 and Sabatieria Rouville, 1903. Methods A total of 40 species of Acantholaimus and 60 species of Sabatieria were considered. The measurements and identifications were obtained from the original publications of species for both genera, this compilation included information regarding the presence or absence of specific characters, as well as morphometric data. To assess the performance of the species identification four ML algorithms were employed: Random Forest (RF), Stochastic Gradient Boosting (SGBoost), Support Vector Machine (SVM) with both linear and radial kernels, and K-nearest neighbor (KNN) algorithms. Results For both genera, the random forest (RF) algorithm demonstrated the highest accuracy in correctly classifying specimens into their respective species, achieving an accuracy rate of 93% for Acantholaimus and 100% for Sabatieria, only a single individual from Acantholaimus of the test data was misclassified. Conclusion These results highlight the overall effectiveness of ML algorithms in species identification. Moreover, it demonstrates that the identification of marine nematodes can be automated, optimizing biodiversity and ecological studies, as well as turning species identification more accessible, efficient, and scalable. Ultimately it will contribute to our understanding and conservation of biodiversity.
Collapse
|
9
|
Detection of Android Malware in the Internet of Things through the K-Nearest Neighbor Algorithm. SENSORS (BASEL, SWITZERLAND) 2023; 23:7256. [PMID: 37631793 PMCID: PMC10460029 DOI: 10.3390/s23167256] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 08/14/2023] [Accepted: 08/16/2023] [Indexed: 08/27/2023]
Abstract
Predicting attacks in Android malware devices using machine learning for recommender systems-based IoT can be a challenging task. However, it is possible to use various machine-learning techniques to achieve this goal. An internet-based framework is used to predict and recommend Android malware on IoT devices. As the prevalence of Android devices grows, the malware creates new viruses on a regular basis, posing a threat to the central system's security and the privacy of the users. The suggested system uses static analysis to predict the malware in Android apps used by consumer devices. The training of the presented system is used to predict and recommend malicious devices to block them from transmitting the data to the cloud server. By taking into account various machine-learning methods, feature selection is performed and the K-Nearest Neighbor (KNN) machine-learning model is proposed. Testing was carried out on more than 10,000 Android applications to check malicious nodes and recommend that the cloud server block them. The developed model contemplated all four machine-learning algorithms in parallel, i.e., naive Bayes, decision tree, support vector machine, and the K-Nearest Neighbor approach and static analysis as a feature subset selection algorithm, and it achieved the highest prediction rate of 93% to predict the malware in real-world applications of consumer devices to minimize the utilization of energy. The experimental results show that KNN achieves 93%, 95%, 90%, and 92% accuracy, precision, recall and f1 measures, respectively.
Collapse
|
10
|
Adaptive sentiment analysis using multioutput classification: a performance comparison. PeerJ Comput Sci 2023; 9:e1378. [PMID: 37346589 PMCID: PMC10280487 DOI: 10.7717/peerj-cs.1378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 04/13/2023] [Indexed: 06/23/2023]
Abstract
The primary objective of this research is to create a multi-output classification model for sentiment analysis through the combination of 10 algorithms: BernoulliNB, Decision Tree, K-nearest neighbor, Logistic Regression, LinearSVC, Bagging, Stacking, Random Forest, AdaBoost, and ExtraTrees. In doing so, we aim to identify the optimal algorithm performance and role within the model. The data utilized in this study is derived from customer reviews of cryptocurrencies in Indonesia. Our results indicate that LinearSVC and Stacking exhibit a high accuracy (90%) compared to the other eight algorithms. The resulting multi-output model demonstrates an average accuracy of 88%, which can be considered satisfactory. This research endeavors to innovate in adaptive sentiment analysis classification by developing a multi-output model that utilizes a combination of 10 classification algorithms.
Collapse
|
11
|
Machine learning boosts three-dimensional bioprinting. Int J Bioprint 2023; 9:739. [PMID: 37323488 PMCID: PMC10261168 DOI: 10.18063/ijb.739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 03/06/2023] [Indexed: 06/17/2023] Open
Abstract
Three-dimensional (3D) bioprinting is a computer-controlled technology that combines biological factors and bioinks to print an accurate 3D structure in a layer- by-layer fashion. 3D bioprinting is a new tissue engineering technology based on rapid prototyping and additive manufacturing technology, combined with various disciplines. In addition to the problems in in vitro culture process, the bioprinting procedure is also afflicted with a few issues: (1) difficulty in looking for the appropriate bioink to match the printing parameters to reduce cell damage and mortality; and (2) difficulty in improving the printing accuracy in the printing process. Data- driven machine learning algorithms with powerful predictive capabilities have natural advantages in behavior prediction and new model exploration. Combining machine learning algorithms with 3D bioprinting helps to find more efficient bioinks, determine printing parameters, and detect defects in the printing process. This paper introduces several machine learning algorithms in detail, summarizes the role of machine learning in additive manufacturing applications, and reviews the research progress of the combination of 3D bioprinting and machine learning in recent years, especially the improvement of bioink generation, the optimization of printing parameter, and the detection of printing defect.
Collapse
|
12
|
Machine learning classification reveals robust morphometric biomarker of glial and neuronal arbors. J Neurosci Res 2023; 101:112-129. [PMID: 36196621 PMCID: PMC9828050 DOI: 10.1002/jnr.25131] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 09/06/2022] [Accepted: 09/20/2022] [Indexed: 01/12/2023]
Abstract
Neurons and glia are the two main cell classes in the nervous systems of most animals. Although functionally distinct, neurons and glia are both characterized by multiple branching arbors stemming from the cell bodies. Glial processes are generally known to form smaller trees than neuronal dendrites. However, the full extent of morphological differences between neurons and glia in multiple species and brain regions has not yet been characterized, nor is it known whether these cells can be reliably distinguished based on geometric features alone. Here, we show that multiple supervised learning algorithms deployed on a large database of morphological reconstructions can systematically classify neuronal and glial arbors with nearly perfect accuracy and precision. Moreover, we report multiple morphometric properties, both size related and size independent, that differ substantially between these cell types. In particular, we newly identify an individual morphometric measurement, Average Branch Euclidean Length that can robustly separate neurons from glia across multiple animal models, a broad diversity of experimental conditions, and anatomical areas, with the notable exception of the cerebellum. We discuss the practical utility and physiological interpretation of this discovery.
Collapse
|
13
|
A hybrid CNN-KNN approach for identification of COVID-19 with 5-fold cross validation. SENSORS INTERNATIONAL 2023; 4:100229. [PMID: 36742993 PMCID: PMC9886434 DOI: 10.1016/j.sintl.2023.100229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 01/24/2023] [Accepted: 01/28/2023] [Indexed: 02/01/2023] Open
Abstract
The novel coronavirus is the new member of the SARS family, which can cause mild to severe infection in the lungs and other vital organs like the heart, kidney and liver. For detecting COVID-19 from images, traditional ANN can be employed. This method begins by extracting the features and then feeding the features into a suitable classifier. The classification rate is not so high as feature extraction is dependent on the experimenters' expertise. To solve this drawback, a hybrid CNN-KNN-based model with 5-fold cross-validation is proposed to classify covid-19 or non-covid19 from CT scans of patients. At first, some pre-processing steps like contrast enhancement, median filtering, data augmentation, and image resizing are performed. Secondly, the entire dataset is divided into five equal sections or folds for training and testing. By doing 5-fold cross-validation, the generalization of the dataset is ensured and the overfitting of the network is prevented. The proposed CNN model consists of four convolutional layers, four max-pooling layers, and two fully connected layers combined with 23 layers. The CNN architecture is used as a feature extractor in this case. The features are taken from the CNN model's fourth convolutional layer and finally, the features are classified using K Nearest Neighbor rather than softmax for better accuracy. The proposed method is conducted over an augmented dataset of 4085 CT scan images. The average accuracy, precision, recall and F1 score of the proposed method after performing a 5-fold cross-validation is 98.26%, 99.42%,97.2% and 98.19%, respectively. The proposed method's accuracy is comparable with the existing works described further, where the state of the art and the custom CNN models were used. Hence, this proposed method can diagnose the COVID-19 patients with higher efficiency.
Collapse
|
14
|
Air Quality Index prediction using an effective hybrid deep learning model. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2022; 315:120404. [PMID: 36240962 DOI: 10.1016/j.envpol.2022.120404] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 09/27/2022] [Accepted: 10/06/2022] [Indexed: 06/16/2023]
Abstract
Environmentalism has become an intrinsic part of everyday life. One of the greatest challenge to the environment's long-term existence is the air pollution. Delhi, the capital of India, has experienced decreasing of air quality for several years. The poor air quality has a significant impact on the lives of individuals. Air Quality Index (AQI) prediction can help to its beneficiaries in taking safeguards about their health before moving to any polluted area. In this study, a variety of data forecasting approaches is evaluated to predict the AQI value for Particulate Matter (PM2.5) μm at a particular area of Delhi and several error-prone strategies such as R-Squared (R2), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE) methods are catalogued. In the proposed approach two deep learning models like Long-Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) are combined to predict the AQI of the environment. Several stand alone machine learning (ML) and deep learning (DL) models such as LSTM, Linear-Regression (LR), GRU, K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) are also trained on the same dataset to compare their performances with the proposed hybrid (LSTM-GRU) model and it is found that the proposed hybrid model shows supremacy in the performance with the MAE value 36.11 and R2 value 0.84.
Collapse
|
15
|
Improving automatic GO annotation with semantic similarity. BMC Bioinformatics 2022; 23:433. [PMID: 36510133 PMCID: PMC9743508 DOI: 10.1186/s12859-022-04958-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Accepted: 09/19/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Automatic functional annotation of proteins is an open research problem in bioinformatics. The growing number of protein entries in public databases, for example in UniProtKB, poses challenges in manual functional annotation. Manual annotation requires expert human curators to search and read related research articles, interpret the results, and assign the annotations to the proteins. Thus, it is a time-consuming and expensive process. Therefore, designing computational tools to perform automatic annotation leveraging the high quality manual annotations that already exist in UniProtKB/SwissProt is an important research problem RESULTS: In this paper, we extend and adapt the GrAPFI (graph-based automatic protein function inference) (Sarker et al. in BMC Bioinform 21, 2020; Sarker et al., in: Proceedings of 7th international conference on complex networks and their applications, Cambridge, 2018) method for automatic annotation of proteins with gene ontology (GO) terms renaming it as GrAPFI-GO. The original GrAPFI method uses label propagation in a similarity graph where proteins are linked through the domains, families, and superfamilies that they share. Here, we also explore various types of similarity measures based on common neighbors in the graph. Moreover, GO terms are arranged in a hierarchical manner according to semantic parent-child relations. Therefore, we propose an efficient pruning and post-processing technique that integrates both semantic similarity and hierarchical relations between the GO terms. We produce experimental results comparing the GrAPFI-GO method with and without considering common neighbors similarity. We also test the performance of GrAPFI-GO and other annotation tools for GO annotation on a benchmark of proteins with and without the proposed pruning and post-processing procedure. CONCLUSION Our results show that the proposed semantic hierarchical post-processing potentially improves the performance of GrAPFI-GO and of other annotation tools as well. Thus, GrAPFI-GO exposes an original efficient and reusable procedure, to exploit the semantic relations among the GO terms in order to improve the automatic annotation of protein functions.
Collapse
|
16
|
Short-Term Demand Forecasting of Urban Online Car-Hailing Based on the K-Nearest Neighbor Model. SENSORS (BASEL, SWITZERLAND) 2022; 22:9456. [PMID: 36502158 PMCID: PMC9736254 DOI: 10.3390/s22239456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Revised: 11/27/2022] [Accepted: 11/30/2022] [Indexed: 06/17/2023]
Abstract
Accurately forecasting the demand of urban online car-hailing is of great significance to improving operation efficiency, reducing traffic congestion and energy consumption. This paper takes 265-day order data from the Hefei urban online car-hailing platform from 2019 to 2021 as an example, and divides each day into 48 time units (30 min per unit) to form a data set. Taking the minimum average absolute error as the optimization objective, the historical data sets are classified, and the values of the state vector T and the parameter K of the K-nearest neighbor model are optimized, which solves the problem of prediction error caused by fixed values of T or K in traditional model. The conclusion shows that the forecasting accuracy of the K-nearest neighbor model can reach 93.62%, which is much higher than the exponential smoothing model (81.65%), KNN1 model (84.02%) and is similar to LSTM model (91.04%), meaning that it can adapt to the urban online car-hailing system and be valuable in terms of its potential application.
Collapse
|
17
|
Automatic Modulation Recognition Based on the Optimized Linear Combination of Higher-Order Cumulants. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22197488. [PMID: 36236583 PMCID: PMC9571176 DOI: 10.3390/s22197488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 08/25/2022] [Accepted: 08/25/2022] [Indexed: 05/14/2023]
Abstract
Automatic modulation recognition (AMR) is used in various domains-from general-purpose communication to many military applications-thanks to the growing popularity of the Internet of Things (IoT) and related communication technologies. In this research article, we propose an innovative idea of combining the classical mathematical technique of computing linear combinations (LCs) of cumulants with a genetic algorithm (GA) to create super-cumulants. These super-cumulants are further used to classify five digital modulation schemes on fading channels using the K-nearest neighbor (KNN). Our proposed classifier significantly improves the percentage recognition accuracy at lower SNRs when using smaller sample sizes. A comparison with existing techniques manifests the supremacy of our proposed classifier.
Collapse
|
18
|
Removal of bacterial indicators in on-site two-stage multi-soil-layering plant under arid climate (Morocco): prediction of total coliform content using K-nearest neighbor algorithm. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022; 29:75716-75729. [PMID: 35661304 DOI: 10.1007/s11356-022-21194-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 05/26/2022] [Indexed: 06/15/2023]
Abstract
This study aims to evaluate and monitor the efficacy of a full-scale two-stage multi-soil-layering (TS-MSL) plant in removing fecal contamination from domestic wastewater. The TS-MSL plant under investigation consisted of two units in series, one with a vertical flow regime (VF-MSL) and the other with a horizontal flow regime (HF-MSL). Furthermore, this study attempts to see whether linear model (LM) and K-nearest neighbor (KNN) model can be used to predict total coliform (TC) removal in the TS-MSL system. For 24 months, the TS-MSL system was monitored, with bimonthly measurements recorded at the inlet and outlet of each compartment. Obtained results show removal of 85% of COD, 67% of TP, 27% of TN, and 3 log units of coliforms with good system stability. Thus, the effluent meets the Moroccan water quality code for reuse in the irrigation of green spaces. In addition, as compared to LM, the KNN model (R2 = 0.988) may be considered as an effective method for predicting TC removal in the TS-MSL system. Finally, sensitivity analysis has shown that TC and dissolved oxygen level in the influent were the most influential parameters for predicting TC removal in the TS-MSL system.
Collapse
|
19
|
Modeling method and miniaturized wavelength strategy for near-infrared spectroscopic discriminant analysis of soy sauce brand identification. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2022; 277:121291. [PMID: 35490665 DOI: 10.1016/j.saa.2022.121291] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 04/13/2022] [Accepted: 04/18/2022] [Indexed: 06/14/2023]
Abstract
The identification of soy sauce brands can avoid adulteration and fraud, which is meaningful for food safety screening. Using visible and near-infrared (Vis-NIR) spectroscopy combined with k-nearest neighbor (kNN), the four-category discriminant models of soy sauce brands were established. The soy sauce of three brands (identification) and the other ten brands (interference) were collected, and a total of four categories of samples were obtained. The spectral datasets of two measurement modals (1 mm, 10 mm) were obtained. Based on moving-window (MW) waveband screening and wavelength step-by-step phase-out (WSP), the MW-WSP-kNN algorithm was proposed and applied to the wavelength optimization for the four-category discriminant analysis. Using calibration-prediction-validation experiment design, various high accuracy models with a small number of wavelengths located in NIR region were determined. In the independent validation, for the 1 mm measurement modal, the selected thirty-five dual-wavelength models and one three-wavelength model were located in NIR combined and overtone frequency regions respectively, all achieved 100% total recognition accuracy rate (RARTotal); for the 10 mm measurement modal, the selected seven three-wavelength models located in NIR overtone frequency region all reached more than 96.8% RARTotal, and the optimal RARTotal was 97.8%. The results showed the feasibility of small number of wavelengths' NIR spectroscopy applied to multi-category discriminant of soy sauce brands, with the advantages of rapid, simple and miniaturized. The proposed various small number of wavelengths' models provided a valuable reference for the design of small dedicated spectrometer with different measurement modals. The integrated optimization method and wavelength selection strategy here are also expected to be applied to other fields.
Collapse
|
20
|
Modeling the Prognostic Impact of Circulating Tumor Cells Enumeration in Metastatic Breast Cancer for Clinical Trial Design Simulation. Oncologist 2022; 27:e561-e570. [PMID: 35278078 PMCID: PMC9255982 DOI: 10.1093/oncolo/oyac045] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 12/31/2021] [Indexed: 11/15/2022] Open
Abstract
Despite the strong prognostic stratification of circulating tumor cells (CTCs) enumeration in metastatic breast cancer (MBC), current clinical trials usually do not include a baseline CTCs in their design. This study aimed to generate a classifier for CTCs prognostic simulation in existing datasets for hypothesis generation in patients with MBC. A K-nearest neighbor machine learning algorithm was trained on a pooled dataset comprising 2436 individual MBC patients from the European Pooled Analysis Consortium and the MD Anderson Cancer Center to identify patients likely to have CTCs ≥ 5/7 mL blood (StageIVaggressive vs StageIVindolent). The model had a 65.1% accuracy and its prognostic impact resulted in a hazard ratio (HR) of 1.89 (Simulatedaggressive vs SimulatedindolentP < .001), similar to patients with actual CTCs enumeration (HR 2.76; P < .001). The classifier's performance was then tested on an independent retrospective database comprising 446 consecutive hormone receptor (HR)-positive HER2-negative MBC patients. The model further stratified clinical subgroups usually considered prognostically homogeneous such as patients with bone-only or liver metastases. Bone-only disease classified as Simulatedaggressive had a significantly worse overall survival (OS; P < .0001), while patients with liver metastases classified as Simulatedindolent had a significantly better prognosis (P < .0001). Consistent results were observed for patients who had undergone CTCs enumeration in the pooled population. The differential prognostic impact of endocrine- (ET) and chemotherapy (CT) was explored across the simulated subgroups. No significant differences were observed between ET and CT in the overall population, both in terms of progression-free survival (PFS) and OS. In contrast, a statistically significant difference, favoring CT over ET was observed among Simulatedaggressive patients (HR: 0.62; P = .030 and HR: 0.60; P = .037, respectively, for PFS and OS).
Collapse
|
21
|
Deep Metric Learning-Based Strawberry Disease Detection With Unknowns. FRONTIERS IN PLANT SCIENCE 2022; 13:891785. [PMID: 35860535 PMCID: PMC9289608 DOI: 10.3389/fpls.2022.891785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 06/06/2022] [Indexed: 06/15/2023]
Abstract
There has been substantial research that has achieved significant advancements in plant disease detection based on deep object detection models. However, with unknown diseases, it is difficult to find a practical solution for plant disease detection. This study proposes a simple but effective strawberry disease detection scheme with unknown diseases that can provide applicable performance in the real field. In the proposed scheme, the known strawberry diseases are detected with deep metric learning (DML)-based classifiers along with the unknown diseases that have certain symptoms. The pipeline of our proposed scheme consists of two stages: the first is object detection with known disease classes, while the second is a DML-based post-filtering stage. The second stage has two different types of classifiers: one is softmax classifiers that are only for known diseases and the K-nearest neighbor (K-NN) classifier for both known and unknown diseases. In the training of the first stage and the DML-based softmax classifier, we only use the known samples of the strawberry disease. Then, we include the known (a priori) and the known unknown training samples to construct the K-NN classifier. The final decisions regarding known diseases are made from the combined results of the two classifiers, while unknowns are detected from the K-NN classifier. The experimental results show that the DML-based post-filter is effective at improving the performance of known disease detection in terms of mAP. Furthermore, the separate DML-based K-NN classifier provides high recall and precision for known and unknown diseases and achieve 97.8% accuracy, meaning it could be exploited as a Region of Interest (ROI) classifier. For the real field data, the proposed scheme achieves a high mAP of 93.7% to detect known classes of strawberry disease, and it also achieves reasonable results for unknowns. This implies that the proposed scheme can be applied to identify disease-like symptoms caused by real known and unknown diseases or disorders for any kind of plant.
Collapse
|
22
|
Gravity-Matching Algorithm Based on K-Nearest Neighbor. SENSORS 2022; 22:s22124454. [PMID: 35746235 PMCID: PMC9228196 DOI: 10.3390/s22124454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 05/28/2022] [Accepted: 06/10/2022] [Indexed: 12/04/2022]
Abstract
The gravity-aided inertial navigation system is a technique using geophysical information, which has broad application prospects, and the gravity-map-matching algorithm is one of its key technologies. A novel gravity-matching algorithm based on the K-Nearest neighbor is proposed in this paper to enhance the anti-noise capability of the gravity-matching algorithm, improve the accuracy of gravity-aided navigation, and reduce the application threshold of the matching algorithm. This algorithm selects K sample labels by the Euclidean distance between sample datum and measurement, and then creatively determines the weight of each label from its spatial position using the weighted average of labels and the constraint conditions of sailing speed to obtain the continuous navigation results by gravity matching. The simulation experiments of post processing are designed to demonstrate the efficiency. The experimental results show that the algorithm reduces the INS positioning error effectively, and the position error in both longitude and latitude directions is less than 800 m. The computing time can meet the requirements of real-time navigation, and the average running time of the KNN algorithm at each matching point is 5.87s. This algorithm shows better stability and anti-noise capability in the continuously matching process.
Collapse
|
23
|
The Braking-Pressure and Driving-Direction Determination System (BDDS) Using Road Roughness and Passenger Conditions of Surrounding Vehicles. SENSORS (BASEL, SWITZERLAND) 2022; 22:4414. [PMID: 35746196 PMCID: PMC9230584 DOI: 10.3390/s22124414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Revised: 06/01/2022] [Accepted: 06/07/2022] [Indexed: 06/15/2023]
Abstract
A fully autonomous vehicle must ensure not only fully autonomous driving but also the safety and comfort of its passengers. However, the self-driving technology that is currently completed focuses only on perfect driving and does not guarantee the safety and comfort of passengers. This paper proposes a braking-pressure and driving-direction determination system (BDDS), which computes the brake pressure and steering angle optimized for passenger safety by utilizing more diverse information than existing autonomous vehicles. The BDDS proposed in this paper consists of two modules. The road roughness classification module (RRCM) classifies the roughness of the road by using the pressure data applied to the suspension and the K-NN algorithm and computes the optimal brake pressure. The passenger recognition and sharing module (PRSM) identifies the current occupant status of the vehicle by using a body pressure sensor and CNN, shares the information with surrounding vehicles, and computes the optimal steering angle using passenger information and road information. As a result of the simulations described in this paper, the parameters of AI models were optimized. In addition, the RRCS was about 7% more accurate than the K-means clustering algorithm, and PRS was about 9% more accurate than the existing seat recognition system.
Collapse
|
24
|
Design of Electronic Nose Detection System for Apple Quality Grading Based on Computational Fluid Dynamics Simulation and K-Nearest Neighbor Support Vector Machine. SENSORS 2022; 22:s22082997. [PMID: 35458982 PMCID: PMC9025600 DOI: 10.3390/s22082997] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 04/09/2022] [Accepted: 04/12/2022] [Indexed: 12/04/2022]
Abstract
Apples are one of the most widely planted fruits in the world, with an extremely high annual production. Several issues should be addressed to avoid the damaging of samples during the quality grading process of apples (e.g., the long detection period and the inability to detect the internal quality of apples). In this study, an electronic nose (e-nose) detection system for apple quality grading based on the K-nearest neighbor support vector machine (KNN-SVM) was designed, and the nasal cavity structure of the e-nose was optimized by computational fluid dynamics (CFD) simulation. A KNN-SVM classifier was also proposed to overcome the shortcomings of the traditional SVMs. The performance of the developed device was experimentally verified in the following steps. The apples were divided into three groups according to their external and internal quality. The e-nose data were pre-processed before features extraction, and then Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) were used to reduce the dimension of the datasets. The recognition accuracy of the PCA–KNN-SVM classifier was 96.45%, and the LDA–KNN-SVM classifier achieved 97.78%. Compared with other commonly used classifiers, (traditional KNN, SVM, Decision Tree, and Random Forest), KNN-SVM is more efficient in terms of training time and accuracy of classification. Generally, the apple grading system can be used to evaluate the quality of apples during storage.
Collapse
|
25
|
Wavelet analysis reveals differential lower limb muscle activity patterns long after anterior cruciate ligament reconstruction. J Biomech 2022; 133:110957. [PMID: 35114581 PMCID: PMC8893161 DOI: 10.1016/j.jbiomech.2022.110957] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 01/05/2022] [Accepted: 01/06/2022] [Indexed: 10/19/2022]
Abstract
The purpose of this study was to test whether differences in muscle activity patterns between anterior cruciate ligament-reconstructed patients (ACLR) and healthy controls could be detected 10 to 15 years post-surgery using a machine learning classification approach. Eleven ACLR subjects and 12 healthy controls were recruited from an ongoing prospective randomized clinical trial. Surface electromyography (EMG) signals were recorded from gastrocnemius medialis and lateralis, tibialis anterior, vastus medialis, rectus femoris, biceps femoris, and semitendinosus muscles. Muscle activity was analyzed using wavelet analysis and examined within four sub-phases of the hop test, as well as an average of the task as a whole. K-nearest neighbor machine learning combined with a leave-one-out validation was used to classify the muscle activity patterns as either ACLR or Control. When muscle activity was averaged across the whole hop task, activity patterns for all muscles except the tibialis anterior were identified as being different between the study cohorts. ACLR patients demonstrated continuous muscle activities that spanned take-off, airborne, and landing hop phases versus healthy controls who displayed timed and regulated islets of muscle activities specific to each hop phase. The most striking features were 25-50% greater relative quadriceps intensity and approximately 66% diminished biceps femoris intensity in ACLR patients. The current findings are in contrast to previous work using conventional co-contraction and muscle activation onset EMG measures of the same dataset, underscoring the sensitivity and potential of the wavelet approach coupled with machine learning to reveal meaningful adaptation strategies in this at-risk population.
Collapse
|
26
|
Skin lesion classification system using a K-nearest neighbor algorithm. Vis Comput Ind Biomed Art 2022; 5:7. [PMID: 35229199 PMCID: PMC8885942 DOI: 10.1186/s42492-022-00103-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 01/23/2022] [Indexed: 11/10/2022] Open
Abstract
One of the most critical steps in medical health is the proper diagnosis of the disease. Dermatology is one of the most volatile and challenging fields in terms of diagnosis. Dermatologists often require further testing, review of the patient's history, and other data to ensure a proper diagnosis. Therefore, finding a method that can guarantee a proper trusted diagnosis quickly is essential. Several approaches have been developed over the years to facilitate the diagnosis based on machine learning. However, the developed systems lack certain properties, such as high accuracy. This study proposes a system developed in MATLAB that can identify skin lesions and classify them as normal or benign. The classification process is effectuated by implementing the K-nearest neighbor (KNN) approach to differentiate between normal skin and malignant skin lesions that imply pathology. KNN is used because it is time efficient and promises highly accurate results. The accuracy of the system reached 98% in classifying skin lesions.
Collapse
|
27
|
Segmentation of Organs and Tumor within Brain Magnetic Resonance Images Using K-Nearest Neighbor Classification. J Med Phys 2022; 47:40-49. [PMID: 35548028 PMCID: PMC9084578 DOI: 10.4103/jmp.jmp_87_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 10/24/2021] [Accepted: 12/11/2021] [Indexed: 11/29/2022] Open
Abstract
PURPOSE To fully exploit the benefits of magnetic resonance imaging (MRI) for radiotherapy, it is desirable to develop segmentation methods to delineate patients' MRI images fast and accurately. The purpose of this work is to develop a semi-automatic method to segment organs and tumor within the brain on standard T1- and T2-weighted MRI images. METHODS AND MATERIALS Twelve brain cancer patients were retrospectively included in this study, and a simple rigid registration was used to align all the images to the same spatial coordinates. Regions of interest were created for organs and tumor segmentations. The K-nearest neighbor (KNN) classification algorithm was used to characterize the knowledge of previous segmentations using 15 image features (T1 and T2 image intensity, 4 Gabor filtered images, 6 image gradients, and 3 Cartesian coordinates), and the trained models were used to predict organ and tumor contours. Dice similarity coefficient (DSC), normalized surface dice, sensitivity, specificity, and Hausdorff distance were used to evaluate the performance of segmentations. RESULTS Our semi-automatic segmentations matched with the ground truths closely. The mean DSC value was between 0.49 (optical chiasm) and 0.89 (right eye) for organ segmentations and was 0.87 for tumor segmentation. Overall performance of our method is comparable or superior to the previous work, and the accuracy of our semi-automatic segmentation is generally better for large volume objects. CONCLUSION The proposed KNN method can accurately segment organs and tumor using standard brain MRI images, provides fast and accurate image processing and planning tools, and paves the way for clinical implementation of MRI-guided radiotherapy and adaptive radiotherapy.
Collapse
|
28
|
Mammography Image-Based Diagnosis of Breast Cancer Using Machine Learning: A Pilot Study. SENSORS 2021; 22:s22010203. [PMID: 35009746 PMCID: PMC8749541 DOI: 10.3390/s22010203] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 12/22/2021] [Accepted: 12/24/2021] [Indexed: 02/08/2023]
Abstract
A tumor is an abnormal tissue classified as either benign or malignant. A breast tumor is one of the most common tumors in women. Radiologists use mammograms to identify a breast tumor and classify it, which is a time-consuming process and prone to error due to the complexity of the tumor. In this study, we applied machine learning-based techniques to assist the radiologist in reading mammogram images and classifying the tumor in a very reasonable time interval. We extracted several features from the region of interest in the mammogram, which the radiologist manually annotated. These features are incorporated into a classification engine to train and build the proposed structure classification models. We used a dataset that was not previously seen in the model to evaluate the accuracy of the proposed system following the standard model evaluation schemes. Accordingly, this study found that various factors could affect the performance, which we avoided after experimenting all the possible ways. This study finally recommends using the optimized Support Vector Machine or Naïve Bayes, which produced 100% accuracy after integrating the feature selection and hyper-parameter optimization schemes.
Collapse
|
29
|
COVID-19 anomaly detection and classification method based on supervised machine learning of chest X-ray images. RESULTS IN PHYSICS 2021; 31:105045. [PMID: 34840938 PMCID: PMC8607738 DOI: 10.1016/j.rinp.2021.105045] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 11/19/2021] [Accepted: 11/19/2021] [Indexed: 05/03/2023]
Abstract
The term COVID-19 is an abbreviation of Coronavirus 2019, which is considered a global pandemic that threatens the lives of millions of people. Early detection of the disease offers ample opportunity of recovery and prevention of spreading. This paper proposes a method for classification and early detection of COVID-19 through image processing using X-ray images. A set of procedures are applied, including preprocessing (image noise removal, image thresholding, and morphological operation), Region of Interest (ROI) detection and segmentation, feature extraction, (Local binary pattern (LBP), Histogram of Gradient (HOG), and Haralick texture features) and classification (K-Nearest Neighbor (KNN) and Support Vector Machine (SVM)). The combinations of the feature extraction operators and classifiers results in six models, namely LBP-KNN, HOG-KNN, Haralick-KNN, LBP-SVM, HOG-SVM, and Haralick-SVM. The six models are tested based on test samples of 5,000 images with the percentage of training of 5-folds cross-validation. The evaluation results show high diagnosis accuracy from 89.2% up to 98.66%. The LBP-KNN model outperforms the other models in which it achieves an average accuracy of 98.66%, a sensitivity of 97.76%, specificity of 100%, and precision of 100%. The proposed method for early detection and classification of COVID-19 through image processing using X-ray images is proven to be usable in which it provides an end-to-end structure without the need for manual feature extraction and manual selection methods.
Collapse
|
30
|
A Novel Multi-Feature Fusion Method in Merging Information of Heterogenous-View Data for Oil Painting Image Feature Extraction and Recognition. Front Neurorobot 2021; 15:709043. [PMID: 34322005 PMCID: PMC8313240 DOI: 10.3389/fnbot.2021.709043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Accepted: 06/17/2021] [Indexed: 12/04/2022] Open
Abstract
The art of oil painting reflects on society in the form of vision, while technology constantly explores and provides powerful possibilities to transform the society, which also includes the revolution in the way of art creation and even the way of thinking. The progress of science and technology often provides great changes for the creation of art, and also often changes people's way of appreciation and ideas. The oil painting image feature extraction and recognition is an important field in computer vision, which is widely used in video surveillance, human-computer interaction, sign language recognition and medical, health care. In the past few decades, feature extraction and recognition have focused on the multi-feature fusion method. However, the captured oil painting image is sensitive to light changes and background noise, which limits the robustness of feature extraction and recognition. Oil painting feature extraction is the basis of feature classification. Feature classification based on a single feature is easily affected by the inaccurate detection accuracy of the object area, object angle, scale change, noise interference and other factors, resulting in the reduction of classification accuracy. Therefore, we propose a novel multi-feature fusion method in merging information of heterogenous-view data for oil painting image feature extraction and recognition in this paper. It fuses the width-to-height ratio feature, rotation invariant uniform local binary mode feature and SIFT feature. Meanwhile, we adopt a modified faster RCNN to extract the semantic feature of oil painting. Then the feature is classified based on the support vector machine and K-nearest neighbor method. The experiment results show that the feature extraction method based on multi-feature fusion can significantly improve the average classification accuracy of oil painting and have high recognition efficiency.
Collapse
|
31
|
Identification of People with Diabetes Treatment through Lipids Profile Using Machine Learning Algorithms. Healthcare (Basel) 2021; 9:422. [PMID: 33917300 PMCID: PMC8067355 DOI: 10.3390/healthcare9040422] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 03/02/2021] [Accepted: 03/08/2021] [Indexed: 11/16/2022] Open
Abstract
Diabetes incidence has been a problem, because according with the World Health Organization and the International Diabetes Federation, the number of people with this disease is increasing very fast all over the world. Diabetic treatment is important to prevent the development of several complications, also lipid profile monitoring is important. For that reason the aim of this work is the implementation of machine learning algorithms that are able to classify cases, that corresponds to patients diagnosed with diabetes that have diabetes treatment, and controls that refers to subjects who do not have diabetes treatment but some of them have diabetes, bases on lipids profile levels. Logistic regression, K-nearest neighbor, decision trees and random forest were implemented, all of them were evaluated with accuracy, sensitivity, specificity and AUC-ROC curve metrics. Artificial neural network obtain an acurracy of 0.685 and an AUC value of 0.750, logistic regression achieve an accuracy of 0.729 and an AUC value of 0.795, K-nearest neighbor gets an accuracy of 0.669 and an AUC value of 0.709, on the other hand, decision tree reached an accuracy pg 0.691 and a AUC value of 0.683, finally random forest achieve an accuracy of 0.704 and an AUC curve of 0.776. The performance of all models was statistically significant, but the best performance model for this problem corresponds to logistic regression.
Collapse
|
32
|
Using Soft Sensors as a Basis of an Innovative Architecture for Operation Planning and Quality Evaluation in Agricultural Sprayers. SENSORS 2021; 21:s21041269. [PMID: 33578915 PMCID: PMC7916728 DOI: 10.3390/s21041269] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Revised: 01/27/2021] [Accepted: 02/06/2021] [Indexed: 11/25/2022]
Abstract
One of the major problems facing humanity in the coming decades is the production of food on a large scale. The production of large quantities of food must be conducted in a sustainable and responsible manner for nature and humans. In this sense, the appropriate application of agricultural pesticides plays a fundamental role since pesticide application in a qualified manner reduces human and environmental risks as well as the costs of food production. Evaluation of the quality of application using sprayers is an important issue, and several quality descriptors related to the average diameter and distribution of droplets are used. This paper describes the construction of a data-driven soft sensor using the parametric principal component regression (PCR) method based on principal component analysis (PCA), which works in two configurations: with the input being the operating conditions of the agricultural boom sprayers and its outputs being the prediction of the quality descriptors of spraying, and vice versa. The soft sensor provides, in one configuration, estimates of the quality of pesticide application at a certain time and, in the other, estimates of the appropriate sprayer-operating conditions, which can be used for control and optimization of the processes in pesticide application. Full cone nozzles are used to illustrate a practical application as well as to validate the usefulness of the soft sensor designed with the PCR method. The selection of historical data, exploration, and filtering of data, and the structure and validation of the soft sensor are presented. For comparison purposes, the results with the well-known nonparametric k-Nearest Neighbor (k−NN) regression method are presented. The results of this research reveal the usefulness of soft sensors in the application of agricultural pesticides and as a knowledge base to assist in agricultural decision-making.
Collapse
|
33
|
Computational Method for Classification of Avian Influenza A Virus Using DNA Sequence Information and Physicochemical Properties. Front Genet 2021; 12:599321. [PMID: 33584824 PMCID: PMC7877484 DOI: 10.3389/fgene.2021.599321] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 01/04/2021] [Indexed: 11/30/2022] Open
Abstract
Accurate and fast characterization of the subtype sequences of Avian influenza A virus (AIAV) hemagglutinin (HA) and neuraminidase (NA) depends on expanding diagnostic services and is embedded in molecular epidemiological studies. A new approach for classifying the AIAV sequences of the HA and NA genes into subtypes using DNA sequence data and physicochemical properties is proposed. This method simply requires unaligned, full-length, or partial sequences of HA or NA DNA as input. It allows for quick and highly accurate assignments of HA sequences to subtypes H1–H16 and NA sequences to subtypes N1–N9. For feature extraction, k-gram, discrete wavelet transformation, and multivariate mutual information were used, and different classifiers were trained for prediction. Four different classifiers, Naïve Bayes, Support Vector Machine (SVM), K nearest neighbor (KNN), and Decision Tree, were compared using our feature selection method. This comparison is based on the 30% dataset separated from the original dataset for testing purposes. Among the four classifiers, Decision Tree was the best, and Precision, Recall, F1 score, and Accuracy were 0.9514, 0.9535, 0.9524, and 0.9571, respectively. Decision Tree had considerable improvements over the other three classifiers using our method. Results show that the proposed feature selection method, when trained with a Decision Tree classifier, gives the best results for accurate prediction of the AIAV subtype.
Collapse
|
34
|
Simplifying Diagnosis of Fetal Alcohol Syndrome Using Machine Learning Methods. Front Pediatr 2021; 9:707566. [PMID: 35127583 PMCID: PMC8814594 DOI: 10.3389/fped.2021.707566] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 12/16/2021] [Indexed: 01/22/2023] Open
Abstract
INTRODUCTION The fetal alcohol spectrum disorder (FASD) is a complex and heterogeneous disorder, caused by gestational exposure to alcohol. Patients with fetal alcohol syndrome (FAS-most severe form of FASD) show abnormal facial features. The aim of our study was to use 3D- metric facial data of patients with FAS and identify machine learning methods, which could improve and objectify the diagnostic process. MATERIAL AND METHODS Facial 3D scans of 30 children with FAS and 30 controls were analyzed. Skeletal, facial, dental and orthodontic parameters as collected in previous studies were used to evaluate their value for machine learning based diagnosis. Three machine learning methods, decision trees, support vector machine and k-nearest neighbors were tested with respect to their accuracy and clinical practicability. RESULTS All three of the above machine learning methods showed a high accuracy of 89.5%. The three predictors with the highest scores were: Midfacial length, palpebral fissure length of the right eye and nose breadth at sulcus nasi. CONCLUSIONS With the parameters right palpebral fissure length, midfacial length and nose breadth at sulcus nasi, machine learning was an efficient method for the objective and reliable detection of patients with FAS within our patient group. Of the three tested methods, decision trees would be the most helpful and easiest to apply method for everyday clinical and private practice.
Collapse
|
35
|
A Sentiment Analysis Approach to Predict an Individual's Awareness of the Precautionary Procedures to Prevent COVID-19 Outbreaks in Saudi Arabia. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 18:ijerph18010218. [PMID: 33396713 PMCID: PMC7795573 DOI: 10.3390/ijerph18010218] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 12/25/2020] [Accepted: 12/27/2020] [Indexed: 11/16/2022]
Abstract
In March 2020, the World Health Organization (WHO) declared the outbreak of Coronavirus disease 2019 (COVID-19) as a pandemic, which affected all countries worldwide. During the outbreak, public sentiment analyses contributed valuable information toward making appropriate public health responses. This study aims to develop a model that predicts an individual's awareness of the precautionary procedures in five main regions in Saudi Arabia. In this study, a dataset of Arabic COVID-19 related tweets was collected, which fell in the period of the curfew. The dataset was processed, based on several machine learning predictive models: Support Vector Machine (SVM), K-nearest neighbors (KNN), and Naïve Bayes (NB), along with the N-gram feature extraction technique. The results show that applying the SVM classifier along with bigram in Term Frequency-Inverse Document Frequency (TF-IDF) outperformed other models with an accuracy of 85%. The results of awareness prediction showed that the south region observed the highest level of awareness towards COVID-19 containment measures, whereas the middle region was the least. The proposed model can support the medical sectors and decision-makers to decide the appropriate procedures for each region based on their attitudes towards the pandemic.
Collapse
|
36
|
Prediction of heart disease and classifiers' sensitivity analysis. BMC Bioinformatics 2020; 21:278. [PMID: 32615980 PMCID: PMC7331233 DOI: 10.1186/s12859-020-03626-y] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 06/22/2020] [Indexed: 12/20/2022] Open
Abstract
Background Heart disease (HD) is one of the most common diseases nowadays, and an early diagnosis of such a disease is a crucial task for many health care providers to prevent their patients for such a disease and to save lives. In this paper, a comparative analysis of different classifiers was performed for the classification of the Heart Disease dataset in order to correctly classify and or predict HD cases with minimal attributes. The set contains 76 attributes including the class attribute, for 1025 patients collected from Cleveland, Hungary, Switzerland, and Long Beach, but in this paper, only a subset of 14 attributes are used, and each attribute has a given set value. The algorithms used K- Nearest Neighbor (K-NN), Naive Bayes, Decision tree J48, JRip, SVM, Adaboost, Stochastic Gradient Decent (SGD) and Decision Table (DT) classifiers to show the performance of the selected classifications algorithms to best classify, and or predict, the HD cases. Results It was shown that using different classification algorithms for the classification of the HD dataset gives very promising results in term of the classification accuracy for the K-NN (K = 1), Decision tree J48 and JRip classifiers with accuracy of classification of 99.7073, 98.0488 and 97.2683% respectively. A feature extraction method was performed using Classifier Subset Evaluator on the HD dataset, and results show enhanced performance in term of the classification accuracy for K-NN (N = 1) and Decision Table classifiers to 100 and 93.8537% respectively after using the selected features by only applying a combination of up to 4 attributes instead of 13 attributes for the predication of the HD cases. Conclusion Different classifiers were used and compared to classify the HD dataset, and we concluded the benefit of having a reliable feature selection method for HD disease prediction with using minimal number of attributes instead of having to consider all available ones.
Collapse
|
37
|
GrAPFI: predicting enzymatic function of proteins from domain similarity graphs. BMC Bioinformatics 2020; 21:168. [PMID: 32349654 PMCID: PMC7191693 DOI: 10.1186/s12859-020-3460-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Accepted: 03/19/2020] [Indexed: 01/20/2023] Open
Abstract
An amendment to this paper has been published and can be accessed via the original article.
Collapse
|
38
|
Protein kinase inhibitors' classification using K-Nearest neighbor algorithm. Comput Biol Chem 2020; 86:107269. [PMID: 32413830 DOI: 10.1016/j.compbiolchem.2020.107269] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Revised: 03/15/2020] [Accepted: 04/20/2020] [Indexed: 10/24/2022]
Abstract
Protein kinases are enzymes acting as a source of phosphate through ATP to regulate protein biological activities by phosphorylating groups of specific amino acids. For that reason, inhibiting protein kinases with an active small molecule plays a significant role in cancer treatment. To achieve this aim, computational drug design, especially QSAR model, is one of the best economical approaches to reduce time and save in costs. In this respect, active inhibitors are attempted to be distinguished from inactive ones using hybrid QSAR model. Therefore, genetic algorithm and K-Nearest Neighbor method were suggested as a dimensional reduction and classification model, respectively. Finally, to evaluate the proposed model's performance, support vector machine and Naïve Bayesian algorithm were examined. The outputs of the proposed model demonstrated significant superiority to other QSAR models.
Collapse
|
39
|
MSCHLMDA: Multi-Similarity Based Combinative Hypergraph Learning for Predicting MiRNA-Disease Association. Front Genet 2020; 11:354. [PMID: 32351545 PMCID: PMC7174776 DOI: 10.3389/fgene.2020.00354] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 03/23/2020] [Indexed: 12/17/2022] Open
Abstract
Accumulating biological and clinical evidence has confirmed the important associations between microRNAs (miRNAs) and a variety of human diseases. Predicting disease-related miRNAs is beneficial for understanding the molecular mechanisms of pathological conditions at the miRNA level, and facilitating the finding of new biomarkers for prevention, diagnosis and treatment of complex human diseases. However, the challenge for researchers is to establish methods that can effectively combine different datasets and make reliable predictions. In this work, we propose the method of Multi-Similarity based Combinative Hypergraph Learning for Predicting MiRNA-disease Association (MSCHLMDA). To establish this method, complex features were extracted by two measures for each miRNA-disease pair. Then, K-nearest neighbor (KNN) and K-means algorithm were used to construct two different hypergraphs. Finally, results from combinative hypergraph learning were used for predicting miRNA-disease association. In order to evaluate the prediction performance of our method, leave-one-out cross validation and 5-fold cross validation was implemented, showing that our method had significantly improved prediction performance compared to previously used methods. Moreover, three case studies on different human complex diseases were performed, which further demonstrated the predictive performance of MSCHLMDA. It is anticipated that MSCHLMDA would become an excellent complement to the biomedical research field in the future.
Collapse
|
40
|
A Hybrid Method for the Diagnosis and Classifying Parkinson's Patients based on Time-frequency Domain Properties and K-nearest Neighbor. JOURNAL OF MEDICAL SIGNALS & SENSORS 2020; 10:60-66. [PMID: 32166079 PMCID: PMC7038745 DOI: 10.4103/jmss.jmss_61_18] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Revised: 07/13/2019] [Accepted: 09/07/2019] [Indexed: 11/12/2022]
Abstract
The vibrations of hands and arms are the main symptoms of Parkinson's ailment. Nevertheless, the affection of the vocal cords leads to troubles and defects in the speech, which is another accurate symptom of the disease. This article presents a diagnostic model of Parkinson's disease (PD) and proposes the time–frequency transform (wavelet WT) and Mel-frequency cepstral coefficients (MFCC) treatment for this disease. The proposed treatment is centered on the vocal signal transformation by a method based on the WT and to extract the coefficients of the MFCC and eventually the categorization of the sick and healthy patients by the use of the classifier K-nearest neighbor (KNN). The analysis used in this article uses a database that contains 18 healthy patients and twenty patients. The Daubechies mother WT is used in treatments to compress the vocal signal and extract the MFCC cepstral coefficients. As far as, the diagnosis of Parkinson's ailment is concerned the KNN classifying performance gives 89% accuracy when applied to 52% of the database as training data, whereas when we increase this percentage from 52% to 73%, we reach 98.68% accuracy which is higher than using the support-vector machine classifier. The KNN is conclusive in the determination of the PD. Moreover, the higher the training data is, the more precise the results are.
Collapse
|
41
|
Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review. BIG DATA 2019; 7:221-248. [PMID: 31411491 DOI: 10.1089/big.2018.0175] [Citation(s) in RCA: 113] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
The K-nearest neighbor (KNN) classifier is one of the simplest and most common classifiers, yet its performance competes with the most complex classifiers in the literature. The core of this classifier depends mainly on measuring the distance or similarity between the tested examples and the training examples. This raises a major question about which distance measures to be used for the KNN classifier among a large number of distance and similarity measures available? This review attempts to answer this question through evaluating the performance (measured by accuracy, precision, and recall) of the KNN using a large number of distance measures, tested on a number of real-world data sets, with and without adding different levels of noise. The experimental results show that the performance of KNN classifier depends significantly on the distance used, and the results showed large gaps between the performances of different distances. We found that a recently proposed nonconvex distance performed the best when applied on most data sets comparing with the other tested distances. In addition, the performance of the KNN with this top performing distance degraded only ∼20% while the noise level reaches 90%, this is true for most of the distances used as well. This means that the KNN classifier using any of the top 10 distances tolerates noise to a certain degree. Moreover, the results show that some distances are less affected by the added noise comparing with other distances.
Collapse
|
42
|
Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies. Metabolomics 2018; 14:128. [PMID: 30830398 PMCID: PMC6153696 DOI: 10.1007/s11306-018-1420-2] [Citation(s) in RCA: 108] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/11/2018] [Accepted: 08/24/2018] [Indexed: 12/12/2022]
Abstract
BACKGROUND Untargeted mass spectrometry (MS)-based metabolomics data often contain missing values that reduce statistical power and can introduce bias in biomedical studies. However, a systematic assessment of the various sources of missing values and strategies to handle these data has received little attention. Missing data can occur systematically, e.g. from run day-dependent effects due to limits of detection (LOD); or it can be random as, for instance, a consequence of sample preparation. METHODS We investigated patterns of missing data in an MS-based metabolomics experiment of serum samples from the German KORA F4 cohort (n = 1750). We then evaluated 31 imputation methods in a simulation framework and biologically validated the results by applying all imputation approaches to real metabolomics data. We examined the ability of each method to reconstruct biochemical pathways from data-driven correlation networks, and the ability of the method to increase statistical power while preserving the strength of established metabolic quantitative trait loci. RESULTS Run day-dependent LOD-based missing data accounts for most missing values in the metabolomics dataset. Although multiple imputation by chained equations performed well in many scenarios, it is computationally and statistically challenging. K-nearest neighbors (KNN) imputation on observations with variable pre-selection showed robust performance across all evaluation schemes and is computationally more tractable. CONCLUSION Missing data in untargeted MS-based metabolomics data occur for various reasons. Based on our results, we recommend that KNN-based imputation is performed on observations with variable pre-selection since it showed robust results in all evaluation schemes.
Collapse
|
43
|
Automatic Seizure Detection Based on Morphological Features Using One-Dimensional Local Binary Pattern on Long-Term EEG. Clin EEG Neurosci 2018; 49:351-362. [PMID: 29214865 DOI: 10.1177/1550059417744890] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Epileptic neurological disorder of the brain is widely diagnosed using the electroencephalography (EEG) technique. EEG signals are nonstationary in nature and show abnormal neural activity during the ictal period. Seizures can be identified by analyzing and obtaining features of EEG signal that can detect these abnormal activities. The present work proposes a novel morphological feature extraction technique based on the local binary pattern (LBP) operator. LBP provides a unique decimal value to a sample point by weighing the binary outcomes after thresholding the neighboring samples with the present sample point. These LBP values assist in capturing the rising and falling edges of the EEG signal, thus providing a morphologically featured discriminating pattern for epilepsy detection. In the present work, the variability in the LBP values is measured by calculating the sum of absolute difference of the consecutive LBP values. Interquartile range is calculated over the preprocessed EEG signal to provide dispersion measure in the signal. For classification purpose, K-nearest neighbor classifier is used, and the performance is evaluated on 896.9 hours of data from CHB-MIT continuous EEG database. Mean accuracy of 99.7% and mean specificity of 99.8% is obtained with average false detection rate of 0.47/h and sensitivity of 99.2% for 136 seizures.
Collapse
|
44
|
An Adaptive Weighted KNN Positioning Method Based on Omnidirectional Fingerprint Database and Twice Affinity Propagation Clustering. SENSORS 2018; 18:s18082502. [PMID: 30071642 PMCID: PMC6111553 DOI: 10.3390/s18082502] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Revised: 07/09/2018] [Accepted: 07/27/2018] [Indexed: 11/16/2022]
Abstract
The human body has a great influence on Wi-Fi signal power. A fixed K value leads to localization errors for the K-nearest neighbor (KNN) algorithm. To address these problems, we present an adaptive weighted KNN positioning method based on an omnidirectional fingerprint database (ODFD) and twice affinity propagation clustering. Firstly, an OFPD is proposed to alleviate body's sheltering impact on signal, which includes position, orientation and the sequence of mean received signal strength (RSS) at each reference point (RP). Secondly, affinity propagation clustering (APC) algorithm is introduced on the offline stage based on the fusion of signal-domain distance and position-domain distance. Finally, adaptive weighted KNN algorithm based on APC is proposed for estimating user's position during online stage. K initial RPs can be obtained by KNN, then they are clustered by APC algorithm based on their position-domain distances. The most probable sub-cluster is reserved by the comparison of RPs' number and signal-domain distance between sub-cluster center and the online RSS readings. The weighted average coordinates in the remaining sub-cluster can be estimated. We have implemented the proposed method with the mean error of 2.2 m, the root mean square error of 1.5 m. Experimental results show that our proposed method outperforms traditional fingerprinting methods.
Collapse
|
45
|
[Prediction of protein subcellular locations by ensemble of improved K-nearest neighbor]. SHENG WU GONG CHENG XUE BAO = CHINESE JOURNAL OF BIOTECHNOLOGY 2017; 33:683-691. [PMID: 28920401 DOI: 10.13345/j.cjb.160389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Adaboost algorithm with improved K-nearest neighbor classifiers is proposed to predict protein subcellular locations. Improved K-nearest neighbor classifier uses three sequence feature vectors including amino acid composition, dipeptide and pseudo amino acid composition of protein sequence. K-nearest neighbor uses Blast in classification stage. The overall success rates by the jackknife test on two data sets of CH317 and Gram1253 are 92.4% and 93.1%. Adaboost algorithm with the novel K-nearest neighbor improved by Blast is an effective method for predicting subcellular locations of proteins.
Collapse
|
46
|
Automated diagnosis of congestive heart failure using dual tree complex wavelet transform and statistical features extracted from 2s of ECG signals. Comput Biol Med 2017; 83:48-58. [PMID: 28231511 DOI: 10.1016/j.compbiomed.2017.01.019] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Revised: 01/15/2017] [Accepted: 01/28/2017] [Indexed: 01/24/2023]
Abstract
Identification of alarming features in the electrocardiogram (ECG) signal is extremely significant for the prediction of congestive heart failure (CHF). ECG signal analysis carried out using computer-aided techniques can speed up the diagnosis process and aid in the proper management of CHF patients. Therefore, in this work, dual tree complex wavelets transform (DTCWT)-based methodology is proposed for an automated identification of ECG signals exhibiting CHF from normal. In the experiment, we have performed a DTCWT on ECG segments of 2s duration up to six levels to obtain the coefficients. From these DTCWT coefficients, statistical features are extracted and ranked using Bhattacharyya, entropy, minimum redundancy maximum relevance (mRMR), receiver-operating characteristics (ROC), Wilcoxon, t-test and reliefF methods. Ranked features are subjected to k-nearest neighbor (KNN) and decision tree (DT) classifiers for automated differentiation of CHF and normal ECG signals. We have achieved 99.86% accuracy, 99.78% sensitivity and 99.94% specificity in the identification of CHF affected ECG signals using 45 features. The proposed method is able to detect CHF patients accurately using only 2s of ECG signal length and hence providing sufficient time for the clinicians to further investigate on the severity of CHF and treatments.
Collapse
|
47
|
A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2016; 131:191-206. [PMID: 27265059 DOI: 10.1016/j.cmpb.2016.04.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Revised: 03/18/2016] [Accepted: 04/06/2016] [Indexed: 06/05/2023]
Abstract
BACKGROUND In the age of information superhighway, big data play a significant role in information processing, extractions, retrieving and management. In computational biology, the continuous challenge is to manage the biological data. Data mining techniques are sometimes imperfect for new space and time requirements. Thus, it is critical to process massive amounts of data to retrieve knowledge. The existing software and automated tools to handle big data sets are not sufficient. As a result, an expandable mining technique that enfolds the large storage and processing capability of distributed or parallel processing platforms is essential. METHOD In this analysis, a contemporary distributed clustering methodology for imbalance data reduction using k-nearest neighbor (K-NN) classification approach has been introduced. The pivotal objective of this work is to illustrate real training data sets with reduced amount of elements or instances. These reduced amounts of data sets will ensure faster data classification and standard storage management with less sensitivity. However, general data reduction methods cannot manage very big data sets. To minimize these difficulties, a MapReduce-oriented framework is designed using various clusters of automated contents, comprising multiple algorithmic approaches. RESULTS To test the proposed approach, a real DNA (deoxyribonucleic acid) dataset that consists of 90 million pairs has been used. The proposed model reduces the imbalance data sets from large-scale data sets without loss of its accuracy. CONCLUSIONS The obtained results depict that MapReduce based K-NN classifier provided accurate results for big data of DNA.
Collapse
|
48
|
Using K-Nearest Neighbor Classification to Diagnose Abnormal Lung Sounds. SENSORS 2015; 15:13132-58. [PMID: 26053756 PMCID: PMC4507578 DOI: 10.3390/s150613132] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2015] [Revised: 05/28/2015] [Accepted: 05/28/2015] [Indexed: 11/17/2022]
Abstract
A reported 30% of people worldwide have abnormal lung sounds, including crackles, rhonchi, and wheezes. To date, the traditional stethoscope remains the most popular tool used by physicians to diagnose such abnormal lung sounds, however, many problems arise with the use of a stethoscope, including the effects of environmental noise, the inability to record and store lung sounds for follow-up or tracking, and the physician's subjective diagnostic experience. This study has developed a digital stethoscope to help physicians overcome these problems when diagnosing abnormal lung sounds. In this digital system, mel-frequency cepstral coefficients (MFCCs) were used to extract the features of lung sounds, and then the K-means algorithm was used for feature clustering, to reduce the amount of data for computation. Finally, the K-nearest neighbor method was used to classify the lung sounds. The proposed system can also be used for home care: if the percentage of abnormal lung sound frames is > 30% of the whole test signal, the system can automatically warn the user to visit a physician for diagnosis. We also used bend sensors together with an amplification circuit, Bluetooth, and a microcontroller to implement a respiration detector. The respiratory signal extracted by the bend sensors can be transmitted to the computer via Bluetooth to calculate the respiratory cycle, for real-time assessment. If an abnormal status is detected, the device will warn the user automatically. Experimental results indicated that the error in respiratory cycles between measured and actual values was only 6.8%, illustrating the potential of our detector for home care applications.
Collapse
|
49
|
Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2014; 113:792-808. [PMID: 24472367 DOI: 10.1016/j.cmpb.2014.01.001] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2013] [Revised: 12/29/2013] [Accepted: 01/03/2014] [Indexed: 06/03/2023]
Abstract
This study proposes a novel prediction approach for human breast and colon cancers using different feature spaces. The proposed scheme consists of two stages: the preprocessor and the predictor. In the preprocessor stage, the mega-trend diffusion (MTD) technique is employed to increase the samples of the minority class, thereby balancing the dataset. In the predictor stage, machine-learning approaches of K-nearest neighbor (KNN) and support vector machines (SVM) are used to develop hybrid MTD-SVM and MTD-KNN prediction models. MTD-SVM model has provided the best values of accuracy, G-mean and Matthew's correlation coefficient of 96.71%, 96.70% and 71.98% for cancer/non-cancer dataset, breast/non-breast cancer dataset and colon/non-colon cancer dataset, respectively. We found that hybrid MTD-SVM is the best with respect to prediction performance and computational cost. MTD-KNN model has achieved moderately better prediction as compared to hybrid MTD-NB (Naïve Bayes) but at the expense of higher computing cost. MTD-KNN model is faster than MTD-RF (random forest) but its prediction is not better than MTD-RF. To the best of our knowledge, the reported results are the best results, so far, for these datasets. The proposed scheme indicates that the developed models can be used as a tool for the prediction of cancer. This scheme may be useful for study of any sequential information such as protein sequence or any nucleic acid sequence.
Collapse
|
50
|
An intelligent procedure for watermelon ripeness detection based on vibration signals. Journal of Food Science and Technology 2013; 52:1075-81. [PMID: 25694721 DOI: 10.1007/s13197-013-1068-x] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Revised: 04/20/2013] [Accepted: 06/11/2013] [Indexed: 11/24/2022]
Abstract
In this paper, an efficient procedure for ripeness detection of watermelon was presented. A nondestructive method was used based on vibration response to determine the internal quality of watermelon. The responses of samples to vibration excitation were optically recorded by a Laser Doppler (LD) vibrometer. Vibration data was collected from watermelons of two qualities, namely, ripe and unripe. Vibration signals were transformed from time-domain to frequency-domain by fast Fourier transform (FFT). Twenty nine features were extracted from the FFT amplitude and phase angle of the vibration signals. K-nearest neighbor (KNN) analysis was applied as a classifier in decision-making stage. The experimental results showed that the usage of the FFT amplitude of the vibration signals gave the maximum classification accuracy. This method allowed identification at a 95.0 % level of efficiency. Hence, the proposed method can reliably detect watermelon ripeness.
Collapse
|