1
|
Bischof G, Januschewski E, Juadjur A. Authentication of Laying Hen Housing Systems Based on Egg Yolk Using 1H NMR Spectroscopy and Machine Learning. Foods 2024; 13:1098. [PMID: 38611402 PMCID: PMC11011716 DOI: 10.3390/foods13071098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 03/29/2024] [Accepted: 04/01/2024] [Indexed: 04/14/2024] Open
Abstract
(1) Background: The authenticity of eggs in relation to the housing system of laying hens is susceptible to food fraud due to the potential for egg mislabeling. (2) Methods: A total of 4188 egg yolks, obtained from four different breeds of laying hens housed in colony cage, barn, free-range, and organic systems, were analyzed using 1H NMR spectroscopy. The data of the resulting 1H NMR spectra were used for different machine learning methods to build classification models for the four housing systems. (3) Results: The comparison of the seven computed models showed that the support vector machine (SVM) model gave the best results with a cross-validation accuracy of 98.5%. The test of classification models with eggs from supermarkets showed that only a maximum of 62.8% of samples were classified according to the housing system labeled on the eggs. (4) Conclusion: The classification models developed in this study included the largest sample size compared to the literature. The SVM model is most suitable for evaluating 1H NMR data in terms of the hen housing system. The test with supermarket samples showed that more authentic samples to analyze influencing factors such as breed, feeding, and housing changes are required.
Collapse
Affiliation(s)
- Greta Bischof
- Chemical Analytics, German Institute of Food Technologies (DIL e.V.), Prof.-v.-Klitzing-Str. 7, 49610 Quakenbrück, Germany (A.J.)
| | | | | |
Collapse
|
2
|
Li S, Tsui PH, Wu W, Wu S, Zhou Z. Ultrasound k-nearest neighbor entropy imaging: Theory, algorithm, and applications. Ultrasonics 2024; 138:107256. [PMID: 38325231 DOI: 10.1016/j.ultras.2024.107256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 01/25/2024] [Accepted: 01/26/2024] [Indexed: 02/09/2024]
Abstract
Ultrasound information entropy is a flexible approach for analyzing ultrasound backscattering. Shannon entropy imaging based on probability distribution histograms (PDHs) has been implemented as a promising method for tissue characterization and diagnosis. However, the bin number affects the stability of entropy estimation. In this study, we introduced the k-nearest neighbor (KNN) algorithm to estimate entropy values and proposed ultrasound KNN entropy imaging. The proposed KNN estimator leveraged the Euclidean distance between data samples, rather than the histogram bins by conventional PDH estimators. We also proposed cumulative relative entropy (CRE) imaging to analyze time-series radiofrequency signals and applied it to monitor thermal lesions induced by microwave ablation (MWA). Computer simulation phantom experiments were conducted to validate and compare the performance of the proposed KNN entropy imaging, the conventional PDH entropy imaging, and Nakagami-m parametric imaging in detecting the variations of scatterer densities and visualizing inclusions. Clinical data of breast lesions were analyzed, and porcine liver MWA experiments ex vivo were conducted to validate the performance of KNN entropy imaging in classifying benign and malignant breast tumors and monitoring thermal lesions, respectively. Compared with PDH, the entropy estimation based on KNN was less affected by the tuning parameters. KNN entropy imaging was more sensitive to changes in scatterer densities and performed better visualizable capability than typical Shannon entropy (TSE) and Nakagami-m parametric imaging. Among different imaging methods, KNN-based Shannon entropy (KSE) imaging achieved the higher accuracy in classification of benign and malignant breast tumors and KNN-based CRE imaging had larger lesion-to-normal contrast when monitoring the ablated areas during MWA at different powers and treatment durations. Ultrasound KNN entropy imaging is a potential quantitative ultrasound approach for tissue characterization.
Collapse
Affiliation(s)
- Sinan Li
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing, China
| | - Po-Hsiang Tsui
- Department of Medical Imaging and Radiological Sciences, College of Medicine, Chang Gung University, Taoyuan, Taiwan; Institute for Radiological Research, Chang Gung University, Taoyuan, Taiwan; Division of Pediatric Gastroenterology, Department of Pediatrics, Chang Gung Memorial Hospital at Linkou, Taoyuan, Taiwan
| | - Weiwei Wu
- College of Biomedical Engineering, Capital Medical University, Beijing, China
| | - Shuicai Wu
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing, China.
| | - Zhuhuang Zhou
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing, China.
| |
Collapse
|
3
|
Ma X, Han X, Zhang L. An Improved k-Nearest Neighbor Algorithm for Recognition and Classification of Thyroid Nodules. J Ultrasound Med 2024. [PMID: 38400537 DOI: 10.1002/jum.16429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 01/17/2024] [Accepted: 01/28/2024] [Indexed: 02/25/2024]
Abstract
OBJECTIVES To complete the task of automatic recognition and classification of thyroid nodules and solve the problem of high classification error rates when the samples are imbalanced. METHODS An improved k-nearest neighbor (KNN) algorithm is proposed and a method for automatic thyroid nodule classification based on the improved KNN algorithm is established. In the improved KNN algorithm, we consider not only the number of class labels for various classes of data in KNNs, but also the corresponding weights. And we use the Minkowski distance measure instead of the Euclidean distance measure. RESULTS A total of 508 ultrasound images of thyroid nodules, including 415 benign nodules and 93 malignant nodules, were used in the paper. Experimental results show the improved KNN has 0.872549 accuracy, 0.867347 precision, 1 recall, and 0.928962 F1-score. At the same time, we also considered the influence of different distance weights, the value of k, different distance measures on the classification results. CONCLUSIONS A comparison result shows that our method has a better performance than the traditional KNN and other classical machine learning methods.
Collapse
Affiliation(s)
- Xuesi Ma
- School of Mathematics and Information Science, Henan Polytechnic University, Jiaozuo, China
| | - Xiang Han
- School of Mathematics and Information Science, Henan Polytechnic University, Jiaozuo, China
| | - Lina Zhang
- School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, China
| |
Collapse
|
4
|
Shakeel CS, Khan SJ. Machine learning (ML) techniques as effective methods for evaluating hair and skin assessments: A systematic review. Proc Inst Mech Eng H 2024; 238:132-148. [PMID: 38156410 DOI: 10.1177/09544119231216290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2023]
Abstract
Machine Learning (ML) techniques provide the ability to effectively evaluate and analyze human skin and hair assessments. The aim of this study is to systematically review the effectiveness of applying Machine Learning (ML) methods and Artificial Intelligence (AI) techniques in order to evaluate hair and skin assessments. PubMed, Web of Science, IEEE Xplore, and Science Direct were searched in order to retrieve research publications between 1 January 2010 and 31 March 2020 using appropriate keywords such as "hair and skin analysis." Following accurate screening, 20 peer-reviewed publications were selected for inclusion in this systematic review. The analysis demonstrated that prevalent Machine Learning (ML) methods comprised of Support Vector Machine (SVM), k-nearest Neighbor, and Artificial Neural Networks (ANN). ANN's were observed to yield the highest accuracy of 95% followed by SVM generating 90%. These techniques were most commonly applied for drafting framework assessments such as that of Melanoma. Values of parameters such as Sensitivity, Specificity, and Area under the Curve (AUC) were extracted from the studies and with the help of comparisons, relevant inferences were also made. ANN's were observed to yield the highest sensitivity of 82.30% as well as a 96.90% specificity. Hence, with this systematic review, a summarization of the studies was drafted that encapsulated how Machine Learning (ML) techniques have been employed for the analysis and evaluation of hair and skin assessments.
Collapse
Affiliation(s)
| | - Saad Jawaid Khan
- Department of Biomedical Engineering, Ziauddin University (ZUFESTM), Karachi, Pakistan
| |
Collapse
|
5
|
Plain B, Pielage H, Kramer SE, Richter M, Saunders GH, Versfeld NJ, Zekveld AA, Bhuiyan TA. Combining Cardiovascular and Pupil Features Using k-Nearest Neighbor Classifiers to Assess Task Demand, Social Context, and Sentence Accuracy During Listening. Trends Hear 2024; 28:23312165241232551. [PMID: 38549351 PMCID: PMC10981225 DOI: 10.1177/23312165241232551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 01/04/2024] [Accepted: 01/25/2024] [Indexed: 04/01/2024] Open
Abstract
In daily life, both acoustic factors and social context can affect listening effort investment. In laboratory settings, information about listening effort has been deduced from pupil and cardiovascular responses independently. The extent to which these measures can jointly predict listening-related factors is unknown. Here we combined pupil and cardiovascular features to predict acoustic and contextual aspects of speech perception. Data were collected from 29 adults (mean = 64.6 years, SD = 9.2) with hearing loss. Participants performed a speech perception task at two individualized signal-to-noise ratios (corresponding to 50% and 80% of sentences correct) and in two social contexts (the presence and absence of two observers). Seven features were extracted per trial: baseline pupil size, peak pupil dilation, mean pupil dilation, interbeat interval, blood volume pulse amplitude, pre-ejection period and pulse arrival time. These features were used to train k-nearest neighbor classifiers to predict task demand, social context and sentence accuracy. The k-fold cross validation on the group-level data revealed above-chance classification accuracies: task demand, 64.4%; social context, 78.3%; and sentence accuracy, 55.1%. However, classification accuracies diminished when the classifiers were trained and tested on data from different participants. Individually trained classifiers (one per participant) performed better than group-level classifiers: 71.7% (SD = 10.2) for task demand, 88.0% (SD = 7.5) for social context, and 60.0% (SD = 13.1) for sentence accuracy. We demonstrated that classifiers trained on group-level physiological data to predict aspects of speech perception generalized poorly to novel participants. Individually calibrated classifiers hold more promise for future applications.
Collapse
Affiliation(s)
- Bethany Plain
- Otolaryngology Head and Neck Surgery, Ear & Hearing, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam UMC, Amsterdam, the Netherlands
- Eriksholm Research Centre, Snekkersten, Denmark
| | - Hidde Pielage
- Otolaryngology Head and Neck Surgery, Ear & Hearing, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam UMC, Amsterdam, the Netherlands
- Eriksholm Research Centre, Snekkersten, Denmark
| | - Sophia E. Kramer
- Otolaryngology Head and Neck Surgery, Ear & Hearing, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam UMC, Amsterdam, the Netherlands
| | - Michael Richter
- School of Psychology, Liverpool John Moores University, Liverpool, UK
| | - Gabrielle H. Saunders
- Manchester Centre for Audiology and Deafness (ManCAD), University of Manchester, Manchester, UK
| | - Niek J. Versfeld
- Otolaryngology Head and Neck Surgery, Ear & Hearing, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam UMC, Amsterdam, the Netherlands
| | - Adriana A. Zekveld
- Otolaryngology Head and Neck Surgery, Ear & Hearing, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam UMC, Amsterdam, the Netherlands
| | | |
Collapse
|
6
|
Khosravani P, Baghernejad M, Moosavi AA, Rezaei M. Digital mapping and spatial modeling of some soil physical and mechanical properties in a semi-arid region of Iran. Environ Monit Assess 2023; 195:1367. [PMID: 37875717 DOI: 10.1007/s10661-023-11980-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 10/11/2023] [Indexed: 10/26/2023]
Abstract
The soil's physical and mechanical (SPM) properties have significant impacts on soil processes, such as water flow, nutrient movement, aeration, microbial activity, erosion, and root growth. To digitally map some SPM properties at four global standard depths, three machine learning algorithms (MLA), namely, random forest, Cubist, and k-nearest neighbor, were employed. A total of 200-point observation was designed with the aim of a field survey across the Marvdasht Plain in Fars Province, Iran. After sampling from topsoil (0 to 30 cm) and subsoil depths (30 to 60 cm), the samples were transferred to the laboratory to determine the mean weight diameter (MWD) and geometric mean diameter (GMD) of aggregates in the laboratory. In addition, shear strength (SS) and penetration resistance (PR) were measured directly during the field survey. In parallel, 79 environmental factors were prepared from topographic and remote sensing data. Four soil variables were also included in the modeling process, as they were co-located with SPM properties based on expert opinion. For selecting the most influential covariates, the variance inflation factor (VIF) and Boruta methods were employed. Two covariate dataset scenarios were used to assess the impact of soil and environmental factors on the modeling of SPM properties including SPM and environmental covariates (scenario 1) and SPM, environmental covariates, and soil variables (scenario 2). From all covariates, nine soil and environmental factors were selected for modeling the SPM properties, of which four of them were the soil variables, three were related to remote sensing, and two factors had topographic sources. The results indicated that scenario 2 outperformed in all standard depths. The findings suggested that clay and SOM are key factors in predicting SPM, highlighting the importance of considering soil variables in addition to environmental covariates for enhancing the accuracy of machine learning prediction. The k-nearest neighbor algorithm was found to be highly effective in predicting SPM, while the random forest algorithm yielded the highest R2 value (0.92) for penetration resistance properties at 15-30 depth. Overall, the approach used in this research has the potential to be extended beyond the Marvdasht Plain of Fars Province, Iran, as well as to other regions worldwide with comparable soil-forming factors. Moreover, this study provides a valuable framework for the digital mapping of SPM properties, serving as a guide for future studies seeking to predict SPM properties. Globally, the output of this research has important significance for soil management and conservation efforts and can facilitate the development of sustainable agricultural practices.
Collapse
Affiliation(s)
- Pegah Khosravani
- Department of Soil Science, College of Agriculture, Shiraz University, Shiraz, Iran
| | - Majid Baghernejad
- Department of Soil Science, College of Agriculture, Shiraz University, Shiraz, Iran.
| | - Ali Akbar Moosavi
- Department of Soil Science, College of Agriculture, Shiraz University, Shiraz, Iran
| | - Meisam Rezaei
- Soil and Water Research Institute (SWRI), Agricultural Research, Education and Extension Organization (AREEO), Karaj, Iran
| |
Collapse
|
7
|
Alkhammash EH, Assiri SA, Nemenqani DM, Althaqafi RMM, Hadjouni M, Saeed F, Elshewey AM. Application of Machine Learning to Predict COVID-19 Spread via an Optimized BPSO Model. Biomimetics (Basel) 2023; 8:457. [PMID: 37887588 PMCID: PMC10604133 DOI: 10.3390/biomimetics8060457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 09/21/2023] [Accepted: 09/21/2023] [Indexed: 10/28/2023] Open
Abstract
During the pandemic of the coronavirus disease (COVID-19), statistics showed that the number of affected cases differed from one country to another and also from one city to another. Therefore, in this paper, we provide an enhanced model for predicting COVID-19 samples in different regions of Saudi Arabia (high-altitude and sea-level areas). The model is developed using several stages and was successfully trained and tested using two datasets that were collected from Taif city (high-altitude area) and Jeddah city (sea-level area) in Saudi Arabia. Binary particle swarm optimization (BPSO) is used in this study for making feature selections using three different machine learning models, i.e., the random forest model, gradient boosting model, and naive Bayes model. A number of predicting evaluation metrics including accuracy, training score, testing score, F-measure, recall, precision, and receiver operating characteristic (ROC) curve were calculated to verify the performance of the three machine learning models on these datasets. The experimental results demonstrated that the gradient boosting model gives better results than the random forest and naive Bayes models with an accuracy of 94.6% using the Taif city dataset. For the dataset of Jeddah city, the results demonstrated that the random forest model outperforms the gradient boosting and naive Bayes models with an accuracy of 95.5%. The dataset of Jeddah city achieved better results than the dataset of Taif city in Saudi Arabia using the enhanced model for the term of accuracy.
Collapse
Affiliation(s)
- Eman H. Alkhammash
- Department of Computer Science, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia;
| | - Sara Ahmad Assiri
- Otolaryngology-Head and Neck Surgert Department, King Faisal Hospital, P.O. Box 11099, Taif 21944, Saudi Arabia;
| | - Dalal M. Nemenqani
- College of Medicine, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia; (D.M.N.); (R.M.M.A.)
| | - Raad M. M. Althaqafi
- College of Medicine, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia; (D.M.N.); (R.M.M.A.)
| | - Myriam Hadjouni
- Department of Computer Sciences, College of Computer and Information Science, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Faisal Saeed
- DAAI Research Group, Department of Computing and Data Science, School of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK;
| | - Ahmed M. Elshewey
- Faculty of Computers and Information, Computer Science Department, Suez University, Suez 43533, Egypt;
| |
Collapse
|
8
|
Houssein EH, Samee NA, Mahmoud NF, Hussain K. Dynamic Coati Optimization Algorithm for Biomedical Classification Tasks. Comput Biol Med 2023; 164:107237. [PMID: 37467535 DOI: 10.1016/j.compbiomed.2023.107237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 06/13/2023] [Accepted: 07/07/2023] [Indexed: 07/21/2023]
Abstract
Medical datasets are primarily made up of numerous pointless and redundant elements in a collection of patient records. None of these characteristics are necessary for a medical decision-making process. Conversely, a large amount of data leads to increased dimensionality and decreased classifier performance in terms of machine learning. Numerous approaches have recently been put out to address this issue, and the results indicate that feature selection can be a successful remedy. To meet the various needs of input patterns, medical diagnostic tasks typically involve learning a suitable categorization model. The k-Nearest Neighbors algorithm (kNN) classifier's classification performance is typically decreased by the input variables' abundance of irrelevant features. To simplify the kNN classifier, essential attributes of the input variables have been searched using the feature selection approach. This paper presents the Coati Optimization Algorithm (DCOA) in a dynamic form as a feature selection technique where each iteration of the optimization process involves the introduction of a different feature. We enhance the exploration and exploitation capability of DCOA by employing dynamic opposing candidate solutions. The most impressive feature of DCOA is that it does not require any preparatory parameter fine-tuning to the most popular metaheuristic algorithms. The CEC'22 test suite and nine medical datasets with various dimension sizes were used to evaluate the performance of the original COA and the proposed dynamic version. The statistical results were validated using the Bonferroni-Dunn test and Kendall's W test and showed the superiority of DCOA over seven well-known metaheuristic algorithms with an overall accuracy of 89.7%, a feature selection of 24%, a sensitivity of 93.35% a specificity of 96.81%, and a precision of 93.90%.
Collapse
Affiliation(s)
- Essam H Houssein
- Faculty of Computers and Information, Minia University, Minia, Egypt.
| | - Nagwan Abdel Samee
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia.
| | - Noha F Mahmoud
- Rehabilitation Sciences Department, Health and Rehabilitation Sciences College, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia.
| | - Kashif Hussain
- Department of Science and Engineering, Solent University, East Park Terrace, Southampton, SO14 0YN, United Kingdom.
| |
Collapse
|
9
|
Shi S, Xu Y, Xu X, Mo X, Ding J. A Preprocessing Manifold Learning Strategy Based on t-Distributed Stochastic Neighbor Embedding. Entropy (Basel) 2023; 25:1065. [PMID: 37510011 PMCID: PMC10378244 DOI: 10.3390/e25071065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 07/01/2023] [Accepted: 07/05/2023] [Indexed: 07/30/2023]
Abstract
In machine learning and data analysis, dimensionality reduction and high-dimensional data visualization can be accomplished by manifold learning using a t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm. We significantly improve this manifold learning scheme by introducing a preprocessing strategy for the t-SNE algorithm. In our preprocessing, we exploit Laplacian eigenmaps to reduce the high-dimensional data first, which can aggregate each data cluster and reduce the Kullback-Leibler divergence (KLD) remarkably. Moreover, the k-nearest-neighbor (KNN) algorithm is also involved in our preprocessing to enhance the visualization performance and reduce the computation and space complexity. We compare the performance of our strategy with that of the standard t-SNE on the MNIST dataset. The experiment results show that our strategy exhibits a stronger ability to separate different clusters as well as keep data of the same kind much closer to each other. Moreover, the KLD can be reduced by about 30% at the cost of increasing the complexity in terms of runtime by only 1-2%.
Collapse
Affiliation(s)
- Sha Shi
- State Key Laboratory of Integrated Services Network, Xidian University, 2 South TaiBai Road, Xi'an 710071, China
| | - Yefei Xu
- State Key Laboratory of Integrated Services Network, Xidian University, 2 South TaiBai Road, Xi'an 710071, China
| | - Xiaoyang Xu
- State Key Laboratory of Integrated Services Network, Xidian University, 2 South TaiBai Road, Xi'an 710071, China
| | - Xiaofan Mo
- National Astronomical Observatories, Chinese Academy of Sciences, 20A Datun Road, Chaoyang District, Beijing 100101, China
| | - Jun Ding
- Institute of Information Sensing, Xidian University, 2 South TaiBai Road, Xi'an 710071, China
| |
Collapse
|
10
|
Lin M, Wen K, Zhu X, Zhao H, Sun X. Graph Autoencoder with Preserving Node Attribute Similarity. Entropy (Basel) 2023; 25:e25040567. [PMID: 37190356 PMCID: PMC10138145 DOI: 10.3390/e25040567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Revised: 03/17/2023] [Accepted: 03/24/2023] [Indexed: 05/17/2023]
Abstract
The graph autoencoder (GAE) is a powerful graph representation learning tool in an unsupervised learning manner for graph data. However, most existing GAE-based methods typically focus on preserving the graph topological structure by reconstructing the adjacency matrix while ignoring the preservation of the attribute information of nodes. Thus, the node attributes cannot be fully learned and the ability of the GAE to learn higher-quality representations is weakened. To address the issue, this paper proposes a novel GAE model that preserves node attribute similarity. The structural graph and the attribute neighbor graph, which is constructed based on the attribute similarity between nodes, are integrated as the encoder input using an effective fusion strategy. In the encoder, the attributes of the nodes can be aggregated both in their structural neighborhood and by their attribute similarity in their attribute neighborhood. This allows performing the fusion of the structural and node attribute information in the node representation by sharing the same encoder. In the decoder module, the adjacency matrix and the attribute similarity matrix of the nodes are reconstructed using dual decoders. The cross-entropy loss of the reconstructed adjacency matrix and the mean-squared error loss of the reconstructed node attribute similarity matrix are used to update the model parameters and ensure that the node representation preserves the original structural and node attribute similarity information. Extensive experiments on three citation networks show that the proposed method outperforms state-of-the-art algorithms in link prediction and node clustering tasks.
Collapse
Affiliation(s)
- Mugang Lin
- College of Computer Science and Technology, Hengyang Normal University, Hengyang 421002, China
- Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, Hengyang 421002, China
| | - Kunhui Wen
- College of Computer Science and Technology, Hengyang Normal University, Hengyang 421002, China
| | - Xuanying Zhu
- College of Computer Science and Technology, Hengyang Normal University, Hengyang 421002, China
| | - Huihuang Zhao
- College of Computer Science and Technology, Hengyang Normal University, Hengyang 421002, China
- Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, Hengyang 421002, China
| | - Xianfang Sun
- School of Computer Science and Informatics, Cardiff University, Cardiff CF24 4AG, UK
| |
Collapse
|
11
|
Shachaf LI, Roberts E, Cahan P, Xiao J. Gene regulation network inference using k-nearest neighbor-based mutual information estimation: revisiting an old DREAM. BMC Bioinformatics 2023; 24:84. [PMID: 36879188 PMCID: PMC9990267 DOI: 10.1186/s12859-022-05047-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 11/08/2022] [Indexed: 03/08/2023] Open
Abstract
BACKGROUND A cell exhibits a variety of responses to internal and external cues. These responses are possible, in part, due to the presence of an elaborate gene regulatory network (GRN) in every single cell. In the past 20 years, many groups worked on reconstructing the topological structure of GRNs from large-scale gene expression data using a variety of inference algorithms. Insights gained about participating players in GRNs may ultimately lead to therapeutic benefits. Mutual information (MI) is a widely used metric within this inference/reconstruction pipeline as it can detect any correlation (linear and non-linear) between any number of variables (n-dimensions). However, the use of MI with continuous data (for example, normalized fluorescence intensity measurement of gene expression levels) is sensitive to data size, correlation strength and underlying distributions, and often requires laborious and, at times, ad hoc optimization. RESULTS In this work, we first show that estimating MI of a bi- and tri-variate Gaussian distribution using k-nearest neighbor (kNN) MI estimation results in significant error reduction as compared to commonly used methods based on fixed binning. Second, we demonstrate that implementing the MI-based kNN Kraskov-Stoögbauer-Grassberger (KSG) algorithm leads to a significant improvement in GRN reconstruction for popular inference algorithms, such as Context Likelihood of Relatedness (CLR). Finally, through extensive in-silico benchmarking we show that a new inference algorithm CMIA (Conditional Mutual Information Augmentation), inspired by CLR, in combination with the KSG-MI estimator, outperforms commonly used methods. CONCLUSIONS Using three canonical datasets containing 15 synthetic networks, the newly developed method for GRN reconstruction-which combines CMIA, and the KSG-MI estimator-achieves an improvement of 20-35% in precision-recall measures over the current gold standard in the field. This new method will enable researchers to discover new gene interactions or better choose gene candidates for experimental validations.
Collapse
Affiliation(s)
- Lior I Shachaf
- Department of Biophysics, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD, 21218, USA.
| | - Elijah Roberts
- Department of Biophysics, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD, 21218, USA
- 10x Genomics, 6230 Stoneridge Mall Road, Pleasanton, CA, 94588-3260, USA
| | - Patrick Cahan
- Department of Biomedical Engineering, Department of Molecular Biology and Genetics, Institute for Cell Engineering, Johns Hopkins School of Medicine, 733 N. Broadway, Baltimore, MD, 21205, USA
| | - Jie Xiao
- Department of Biophysics and Biophysical Chemistry, Johns Hopkins School of Medicine, 725 N. Wolfe Street, WBSB 708, Baltimore, MD, 21205, USA
| |
Collapse
|
12
|
Suliman A, Mowla MR, Alivar A, Carlson C, Prakash P, Natarajan B, Warren S, Thompson DE. Effects of Ballistocardiogram Peak Detection Jitters on the Quality of Heart Rate Variability Features: A Simulation-Based Case Study in the Context of Sleep Staging. Sensors (Basel) 2023; 23:2693. [PMID: 36904896 PMCID: PMC10007206 DOI: 10.3390/s23052693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 02/23/2023] [Accepted: 02/27/2023] [Indexed: 06/18/2023]
Abstract
Heart rate variability (HRV) features support several clinical applications, including sleep staging, and ballistocardiograms (BCGs) can be used to unobtrusively estimate these features. Electrocardiography is the traditional clinical standard for HRV estimation, but BCGs and electrocardiograms (ECGs) yield different estimates for heartbeat intervals (HBIs), leading to differences in calculated HRV parameters. This study examines the viability of using BCG-based HRV features for sleep staging by quantifying the impact of these timing differences on the resulting parameters of interest. We introduced a range of synthetic time offsets to simulate the differences between BCG- and ECG-based heartbeat intervals, and the resulting HRV features are used to perform sleep staging. Subsequently, we draw a relationship between the mean absolute error in HBIs and the resulting sleep-staging performances. We also extend our previous work in heartbeat interval identification algorithms to demonstrate that our simulated timing jitters are close representatives of errors between heartbeat interval measurements. This work indicates that BCG-based sleep staging can produce accuracies comparable to ECG-based techniques such that at an HBI error range of up to 60 ms, the sleep-scoring error could increase from 17% to 25% based on one of the scenarios we examined.
Collapse
|
13
|
Stój A, Czernecki T, Domagała D. Authentication of Polish Red Wines Produced from Zweigelt and Rondo Grape Varieties Based on Volatile Compounds Analysis in Combination with Machine Learning Algorithms: Hotrienol as a Marker of the Zweigelt Variety. Molecules 2023; 28:molecules28041961. [PMID: 36838950 PMCID: PMC9967794 DOI: 10.3390/molecules28041961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Revised: 02/10/2023] [Accepted: 02/16/2023] [Indexed: 02/22/2023] Open
Abstract
The aim of this study was to determine volatile compounds in red wines of Zweigelt and Rondo varieties using HS-SPME/GC-MS and to find a marker and/or a classification model for the assessment of varietal authenticity. The wines were produced by using five commercial yeast strains and two types of malolactic fermentation. Sixty-seven volatile compounds were tentatively identified in the test wines; they represented several classes: 9 acids, 24 alcohols, 2 aldehydes, 19 esters, 2 furan compounds, 2 ketones, 1 sulfur compound and 8 terpenes. 3,7-dimethyl-1,5,7-octatrien-3-ol (hotrienol) was found to be a variety marker for Zweigelt wines, since it was detected in all the Zweigelt wines, but was not present in the Rondo wines at all. The relative concentrations of volatiles were used as an input data set, divided into two subsets (training and testing), to the support vector machine (SVM) and k-nearest neighbor (kNN) algorithms. Both machine learning methods yielded models with the highest possible classification accuracy (100%) when the relative concentrations of all the test compounds or alcohols alone were used as input data. An evaluation of the importance value of subsets consisting of six volatile compounds with the highest potential to distinguish between the Zweigelt and Rondo varieties revealed that SVM and kNN yielded the best classification models (F-score of 1, accuracy of 100%) when 3-ethyl-4-methylpentan-1-ol or 3,7-dimethyl-1,5,7-octatrien-3-ol (hotrienol) or subsets containing one or both of them were used. Moreover, the best SVM model (F-score of 1) was built with a subset containing 2-phenylethyl acetate and 3-(methylsulfanyl)propan-1-ol.
Collapse
Affiliation(s)
- Anna Stój
- Department of Biotechnology, Microbiology and Human Nutrition, Faculty of Food Science and Biotechnology, University of Life Sciences, 8 Skromna Street, 20-704 Lublin, Poland
- Correspondence: (A.S.); (D.D.)
| | - Tomasz Czernecki
- Department of Biotechnology, Microbiology and Human Nutrition, Faculty of Food Science and Biotechnology, University of Life Sciences, 8 Skromna Street, 20-704 Lublin, Poland
| | - Dorota Domagała
- Department of Applied Mathematics and Computer Science, Faculty of Production Engineering, University of Life Sciences in Lublin, 28 Głęboka Street, 20-612 Lublin, Poland
- Correspondence: (A.S.); (D.D.)
| |
Collapse
|
14
|
Jalti F, Hajji B, Acri A, Calì M. An Advanced Rider-Cornering-Assistance System for PTW Vehicles Developed Using ML KNN Method. Sensors (Basel) 2023; 23:1540. [PMID: 36772580 PMCID: PMC9920225 DOI: 10.3390/s23031540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 01/25/2023] [Accepted: 01/26/2023] [Indexed: 06/18/2023]
Abstract
The dynamic behavior of a Powered Two-Wheeler (PTW) is much more complicated than that of a car, which is due to the strong coupling between the longitudinal and lateral dynamics produced by the large roll angles. This makes the analysis of the dynamics, and therefore the design and synthesis of the controller, particularly complex and difficult. In relation to assistance in dangerous situations, several recent manuscripts have suggested devices with limitations of cornering velocity by proposing restrictive models. However, these models can lead to repulsion by the users of PTW vehicles, significantly limiting vehicle performance. In the present work, the authors developed an Advanced Rider-cornering Assistance System (ARAS) based on the skills learned by riders running across curvilinear trajectories using Artificial Intelligence (AI) and Neural Network (NN) techniques. New algorithms that allow the value of velocity to be estimated by prediction accuracy of up to 99.06% were developed using the K-Nearest Neighbor (KNN) Machine Learning (ML) technique.
Collapse
Affiliation(s)
- Fakhreddine Jalti
- Laboratory of Renewable Energy, Embedded System and Information Processing, National School of Applied Sciences, Mohammed First University, Oujda 60000, Morocco
| | - Bekkay Hajji
- Laboratory of Renewable Energy, Embedded System and Information Processing, National School of Applied Sciences, Mohammed First University, Oujda 60000, Morocco
| | - Alberto Acri
- Department of Engineering, University of Messina, 98158 Messina, Italy
| | - Michele Calì
- Electric, Electronics and Computer Engineering Department, University of Catania, 95125 Catania, Italy
| |
Collapse
|
15
|
Trang NTH, Long KQ, An PL, Dang TN. Development of an Artificial Intelligence-Based Breast Cancer Detection Model by Combining Mammograms and Medical Health Records. Diagnostics (Basel) 2023; 13. [PMID: 36766450 DOI: 10.3390/diagnostics13030346] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 01/10/2023] [Accepted: 01/13/2023] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND Artificial intelligence (AI)-based computational models that analyze breast cancer have been developed for decades. The present study was implemented to investigate the accuracy and efficiency of combined mammography images and clinical records for breast cancer detection using machine learning and deep learning classifiers. METHODS This study was verified using 731 images from 357 women who underwent at least one mammogram and had clinical records for at least six months before mammography. The model was trained on mammograms and clinical variables to discriminate benign and malignant lesions. Multiple pre-trained deep CNN models to detect cancer in mammograms, including X-ception, VGG16, ResNet-v2, ResNet50, and CNN3 were employed. Machine learning models were constructed using k-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), Artificial Neural Network (ANN), and gradient boosting machine (GBM) in the clinical dataset. RESULTS The detection performance obtained an accuracy of 84.5% with a specificity of 78.1% at a sensitivity of 89.7% and an AUC of 0.88. When trained on mammography image data alone, the result achieved a slightly lower score than the combined model (accuracy, 72.5% vs. 84.5%, respectively). CONCLUSIONS A breast cancer-detection model combining machine learning and deep learning models was performed in this study with a satisfactory result, and this model has potential clinical applications.
Collapse
|
16
|
Fuadah YN, Pramudito MA, Lim KM. An Optimal Approach for Heart Sound Classification Using Grid Search in Hyperparameter Optimization of Machine Learning. Bioengineering (Basel) 2022; 10. [PMID: 36671616 DOI: 10.3390/bioengineering10010045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Revised: 12/13/2022] [Accepted: 12/27/2022] [Indexed: 12/31/2022]
Abstract
Heart-sound auscultation is one of the most widely used approaches for detecting cardiovascular disorders. Diagnosing abnormalities of heart sound using a stethoscope depends on the physician's skill and judgment. Several studies have shown promising results in automatically detecting cardiovascular disorders based on heart-sound signals. However, the accuracy performance needs to be enhanced as automated heart-sound classification aids in the early detection and prevention of the dangerous effects of cardiovascular problems. In this study, an optimal heart-sound classification method based on machine learning technologies for cardiovascular disease prediction is performed. It consists of three steps: pre-processing that sets the 5 s duration of the PhysioNet Challenge 2016 and 2022 datasets, feature extraction using Mel frequency cepstrum coefficients (MFCC), and classification using grid search for hyperparameter tuning of several classifier algorithms including k-nearest neighbor (K-NN), random forest (RF), artificial neural network (ANN), and support vector machine (SVM). The five-fold cross-validation was used to evaluate the performance of the proposed method. The best model obtained classification accuracy of 95.78% and 76.31%, which was assessed using PhysioNet Challenge 2016 and 2022, respectively. The findings demonstrate that the suggested approach obtained excellent classification results using PhysioNet Challenge 2016 and showed promising results using PhysioNet Challenge 2022. Therefore, the proposed method has been potentially developed as an additional tool to facilitate the medical practitioner in diagnosing the abnormality of the heart sound.
Collapse
|
17
|
AlMazrua H, AlShamlan H. A New Algorithm for Cancer Biomarker Gene Detection Using Harris Hawks Optimization. Sensors (Basel) 2022; 22:s22197273. [PMID: 36236372 PMCID: PMC9572901 DOI: 10.3390/s22197273] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 09/01/2022] [Accepted: 09/09/2022] [Indexed: 05/29/2023]
Abstract
This paper presents two novel swarm intelligence algorithms for gene selection, HHO-SVM and HHO-KNN. Both of these algorithms are based on Harris Hawks Optimization (HHO), one in conjunction with support vector machines (SVM) and the other in conjunction with k-nearest neighbors (k-NN). In both algorithms, the goal is to determine a small gene subset that can be used to classify samples with a high degree of accuracy. The proposed algorithms are divided into two phases. To obtain an accurate gene set and to deal with the challenge of high-dimensional data, the redundancy analysis and relevance calculation are conducted in the first phase. To solve the gene selection problem, the second phase applies SVM and k-NN with leave-one-out cross-validation. A performance evaluation was performed on six microarray data sets using the two proposed algorithms. A comparison of the two proposed algorithms with several known algorithms indicates that both of them perform quite well in terms of classification accuracy and the number of selected genes.
Collapse
|
18
|
Meno L, Escuredo O, Abuley IK, Seijo MC. Importance of Meteorological Parameters and Airborne Conidia to Predict Risk of Alternaria on a Potato Crop Ambient Using Machine Learning Algorithms. Sensors (Basel) 2022; 22:s22187063. [PMID: 36146412 PMCID: PMC9500921 DOI: 10.3390/s22187063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 09/09/2022] [Accepted: 09/15/2022] [Indexed: 05/14/2023]
Abstract
Secondary infections of early blight during potato crop season are conditioned by aerial inoculum. However, although aerobiological studies have focused on understanding the key factors that influence the spore concentration in the air, less work has been carried out to predict when critical concentrations of conidia occur. Therefore, the goals of this study were to understand the key weather variables that affect the hourly and daily conidia dispersal of Alternaria solani and A. alternata in a potato field, and to use these weather factors in different machine learning (ML) algorithms to predict the daily conidia levels. This study showed that conidia per hour in a day is influenced by the weather conditions that characterize the hour, but not the hour of the day. Specifically, the relative humidity and solar radiation were the most relevant weather parameters influencing the conidia concentration in the air and both in a linear model explained 98% of the variation of this concentration per hour. Moreover, the dew point temperature three days before was the weather variable with the strongest effect on conidia per day. An improved prediction of Alternaria conidia level was achieved via ML algorithms when the conidia of previous days is considered in the analysis. Among the ML algorithms applied, the CART model with an accuracy of 86% were the best to predict daily conidia level.
Collapse
Affiliation(s)
- Laura Meno
- Department of Vegetal Biology and Soil Sciences, Faculty of Sciences, University of Vigo, As Lagoas, 32004 Ourense, Spain
| | - Olga Escuredo
- Department of Vegetal Biology and Soil Sciences, Faculty of Sciences, University of Vigo, As Lagoas, 32004 Ourense, Spain
| | - Isaac Kwesi Abuley
- Department of Agroecology, Flakkebjerg Research Center, Aarhus University, Forsøgsvej 1, 4200 Slagelse, Denmark
| | - María Carmen Seijo
- Department of Vegetal Biology and Soil Sciences, Faculty of Sciences, University of Vigo, As Lagoas, 32004 Ourense, Spain
- Correspondence:
| |
Collapse
|
19
|
Bao G, Lin M, Sang X, Hou Y, Liu Y, Wu Y. Classification of Dysphonic Voices in Parkinson's Disease with Semi-Supervised Competitive Learning Algorithm. Biosensors (Basel) 2022; 12:502. [PMID: 35884305 PMCID: PMC9312485 DOI: 10.3390/bios12070502] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 07/04/2022] [Accepted: 07/07/2022] [Indexed: 06/15/2023]
Abstract
This article proposes a novel semi-supervised competitive learning (SSCL) algorithm for vocal pattern classifications in Parkinson’s disease (PD). The acoustic parameters of voice records were grouped into the families of jitter, shimmer, harmonic-to-noise, frequency, and nonlinear measures, respectively. The linear correlations were computed within each acoustic parameter family. According to the correlation matrix results, the jitter, shimmer, and harmonic-to-noise parameters presented as highly correlated in terms of Pearson’s correlation coefficients. Then, the principal component analysis (PCA) technique was implemented to eliminate the redundant dimensions of the acoustic parameters for each family. The Mann−Whitney−Wilcoxon hypothesis test was used to evaluate the significant difference of the PCA-projected features between the healthy subjects and PD patients. Eight dominant PCA-projected features were selected based on the eigenvalue threshold criterion and the statistical significance level (p < 0.05) of the hypothesis test. The SSCL algorithm proposed in this paper included the procedures of the competitive prototype seed selection, K-means optimization, and the nearest neighbor classifications. The pattern classification experimental results showed that the proposed SSCL method can provide the excellent diagnostic performances in terms of accuracy (0.838), recall (0.825), specificity (0.85), precision (0.846), F-score (0.835), Matthews correlation coefficient (0.675), area under the receiver operating characteristic curve (0.939), and Kappa coefficient (0.675), which were consistently better than those results of conventional KNN or SVM classifiers.
Collapse
|
20
|
Sharifonnasabi F, Jhanjhi NZ, John J, Obeidy P, Band SS, Alinejad-Rokny H, Baz M. Hybrid HCNN-KNN Model Enhances Age Estimation Accuracy in Orthopantomography. Front Public Health 2022; 10:879418. [PMID: 35712286 PMCID: PMC9197238 DOI: 10.3389/fpubh.2022.879418] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2022] [Accepted: 04/22/2022] [Indexed: 11/17/2022] Open
Abstract
Age estimation in dental radiographs Orthopantomography (OPG) is a medical imaging technique that physicians and pathologists utilize for disease identification and legal matters. For example, for estimating post-mortem interval, detecting child abuse, drug trafficking, and identifying an unknown body. Recent development in automated image processing models improved the age estimation's limited precision to an approximate range of +/- 1 year. While this estimation is often accepted as accurate measurement, age estimation should be as precise as possible in most serious matters, such as homicide. Current age estimation techniques are highly dependent on manual and time-consuming image processing. Age estimation is often a time-sensitive matter in which the image processing time is vital. Recent development in Machine learning-based data processing methods has decreased the imaging time processing; however, the accuracy of these techniques remains to be further improved. We proposed an ensemble method of image classifiers to enhance the accuracy of age estimation using OPGs from 1 year to a couple of months (1-3-6). This hybrid model is based on convolutional neural networks (CNN) and K nearest neighbors (KNN). The hybrid (HCNN-KNN) model was used to investigate 1,922 panoramic dental radiographs of patients aged 15 to 23. These OPGs were obtained from the various teaching institutes and private dental clinics in Malaysia. To minimize the chance of overfitting in our model, we used the principal component analysis (PCA) algorithm and eliminated the features with high correlation. To further enhance the performance of our hybrid model, we performed systematic image pre-processing. We applied a series of classifications to train our model. We have successfully demonstrated that combining these innovative approaches has improved the classification and segmentation and thus the age-estimation outcome of the model. Our findings suggest that our innovative model, for the first time, to the best of our knowledge, successfully estimated the age in classified studies of 1 year old, 6 months, 3 months and 1-month-old cases with accuracies of 99.98, 99.96, 99.87, and 98.78 respectively.
Collapse
Affiliation(s)
- Fatemeh Sharifonnasabi
- Department of Computer Science & Engineering, School of Computing & IT (SoCIT), Taylor's University, Subang Jaya, Malaysia
| | - Noor Zaman Jhanjhi
- Department of Computer Science & Engineering, School of Computing & IT (SoCIT), Taylor's University, Subang Jaya, Malaysia
| | - Jacob John
- Department of Restorative Dentistry, Faculty of Dentistry, University of Malaya, Kuala Lumpur, Malaysia
| | - Peyman Obeidy
- Charles Perkins Centre, Faculty of Medicine and Health, University of Sydney, Darlington, NSW, Australia
| | - Shahab S Band
- Future Technology Research Centre, College of Future, National Yunlin University of Science and Technology, Yunlin, Taiwan
| | - Hamid Alinejad-Rokny
- BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, University of New South Wales (UNSW) Sydney, Kensington, NSW, Australia.,UNSW Data Science Hub, The University of New South Wales, UNSW Sydney, Kensington, NSW, Australia.,Health Data Analytics Program, AI-enabled Processes (AIP) Research Centre, Macquarie University, Macquarie Park, NSW, Australia
| | - Mohammed Baz
- Department of Computer Engineering, College of Computer and Information Technology, Taif University, Taif, Saudi Arabia
| |
Collapse
|
21
|
Rattanasak A, Uthansakul P, Uthansakul M, Jumphoo T, Phapatanaburi K, Sindhupakorn B, Rooppakhun S. Real-Time Gait Phase Detection Using Wearable Sensors for Transtibial Prosthesis Based on a kNN Algorithm. Sensors (Basel) 2022; 22:4242. [PMID: 35684863 DOI: 10.3390/s22114242] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 05/24/2022] [Accepted: 05/31/2022] [Indexed: 02/01/2023]
Abstract
Those with disabilities who have lost their legs must use a prosthesis to walk. However, traditional prostheses have the disadvantage of being unable to move and support the human gait because there are no mechanisms or algorithms to control them. This makes it difficult for the wearer to walk. To overcome this problem, we developed an insole device with a wearable sensor for real-time gait phase detection based on the kNN (k-nearest neighbor) algorithm for prosthetic control. The kNN algorithm is used with the raw data obtained from the pressure sensors in the insole to predict seven walking phases, i.e., stand, heel strike, foot flat, midstance, heel off, toe-off, and swing. As a result, the predictive decision in each gait cycle to control the ankle movement of the transtibial prosthesis improves with each walk. The results in this study can provide 81.43% accuracy for gait phase detection, and can control the transtibial prosthetic effectively at the maximum walking speed of 6 km/h. Moreover, this insole device is small, lightweight and unaffected by the physical factors of the wearer.
Collapse
|
22
|
Mishra S, Shaw K, Mishra D, Patil S, Kotecha K, Kumar S, Bajaj S. Improving the Accuracy of Ensemble Machine Learning Classification Models Using a Novel Bit-Fusion Algorithm for Healthcare AI Systems. Front Public Health 2022; 10:858282. [PMID: 35602150 PMCID: PMC9114677 DOI: 10.3389/fpubh.2022.858282] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 03/15/2022] [Indexed: 12/11/2022] Open
Abstract
Healthcare AI systems exclusively employ classification models for disease detection. However, with the recent research advances into this arena, it has been observed that single classification models have achieved limited accuracy in some cases. Employing fusion of multiple classifiers outputs into a single classification framework has been instrumental in achieving greater accuracy and performing automated big data analysis. The article proposes a bit fusion ensemble algorithm that minimizes the classification error rate and has been tested on various datasets. Five diversified base classifiers k- nearest neighbor (KNN), Support Vector Machine (SVM), Multi-Layer Perceptron (MLP), Decision Tree (D.T.), and Naïve Bayesian Classifier (N.B.), are used in the implementation model. Bit fusion algorithm works on the individual input from the classifiers. Decision vectors of the base classifier are weighted transformed into binary bits by comparing with high-reliability threshold parameters. The output of each base classifier is considered as soft class vectors (CV). These vectors are weighted, transformed and compared with a high threshold value of initialized δ = 0.9 for reliability. Binary patterns are extracted, and the model is trained and tested again. The standard fusion approach and proposed bit fusion algorithm have been compared by average error rate. The error rate of the Bit-fusion algorithm has been observed with the values 5.97, 12.6, 4.64, 0, 0, 27.28 for Leukemia, Breast cancer, Lung Cancer, Hepatitis, Lymphoma, Embryonal Tumors, respectively. The model is trained and tested over datasets from UCI, UEA, and UCR repositories as well which also have shown reduction in the error rates.
Collapse
Affiliation(s)
- Sashikala Mishra
- Symbiosis Institute of Technology, Symbiosis International University, Pune, India
| | - Kailash Shaw
- Symbiosis Institute of Technology, Symbiosis International University, Pune, India
| | - Debahuti Mishra
- Department of Computer Science and Engineering, Siksha O Anusandhan Deemed to be University, Bhubaneshwar, India
| | - Shruti Patil
- Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
| | - Ketan Kotecha
- Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
| | - Satish Kumar
- Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
| | - Simi Bajaj
- School of Computer Data and Mathematical Sciences, University of Western Sydney, Sydney, NSW, Australia
| |
Collapse
|
23
|
Swana EF, Doorsamy W, Bokoro P. Tomek Link and SMOTE Approaches for Machine Fault Classification with an Imbalanced Dataset. Sensors (Basel) 2022; 22:3246. [PMID: 35590937 PMCID: PMC9099503 DOI: 10.3390/s22093246] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 04/14/2022] [Accepted: 04/20/2022] [Indexed: 06/15/2023]
Abstract
Data-driven methods have prominently featured in the progressive research and development of modern condition monitoring systems for electrical machines. These methods have the advantage of simplicity when it comes to the implementation of effective fault detection and diagnostic systems. Despite their many advantages, the practical implementation of data-driven approaches still faces challenges such as data imbalance. The lack of sufficient and reliable labeled fault data from machines in the field often poses a challenge in developing accurate supervised learning-based condition monitoring systems. This research investigates the use of a Naïve Bayes classifier, support vector machine, and k-nearest neighbors together with synthetic minority oversampling technique, Tomek link, and the combination of these two resampling techniques for fault classification with simulation and experimental imbalanced data. A comparative analysis of these techniques is conducted for different imbalanced data cases to determine the suitability thereof for condition monitoring on a wound-rotor induction generator. The precision, recall, and f1-score matrices are applied for performance evaluation. The results indicate that the technique combining the synthetic minority oversampling technique with the Tomek link provides the best performance across all tested classifiers. The k-nearest neighbors, together with this combination resampling technique yielded the most accurate classification results. This research is of interest to researchers and practitioners working in the area of condition monitoring in electrical machines, and the findings and presented approach of the comparative analysis will assist with the selection of the most suitable technique for handling imbalanced fault data. This is especially important in the practice of condition monitoring on electrical rotating machines, where fault data are very limited.
Collapse
Affiliation(s)
- Elsie Fezeka Swana
- Department of Electrical and Electronics Engineering Technology, Doornfontein Campus, University of Johannesburg, Johannesburg 2028, South Africa; (E.F.S.); (P.B.)
| | - Wesley Doorsamy
- Institute for Intelligent Systems, Auckland Park Campus, University of Johannesburg, Johannesburg 2006, South Africa
| | - Pitshou Bokoro
- Department of Electrical and Electronics Engineering Technology, Doornfontein Campus, University of Johannesburg, Johannesburg 2028, South Africa; (E.F.S.); (P.B.)
| |
Collapse
|
24
|
Palacín J, Rubies E, Clotet E, Martínez D. Classification of Two Volatiles Using an eNose Composed by an Array of 16 Single-Type Miniature Micro-Machined Metal-Oxide Gas Sensors. Sensors (Basel) 2022; 22:s22031120. [PMID: 35161866 PMCID: PMC8838111 DOI: 10.3390/s22031120] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 01/19/2022] [Accepted: 01/27/2022] [Indexed: 05/26/2023]
Abstract
The artificial replication of an olfactory system is currently an open problem. The development of a portable and low-cost artificial olfactory system, also called electronic nose or eNose, is usually based on the use of an array of different gas sensors types, sensitive to different target gases. Low-cost Metal-Oxide semiconductor (MOX) gas sensors are widely used in such arrays. MOX sensors are based on a thin layer of silicon oxide with embedded heaters that can operate at different temperature set points, which usually have the disadvantages of different volatile sensitivity in each individual sensor unit and also different crossed sensitivity to different volatiles (unspecificity). This paper presents and eNose composed by an array of 16 low-cost BME680 digital miniature sensors embedding a miniature MOX gas sensor proposed to unspecifically evaluate air quality. In this paper, the inherent variability and unspecificity that must be expected from the 16 embedded MOX gas sensors, combined with signal processing, are exploited to classify two target volatiles: ethanol and acetone. The proposed eNose reads the resistance of the sensing layer of the 16 embedded MOX gas sensors, applies PCA for dimensional reduction and k-NN for classification. The validation results have shown an instantaneous classification success higher than 94% two days after the calibration and higher than 70% two weeks after, so the majority classification of a sequence of measures has been always successful in laboratory conditions. These first validation results and the low-power consumption of the eNose (0.9 W) enables its future improvement and its use in portable and battery-operated applications.
Collapse
|
25
|
Xie K, Liu K, Alvi HAK, Chen Y, Wang S, Yuan X. KNNCNV: A K-Nearest Neighbor Based Method for Detection of Copy Number Variations Using NGS Data. Front Cell Dev Biol 2022; 9:796249. [PMID: 35004691 PMCID: PMC8728060 DOI: 10.3389/fcell.2021.796249] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Accepted: 11/23/2021] [Indexed: 11/19/2022] Open
Abstract
Copy number variation (CNV) is a well-known type of genomic mutation that is associated with the development of human cancer diseases. Detection of CNVs from the human genome is a crucial step for the pipeline of starting from mutation analysis to cancer disease diagnosis and treatment. Next-generation sequencing (NGS) data provides an unprecedented opportunity for CNVs detection at the base-level resolution, and currently, many methods have been developed for CNVs detection using NGS data. However, due to the intrinsic complexity of CNVs structures and NGS data itself, accurate detection of CNVs still faces many challenges. In this paper, we present an alternative method, called KNNCNV (K-Nearest Neighbor based CNV detection), for the detection of CNVs using NGS data. Compared to current methods, KNNCNV has several distinctive features: 1) it assigns an outlier score to each genome segment based solely on its first k nearest-neighbor distances, which is not only easy to extend to other data types but also improves the power of discovering CNVs, especially the local CNVs that are likely to be masked by their surrounding regions; 2) it employs the variational Bayesian Gaussian mixture model (VBGMM) to transform these scores into a series of binary labels without a user-defined threshold. To evaluate the performance of KNNCNV, we conduct both simulation and real sequencing data experiments and make comparisons with peer methods. The experimental results show that KNNCNV could derive better performance than others in terms of F1-score.
Collapse
Affiliation(s)
- Kun Xie
- School of Computer Science and Technology, Xidian University, Xi'an, China.,Hangzhou Institute of Technology, Xidian University, Hangzhou, China
| | - Kang Liu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Haque A K Alvi
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Yuehui Chen
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, China
| | - Shuzhen Wang
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an, China.,Hangzhou Institute of Technology, Xidian University, Hangzhou, China
| |
Collapse
|
26
|
Muangprathub J, Sriwichian A, Wanichsombat A, Kajornkasirat S, Nillaor P, Boonjing V. A Novel Elderly Tracking System Using Machine Learning to Classify Signals from Mobile and Wearable Sensors. Int J Environ Res Public Health 2021; 18:12652. [PMID: 34886377 PMCID: PMC8656729 DOI: 10.3390/ijerph182312652] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 11/18/2021] [Accepted: 11/27/2021] [Indexed: 11/16/2022]
Abstract
A health or activity monitoring system is the most promising approach to assisting the elderly in their daily lives. The increase in the elderly population has increased the demand for health services so that the existing monitoring system is no longer able to meet the needs of sufficient care for the elderly. This paper proposes the development of an elderly tracking system using the integration of multiple technologies combined with machine learning to obtain a new elderly tracking system that covers aspects of activity tracking, geolocation, and personal information in an indoor and an outdoor environment. It also includes information and results from the collaboration of local agencies during the planning and development of the system. The results from testing devices and systems in a case study show that the k-nearest neighbor (k-NN) model with k = 5 was the most effective in classifying the nine activities of the elderly, with 96.40% accuracy. The developed system can monitor the elderly in real-time and can provide alerts. Furthermore, the system can display information of the elderly in a spatial format, and the elderly can use a messaging device to request help in an emergency. Our system supports elderly care with data collection, tracking and monitoring, and notification, as well as by providing supporting information to agencies relevant in elderly care.
Collapse
Affiliation(s)
- Jirapond Muangprathub
- Faculty of Science and Industrial Technology, Surat Thani Campus, Prince of Songkla University, Surat Thani 84000, Thailand; (A.S.); (A.W.); (S.K.)
- Integrated High-Value of Oleochemical (IHVO) Research Center, Surat Thani Campus, Prince of Songkla University, Surat Thani 84000, Thailand
| | - Anirut Sriwichian
- Faculty of Science and Industrial Technology, Surat Thani Campus, Prince of Songkla University, Surat Thani 84000, Thailand; (A.S.); (A.W.); (S.K.)
| | - Apirat Wanichsombat
- Faculty of Science and Industrial Technology, Surat Thani Campus, Prince of Songkla University, Surat Thani 84000, Thailand; (A.S.); (A.W.); (S.K.)
| | - Siriwan Kajornkasirat
- Faculty of Science and Industrial Technology, Surat Thani Campus, Prince of Songkla University, Surat Thani 84000, Thailand; (A.S.); (A.W.); (S.K.)
| | - Pichetwut Nillaor
- Faculty of Commerce and Management, Trang Campus, Prince of Songkla University, Trang 92000, Thailand;
| | - Veera Boonjing
- Department of Computer Engineering, School of Engineering, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand;
| |
Collapse
|
27
|
Lu L, Wang W. Fault Diagnosis of Permanent Magnet DC Motors Based on Multi-Segment Feature Extraction. Sensors (Basel) 2021; 21:7505. [PMID: 34833579 DOI: 10.3390/s21227505] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/16/2021] [Accepted: 11/08/2021] [Indexed: 11/16/2022]
Abstract
For permanent magnet DC motors (PMDCMs), the amplitude of the current signals gradually decreases after the motor starts. Only using the signal features of current in a single segment is not conducive to fault diagnosis for PMDCMs. In this work, multi-segment feature extraction is presented for improving the effect of fault diagnosis of PMDCMs. Additionally, a support vector machine (SVM), a classification and regression tree (CART), and the k-nearest neighbor algorithm (k-NN) are utilized for the construction of fault diagnosis models. The time domain features extracted from several successive segments of current signals make up a feature vector, which is adopted for fault diagnosis of PMDCMs. Experimental results show that multi-segment features have a better diagnostic effect than single-segment features; the average accuracy of fault diagnosis improves by 19.88%. This paper lays the foundation of fault diagnosis for PMDCMs through multi-segment feature extraction and provides a novel method for feature extraction.
Collapse
|
28
|
Moldovanu S, Damian Michis FA, Biswas KC, Culea-Florescu A, Moraru L. Skin Lesion Classification Based on Surface Fractal Dimensions and Statistical Color Cluster Features Using an Ensemble of Machine Learning Techniques. Cancers (Basel) 2021; 13:5256. [PMID: 34771421 DOI: 10.3390/cancers13215256] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2021] [Revised: 10/18/2021] [Accepted: 10/18/2021] [Indexed: 01/23/2023] Open
Abstract
Simple Summary This study aimed to investigate the efficacy of implementation of novel skin surface fractal dimension features as an auxiliary diagnostic method for melanoma recognition. We therefore examined the skin lesion classification accuracy of the kNN-CV algorithm and of the proposed Radial basis function neural network model. We found an increased accuracy of classification when the fractal analysis is added to the classical color distribution analysis. Our results indicate that by using a reliable classifier, more opportunities exist to detect timely cancerous skin lesions. Abstract (1) Background: An approach for skin cancer recognition and classification by implementation of a novel combination of features and two classifiers, as an auxiliary diagnostic method, is proposed. (2) Methods: The predictions are made by k-nearest neighbor with a 5-fold cross validation algorithm and a neural network model to assist dermatologists in the diagnosis of cancerous skin lesions. As a main contribution, this work proposes a descriptor that combines skin surface fractal dimension and relevant color area features for skin lesion classification purposes. The surface fractal dimension is computed using a 2D generalization of Higuchi’s method. A clustering method allows for the selection of the relevant color distribution in skin lesion images by determining the average percentage of color areas within the nevi and melanoma lesion areas. In a classification stage, the Higuchi fractal dimensions (HFDs) and the color features are classified, separately, using a kNN-CV algorithm. In addition, these features are prototypes for a Radial basis function neural network (RBFNN) classifier. The efficiency of our algorithms was verified by utilizing images belonging to the 7-Point, Med-Node, and PH2 databases; (3) Results: Experimental results show that the accuracy of the proposed RBFNN model in skin cancer classification is 95.42% for 7-Point, 94.71% for Med-Node, and 94.88% for PH2, which are all significantly better than that of the kNN algorithm. (4) Conclusions: 2D Higuchi’s surface fractal features have not been previously used for skin lesion classification purpose. We used fractal features further correlated to color features to create a RBFNN classifier that provides high accuracies of classification.
Collapse
|
29
|
Long F, Zhao S, Wei X, Ng SC, Ni X, Chi A, Fang P, Zeng W, Wei B. Positive and Negative Emotion Classification Based on Multi-channel. Front Behav Neurosci 2021; 15:720451. [PMID: 34512288 PMCID: PMC8428531 DOI: 10.3389/fnbeh.2021.720451] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 07/29/2021] [Indexed: 11/13/2022] Open
Abstract
The EEG features of different emotions were extracted based on multi-channel and forehead channels in this study. The EEG signals of 26 subjects were collected by the emotional video evoked method. The results show that the energy ratio and differential entropy of the frequency band can be used to classify positive and negative emotions effectively, and the best effect can be achieved by using an SVM classifier. When only the forehead and forehead signals are used, the highest classification accuracy can reach 66%. When the data of all channels are used, the highest accuracy of the model can reach 82%. After channel selection, the best model of this study can be obtained. The accuracy is more than 86%.
Collapse
Affiliation(s)
- Fangfang Long
- Department of Psychology, Nanjing University, Nanjing, China
| | - Shanguang Zhao
- Centre for Sport and Exercise Sciences, University of Malaya, Kuala Lumpur, Malaysia
| | - Xin Wei
- Institute of Social Psychology, School of Humanities and Social Sciences, Xi'an Jiaotong University, Xi'an, China.,Key & Core Technology Innovation Institute of the Greater Bay Area, Guangdong, China
| | - Siew-Cheok Ng
- Faculty of Engineering, University of Malaya, Kuala Lumpur, Malaysia
| | - Xiaoli Ni
- Institute of Social Psychology, School of Humanities and Social Sciences, Xi'an Jiaotong University, Xi'an, China
| | - Aiping Chi
- School of Sports, Shaanxi Normal University, Xi'an, China
| | - Peng Fang
- Department of the Psychology of Military Medicine, Air Force Medical University, Xi'an, China
| | - Weigang Zeng
- Key & Core Technology Innovation Institute of the Greater Bay Area, Guangdong, China
| | - Bokun Wei
- Xi'an Middle School of Shaanxi Province, Xi'an, China
| |
Collapse
|
30
|
Lee CY, Ruan LM, Lee ZJ, Huang JQ, Yao J, Ning ZY, Tu JF. Study on the university students' satisfaction of the wisdom tree massive open online course platform based on parameter optimization intelligent algorithm. Sci Prog 2021; 104:368504211054256. [PMID: 34851210 PMCID: PMC10358608 DOI: 10.1177/00368504211054256] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
INTRODUCTION Curriculum learning through the wisdom tree massive open online course platform not only gets rid of the limitations of specialty, school and region, eliminates the limitations of time and space in traditional teaching, but also effectively solves the problem of educational equity. OBJECTIVES This paper proposes an intelligent algorithm combining decision tree, support vector machine, and simulated annealing to obtain the best classification accuracy and decision rules for university students' satisfaction with the wisdom tree massive open online course platform. METHODS This study takes the university students in Fuzhou city information management department as the survey object, and adopts the electronic questionnaire survey method. A total of 1136 formal questionnaires were responded, and 1028 valid questionnaires were obtained after data cleaning and deleting invalid questionnaires (the effective rate was 90.49%). In this paper, the reliability and validity of the questionnaire were tested by IBM SPSS-20.0 software, and six explanatory variables including function, achievement, exercise, quality, richness, and interaction were obtained by principal component analysis. Then, the questionnaire data is converted to CSV (comma separated values) format for analysis. This paper proposes an intelligent algorithm combining decision tree, support vector machine, and simulated annealing to obtain the best classification accuracy and decision rules for university students' satisfaction with the wisdom tree massive open online course platform. In this paper, the proposed algorithm is compared with decision tree, random forest, k-nearest neighbor, and support vector machine to verify its performance. RESULTS The experimental results show that training set classification accuracy of decision tree, random forest, k-nearest neighbor, only support vector machine and the proposed algorithm (simulated annealing + support vector machine) are 92.21%, 96.10%, 95.67%, 97.29%, and 99.58%, respectively. CONCLUSION The proposed algorithm simulated annealing + support vector machine does increase the classification accuracy. At the same time, the 11 decision rules generated by simulated annealing + decision tree can provide useful information for decision makers.
Collapse
Affiliation(s)
- Chou-Yuan Lee
- School of Big Data, Fuzhou University of International Studies and Trade, China
| | - Ling-Ming Ruan
- School of Big Data, Fuzhou University of International Studies and Trade, China
| | - Zne-Jung Lee
- School of Intelligent Construction, Fuzhou University of International Studies and Trade, China
| | - Jian-Qiong Huang
- School of Big Data, Fuzhou University of International Studies and Trade, China
| | - Jie Yao
- School of Big Data, Fuzhou University of International Studies and Trade, China
| | - Zheng-Yuan Ning
- School of Big Data, Fuzhou University of International Studies and Trade, China
| | - Jih-Fu Tu
- Department of Industrial Engineering and Management, St. John's University
| |
Collapse
|
31
|
Kim J, Lee K, Rupasinghe R, Rezaei S, Martínez-López B, Liu X. Applications of Machine Learning for the Classification of Porcine Reproductive and Respiratory Syndrome Virus Sublineages Using Amino Acid Scores of ORF5 Gene. Front Vet Sci 2021; 8:683134. [PMID: 34368274 PMCID: PMC8345883 DOI: 10.3389/fvets.2021.683134] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 06/23/2021] [Indexed: 11/13/2022] Open
Abstract
Porcine reproductive and respiratory syndrome is an infectious disease of pigs caused by PRRS virus (PRRSV). A modified live-attenuated vaccine has been widely used to control the spread of PRRSV and the classification of field strains is a key for a successful control and prevention. Restriction fragment length polymorphism targeting the Open reading frame 5 (ORF5) genes is widely used to classify PRRSV strains but showed unstable accuracy. Phylogenetic analysis is a powerful tool for PRRSV classification with consistent accuracy but it demands large computational power as the number of sequences gets increased. Our study aimed to apply four machine learning (ML) algorithms, random forest, k-nearest neighbor, support vector machine and multilayer perceptron, to classify field PRRSV strains into four clades using amino acid scores based on ORF5 gene sequence. Our study used amino acid sequences of ORF5 gene in 1931 field PRRSV strains collected in the US from 2012 to 2020. Phylogenetic analysis was used to labels field PRRSV strains into one of four clades: Lineage 5 or three clades in Linage 1. We measured accuracy and time consumption of classification using four ML approaches by different size of gene sequences. We found that all four ML algorithms classify a large number of field strains in a very short time (<2.5 s) with very high accuracy (>0.99 Area under curve of the Receiver of operating characteristics curve). Furthermore, the random forest approach detects a total of 4 key amino acid positions for the classification of field PRRSV strains into four clades. Our finding will provide an insightful idea to develop a rapid and accurate classification model using genetic information, which also enables us to handle large genome datasets in real time or semi-real time for data-driven decision-making and more timely surveillance.
Collapse
Affiliation(s)
- Jeonghoon Kim
- Department of Mathematics, University of California, Davis, Davis, CA, United States
| | - Kyuyoung Lee
- Department of Medicine and Epidemiology, Center for Animal Disease Modeling and Surveillance (CADMS), School of Veterinary Medicine, University of California, Davis, Davis, CA, United States
| | - Ruwini Rupasinghe
- Department of Medicine and Epidemiology, Center for Animal Disease Modeling and Surveillance (CADMS), School of Veterinary Medicine, University of California, Davis, Davis, CA, United States
| | - Shahbaz Rezaei
- Department of Computer Science, University of California, Davis, Davis, CA, United States
| | - Beatriz Martínez-López
- Department of Medicine and Epidemiology, Center for Animal Disease Modeling and Surveillance (CADMS), School of Veterinary Medicine, University of California, Davis, Davis, CA, United States
| | - Xin Liu
- Department of Computer Science, University of California, Davis, Davis, CA, United States
| |
Collapse
|
32
|
Onginjo JO, Zhou DM, Berhanu TF, Belihu SWG. Analyzing the impact of social capital on US based Kickstarter projects outcome. Heliyon 2021; 7:e07425. [PMID: 34377848 PMCID: PMC8327500 DOI: 10.1016/j.heliyon.2021.e07425] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Revised: 01/20/2021] [Accepted: 06/24/2021] [Indexed: 11/30/2022] Open
Abstract
The essence of this paper is to analyse the ripple effects caused from the intertwining and complex relationship between the relational and structural dimensions of social capital on the US based Kick starter projects’ outcomes. This will be measured based on real time data collected from the Kick starter. com in form of 1157 projects organised in the structure of the number of backers, amount of time taken to fund the projects and the converted amount pledged towards the projects, as classified according to various project categories and geographical locations. This research applies qualitative and quantitative statistical analysis methods as well as data mining techniques; k-Nearest Neighbour, Naive Bayes and Decision Tree Algorithms. The results from this research confirm that relational social capital i.e. the number of backers involved in the projects, has significantly strong and positive impact on the converted amount pledged towards a project and the project outcome. This paper also offers a feasible decision-making model that will be used by the entrepreneurs in the future to determine which type of project categories an entrepreneur can choose to host and the project outcome.
Collapse
Affiliation(s)
- Joseph Ochieng Onginjo
- University of Electronic Science and Technology of China, Qingshuihe Campus, School of Management and Economics, West Hi-Tech Zone, No.2006, Xiyuan Avenue, Pixian, 611731 Chengdu, Sichuan, PR China
- Corresponding author.
| | - Dong Mei Zhou
- University of Electronic Science and Technology of China, Qingshuihe Campus, School of Management and Economics, West Hi-Tech Zone, No.2006, Xiyuan Avenue, Pixian, 611731 Chengdu, Sichuan, PR China
| | - Tesema Fiseha Berhanu
- Zhejiang Lab, Building 10, China Artificial Intelligence Town, 1818 Wenyi West Road, Yuhang District, ICP No. 18016057, Hangzhou, Zhejiang, PR China
| | - Sime Welde Gebrile Belihu
- Fayetteville State University, Department of Marketing, Management and Entrepreneurship, 1200 Murchison Rd., Fayetteville, 28301 NC, USA
| |
Collapse
|
33
|
Kongsompong S, E-Kobon T, Chumnanpuen P. K-Nearest Neighbor and Random Forest-Based Prediction of Putative Tyrosinase Inhibitory Peptides of Abalone Haliotis diversicolor. Molecules 2021; 26:3671. [PMID: 34208619 DOI: 10.3390/molecules26123671] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 06/06/2021] [Accepted: 06/15/2021] [Indexed: 12/28/2022] Open
Abstract
Skin pigment disorders are common cosmetic and medical problems. Many known compounds inhibit the key melanin-producing enzyme, tyrosinase, but their use is limited due to side effects. Natural-derived peptides also display tyrosinase inhibition. Abalone is a good source of peptides, and the abalone proteins have been used widely in pharmaceutical and cosmetic products, but not for melanin inhibition. This study aimed to predict putative tyrosinase inhibitory peptides (TIPs) from abalone, Haliotis diversicolor, using k-nearest neighbor (kNN) and random forest (RF) algorithms. The kNN and RF predictors were trained and tested against 133 peptides with known anti-tyrosinase properties with 97% and 99% accuracy. The kNN predictor suggested 1075 putative TIPs and six TIPs from the RF predictor. Two helical peptides were predicted by both methods and showed possible interaction with the predicted structure of mushroom tyrosinase, similar to those of the known TIPs. These two peptides had arginine and aromatic amino acids, which were common to the known TIPs, suggesting non-competitive inhibition on the tyrosinase. Therefore, the first version of the TIP predictors could suggest a reasonable number of the TIP candidates for further experiments. More experimental data will be important for improving the performance of these predictors, and they can be extended to discover more TIPs from other organisms. The confirmation of TIPs in abalone will be a new commercial opportunity for abalone farmers and industry.
Collapse
|
34
|
Abou-Abbas L, van Noordt S, Desjardins JA, Cichonski M, Elsabbagh M. Use of Empirical Mode Decomposition in ERP Analysis to Classify Familial Risk and Diagnostic Outcomes for Autism Spectrum Disorder. Brain Sci 2021; 11:brainsci11040409. [PMID: 33804986 PMCID: PMC8063929 DOI: 10.3390/brainsci11040409] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 03/17/2021] [Accepted: 03/18/2021] [Indexed: 12/01/2022] Open
Abstract
Event-related potentials (ERPs) activated by faces and gaze processing are found in individuals with autism spectrum disorder (ASD) in the early stages of their development and may serve as a putative biomarker to supplement behavioral diagnosis. We present a novel approach to the classification of visual ERPs collected from 6-month-old infants using intrinsic mode functions (IMFs) derived from empirical mode decomposition (EMD). Selected features were used as inputs to two machine learning methods (support vector machines and k-nearest neighbors (k-NN)) using nested cross validation. Different runs were executed for the modelling and classification of the participants in the control and high-risk (HR) groups and the classification of diagnosis outcome within the high-risk group: HR-ASD and HR-noASD. The highest accuracy in the classification of familial risk was 88.44%, achieved using a support vector machine (SVM). A maximum accuracy of 74.00% for classifying infants at risk who go on to develop ASD vs. those who do not was achieved through k-NN. IMF-based extracted features were highly effective in classifying infants by risk status, but less effective by diagnostic outcome. Advanced signal analysis of ERPs integrated with machine learning may be considered a first step toward the development of an early biomarker for ASD.
Collapse
Affiliation(s)
- Lina Abou-Abbas
- Montreal Neurological Institute, McGill University, Montreal, QC H3A 2B4, Canada; (S.v.N.); (M.E.)
- Correspondence:
| | - Stefon van Noordt
- Montreal Neurological Institute, McGill University, Montreal, QC H3A 2B4, Canada; (S.v.N.); (M.E.)
| | - James A. Desjardins
- Cognitive and Affective Neuroscience Lab, Brock University, St. Catharines, ON L2S 3A1, Canada; (J.A.D.); (M.C.)
| | - Mike Cichonski
- Cognitive and Affective Neuroscience Lab, Brock University, St. Catharines, ON L2S 3A1, Canada; (J.A.D.); (M.C.)
| | - Mayada Elsabbagh
- Montreal Neurological Institute, McGill University, Montreal, QC H3A 2B4, Canada; (S.v.N.); (M.E.)
| |
Collapse
|
35
|
Shokrekhodaei M, Cistola DP, Roberts RC, Quinones S. Non-Invasive Glucose Monitoring Using Optical Sensor and Machine Learning Techniques for Diabetes Applications. IEEE Access 2021; 9:73029-73045. [PMID: 34336539 PMCID: PMC8321391 DOI: 10.1109/access.2021.3079182] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Diabetes is a major public health challenge affecting more than 451 million people. Physiological and experimental factors influence the accuracy of non-invasive glucose monitoring, and these need to be overcome before replacing the finger prick method. Also, the suitable employment of machine learning techniques can significantly improve the accuracy of glucose predictions. One aim of this study is to use light sources with multiple wavelengths to enhance the sensitivity and selectivity of glucose detection in an aqueous solution. Multiple wavelength measurements have the potential to compensate for errors associated with inter- and intra-individual differences in blood and tissue components. In this study, the transmission measurements of a custom built optical sensor are examined using 18 different wavelengths between 410 and 940 nm. Results show a high correlation value (0.98) between glucose concentration and transmission intensity for four wavelengths (485, 645, 860 and 940 nm). Five machine learning methods are investigated for glucose predictions. When regression methods are used, 9% of glucose predictions fall outside the correct range (normal, hypoglycemic or hyperglycemic). The prediction accuracy is improved by applying classification methods on sets of data arranged into 21 classes. Data within each class corresponds to a discrete 10 mg/dL glucose range. Classification based models outperform regression, and among them, the support vector machine is the most successful with F1-score of 99%. Additionally, Clarke error grid shows that 99.75% of glucose readings fall within the clinically acceptable zones. This is an important step towards critical diagnosis during an emergency patient situation.
Collapse
Affiliation(s)
- Maryamsadat Shokrekhodaei
- Electrical and Computer Engineering Department, The University of Texas at El Paso, El Paso, TX 79968 USA
| | - David P. Cistola
- Center of Emphasis in Diabetes & Metabolism, Paul L. Foster School of Medicine, Texas Tech University Health Sciences Center El Paso, El Paso, TX 79905, USA
| | - Robert C. Roberts
- Electrical and Computer Engineering Department, The University of Texas at El Paso, El Paso, TX 79968 USA
| | - Stella Quinones
- Metallurgical, Materials and Biomedical Engineering Department, The University of Texas at El Paso, El Paso, TX 79968 USA
| |
Collapse
|
36
|
Fogliatto FS, Anzanello MJ, Soares F, Brust-Renck PG. Decision Support for Breast Cancer Detection: Classification Improvement Through Feature Selection. Cancer Control 2020; 26:1073274819876598. [PMID: 31538497 PMCID: PMC6755645 DOI: 10.1177/1073274819876598] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Several statistical-based approaches have been developed to support medical personnel in early breast cancer detection. This article presents a method for feature selection aimed at classifying cases into categories based on patients' breast tissue measures and protein microarray. The effectiveness of this feature selection strategy was evaluated against the commonly used Wisconsin Breast Cancer Database-WBCD (with several patients and fewer features) and a new protein microarray data set (with several features and fewer patients). Features were ranked according to a feature importance index that combines parameters emerging from the unsupervised method of principal component analysis and the supervised method of Bhattacharyya distance. Observations of a training set were iteratively categorized into malignant and benign cases through 3 classification techniques: k-Nearest Neighbor, linear discriminant analysis, and probabilistic neural network. After each classification, the feature with the smallest importance index was removed, and a new categorization was carried out until there was only one feature left. The subset yielding maximum accuracy was used to classify observations in the testing set. Our method yielded average 99.17% accurate classifications in the testing set while retaining average 4.61 out of 9 features in the WBCD, which is comparable to the best results reported by the literature on that data set, with the advantage of relying on simple and widely available multivariate techniques. When applied to the microarray data, the method yielded average accuracy of 98.30% while retaining average 2.17% of the original features. Our results can aid health-care professionals during early diagnosis of breast cancer.
Collapse
Affiliation(s)
- Flavio S Fogliatto
- Industrial Engineering Department, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Michel J Anzanello
- Industrial Engineering Department, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Felipe Soares
- Industrial Engineering Department, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Priscila G Brust-Renck
- Industrial Engineering Department, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil
| |
Collapse
|
37
|
Casale A, Dettman J. Composite Machine Learning Algorithm for Material Sourcing. J Forensic Sci 2020; 65:1458-1464. [PMID: 32343397 DOI: 10.1111/1556-4029.14436] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 03/24/2020] [Accepted: 03/26/2020] [Indexed: 11/28/2022]
Abstract
This study developed a composite machine learning algorithm for attribution of materials of forensic interest (like ammonium nitrate) to original sources. k-nearest neighbor and random forest models were used for source elimination and classification, respectively, in a two-step, composite algorithm based on particle color, size/shape, and trace element concentration features. Novel approaches for simulation to supplement within-source reference features based on empirically measured multi-lot analyses, an improved hold-one-lot-out method for cross-validation, an assessment of the likelihood of the presence of a reference sample, fusion of the source probabilities from the respective classification models, and the calculation of metrics for assessing ensemble sourcing performance are described. Excellent sourcing predictions were obtained; the sourcing algorithm identified the correct source as the top choice 89% of the time, and the correct source was identified to be an average of 2.7 times more likely than the most likely incorrect source.
Collapse
Affiliation(s)
- Amanda Casale
- MIT Lincoln Laboratory, 244 Wood Street, Lexington, Massachusetts, 02421
| | - Josh Dettman
- MIT Lincoln Laboratory, 244 Wood Street, Lexington, Massachusetts, 02421
| |
Collapse
|
38
|
Alturki FA, AlSharabi K, Abdurraqeeb AM, Aljalal M. EEG Signal Analysis for Diagnosing Neurological Disorders Using Discrete Wavelet Transform and Intelligent Techniques. Sensors (Basel) 2020; 20:s20092505. [PMID: 32354161 PMCID: PMC7361958 DOI: 10.3390/s20092505] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 04/21/2020] [Accepted: 04/25/2020] [Indexed: 11/25/2022]
Abstract
Analysis of electroencephalogram (EEG) signals is essential because it is an efficient method to diagnose neurological brain disorders. In this work, a single system is developed to diagnose one or two neurological diseases at the same time (two-class mode and three-class mode). For this purpose, different EEG feature-extraction and classification techniques are investigated to aid in the accurate diagnosis of neurological brain disorders: epilepsy and autism spectrum disorder (ASD). Two different modes, single-channel and multi-channel, of EEG signals are analyzed for epilepsy and ASD. The independent components analysis (ICA) technique is used to remove the artifacts from EEG dataset. Then, the EEG dataset is segmented and filtered to remove noise and interference using an elliptic band-pass filter. Next, the EEG signal features are extracted from the filtered signal using a discrete wavelet transform (DWT) to decompose the filtered signal to its sub-bands delta, theta, alpha, beta and gamma. Subsequently, five statistical methods are used to extract features from the EEG sub-bands: the logarithmic band power (LBP), standard deviation, variance, kurtosis, and Shannon entropy (SE). Further, the features are fed into four different classifiers, linear discriminant analysis (LDA), support vector machine (SVM), k-nearest neighbor (KNN), and artificial neural networks (ANNs), to classify the features corresponding to their classes. The combination of DWT with SE and LBP produces the highest accuracy among all the classifiers. The overall classification accuracy approaches 99.9% using SVM and 97% using ANN for the three-class single-channel and multi-channel modes, respectively.
Collapse
|
39
|
Abstract
In developing countries like Pakistan, cleft surgery is expensive for families, and the child also experiences much pain. In this article, we propose a machine learning-based solution to avoid cleft in the mother's womb. The possibility of cleft lip and palate in embryos can be predicted before birth by using the proposed solution. We collected 1000 pregnant female samples from three different hospitals in Lahore, Punjab. A questionnaire has been designed to obtain a variety of data, such as gender, parenting, family history of cleft, the order of birth, the number of children, midwives counseling, miscarriage history, parent smoking, and physician visits. Different cleaning, scaling, and feature selection methods have been applied to the data collected. After selecting the best features from the cleft data, various machine learning algorithms were used, including random forest, k-nearest neighbor, decision tree, support vector machine, and multilayer perceptron. In our implementation, multilayer perceptron is a deep neural network, which yields excellent results for the cleft dataset compared to the other methods. We achieved 92.6% accuracy on test data based on the multilayer perceptron model. Our promising results of predictions would help to fight future clefts for children who would have cleft.
Collapse
|
40
|
Zhang G, Wang P, Chen H, Zhang L. Wireless Indoor Localization Using Convolutional Neural Network and Gaussian Process Regression. Sensors (Basel) 2019; 19:s19112508. [PMID: 31159314 PMCID: PMC6603619 DOI: 10.3390/s19112508] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 05/21/2019] [Accepted: 05/28/2019] [Indexed: 11/16/2022]
Abstract
This paper presents a localization model employing convolutional neural network (CNN) and Gaussian process regression (GPR) based on Wi-Fi received signal strength indication (RSSI) fingerprinting data. In the proposed scheme, the CNN model is trained by a training dataset. The trained model adapts to complex scenes with multipath effects or many access points (APs). More specifically, the pre-processing algorithm makes the RSSI vector which is formed by considerable RSSI values from different APs readable by the CNN algorithm. The trained CNN model improves the positioning performance by taking a series of RSSI vectors into account and extracting local features. In this design, however, the performance is to be further improved by applying the GPR algorithm to adjust the coordinates of target points and offset the over-fitting problem of CNN. After implementing the hybrid model, the model is experimented with a public database that was collected from a library of Jaume I University in Spain. The results show that the hybrid model has outperformed the model using k-nearest neighbor (KNN) by 61.8%. While the CNN model improves the performance by 45.8%, the GPR algorithm further enhances the localization accuracy. In addition, the paper has also experimented with the three kernel functions, all of which have been demonstrated to have positive effects on GPR.
Collapse
Affiliation(s)
- Guolong Zhang
- School of Automation and Electronic Engineering, University of Science and Technology Beijing, Beijing 100083, China.
| | - Ping Wang
- Space Star Technology Co.,Ltd., Beijing 100086, China.
| | - Haibing Chen
- School of Automation and Electronic Engineering, University of Science and Technology Beijing, Beijing 100083, China.
| | - Lan Zhang
- School of Automation and Electronic Engineering, University of Science and Technology Beijing, Beijing 100083, China.
| |
Collapse
|
41
|
Wang GZ, Li J, Hu YT, Li Y, Du ZY. Fault Identification of Chemical Processes Based on k-NN Variable Contribution and CNN Data Reconstruction Methods. Sensors (Basel) 2019; 19:E929. [PMID: 30813310 DOI: 10.3390/s19040929] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/25/2018] [Revised: 02/20/2019] [Accepted: 02/21/2019] [Indexed: 11/18/2022]
Abstract
Data-driven fault detection and identification methods are important in large-scale chemical processes. However, some traditional methods often fail to show superior performance owing to the self-limitations and the characteristics of process data, such as nonlinearity, non-Gaussian distribution, and multi-operating mode. To cope with these issues, the k-NN (k-Nearest Neighbor) fault detection method and extensions have been developed in recent years. Nevertheless, these methods are primarily used for fault detection, and few papers can be found that examine fault identification. In this paper, in order to extract effective fault information, the relationship between various faults and abnormal variables is studied, and an accurate “fault–symptom” table is presented. Then, a novel fault identification method based on k-NN variable contribution and CNN data reconstruction theories is proposed. When there is an abnormality, a variable contribution plot method based on k-NN is used to calculate the contribution index of each variable, and the feasibility of this method is verified by contribution decomposition theory, which includes a feasibility analysis of a single abnormal variable and multiple abnormal variables. Furthermore, to identify all the faulty variables, a CNN (Center-based Nearest Neighbor) data reconstruction method is proposed; the variables that have the larger contribution indices can be reconstructed using the CNN reconstruction method in turn. The proposed search strategy can guarantee that all faulty variables are found in each sample. The reliability and validity of the proposed method are verified by a numerical example and the Continuous Stirred Tank Reactor system.
Collapse
|
42
|
Rashidi HH, Tran NK, Betts EV, Howell LP, Green R. Artificial Intelligence and Machine Learning in Pathology: The Present Landscape of Supervised Methods. Acad Pathol 2019; 6:2374289519873088. [PMID: 31523704 PMCID: PMC6727099 DOI: 10.1177/2374289519873088] [Citation(s) in RCA: 134] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Revised: 07/15/2019] [Accepted: 07/26/2019] [Indexed: 12/28/2022] Open
Abstract
Increased interest in the opportunities provided by artificial intelligence and machine learning has spawned a new field of health-care research. The new tools under development are targeting many aspects of medical practice, including changes to the practice of pathology and laboratory medicine. Optimal design in these powerful tools requires cross-disciplinary literacy, including basic knowledge and understanding of critical concepts that have traditionally been unfamiliar to pathologists and laboratorians. This review provides definitions and basic knowledge of machine learning categories (supervised, unsupervised, and reinforcement learning), introduces the underlying concept of the bias-variance trade-off as an important foundation in supervised machine learning, and discusses approaches to the supervised machine learning study design along with an overview and description of common supervised machine learning algorithms (linear regression, logistic regression, Naive Bayes, k-nearest neighbor, support vector machine, random forest, convolutional neural networks).
Collapse
Affiliation(s)
- Hooman H. Rashidi
- Department of Pathology and Laboratory Medicine, University of California Davis, School of Medicine, Davis, CA, USA
| | - Nam K. Tran
- Department of Pathology and Laboratory Medicine, University of California Davis, School of Medicine, Davis, CA, USA
| | - Elham Vali Betts
- Department of Pathology and Laboratory Medicine, University of California Davis, School of Medicine, Davis, CA, USA
| | - Lydia P. Howell
- Department of Pathology and Laboratory Medicine, University of California Davis, School of Medicine, Davis, CA, USA
| | - Ralph Green
- Department of Pathology and Laboratory Medicine, University of California Davis, School of Medicine, Davis, CA, USA
| |
Collapse
|
43
|
Ciszek R, Ndode-Ekane XE, Gomez CS, Casillas-Espinosa PM, Ali I, Smith G, Puhakka N, Lapinlampi N, Andrade P, Kamnaksh A, Immonen R, Paananen T, Hudson MR, Brady RD, Shultz SR, O'Brien TJ, Staba RJ, Tohka J, Pitkänen A. Informatics tools to assess the success of procedural harmonization in preclinical multicenter biomarker discovery study on post-traumatic epileptogenesis. Epilepsy Res 2018; 150:17-26. [PMID: 30605864 DOI: 10.1016/j.eplepsyres.2018.12.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Revised: 12/11/2018] [Accepted: 12/26/2018] [Indexed: 12/28/2022]
Abstract
The Epilepsy Bioinformatics Study for Antiepileptogenic Therapy (EpiBioS4Rx) is a National Institutes for Neurological Diseases and Stoke funded Centers-Without-Walls international multidisciplinary study aimed at preventing epileptogenesis. The preclinical biomarker discovery in EpiBios4Rx applies a multicenter study design to allow the number of animals that are required for adequate statistical power for the analysis to be studied in an efficient manner. Further, the use of multiple centers mimics the clinical trial situation, and therefore potentially the chance of successful clinical translation of the outcomes of the study. Its successful implementation requires harmonization of procedures and data analyses between the three contributing centers in Finland, Australia, and USA. The objective of the present analysis was to develop metrics for analysis of the success of harmonization of procedures to guide further data analyses and plan the future multicenter preclinical studies. The interim analysis of data is based on the analysis of data from 212 rats with lateral fluid-percussion injury or sham-operation included in the biomarker discovery by April 30, 2018. The details of protocols, including production of injury, post-injury follow-up, blood sampling, electroencephalogram recording, and magnetic resonance imaging have been presented in the accompanying manuscripts in this Supplement. Implementation of protocols in EpiBios4Rx project participant centers was visualized in 2D using t-distributed stochastic neighborhood embedding (t-SNE). The protocols applied to each rat were presented as feature vectors of procedure related variables (e.g., impact pressure, anesthesia time). The total number of protocol features linked to each rat was 112. The missing data was accounted in visualization by utilizing imputation and adding the number of missing values as a third dimension to 2D t-SNE plot, resulting in a 3D overview of protocol data. Intraclass correlation coefficient (ICC) using Euclidean distances and area under receiver operating characteristic curve (AUC) of k-nearest neighbor classifier (KNN) were utilized to quantify the degree of clustering by center. Both subsets of data with incomplete protocol vectors omitted and missing protocol data imputed were assessed. Our data show that a visible clustering by center was observed in all t-SNE plots, except for day 7 neuroscores. Both ICC and AUC indicated clustering by center in all protocol variable subsets, excluding unimputed day 7 neuroscores (ICC 0.04 and AUC 0.6). ICC for imputed set of all protocol related variables was 0.1 and KNN AUC 0.92. In conclusion, both ICC and AUC indicated differences in protocol between EpiBios4Rx participating centers, which needs to be taken into account in data analysis. Importantly, the majority of observed differences are recoverable as they relate to insufficient updates in record keeping. While AUC score of KNN is a more sensitive measure for protocol harmonization than ICC for data that displays complex splintered clustering, ICC and AUC provide complementary measures to assess the degree of procedural harmonization. This experience should be helpful for other groups planning such multicenter post-traumatic epileptogenesis studies in the future.
Collapse
Affiliation(s)
- Robert Ciszek
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland.
| | | | - Cesar Santana Gomez
- Department of Neurology, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Pablo M Casillas-Espinosa
- The Department of Neuroscience, Central Clinical School, Monash University, Melbourne, Australia; Department of Medicine, The Royal Melbourne Hospital, The University of Melbourne, Victoria, 3052, Australia
| | - Idrish Ali
- The Department of Neuroscience, Central Clinical School, Monash University, Melbourne, Australia; Department of Medicine, The Royal Melbourne Hospital, The University of Melbourne, Victoria, 3052, Australia
| | - Gregory Smith
- Department of Neurology, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Noora Puhakka
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland
| | - Niina Lapinlampi
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland
| | - Pedro Andrade
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland
| | - Alaa Kamnaksh
- Department of Anatomy, Physiology and Genetics, Uniformed Services University, MD, USA
| | - Riikka Immonen
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland
| | - Tomi Paananen
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland
| | - Matthew R Hudson
- The Department of Neuroscience, Central Clinical School, Monash University, Melbourne, Australia; Department of Medicine, The Royal Melbourne Hospital, The University of Melbourne, Victoria, 3052, Australia
| | - Rhys D Brady
- The Department of Neuroscience, Central Clinical School, Monash University, Melbourne, Australia; Department of Medicine, The Royal Melbourne Hospital, The University of Melbourne, Victoria, 3052, Australia
| | - Sandy R Shultz
- The Department of Neuroscience, Central Clinical School, Monash University, Melbourne, Australia
| | - Terence J O'Brien
- Department of Neurology, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA; Department of Medicine, The Royal Melbourne Hospital, The University of Melbourne, Victoria, 3052, Australia; Department of Neurology, The Alfred Hospital, Commercial Road, Melbourne, Victoria, 3004, Australia; Department of Neurology, The Royal Melbourne Hospital, Grattan Street, Parkville, Victoria, 3050, Australia
| | - Richard J Staba
- Department of Neurology, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Jussi Tohka
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland
| | - Asla Pitkänen
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland
| |
Collapse
|
44
|
Liu X, Ping Y, Wang D, Yao R, Wan H. [Comparison of decoding performance between spike and local field potential signals during goal-directed decision-making task of pigeons]. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi 2018; 35:786-793. [PMID: 30370720 PMCID: PMC9935269 DOI: 10.7507/1001-5515.201712038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Indexed: 11/03/2022]
Abstract
Both spike and local field potential (LFP) signals are two of the most important candidate signals for neural decoding. At present there are numerous studies on their decoding performance in mammals, but the decoding performance in birds is still not clear. We analyzed the decoding performance of both signals recorded from nidopallium caudolaterale area in six pigeons during the goal-directed decision-making task using the decoding algorithm combining leave-one-out and k-nearest neighbor (LOO- kNN). And the influence of the parameters, include the number of channels, the position and size of decoding window, and the nearest neighbor k value, on the decoding performance was also studied. The results in this study have shown that the two signals can effectively decode the movement intention of pigeons during the this task, but in contrast, the decoding performance of LFP signal is higher than that of spike signal and it is less affected by the number of channels. The best decoding window is in the second half of the goal-directed decision-making process, and the optimal decoding window size of LFP signal (0.3 s) is shorter than that of spike signal (1 s). For the LOO- kNN algorithm, the accuracy is inversely proportional to the k value. The smaller the k value is, the larger the accuracy of decoding is. The results in this study will help to parse the neural information processing mechanism of brain and also have reference value for brain-computer interface.
Collapse
Affiliation(s)
- Xinyu Liu
- School of Information Engineering, Huanghuai University, Zhumadian, Henan 463000, P.R.China;School of Electrical Engineering, Zhengzhou University, Zhengzhou 450001, P.R.China;Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou University, Zhengzhou 450001,
| | - Yanna Ping
- School of Information Engineering, Huanghuai University, Zhumadian, Henan 463000, P.R.China
| | - Dongyun Wang
- School of Information Engineering, Huanghuai University, Zhumadian, Henan 463000, P.R.China
| | - Ruxian Yao
- School of Information Engineering, Huanghuai University, Zhumadian, Henan 463000, P.R.China
| | - Hong Wan
- School of Electrical Engineering, Zhengzhou University, Zhengzhou 450001, P.R.China;Henan Key Laboratory of Brain Science and Brain-Computer Interface Technology, Zhengzhou University, Zhengzhou 450001, P.R.China
| |
Collapse
|
45
|
Desailly E, Galarraga OC, Khouri N. Improving multilevel surgery planning and predicting post-operative outcome in cerebral palsy. Comput Methods Biomech Biomed Engin 2017; 20:59-60. [PMID: 29088668 DOI: 10.1080/10255842.2017.1382860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- E Desailly
- a Fondation Ellen Poidatz (St Fargeau-Ponthierry , France )
| | | | - N Khouri
- a Fondation Ellen Poidatz (St Fargeau-Ponthierry , France ).,c Hôpital Necker (Paris , France )
| |
Collapse
|
46
|
Zhou T, Liang Y, Liu G, Tan S, Chen Z. [Study on aided diagnosis for cardiovascular diseases based on Relief algorithm]. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi 2017; 34:535-542. [PMID: 29745549 DOI: 10.7507/1001-5515.201609070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The study was intended to introduce a novel method for aided diagnosis of cardiovascular diseases based on photoplethysmography (PPG). For this purpose, 40 volunteers were recruited in this study, of whom the physiological and pathological information was collected, including blood pressure and simultaneous PPG data on fingertips, by using a sphygmomanometer and a smart fingertip sensor. According to the PPG signal and its first and second derivatives, 52 features were defined and acquired. The Relief feature selection algorithm was performed to calculate the contribution of each feature to cardiovascular diseases. And then 10 core features which had the greatest contribution were selected as an optimal feature subset. Finally, the efficiency of the Relief feature selection algorithm was demonstrated by the results of k-nearest neighbor (kNN) and support vector machine (SVM) classifier applications of the features. The prediction accuracy of kNN model and SVM reached 66.67% and 83.33% respectively, indicating that: ① Age was the foremost feature for aided diagnosis of cardiovascular diseases; ② The optimal feature subset provided an important evaluation of cardiovascular health status. The obtained results showed that the application of the Relief feature selection algorithm provided high accuracy in aided diagnosis of cardiovascular diseases.
Collapse
Affiliation(s)
- Tanqi Zhou
- School of Electronic Engineer and Automatic, Guilin University of Electronic Technology, Guilin, Guangxi 541004, P.R.China
| | - Yongbo Liang
- School of Life and Environmental Sciences, Guilin University of Electronic Technology, Guilin, Guangxi 541004, P.R.China
| | - Guiyong Liu
- Guilin People's Hospital, Guilin, Guangxi 541002, P.R.China
| | - Shaozhen Tan
- Guilin People's Hospital, Guilin, Guangxi 541002, P.R.China
| | - Zhencheng Chen
- School of Life and Environmental Sciences, Guilin University of Electronic Technology, Guilin, Guangxi 541004,
| |
Collapse
|
47
|
Khanfar MA, Taha MO. Unsupervised pharmacophore modeling combined with QSAR analyses revealed novel low micromolar SIRT2 inhibitors. J Mol Recognit 2017; 30. [PMID: 28299833 DOI: 10.1002/jmr.2623] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Revised: 01/18/2017] [Accepted: 02/13/2017] [Indexed: 11/10/2022]
Abstract
Situin 2 (SIRT2) enzyme is a histone deacetylase that has important role in neuronal development. SIRT2 is clinically validated target for neurodegenerative diseases and some cancers. In this study, exhaustive unsupervised pharmacophore modeling was combined with quantitative structure-activity relationship (QSAR) analysis to explore the structural requirements for potent SIRT2 inhibitors using 146 known SIRT2 ligands. A computational workflow that combines genetic function algorithm with k-nearest neighbor or multiple linear regression was implemented to build self-consistent and predictive QSAR models based on combinations of pharmacophores and physicochemical descriptors. Successful pharmacophores were complemented with exclusion spheres to optimize their receiver operating characteristic curve profiles. Optimal QSAR models and their associated pharmacophore hypotheses were experimentally validated by identification and in vitro evaluation of several new promising SIRT2 inhibitory leads retrieved from the National Cancer Institute structural database. The most potent hit illustrated IC50 value of 5.4μM. The chemical structures of active hits were validated by proton nuclear magnetic resonance and mass spectroscopy.
Collapse
Affiliation(s)
- Mohammad A Khanfar
- Department of Pharmaceutical Sciences, Faculty of Pharmacy, Univerity of Jordan, Amman, Jordan
| | - Mutasem O Taha
- Drug Discovery Unit, Department of Pharmaceutical Sciences, Faculty of Pharmacy, University of Jordan, Amman, Jordan
| |
Collapse
|
48
|
Kashefpoor M, Rabbani H, Barekatain M. Automatic Diagnosis of Mild Cognitive Impairment Using Electroencephalogram Spectral Features. J Med Signals Sens 2016; 6:25-32. [PMID: 27014609 PMCID: PMC4786960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
Abstract
Alzheimer's disease (AD) is one of the most expensive and fatal diseases in the elderly population. Up to now, no cure have been found for AD, so early stage diagnosis is the only way to control it. Mild cognitive impairment (MCI) usually is the early stage of AD which is defined as decreasing in mental abilities such a cognition, memory, and speech not too severe to interfere daily activities. MCI diagnosis is rather hard and usually assumed as normal consequences of aging. This study proposes an accurate, mobile, and nonexpensive diagnostic approach based on electroencephalogram (EEG) signal. EEG signals were recorded using 19 electrodes positioned according to the 10-20 International system at resting eyes closed state from 16 normal and 11 MCI participants. Nineteen Spectral features are computed for each channel and examined using a correlation based algorithm to select the best discriminative features. Selected features are classified using a combination of neurofuzzy system and k-nearest neighbor classifier. Final results reach 88.89%, 100%, and 83.33% for accuracy, sensitivity, and specificity, respectively, which shows the potential of proposed method to be used as an MCI diagnostic tool, especially for screening a large population.
Collapse
Affiliation(s)
- Masoud Kashefpoor
- Department of Biomedical Engineering, Faculty of Advanced Medical Technologies, Isfahan University of Medical Sciences, Isfahan, Iran,Student Research Center, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Hossein Rabbani
- Department of Biomedical Engineering, Faculty of Advanced Medical Technologies, Isfahan University of Medical Sciences, Isfahan, Iran,Medical Image and Signal Processing Research Center, Isfahan University of Medical Sciences, Isfahan, Iran,Address for correspondence: Department of Advanced Medical Technologies, Medical Image and Signal Processing Research Center, Isfahan University of Medical Sciences, Isfahan, Iran. E-mail:
| | - Majid Barekatain
- Psychosomatic Research Center, Department of Psychiatry, Medical School, Isfahan University of Medical Sciences, Isfahan, Iran
| |
Collapse
|
49
|
Sharma MC. Identification of 3-Nitro-2,4,6-trihydroxybenzamide Derivatives as Photosynthetic Electron Transport Inhibitors by QSAR and Pharmacophore Studies. Interdiscip Sci 2015; 8:109-21. [PMID: 26245276 DOI: 10.1007/s12539-015-0019-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2013] [Revised: 01/21/2014] [Accepted: 02/07/2014] [Indexed: 11/30/2022]
Abstract
In the present investigation, quantitative structure-activity relationship (QSAR) analysis was performed on a data set consisting of structurally diverse compounds in order to investigate the role of their structural features on their photosynthetic electron transport Inhibitors. The best 2D-QSAR model was selected, having correlation coefficient r (2) = 0.8544 and cross-validated squared correlation coefficient q (2) = 0.7139 with external predictive ability of pred_r (2) = 0.7753. The results obtained in this study indicate that the presence of hydroxy and nitro groups, expressed by the SsOHcount and SddsN (nitro) count, is the most relevant molecular property determining efficiency of photosynthetic inhibitory. Molecular field analysis was used to construct the best k-nearest neighbor (kNN-MFA)-based 3D-QSAR model using SA-PLS method, showing good correlative and predictive capabilities in terms of [Formula: see text] and [Formula: see text]. The pharmacophore model includes three features viz. hydrogen bond donor, hydrogen bond acceptor, and one aromatic feature. The developed model was found to be predictive and can be used to design potent photosynthetic electron transport activities prior to their synthesis for further lead modification.
Collapse
Affiliation(s)
- Mukesh C Sharma
- Drug Research Laboratory, School of Pharmacy, Devi Ahilya University, Takshila Campus, Khandwa Road, Indore, M.P., 452 001, India.
| |
Collapse
|
50
|
Sharma MC. A Structure-Activity Relationship Study of Naphthoquinone Derivatives as Antitubercular Agents Using Molecular Modeling Techniques. Interdiscip Sci 2015; 7:346-56. [PMID: 26159131 DOI: 10.1007/s12539-015-0011-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2013] [Revised: 11/15/2013] [Accepted: 11/22/2013] [Indexed: 11/25/2022]
Abstract
Tuberculosis (TB) is one of the major causes of death worldwide. Mycobacterium tuberculosis, the leading causative agent of TB, is responsible for the morbidity and mortality of a large population worldwide. In view of above and as a part of our effort to develop new and potent anti-TB agents, a series of substituted naphthoquinone derivatives were subjected to molecular modeling using various feature selection methods. The statistically significant best 2D-QSAR model having correlation coefficient [Formula: see text] and cross-validated squared correlation coefficient [Formula: see text] with external predictive ability of [Formula: see text] was developed by SA-PLS, and group-based QSAR model having [Formula: see text] and [Formula: see text] with [Formula: see text] was developed by SA-PLS. Further analysis using three-dimensional QSAR technique identifies a suitable model obtained by SA-partial least square method leading to antitubercular activity prediction. k-nearest neighbor molecular field analysis was used to construct the best 3D-QSAR model using SA-PLS method, showing good correlative and predictive capabilities in terms of [Formula: see text] and [Formula: see text]. The pharmacophore analysis results obtained from this study show that the distance between the aromatic/hydrophobic and the naphthoquinone moiety sites to the aliphatic and acceptor groups should be connected with almost the same distance for significant antitubercular activity. The information rendered by QSAR models may lead to a better understanding of structural requirements of antitubercular activity and also can help in the design of novel potent antitubercular activity.
Collapse
Affiliation(s)
- Mukesh C Sharma
- Drug Research Laboratory, School of Pharmacy, Devi Ahilya University, Takshila Campus, Khandwa Road, Indore, 452 001, India.
| |
Collapse
|