Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

25
(from Reference Citation Analysis)

Article PDFs (9)

Cited by > 0 (18)

Searched Name

naive Bayes

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	Identifying Intraoperative Spinal Cord Injury Location from Somatosensory Evoked Potentials' Time-Frequency Components. Bioengineering (Basel) 2023;10:707. [PMID: 37370638 DOI: 10.3390/bioengineering10060707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 06/07/2023] [Accepted: 06/08/2023] [Indexed: 06/29/2023] Open Abstract Excessive distraction in corrective spine surgery can lead to iatrogenic distraction spinal cord injury. Diagnosis of the location of the spinal cord injury helps in early removal of the injury source. The time-frequency components of the somatosensory evoked potential have been reported to provide information on the location of spinal cord injury, but most studies have focused on contusion injuries of the cervical spine. In this study, we established 19 rat models of distraction spinal cord injury at different levels and collected the somatosensory evoked potentials of the hindlimb and extracted their time-frequency components. Subsequently, we used k-medoid clustering and naive Bayes to classify spinal cord injury at the C5 and C6 level, as well as spinal cord injury at the cervical, thoracic, and lumbar spine, respectively. The results showed that there was a significant delay in the latency of the time-frequency components distributed between 15 and 30 ms and 50 and 150 Hz in all spinal cord injury groups. The overall classification accuracy was 88.28% and 84.87%. The results demonstrate that the k-medoid clustering and naive Bayes methods are capable of extracting the time-frequency component information depending on the spinal cord injury location and suggest that the somatosensory evoked potential has the potential to diagnose the location of a spinal cord injury. Collapse Key Words machine learning naive Bayes somatosensory evoked potentials spinal cord injury time-frequency components Collapse MESH Headings Collapse Grants 2021YFF0501600 National Key Research and Development Program of China 2022A703-3 Zhanjiang Competitive allocation of special funds for scientific and technological development Collapse
2	Multi-Layered Non-Local Bayes Model for Lung Cancer Early Diagnosis Prediction with the Internet of Medical Things. Bioengineering (Basel) 2023;10:bioengineering10020138. [PMID: 36829633 PMCID: PMC9952033 DOI: 10.3390/bioengineering10020138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 01/07/2023] [Accepted: 01/11/2023] [Indexed: 01/22/2023] Open Abstract The Internet of Things (IoT) has been influential in predicting major diseases in current practice. The deep learning (DL) technique is vital in monitoring and controlling the functioning of the healthcare system and ensuring an effective decision-making process. In this study, we aimed to develop a framework implementing the IoT and DL to identify lung cancer. The accurate and efficient prediction of disease is a challenging task. The proposed model deploys a DL process with a multi-layered non-local Bayes (NL Bayes) model to manage the process of early diagnosis. The Internet of Medical Things (IoMT) could be useful in determining factors that could enable the effective sorting of quality values through the use of sensors and image processing techniques. We studied the proposed model by analyzing its results with regard to specific attributes such as accuracy, quality, and system process efficiency. In this study, we aimed to overcome problems in the existing process through the practical results of a computational comparison process. The proposed model provided a low error rate (2%, 5%) and an increase in the number of instance values. The experimental results led us to conclude that the proposed model can make predictions based on images with high sensitivity and better precision values compared to other specific results. The proposed model achieved the expected accuracy (81%, 95%), the expected specificity (80%, 98%), and the expected sensitivity (80%, 99%). This model is adequate for real-time health monitoring systems in the prediction of lung cancer and can enable effective decision-making with the use of DL techniques. Collapse Key Words Internet of Things deep learning diagnosis prediction lung cancer machine learning medical things naive Bayes Collapse MESH Headings Collapse Grants Collapse
3	Probabilistic Fusion for Pedestrian Detection from Thermal and Colour Images. SENSORS (BASEL, SWITZERLAND) 2022;22:8637. [PMID: 36433238 PMCID: PMC9698565 DOI: 10.3390/s22228637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 10/26/2022] [Accepted: 11/07/2022] [Indexed: 06/16/2023] Abstract Pedestrian detection is an important research domain due to its relevance for autonomous and assisted driving, as well as its applications in security and industrial automation. Often, more than one type of sensor is used to cover a broader range of operating conditions than a single-sensor system would allow. However, it remains difficult to make pedestrian detection systems perform well in highly dynamic environments, often requiring extensive retraining of the algorithms for specific conditions to reach satisfactory accuracy, which, in turn, requires large, annotated datasets captured in these conditions. In this paper, we propose a probabilistic decision-level sensor fusion method based on naive Bayes to improve the efficiency of the system by combining the output of available pedestrian detectors for colour and thermal images without retraining. The results in this paper, obtained through long-term experiments, demonstrate the efficacy of our technique, its ability to work with non-registered images, and its adaptability to cope with situations when one of the sensors fails. The results also show that our proposed technique improves the overall accuracy of the system and could be very useful in several applications. Collapse Key Words decision-level fusion naive Bayes probabilistic fusion sensor fusion Collapse MESH Headings Humans Pedestrians Bayes Theorem Color Automobile Driving Algorithms Collapse Grants Collapse
4	User Experience Estimation in Multi-Service Scenario of Cellular Network. SENSORS (BASEL, SWITZERLAND) 2021;22:89. [PMID: 35009633 PMCID: PMC8747518 DOI: 10.3390/s22010089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 12/15/2021] [Accepted: 12/22/2021] [Indexed: 06/14/2023] Abstract The estimation of user experience in a wireless network has always been a research hotspot, especially for the realization of network automation. In order to solve the problem of user experience estimation in wireless networks, we propose a two-step optimization method for the selection of the kernel function and bandwidth in a naive Bayesian classifier based on kernel density estimation. This optimization method can effectively improve the accuracy of estimation. At present, research on user experience estimation in wireless networks does not include an in-depth analysis of the reasons for the decline of user experience. We established a scheme integrating user experience prediction and network fault diagnosis. Key performance indicator (KPI) data collected from an actual network were divided into five categories, which were used to estimate user experience. The results of these five estimates were counted through the voting mechanism, and the final estimation results could be obtained. At the same time, this voting mechanism can also feed back to us which KPIs lead to the reduction of user experience. In addition, this paper also puts forward the evaluation standard of the multi-service perception capability of cell-level wireless networks. We summarize the user experience estimation for three main services in a cell to obtain a cell-level user experience evaluation. The results showed that the proposed method can accurately estimate user experience and diagnosis abnormal values in a timely manner. This method can improve the efficiency of network management. Collapse Key Words cellular network kernel density estimation naive Bayes network automation users experience Collapse MESH Headings Algorithms Bayes Theorem Computer Communication Networks Spatial Analysis Wireless Technology Collapse Grants Collapse
5	Mental Stress Classification Based on a Support Vector Machine and Naive Bayes Using Electrocardiogram Signals. SENSORS (BASEL, SWITZERLAND) 2021;21:7916. [PMID: 34883920 PMCID: PMC8659646 DOI: 10.3390/s21237916] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 11/19/2021] [Accepted: 11/25/2021] [Indexed: 02/07/2023] Abstract Examining mental health is crucial for preventing mental illnesses such as depression. This study presents a method for classifying electrocardiogram (ECG) data into four emotional states according to the stress levels using one-against-all and naive Bayes algorithms of a support vector machine. The stress classification criteria were determined by calculating the average values of the R-S peak, R-R interval, and Q-T interval of the ECG data to improve the stress classification accuracy. For the performance evaluation of the stress classification model, confusion matrix, receiver operating characteristic (ROC) curve, and minimum classification error were used. The average accuracy of the stress classification was 97.6%. The proposed model improved the accuracy by 8.7% compared to the previous stress classification algorithm. Quantifying the stress signals experienced by people can facilitate a more effective management of their mental state. Collapse Key Words electrocardiogram naive Bayes support vector machine Collapse MESH Headings Algorithms Bayes Theorem Electrocardiography Humans ROC Curve Support Vector Machine Collapse Grants Collapse
6	Realizing an Effective COVID-19 Diagnosis System Based on Machine Learning and IoT in Smart Hospital Environment. IEEE INTERNET OF THINGS JOURNAL 2021;8:15919-15928. [PMID: 35782183 PMCID: PMC8769008 DOI: 10.1109/jiot.2021.3050775] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 12/05/2020] [Accepted: 01/06/2021] [Indexed: 05/18/2023] Abstract The aim of this study is to propose a model based on machine learning (ML) and Internet of Things (IoT) to diagnose patients with COVID-19 in smart hospitals. In this sense, it was emphasized that by the representation for the role of ML models and IoT relevant technologies in smart hospital environment. The accuracy rate of diagnosis (classification) based on laboratory findings can be improved via light ML models. Three ML models, namely, naive Bayes (NB), Random Forest (RF), and support vector machine (SVM), were trained and tested on the basis of laboratory datasets. Three main methodological scenarios of COVID-19 diagnoses, such as diagnoses based on original and normalized datasets and those based on feature selection, were presented. Compared with benchmark studies, our proposed SVM model obtained the most substantial diagnosis performance (up to 95%). The proposed model based on ML and IoT can be served as a clinical decision support system. Furthermore, the outcomes could reduce the workload for doctors, tackle the issue of patient overcrowding, and reduce mortality rate during the COVID-19 pandemic. Collapse Key Words COVID-19 Internet of Things (IoT) laboratory findings machine learning (ML) naive Bayes random forest (RF) smart hospital environment support vector machine Collapse MESH Headings Collapse Grants Collapse
7	Assessment of current taxonomic assignment strategies for metabarcoding eukaryotes. Mol Ecol Resour 2021;21:2190-2203. [PMID: 33905615 DOI: 10.1111/1755-0998.13407] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 03/08/2021] [Accepted: 04/19/2021] [Indexed: 01/04/2023] Abstract The effective use of metabarcoding in biodiversity science has brought important analytical challenges due to the need to generate accurate taxonomic assignments. The assignment of sequences to genus or species level is critical for biodiversity surveys and biomonitoring, but it is particularly challenging as researchers must select the approach that best recovers information on species composition. This study evaluates the performance and accuracy of seven methods in recovering the species composition of mock communities by using COI barcode fragments. The mock communities varied in species number and specimen abundance, while upstream molecular and bioinformatic variables were held constant, and using a set of COI fragments. We evaluated the impact of parameter optimization on the quality of the predictions. Our results indicate that BLAST top hit competes well with more complex approaches if optimized for the mock community under study. For example, the two machine learning methods that were benchmarked proved more sensitive to reference database heterogeneity and completeness than methods based on sequence similarity. The accuracy of assignments was impacted by both species and specimen counts (query compositional heterogeneity) which ultimately influence the selection of appropriate software. We urge researchers to: (i) use realistic mock communities to allow optimization of parameters, regardless of the taxonomic assignment method employed; (ii) carefully choose and curate the reference databases including completeness; and (iii) use QIIME, BLAST or LCA methods, in conjunction with parameter tuning to better assign taxonomy to diverse communities, especially when information on species diversity is lacking for the area under study. Collapse Key Words BLAST benchmarking compositional heterogeneity machine learning mock community naive Bayes species identification Collapse MESH Headings Collapse Grants Collapse
8	Machine learning approaches to constructing predictive models of vitamin D deficiency in a hypertensive population: a comparative study. Inform Health Soc Care 2021;46:355-369. [PMID: 33792475 DOI: 10.1080/17538157.2021.1896524] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Abstract Objective: Given the association between vitamin D deficiency and risk for cardiovascular disease, we used machine learning approaches to establish a model to predict the probability of deficiency. Determination of serum levels of 25-hydroxy vitamin D (25(OH)D) provided the best assessment of vitamin D status, but such tests are not always widely available or feasible. Thus, our study established predictive models with high sensitivity to identify patients either unlikely to have vitamin D deficiency or who should undergo 25(OH)D testing.Methods: We collected data from 1002 hypertensive patients from a Spanish university hospital. The elastic net regularization approach was applied to reduce the dimensionality of the dataset. The issue of determining vitamin D status was addressed as a classification problem; thus, the following classifiers were applied: logistic regression, support vector machine (SVM), random forest, naive Bayes, and Extreme Gradient Boost methods. Classification accuracy, sensitivity, specificity, and predictive values were computed to assess the performance of each method.Results: The SVM-based method with radial kernel performed better than the other algorithms in terms of sensitivity (98%), negative predictive value (71%), and classification accuracy (73%).Conclusion: The combination of a feature-selection method such as elastic net regularization and a classification approach produced well-fitted models. The SVM approach yielded better predictions than the other algorithms. This combination approach allowed us to develop a predictive model with high sensitivity but low specificity, to identify the population that could benefit from laboratory determination of serum levels of 25(OH)D. Collapse Key Words Vitamin D deficiency cardiovascular risk assessment logistic regression naive Bayes predictive model random forest support vector machine Collapse MESH Headings Collapse Grants Collapse
9	Estimating the relative probability of direct transmission between infectious disease patients. Int J Epidemiol 2021;49:764-775. [PMID: 32211747 DOI: 10.1093/ije/dyaa031] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Accepted: 02/07/2020] [Indexed: 11/14/2022] Open Abstract BACKGROUND Estimating infectious disease parameters such as the serial interval (time between symptom onset in primary and secondary cases) and reproductive number (average number of secondary cases produced by a primary case) are important in understanding infectious disease dynamics. Many estimation methods require linking cases by direct transmission, a difficult task for most diseases. METHODS Using a subset of cases with detailed genetic and/or contact investigation data to develop a training set of probable transmission events, we build a model to estimate the relative transmission probability for all case-pairs from demographic, spatial and clinical data. Our method is based on naive Bayes, a machine learning classification algorithm which uses the observed frequencies in the training dataset to estimate the probability that a pair is linked given a set of covariates. RESULTS In simulations, we find that the probabilities estimated using genetic distance between cases to define training transmission events are able to distinguish between truly linked and unlinked pairs with high accuracy (area under the receiver operating curve value of 95%). Additionally, only a subset of the cases, 10-50% depending on sample size, need to have detailed genetic data for our method to perform well. We show how these probabilities can be used to estimate the average effective reproductive number and apply our method to a tuberculosis outbreak in Hamburg, Germany. CONCLUSIONS Our method is a novel way to infer transmission dynamics in any dataset when only a subset of cases has rich contact investigation and/or genetic data. Collapse Key Words Tuberculosis machine learning naive Bayes reproductive number Collapse MESH Headings Collapse Grants Collapse
10	Automatic Detection of a Student's Affective States for Intelligent Teaching Systems. Brain Sci 2021;11:brainsci11030331. [PMID: 33808032 PMCID: PMC7998267 DOI: 10.3390/brainsci11030331] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 03/01/2021] [Accepted: 03/03/2021] [Indexed: 11/16/2022] Open Abstract AutoTutor is an automated computer tutor that simulates human tutors and holds conversations with students in natural language. Using data collected from AutoTutor, the following determinations were sought: Can we automatically classify affect states from intelligent teaching systems to aid in the detection of a learner’s emotional state? Using frequency patterns of AutoTutor feedback and assigned user emotion in a series of pairs, can the next pair of feedback/emotion series be predicted? Through a priori data mining approaches, we found dominant frequent item sets that predict the next set of responses. Thirty-four participants provided 200 turns between the student and the AutoTutor. Two series of attributes and emotions were concatenated into one row to create a record of previous and next set of emotions. Feature extraction techniques, such as multilayer-perceptron and naive Bayes, were performed on the dataset to perform classification for affective state labeling. The emotions ‘Flow’ and ‘Frustration’ had the highest classification of all the other emotions when measured against other emotions and their respective attributes. The most common frequent item sets were ‘Flow’ and ‘Confusion’. Collapse Key Words a priori affective states antecedent/consequent human computer interaction intelligent tutoring systems multi-layer perceptron naive Bayes Collapse MESH Headings Collapse Grants Collapse
11	SICD: Novel Single-Access-Point Indoor Localization Based on CSI-MIMO with Dimensionality Reduction. SENSORS 2021;21:s21041325. [PMID: 33668436 PMCID: PMC7918435 DOI: 10.3390/s21041325] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 02/05/2021] [Accepted: 02/09/2021] [Indexed: 11/16/2022] Abstract With the rise of location-based services and the rapidly growing requirements related to their applications, indoor localization based on channel state information–multiple-input multiple-output (CSI-MIMO) has become an important research topic. However, indoor localization based on CSI-MIMO has some disadvantages, including noise and high data dimensions. To overcome the above drawbacks, we proposed a novel method of indoor localization based on CSI-MIMO, named SICD. For SICD, a novel localization fingerprint was first designed which can reflect the time–frequency and space–frequency characteristics of CSI-MIMO under a single access point (AP). To reduce the redundancy in the data of CSI-MIMO amplitude, we developed a data dimensionality reduction algorithm. Moreover, by leveraging a log-normal distribution, we calculated the conditional probability of the naive Bayes classifier, which was used to predict the moving object’s location. Compared with other state-of-the-art methods, the results of the experiment confirm that the SICD effectively improves localization accuracy. Collapse Key Words MIMO channel state information dimensionality reduction indoor localization naive Bayes Collapse MESH Headings Collapse Grants Collapse
12	A Machine Learning Approach for Tracing Tumor Original Sites With Gene Expression Profiles. Front Bioeng Biotechnol 2020;8:607126. [PMID: 33330438 PMCID: PMC7732438 DOI: 10.3389/fbioe.2020.607126] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 10/26/2020] [Indexed: 11/23/2022] Open Abstract Some carcinomas show that one or more metastatic sites appear with unknown origins. The identification of primary or metastatic tumor tissues is crucial for physicians to develop precise treatment plans for patients. With unknown primary origin sites, it is challenging to design specific plans for patients. Usually, those patients receive broad-spectrum chemotherapy, while still having poor prognosis though. Machine learning has been widely used and already achieved significant advantages in clinical practices. In this study, we classify and predict a large number of tumor samples with uncertain origins by applying the random forest and Naive Bayesian algorithms. We use the precision, recall, and other measurements to evaluate the performance of our approach. The results have showed that the prediction accuracy of this method was 90.4 for 7,713 samples. The accuracy was 80% for 20 metastatic tumors samples. In addition, the 10-fold cross-validation is used to evaluate the accuracy of classification, which reaches 91%. Collapse Key Words machine learning naive Bayes random forest the ability of tissue tracing uncertain origins Collapse MESH Headings Collapse Grants Collapse
13	Smart Helmet 5.0 for Industrial Internet of Things Using Artificial Intelligence. SENSORS 2020;20:s20216241. [PMID: 33139608 PMCID: PMC7663590 DOI: 10.3390/s20216241] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 10/26/2020] [Accepted: 10/27/2020] [Indexed: 11/17/2022] Abstract Information and communication technologies (ICTs) have contributed to advances in Occupational Health and Safety, improving the security of workers. The use of Personal Protective Equipment (PPE) based on ICTs reduces the risk of accidents in the workplace, thanks to the capacity of the equipment to make decisions on the basis of environmental factors. Paradigms such as the Industrial Internet of Things (IIoT) and Artificial Intelligence (AI) make it possible to generate PPE models feasibly and create devices with more advanced characteristics such as monitoring, sensing the environment and risk detection between others. The working environment is monitored continuously by these models and they notify the employees and their supervisors of any anomalies and threats. This paper presents a smart helmet prototype that monitors the conditions in the workers’ environment and performs a near real-time evaluation of risks. The data collected by sensors is sent to an AI-driven platform for analysis. The training dataset consisted of 11,755 samples and 12 different scenarios. As part of this research, a comparative study of the state-of-the-art models of supervised learning is carried out. Moreover, the use of a Deep Convolutional Neural Network (ConvNet/CNN) is proposed for the detection of possible occupational risks. The data are processed to make them suitable for the CNN and the results are compared against a Static Neural Network (NN), Naive Bayes Classifier (NB) and Support Vector Machine (SVM), where the CNN had an accuracy of 92.05% in cross-validation. Collapse Key Words OHS PPE convolutional neural network deep learning microcontroller naive Bayes risk detection support vector machine Collapse MESH Headings Collapse Grants Collapse
14	A Probabilistic Classification Tool for Genetic Subtypes of Diffuse Large B Cell Lymphoma with Therapeutic Implications. Cancer Cell 2020;37:551-568.e14. [PMID: 32289277 PMCID: PMC8459709 DOI: 10.1016/j.ccell.2020.03.015] [Citation(s) in RCA: 511] [Impact Index Per Article: 127.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/19/2019] [Revised: 01/03/2020] [Accepted: 03/16/2020] [Indexed: 12/22/2022] Abstract The development of precision medicine approaches for diffuse large B cell lymphoma (DLBCL) is confounded by its pronounced genetic, phenotypic, and clinical heterogeneity. Recent multiplatform genomic studies revealed the existence of genetic subtypes of DLBCL using clustering methodologies. Here, we describe an algorithm that determines the probability that a patient's lymphoma belongs to one of seven genetic subtypes based on its genetic features. This classification reveals genetic similarities between these DLBCL subtypes and various indolent and extranodal lymphoma types, suggesting a shared pathogenesis. These genetic subtypes also have distinct gene expression profiles, immune microenvironments, and outcomes following immunochemotherapy. Functional analysis of genetic subtype models highlights distinct vulnerabilities to targeted therapy, supporting the use of this classification in precision medicine trials. Collapse Key Words A53 BN2 DLBCL EZB LymphGen MCD N1 ST2 genomic classification lymphoma naive Bayes Collapse MESH Headings Animals Apoptosis Biomarkers, Tumor/genetics Cell Proliferation Female Gene Expression Profiling Gene Expression Regulation, Neoplastic Genetic Heterogeneity Humans Lymphoma, Large B-Cell, Diffuse/classification Lymphoma, Large B-Cell, Diffuse/drug therapy Lymphoma, Large B-Cell, Diffuse/genetics Lymphoma, Large B-Cell, Diffuse/pathology Mice Mice, Inbred NOD Mice, SCID Molecular Targeted Therapy Precision Medicine Tumor Cells, Cultured Tumor Microenvironment Xenograft Model Antitumor Assays Collapse Grants P01 CA229100 NCI NIH HHS Z01 BC011006 Intramural NIH HHS Collapse
15	A genetic programming-based approach to identify potential inhibitors of serine protease of Mycobacterium tuberculosis. Future Med Chem 2020;12:147-159. [PMID: 32031024 DOI: 10.4155/fmc-2018-0560] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open Abstract Aim: We applied genetic programming approaches to understand the impact of descriptors on inhibitory effects of serine protease inhibitors of Mycobacterium tuberculosis (Mtb) and the discovery of new inhibitors as drug candidates. Materials & methods: The experimental dataset of serine protease inhibitors of Mtb descriptors was optimized by genetic algorithm (GA) along with the correlation-based feature selection (CFS) in order to develop predictive models using machine-learning algorithms. The best model was deployed on a library of 918 phytochemical compounds to screen potential serine protease inhibitors of Mtb. Quality and performance of the predictive models were evaluated using various standard statistical parameters. Result: The best random forest model with CFS-GA screened 126 anti-tubercular agents out of 918 phytochemical compounds. Also, genetic programing symbolic classification method is optimized descriptors and developed an equation for mathematical models. Conclusion: The use of CFS-GA with random forest-enhanced classification accuracy and predicted new serine protease inhibitors of Mtb, which can be used for better drug development against tuberculosis. Collapse Key Words J48 correlation-based feature selection feature selection genetic algorithm genetic programming machine learning naive Bayes random forest support vector machines tuberculosis Collapse MESH Headings Collapse Grants Collapse
16	[Resting-state electroencephalogram classification of patients with schizophrenia or depression]. SHENG WU YI XUE GONG CHENG XUE ZA ZHI = JOURNAL OF BIOMEDICAL ENGINEERING = SHENGWU YIXUE GONGCHENGXUE ZAZHI 2019;36:916-923. [PMID: 31875364 DOI: 10.7507/1001-5515.201812041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/10/2023] Abstract The clinical manifestations of patients with schizophrenia and patients with depression not only have a certain similarity, but also change with the patient's mood, and thus lead to misdiagnosis in clinical diagnosis. Electroencephalogram (EEG) analysis provides an important reference and objective basis for accurate differentiation and diagnosis between patients with schizophrenia and patients with depression. In order to solve the problem of misdiagnosis between patients with schizophrenia and patients with depression, and to improve the accuracy of the classification and diagnosis of these two diseases, in this study we extracted the resting-state EEG features from 100 patients with depression and 100 patients with schizophrenia, including information entropy, sample entropy and approximate entropy, statistical properties feature and relative power spectral density (rPSD) of each EEG rhythm (δ, θ, α, β). Then feature vectors were formed to classify these two types of patients using the support vector machine (SVM) and the naive Bayes (NB) classifier. Experimental results indicate that: ① The rPSD feature vector P performs the best in classification, achieving an average accuracy of 84.2% and a highest accuracy of 86.3%; ② The accuracy of SVM is obviously better than that of NB; ③ For the rPSD of each rhythm, the β rhythm performs the best with the highest accuracy of 76%; ④ Electrodes with large feature weight are mainly concentrated in the frontal lobe and parietal lobe. The results of this study indicate that the rPSD feature vector P in conjunction with SVM can effectively distinguish depression and schizophrenia, and can also play an auxiliary role in the relevant clinical diagnosis. Collapse Key Words depression electroencephalogram feature extraction naive Bayes schizophrenia support vector machine Collapse MESH Headings Bayes Theorem Depression Electroencephalography Humans Schizophrenia Signal Processing, Computer-Assisted Support Vector Machine Collapse Grants Collapse
17	Finding Diagnostically Useful Patterns in Quantitative Phenotypic Data. Am J Hum Genet 2019;105:933-946. [PMID: 31607427 PMCID: PMC6848993 DOI: 10.1016/j.ajhg.2019.09.015] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Accepted: 09/13/2019] [Indexed: 12/11/2022] Open Abstract Trio-based whole-exome sequence (WES) data have established confident genetic diagnoses in ∼40% of previously undiagnosed individuals recruited to the Deciphering Developmental Disorders (DDD) study. Here we aim to use the breadth of phenotypic information recorded in DDD to augment diagnosis and disease variant discovery in probands. Median Euclidean distances (mEuD) were employed as a simple measure of similarity of quantitative phenotypic data within sets of ≥10 individuals with plausibly causative de novo mutations (DNM) in 28 different developmental disorder genes. 13/28 (46.4%) showed significant similarity for growth or developmental milestone metrics, 10/28 (35.7%) showed similarity in HPO term usage, and 12/28 (43%) showed no phenotypic similarity. Pairwise comparisons of individuals with high-impact inherited variants to the 32 individuals with causative DNM in ANKRD11 using only growth z-scores highlighted 5 likely causative inherited variants and two unrecognized DNM resulting in an 18% diagnostic uplift for this gene. Using an independent approach, naive Bayes classification of growth and developmental data produced reasonably discriminative models for the 24 DNM genes with sufficiently complete data. An unsupervised naive Bayes classification of 6,993 probands with WES data and sufficient phenotypic information defined 23 in silico syndromes (ISSs) and was used to test a “phenotype first” approach to the discovery of causative genotypes using WES variants strictly filtered on allele frequency, mutation consequence, and evidence of constraint in humans. This highlighted heterozygous de novo nonsynonymous variants in SPTBN2 as causative in three DDD probands. Collapse Key Words developmental disease genotype naive Bayes phenotype tSNE Collapse MESH Headings Collapse Grants Collapse
18	Comparing regression, naive Bayes, and random forest methods in the prediction of individual survival to second lactation in Holstein cattle. J Dairy Sci 2019;102:9409-9421. [PMID: 31447154 DOI: 10.3168/jds.2019-16295] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 06/17/2019] [Indexed: 11/19/2022] Abstract In this study, we compared multiple logistic regression, a linear method, to naive Bayes and random forest, 2 nonlinear machine-learning methods. We used all 3 methods to predict individual survival to second lactation in dairy heifers. The data set used for prediction contained 6,847 heifers born between January 2012 and June 2013, and had known survival outcomes. Each animal had 50 genomic estimated breeding values available at birth and up to 65 phenotypic variables that accumulated over time. Survival was predicted at 5 moments in life: at birth, at 18 mo, at first calving, at 6 wk after first calving, and at 200 d after first calving. The data sets were randomly split into 70% training and 30% testing sets to evaluate model performance for 20-fold validation. The methods were compared for accuracy, sensitivity, specificity, area under the curve (AUC) value, contrasts between groups for the prediction outcomes, and increase in surviving animals in a practical scenario. At birth and 18 mo, all methods had overlapping performance; no method significantly outperformed the other. At first calving, 6 wk after first calving, and 200 d after first calving, random forest and naive Bayes had overlapping performance, and both machine-learning methods outperformed multiple logistic regression. Overall, naive Bayes has the highest average AUC at all decision points up to 200 d after first calving. Random forest had the highest AUC at 200 d after first calving. All methods obtained similar increases in survival in the practical scenario. Despite this, the methods appeared to predict the survival of individual heifers differently. All methods improved over time, but the changes in mean model outcomes for surviving and non-surviving animals differed by method. Furthermore, the correlations of individual predictions between methods ranged from r = 0.417 to r = 0.700; the lowest correlations were at first calving for all methods. In short, all 3 methods were able to predict survival at a population level, because all methods improved survival in a practical scenario. However, depending on the method used, predictions for individual animals were quite different between methods. Collapse Key Words machine learning naive Bayes phenotypic prediction random forest regression Collapse MESH Headings Collapse Grants Collapse
19	Learning-based classification of valence emotion from electroencephalography. Int J Neurosci 2019;129:1085-1093. [PMID: 31215829 DOI: 10.1080/00207454.2019.1634070] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Abstract The neuroimaging research field has been revolutionized with the development of human cognitive functions without the use of brain pathways. To assist such systems, electroencephalography (EEG) based measures play an important role. In this study, the publicly available database of emotion analysis using physiological signals, has been used to identify the human emotions such as valence (positive/negative) from the given recorded EEG signals. With the identification of such emotion, the feeling of goodness or badness related individual experiences with the situation can be identified from his/her brain signals. The different machine learning classifiers such as random forest, decisions trees, K-nearest neighbor, support vector machines, naive Bayes and neural network have been used to identify and evaluate such emotions. The previous work done by the other authors on the same dataset using various quantitative approaches are compared with the approaches used in this study yields higher accuracy rates with the random forest and decision tree. The effectiveness of each classifier in terms of statistical measures such as accuracy, F-score, etc. has been evaluated. The random forest classifier was found to outperform with an accuracy of 98%, closely followed by the Decision tree at 94% are the most effective classifiers in classifying the valence emotions of the EEG data for 6 subjects. Collapse Key Words -nearest neighbor Electroencephalography brain–computer interface database of emotion analysis using physiological signals decisions trees naive Bayes neural network random forest support vector machines Collapse MESH Headings Collapse Grants Collapse
20	Evaluation of Machine Learning Algorithms for Surface Water Extraction in a Landsat 8 Scene of Nepal. SENSORS 2019;19:s19122769. [PMID: 31226778 PMCID: PMC6631528 DOI: 10.3390/s19122769] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Revised: 06/12/2019] [Accepted: 06/17/2019] [Indexed: 11/16/2022] Abstract With over 6000 rivers and 5358 lakes, surface water is one of the most important resources in Nepal. However, the quantity and quality of Nepal's rivers and lakes are decreasing due to human activities and climate change. Despite the advancement of remote sensing technology and the availability of open access data and tools, the monitoring and surface water extraction works has not been carried out in Nepal. Single or multiple water index methods have been applied in the extraction of surface water with satisfactory results. Extending our previous study, the authors evaluated six different machine learning algorithms: Naive Bayes (NB), recursive partitioning and regression trees (RPART), neural networks (NNET), support vector machines (SVM), random forest (RF), and gradient boosted machines (GBM) to extract surface water in Nepal. With three secondary bands, slope, NDVI and NDWI, the algorithms were evaluated for performance with the addition of extra information. As a result, all the applied machine learning algorithms, except NB and RPART, showed good performance. RF showed overall accuracy (OA) and kappa coefficient (Kappa) of 1 for the all the multiband data with the reference dataset, followed by GBM, NNET, and SVM in metrics. The performances were better in the hilly regions and flat lands, but not well in the Himalayas with ice, snow and shadows, and the addition of slope and NDWI showed improvement in the results. Adding single secondary bands is better than adding multiple in most algorithms except NNET. From current and previous studies, it is recommended to separate any study area with and without snow or low and high elevation, then apply machine learning algorithms in original Landsat data or with the addition of slopes or NDWI for better performance. Collapse Key Words Landsat Nepal gradient boosted machines machine learning naive Bayes neural networks random forest recursive partitioning and regression trees support vector machines surface water mapping Collapse MESH Headings Collapse Grants Collapse
21	High-dimensional prediction of binary outcomes in the presence of between-study heterogeneity. Stat Methods Med Res 2018;28:2848-2867. [PMID: 30051767 DOI: 10.1177/0962280218787544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Abstract Many prediction methods have been proposed in the literature, but most of them ignore heterogeneity between populations. Either only data from a single study or population is available for model building and evaluation, or when data from multiple studies make up the training dataset, studies are pooled before model building. As a result, prediction models might perform less than expected when applied to new subjects from new study populations. We propose a linear method for building prediction models with high-dimensional data from multiple studies. Our method explicitly addresses between-population variability and tends to select predictors that are predictive in most of the study populations. We employ empirical Bayes estimators and hence avoid selection bias during the variable selection process. Simulation results demonstrate that the new method works better than other linear prediction methods that ignore the between-study variability. Our method is developed for classification into two groups. Collapse Key Words Empirical Bayes heterogeneity high-dimensional data multiple studies naive Bayes Collapse MESH Headings Collapse Grants Collapse
22	Text Classification for Organizational Researchers: A Tutorial. ORGANIZATIONAL RESEARCH METHODS 2018;21:766-799. [PMID: 29881249 PMCID: PMC5975702 DOI: 10.1177/1094428117719322] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract Organizations are increasingly interested in classifying texts or parts thereof into categories, as this enables more effective use of their information. Manual procedures for text classification work well for up to a few hundred documents. However, when the number of documents is larger, manual procedures become laborious, time-consuming, and potentially unreliable. Techniques from text mining facilitate the automatic assignment of text strings to categories, making classification expedient, fast, and reliable, which creates potential for its application in organizational research. The purpose of this article is to familiarize organizational researchers with text mining techniques from machine learning and statistics. We describe the text classification process in several roughly sequential steps, namely training data preparation, preprocessing, transformation, application of classification techniques, and validation, and provide concrete recommendations at each step. To help researchers develop their own text classifiers, the R code associated with each step is presented in a tutorial. The tutorial draws from our own work on job vacancy mining. We end the article by discussing how researchers can validate a text classification model and the associated output. Collapse Key Words naive Bayes random forest support vector machines text classification text mining Collapse MESH Headings Collapse Grants Collapse
23	Unified framework for triaxial accelerometer-based fall event detection and classification using cumulants and hierarchical decision tree classifier. Healthc Technol Lett 2015;2:101-7. [PMID: 26609414 DOI: 10.1049/htl.2015.0018] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2015] [Revised: 06/22/2015] [Accepted: 06/22/2015] [Indexed: 11/19/2022] Open Abstract In this Letter, the authors present a unified framework for fall event detection and classification using the cumulants extracted from the acceleration (ACC) signals acquired using a single waist-mounted triaxial accelerometer. The main objective of this Letter is to find suitable representative cumulants and classifiers in effectively detecting and classifying different types of fall and non-fall events. It was discovered that the first level of the proposed hierarchical decision tree algorithm implements fall detection using fifth-order cumulants and support vector machine (SVM) classifier. In the second level, the fall event classification algorithm uses the fifth-order cumulants and SVM. Finally, human activity classification is performed using the second-order cumulants and SVM. The detection and classification results are compared with those of the decision tree, naive Bayes, multilayer perceptron and SVM classifiers with different types of time-domain features including the second-, third-, fourth- and fifth-order cumulants and the signal magnitude vector and signal magnitude area. The experimental results demonstrate that the second- and fifth-order cumulant features and SVM classifier can achieve optimal detection and classification rates of above 95%, as well as the lowest false alarm rate of 1.03%. Collapse Key Words ACC signals SVM classifiers acceleration measurement acceleration signals accelerometers biomedical measurement body sensor networks cumulant extraction decision trees fall event classification algorithm feature extraction fifth-order cumulants fourth-order cumulants hierarchical decision tree classifier human activity classification lowest false alarm rate medical signal processing multilayer perceptron naive Bayes optimal detection second-order cumulants signal classification single waist-mounted triaxial accelerometer support vector machines supports vector machine third-order cumulants time-domain features triaxial accelerometer-based fall event detection Collapse MESH Headings Collapse Grants Collapse
24	A classifier system for predicting RNA secondary structure. ACTA ACUST UNITED AC 2014;10:307-20. [PMID: 24794072 DOI: 10.1504/ijbra.2014.060764] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Abstract Finding the secondary structures of ribonucleic acid sequences is a very important task. The secondary structure helps determine their functionalities which in turn plays a role in the proteins production. Manual laboratory methods use X-ray diffraction to predict secondary structures but it is complex, slow and expensive. Therefore, different computational approaches are used to predict RNA secondary structure in order to reduce the time and cost associated with the manual process. We propose a system called IsRNA to predict a single element, internal loop, of the RNA secondary structure. IsRNA experiments with different classifiers such as SVM, KNN, Naive Bayes and Simple K means to find the most accurate classifier. We present a through experimental evaluation of 24 features, classified into five groups, to determine the most relevant feature groups. The system is evaluated using Rfam sequences and achieves an overall sensitivity, selectivity, and accuracy of 96.1%, 98%, and 96.1%, respectively. Collapse Key Words KNN RNA secondary structure RNA sequences SVM bioinformatics classifiers internal loop k–nearest neighbour naive Bayes protein production ribonucleic acid simple K–means support vector machine Collapse MESH Headings Collapse Grants Collapse
25	Morphology cluster and prediction of growth of human brain pyramidal neurons. Neural Regen Res 2012;7:36-40. [PMID: 25806056 PMCID: PMC4354113 DOI: 10.3969/j.issn.1673-5374.2012.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2011] [Accepted: 11/12/2011] [Indexed: 11/12/2022] Open Abstract Predicting neuron growth is valuable to understand the morphology of neurons, thus it is helpful in the research of neuron classification. This study sought to propose a new method of predicting the growth of human neurons using 1 907 sets of data in human brain pyramidal neurons obtained from the website of NeuroMorpho.Org. First, we analyzed neurons in a morphology field and used an expectation-maximization algorithm to specify the neurons into six clusters. Second, naive Bayes classifier was used to verify the accuracy of the expectation-maximization algorithm. Experiment results proved that the cluster groups here were efficient and feasible. Finally, a new method to rank the six expectation-maximization algorithm clustered classes was used in predicting the growth of human pyramidal neurons. Collapse Key Words 10-fold cross validation expectation-maximization morphological cluster naive Bayes neural regeneration neurons Collapse MESH Headings Collapse Grants Collapse