Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

50
(from Reference Citation Analysis)

Article PDFs (13)

Cited by > 0 (39)

Searched Name

Naïve Bayes

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Barry KA, Manzali Y, Flouchi R, Balouki Y, Chelhi K, Elfar M. Exploring the use of association rules in random forest for predicting heart disease. Comput Methods Biomech Biomed Engin 2024;27:338-346. [PMID: 36877167 DOI: 10.1080/10255842.2023.2185477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 02/07/2023] [Accepted: 02/16/2023] [Indexed: 03/07/2023]

Monteverde-Suárez D, González-Flores P, Santos-Solórzano R, García-Minjares M, Zavala-Sierra I, de la Luz VL, Sánchez-Mendiola M. Predicting students' academic progress and related attributes in first-year medical students: an analysis with artificial neural networks and Naïve Bayes. BMC Med Educ 2024;24:74. [PMID: 38243257 PMCID: PMC10799512 DOI: 10.1186/s12909-023-04918-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 11/30/2023] [Indexed: 01/21/2024]

Abstract

BACKGROUND

Dropout and poor academic performance are persistent problems in medical schools in emerging economies. Identifying at-risk students early and knowing the factors that contribute to their success would be useful for designing educational interventions. Educational Data Mining (EDM) methods can identify students at risk of poor academic progress and dropping out. The main goal of this study was to use machine learning models, Artificial Neural Networks (ANN) and Naïve Bayes (NB), to identify first year medical students that succeed academically, using sociodemographic data and academic history.

METHODS

Data from seven cohorts (2011 to 2017) of admitted medical students to the National Autonomous University of Mexico (UNAM) Faculty of Medicine in Mexico City were analysed. Data from 7,976 students (2011 to 2017 cohorts) of the program were included. Information from admission diagnostic exam results, academic history, sociodemographic characteristics and family environment was used. The main dataset included 48 variables. The study followed the general knowledge discovery process: pre-processing, data analysis, and validation. Artificial Neural Networks (ANN) and Naïve Bayes (NB) models were used for data mining analysis.

RESULTS

ANNs models had slightly better performance in accuracy, sensitivity, and specificity. Both models had better sensitivity when classifying regular students and better specificity when classifying irregular students. Of the 25 variables with highest predictive value in the Naïve Bayes model, percentage of correct answers in the diagnostic exam was the best variable.

CONCLUSIONS

Both ANN and Naïve Bayes methods can be useful for predicting medical students' academic achievement in an undergraduate program, based on information of their prior knowledge and socio-demographic factors. Although ANN offered slightly superior results, Naïve Bayes made it possible to obtain an in-depth analysis of how the different variables influenced the model. The use of educational data mining techniques and machine learning classification techniques have potential in medical education.

Collapse

Kochetkova T, Hanke MS, Indermaur M, Groetsch A, Remund S, Neuenschwander B, Michler J, Siebenrock KA, Zysset P, Schwiedrzik J. Composition and micromechanical properties of the femoral neck compact bone in relation to patient age, sex and hip fracture occurrence. Bone 2023;177:116920. [PMID: 37769956 DOI: 10.1016/j.bone.2023.116920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 09/22/2023] [Accepted: 09/25/2023] [Indexed: 10/03/2023]

Abstract

Current clinical methods of bone health assessment depend to a great extent on bone mineral density (BMD) measurements. However, these methods only act as a proxy for bone strength and are often only carried out after the fracture occurs. Besides BMD, composition and tissue-level mechanical properties are expected to affect the whole bone's strength and toughness. While the elastic properties of the bone extracellular matrix (ECM) have been extensively investigated over the past two decades, there is still limited knowledge of the yield properties and their relationship to composition and architecture. In the present study, morphological, compositional and micropillar compression bone data was collected from patients who underwent hip arthroplasty. Femoral neck samples from 42 patients were collected together with anonymous clinical information about age, sex and primary diagnosis (coxarthrosis or hip fracture). The femoral neck cortex from the inferomedial region was analyzed in a site-matched manner using a combination of micromechanical testing (nanoindentation, micropillar compression) together with micro-CT and quantitative polarized Raman spectroscopy for both morphological and compositional characterization. Mechanical properties, as well as the sample-level mineral density, were constant over age. Only compositional properties demonstrate weak dependence on patient age: decreasing mineral to matrix ratio (p = 0.02, R2 = 0.13, 2.6 % per decade) and increasing amide I sub-peak ratio I∼1660/I∼1683 (p = 0.04, R2 = 0.11, 1.5 % per decade). The patient's sex and diagnosis did not seem to influence investigated bone properties. A clear zonal dependence between interstitial and osteonal cortical zones was observed for compositional and elastic bone properties (p < 0.0001). Site-matched microscale analysis confirmed that all investigated mechanical properties except yield strain demonstrate a positive correlation with the mineral fraction of bone. The output database is the first to integrate the experimentally assessed microscale yield properties, local tissue composition and morphology with the available patient clinical information. The final dataset was used for bone fracture risk prediction in-silico through the principal component analysis and the Naïve Bayes classification algorithm. The analysis showed that the mineral to matrix ratio, indentation hardness and micropillar yield stress are the most relevant parameters for bone fracture risk prediction at 70 % model accuracy (0.71 AUC). Due to the low number of samples, further studies to build a universal fracture prediction algorithm are anticipated with the higher number of patients (N > 200). The proposed classification algorithm together with the output dataset of bone tissue properties can be used for the future comparison of existing methods to evaluate bone quality as well as to form a better understanding of the mechanisms through which bone tissue is affected by aging or disease.

Collapse

Zhang Q, Zhao HM, Yang K, Chen J, Yang RQ, Wang C. Construction of an Analysis Model of mRNA Markers in Menstrual Blood Based on Naïve Bayes and Multivariate Logistic Regression Methods. Fa Yi Xue Za Zhi 2023;39:447-451. [PMID: 38006263 DOI: 10.12116/j.issn.1004-5619.2021.511207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 11/26/2023]

Varga G, Stoicu-Tivadar L, Nicola S. Comparison of Data Classification Results in Serious Gaming for Rehabilitation of Rheumatoid Arthritis. Stud Health Technol Inform 2023;309:63-67. [PMID: 37869807 DOI: 10.3233/shti230740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2023]

Costantini G, Cesarini V, Brenna E. High-Level CNN and Machine Learning Methods for Speaker Recognition. Sensors (Basel) 2023;23:3461. [PMID: 37050521 PMCID: PMC10098737 DOI: 10.3390/s23073461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 03/20/2023] [Accepted: 03/22/2023] [Indexed: 06/19/2023]

Barman U, Pathak C, Mazumder NK. Comparative assessment of Pest damage identification of coconut plant using damage texture and color analysis. Multimed Tools Appl 2023;82:1-23. [PMID: 36712953 PMCID: PMC9874181 DOI: 10.1007/s11042-023-14369-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 06/27/2022] [Accepted: 01/02/2023] [Indexed: 06/18/2023]

Albataineh Z, Aldrweesh F, Alzubaidi MA. COVID-19 CT-images diagnosis and severity assessment using machine learning algorithm. Cluster Comput 2023:1-16. [PMID: 36712413 PMCID: PMC9871425 DOI: 10.1007/s10586-023-03972-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 11/20/2022] [Accepted: 11/26/2022] [Indexed: 06/18/2023]

Sebro R, la Garza-Ramos CD. Utilizing machine learning for opportunistic screening for low BMD using CT scans of the cervical spine. J Neuroradiol 2022;50:293-301. [PMID: 36030924 DOI: 10.1016/j.neurad.2022.08.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 08/22/2022] [Accepted: 08/24/2022] [Indexed: 11/28/2022]

Abstract

BACKGROUND

Computed Tomography (CT) scans of the cervical spine are often performed to evaluate patients for trauma and degenerative changes of the cervical spine. We hypothesized that the CT attenuation of the cervical vertebrae can be used to identify patients who should be screened for osteoporosis.

METHODS

Retrospective study of 253 patients (177 training/validation and 76 test) with unenhanced CT scans of the cervical spine and DXA studies within 12 months of each other. Volumetric segmentation of C1-T1, clivus, and first ribs was performed to obtain the CT attenuation of each bone. The correlations of the CT attenuations between the bones and with DXA measurements were evaluated. Univariate receiver operator characteristic (ROC) analyses, and multivariate classifiers (Random Forest (RF), XGBoost, Naïve Bayes (NB), and Support Vector Machines (SVM)) analyzing the CT attenuation of all bones, were utilized to predict patients with osteopenia/osteoporosis and femoral neck bone mineral density (BMD) T-scores <-1.

RESULTS

There were positive correlations between the CT attenuation of each bone, and with the DXA measurements. A CT attenuation threshold of 305.2 Hounsfield Units (HU) at C3 had the highest accuracy =0.763 (AUC=0.814) to detect femoral neck BMD T-scores ≤-1 and a CT attenuation threshold of 323.6 HU at C3 had the highest accuracy=0.774 (AUC=0.843) to detect osteopenia/osteoporosis. The SVM classifier (AUC=0.756) had higher AUC than the RF (AUC=0.692, P=0.224), XGBoost (AUC=0.736; P=0.814), NB (AUC=0.622, P=0.133) and CT threshold of 305.2 HU at C3 (AUC=0.704, P=0.531) classifiers to identify patients with femoral neck BMD T-scores <-1. The SVM classifier (accuracy=0.816) was more accurate than using the CT threshold of 305.2 HU at C3 (accuracy=0.671) (McNemar's χ₁²=7.55, P=0.006).

CONCLUSION

Opportunistic screening for low BMD can be done using cervical spine CT scans. A SVM classifier was more accurate than using the CT threshold of 305.2 HU at C3.

Collapse

Rabie AH, Mansour NA, Saleh AI, Takieldeen AE. Expecting individuals' body reaction to Covid-19 based on statistical Naïve Bayes technique. Pattern Recognit 2022;128:108693. [PMID: 35400761 PMCID: PMC8983097 DOI: 10.1016/j.patcog.2022.108693] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 02/01/2022] [Accepted: 04/03/2022] [Indexed: 06/14/2023]

Abstract

Covid-19, what a strange, unpredictable mutated virus. It has baffled many scientists, as no firm rule has yet been reached to predict the effect that the virus can inflict on people if they are infected with it. Recently, many researches have been introduced for diagnosing Covid-19; however, none of them pay attention to predict the effect of the virus on the person's body if the infection occurs but before the infection really takes place. Predicting the extent to which people will be affected if they are infected with the virus allows for some drastic precautions to be taken for those who will suffer from serious complications, while allowing some freedom for those who expect not to be affected badly. This paper introduces Covid-19 Prudential Expectation Strategy (CPES) as a new strategy for predicting the behavior of the person's body if he has been infected with Covid-19. The CPES composes of three phases called Outlier Rejection Phase (ORP), Feature Selection Phase (FSP), and Classification Phase (CP). For enhancing the classification accuracy in CP, CPES employs two proposed techniques for outlier rejection in ORP and feature selection in FSP, which are called Hybrid Outlier Rejection (HOR) method and Improved Binary Genetic Algorithm (IBGA) method respectively. In ORP, HOR rejects outliers in the training data using a hybrid method that combines standard division and Binary Gray Wolf Optimization (BGWO) method. On the other hand, in FSP, IBGA as a hybrid method selects the most useful features for the prediction process. IBGA includes Fisher Score (F_Score) as a filter method to quickly select the features and BGA as a wrapper method to accurately select the features based on the average accuracy value from several classification models as a fitness function to guarantee the efficiency of the selected subset of features with any classifier. In CP, CPES has the ability to classify people based on their bodies' reaction to Covid-19 infection, which is built upon a proposed Statistical Naïve Bayes (SNB) classifier after performing the previous two phases. CPES has been compared against recent related strategies in terms of accuracy, error, recall, precision, and run-time using Covid-19 dataset [1]. This dataset contains routine blood tests collected from people before and after their infection with covid-19 through a Web-based form created by us. CPES outperforms the competing methods in experimental results because it provides the best results with values of 0.87, 0.13, 0.84, and 0.79 for accuracy, error, precision, and recall.

Collapse

Abu El-Magd SA, Maged A, Farhat HI. Hybrid-based Bayesian algorithm and hydrologic indices for flash flood vulnerability assessment in coastal regions: machine learning, risk prediction, and environmental impact. Environ Sci Pollut Res Int 2022;29:57345-57356. [PMID: 35352224 PMCID: PMC9395492 DOI: 10.1007/s11356-022-19903-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 03/21/2022] [Indexed: 05/29/2023]

Kalezhi J, Chibuluma M, Chembe C, Chama V, Lungo F, Kunda D. Modelling Covid-19 infections in Zambia using data mining techniques. Results Eng 2022;13:100363. [PMID: 35317385 PMCID: PMC8813672 DOI: 10.1016/j.rineng.2022.100363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 01/08/2022] [Accepted: 02/01/2022] [Indexed: 06/14/2023]

Tiwari D, Bhati BS, Al‐Turjman F, Nagpal B. Pandemic coronavirus disease (Covid-19): World effects analysis and prediction using machine-learning techniques. Expert Syst 2022;39:e12714. [PMID: 34177035 PMCID: PMC8209956 DOI: 10.1111/exsy.12714] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 04/26/2021] [Indexed: 05/09/2023]

Abstract

Pandemic novel Coronavirus (Covid-19) is an infectious disease that primarily spreads by droplets of nose discharge when sneezing and saliva from the mouth when coughing, that had first been reported in Wuhan, China in December 2019. Covid-19 became a global pandemic, which led to a harmful impact on the world. Many predictive models of Covid-19 are being proposed by academic researchers around the world to take the foremost decisions and enforce the appropriate control measures. Due to the lack of accurate Covid-19 records and uncertainty, the standard techniques are being failed to correctly predict the epidemic global effects. To address this issue, we present an Artificial Intelligence (AI)-based meta-analysis to predict the trend of epidemic Covid-19 over the world. The powerful machine learning algorithms namely Naïve Bayes, Support Vector Machine (SVM) and Linear Regression were applied on real time-series dataset, which holds the global record of confirmed, recovered, deaths and active cases of Covid-19 outbreak. Statistical analysis has also been conducted to present various facts regarding Covid-19 observed symptoms, a list of Top-20 Coronavirus affected countries and a number of coactive cases over the world. Among the three machine learning techniques investigated, Naïve Bayes produced promising results to predict Covid-19 future trends with less Mean Absolute Error (MAE) and Mean Squared Error (MSE). The less value of MAE and MSE strongly represent the effectiveness of the Naïve Bayes regression technique. Although, the global footprint of this pandemic is still uncertain. This study demonstrates the various trends and future growth of the global pandemic for a proactive response from the citizens and governments of countries. This paper sets the initial benchmark to demonstrate the capability of machine learning for outbreak prediction.

Collapse

Abd-Elsalam SM, Ezz MM, Gamalel-Din S, Esmat G, Elakel W, ElHefnawi M. Derivation of "Egyptian varices prediction (EVP) index": A novel noninvasive index for diagnosing esophageal varices in HCV Patients. J Adv Res 2022;35:87-97. [PMID: 35024195 PMCID: PMC8721354 DOI: 10.1016/j.jare.2021.02.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 02/06/2021] [Accepted: 02/17/2021] [Indexed: 02/07/2023] Open

Abstract

•

Esophageal Varices is one complication of chronic liver disease that leads to deaths globally due to hemorrhage.

•

The prediction of presence the Esophageal Varices is essential to avoid bleeding for patients.

•

Now the only diagnostic method for Esophageal Varices by the upper gastrointestinal endoscopy but it has many disadvantages.

•

Only ten variables are the most significant for diagnosing the varices: PLT, Stiffness, PC, liver texture, spleen, HCV-RNA, Albumin, gender, Total bilirubin, and PV diameter.

•

We Evaluated the effectiveness of several noninvasive markers for predicting Varices.

•

We Introduced a novel (EVP) index with acceptable performance for diagnosing Varices and compared with the exist, it could save operating the upper endoscopic by nearly 46.5%.

Introduction

Esophageal Varices (EVs) is one of the major dangerous complications of liver fibrosis. Upper Gastrointestinal (UGI) Endoscopy is necessary for its diagnosis. Repeated examinations for EVs screening severely burden endoscopic units in terms of cost and other side implications; moreover, the lack of public health resources in rural areas and primary hospitals should be considered, particularly in developing countries. So, an accurate noninvasive marker for EV is highly needed for liver disease patients.

Objectives

This study sought to evaluate the values of several indices to determine how adequate are they in predicting EV and build a novel accurate prediction index.

Methods

Five thousand and thirteen patients were enrolled. The laboratory tests, abdominal ultrasonography, liver stiffness measurement using Fibro-scan, and UGI endoscopy were performed. Ten common indices: Fib-4 score, AST-to-platelet ratio index, Fibrosis index, AST/ALT ratio Varices Prediction Rule, Baveno VI, APRI-Fib4 Combo, King score, “Model for End-Stage Liver Disease”, and Lok Score were calculated. The significant predictors for EVs were identified by using “P-value Correlation-based Filter Selection” method, where a novel Egyptian Varices Prediction (EVP) index was developed using binary logistic regression. The diagnostic performance was evaluated by some parameters and the Area Under Curve (AUC).

Results

EVP Index was correlated to EVs at 0.5; it achieved higher performance (AUC 0.788, accuracy 73.3%, and sensitivity 78%) than the other indices at a cutoff point of 0.423.

Conclusion

EVP Index was a good noninvasive predictor. It had an acceptable performance for diagnosing EVs and it was only required regular laboratory tests and imaging data. It can provide a tool for classifying or arranging the patients according to the degree pre-emptive for selective endoscopy and the degree of severity. Also, it will enable clinicians to concentrate on one marker instead of a wide set of parameters.

Collapse

Alshammari MM, Almuhanna A, Alhiyafi J. Mammography Image-Based Diagnosis of Breast Cancer Using Machine Learning: A Pilot Study. Sensors (Basel) 2021;22:203. [PMID: 35009746 DOI: 10.3390/s22010203] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 12/22/2021] [Accepted: 12/24/2021] [Indexed: 02/08/2023]

Das S, Amin SA, Jha T. Insight into the structural requirement of aryl sulphonamide based gelatinases (MMP-2 and MMP-9) inhibitors - Part I: 2D-QSAR, 3D-QSAR topomer CoMFA and Naïve Bayes studies - First report of 3D-QSAR Topomer CoMFA analysis for MMP-9 inhibitors and jointly inhibitors of gelatinases together. SAR QSAR Environ Res 2021;32:655-687. [PMID: 34355614 DOI: 10.1080/1062936x.2021.1955414] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 07/11/2021] [Indexed: 06/13/2023]

Jha AN, Chatterjee N, Tiwari G. A performance analysis of prediction techniques for impacting vehicles in hit-and-run road accidents. Accid Anal Prev 2021;157:106164. [PMID: 33957476 DOI: 10.1016/j.aap.2021.106164] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Revised: 01/12/2021] [Accepted: 04/27/2021] [Indexed: 06/12/2023]

Abstract

Road accidents are globally accepted challenges. They are one of the significant causes of deaths and injuries besides other direct and indirect losses. Countries and international organizations have designed technologies, systems, and policies to prevent accidents. However, hit-and-run accidents remain one of the most dangerous types of road accidents as the information about the vehicle responsible for the accident remain unknown. Therefore, any mechanism which can provide information about the impacting vehicle in hit-and-run accidents will be useful in planning and executing preventive measures to address this road menace. Since there exist several models to predict the impacting unknown vehicle, it becomes important to find which is the most accurate amongst those available. This research applies a process-based approach that identifies the most accurate model out of six supervised learning classification models viz. Logistic Reasoning, Linear Discriminant Analysis, Naïve Bayes, Classification and Regression Trees, k-Nearest Neighbor and Support Vector Machine. These models are implemented using five-fold and ten-fold cross validation, on road accident data collected from five mid-sized Indian cities: Agra, Amritsar, Bhopal, Ludhiana, and Vizag (Vishakhapatnam).This study investigates the possible input factors that may have effect on the performance of applied models. Based on the results of the experiment conducted in this study, Support Vector Machine has been found to have the maximum potentiality to predict unknown impacting vehicle type in hit-and-run accidents for all the cities except Amritsar. The result indicates that, Classification and Regression Trees have maximum accuracy, for Amritsar. Naïve Bayes performed very poorly for the five cities. These recommendations will help in predicting unknown impacting vehicles in hit-and-run accidents. The outcome is useful for transportation authorities and policymakers to implement effective road safety measures for the safety of road users.

Collapse

Savic N, Bovio N, Gilbert F, Paz J, Guseva Canu I. Procode: A Machine-Learning Tool to Support (Re-)coding of Free-Texts of Occupations and Industries. Ann Work Expo Health 2021;66:113-118. [PMID: 34145882 DOI: 10.1093/annweh/wxab037] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 03/30/2021] [Accepted: 05/07/2021] [Indexed: 11/13/2022] Open

Chatterjee A, Roy S, Das S. A Bi-fold Approach to Detect and Classify COVID-19 X-Ray Images and Symptom Auditor. SN Comput Sci 2021;2:304. [PMID: 34075356 PMCID: PMC8160081 DOI: 10.1007/s42979-021-00701-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 05/12/2021] [Indexed: 11/17/2022]

Bosc N, Felix E, Arcila R, Mendez D, Saunders MR, Green DVS, Ochoada J, Shelat AA, Martin EJ, Iyer P, Engkvist O, Verras A, Duffy J, Burrows J, Gardner JMF, Leach AR. MAIP: a web service for predicting blood-stage malaria inhibitors. J Cheminform 2021;13:13. [PMID: 33618772 PMCID: PMC7898753 DOI: 10.1186/s13321-021-00487-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Accepted: 01/20/2021] [Indexed: 12/17/2022] Open

Affiliation(s)

Nicolas Bosc European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD, Hinxton, Cambridge, United Kingdom.
Eloy Felix European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD, Hinxton, Cambridge, United Kingdom
Ricardo Arcila European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD, Hinxton, Cambridge, United Kingdom
David Mendez European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD, Hinxton, Cambridge, United Kingdom
Martin R Saunders Department of Molecular Design, Data and Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Hertfordshire, SG1 2NY, Stevenage, UK
Darren V S Green Department of Molecular Design, Data and Computational Sciences, GlaxoSmithKline, Gunnels Wood Road, Hertfordshire, SG1 2NY, Stevenage, UK
Jason Ochoada Department of Chemical Biology and Therapeutics, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Tennessee, 38105, Memphis, USA
Anang A Shelat Department of Chemical Biology and Therapeutics, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Tennessee, 38105, Memphis, USA
Eric J Martin Novartis Institute for Biomedical Research, 5300 Chiron Way, California, 94608- 2916, Emeryville, USA
Preeti Iyer Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
Ola Engkvist Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
Andreas Verras Schrodinger Inc, 120 West 45th Street, 10036-4041, New York, NY, USA
James Duffy Medicines for Malaria Ventures Discovery, 1215, Geneva, Switzerland
Jeremy Burrows Medicines for Malaria Ventures Discovery, 1215, Geneva, Switzerland
J Mark F Gardner AMG Consultants Ltd, Discovery Park House, Discovery Park, Ramsgate Road, CT13 9ND, Sandwich, Kent, UK
Andrew R Leach European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD, Hinxton, Cambridge, United Kingdom.

Collapse

Jain N, Jhunthra S, Garg H, Gupta V, Mohan S, Ahmadian A, Salahshour S, Ferrara M. Prediction modelling of COVID using machine learning methods from B-cell dataset. Results Phys 2021;21:103813. [PMID: 33495725 PMCID: PMC7816944 DOI: 10.1016/j.rinp.2021.103813] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 12/25/2020] [Accepted: 12/30/2020] [Indexed: 05/03/2023]

Humayun F, Khan F, Fawad N, Shamas S, Fazal S, Khan A, Ali A, Farhan A, Wei DQ. Computational Method for Classification of Avian Influenza A Virus Using DNA Sequence Information and Physicochemical Properties. Front Genet 2021;12:599321. [PMID: 33584824 PMCID: PMC7877484 DOI: 10.3389/fgene.2021.599321] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 01/04/2021] [Indexed: 11/30/2022] Open

Iqbal N, Sang J, Chen J, Xia X. Measuring Software Maintainability with Naïve Bayes Classifier. Entropy (Basel) 2021;23:e23020136. [PMID: 33499278 PMCID: PMC7910974 DOI: 10.3390/e23020136] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 01/17/2021] [Accepted: 01/19/2021] [Indexed: 11/16/2022]

Lakretz Y, Ossmy O, Friedmann N, Mukamel R, Fried I. Single-cell activity in human STG during perception of phonemes is organized according to manner of articulation. Neuroimage 2021;226:117499. [PMID: 33186717 DOI: 10.1016/j.neuroimage.2020.117499] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 09/29/2020] [Accepted: 10/21/2020] [Indexed: 11/23/2022] Open

Tavazzi E, Daberdaku S, Vasta R, Calvo A, Chiò A, Di Camillo B. Exploiting mutual information for the imputation of static and dynamic mixed-type clinical data with an adaptive k-nearest neighbours approach. BMC Med Inform Decis Mak 2020;20:174. [PMID: 32819346 PMCID: PMC7439551 DOI: 10.1186/s12911-020-01166-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 06/24/2020] [Indexed: 11/12/2022] Open

Abstract

Background

Clinical registers constitute an invaluable resource in the medical data-driven decision making context. Accurate machine learning and data mining approaches on these data can lead to faster diagnosis, definition of tailored interventions, and improved outcome prediction. A typical issue when implementing such approaches is the almost unavoidable presence of missing values in the collected data. In this work, we propose an imputation algorithm based on a mutual information-weighted k-nearest neighbours approach, able to handle the simultaneous presence of missing information in different types of variables. We developed and validated the method on a clinical register, constituted by the information collected over subsequent screening visits of a cohort of patients affected by amyotrophic lateral sclerosis.

Methods

For each subject with missing data to be imputed, we create a feature vector constituted by the information collected over his/her first three months of visits. This vector is used as sample in a k-nearest neighbours procedure, in order to select, among the other patients, the ones with the most similar temporal evolution of the disease over time. An ad hoc similarity metric was implemented for the sample comparison, capable of handling the different nature of the data, the presence of multiple missing values and include the cross-information among features captured by the mutual information statistic.

Results

We validated the proposed imputation method on an independent test set, comparing its performance with those of three state-of-the-art competitors, resulting in better performance. We further assessed the validity of our algorithm by comparing the performance of a survival classifier built on the data imputed with our method versus the one built on the data imputed with the best-performing competitor.

Conclusions

Imputation of missing data is a crucial –and often mandatory– step when working with real-world datasets. The algorithm proposed in this work could effectively impute an amyotrophic lateral sclerosis clinical dataset, by handling the temporal and the mixed-type nature of the data and by exploiting the cross-information among features. We also showed how the imputation quality can affect a machine learning task.

Collapse

Yang C, Yang J, Liu Y, Geng X. Cancer Risk Analysis Based on Improved Probabilistic Neural Network. Front Comput Neurosci 2020;14:58. [PMID: 32792930 PMCID: PMC7385247 DOI: 10.3389/fncom.2020.00058] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Accepted: 05/22/2020] [Indexed: 01/06/2023] Open

Saccà V, Sarica A, Novellino F, Barone S, Tallarico T, Filippelli E, Granata A, Chiriaco C, Bruno Bossio R, Valentino P, Quattrone A. Evaluation of machine learning algorithms performance for the prediction of early multiple sclerosis from resting-state FMRI connectivity data. Brain Imaging Behav 2020;13:1103-1114. [PMID: 29992392 DOI: 10.1007/s11682-018-9926-9] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Maniruzzaman M, Rahman MJ, Ahammed B, Abedin MM. Classification and prediction of diabetes disease using machine learning paradigm. Health Inf Sci Syst 2020;8:7. [PMID: 31949894 DOI: 10.1007/s13755-019-0095-z] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Accepted: 12/21/2019] [Indexed: 12/19/2022] Open

Chen W, Tsangaratos P, Ilia I, Duan Z, Chen X. Groundwater spring potential mapping using population-based evolutionary algorithms and data mining methods. Sci Total Environ 2019;684:31-49. [PMID: 31150874 DOI: 10.1016/j.scitotenv.2019.05.312] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 05/08/2019] [Accepted: 05/20/2019] [Indexed: 06/09/2023]

Abstract

Water scarcity in many regions of the world has become an unpleasant reality. Groundwater appears to be one of the main natural resources capable to reverse this situation. Uncovering the spatial patterns of groundwater occurrence is a crucial factor that could assist in carrying out successful water resources management projects. The main objective of the current study was to provide a novel methodology approach which utilized Genetic Algorithm (GA) in order to perform a feature selection procedure and data mining methods for generating a groundwater spring potential map. Three data mining methods, Naïve Bayes (NB), Support Vector Machine (SVM) and Random Forest (RF) were utilized to construct a groundwater spring potential map that had over 0.81 probability of occurrence for the Wuqi County, Shaanxi Province, China. Groundwater spring locations and sixteen related variables were analyzed, namely: lithology, soil cover, land use cover, normalized difference vegetation index (NDVI), elevation, slope angle, aspect, planform curvature, profile curvature, curvature, stream power index (SPI), stream transport index (STI), topographic wetness index (TWI), mean annual rainfall, distance from river network and distance from road network. The Frequency ratio method was used to weight the variables, whereas a multi-collinearity analysis was performed to identify the relation between the parameters and to decide about their usage. The optimal set of parameters, which was determined by the GA, reduced the number of parameters into twelve removing planform curvature, profile curvature, curvature and STI. The Receiver Operating Characteristic curve and the area under the curve (AUROC) were estimated so as to evaluate the predictive power of each model. The results indicated that the optimized models were superior in accuracy than the original models. The optimized RF model produced the best results (0.9572), followed by the optimized SVM (0.9529) and the optimized NB (0.8235). Overall, the current study highlights the necessity of applying feature selection techniques in groundwater spring assessments and also that data mining methods may be a highly powerful investigation approach for groundwater spring potential mapping.

Collapse

Dada EG, Bassi JS, Chiroma H, Abdulhamid SM, Adetunmbi AO, Ajibuwa OE. Machine learning for email spam filtering: review, approaches and open research problems. Heliyon 2019;5:e01802. [PMID: 31211254 PMCID: PMC6562150 DOI: 10.1016/j.heliyon.2019.e01802] [Citation(s) in RCA: 135] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Revised: 02/25/2019] [Accepted: 05/20/2019] [Indexed: 11/18/2022] Open

He Q, Shahabi H, Shirzadi A, Li S, Chen W, Wang N, Chai H, Bian H, Ma J, Chen Y, Wang X, Chapi K, Ahmad BB. Landslide spatial modelling using novel bivariate statistical based Naïve Bayes, RBF Classifier, and RBF Network machine learning algorithms. Sci Total Environ 2019;663:1-15. [PMID: 30708212 DOI: 10.1016/j.scitotenv.2019.01.329] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2018] [Revised: 01/06/2019] [Accepted: 01/25/2019] [Indexed: 06/09/2023]

Zorn KM, Lane TR, Russo DP, Clark AM, Makarov V, Ekins S. Multiple Machine Learning Comparisons of HIV Cell-based and Reverse Transcriptase Data Sets. Mol Pharm 2019;16:1620-1632. [PMID: 30779585 DOI: 10.1021/acs.molpharmaceut.8b01297] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Purushothaman G, Vikas R. Identification of a feature selection based pattern recognition scheme for finger movement recognition from multichannel EMG signals. Australas Phys Eng Sci Med 2018;41:549-559. [PMID: 29744809 DOI: 10.1007/s13246-018-0646-7] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Accepted: 05/01/2018] [Indexed: 11/26/2022]

Abstract

This paper focuses on identification of an effective pattern recognition scheme with the least number of time domain features for dexterous control of prosthetic hand to recognize the various finger movements from surface electromyogram (EMG) signals. Eight channels EMG from 8 able-bodied subjects for 15 individuals and combined finger activities have been considered in this work. In this work, an attempt has been made to recognize a number of classes with the least number of features. Therefore, EMG signals are pre-processed using dual tree complex wavelet transform to improve the discriminating capability of features and time domain features such as zero crossing, slope sign change, mean absolute value, and waveform length is extracted from the pre-processed data. The performance of extracted features is studied with different classifiers such as linear discriminant analysis, naive Bayes classifier, quadratic support vector machine and cubic support vector machine with and without feature selection algorithms. The feature selection has been studied using particle swarm optimization (PSO) and ant colony optimization (ACO) with different number of features to identify the effect of features. The results demonstrated that naive Bayes classifier with ant colony optimization shows an average classification accuracy of 88.89% with a response time of 0.058025 ms for recognizing the 15 different finger movements with 16 features with significant difference in accuracy compared to SVM classifier with feature selection for a significance level of 0.05. There is no significant difference in the accuracy, specificity and sensitivity of an SVM classifier with and without feature selection. But the processing time is significantly more than the LDA and NB classifier. The PSO and ACO results revealed that slope sign changes contribute to recognizing the activity. In PSO, mean absolute value has been found to be effective compared to waveform length, contradictory with ACO. Further, the zero crossings have been found to be not effective in classification of finger movements in both the methods.

Collapse

Periwal V, Scaria V. Machine Learning Approaches Toward Building Predictive Models for Small Molecule Modulators of miRNA and Its Utility in Virtual Screening of Molecular Databases. Methods Mol Biol 2017;1517:155-68. [PMID: 27924481 DOI: 10.1007/978-1-4939-6563-2_11] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Pal LR, Kundu K, Yin Y, Moult J. CAGI4 Crohn's exome challenge: Marker SNP versus exome variant models for assigning risk of Crohn disease. Hum Mutat 2017;38:1225-1234. [PMID: 28512778 PMCID: PMC5576730 DOI: 10.1002/humu.23256] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Revised: 05/09/2017] [Accepted: 05/10/2017] [Indexed: 12/18/2022]

Koutsoukas A, Monaghan KJ, Li X, Huan J. Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data. J Cheminform 2017;9:42. [PMID: 29086090 PMCID: PMC5489441 DOI: 10.1186/s13321-017-0226-y] [Citation(s) in RCA: 107] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 05/27/2017] [Indexed: 01/03/2023] Open

Abstract

Background

In recent years, research in artificial neural networks has resurged, now under the deep-learning umbrella, and grown extremely popular. Recently reported success of DL techniques in crowd-sourced QSAR and predictive toxicology competitions has showcased these methods as powerful tools in drug-discovery and toxicology research. The aim of this work was dual, first large number of hyper-parameter configurations were explored to investigate how they affect the performance of DNNs and could act as starting points when tuning DNNs and second their performance was compared to popular methods widely employed in the field of cheminformatics namely Naïve Bayes, k-nearest neighbor, random forest and support vector machines. Moreover, robustness of machine learning methods to different levels of artificially introduced noise was assessed. The open-source Caffe deep-learning framework and modern NVidia GPU units were utilized to carry out this study, allowing large number of DNN configurations to be explored.

Results

We show that feed-forward deep neural networks are capable of achieving strong classification performance and outperform shallow methods across diverse activity classes when optimized. Hyper-parameters that were found to play critical role are the activation function, dropout regularization, number hidden layers and number of neurons. When compared to the rest methods, tuned DNNs were found to statistically outperform, with p value <0.01 based on Wilcoxon statistical test. DNN achieved on average MCC units of 0.149 higher than NB, 0.092 than kNN, 0.052 than SVM with linear kernel, 0.021 than RF and finally 0.009 higher than SVM with radial basis function kernel. When exploring robustness to noise, non-linear methods were found to perform well when dealing with low levels of noise, lower than or equal to 20%, however when dealing with higher levels of noise, higher than 30%, the Naïve Bayes method was found to perform well and even outperform at the highest level of noise 50% more sophisticated methods across several datasets.

Electronic supplementary material

The online version of this article (doi:10.1186/s13321-017-0226-y) contains supplementary material, which is available to authorized users.

Collapse

Trainor PJ, DeFilippis AP, Rai SN. Evaluation of Classifier Performance for Multiclass Phenotype Discrimination in Untargeted Metabolomics. Metabolites 2017. [PMID: 28635678 PMCID: PMC5488001 DOI: 10.3390/metabo7020030] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

Abstract

Statistical classification is a critical component of utilizing metabolomics data for examining the molecular determinants of phenotypes. Despite this, a comprehensive and rigorous evaluation of the accuracy of classification techniques for phenotype discrimination given metabolomics data has not been conducted. We conducted such an evaluation using both simulated and real metabolomics datasets, comparing Partial Least Squares-Discriminant Analysis (PLS-DA), Sparse PLS-DA, Random Forests, Support Vector Machines (SVM), Artificial Neural Network, k-Nearest Neighbors (k-NN), and Naïve Bayes classification techniques for discrimination. We evaluated the techniques on simulated data generated to mimic global untargeted metabolomics data by incorporating realistic block-wise correlation and partial correlation structures for mimicking the correlations and metabolite clustering generated by biological processes. Over the simulation studies, covariance structures, means, and effect sizes were stochastically varied to provide consistent estimates of classifier performance over a wide range of possible scenarios. The effects of the presence of non-normal error distributions, the introduction of biological and technical outliers, unbalanced phenotype allocation, missing values due to abundances below a limit of detection, and the effect of prior-significance filtering (dimension reduction) were evaluated via simulation. In each simulation, classifier parameters, such as the number of hidden nodes in a Neural Network, were optimized by cross-validation to minimize the probability of detecting spurious results due to poorly tuned classifiers. Classifier performance was then evaluated using real metabolomics datasets of varying sample medium, sample size, and experimental design. We report that in the most realistic simulation studies that incorporated non-normal error distributions, unbalanced phenotype allocation, outliers, missing values, and dimension reduction, classifier performance (least to greatest error) was ranked as follows: SVM, Random Forest, Naïve Bayes, sPLS-DA, Neural Networks, PLS-DA and k-NN classifiers. When non-normal error distributions were introduced, the performance of PLS-DA and k-NN classifiers deteriorated further relative to the remaining techniques. Over the real datasets, a trend of better performance of SVM and Random Forest classifier performance was observed.

Collapse

Jarecki JB, Meder B, Nelson JD. Naïve and Robust: Class-Conditional Independence in Human Classification Learning. Cogn Sci 2017;42:4-42. [PMID: 28574602 DOI: 10.1111/cogs.12496] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2016] [Revised: 09/19/2016] [Accepted: 11/18/2017] [Indexed: 11/30/2022]

Zeng F, Yang D, Xing X, Qi S. Evaluation of Bayesian approaches to identify DDT source contributions to soils in Southeast China. Chemosphere 2017;176:32-38. [PMID: 28254712 DOI: 10.1016/j.chemosphere.2017.02.049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2016] [Revised: 02/06/2017] [Accepted: 02/08/2017] [Indexed: 06/06/2023]

Basu N, Bandyopadhyay SK. 2D Source area prediction based on physical characteristics of a regular, passive blood drip stain. Forensic Sci Int 2016;266:39-53. [PMID: 27295073 DOI: 10.1016/j.forsciint.2016.04.024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Revised: 03/21/2016] [Accepted: 04/18/2016] [Indexed: 10/21/2022]

Yao ZJ, Dong J, Che YJ, Zhu MF, Wen M, Wang NN, Wang S, Lu AP, Cao DS. TargetNet: a web service for predicting potential drug-target interaction profiling via multi-target SAR models. J Comput Aided Mol Des 2016;30:413-24. [PMID: 27167132 DOI: 10.1007/s10822-016-9915-2] [Citation(s) in RCA: 185] [Impact Index Per Article: 23.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2016] [Accepted: 05/06/2016] [Indexed: 02/01/2023]

Yin X, Hadjiloucas S, Zhang Y. Classification of THz pulse signals using two-dimensional cross-correlation feature extraction and non-linear classifiers. Comput Methods Programs Biomed 2016;127:64-82. [PMID: 27000290 DOI: 10.1016/j.cmpb.2016.01.017] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Revised: 01/20/2016] [Accepted: 01/21/2016] [Indexed: 05/14/2023]

Abstract

This work provides a performance comparison of four different machine learning classifiers: multinomial logistic regression with ridge estimators (MLR) classifier, k-nearest neighbours (KNN), support vector machine (SVM) and naïve Bayes (NB) as applied to terahertz (THz) transient time domain sequences associated with pixelated images of different powder samples. The six substances considered, although have similar optical properties, their complex insertion loss at the THz part of the spectrum is significantly different because of differences in both their frequency dependent THz extinction coefficient as well as differences in their refractive index and scattering properties. As scattering can be unquantifiable in many spectroscopic experiments, classification solely on differences in complex insertion loss can be inconclusive. The problem is addressed using two-dimensional (2-D) cross-correlations between background and sample interferograms, these ensure good noise suppression of the datasets and provide a range of statistical features that are subsequently used as inputs to the above classifiers. A cross-validation procedure is adopted to assess the performance of the classifiers. Firstly the measurements related to samples that had thicknesses of 2mm were classified, then samples at thicknesses of 4mm, and after that 3mm were classified and the success rate and consistency of each classifier was recorded. In addition, mixtures having thicknesses of 2 and 4mm as well as mixtures of 2, 3 and 4mm were presented simultaneously to all classifiers. This approach provided further cross-validation of the classification consistency of each algorithm. The results confirm the superiority in classification accuracy and robustness of the MLR (least accuracy 88.24%) and KNN (least accuracy 90.19%) algorithms which consistently outperformed the SVM (least accuracy 74.51%) and NB (least accuracy 56.86%) classifiers for the same number of feature vectors across all studies. The work establishes a general methodology for assessing the performance of other hyperspectral dataset classifiers on the basis of 2-D cross-correlations in far-infrared spectroscopy or other parts of the electromagnetic spectrum. It also advances the wider proliferation of automated THz imaging systems across new application areas e.g., biomedical imaging, industrial processing and quality control where interpretation of hyperspectral images is still under development.

Collapse

Bertke SJ, Meyers AR, Wurzelbacher SJ, Measure A, Lampl MP, Robins D. Comparison of methods for auto-coding causation of injury narratives. Accid Anal Prev 2016;88:117-123. [PMID: 26745274 PMCID: PMC4915551 DOI: 10.1016/j.aap.2015.12.006] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 11/13/2015] [Accepted: 12/07/2015] [Indexed: 05/30/2023]

Carvajal G, Roser DJ, Sisson SA, Keegan A, Khan SJ. Modelling pathogen log10 reduction values achieved by activated sludge treatment using naïve and semi naïve Bayes network models. Water Res 2015;85:304-315. [PMID: 26342914 DOI: 10.1016/j.watres.2015.08.035] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2015] [Revised: 08/03/2015] [Accepted: 08/19/2015] [Indexed: 06/05/2023]

Abstract

Risk management for wastewater treatment and reuse have led to growing interest in understanding and optimising pathogen reduction during biological treatment processes. However, modelling pathogen reduction is often limited by poor characterization of the relationships between variables and incomplete knowledge of removal mechanisms. The aim of this paper was to assess the applicability of Bayesian belief network models to represent associations between pathogen reduction, and operating conditions and monitoring parameters and predict AS performance. Naïve Bayes and semi-naïve Bayes networks were constructed from an activated sludge dataset including operating and monitoring parameters, and removal efficiencies for two pathogens (native Giardia lamblia and seeded Cryptosporidium parvum) and five native microbial indicators (F-RNA bacteriophage, Clostridium perfringens, Escherichia coli, coliforms and enterococci). First we defined the Bayesian network structures for the two pathogen log10 reduction values (LRVs) class nodes discretized into two states (< and ≥ 1 LRV) using two different learning algorithms. Eight metrics, such as Prediction Accuracy (PA) and Area Under the receiver operating Curve (AUC), provided a comparison of model prediction performance, certainty and goodness of fit. This comparison was used to select the optimum models. The optimum Tree Augmented naïve models predicted removal efficiency with high AUC when all system parameters were used simultaneously (AUCs for C. parvum and G. lamblia LRVs of 0.95 and 0.87 respectively). However, metrics for individual system parameters showed only the C. parvum model was reliable. By contrast individual parameters for G. lamblia LRV prediction typically obtained low AUC scores (AUC < 0.81). Useful predictors for C. parvum LRV included solids retention time, turbidity and total coliform LRV. The methodology developed appears applicable for predicting pathogen removal efficiency in water treatment systems generally.

Collapse

Marucci-Wellman HR, Lehto MR, Corns HL. A practical tool for public health surveillance: Semi-automated coding of short injury narratives from large administrative databases using Naïve Bayes algorithms. Accid Anal Prev 2015;84:165-176. [PMID: 26412196 DOI: 10.1016/j.aap.2015.06.014] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2015] [Accepted: 06/30/2015] [Indexed: 06/05/2023]

Mussa HY, Marcus D, Mitchell JBO, Glen RC. Verifying the fully "Laplacianised" posterior Naïve Bayesian approach and more. J Cheminform 2015;7:27. [PMID: 26075027 PMCID: PMC4464057 DOI: 10.1186/s13321-015-0075-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2015] [Accepted: 05/12/2015] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

In a recent paper, Mussa, Mitchell and Glen (MMG) have mathematically demonstrated that the "Laplacian Corrected Modified Naïve Bayes" (LCMNB) algorithm can be viewed as a variant of the so-called Standard Naïve Bayes (SNB) scheme, whereby the role played by absence of compound features in classifying/assigning the compound to its appropriate class is ignored. MMG have also proffered guidelines regarding the conditions under which this omission may hold. Utilising three data sets, the present paper examines the validity of these guidelines in practice. The paper also extends MMG's work and introduces a new version of the SNB classifier: "Tapered Naïve Bayes" (TNB). TNB does not discard the role of absence of a feature out of hand, nor does it fully consider its role. Hence, TNB encapsulates both SNB and LCMNB.

RESULTS

LCMNB, SNB and TNB performed differently on classifying 4,658, 5,031 and 1,149 ligands (all chosen from the ChEMBL Database) distributed over 31 enzymes, 23 membrane receptors, and one ion-channel, four transporters and one transcription factor as their target proteins. When the number of features utilised was equal to or smaller than the "optimal" number of features for a given data set, SNB classifiers systematically gave better classification results than those yielded by LCMNB classifiers. The opposite was true when the number of features employed was markedly larger than the "optimal" number of features for this data set. Nonetheless, these LCMNB performances were worse than the classification performance achieved by SNB when the "optimal" number of features for the data set was utilised. TNB classifiers systematically outperformed both SNB and LCMNB classifiers.

CONCLUSIONS

The classification results obtained in this study concur with the mathematical based guidelines given in MMG's paper-that is, ignoring the role of absence of a feature out of hand does not necessarily improve classification performance of the SNB approach; if anything, it could make the performance of the SNB method worse. The results obtained also lend support to the rationale, on which the TNB algorithm rests: handled judiciously, taking into account absence of features can enhance (not impair) the discriminatory classification power of the SNB approach.

Collapse

Fakhraei S, Soltanian-Zadeh H, Fotouhi F. Bias and Stability of Single Variable Classifiers for Feature Ranking and Selection. Expert Syst Appl 2014;14:6945-6958. [PMID: 25177107 PMCID: PMC4144463 DOI: 10.1016/j.eswa.2014.05.007] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]

Huang H, Tosun AB, Guo J, Chen C, Wang W, Ozolek JA, Rohde GK. Cancer diagnosis by nuclear morphometry using spatial information ^.. Pattern Recognit Lett 2014;42:115-121. [PMID: 24910485 DOI: 10.1016/j.patrec.2014.02.008] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Majid A, Ali S, Iqbal M, Kausar N. Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines. Comput Methods Programs Biomed 2014;113:792-808. [PMID: 24472367 DOI: 10.1016/j.cmpb.2014.01.001] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2013] [Revised: 12/29/2013] [Accepted: 01/03/2014] [Indexed: 06/03/2023]

Bashyam V, Morioka C, El-Saden S, Bui AAT, Taira RK. Identifying relevant medical reports from an assorted report collection using the multinomial naïve Bayes classifier and the UMLS. Indian J Med Inform 2007;2:2. [PMID: 36284749 PMCID: PMC9592058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]