1
|
Kumari A, Akhtar M, Shah R, Tanveer M. Support matrix machine: A review. Neural Netw 2025; 181:106767. [PMID: 39488110 DOI: 10.1016/j.neunet.2024.106767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 07/31/2024] [Accepted: 09/26/2024] [Indexed: 11/04/2024]
Abstract
Support vector machine (SVM) is one of the most studied paradigms in the realm of machine learning for classification and regression problems. It relies on vectorized input data. However, a significant portion of the real-world data exists in matrix format, which is given as input to SVM by reshaping the matrices into vectors. The process of reshaping disrupts the spatial correlations inherent in the matrix data. Also, converting matrices into vectors results in input data with a high dimensionality, which introduces significant computational complexity. To overcome these issues in classifying matrix input data, support matrix machine (SMM) is proposed. It represents one of the emerging methodologies tailored for handling matrix input data. SMM preserves the structural information of the matrix data by using the spectral elastic net property which is a combination of the nuclear norm and Frobenius norm. This article provides the first in-depth analysis of the development of the SMM model, which can be used as a thorough summary by both novices and experts. We discuss numerous SMM variants, such as robust, sparse, class-imbalance, and multi-class classification models. We also analyze the applications of the SMM and conclude the article by outlining potential future research avenues and possibilities that may motivate researchers to advance the SMM algorithm.
Collapse
Affiliation(s)
- Anuradha Kumari
- Department of Mathematics, Indian Institute of Technology Indore, Simrol, Indore, 453552, Madhya Pradesh, India
| | - Mushir Akhtar
- Department of Mathematics, Indian Institute of Technology Indore, Simrol, Indore, 453552, Madhya Pradesh, India
| | - Rupal Shah
- Department of Electrical Engineering, Indian Institute of Technology Indore, Simrol, Indore, 453552, Madhya Pradesh, India
| | - M Tanveer
- Department of Mathematics, Indian Institute of Technology Indore, Simrol, Indore, 453552, Madhya Pradesh, India.
| |
Collapse
|
2
|
Yasin P, Yimit Y, Cai X, Aimaiti A, Sheng W, Mamat M, Nijiati M. Machine learning-enabled prediction of prolonged length of stay in hospital after surgery for tuberculosis spondylitis patients with unbalanced data: a novel approach using explainable artificial intelligence (XAI). Eur J Med Res 2024; 29:383. [PMID: 39054495 PMCID: PMC11270948 DOI: 10.1186/s40001-024-01988-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Accepted: 07/18/2024] [Indexed: 07/27/2024] Open
Abstract
BACKGROUND Tuberculosis spondylitis (TS), commonly known as Pott's disease, is a severe type of skeletal tuberculosis that typically requires surgical treatment. However, this treatment option has led to an increase in healthcare costs due to prolonged hospital stays (PLOS). Therefore, identifying risk factors associated with extended PLOS is necessary. In this research, we intended to develop an interpretable machine learning model that could predict extended PLOS, which can provide valuable insights for treatments and a web-based application was implemented. METHODS We obtained patient data from the spine surgery department at our hospital. Extended postoperative length of stay (PLOS) refers to a hospitalization duration equal to or exceeding the 75th percentile following spine surgery. To identify relevant variables, we employed several approaches, such as the least absolute shrinkage and selection operator (LASSO), recursive feature elimination (RFE) based on support vector machine classification (SVC), correlation analysis, and permutation importance value. Several models using implemented and some of them are ensembled using soft voting techniques. Models were constructed using grid search with nested cross-validation. The performance of each algorithm was assessed through various metrics, including the AUC value (area under the curve of receiver operating characteristics) and the Brier Score. Model interpretation involved utilizing methods such as Shapley additive explanations (SHAP), the Gini Impurity Index, permutation importance, and local interpretable model-agnostic explanations (LIME). Furthermore, to facilitate the practical application of the model, a web-based interface was developed and deployed. RESULTS The study included a cohort of 580 patients and 11 features include (CRP, transfusions, infusion volume, blood loss, X-ray bone bridge, X-ray osteophyte, CT-vertebral destruction, CT-paravertebral abscess, MRI-paravertebral abscess, MRI-epidural abscess, postoperative drainage) were selected. Most of the classifiers showed better performance, where the XGBoost model has a higher AUC value (0.86) and lower Brier Score (0.126). The XGBoost model was chosen as the optimal model. The results obtained from the calibration and decision curve analysis (DCA) plots demonstrate that XGBoost has achieved promising performance. After conducting tenfold cross-validation, the XGBoost model demonstrated a mean AUC of 0.85 ± 0.09. SHAP and LIME were used to display the variables' contributions to the predicted value. The stacked bar plots indicated that infusion volume was the primary contributor, as determined by Gini, permutation importance (PFI), and the LIME algorithm. CONCLUSIONS Our methods not only effectively predicted extended PLOS but also identified risk factors that can be utilized for future treatments. The XGBoost model developed in this study is easily accessible through the deployed web application and can aid in clinical research.
Collapse
Affiliation(s)
- Parhat Yasin
- Department of Spine Surgery, The Sixth Affiliated Hospital of Xinjiang Medical University, Urumqi, 830000, Xinjiang, People's Republic of China
- Department of Spine Surgery, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, 830054, Xinjiang, People's Republic of China
| | - Yasen Yimit
- Department of Radiology, The First People's Hospital of Kashi Prefecture, Kashi, 844000, Xinjiang, People's Republic of China
| | - Xiaoyu Cai
- Department of Spine Surgery, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, 830054, Xinjiang, People's Republic of China
| | - Abasi Aimaiti
- Department of Anesthesiology, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, 830054, Xinjiang, People's Republic of China
| | - Weibin Sheng
- Department of Spine Surgery, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, 830054, Xinjiang, People's Republic of China
| | - Mardan Mamat
- Department of Spine Surgery, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, 830054, Xinjiang, People's Republic of China.
| | - Mayidili Nijiati
- Department of Radiology, The Fourth Affiliated Hospital of Xinjiang Medical University(Xinjiang Hospital of Traditional Chinese Medicine), Urumqi, 830002, Xinjiang, People's Republic of China.
- Xinjiang Key Laboratory of Artificial Intelligence Assisted Imaging Diagnosis, Kashi, 844000, Xinjiang, People's Republic of China.
| |
Collapse
|
3
|
Deng F, Zhao L, Yu N, Lin Y, Zhang L. Union With Recursive Feature Elimination: A Feature Selection Framework to Improve the Classification Performance of Multicategory Causes of Death in Colorectal Cancer. J Transl Med 2024; 104:100320. [PMID: 38158124 DOI: 10.1016/j.labinv.2023.100320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 12/05/2023] [Accepted: 12/20/2023] [Indexed: 01/03/2024] Open
Abstract
Despite the use of machine learning tools, it is challenging to properly model cause-specific deaths in colorectal cancer (CRC) patients and choose appropriate treatments. Here, we propose an interesting feature selection framework, namely union with recursive feature elimination (U-RFE), to select the union feature sets that are crucial in CRC progression-specific mortality using The Cancer Genome Atlas (TCGA) dataset. Based on the union feature sets, we compared the performance of 5 classification algorithms, including logistic regression (LR), support vector machines (SVM), random forest (RF), eXtreme gradient boosting (XGBoost), and Stacking, to identify the best model for classifying 4-category deaths. In the first stage of U-RFE, LR, SVM, and RF were used as base estimators to obtain subsets containing the same number of features but not exactly the same specific features. Union analysis of the subsets was then performed to determine the final union feature set, effectively combining the advantages of different algorithms. We found that the U-RFE framework could improve various models' performance. Stacking outperformed LR, SVM, RF, and XGBoost in most scenarios. When the target feature number of the RFE was set to 50 and the union feature set contained 298 deterministic features, the Stacking model achieved F1_weighted, Recall_weighted, Precision_weighted, Accuracy, and Matthews correlation coefficient of 0.851, 0.864, 0.854, 0.864, and 0.717, respectively. The performance of the minority categories was also significantly improved. Therefore, this recursive feature elimination-based approach of feature selection improves performances of classifying CRC deaths using clinical and omics data or those using other data with high feature redundancy and imbalance.
Collapse
Affiliation(s)
- Fei Deng
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China.
| | - Lin Zhao
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
| | - Ning Yu
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
| | - Yuxiang Lin
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
| | - Lanjing Zhang
- Department of Biological Sciences, Rutgers University, Newark, New Jersey; Department of Pathology, Princeton Medical Center, Plainsboro, New Jersey; Rutgers Cancer Institute of New Jersey, New Brunswick, New Jersey; Department of Chemical Biology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, New Jersey.
| |
Collapse
|
4
|
Rakhshaninejad M, Fathian M, Shirkoohi R, Barzinpour F, Gandomi AH. Refining breast cancer biomarker discovery and drug targeting through an advanced data-driven approach. BMC Bioinformatics 2024; 25:33. [PMID: 38253993 PMCID: PMC10810249 DOI: 10.1186/s12859-024-05657-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 01/15/2024] [Indexed: 01/24/2024] Open
Abstract
Breast cancer remains a major public health challenge worldwide. The identification of accurate biomarkers is critical for the early detection and effective treatment of breast cancer. This study utilizes an integrative machine learning approach to analyze breast cancer gene expression data for superior biomarker and drug target discovery. Gene expression datasets, obtained from the GEO database, were merged post-preprocessing. From the merged dataset, differential expression analysis between breast cancer and normal samples revealed 164 differentially expressed genes. Meanwhile, a separate gene expression dataset revealed 350 differentially expressed genes. Additionally, the BGWO_SA_Ens algorithm, integrating binary grey wolf optimization and simulated annealing with an ensemble classifier, was employed on gene expression datasets to identify predictive genes including TOP2A, AKR1C3, EZH2, MMP1, EDNRB, S100B, and SPP1. From over 10,000 genes, BGWO_SA_Ens identified 1404 in the merged dataset (F1 score: 0.981, PR-AUC: 0.998, ROC-AUC: 0.995) and 1710 in the GSE45827 dataset (F1 score: 0.965, PR-AUC: 0.986, ROC-AUC: 0.972). The intersection of DEGs and BGWO_SA_Ens selected genes revealed 35 superior genes that were consistently significant across methods. Enrichment analyses uncovered the involvement of these superior genes in key pathways such as AMPK, Adipocytokine, and PPAR signaling. Protein-protein interaction network analysis highlighted subnetworks and central nodes. Finally, a drug-gene interaction investigation revealed connections between superior genes and anticancer drugs. Collectively, the machine learning workflow identified a robust gene signature for breast cancer, illuminated their biological roles, interactions and therapeutic associations, and underscored the potential of computational approaches in biomarker discovery and precision oncology.
Collapse
Affiliation(s)
- Morteza Rakhshaninejad
- Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran
| | - Mohammad Fathian
- Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran.
| | - Reza Shirkoohi
- Cancer Biology Research Center, Cancer Institute, Imam Khomeini Hospital Complex, Tehran University of Medical Sciences, Keshavarz Boulevard, Tehran, 1419733141, Tehran, Iran
| | - Farnaz Barzinpour
- Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran
| | - Amir H Gandomi
- Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, 2007, NSW, Australia
- University Research and Innovation Center (EKIK), Óbuda University, Budapest, 1034, Hungary
| |
Collapse
|
5
|
Sheikh K, Sayeed S, Asif A, Siddiqui MF, Rafeeq MM, Sahu A, Ahmad S. Consequential Innovations in Nature-Inspired Intelligent Computing Techniques for Biomarkers and Potential Therapeutics Identification. STUDIES IN COMPUTATIONAL INTELLIGENCE 2023:247-274. [DOI: 10.1007/978-981-19-6379-7_13] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/24/2024]
|
6
|
Hybrid Approach to Identifying Druglikeness Leading Compounds against COVID-19 3CL Protease. Pharmaceuticals (Basel) 2022; 15:ph15111333. [DOI: 10.3390/ph15111333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Revised: 10/21/2022] [Accepted: 10/25/2022] [Indexed: 11/16/2022] Open
Abstract
SARS-CoV-2 is a positive single-strand RNA-based macromolecule that has caused the death of more than 6.3 million people since June 2022. Moreover, by disturbing global supply chains through lockdowns, the virus has indirectly caused devastating damage to the global economy. It is vital to design and develop drugs for this virus and its various variants. In this paper, we developed an in silico study-based hybrid framework to repurpose existing therapeutic agents in finding drug-like bioactive molecules that would cure COVID-19. In the first step, a total of 133 drug-likeness bioactive molecules are retrieved from the ChEMBL database against SARS coronavirus 3CL Protease. Based on the standard IC50, the dataset is divided into three classes: active, inactive, and intermediate. Our comparative analysis demonstrated that the proposed Extra Tree Regressor (ETR)-based QSAR model has improved prediction results related to the bioactivity of chemical compounds as compared to Gradient Boosting-, XGBoost-, Support Vector-, Decision Tree-, and Random Forest-based regressor models. ADMET analysis is carried out to identify thirteen bioactive molecules with the ChEMBL IDs 187460, 190743, 222234, 222628, 222735, 222769, 222840, 222893, 225515, 358279, 363535, 365134, and 426898. These molecules are highly suitable drug candidates for SARS-CoV-2 3CL Protease. In the next step, the efficacy of the bioactive molecules is computed in terms of binding affinity using molecular docking, and then six bioactive molecules are shortlisted, with the ChEMBL IDs 187460, 222769, 225515, 358279, 363535, and 365134. These molecules can be suitable drug candidates for SARS-CoV-2. It is anticipated that the pharmacologist and/or drug manufacturer would further investigate these six molecules to find suitable drug candidates for SARS-CoV-2. They can adopt these promising compounds for their downstream drug development stages.
Collapse
|
7
|
Liu X, Guo L, Wang H, Guo J, Yang S, Duan L. Research on imbalance machine learning methods for MR
T
1
WI soft tissue sarcoma data. BMC Med Imaging 2022; 22:149. [PMID: 36028803 PMCID: PMC9417078 DOI: 10.1186/s12880-022-00876-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 08/08/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Soft tissue sarcoma is a rare and highly heterogeneous tumor in clinical practice. Pathological grading of the soft tissue sarcoma is a key factor in patient prognosis and treatment planning while the clinical data of soft tissue sarcoma are imbalanced. In this paper, we propose an effective solution to find the optimal imbalance machine learning model for predicting the classification of soft tissue sarcoma data. METHODS In this paper, a large number of features are first obtained based onT 1 WI images using the radiomics methods.Then, we explore the methods of feature selection, sampling and classification, get 17 imbalance machine learning models based on the above features and performed extensive experiments to classify imbalanced soft tissue sarcoma data. Meanwhile, we used another dataset splitting method as well, which could improve the classification performance and verify the validity of the models. RESULTS The experimental results show that the combination of extremely randomized trees (ERT) classification algorithm using SMOTETomek and the recursive feature elimination technique (RFE) performs best compared to other methods. The accuracy of RFE+STT+ERT is 81.57% , which is close to the accuracy of biopsy, and the accuracy is 95.69% when using another dataset splitting method. CONCLUSION Preoperative predicting pathological grade of soft tissue sarcoma in an accurate and noninvasive manner is essential. Our proposed machine learning method (RFE+STT+ERT) can make a positive contribution to solving the imbalanced data classification problem, which can favorably support the development of personalized treatment plans for soft tissue sarcoma patients.
Collapse
Affiliation(s)
- Xuanxuan Liu
- College of Computer Science and Technology, Qingdao University, Qingdao, 266071 China
| | - Li Guo
- College of Computer Science and Technology, Qingdao University, Qingdao, 266071 China
| | - Hexiang Wang
- Department of Radiology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Jia Guo
- Department of Radiology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Shifeng Yang
- Department of Radiology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China
| | - Lisha Duan
- Department of Radiology, The Third Hospital of Hebei Medical University, Shijiazhuang, Qingdao, China
| |
Collapse
|
8
|
Lin LS, Hu SC, Lin YS, Li DC, Siao LR. A new approach to generating virtual samples to enhance classification accuracy with small data-a case of bladder cancer. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:6204-6233. [PMID: 35603398 DOI: 10.3934/mbe.2022290] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In the medical field, researchers are often unable to obtain the sufficient samples in a short period of time necessary to build a stable data-driven forecasting model used to classify a new disease. To address the problem of small data learning, many studies have demonstrated that generating virtual samples intended to augment the amount of training data is an effective approach, as it helps to improve forecasting models with small datasets. One of the most popular methods used in these studies is the mega-trend-diffusion (MTD) technique, which is widely used in various fields. The effectiveness of the MTD technique depends on the degree of data diffusion. However, data diffusion is seriously affected by extreme values. In addition, the MTD method only considers data fitted using a unimodal triangular membership function. However, in fact, data may come from multiple distributions in the real world. Therefore, considering the fact that data comes from multi-distributions, in this paper, a distance-based mega-trend-diffusion (DB-MTD) technique is proposed to appropriately estimate the degree of data diffusion with less impacts from extreme values. In the proposed method, it is assumed that the data is fitted by the triangular and trapezoidal membership functions to generate virtual samples. In addition, a possibility evaluation mechanism is proposed to measure the applicability of the virtual samples. In our experiment, two bladder cancer datasets are used to verify the effectiveness of the proposed DB-MTD method. The experimental results demonstrated that the proposed method outperforms other VSG techniques in classification and regression items for small bladder cancer datasets.
Collapse
Affiliation(s)
- Liang-Sian Lin
- Department of Information Management, National Taipei University of Nursing and Health Sciences, Ming-te Road, Taipei 112303, Taiwan
| | - Susan C Hu
- Department of Public Health, College of Medicine, National Cheng Kung University, University Road, Tainan 70101, Taiwan
| | - Yao-San Lin
- Singapore Centre for Chinese Language, Nanyang Technological University, Ghim Moh Road Singapore 279623, Singapore
| | - Der-Chiang Li
- Department of Industrial and Information Management, National Cheng Kung University, University Road, Tainan 70101, Taiwan
| | - Liang-Ren Siao
- Department of Industrial and Information Management, National Cheng Kung University, University Road, Tainan 70101, Taiwan
| |
Collapse
|
9
|
Xiao Y, Wu J, Lin Z. Cancer diagnosis using generative adversarial networks based on deep learning from imbalanced data. Comput Biol Med 2021; 135:104540. [PMID: 34153791 DOI: 10.1016/j.compbiomed.2021.104540] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 05/14/2021] [Accepted: 05/26/2021] [Indexed: 11/19/2022]
Abstract
BACKGROUND AND OBJECTIVE Cancer is a serious global disease due to its high mortality, and the key to effective treatment is accurate diagnosis. However, limited by sampling difficulty and actual sample size in clinical practice, data imbalance is a common problem in cancer diagnosis, while most conventional classification methods assume balanced data distribution. Therefore, addressing the imbalanced learning problem to improve the predictive performance of cancer diagnosis is significant. METHODS In the study, we dissect the data imbalance prevalent in cancer gene expression data and present an improved deep learning based Wasserstein generative adversarial network (WGAN) model, which provides a reliable training progress indicator and deeply explores the characteristics of data. The WGAN generates new samples from the minority class and solves the imbalance problem at the data level. RESULTS We analyze three publicly available data sets on RNA-seq of three kinds of cancer using the proposed WGAN and compare the results with those from two commonly adopted sampling methods. According to the results, through addressing the data imbalance problem, the balanced data distribution and the expanding sample size increase the prediction accuracy in all three data sets. CONCLUSIONS Therefore, the proposed WGAN method is superior in solving the imbalanced learning problem of gene expression data, providing significantly better prediction performance in cancer diagnosis.
Collapse
Affiliation(s)
- Yawen Xiao
- Department of Automation, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Jun Wu
- The Center for Bioinformatics and Computational Biology, East China Normal University, Shanghai, 200241, China.
| | - Zongli Lin
- Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, 22904-4743, USA.
| |
Collapse
|
10
|
Ramesh Dhanaseelan F, Jeya Sutha M. Detection of Breast Cancer Based on Fuzzy Frequent Itemsets Mining. Ing Rech Biomed 2021. [DOI: 10.1016/j.irbm.2020.05.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
11
|
A multiple combined method for rebalancing medical data with class imbalances. Comput Biol Med 2021; 134:104527. [PMID: 34091384 DOI: 10.1016/j.compbiomed.2021.104527] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 05/24/2021] [Accepted: 05/25/2021] [Indexed: 11/24/2022]
Abstract
Most classification algorithms assume that classes are in a balanced state. However, datasets with class imbalances are everywhere. The classes of actual medical datasets are imbalanced, severely impacting identification models and even sacrificing the classification accuracy of the minority class, even though it is the most influential and representative. The medical field has irreversible characteristics. Its tolerance rate for misjudgment is relatively low, and errors may cause irreparable harm to patients. Therefore, this study proposes a multiple combined method to rebalance medical data featuring class imbalances. The combined methods include (1) resampling methods (synthetic minority oversampling technique [SMOTE] and undersampling [US]), (2) particle swarm optimization (PSO), and (3) MetaCost. This study conducted two experiments with nine medical datasets to verify and compare the proposed method with the listing methods. A decision tree is used to generate decision rules for easy understanding of the research results. The results show that (1) the proposed method with ensemble learning can improve the area under a receiver operating characteristic curve (AUC), recall, precision, and F1 metrics; (2) MetaCost can increase sensitivity; (3) SMOTE can effectively enhance AUC; (4) US can improve sensitivity, F1, and misclassification costs in data with a high-class imbalance ratio; and (5) PSO-based attribute selection can increase sensitivity and reduce data dimension. Finally, we suggest that the dataset with an imbalanced ratio >9 must use the US results to make the decision. As the imbalanced ratio is < 9, the decision-maker can simultaneously consider the results of SMOTE and US to identify the best decision.
Collapse
|
12
|
Haque F, Bin Ibne Reaz M, Chowdhury MEH, Srivastava G, Hamid Md Ali S, Bakar AAA, Bhuiyan MAS. Performance Analysis of Conventional Machine Learning Algorithms for Diabetic Sensorimotor Polyneuropathy Severity Classification. Diagnostics (Basel) 2021; 11:diagnostics11050801. [PMID: 33925190 PMCID: PMC8146253 DOI: 10.3390/diagnostics11050801] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 04/23/2021] [Accepted: 04/27/2021] [Indexed: 12/26/2022] Open
Abstract
Background: Diabetic peripheral neuropathy (DSPN), a major form of diabetic neuropathy, is a complication that arises in long-term diabetic patients. Even though the application of machine learning (ML) in disease diagnosis is a very common and well-established field of research, its application in diabetic peripheral neuropathy (DSPN) diagnosis using composite scoring techniques like Michigan Neuropathy Screening Instrumentation (MNSI), is very limited in the existing literature. Method: In this study, the MNSI data were collected from the Epidemiology of Diabetes Interventions and Complications (EDIC) clinical trials. Two different datasets with different MNSI variable combinations based on the results from the eXtreme Gradient Boosting feature ranking technique were used to analyze the performance of eight different conventional ML algorithms. Results: The random forest (RF) classifier outperformed other ML models for both datasets. However, all ML models showed almost perfect reliability based on Kappa statistics and a high correlation between the predicted output and actual class of the EDIC patients when all six MNSI variables were considered as inputs. Conclusions: This study suggests that the RF algorithm-based classifier using all MNSI variables can help to predict the DSPN severity which will help to enhance the medical facilities for diabetic patients.
Collapse
Affiliation(s)
- Fahmida Haque
- Department of Electrical, Electronic and System Engineering, Universiti Kebangsaan Malaysia, Bangi 43600, Malaysia; (F.H.); (M.B.I.R.); (S.H.M.A.); (A.A.A.B.)
| | - Mamun Bin Ibne Reaz
- Department of Electrical, Electronic and System Engineering, Universiti Kebangsaan Malaysia, Bangi 43600, Malaysia; (F.H.); (M.B.I.R.); (S.H.M.A.); (A.A.A.B.)
| | | | - Geetika Srivastava
- Department of Physics and Electronics, Dr. Ram Manohar Lohia Avadh University, Ayodhya 224001, India;
| | - Sawal Hamid Md Ali
- Department of Electrical, Electronic and System Engineering, Universiti Kebangsaan Malaysia, Bangi 43600, Malaysia; (F.H.); (M.B.I.R.); (S.H.M.A.); (A.A.A.B.)
| | - Ahmad Ashrif A. Bakar
- Department of Electrical, Electronic and System Engineering, Universiti Kebangsaan Malaysia, Bangi 43600, Malaysia; (F.H.); (M.B.I.R.); (S.H.M.A.); (A.A.A.B.)
| | - Mohammad Arif Sobhan Bhuiyan
- Department Electrical and Electronic Engineering, Xiamen University Malaysia, Bandar Sunsuria, Sepang 43900, Malaysia
- Correspondence:
| |
Collapse
|
13
|
Du X. Research on time series characteristics of sports training effect based on support vector machine. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-189573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Sports athletes not only exercise fast, but also suffer from the surrounding complex environment. Therefore, the video needs to be sequenced to improve processing efficiency. From the perspective of machine learning, this paper designs a spatial feature extractor based on CNN to extract time series features of sports. Moreover, this paper uses the support vector machine as the basis of the construction model to construct a feature extraction model based on support vector machine and random forest based on different situations. At the same time, this paper collects test data through the sports database and uses the swimming project as an example to analyze the model performance. Finally, the paper verifies the validity of the model by comparing and verifying methods. The research indicates that the proposed method has certain effectiveness and can provide theoretical reference for subsequent related research.
Collapse
Affiliation(s)
- Xiaobing Du
- Department of P.E., Hanshan Normal University, Chaozhou, Guangdong, China
| |
Collapse
|
14
|
ImbTreeEntropy and ImbTreeAUC: Novel R Packages for Decision Tree Learning on the Imbalanced Datasets. ELECTRONICS 2021. [DOI: 10.3390/electronics10060657] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This paper presents two R packages ImbTreeEntropy and ImbTreeAUC to handle imbalanced data problems. ImbTreeEntropy functionality includes application of a generalized entropy functions, such as Rényi, Tsallis, Sharma–Mittal, Sharma–Taneja and Kapur, to measure impurity of a node. ImbTreeAUC provides non-standard measures to choose an optimal split point for an attribute (as well the optimal attribute for splitting) by employing local, semi-global and global AUC (Area Under the ROC curve) measures. Both packages are applicable for binary and multiclass problems and they support cost-sensitive learning, by defining a misclassification cost matrix, and weighted-sensitive learning. The packages accept all types of attributes, including continuous, ordered and nominal, where the latter type is simplified for multiclass problems to reduce the computational overheads. Both applications enable optimization of the thresholds where posterior probabilities determine final class labels in a way that misclassification costs are minimized. Model overfitting can be managed either during the growing phase or at the end using post-pruning. The packages are mainly implemented in R, however some computationally demanding functions are written in plain C++. In order to speed up learning time, parallel processing is supported as well.
Collapse
|
15
|
Altıntop ÇG, Latifoğlu F, Akın AK, İleri R, Yazar MA. Analysis of Consciousness Level Using Galvanic Skin Response during Therapeutic Effect. J Med Syst 2020; 45:1. [PMID: 33236166 DOI: 10.1007/s10916-020-01677-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 11/17/2020] [Indexed: 11/25/2022]
Abstract
The neurological status of patients in the Intensive Care Units (ICU) is determined by the Glasgow Coma Scale (GCS). Patients in coma are thought to be unaware of what is happening around them. However, many studies show that the family plays an important role in the recovery of the patient and is a great emotional resource. In this study, Galvanic Skin Response (GSR) signals were analyzed from 31 patients with low consciousness levels between GCS 3 and 8 to determine relationship between consciousness level and GSR signals as a new approach. The effect of family and nurse on unconscious patients was investigated by GSR signals recorded with a new proposed protocol. The signals were recorded during conversation and touching of the patient by the nurse and their families. According to numerical results, the level of consciousness can be separated using GSR signals. Also, it was found that family and nurse had statistically significant effects on the patient. Patients with GCS 3,4, and 5 were considered to have low level of consciousness, while patients with GCS 6,7, and 8 were considered to have high level of consciousness. According to our results, it is obtained lower GSR amplitude in low GCS (3, 4, 5) compared to high GCS (7, 8). It was concluded that these patients were aware of therapeutic affect although they were unconscious. During the classification stage of this study, the class imbalance problem, which is common in medical diagnosis, was solved using Synthetic Minority Over-Sampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN) and random oversampling methods. In addition, level of consciousness was classified with 92.7% success using various decision tree algorithms. Random Forest was the method which provides higher accuracy compared to all other methods. The obtained results showed that GSR signal analysis recorded in different stages gives very successful GCS score classification performance according to literature studies.
Collapse
Affiliation(s)
| | - Fatma Latifoğlu
- Department of Biomedical Engineering, Erciyes University, Kayseri, Turkey.
| | - Aynur Karayol Akın
- Department of Anesthesiology and Reanimation, Erciyes University, Kayseri, Turkey
| | - Ramis İleri
- Department of Biomedical Engineering, Erciyes University, Kayseri, Turkey
| | - Mehmet Akif Yazar
- Department of Anesthesiology and Reanimation, Konya Training and Research Hospital, Konya, Turkey
| |
Collapse
|
16
|
Dubin D, Xiaoxia W. Human-computer system design of entrepreneurship education based on artificial intelligence and image feature retrieval. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-189067] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The key of deep learning is how to extract abstract, deep and nonlinear target features, in which algorithm plays a crucial role. In this paper, the authors analyze the intelligent system design of entrepreneurship education classroom based on artificial intelligence and image feature retrieval. Pyramid pooling is used to transform any size feature map into fixed size feature vector, which is finally sent to the full connection layer for classification and regression. Experimental results show that the algorithm accelerates the convergence of the whole network and improves the detection speed. The education taught by entrepreneurial class is not only to help college students to seek a stable career, but also to help college students develop their own potential, cultivate entrepreneurial awareness, improve entrepreneurial quality and ability. Entrepreneurship education should not only stay in the design of subject courses, but should integrate entrepreneurship education with internet entrepreneurship practice. On this basis, we provide new countermeasures and suggestions for improving the quality and ability of college students in the process of entrepreneurial activities.
Collapse
Affiliation(s)
- Dong Dubin
- College of Agriculture and Food Science, Zhejiang A&F University, Hangzhou, Zhejiang, China
| | - Wang Xiaoxia
- Dean’s office, Zhejiang A&F University, Hangzhou, Zhejiang, China
| |
Collapse
|
17
|
Amaral JLM, Sancho AG, Faria ACD, Lopes AJ, Melo PL. Differential diagnosis of asthma and restrictive respiratory diseases by combining forced oscillation measurements, machine learning and neuro-fuzzy classifiers. Med Biol Eng Comput 2020; 58:2455-2473. [PMID: 32776208 DOI: 10.1007/s11517-020-02240-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Accepted: 07/26/2020] [Indexed: 01/30/2023]
Abstract
To design machine learning classifiers to facilitate the clinical use and increase the accuracy of the forced oscillation technique (FOT) in the differential diagnosis of patients with asthma and restrictive respiratory diseases. FOT and spirometric exams were performed in 97 individuals, including controls (n = 20), asthmatic patients (n = 38), and restrictive (n = 39) patients. The first experiment of this study showed that the best FOT parameter was the resonance frequency, providing moderate accuracy (AUC = 0.87). In the second experiment, a neuro-fuzzy classifier and different supervised machine learning techniques were investigated, including k-nearest neighbors, random forests, AdaBoost with decision trees, and support vector machines with a radial basis kernel. All classifiers achieved high accuracy (AUC ≥ 0.9) in the differentiation between patient groups. In the third and fourth experiments, the use of different feature selection techniques allowed us to achieve high accuracy with only three FOT parameters. In addition, the neuro-fuzzy classifier also provided rules to explain the classification. Neuro-fuzzy and machine learning classifiers can aid in the differential diagnosis of patients with asthma and restrictive respiratory diseases. They can assist clinicians as a support system providing accurate diagnostic options.
Collapse
Affiliation(s)
- Jorge L M Amaral
- Department of Electronics and Telecommunications Engineering, State University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Alexandre G Sancho
- Biomedical Instrumentation Laboratory, Institute of Biology Roberto Alcantara Gomes and Laboratory of Clinical and Experimental Research in Vascular Biology, State University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Alvaro C D Faria
- Biomedical Instrumentation Laboratory, Institute of Biology Roberto Alcantara Gomes and Laboratory of Clinical and Experimental Research in Vascular Biology, State University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Agnaldo J Lopes
- Pulmonary Function Laboratory, Pedro Ernesto University Hospital, State University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Pedro L Melo
- Biomedical Instrumentation Laboratory, Institute of Biology Roberto Alcantara Gomes and Laboratory of Clinical and Experimental Research in Vascular Biology, State University of Rio de Janeiro, Rio de Janeiro, Brazil.
| |
Collapse
|
18
|
Sajjadnia Z, Khayami R, Moosavi MR. Preprocessing Breast Cancer Data to Improve the Data Quality, Diagnosis Procedure, and Medical Care Services. Cancer Inform 2020; 19:1176935120917955. [PMID: 32528221 PMCID: PMC7262833 DOI: 10.1177/1176935120917955] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 03/09/2020] [Indexed: 11/17/2022] Open
Abstract
In recent years, due to an increase in the incidence of different cancers,
various data sources are available in this field. Consequently, many researchers
have become interested in the discovery of useful knowledge from available data
to assist faster decision-making by doctors and reduce the negative consequences
of such diseases. Data mining includes a set of useful techniques in the
discovery of knowledge from the data: detecting hidden patterns and finding
unknown relations. However, these techniques face several challenges with
real-world data. Particularly, dealing with inconsistencies, errors, noise, and
missing values requires appropriate preprocessing and data preparation
procedures. In this article, we investigate the impact of preprocessing to
provide high-quality data for classification techniques. A wide range of
preprocessing and data preparation methods are studied, and a set of
preprocessing steps was leveraged to obtain appropriate classification results.
The preprocessing is done on a real-world breast cancer dataset of the Reza
Radiation Oncology Center in Mashhad with various features and a great
percentage of null values, and the results are reported in this article. To
evaluate the impact of the preprocessing steps on the results of classification
algorithms, this case study was divided into the following 3 experiments: Breast cancer recurrence prediction without data preprocessing Breast cancer recurrence prediction by error removal Breast cancer recurrence prediction by error removal and filling null values Then, in each experiment, dimensionality reduction techniques are used to select
a suitable subset of features for the problem at hand. Breast cancer recurrence
prediction models are constructed using the 3 widely used classification
algorithms, namely, naïve Bayes, k-nearest neighbor, and
sequential minimal optimization. The evaluation of the experiments is done in
terms of accuracy, sensitivity, F-measure, precision, and G-mean measures. Our
results show that recurrence prediction is significantly improved after data
preprocessing, especially in terms of sensitivity, F-measure, precision, and
G-mean measures.
Collapse
Affiliation(s)
- Zeinab Sajjadnia
- Department of Computer and IT Engineering, Shiraz University of Technology, Shiraz, Iran
| | - Raof Khayami
- Department of Computer and IT Engineering, Shiraz University of Technology, Shiraz, Iran
| | - Mohammad Reza Moosavi
- Department of Computer Science and Engineering and IT, Shiraz University, Shiraz, Iran
| |
Collapse
|
19
|
A design of information granule-based under-sampling method in imbalanced data classification. Soft comput 2020. [DOI: 10.1007/s00500-020-05023-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
20
|
do Amaral JLM, de Melo PL. Clinical decision support systems to improve the diagnosis and management of respiratory diseases. ARTIFICIAL INTELLIGENCE IN PRECISION HEALTH 2020:359-391. [DOI: 10.1016/b978-0-12-817133-2.00015-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2025]
|
21
|
Wang F, Tian YC, Zhang X, Hu F. Detecting Disorders of Consciousness in Brain Injuries From EEG Connectivity Through Machine Learning. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2020. [DOI: 10.1109/tetci.2020.3032662] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
22
|
Zhen Z, Yanqing Y. Lean production and technological innovation in manufacturing industry based on SVM algorithms and data mining technology. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019. [DOI: 10.3233/jifs-179217] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Zhen Zhen
- School of Business, Nanjing Normal University, Nanjing, Jiangsu, China
| | - Yao Yanqing
- CDP Group Limited, Shanghai (Global Headquarter), China
| |
Collapse
|
23
|
Yuan X. Emotional tendency of online legal course review texts based on SVM algorithm and network data acquisition. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019. [DOI: 10.3233/jifs-179207] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Xiaoyi Yuan
- School of Accounting & Finance, Xi’an Peihua University, Xi’an, China
| |
Collapse
|
24
|
Blanco V, Japón A, Puerto J. Optimal arrangements of hyperplanes for SVM-based multiclass classification. ADV DATA ANAL CLASSI 2019. [DOI: 10.1007/s11634-019-00367-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
25
|
Abdar M, Wijayaningrum VN, Hussain S, Alizadehsani R, Plawiak P, Acharya UR, Makarenkov V. IAPSO-AIRS: A novel improved machine learning-based system for wart disease treatment. J Med Syst 2019; 43:220. [DOI: 10.1007/s10916-019-1343-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Accepted: 05/13/2019] [Indexed: 12/14/2022]
|
26
|
A Comprehensive Medical Decision–Support Framework Based on a Heterogeneous Ensemble Classifier for Diabetes Prediction. ELECTRONICS 2019. [DOI: 10.3390/electronics8060635] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Early diagnosis of diabetes mellitus (DM) is critical to prevent its serious complications. An ensemble of classifiers is an effective way to enhance classification performance, which can be used to diagnose complex diseases, such as DM. This paper proposes an ensemble framework to diagnose DM by optimally employing multiple classifiers based on bagging and random subspace techniques. The proposed framework combines seven of the most suitable and heterogeneous data mining techniques, each with a separate set of suitable features. These techniques are k-nearest neighbors, naïve Bayes, decision tree, support vector machine, fuzzy decision tree, artificial neural network, and logistic regression. The framework is designed accurately by selecting, for every sub-dataset, the most suitable feature set and the most accurate classifier. It was evaluated using a real dataset collected from electronic health records of Mansura University Hospitals (Mansura, Egypt). The resulting framework achieved 90% of accuracy, 90.2% of recall = 90.2%, and 94.9% of precision. We evaluated and compared the proposed framework with many other classification algorithms. An analysis of the results indicated that the proposed ensemble framework significantly outperforms all other classifiers. It is a successful step towards constructing a personalized decision support system, which could help physicians in daily clinical practice.
Collapse
|
27
|
Fotouhi S, Asadi S, Kattan MW. A comprehensive data level analysis for cancer diagnosis on imbalanced data. J Biomed Inform 2019; 90:103089. [DOI: 10.1016/j.jbi.2018.12.003] [Citation(s) in RCA: 118] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2017] [Revised: 11/02/2018] [Accepted: 12/21/2018] [Indexed: 11/15/2022]
|
28
|
Kanimozhi U, Ganapathy S, Manjula D, Kannan A. An Intelligent Risk Prediction System for Breast Cancer Using Fuzzy Temporal Rules. NATIONAL ACADEMY SCIENCE LETTERS 2018. [DOI: 10.1007/s40009-018-0732-0] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
29
|
A methodology for customizing clinical tests for esophageal cancer based on patient preferences. Artif Intell Med 2018; 95:16-26. [PMID: 30279042 DOI: 10.1016/j.artmed.2018.08.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Revised: 05/02/2018] [Accepted: 08/02/2018] [Indexed: 12/20/2022]
Abstract
BACKGROUND Clinical tests for diagnosis of any disease may be expensive, uncomfortable, time consuming and can have side effects e.g. barium swallow test for esophageal cancer. Although we can predict non-existence of esophageal cancer with near 100% certainty just using demographics, lifestyle, medical history information, and a few basic clinical tests but our objective is to devise a general methodology for customizing tests with user preferences to avoid expensive or uncomfortable tests. METHOD We propose to use classifiers trained from electronic medical records (EMR) for selection of tests. The key idea is to design classifiers with 100% false normal rates, possibly at the cost of higher false abnormal. We find kernel logistic regression to be most suitable for the task. We propose an algorithm for finding the best probability threshold for kernel LR, based on test set accuracy tuning with help of a validation data set. Using the proposed algorithm, we describe schemes for selecting tests, which appear as features in the automatic classification algorithm, using preferences on costs and discomfort of the users i.e the proposed method is able to detect almost all true patients in the population even with user preferred clinical tests. RESULT We test our methodology with EMRs collected for more than 3000 patients, as a part of project carried out by a reputed hospital in Mumbai, India. We found that kernel SVM and kernel LR with a polynomial kernel of degree 3, yields an accuracy of 99.18% and sensitivity 100% using only demographic, lifestyle, patient history, and basic clinical tests. We demonstrate our test selection algorithm using two case studies, one using cost of clinical tests, and other using "discomfort" values for clinical tests. We compute the test sets corresponding to the lowest false abnormals for each criterion described above, using exhaustive enumeration of 12 and 15 clinical tests respectively. The sets turn out to be different, substantiating our claim that one can customize test sets based on user preferences.
Collapse
|
30
|
Rattanamaneerusmee A, Thirapanmethee K, Nakamura Y, Bongcheewin B, Chomnawang MT. Chemopreventive and biological activities of Helicteres isora L. fruit extracts. Res Pharm Sci 2018; 13:484-492. [PMID: 30607146 PMCID: PMC6288992 DOI: 10.4103/1735-5362.245960] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Helicteres isora L. (H. isora) has been used in traditional medicine in Asia. This study was aimed to determine biological activities of H. isora fruit extracts. Chemopreventive effect was examined by cell proliferation assay and differentiation-inducing effect. Anti-inflammatory activity of extracts was studied on the levels of nitric oxide (NO), tumor necrosis factor alpha (TNF-α), production of prostaglandin E2 (PGE-2), and cyclooxygenas-2 (COX-2). Cell proliferation assay revealed that H. isora extracts and its major compound, rosmarinic acid, showed no cytotoxicity in THP-1 and RCM-1 cells. Methylthio acetic acid from Cucumis melo var.conomon used as a positive control and 80% ethanol extracts demonstrated significant cell differentiation induction. Hexane extract of H. isora could lower the levels of TNF-α, PGE-2, and NO in THP-1 cells with 51.61 ± 0.79%, 69.68 ± 0.017%, and 69.93 ± 9.41% inhibition, respectively. The highest inhibitory effect on COX-2 was obtained from dichloromethane extract. Dexamethasone inhibited the secretion of TNF-α with 95.82 ± 0.50% while celecoxib showed the inhibitory effect on COX-2 and PGE-2 with 100% and 99.86%, respectively. The ethanol extract showed the best antioxidant activity by DPPH and FRAP assays at IC50 of 5.43 ± 1.01 μg/mL and 22.83 ± 0.13 mmol FeSO4/g sample, respectively, while the positive control, trolox, showed the antioxidant activity with IC50 and FRAP values at 4.08 ± 0.85 μg/mL and 10.84 ± 0.04 mmol FeSO4/g sample, respectively. Taken together, H. isora possess chemopreventive and antioxidant activity. Further studies on in vivo activities of this plant are suggested.
Collapse
Affiliation(s)
| | - Krit Thirapanmethee
- Department of Microbiology, Faculty of Pharmacy, Mahidol University, Bangkok, Thailand
| | - Yasushi Nakamura
- Division of Applied Life Sciences, Graduate School of Life and Environmental Sciences, Kyoto Prefectural University, Shimogamo-Hangi, Sakyo, Kyoto 606-8522, Japan.,Horticultural Division, Kyoto Prefectural Agriculture, Forestry and Fisheries Technology Center, Amarube, Kameoka, Kyoto 621-0806, Japan
| | - Bhanubong Bongcheewin
- Department of Pharmaceutical Botany, Faculty of Pharmacy, Mahidol University, Bangkok, Thailand
| | | |
Collapse
|
31
|
The Naïve Associative Classifier (NAC): A novel, simple, transparent, and accurate classification model evaluated on financial data. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.03.085] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
32
|
A Two-Layer Computational Model for Discrimination of Enhancer and Their Types Using Hybrid Features Pace of Pseudo K-Tuple Nucleotide Composition. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2017. [DOI: 10.1007/s13369-017-2818-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
33
|
Ma L, Fan S. CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests. BMC Bioinformatics 2017; 18:169. [PMID: 28292263 PMCID: PMC5351181 DOI: 10.1186/s12859-017-1578-z] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 03/03/2017] [Indexed: 01/04/2023] Open
Abstract
Background The random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization. Results We propose the CURE-SMOTE algorithm for the imbalanced data classification problem. Experiments on imbalanced UCI data reveal that the combination of Clustering Using Representatives (CURE) enhances the original synthetic minority oversampling technique (SMOTE) algorithms effectively compared with the classification results on the original data using random sampling, Borderline-SMOTE1, safe-level SMOTE, C-SMOTE, and k-means-SMOTE. Additionally, the hybrid RF (random forests) algorithm has been proposed for feature selection and parameter optimization, which uses the minimum out of bag (OOB) data error as its objective function. Simulation results on binary and higher-dimensional data indicate that the proposed hybrid RF algorithms, hybrid genetic-random forests algorithm, hybrid particle swarm-random forests algorithm and hybrid fish swarm-random forests algorithm can achieve the minimum OOB error and show the best generalization ability. Conclusion The training set produced from the proposed CURE-SMOTE algorithm is closer to the original data distribution because it contains minimal noise. Thus, better classification results are produced from this feasible and effective algorithm. Moreover, the hybrid algorithm's F-value, G-mean, AUC and OOB scores demonstrate that they surpass the performance of the original RF algorithm. Hence, this hybrid algorithm provides a new way to perform feature selection and parameter optimization.
Collapse
Affiliation(s)
- Li Ma
- School of Information Science and Technology, Jinan University, Guangzhou, 510632, China
| | - Suohai Fan
- School of Information Science and Technology, Jinan University, Guangzhou, 510632, China.
| |
Collapse
|
34
|
Khozeimeh F, Alizadehsani R, Roshanzamir M, Khosravi A, Layegh P, Nahavandi S. An expert system for selecting wart treatment method. Comput Biol Med 2017; 81:167-175. [DOI: 10.1016/j.compbiomed.2017.01.001] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2016] [Revised: 12/31/2016] [Accepted: 01/03/2017] [Indexed: 01/15/2023]
|
35
|
Tan MS, Tan JW, Chang SW, Yap HJ, Abdul Kareem S, Zain RB. A genetic programming approach to oral cancer prognosis. PeerJ 2016; 4:e2482. [PMID: 27688975 PMCID: PMC5036111 DOI: 10.7717/peerj.2482] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2016] [Accepted: 08/24/2016] [Indexed: 11/20/2022] Open
Abstract
Background The potential of genetic programming (GP) on various fields has been attained in recent years. In bio-medical field, many researches in GP are focused on the recognition of cancerous cells and also on gene expression profiling data. In this research, the aim is to study the performance of GP on the survival prediction of a small sample size of oral cancer prognosis dataset, which is the first study in the field of oral cancer prognosis. Method GP is applied on an oral cancer dataset that contains 31 cases collected from the Malaysia Oral Cancer Database and Tissue Bank System (MOCDTBS). The feature subsets that is automatically selected through GP were noted and the influences of this subset on the results of GP were recorded. In addition, a comparison between the GP performance and that of the Support Vector Machine (SVM) and logistic regression (LR) are also done in order to verify the predictive capabilities of the GP. Result The result shows that GP performed the best (average accuracy of 83.87% and average AUROC of 0.8341) when the features selected are smoking, drinking, chewing, histological differentiation of SCC, and oncogene p63. In addition, based on the comparison results, we found that the GP outperformed the SVM and LR in oral cancer prognosis. Discussion Some of the features in the dataset are found to be statistically co-related. This is because the accuracy of the GP prediction drops when one of the feature in the best feature subset is excluded. Thus, GP provides an automatic feature selection function, which chooses features that are highly correlated to the prognosis of oral cancer. This makes GP an ideal prediction model for cancer clinical and genomic data that can be used to aid physicians in their decision making stage of diagnosis or prognosis.
Collapse
Affiliation(s)
- Mei Sze Tan
- Bioinformatics Program, Institute of Biological Sciences, Faculty of Science, University of Malaya , Kuala Lumpur , Malaysia
| | - Jing Wei Tan
- Bioinformatics Program, Institute of Biological Sciences, Faculty of Science, University of Malaya , Kuala Lumpur , Malaysia
| | - Siow-Wee Chang
- Bioinformatics Program, Institute of Biological Sciences, Faculty of Science, University of Malaya , Kuala Lumpur , Malaysia
| | - Hwa Jen Yap
- Department of Mechanical Engineering, Faculty of Engineering, University of Malaya , Kuala Lumpur , Malaysia
| | - Sameem Abdul Kareem
- Department of Artificial Intelligence, Faculty of Computer Science & Information Technology, University of Malaya , Kuala Lumpur , Malaysia
| | - Rosnah Binti Zain
- Oral Cancer Research & Coordinating Centre (OCRCC), Faculty of Dentistry, University of Malaya , Kuala Lumpur , Malaysia
| |
Collapse
|
36
|
Li YH, Xu JY, Tao L, Li XF, Li S, Zeng X, Chen SY, Zhang P, Qin C, Zhang C, Chen Z, Zhu F, Chen YZ. SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity. PLoS One 2016; 11:e0155290. [PMID: 27525735 PMCID: PMC4985167 DOI: 10.1371/journal.pone.0155290] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Accepted: 04/27/2016] [Indexed: 12/20/2022] Open
Abstract
Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi.
Collapse
Affiliation(s)
- Ying Hong Li
- Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China
| | - Jing Yu Xu
- Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China
- School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, China
| | - Lin Tao
- Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China
- Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore
| | - Xiao Feng Li
- Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China
| | - Shuang Li
- Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China
| | - Xian Zeng
- Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore
| | - Shang Ying Chen
- Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore
| | - Peng Zhang
- Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore
| | - Chu Qin
- Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore
| | - Cheng Zhang
- Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore
| | - Zhe Chen
- Zhejiang Key Laboratory of Gastro-intestinal Pathophysiology, Zhejiang Hospital of Traditional Chinese Medicine, Zhejiang Chinese Medical University, Hangzhou, P. R. China
| | - Feng Zhu
- Innovative Drug Research and Bioinformatics Group, Innovative Drug Research Centre and School of Pharmaceutical Sciences, Chongqing University, Chongqing, 401331, China
| | - Yu Zong Chen
- Bioinformatics and Drug Discovery group, Department of Pharmacy, National University of Singapore, Singapore, 117543, Singapore
| |
Collapse
|
37
|
Ali S, Majid A, Javed SG, Sattar M. Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data. Comput Biol Med 2016; 73:38-46. [DOI: 10.1016/j.compbiomed.2016.04.002] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Revised: 03/31/2016] [Accepted: 04/02/2016] [Indexed: 01/10/2023]
|
38
|
Kazemi M, Moghimbeigi A, Kiani J, Mahjub H, Faradmal J. Diabetic peripheral neuropathy class prediction by multicategory support vector machine model: a cross-sectional study. Epidemiol Health 2016; 38:e2016011. [PMID: 27032459 PMCID: PMC5063819 DOI: 10.4178/epih.e2016011] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Accepted: 03/24/2016] [Indexed: 11/24/2022] Open
Abstract
OBJECTIVES Diabetes is increasing in worldwide prevalence, toward epidemic levels. Diabetic neuropathy, one of the most common complications of diabetes mellitus, is a serious condition that can lead to amputation. This study used a multicategory support vector machine (MSVM) to predict diabetic peripheral neuropathy severity classified into four categories using patients’ demographic characteristics and clinical features. METHODS In this study, the data were collected at the Diabetes Center of Hamadan in Iran. Patients were enrolled by the convenience sampling method. Six hundred patients were recruited. After obtaining informed consent, a questionnaire collecting general information and a neuropathy disability score (NDS) questionnaire were administered. The NDS was used to classify the severity of the disease. We used MSVM with both one-against-all and one-against-one methods and three kernel functions, radial basis function (RBF), linear, and polynomial, to predict the class of disease with an unbalanced dataset. The synthetic minority class oversampling technique algorithm was used to improve model performance. To compare the performance of the models, the mean of accuracy was used. RESULTS For predicting diabetic neuropathy, a classifier built from a balanced dataset and the RBF kernel function with a one-against-one strategy predicted the class to which a patient belonged with about 76% accuracy. CONCLUSIONS The results of this study indicate that, in terms of overall classification accuracy, the MSVM model based on a balanced dataset can be useful for predicting the severity of diabetic neuropathy, and it should be further investigated for the prediction of other diseases.
Collapse
Affiliation(s)
- Maryam Kazemi
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Abbas Moghimbeigi
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran.,Modeling of Noncommunicable Diseases Research Center, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Javad Kiani
- Department of Endocrinology, College of Medical Sciences, Hamedan University of Medical Sciences, Hamedan, Iran.,Department of Internal Medicine, College of Medical Sciences, Hamedan University of Medical Sciences, Hamedan, Iran
| | - Hossein Mahjub
- Research Center for Health Sciences and Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Javad Faradmal
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran.,Modeling of Noncommunicable Diseases Research Center, Hamadan University of Medical Sciences, Hamadan, Iran
| |
Collapse
|
39
|
Bashir S, Qamar U, Khan FH. IntelliHealth: A medical decision support application using a novel weighted multi-layer classifier ensemble framework. J Biomed Inform 2015; 59:185-200. [PMID: 26703093 DOI: 10.1016/j.jbi.2015.12.001] [Citation(s) in RCA: 82] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2015] [Revised: 11/01/2015] [Accepted: 12/06/2015] [Indexed: 11/30/2022]
Abstract
Accuracy plays a vital role in the medical field as it concerns with the life of an individual. Extensive research has been conducted on disease classification and prediction using machine learning techniques. However, there is no agreement on which classifier produces the best results. A specific classifier may be better than others for a specific dataset, but another classifier could perform better for some other dataset. Ensemble of classifiers has been proved to be an effective way to improve classification accuracy. In this research we present an ensemble framework with multi-layer classification using enhanced bagging and optimized weighting. The proposed model called "HM-BagMoov" overcomes the limitations of conventional performance bottlenecks by utilizing an ensemble of seven heterogeneous classifiers. The framework is evaluated on five different heart disease datasets, four breast cancer datasets, two diabetes datasets, two liver disease datasets and one hepatitis dataset obtained from public repositories. The analysis of the results show that ensemble framework achieved the highest accuracy, sensitivity and F-Measure when compared with individual classifiers for all the diseases. In addition to this, the ensemble framework also achieved the highest accuracy when compared with the state of the art techniques. An application named "IntelliHealth" is also developed based on proposed model that may be used by hospitals/doctors for diagnostic advice.
Collapse
Affiliation(s)
- Saba Bashir
- Computer Engineering Department, College of Electrical and Mechanical Engineering, National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan.
| | - Usman Qamar
- Computer Engineering Department, College of Electrical and Mechanical Engineering, National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan.
| | - Farhan Hassan Khan
- Computer Engineering Department, College of Electrical and Mechanical Engineering, National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan.
| |
Collapse
|
40
|
Makond B, Wang KJ, Wang KM. Probabilistic modeling of short survivability in patients with brain metastasis from lung cancer. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2015; 119:142-162. [PMID: 25804445 DOI: 10.1016/j.cmpb.2015.02.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2014] [Revised: 02/07/2015] [Accepted: 02/10/2015] [Indexed: 06/04/2023]
Abstract
The prediction of substantially short survivability in patients is extremely risky. In this study, we proposed a probabilistic model using Bayesian network (BN) to predict the short survivability of patients with brain metastasis from lung cancer. A nationwide cancer patient database from 1996 to 2010 in Taiwan was used. The cohort consisted of 438 patients with brain metastasis from lung cancer. We utilized synthetic minority over-sampling technique (SMOTE) to solve the imbalanced property embedded in the problem. The proposed BN was compared with three competitive models, namely, naive Bayes (NB), logistic regression (LR), and support vector machine (SVM). Statistical analysis showed that performances of BN, LR, NB, and SVM were statistically the same in terms of all indices with low sensitivity when these models were applied on an imbalanced data set. Results also showed that SMOTE can improve the performance of the four models in terms of sensitivity, while keeping high accuracy and specificity. Further, the proposed BN is more effective as compared with NB, LR, and SVM from two perspectives: the transparency and ability to show the relation of factors affecting brain metastasis from lung cancer; it allows decision makers to find the probability despite incomplete evidence and information; and the sensitivity of the proposed BN is the highest among all standard machine learning methods.
Collapse
Affiliation(s)
- Bunjira Makond
- Faculty of Commerce and Management, Prince of Songkla University, Trang, Thailand.
| | - Kung-Jeng Wang
- Department of Industrial Management, National Taiwan University of Science and Technology, Taipei 106, Taiwan, ROC.
| | - Kung-Min Wang
- Department of Surgery, Shin-Kong Wu Ho-Su Memorial Hospital, Taipei, Taiwan, ROC.
| |
Collapse
|
41
|
Ali S, Majid A. Can–Evo–Ens: Classifier stacking based evolutionary ensemble system for prediction of human breast cancer using amino acid sequences. J Biomed Inform 2015; 54:256-69. [DOI: 10.1016/j.jbi.2015.01.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Revised: 12/09/2014] [Accepted: 01/12/2015] [Indexed: 01/10/2023]
|
42
|
Wang KJ, Adrian AM, Chen KH, Wang KM. A hybrid classifier combining Borderline-SMOTE with AIRS algorithm for estimating brain metastasis from lung cancer: a case study in Taiwan. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2015; 119:63-76. [PMID: 25823851 DOI: 10.1016/j.cmpb.2015.03.003] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2014] [Revised: 03/03/2015] [Accepted: 03/06/2015] [Indexed: 06/04/2023]
Abstract
Classifying imbalanced data in medical informatics is challenging. Motivated by this issue, this study develops a classifier approach denoted as BSMAIRS. This approach combines borderline synthetic minority oversampling technique (BSM) and artificial immune recognition system (AIRS) as global optimization searcher with the nearest neighbor algorithm used as a local classifier. Eight electronic medical datasets collected from University of California, Irvine (UCI) machine learning repository were used to evaluate the effectiveness and to justify the performance of the proposed BSMAIRS. Comparisons with several well-known classifiers were conducted based on accuracy, sensitivity, specificity, and G-mean. Statistical results concluded that BSMAIRS can be used as an efficient method to handle imbalanced class problems. To further confirm its performance, BSMAIRS was applied to real imbalanced medical data of lung cancer metastasis to the brain that were collected from National Health Insurance Research Database, Taiwan. This application can function as a supplementary tool for doctors in the early diagnosis of brain metastasis from lung cancer.
Collapse
Affiliation(s)
- Kung-Jeng Wang
- Department of Industrial Management, National Taiwan University of Science and Technology, Taipei 106, Taiwan, ROC.
| | - Angelia Melani Adrian
- Department of Industrial Management, National Taiwan University of Science and Technology, Taipei 106, Taiwan, ROC; Department of Informatics Engineering, De La Salle University, Manado 95231, Indonesia.
| | - Kun-Huang Chen
- Department of Industrial Management, National Taiwan University of Science and Technology, Taipei 106, Taiwan, ROC.
| | - Kung-Min Wang
- Department of Surgery, Shin-Kong Wu Ho-Su Memorial Hospital, Taipei, Taiwan, ROC.
| |
Collapse
|
43
|
HBC-Evo: predicting human breast cancer by exploiting amino acid sequence-based feature spaces and evolutionary ensemble system. Amino Acids 2014; 47:217-21. [DOI: 10.1007/s00726-014-1871-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2014] [Accepted: 11/04/2014] [Indexed: 10/24/2022]
|
44
|
Hayat M, Iqbal N. Discriminating protein structure classes by incorporating Pseudo Average Chemical Shift to Chou's general PseAAC and Support Vector Machine. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2014; 116:184-192. [PMID: 24997484 DOI: 10.1016/j.cmpb.2014.06.007] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Revised: 06/09/2014] [Accepted: 06/13/2014] [Indexed: 06/03/2023]
Abstract
Proteins control all biological functions in living species. Protein structure is comprised of four major classes including all-α class, all-β class, α+β, and α/β. Each class performs different function according to their nature. Owing to the large exploration of protein sequences in the databanks, the identification of protein structure classes is difficult through conventional methods with respect to cost and time. Looking at the importance of protein structure classes, it is thus highly desirable to develop a computational model for discriminating protein structure classes with high accuracy. For this purpose, we propose a silco method by incorporating Pseudo Average Chemical Shift and Support Vector Machine. Two feature extraction schemes namely Pseudo Amino Acid Composition and Pseudo Average Chemical Shift are used to explore valuable information from protein sequences. The performance of the proposed model is assessed using four benchmark datasets 25PDB, 1189, 640 and 399 employing jackknife test. The success rates of the proposed model are 84.2%, 85.0%, 86.4%, and 89.2%, respectively on the four datasets. The empirical results reveal that the performance of our proposed model compared to existing models is promising in the literature so far and might be useful for future research.
Collapse
Affiliation(s)
- Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.
| | - Nadeem Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| |
Collapse
|
45
|
|