Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ning Q, Zhao X, Ma Z. A Novel Method for Identification of Glutarylation Sites Combining Borderline-SMOTE With Tomek Links Technique in Imbalanced Data. IEEE/ACM Trans Comput Biol Bioinform 2022;19:2632-2641. [PMID: 34236968 DOI: 10.1109/tcbb.2021.3095482] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

For:	Ning Q, Zhao X, Ma Z. A Novel Method for Identification of Glutarylation Sites Combining Borderline-SMOTE With Tomek Links Technique in Imbalanced Data. IEEE/ACM Trans Comput Biol Bioinform 2022;19:2632-2641. [PMID: 34236968 DOI: 10.1109/tcbb.2021.3095482] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Number

Cited by Other Article(s)

Jiang J, Zhang C, Ke L, Hayes N, Zhu Y, Qiu H, Zhang B, Zhou T, Wei GW. A review of machine learning methods for imbalanced data challenges in chemistry. Chem Sci 2025:d5sc00270b. [PMID: 40271022 PMCID: PMC12013631 DOI: 10.1039/d5sc00270b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2025] [Accepted: 04/06/2025] [Indexed: 04/25/2025] Open

Li L, Zhang X. Addressing data imbalance in collision risk prediction with active generative oversampling. Sci Rep 2025;15:9133. [PMID: 40097620 PMCID: PMC11914271 DOI: 10.1038/s41598-025-93851-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2024] [Accepted: 03/10/2025] [Indexed: 03/19/2025] Open

Ahmed F, Sharma A, Shatabda S, Dehzangi I. DeepPhoPred: Accurate Deep Learning Model to Predict Microbial Phosphorylation. Proteins 2025;93:465-481. [PMID: 39239684 DOI: 10.1002/prot.26734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 06/27/2024] [Accepted: 07/15/2024] [Indexed: 09/07/2024]

Saez JA, Vera JF. Compact Class-Conditional Attribute Category Clustering: Amino Acid Grouping for Enhanced HIV-1 Protease Cleavage Classification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024;21:2167-2178. [PMID: 39178086 DOI: 10.1109/tcbb.2024.3448617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/25/2024]

Ishfaq M, Shah SZA, Ahmad I, Rahman Z. Multinomial classification of NLRP3 inhibitory compounds based on large scale machine learning approaches. Mol Divers 2024;28:1849-1868. [PMID: 37418166 DOI: 10.1007/s11030-023-10690-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 07/03/2023] [Indexed: 07/08/2023]

Gu ZF, Hao YD, Wang TY, Cai PL, Zhang Y, Deng KJ, Lin H, Lv H. Prediction of blood-brain barrier penetrating peptides based on data augmentation with Augur. BMC Biol 2024;22:86. [PMID: 38637801 PMCID: PMC11027412 DOI: 10.1186/s12915-024-01883-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 04/05/2024] [Indexed: 04/20/2024] Open

Abstract

BACKGROUND

The blood-brain barrier serves as a critical interface between the bloodstream and brain tissue, mainly composed of pericytes, neurons, endothelial cells, and tightly connected basal membranes. It plays a pivotal role in safeguarding brain from harmful substances, thus protecting the integrity of the nervous system and preserving overall brain homeostasis. However, this remarkable selective transmission also poses a formidable challenge in the realm of central nervous system diseases treatment, hindering the delivery of large-molecule drugs into the brain. In response to this challenge, many researchers have devoted themselves to developing drug delivery systems capable of breaching the blood-brain barrier. Among these, blood-brain barrier penetrating peptides have emerged as promising candidates. These peptides had the advantages of high biosafety, ease of synthesis, and exceptional penetration efficiency, making them an effective drug delivery solution. While previous studies have developed a few prediction models for blood-brain barrier penetrating peptides, their performance has often been hampered by issue of limited positive data.

RESULTS

In this study, we present Augur, a novel prediction model using borderline-SMOTE-based data augmentation and machine learning. we extract highly interpretable physicochemical properties of blood-brain barrier penetrating peptides while solving the issues of small sample size and imbalance of positive and negative samples. Experimental results demonstrate the superior prediction performance of Augur with an AUC value of 0.932 on the training set and 0.931 on the independent test set.

CONCLUSIONS

This newly developed Augur model demonstrates superior performance in predicting blood-brain barrier penetrating peptides, offering valuable insights for drug development targeting neurological disorders. This breakthrough may enhance the efficiency of peptide-based drug discovery and pave the way for innovative treatment strategies for central nervous system diseases.

Collapse

Affiliation(s)

Zhi-Feng Gu The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, PR China Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, PR China
Yu-Duo Hao The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, PR China Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, PR China
Tian-Yu Wang The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, PR China Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, PR China
Pei-Ling Cai School of Basic Medical Sciences, Chengdu University, Chengdu, 610106, PR China
Yang Zhang Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, 610072, PR China
Ke-Jun Deng The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, PR China Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, PR China
Hao Lin The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, PR China. Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, PR China.
Hao Lv The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, PR China. Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731, PR China.

Collapse

Arif M, Fang G, Fida H, Musleh S, Yu DJ, Alam T. iMRSAPred: Improved Prediction of Anti-MRSA Peptides Using Physicochemical and Pairwise Contact-Energy Properties of Amino Acids. ACS OMEGA 2024;9:2874-2883. [PMID: 38250405 PMCID: PMC10795061 DOI: 10.1021/acsomega.3c08303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Revised: 12/06/2023] [Accepted: 12/13/2023] [Indexed: 01/23/2024]

Liu X, Zhu B, Dai XW, Xu ZA, Li R, Qian Y, Lu YP, Zhang W, Liu Y, Zheng J. GBDT_KgluSite: An improved computational prediction model for lysine glutarylation sites based on feature fusion and GBDT classifier. BMC Genomics 2023;24:765. [PMID: 38082413 PMCID: PMC10712101 DOI: 10.1186/s12864-023-09834-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open

A machine learning and explainable artificial intelligence triage-prediction system for COVID-19. DECISION ANALYTICS JOURNAL 2023;7:100246. [PMCID: PMC10163946 DOI: 10.1016/j.dajour.2023.100246] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 04/21/2023] [Accepted: 05/02/2023] [Indexed: 06/02/2024]

Abstract

COVID-19 is a respiratory disease caused by the SARS-CoV-2 contagion, severely disrupted the healthcare infrastructure. Various countries have developed COVID-19 vaccines that have effectively prevented the severe symptoms caused by the virus to a certain extent. However, a small section of people continues to perish. Artificial intelligence advances have revolutionized healthcare diagnosis and prognosis infrastructure. In this study, we predict the severity of COVID-19 using heterogenous Machine Learning and Deep Learning algorithms by considering clinical markers, vital signs, and other critical factors. This study extensively reviews various classifier architectures to predict the COVID-19 severity. We built and evaluated multiple pipelines entailing combinations of five state-of-the-art data-balancing techniques (Synthetic Minority Oversampling Technique (SMOTE), Adaptive Synthetic, Borderline SMOTE, SMOTE with Tomek links, and SMOTE with Edited Nearest Neighbor (ENN)) and twelve heterogeneous classifiers such as Logistic Regression, Decision Tree, Random Forest, Support Vector Machine, K-Nearest Neighbors, Naïve Bayes, Xgboost, Extratrees, Adaboost, Light GBM, Catboost, and 1-D Convolution Neural Network. The best-performing pipeline consists of Random Forest trained on Borderline SMOTE balanced data that produced the highest recall of 83%. We deployed Explainable Artificial Intelligence tools such as Shapley Additive Explanations and Local Interpretable Model-agnostic Explanations, ELI5, Qlattice, Anchor, and Feature Importance to demystify complex tree-based ensemble models. These tools provide valuable insights into the significance of critical features in the severity prediction of a COVID-19 patient. It was observed that changes in respiratory rate, blood pressure, lactate, and calcium values were the primary contributors to the increase in severity of a COVID-19 patient. This architecture aims to be an explainable decision-support triaging system for medical professionals in countries lacking advanced medical technology and infrastructure to reduce fatalities.

Collapse

Class-biased sarcasm detection using BiLSTM variational autoencoder-based synthetic oversampling. Soft comput 2023. [DOI: 10.1007/s00500-023-07956-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]

Tarimo CS, Bhuyan SS, Li Q, Ren W, Mahande MJ, Wu J. Combining Resampling Strategies and Ensemble Machine Learning Methods to Enhance Prediction of Neonates with a Low Apgar Score After Induction of Labor in Northern Tanzania. Risk Manag Healthc Policy 2021;14:3711-3720. [PMID: 34522147 PMCID: PMC8434924 DOI: 10.2147/rmhp.s331077] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 08/26/2021] [Indexed: 11/23/2022] Open

Abstract

Objective

The goal of this study was to establish the most efficient boosting method in predicting neonatal low Apgar scores following labor induction intervention and to assess whether resampling strategies would improve the predictive performance of the selected boosting algorithms.

Methods

A total of 7716 singleton births delivered from 2000 to 2015 were analyzed. Cesarean deliveries following labor induction, deliveries with abnormal presentation, and deliveries with missing Apgar score or delivery mode information were excluded. We examined the effect of resampling approaches or data preprocessing on predicting low Apgar scores, specifically the synthetic minority oversampling technique (SMOTE), borderline-SMOTE, and the random undersampling (RUS) technique. Sensitivity, specificity, precision, area under receiver operating curve (AUROC), F-score, positive predicted values (PPV), negative predicted values (NPV) and accuracy of the three (3) boosting-based ensemble methods were used to evaluate their discriminative ability. The ensemble learning models tested include adoptive boosting (AdaBoost), gradient boosting (GB) and extreme gradient boosting method (XGBoost).

Results

The prevalence of low (<7) Apgar scores was 9.5% (n = 733). The prediction models performed nearly similar in their baseline mode. Following the application of resampling techniques, borderline-SMOTE significantly improved the predictive performance of all the boosting-based ensemble methods under observation in terms of sensitivity, F1-score, AUROC and PPV.

Conclusion

Policymakers, healthcare informaticians and neonatologists should consider implementing data preprocessing strategies when predicting a neonatal outcome with imbalanced data to enhance efficiency. The process may be more effective when borderline-SMOTE technique is deployed on the selected ensemble classifiers. However, future research may focus on testing additional resampling techniques, performing feature engineering, variable selection and optimizing further the ensemble learning hyperparameters.

Collapse