1
|
Naseem A, Khan YD. An intelligent model for prediction of abiotic stress-responsive microRNAs in plants using statistical moments based features and ensemble approaches. Methods 2024:S1046-2023(24)00124-5. [PMID: 38768931 DOI: 10.1016/j.ymeth.2024.05.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 04/30/2024] [Accepted: 05/10/2024] [Indexed: 05/22/2024] Open
Abstract
This study proposed an intelligent model for predicting abiotic stress-responsive microRNAs in plants. MicroRNAs (miRNAs) are short RNA molecules regulates the stress in genes. Experimental methods are costly and time-consuming, as compare to in-silico prediction. Addressing this gap, the study seeks to develop an efficient computational model for plant stress response prediction. The two benchmark datasets for MiRNA and Pre-MiRNA dataset have been acquired in this study. Four ensemble approaches such as bagging, boosting, stacking, and blending have been employed. Classifiers such as Random Forest (RF), Extra Trees (ET), Ada Boost (ADB), Light Gradient Boosting Machine (LGBM), and Support Vector Machine (SVM). Stacking and Blending employed all stated classifiers as base learners and Logistic Regression (LR) as Meta Classifier. There have been a total of four types of testing used, including independent set, self-consistency, cross-validation with 5 and 10 folds, and jackknife. This study has utilized evaluation metrics such as accuracy score, specificity, sensitivity, Mathew's correlation coefficient (MCC), and AUC. Our proposed methodology has outperformed existing state of the art study in both datasets based on independent set testing. The SVM-based approach has exhibited accuracy score of 0.659 for the MiRNA dataset, which is better than the previous study. The ET classifier has surpassed the accuracy of Pre-MiRNA dataset as compared to the existing benchmark study, achieving an impressive score of 0.67. The proposed method can be used in future research to predict abiotic stresses in plants.
Collapse
Affiliation(s)
- Ansar Naseem
- Department of Artificial Intelligence, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan.
| |
Collapse
|
2
|
Chen T, Kabir MF. Explainable machine learning approach for cancer prediction through binarilization of RNA sequencing data. PLoS One 2024; 19:e0302947. [PMID: 38728288 PMCID: PMC11086842 DOI: 10.1371/journal.pone.0302947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 04/15/2024] [Indexed: 05/12/2024] Open
Abstract
In recent years, researchers have proven the effectiveness and speediness of machine learning-based cancer diagnosis models. However, it is difficult to explain the results generated by machine learning models, especially ones that utilized complex high-dimensional data like RNA sequencing data. In this study, we propose the binarilization technique as a novel way to treat RNA sequencing data and used it to construct explainable cancer prediction models. We tested our proposed data processing technique on five different models, namely neural network, random forest, xgboost, support vector machine, and decision tree, using four cancer datasets collected from the National Cancer Institute Genomic Data Commons. Since our datasets are imbalanced, we evaluated the performance of all models using metrics designed for imbalance performance like geometric mean, Matthews correlation coefficient, F-Measure, and area under the receiver operating characteristic curve. Our approach showed comparative performance while relying on less features. Additionally, we demonstrated that data binarilization offers higher explainability by revealing how each feature affects the prediction. These results demonstrate the potential of data binarilization technique in improving the performance and explainability of RNA sequencing based cancer prediction models.
Collapse
Affiliation(s)
- Tianjie Chen
- Department of Computer Science, Pennsylvania State University Harrisburg, Middletown, Pennsylvania, United States of America
| | - Md Faisal Kabir
- Department of Computer Science, Pennsylvania State University Harrisburg, Middletown, Pennsylvania, United States of America
| |
Collapse
|
3
|
Akbar S, Zou Q, Raza A, Alarfaj FK. iAFPs-Mv-BiTCN: Predicting antifungal peptides using self-attention transformer embedding and transform evolutionary based multi-view features with bidirectional temporal convolutional networks. Artif Intell Med 2024; 151:102860. [PMID: 38552379 DOI: 10.1016/j.artmed.2024.102860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 02/21/2024] [Accepted: 03/25/2024] [Indexed: 04/26/2024]
Abstract
Globally, fungal infections have become a major health concern in humans. Fungal diseases generally occur due to the invading fungus appearing on a specific portion of the body and becoming hard for the human immune system to resist. The recent emergence of COVID-19 has intensely increased different nosocomial fungal infections. The existing wet-laboratory-based medications are expensive, time-consuming, and may have adverse side effects on normal cells. In the last decade, peptide therapeutics have gained significant attention due to their high specificity in targeting affected cells without affecting healthy cells. Motivated by the significance of peptide-based therapies, we developed a highly discriminative prediction scheme called iAFPs-Mv-BiTCN to predict antifungal peptides correctly. The training peptides are encoded using word embedding methods such as skip-gram and attention mechanism-based bidirectional encoder representation using transformer. Additionally, transform-based evolutionary features are generated using the Pseduo position-specific scoring matrix using discrete wavelet transform (PsePSSM-DWT). The fused vector of word embedding and evolutionary descriptors is formed to compensate for the limitations of single encoding methods. A Shapley Additive exPlanations (SHAP) based global interpolation approach is applied to reduce training costs by choosing the optimal feature set. The selected feature set is trained using a bi-directional temporal convolutional network (BiTCN). The proposed iAFPs-Mv-BiTCN model achieved a predictive accuracy of 98.15 % and an AUC of 0.99 using training samples. In the case of the independent samples, our model obtained an accuracy of 94.11 % and an AUC of 0.98. Our iAFPs-Mv-BiTCN model outperformed existing models with a ~4 % and ~5 % higher accuracy using training and independent samples, respectively. The reliability and efficacy of the proposed iAFPs-Mv-BiTCN model make it a valuable tool for scientists and may perform a beneficial role in pharmaceutical design and research academia.
Collapse
Affiliation(s)
- Shahid Akbar
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China; Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, PR China.
| | - Ali Raza
- Department of Physical and Numerical Sciences, Qurtuba University of Science and Information Technology, Peshawar, KP 25124, Pakistan
| | - Fawaz Khaled Alarfaj
- Department of Management Information Systems (MIS), School of Business, King Faisal University (KFU), Al-Ahsa 31982, Saudi Arabia
| |
Collapse
|
4
|
Lv Y, Feng G, Yang L, Wu X, Wang C, Ye A, wang S, Xu C, Shi H. Differential whole-genome doubling based signatures for improvement on clinical outcomes and drug response in patients with breast cancer. Heliyon 2024; 10:e28586. [PMID: 38576569 PMCID: PMC10990872 DOI: 10.1016/j.heliyon.2024.e28586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 03/20/2024] [Accepted: 03/20/2024] [Indexed: 04/06/2024] Open
Abstract
Whole genome doublings (WGD), a hallmark of human cancer, is pervasive in breast cancer patients. However, the molecular mechanism of the complete impact of WGD on survival and treatment response in breast cancer remains unclear. To address this, we performed a comprehensive and systematic analysis of WGD, aiming to identify distinct genetic alterations linked to WGD and highlight its improvement on clinical outcomes and treatment response for breast cancer. A linear regression model along with weighted gene co-expression network analysis (WGCNA) was applied on The Cancer Genome Atlas (TCGA) dataset to identify critical genes related to WGD. Further Cox regression models with random selection were used to optimize the most useful prognostic markers in the TCGA dataset. The clinical implication of the risk model was further assessed through prognostic impact evaluation, tumor stratification, functional analysis, genomic feature difference analysis, drug response analysis, and multiple independent datasets for validation. Our findings revealed a high aneuploidy burden, chromosomal instability (CIN), copy number variation (CNV), and mutation burden in breast tumors exhibiting WGD events. Moreover, 247 key genes associated with WGD were identified from the distinct genomic patterns in the TCGA dataset. A risk model consisting of 22 genes was optimized from the key genes. High-risk breast cancer patients were more prone to WGD and exhibited greater genomic diversity compared to low-risk patients. Some oncogenic signaling pathways were enriched in the high-risk group, while primary immune deficiency pathways were enriched in the low-risk group. We also identified a risk gene, ANLN (anillin), which displayed a strong positive correlation with two crucial WGD genes, KIF18A and CCNE2. Tumors with high expression of ANLN were more prone to WGD events and displayed worse clinical survival outcomes. Furthermore, the expression levels of these risk genes were significantly associated with the sensitivities of BRCA cell lines to multiple drugs, providing valuable insights for targeted therapies. These findings will be helpful for further improvement on clinical outcomes and contribution to drug development in breast cancer.
Collapse
Affiliation(s)
| | | | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, 150081, China
| | - Xiaoliang Wu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, 150081, China
| | - Chengyi Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, 150081, China
| | - Aokun Ye
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, 150081, China
| | - Shuyuan wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, 150081, China
| | - Chaohan Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, 150081, China
| | - Hongbo Shi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, 150081, China
| |
Collapse
|
5
|
Liang X, Zhao H, Wang J. MA-PEP: A novel anticancer peptide prediction framework with multimodal feature fusion based on attention mechanism. Protein Sci 2024; 33:e4966. [PMID: 38532681 DOI: 10.1002/pro.4966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 01/30/2024] [Accepted: 03/06/2024] [Indexed: 03/28/2024]
Abstract
AntiCancer Peptides (ACPs) have emerged as promising therapeutic agents for cancer treatment. The time-consuming and costly nature of wet-lab discriminatory methods has spurred the development of various machine learning and deep learning-based ACP classification methods. Nonetheless, current methods encountered challenges in efficiently integrating features from various peptide modalities, thereby limiting a more comprehensive understanding of ACPs and further restricting the improvement of prediction model performance. In this study, we introduce a novel ACP prediction method, MA-PEP, which leverages multiple attention mechanisms for feature enhancement and fusion to improve ACP prediction. By integrating the enhanced molecular-level chemical features and sequence information of peptides, MA-PEP demonstrates superior prediction performance across several benchmark datasets, highlighting its efficacy in ACP prediction. Moreover, the visual analysis and case studies further demonstrate MA-PEP's reliable feature extraction capability and its promise in the realm of ACP exploration. The code and datasets for MA-PEP are available at https://github.com/liangxiaodata/MA-PEP.
Collapse
Affiliation(s)
- Xiao Liang
- School of Computer Science and Engineering, Central South University, Changsha, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, China
| | - Haochen Zhao
- School of Computer Science and Engineering, Central South University, Changsha, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, China
| |
Collapse
|
6
|
Balaji PD, Selvam S, Sohn H, Madhavan T. MLASM: Machine learning based prediction of anticancer small molecules. Mol Divers 2024:10.1007/s11030-024-10823-x. [PMID: 38554168 DOI: 10.1007/s11030-024-10823-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 02/10/2024] [Indexed: 04/01/2024]
Abstract
Cancer, being the second leading cause of death globally. So, the development of effective anticancer treatments is crucial in the field of medicine. Anticancer peptides (ACPs) have shown promising therapeutic potential in cancer treatment compared to traditional methods. However, the process of identifying ACPs through experimental means is often time-intensive and expensive. To overcome this issue, we employed a machine learning-based approach for the first time to develop an anticancer model using small molecules. Anticancer small molecules (ACSMs) are compounds that have been developed to target and inhibit cancer cells. In this study, we used 10,000 compounds to develop the machine learning models using five algorithms such as, Random Forest (RF), Light gradient boosting machine (LightGBM), K-nearest neighbors (KNN), Decision tree (DT) and Extreme Gradient Boosting (XGB). The developed models were evaluated using the test set and top three models were identified (RF, LightGBM and XGB). Furthermore, to validate the predictive performance of our models, we have performed external validation using an FDA approved anticancer compounds/drugs. Following this analysis, we found that our LightGBM model correctly predicted 9 compounds as active. However, RF and XGB exhibited some limitations by predicting 8 and 7 compounds as active out of 10, respectively. These results demonstrate that, when compared to RF and XGB, the LightGBM model showcase robust prediction capabilities, achieving a superior accuracy of 79% with an AUC of 0.88. These findings provide promising insights into the potential of our approach for predicting anticancer small molecules, highlighting the role of machine learning in advancing cancer treatment research.
Collapse
Affiliation(s)
- Priya Dharshini Balaji
- Computational Biology Laboratory, Department of Genetic Engineering, School of Bio-Engineering, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu, Tamil Nadu, 603203, India
| | - Subathra Selvam
- Computational Biology Laboratory, Department of Genetic Engineering, School of Bio-Engineering, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu, Tamil Nadu, 603203, India
| | - Honglae Sohn
- Department of Chemistry, Department of Carbon Materials, Chosun University, Gwangju, South Korea
| | - Thirumurthy Madhavan
- Computational Biology Laboratory, Department of Genetic Engineering, School of Bio-Engineering, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu, Tamil Nadu, 603203, India.
| |
Collapse
|
7
|
Lee B, Shin D. Contrastive learning for enhancing feature extraction in anticancer peptides. Brief Bioinform 2024; 25:bbae220. [PMID: 38725157 PMCID: PMC11082072 DOI: 10.1093/bib/bbae220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 03/28/2024] [Accepted: 04/21/2024] [Indexed: 05/13/2024] Open
Abstract
Cancer, recognized as a primary cause of death worldwide, has profound health implications and incurs a substantial social burden. Numerous efforts have been made to develop cancer treatments, among which anticancer peptides (ACPs) are garnering recognition for their potential applications. While ACP screening is time-consuming and costly, in silico prediction tools provide a way to overcome these challenges. Herein, we present a deep learning model designed to screen ACPs using peptide sequences only. A contrastive learning technique was applied to enhance model performance, yielding better results than a model trained solely on binary classification loss. Furthermore, two independent encoders were employed as a replacement for data augmentation, a technique commonly used in contrastive learning. Our model achieved superior performance on five of six benchmark datasets against previous state-of-the-art models. As prediction tools advance, the potential in peptide-based cancer therapeutics increases, promising a brighter future for oncology research and patient care.
Collapse
Affiliation(s)
- Byungjo Lee
- Research Institute, National Cancer Center, 323, Ilsan-ro, Ilsandong-gu, Goyang, 10408, Republic of Korea
| | - Dongkwan Shin
- Research Institute, National Cancer Center, 323, Ilsan-ro, Ilsandong-gu, Goyang, 10408, Republic of Korea
- Department of Cancer Biomedical Science, National Cancer Center Graduate School of Cancer Science and Policy, 323, Ilsan-ro, Ilsandong-gu, Goyang, 10408, Republic of Korea
| |
Collapse
|
8
|
Li X, Wu M, Wu M, Liu J, Song L, Wang J, Zhou J, Li S, Yang H, Zhang J, Cui X, Liu Z, Zeng F. A radiomics and genomics-derived model for predicting metastasis and prognosis in colorectal cancer. Carcinogenesis 2024; 45:170-180. [PMID: 38195111 DOI: 10.1093/carcin/bgad098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 12/08/2023] [Accepted: 01/08/2024] [Indexed: 01/11/2024] Open
Abstract
Approximately 50% of colorectal cancer (CRC) patients would develop metastasis with poor prognosis, therefore, it is necessary to effectively predict metastasis in clinical treatment. In this study, we aimed to establish a machine-learning model for predicting metastasis in CRC patients by considering radiomics and transcriptomics simultaneously. Here, 1023 patients with CRC from three centers were collected and divided into five queues (Dazhou Central Hospital n = 517, Nanchong Central Hospital n = 120 and the Cancer Genome Atlas (TCGA) n = 386). A total of 854 radiomics features were extracted from tumor lesions on CT images, and 217 differentially expressed genes were obtained from non-metastasis and metastasis tumor tissues using RNA sequencing. Based on radiotranscriptomic (RT) analysis, a novel RT model was developed and verified through genetic algorithms (GA). Interleukin (IL)-26, a biomarker in RT model, was verified for its biological function in CRC metastasis. Furthermore, 15 radiomics variables were screened through stepwise regression, which was highly correlated with the IL26 expression level. Finally, a radiomics model (RA) was established by combining GA and stepwise regression analysis with radiomics features. The RA model exhibited favorable discriminatory ability and accuracy for metastasis prediction in two independent verification cohorts. We designed multicenter, multi-scale cohorts to construct and verify novel combined radiomics and genomics models for predicting metastasis in CRC. Overall, RT model and RA model might help clinicians in directing personalized diagnosis and therapeutic regimen selection for patients with CRC.
Collapse
Affiliation(s)
- Xue Li
- Department of Clinical Research Center, Dazhou Central Hospital, Dazhou, Sichuan 635000, China
| | - Meng Wu
- Department of Ultrasound, Zhongnan Hospital of Wuhan University, Wuhan, Hubei 430071, China
| | - Min Wu
- Department of Radiology, Huaxi MR Research Center (HMRRC), West China Hospital of Sichuan University, Chengdu 610041, China
| | - Jie Liu
- Department of General Surgery, Dazhou Central Hospital, Dazhou, Sichuan 635000, China
| | - Li Song
- Department of Clinical laboratory, Dazhou Central Hospital, Dazhou, Sichuan 635000, China
| | - Jiasi Wang
- Department of Clinical laboratory, Dazhou Central Hospital, Dazhou, Sichuan 635000, China
| | - Jun Zhou
- Department of Clinical Research Center, Dazhou Central Hospital, Dazhou, Sichuan 635000, China
| | - Shilin Li
- Department of Clinical Research Center, Dazhou Central Hospital, Dazhou, Sichuan 635000, China
| | - Hang Yang
- Department of Clinical Research Center, Dazhou Central Hospital, Dazhou, Sichuan 635000, China
| | - Jun Zhang
- Department of General Surgery, Dazhou Central Hospital, Dazhou, Sichuan 635000, China
| | - Xinwu Cui
- Department of Medical Ultrasound, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, 1095 Jiefang Road, Wuhan 430030, China
| | - Zhenyu Liu
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100080, China
| | - Fanxin Zeng
- Department of Clinical Research Center, Dazhou Central Hospital, Dazhou, Sichuan 635000, China
| |
Collapse
|
9
|
Wu H, Liu X, Fang Y, Yang Y, Huang Y, Pan X, Shen HB. Decoding protein binding landscape on circular RNAs with base-resolution transformer models. Comput Biol Med 2024; 171:108175. [PMID: 38402841 DOI: 10.1016/j.compbiomed.2024.108175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 01/16/2024] [Accepted: 02/18/2024] [Indexed: 02/27/2024]
Abstract
Circular RNAs (circRNAs), a class of endogenous RNA with a covalent loop structure, can regulate gene expression by serving as sponges for microRNAs and RNA-binding proteins (RBPs). To date, most computational methods for predicting RBP binding sites on circRNAs focus on circRNA fragments instead of circRNAs. These methods detect whether a circRNA fragment contains binding sites, but cannot determine where are the binding sites and how many binding sites are on the circRNA transcript. We report a hybrid deep learning-based tool, CircSite, to predict RBP binding sites at single-nucleotide resolution and detect key contributed nucleotides on circRNA transcripts. CircSite takes advantage of convolutional neural networks (CNNs) and Transformer for learning local and global representations of circRNAs binding to RBPs, respectively. We construct 37 datasets of circRNAs interacting with proteins for benchmarking and the experimental results show that CircSite offers accurate predictions of RBP binding nucleotides and detects key subsequences aligning well with known binding motifs. CircSite is an easy-to-use online webserver for predicting RBP binding sites on circRNA transcripts and freely available at http://www.csbio.sjtu.edu.cn/bioinf/CircSite/.
Collapse
Affiliation(s)
- Hehe Wu
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, And Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Xiaojian Liu
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, And Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Yi Fang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, And Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Yang Yang
- Center for Brain-Like Computing and Machine Intelligence, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yan Huang
- State Key Laboratory of Infrared Physics, Shanghai Institute of Technical Physics Chinese Academy of Sciences, 500 Yutian Road, Shanghai, 200083, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, And Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, And Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| |
Collapse
|
10
|
Karakaya O, Kilimci ZH. An efficient consolidation of word embedding and deep learning techniques for classifying anticancer peptides: FastText+BiLSTM. PeerJ Comput Sci 2024; 10:e1831. [PMID: 38435607 PMCID: PMC10909209 DOI: 10.7717/peerj-cs.1831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 12/31/2023] [Indexed: 03/05/2024]
Abstract
Anticancer peptides (ACPs) are a group of peptides that exhibit antineoplastic properties. The utilization of ACPs in cancer prevention can present a viable substitute for conventional cancer therapeutics, as they possess a higher degree of selectivity and safety. Recent scientific advancements generate an interest in peptide-based therapies which offer the advantage of efficiently treating intended cells without negatively impacting normal cells. However, as the number of peptide sequences continues to increase rapidly, developing a reliable and precise prediction model becomes a challenging task. In this work, our motivation is to advance an efficient model for categorizing anticancer peptides employing the consolidation of word embedding and deep learning models. First, Word2Vec, GloVe, FastText, One-Hot-Encoding approaches are evaluated as embedding techniques for the purpose of extracting peptide sequences. Then, the output of embedding models are fed into deep learning approaches CNN, LSTM, BiLSTM. To demonstrate the contribution of proposed framework, extensive experiments are carried on widely-used datasets in the literature, ACPs250 and independent. Experiment results show the usage of proposed model enhances classification accuracy when compared to the state-of-the-art studies. The proposed combination, FastText+BiLSTM, exhibits 92.50% of accuracy for ACPs250 dataset, and 96.15% of accuracy for the Independent dataset, thence determining new state-of-the-art.
Collapse
Affiliation(s)
- Onur Karakaya
- Research and Development Inc., Turkcell Technology, İstanbul, Turkey
| | - Zeynep Hilal Kilimci
- Department of Information Systems Engineering, Kocaeli University, Kocaeli, Turkey
| |
Collapse
|
11
|
Suleman MT, Alturise F, Alkhalifah T, Khan YD. m1A-Ensem: accurate identification of 1-methyladenosine sites through ensemble models. BioData Min 2024; 17:4. [PMID: 38360720 PMCID: PMC10868122 DOI: 10.1186/s13040-023-00353-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 12/31/2023] [Indexed: 02/17/2024] Open
Abstract
BACKGROUND 1-methyladenosine (m1A) is a variant of methyladenosine that holds a methyl substituent in the 1st position having a prominent role in RNA stability and human metabolites. OBJECTIVE Traditional approaches, such as mass spectrometry and site-directed mutagenesis, proved to be time-consuming and complicated. METHODOLOGY The present research focused on the identification of m1A sites within RNA sequences using novel feature development mechanisms. The obtained features were used to train the ensemble models, including blending, boosting, and bagging. Independent testing and k-fold cross validation were then performed on the trained ensemble models. RESULTS The proposed model outperformed the preexisting predictors and revealed optimized scores based on major accuracy metrics. CONCLUSION For research purpose, a user-friendly webserver of the proposed model can be accessed through https://taseersuleman-m1a-ensem1.streamlit.app/ .
Collapse
Affiliation(s)
- Muhammad Taseer Suleman
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, 54770, Pakistan
| | - Fahad Alturise
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia.
| | - Tamim Alkhalifah
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, 54770, Pakistan
| |
Collapse
|
12
|
Nopour R. Screening ovarian cancer by using risk factors: machine learning assists. Biomed Eng Online 2024; 23:18. [PMID: 38347611 PMCID: PMC10863117 DOI: 10.1186/s12938-024-01219-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 02/06/2024] [Indexed: 02/15/2024] Open
Abstract
BACKGROUND AND AIM Ovarian cancer (OC) is a prevalent and aggressive malignancy that poses a significant public health challenge. The lack of preventive strategies for OC increases morbidity, mortality, and other negative consequences. Screening OC through risk prediction could be leveraged as a powerful strategy for preventive purposes that have not received much attention. So, this study aimed to leverage machine learning approaches as predictive assistance solutions to screen high-risk groups of OC and achieve practical preventive purposes. MATERIALS AND METHODS As this study is data-driven and retrospective in nature, we leveraged 1516 suspicious OC women data from one concentrated database belonging to six clinical settings in Sari City from 2015 to 2019. Six machine learning (ML) algorithms, including XG-Boost, Random Forest (RF), J-48, support vector machine (SVM), K-nearest neighbor (KNN), and artificial neural network (ANN) were leveraged to construct prediction models for OC. To choose the best model for predicting OC, we compared various prediction models built using the area under the receiver characteristic operator curve (AU-ROC). RESULTS Current experimental results revealed that the XG-Boost with AU-ROC = 0.93 (0.95 CI = [0.91-0.95]) was recognized as the best-performing model for predicting OC. CONCLUSIONS ML approaches possess significant predictive efficiency and interoperability to achieve powerful preventive strategies leveraging OC screening high-risk groups.
Collapse
Affiliation(s)
- Raoof Nopour
- Department of Health Information Management, Student Research Committee, School of Health Management and Information Sciences Branch, Iran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
13
|
Pal J, Ghosh S, Maji B, Bhattacharya DK. Use of 2D FFT and DTW in Protein Sequence Comparison. Protein J 2024; 43:1-11. [PMID: 37848727 DOI: 10.1007/s10930-023-10160-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/20/2023] [Indexed: 10/19/2023]
Abstract
Protein sequence comparison remains a challenging work for the researchers owing to the computational complexity due to the presence of 20 amino acids compared with only four nucleotides in Genome sequences. Further, protein sequences of different species are of different lengths; it throws additional changes to the researchers to develop methods, specially alignment-free methods, to compare protein sequences. In this work, an efficient technique to compare protein sequences is developed by a graphical representation. First, the classified grouping of 20 amino acids with a cardinality of 4 based on polar class is considered to narrow down the representational range from 20 to 4. Then a unit vector technique based on a two-quadrant Cartesian system is proposed to provide a new two-dimensional graphical representation of the protein sequence. Now, two approaches are proposed to cope with the varying lengths of protein sequences from various species: one uses Dynamic Time Warping (DTW), while the other one uses a two-dimensional Fast Fourier Transform (2D FFT). Next, the effectiveness of these two techniques is analyzed using two evaluation criteria-quantitative measures based on symmetric distance (SD) and computational speed. An analysis is performed on five data sets of 9 ND4, 9 ND5, 9 ND6, 12 Baculovirus, and 24 TF proteins under the two methods. It is found that the FFT-based method produces the same results as DTW but in less computational time. It is found that the result of the proposed method agrees with the known biological reference. Further, the present method produces better clustering than the existing ones.
Collapse
Affiliation(s)
- Jayanta Pal
- Department of ECE, National Institute of Technology, Durgapur, India.
- Department of CSE, Narula Institute of Technology, Kolkata, India.
| | - Soumen Ghosh
- Department of ECE, National Institute of Technology, Durgapur, India
| | - Bansibadan Maji
- Department of ECE, National Institute of Technology, Durgapur, India
| | | |
Collapse
|
14
|
Hassan E, Abd El-Hafeez T, Shams MY. Optimizing classification of diseases through language model analysis of symptoms. Sci Rep 2024; 14:1507. [PMID: 38233458 PMCID: PMC10794698 DOI: 10.1038/s41598-024-51615-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 01/07/2024] [Indexed: 01/19/2024] Open
Abstract
This paper investigated the use of language models and deep learning techniques for automating disease prediction from symptoms. Specifically, we explored the use of two Medical Concept Normalization-Bidirectional Encoder Representations from Transformers (MCN-BERT) models and a Bidirectional Long Short-Term Memory (BiLSTM) model, each optimized with a different hyperparameter optimization method, to predict diseases from symptom descriptions. In this paper, we utilized two distinct dataset called Dataset-1, and Dataset-2. Dataset-1 consists of 1,200 data points, with each point representing a unique combination of disease labels and symptom descriptions. While, Dataset-2 is designed to identify Adverse Drug Reactions (ADRs) from Twitter data, comprising 23,516 rows categorized as ADR (1) or Non-ADR (0) tweets. The results indicate that the MCN-BERT model optimized with AdamP achieved 99.58% accuracy for Dataset-1 and 96.15% accuracy for Dataset-2. The MCN-BERT model optimized with AdamW performed well with 98.33% accuracy for Dataset-1 and 95.15% for Dataset-2, while the BiLSTM model optimized with Hyperopt achieved 97.08% accuracy for Dataset-1 and 94.15% for Dataset-2. Our findings suggest that language models and deep learning techniques have promise for supporting earlier detection and more prompt treatment of diseases, as well as expanding remote diagnostic capabilities. The MCN-BERT and BiLSTM models demonstrated robust performance in accurately predicting diseases from symptoms, indicating the potential for further related research.
Collapse
Affiliation(s)
- Esraa Hassan
- Faculty of Artificial Intelligence, Kafrelsheikh University, Kafrelsheikh, 33516, Egypt.
| | - Tarek Abd El-Hafeez
- Department of Computer Science, Faculty of Science, Minia University, Minia, 61519, Egypt.
- Computer Science Unit, Deraya University, Minia University, Minia, 61765, Egypt.
| | - Mahmoud Y Shams
- Faculty of Artificial Intelligence, Kafrelsheikh University, Kafrelsheikh, 33516, Egypt.
| |
Collapse
|
15
|
Aruwa CE, Sabiu S. Adipose tissue inflammation linked to obesity: A review of current understanding, therapies and relevance of phyto-therapeutics. Heliyon 2024; 10:e23114. [PMID: 38163110 PMCID: PMC10755291 DOI: 10.1016/j.heliyon.2023.e23114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 11/25/2023] [Accepted: 11/27/2023] [Indexed: 01/03/2024] Open
Abstract
Obesity is a current global challenge affecting all ages and is characterized by the up-regulated secretion of bioactive factors/pathways which result in adipose tissue inflammation (ATI). Current obesity therapies are mainly focused on lifestyle (diet/nutrition) changes. This is because many chemosynthetic anti-obesogenic medications cause adverse effects like diarrhoea, dyspepsia, and faecal incontinence, among others. As such, it is necessary to appraise the efficacies and mechanisms of action of safer, natural alternatives like plant-sourced compounds, extracts [extractable phenol (EP) and macromolecular antioxidant (MA) extracts], and anti-inflammatory peptides, among others, with a view to providing a unique approach to obesity care. These natural alternatives may constitute potent therapies for ATI linked to obesity. The potential of MA compounds (analysed for the first time in this review) and extracts in ATI and obesity management is elucidated upon, while also highlighting research gaps and future prospects. Furthermore, immune cells, signalling pathways, genes, and adipocyte cytokines play key roles in ATI responses and are targeted in certain therapies. As a result, this review gives an in-depth appraisal of ATI linked to obesity, its causes, mechanisms, and effects of past, present, and future therapies for reversal and alleviation of ATI. Achieving a significant decrease in morbidity and mortality rates attributed to ATI linked to obesity and related comorbidities is possible as research improves our understanding over time.
Collapse
Affiliation(s)
- Christiana Eleojo Aruwa
- Department of Biotechnology and Food Science, Durban University of Technology, PO Box 1334, Durban, 4000, South Africa
| | - Saheed Sabiu
- Department of Biotechnology and Food Science, Durban University of Technology, PO Box 1334, Durban, 4000, South Africa
| |
Collapse
|
16
|
Yaqoob A, Verma NK, Aziz RM. Optimizing Gene Selection and Cancer Classification with Hybrid Sine Cosine and Cuckoo Search Algorithm. J Med Syst 2024; 48:10. [PMID: 38193948 DOI: 10.1007/s10916-023-02031-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 12/28/2023] [Indexed: 01/10/2024]
Abstract
Gene expression datasets offer a wide range of information about various biological processes. However, it is difficult to find the important genes among the high-dimensional biological data due to the existence of redundant and unimportant ones. Numerous Feature Selection (FS) techniques have been created to get beyond this obstacle. Improving the efficacy and precision of FS methodologies is crucial in order to identify significant genes amongst complicated complex biological data. In this work, we present a novel approach to gene selection called the Sine Cosine and Cuckoo Search Algorithm (SCACSA). This hybrid method is designed to work with well-known machine learning classifiers Support Vector Machine (SVM). Using a dataset on breast cancer, the hybrid gene selection algorithm's performance is carefully assessed and compared to other feature selection methods. To improve the quality of the feature set, we use minimum Redundancy Maximum Relevance (mRMR) as a filtering strategy in the first step. The hybrid SCACSA method is then used to enhance and optimize the gene selection procedure. Lastly, we classify the dataset according to the chosen genes by using the SVM classifier. Given the pivotal role gene selection plays in unraveling complex biological datasets, SCACSA stands out as an invaluable tool for the classification of cancer datasets. The findings help medical practitioners make well-informed decisions about cancer diagnosis and provide them with a valuable tool for navigating the complex world of gene expression data.
Collapse
Affiliation(s)
- Abrar Yaqoob
- School of Advanced Sciences and Languages, VIT Bhopal University, Kothrikalan, Sehore, 466114, India.
| | - Navneet Kumar Verma
- School of Advanced Sciences and Languages, VIT Bhopal University, Kothrikalan, Sehore, 466114, India
| | - Rabia Musheer Aziz
- School of Advanced Sciences and Languages, VIT Bhopal University, Kothrikalan, Sehore, 466114, India
| |
Collapse
|
17
|
Zhang L, Xiao K, Kong L. A computational method for small molecule-RNA binding sites identification by utilizing position specificity and complex network information. Biosystems 2024; 235:105094. [PMID: 38056591 DOI: 10.1016/j.biosystems.2023.105094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Revised: 11/23/2023] [Accepted: 11/24/2023] [Indexed: 12/08/2023]
Abstract
Some computational methods have been given for small molecule-RNA binding site identification due to that it plays a significant role in revealing biology function researches. However, it is still challenging to design an accurate model, especially for MCC. We designed a feature extraction technology from two aspects (position specificity and complex network information). Specifically, complex network was employed to express the space topological structure and sequence position information for improving prediction effect. Then, the features fused position specificity and complex network information were input into random forest classifier for model construction. The AUC of 88.22%, 77.92% and 81.46% were obtained on three independent datasets (RB19, CS71, RB78). Compared with the existing method, the best MCC were obtained on three datasets, which were 8.19%, 0.59% and 4.35% higher than the state-of-the-art prediction methods, respectively. The outstanding performances show that our method is a powerful tool to identify RNA binding sites, helping to the design RNA-targeting small molecule drugs. The data and resource codes are available at https://github.com/Kangxiaoneuq/PCN_RNAsite.
Collapse
Affiliation(s)
- Lichao Zhang
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, 066000, PR China; Hebei Innovation Center for Smart Perception and Applied Technology of Agricultural Data, Qinhuangdao, 066000, PR China.
| | - Kang Xiao
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, 066000, PR China.
| | - Liang Kong
- Hebei Innovation Center for Smart Perception and Applied Technology of Agricultural Data, Qinhuangdao, 066000, PR China; School of Mathematics and Information Science & Technology, Hebei Normal University of Science & Technology, Qinhuangdao, 066000, PR China.
| |
Collapse
|
18
|
La Paglia L, Vazzana M, Mauro M, Urso A, Arizza V, Vizzini A. Bioactive Molecules from the Innate Immunity of Ascidians and Innovative Methods of Drug Discovery: A Computational Approach Based on Artificial Intelligence. Mar Drugs 2023; 22:6. [PMID: 38276644 PMCID: PMC10817596 DOI: 10.3390/md22010006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 12/12/2023] [Accepted: 12/17/2023] [Indexed: 01/27/2024] Open
Abstract
The study of bioactive molecules of marine origin has created an important bridge between biological knowledge and its applications in biotechnology and biomedicine. Current studies in different research fields, such as biomedicine, aim to discover marine molecules characterized by biological activities that can be used to produce potential drugs for human use. In recent decades, increasing attention has been paid to a particular group of marine invertebrates, the Ascidians, as they are a source of bioactive products. We describe omics data and computational methods relevant to identifying the mechanisms and processes of innate immunity underlying the biosynthesis of bioactive molecules, focusing on innovative computational approaches based on Artificial Intelligence. Since there is increasing attention on finding new solutions for a sustainable supply of bioactive compounds, we propose that a possible improvement in the biodiscovery pipeline might also come from the study and utilization of marine invertebrates' innate immunity.
Collapse
Affiliation(s)
- Laura La Paglia
- Istituto di Calcolo e Reti ad Alte Prestazioni–Consiglio Nazionale delle Ricerche, Via Ugo La Malfa 153, 90146 Palermo, Italy; (L.L.P.); (A.U.)
| | - Mirella Vazzana
- Dipartimento di Scienze e Tecnologie Biologiche, Chimiche e Farmaceutiche–Università di Palermo, Via Archirafi 18, 90100 Palermo, Italy; (M.V.); (M.M.); (V.A.)
| | - Manuela Mauro
- Dipartimento di Scienze e Tecnologie Biologiche, Chimiche e Farmaceutiche–Università di Palermo, Via Archirafi 18, 90100 Palermo, Italy; (M.V.); (M.M.); (V.A.)
| | - Alfonso Urso
- Istituto di Calcolo e Reti ad Alte Prestazioni–Consiglio Nazionale delle Ricerche, Via Ugo La Malfa 153, 90146 Palermo, Italy; (L.L.P.); (A.U.)
| | - Vincenzo Arizza
- Dipartimento di Scienze e Tecnologie Biologiche, Chimiche e Farmaceutiche–Università di Palermo, Via Archirafi 18, 90100 Palermo, Italy; (M.V.); (M.M.); (V.A.)
| | - Aiti Vizzini
- Dipartimento di Scienze e Tecnologie Biologiche, Chimiche e Farmaceutiche–Università di Palermo, Via Archirafi 18, 90100 Palermo, Italy; (M.V.); (M.M.); (V.A.)
| |
Collapse
|
19
|
Sutradhar A, Al Rafi M, Shamrat FMJM, Ghosh P, Das S, Islam MA, Ahmed K, Zhou X, Azad AKM, Alyami SA, Moni MA. BOO-ST and CBCEC: two novel hybrid machine learning methods aim to reduce the mortality of heart failure patients. Sci Rep 2023; 13:22874. [PMID: 38129433 PMCID: PMC10739972 DOI: 10.1038/s41598-023-48486-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 11/27/2023] [Indexed: 12/23/2023] Open
Abstract
Heart failure (HF) is a leading cause of mortality worldwide. Machine learning (ML) approaches have shown potential as an early detection tool for improving patient outcomes. Enhancing the effectiveness and clinical applicability of the ML model necessitates training an efficient classifier with a diverse set of high-quality datasets. Hence, we proposed two novel hybrid ML methods ((a) consisting of Boosting, SMOTE, and Tomek links (BOO-ST); (b) combining the best-performing conventional classifier with ensemble classifiers (CBCEC)) to serve as an efficient early warning system for HF mortality. The BOO-ST was introduced to tackle the challenge of class imbalance, while CBCEC was responsible for training the processed and selected features derived from the Feature Importance (FI) and Information Gain (IG) feature selection techniques. We also conducted an explicit and intuitive comprehension to explore the impact of potential characteristics correlating with the fatality cases of HF. The experimental results demonstrated the proposed classifier CBCEC showcases a significant accuracy of 93.67% in terms of providing the early forecasting of HF mortality. Therefore, we can reveal that our proposed aspects (BOO-ST and CBCEC) can be able to play a crucial role in preventing the death rate of HF and reducing stress in the healthcare sector.
Collapse
Affiliation(s)
- Ananda Sutradhar
- Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City (DSC), Birulia, Savar, Dhaka, 1216, Bangladesh
| | - Mustahsin Al Rafi
- Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City (DSC), Birulia, Savar, Dhaka, 1216, Bangladesh
| | - F M Javed Mehedi Shamrat
- Department of Computer System and Technology, University of Malaya, 50603, Kuala Lumpur, Malaysia
| | - Pronab Ghosh
- Department of Computer Science, Lakehead University, 955 Oliver Rd, Thunder Bay, ON, P7B 5E1, Canada
| | - Subrata Das
- Department of Computer Science, Lakehead University, 955 Oliver Rd, Thunder Bay, ON, P7B 5E1, Canada
| | - Md Anaytul Islam
- Department of Computer Science, Lakehead University, 955 Oliver Rd, Thunder Bay, ON, P7B 5E1, Canada
| | - Kawsar Ahmed
- Department of Electrical and Computer Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK, S7N 5A9, Canada
- Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Santosh, Tangail, 1902, Bangladesh
- Health Informatics Research Lab, Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh
| | - Xujuan Zhou
- School of Business, University of Southern Queensland, Toowoomba, Australia
| | - A K M Azad
- Department of Mathematics and Statistics, Faculty of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), 13318, Riyadh, Saudi Arabia
| | - Salem A Alyami
- Department of Mathematics and Statistics, Faculty of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), 13318, Riyadh, Saudi Arabia
| | - Mohammad Ali Moni
- Centre for AI & Digital Health Technology, Artificial Intelligence & Cyber Future Institute, Charles Stuart University, Bathurst, NSW, 2795, Australia.
| |
Collapse
|
20
|
Raza A, Uddin J, Almuhaimeed A, Akbar S, Zou Q, Ahmad A. AIPs-SnTCN: Predicting Anti-Inflammatory Peptides Using fastText and Transformer Encoder-Based Hybrid Word Embedding with Self-Normalized Temporal Convolutional Networks. J Chem Inf Model 2023; 63:6537-6554. [PMID: 37905969 DOI: 10.1021/acs.jcim.3c01563] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Inflammation is a biologically resistant response to harmful stimuli, such as infection, damaged cells, toxic chemicals, or tissue injuries. Its purpose is to eradicate pathogenic micro-organisms or irritants and facilitate tissue repair. Prolonged inflammation can result in chronic inflammatory diseases. However, wet-laboratory-based treatments are costly and time-consuming and may have adverse side effects on normal cells. In the past decade, peptide therapeutics have gained significant attention due to their high specificity in targeting affected cells without affecting healthy cells. Motivated by the significance of peptide-based therapies, we developed a highly discriminative prediction model called AIPs-SnTCN to predict anti-inflammatory peptides accurately. The peptide samples are encoded using word embedding techniques such as skip-gram and attention-based bidirectional encoder representation using a transformer (BERT). The conjoint triad feature (CTF) also collects structure-based cluster profile features. The fused vector of word embedding and sequential features is formed to compensate for the limitations of single encoding methods. Support vector machine-based recursive feature elimination (SVM-RFE) is applied to choose the ranking-based optimal space. The optimized feature space is trained by using an improved self-normalized temporal convolutional network (SnTCN). The AIPs-SnTCN model achieved a predictive accuracy of 95.86% and an AUC of 0.97 by using training samples. In the case of the alternate training data set, our model obtained an accuracy of 92.04% and an AUC of 0.96. The proposed AIPs-SnTCN model outperformed existing models with an ∼19% higher accuracy and an ∼14% higher AUC value. The reliability and efficacy of our AIPs-SnTCN model make it a valuable tool for scientists and may play a beneficial role in pharmaceutical design and research academia.
Collapse
Affiliation(s)
- Ali Raza
- Department of Physical and Numerical Sciences, Qurtuba University of Science and Information Technology, Peshawar, Khyber Pakhtunkhwa 25124, Pakistan
- Department of Computer Science, MY University, Islamabad 45750, Pakistan
| | - Jamal Uddin
- Department of Physical and Numerical Sciences, Qurtuba University of Science and Information Technology, Peshawar, Khyber Pakhtunkhwa 25124, Pakistan
| | - Abdullah Almuhaimeed
- Digital Health Institute, King Abdulaziz City for Science and Technology, Riyadh 11442, Saudi Arabia
| | - Shahid Akbar
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Khyber Pakhtunkhwa 23200, Pakistan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, PR China
| | - Ashfaq Ahmad
- Department of Computer Science, MY University, Islamabad 45750, Pakistan
| |
Collapse
|
21
|
Sun M, Hu H, Pang W, Zhou Y. ACP-BC: A Model for Accurate Identification of Anticancer Peptides Based on Fusion Features of Bidirectional Long Short-Term Memory and Chemically Derived Information. Int J Mol Sci 2023; 24:15447. [PMID: 37895128 PMCID: PMC10607064 DOI: 10.3390/ijms242015447] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 09/10/2023] [Accepted: 10/20/2023] [Indexed: 10/29/2023] Open
Abstract
Anticancer peptides (ACPs) have been proven to possess potent anticancer activities. Although computational methods have emerged for rapid ACPs identification, their accuracy still needs improvement. In this study, we propose a model called ACP-BC, a three-channel end-to-end model that utilizes various combinations of data augmentation techniques. In the first channel, features are extracted from the raw sequence using a bidirectional long short-term memory network. In the second channel, the entire sequence is converted into a chemical molecular formula, which is further simplified using Simplified Molecular Input Line Entry System notation to obtain deep abstract features through a bidirectional encoder representation transformer (BERT). In the third channel, we manually selected four effective features according to dipeptide composition, binary profile feature, k-mer sparse matrix, and pseudo amino acid composition. Notably, the application of chemical BERT in predicting ACPs is novel and successfully integrated into our model. To validate the performance of our model, we selected two benchmark datasets, ACPs740 and ACPs240. ACP-BC achieved prediction accuracy with 87% and 90% on these two datasets, respectively, representing improvements of 1.3% and 7% compared to existing state-of-the-art methods on these datasets. Therefore, systematic comparative experiments have shown that the ACP-BC can effectively identify anticancer peptides.
Collapse
Affiliation(s)
- Mingwei Sun
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (M.S.); (H.H.)
| | - Haoyuan Hu
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (M.S.); (H.H.)
| | - Wei Pang
- School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh EH14 4AS, UK;
| | - You Zhou
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (M.S.); (H.H.)
- College of Software, Jilin University, Changchun 130012, China
| |
Collapse
|
22
|
Kumar A, Rana PS. A deep learning based ensemble approach for protein allergen classification. PeerJ Comput Sci 2023; 9:e1622. [PMID: 37869456 PMCID: PMC10588724 DOI: 10.7717/peerj-cs.1622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 09/07/2023] [Indexed: 10/24/2023]
Abstract
In recent years, the increased population has led to an increase in the demand for various industrially processed edibles and other consumable products. These industries regularly alter the proteins found in raw materials to generate more commercially viable end-products in order to keep up with consumer demand. These modifications result in a substance that may cause allergic reactions in consumers, thereby creating a protein allergen. The detection of such proteins in various substances is essential for the prevention, diagnosis and treatment of allergic conditions. Bioinformatics and computational methods can be used to analyze the information contained in amino-acid sequences to detect possible allergens. The article presents a deep learning based ensemble approach to identify protein allergens using Extra Tree, Deep Belief Network (DBN), and CatBoost models. The proposed ensemble model achieves higher detection accuracy by combining the prediction results of the three models using majority voting. The evaluation of the proposed model was carried out on the benchmark protein allergen dataset, and the performance analysis revealed that the proposed model outperforms the other state-of-the-art literature techniques with a protein allergen detection accuracy of 89.16%.
Collapse
Affiliation(s)
- Arun Kumar
- Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala, Punjab, India
| | - Prashant Singh Rana
- Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala, Punjab, India
| |
Collapse
|
23
|
Singh V, Singh SK. A separable temporal convolutional networks based deep learning technique for discovering antiviral medicines. Sci Rep 2023; 13:13722. [PMID: 37608092 PMCID: PMC10444765 DOI: 10.1038/s41598-023-40922-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Accepted: 08/18/2023] [Indexed: 08/24/2023] Open
Abstract
An alarming number of fatalities caused by the COVID-19 pandemic has forced the scientific community to accelerate the process of therapeutic drug discovery. In this regard, the collaboration between biomedical scientists and experts in artificial intelligence (AI) has led to a number of in silico tools being developed for the initial screening of therapeutic molecules. All living organisms produce antiviral peptides (AVPs) as a part of their first line of defense against invading viruses. The Deep-AVPiden model proposed in this paper and its corresponding web app, deployed at https://deep-avpiden.anvil.app , is an effort toward discovering novel AVPs in proteomes of living organisms. Apart from Deep-AVPiden, a computationally efficient model called Deep-AVPiden (DS) has also been developed using the same underlying network but with point-wise separable convolutions. The Deep-AVPiden and Deep-AVPiden (DS) models show an accuracy of 90% and 88%, respectively, and both have a precision of 90%. Also, the proposed models were statistically compared using the Student's t-test. On comparing the proposed models with the state-of-the-art classifiers, it was found that they are much better than them. To test the proposed model, we identified some AVPs in the natural defense proteins of plants, mammals, and fishes and found them to have appreciable sequence similarity with some experimentally validated antimicrobial peptides. These AVPs can be chemically synthesized and tested for their antiviral activity.
Collapse
Affiliation(s)
- Vishakha Singh
- Department of Computer Science and Engineering, Indian Institute of Technology (BHU) Varanasi, Varanasi, Uttar Pradesh, 221005, India.
| | - Sanjay Kumar Singh
- Department of Computer Science and Engineering, Indian Institute of Technology (BHU) Varanasi, Varanasi, Uttar Pradesh, 221005, India.
| |
Collapse
|
24
|
Ali F, Kumar H, Alghamdi W, Kateb FA, Alarfaj FK. Recent Advances in Machine Learning-Based Models for Prediction of Antiviral Peptides. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING : STATE OF THE ART REVIEWS 2023; 30:1-12. [PMID: 37359746 PMCID: PMC10148704 DOI: 10.1007/s11831-023-09933-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/19/2023] [Indexed: 06/28/2023]
Abstract
Viruses have killed and infected millions of people across the world. It causes several chronic diseases like COVID-19, HIV, and hepatitis. To cope with such diseases and virus infections, antiviral peptides (AVPs) have been applied in the design of drugs. Keeping in view the significant role in pharmaceutical industry and other research fields, identification of AVPs is highly indispensable. In this connection, experimental and computational methods were proposed to identify AVPs. However, more accurate predictors for boosting AVPs identification are highly desirable. This work presents a thorough study and reports the available predictors of AVPs. We explained applied datasets, feature representation approaches, classification algorithms, and evaluation parameters of performance. In this study, the limitations of the existing studies and the best methods were emphasized. Provided the pros and cons of the applied classifiers. The future insights demonstrate efficient feature encoding approaches, best feature optimization schemes, and effective classification techniques that can improve the performance of novel method for accurate prediction of AVPs.
Collapse
Affiliation(s)
- Farman Ali
- Sarhad University of Science and Information Technology Peshawar, Mardan Campus, Khyber Pakhtunkhwa, Pakistan
| | - Harish Kumar
- Department of Computer Science, College of Computer Science, King Khalid University, Abha, Saudi Arabia
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, 21589 Saudi Arabia
| | - Faris A. Kateb
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, 21589 Saudi Arabia
| | - Fawaz Khaled Alarfaj
- Department of Management Information Systems, King Faisal University, Hufof, Saudi Arabia
| |
Collapse
|
25
|
Domain Contrast Network for cross-muscle ALS disease identification with EMG signal. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2023.104582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
26
|
Deng H, Ding M, Wang Y, Li W, Liu G, Tang Y. ACP-MLC: A two-level prediction engine for identification of anticancer peptides and multi-label classification of their functional types. Comput Biol Med 2023; 158:106844. [PMID: 37058760 DOI: 10.1016/j.compbiomed.2023.106844] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 03/09/2023] [Accepted: 03/30/2023] [Indexed: 04/07/2023]
Abstract
Anticancer peptides (ACPs), a series of short bioactive peptides, are promising candidates in fighting against cancer due to their high activity, low toxicity, and not likely cause drug resistance. The accurate identification of ACPs and classification of their functional types is of great importance for investigating their mechanisms of action and developing peptide-based anticancer therapies. Here, we provided a computational tool, called ACP-MLC, to address binary classification and multi-label classification of ACPs for a given peptide sequence. Briefly, ACP-MLC is a two-level prediction engine, in which the 1st-level model predicts whether a query sequence is an ACP or not by random forest algorithm, and the 2nd-level model predicts which tissue types the sequence might target by the binary relevance algorithm. Development and evaluation by high-quality datasets, our ACP-MLC yielded an area under the receiver operating characteristic curve (AUC) of 0.888 on the independent test set for the 1st-level prediction, and obtained 0.157 hamming loss, 0.577 subset accuracy, 0.802 F1-scoremacro, and 0.826 F1-scoremicro on the independent test set for the 2nd-level prediction. A systematic comparison demonstrated that ACP-MLC outperformed existing binary classifiers and other multi-label learning classifiers for ACP prediction. Finally, we interpreted the important features of ACP-MLC by the SHAP method. User-friendly software and the datasets are available at https://github.com/Nicole-DH/ACP-MLC. We believe that the ACP-MLC would be a powerful tool in ACP discovery.
Collapse
|
27
|
Zhang Z, Zhu H, Wang X, Lin S, Ruan C, Wang Q. A novel basement membrane-related gene signature for prognosis of lung adenocarcinomas. Comput Biol Med 2023; 154:106597. [PMID: 36708655 DOI: 10.1016/j.compbiomed.2023.106597] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/01/2022] [Accepted: 01/22/2023] [Indexed: 01/25/2023]
Abstract
BACKGROUND Lung adenocarcinoma (LUAD) remains a global health concern with its poor prognosis and high mortality. Whether tumor cells invade through the basement membrane (BM) is the key factor to determine the prognosis of LUAD. This study aimed to identify the BM-related gene signatures to improve the overall prognosis of LUAD. MATERIALS & METHODS A series of bioinformatics analyses were conducted based on TCGA and GEO datasets. Unsupervised consistent cluster analysis was performed, and 500 LUAD patients were assigned to two different groups according to expressions of 222 BM-related genes. The differentially expressed genes (DEGs) between the two clusters were identified, and Lasso regression, ROC curve, univariate and multivariate Cox regression analyses and enrichment analysis were conducted. Besides, ssGSEA, CIBERSORT and ESTIMATE algorithmwere were employed to understand the relationship between the tumor microenvironment (TME) and risk scores. Moreover, single cell clustering and trajectory analyses were performed to further understand the significance of BM-related genes. Finally, qRT-PCR was used to verify the prognosis model. RESULTS A total of 31 prognostic BM-related genes were determined for LUAD, and a novel 17-mRNA prognostic model named BMsocre was successfully established to predict the overall survival of LUAD patients. The high BMscore group indicated worse prognosis. Seventeen DEGs were enriched mainly in metabolism, ECM-receptor interaction and immune response. In addition, the high-risk group showed higher TMB and lower immune score. The low-risk group had a better immunotherapeutic response where immune escape was less likely. The BMscore model was verified in our patient cohort. Furthermore, NELL2 was mainly expressed in clusters of T cells, and was identified to play a critical role in T-cell differentiation. CONCLUSIONS A novel BMscore model was successfully established and might be effective for providing guidance to LUAD therapy.
Collapse
Affiliation(s)
- Zhenxing Zhang
- Department of Thoracic and Maxillofacial Surgery (B7X), Taizhou Central Hospital (Taizhou University Hospital), Taizhou, Zhejiang Province, China
| | - Haoran Zhu
- Xi'an Jiaotong University Health Science Center, Xi'an, Shaanxi Province, China
| | - Xiaojun Wang
- Department of Thoracic Surgery, Taizhou Central Hospital (Taizhou University Hospital), Taizhou, Zhejiang Province, China
| | - Shanan Lin
- Department of Thoracic Surgery, Taizhou Central Hospital (Taizhou University Hospital), Taizhou, Zhejiang Province, China
| | - Chenjin Ruan
- Department of Thoracic Surgery, Taizhou Central Hospital (Taizhou University Hospital), Taizhou, Zhejiang Province, China
| | - Qiang Wang
- Department of Thoracic Surgery, Taizhou Central Hospital (Taizhou University Hospital), Taizhou, Zhejiang Province, China.
| |
Collapse
|
28
|
Wang C, Zou Q. Prediction of protein solubility based on sequence physicochemical patterns and distributed representation information with DeepSoluE. BMC Biol 2023; 21:12. [PMID: 36694239 PMCID: PMC9875434 DOI: 10.1186/s12915-023-01510-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 01/05/2023] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Protein solubility is a precondition for efficient heterologous protein expression at the basis of most industrial applications and for functional interpretation in basic research. However, recurrent formation of inclusion bodies is still an inevitable roadblock in protein science and industry, where only nearly a quarter of proteins can be successfully expressed in soluble form. Despite numerous solubility prediction models having been developed over time, their performance remains unsatisfactory in the context of the current strong increase in available protein sequences. Hence, it is imperative to develop novel and highly accurate predictors that enable the prioritization of highly soluble proteins to reduce the cost of actual experimental work. RESULTS In this study, we developed a novel tool, DeepSoluE, which predicts protein solubility using a long-short-term memory (LSTM) network with hybrid features composed of physicochemical patterns and distributed representation of amino acids. Comparison results showed that the proposed model achieved more accurate and balanced performance than existing tools. Furthermore, we explored specific features that have a dominant impact on the model performance as well as their interaction effects. CONCLUSIONS DeepSoluE is suitable for the prediction of protein solubility in E. coli; it serves as a bioinformatics tool for prescreening of potentially soluble targets to reduce the cost of wet-experimental studies. The publicly available webserver is freely accessible at http://lab.malab.cn/~wangchao/softs/DeepSoluE/ .
Collapse
Affiliation(s)
- Chao Wang
- grid.411307.00000 0004 1790 5236School of Software Engineering, Chengdu University of Information Technology, Chengdu, China
| | - Quan Zou
- grid.54549.390000 0004 0369 4060Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
29
|
Ghaly G, Tallima H, Dabbish E, Badr ElDin N, Abd El-Rahman MK, Ibrahim MAA, Shoeib T. Anti-Cancer Peptides: Status and Future Prospects. Molecules 2023; 28:molecules28031148. [PMID: 36770815 PMCID: PMC9920184 DOI: 10.3390/molecules28031148] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Revised: 12/26/2022] [Accepted: 01/19/2023] [Indexed: 01/26/2023] Open
Abstract
The dramatic rise in cancer incidence, alongside treatment deficiencies, has elevated cancer to the second-leading cause of death globally. The increasing morbidity and mortality of this disease can be traced back to a number of causes, including treatment-related side effects, drug resistance, inadequate curative treatment and tumor relapse. Recently, anti-cancer bioactive peptides (ACPs) have emerged as a potential therapeutic choice within the pharmaceutical arsenal due to their high penetration, specificity and fewer side effects. In this contribution, we present a general overview of the literature concerning the conformational structures, modes of action and membrane interaction mechanisms of ACPs, as well as provide recent examples of their successful employment as targeting ligands in cancer treatment. The use of ACPs as a diagnostic tool is summarized, and their advantages in these applications are highlighted. This review expounds on the main approaches for peptide synthesis along with their reconstruction and modification needed to enhance their therapeutic effect. Computational approaches that could predict therapeutic efficacy and suggest ACP candidates for experimental studies are discussed. Future research prospects in this rapidly expanding area are also offered.
Collapse
Affiliation(s)
- Gehane Ghaly
- Department of Chemistry, The American University in Cairo, New Cairo 11835, Egypt
| | - Hatem Tallima
- Department of Chemistry, The American University in Cairo, New Cairo 11835, Egypt
| | - Eslam Dabbish
- Department of Chemistry, The American University in Cairo, New Cairo 11835, Egypt
| | - Norhan Badr ElDin
- Analytical Chemistry Department, Faculty of Pharmacy, Cairo University, Kasr-El Aini Street, Cairo 11562, Egypt
| | - Mohamed K. Abd El-Rahman
- Analytical Chemistry Department, Faculty of Pharmacy, Cairo University, Kasr-El Aini Street, Cairo 11562, Egypt
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, MA 02138, USA
| | - Mahmoud A. A. Ibrahim
- Computational Chemistry Laboratory, Chemistry Department, Faculty of Science, Minia University, Minia 61519, Egypt
- School of Health Sciences, University of Kwa-Zulu-Natal, Westville, Durban 4000, South Africa
| | - Tamer Shoeib
- Department of Chemistry, The American University in Cairo, New Cairo 11835, Egypt
- Correspondence:
| |
Collapse
|
30
|
Kordi M, Borzouyi Z, Chitsaz S, Asmaei MH, Salami R, Tabarzad M. Antimicrobial peptides with anticancer activity: Today status, trends and their computational design. Arch Biochem Biophys 2023; 733:109484. [PMID: 36473507 DOI: 10.1016/j.abb.2022.109484] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 11/29/2022] [Accepted: 11/30/2022] [Indexed: 12/12/2022]
Abstract
Some antimicrobial peptides have been shown to be able to inhibit the proliferation of cancer cell lines. Various strategies for treating cancers with active peptides have been pursued. According to the reports, anticancer peptides are important therapeutic peptides, which can act through two distinct pathways: they either just create pores in the cell membrane, or they have a vital intracellular target. In this review, publications up to Sep. 2021 had extracted form Scopus and PubMed using "antimicrobial peptide" and "anticancer peptide" as keywords. In second step, "computational design" related publications extracted. Among publications, those have similar scopes were classified and selected based on mechanisms of action and application. In this review, the most recent advances in the field of antimicrobial peptides with anti-cancer activities have been summarized. Freely available webservers such as AntiCP, ACPP, iACP, iACP-GAEnsC, ACPred are discussed here. In conclusion, despite some limitations of ACPs such as production cost and challenges, short half-life and toxicity on normal cells, the beneficial properties of AMPs make some of them good therapeutic agents for cancer therapy. Towards designing novel ACPs, the computational methods have substantial position and have been used progressively, today.
Collapse
Affiliation(s)
- Masoumeh Kordi
- Department of Plant Science and Biotechnology, School of Life Sciences and Biotechnology, Shahid Beheshti University, Tehran, Iran.
| | - Zeynab Borzouyi
- Department of Agriculture, School of Agriculture and Plant Breeding, Islamic Azad University, Sabzevar, Iran
| | - Saideh Chitsaz
- Department of Microbiology, Islamic Azad University, Karaj, Iran
| | | | - Robab Salami
- Department of Plant Science and Biotechnology, School of Life Sciences and Biotechnology, Shahid Beheshti University, Tehran, Iran
| | - Maryam Tabarzad
- Protein Technology Research Center, Shahid Beheshti University of Medical Science, Iran.
| |
Collapse
|
31
|
Guo X, Tiwari P, Zou Q, Ding Y. Subspace projection-based weighted echo state networks for predicting therapeutic peptides. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
32
|
Li J, Cao F, Gao Q, Liang K, Tang Y. Improving diagnosis accuracy of non-small cell lung carcinoma on noisy data by adaptive group lasso regularized multinomial regression. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
33
|
Ruiz de Miras J, Ibáñez-Molina A, Soriano M, Iglesias-Parro S. Schizophrenia classification using machine learning on resting state EEG signal. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
34
|
Liang Y, Ma X. iACP-GE: accurate identification of anticancer peptides by using gradient boosting decision tree and extra tree. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2023; 34:1-19. [PMID: 36562289 DOI: 10.1080/1062936x.2022.2160011] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 12/12/2022] [Indexed: 06/17/2023]
Abstract
Cancer is one of the main diseases threatening human life, accounting for millions of deaths around the world each year. Traditional physical and chemical methods for cancer treatment are extremely time-consuming, lab-intensive, expensive, inefficient and difficult to be applied in a high-throughput way. Hence, it is an urgent task to develop automated computational methods to enable fast and accurate identification of anticancer peptides (ACPs). In this paper, we develop a novel model named iACP-GE to identify ACPs. Multi-features are extracted by using binary encoding, enhanced grouped amino acid composition and BLOSUM62 encoding based on the N5C5 sequence, as well as detrended forward moving-average auto-cross correlation analysis based on physicochemical properties of 20 natural amino acids. Thus, 835 features are obtained for each sample, in order to avoid information redundancy, gradient boosting decision tree was adopted as the feature selection strategy. Then, the optimal feature subset is input to the extra tree classifier. The accuracies of ACP740 and ACP240 datasets with the 5-fold cross-validation were 90.54% and 91.25%, respectively. Experimental results indicate that iACP-GE significantly outperforms several existing models on ACP740 and ACP240 datasets and can be used as an effective tool for the identification of ACPs. The datasets and source codes for iACP-GE are available at https://github.com/yunyunliang88/iACP-GE.
Collapse
Affiliation(s)
- Y Liang
- School of Science, Xi'an Polytechnic University, Xi'an, P. R. China
| | - X Ma
- School of Science, Xi'an Polytechnic University, Xi'an, P. R. China
| |
Collapse
|
35
|
Wu X, Zeng W, Lin F. GCNCPR-ACPs: a novel graph convolution network method for ACPs prediction. BMC Bioinformatics 2022; 23:560. [PMID: 36564705 PMCID: PMC9789540 DOI: 10.1186/s12859-022-04771-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 05/31/2022] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Anticancer peptide (ACP) inhibits and kills tumor cells. Research on ACP is of great significance for the development of new drugs, and the prediction of ACPs and non-ACPs is the new hotspot. RESULTS We propose a new machine learning-based method named GCNCPR-ACPs (a Graph Convolutional Neural Network Method based on collapse pooling and residual network to predict the ACPs), which automatically and accurately predicts ACPs using residual graph convolution networks, differentiable graph pooling, and features extracted using peptide sequence information extraction. The GCNCPR-ACPs method can effectively capture different levels of node attributes for amino acid node representation learning, GCNCPR-ACPs uses node2vec and one-hot embedding methods to extract initial amino acid features for ACP prediction. CONCLUSIONS Experimental results of ten-fold cross-validation and independent validation based on different metrics showed that GCNCPR-ACPs significantly outperformed state-of-the-art methods. Specifically, the evaluation indicators of Matthews Correlation Coefficient (MCC) and AUC of our predicator were 69.5% and 90%, respectively, which were 4.3% and 2% higher than those of the other predictors, respectively, in ten-fold cross-validation. And in the independent test, the scores of MCC and SP were 69.6% and 93.9%, respectively, which were 37.6% and 5.5% higher than those of the other predictors, respectively. The overall results showed that the GCNCPR-ACPs method proposed in the current paper can effectively predict ACPs.
Collapse
Affiliation(s)
- Xiujin Wu
- grid.12955.3a0000 0001 2264 7233School of Informatics, Xiamen University, Xiamen, Fujian China
| | - Wenhua Zeng
- grid.12955.3a0000 0001 2264 7233School of Informatics, Xiamen University, Xiamen, Fujian China
| | - Fan Lin
- grid.12955.3a0000 0001 2264 7233School of Informatics, Xiamen University, Xiamen, Fujian China ,grid.2515.30000 0004 0378 8438Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA USA
| |
Collapse
|
36
|
ACPred-BMF: bidirectional LSTM with multiple feature representations for explainable anticancer peptide prediction. Sci Rep 2022; 12:21915. [PMID: 36535969 PMCID: PMC9763336 DOI: 10.1038/s41598-022-24404-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 11/15/2022] [Indexed: 12/24/2022] Open
Abstract
Cancer has become a major factor threatening human life and health. Under the circumstance that traditional treatment methods such as chemotherapy and radiotherapy are not highly specific and often cause severe side effects and toxicity, new treatment methods are urgently needed. Anticancer peptide drugs have low toxicity, stronger efficacy and specificity, and have emerged as a new type of cancer treatment drugs. However, experimental identification of anticancer peptides is time-consuming and expensive, and difficult to perform in a high-throughput manner. Computational identification of anticancer peptides can make up for the shortcomings of experimental identification. In this study, a deep learning-based predictor named ACPred-BMF is proposed for the prediction of anticancer peptides. This method uses the quantitative and qualitative properties of amino acids, binary profile feature to numerical representation for the peptide sequences. The Bidirectional LSTM network architecture is used in the model, and the attention mechanism is also considered. To alleviate the black-box problem of deep learning model prediction, we visualized the automatically extracted features and used the Shapley additive explanations algorithm to determine the importance of features to further understand the anticancer peptide mechanism. The results show that our method is one of the state-of-the-art anticancer peptide predictors. A web server as the implementation of ACPred-BMF that can be accessed via: http://mialab.ruc.edu.cn/ACPredBMFServer/ .
Collapse
|
37
|
DBP-iDWT: Improving DNA-Binding Proteins Prediction Using Multi-Perspective Evolutionary Profile and Discrete Wavelet Transform. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:2987407. [PMID: 36211019 PMCID: PMC9534628 DOI: 10.1155/2022/2987407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 08/19/2022] [Accepted: 09/09/2022] [Indexed: 11/17/2022]
Abstract
DNA-binding proteins (DBPs) have crucial biotic activities including DNA replication, recombination, and transcription. DBPs are highly concerned with chronic diseases and are used in the manufacturing of antibiotics and steroids. A series of predictors were established to identify DBPs. However, researchers are still working to further enhance the identification of DBPs. This research designed a novel predictor to identify DBPs more accurately. The features from the sequences are transformed by F-PSSM (Filtered position-specific scoring matrix), PSSM-DPC (Position specific scoring matrix-dipeptide composition), and R-PSSM (Reduced position-specific scoring matrix). To eliminate the noisy attributes, we extended DWT (discrete wavelet transform) to F-PSSM, PSSM-DPC, and R-PSSM and introduced three novel descriptors, namely, F-PSSM-DWT, PSSM-DPC-DWT, and R-PSSM-DWT. Onward, the training of the four models were performed using LiXGB (Light eXtreme gradient boosting), XGB (eXtreme gradient boosting, ERT (extremely randomized trees), and Adaboost. LiXGB with R-PSSM-DWT has attained 6.55% higher accuracy on training and 5.93% on testing dataset than the best existing predictors. The results reveal the excellent performance of our novel predictor over the past studies. DBP-iDWT would be fruitful for establishing more operative therapeutic strategies for fatal disease treatment.
Collapse
|
38
|
Akbar S, Hayat M, Tahir M, Khan S, Alarfaj FK. cACP-DeepGram: Classification of anticancer peptides via deep neural network and skip-gram-based word embedding model. Artif Intell Med 2022; 131:102349. [DOI: 10.1016/j.artmed.2022.102349] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 05/24/2022] [Accepted: 07/04/2022] [Indexed: 12/28/2022]
|
39
|
Zou H, Yang F, Yin Z. Integrating multiple sequence features for identifying anticancer peptides. Comput Biol Chem 2022; 99:107711. [DOI: 10.1016/j.compbiolchem.2022.107711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 05/16/2022] [Accepted: 05/29/2022] [Indexed: 11/03/2022]
|
40
|
Zakharova E, Orsi M, Capecchi A, Reymond JL. Machine Learning Guided Discovery of Non-Hemolytic Membrane Disruptive Anti-Cancer Peptides. ChemMedChem 2022; 17:e202200291. [PMID: 35880810 PMCID: PMC9541320 DOI: 10.1002/cmdc.202200291] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 06/29/2022] [Indexed: 12/05/2022]
Abstract
Most antimicrobial peptides (AMPs) and anticancer peptides (ACPs) fold into membrane disruptive cationic amphiphilic α‐helices, many of which are however also unpredictably hemolytic and toxic. Here we exploited the ability of recurrent neural networks (RNN) to distinguish active from inactive and non‐hemolytic from hemolytic AMPs and ACPs to discover new non‐hemolytic ACPs. Our discovery pipeline involved: 1) sequence generation using either a generative RNN or a genetic algorithm, 2) RNN classification for activity and hemolysis, 3) selection for sequence novelty, helicity and amphiphilicity, and 4) synthesis and testing. Experimental evaluation of thirty‐three peptides resulted in eleven active ACPs, four of which were non‐hemolytic, with properties resembling those of the natural ACP lasioglossin III. These experiments show the first example of direct machine learning guided discovery of non‐hemolytic ACPs.
Collapse
Affiliation(s)
- Elena Zakharova
- University of Bern: Universitat Bern, Departement of Chemistry, Biochemistry and Pharmaceutical Sciences, SWITZERLAND
| | - Markus Orsi
- University of Bern: Universitat Bern, Departement of Chemistry, Biochemistry and Pharmaceutical Sciences, SWITZERLAND
| | - Alice Capecchi
- University of Bern: Universitat Bern, Departement of Chemistry, Biochemistry and Pharmaceutical Sciences, SWITZERLAND
| | - Jean-Louis Reymond
- Universität Bern: Universitat Bern, Department of Chemistry and Biochemistry, Department of Chemistry and Biochemistry, Freiestrasse 3, 3012, Switzerland, 3012, Bern, SWITZERLAND
| |
Collapse
|
41
|
Zhu L, Ye C, Hu X, Yang S, Zhu C. ACP-check: An anticancer peptide prediction model based on bidirectional long short-term memory and multi-features fusion strategy. Comput Biol Med 2022; 148:105868. [PMID: 35868046 DOI: 10.1016/j.compbiomed.2022.105868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 06/14/2022] [Accepted: 07/09/2022] [Indexed: 11/16/2022]
Abstract
The anticancer peptide is an emerging anticancer drug that has become an effective alternative to chemotherapy and targeted therapy due to fewer side effects and resistance. The traditional biological experimental method for identifying anticancer peptides is a time-consuming and complicated process that hinders large-scale, rapid, and effective identification. In this paper, we propose a model based on a bidirectional long short-term memory network and multi-features fusion, called ACP-check, which employs a bidirectional long short-term memory network to extract time-dependent information features from peptide sequences, and combines them with amino acid sequence features including binary profile feature, dipeptide composition, the composition of k-spaced amino acid group pairs, amino acid composition, and sequence-order-coupling number. To verify the performance of the model, six benchmark datasets are selected, including ACPred-Fuse, ACPred-FL, ACP240, ACP740, main and alternate datasets of AntiCP2.0. In terms of Matthews correlation coefficients, ACP-check obtains 0.37, 0.82, 0.80, 0.75, 0.56, and 0.86 on six datasets respectively, which is an improvement by 2%-86% than existing state-of-the-art anticancer peptides prediction methods. Furthermore, ACP-check achieves prediction accuracy with 0.91, 0.91, 0.90, 0.87, 0.78, and 0.93 respectively, which increases range from 1%-49%. Overall, the comparison experiment shows that ACP-check can accurately identify anticancer peptides by sequence-level information. The code and data are available at http://www.cczubio.top/ACP-check/.
Collapse
Affiliation(s)
- Lun Zhu
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China
| | - Chenyang Ye
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China
| | - Xuemei Hu
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Sen Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China; Changzhou No.2 People's Hospital, the Affiliated Hospital of Nanjing Medical University, Changzhou, 213164, China.
| | - Chenyang Zhu
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China
| |
Collapse
|
42
|
MPMABP: A CNN and Bi-LSTM-Based Method for Predicting Multi-Activities of Bioactive Peptides. Pharmaceuticals (Basel) 2022; 15:ph15060707. [PMID: 35745625 PMCID: PMC9231127 DOI: 10.3390/ph15060707] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 05/23/2022] [Accepted: 05/30/2022] [Indexed: 12/30/2022] Open
Abstract
Bioactive peptides are typically small functional peptides with 2–20 amino acid residues and play versatile roles in metabolic and biological processes. Bioactive peptides are multi-functional, so it is vastly challenging to accurately detect all their functions simultaneously. We proposed a convolution neural network (CNN) and bi-directional long short-term memory (Bi-LSTM)-based deep learning method (called MPMABP) for recognizing multi-activities of bioactive peptides. The MPMABP stacked five CNNs at different scales, and used the residual network to preserve the information from loss. The empirical results showed that the MPMABP is superior to the state-of-the-art methods. Analysis on the distribution of amino acids indicated that the lysine preferred to appear in the anti-cancer peptide, the leucine in the anti-diabetic peptide, and the proline in the anti-hypertensive peptide. The method and analysis are beneficial to recognize multi-activities of bioactive peptides.
Collapse
|
43
|
Feng G, Yao H, Li C, Liu R, Huang R, Fan X, Ge R, Miao Q. ME-ACP: Multi-view neural networks with ensemble model for identification of anticancer peptides. Comput Biol Med 2022; 145:105459. [DOI: 10.1016/j.compbiomed.2022.105459] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Revised: 03/22/2022] [Accepted: 03/24/2022] [Indexed: 12/26/2022]
|
44
|
To Assist Oncologists: An Efficient Machine Learning-Based Approach for Anti-Cancer Peptides Classification. SENSORS 2022; 22:s22114005. [PMID: 35684624 PMCID: PMC9185351 DOI: 10.3390/s22114005] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 05/19/2022] [Accepted: 05/20/2022] [Indexed: 12/10/2022]
Abstract
In the modern technological era, Anti-cancer peptides (ACPs) have been considered a promising cancer treatment. It’s critical to find new ACPs to ensure a better knowledge of their functioning processes and vaccine development. Thus, timely and efficient ACPs using a computational technique are highly needed because of the enormous peptide sequences generated in the post-genomic era. Recently, numerous adaptive statistical algorithms have been developed for separating ACPs and NACPs. Despite great advancements, existing approaches still have insufficient feature descriptors and learning methods, limiting predictive performance. To address this, a trustworthy framework is developed for the precise identification of ACPs. Particularly, the presented approach incorporates four hypothetical feature encoding mechanisms namely: amino acid, dipeptide, tripeptide, and an improved version of pseudo amino acid composition are applied to indicate the motif of the target class. Moreover, principal component analysis (PCA) is employed for feature pruning, while selecting optimal, deep, and highly variated features. Due to the diverse nature of learning, experiments are performed over numerous algorithms to select the optimum operating method. After investigating the empirical outcomes, the support vector machine with hybrid feature space shows better performance. The proposed framework achieved an accuracy of 97.09% and 98.25% over the benchmark and independent datasets, respectively. The comparative analysis demonstrates that our proposed model outperforms as compared to the existing methods and is beneficial in drug development, and oncology.
Collapse
|
45
|
Liu J, Gu Y, Zhu W, Zhang Z, Xin Y, Shen Y, He L, Du J. Expression profiles of circular RNA in human placental villus and decidua and prediction of drugs for recurrent spontaneous abortion. Am J Reprod Immunol 2022; 88:e13578. [PMID: 35583158 DOI: 10.1111/aji.13578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 04/27/2022] [Accepted: 05/13/2022] [Indexed: 11/29/2022] Open
Abstract
PROBLEM We aimed to evaluate potential biomarkers and candidate drugs for recurrent spontaneous abortion (RSA) and explore functional circular RNA pathways involved in regulating RSA. METHOD OF STUDY Expression profiles of placental villus and decidua samples derived from females with RSA and those with healthy pregnancies who underwent induced abortion were analyzed using high-throughput RNA whole transcriptome sequencing. Abnormally expressed circular RNAs in a larger cohort of samples were validated using real-time quantitative polymerase chain reaction. Drug discovery and molecular docking were performed using online databases and the Autodock tool, respectively. RESULTS In total, 2103 and 2160 circular RNAs were detected in three pairs of villi and three pairs of decidual tissues, respectively. A total of 22 circular RNAs, 58 miRNAs, and 393 mRNAs with significantly different expression patterns were identified. Five circular RNAs were verified, and the expression of hsa_circ_0088485 was significantly upregulated in the RSA group (P = .041) with a high area under the curve value (.727), sensitivity (76.5%), and specificity (64.7%). GO and KEGG enrichment analyses indicated that differentially expressed genes were associated with angiogenesis and cell adhesion. Drug discovery and molecular docking were analyzed based on 93 differentially expressed mRNAs of the ceRNA network. A total of 36 chemicals were identified as putative bioactive molecules for RSA, and one representative chemical was identified for docking with six proteins. CONCLUSIONS These findings provide novel insights into the mechanism of regulation of RSA by circular RNA and its clinical diagnosis and treatment.
Collapse
Affiliation(s)
- Junwei Liu
- NHC Key Lab of Reproduction Regulation (Shanghai Institute for Biomedical and Pharmaceutical Technologies), School of Pharmacy, Fudan University, Shanghai, China
| | - Yan Gu
- The Second Hospital of Tianjin Medical University, Tianjin, China
| | - Weiqiang Zhu
- NHC Key Lab of Reproduction Regulation (Shanghai Institute for Biomedical and Pharmaceutical Technologies), School of Pharmacy, Fudan University, Shanghai, China
| | - Zhaofeng Zhang
- NHC Key Lab of Reproduction Regulation (Shanghai Institute for Biomedical and Pharmaceutical Technologies), School of Pharmacy, Fudan University, Shanghai, China
| | - Yawei Xin
- The Second Hospital of Tianjin Medical University, Tianjin, China
| | - Yupei Shen
- NHC Key Lab of Reproduction Regulation (Shanghai Institute for Biomedical and Pharmaceutical Technologies), School of Pharmacy, Fudan University, Shanghai, China
| | - Lin He
- Bio-X Center, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Jing Du
- NHC Key Lab of Reproduction Regulation (Shanghai Institute for Biomedical and Pharmaceutical Technologies), School of Pharmacy, Fudan University, Shanghai, China
| |
Collapse
|
46
|
Multi-channel CNN based anticancer peptides identification. Anal Biochem 2022; 650:114707. [PMID: 35568159 DOI: 10.1016/j.ab.2022.114707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 01/27/2022] [Accepted: 04/27/2022] [Indexed: 11/20/2022]
Abstract
Cancer is one of the most dangerous diseases in the world that often leads to misery and death. Current treatments include different kinds of anticancer therapy which exhibit different types of side effects. Because of certain physicochemical properties, anticancer peptides (ACPs) have opened a new path of treatments for this deadly disease. That is why a well-performed methodology for identifying novel anticancer peptides has great importance in the fight against cancer. In addition to the laboratory techniques, various machine learning and deep learning methodologies have developed in recent years for this task. Although these models have shown reasonable predictive ability, there's still room for improvement in terms of performance and exploring new types of algorithms. In this work, we have proposed a novel multi-channel convolutional neural network (CNN) for identifying anticancer peptides from protein sequences. We have collected data from the existing state-of-the-art methodologies and applied binary encoding for data preprocessing. We have also employed k-fold cross-validation to train our models on benchmark datasets and compared our models' performance on the independent datasets. The comparison has indicated our models' superiority on various evaluation metrics. We think our work can be a valuable asset in finding novel anticancer peptides. We have provided a user-friendly web server for academic purposes and it is publicly available at: \texttt{http://103.99.176.239/iacp-cnn/}.
Collapse
|
47
|
Development of Anticancer Peptides Using Artificial Intelligence and Combinational Therapy for Cancer Therapeutics. Pharmaceutics 2022; 14:pharmaceutics14050997. [PMID: 35631583 PMCID: PMC9147327 DOI: 10.3390/pharmaceutics14050997] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 04/28/2022] [Accepted: 05/04/2022] [Indexed: 01/27/2023] Open
Abstract
Cancer is a group of diseases causing abnormal cell growth, altering the genome, and invading or spreading to other parts of the body. Among therapeutic peptide drugs, anticancer peptides (ACPs) have been considered to target and kill cancer cells because cancer cells have unique characteristics such as a high negative charge and abundance of microvilli in the cell membrane when compared to a normal cell. ACPs have several advantages, such as high specificity, cost-effectiveness, low immunogenicity, minimal toxicity, and high tolerance under normal physiological conditions. However, the development and identification of ACPs are time-consuming and expensive in traditional wet-lab-based approaches. Thus, the application of artificial intelligence on the approaches can save time and reduce the cost to identify candidate ACPs. Recently, machine learning (ML), deep learning (DL), and hybrid learning (ML combined DL) have emerged into the development of ACPs without experimental analysis, owing to advances in computer power and big data from the power system. Additionally, we suggest that combination therapy with classical approaches and ACPs might be one of the impactful approaches to increase the efficiency of cancer therapy.
Collapse
|
48
|
Guo X, Jiang Y, Zou Q. Structured Sparse Regularized TSK Fuzzy System for predicting therapeutic peptides. Brief Bioinform 2022; 23:6570018. [PMID: 35438149 DOI: 10.1093/bib/bbac135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 03/19/2022] [Accepted: 03/22/2022] [Indexed: 11/13/2022] Open
Abstract
Therapeutic peptides act on the skeletal system, digestive system and blood system, have antibacterial properties and help relieve inflammation. In order to reduce the resource consumption of wet experiments for the identification of therapeutic peptides, many computational-based methods have been developed to solve the identification of therapeutic peptides. Due to the insufficiency of traditional machine learning methods in dealing with feature noise. We propose a novel therapeutic peptide identification method called Structured Sparse Regularized Takagi-Sugeno-Kang Fuzzy System on Within-Class Scatter (SSR-TSK-FS-WCS). Our method achieves good performance on multiple therapeutic peptides and UCI datasets.
Collapse
Affiliation(s)
- Xiaoyi Guo
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, P.R.China
| | - Yizhang Jiang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, P.R.China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, P.R.China
| |
Collapse
|
49
|
ACPNet: A Deep Learning Network to Identify Anticancer Peptides by Hybrid Sequence Information. Molecules 2022; 27:molecules27051544. [PMID: 35268644 PMCID: PMC8912097 DOI: 10.3390/molecules27051544] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 02/20/2022] [Accepted: 02/23/2022] [Indexed: 12/18/2022] Open
Abstract
Cancer is one of the most dangerous threats to human health. One of the issues is drug resistance action, which leads to side effects after drug treatment. Numerous therapies have endeavored to relieve the drug resistance action. Recently, anticancer peptides could be a novel and promising anticancer candidate, which can inhibit tumor cell proliferation, migration, and suppress the formation of tumor blood vessels, with fewer side effects. However, it is costly, laborious and time consuming to identify anticancer peptides by biological experiments with a high throughput. Therefore, accurately identifying anti-cancer peptides becomes a key and indispensable step for anticancer peptides therapy. Although some existing computer methods have been developed to predict anticancer peptides, the accuracy still needs to be improved. Thus, in this study, we propose a deep learning-based model, called ACPNet, to distinguish anticancer peptides from non-anticancer peptides (non-ACPs). ACPNet employs three different types of peptide sequence information, peptide physicochemical properties and auto-encoding features linking the training process. ACPNet is a hybrid deep learning network, which fuses fully connected networks and recurrent neural networks. The comparison with other existing methods on ACPs82 datasets shows that ACPNet not only achieves the improvement of 1.2% Accuracy, 2.0% F1-score, and 7.2% Recall, but also gets balanced performance on the Matthews correlation coefficient. Meanwhile, ACPNet is verified on an independent dataset, with 20 proven anticancer peptides, and only one anticancer peptide is predicted as non-ACPs. The comparison and independent validation experiment indicate that ACPNet can accurately distinguish anticancer peptides from non-ACPs.
Collapse
|
50
|
Ahmed S, Muhammod R, Khan ZH, Adilina S, Sharma A, Shatabda S, Dehzangi A. ACP-MHCNN: an accurate multi-headed deep-convolutional neural network to predict anticancer peptides. Sci Rep 2021; 11:23676. [PMID: 34880291 PMCID: PMC8654959 DOI: 10.1038/s41598-021-02703-3] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 11/17/2021] [Indexed: 01/10/2023] Open
Abstract
Although advancing the therapeutic alternatives for treating deadly cancers has gained much attention globally, still the primary methods such as chemotherapy have significant downsides and low specificity. Most recently, Anticancer peptides (ACPs) have emerged as a potential alternative to therapeutic alternatives with much fewer negative side-effects. However, the identification of ACPs through wet-lab experiments is expensive and time-consuming. Hence, computational methods have emerged as viable alternatives. During the past few years, several computational ACP identification techniques using hand-engineered features have been proposed to solve this problem. In this study, we propose a new multi headed deep convolutional neural network model called ACP-MHCNN, for extracting and combining discriminative features from different information sources in an interactive way. Our model extracts sequence, physicochemical, and evolutionary based features for ACP identification using different numerical peptide representations while restraining parameter overhead. It is evident through rigorous experiments using cross-validation and independent-dataset that ACP-MHCNN outperforms other models for anticancer peptide identification by a substantial margin on our employed benchmarks. ACP-MHCNN outperforms state-of-the-art model by 6.3%, 8.6%, 3.7%, 4.0%, and 0.20 in terms of accuracy, sensitivity, specificity, precision, and MCC respectively. ACP-MHCNN and its relevant codes and datasets are publicly available at: https://github.com/mrzResearchArena/Anticancer-Peptides-CNN . ACP-MHCNN is also publicly available as an online predictor at: https://anticancer.pythonanywhere.com/ .
Collapse
Affiliation(s)
- Sajid Ahmed
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| | - Rafsanjani Muhammod
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| | - Zahid Hossain Khan
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| | - Sheikh Adilina
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| | - Alok Sharma
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD, 4111, Australia
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh.
| | - Abdollah Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ, 08102, USA.
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, 08102, USA.
| |
Collapse
|