1
|
Ali FEM, Abdel-Reheim MA, Hassanein EHM, Abd El-Aziz MK, Althagafy HS, Badran KSA. Exploring the potential of drug repurposing for liver diseases: A comprehensive study. Life Sci 2024; 347:122642. [PMID: 38641047 DOI: 10.1016/j.lfs.2024.122642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Revised: 03/24/2024] [Accepted: 04/10/2024] [Indexed: 04/21/2024]
Abstract
Drug repurposing involves the investigation of existing drugs for new indications. It offers a great opportunity to quickly identify a new drug candidate at a lower cost than novel discovery and development. Despite the importance and potential role of drug repurposing, there is no specific definition that healthcare providers and the World Health Organization credit. Unfortunately, many similar and interchangeable concepts are being used in the literature, making it difficult to collect and analyze uniform data on repurposed drugs. This research was conducted based on understanding general criteria for drug repurposing, concentrating on liver diseases. Many drugs have been investigated for their effect on liver diseases even though they were originally approved (or on their way to being approved) for other diseases. Some of the hypotheses for drug repurposing were first captured from the literature and then processed further to test the hypothesis. Recently, with the revolution in bioinformatics techniques, scientists have started to use drug libraries and computer systems that can analyze hundreds of drugs to give a short list of candidates to be analyzed pharmacologically. However, this study revealed that drug repurposing is a potential aid that may help deal with liver diseases. It provides available or under-investigated drugs that could help treat hepatitis, liver cirrhosis, Wilson disease, liver cancer, and fatty liver. However, many further studies are needed to ensure the efficacy of these drugs on a large scale.
Collapse
Affiliation(s)
- Fares E M Ali
- Department of Pharmacology and Toxicology, Faculty of Pharmacy, Al-Azhar University, Assiut 71524, Egypt; Michael Sayegh, Faculty of Pharmacy, Aqaba University of Technology, Aqaba 77110, Jordan
| | - Mustafa Ahmed Abdel-Reheim
- Department of Pharmaceutical Sciences, College of Pharmacy, Shaqra University, Shaqra 11961, Saudi Arabia; Department of Pharmacology and Toxicology, Faculty of Pharmacy, Beni-Suef University, Beni Suef 62521, Egypt.
| | - Emad H M Hassanein
- Department of Pharmacology and Toxicology, Faculty of Pharmacy, Al-Azhar University, Assiut 71524, Egypt.
| | - Mostafa K Abd El-Aziz
- Department of Pharmacology and Toxicology, Faculty of Pharmacy, Al-Azhar University, Assiut 71524, Egypt
| | - Hanan S Althagafy
- Department of Biochemistry, Faculty of Science, University of Jeddah, Jeddah, Saudi Arabia
| | - Khalid S A Badran
- Department of Pharmacology and Toxicology, Faculty of Pharmacy, Al-Azhar University, Assiut 71524, Egypt
| |
Collapse
|
2
|
Cao P, Yue M, Cheng Y, Sullivan MA, Chen W, Yu H, Li F, Wu S, Lv Y, Zhai X, Zhang Y. Naringenin prevents non-alcoholic steatohepatitis by modulating the host metabolome and intestinal microbiome in MCD diet-fed mice. Food Sci Nutr 2023; 11:7826-7840. [PMID: 38107095 PMCID: PMC10724642 DOI: 10.1002/fsn3.3700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 09/03/2023] [Accepted: 09/05/2023] [Indexed: 12/19/2023] Open
Abstract
Non-alcoholic steatohepatitis (NASH) is a severe inflammatory phase of the non-alcoholic fatty liver disease (NAFLD) spectrum and can progress to advanced stages of NAFLD if left untreated. This study uses multi-omics data to elucidate the underlying mechanism of naringenin's reported benefit in alleviating (NASH). Male mice were fed a NASH-inducing (methionine-choline-deficient) MCD diet with or without naringenin supplementation for 6 weeks. Naringenin prevented NASH-induced histopathological liver damage and reversed the abnormal levels of hepatic triglyceride (TG)/total cholesterol (TC), serum TG/TC, serum alanine aminotransferase/aspartate transaminase, and hepatic malondialdehyde and glutathione. Importantly, naringenin intervention significantly modulated the relative abundance of gut microbiota and the host metabolomic profile. We detected more than 700 metabolites in the serum and found that the gut genus levels of Anaeroplasma and the [Eubacterium] nodatum group were closely associated with xanthine, 2-picoline, and securinine, respectively. Tuzzerella alterations showed the highest number of associations with host endogenous metabolites such as FAHFA (8:0/10:0), FFA (20:2), carnitine C8:1, tridecanedioic acid, securinine, acetylvaline, DL-O-tyrosine, and Phe-Asn. This study indicates that the interplay between host serum metabolites and gut microbiota may contribute to the therapeutic effect of naringenin against NASH.
Collapse
Affiliation(s)
- Peng Cao
- Department of Pharmacy, Union Hospital, Tongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
- Hubei Province Clinical Research Center for Precision Medicine for Critical IllnessWuhanChina
- Hubei Key Laboratory of Wudang Local Chinese Medicine Research, School of Pharmaceutical SciencesHubei University of MedicineShiyanChina
- Hubei Key Laboratory of Biological Targeted Therapy, Union Hospital, Tongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
| | - Ming Yue
- Hubei Key Laboratory of Wudang Local Chinese Medicine Research, School of Pharmaceutical SciencesHubei University of MedicineShiyanChina
- Department of Pharmacy, The Central Hospital of Wuhan, Tongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
| | - Yuanlei Cheng
- Hubei Key Laboratory of Wudang Local Chinese Medicine Research, School of Pharmaceutical SciencesHubei University of MedicineShiyanChina
- Department of Pharmacy, The Central Hospital of Wuhan, Tongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
| | - Mitchell A. Sullivan
- Glycation and Diabetes, Mater Research Institute – The University of QueenslandTranslational Research InstituteBrisbaneQueenslandAustralia
| | - Wen Chen
- Department of Pharmacy, Union Hospital, Tongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
- Hubei Province Clinical Research Center for Precision Medicine for Critical IllnessWuhanChina
| | - Huifan Yu
- Hubei Key Laboratory of Wudang Local Chinese Medicine Research, School of Pharmaceutical SciencesHubei University of MedicineShiyanChina
| | - Fei Li
- Hubei Key Laboratory of Wudang Local Chinese Medicine Research, School of Pharmaceutical SciencesHubei University of MedicineShiyanChina
| | - Sanlan Wu
- Department of Pharmacy, Union Hospital, Tongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
- Hubei Province Clinical Research Center for Precision Medicine for Critical IllnessWuhanChina
| | - Yongning Lv
- Department of Pharmacy, Union Hospital, Tongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
- Hubei Province Clinical Research Center for Precision Medicine for Critical IllnessWuhanChina
| | - Xuejia Zhai
- Department of Pharmacy, Union Hospital, Tongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
- Hubei Province Clinical Research Center for Precision Medicine for Critical IllnessWuhanChina
| | - Yu Zhang
- Department of Pharmacy, Union Hospital, Tongji Medical CollegeHuazhong University of Science and TechnologyWuhanChina
- Hubei Province Clinical Research Center for Precision Medicine for Critical IllnessWuhanChina
| |
Collapse
|
3
|
Meng C, Yuan Y, Zhao H, Pei Y, Li Z. IIFS: An improved incremental feature selection method for protein sequence processing. Comput Biol Med 2023; 167:107654. [PMID: 37944304 DOI: 10.1016/j.compbiomed.2023.107654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/09/2023] [Accepted: 10/31/2023] [Indexed: 11/12/2023]
Abstract
MOTIVATION Discrete features can be obtained from protein sequences using a feature extraction method. These features are the basis of downstream processing of protein data, but it is necessary to screen and select some important features from them as they generally have data redundancy. RESULT Here, we report IIFS, an improved incremental feature selection method that exploits a new subset search strategy to find the optimal feature set. IIFS combines nonadjacent sorting features to prevent the drawbacks of data explosion and excessive reliance on feature sorting results. The comparative experimental results on 27 feature sorting data show that IIFS can find more accurate and important features compared to existing methods.The IIFS approach also handles data redundancy more efficiently and finds more representative and discriminatory features while ensuring minimal feature dimensionality and good evaluation metrics. Moreover, we wrap this method and deploy it on a web server for access at http://112.124.26.17:8005/.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China; Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application of Agriculture and Animal Husbandry, China
| | - Ye Yuan
- Beidahuang Industry Group General Hospital, Harbin, 150001, China
| | - Haiyan Zhao
- College of Integration of Traditional Chinese and Western Medicine to Southwest Medical University, Luzhou, Sichuan, 646000, China
| | - Yue Pei
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100190, China
| | - Zhi Li
- Department of Spleen and Stomach Diseases, The Affiliated Traditional Chinese Medicine Hospital of Southwest Medical University, Luzhou, Sichuan, 646000, China.
| |
Collapse
|
4
|
Li X, Liao M, Wang B, Zan X, Huo Y, Liu Y, Bao Z, Xu P, Liu W. A drug repurposing method based on inhibition effect on gene regulatory network. Comput Struct Biotechnol J 2023; 21:4446-4455. [PMID: 37731599 PMCID: PMC10507583 DOI: 10.1016/j.csbj.2023.09.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 09/05/2023] [Accepted: 09/07/2023] [Indexed: 09/22/2023] Open
Abstract
Numerous computational drug repurposing methods have emerged as efficient alternatives to costly and time-consuming traditional drug discovery approaches. Some of these methods are based on the assumption that the candidate drug should have a reversal effect on disease-associated genes. However, such methods are not applicable in the case that there is limited overlap between disease-related genes and drug-perturbed genes. In this study, we proposed a novel Drug Repurposing method based on the Inhibition Effect on gene regulatory network (DRIE) to identify potential drugs for cancer treatment. DRIE integrated gene expression profile and gene regulatory network to calculate inhibition score by using the shortest path in the disease-specific network. The results on eleven datasets indicated the superior performance of DRIE when compared to other state-of-the-art methods. Case studies showed that our method effectively discovered novel drug-disease associations. Our findings demonstrated that the top-ranked drug candidates had been already validated by CTD database. Additionally, it clearly identified potential agents for three cancers (colorectal, breast, and lung cancer), which was beneficial when annotating drug-disease relationships in the CTD. This study proposed a novel framework for drug repurposing, which would be helpful for drug discovery and development.
Collapse
Affiliation(s)
- Xianbin Li
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
- School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, China
| | - Minzhen Liao
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
| | - Bing Wang
- School of Medicine, Southeast University, Nanjing, China
| | - Xiangzhen Zan
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
| | - Yanhao Huo
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
| | - Yue Liu
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
| | - Zhenshen Bao
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
- School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, China
| | - Peng Xu
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
- School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, China
| | - Wenbin Liu
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
| |
Collapse
|
5
|
Zhu W, Yuan SS, Li J, Huang CB, Lin H, Liao B. A First Computational Frame for Recognizing Heparin-Binding Protein. Diagnostics (Basel) 2023; 13:2465. [PMID: 37510209 PMCID: PMC10377868 DOI: 10.3390/diagnostics13142465] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 07/13/2023] [Accepted: 07/21/2023] [Indexed: 07/30/2023] Open
Abstract
Heparin-binding protein (HBP) is a cationic antibacterial protein derived from multinuclear neutrophils and an important biomarker of infectious diseases. The correct identification of HBP is of great significance to the study of infectious diseases. This work provides the first HBP recognition framework based on machine learning to accurately identify HBP. By using four sequence descriptors, HBP and non-HBP samples were represented by discrete numbers. By inputting these features into a support vector machine (SVM) and random forest (RF) algorithm and comparing the prediction performances of these methods on training data and independent test data, it is found that the SVM-based classifier has the greatest potential to identify HBP. The model could produce an auROC of 0.981 ± 0.028 on training data using 10-fold cross-validation and an overall accuracy of 95.0% on independent test data. As the first model for HBP recognition, it will provide some help for infectious diseases and stimulate further research in related fields.
Collapse
Affiliation(s)
- Wen Zhu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou 571158, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou 571158, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China
| | - Shi-Shi Yuan
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Jian Li
- School of Basic Medical Sciences, Chengdu University, Chengdu 610106, China
| | - Cheng-Bing Huang
- School of Computer Science and Technology, ABa Teachers University, Chengdu 623002, China
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Bo Liao
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou 571158, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou 571158, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China
| |
Collapse
|
6
|
Luo X, Wang Y, Zou Q, Xu L. Recall DNA methylation levels at low coverage sites using a CNN model in WGBS. PLoS Comput Biol 2023; 19:e1011205. [PMID: 37315069 DOI: 10.1371/journal.pcbi.1011205] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 05/22/2023] [Indexed: 06/16/2023] Open
Abstract
DNA methylation is an important regulator of gene transcription. WGBS is the gold-standard approach for base-pair resolution quantitative of DNA methylation. It requires high sequencing depth. Many CpG sites with insufficient coverage in the WGBS data, resulting in inaccurate DNA methylation levels of individual sites. Many state-of-arts computation methods were proposed to predict the missing value. However, many methods required either other omics datasets or other cross-sample data. And most of them only predicted the state of DNA methylation. In this study, we proposed the RcWGBS, which can impute the missing (or low coverage) values from the DNA methylation levels on the adjacent sides. Deep learning techniques were employed for the accurate prediction. The WGBS datasets of H1-hESC and GM12878 were down-sampled. The average difference between the DNA methylation level at 12× depth predicted by RcWGBS and that at >50× depth in the H1-hESC and GM2878 cells are less than 0.03 and 0.01, respectively. RcWGBS performed better than METHimpute even though the sequencing depth was as low as 12×. Our work would help to process methylation data of low sequencing depth. It is beneficial for researchers to save sequencing costs and improve data utilization through computational methods.
Collapse
Affiliation(s)
- Ximei Luo
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, Guangdong, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Yansu Wang
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, Guangdong, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, Guangdong, China
| |
Collapse
|
7
|
Zhang YF, Wang YH, Gu ZF, Pan XR, Li J, Ding H, Zhang Y, Deng KJ. Bitter-RF: A random forest machine model for recognizing bitter peptides. Front Med (Lausanne) 2023; 10:1052923. [PMID: 36778738 PMCID: PMC9909039 DOI: 10.3389/fmed.2023.1052923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Accepted: 01/05/2023] [Indexed: 01/27/2023] Open
Abstract
Introduction Bitter peptides are short peptides with potential medical applications. The huge potential behind its bitter taste remains to be tapped. To better explore the value of bitter peptides in practice, we need a more effective classification method for identifying bitter peptides. Methods In this study, we developed a Random forest (RF)-based model, called Bitter-RF, using sequence information of the bitter peptide. Bitter-RF covers more comprehensive and extensive information by integrating 10 features extracted from the bitter peptides and achieves better results than the latest generation model on independent validation set. Results The proposed model can improve the accurate classification of bitter peptides (AUROC = 0.98 on independent set test) and enrich the practical application of RF method in protein classification tasks which has not been used to build a prediction model for bitter peptides. Discussion We hope the Bitter-RF could provide more conveniences to scholars for bitter peptide research.
Collapse
Affiliation(s)
- Yu-Fei Zhang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yu-Hao Wang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhi-Feng Gu
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Xian-Run Pan
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Jian Li
- School of Basic Medical Sciences, Chengdu University, Chengdu, China
| | - Hui Ding
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China,*Correspondence: Hui Ding,
| | - Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China,Yang Zhang,
| | - Ke-Jun Deng
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China,Ke-Jun Deng,
| |
Collapse
|
8
|
Su W, Deng S, Gu Z, Yang K, Ding H, Chen H, Zhang Z. Prediction of apoptosis protein subcellular location based on amphiphilic pseudo amino acid composition. Front Genet 2023; 14:1157021. [PMID: 36926588 PMCID: PMC10011625 DOI: 10.3389/fgene.2023.1157021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 02/20/2023] [Indexed: 03/08/2023] Open
Abstract
Introduction: Apoptosis proteins play an important role in the process of cell apoptosis, which makes the rate of cell proliferation and death reach a relative balance. The function of apoptosis protein is closely related to its subcellular location, it is of great significance to study the subcellular locations of apoptosis proteins. Many efforts in bioinformatics research have been aimed at predicting their subcellular location. However, the subcellular localization of apoptotic proteins needs to be carefully studied. Methods: In this paper, based on amphiphilic pseudo amino acid composition and support vector machine algorithm, a new method was proposed for the prediction of apoptosis proteins\x{2019} subcellular location. Results and Discussion: The method achieved good performance on three data sets. The Jackknife test accuracy of the three data sets reached 90.5%, 93.9% and 84.0%, respectively. Compared with previous methods, the prediction accuracies of APACC_SVM were improved.
Collapse
Affiliation(s)
- Wenxia Su
- College of Science, Inner Mongolia Agriculture University, Hohhot, China
| | - Shuyi Deng
- School of Life Science and Technology, Center for Information Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhifeng Gu
- School of Life Science and Technology, Center for Information Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Keli Yang
- Nonlinear Research Institute, Baoji University of Arts and Sciences, Baoji, China
| | - Hui Ding
- School of Life Science and Technology, Center for Information Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hui Chen
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| | - Zhaoyue Zhang
- School of Life Science and Technology, Center for Information Biology, University of Electronic Science and Technology of China, Chengdu, China.,School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| |
Collapse
|
9
|
Wang Y, Zhang Y, Zhang Y, Gu Z, Zhang Z, Lin H, Deng K. Identification of adaptor proteins using the ANOVA feature selection technique. Methods 2022; 208:42-47. [DOI: 10.1016/j.ymeth.2022.10.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Revised: 10/01/2022] [Accepted: 10/24/2022] [Indexed: 11/06/2022] Open
|
10
|
Zhao D, Wang L, Chen Z, Zhang L, Xu L. KRAS is a prognostic biomarker associated with diagnosis and treatment in multiple cancers. Front Genet 2022; 13:1024920. [PMID: 36330448 PMCID: PMC9624065 DOI: 10.3389/fgene.2022.1024920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 09/20/2022] [Indexed: 11/21/2022] Open
Abstract
KRAS encodes K-Ras proteins, which take part in the MAPK pathway. The expression level of KRAS is high in tumor patients. Our study compared KRAS expression levels between 33 kinds of tumor tissues. Additionally, we studied the association of KRAS expression levels with diagnostic and prognostic values, clinicopathological features, and tumor immunity. We established 22 immune-infiltrating cell expression datasets to calculate immune and stromal scores to evaluate the tumor microenvironment. KRAS genes, immune check-point genes and interacting genes were selected to construct the PPI network. We selected 79 immune checkpoint genes and interacting related genes to calculate the correlation. Based on the 33 tumor expression datasets, we conducted GSEA (genome set enrichment analysis) to show the KRAS and other co-expressed genes associated with cancers. KRAS may be a reliable prognostic biomarker in the diagnosis of cancer patients and has the potential to be included in cancer-targeted drugs.
Collapse
Affiliation(s)
- Da Zhao
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- School of food and drug, Shenzhen Polytechnic, Shenzhen, China
| | - Lizhuang Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Zheng Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- School of food and drug, Shenzhen Polytechnic, Shenzhen, China
| | - Lijun Zhang
- School of food and drug, Shenzhen Polytechnic, Shenzhen, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
- *Correspondence: Lei Xu,
| |
Collapse
|
11
|
Yuan SS, Gao D, Xie XQ, Ma CY, Su W, Zhang ZY, Zheng Y, Ding H. IBPred: a sequence-based predictor for identifying ion binding protein in phage. Comput Struct Biotechnol J 2022; 20:4942-4951. [PMID: 36147670 PMCID: PMC9474292 DOI: 10.1016/j.csbj.2022.08.053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 08/23/2022] [Accepted: 08/24/2022] [Indexed: 11/16/2022] Open
Abstract
Ion binding proteins (IBPs) can selectively and non-covalently interact with ions. IBPs in phages also play an important role in biological processes. Therefore, accurate identification of IBPs is necessary for understanding their biological functions and molecular mechanisms that involve binding to ions. Since molecular biology experimental methods are still labor-intensive and cost-ineffective in identifying IBPs, it is helpful to develop computational methods to identify IBPs quickly and efficiently. In this work, a random forest (RF)-based model was constructed to quickly identify IBPs. Based on the protein sequence information and residues’ physicochemical properties, the dipeptide composition combined with the physicochemical correlation between two residues were proposed for the extraction of features. A feature selection technique called analysis of variance (ANOVA) was used to exclude redundant information. By comparing with other classified methods, we demonstrated that our method could identify IBPs accurately. Based on the model, a Python package named IBPred was built with the source code which can be accessed at https://github.com/ShishiYuan/IBPred.
Collapse
|
12
|
Liu S, Cui C, Chen H, Liu T. Ensemble Learning-Based Feature Selection for Phage Protein Prediction. Front Microbiol 2022; 13:932661. [PMID: 35910662 PMCID: PMC9335128 DOI: 10.3389/fmicb.2022.932661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 06/14/2022] [Indexed: 11/14/2022] Open
Abstract
Phage has high specificity for its host recognition. As a natural enemy of bacteria, it has been used to treat super bacteria many times. Identifying phage proteins from the original sequence is very important for understanding the relationship between phage and host bacteria and developing new antimicrobial agents. However, traditional experimental methods are both expensive and time-consuming. In this study, an ensemble learning-based feature selection method is proposed to find important features for phage protein identification. The method uses four types of protein sequence-derived features, quantifies the importance of each feature by adding perturbations to the features to influence the results, and finally splices the important features among the four types of features. In addition, we analyzed the selected features and their biological significance.
Collapse
Affiliation(s)
- Songbo Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Chengmin Cui
- Beijing Institute of Control Engineering, China Academy of Space Technology, Beijing, China
| | - Huipeng Chen
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
- *Correspondence: Huipeng Chen
| | - Tong Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
13
|
Hua Y, Wang H, Ye Z, Zheng D, Zhang X. An integrated pan-cancer analysis of identifying biomarkers about the EGR family genes in human carcinomas. Comput Biol Med 2022. [DOI: 10.1016/j.compbiomed.2022.105889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 06/25/2022] [Accepted: 07/16/2022] [Indexed: 12/24/2022]
|
14
|
Liu P, Zhou Y, Dong X, Zheng B, Liang B, Liang R, Liu Z, Li L, Gong P, Wang F. ZNF165 Is Involved in the Regulation of Immune Microenvironment and Promoting the Proliferation and Migration of Hepatocellular Carcinoma by AhR/CYP1A1. J Immunol Res 2022; 2022:1-12. [PMID: 35692498 PMCID: PMC9177304 DOI: 10.1155/2022/4446805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 04/13/2022] [Accepted: 04/15/2022] [Indexed: 01/27/2023] Open
Abstract
The strong tumorigenic capacity and treatment resistance made hepatocellular carcinoma (HCC) a huge threat to public health. ZNF165, the kruppel family of zinc-finger-containing transcription factors, is expressed in HCC; however, its specific role in HCC and the molecular mechanism are yet to be elucidated. In this study, we observed that ZNF165 was overexpressed in liver cancer tissues and the immune microenvironment; higher ZNF165 expression was correlated with lower overall survival in liver cancer patients. The ZNF165 knockdown in Bel7402 cells revealed the impairment of the tryptophan/kynurenine/AhR/CYP1A1 axis. Moreover, the knockdown of CYP1A1 significantly inhibited the proliferation and migration of HCC cells, and ZNF165 promoted the transcriptional activity of AhR by facilitating the nuclear translocation of CYP1A1. In conclusion, the present study argued that ZNF165 was highly expressed in liver tissues and the immune microenvironment. ZNF165 promoted the proliferation and migration of HCC cells by activating the tryptophan/kynurenine/AhR/CYP1A1 axis and promoting the expression of CYP1A1.
Collapse
|
15
|
Liu P, Ding Y, Rong Y, Chen D. Prediction of cell penetrating peptides and their uptake efficiency using random forest‐based feature selections. AIChE J 2022. [DOI: 10.1002/aic.17781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Peng Liu
- Institute of Fundamental and Frontier Sciences University of Electronic Science and Technology of China Chengdu China
- Institute of Yangtze Delta Region (Quzhou) University of Electronic Science and Technology of China Quzhou China
| | - Yijie Ding
- Institute of Yangtze Delta Region (Quzhou) University of Electronic Science and Technology of China Quzhou China
| | - Ying Rong
- Beidahuang Industry Group General Hospital Harbin China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University Quzhou China
| |
Collapse
|
16
|
Huang G, He X, Xue Z, Long Y, Liu J, Cai J, Tang P, Han B, Shen B, Huang R, Yan J. Rauwolfia vomitoria extract suppresses benign prostatic hyperplasia by inducing autophagic apoptosis through endoplasmic reticulum stress. BMC Complement Med Ther 2022; 22:125. [PMID: 35513857 PMCID: PMC9074266 DOI: 10.1186/s12906-022-03610-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2021] [Accepted: 04/25/2022] [Indexed: 11/26/2022] Open
Abstract
Background The current drug treatments for benign prostatic hyperplasia (BPH) have negative side effects. Therefore, it is important to find effective alternative therapies with significantly fewer side effects. Our previous study revealed that Rauwolfia vomitoria (RWF) root bark extract reversed BPH development in a rat model. However, the molecular mechanism of its inhibitory effects on BPH remains largely unknown. Methods BPH-1 and WPMY-1 cell lines derived from BPH epithelial and prostatic stromal compartments were selected to investigate how RWF extract inhibits BPH in vitro by MTT and flow cytometry assays. Microarray, quantitative real-time PCR, immunoblotting, and GFP-LC3 immunofluorescence assays were performed to evaluate the effects of RWF extract on endoplasmic reticulum stress (ER stress) and autophagic apoptosis pathways in two cell lines. A human BPH ex vivo explant assay was also employed for validation. Results RWF extract treatment decreased cell viability and induced apoptotic cell death in both BPH-1 and WPMY-1 cells in a concentration-dependent manner with the increase of pro-apoptotic PCDC4 protein. RWF extract induced autophagy by enhancing the levels of autophagic genes (ULK2 and SQSTM1/p62) and the LC3II:LC3I ratio, with the increase of GFP-LC3 puncta. Moreover, RWF extract activated PERK- and ATF6-associated ER stress pathways by inducing the transcriptional levels of EIF2AK3/PERK, DDIT3/CHOP and ATF6, accompanied by the reduction of BiP protein level, but not its mRNA level. Another ER stress pathway was not induced by RWF extract, as manifested by the lack of XBP1 splicing. Pharmacological inhibition of autophagy by 3-methyladenine abrogated apoptosis but not ER stress; while inhibition of ER stress by 4-phenylbutyrate alleviated the induction of autophagy and apoptosis. In addition, pretreatments with either 3-methyladenine or 4-phenylbutyrate suppressed RWF extract-induced cytotoxicity. Notably, the inductions of PERK- and ATF6-related stress pathways and autophagic apoptosis were confirmed in a human BPH ex vivo explant. Conclusions Our data have demonstrated that RWF extract significantly suppressed the viabilities of BPH epithelial cells and BPH myofibroblasts by inducing apoptosis via upregulating ER stress and autophagy. These data indicate that RWF extract is a potential novel alternative therapeutic approach for BPH. Supplementary Information The online version contains supplementary material available at 10.1186/s12906-022-03610-4.
Collapse
Affiliation(s)
- Guifang Huang
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Avenue, Nanjing, 210023, Jiangsu, China.,Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Xiao He
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Avenue, Nanjing, 210023, Jiangsu, China.,Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China
| | - Zesheng Xue
- Model Animal Research Center of Nanjing University, 12 Xuefu Road, Nanjing, 210061, Jiangsu, China
| | - Yiming Long
- Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China.,University of Chinese Academy of Sciences, No.19(A) Yuquan Road, Beijing, 100049, China
| | - Jiakuan Liu
- Department of Laboratory Animal Science, Fudan University, 130 Dong'an Road, Shanghai, 200032, China
| | - Jinming Cai
- Department of Urology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, 100 Haining Road, Shanghai, 200080, China
| | - Pengfei Tang
- Department of Urology, Shanghai General Hospital of Nanjing Medical University, Shanghai, 200080, China
| | - Bangmin Han
- Department of Urology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, 100 Haining Road, Shanghai, 200080, China
| | - Bing Shen
- Department of Urology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, 100 Haining Road, Shanghai, 200080, China.,Department of Urology, Shanghai General Hospital of Nanjing Medical University, Shanghai, 200080, China
| | - Ruimin Huang
- School of Chinese Materia Medica, Nanjing University of Chinese Medicine, 138 Xianlin Avenue, Nanjing, 210023, Jiangsu, China. .,Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, China. .,University of Chinese Academy of Sciences, No.19(A) Yuquan Road, Beijing, 100049, China.
| | - Jun Yan
- Department of Laboratory Animal Science, Fudan University, 130 Dong'an Road, Shanghai, 200032, China.
| |
Collapse
|
17
|
Zhao S, Pan Q, Zou Q, Ju Y, Shi L, Su X, Liao C. Identifying and Classifying Enhancers by Dinucleotide-Based Auto-Cross Covariance and Attention-Based Bi-LSTM. Computational and Mathematical Methods in Medicine 2022; 2022:1-11. [PMID: 35422876 PMCID: PMC9005296 DOI: 10.1155/2022/7518779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 03/12/2022] [Indexed: 11/17/2022]
Abstract
Enhancers are a class of noncoding DNA elements located near structural genes. In recent years, their identification and classification have been the focus of research in the field of bioinformatics. However, due to their high free scattering and position variability, although the performance of the prediction model has been continuously improved, there is still a lot of room for progress. In this paper, density-based spatial clustering of applications with noise (DBSCAN) was used to screen the physicochemical properties of dinucleotides to extract dinucleotide-based auto-cross covariance (DACC) features; then, the features are reduced by feature selection Python toolkit MRMD 2.0. The reduced features are input into the random forest to identify enhancers. The enhancer classification model was built by word2vec and attention-based Bi-LSTM. Finally, the accuracies of our enhancer identification and classification models were 77.25% and 73.50%, respectively, and the Matthews’ correlation coefficients (MCCs) were 0.5470 and 0.4881, respectively, which were better than the performance of most predictors.
Collapse
|
18
|
Zhang H, Zou Q, Ju Y, Song C, Chen D. Distance-based support vector machine to predict DNA N6-methyladenine modification. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220404145517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
DNA N6-methyladenine plays an important role in the restriction-modification system to isolate invasion from adventive DNA. The shortcomings of the high time-consumption and high costs of experimental methods have been exposed, and some computational methods have emerged. The support vector machine theory has received extensive attention in the bioinformatics field due to its solid theoretical foundation and many good characteristics.
Objective:
General machine learning methods include an important step of extracting features. The research has omitted this step and replaced with easy-to-obtain sequence distances matrix to obtain better results
Method:
First sequence alignment technology was used to achieve the similarity matrix. Then a novel transformation turned the similarity matrix into a distance matrix. Next, the similarity-distance matrix is made positive semi-definite so that it can be used in the kernel matrix. Finally, the LIBSVM software was applied to solve the support vector machine.
Results:
The five-fold cross-validation of this model on rice and mouse data has achieved excellent accuracy rates of 92.04% and 96.51%, respectively. This shows that the DB-SVM method has obvious advantages compared with traditional machine learning methods. Meanwhile this model achieved 0.943,0.982 and 0.818 accuracy,0.944, 0.982, and 0.838 Matthews correlation coefficient and 0.942, 0.982 and 0.840 F1 scores for the rice, M. musculus and cross-species genome datasets, respectively.
Conclusion:
These outcomes show that this model outperforms the iIM-CNN and csDMA in the prediction of DNA 6mA modification, which are the lastest research on DNA 6mA.
Collapse
Affiliation(s)
- Haoyu Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610051, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610051, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Chenggang Song
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou 324000, China
| |
Collapse
|
19
|
Wang Z, Zhang Y, Li Q, Zou Q, Liu Q. A road map for happiness: The psychological factors related cell types in various parts of human body from single cell RNA-seq data analysis. Comput Biol Med 2022; 143:105286. [PMID: 35183972 DOI: 10.1016/j.compbiomed.2022.105286] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 01/16/2022] [Accepted: 01/24/2022] [Indexed: 12/13/2022]
Abstract
Massive evidence from all sources including zoology, neurobiology and immunology has confirmed that psychological factors can raise remarkable physiological effects. Researchers have long been aware of the potential value of these effects and wanted to harness them in the development of new drugs and therapies, for which the mechanism study is a necessary prerequisite. However, most of these studies are restricted to neuroscience, or starts with blood sample and fall into the area of immunity. In this study, we choose to focus on the psychological factor of happiness, mining existing publicly available single cell RNA sequencing (scRNA-seq) data for the expression of happiness-related genes collected from various sources of literature in all types of cells in the samples, finding that the expression of these genes is not restricted within neuro-regulated cells or tissue-resident immune cells, on the opposite, cell types that are unique to tissue and organ without direct regulation from nervous system account for the majority to express the happiness-related genes. Our research is a preliminary exploration of where our body respond to our mind at cell level, and lays the foundation for more detailed mechanism research.
Collapse
Affiliation(s)
- Ziwei Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology, China
| | - Ying Zhang
- Department of Anesthesiology, Hospital T.C.M Affiliated to Southwest Medical University, Luzhou, China
| | - Qun Li
- Department of Pain, The Affiliated Traditional Chinese Medicine Hospital of Southwest Medical University, Luzhou, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology, China; Yangtze Delta Region Institute Quzhou, University of Electronic Science and Technology of China, Quzhou, Zhejiang, China.
| | - Qing Liu
- Department of Algology, Hospital T.C.M Affiliated to Southwest Medical University, Luzhou, China.
| |
Collapse
|
20
|
Chen Y, Wang Y, Ding Y, Su X, Wang C. RGCNCDA: Relational graph convolutional network improves circRNA-disease association prediction by incorporating microRNAs. Comput Biol Med 2022; 143:105322. [PMID: 35217342 DOI: 10.1016/j.compbiomed.2022.105322] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 02/11/2022] [Accepted: 02/13/2022] [Indexed: 12/21/2022]
Abstract
Recently, a large number of studies have indicated that circRNAs with covalently closed loops play important roles in biological processes and have potential as diagnostic biomarkers. Therefore, research on the circRNA-disease relationship is helpful in disease diagnosis and treatment. However, traditional biological verification methods require considerable labor and time costs. In this paper, we propose a new computational method (RGCNCDA) to predict circRNA-disease associations based on relational graph convolutional networks (R-GCNs). The method first integrates the circRNA similarity network, miRNA similarity network, disease similarity network and association networks among them to construct a global heterogeneous network. Then, it employs the random walk with restart (RWR) and principal component analysis (PCA) models to learn low-dimensional and high-order information from the global heterogeneous network as the topological features. Finally, a prediction model based on an R-GCN encoder and a DistMult decoder is built to predict the potential disease-associated circRNA. The predicted results demonstrate that RGCNCDA performs significantly better than the other six state-of-the-art methods in a 5-fold cross validation. Furthermore, the case study illustrates that RGCNCDA can effectively discover potential circRNA-disease associations.
Collapse
Affiliation(s)
- Yaojia Chen
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Yanpeng Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Xi Su
- Foshan Maternity & Child Healthcare Hospital, Southern Medical University, Foshan, China.
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, China.
| |
Collapse
|
21
|
Zhao S, Ding Y, Liu X, Su X. HKAM-MKM: A hybrid kernel alignment maximization-based multiple kernel model for identifying DNA-binding proteins. Comput Biol Med 2022; 145:105395. [PMID: 35334314 DOI: 10.1016/j.compbiomed.2022.105395] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 03/08/2022] [Accepted: 03/08/2022] [Indexed: 12/24/2022]
Abstract
The identification of DNA-binding proteins (DBPs) has always been a hot issue in the field of sequence classification. However, considering that the experimental identification method is very resource-intensive, the construction of a computational prediction model is worthwhile. This study developed and evaluated a hybrid kernel alignment maximization-based multiple kernel model (HKAM-MKM) for predicting DBPs. First, we collected two datasets and performed feature extraction on the sequences to obtain six feature groups, and then constructed the corresponding kernels. To ensure the effective utilisation of the base kernel and avoid ignoring the difference between the sample and its neighbours, we proposed local kernel alignment to calculate the kernel between the sample and its neighbours, with each sample as the centre. We combined the global and local kernel alignments to develop a hybrid kernel alignment model, and balance the relationship between the two through parameters. By maximising the hybrid kernel alignment value, we obtained the weight of each kernel and then linearly combined the kernels in the form of weights. Finally, the fused kernel was input into a support vector machine for training and prediction. Finally, in the independent test sets PDB186 and PDB2272, we obtained the highest Matthew's correlation coefficient (MCC) (0.768 and 0.5962, respectively) and the highest accuracy (87.1% and 78.43%, respectively), which were superior to the other predictors. Therefore, HKAM-MKM is an efficient prediction tool for DBPs.
Collapse
|
22
|
Chen Z, Jiao S, Zhao D, Zou Q, Xu L, Zhang L, Su X. The Characterization of Structure and Prediction for Aquaporin in Tumour Progression by Machine Learning. Front Cell Dev Biol 2022; 10:845622. [PMID: 35178393 PMCID: PMC8844512 DOI: 10.3389/fcell.2022.845622] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Accepted: 01/17/2022] [Indexed: 11/21/2022] Open
Abstract
Recurrence and new cases of cancer constitute a challenging human health problem. Aquaporins (AQPs) can be expressed in many types of tumours, including the brain, breast, pancreas, colon, skin, ovaries, and lungs, and the histological grade of cancer is positively correlated with AQP expression. Therefore, the identification of aquaporins is an area to explore. Computational tools play an important role in aquaporin identification. In this research, we propose reliable, accurate and automated sequence predictor iAQPs-RF to identify AQPs. In this study, the feature extraction method was 188D (global protein sequence descriptor, GPSD). Six common classifiers, including random forest (RF), NaiveBayes (NB), support vector machine (SVM), XGBoost, logistic regression (LR) and decision tree (DT), were used for AQP classification. The classification results show that the random forest (RF) algorithm is the most suitable machine learning algorithm, and the accuracy was 97.689%. Analysis of Variance (ANOVA) was used to analyse these characteristics. Feature rank based on the ANOVA method and IFS strategy was applied to search for the optimal features. The classification results suggest that the 26th feature (neutral/hydrophobic) and 21st feature (hydrophobic) are the two most powerful and informative features that distinguish AQPs from non-AQPs. Previous studies reported that plasma membrane proteins have hydrophobic characteristics. Aquaporin subcellular localization prediction showed that all aquaporins were plasma membrane proteins with highly conserved transmembrane structures. In addition, the 3D structure of aquaporins was consistent with the localization results. Therefore, these studies confirmed that aquaporins possess hydrophobic properties. Although aquaporins are highly conserved transmembrane structures, the phylogenetic tree shows the diversity of aquaporins during evolution. The PCA showed that positive and negative samples were well separated by 54D features, indicating that the 54D feature can effectively classify aquaporins. The online prediction server is accessible at http://lab.malab.cn/∼acy/iAQP.
Collapse
Affiliation(s)
- Zheng Chen
- School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, Shenzhen, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Shihu Jiao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Da Zhao
- School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, Shenzhen, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Lijun Zhang
- School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, Shenzhen, China
| | - Xi Su
- Foshan Maternal and Child Health Hospital, Foshan, China
| |
Collapse
|
23
|
Zhang S, Jiang H, Gao B, Yang W, Wang G. Identification of Diagnostic Markers for Breast Cancer Based on Differential Gene Expression and Pathway Network. Front Cell Dev Biol 2022; 9:811585. [PMID: 35096840 PMCID: PMC8790293 DOI: 10.3389/fcell.2021.811585] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 12/13/2021] [Indexed: 11/13/2022] Open
Abstract
Background: Breast cancer is the second largest cancer in the world, the incidence of breast cancer continues to rise worldwide, and women's health is seriously threatened. Therefore, it is very important to explore the characteristic changes of breast cancer from the gene level, including the screening of differentially expressed genes and the identification of diagnostic markers. Methods: The gene expression profiles of breast cancer were obtained from the TCGA database. The edgeR R software package was used to screen the differentially expressed genes between breast cancer patients and normal samples. The function and pathway enrichment analysis of these genes revealed significant enrichment of functions and pathways. Next, download these pathways from KEGG website, extract the gene interaction relations, construct the KEGG pathway gene interaction network. The potential diagnostic markers of breast cancer were obtained by combining the differentially expressed genes with the key genes in the network. Finally, these markers were used to construct the diagnostic prediction model of breast cancer, and the predictive ability of the model and the diagnostic ability of the markers were verified by internal and external data. Results: 1060 differentially expressed genes were identified between breast cancer patients and normal controls. Enrichment analysis revealed 28 significantly enriched pathways (p < 0.05). They were downloaded from KEGG website, and the gene interaction relations were extracted to construct the gene interaction network of KEGG pathway, which contained 1277 nodes and 7345 edges. The key nodes with a degree greater than 30 were extracted from the network, containing 154 genes. These 154 key genes shared 23 genes with differentially expressed genes, which serve as potential diagnostic markers for breast cancer. The 23 genes were used as features to construct the SVM classification model, and the model had good predictive ability in both the training dataset and the validation dataset (AUC = 0.960 and 0.907, respectively). Conclusion: This study showed that the difference of gene expression level is important for the diagnosis of breast cancer, and identified 23 breast cancer diagnostic markers, which provides valuable information for clinical diagnosis and basic treatment experiments.
Collapse
Affiliation(s)
- Shumei Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Haoran Jiang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Bo Gao
- Department of Radiology, The Second Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Wen Yang
- International Medical Center, Shenzhen University General Hospital, Shenzhen, China
| | - Guohua Wang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| |
Collapse
|
24
|
Zhao Z, Yang W, Zhai Y, Liang Y, Zhao Y. Identify DNA-Binding Proteins Through the Extreme Gradient Boosting Algorithm. Front Genet 2022; 12:821996. [PMID: 35154264 PMCID: PMC8837382 DOI: 10.3389/fgene.2021.821996] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 12/07/2021] [Indexed: 12/13/2022] Open
Abstract
The exploration of DNA-binding proteins (DBPs) is an important aspect of studying biological life activities. Research on life activities requires the support of scientific research results on DBPs. The decline in many life activities is closely related to DBPs. Generally, the detection method for identifying DBPs is achieved through biochemical experiments. This method is inefficient and requires considerable manpower, material resources and time. At present, several computational approaches have been developed to detect DBPs, among which machine learning (ML) algorithm-based computational techniques have shown excellent performance. In our experiments, our method uses fewer features and simpler recognition methods than other methods and simultaneously obtains satisfactory results. First, we use six feature extraction methods to extract sequence features from the same group of DBPs. Then, this feature information is spliced together, and the data are standardized. Finally, the extreme gradient boosting (XGBoost) model is used to construct an effective predictive model. Compared with other excellent methods, our proposed method has achieved better results. The accuracy achieved by our method is 78.26% for PDB2272 and 85.48% for PDB186. The accuracy of the experimental results achieved by our strategy is similar to that of previous detection methods.
Collapse
Affiliation(s)
- Ziye Zhao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Wen Yang
- International Medical Center, Shenzhen University General Hospital, Shenzhen, China
| | - Yixiao Zhai
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Yingjian Liang
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
- *Correspondence: Yingjian Liang, ; Yuming Zhao,
| | - Yuming Zhao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
- *Correspondence: Yingjian Liang, ; Yuming Zhao,
| |
Collapse
|
25
|
Liu M, Chen H, Gao D, Ma C, Zhang Z, Manavalan B. Identification of Helicobacter pylori Membrane Proteins Using Sequence-Based Features. Computational and Mathematical Methods in Medicine 2022; 2022:1-7. [PMID: 35069791 PMCID: PMC8769816 DOI: 10.1155/2022/7493834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Accepted: 12/16/2021] [Indexed: 11/28/2022]
Abstract
Helicobacter pylori (H. pylori) is the most common risk factor for gastric cancer worldwide. The membrane proteins of the H. pylori are involved in bacterial adherence and play a vital role in the field of drug discovery. Thus, an accurate and cost-effective computational model is needed to predict the uncharacterized membrane proteins of H. pylori. In this study, a reliable benchmark dataset consisted of 114 membrane and 219 nonmembrane proteins was constructed based on UniProt. A support vector machine- (SVM-) based model was developed for discriminating H. pylori membrane proteins from nonmembrane proteins by using sequence information. Cross-validation showed that our method achieved good performance with an accuracy of 91.29%. It is anticipated that the proposed model will be useful for the annotation of H. pylori membrane proteins and the development of new anti-H. pylori agents.
Collapse
|
26
|
Zhang Z, Gong Y, Gao B, Li H, Gao W, Zhao Y, Dong B. SNAREs-SAP: SNARE Proteins Identification With PSSM Profiles. Front Genet 2022; 12:809001. [PMID: 34987554 PMCID: PMC8721734 DOI: 10.3389/fgene.2021.809001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 11/15/2021] [Indexed: 12/20/2022] Open
Abstract
Soluble N-ethylmaleimide sensitive factor activating protein receptor (SNARE) proteins are a large family of transmembrane proteins located in organelles and vesicles. The important roles of SNARE proteins include initiating the vesicle fusion process and activating and fusing proteins as they undergo exocytosis activity, and SNARE proteins are also vital for the transport regulation of membrane proteins and non-regulatory vesicles. Therefore, there is great significance in establishing a method to efficiently identify SNARE proteins. However, the identification accuracy of the existing methods such as SNARE CNN is not satisfied. In our study, we developed a method based on a support vector machine (SVM) that can effectively recognize SNARE proteins. We used the position-specific scoring matrix (PSSM) method to extract features of SNARE protein sequences, used the support vector machine recursive elimination correlation bias reduction (SVM-RFE-CBR) algorithm to rank the importance of features, and then screened out the optimal subset of feature data based on the sorted results. We input the feature data into the model when building the model, used 10-fold crossing validation for training, and tested model performance by using an independent dataset. In independent tests, the ability of our method to identify SNARE proteins achieved a sensitivity of 68%, specificity of 94%, accuracy of 92%, area under the curve (AUC) of 84%, and Matthew’s correlation coefficient (MCC) of 0.48. The results of the experiment show that the common evaluation indicators of our method are excellent, indicating that our method performs better than other existing classification methods in identifying SNARE proteins.
Collapse
Affiliation(s)
- Zixiao Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Yue Gong
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Bo Gao
- Department of Radiology, The Second Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Hongfei Li
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Wentao Gao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Yuming Zhao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Benzhi Dong
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| |
Collapse
|
27
|
Gu X, Guo L, Liao B, Jiang Q. Pseudo-188D: Phage Protein Prediction Based on a Model of Pseudo-188D. Front Genet 2021; 12:796327. [PMID: 34925468 PMCID: PMC8672092 DOI: 10.3389/fgene.2021.796327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 11/15/2021] [Indexed: 11/13/2022] Open
Abstract
Phages have seriously affected the biochemical systems of the world, and not only are phages related to our health, but medical treatments for many cancers and skin infections are related to phages; therefore, this paper sought to identify phage proteins. In this paper, a Pseudo-188D model was established. The digital features of the phage were extracted by PseudoKNC, an appropriate vector was selected by the AdaBoost tool, and features were extracted by 188D. Then, the extracted digital features were combined together, and finally, the viral proteins of the phage were predicted by a stochastic gradient descent algorithm. Our model effect reached 93.4853%. To verify the stability of our model, we randomly selected 80% of the downloaded data to train the model and used the remaining 20% of the data to verify the robustness of our model.
Collapse
Affiliation(s)
- Xiaomei Gu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Institute of Yangtze River Delta, University of Electronic Science and Technology of China, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Lina Guo
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Bo Liao
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Qinghua Jiang
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| |
Collapse
|
28
|
Abstract
Recently, several anti-inflammatory peptides (AIPs) have been found in the process of the inflammatory response, and these peptides have been used to treat some inflammatory and autoimmune diseases. Therefore, identifying AIPs accurately from a given amino acid sequences is critical for the discovery of novel and efficient anti-inflammatory peptide-based therapeutics and the acceleration of their application in therapy. In this paper, a random forest-based model called iAIPs for identifying AIPs is proposed. First, the original samples were encoded with three feature extraction methods, including g-gap dipeptide composition (GDC), dipeptide deviation from the expected mean (DDE), and amino acid composition (AAC). Second, the optimal feature subset is generated by a two-step feature selection method, in which the feature is ranked by the analysis of variance (ANOVA) method, and the optimal feature subset is generated by the incremental feature selection strategy. Finally, the optimal feature subset is inputted into the random forest classifier, and the identification model is constructed. Experiment results showed that iAIPs achieved an AUC value of 0.822 on an independent test dataset, which indicated that our proposed model has better performance than the existing methods. Furthermore, the extraction of features for peptide sequences provides the basis for evolutionary analysis. The study of peptide identification is helpful to understand the diversity of species and analyze the evolutionary history of species.
Collapse
Affiliation(s)
- Dongxu Zhao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Zhixia Teng
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Yanjuan Li
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
| |
Collapse
|
29
|
Gong Y, Liao B, Wang P, Zou Q. DrugHybrid_BS: Using Hybrid Feature Combined With Bagging-SVM to Predict Potentially Druggable Proteins. Front Pharmacol 2021; 12:771808. [PMID: 34916947 PMCID: PMC8669608 DOI: 10.3389/fphar.2021.771808] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 11/15/2021] [Indexed: 01/09/2023] Open
Abstract
Drug targets are biological macromolecules or biomolecule structures capable of specifically binding a therapeutic effect with a particular drug or regulating physiological functions. Due to the important value and role of drug targets in recent years, the prediction of potential drug targets has become a research hotspot. The key to the research and development of modern new drugs is first to identify potential drug targets. In this paper, a new predictor, DrugHybrid_BS, is developed based on hybrid features and Bagging-SVM to identify potentially druggable proteins. This method combines the three features of monoDiKGap (k = 2), cross-covariance, and grouped amino acid composition. It removes redundant features and analyses key features through MRMD and MRMD2.0. The cross-validation results show that 96.9944% of the potentially druggable proteins can be accurately identified, and the accuracy of the independent test set has reached 96.5665%. This all means that DrugHybrid_BS has the potential to become a useful predictive tool for druggable proteins. In addition, the hybrid key features can identify 80.0343% of the potentially druggable proteins combined with Bagging-SVM, which indicates the significance of this part of the features for research.
Collapse
Affiliation(s)
- Yuxin Gong
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Bo Liao
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Peng Wang
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|
30
|
Han S, Wang N, Guo Y, Tang F, Xu L, Ju Y, Shi L. Application of Sparse Representation in Bioinformatics. Front Genet 2021; 12:810875. [PMID: 34976030 PMCID: PMC8715914 DOI: 10.3389/fgene.2021.810875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 12/01/2021] [Indexed: 11/15/2022] Open
Abstract
Inspired by L1-norm minimization methods, such as basis pursuit, compressed sensing, and Lasso feature selection, in recent years, sparse representation shows up as a novel and potent data processing method and displays powerful superiority. Researchers have not only extended the sparse representation of a signal to image presentation, but also applied the sparsity of vectors to that of matrices. Moreover, sparse representation has been applied to pattern recognition with good results. Because of its multiple advantages, such as insensitivity to noise, strong robustness, less sensitivity to selected features, and no “overfitting” phenomenon, the application of sparse representation in bioinformatics should be studied further. This article reviews the development of sparse representation, and explains its applications in bioinformatics, namely the use of low-rank representation matrices to identify and study cancer molecules, low-rank sparse representations to analyze and process gene expression profiles, and an introduction to related cancers and gene expression profile database.
Collapse
Affiliation(s)
- Shuguang Han
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Ning Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yuxin Guo
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Furong Tang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
- *Correspondence: Ying Ju, ; Lei Shi,
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai, China
- *Correspondence: Ying Ju, ; Lei Shi,
| |
Collapse
|
31
|
Abstract
Structural variations in the genome are closely related to human health and the occurrence and development of various diseases. To understand the mechanisms of diseases, find pathogenic targets, and carry out personalized precision medicine, it is critical to detect such variations. The rapid development of high-throughput sequencing technologies has accelerated the accumulation of large amounts of genomic mutation data, including synonymous mutations. Identifying pathogenic synonymous mutations that play important roles in the occurrence and development of diseases from all the available mutation data is of great importance. In this paper, machine learning theories and methods are reviewed, efficient and accurate pathogenic synonymous mutation prediction methods are developed, and a standardized three-level variant analysis framework is constructed. In addition, multiple variation tolerance prediction models are studied and integrated, and new ideas for structural variation detection based on deep information mining are explored.
Collapse
Affiliation(s)
- Xiuchun Lin
- College of Information and Electrical Engineering, China Agricultural University, Beijing, China
| |
Collapse
|
32
|
Ao C, Zou Q, Yu L. NmRF: identification of multispecies RNA 2'-O-methylation modification sites from RNA sequences. Brief Bioinform 2021; 23:6446272. [PMID: 34850821 DOI: 10.1093/bib/bbab480] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 10/05/2021] [Accepted: 10/18/2021] [Indexed: 12/12/2022] Open
Abstract
2'-O-methylation (Nm) is a post-transcriptional modification of RNA that is catalyzed by 2'-O-methyltransferase and involves replacing the H on the 2'-hydroxyl group with a methyl group. The 2'-O-methylation modification site is detected in a variety of RNA types (miRNA, tRNA, mRNA, etc.), plays an important role in biological processes and is associated with different diseases. There are few functional mechanisms developed at present, and traditional high-throughput experiments are time-consuming and expensive to explore functional mechanisms. For a deeper understanding of relevant biological mechanisms, it is necessary to develop efficient and accurate recognition tools based on machine learning. Based on this, we constructed a predictor called NmRF based on optimal mixed features and random forest classifier to identify 2'-O-methylation modification sites. The predictor can identify modification sites of multiple species at the same time. To obtain a better prediction model, a two-step strategy is adopted; that is, the optimal hybrid feature set is obtained by combining the light gradient boosting algorithm and incremental feature selection strategy. In 10-fold cross-validation, the accuracies of Homo sapiens and Saccharomyces cerevisiae were 89.069 and 93.885%, and the AUC were 0.9498 and 0.9832, respectively. The rigorous 10-fold cross-validation and independent tests confirm that the proposed method is significantly better than existing tools. A user-friendly web server is accessible at http://lab.malab.cn/∼acy/NmRF.
Collapse
Affiliation(s)
- Chunyan Ao
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
33
|
Chen J, Zhang Q, Liu T, Tang H. Roles of M6A Regulators in Hepatocellular Carcinoma: Promotion or Suppression. Curr Gene Ther 2021; 22:40-50. [PMID: 34825870 DOI: 10.2174/1566523221666211126105940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 06/15/2021] [Accepted: 10/14/2021] [Indexed: 11/22/2022]
Abstract
Hepatocellular carcinoma (HCC) is the sixth globally diagnosed cancer with a poor prognosis. Although the pathological factors of hepatocellular carcinoma are well elucidated, the underlying molecular mechanisms remain unclear. N6-methyladenosine (m6A) is an adenosine methylation occurring at the N6 site, which is the most prevalent modification of eukaryotic mRNA. Recent studies have shown that m6A can regulate gene expression, thus modulating the processes of cell self-renewal, differentiation, and apoptosis. The methyls in m6A are installed by methyltransferases ("writers"), removed by demethylases ("erasers") and recognized by m6A-binding proteins ("readers"). In this review, we discuss the roles of above regulators in the progression and prognosis of HCC, and summarize the clinical association between m6A modification and hepatocellular carcinoma, so as to provide more valuable information for clinical treatment.
Collapse
Affiliation(s)
- Jiamao Chen
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, China
| | - Qian Zhang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, China
| | - Ting Liu
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, China
| | - Hua Tang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou, China
| |
Collapse
|
34
|
Abstract
BACKGROUND Cancer is one of the most serious diseases threatening human health. Cancer immunotherapy represents the most promising treatment strategy due to its high efficacy and selectivity and lower side effects compared with traditional treatment. The identification of tumor T cell antigens is one of the most important tasks for antitumor vaccines development and molecular function investigation. Although several machine learning predictors have been developed to identify tumor T cell antigen, more accurate tumor T cell antigen identification by existing methodology is still challenging. METHODS In this study, we used a non-redundant dataset of 592 tumor T cell antigens (positive samples) and 393 tumor T cell antigens (negative samples). Four types feature encoding methods have been studied to build an efficient predictor, including amino acid composition, global protein sequence descriptors and grouped amino acid and peptide composition. To improve the feature representation ability of the hybrid features, we further employed a two-step feature selection technique to search for the optimal feature subset. The final prediction model was constructed using random forest algorithm. RESULTS Finally, the top 263 informative features were selected to train the random forest classifier for detecting tumor T cell antigen peptides. iTTCA-RF provides satisfactory performance, with balanced accuracy, specificity and sensitivity values of 83.71%, 78.73% and 88.69% over tenfold cross-validation as well as 73.14%, 62.67% and 83.61% over independent tests, respectively. The online prediction server was freely accessible at http://lab.malab.cn/~acy/iTTCA . CONCLUSIONS We have proven that the proposed predictor iTTCA-RF is superior to the other latest models, and will hopefully become an effective and useful tool for identifying tumor T cell antigens presented in the context of major histocompatibility complex class I.
Collapse
Affiliation(s)
- Shihu Jiao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Huannan Guo
- Department of Oncology, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China.
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai, China.
| |
Collapse
|
35
|
Yang YH, Wang JS, Yuan SS, Liu ML, Su W, Lin H, Zhang ZY. A Survey for Predicting ATP Binding Residues of Proteins Using Machine Learning Methods. Curr Med Chem 2021; 29:789-806. [PMID: 34514982 DOI: 10.2174/0929867328666210910125802] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 06/29/2021] [Accepted: 07/04/2021] [Indexed: 11/22/2022]
Abstract
Protein-ligand interactions are necessary for majority protein functions. Adenosine-5'-triphosphate (ATP) is one such ligand that plays vital role as a coenzyme in providing energy for cellular activities, catalyzing biological reaction and signaling. Knowing ATP binding residues of proteins is helpful for annotation of protein function and drug design. However, due to the huge amounts of protein sequences influx into databases in the post-genome era, experimentally identifying ATP binding residues is cost-ineffective and time-consuming. To address this problem, computational methods have been developed to predict ATP binding residues. In this review, we briefly summarized the application of machine learning methods in detecting ATP binding residues of proteins. We expect this review will be helpful for further research.
Collapse
Affiliation(s)
- Yu-He Yang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Jia-Shu Wang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Shi-Shi Yuan
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Meng-Lu Liu
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Wei Su
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Hao Lin
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Zhao-Yue Zhang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| |
Collapse
|
36
|
Zulfiqar H, Sun ZJ, Huang QL, Yuan SS, Lv H, Dao FY, Lin H, Li YW. Deep-4mCW2V: A sequence-based predictor to identify N4-methylcytosine sites in Escherichia coli. Methods 2021; 203:558-563. [PMID: 34352373 DOI: 10.1016/j.ymeth.2021.07.011] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 07/22/2021] [Accepted: 07/29/2021] [Indexed: 10/20/2022] Open
Abstract
N4-methylcytosine (4mC) is a type of DNA modification which could regulate several biological progressions such as transcription regulation, replication and gene expressions. Precisely recognizing 4mC sites in genomic sequences can provide specific knowledge about their genetic roles. This study aimed to develop a deep learning-based model to predict 4mC sites in the Escherichia coli. In the model, DNA sequences were encoded by word embedding technique 'word2vec'. The obtained features were inputted into 1-D convolutional neural network (CNN) to discriminate 4mC sites from non-4mC sites in Escherichia coli genome. The examination on independent dataset showed that our model could yield the overall accuracy of 0.861, which was about 4.3% higher than the existing model. To provide convenience to scholars, we provided the data and source code of the model which can be freely download from https://github.com/linDing-groups/Deep-4mCW2V.
Collapse
Affiliation(s)
- Hasan Zulfiqar
- Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zi-Jie Sun
- Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Qin-Lai Huang
- Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Shi-Shi Yuan
- Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lv
- Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fu-Ying Dao
- Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Yan-Wen Li
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China; Key Laboratory of Intelligent Information Processing of Jilin Province, Northeast Normal University, Changchun 130117, China; Institute of Computational Biology, Northeast Normal University, Changchun 130117, China.
| |
Collapse
|
37
|
Zulfiqar H, Yuan SS, Huang QL, Sun ZJ, Dao FY, Yu XL, Lin H. Identification of cyclin protein using gradient boost decision tree algorithm. Comput Struct Biotechnol J 2021; 19:4123-4131. [PMID: 34527186 PMCID: PMC8346528 DOI: 10.1016/j.csbj.2021.07.013] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 07/15/2021] [Accepted: 07/15/2021] [Indexed: 12/12/2022] Open
Abstract
Cyclin proteins are capable to regulate the cell cycle by forming a complex with cyclin-dependent kinases to activate cell cycle. Correct recognition of cyclin proteins could provide key clues for studying their functions. However, their sequences share low similarity, which results in poor prediction for sequence similarity-based methods. Thus, it is urgent to construct a machine learning model to identify cyclin proteins. This study aimed to develop a computational model to discriminate cyclin proteins from non-cyclin proteins. In our model, protein sequences were encoded by seven kinds of features that are amino acid composition, composition of k-spaced amino acid pairs, tri peptide composition, pseudo amino acid composition, geary correlation, normalized moreau-broto autocorrelation and composition/transition/distribution. Afterward, these features were optimized by using analysis of variance (ANOVA) and minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) technique. A gradient boost decision tree (GBDT) classifier was trained on the optimal features. Five-fold cross-validated results showed that our model would identify cyclins with an accuracy of 93.06% and AUC value of 0.971, which are higher than the two recent studies on the same data.
Collapse
Affiliation(s)
- Hasan Zulfiqar
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Shi-Shi Yuan
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Qin-Lai Huang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zi-Jie Sun
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fu-Ying Dao
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiao-Long Yu
- School of Materials Science and Engineering, Hainan University, Haikou 570228, China
| | - Hao Lin
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
38
|
Abstract
A neurotoxin is essentially a protein that mainly acts on the nervous system; it has a selective toxic effect on the central nervous system and neuromuscular nodes, can cause muscle paralysis and respiratory paralysis, and has strong lethality. According to their principle of action, neurotoxins are divided into presynaptic neurotoxins and postsynaptic neurotoxins. Correctly identifying presynaptic and postsynaptic nerve toxins provides important clues for future drug development and the discovery of drug targets. Therefore, a predictive model, Neu_LR, was constructed in this paper. The monoMonokGap method was used to extract the frequency characteristics of presynaptic and postsynaptic neurotoxin sequences and carry out feature selection, then, based on the important features obtained after dimensionality reduction, the prediction model Neu_LR was constructed using a logistic regression algorithm, and ten-fold cross-validation and independent test set validation were used. The final accuracy rates were 99.6078 and 94.1176%, respectively, which proved that the Neu_LR model had good predictive performance and robustness, and could meet the prediction requirements of presynaptic and postsynaptic neurotoxins. The data and source code of the model can be freely download from https://github.com/gyx123681/.
Collapse
Affiliation(s)
- Wen Zhu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Yuxin Guo
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|
39
|
Ru X, Ye X, Sakurai T, Zou Q, Xu L, Lin C. Current status and future prospects of drug-target interaction prediction. Brief Funct Genomics 2021; 20:312-322. [PMID: 34189559 DOI: 10.1093/bfgp/elab031] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Revised: 06/01/2021] [Accepted: 06/04/2021] [Indexed: 01/09/2023] Open
Abstract
Drug-target interaction prediction is important for drug development and drug repurposing. Many computational methods have been proposed for drug-target interaction prediction due to their potential to the time and cost reduction. In this review, we introduce the molecular docking and machine learning-based methods, which have been widely applied to drug-target interaction prediction. Particularly, machine learning-based methods are divided into different types according to the data processing form and task type. For each type of method, we provide a specific description and propose some solutions to improve its capability. The knowledge of heterogeneous network and learning to rank are also summarized in this review. As far as we know, this is the first comprehensive review that summarizes the knowledge of heterogeneous network and learning to rank in the drug-target interaction prediction. Moreover, we propose three aspects that can be explored in depth for future research.
Collapse
Affiliation(s)
| | - Xiucai Ye
- Department of Computer Science, and Center for Artificial Intelligence Research (C-AIR), University of Tsukuba
| | - Tetsuya Sakurai
- Department of Computer Science and is the director of the C-AIR, University of Tsukuba
| | - Quan Zou
- University of Electronic Science and Technology of China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic
| | | |
Collapse
|
40
|
Jiao S, Xu L, Ju Y. CWLy-RF: A novel approach for identifying cell wall lyases based on random forest classifier. Genomics 2021; 113:2919-24. [PMID: 34186189 DOI: 10.1016/j.ygeno.2021.06.038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Revised: 06/20/2021] [Accepted: 06/25/2021] [Indexed: 02/05/2023]
Abstract
Drug resistance of pathogenic bacteria has become increasingly serious due to the abuse of antibiotics in recent years. Researchers have found that cell wall lyases are effective antibacterial agents that can specifically recognize target bacteria and degrade bacterial peptidoglycan. Traditional wet experiments are usually expensive, time-consuming and laborious for the identification of lyases. Therefore, there is an urgent need to develop prediction tools based on computer methods to identify lyases quickly and accurately. In this paper, a new predictor, CWLy-RF, is proposed based on the random forest (RF) algorithm to identify cell wall lyases. In this method, we combined three features, namely, 400D, 188D and the composition of k-spaced amino acid group pairs, using mixed-feature representation methods. Afterward, we improved the feature representation ability with the selected top 100 features by using the information gain method and trained a predictive model using RF. The constructed prediction model is evaluated by using 10-fold cross-validation. The accuracy obtained was 96.09%, the AUC was 0.993, the MCC was 0.922, the sensitivity was 94.92%, and the specificity was 97.32%. We have proved that the proposed predictor CWLy-RF is superior to other latest models, and it will hopefully become an effective and useful tool for identifying lyases.
Collapse
|