1
|
Wu S, Xu J, Guo JT. Accurate prediction of nucleic acid binding proteins using protein language model. BIOINFORMATICS ADVANCES 2025; 5:vbaf008. [PMID: 39990254 PMCID: PMC11845279 DOI: 10.1093/bioadv/vbaf008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2024] [Revised: 12/20/2024] [Accepted: 01/15/2025] [Indexed: 02/25/2025]
Abstract
Motivation Nucleic acid binding proteins (NABPs) play critical roles in various and essential biological processes. Many machine learning-based methods have been developed to predict different types of NABPs. However, most of these studies have limited applications in predicting the types of NABPs for any given protein with unknown functions, due to several factors such as dataset construction, prediction scope and features used for training and testing. In addition, single-stranded DNA binding proteins (DBP) (SSBs) have not been extensively investigated for identifying novel SSBs from proteins with unknown functions. Results To improve prediction accuracy of different types of NABPs for any given protein, we developed hierarchical and multi-class models with machine learning-based methods and a feature extracted from protein language model ESM2. Our results show that by combining the feature from ESM2 and machine learning methods, we can achieve high prediction accuracy up to 95% for each stage in the hierarchical approach, and 85% for overall prediction accuracy from the multi-class approach. More importantly, besides the much improved prediction of other types of NABPs, the models can be used to accurately predict single-stranded DBPs, which is underexplored. Availability and implementation The datasets and code can be found at https://figshare.com/projects/Prediction_of_nucleic_acid_binding_proteins_using_protein_language_model/211555.
Collapse
Affiliation(s)
- Siwen Wu
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, United States
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL 60637, United States
| | - Jun-tao Guo
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, United States
| |
Collapse
|
2
|
Pradhan UK, Naha S, Das R, Gupta A, Parsad R, Meher PK. RBProkCNN: Deep learning on appropriate contextual evolutionary information for RNA binding protein discovery in prokaryotes. Comput Struct Biotechnol J 2024; 23:1631-1640. [PMID: 38660008 PMCID: PMC11039349 DOI: 10.1016/j.csbj.2024.04.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 04/12/2024] [Accepted: 04/12/2024] [Indexed: 04/26/2024] Open
Abstract
RNA-binding proteins (RBPs) are central to key functions such as post-transcriptional regulation, mRNA stability, and adaptation to varied environmental conditions in prokaryotes. While the majority of research has concentrated on eukaryotic RBPs, recent developments underscore the crucial involvement of prokaryotic RBPs. Although computational methods have emerged in recent years to identify RBPs, they have fallen short in accurately identifying prokaryotic RBPs due to their generic nature. To bridge this gap, we introduce RBProkCNN, a novel machine learning-driven computational model meticulously designed for the accurate prediction of prokaryotic RBPs. The prediction process involves the utilization of eight shallow learning algorithms and four deep learning models, incorporating PSSM-based evolutionary features. By leveraging a convolutional neural network (CNN) and evolutionarily significant features selected through extreme gradient boosting variable importance measure, RBProkCNN achieved the highest accuracy in five-fold cross-validation, yielding 98.04% auROC and 98.19% auPRC. Furthermore, RBProkCNN demonstrated robust performance with an independent dataset, showcasing a commendable 95.77% auROC and 95.78% auPRC. Noteworthy is its superior predictive accuracy when compared to several state-of-the-art existing models. RBProkCNN is available as an online prediction tool (https://iasri-sg.icar.gov.in/rbprokcnn/), offering free access to interested users. This tool represents a substantial contribution, enriching the array of resources available for the accurate and efficient prediction of prokaryotic RBPs.
Collapse
Affiliation(s)
- Upendra Kumar Pradhan
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Sanchita Naha
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Ritwika Das
- Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Ajit Gupta
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Rajender Parsad
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| |
Collapse
|
3
|
Wu S, Guo JT. Improved prediction of DNA and RNA binding proteins with deep learning models. Brief Bioinform 2024; 25:bbae285. [PMID: 38856168 PMCID: PMC11163377 DOI: 10.1093/bib/bbae285] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 05/20/2024] [Accepted: 05/31/2024] [Indexed: 06/11/2024] Open
Abstract
Nucleic acid-binding proteins (NABPs), including DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs), play important roles in essential biological processes. To facilitate functional annotation and accurate prediction of different types of NABPs, many machine learning-based computational approaches have been developed. However, the datasets used for training and testing as well as the prediction scopes in these studies have limited their applications. In this paper, we developed new strategies to overcome these limitations by generating more accurate and robust datasets and developing deep learning-based methods including both hierarchical and multi-class approaches to predict the types of NABPs for any given protein. The deep learning models employ two layers of convolutional neural network and one layer of long short-term memory. Our approaches outperform existing DBP and RBP predictors with a balanced prediction between DBPs and RBPs, and are more practically useful in identifying novel NABPs. The multi-class approach greatly improves the prediction accuracy of DBPs and RBPs, especially for the DBPs with ~12% improvement. Moreover, we explored the prediction accuracy of single-stranded DNA binding proteins and their effect on the overall prediction accuracy of NABP predictions.
Collapse
Affiliation(s)
- Siwen Wu
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, United States
| | - Jun-tao Guo
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, United States
| |
Collapse
|
4
|
Kumar N, Tripathi S, Sharma N, Patiyal S, Devi NL, Raghava GPS. A method for predicting linear and conformational B-cell epitopes in an antigen from its primary sequence. Comput Biol Med 2024; 170:108083. [PMID: 38295479 DOI: 10.1016/j.compbiomed.2024.108083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 12/26/2023] [Accepted: 01/27/2024] [Indexed: 02/02/2024]
Abstract
B-cell is an essential component of the immune system that plays a vital role in providing the immune response against any pathogenic infection by producing antibodies. Existing methods either predict linear or conformational B-cell epitopes in an antigen. In this study, a single method was developed for predicting both types (linear/conformational) of B-cell epitopes. The dataset used in this study contains 3875 B-cell epitopes and 3996 non-B-cell epitopes, where B-cell epitopes consist of both linear and conformational B-cell epitopes. Our primary analysis indicates that certain residues (like Asp, Glu, Lys, and Asn) are more prominent in B-cell epitopes. We developed machine-learning based methods using different types of sequence composition and achieved the highest AUROC of 0.80 using dipeptide composition. In addition, models were developed on selected features, but no further improvement was observed. Our similarity-based method implemented using BLAST shows a high probability of correct prediction with poor sensitivity. Finally, we developed a hybrid model that combines alignment-free (dipeptide based random forest model) and alignment-based (BLAST-based similarity) models. Our hybrid model attained a maximum AUROC of 0.83 with an MCC of 0.49 on the independent dataset. Our hybrid model performs better than existing methods on an independent dataset used in this study. All models were trained and tested on 80 % of the data using a cross-validation technique, and the final model was evaluated on 20 % of the data, called an independent or validation dataset. A webserver and standalone package named "CLBTope" has been developed for predicting, designing, and scanning B-cell epitopes in an antigen sequence available at (https://webs.iiitd.edu.in/raghava/clbtope/).
Collapse
Affiliation(s)
- Nishant Kumar
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Sadhana Tripathi
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Neelam Sharma
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Naorem Leimarembi Devi
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| |
Collapse
|
5
|
Iwaniak A, Minkiewicz P, Darewicz M. Bioinformatics and bioactive peptides from foods: Do they work together? ADVANCES IN FOOD AND NUTRITION RESEARCH 2024; 108:35-111. [PMID: 38461003 DOI: 10.1016/bs.afnr.2023.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/11/2024]
Abstract
We live in the Big Data Era which affects many aspects of science, including research on bioactive peptides derived from foods, which during the last few decades have been a focus of interest for scientists. These two issues, i.e., the development of computer technologies and progress in the discovery of novel peptides with health-beneficial properties, are closely interrelated. This Chapter presents the example applications of bioinformatics for studying biopeptides, focusing on main aspects of peptide analysis as the starting point, including: (i) the role of peptide databases; (ii) aspects of bioactivity prediction; (iii) simulation of peptide release from proteins. Bioinformatics can also be used for predicting other features of peptides, including ADMET, QSAR, structure, and taste. To answer the question asked "bioinformatics and bioactive peptides from foods: do they work together?", currently it is almost impossible to find examples of peptide research with no bioinformatics involved. However, theoretical predictions are not equivalent to experimental work and always require critical scrutiny. The aspects of compatibility of in silico and in vitro results are also summarized herein.
Collapse
Affiliation(s)
- Anna Iwaniak
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Olsztyn-Kortowo, Poland.
| | - Piotr Minkiewicz
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Olsztyn-Kortowo, Poland
| | - Małgorzata Darewicz
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Olsztyn-Kortowo, Poland
| |
Collapse
|
6
|
Avila-Lopez P, Lauberth SM. Exploring new roles for RNA-binding proteins in epigenetic and gene regulation. Curr Opin Genet Dev 2024; 84:102136. [PMID: 38128453 PMCID: PMC11245729 DOI: 10.1016/j.gde.2023.102136] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 11/12/2023] [Accepted: 11/15/2023] [Indexed: 12/23/2023]
Abstract
A significant portion of the human proteome comprises RNA-binding proteins (RBPs) that play fundamental roles in numerous biological processes. In the last decade, there has been a staggering increase in RBP identification and classification, which has fueled interest in the evolving roles of RBPs and RBP-driven molecular mechanisms. Here, we focus on recent insights into RBP-dependent regulation of the epigenetic and transcriptional landscape. We describe advances in methodologies that define the RNA-protein interactome and machine-learning algorithms that are streamlining RBP discovery and predicting new RNA-binding regions. Finally, we present how RBP dysregulation leads to alterations in tumor-promoting gene expression and discuss the potential for targeting these RBPs for the development of new cancer therapeutics.
Collapse
Affiliation(s)
- Pedro Avila-Lopez
- Simpson Querrey Institute for Epigenetics, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA; Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Shannon M Lauberth
- Simpson Querrey Institute for Epigenetics, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA; Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA.
| |
Collapse
|
7
|
Pradhan UK, Meher PK, Naha S, Pal S, Gupta S, Gupta A, Parsad R. RBPLight: a computational tool for discovery of plant-specific RNA-binding proteins using light gradient boosting machine and ensemble of evolutionary features. Brief Funct Genomics 2023; 22:401-410. [PMID: 37158175 DOI: 10.1093/bfgp/elad016] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 04/12/2023] [Accepted: 04/21/2023] [Indexed: 05/10/2023] Open
Abstract
RNA-binding proteins (RBPs) are essential for post-transcriptional gene regulation in eukaryotes, including splicing control, mRNA transport and decay. Thus, accurate identification of RBPs is important to understand gene expression and regulation of cell state. In order to detect RBPs, a number of computational models have been developed. These methods made use of datasets from several eukaryotic species, specifically from mice and humans. Although some models have been tested on Arabidopsis, these techniques fall short of correctly identifying RBPs for other plant species. Therefore, the development of a powerful computational model for identifying plant-specific RBPs is needed. In this study, we presented a novel computational model for locating RBPs in plants. Five deep learning models and ten shallow learning algorithms were utilized for prediction with 20 sequence-derived and 20 evolutionary feature sets. The highest repeated five-fold cross-validation accuracy, 91.24% AU-ROC and 91.91% AU-PRC, was achieved by light gradient boosting machine. While evaluated using an independent dataset, the developed approach achieved 94.00% AU-ROC and 94.50% AU-PRC. The proposed model achieved significantly higher accuracy for predicting plant-specific RBPs as compared to the currently available state-of-art RBP prediction models. Despite the fact that certain models have already been trained and assessed on the model organism Arabidopsis, this is the first comprehensive computer model for the discovery of plant-specific RBPs. The web server RBPLight was also developed, which is publicly accessible at https://iasri-sg.icar.gov.in/rbplight/, for the convenience of researchers to identify RBPs in plants.
Collapse
Affiliation(s)
- Upendra K Pradhan
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Prabina K Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Sanchita Naha
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Soumen Pal
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Sagar Gupta
- CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur (HP) 176061, India
| | - Ajit Gupta
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| | - Rajender Parsad
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India
| |
Collapse
|
8
|
Arican OC, Gumus O. PredDRBP-MLP: Prediction of DNA-binding proteins and RNA-binding proteins by multilayer perceptron. Comput Biol Med 2023; 164:107317. [PMID: 37562328 DOI: 10.1016/j.compbiomed.2023.107317] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 07/27/2023] [Accepted: 08/07/2023] [Indexed: 08/12/2023]
Abstract
Proteins interact with many molecules in order to maintain the vital activities in cells. Proteins that interact with DNA are called DNA-binding proteins (DBP), and proteins that interact with RNA are called RNA-binding proteins (RBP). Since DBPs and RBPs are involved in critical biological processes, their classification is quite important. Although the convolutional neural network and bidirectional long-short-term memory hybrid model (CNN-BiLSTM) is very popular in DBP and RBP classification, it has problems such as requirement of high processing power and long training time. Therefore, a multilayer perceptron (MLP) based predictor, PredDRBP-MLP (Predictor of DNA-Binding Proteins and RNA-Binding Proteins - Multilayer Perceptron) was developed in this study. PredDRBP-MLP is an artificial learning model that performs multi-class classification of DBPs, RBPs and non-nucleic acid-binding proteins (NNABP). PredDRBP-MLP achieved quite successful results on the independent dataset, specifically in the NNABP class, compared to the existing predictors, in addition to requiring lower processing power and being able to train quicker compared to CNN-BiLSTM based predictors. In NNABP class, PredDRBP-MLP predictor achieved 0.578 precision, 0.522 recall and 0.549 F1-score, while other multi-class predictor achieved 0.486 precision, 0.183 recall and 0.266 F1-score. A desktop application was developed for PredDRBP-MLP. The application is freely accessible at https://sourceforge.net/projects/preddrbp-mlp.
Collapse
Affiliation(s)
- Ozgur Can Arican
- Department of Health Bioinformatics, Ege University, 35100, Izmir, Turkey.
| | - Ozgur Gumus
- Department of Computer Engineering, Ege University, 35100, Izmir, Turkey.
| |
Collapse
|
9
|
Agarwal A, Kant S, Bahadur RP. Efficient mapping of RNA-binding residues in RNA-binding proteins using local sequence features of binding site residues in protein-RNA complexes. Proteins 2023; 91:1361-1379. [PMID: 37254800 DOI: 10.1002/prot.26528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 04/13/2023] [Accepted: 05/02/2023] [Indexed: 06/01/2023]
Abstract
Protein-RNA interactions play vital roles in plethora of biological processes such as regulation of gene expression, protein synthesis, mRNA processing and biogenesis. Identification of RNA-binding residues (RBRs) in proteins is essential to understand RNA-mediated protein functioning, to perform site-directed mutagenesis and to develop novel targeted drug therapies. Moreover, the extensive gap between sequence and structural data restricts the identification of binding sites in unsolved structures. However, efficient use of computational methods demanding only sequence to identify binding residues can bridge this huge sequence-structure gap. In this study, we have extensively studied protein-RNA interface in known RNA-binding proteins (RBPs). We find that the interface is highly enriched in basic and polar residues with Gly being the most common interface neighbor. We investigated several amino acid features and developed a method to predict putative RBRs from amino acid sequence. We have implemented balanced random forest (BRF) classifier with local residue features of protein sequences for prediction. With 5-fold cross-validations, the sequence pattern derived dipeptide composition based BRF model (DCP-BRF) resulted in an accuracy of 87.9%, specificity of 88.8%, sensitivity of 82.2%, Mathew's correlation coefficient of 0.60 and AUC of 0.93, performing better than few existing methods. We further validated our prediction model on known human RBPs through RBR prediction and could map ~54% of them. Further, knowledge of binding site preferences obtained from computational predictions combined with experimental validations of potential RNA binding sites can enhance our understanding of protein-RNA interactions. This may serve to accelerate investigations on functional roles of many novel RBPs.
Collapse
Affiliation(s)
- Ankita Agarwal
- School of Bio Science, Indian Institute of Technology Kharagpur, Kharagpur, India
- Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Shri Kant
- Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Ranjit Prasad Bahadur
- Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| |
Collapse
|
10
|
Solis-Miranda J, Chodasiewicz M, Skirycz A, Fernie AR, Moschou PN, Bozhkov PV, Gutierrez-Beltran E. Stress-related biomolecular condensates in plants. THE PLANT CELL 2023; 35:3187-3204. [PMID: 37162152 PMCID: PMC10473214 DOI: 10.1093/plcell/koad127] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 04/07/2023] [Accepted: 04/27/2023] [Indexed: 05/11/2023]
Abstract
Biomolecular condensates are membraneless organelle-like structures that can concentrate molecules and often form through liquid-liquid phase separation. Biomolecular condensate assembly is tightly regulated by developmental and environmental cues. Although research on biomolecular condensates has intensified in the past 10 years, our current understanding of the molecular mechanisms and components underlying their formation remains in its infancy, especially in plants. However, recent studies have shown that the formation of biomolecular condensates may be central to plant acclimation to stress conditions. Here, we describe the mechanism, regulation, and properties of stress-related condensates in plants, focusing on stress granules and processing bodies, 2 of the most well-characterized biomolecular condensates. In this regard, we showcase the proteomes of stress granules and processing bodies in an attempt to suggest methods for elucidating the composition and function of biomolecular condensates. Finally, we discuss how biomolecular condensates modulate stress responses and how they might be used as targets for biotechnological efforts to improve stress tolerance.
Collapse
Affiliation(s)
- Jorge Solis-Miranda
- Institutode Bioquimica Vegetal y Fotosintesis, Consejo Superior de Investigaciones Cientificas (CSIC)-Universidad de Sevilla, 41092 Sevilla, Spain
| | - Monika Chodasiewicz
- Biological and Environmental Science and Engineering Division, Center for Desert Agriculture, King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| | | | - Alisdair R Fernie
- Max-Planck-Institute of Molecular Plant Physiology, 14476 Potsdam-Golm, Germany
| | - Panagiotis N Moschou
- Department of Plant Biology, Uppsala BioCenter, Swedish University of Agricultural Sciences and Linnean Center for Plant Biology, 75007 Uppsala, Sweden
- Department of Biology, University of Crete, Heraklion 71409, Greece
- Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology-Hellas, Heraklion 70013, Greece
| | - Peter V Bozhkov
- Department of Molecular Sciences, Uppsala BioCenter, Swedish University of Agricultural Sciences and Linnean Center for Plant Biology, 75007 Uppsala, Sweden
| | - Emilio Gutierrez-Beltran
- Institutode Bioquimica Vegetal y Fotosintesis, Consejo Superior de Investigaciones Cientificas (CSIC)-Universidad de Sevilla, 41092 Sevilla, Spain
- Departamento de Bioquimica Vegetal y Biologia Molecular, Facultad de Biologia, Universidad de Sevilla, 41012 Sevilla, Spain
| |
Collapse
|
11
|
Jin W, Brannan KW, Kapeli K, Park SS, Tan HQ, Gosztyla ML, Mujumdar M, Ahdout J, Henroid B, Rothamel K, Xiang JS, Wong L, Yeo GW. HydRA: Deep-learning models for predicting RNA-binding capacity from protein interaction association context and protein sequence. Mol Cell 2023; 83:2595-2611.e11. [PMID: 37421941 PMCID: PMC11098078 DOI: 10.1016/j.molcel.2023.06.019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 03/20/2023] [Accepted: 06/13/2023] [Indexed: 07/10/2023]
Abstract
RNA-binding proteins (RBPs) control RNA metabolism to orchestrate gene expression and, when dysfunctional, underlie human diseases. Proteome-wide discovery efforts predict thousands of RBP candidates, many of which lack canonical RNA-binding domains (RBDs). Here, we present a hybrid ensemble RBP classifier (HydRA), which leverages information from both intermolecular protein interactions and internal protein sequence patterns to predict RNA-binding capacity with unparalleled specificity and sensitivity using support vector machines (SVMs), convolutional neural networks (CNNs), and Transformer-based protein language models. Occlusion mapping by HydRA robustly detects known RBDs and predicts hundreds of uncharacterized RNA-binding associated domains. Enhanced CLIP (eCLIP) for HydRA-predicted RBP candidates reveals transcriptome-wide RNA targets and confirms RNA-binding activity for HydRA-predicted RNA-binding associated domains. HydRA accelerates construction of a comprehensive RBP catalog and expands the diversity of RNA-binding associated domains.
Collapse
Affiliation(s)
- Wenhao Jin
- Department of Cellular and Molecular Medicine, University of Califorinia, San Diego, La Jolla, CA, USA; Institute for Genomic Medicine and UCSD Stem Cell Program, University of California, San Diego, La Jolla, CA, USA; Stem Cell Program, University of California, San Diego, La Jolla, CA, USA
| | - Kristopher W Brannan
- Department of Cellular and Molecular Medicine, University of Califorinia, San Diego, La Jolla, CA, USA; Institute for Genomic Medicine and UCSD Stem Cell Program, University of California, San Diego, La Jolla, CA, USA; Stem Cell Program, University of California, San Diego, La Jolla, CA, USA
| | - Katannya Kapeli
- Department of Physiology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Samuel S Park
- Department of Cellular and Molecular Medicine, University of Califorinia, San Diego, La Jolla, CA, USA; Institute for Genomic Medicine and UCSD Stem Cell Program, University of California, San Diego, La Jolla, CA, USA; Stem Cell Program, University of California, San Diego, La Jolla, CA, USA
| | - Hui Qing Tan
- Department of Physiology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Maya L Gosztyla
- Department of Cellular and Molecular Medicine, University of Califorinia, San Diego, La Jolla, CA, USA; Institute for Genomic Medicine and UCSD Stem Cell Program, University of California, San Diego, La Jolla, CA, USA; Stem Cell Program, University of California, San Diego, La Jolla, CA, USA
| | - Mayuresh Mujumdar
- Department of Cellular and Molecular Medicine, University of Califorinia, San Diego, La Jolla, CA, USA; Institute for Genomic Medicine and UCSD Stem Cell Program, University of California, San Diego, La Jolla, CA, USA; Stem Cell Program, University of California, San Diego, La Jolla, CA, USA
| | - Joshua Ahdout
- Department of Cellular and Molecular Medicine, University of Califorinia, San Diego, La Jolla, CA, USA; Institute for Genomic Medicine and UCSD Stem Cell Program, University of California, San Diego, La Jolla, CA, USA; Stem Cell Program, University of California, San Diego, La Jolla, CA, USA
| | - Bryce Henroid
- Department of Cellular and Molecular Medicine, University of Califorinia, San Diego, La Jolla, CA, USA; Institute for Genomic Medicine and UCSD Stem Cell Program, University of California, San Diego, La Jolla, CA, USA; Stem Cell Program, University of California, San Diego, La Jolla, CA, USA
| | - Katherine Rothamel
- Department of Cellular and Molecular Medicine, University of Califorinia, San Diego, La Jolla, CA, USA; Institute for Genomic Medicine and UCSD Stem Cell Program, University of California, San Diego, La Jolla, CA, USA; Stem Cell Program, University of California, San Diego, La Jolla, CA, USA
| | - Joy S Xiang
- Department of Cellular and Molecular Medicine, University of Califorinia, San Diego, La Jolla, CA, USA; Institute for Genomic Medicine and UCSD Stem Cell Program, University of California, San Diego, La Jolla, CA, USA; Stem Cell Program, University of California, San Diego, La Jolla, CA, USA
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, Singapore, Singapore
| | - Gene W Yeo
- Department of Cellular and Molecular Medicine, University of Califorinia, San Diego, La Jolla, CA, USA; Institute for Genomic Medicine and UCSD Stem Cell Program, University of California, San Diego, La Jolla, CA, USA; Stem Cell Program, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
12
|
Yan K, Feng J, Huang J, Wu H. iDRPro-SC: identifying DNA-binding proteins and RNA-binding proteins based on subfunction classifiers. Brief Bioinform 2023:bbad251. [PMID: 37405873 DOI: 10.1093/bib/bbad251] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/10/2023] [Accepted: 06/12/2023] [Indexed: 07/07/2023] Open
Abstract
Nucleic acid-binding proteins are proteins that interact with DNA and RNA to regulate gene expression and transcriptional control. The pathogenesis of many human diseases is related to abnormal gene expression. Therefore, recognizing nucleic acid-binding proteins accurately and efficiently has important implications for disease research. To address this question, some scientists have proposed the method of using sequence information to identify nucleic acid-binding proteins. However, different types of nucleic acid-binding proteins have different subfunctions, and these methods ignore their internal differences, so the performance of the predictor can be further improved. In this study, we proposed a new method, called iDRPro-SC, to predict the type of nucleic acid-binding proteins based on the sequence information. iDRPro-SC considers the internal differences of nucleic acid-binding proteins and combines their subfunctions to build a complete dataset. Additionally, we used an ensemble learning to characterize and predict nucleic acid-binding proteins. The results of the test dataset showed that iDRPro-SC achieved the best prediction performance and was superior to the other existing nucleic acid-binding protein prediction methods. We have established a web server that can be accessed online: http://bliulab.net/iDRPro-SC.
Collapse
Affiliation(s)
- Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Jiawei Feng
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Jing Huang
- Huajian Yutong Technology (Beijing) Co., Ltd
- State Key Laboratory of Media Convergence Production Technology and Systems, Beijing China,100803
- Xinhua New Media Culture Communication Co., Ltd
| | - Hao Wu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
13
|
Wang Z, Zhu H. Exploiting liver metabolism for tissue-specific cancer targeting. NATURE CANCER 2023; 4:310-311. [PMID: 36977775 DOI: 10.1038/s43018-023-00530-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
Affiliation(s)
- Zixi Wang
- Children's Research Institute, Departments of Pediatrics and Internal Medicine, Center for Regenerative Science and Medicine, Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Hao Zhu
- Children's Research Institute, Departments of Pediatrics and Internal Medicine, Center for Regenerative Science and Medicine, Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
14
|
Li YT, Liu CJ, Kao JH, Lin LF, Tu HC, Wang CC, Huang PH, Cheng HR, Chen PJ, Chen DS, Wu HL. Metastatic tumor antigen 1 contributes to hepatocarcinogenesis posttranscriptionally through RNA-binding function. Hepatology 2023; 77:379-394. [PMID: 35073601 DOI: 10.1002/hep.32356] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 01/07/2022] [Accepted: 01/12/2022] [Indexed: 01/28/2023]
Abstract
BACKGROUND AND AIMS Both nuclear and cytoplasmic overexpression of metastatic tumor antigen 1 (MTA1) contributes to tumorigenesis of HCC. Most studies have focused on nuclear MTA1 whose function is mainly a chromatin modifier regulating the expression of various cancer-promoting genes. By contrast, the molecular mechanisms of cytoplasmic MTA1 in carcinogenesis remain elusive. Here, we reveal a role of MTA1 in posttranscriptional gene regulation. APPROACH AND RESULTS We conducted the in vitro and in vivo RNA-protein interaction assays indicating that MTA1 could bind directly to the 3'-untranslated region of MYC RNA. Mutation at the first glycine of the conserved GXXG loop within a K-homology II domain-like structure in MTA1 (G78D) resulted in the loss of RNA-binding activity. We used gain- and loss-of-function strategy showing that MTA1, but not the G78D mutant, extended the half-life of MYC and protected it from the lethal -7-mediated degradation. The G78D mutant exhibited lower activity in promoting tumorigenesis than wild-type in vitro and in vivo. Furthermore, RNA-immunoprecipitation sequencing analysis demonstrated that MTA1 binds various oncogenesis-related mRNAs besides MYC . The clinical relevance of cytoplasmic MTA1 and its interaction with MYC were investigated using HBV-HCC cohorts with or without early recurrence. The results showed that higher cytoplasmic MTA1 level and MTA1- MYC interaction were associated with early recurrence. CONCLUSIONS MTA1 is a generic RNA-binding protein. Cytoplasmic MTA1 and its binding to MYC is associated with early recurrence in patients with HBV-HCC. This function enables it to regulate gene expression posttranscriptionally and contributes to hepatocarcinogenesis.
Collapse
Affiliation(s)
- Yung-Tsung Li
- Hepatitis Research Center , National Taiwan University Hospital , Taipei , Taiwan
- Department of Internal Medicine , National Taiwan University Hospital , Taipei , Taiwan
- Graduate Institute of Clinical Medicine , National Taiwan University College of Medicine , Taipei , Taiwan
| | - Chun-Jen Liu
- Hepatitis Research Center , National Taiwan University Hospital , Taipei , Taiwan
- Department of Internal Medicine , National Taiwan University Hospital , Taipei , Taiwan
- Graduate Institute of Clinical Medicine , National Taiwan University College of Medicine , Taipei , Taiwan
| | - Jia-Horng Kao
- Hepatitis Research Center , National Taiwan University Hospital , Taipei , Taiwan
- Department of Internal Medicine , National Taiwan University Hospital , Taipei , Taiwan
- Graduate Institute of Clinical Medicine , National Taiwan University College of Medicine , Taipei , Taiwan
| | - Li-Feng Lin
- Hepatitis Research Center , National Taiwan University Hospital , Taipei , Taiwan
| | - Hui-Chu Tu
- Hepatitis Research Center , National Taiwan University Hospital , Taipei , Taiwan
| | - Chih-Chiang Wang
- Graduate Institute of Clinical Medicine , National Taiwan University College of Medicine , Taipei , Taiwan
| | - Po-Hsi Huang
- Hepatitis Research Center , National Taiwan University Hospital , Taipei , Taiwan
| | - Huei-Ru Cheng
- Graduate Institute of Clinical Medicine , National Taiwan University College of Medicine , Taipei , Taiwan
| | - Pei-Jer Chen
- Hepatitis Research Center , National Taiwan University Hospital , Taipei , Taiwan
- Department of Internal Medicine , National Taiwan University Hospital , Taipei , Taiwan
- Graduate Institute of Clinical Medicine , National Taiwan University College of Medicine , Taipei , Taiwan
| | - Ding-Shinn Chen
- Hepatitis Research Center , National Taiwan University Hospital , Taipei , Taiwan
- Department of Internal Medicine , National Taiwan University Hospital , Taipei , Taiwan
- Graduate Institute of Clinical Medicine , National Taiwan University College of Medicine , Taipei , Taiwan
- Genomics Research Center , Academia Sinica , Taipei , Taiwan
| | - Hui-Lin Wu
- Hepatitis Research Center , National Taiwan University Hospital , Taipei , Taiwan
- Graduate Institute of Clinical Medicine , National Taiwan University College of Medicine , Taipei , Taiwan
| |
Collapse
|
15
|
Pande A, Patiyal S, Lathwal A, Arora C, Kaur D, Dhall A, Mishra G, Kaur H, Sharma N, Jain S, Usmani SS, Agrawal P, Kumar R, Kumar V, Raghava GPS. Pfeature: A Tool for Computing Wide Range of Protein Features and Building Prediction Models. J Comput Biol 2023; 30:204-222. [PMID: 36251780 DOI: 10.1089/cmb.2022.0241] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
In the last three decades, a wide range of protein features have been discovered to annotate a protein. Numerous attempts have been made to integrate these features in a software package/platform so that the user may compute a wide range of features from a single source. To complement the existing methods, we developed a method, Pfeature, for computing a wide range of protein features. Pfeature allows to compute more than 200,000 features required for predicting the overall function of a protein, residue-level annotation of a protein, and function of chemically modified peptides. It has six major modules, namely, composition, binary profiles, evolutionary information, structural features, patterns, and model building. Composition module facilitates to compute most of the existing compositional features, plus novel features. The binary profile of amino acid sequences allows to compute the fraction of each type of residue as well as its position. The evolutionary information module allows to compute evolutionary information of a protein in the form of a position-specific scoring matrix profile generated using Position-Specific Iterative Basic Local Alignment Search Tool (PSI-BLAST); fit for annotation of a protein and its residues. A structural module was developed for computing of structural features/descriptors from a tertiary structure of a protein. These features are suitable to predict the therapeutic potential of a protein containing non-natural or chemically modified residues. The model-building module allows to implement various machine learning techniques for developing classification and regression models as well as feature selection. Pfeature also allows the generation of overlapping patterns and features from a protein. A user-friendly Pfeature is available as a web server python library and stand-alone package.
Collapse
Affiliation(s)
- Akshara Pande
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Anjali Lathwal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Chakit Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Dilraj Kaur
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Gaurav Mishra
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Department of Electrical Engineering, Shiv Nadar University, Greater Noida, India
| | - Harpreet Kaur
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Neelam Sharma
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Shipra Jain
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Salman Sadullah Usmani
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Piyush Agrawal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Rajesh Kumar
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Vinod Kumar
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.,Bioinformatics Centre, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| |
Collapse
|
16
|
Kumar N, Patiyal S, Choudhury S, Tomer R, Dhall A, Raghava GPS. DMPPred: a tool for identification of antigenic regions responsible for inducing type 1 diabetes mellitus. Brief Bioinform 2023; 24:6911429. [PMID: 36524996 DOI: 10.1093/bib/bbac525] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 10/27/2022] [Accepted: 11/04/2022] [Indexed: 12/23/2022] Open
Abstract
There are a number of antigens that induce autoimmune response against β-cells, leading to type 1 diabetes mellitus (T1DM). Recently, several antigen-specific immunotherapies have been developed to treat T1DM. Thus, identification of T1DM associated peptides with antigenic regions or epitopes is important for peptide based-therapeutics (e.g. immunotherapeutic). In this study, for the first time, an attempt has been made to develop a method for predicting, designing, and scanning of T1DM associated peptides with high precision. We analysed 815 T1DM associated peptides and observed that these peptides are not associated with a specific class of HLA alleles. Thus, HLA binder prediction methods are not suitable for predicting T1DM associated peptides. First, we developed a similarity/alignment based method using Basic Local Alignment Search Tool and achieved a high probability of correct hits with poor coverage. Second, we developed an alignment-free method using machine learning techniques and got a maximum AUROC of 0.89 using dipeptide composition. Finally, we developed a hybrid method that combines the strength of both alignment free and alignment-based methods and achieves maximum area under the receiver operating characteristic of 0.95 with Matthew's correlation coefficient of 0.81 on an independent dataset. We developed a web server 'DMPPred' and stand-alone server for predicting, designing and scanning T1DM associated peptides (https://webs.iiitd.edu.in/raghava/dmppred/).
Collapse
Affiliation(s)
- Nishant Kumar
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Shubham Choudhury
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Ritu Tomer
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| |
Collapse
|
17
|
Du X, Hu J. Deep Multi-Label Joint Learning for RNA and DNA-Binding Proteins Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:307-320. [PMID: 35148267 DOI: 10.1109/tcbb.2022.3150280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The recognition of DNA- (DBPs) and RNA-binding proteins (RBPs) is not only conducive to understanding cell function, but also a challenging task. Previous studies have shown that these proteins are usually considered separately due to different binding domains. In addition, due to the high similarity between DBPs and RBPs, it is possible for DBPs predictor to predict RBPs as DBPs, and vice versa, which leads to high cross-prediction rate. In this study, we creatively propose a novel deep multi-label joint learning framework to leverage the relationship between multiple labels and binding proteins. First, a multi-label variant network is designed to explore multi-scale context hidden information. Then, multi-label Long Short-Term Memory (multiLSTM) is used to mine the potential relationship between labels. Finally, the calibrated hidden features from variant network are considered for different levels of joint learning so that multiLSTM can better explore the correlation between them. Extensive experiments are also carried out to compare the proposed method with other existing methods. Furthermore, we also provide further insights into the importance of the relevant bioanalysis of proteins obtained from our model and summarize these binding proteins that are significantly related to a disease. Our method is freely available at http://39.108.90.186/dmlj.
Collapse
|
18
|
Wu Z, Basu S, Wu X, Kurgan L. qNABpredict: Quick, accurate, and taxonomy-aware sequence-based prediction of content of nucleic acid binding amino acids. Protein Sci 2023; 32:e4544. [PMID: 36519304 PMCID: PMC9798252 DOI: 10.1002/pro.4544] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/07/2022] [Accepted: 12/08/2022] [Indexed: 12/23/2022]
Abstract
Protein sequence-based predictors of nucleic acid (NA)-binding include methods that predict NA-binding proteins and NA-binding residues. The residue-level tools produce more details but suffer high computational cost since they must predict every amino acid in the input sequence and rely on multiple sequence alignments. We propose an alternative approach that predicts content (fraction) of the NA-binding residues, offering more information than the protein-level prediction and much shorter runtime than the residue-level tools. Our first-of-its-kind content predictor, qNABpredict, relies on a small, rationally designed and fast-to-compute feature set that represents relevant characteristics extracted from the input sequence and a well-parametrized support vector regression model. We provide two versions of qNABpredict, a taxonomy-agnostic model that can be used for proteins of unknown taxonomic origin and more accurate taxonomy-aware models that are tailored to specific taxonomic kingdoms: archaea, bacteria, eukaryota, and viruses. Empirical tests on a low-similarity test dataset show that qNABpredict is 100 times faster and generates statistically more accurate content predictions when compared to the content extracted from results produced by the residue-level predictors. We also show that qNABpredict's content predictions can be used to improve results generated by the residue-level predictors. We release qNABpredict as a convenient webserver and source code at http://biomine.cs.vcu.edu/servers/qNABpredict/. This new tool should be particularly useful to predict details of protein-NA interactions for large protein families and proteomes.
Collapse
Affiliation(s)
- Zhonghua Wu
- School of Mathematical Sciences and LPMCNankai UniversityTianjinChina
| | - Sushmita Basu
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - Xuantai Wu
- School of Mathematical Sciences and LPMCNankai UniversityTianjinChina
| | - Lukasz Kurgan
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginiaUSA
| |
Collapse
|
19
|
Wang N, Zhang J, Liu B. iDRBP-EL: Identifying DNA- and RNA- Binding Proteins Based on Hierarchical Ensemble Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:432-441. [PMID: 34932484 DOI: 10.1109/tcbb.2021.3136905] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Identification of DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs) from the primary sequences is essential for further exploring protein-nucleic acid interactions. Previous studies have shown that machine-learning-based methods can efficiently identify DBPs or RBPs. However, the information used in these methods is slightly unitary, and most of them only can predict DBPs or RBPs. In this study, we proposed a computational predictor iDRBP-EL to identify DNA- and RNA- binding proteins, and introduced hierarchical ensemble learning to integrate three level information. The method can integrate the information of different features, machine learning algorithms and data into one multi-label model. The ablation experiment showed that the fusion of different information can improve the prediction performance and overcome the cross-prediction problem. Experimental results on the independent datasets showed that iDRBP-EL outperformed all the other competing methods. Moreover, we established a user-friendly webserver iDRBP-EL (http://bliulab.net/iDRBP-EL), which can predict both DBPs and RBPs only based on protein sequences.
Collapse
|
20
|
Liu Y, Niu G, Zhou J, Shen W, Corriou JP, Seferlis P. Hybrid Intelligent Fault Diagnosis Model Based on Improved MPCA-V for Sensors in a Laboratory-Scale Wastewater Treatment Process. Ind Eng Chem Res 2022. [DOI: 10.1021/acs.iecr.2c02334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Yin Liu
- State Key Laboratory of Pulp and Paper Engineering, South China University of Technology, Guangzhou510640, P. R. China
| | - Guoqiang Niu
- State Key Laboratory of Pulp and Paper Engineering, South China University of Technology, Guangzhou510640, P. R. China
| | - Jing Zhou
- State Key Laboratory of Pulp and Paper Engineering, South China University of Technology, Guangzhou510640, P. R. China
| | - Wenhao Shen
- State Key Laboratory of Pulp and Paper Engineering, South China University of Technology, Guangzhou510640, P. R. China
| | - Jean-Pierre Corriou
- Laboratoire Réactions et Génie des Procédés, UMR 7274-CNRS, Lorraine University, ENSIC 1, Rue Grandville BP, 20451Nancy Cedex, France
| | - Panagiotis Seferlis
- Department of Mechanical Engineering, Aristotle University of Thessaloniki, Thessaloniki54124, Greece
| |
Collapse
|
21
|
Shahnazari M, Zakipour Z, Razi H, Moghadam A, Alemzadeh A. Bioinformatics approaches for classification and investigation of the evolution of the Na/K-ATPase alpha-subunit. BMC Ecol Evol 2022; 22:122. [PMID: 36289471 PMCID: PMC9609216 DOI: 10.1186/s12862-022-02071-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Accepted: 09/29/2022] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Na,K-ATPase is a key protein in maintaining membrane potential that has numerous additional cellular functions. Its catalytic subunit (α), found in a wide range of organisms from prokaryotes to complex eukaryote. Several studies have been done to identify the functions as well as determining the evolutionary relationships of the α-subunit. However, a survey of a larger collection of protein sequences according to sequences similarity and their attributes is very important in revealing deeper evolutionary relationships and identifying specific amino acid differences among evolutionary groups that may have a functional role. RESULTS In this study, 753 protein sequences using phylogenetic tree classification resulted in four groups: prokaryotes (I), fungi and various kinds of Protista and some invertebrates (II), the main group of invertebrates (III), and vertebrates (IV) that was consisted with species tree. The percent of sequences that acquired a specific motif for the α/β subunit assembly increased from group I to group IV. The vertebrate sequences were divided into four groups according to isoforms with each group conforming to the evolutionary path of vertebrates from fish to tetrapods. Data mining was used to identify the most effective attributes in classification of sequences. Using 1252 attributes extracted from the sequences, the decision tree classified them in five groups: Protista, prokaryotes, fungi, invertebrates and vertebrates. Also, vertebrates were divided into four subgroups (isoforms). Generally, the count of different dipeptides and amino acid ratios were the most significant attributes for grouping. Using alignment of sequences identified the effective position of the respective dipeptides in the separation of the groups. So that 208GC is apparently involved in the separation of vertebrates from the four other organism groups, and 41DH, 431FK, and 451KC were involved in separation vertebrate isoform types. CONCLUSION The application of phylogenetic and decision tree analysis for Na,K-ATPase, provides a better understanding of the evolutionary changes according to the amino acid sequence and its related properties that could lead to the identification of effective attributes in the separation of sequences in different groups of phylogenetic tree. In this study, key evolution-related dipeptides are identified which can guide future experimental studies.
Collapse
Affiliation(s)
- Marzieh Shahnazari
- Department of Plant Production and Genetics, School of Agriculture, Shiraz University, Shiraz, Iran
| | - Zahra Zakipour
- Department of Plant Production and Genetics, School of Agriculture, Shiraz University, Shiraz, Iran
| | - Hooman Razi
- Department of Plant Production and Genetics, School of Agriculture, Shiraz University, Shiraz, Iran
| | - Ali Moghadam
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
| | - Abbas Alemzadeh
- Department of Plant Production and Genetics, School of Agriculture, Shiraz University, Shiraz, Iran.
| |
Collapse
|
22
|
Balcerak A, Macech-Klicka E, Wakula M, Tomecki R, Goryca K, Rydzanicz M, Chmielarczyk M, Szostakowska-Rodzos M, Wisniewska M, Lyczek F, Helwak A, Tollervey D, Kudla G, Grzybowska EA. The RNA-Binding Landscape of HAX1 Protein Indicates Its Involvement in Translation and Ribosome Assembly. Cells 2022; 11:cells11192943. [PMID: 36230905 PMCID: PMC9564044 DOI: 10.3390/cells11192943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 09/13/2022] [Accepted: 09/15/2022] [Indexed: 11/18/2022] Open
Abstract
HAX1 is a human protein with no known homologues or structural domains. Mutations in the HAX1 gene cause severe congenital neutropenia through mechanisms that are poorly understood. Previous studies reported the RNA-binding capacity of HAX1, but the role of this binding in physiology and pathology remains unexplained. Here, we report the transcriptome-wide characterization of HAX1 RNA targets using RIP-seq and CRAC, indicating that HAX1 binds transcripts involved in translation, ribosome biogenesis, and rRNA processing. Using CRISPR knockouts, we find that HAX1 RNA targets partially overlap with transcripts downregulated in HAX1 KO, implying a role in mRNA stabilization. Gene ontology analysis demonstrated that genes differentially expressed in HAX1 KO (including genes involved in ribosome biogenesis and translation) are also enriched in a subset of genes whose expression correlates with HAX1 expression in four analyzed neoplasms. The functional connection to ribosome biogenesis was also demonstrated by gradient sedimentation ribosome profiles, which revealed differences in the small subunit:monosome ratio in HAX1 WT/KO. We speculate that changes in HAX1 expression may be important for the etiology of HAX1-linked diseases through dysregulation of translation.
Collapse
Affiliation(s)
- Anna Balcerak
- Molecular and Translational Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, 02-781 Warsaw, Poland
| | - Ewelina Macech-Klicka
- Molecular and Translational Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, 02-781 Warsaw, Poland
| | - Maciej Wakula
- Molecular and Translational Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, 02-781 Warsaw, Poland
| | - Rafal Tomecki
- Laboratory of RNA Processing and Decay, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, 02-106 Warsaw, Poland
- Faculty of Biology, Institute of Genetics and Biotechnology, University of Warsaw, 02-106 Warsaw, Poland
| | - Krzysztof Goryca
- Genomics Core Facility, Centre of New Technologies University of Warsaw, 02-097 Warsaw, Poland
| | - Malgorzata Rydzanicz
- Department of Medical Genetics, Medical University of Warsaw, 02-106 Warsaw, Poland
| | - Mateusz Chmielarczyk
- Molecular and Translational Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, 02-781 Warsaw, Poland
| | - Malgorzata Szostakowska-Rodzos
- Molecular and Translational Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, 02-781 Warsaw, Poland
| | - Marta Wisniewska
- Laboratory of Biological Chemistry of Metal Ions, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, 02-106 Warsaw, Poland
| | - Filip Lyczek
- Molecular and Translational Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, 02-781 Warsaw, Poland
| | - Aleksandra Helwak
- Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, UK
| | - David Tollervey
- Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, UK
| | - Grzegorz Kudla
- MRC Human Genetics Unit, University of Edinburgh, Edinburgh EH4 2XU, UK
| | - Ewa A. Grzybowska
- Molecular and Translational Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, 02-781 Warsaw, Poland
- Correspondence:
| |
Collapse
|
23
|
Feng J, Wang N, Zhang J, Liu B. iDRBP-ECHF: Identifying DNA- and RNA-binding proteins based on extensible cubic hybrid framework. Comput Biol Med 2022; 149:105940. [PMID: 36044786 DOI: 10.1016/j.compbiomed.2022.105940] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 07/10/2022] [Accepted: 08/06/2022] [Indexed: 11/28/2022]
Abstract
Proteins interact with nucleic acids to regulate the life activities of organisms. Therefore, how to accurately and efficiently identify nucleic acid-binding proteins (NABPs) is particularly significant. Some sequence-based computational methods have been proposed to identify DNA- and RNA-binding proteins in previous studies. However, the benchmark datasets used by these methods ignore the proportion of NABPs in the real world, and some integration methods only integrate traditional machine learning algorithms, resulting in limited prediction performance. In this study, we proposed a sequence-based method called iDRBP-ECHF to predict the DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs). We constructed a benchmark dataset by considering the proportion of positive and negative samples in the real world, and used down-sampling to generate three relatively balanced datasets to train the iDRBP-ECHF. In addition, we incorporated the deep learning algorithms into the framework to obtain a more compact high-level feature representation of the input data. The results on two independent datasets show that it achieves the most advanced performance and is superior to the other existing sequence-based DBP and RBP prediction methods. In addition, we set up a webserver iDRBP-ECHF, which can be accessed at http://bliulab.net/iDRBP-ECHF.
Collapse
Affiliation(s)
- Jiawei Feng
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.
| | - Ning Wang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.
| | - Jun Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, 518055, China.
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China; Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, 100081, China.
| |
Collapse
|
24
|
Wang N, Zhang J, Liu B. IDRBP-PPCT: Identifying Nucleic Acid-Binding Proteins Based on Position-Specific Score Matrix and Position-Specific Frequency Matrix Cross Transformation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2284-2293. [PMID: 33780341 DOI: 10.1109/tcbb.2021.3069263] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs) are two important nucleic acid-binding proteins (NABPs), which play important roles in biological processes such as replication, translation and transcription of genetic material. Some proteins (DRBPs) bind to both DNA and RNA, also play a key role in gene expression. Identification of DBPs, RBPs and DRBPs is important to study protein-nucleic acid interactions. Computational methods are increasingly being proposed to automatically identify DNA- or RNA-binding proteins based only on protein sequences. One challenge is to design an effective protein representation method to convert protein sequences into fixed-dimension feature vectors. In this study, we proposed a novel protein representation method called Position-Specific Scoring Matrix (PSSM) and Position-Specific Frequency Matrix (PSFM) Cross Transformation (PPCT) to represent protein sequences. This method contains the evolutionary information in PSSM and PSFM, and their correlations. A new computational predictor called IDRBP-PPCT was proposed by combining PPCT and the two-layer framework based on the random forest algorithm to identify DBPs, RBPs and DRBPs. The experimental results on the independent dataset and the tomato genome proved the effectiveness of the proposed method. A user-friendly web-server of IDRBP-PPCT was constructed, which is freely available at http://bliulab.net/IDRBP-PPCT.
Collapse
|
25
|
Peng X, Wang X, Guo Y, Ge Z, Li F, Gao X, Song J. RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins. Brief Bioinform 2022; 23:6596984. [PMID: 35649392 PMCID: PMC9294422 DOI: 10.1093/bib/bbac215] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 04/25/2022] [Accepted: 05/06/2022] [Indexed: 11/27/2022] Open
Abstract
RNA binding proteins (RBPs) are critical for the post-transcriptional control of RNAs and play vital roles in a myriad of biological processes, such as RNA localization and gene regulation. Therefore, computational methods that are capable of accurately identifying RBPs are highly desirable and have important implications for biomedical and biotechnological applications. Here, we propose a two-stage deep transfer learning-based framework, termed RBP-TSTL, for accurate prediction of RBPs. In the first stage, the knowledge from the self-supervised pre-trained model was extracted as feature embeddings and used to represent the protein sequences, while in the second stage, a customized deep learning model was initialized based on an annotated pre-training RBPs dataset before being fine-tuned on each corresponding target species dataset. This two-stage transfer learning framework can enable the RBP-TSTL model to be effectively trained to learn and improve the prediction performance. Extensive performance benchmarking of the RBP-TSTL models trained using the features generated by the self-supervised pre-trained model and other models trained using hand-crafting encoding features demonstrated the effectiveness of the proposed two-stage knowledge transfer strategy based on the self-supervised pre-trained models. Using the best-performing RBP-TSTL models, we further conducted genome-scale RBP predictions for Homo sapiens, Arabidopsis thaliana, Escherichia coli, and Salmonella and established a computational compendium containing all the predicted putative RBPs candidates. We anticipate that the proposed RBP-TSTL approach will be explored as a useful tool for the characterization of RNA-binding proteins and exploration of their sequence–structure–function relationships.
Collapse
Affiliation(s)
- Xinxin Peng
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia.,Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia
| | - Xiaoyu Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia.,Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia
| | - Yuming Guo
- Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria 3004, Australia
| | - Zongyuan Ge
- Monash e-Research Centre and Faculty of Engineering, Monash University, Melbourne, VIC 3800, Australia
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia.,Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia.,College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia.,KAUST Computational Bioscience Research Center, King Abdullah University of Science and Technology
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia.,Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia
| |
Collapse
|
26
|
Parra ALC, Bezerra LP, Shawar DE, Neto NAS, Mesquita FP, da Silva GO, Souza PFN. Synthetic antiviral peptides: a new way to develop targeted antiviral drugs. Future Virol 2022. [DOI: 10.2217/fvl-2021-0308] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The global concern over emerging and re-emerging viral infections has spurred the search for novel antiviral agents. Peptides with antiviral activity stand out, by overcoming limitations of the current drugs utilized, due to their biocompatibility, specificity and effectiveness. Synthetic peptides have been shown to be viable alternatives to natural peptides due to several difficulties of using of the latter in clinical trials. Various platforms have been utilized by researchers to predict the most effective peptide sequences against HIV, influenza, dengue, MERS and SARS. Synthetic peptides are already employed in the treatment of HIV infection. The novelty of this study is to discuss, for the first time, the potential of synthetic peptides as antiviral molecules. We conclude that synthetic peptides can act as new weapons against viral threats to humans.
Collapse
Affiliation(s)
- Aura LC Parra
- Department of Biochemistry & Molecular Biology, Federal University of Ceara, Fortaleza, Ceara, 60440-554, Brazil
| | - Leandro P Bezerra
- Department of Biochemistry & Molecular Biology, Federal University of Ceara, Fortaleza, Ceara, 60440-554, Brazil
| | - Dur E Shawar
- Department of Biochemistry & Molecular Biology, Federal University of Ceara, Fortaleza, Ceara, 60440-554, Brazil
| | - Nilton AS Neto
- Department of Biochemistry & Molecular Biology, Federal University of Ceara, Fortaleza, Ceara, 60440-554, Brazil
| | - Felipe P Mesquita
- Drug Research & Development Center (NPDM), Federal University of Ceará, Cel. Nunes de Melo, Rodolfo Teófilo, 1000, Fortaleza, Brazil
| | - Gabrielly O da Silva
- Department of Biochemistry & Molecular Biology, Federal University of Ceara, Fortaleza, Ceara, 60440-554, Brazil
| | - Pedro FN Souza
- Department of Biochemistry & Molecular Biology, Federal University of Ceara, Fortaleza, Ceara, 60440-554, Brazil
- Drug Research & Development Center (NPDM), Federal University of Ceará, Cel. Nunes de Melo, Rodolfo Teófilo, 1000, Fortaleza, Brazil
| |
Collapse
|
27
|
Zhang J, Yan K, Chen Q, Liu B. PreRBP-TL: prediction of species-specific RNA-binding proteins based on transfer learning. Bioinformatics 2022; 38:2135-2143. [PMID: 35176130 DOI: 10.1093/bioinformatics/btac106] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 11/18/2021] [Accepted: 02/15/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION RNA-binding proteins (RBPs) play crucial roles in post-transcriptional regulation. Accurate identification of RBPs helps to understand gene expression, regulation, etc. In recent years, some computational methods were proposed to identify RBPs. However, these methods fail to accurately identify RBPs from some specific species with limited data, such as bacteria. RESULTS In this study, we introduce a computational method called PreRBP-TL for identifying species-specific RBPs based on transfer learning. The weights of the prediction model were initialized by pretraining with the large general RBP dataset and then fine-tuned with the small species-specific RPB dataset by using transfer learning. The experimental results show that the PreRBP-TL achieves better performance for identifying the species-specific RBPs from Human, Arabidopsis, Escherichia coli and Salmonella, outperforming eight state-of-the-art computational methods. It is anticipated PreRBP-TL will become a useful method for identifying RBPs. AVAILABILITY AND IMPLEMENTATION For the convenience of researchers to identify RBPs, the web server of PreRBP-TL was established, freely available at http://bliulab.net/PreRBP-TL. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jun Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
| | - Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Qingcai Chen
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China.,School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
28
|
Zhang J, Hess WR, Zhang C. "Life is short, and art is long": RNA degradation in cyanobacteria and model bacteria. MLIFE 2022; 1:21-39. [PMID: 38818322 PMCID: PMC10989914 DOI: 10.1002/mlf2.12015] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 03/03/2022] [Accepted: 03/03/2022] [Indexed: 06/01/2024]
Abstract
RNA turnover plays critical roles in the regulation of gene expression and allows cells to respond rapidly to environmental changes. In bacteria, the mechanisms of RNA turnover have been extensively studied in the models Escherichia coli and Bacillus subtilis, but not much is known in other bacteria. Cyanobacteria are a diverse group of photosynthetic organisms that have great potential for the sustainable production of valuable products using CO2 and solar energy. A better understanding of the regulation of RNA decay is important for both basic and applied studies of cyanobacteria. Genomic analysis shows that cyanobacteria have more than 10 ribonucleases and related proteins in common with E. coli and B. subtilis, and only a limited number of them have been experimentally investigated. In this review, we summarize the current knowledge about these RNA-turnover-related proteins in cyanobacteria. Although many of them are biochemically similar to their counterparts in E. coli and B. subtilis, they appear to have distinct cellular functions, suggesting a different mechanism of RNA turnover regulation in cyanobacteria. The identification of new players involved in the regulation of RNA turnover and the elucidation of their biological functions are among the future challenges in this field.
Collapse
Affiliation(s)
- Ju‐Yuan Zhang
- State Key Laboratory of Freshwater Ecology and Biotechnology and Key Laboratory of Algal Biology, Institute of HydrobiologyChinese Academy of SciencesWuhanChina
| | - Wolfgang R. Hess
- Genetics and Experimental Bioinformatics, Faculty of BiologyUniversity of FreiburgFreiburgGermany
| | - Cheng‐Cai Zhang
- State Key Laboratory of Freshwater Ecology and Biotechnology and Key Laboratory of Algal Biology, Institute of HydrobiologyChinese Academy of SciencesWuhanChina
- Institut WUT‐AMUAix‐Marseille University and Wuhan University of TechnologyWuhanChina
| |
Collapse
|
29
|
Xie J, Zhang X, Zheng J, Hong X, Tong X, Liu X, Xue Y, Wang X, Zhang Y, Liu S. Two novel RNA-binding proteins identification through computational prediction and experimental validation. Genomics 2021; 114:149-160. [PMID: 34921931 DOI: 10.1016/j.ygeno.2021.12.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 08/05/2021] [Accepted: 12/13/2021] [Indexed: 11/16/2022]
Abstract
Since RBPs play important roles in the cell, it's particularly important to find new RBPs. We performed iRIP-seq and CLIP-seq to verify two proteins, CLIP1 and DMD, predicted by RBPPred whether are RBPs or not. The experimental results confirm that these two proteins have RNA-binding activity. We identified significantly enriched binding motifs UGGGGAGG, CUUCCG and CCCGU for CLIP1 (iRIP-seq), DMD (iRIP-seq) and DMD (CLIP-seq), respectively. The computational KEGG and GO analysis show that the CLIP1 and DMD share some biological processes and functions. Besides, we found that the SNPs between DMD and its RNA partners may be associated with Becker muscular dystrophy, Duchenne muscular dystrophy, Dilated cardiomyopathy 3B and Cardiovascular phenotype. Among the thirteen cancers data, CLIP1 and another 300 oncogenes always co-occur, and 123 of these 300 genes interact with CLIP1. These cancers may be associated with the mutations occurred in both CLIP1 and the genes it interacts with.
Collapse
Affiliation(s)
- Juan Xie
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Xiaoli Zhang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Jinfang Zheng
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Xu Hong
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Xiaoxue Tong
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Xudong Liu
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Yaqiang Xue
- Laboratory for Genome Regulation and Human Health, ABLife Inc., Wuhan, Hubei 430075, China
| | - Xuelian Wang
- ABLife BioBigData Institute, Wuhan, Hubei 430075, China
| | - Yi Zhang
- ABLife BioBigData Institute, Wuhan, Hubei 430075, China
| | - Shiyong Liu
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China.
| |
Collapse
|
30
|
Li HL, Pang YH, Liu B. BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models. Nucleic Acids Res 2021; 49:e129. [PMID: 34581805 PMCID: PMC8682797 DOI: 10.1093/nar/gkab829] [Citation(s) in RCA: 146] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Revised: 08/24/2021] [Accepted: 09/09/2021] [Indexed: 01/08/2023] Open
Abstract
In order to uncover the meanings of ‘book of life’, 155 different biological language models (BLMs) for DNA, RNA and protein sequence analysis are discussed in this study, which are able to extract the linguistic properties of ‘book of life’. We also extend the BLMs into a system called BioSeq-BLM for automatically representing and analyzing the sequence data. Experimental results show that the predictors generated by BioSeq-BLM achieve comparable or even obviously better performance than the exiting state-of-the-art predictors published in literatures, indicating that BioSeq-BLM will provide new approaches for biological sequence analysis based on natural language processing technologies, and contribute to the development of this very important field. In order to help the readers to use BioSeq-BLM for their own experiments, the corresponding web server and stand-alone package are established and released, which can be freely accessed at http://bliulab.net/BioSeq-BLM/.
Collapse
Affiliation(s)
- Hong-Liang Li
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Yi-He Pang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
31
|
Niu M, Wu J, Zou Q, Liu Z, Xu L. rBPDL:Predicting RNA-Binding Proteins Using Deep Learning. IEEE J Biomed Health Inform 2021; 25:3668-3676. [PMID: 33780344 DOI: 10.1109/jbhi.2021.3069259] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
RNA-binding protein (RBP) is a powerful and wide-ranging regulator that plays an important role in cell development, differentiation, metabolism, health and disease. The prediction of RBPs provides valuable guidance for biologists. Although experimental methods have made great progress in predicting RBP, they are time-consuming and not flexible. Therefore, we developed a network model, rBPDL, by combining a convolutional neural network and long short-term memory for multilabel classification of RBPs. Moreover, to achieve better prediction results, we used a voting algorithm for ensemble learning of the model. We compared rBPDL with state-of-the-art methods and found that rBPDL significantly improved identification performance for the RBP68 dataset, with a macro-Area Under Curve (AUC), micro-AUC, and weighted AUC of 0.936, 0.962, and 0.946, respectively. Furthermore, through AUC statistical analysis of the RBP domain, we analyzed the performance of rBPDL and found that the RBP identification performance in the same domain was similar. In addition, we analyzed the performance preferences and physicochemical properties of the binding protein amino acids and explored the characteristics that affect the binding by using the RBP86 dataset.
Collapse
|
32
|
Gutierrez‐Beltran E, Elander PH, Dalman K, Dayhoff GW, Moschou PN, Uversky VN, Crespo JL, Bozhkov PV. Tudor staphylococcal nuclease is a docking platform for stress granule components and is essential for SnRK1 activation in Arabidopsis. EMBO J 2021; 40:e105043. [PMID: 34287990 PMCID: PMC8447601 DOI: 10.15252/embj.2020105043] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 06/23/2021] [Accepted: 07/01/2021] [Indexed: 12/19/2022] Open
Abstract
Tudor staphylococcal nuclease (TSN; also known as Tudor-SN, p100, or SND1) is a multifunctional, evolutionarily conserved regulator of gene expression, exhibiting cytoprotective activity in animals and plants and oncogenic activity in mammals. During stress, TSN stably associates with stress granules (SGs), in a poorly understood process. Here, we show that in the model plant Arabidopsis thaliana, TSN is an intrinsically disordered protein (IDP) acting as a scaffold for a large pool of other IDPs, enriched for conserved stress granule components as well as novel or plant-specific SG-localized proteins. While approximately 30% of TSN interactors are recruited to stress granules de novo upon stress perception, 70% form a protein-protein interaction network present before the onset of stress. Finally, we demonstrate that TSN and stress granule formation promote heat-induced activation of the evolutionarily conserved energy-sensing SNF1-related protein kinase 1 (SnRK1), the plant orthologue of mammalian AMP-activated protein kinase (AMPK). Our results establish TSN as a docking platform for stress granule proteins, with an important role in stress signalling.
Collapse
Affiliation(s)
- Emilio Gutierrez‐Beltran
- Instituto de Bioquímica Vegetal y FotosíntesisConsejo Superior de Investigaciones Científicas (CSIC)‐Universidad de SevillaSevillaSpain
- Departamento de Bioquímica Vegetal y Biología MolecularFacultad de BiologíaUniversidad de SevillaSevillaSpain
| | - Pernilla H Elander
- Department of Molecular SciencesUppsala BioCenterSwedish University of Agricultural Sciences and Linnean Center for Plant BiologyUppsalaSweden
| | - Kerstin Dalman
- Department of Molecular SciencesUppsala BioCenterSwedish University of Agricultural Sciences and Linnean Center for Plant BiologyUppsalaSweden
| | - Guy W Dayhoff
- Department of ChemistryCollege of Art and SciencesUniversity of South FloridaTampaFLUSA
| | - Panagiotis N Moschou
- Institute of Molecular Biology and BiotechnologyFoundation for Research and Technology ‐ HellasHeraklionGreece
- Department of Plant BiologyUppsala BioCenterSwedish University of Agricultural Sciences and Linnean Center for Plant BiologyUppsalaSweden
- Department of BiologyUniversity of CreteHeraklionGreece
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of MedicineUniversity of South FloridaTampaFLUSA
- Institute for Biological Instrumentation of the Russian Academy of SciencesFederal Research Center “Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences”PushchinoRussia
| | - Jose L Crespo
- Instituto de Bioquímica Vegetal y FotosíntesisConsejo Superior de Investigaciones Científicas (CSIC)‐Universidad de SevillaSevillaSpain
| | - Peter V Bozhkov
- Department of Molecular SciencesUppsala BioCenterSwedish University of Agricultural Sciences and Linnean Center for Plant BiologyUppsalaSweden
| |
Collapse
|
33
|
Zhang J, Chen Q, Liu B. DeepDRBP-2L: A New Genome Annotation Predictor for Identifying DNA-Binding Proteins and RNA-Binding Proteins Using Convolutional Neural Network and Long Short-Term Memory. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1451-1463. [PMID: 31722485 DOI: 10.1109/tcbb.2019.2952338] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs) are two kinds of crucial proteins, which are associated with various cellule activities and some important diseases. Accurate identification of DBPs and RBPs facilitate both theoretical research and real world application. Existing sequence-based DBP predictors can accurately identify DBPs but incorrectly predict many RBPs as DBPs, and vice versa, resulting in low prediction precision. Moreover, some proteins (DRBPs) interacting with both DNA and RNA play important roles in gene expression and cannot be identified by existing computational methods. In this study, a two-level predictor named DeepDRBP-2L was proposed by combining Convolutional Neural Network (CNN) and the Long Short-Term Memory (LSTM). It is the first computational method that is able to identify DBPs, RBPs and DRBPs. Rigorous cross-validations and independent tests showed that DeepDRBP-2L is able to overcome the shortcoming of the existing methods and can go one further step to identify DRBPs. Application of DeepDRBP-2L to tomato genome further demonstrated its performance. The webserver of DeepDRBP-2L is freely available at http://bliulab.net/DeepDRBP-2L.
Collapse
|
34
|
Riediger M, Spät P, Bilger R, Voigt K, Maček B, Hess WR. Analysis of a photosynthetic cyanobacterium rich in internal membrane systems via gradient profiling by sequencing (Grad-seq). THE PLANT CELL 2021; 33:248-269. [PMID: 33793824 PMCID: PMC8136920 DOI: 10.1093/plcell/koaa017] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 11/12/2020] [Indexed: 05/23/2023]
Abstract
Although regulatory small RNAs have been reported in photosynthetic cyanobacteria, the lack of clear RNA chaperones involved in their regulation poses a conundrum. Here, we analyzed the full complement of cellular RNAs and proteins using gradient profiling by sequencing (Grad-seq) in Synechocystis 6803. Complexes with overlapping subunits such as the CpcG1-type versus the CpcL-type phycobilisomes or the PsaK1 versus PsaK2 photosystem I pre(complexes) could be distinguished, supporting the high quality of this approach. Clustering of the in-gradient distribution profiles followed by several additional criteria yielded a short list of potential RNA chaperones that include an YlxR homolog and a cyanobacterial homolog of the KhpA/B complex. The data suggest previously undetected complexes between accessory proteins and CRISPR-Cas systems, such as a Csx1-Csm6 ribonucleolytic defense complex. Moreover, the exclusive association of either RpoZ or 6S RNA with the core RNA polymerase complex and the existence of a reservoir of inactive sigma-antisigma complexes is suggested. The Synechocystis Grad-seq resource is available online at https://sunshine.biologie.uni-freiburg.de/GradSeqExplorer/ providing a comprehensive resource for the functional assignment of RNA-protein complexes and multisubunit protein complexes in a photosynthetic organism.
Collapse
Affiliation(s)
- Matthias Riediger
- Genetics and Experimental Bioinformatics, Faculty of Biology, University of Freiburg, Schänzlestr. 1, 79104 Freiburg, Germany
| | - Philipp Spät
- Department of Quantitative Proteomics, Interfaculty Institute for Cell Biology, University of Tübingen, Auf der Morgenstelle 15, 72076 Tübingen, Germany
| | - Raphael Bilger
- Genetics and Experimental Bioinformatics, Faculty of Biology, University of Freiburg, Schänzlestr. 1, 79104 Freiburg, Germany
| | - Karsten Voigt
- IT Administration, Institute of Biology 3, Faculty of Biology, University of Freiburg, Schänzlestr. 1, 79104 Freiburg, Germany
| | - Boris Maček
- Department of Quantitative Proteomics, Interfaculty Institute for Cell Biology, University of Tübingen, Auf der Morgenstelle 15, 72076 Tübingen, Germany
| | - Wolfgang R Hess
- Genetics and Experimental Bioinformatics, Faculty of Biology, University of Freiburg, Schänzlestr. 1, 79104 Freiburg, Germany
| |
Collapse
|
35
|
The interactome of multifunctional HAX1 protein suggests its role in the regulation of energy metabolism, de-aggregation, cytoskeleton organization and RNA-processing. Biosci Rep 2021; 40:226900. [PMID: 33146709 PMCID: PMC7670567 DOI: 10.1042/bsr20203094] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 10/14/2020] [Accepted: 11/02/2020] [Indexed: 01/07/2023] Open
Abstract
HCLS1-associated protein X-1 (HAX1) is a multifunctional protein involved in many cellular processes, including apoptosis, cell migration and calcium homeostasis, but its mode of action still remains obscure. Multiple HAX1 protein partners have been identified, but they are involved in many distinct pathways, form different complexes and do not constitute a coherent group. By characterizing HAX1 protein interactome using targeted approach, we attempt to explain HAX1 multiple functions and its role in the cell. Presented analyses indicate that HAX1 interacts weakly with a wide spectrum of proteins and its interactome tends to be cell-specific, which conforms to a profile of intrinsically disordered protein (IDP). Moreover, we have identified a mitochondrial subset of HAX1 protein partners and preliminarily characterized its involvement in the cellular response to oxidative stress and aggregation.
Collapse
|
36
|
Zhang ZM, Guan ZX, Wang F, Zhang D, Ding H. Application of Machine Learning Methods in Predicting Nuclear Receptors and their Families. Med Chem 2021; 16:594-604. [PMID: 31584374 DOI: 10.2174/1573406415666191004125551] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 06/18/2019] [Accepted: 08/23/2019] [Indexed: 11/22/2022]
Abstract
Nuclear receptors (NRs) are a superfamily of ligand-dependent transcription factors that are closely related to cell development, differentiation, reproduction, homeostasis, and metabolism. According to the alignments of the conserved domains, NRs are classified and assigned the following seven subfamilies or eight subfamilies: (1) NR1: thyroid hormone like (thyroid hormone, retinoic acid, RAR-related orphan receptor, peroxisome proliferator activated, vitamin D3- like), (2) NR2: HNF4-like (hepatocyte nuclear factor 4, retinoic acid X, tailless-like, COUP-TFlike, USP), (3) NR3: estrogen-like (estrogen, estrogen-related, glucocorticoid-like), (4) NR4: nerve growth factor IB-like (NGFI-B-like), (5) NR5: fushi tarazu-F1 like (fushi tarazu-F1 like), (6) NR6: germ cell nuclear factor like (germ cell nuclear factor), and (7) NR0: knirps like (knirps, knirpsrelated, embryonic gonad protein, ODR7, trithorax) and DAX like (DAX, SHP), or dividing NR0 into (7) NR7: knirps like and (8) NR8: DAX like. Different NRs families have different structural features and functions. Since the function of a NR is closely correlated with which subfamily it belongs to, it is highly desirable to identify NRs and their subfamilies rapidly and effectively. The knowledge acquired is essential for a proper understanding of normal and abnormal cellular mechanisms. With the advent of the post-genomics era, huge amounts of sequence-known proteins have increased explosively. Conventional methods for accurately classifying the family of NRs are experimental means with high cost and low efficiency. Therefore, it has created a greater need for bioinformatics tools to effectively recognize NRs and their subfamilies for the purpose of understanding their biological function. In this review, we summarized the application of machine learning methods in the prediction of NRs from different aspects. We hope that this review will provide a reference for further research on the classification of NRs and their families.
Collapse
Affiliation(s)
- Zi-Mei Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zheng-Xing Guan
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fang Wang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Dan Zhang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
37
|
Mishra A, Khanal R, Kabir WU, Hoque T. AIRBP: Accurate identification of RNA-binding proteins using machine learning techniques. Artif Intell Med 2021; 113:102034. [PMID: 33685590 DOI: 10.1016/j.artmed.2021.102034] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2020] [Revised: 01/19/2021] [Accepted: 02/09/2021] [Indexed: 12/25/2022]
Abstract
Identification of RNA-binding proteins (RBPs) that bind to ribonucleic acid molecules is an important problem in Computational Biology and Bioinformatics. It becomes indispensable to identify RBPs as they play crucial roles in post-transcriptional control of RNAs and RNA metabolism as well as have diverse roles in various biological processes such as splicing, mRNA stabilization, mRNA localization, and translation, RNA synthesis, folding-unfolding, modification, processing, and degradation. The existing experimental techniques for identifying RBPs are time-consuming and expensive. Therefore, identifying RBPs directly from the sequence using computational methods can be useful to annotate RBPs and assist the experimental design efficiently. In this work, we present a method called AIRBP, which is designed using an advanced machine learning technique, called stacking, to effectively predict RBPs by utilizing features extracted from evolutionary information, physiochemical properties, and disordered properties. Moreover, our method, AIRBP, use the majority vote from RBPPred, DeepRBPPred, and the stacking model for the prediction for RBPs. The results show that AIRBP attains Accuracy (ACC), Balanced Accuracy (BACC), F1-score, and Mathews Correlation Coefficient (MCC) of 95.84 %, 94.71 %, 0.928, and 0.899, respectively, based on the training dataset, using 10-fold cross-validation (CV). Further evaluation of AIRBP on independent test set reveals that it achieves ACC, BACC, F1-score, and MCC of 94.36 %, 94.28 %, 0.897, and 0.860, for Human test set; 91.25 %, 93.00 %, 0.896, and 0.835 for S. cerevisiae test set; and 90.60 %, 90.41 %, 0.934, and 0.775 for A. thaliana test set, respectively. These results indicate that the AIRBP outperforms the existing Deep- and TriPepSVM methods. Therefore, the proposed better-performing AIRBP can be useful for accurate identification and annotation of RBPs directly from the sequence and help gain valuable insight to treat critical diseases. Availability: Code-data is available here: http://cs.uno.edu/∼tamjid/Software/AIRBP/code_data.zip.
Collapse
Affiliation(s)
- Avdesh Mishra
- Department of Electrical Engineering and Computer Science, Texas A&M University-Kingsville, Kingsville, TX, USA
| | - Reecha Khanal
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA
| | - Wasi Ul Kabir
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA
| | - Tamjidul Hoque
- Department of Computer Science, University of New Orleans, New Orleans, LA, USA.
| |
Collapse
|
38
|
The evolutionary relationship of S15/NS1RNA binding domains with a similar protein domain pattern - A computational approach. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
39
|
Sharma N, Patiyal S, Dhall A, Pande A, Arora C, Raghava GPS. AlgPred 2.0: an improved method for predicting allergenic proteins and mapping of IgE epitopes. Brief Bioinform 2020; 22:5985292. [PMID: 33201237 DOI: 10.1093/bib/bbaa294] [Citation(s) in RCA: 152] [Impact Index Per Article: 30.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2020] [Revised: 10/02/2020] [Accepted: 10/05/2020] [Indexed: 12/22/2022] Open
Abstract
AlgPred 2.0 is a web server developed for predicting allergenic proteins and allergenic regions in a protein. It is an updated version of AlgPred developed in 2006. The dataset used for training, testing and validation consists of 10 075 allergens and 10 075 non-allergens. In addition, 10 451 experimentally validated immunoglobulin E (IgE) epitopes were used to identify antigenic regions in a protein. All models were trained on 80% of data called training dataset, and the performance of models was evaluated using 5-fold cross-validation technique. The performance of the final model trained on the training dataset was evaluated on 20% of data called validation dataset; no two proteins in any two sets have more than 40% similarity. First, a Basic Local Alignment Search Tool (BLAST) search has been performed against the dataset, and allergens were predicted based on the level of similarity with known allergens. Second, IgE epitopes obtained from the IEDB database were searched in the dataset to predict allergens based on their presence in a protein. Third, motif-based approaches like multiple EM for motif elicitation/motif alignment and search tool have been used to predict allergens. Fourth, allergen prediction models have been developed using a wide range of machine learning techniques. Finally, the ensemble approach has been used for predicting allergenic protein by combining prediction scores of different approaches. Our best model achieved maximum performance in terms of area under receiver operating characteristic curve 0.98 with Matthew's correlation coefficient 0.85 on the validation dataset. A web server AlgPred 2.0 has been developed that allows the prediction of allergens, mapping of IgE epitope, motif search and BLAST search (https://webs.iiitd.edu.in/raghava/algpred2/).
Collapse
Affiliation(s)
- Neelam Sharma
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Akshara Pande
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Chakit Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| |
Collapse
|
40
|
Heffron J, Mayer BK. Improved Virus Isoelectric Point Estimation by Exclusion of Known and Predicted Genome-Binding Regions. Appl Environ Microbiol 2020; 86:e01674-20. [PMID: 32978129 PMCID: PMC7657617 DOI: 10.1128/aem.01674-20] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Accepted: 09/18/2020] [Indexed: 01/16/2023] Open
Abstract
Knowledge of the isoelectric points (pIs) of viruses is beneficial for predicting virus behavior in environmental transport and physical/chemical treatment applications. However, the empirically measured pIs of many viruses have thus far defied simple explanation, let alone prediction, based on the ionizable amino acid composition of the virus capsid. Here, we suggest an approach for predicting the pI of nonenveloped viruses by excluding capsid regions that stabilize the virus polynucleotide via electrostatic interactions. This method was applied first to viruses with known polynucleotide-binding regions (PBRs) and/or three-dimensional (3D) structures. Then, PBRs were predicted in a group of 32 unique viral capsid proteome sequences via conserved structures and sequence motifs. Removing predicted PBRs resulted in a significantly better fit to empirical pI values. After modification, mean differences between theoretical and empirical pI values were reduced from 2.1 ± 2.4 to 0.1 ± 1.7 pH units.IMPORTANCE This model fits predicted pIs to empirical values for a diverse set of viruses. The results suggest that many previously reported discrepancies between theoretical and empirical virus pIs can be explained by coulombic neutralization of PBRs of the inner capsid. Given the diversity of virus capsid structures, this nonarbitrary, heuristic approach to predicting virus pI offers an effective alternative to a simplistic, one-size-fits-all charge model of the virion. The accurate, structure-based prediction of PBRs of the virus capsid employed here may also be of general interest to structural virologists.
Collapse
Affiliation(s)
- Joe Heffron
- Department of Civil, Construction and Environmental Engineering, Marquette University, Milwaukee, Wisconsin, USA
| | - Brooke K Mayer
- Department of Civil, Construction and Environmental Engineering, Marquette University, Milwaukee, Wisconsin, USA
| |
Collapse
|
41
|
Zhang J, Chen Q, Liu B. iDRBP_MMC: Identifying DNA-Binding Proteins and RNA-Binding Proteins Based on Multi-Label Learning Model and Motif-Based Convolutional Neural Network. J Mol Biol 2020; 432:5860-5875. [DOI: 10.1016/j.jmb.2020.09.008] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Revised: 08/12/2020] [Accepted: 09/04/2020] [Indexed: 11/28/2022]
|
42
|
Chen YM, Zu XP, Li D. Identification of Proteins of Tobacco Mosaic Virus by Using a Method of Feature Extraction. Front Genet 2020; 11:569100. [PMID: 33193664 PMCID: PMC7581905 DOI: 10.3389/fgene.2020.569100] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Accepted: 09/09/2020] [Indexed: 12/03/2022] Open
Abstract
Tobacco mosaic virus, TMV for short, is widely distributed in the global tobacco industry and has a significant impact on tobacco production. It can reduce the amount of tobacco grown by 50-70%. In this research of study, we aimed to identify tobacco mosaic virus proteins and healthy tobacco leaf proteins by using machine learning approaches. The experiment's results showed that the support vector machine algorithm achieved high accuracy in different feature extraction methods. And 188-dimensions feature extraction method improved the classification accuracy. In that the support vector machine algorithm and 188-dimensions feature extraction method were finally selected as the final experimental methods. In the 10-fold cross-validation processes, the SVM combined with 188-dimensions achieved 93.5% accuracy on the training set and 92.7% accuracy on the independent validation set. Besides, the evaluation index of the results of experiments indicate that the method developed by us is valid and robust.
Collapse
Affiliation(s)
| | | | - Dan Li
- Information and Computer Engineering College, Northeast Forestry University, Harbin, China
| |
Collapse
|
43
|
Zhao Y, Du X. econvRBP: Improved ensemble convolutional neural networks for RNA binding protein prediction directly from sequence. Methods 2020; 181-182:15-23. [PMID: 31513916 DOI: 10.1016/j.ymeth.2019.09.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 08/21/2019] [Accepted: 09/05/2019] [Indexed: 10/26/2022] Open
Abstract
RNA binding proteins (RBPs) determine RNA process from synthesis to decay, which play a key role in RNA transport, translation and degradation. Therefore, exploring RBPs' function from the amino acid sequence using computational methods has become one of the momentous topics in genome annotation. However, there still have some challenges: (1) shallow feature: Although the sequence determines structure is self-evident, it is difficult to analyze the essential features from simple sequence. (2) Poorly understand: feature-based prediction methods mainly emphasize feature extraction, while in-depth understanding of protein mysteries limits the application of feature engineering. (3) Feature fusion: multi-feature fusion is often used, but the features are not well integrated. In view of these challenges, we propose a novel ensemble convolutional neural network (econvRBP) to predict RBPs. In order to capture the local and global features of RNA binding proteins simultaneously, first of all, One Hot and Conjoint Triad encoding methods are used to transform amino acid sequence into local and global features, respectively. After that the local and global features are combined for further high-level feature extraction using convolutional neural networks. Some experiments are constructed to evaluate our method with 10-fold cross validation and the results show that it has achieved the best performance among all the predictors so far. We correctly predicted 99% of 2875 RBPs and 99% of 6782 non-RBPs with accuracy of 0.99. In addition, the datasets provided by RBPPred are also used to validate our models with an accuracy of 0.87. These results indicate that the econvRBP is the most excellent method at present, and will provide reliable guidance for the detection of RBPs. econvRBP is available at http://47.100.203.218:3389/home.html/.
Collapse
Affiliation(s)
- Yuze Zhao
- School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| | - Xiuquan Du
- School of Computer Science and Technology, Anhui University, Hefei, Anhui, China.
| |
Collapse
|
44
|
Kaur D, Arora C, Raghava GPS. A Hybrid Model for Predicting Pattern Recognition Receptors Using Evolutionary Information. Front Immunol 2020; 11:71. [PMID: 32082326 PMCID: PMC7002473 DOI: 10.3389/fimmu.2020.00071] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 01/13/2020] [Indexed: 12/17/2022] Open
Abstract
This study describes a method developed for predicting pattern recognition receptors (PRRs), which are an integral part of the immune system. The models developed here were trained and evaluated on the largest possible non-redundant PRRs, obtained from PRRDB 2.0, and non-pattern recognition receptors (Non-PRRs), obtained from Swiss-Prot. Firstly, a similarity-based approach using BLAST was used to predict PRRs and got limited success due to a large number of no-hits. Secondly, machine learning-based models were developed using sequence composition and achieved a maximum MCC of 0.63. In addition to this, models were developed using evolutionary information in the form of PSSM composition and achieved maximum MCC value of 0.66. Finally, we developed hybrid models that combined a similarity-based approach using BLAST and machine learning-based models. Our best model, which combined BLAST and PSSM based model, achieved a maximum MCC value of 0.82 with an AUROC value of 0.95, utilizing the potential of both similarity-based search and machine learning techniques. In order to facilitate the scientific community, we also developed a web server "PRRpred" based on the best model developed in this study (http://webs.iiitd.edu.in/raghava/prrpred/).
Collapse
Affiliation(s)
- Dilraj Kaur
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Chakit Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| |
Collapse
|
45
|
Touati R, Messaoudi I, Oueslati AE, Lachiri Z, Kharrat M. Classification of intra-genomic helitrons based on features extracted from different orders of FCGS. INFORMATICS IN MEDICINE UNLOCKED 2020. [DOI: 10.1016/j.imu.2019.100271] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
46
|
Zhang Y, Xie R, Wang J, Leier A, Marquez-Lago TT, Akutsu T, Webb GI, Chou KC, Song J. Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework. Brief Bioinform 2019; 20:2185-2199. [PMID: 30351377 PMCID: PMC6954445 DOI: 10.1093/bib/bby079] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 07/28/2018] [Accepted: 08/01/2018] [Indexed: 11/15/2022] Open
Abstract
As a newly discovered post-translational modification (PTM), lysine malonylation (Kmal) regulates a myriad of cellular processes from prokaryotes to eukaryotes and has important implications in human diseases. Despite its functional significance, computational methods to accurately identify malonylation sites are still lacking and urgently needed. In particular, there is currently no comprehensive analysis and assessment of different features and machine learning (ML) methods that are required for constructing the necessary prediction models. Here, we review, analyze and compare 11 different feature encoding methods, with the goal of extracting key patterns and characteristics from residue sequences of Kmal sites. We identify optimized feature sets, with which four commonly used ML methods (random forest, support vector machines, K-nearest neighbor and logistic regression) and one recently proposed [Light Gradient Boosting Machine (LightGBM)] are trained on data from three species, namely, Escherichia coli, Mus musculus and Homo sapiens, and compared using randomized 10-fold cross-validation tests. We show that integration of the single method-based models through ensemble learning further improves the prediction performance and model robustness on the independent test. When compared to the existing state-of-the-art predictor, MaloPred, the optimal ensemble models were more accurate for all three species (AUC: 0.930, 0.923 and 0.944 for E. coli, M. musculus and H. sapiens, respectively). Using the ensemble models, we developed an accessible online predictor, kmal-sp, available at http://kmalsp.erc.monash.edu/. We hope that this comprehensive survey and the proposed strategy for building more accurate models can serve as a useful guide for inspiring future developments of computational methods for PTM site prediction, expedite the discovery of new malonylation and other PTM types and facilitate hypothesis-driven experimental validation of novel malonylated substrates and malonylation sites.
Collapse
Affiliation(s)
- Yanju Zhang
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Ruopeng Xie
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Jiawei Wang
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, VIC 3800, Australia
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Tatiana T Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, VIC 3800, Australia
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Jiangning Song
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, VIC 3800, Australia
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, VIC 3800, Australia
| |
Collapse
|
47
|
Bressin A, Schulte-Sasse R, Figini D, Urdaneta EC, Beckmann BM, Marsico A. TriPepSVM: de novo prediction of RNA-binding proteins based on short amino acid motifs. Nucleic Acids Res 2019; 47:4406-4417. [PMID: 30923827 PMCID: PMC6511874 DOI: 10.1093/nar/gkz203] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Revised: 02/20/2019] [Accepted: 03/18/2019] [Indexed: 12/26/2022] Open
Abstract
In recent years, hundreds of novel RNA-binding proteins (RBPs) have been identified, leading to the discovery of novel RNA-binding domains. Furthermore, unstructured or disordered low-complexity regions of RBPs have been identified to play an important role in interactions with nucleic acids. However, these advances in understanding RBPs are limited mainly to eukaryotic species and we only have limited tools to faithfully predict RNA-binders in bacteria. Here, we describe a support vector machine-based method, called TriPepSVM, for the prediction of RNA-binding proteins. TriPepSVM applies string kernels to directly handle protein sequences using tri-peptide frequencies. Testing the method in human and bacteria, we find that several RBP-enriched tri-peptides occur more often in structurally disordered regions of RBPs. TriPepSVM outperforms existing applications, which consider classical structural features of RNA-binding or homology, in the task of RBP prediction in both human and bacteria. Finally, we predict 66 novel RBPs in Salmonella Typhimurium and validate the bacterial proteins ClpX, DnaJ and UbiG to associate with RNA in vivo.
Collapse
Affiliation(s)
- Annkatrin Bressin
- Max Planck Institute for Molecular Genetics, Ihnestrasse 63-73, 14195 Berlin, Germany
| | - Roman Schulte-Sasse
- Max Planck Institute for Molecular Genetics, Ihnestrasse 63-73, 14195 Berlin, Germany
| | - Davide Figini
- IRI Life Sciences, Humboldt University Berlin, Philippstrasse 13, 10115 Berlin, Germany
| | - Erika C Urdaneta
- IRI Life Sciences, Humboldt University Berlin, Philippstrasse 13, 10115 Berlin, Germany
| | - Benedikt M Beckmann
- IRI Life Sciences, Humboldt University Berlin, Philippstrasse 13, 10115 Berlin, Germany
| | - Annalisa Marsico
- Max Planck Institute for Molecular Genetics, Ihnestrasse 63-73, 14195 Berlin, Germany.,Free University of Berlin, Takustrasse 9, 14195 Berlin, Germany.,Institute of Computational Biology (ICB), Helmholtz Zentrum Munich, Ingolstaedter Landstr. 1 85764 Neuherberg, Germany
| |
Collapse
|
48
|
Sagar A, Xue B. Recent Advances in Machine Learning Based Prediction of RNA-protein Interactions. Protein Pept Lett 2019; 26:601-619. [PMID: 31215361 DOI: 10.2174/0929866526666190619103853] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Revised: 04/04/2019] [Accepted: 06/01/2019] [Indexed: 12/18/2022]
Abstract
The interactions between RNAs and proteins play critical roles in many biological processes. Therefore, characterizing these interactions becomes critical for mechanistic, biomedical, and clinical studies. Many experimental methods can be used to determine RNA-protein interactions in multiple aspects. However, due to the facts that RNA-protein interactions are tissuespecific and condition-specific, as well as these interactions are weak and frequently compete with each other, those experimental techniques can not be made full use of to discover the complete spectrum of RNA-protein interactions. To moderate these issues, continuous efforts have been devoted to developing high quality computational techniques to study the interactions between RNAs and proteins. Many important progresses have been achieved with the application of novel techniques and strategies, such as machine learning techniques. Especially, with the development and application of CLIP techniques, more and more experimental data on RNA-protein interaction under specific biological conditions are available. These CLIP data altogether provide a rich source for developing advanced machine learning predictors. In this review, recent progresses on computational predictors for RNA-protein interaction were summarized in the following aspects: dataset, prediction strategies, and input features. Possible future developments were also discussed at the end of the review.
Collapse
Affiliation(s)
- Amit Sagar
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, Florida 33620, United States
| | - Bin Xue
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, Florida 33620, United States
| |
Collapse
|
49
|
Faustino AF, Martins AS, Karguth N, Artilheiro V, Enguita FJ, Ricardo JC, Santos NC, Martins IC. Structural and Functional Properties of the Capsid Protein of Dengue and Related Flavivirus. Int J Mol Sci 2019; 20:E3870. [PMID: 31398956 PMCID: PMC6720645 DOI: 10.3390/ijms20163870] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 08/05/2019] [Accepted: 08/06/2019] [Indexed: 02/07/2023] Open
Abstract
Dengue, West Nile and Zika, closely related viruses of the Flaviviridae family, are an increasing global threat, due to the expansion of their mosquito vectors. They present a very similar viral particle with an outer lipid bilayer containing two viral proteins and, within it, the nucleocapsid core. This core is composed by the viral RNA complexed with multiple copies of the capsid protein, a crucial structural protein that mediates not only viral assembly, but also encapsidation, by interacting with host lipid systems. The capsid is a homodimeric protein that contains a disordered N-terminal region, an intermediate flexible fold section and a very stable conserved fold region. Since a better understanding of its structure can give light into its biological activity, here, first, we compared and analyzed relevant mosquito-borne Flavivirus capsid protein sequences and their predicted structures. Then, we studied the alternative conformations enabled by the N-terminal region. Finally, using dengue virus capsid protein as main model, we correlated the protein size, thermal stability and function with its structure/dynamics features. The findings suggest that the capsid protein interaction with host lipid systems leads to minor allosteric changes that may modulate the specific binding of the protein to the viral RNA. Such mechanism can be targeted in future drug development strategies, namely by using improved versions of pep14-23, a dengue virus capsid protein peptide inhibitor, previously developed by us. Such knowledge can yield promising advances against Zika, dengue and closely related Flavivirus.
Collapse
Affiliation(s)
- André F Faustino
- Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Av. Prof. Egas Moniz, 1649-028 Lisbon, Portugal
| | - Ana S Martins
- Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Av. Prof. Egas Moniz, 1649-028 Lisbon, Portugal
| | - Nina Karguth
- Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Av. Prof. Egas Moniz, 1649-028 Lisbon, Portugal
| | - Vanessa Artilheiro
- Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Av. Prof. Egas Moniz, 1649-028 Lisbon, Portugal
| | - Francisco J Enguita
- Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Av. Prof. Egas Moniz, 1649-028 Lisbon, Portugal
| | - Joana C Ricardo
- Centro de Química-Física Molecular, Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal
| | - Nuno C Santos
- Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Av. Prof. Egas Moniz, 1649-028 Lisbon, Portugal.
| | - Ivo C Martins
- Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Av. Prof. Egas Moniz, 1649-028 Lisbon, Portugal.
| |
Collapse
|
50
|
Chauhan S, Ahmad S. Enabling full‐length evolutionary profiles based deep convolutional neural network for predicting DNA‐binding proteins from sequence. Proteins 2019; 88:15-30. [DOI: 10.1002/prot.25763] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2019] [Revised: 06/01/2019] [Accepted: 06/15/2019] [Indexed: 12/22/2022]
Affiliation(s)
- Sucheta Chauhan
- School of Computational and Integrative SciencesJawaharlal Nehru University New Delhi India
| | - Shandar Ahmad
- School of Computational and Integrative SciencesJawaharlal Nehru University New Delhi India
| |
Collapse
|