1
|
Zhang J, Wang R, Wei L. MucLiPred: Multi-Level Contrastive Learning for Predicting Nucleic Acid Binding Residues of Proteins. J Chem Inf Model 2024; 64:1050-1065. [PMID: 38301174 DOI: 10.1021/acs.jcim.3c01471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Protein-molecule interactions play a crucial role in various biological functions, with their accurate prediction being pivotal for drug discovery and design processes. Traditional methods for predicting protein-molecule interactions are limited. Some can only predict interactions with a specific molecule, restricting their applicability, while others target multiple molecule types but fail to efficiently process diverse interaction information, leading to complexity and inefficiency. This study presents a novel deep learning model, MucLiPred, equipped with a dual contrastive learning mechanism aimed at improving the prediction of multiple molecule-protein interactions and the identification of potential molecule-binding residues. The residue-level paradigm focuses on differentiating binding from non-binding residues, illuminating detailed local interactions. The type-level paradigm, meanwhile, analyzes overarching contexts of molecule types, like DNA or RNA, ensuring that representations of identical molecule types gravitate closer in the representational space, bolstering the model's proficiency in discerning interaction motifs. This dual approach enables comprehensive multi-molecule predictions, elucidating the relationships among different molecule types and strengthening precise protein-molecule interaction predictions. Empirical evidence demonstrates MucLiPred's superiority over existing models in robustness and prediction accuracy. The integration of dual contrastive learning techniques amplifies its capability to detect potential molecule-binding residues with precision. Further optimization, separating representational and classification tasks, has markedly improved its performance. MucLiPred thus represents a significant advancement in protein-molecule interaction prediction, setting a new precedent for future research in this field.
Collapse
Affiliation(s)
- Jiashuo Zhang
- School of Software, Shandong University, Jinan 250101, China
| | - Ruheng Wang
- School of Software, Shandong University, Jinan 250101, China
| | - Leyi Wei
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
- Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
| |
Collapse
|
2
|
Escamilla-Gutiérrez A, Córdova-Espinoza MG, Sánchez-Monciváis A, Tecuatzi-Cadena B, Regalado-García AG, Medina-Quero K. In silico selection of aptamers for bacterial toxins detection. J Biomol Struct Dyn 2023; 41:10909-10918. [PMID: 36546716 DOI: 10.1080/07391102.2022.2159529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 12/10/2022] [Indexed: 12/24/2022]
Abstract
The most commonly used toxins in biological warfare are staphylococcal enterotoxin B (3SEB), cholera toxin (1XTC), and botulinum toxin (3BTA). Uncovering novel strategies for identifying these toxins is paramount; therefore, aptamers are used for this purpose. Aptamers are single-stranded DNA or RNA oligonucleotides selected via Systematic Evolution of Ligands by Exponential Enrichment (SELEX) with high binding affinity and specificity against target molecules. However, SELEX in vitro is tedious; hence, adopting alternative in silico molecular docking approaches is necessary. We aimed to conduct molecular docking with accessible tools and obtain RNA aptamers. First, 4,820,095 sequences obtained from an initial library of 9.5 × 109 Python script sequences were used. The GraphClust program was used to create representative groups or clusters, and the DoGSiteScorer (https://proteins.plus/) was used to conduct binding site detection of the proteins: 5DO4 (thrombin), 3SEB, 1XTC, and 3BTA. rDock, HDock, and PatchDock were adopted, combining different docking program results (consensus scoring), to improve receptor-ligand prediction. An analysis of the poses and root mean square deviation (RMSD) was performed, and 468 structurally different aptamers were obtained. The DoGSiteScorer program predicted the binding site of each protein to direct the interaction with the aptamer. Candidate aptamers for 3SEB, 1XTC, and 3BTA were selected according to the pose value considering the closeness of the interaction with a lower mean of 45.923 Å, 45.854 Å, and 72.490 Å, respectively.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Alejandro Escamilla-Gutiérrez
- Laboratorio de Bacteriología Médica, Departamento de Microbiología, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Ciudad de México, México
- Hospital General, Instituto Mexicano del Seguro Social IMSS, Ciudad de México, México
| | - María Guadalupe Córdova-Espinoza
- Laboratorio de Bacteriología Médica, Departamento de Microbiología, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Ciudad de México, México
- Laboratorio de Inmunología, Escuela Militar de Graduados de Sanidad, Secretaría de la Defensa Nacional, Ciudad de México, México
| | - Anahí Sánchez-Monciváis
- Laboratorio de Inmunología, Escuela Militar de Graduados de Sanidad, Secretaría de la Defensa Nacional, Ciudad de México, México
| | - Brenda Tecuatzi-Cadena
- Laboratorio de Inmunología, Escuela Militar de Graduados de Sanidad, Secretaría de la Defensa Nacional, Ciudad de México, México
| | - Ana Gabriela Regalado-García
- Laboratorio de Inmunología, Escuela Militar de Graduados de Sanidad, Secretaría de la Defensa Nacional, Ciudad de México, México
| | - Karen Medina-Quero
- Laboratorio de Inmunología, Escuela Militar de Graduados de Sanidad, Secretaría de la Defensa Nacional, Ciudad de México, México
| |
Collapse
|
3
|
Patiyal S, Dhall A, Bajaj K, Sahu H, Raghava GPS. Prediction of RNA-interacting residues in a protein using CNN and evolutionary profile. Brief Bioinform 2023; 24:6901899. [PMID: 36516298 DOI: 10.1093/bib/bbac538] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 09/28/2022] [Accepted: 11/08/2022] [Indexed: 12/15/2022] Open
Abstract
This paper describes a method Pprint2, which is an improved version of Pprint developed for predicting RNA-interacting residues in a protein. Training and independent/validation datasets used in this study comprises of 545 and 161 non-redundant RNA-binding proteins, respectively. All models were trained on training dataset and evaluated on the validation dataset. The preliminary analysis reveals that positively charged amino acids such as H, R and K, are more prominent in the RNA-interacting residues. Initially, machine learning based models have been developed using binary profile and obtain maximum area under curve (AUC) 0.68 on validation dataset. The performance of this model improved significantly from AUC 0.68 to 0.76, when evolutionary profile is used instead of binary profile. The performance of our evolutionary profile-based model improved further from AUC 0.76 to 0.82, when convolutional neural network has been used for developing model. Our final model based on convolutional neural network using evolutionary information achieved AUC 0.82 with Matthews correlation coefficient of 0.49 on the validation dataset. Our best model outperforms existing methods when evaluated on the independent/validation dataset. A user-friendly standalone software and web-based server named 'Pprint2' has been developed for predicting RNA-interacting residues (https://webs.iiitd.edu.in/raghava/pprint2 and https://github.com/raghavagps/pprint2).
Collapse
Affiliation(s)
- Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Khushboo Bajaj
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Harshita Sahu
- Department of Computer Science and Engineering, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi-110020, India
| |
Collapse
|
4
|
Zhang L, Yang J, Luo Y, Liu F, Yuan Y, Zhuang S. A p53/lnc-Ip53 Negative Feedback Loop Regulates Tumor Growth and Chemoresistance. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2020; 7:2001364. [PMID: 33173727 PMCID: PMC7610266 DOI: 10.1002/advs.202001364] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 06/08/2020] [Indexed: 06/03/2023]
Abstract
Acetylation is a critical mechanism to modulate tumor-suppressive activity of p53, but the causative roles of long non-coding RNAs (lncRNAs) in p53 acetylation and their biological significance remain unexplored. Here, lncRNA LOC100294145 is discovered to be transactivated by p53 and is thus designated as lnc-Ip53 for lncRNA induced by p53. Furthermore, lnc-Ip53 impedes p53 acetylation by interacting with histone deacetylase 1 (HDAC1) and E1A binding protein p300 (p300) to prevent HDAC1 degradation and attenuate p300 activity, resulting in abrogation of p53 activity and subsequent cell proliferation and apoptosis resistance. Mouse xenograft models reveal that lnc-Ip53 promotes tumor growth and chemoresistance in vivo, which is attenuated by an HDAC inhibitor. Silencing lnc-Ip53 inhibits the growth of xenografts with wild-type p53, but not those expressing acetylation-resistant p53. Consistently, lnc-Ip53 is upregulated in multiple cancer types, including hepatocellular carcinoma (HCC). High levels of lnc-Ip53 is associated with low levels of acetylated p53 in human HCC and mouse xenografts, and is also correlated with poor survival of HCC patients. These findings identify a novel p53/lnc-Ip53 negative feedback loop in cells and indicate that abnormal upregulation of lnc-Ip53 represents an important mechanism to inhibit p53 acetylation/activity and thereby promote tumor growth and chemoresistance, which may be exploited for anticancer therapy.
Collapse
Affiliation(s)
- Li‐Zhen Zhang
- MOE Key Laboratory of Gene Function and RegulationSchool of Life SciencesCollaborative Innovation Center for Cancer MedicineSun Yat‐sen UniversityGuangzhou510275China
| | - Jin‐E Yang
- MOE Key Laboratory of Gene Function and RegulationSchool of Life SciencesCollaborative Innovation Center for Cancer MedicineSun Yat‐sen UniversityGuangzhou510275China
| | - Yu‐Wei Luo
- MOE Key Laboratory of Gene Function and RegulationSchool of Life SciencesCollaborative Innovation Center for Cancer MedicineSun Yat‐sen UniversityGuangzhou510275China
| | - Feng‐Ting Liu
- MOE Key Laboratory of Gene Function and RegulationSchool of Life SciencesCollaborative Innovation Center for Cancer MedicineSun Yat‐sen UniversityGuangzhou510275China
| | - Yun‐Fei Yuan
- Department of Hepatobilliary OncologyCancer CenterSun Yat‐sen UniversityGuangzhou510060China
| | - Shi‐Mei Zhuang
- MOE Key Laboratory of Gene Function and RegulationSchool of Life SciencesCollaborative Innovation Center for Cancer MedicineSun Yat‐sen UniversityGuangzhou510275China
- Key Laboratory of Liver Disease of Guangdong ProvinceThe Third Affiliated HospitalSun Yat‐sen UniversityGuangzhou510630China
| |
Collapse
|
5
|
Wekesa JS, Meng J, Luan Y. Multi-feature fusion for deep learning to predict plant lncRNA-protein interaction. Genomics 2020; 112:2928-2936. [PMID: 32437848 DOI: 10.1016/j.ygeno.2020.05.005] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 04/22/2020] [Accepted: 05/05/2020] [Indexed: 12/28/2022]
Abstract
Long non-coding RNAs (lncRNAs) play key roles in regulating cellular biological processes through diverse molecular mechanisms including binding to RNA binding proteins. The majority of plant lncRNAs are functionally uncharacterized, thus, accurate prediction of plant lncRNA-protein interaction is imperative for subsequent functional studies. We present an integrative model, namely DRPLPI. Its uniqueness is that it predicts by multi-feature fusion. Structural and four groups of sequence features are used, including tri-nucleotide composition, gapped k-mer, recursive complement and binary profile. We design a multi-head self-attention long short-term memory encoder-decoder network to extract generative high-level features. To obtain robust results, DRPLPI combines categorical boosting and extra trees into a single meta-learner. Experiments on Zea mays and Arabidopsis thaliana obtained 0.9820 and 0.9652 area under precision/recall curve (AUPRC) respectively. The proposed method shows significant enhancement in the prediction performance compared with existing state-of-the-art methods.
Collapse
Affiliation(s)
- Jael Sanyanda Wekesa
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116023, China; School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, Nairobi 62000-00200, Kenya
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116023, China.
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, Liaoning 116023, China
| |
Collapse
|
6
|
Wang Z, Lei X, Wu FX. Identifying Cancer-Specific circRNA-RBP Binding Sites Based on Deep Learning. Molecules 2019; 24:E4035. [PMID: 31703384 PMCID: PMC6891306 DOI: 10.3390/molecules24224035] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2019] [Revised: 10/25/2019] [Accepted: 11/06/2019] [Indexed: 12/17/2022] Open
Abstract
Circular RNAs (circRNAs) are extensively expressed in cells and tissues, and play crucial roles in human diseases and biological processes. Recent studies have reported that circRNAs could function as RNA binding protein (RBP) sponges, meanwhile RBPs can also be involved in back-splicing. The interaction with RBPs is also considered an important factor for investigating the function of circRNAs. Hence, it is necessary to understand the interaction mechanisms of circRNAs and RBPs, especially in human cancers. Here, we present a novel method based on deep learning to identify cancer-specific circRNA-RBP binding sites (CSCRSites), only using the nucleotide sequences as the input. In CSCRSites, an architecture with multiple convolution layers is utilized to detect the features of the raw circRNA sequence fragments, and further identify the binding sites through a fully connected layer with the softmax output. The experimental results show that CSCRSites outperform the conventional machine learning classifiers and some representative deep learning methods on the benchmark data. In addition, the features learnt by CSCRSites are converted to sequence motifs, some of which can match to human known RNA motifs involved in human diseases, especially cancer. Therefore, as a deep learning-based tool, CSCRSites could significantly contribute to the function analysis of cancer-associated circRNAs.
Collapse
Affiliation(s)
- Zhengfeng Wang
- School of Computer Science, Shaanxi Normal University, Xi’an 710119, China;
- College of Information Science and Engineering, Guilin University of Technology, Guilin 541004, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi’an 710119, China;
| | - Fang-Xiang Wu
- Department of Mechanical Engineering and Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada;
| |
Collapse
|
7
|
Sagar A, Xue B. Recent Advances in Machine Learning Based Prediction of RNA-protein Interactions. Protein Pept Lett 2019; 26:601-619. [PMID: 31215361 DOI: 10.2174/0929866526666190619103853] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Revised: 04/04/2019] [Accepted: 06/01/2019] [Indexed: 12/18/2022]
Abstract
The interactions between RNAs and proteins play critical roles in many biological processes. Therefore, characterizing these interactions becomes critical for mechanistic, biomedical, and clinical studies. Many experimental methods can be used to determine RNA-protein interactions in multiple aspects. However, due to the facts that RNA-protein interactions are tissuespecific and condition-specific, as well as these interactions are weak and frequently compete with each other, those experimental techniques can not be made full use of to discover the complete spectrum of RNA-protein interactions. To moderate these issues, continuous efforts have been devoted to developing high quality computational techniques to study the interactions between RNAs and proteins. Many important progresses have been achieved with the application of novel techniques and strategies, such as machine learning techniques. Especially, with the development and application of CLIP techniques, more and more experimental data on RNA-protein interaction under specific biological conditions are available. These CLIP data altogether provide a rich source for developing advanced machine learning predictors. In this review, recent progresses on computational predictors for RNA-protein interaction were summarized in the following aspects: dataset, prediction strategies, and input features. Possible future developments were also discussed at the end of the review.
Collapse
Affiliation(s)
- Amit Sagar
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, Florida 33620, United States
| | - Bin Xue
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, Florida 33620, United States
| |
Collapse
|
8
|
Pan X, Yang Y, Xia C, Mirza AH, Shen H. Recent methodology progress of deep learning for RNA–protein interaction prediction. WILEY INTERDISCIPLINARY REVIEWS-RNA 2019; 10:e1544. [DOI: 10.1002/wrna.1544] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 04/07/2019] [Accepted: 04/11/2019] [Indexed: 12/17/2022]
Affiliation(s)
- Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing Ministry of Education of China Shanghai China
- IDLab, Department for Electronics and Information Systems Ghent University Ghent Belgium
- BASF Agriculture Solution Ghent Belgium
| | - Yang Yang
- Department of Computer Science Shanghai Jiao Tong University, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering Shanghai China
| | - Chun‐Qiu Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing Ministry of Education of China Shanghai China
| | - Aashiq H. Mirza
- Department of Pharmacology Weill Cornell Medicine New York New York
| | - Hong‐Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing Ministry of Education of China Shanghai China
- Department of Computer Science Shanghai Jiao Tong University, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering Shanghai China
| |
Collapse
|
9
|
Agrawal P, Patiyal S, Kumar R, Kumar V, Singh H, Raghav PK, Raghava GPS. ccPDB 2.0: an updated version of datasets created and compiled from Protein Data Bank. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2019; 2019:5298333. [PMID: 30689843 PMCID: PMC6343045 DOI: 10.1093/database/bay142] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 12/09/2018] [Indexed: 12/20/2022]
Abstract
ccPDB 2.0 (http://webs.iiitd.edu.in/raghava/ccpdb) is an updated version of the manually curated database ccPDB that maintains datasets required for developing methods to predict the structure and function of proteins. The number of datasets compiled from literature increased from 45 to 141 in ccPDB 2.0. Similarly, the number of protein structures used for creating datasets also increased from ~74 000 to ~137 000 (PDB March 2018 release). ccPDB 2.0 provides the same web services and flexible tools which were present in the previous version of the database. In the updated version, links of the number of methods developed in the past few years have also been incorporated. This updated resource is built on responsive templates which is compatible with smartphones (mobile, iPhone, iPad, tablets etc.) and large screen gadgets. In summary, ccPDB 2.0 is a user-friendly web-based platform that provides comprehensive as well as updated information about datasets.
Collapse
Affiliation(s)
- Piyush Agrawal
- Bioinformatics Center, CSIR-Institute of Microbial Technology, India.,Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, India
| | - Rajesh Kumar
- Bioinformatics Center, CSIR-Institute of Microbial Technology, India.,Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, India
| | - Vinod Kumar
- Bioinformatics Center, CSIR-Institute of Microbial Technology, India.,Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, India
| | - Harinder Singh
- J. Craig Venter Institute 9605 Medical Center Drive, Suite 150 Rockville, MD, USA
| | - Pawan Kumar Raghav
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, India
| |
Collapse
|