1
|
Roterman I, Stapor K, Konieczny L. Engagement of intrinsic disordered proteins in protein-protein interaction. Front Mol Biosci 2023; 10:1230922. [PMID: 37583961 PMCID: PMC10423874 DOI: 10.3389/fmolb.2023.1230922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 07/11/2023] [Indexed: 08/17/2023] Open
Abstract
Proteins from the intrinsically disordered group (IDP) focus the attention of many researchers engaged in protein structure analysis. The main criteria used in their identification are lack of secondary structure and significant structural variability. This variability takes forms that cannot be identified in the X-ray technique. In the present study, different criteria were used to assess the status of IDP proteins and their fragments recognized as intrinsically disordered regions (IDRs). The status of the hydrophobic core in proteins identified as IDPs and in their complexes was assessed. The status of IDRs as components of the ordering structure resulting from the construction of the hydrophobic core was also assessed. The hydrophobic core is understood as a structure encompassing the entire molecule in the form of a centrally located high concentration of hydrophobicity and a shell with a gradually decreasing level of hydrophobicity until it reaches a level close to zero on the protein surface. It is a model assuming that the protein folding process follows a micellization pattern aiming at exposing polar residues on the surface, with the simultaneous isolation of hydrophobic amino acids from the polar aquatic environment. The use of the model of hydrophobicity distribution in proteins in the form of the 3D Gaussian distribution described on the protein particle introduces the possibility of assessing the degree of similarity to the assumed micelle-like distribution and also enables the identification of deviations and mismatch between the actual distribution and the idealized distribution. The FOD (fuzzy oil drop) model and its modified FOD-M version allow for the quantitative assessment of these differences and the assessment of the relationship of these areas to the protein function. In the present work, the sections of IDRs in protein complexes classified as IDPs are analyzed. The classification "disordered" in the structural sense (lack of secondary structure or high flexibility) does not always entail a mismatch with the structure of the hydrophobic core. Particularly, the interface area, often consisting of IDRs, in many analyzed complexes shows the compliance of the hydrophobicity distribution with the idealized distribution, which proves that matching to the structure of the hydrophobic core does not require secondary structure ordering.
Collapse
Affiliation(s)
- Irena Roterman
- Department of Bioinformatics and Telemedicine, Jagiellonian University—Medical College, Kraków, Poland
| | - Katarzyna Stapor
- Department of Applied Informatics, Faculty of Automatic, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland
| | - Leszek Konieczny
- Chair of Medical Biochemistry, Medical College, Jagiellonian University, Kraków, Poland
| |
Collapse
|
2
|
Biological soft matter: intrinsically disordered proteins in liquid-liquid phase separation and biomolecular condensates. Essays Biochem 2022; 66:831-847. [PMID: 36350034 DOI: 10.1042/ebc20220052] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 10/24/2022] [Accepted: 10/25/2022] [Indexed: 11/10/2022]
Abstract
The facts that many proteins with crucial biological functions do not have unique structures and that many biological processes are compartmentalized into the liquid-like biomolecular condensates, which are formed via liquid-liquid phase separation (LLPS) and are not surrounded by the membrane, are revolutionizing the modern biology. These phenomena are interlinked, as the presence of intrinsic disorder represents an important requirement for a protein to undergo LLPS that drives biogenesis of numerous membrane-less organelles (MLOs). Therefore, one can consider these phenomena as crucial constituents of a new IDP-LLPS-MLO field. Furthermore, intrinsically disordered proteins (IDPs), LLPS, and MLOs represent a clear link between molecular and cellular biology and soft matter and condensed soft matter physics. Both IDP and LLPS/MLO fields are undergoing explosive development and generate the ever-increasing mountain of crucial data. These new data provide answers to so many long-standing questions that it is difficult to imagine that in the very recent past, protein scientists and cellular biologists operated without taking these revolutionary concepts into account. The goal of this essay is not to deliver a comprehensive review of the IDP-LLPS-MLO field but to provide a brief and rather subjective outline of some of the recent developments in these exciting fields.
Collapse
|
3
|
Intrinsically Disordered Proteins: An Overview. Int J Mol Sci 2022; 23:ijms232214050. [PMID: 36430530 PMCID: PMC9693201 DOI: 10.3390/ijms232214050] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2022] [Revised: 11/07/2022] [Accepted: 11/08/2022] [Indexed: 11/16/2022] Open
Abstract
Many proteins and protein segments cannot attain a single stable three-dimensional structure under physiological conditions; instead, they adopt multiple interconverting conformational states. Such intrinsically disordered proteins or protein segments are highly abundant across proteomes, and are involved in various effector functions. This review focuses on different aspects of disordered proteins and disordered protein regions, which form the basis of the so-called "Disorder-function paradigm" of proteins. Additionally, various experimental approaches and computational tools used for characterizing disordered regions in proteins are discussed. Finally, the role of disordered proteins in diseases and their utility as potential drug targets are explored.
Collapse
|
4
|
Chen R, Li X, Yang Y, Song X, Wang C, Qiao D. Prediction of protein-protein interaction sites in intrinsically disordered proteins. Front Mol Biosci 2022; 9:985022. [PMID: 36250006 PMCID: PMC9567019 DOI: 10.3389/fmolb.2022.985022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 07/27/2022] [Indexed: 11/25/2022] Open
Abstract
Intrinsically disordered proteins (IDPs) participate in many biological processes by interacting with other proteins, including the regulation of transcription, translation, and the cell cycle. With the increasing amount of disorder sequence data available, it is thus crucial to identify the IDP binding sites for functional annotation of these proteins. Over the decades, many computational approaches have been developed to predict protein-protein binding sites of IDP (IDP-PPIS) based on protein sequence information. Moreover, there are new IDP-PPIS predictors developed every year with the rapid development of artificial intelligence. It is thus necessary to provide an up-to-date overview of these methods in this field. In this paper, we collected 30 representative predictors published recently and summarized the databases, features and algorithms. We described the procedure how the features were generated based on public data and used for the prediction of IDP-PPIS, along with the methods to generate the feature representations. All the predictors were divided into three categories: scoring functions, machine learning-based prediction, and consensus approaches. For each category, we described the details of algorithms and their performances. Hopefully, our manuscript will not only provide a full picture of the status quo of IDP binding prediction, but also a guide for selecting different methods. More importantly, it will shed light on the inspirations for future development trends and principles.
Collapse
Affiliation(s)
- Ranran Chen
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- National Institute of Health Data Science of China, Shandong University, Jinan, China
| | - Xinlu Li
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- National Institute of Health Data Science of China, Shandong University, Jinan, China
| | - Yaqing Yang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- National Institute of Health Data Science of China, Shandong University, Jinan, China
| | - Xixi Song
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- National Institute of Health Data Science of China, Shandong University, Jinan, China
| | - Cheng Wang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- National Institute of Health Data Science of China, Shandong University, Jinan, China
- *Correspondence: Cheng Wang, ; Dongdong Qiao,
| | - Dongdong Qiao
- Shandong Mental Health Center, Shandong University, Jinan, China
- *Correspondence: Cheng Wang, ; Dongdong Qiao,
| |
Collapse
|
5
|
Zhou T, Rong J, Liu Y, Gong W, Li C. An ensemble approach to predict binding hotspots in protein-RNA interactions based on SMOTE data balancing and random grouping feature selection strategies. Bioinformatics 2022; 38:2452-2458. [PMID: 35253843 DOI: 10.1093/bioinformatics/btac138] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 01/15/2022] [Accepted: 03/02/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The identification of binding hotspots in protein-RNA interactions is crucial for understanding their potential recognition mechanisms and drug design. The experimental methods have many limitations, since they are usually time-consuming and labor-intensive. Thus, developing an effective and efficient theoretical method is urgently needed. RESULTS Here we present SREPRHot, a method to predict hotspots, defined as the residues whose mutation to alanine generate a binding free energy change ≥ 2.0 kcal/mol, while others use a cutoff of 1.0 kcal/mol to obtain balanced datasets. To deal with the dataset imbalance, Synthetic Minority Over-sampling Technique (SMOTE) is utilized to generate minority samples to achieve a dataset balance. Additionally, besides conventional features, we use two types of new features, residue interface propensity previously developed by us, and topological features obtained using node-weighted networks, and propose an effective Random Grouping feature selection strategy combined with a two-step method to determine an optimal feature set. Finally, a stacking ensemble classifier is adopted to build our model. The results show SREPRHot achieves a good performance with SEN, MCC and AUC of 0.900, 0.557 and 0.829 on the independent testing dataset. The comparison study indicates SREPRHot shows a promising performance. AVAILABILITY AND IMPLEMENTATION The source code is available at https://github.com/ChunhuaLiLab/SREPRHot. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tong Zhou
- Falcuty of Environmental and Life Sciences, Beijing University of Technology, Beijing, 100124, China
| | - Jie Rong
- Falcuty of Environmental and Life Sciences, Beijing University of Technology, Beijing, 100124, China
| | - Yang Liu
- Falcuty of Environmental and Life Sciences, Beijing University of Technology, Beijing, 100124, China
| | - Weikang Gong
- Falcuty of Environmental and Life Sciences, Beijing University of Technology, Beijing, 100124, China
| | - Chunhua Li
- Falcuty of Environmental and Life Sciences, Beijing University of Technology, Beijing, 100124, China
| |
Collapse
|
6
|
He H, Zhou Y, Chi Y, He J. Prediction of MoRFs based on sequence properties and convolutional neural networks. BioData Min 2021; 14:39. [PMID: 34391457 PMCID: PMC8364704 DOI: 10.1186/s13040-021-00275-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2021] [Accepted: 08/08/2021] [Indexed: 12/02/2022] Open
Abstract
Background Intrinsically disordered proteins possess flexible 3-D structures, which makes them play an important role in a variety of biological functions. Molecular recognition features (MoRFs) act as an important type of functional regions, which are located within longer intrinsically disordered regions and undergo disorder-to-order transitions upon binding their interaction partners. Results We develop a method, MoRFCNN, to predict MoRFs based on sequence properties and convolutional neural networks (CNNs). The sequence properties contain structural and physicochemical properties which are used to describe the differences between MoRFs and non-MoRFs. Especially, to highlight the correlation between the target residue and adjacent residues, three windows are selected to preprocess the selected properties. After that, these calculated properties are combined into the feature matrix to predict MoRFs through the constructed CNN. Comparing with other existing methods, MoRFCNN obtains better performance. Conclusions MoRFCNN is a new individual MoRFs prediction method which just uses protein sequence properties without evolutionary information. The simulation results show that MoRFCNN is effective and competitive.
Collapse
Affiliation(s)
- Hao He
- School of Electronic and Information Engineering, Hebei University of Technology, Tianjin, China
| | - Yatong Zhou
- School of Electronic and Information Engineering, Hebei University of Technology, Tianjin, China.
| | - Yue Chi
- School of Electronic and Information Engineering, Hebei University of Technology, Tianjin, China
| | - Jingfei He
- School of Electronic and Information Engineering, Hebei University of Technology, Tianjin, China
| |
Collapse
|
7
|
Lyngdoh DL, Nag N, Uversky VN, Tripathi T. Prevalence and functionality of intrinsic disorder in human FG-nucleoporins. Int J Biol Macromol 2021; 175:156-170. [PMID: 33548309 DOI: 10.1016/j.ijbiomac.2021.01.218] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 01/19/2021] [Accepted: 01/31/2021] [Indexed: 11/27/2022]
Abstract
The nuclear-cytoplasmic transport of biomolecules is assisted by the nuclear pores composed of evolutionarily conserved proteins termed nucleoporins (Nups). The central Nups, characterized by multiple FG-repeats, are highly dynamic and contain a high level of intrinsically disordered regions (IDPRs). FG-Nups bind several protein partners and play critical roles in molecular interactions and the regulation of cellular functions through their IDPRs. In the present study, we performed a multiparametric bioinformatics analysis to characterize the prevalence and functionality of IDPRs in human FG-Nups. These analyses revealed that the sequence of all FG-Nups contained >50% IDPRs (except Nup54 and Nup358). Nup98, Nup153, and POM121 were extremely disordered with ~80% IDPRs. The functional disorder-based binding regions in the FG-Nups were identified. The phase separation behavior of FG-Nups indicated that all FG-Nups have the potential to undergo liquid-to-liquid phase separation that could stabilize their liquid state. The inherent structural flexibility in FG-Nups is mechanistically and functionally advantageous. Since certain FG-Nups interact with disease-relevant protein aggregates, their complexes can be exploited for drug design. Furthermore, consideration of the FG-Nups from the intrinsic disorder perspective provides critical information that can guide future experimental studies to uncover novel pathways associated with diseases linked with protein misfolding and aggregation.
Collapse
Affiliation(s)
- Denzelle Lee Lyngdoh
- Molecular and Structural Biophysics Laboratory, Department of Biochemistry, North-Eastern Hill University, Shillong 793022, India
| | - Niharika Nag
- Molecular and Structural Biophysics Laboratory, Department of Biochemistry, North-Eastern Hill University, Shillong 793022, India
| | - Vladimir N Uversky
- Department of Molecular Medicine and Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL 33620, United States
| | - Timir Tripathi
- Molecular and Structural Biophysics Laboratory, Department of Biochemistry, North-Eastern Hill University, Shillong 793022, India.
| |
Collapse
|
8
|
Sharma R, Kumar S, Tsunoda T, Kumarevel T, Sharma A. Single-stranded and double-stranded DNA-binding protein prediction using HMM profiles. Anal Biochem 2020; 612:113954. [PMID: 32946833 DOI: 10.1016/j.ab.2020.113954] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 08/26/2020] [Accepted: 09/10/2020] [Indexed: 10/23/2022]
Abstract
BACKGROUND DNA-binding proteins perform important roles in cellular processes and are involved in many biological activities. These proteins include crucial protein-DNA binding domains and can interact with single-stranded or double-stranded DNA, and accordingly classified as single-stranded DNA-binding proteins (SSBs) or double-stranded DNA-binding proteins (DSBs). Computational prediction of SSBs and DSBs helps in annotating protein functions and understanding of protein-binding domains. RESULTS Performance is reported using the DNA-binding protein dataset that was recently introduced by Wang et al., [1]. The proposed method achieved a sensitivity of 0.600, specificity of 0.792, AUC of 0.758, MCC of 0.369, accuracy of 0.744, and F-measure of 0.536, on the independent test set. CONCLUSION The proposed method with the hidden Markov model (HMM) profiles for feature extraction, outperformed the benchmark method in the literature and achieved an overall improvement of approximately 3%. The source code and supplementary information of the proposed method is available at https://github.com/roneshsharma/Predict-DNA-binding-proteins/wiki.
Collapse
Affiliation(s)
- Ronesh Sharma
- School of Electrical and Electronics Engineering, Fiji National University, Suva, Fiji.
| | - Shiu Kumar
- School of Electrical and Electronics Engineering, Fiji National University, Suva, Fiji.
| | - Tatsuhiko Tsunoda
- Laboratory of Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan; Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan; Laboratory of Medical Science Mathematics, Department of Biological Sciences, Graduate School of Science, University of Tokyo, Tokyo, 113-0033, Japan.
| | - Thirumananseri Kumarevel
- Laboratory for Transcription Structural Biology, RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan.
| | - Alok Sharma
- Laboratory of Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan; Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan; School of Engineering and Physics, The University of the South Pacific, Suva, Fiji; Institute for Integrated and Intelligent Systems, Griffith University, Nathan, Brisbane, QLD, Australia.
| |
Collapse
|
9
|
Abstract
Intrinsically disordered proteins (IDPs) and regions (IDRs) are commonly found in all proteomes analyzed so far. These proteins/regions are subject to numerous posttranslational modifications (PTMs) and alternative splicing, are involved in a wide range of cellular functions, and often facilitate protein-protein interactions (PPIs). Some of these proteins contain molecular recognition features (MoRFs), which are IDRs that bind to partner proteins and undergo disorder-to-order transitions. Although many IDPs/IDRs can fold upon binding, a large fraction of these proteins are known to maintain significant amounts of disorder in their bound states. Being well-recognized interaction specialists, IDPs/IDRs can participate in one-to-many and many-to-one interactions, where one IDP/IDR binds to multiple partners potentially gaining very different structures in the bound state, or where multiple unrelated IDPs/IDRs bind to one partner. As a result, IDPs frequently serve as hubs (i.e., proteins with many links) in complex PPI networks. The goal of this chapter is to describe computational and bioinformatics tools that can be used to look at the disorder status of proteins within a given PPI network and also to gain some knowledge on the disorder-based functionality of the members of this network. To this end, description is provided for some of the use of UniProt and DisProt databases, several databases generating PPI networks (BioGRID, IntAct, DIP, MINT, HPRD, APID, KEGG, and STRING), Composition profiler, some tools for the per-residue disorder predictions (PONDR® VLXT, PONDR® VL3, PONDR® VSL2, PONDR-FIT, and IUPred), binary disorder classifiers CH-plot and CDF-plot and their combined CH-CDF analysis, web-based tools for the visualization of disorder distribution in a query protein (D2P2 and MobiDB), as well as some tools for evaluation disorder-based functionality of proteins (ANCHOR, MoRFpred, DEPP, and ModPred).
Collapse
Affiliation(s)
- Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, USA. .,USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, FL, USA. .,Laboratory of New Methods in Biology, Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, Moscow Region, Russian Federation.
| |
Collapse
|
10
|
Katuwawala A, Ghadermarzi S, Kurgan L. Computational prediction of functions of intrinsically disordered regions. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2019; 166:341-369. [PMID: 31521235 DOI: 10.1016/bs.pmbts.2019.04.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Intrinsically disorder regions (IDRs) are abundant in nature, particularly among Eukaryotes. While they facilitate a wide spectrum of cellular functions including signaling, molecular assembly and recognition, translation, transcription and regulation, only several hundred IDRs are annotated functionally. This annotation gap motivates the development of fast and accurate computational methods that predict IDR functions directly from protein sequences. We introduce and describe a comprehensive collection of 25 methods that provide accurate predictions of IDRs that interact with proteins and nucleic acids, that function as flexible linkers and that moonlight multiple functions. Virtually all of these predictors can be accessed online and many were developed in the last few years. They utilize a wide range of predictive architectures and take advantage of modern machine learning algorithms. Our empirical analysis shows that predictors that are available as webservers enjoy high rates of citations, attesting to their practical value and popularity. The most cited methods include DISOPRED3, ANCHOR, alpha-MoRFpred, MoRFpred, fMoRFpred and MoRFCHiBi. We present two case studies to demonstrate that predictions produced by these computational tools are relatively easy to interpret and that they deliver valuable functional clues. However, the current computational tools cover a relatively narrow range of disorder functions. Further development efforts that would cover a broader range of functions should be pursued. We demonstrate that a sufficient amount of functionally annotated IDRs that are associated with several other disorder functions is already available and can be used to design and validate novel predictors.
Collapse
Affiliation(s)
- Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, United States.
| |
Collapse
|
11
|
Ahmad S, Gromiha MM, Raghava GPS, Schönbach C, Ranganathan S. APBioNet's annual International Conference on Bioinformatics (InCoB) returns to India in 2018. BMC Genomics 2019; 19:266. [PMID: 30999857 PMCID: PMC7402400 DOI: 10.1186/s12864-019-5582-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
InCoB, one of the largest annual bioinformatics conferences in the Asia-Pacific region since its launch in 2002, returned to New Delhi, India after 12 years, with a conference attendance of 314 delegates. The 2018 conference had sessions on Big Data and Algorithms, Next Generation Sequencing and Omics Science, Structure, Function and Interactions, Disease and Drug Discovery and Plant and Agricultural Bioinformatics. The conference also featured an industry track as well as panel discussions on Women in Bioinformatics and Democratization vs. Quality control in academic publishing. Asia Pacific Bioinformatics Interaction & Networking Society (APbians) was launched as an APBionet Special Interest Group. Of the 52 oral presentations made, 22 were accepted in supplemental issues of BMC Bioinformatics, BMC Genomics or BMC Medical Genomics and are briefly reviewed here. Next year’s InCoB will be held in Jakarta, Indonesia from September 10–12, 2019.
Collapse
Affiliation(s)
- Shandar Ahmad
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, 110 067, India
| | - Michael M Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamilnadu, 600 036, India
| | - Gajendra P S Raghava
- Centre for Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, 110020, India
| | - Christian Schönbach
- Department of Biology, School of Science and Technology, Nazarbayev University, Astana, Kazakhstan.,International Research Center for Medical Sciences, Graduate School of Medical Sciences, Kumamoto University, Kumamoto, 860-0811, Japan
| | - Shoba Ranganathan
- Department of Molecular Sciences, Macquarie University, Sydney, NSW, 2109, Australia. .,Transformational Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation, Sydney, Australia.
| |
Collapse
|
12
|
Deng L, Sui Y, Zhang J. XGBPRH: Prediction of Binding Hot Spots at Protein⁻RNA Interfaces Utilizing Extreme Gradient Boosting. Genes (Basel) 2019; 10:genes10030242. [PMID: 30901953 PMCID: PMC6471955 DOI: 10.3390/genes10030242] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 03/14/2019] [Accepted: 03/15/2019] [Indexed: 01/24/2023] Open
Abstract
Hot spot residues at protein⁻RNA complexes are vitally important for investigating the underlying molecular recognition mechanism. Accurately identifying protein⁻RNA binding hot spots is critical for drug designing and protein engineering. Although some progress has been made by utilizing various available features and a series of machine learning approaches, these methods are still in the infant stage. In this paper, we present a new computational method named XGBPRH, which is based on an eXtreme Gradient Boosting (XGBoost) algorithm and can effectively predict hot spot residues in protein⁻RNA interfaces utilizing an optimal set of properties. Firstly, we download 47 protein⁻RNA complexes and calculate a total of 156 sequence, structure, exposure, and network features. Next, we adopt a two-step feature selection algorithm to extract a combination of 6 optimal features from the combination of these 156 features. Compared with the state-of-the-art approaches, XGBPRH achieves better performances with an area under the ROC curve (AUC) score of 0.817 and an F1-score of 0.802 on the independent test set. Meanwhile, we also apply XGBPRH to two case studies. The results demonstrate that the method can effectively identify novel energy hotspots.
Collapse
Affiliation(s)
- Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha 410075, China.
| | - Yuanchao Sui
- School of Computer Science and Engineering, Central South University, Changsha 410075, China.
| | - Jingpu Zhang
- School of Computer and Data Science, Henan University of Urban Construction, Pingdingshan 467000, China.
| |
Collapse
|