1
|
Han Z, Wu Z, Gong W, Zhou W, Chen L, Li C. Allosteric mechanism for SL RNA recognition by polypyrimidine tract binding protein RRM1: An atomistic MD simulation and network-based study. Int J Biol Macromol 2022; 221:763-772. [PMID: 36058398 DOI: 10.1016/j.ijbiomac.2022.08.181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/20/2022] [Accepted: 08/27/2022] [Indexed: 12/01/2022]
Abstract
Polypyrimidine tract-binding protein (PTB), an RNA-binding protein, is involved in the regulation of diverse processes in mRNA metabolism. However, the allosteric modulation of its binding with RNA remains unclear. We explore the dynamic characteristics of PTB RNA recognition motif 1 (RRM1) in its RNA-free and wild-type/mutant RNA-bound states to understand the issues using molecular dynamics (MD) simulation, perturbation response scanning (PRS) and protein structure network (PSN) models. It is found that RNA binding strengthens RRM1 stability, while L151G mutation in α3 helix far away from the interface makes the complex unstable. The latter is caused by long-distance dynamic couplings, which makes intermolecular electrostatic and entropy energies unfavorable. The weakened couplings between interface β sheets and C-terminal parts upon mutation reveal RNA recognition is co-regulated by these regions. Interestingly, PRS analysis reveals the allostery caused by the perturbation on α3 helix has already been pre-encoded in the equilibrium dynamics of the protein structure. PSN analysis shows the details of the allosteric signal transmission, revealing the necessity of strong couplings between α3 helix and interface for maintaining the high binding affinity. This study sheds light on the mechanisms of PTB allostery and RNA recognition and can provide important information for drug design.
Collapse
Affiliation(s)
- Zhongjie Han
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Zhixiang Wu
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Weikang Gong
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Wenxue Zhou
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Lei Chen
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
2
|
Ramakrishnan C, Nagarajan R, Sekijima M, Michael Gromiha M. Molecular dynamics simulations of cognate and non-cognate AspRS-tRNA Asp complexes. J Biomol Struct Dyn 2020; 39:493-501. [PMID: 31900102 DOI: 10.1080/07391102.2019.1711188] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Aspartyl tRNA synthetase (AspRS), one of the 20 aminoacyl-tRNA synthetases, plays an important role in protein synthesis by catalyzing the aminoacylation reaction and synthesises Aspartyl-tRNA (tRNAAsp). A typical three-dimensional structure of AspRS comprises three distinct domains for the recognition of cognate tRNA and catalysis, namely, anti-codon binding domain/N-terminal domain, hinge domain and catalytic domain through their interactions with anti-codon loop, D-stem and acceptor arm of cognate tRNA, respectively. In this work, we have studied the structural characteristics of each domain of AspRS to understand the recognition mechanism of tRNAAsp using molecular dynamics simulations. The dynamics of AspRS-tRNAAsp complexes from E.coli (cognate and non-cognate), S.cerevisiae (cognate) and T.thermophilus (non-cognate) were compared to understand the differences in recognition of cognate and non-cognate tRNAs. Our results explain that the conformational changes associated with the recognition of tRNA occur only in the cognate complexes. Among the cognate complexes, the conformational changes in yeast AspRS are highly controlled during tRNAAsp recognition than that of in the E. coli AspRS. Moreover, the functional motions required for the tRNA recognition are observed only in the cognate complexes, and the conformational changes in AspRS and their recognition of tRNAAsp are organism specific.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- C Ramakrishnan
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
| | - R Nagarajan
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
| | - M Sekijima
- Advanced Computational Drug Discovery Unit, Tokyo Institute of Technology, Yokohama, Japan
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India.,Advanced Computational Drug Discovery Unit, Tokyo Tech World Research Hub Initiative (WRHI), Institute of Innovative Research, Tokyo Institute of Technology, Yokohama, Japan
| |
Collapse
|
3
|
Tanwar H, Kumar DT, Doss CGP, Zayed H. Bioinformatics classification of mutations in patients with Mucopolysaccharidosis IIIA. Metab Brain Dis 2019; 34:1577-1594. [PMID: 31385193 PMCID: PMC6858298 DOI: 10.1007/s11011-019-00465-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Accepted: 07/08/2019] [Indexed: 02/06/2023]
Abstract
Mucopolysaccharidosis (MPS) IIIA, also known as Sanfilippo syndrome type A, is a severe, progressive disease that affects the central nervous system (CNS). MPS IIIA is inherited in an autosomal recessive manner and is caused by a deficiency in the lysosomal enzyme sulfamidase, which is required for the degradation of heparan sulfate. The sulfamidase is produced by the N-sulphoglucosamine sulphohydrolase (SGSH) gene. In MPS IIIA patients, the excess of lysosomal storage of heparan sulfate often leads to mental retardation, hyperactive behavior, and connective tissue impairments, which occur due to various known missense mutations in the SGSH, leading to protein dysfunction. In this study, we focused on three mutations (R74C, S66W, and R245H) based on in silico pathogenic, conservation, and stability prediction tool studies. The three mutations were further subjected to molecular dynamic simulation (MDS) analysis using GROMACS simulation software to observe the structural changes they induced, and all the mutants exhibited maximum deviation patterns compared with the native protein. Conformational changes were observed in the mutants based on various geometrical parameters, such as conformational stability, fluctuation, and compactness, followed by hydrogen bonding, physicochemical properties, principal component analysis (PCA), and salt bridge analyses, which further validated the underlying cause of the protein instability. Additionally, secondary structure and surrounding amino acid analyses further confirmed the above results indicating the loss of protein function in the mutants compared with the native protein. The present results reveal the effects of three mutations on the enzymatic activity of sulfamidase, providing a molecular explanation for the cause of the disease. Thus, this study allows for a better understanding of the effect of SGSH mutations through the use of various computational approaches in terms of both structure and functions and provides a platform for the development of therapeutic drugs and potential disease treatments.
Collapse
Affiliation(s)
- Himani Tanwar
- Department of Integrative Biology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India
| | - D Thirumal Kumar
- Department of Integrative Biology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India
| | - C George Priya Doss
- Department of Integrative Biology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India.
| | - Hatem Zayed
- Department of Biomedical Sciences, College of Health and Sciences, Qatar University, Doha, Qatar.
| |
Collapse
|
4
|
Agrahari AK, Muskan M, George Priya Doss C, Siva R, Zayed H. Computational insights of K1444N substitution in GAP-related domain of NF1 gene associated with neurofibromatosis type 1 disease: a molecular modeling and dynamics approach. Metab Brain Dis 2018; 33:1443-1457. [PMID: 29804243 DOI: 10.1007/s11011-018-0251-1] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/12/2017] [Accepted: 05/17/2018] [Indexed: 12/18/2022]
Abstract
The NF1 gene encodes for neurofibromin protein, which is ubiquitously expressed, but most highly in the central nervous system. Non-synonymous SNPs (nsSNPs) in the NF1 gene were found to be associated with Neurofibromatosis Type 1 disease, which is characterized by the growth of tumors along nerves in the skin, brain, and other parts of the body. In this study, we used several in silico predictions tools to analyze 16 nsSNPs in the RAS-GAP domain of neurofibromin, the K1444N (K1423N) mutation was predicted as the most pathogenic. The comparative molecular dynamic simulation (MDS; 50 ns) between the wild type and the K1444N (K1423N) mutant suggested a significant change in the electrostatic potential. In addition, the RMSD, RMSF, Rg, hydrogen bonds, and PCA analysis confirmed the loss of flexibility and increase in compactness of the mutant protein. Further, SASA analysis revealed exchange between hydrophobic and hydrophilic residues from the core of the RAS-GAP domain to the surface of the mutant domain, consistent with the secondary structure analysis that showed significant alteration in the mutant protein conformation. Our data concludes that the K1444N (K1423N) mutant lead to increasing the rigidity and compactness of the protein. This study provides evidence of the benefits of the computational tools in predicting the pathogenicity of genetic mutations and suggests the application of MDS and different in silico prediction tools for variant assessment and classification in genetic clinics.
Collapse
Affiliation(s)
- Ashish Kumar Agrahari
- Department of Integrative Biology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India
| | - Meghana Muskan
- Department of Integrative Biology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India
| | - C George Priya Doss
- Department of Integrative Biology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India.
| | - R Siva
- Department of Integrative Biology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India
| | - Hatem Zayed
- Department of Biomedical Sciences, College of Health and Sciences, Qatar University, Doha, Qatar.
| |
Collapse
|
5
|
Kulandaisamy A, Srivastava A, Kumar P, Nagarajan R, Priya SB, Gromiha MM. Identification and Analysis of Key Residues in Protein-RNA Complexes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1436-1444. [PMID: 29993582 DOI: 10.1109/tcbb.2018.2834387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Protein-RNA complexes play important roles in various biological processes. The functions of protein-RNA complexes are dictated by their interactions, binding, stability, and affinity. In this work, we have identified the key residues (KRs), which are involved in both stability and binding. We found that 42 percent of considered proteins share common binding and stabilizing residues, whereas these residues are distinct in 58 percent of the proteins. Overall, 5 percent of stabilizing and 3 percent of binding residues serve as key residues. These residues are enriched with the combination of polar, charged, aliphatic, and aromatic residues. Analysis on subclasses of protein-RNA complexes based on protein structural class, function and RNA type showed that regulatory proteins, and complexes with single stranded RNA and rRNA have appreciable number of key residues. Specifically, Arg, Tyr, and Thr are preferred in most of the subclasses of protein-RNA complexes. In addition, residues with similar chemical behavior have different preferences to be KRs, such that Arg, Tyr, Val, and Thr are preferred over Lys, Trp, Ile, and Ser, respectively. Atomic level contacts revealed that charged and polar-nonpolar contacts are dominant in enzymes, polar in structural, and nonpolar in regulatory proteins. On the other hand, polar-nonpolar contacts are enriched in all these classes of protein-RNA complexes. Further, the influence of sequence and structural features such as conservation score, surrounding hydrophobicity, solvent accessibility, secondary structure, and long-range order in key residues are also discussed. We envisage that the present study provides insights to understand the structural and functional aspects of protein-RNA complexes.
Collapse
|
6
|
Hu W, Qin L, Li M, Pu X, Guo Y. A structural dissection of protein–RNA interactions based on different RNA base areas of interfaces. RSC Adv 2018; 8:10582-10592. [PMID: 35540439 PMCID: PMC9078961 DOI: 10.1039/c8ra00598b] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2018] [Accepted: 03/05/2018] [Indexed: 11/21/2022] Open
Abstract
Protein–RNA interactions are very common cellular processes, but the mechanisms of interactions are not fully understood, mainly due to the complicated RNA structures. By the elaborate investigation on RNA structures of protein–RNA complexes, it was firstly found in this paper that RNAs in these complexes could be clearly classified into three classes (high, medium and low) based on the different levels of Pbase (the percentage of base area buried in the RNA interface). In view of the three RNA classes, more detailed analyses on protein–RNA interactions were comprehensively performed from various aspects, including interface area, structure, composition and interaction force, so as to achieve a deeper understanding of the recognition specificity for the three classes of protein–RNA interactions. According to our classification strategy, the three complex classes have significant differences in terms of almost all properties. Complexes in the high class have short and extended RNA structures and behave like protein–ssDNA interactions. Their hydrogen bonds and hydrophobic interactions are strong. For complexes in low class, their RNA structures are mainly double-stranded, like protein–dsDNA interactions, and electrostatic interactions frequently occur. The complexes in medium class have the longest RNA chains and largest average interface area. Meanwhile, they do not show any preference for the interaction force. On average, in terms of composition, secondary structures and intermolecular physicochemical properties, significant feature preferences can be observed in high and low complexes, but no highly specific features are found for medium complexes. We found that our proposed Pbase is an important parameter which can be used as a new determinant to distinguish protein–RNA complexes. For high and low complexes, we can more easily understand the specificity of the recognition process from the interface features than for medium complexes. In the future, medium complexes should be our research focus to further structurally analyze from more feature aspects. Overall, this study may contribute to further understanding of the mechanism of protein–RNA interactions on a more detailed level. Qualitative and quantitative measurements of the influence of structure and composition of RNA interfaces on protein–RNA interactions.![]()
Collapse
Affiliation(s)
- Wen Hu
- College of Chemistry
- Sichuan University
- Chengdu 610064
- People's Republic of China
| | - Liu Qin
- College of Chemistry
- Sichuan University
- Chengdu 610064
- People's Republic of China
| | - Menglong Li
- College of Chemistry
- Sichuan University
- Chengdu 610064
- People's Republic of China
| | - Xuemei Pu
- College of Chemistry
- Sichuan University
- Chengdu 610064
- People's Republic of China
| | - Yanzhi Guo
- College of Chemistry
- Sichuan University
- Chengdu 610064
- People's Republic of China
| |
Collapse
|
7
|
Zhang J, Ma Z, Kurgan L. Comprehensive review and empirical analysis of hallmarks of DNA-, RNA- and protein-binding residues in protein chains. Brief Bioinform 2017; 20:1250-1268. [DOI: 10.1093/bib/bbx168] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Revised: 11/15/2017] [Indexed: 11/13/2022] Open
Abstract
Abstract
Proteins interact with a variety of molecules including proteins and nucleic acids. We review a comprehensive collection of over 50 studies that analyze and/or predict these interactions. While majority of these studies address either solely protein–DNA or protein–RNA binding, only a few have a wider scope that covers both protein–protein and protein–nucleic acid binding. Our analysis reveals that binding residues are typically characterized with three hallmarks: relative solvent accessibility (RSA), evolutionary conservation and propensity of amino acids (AAs) for binding. Motivated by drawbacks of the prior studies, we perform a large-scale analysis to quantify and contrast the three hallmarks for residues that bind DNA-, RNA-, protein- and (for the first time) multi-ligand-binding residues that interact with DNA and proteins, and with RNA and proteins. Results generated on a well-annotated data set of over 23 000 proteins show that conservation of binding residues is higher for nucleic acid- than protein-binding residues. Multi-ligand-binding residues are more conserved and have higher RSA than single-ligand-binding residues. We empirically show that each hallmark discriminates between binding and nonbinding residues, even predicted RSA, and that combining them improves discriminatory power for each of the five types of interactions. Linear scoring functions that combine these hallmarks offer good predictive performance of residue-level propensity for binding and provide intuitive interpretation of predictions. Better understanding of these residue-level interactions will facilitate development of methods that accurately predict binding in the exponentially growing databases of protein sequences.
Collapse
|
8
|
Chang S, Zhang DW, Xu L, Wan H, Hou TJ, Kong R. Exploring the molecular basis of RNA recognition by the dimeric RNA-binding protein via molecular simulation methods. RNA Biol 2016; 13:1133-1143. [PMID: 27592836 DOI: 10.1080/15476286.2016.1223007] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
RNA-binding protein with multiple splicing (RBPMS) is critical for axon guidance, smooth muscle plasticity, and regulation of cancer cell proliferation and migration. Recently, different states of the RNA-recognition motif (RRM) of RBPMS, one in its free form and another in complex with CAC-containing RNA, were determined by X-ray crystallography. In this article, the free RRM domain, its wild type complex and 2 mutant complex systems are studied by molecular dynamics (MD) simulations. Through comparison of free RRM domain and complex systems, it's found that the RNA binding facilitates stabilizing the RNA-binding interface of RRM domain, especially the C-terminal loop. Although both R38Q and T103A/K104A mutations reduce the binding affinity of RRM domain and RNA, the underlining mechanisms are different. Principal component analysis (PCA) and Molecular mechanics Poisson-Boltzmann surface area (MM/PBSA) methods were used to explore the dynamical and recognition mechanisms of RRM domain and RNA. R38Q mutation is positioned on the homodimerization interface and mainly induces the large fluctuations of RRM domains. This mutation does not directly act on the RNA-binding interface, but some interfacial hydrogen bonds are weakened. In contrast, T103A/K104A mutations are located on the RNA-binding interface of RRM domain. These mutations obviously break most of high occupancy hydrogen bonds in the RNA-binding interface. Meanwhile, the key interfacial residues lose their favorable energy contributions upon RNA binding. The ranking of calculated binding energies in 3 complex systems is well consistent with that of experimental binding affinities. These results will be helpful in understanding the RNA recognition mechanisms of RRM domain.
Collapse
Affiliation(s)
- Shan Chang
- a Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology , Changzhou , China
| | - Da-Wei Zhang
- a Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology , Changzhou , China
| | - Lei Xu
- a Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology , Changzhou , China
| | - Hua Wan
- b College of Mathematics and Informatics, South China Agricultural University , Guangzhou , China
| | - Ting-Jun Hou
- c College of Pharmaceutical Sciences, Zhejiang University , Hangzhou , China
| | - Ren Kong
- a Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology , Changzhou , China
| |
Collapse
|
9
|
Nagarajan R, Archana A, Thangakani AM, Jemimah S, Velmurugan D, Gromiha MM. PDBparam: Online Resource for Computing Structural Parameters of Proteins. Bioinform Biol Insights 2016; 10:73-80. [PMID: 27330281 PMCID: PMC4909059 DOI: 10.4137/bbi.s38423] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2016] [Revised: 04/20/2016] [Accepted: 04/24/2016] [Indexed: 02/07/2023] Open
Abstract
Understanding the structure-function relationship in proteins is a longstanding goal in molecular and computational biology. The development of structure-based parameters has helped to relate the structure with the function of a protein. Although several structural features have been reported in the literature, no single server can calculate a wide-ranging set of structure-based features from protein three-dimensional structures. In this work, we have developed a web-based tool, PDBparam, for computing more than 50 structure-based features for any given protein structure. These features are classified into four major categories: (i) interresidue interactions, which include short-, medium-, and long-range interactions, contact order, long-range order, total contact distance, contact number, and multiple contact index, (ii) secondary structure propensities such as α-helical propensity, β-sheet propensity, and propensity of amino acids to exist at various positions of α-helix and amino acid compositions in high B-value regions, (iii) physicochemical properties containing ionic interactions, hydrogen bond interactions, hydrophobic interactions, disulfide interactions, aromatic interactions, surrounding hydrophobicity, and buriedness, and (iv) identification of binding site residues in protein-protein, protein-nucleic acid, and protein-ligand complexes. The server can be freely accessed at http://www.iitm.ac.in/bioinfo/pdbparam/. We suggest the use of PDBparam as an effective tool for analyzing protein structures.
Collapse
Affiliation(s)
- R. Nagarajan
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
| | - A. Archana
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
| | - A. Mary Thangakani
- CAS in Crystallography and Biophysics, University of Madras, Chennai, India
- Bioinformatics Infrastructure Facility, University of Madras, Chennai, India
| | - S. Jemimah
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
| | - D. Velmurugan
- CAS in Crystallography and Biophysics, University of Madras, Chennai, India
- Bioinformatics Infrastructure Facility, University of Madras, Chennai, India
| | - M. Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
| |
Collapse
|
10
|
Anoosha P, Huang LT, Sakthivel R, Karunagaran D, Gromiha MM. Discrimination of driver and passenger mutations in epidermal growth factor receptor in cancer. Mutat Res 2015; 780:24-34. [PMID: 26264175 DOI: 10.1016/j.mrfmmm.2015.07.005] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2015] [Revised: 05/21/2015] [Accepted: 07/07/2015] [Indexed: 06/04/2023]
Abstract
Cancer is one of the most life-threatening diseases and mutations in several genes are the vital cause in tumorigenesis. Protein kinases play essential roles in cancer progression and specifically, epidermal growth factor receptor (EGFR) is an important target for cancer therapy. In this work, we have developed a method to classify single amino acid polymorphisms (SAPs) in EGFR into disease-causing (driver) and neutral (passenger) mutations using both sequence and structure based features of the mutation site by machine learning approaches. We compiled a set of 222 features and selected a set of 21 properties utilizing feature selection methods, for maximizing the prediction performance. In a set of 540 mutants, we obtained an overall classification accuracy of 67.8% with 10 fold cross validation using support vector machines. Further, the mutations have been grouped into four sets based on secondary structure and accessible surface area, which enhanced the overall classification accuracy to 80.2%, 81.9%, 77.9% and 75.1% for helix, strand, coil-buried and coil-exposed mutants, respectively. The method was tested with a blind dataset of 60 mutations, which showed an average accuracy of 85.4%. These accuracy levels are superior to other methods available in the literature for EGFR mutants, with an increase of more than 30%. Moreover, we have screened all possible single amino acid polymorphisms (SAPs) in EGFR and suggested the probable driver and passenger mutations, which would help in the development of mutation specific drugs for cancer treatment.
Collapse
Affiliation(s)
- P Anoosha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600 036, Tamil Nadu, India
| | - Liang-Tsung Huang
- Department of Medical Informatics, Tzu Chi University, Hualien 970, Taiwan
| | - R Sakthivel
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600 036, Tamil Nadu, India
| | - D Karunagaran
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600 036, Tamil Nadu, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600 036, Tamil Nadu, India.
| |
Collapse
|
11
|
Motion GB, Howden AJM, Huitema E, Jones S. DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool. Nucleic Acids Res 2015; 43:e158. [PMID: 26304539 PMCID: PMC4678848 DOI: 10.1093/nar/gkv805] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2015] [Accepted: 07/28/2015] [Indexed: 11/26/2022] Open
Abstract
There are currently 151 plants with draft genomes available but levels of functional annotation for putative protein products are low. Therefore, accurate computational predictions are essential to annotate genomes in the first instance, and to provide focus for the more costly and time consuming functional assays that follow. DNA-binding proteins are an important class of proteins that require annotation, but current computational methods are not applicable for genome wide predictions in plant species. Here, we explore the use of species and lineage specific models for the prediction of DNA-binding proteins in plants. We show that a species specific support vector machine model based on Arabidopsis sequence data is more accurate (accuracy 81%) than a generic model (74%), and based on this we develop a plant specific model for predicting DNA-binding proteins. We apply this model to the tomato proteome and demonstrate its ability to perform accurate high-throughput prediction of DNA-binding proteins. In doing so, we have annotated 36 currently uncharacterised proteins by assigning a putative DNA-binding function. Our model is publically available and we propose it be used in combination with existing tools to help increase annotation levels of DNA-binding proteins encoded in plant genomes.
Collapse
Affiliation(s)
- Graham B Motion
- Division of Plant Sciences, University of Dundee at the James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK Cell and Molecular Sciences, James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| | - Andrew J M Howden
- Division of Plant Sciences, University of Dundee at the James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| | - Edgar Huitema
- Division of Plant Sciences, University of Dundee at the James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| | - Susan Jones
- Information and Computational Sciences, James Hutton Institute, Invergowrie, Dundee DD2 5DA, UK
| |
Collapse
|