1
|
Nawn D, Hassan SS, Sil M, Ghosh A, Goswami A, Basu P, Dayhoff GW, Lundstrom K, Uversky VN. The distal-proximal relationships among the human moonlighting proteins: Evolutionary hotspots and Darwinian checkpoints. Int J Biol Macromol 2024; 259:128998. [PMID: 38176503 DOI: 10.1016/j.ijbiomac.2023.128998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 12/19/2023] [Accepted: 12/21/2023] [Indexed: 01/06/2024]
Abstract
Moonlighting proteins, known for their ability to perform multiple, often unrelated functions within a single polypeptide chain, challenge the traditional "one gene, one protein, one function" paradigm. As organisms evolved, their genomes remained relatively stable in size, but the introduction of post-translational modifications and sub-strategies like protein promiscuity and intrinsic disorder enabled multifunctionality. Enzymes, in particular, exemplify this phenomenon, engaging in unrelated processes alongside their primary catalytic roles. This study employs a systematic, quantitative informatics approach to shed light on human moonlighting protein sequences. Phylogenetic analyses of human moonlighting proteins are presented, elucidating the distal-proximal relationships among these proteins based on sequence-derived quantitative features. The findings unveil the captivating world of human moonlighting proteins, urging further investigations in the emerging field of moonlighting proteomics, with the potential for significant contributions to our understanding of multifunctional proteins and their roles in diverse cellular processes and diseases.
Collapse
Affiliation(s)
- Debaleena Nawn
- Biological Science Division, Indian Statistical Institute, 203 B.T Road, Kolkata, 700108, West Bengal, India; Indian Research Institute for Integrated Medicine (IRIIM), Unsani, Howrah, 711302, West Bengal, India.
| | - Sk Sarif Hassan
- Department of Mathematics, Pingla Thana Mahavidyalaya, Maligram, Paschim Medinipur, West Bengal, India.
| | - Moumita Sil
- Biological Science Division, Indian Statistical Institute, 203 B.T Road, Kolkata, 700108, West Bengal, India.
| | - Ankita Ghosh
- Biological Science Division, Indian Statistical Institute, 203 B.T Road, Kolkata, 700108, West Bengal, India.
| | - Arunava Goswami
- Biological Science Division, Indian Statistical Institute, 203 B.T Road, Kolkata, 700108, West Bengal, India.
| | - Pallab Basu
- School of Physics, University of the Witwatersrand, Johannesburg, Braamfontein 2000, South Africa; Woxsen School of Sciences, Woxsen University, Hyderabad 500 033, Telangana, India.
| | - Guy W Dayhoff
- Department of Chemistry, University of South Florida, Tampa, FL 33612, USA.
| | | | - Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA.
| |
Collapse
|
2
|
Zheng R, Huang Z, Deng L. Large-scale predicting protein functions through heterogeneous feature fusion. Brief Bioinform 2023:bbad243. [PMID: 37401369 DOI: 10.1093/bib/bbad243] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Revised: 05/18/2023] [Accepted: 06/12/2023] [Indexed: 07/05/2023] Open
Abstract
As the volume of protein sequence and structure data grows rapidly, the functions of the overwhelming majority of proteins cannot be experimentally determined. Automated annotation of protein function at a large scale is becoming increasingly important. Existing computational prediction methods are typically based on expanding the relatively small number of experimentally determined functions to large collections of proteins with various clues, including sequence homology, protein-protein interaction, gene co-expression, etc. Although there has been some progress in protein function prediction in recent years, the development of accurate and reliable solutions still has a long way to go. Here we exploit AlphaFold predicted three-dimensional structural information, together with other non-structural clues, to develop a large-scale approach termed PredGO to annotate Gene Ontology (GO) functions for proteins. We use a pre-trained language model, geometric vector perceptrons and attention mechanisms to extract heterogeneous features of proteins and fuse these features for function prediction. The computational results demonstrate that the proposed method outperforms other state-of-the-art approaches for predicting GO functions of proteins in terms of both coverage and accuracy. The improvement of coverage is because the number of structures predicted by AlphaFold is greatly increased, and on the other hand, PredGO can extensively use non-structural information for functional prediction. Moreover, we show that over 205 000 ($\sim $100%) entries in UniProt for human are annotated by PredGO, over 186 000 ($\sim $90%) of which are based on predicted structure. The webserver and database are available at http://predgo.denglab.org/.
Collapse
Affiliation(s)
- Rongtao Zheng
- School of Computer Science and Engineering, Central South University, 410000 Changsha, China
| | - Zhijian Huang
- School of Computer Science and Engineering, Central South University, 410000 Changsha, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, 410000 Changsha, China
| |
Collapse
|
3
|
Khan K, Alhar MSO, Abbas MN, Abbas SQ, Kazi M, Khan SA, Sadiq A, Hassan SSU, Bungau S, Jalal K. Integrated Bioinformatics-Based Subtractive Genomics Approach to Decipher the Therapeutic Drug Target and Its Possible Intervention against Brucellosis. Bioengineering (Basel) 2022; 9:633. [PMID: 36354544 PMCID: PMC9687753 DOI: 10.3390/bioengineering9110633] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 10/28/2022] [Accepted: 10/29/2022] [Indexed: 11/16/2023] Open
Abstract
Brucella suis, one of the causative agents of brucellosis, is Gram-negative intracellular bacteria that may be found all over the globe and it is a significant facultative zoonotic pathogen found in livestock. It may adapt to a phagocytic environment, reproduce, and develop resistance to harmful environments inside host cells, which is a crucial part of the Brucella life cycle making it a worldwide menace. The molecular underpinnings of Brucella pathogenicity have been substantially elucidated due to comprehensive methods such as proteomics. Therefore, we aim to explore the complete Brucella suis proteome to prioritize the novel proteins as drug targets via subtractive proteo-genomics analysis, an effort to conjecture the existence of distinct pathways in the development of brucellosis. Consequently, 38 unique metabolic pathways having 503 proteins were observed while among these 503 proteins, the non-homologs (n = 421), essential (n = 350), drug-like (n = 114), virulence (n = 45), resistance (n = 42), and unique to pathogen proteins were retrieved from Brucella suis. The applied subsequent hierarchical shortlisting resulted in a protein, i.e., isocitrate lyase, that may act as potential drug target, which was finalized after the extensive literature survey. The interacting partners for these shortlisted drug targets were identified through the STRING database. Moreover, structure-based studies were also performed on isocitrate lyase to further analyze its function. For that purpose, ~18,000 ZINC compounds were screened to identify new potent drug candidates against isocitrate lyase for brucellosis. It resulted in the shortlisting of six compounds, i.e., ZINC95543764, ZINC02688148, ZINC20115475, ZINC04232055, ZINC04231816, and ZINC04259566 that potentially inhibit isocitrate lyase. However, the ADMET profiling showed that all compounds fulfill ADMET properties except for ZINC20115475 showing positive Ames activity; whereas, ZINC02688148, ZINC04259566, ZINC04232055, and ZINC04231816 showed hepatoxicity while all compounds were observed to have no skin sensitization. In light of these parameters, we recommend ZINC95543764 compound for further experimental studies. According to the present research, which uses subtractive genomics, proteins that might serve as therapeutic targets and potential lead options for eradicating brucellosis have been narrowed down.
Collapse
Affiliation(s)
- Kanwal Khan
- Dr. Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi City 75270, Pakistan
| | | | - Muhammad Naseer Abbas
- Department of Pharmacy, Kohat University of Science and Technology, Kohat 26000, Pakistan
| | - Syed Qamar Abbas
- Department of Pharmacy, Sarhad University of Science and Technology, Peshawar 25000, Pakistan
| | - Mohsin Kazi
- Department of Pharmaceutics, College of Pharmacy, P.O. Box-2457, King Saud University, Riyadh 11451, Saudi Arabia
| | - Saeed Ahmad Khan
- Department of Pharmacy, Kohat University of Science and Technology, Kohat 26000, Pakistan
- Division of Molecular Pharmaceutics and Drug Delivery, The University of Texas at Austin, 2409 University Ave., Austin, TX 78712, USA
| | - Abdul Sadiq
- Department of Pharmacy, Faculty of Biological Sciences, University of Malakand, Chakdara 18000, Pakistan
| | - Syed Shams ul Hassan
- Shanghai Key Laboratory for Molecular Engineering of Chiral Drugs, School of Pharmacy, Shanghai Jiao Tong University, Shanghai 200240, China
- Department of Natural Product Chemistry, School of Pharmacy, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Simona Bungau
- Department of Pharmacy, Faculty of Medicine and Pharmacy, University of Oradea, 410028 Oradea, Romania
| | - Khurshid Jalal
- HEJ Research Institute of Chemistry International Center for Chemical and Biological Sciences, University of Karachi, Karachi City 75270, Pakistan
| |
Collapse
|
4
|
Riziotis IG, Thornton JM. Capturing the geometry, function, and evolution of enzymes with 3D templates. Protein Sci 2022; 31:e4363. [PMID: 35762726 PMCID: PMC9207746 DOI: 10.1002/pro.4363] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 05/06/2022] [Accepted: 05/14/2022] [Indexed: 11/05/2022]
Abstract
Structural templates are 3D signatures representing protein functional sites, such as ligand binding cavities, metal coordination motifs, or catalytic sites. Here we explore methods to generate template libraries and algorithms to query structures for conserved 3D motifs. Applications of templates are discussed, as well as some exemplar cases for examining evolutionary links in enzymes. We also introduce the concept of using more than one template per structure to represent flexible sites, as an approach to better understand catalysis through snapshots captured in enzyme structures. Functional annotation from structure is an important topic that has recently resurfaced due to the new more accurate methods of protein structure prediction. Therefore, we anticipate that template-based functional site detection will be a powerful tool in the task of characterizing a vast number of new protein models.
Collapse
|
5
|
Babi J, Zhu L, Lin A, Uva A, El‐Haddad H, Peloewetse A, Tran H. Self‐assembled free‐floating
nanomaterials from
sequence‐defined
polymers. JOURNAL OF POLYMER SCIENCE 2021. [DOI: 10.1002/pol.20210366] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Jon Babi
- Department of Chemistry University of Toronto Toronto Ontario Canada
| | - Linglan Zhu
- Department of Chemistry University of Toronto Toronto Ontario Canada
| | - Angela Lin
- Department of Chemistry University of Toronto Toronto Ontario Canada
| | - Azalea Uva
- Department of Chemistry University of Toronto Toronto Ontario Canada
| | - Hana El‐Haddad
- Department of Chemistry University of Toronto Toronto Ontario Canada
| | - Atang Peloewetse
- Department of Chemistry University of Toronto Toronto Ontario Canada
| | - Helen Tran
- Department of Chemistry University of Toronto Toronto Ontario Canada
- Department of Chemical Engineering University of Toronto Toronto Ontario Canada
| |
Collapse
|
6
|
Trisolini L, Gambacorta N, Gorgoglione R, Montaruli M, Laera L, Colella F, Volpicella M, De Grassi A, Pierri CL. FAD/NADH Dependent Oxidoreductases: From Different Amino Acid Sequences to Similar Protein Shapes for Playing an Ancient Function. J Clin Med 2019; 8:jcm8122117. [PMID: 31810296 PMCID: PMC6947548 DOI: 10.3390/jcm8122117] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Revised: 11/11/2019] [Accepted: 11/18/2019] [Indexed: 12/29/2022] Open
Abstract
Flavoprotein oxidoreductases are members of a large protein family of specialized dehydrogenases, which include type II NADH dehydrogenase, pyridine nucleotide-disulphide oxidoreductases, ferredoxin-NAD+ reductases, NADH oxidases, and NADH peroxidases, playing a crucial role in the metabolism of several prokaryotes and eukaryotes. Although several studies have been performed on single members or protein subgroups of flavoprotein oxidoreductases, a comprehensive analysis on structure-function relationships among the different members and subgroups of this great dehydrogenase family is still missing. Here, we present a structural comparative analysis showing that the investigated flavoprotein oxidoreductases have a highly similar overall structure, although the investigated dehydrogenases are quite different in functional annotations and global amino acid composition. The different functional annotation is ascribed to their participation in species-specific metabolic pathways based on the same biochemical reaction, i.e., the oxidation of specific cofactors, like NADH and FADH2. Notably, the performed comparative analysis sheds light on conserved sequence features that reflect very similar oxidation mechanisms, conserved among flavoprotein oxidoreductases belonging to phylogenetically distant species, as the bacterial type II NADH dehydrogenases and the mammalian apoptosis-inducing factor protein, until now retained as unique protein entities in Bacteria/Fungi or Animals, respectively. Furthermore, the presented computational analyses will allow consideration of FAD/NADH oxidoreductases as a possible target of new small molecules to be used as modulators of mitochondrial respiration for patients affected by rare diseases or cancer showing mitochondrial dysfunction, or antibiotics for treating bacterial/fungal/protista infections.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Anna De Grassi
- Correspondence: (A.D.G.); or (C.L.P.); Tel.: +39-080-544-3614 (A.D.G. & C.L.P.); Fax: +39-080-544-2770 (A.D.G. & C.L.P.)
| | - Ciro Leonardo Pierri
- Correspondence: (A.D.G.); or (C.L.P.); Tel.: +39-080-544-3614 (A.D.G. & C.L.P.); Fax: +39-080-544-2770 (A.D.G. & C.L.P.)
| |
Collapse
|
7
|
Abundant Perithecial Protein (APP) from Neurospora is a primitive functional analog of ocular crystallins. Biochem Biophys Res Commun 2019; 516:796-800. [PMID: 31255285 DOI: 10.1016/j.bbrc.2019.06.102] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 06/18/2019] [Indexed: 11/21/2022]
Abstract
The eye arose during the Cambrian explosion from pre-existing proteins that would have been recruited for the formation of the specialized components of this organ, such as the transparent lens. Proteins suitable for the role of lens crystallins would need to possess unusual physical properties and the study of such earliest analogs of ocular crystallins would add to our understanding of the nature of recruitment of proteins as lens/corneal crystallins. We show that the Abundant Perithecial Protein (APP) of the fungi Neurospora and Sordaria fulfils the criteria for an early crystallin analog. The perithecia in these fungal species are phototropic, and APP accumulates at a high concentration in the neck of the pitcher-shaped perithecium. Spores are formed at the base of the perithecium, and light contributes to their maturation. The hydrodynamic properties of APP appear to exclude dimer formation or aggregation at high protein concentrations. APP is also deficient in Ca2+ binding, a property seen in its close homolog, the calcium-binding cell adhesion molecule (DdCAD-1) from Dictyostelium discoidum. Comparable to crystallins, APP occurs in high concentrations and seems to have dispensed with Ca2+ binding in exchange for greater stability. These crystallin-like attributes of APP lead us to demonstrate that it is a primitive form of ocular crystallins.
Collapse
|
8
|
Evolutionary convergence in the biosyntheses of the imidazole moieties of histidine and purines. PLoS One 2018; 13:e0196349. [PMID: 29698445 PMCID: PMC5919458 DOI: 10.1371/journal.pone.0196349] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Accepted: 04/11/2018] [Indexed: 12/14/2022] Open
Abstract
Background The imidazole group is an ubiquitous chemical motif present in several key types of biomolecules. It is a structural moiety of purines, and plays a central role in biological catalysis as part of the side-chain of histidine, the amino acid most frequently found in the catalytic site of enzymes. Histidine biosynthesis starts with both ATP and the pentose phosphoribosyl pyrophosphate (PRPP), which is also the precursor for the de novo synthesis of purines. These two anabolic pathways are also connected by the imidazole intermediate 5-aminoimidazole-4-carboxamide ribotide (AICAR), which is synthesized in both routes but used only in purine biosynthesis. Rather surprisingly, the imidazole moieties of histidine and purines are synthesized by different, non-homologous enzymes. As discussed here, this phenomenon can be understood as a case of functional molecular convergence. Results In this work, we analyze these polyphyletic processes and argue that the independent origin of the corresponding enzymes is best explained by the differences in the function of each of the molecules to which the imidazole moiety is attached. Since the imidazole present in histidine is a catalytic moiety, its chemical arrangement allows it to act as an acid or a base. On the contrary, the de novo biosynthesis of purines starts with an activated ribose and all the successive intermediates are ribotides, with the key β-glycosidic bondage joining the ribose and the imidazole moiety. This prevents purine ribonucleotides to exhibit any imidazole-dependent catalytic activity, and may have been the critical trait for the evolution of two separate imidazole-synthesizing-enzymes. We also suggest that, in evolutionary terms, the biosynthesis of purines predated that of histidine. Conclusions As reviewed here, other biosynthetic routes for imidazole molecules are also found in extant metabolism, including the autocatalytic cyclization that occurs during the formation of creatinine from creatine phosphate, as well as the internal cyclization of the Ala-Ser-Gly motif of some members of the ammonia-lyase and aminomutase families, that lead to the MIO cofactor. The diversity of imidazole-synthesizing pathways highlights the biological significance of this key chemical group, whose biosyntheses evolved independently several times.
Collapse
|
9
|
Akand EH, Downard KM. Mutational analysis employing a phylogenetic mass tree approach in a study of the evolution of the influenza virus. Mol Phylogenet Evol 2017; 112:209-217. [DOI: 10.1016/j.ympev.2017.04.005] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2017] [Revised: 03/29/2017] [Accepted: 04/05/2017] [Indexed: 11/28/2022]
|
10
|
Kitagawa H, Takeda K, Tsuboi R, Hayashi M, Sasaki JI, Imazato S. Influence of polymerization properties of 4-META/MMA-based resin on the activity of fibroblast growth factor-2. Dent Mater J 2017. [PMID: 28626207 DOI: 10.4012/dmj.2016-372] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Dental adhesive resins based on 4-methacryloxyethyl trimellitate anhydride (4-META)/methyl methacrylate (MMA) have been utilized for root-end filling and the bonding of fractured roots. To increase the success rate of these treatments, it would be beneficial to promote the healing of surrounding tissue by applying growth factors. In this study, the influences of the polymerization properties of 4-META/MMA-based resins on the activity of fibroblast growth factor-2 (FGF-2) were evaluated in vitro. The temperature increase caused by the heat generation during polymerization of the 4-META/MMA-based resin was insufficient to change the structure and function of FGF-2. Unpolymerized monomers released from the cured 4-META/MMA-based resin had no negative influences on the ability of FGF-2 to promote the proliferation of osteoblast-like cells. These findings suggest that it is possible to use FGF-2 in combination with 4-META/MMA-based resins.
Collapse
Affiliation(s)
- Haruaki Kitagawa
- Department of Biomaterials Science, Osaka University Graduate School of Dentistry
| | - Kahoru Takeda
- Department of Restorative Dentistry and Endodontology, Osaka University Graduate School of Dentistry
| | - Ririko Tsuboi
- Department of Biomaterials Science, Osaka University Graduate School of Dentistry.,Division for Interdisciplinary Dentistry, Osaka University Dental Hospital
| | - Mikako Hayashi
- Department of Restorative Dentistry and Endodontology, Osaka University Graduate School of Dentistry
| | - Jun-Ichi Sasaki
- Department of Biomaterials Science, Osaka University Graduate School of Dentistry
| | - Satoshi Imazato
- Department of Biomaterials Science, Osaka University Graduate School of Dentistry
| |
Collapse
|
11
|
Abstract
The ProFunc web server is a tool for helping identify the function of a given protein whose 3D coordinates have been experimentally determined or homology modeled. It uses a cocktail of both sequence- and structure-based methods to identify matches to other proteins that may, in turn, suggest the query protein's most likely function. The server was originally developed to aid the worldwide structural genomics effort at the start of the millennium. It accepts a file containing the protein's 3D coordinates in PDB format, and, when processing is complete, sends an email containing a link to the password-protected result pages. The results include an at-a-glance summary, as well as separate pages containing more detailed analyses. The server can be found at: http://www.ebi.ac.uk/thornton-srv/databases/profunc .
Collapse
Affiliation(s)
- Roman A Laskowski
- European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| |
Collapse
|
12
|
Jez JM. Revisiting protein structure, function, and evolution in the genomic era. J Invertebr Pathol 2017; 142:11-15. [DOI: 10.1016/j.jip.2016.07.013] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2016] [Revised: 06/04/2016] [Accepted: 07/28/2016] [Indexed: 02/05/2023]
|
13
|
Manrique-Carpintero NC, Tokuhisa JG, Ginzberg I, Veilleux RE. Allelic variation in genes contributing to glycoalkaloid biosynthesis in a diploid interspecific population of potato. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2014; 127:391-405. [PMID: 24190104 DOI: 10.1007/s00122-013-2226-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2013] [Accepted: 10/22/2013] [Indexed: 06/02/2023]
Abstract
Variation for allelic state within genes of both primary and secondary metabolism influences the quantity and quality of steroidal glycoalkaloids produced in potato leaves. Genetic factors associated with the biosynthesis and accumulation of steroidal glycoalkaloids (SGAs) in potato were addressed by a candidate gene approach and whole genome single nucleotide polymorphism (SNP) genotyping. Allelic sequences spanning coding regions of four candidate genes [3-hydroxy-3-methylglutaryl coenzyme A reductase 2 (HMG2); 2,3-squalene epoxidase; solanidine galactosyltransferase; and solanidine glucosyltransferase (SGT2)] were obtained from two potato species differing in SGA composition: Solanum chacoense (chc 80-1) and Solanum tuberosum group Phureja (phu DH). An F2 population was genotyped and foliar SGAs quantified. The concentrations of α-solanine, α-chaconine, leptine I, leptine II and total SGAs varied broadly among F2 individuals. F2 plants with chc 80-1 alleles for HMG2 or SGT2 accumulated significantly greater leptines and total SGAs compared to plants with phu DH alleles. Plants with chc 80-1 alleles at both loci expressed the greatest levels of total SGAs, α-solanine and α-chaconine. A significant positive correlation was found between α-solanine and α-chaconine accumulation as well as between leptine I and leptine II. A whole genome SNP genotyping analysis of an F2 subsample verified the importance of chc 80-1 alleles at HMG2 and SGT2 for SGA synthesis and accumulation and suggested additional candidate genes including some previously associated with SGA production. Loci on five and seven potato pseudochromosomes were associated with synthesis and accumulation of SGAs, respectively. Two loci, on pseudochromosomes 1 and 6, explained phenotypic segregation of α-solanine and α-chaconine synthesis. Knowledge of the genetic factors influencing SGA production in potato may assist breeding for pest resistance.
Collapse
|
14
|
Abstract
Although more than 10(9) years have passed since the existence of the last universal common ancestor, proteins have yet to reach the limits of divergence. As a result, metabolic complexity is ever expanding. Identifying and understanding the mechanisms that drive and limit the divergence of protein sequence space impact not only evolutionary biologists investigating molecular evolution but also synthetic biologists seeking to design useful catalysts and engineer novel metabolic pathways. Investigations over the past 50 years indicate that the recruitment of enzymes for new functions is a key event in the acquisition of new metabolic capacity. In this review, we outline the genetic mechanisms that enable recruitment and summarize the present state of knowledge regarding the functional characteristics of extant catalysts that facilitate recruitment. We also highlight recent examples of enzyme recruitment, both from the historical record provided by phylogenetics and from enzyme evolution experiments. We conclude with a look to the future, which promises fruitful consequences from the convergence of molecular evolutionary theory, laboratory-directed evolution, and synthetic biology.
Collapse
Affiliation(s)
- Cindy Schulenburg
- Laboratory of Organic Chemistry, ETH-Zürich , Zürich CH-8093, Switzerland
| | | |
Collapse
|
15
|
Xing S, Li M, Liu P. Evolution of S-domain receptor-like kinases in land plants and origination of S-locus receptor kinases in Brassicaceae. BMC Evol Biol 2013; 13:69. [PMID: 23510165 PMCID: PMC3616866 DOI: 10.1186/1471-2148-13-69] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Accepted: 03/12/2013] [Indexed: 01/31/2023] Open
Abstract
Background The S-domain serine/threonine receptor-like kinases (SRLKs) comprise one of the largest and most rapidly expanding subfamilies in the plant receptor-like/Pelle kinase (RLKs) family. The founding member of this subfamily, the S-locus receptor kinase (SRK), functions as the female determinant of specificity in the self-incompatibility (SI) responses of crucifers. Two classes of proteins resembling the extracellular S domain (designated S-domain receptor-like proteins, SRLPs) or the intracellular kinase domain (designated S-domain receptor-like cytoplasmic kinases, SRLCKs) of SRK are also ubiquitous in land plants, indicating that the SRLKs are composite molecules that originated by domain fusion of the two component proteins. Here, we explored the origin and diversification of SRLKs by phylogenomic methods. Results Based on the distribution patterns of SRLKs and SRLCKs in a reconciled species-domain tree, a maximum parsimony model was then established for simultaneously inferring and dating gene duplication/loss and fusion /fission events in SRLK evolution. Various SRK alleles from crucifer species were then included in our phylogenetic analyses to infer the origination of SRKs by identifying the proper outgroups. Conclusions Two gene fusion events were inferred and the major gene fusion event occurred in the common ancestor of land plants generated almost all of extant SRLKs. The functional diversification of duplicated SRLKs was illustrated by molecular evolution analyses of SRKs. Our findings support that SRKs originated as two ancient haplotypes derived from a pair of tandem duplicate genes through random regulatory neo-/sub- functionalization in the common ancestor of the Brassicaceae.
Collapse
Affiliation(s)
- Shilai Xing
- Department of Ecology, College of Resources and Environmental Sciences, China Agricultural University, Beijing 100193, People's Republic of China
| | | | | |
Collapse
|
16
|
Furnham N, Laskowski RA, Thornton JM. Abstracting knowledge from the protein data bank. Biopolymers 2012; 99:183-8. [DOI: 10.1002/bip.22107] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2012] [Accepted: 05/25/2012] [Indexed: 12/27/2022]
|
17
|
Evran S, Telefoncu A, Sterner R. Directed evolution of ( )8-barrel enzymes: establishing phosphoribosylanthranilate isomerisation activity on the scaffold of the tryptophan synthase -subunit. Protein Eng Des Sel 2012; 25:285-93. [DOI: 10.1093/protein/gzs015] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
|
18
|
Gao L, Gao F, Wang L, Geng C, Chi L, Zhao J, Qu Y. N-glycoform diversity of cellobiohydrolase I from Penicillium decumbens and synergism of nonhydrolytic glycoform in cellulose degradation. J Biol Chem 2012; 287:15906-15. [PMID: 22427663 DOI: 10.1074/jbc.m111.332890] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Four cellobiohydrolase I (CBHI) glycoforms, namely, CBHI-A, CBHI-B, CBHI-C, and CBHI-D, were purified from the cultured broth of Penicillium decumbens JU-A10. All glycoforms had the same amino acid sequence but displayed different characteristics and biological functions. The effects of the N-glycans of the glycoforms on CBH activity were analyzed using mass spectrum data. Longer N-glycan chains at the Asn-137 of CBHI increased CBH activity. After the N-glycans were removed using site-directed mutagenesis and homologous expression in P. decumbens, the specific CBH activity of the recombinant CBHI without N-glycosylation increased by 65% compared with the wild-type CBHI with the highest specific activity. However, the activity was not stable. Only the N-glycosylation at Asn-137 can improve CBH activity by 40%. rCBHI with N-glycosylation only at Asn-470 exhibited no enzymatic activity. CBH activity was affected whether or not the protein was glycosylated, together with the N-glycosylation site and N-glycan structure. N-Glycosylation not only affects CBH activity but may also bring a new feature to a nonhydrolytic CBHI glycoform (CBHI-A). By supplementing CBHI-A to different commercial cellulase preparations, the glucose yield of lignocellulose hydrolysis increased by >20%. After treatment with a low dose (5 mg/g substrate) of CBHI-A at 50 °C for 7 days, the hydrogen-bond intensity and crystalline degree of cotton fibers decreased by 17 and 34%, respectively. These results may provide new guidelines for cellulase engineering.
Collapse
Affiliation(s)
- Le Gao
- State Key Laboratory of Microbial Technology, Shandong University, Jinan 250100, China
| | | | | | | | | | | | | |
Collapse
|
19
|
Ghosh AK, Anderson DD, Weber IT, Mitsuya H. Enhancing protein backbone binding--a fruitful concept for combating drug-resistant HIV. Angew Chem Int Ed Engl 2012; 51:1778-802. [PMID: 22290878 PMCID: PMC7159617 DOI: 10.1002/anie.201102762] [Citation(s) in RCA: 117] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2011] [Indexed: 12/02/2022]
Abstract
The evolution of drug resistance is one of the most fundamental problems in medicine. In HIV/AIDS, the rapid emergence of drug-resistant HIV-1 variants is a major obstacle to current treatments. HIV-1 protease inhibitors are essential components of present antiretroviral therapies. However, with these protease inhibitors, resistance occurs through viral mutations that alter inhibitor binding, resulting in a loss of efficacy. This loss of potency has raised serious questions with regard to effective long-term antiretroviral therapy for HIV/AIDS. In this context, our research has focused on designing inhibitors that form extensive hydrogen-bonding interactions with the enzyme's backbone in the active site. In doing so, we limit the protease's ability to acquire drug resistance as the geometry of the catalytic site must be conserved to maintain functionality. In this Review, we examine the underlying principles of enzyme structure that support our backbone-binding concept as an effective means to combat drug resistance and highlight their application in our recent work on antiviral HIV-1 protease inhibitors.
Collapse
Affiliation(s)
- Arun K Ghosh
- Department of Chemistry, Purdue University, West Lafayette, IN 47907, USA.
| | | | | | | |
Collapse
|
20
|
Sunagar K, Johnson WE, O'Brien SJ, Vasconcelos V, Antunes A. Evolution of CRISPs associated with toxicoferan-reptilian venom and mammalian reproduction. Mol Biol Evol 2012; 29:1807-22. [PMID: 22319140 DOI: 10.1093/molbev/mss058] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Cysteine-rich secretory proteins (CRISPs) are glycoproteins found exclusively in vertebrates and have broad diversified functions. They are hypothesized to play important roles in mammalian reproduction and in reptilian venom, where they disrupt homeostasis of the prey through several mechanisms, including among others, blockage of cyclic nucleotide-gated and voltage-gated ion channels and inhibition of smooth muscle contraction. We evaluated the molecular evolution of CRISPs in toxicoferan reptiles at both nucleotide and protein levels relative to their nonvenomous mammalian homologs. We show that the evolution of CRISP gene in these reptiles is significantly influenced by positive selection and in snakes (ω = 3.84) more than in lizards (ω = 2.33), whereas mammalian CRISPs were under strong negative selection (CRISP1 = 0.55, CRISP2 = 0.40, and CRISP3 = 0.68). The use of ancestral sequence reconstruction, mapping of mutations on the three-dimensional structure, and detailed evaluation of selection pressures suggests that the toxicoferan CRISPs underwent accelerated evolution aided by strong positive selection and directional mutagenesis, whereas their mammalian homologs are constrained by negative selection. Gene and protein-level selection analyses identified 41 positively selected sites in snakes and 14 sites in lizards. Most of these sites are located on the molecular surface (nearly 76% in snakes and 79% in lizards), whereas the backbone of the protein retains a highly conserved structural scaffold. Nearly 46% of the positively selected sites occur in the cysteine-rich domain of the protein. This directional mutagenesis, where the hotspots of mutations are found on the molecular surface and functional domains of the protein, acts as a diversifying mechanism for the exquisite biological targeting of CRISPs in toxicoferan reptiles. Finally, our analyses suggest that the evolution of toxicoferan-CRISP venoms might have been influenced by the specific predatory mechanism employed by the organism. CRISPs in Elapidae, which mostly employ neurotoxins, have experienced less positive selection pressure (ω = 2.86) compared with the "nonvenomous" colubrids (ω = 4.10) that rely on grip and constriction to capture the prey, and the Viperidae, a lineage that mostly employs haemotoxins (ω = 4.19). Relatively lower omega estimates in Anguimorph lizards (ω = 2.33) than snakes (ω = 3.84) suggests that lizards probably depend more on pace and powerful jaws for predation than venom.
Collapse
Affiliation(s)
- Kartik Sunagar
- CIMAR/CIIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Porto, Portugal
| | | | | | | | | |
Collapse
|
21
|
Ghosh AK, Anderson DD, Weber IT, Mitsuya H. Verstärkung der Bindung an das Proteinrückgrat - ein fruchtbares Konzept gegen die Arzneimittelresistenz von HIV. Angew Chem Int Ed Engl 2012. [DOI: 10.1002/ange.201102762] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
|
22
|
Abstract
Phenotypes that vary in response to DNA mutations are essential for evolutionary adaptation and innovation. Therefore, it seems that robustness, a lack of phenotypic variability, must hinder adaptation. The main purpose of this review is to show why this is not necessarily correct. There are two reasons. The first is that robustness causes the existence of genotype networks--large connected sets of genotypes with the same phenotype. I discuss why genotype networks facilitate phenotypic variability. The second reason emerges from the evolutionary dynamics of evolving populations on genotype networks. I discuss how these dynamics can render highly robust phenotypes more variable, using examples from protein and RNA macromolecules. In addition, robustness can help avoid an important evolutionary conflict between the interests of individuals and populations-a conflict that can impede evolutionary adaptation.
Collapse
Affiliation(s)
- Andreas Wagner
- Institute of Evolutionary Biology and Environmental Studies, University of Zurich, Y27-J-54 Winterthurerstrasse 190, 8057 Zurich, Switzerland.
| |
Collapse
|
23
|
Furnham N, Sillitoe I, Holliday GL, Cuff AL, Rahman SA, Laskowski RA, Orengo CA, Thornton JM. FunTree: a resource for exploring the functional evolution of structurally defined enzyme superfamilies. Nucleic Acids Res 2012; 40:D776-82. [PMID: 22006843 PMCID: PMC3245072 DOI: 10.1093/nar/gkr852] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2011] [Accepted: 09/24/2011] [Indexed: 11/12/2022] Open
Abstract
FunTree is a new resource that brings together sequence, structure, phylogenetic, chemical and mechanistic information for structurally defined enzyme superfamilies. Gathering together this range of data into a single resource allows the investigation of how novel enzyme functions have evolved within a structurally defined superfamily as well as providing a means to analyse trends across many superfamilies. This is done not only within the context of an enzyme's sequence and structure but also the relationships of their reactions. Developed in tandem with the CATH database, it currently comprises 276 superfamilies covering ~1800 (70%) of sequence assigned enzyme reactions. Central to the resource are phylogenetic trees generated from structurally informed multiple sequence alignments using both domain structural alignments supplemented with domain sequences and whole sequence alignments based on commonality of multi-domain architectures. These trees are decorated with functional annotations such as metabolite similarity as well as annotations from manually curated resources such the catalytic site atlas and MACiE for enzyme mechanisms. The resource is freely available through a web interface: www.ebi.ac.uk/thorton-srv/databases/FunTree.
Collapse
Affiliation(s)
- Nicholas Furnham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | |
Collapse
|
24
|
The molecular origins of evolutionary innovations. Trends Genet 2011; 27:397-410. [PMID: 21872964 DOI: 10.1016/j.tig.2011.06.002] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2011] [Revised: 06/10/2011] [Accepted: 06/13/2011] [Indexed: 11/22/2022]
Abstract
The history of life is a history of evolutionary innovations, qualitatively new phenotypic traits that endow their bearers with new, often game-changing abilities. We know many individual examples of innovations and their natural history, but we know little about the fundamental principles of phenotypic variability that permit new phenotypes to arise. Most phenotypic innovations result from changes in three classes of systems: metabolic networks, regulatory circuits, and macromolecules. I here highlight two important features that these classes of systems share. The first is the ubiquity of vast genotype networks - connected sets of genotypes with the same phenotype. The second is the great phenotypic diversity of small neighborhoods around different genotypes in genotype space. I here explain that both features are essential for the phenotypic variability that can bring forth qualitatively new phenotypes. Both features emerge from a common cause, the robustness of phenotypes to perturbations, whose origins are linked to life in changing environments.
Collapse
|
25
|
Wagner A. The low cost of recombination in creating novel phenotypes: Recombination can create new phenotypes while disrupting well-adapted phenotypes much less than mutation. Bioessays 2011; 33:636-46. [PMID: 21633964 DOI: 10.1002/bies.201100027] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Recombination is often considered a disruptive force for well-adapted phenotypes, but recent evidence suggests that this cost of recombination can be small. A key benefit of recombination is that it can help create proteins and regulatory circuits with novel and useful phenotypes more efficiently than point mutation. Its effectiveness stems from the large-scale reorganization of genotypes that it causes, which can help explore far-flung regions in genotype space. Recent work on complex phenotypes in model gene regulatory circuits and proteins shows that the disruptive effects of recombination can be very mild compared to the effects of mutation. Recombination thus can have great benefits at a modest cost, but we do not understand the reasons well. A better understanding might shed light on the evolution of recombination and help improve evolutionary strategies in biochemical engineering.
Collapse
Affiliation(s)
- Andreas Wagner
- Institute of Evolutionary Biology and Environmental Sciences, University of Zurich, Zurich, Switzerland.
| |
Collapse
|
26
|
Yahalom R, Reshef D, Wiener A, Frankel S, Kalisman N, Lerner B, Keasar C. Structure-based identification of catalytic residues. Proteins 2011; 79:1952-63. [PMID: 21491495 DOI: 10.1002/prot.23020] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2010] [Revised: 01/14/2011] [Accepted: 01/28/2011] [Indexed: 11/10/2022]
Abstract
The identification of catalytic residues is an essential step in functional characterization of enzymes. We present a purely structural approach to this problem, which is motivated by the difficulty of evolution-based methods to annotate structural genomics targets that have few or no homologs in the databases. Our approach combines a state-of-the-art support vector machine (SVM) classifier with novel structural features that augment structural clues by spatial averaging and Z scoring. Special attention is paid to the class imbalance problem that stems from the overwhelming number of non-catalytic residues in enzymes compared to catalytic residues. This problem is tackled by: (1) optimizing the classifier to maximize a performance criterion that considers both Type I and Type II errors in the classification of catalytic and non-catalytic residues; (2) under-sampling non-catalytic residues before SVM training; and (3) during SVM training, penalizing errors in learning catalytic residues more than errors in learning non-catalytic residues. Tested on four enzyme datasets, one specifically designed by us to mimic the structural genomics scenario and three previously evaluated datasets, our structure-based classifier is never inferior to similar structure-based classifiers and comparable to classifiers that use both structural and evolutionary features. In addition to the evaluation of the performance of catalytic residue identification, we also present detailed case studies on three proteins. This analysis suggests that many false positive predictions may correspond to binding sites and other functional residues. A web server that implements the method, our own-designed database, and the source code of the programs are publicly available at http://www.cs.bgu.ac.il/∼meshi/functionPrediction.
Collapse
Affiliation(s)
- Ran Yahalom
- Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel
| | | | | | | | | | | | | |
Collapse
|
27
|
Arenas NE, Salazar LM, Soto CY, Vizcaíno C, Patarroyo ME, Patarroyo MA, Gómez A. Molecular modeling and in silico characterization of Mycobacterium tuberculosis TlyA: possible misannotation of this tubercle bacilli-hemolysin. BMC STRUCTURAL BIOLOGY 2011; 11:16. [PMID: 21443791 PMCID: PMC3072309 DOI: 10.1186/1472-6807-11-16] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2010] [Accepted: 03/28/2011] [Indexed: 11/24/2022]
Abstract
Background The TlyA protein has a controversial function as a virulence factor in Mycobacterium tuberculosis (M. tuberculosis). At present, its dual activity as hemolysin and RNA methyltransferase in M. tuberculosis has been indirectly proposed based on in vitro results. There is no evidence however for TlyA relevance in the survival of tubercle bacilli inside host cells or whether both activities are functionally linked. A thorough analysis of structure prediction for this mycobacterial protein in this study shows the need for reevaluating TlyA's function in virulence. Results Bioinformatics analysis of TlyA identified a ribosomal protein binding domain (S4 domain), located between residues 5 and 68 as well as an FtsJ-like methyltranferase domain encompassing residues 62 and 247, all of which have been previously described in translation machinery-associated proteins. Subcellular localization prediction showed that TlyA lacks a signal peptide and its hydrophobicity profile showed no evidence of transmembrane helices. These findings suggested that it may not be attached to the membrane, which is consistent with a cytoplasmic localization. Three-dimensional modeling of TlyA showed a consensus structure, having a common core formed by a six-stranded β-sheet between two α-helix layers, which is consistent with an RNA methyltransferase structure. Phylogenetic analyses showed high conservation of the tlyA gene among Mycobacterium species. Additionally, the nucleotide substitution rates suggested purifying selection during tlyA gene evolution and the absence of a common ancestor between TlyA proteins and bacterial pore-forming proteins. Conclusion Altogether, our manual in silico curation suggested that TlyA is involved in ribosomal biogenesis and that there is a functional annotation error regarding this protein family in several microbial and plant genomes, including the M. tuberculosis genome.
Collapse
Affiliation(s)
- Nelson E Arenas
- Departamento de Química, Facultad de Ciencias, Universidad Nacional de Colombia, Carrera 45 No. 26-85 Bogotá, DC. Colombia
| | | | | | | | | | | | | |
Collapse
|
28
|
Jez JM. Toward protein engineering for phytoremediation: possibilities and challenges. INTERNATIONAL JOURNAL OF PHYTOREMEDIATION 2011; 13 Suppl 1:77-89. [PMID: 22046752 DOI: 10.1080/15226514.2011.568537] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The combination of rational protein engineering and directed evolution techniques allow for the redesign of enzymes with tailored properties for use in environmental remediation. This review summarizes current molecular methods for either altering or improving protein function and highlights examples of how these methods can address bioremediation problems. Although much of the protein engineering applied to environmental clean-up employs microbial systems, there is great potential for and significant challenges to translating these approaches to plant systems for phytoremediation purposes. Protein engineering technologies combined with genomic information and metabolic engineering strategies hold promise for the design of plants and microbes to remediate organic and inorganic pollutants.
Collapse
Affiliation(s)
- Joseph M Jez
- Department of Biology, Washington University, St. Louis, Missouri 63130, USA.
| |
Collapse
|
29
|
A naturally chimeric type IIA topoisomerase in Aquifex aeolicus highlights an evolutionary path for the emergence of functional paralogs. Proc Natl Acad Sci U S A 2010; 107:22055-9. [PMID: 21076033 DOI: 10.1073/pnas.1012938107] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Bacteria frequently possess two type IIA DNA topoisomerases, gyrase and topo IV, which maintain chromosome topology by variously supercoiling, relaxing, and disentangling DNA. DNA recognition and functional output is thought to be controlled by the C-terminal domain (CTD) of the topoisomerase DNA binding subunit (GyrA/ParC). The deeply rooted organism Aquifex aeolicus encodes one type IIA topoisomerase conflictingly categorized as either DNA gyrase or topo IV. To resolve this enzyme's catalytic properties and heritage, we conducted a series of structural and biochemical studies on the isolated GyrA/ParC CTD and the holoenzyme. Whereas the CTD displays a global structure similar to that seen in bone fide GyrA and ParC paralogs, it lacks a key functional motif (the "GyrA-box") and fails to wrap DNA. Biochemical assays show that the A. aeolicus topoisomerase cannot supercoil DNA, but robustly removes supercoils and decatenates DNA, two hallmark activities of topo IV. Despite these properties, phylogenetic analyses place all functional domains except the CTD squarely within a gyrase lineage, and the A. aeolicus GyrB subunit is capable of supporting supercoiling with Escherichia coli GyrA, but not DNA relaxation with E. coli ParC. Moreover, swapping the A. aeolicus GyrA/ParC CTD with the GyrA CTD from Thermotoga maritima creates an enzyme that negatively supercoils DNA. These findings identify A. aeolicus as the first bacterial species yet found to exist without a functional gyrase, and suggest an evolutionary path for generation of bacterial type IIA paralogs.
Collapse
|
30
|
Konc J, Janezic D. ProBiS: a web server for detection of structurally similar protein binding sites. Nucleic Acids Res 2010; 38:W436-40. [PMID: 20504855 PMCID: PMC2896105 DOI: 10.1093/nar/gkq479] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A web server, ProBiS, freely available at http://probis.cmm.ki.si, is presented. This provides access to the program ProBiS (Protein Binding Sites), which detects protein binding sites based on local structural alignments. Detailed instructions and user guidelines for use of ProBiS are available at the server under 'HELP' and selected examples are provided under 'EXAMPLES'.
Collapse
Affiliation(s)
- Janez Konc
- National Institute of Chemistry, Hajdrihova 19, 1000 Ljubljana, Slovenia
| | | |
Collapse
|
31
|
Kim KM, Caetano-Anollés G. Emergence and evolution of modern molecular functions inferred from phylogenomic analysis of ontological data. Mol Biol Evol 2010; 27:1710-33. [PMID: 20418223 DOI: 10.1093/molbev/msq106] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The biological processes that characterize the phenotypes of a living system are embodied in the function of molecules and hold the key to evolutionary history, delimiting natural selection and change. These processes and functions provide direct insight into the emergence, development, and organization of cellular life. However, detailed molecular functions make up a network-like hierarchy of relationships that tells little of evolutionary links between structure and function in biology. For example, Gene Ontology terms represent widely-used vocabularies of processes and functions with evolutionary relationships that are implicit but not defined. Here, we uncover patterns of global evolutionary history in ontological terms associated with the sequence of 38 genomes. These patterns unfold the metabolic origins of modern molecular functions and major biological transitions in evolution toward complex life. Phylogenies reveal the primordial appearance of hydrolases and transferases, with ATPase, GTPase, and helicase activities being the most ancient. This indicates that ancient catalysts were crucial for binding and transport, the emergence of nucleic acids and protein biopolymers, and the communication of primordial cells with the environment. Finally, the history of biological processes showed that cellular biopolymer metabolic processes preceded biopolymer biosynthesis and essential processes related to macromolecular formation, directly challenging the existence of an RNA world. Phylogenomic systematization of biological function takes the structure and function paradigm to a completely new level of abstraction, demonstrating a "metabolic first" origin of life. The approach uncovers patterns in the morphing of function that are unprecedented and necessary for systematic views in biology.
Collapse
Affiliation(s)
- Kyung Mo Kim
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, IL, USA
| | | |
Collapse
|
32
|
Li H, Greene LH. Sequence and structural analysis of the chitinase insertion domain reveals two conserved motifs involved in chitin-binding. PLoS One 2010; 5:e8654. [PMID: 20084296 PMCID: PMC2805709 DOI: 10.1371/journal.pone.0008654] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2009] [Accepted: 12/05/2009] [Indexed: 01/01/2023] Open
Abstract
Background Chitinases are prevalent in life and are found in species including archaea, bacteria, fungi, plants, and animals. They break down chitin, which is the second most abundant carbohydrate in nature after cellulose. Hence, they are important for maintaining a balance between carbon and nitrogen trapped as insoluble chitin in biomass. Chitinases are classified into two families, 18 and 19 glycoside hydrolases. In addition to a catalytic domain, which is a triosephosphate isomerase barrel, many family 18 chitinases contain another module, i.e., chitinase insertion domain. While numerous studies focus on the biological role of the catalytic domain in chitinase activity, the function of the chitinase insertion domain is not completely understood. Bioinformatics offers an important avenue in which to facilitate understanding the role of residues within the chitinase insertion domain in chitinase function. Results Twenty-seven chitinase insertion domain sequences, which include four experimentally determined structures and span five kingdoms, were aligned and analyzed using a modified sequence entropy parameter. Thirty-two positions with conserved residues were identified. The role of these conserved residues was explored by conducting a structural analysis of a number of holo-enzymes. Hydrogen bonding and van der Waals calculations revealed a distinct subset of four conserved residues constituting two sequence motifs that interact with oligosaccharides. The other conserved residues may be key to the structure, folding, and stability of this domain. Conclusions Sequence and structural studies of the chitinase insertion domains conducted within the framework of evolution identified four conserved residues which clearly interact with the substrates. Furthermore, evolutionary studies propose a link between the appearance of the chitinase insertion domain and the function of family 18 chitinases in the subfamily A.
Collapse
Affiliation(s)
- Hai Li
- Department of Chemistry and Biochemistry, Old Dominion University, Norfolk, Virginia, United States of America
| | - Lesley H. Greene
- Department of Chemistry and Biochemistry, Old Dominion University, Norfolk, Virginia, United States of America
- * E-mail:
| |
Collapse
|
33
|
Roy A, Taraphder S. Transition path sampling study of the conformational fluctuation of His-64 in human carbonic anhydrase II. J Phys Chem B 2009; 113:12555-64. [PMID: 19685901 DOI: 10.1021/jp9010982] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We report here a transition path sampling study of the conformational fluctuation of His-64 that is known to be important in the enzymatic catalysis of human carbonic anhydrase II. The dynamical transition between experimentally detected conformations of His-64 could not be observed using classical molecular dynamics trajectories extended to 3.5 ns, indicating the transition to be rare on the time scale of molecular dynamics. Using the transition path sampling method, an ensemble of transition paths between these two conformers has been generated and analyzed in detail to identify the mechanism of coupling of His-64 to its neighboring residues during the conformational transition. It is found that both Asn-62 and Tyr-7 may contribute toward retaining the His-64 residue in its outward conformation. Trp-5, on the other hand, shows marked motions at the transition state. The number of water molecules inside a part of the active site cavity and the corresponding cavity volume are also found to vary coupled to the His-64 conformational dynamics.
Collapse
Affiliation(s)
- Arijit Roy
- Department of Chemistry, Indian Institute of Technology, Kharagpur 721302, India
| | | |
Collapse
|
34
|
Galant A, Arkus KA, Zubieta C, Cahoon RE, Jez JM. Structural basis for evolution of product diversity in soybean glutathione biosynthesis. THE PLANT CELL 2009; 21:3450-8. [PMID: 19948790 PMCID: PMC2798330 DOI: 10.1105/tpc.109.071183] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2009] [Revised: 10/09/2009] [Accepted: 11/05/2009] [Indexed: 05/05/2023]
Abstract
The redox active peptide glutathione is ubiquitous in nature, but some plants also synthesize glutathione analogs in response to environmental stresses. To understand the evolution of chemical diversity in the closely related enzymes homoglutathione synthetase (hGS) and glutathione synthetase (GS), we determined the structures of soybean (Glycine max) hGS in three states: apoenzyme, bound to gamma-glutamylcysteine (gammaEC), and with hGSH, ADP, and a sulfate ion bound in the active site. Domain movements and rearrangement of active site loops change the structure from an open active site form (apoenzyme and gammaEC complex) to a closed active site form (hGSH*ADP*SO(4)(2-) complex). The structure of hGS shows that two amino acid differences in an active site loop provide extra space to accommodate the longer beta-Ala moiety of hGSH in comparison to the glycinyl group of glutathione. Mutation of either Leu-487 or Pro-488 to an Ala improves catalytic efficiency using Gly, but a double mutation (L487A/P488A) is required to convert the substrate preference of hGS from beta-Ala to Gly. These structures, combined with site-directed mutagenesis, reveal the molecular changes that define the substrate preference of hGS, explain the product diversity within evolutionarily related GS-like enzymes, and reinforce the critical role of active site loops in the adaptation and diversification of enzyme function.
Collapse
Affiliation(s)
- Ashley Galant
- Department of Biology, Washington University, St. Louis, Missouri 63130
| | - Kiani A.J. Arkus
- Department of Biology, Washington University, St. Louis, Missouri 63130
- Donald Danforth Plant Science Center, St. Louis, Missouri 63132
| | - Chloe Zubieta
- Donald Danforth Plant Science Center, St. Louis, Missouri 63132
| | | | - Joseph M. Jez
- Department of Biology, Washington University, St. Louis, Missouri 63130
- Donald Danforth Plant Science Center, St. Louis, Missouri 63132
| |
Collapse
|
35
|
Abstract
Detection of structural motif of residues in protein structures allows identification of structural or functional similarity between proteins. In the field of protein engineering, structural motif identification is essential to select protein scaffolds on which a motif of residues can be transferred to design a new protein with a given function. We describe here the RASMOT-3D PRO webserver (http://biodev.extra.cea.fr/rasmot3d/) that performs a systematic search in 3D structures of protein for a set of residues exhibiting a particular topology. Comparison is based on Cα and Cβ atoms in two steps: inter-atomic distances and RMSD. RASMOT-3D PRO takes in input a PDB file containing the 3D coordinates of the searched motif and provides an interactive list of identified protein structures exhibiting residues of similar topology as the motif searched. Each solution can be graphically examined on the website. The topological search can be conducted in structures described in PDB files uploaded by the user or in those deposited in the PDB. This characteristic as well as the possibility to reject scaffolds sterically incompatible with the target, makes RASMOT-3D PRO a unique webtool in the field of protein engineering.
Collapse
Affiliation(s)
- Gaëlle Debret
- Service d'Ingénierie Moléculaire des Protéines (SIMOPRO), iBiTec-S, DSV, CEA, CE-Saclay, 91191 Gif Sur Yvette Cedex, France
| | | | | |
Collapse
|
36
|
Chiang RA, Sali A, Babbitt PC. Evolutionarily conserved substrate substructures for automated annotation of enzyme superfamilies. PLoS Comput Biol 2008; 4:e1000142. [PMID: 18670595 PMCID: PMC2453236 DOI: 10.1371/journal.pcbi.1000142] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2007] [Accepted: 06/24/2008] [Indexed: 11/19/2022] Open
Abstract
The evolution of enzymes affects how well a species can adapt to new environmental conditions. During enzyme evolution, certain aspects of molecular function are conserved while other aspects can vary. Aspects of function that are more difficult to change or that need to be reused in multiple contexts are often conserved, while those that vary may indicate functions that are more easily changed or that are no longer required. In analogy to the study of conservation patterns in enzyme sequences and structures, we have examined the patterns of conservation and variation in enzyme function by analyzing graph isomorphisms among enzyme substrates of a large number of enzyme superfamilies. This systematic analysis of substrate substructures establishes the conservation patterns that typify individual superfamilies. Specifically, we determined the chemical substructures that are conserved among all known substrates of a superfamily and the substructures that are reacting in these substrates and then examined the relationship between the two. Across the 42 superfamilies that were analyzed, substantial variation was found in how much of the conserved substructure is reacting, suggesting that superfamilies may not be easily grouped into discrete and separable categories. Instead, our results suggest that many superfamilies may need to be treated individually for analyses of evolution, function prediction, and guiding enzyme engineering strategies. Annotating superfamilies with these conserved and reacting substructure patterns provides information that is orthogonal to information provided by studies of conservation in superfamily sequences and structures, thereby improving the precision with which we can predict the functions of enzymes of unknown function and direct studies in enzyme engineering. Because the method is automated, it is suitable for large-scale characterization and comparison of fundamental functional capabilities of both characterized and uncharacterized enzyme superfamilies.
Collapse
Affiliation(s)
- Ranyee A. Chiang
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and California Institute for Quantitative Biosciences, University of California at San Francisco, San Francisco, California, United States of America
| | - Andrej Sali
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and California Institute for Quantitative Biosciences, University of California at San Francisco, San Francisco, California, United States of America
| | - Patricia C. Babbitt
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and California Institute for Quantitative Biosciences, University of California at San Francisco, San Francisco, California, United States of America
- * E-mail:
| |
Collapse
|
37
|
Martin OC, Wagner A. New structural variation in evolutionary searches of RNA neutral networks. Biosystems 2007; 90:475-85. [PMID: 17276586 DOI: 10.1016/j.biosystems.2006.11.007] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2006] [Revised: 11/28/2006] [Accepted: 11/28/2006] [Indexed: 11/24/2022]
Abstract
RNA secondary structure is an important computational model to understand how genetic variation maps into phenotypic (structural) variation. Evolutionary innovation in RNA structures is facilitated by neutral networks, large connected sets of RNA sequences that fold into the same structure. Our work extends and deepens previous studies on neutral networks. First, we show that even the 1-mutant neighborhood of a given sequence (genotype) G0 with structure (phenotype) P contains many structural variants that are not close to P. This holds for biological and generic RNA sequences alike. Second, we analyze the relation between new structures in the 1-neighborhoods of genotypes Gk that are only a moderate Hamming distance k away from G0, and the structure of G0 itself, both for biological and for generic RNA structures. Third, we analyze the relation between mutational robustness of a sequence and the distances of structural variants near this sequence. Our findings underscore the role of neutral networks in evolutionary innovation, and the role that high robustness can play in diminishing the potential for such innovation.
Collapse
|
38
|
Brown DP, Krishnamurthy N, Sjölander K. Automated protein subfamily identification and classification. PLoS Comput Biol 2007; 3:e160. [PMID: 17708678 PMCID: PMC1950344 DOI: 10.1371/journal.pcbi.0030160] [Citation(s) in RCA: 86] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2006] [Accepted: 06/25/2007] [Indexed: 11/22/2022] Open
Abstract
Function prediction by homology is widely used to provide preliminary functional annotations for genes for which experimental evidence of function is unavailable or limited. This approach has been shown to be prone to systematic error, including percolation of annotation errors through sequence databases. Phylogenomic analysis avoids these errors in function prediction but has been difficult to automate for high-throughput application. To address this limitation, we present a computationally efficient pipeline for phylogenomic classification of proteins. This pipeline uses the SCI-PHY (Subfamily Classification in Phylogenomics) algorithm for automatic subfamily identification, followed by subfamily hidden Markov model (HMM) construction. A simple and computationally efficient scoring scheme using family and subfamily HMMs enables classification of novel sequences to protein families and subfamilies. Sequences representing entirely novel subfamilies are differentiated from those that can be classified to subfamilies in the input training set using logistic regression. Subfamily HMM parameters are estimated using an information-sharing protocol, enabling subfamilies containing even a single sequence to benefit from conservation patterns defining the family as a whole or in related subfamilies. SCI-PHY subfamilies correspond closely to functional subtypes defined by experts and to conserved clades found by phylogenetic analysis. Extensive comparisons of subfamily and family HMM performances show that subfamily HMMs dramatically improve the separation between homologous and non-homologous proteins in sequence database searches. Subfamily HMMs also provide extremely high specificity of classification and can be used to predict entirely novel subtypes. The SCI-PHY Web server at http://phylogenomics.berkeley.edu/SCI-PHY/ allows users to upload a multiple sequence alignment for subfamily identification and subfamily HMM construction. Biologists wishing to provide their own subfamily definitions can do so. Source code is available on the Web page. The Berkeley Phylogenomics Group PhyloFacts resource contains pre-calculated subfamily predictions and subfamily HMMs for more than 40,000 protein families and domains at http://phylogenomics.berkeley.edu/phylofacts/.
Collapse
Affiliation(s)
- Duncan P Brown
- Department of Bioengineering, University of California, Berkeley, California, United States of America
| | - Nandini Krishnamurthy
- Department of Bioengineering, University of California, Berkeley, California, United States of America
| | - Kimmen Sjölander
- Department of Bioengineering, University of California, Berkeley, California, United States of America
| |
Collapse
|
39
|
Mirkovic N, Li Z, Parnassa A, Murray D. Strategies for high-throughput comparative modeling: applications to leverage analysis in structural genomics and protein family organization. Proteins 2007; 66:766-77. [PMID: 17154423 DOI: 10.1002/prot.21191] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The technological breakthroughs in structural genomics were designed to facilitate the solution of a sufficient number of structures, so that as many protein sequences as possible can be structurally characterized with the aid of comparative modeling. The leverage of a solved structure is the number and quality of the models that can be produced using the structure as a template for modeling and may be viewed as the "currency" with which the success of a structural genomics endeavor can be measured. Moreover, the models obtained in this way should be valuable to all biologists. To this end, at the Northeast Structural Genomics Consortium (NESG), a modular computational pipeline for automated high-throughput leverage analysis was devised and used to assess the leverage of the 186 unique NESG structures solved during the first phase of the Protein Structure Initiative (January 2000 to July 2005). Here, the results of this analysis are presented. The number of sequences in the nonredundant protein sequence database covered by quality models produced by the pipeline is approximately 39,000, so that the average leverage is approximately 210 models per structure. Interestingly, only 7900 of these models fulfill the stringent modeling criterion of being at least 30% sequence-identical to the corresponding NESG structures. This study shows how high-throughput modeling increases the efficiency of structure determination efforts by providing enhanced coverage of protein structure space. In addition, the approach is useful in refining the boundaries of structural domains within larger protein sequences, subclassifying sequence diverse protein families, and defining structure-based strategies specific to a particular family.
Collapse
Affiliation(s)
- Nebojsa Mirkovic
- Department of Microbiology and Immunology, Weill Medical College of Cornell University, New York, New York 10021, USA
| | | | | | | |
Collapse
|
40
|
Soto JG, White SA, Reyes SR, Regalado R, Sanchez EE, Perez JC. Molecular evolution of PIII-SVMP and RGD disintegrin genes from the genus Crotalus. Gene 2006; 389:66-72. [PMID: 17112685 DOI: 10.1016/j.gene.2006.09.020] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2006] [Revised: 09/12/2006] [Accepted: 09/22/2006] [Indexed: 11/16/2022]
Abstract
Several types of disintegrins have been isolated from Crotalus spp rattlesnakes, including RGD disintegrins, and PIII-SVMPs. We isolated six cDNAs from snake venom glands using RT-PCR. Three RGD disintegrins (atroxatin, mojastin, and viridistatin) and three PIII-SVMPs (catroriarin, scutiarin, and viristiarin) cDNAs were isolated from the rattlesnakes Crotalus atrox, Crotalus scutulatus scutulatus, and Crotalus viridis viridis, respectively. Atroxatin and Viridistatin shared 90% amino acid identity to each other, and 87% identity to Mojastin. Scutiarin and Viristiarin were identical. All PIII-SVMPs isolated in this study shared the highest amino acid identity with Catrocollastatin. cDNA and protein sequences for RGD disintegrins, one MVD disintegrin, and PIII-SVMPs of the genus Crotalus (present in the NCBI database), were used in phylogenetic analysis. Neighbor-joining analysis of PIII-SVMP and RGD/MVD disintegrin-coding DNA sequences showed that these groups of genes separate into separate clades. A Phi(ST) pairwise comparison and Analysis of Molecular Variance (AMOVA) between PIII-SVMPs and RGD/MVD disintegrins showed significant genetic differences. Mutations observed in ten of the cDNAs analyzed did not affect Cys-coding sequences. Our K(A)/K(S) data suggest that rapid evolution occurred between the genes coding for PIII-SVMPs resulting, in the production of RGD disintegrin-coding genes. However, once these genes diverged, mutations in the PIII-SVMP-coding genes were accumulated less frequently.
Collapse
Affiliation(s)
- Julio G Soto
- Biological Sciences Department, San Jose State University, One Washington Square, Duncan Hall 254, San Jose, CA 95192-0100, United States.
| | | | | | | | | | | |
Collapse
|
41
|
Abstract
The ability to predict the function of a protein, given its sequence and/or 3D structure, is an essential requirement for exploiting the wealth of data made available by genomics and structural genomics projects and is therefore raising increasing interest in the computational biology community. To foster developments in the area as well as to establish the state of the art of present methods, a function prediction category was tentatively introduced in the 6th edition of the Critical Assessment of Techniques for Protein Structure Prediction (CASP) worldwide experiment. The assessment of the performance of the methods was made difficult by at least two factors: (a) the experimentally determined function of the targets was not available at the time of assessment; (b) the experiment is run blindly, preventing verification of whether the convergence of different predictions towards the same functional annotation was due to the similarity of the methods or to a genuine signal detectable by different methodologies. In this work, we collected information about the methods used by the various predictors and revisited the results of the experiment by verifying how often and in which cases a convergent prediction was obtained by methods based on different rationale. We propose a method for classifying the type and redundancy of the methods. We also analyzed the cases in which a function for the target protein has become available. Our results show that predictions derived from a consensus of different methods can reach an accuracy as high as 80%. It follows that some of the predictions submitted to CASP6, once reanalyzed taking into account the type of converging methods, can provide very useful information to researchers interested in the function of the target proteins.
Collapse
|
42
|
Zhang Y, Hubner IA, Arakaki AK, Shakhnovich E, Skolnick J. On the origin and highly likely completeness of single-domain protein structures. Proc Natl Acad Sci U S A 2006; 103:2605-10. [PMID: 16478803 PMCID: PMC1413790 DOI: 10.1073/pnas.0509379103] [Citation(s) in RCA: 142] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The size and origin of the protein fold universe is of fundamental and practical importance. Analyzing randomly generated, compact sticky homopolypeptide conformations constructed in generic simplified and all-atom protein models, all have similar folds in the library of solved structures, the Protein Data Bank, and conversely, all compact, single-domain protein structures in the Protein Data Bank have structural analogues in the compact model set. Thus, both sets are highly likely complete, with the protein fold universe arising from compact conformations of hydrogen-bonded, secondary structures. Because side chains are represented by their Cbeta atoms, these results also suggest that the observed protein folds are insensitive to the details of side-chain packing. Sequence specificity enters both in fine-tuning the structure and thermodynamically stabilizing a given fold with respect to the set of alternatives. Scanning the models against a three-dimensional active-site library, close geometric matches are frequently found. Thus, the presence of active-site-like geometries also seems to be a consequence of the packing of compact, secondary structural elements. These results have significant implications for the evolution of protein structure and function.
Collapse
Affiliation(s)
- Yang Zhang
- *Center of Excellence in Bioinformatics, University at Buffalo, State University of New York, 901 Washington Street, Buffalo, NY 14203; and
| | - Isaac A. Hubner
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, MA 02138
| | - Adrian K. Arakaki
- *Center of Excellence in Bioinformatics, University at Buffalo, State University of New York, 901 Washington Street, Buffalo, NY 14203; and
| | - Eugene Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, MA 02138
| | - Jeffrey Skolnick
- *Center of Excellence in Bioinformatics, University at Buffalo, State University of New York, 901 Washington Street, Buffalo, NY 14203; and
- To whom correspondence should be sent at the present address:
Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318. E-mail:
| |
Collapse
|
43
|
Binkowski TA, Joachimiak A, Liang J. Protein surface analysis for function annotation in high-throughput structural genomics pipeline. Protein Sci 2006; 14:2972-81. [PMID: 16322579 PMCID: PMC2253251 DOI: 10.1110/ps.051759005] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Structural genomics (SG) initiatives are expanding the universe of protein fold space by rapidly determining structures of proteins that were intentionally selected on the basis of low sequence similarity to proteins of known structure. Often these proteins have no associated biochemical or cellular functions. The SG success has resulted in an accelerated deposition of novel structures. In some cases the structural bioinformatics analysis applied to these novel structures has provided specific functional assignment. However, this approach has also uncovered limitations in the functional analysis of uncharacterized proteins using traditional sequence and backbone structure methodologies. A novel method, named pvSOAR (pocket and void Surface of Amino Acid Residues), of comparing the protein surfaces of geometrically defined pockets and voids was developed. pvSOAR was able to detect previously unrecognized and novel functional relationships between surface features of proteins. In this study, pvSOAR is applied to several structural genomics proteins. We examined the surfaces of YecM, BioH, and RpiB from Escherichia coli as well as the CBS domains from inosine-5'-monosphate dehydrogenase from Streptococcus pyogenes, conserved hypothetical protein Ta549 from Thermoplasm acidophilum, and CBS domain protein mt1622 from Methanobacterium thermoautotrophicum with the goal to infer information about their biochemical function.
Collapse
Affiliation(s)
- T Andrew Binkowski
- Department of Bioengineering, The University of Illinois, 851 South Morgan St., Room 218, Chicago, IL 60607, USA.
| | | | | |
Collapse
|
44
|
Koch MA, Waldmann H. Protein structure similarity clustering and natural product structure as guiding principles for chemical genomics. ERNST SCHERING RESEARCH FOUNDATION WORKSHOP 2006:89-109. [PMID: 16709001 DOI: 10.1007/978-3-540-37635-4_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
The majority of all proteins are modularly built from a limited set of approximately 1,000 structural domains. The knowledge of a common protein fold topology in the ligand-sensing cores of protein domains can be exploited for the design of small-molecule libraries in the development of inhibitors and ligands. Thus, a novel strategy of clustering protein domain cores based exclusively on structure similarity considerations (protein structure similarity clustering, PSSC) has been successfully applied to the development of small-molecule inhibitors of acetylcholinesterase and the 11beta-hydroxysteroid dehydrogenases based on the structure of a naturally occurring Cdc25 inhibitor. The efficiency of making use of the scaffolds of natural products as biologically prevalidated starting points for the design of compound libraries is further highlighted by the development of benzopyran-based FXR ligands.
Collapse
Affiliation(s)
- M A Koch
- Max Planck Institute of Molecular Physiology, Department of Chemical Biology, Dortmund, Germany
| | | |
Collapse
|
45
|
Ben-Shimon A, Eisenstein M. Looking at enzymes from the inside out: the proximity of catalytic residues to the molecular centroid can be used for detection of active sites and enzyme-ligand interfaces. J Mol Biol 2005; 351:309-26. [PMID: 16019028 DOI: 10.1016/j.jmb.2005.06.047] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2005] [Revised: 06/19/2005] [Accepted: 06/21/2005] [Indexed: 11/25/2022]
Abstract
Analysis of the distances of the exposed residues in 175 enzymes from the centroids of the molecules indicates that catalytic residues are very often found among the 5% of residues closest to the enzyme centroid. This property of catalytic residues is implemented in a new prediction algorithm (named EnSite) for locating the active sites of enzymes and in a new scheme for re-ranking enzyme-ligand docking solutions. EnSite examines only 5% of the molecular surface (represented by surface dots) that is closest to the centroid, identifying continuous surface segments and ranking them by their area size. EnSite ranks the correct prediction 1-4 in 97% of the cases in a dataset of 65 monomeric enzymes (rank 1 for 89% of the cases) and in 86% of the cases in a dataset of 176 monomeric and multimeric enzymes from all six top-level enzyme classifications (rank 1 in 74% of the cases). Importantly, identification of buried or flat active sites is straightforward because EnSite "looks" at the molecular surface from the inside out. Detailed examination of the results indicates that the proximity of the catalytic residues to the centroid is a property of the functional unit, defined as the assembly of domains or chains that form the active site (in most cases the functional unit corresponds to a single whole polypeptide chain). Using the functional unit in the prediction further improves the results. The new property of active sites is also used for re-evaluating enzyme-inhibitor unbound docking results. Sorting the docking solutions by the distance of the interface to the centroid of the enzyme improves remarkably the ranks of nearly correct solutions compared to ranks based on geometric-electrostatic-hydrophobic complementarity scores.
Collapse
Affiliation(s)
- Avraham Ben-Shimon
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel
| | | |
Collapse
|
46
|
Chakrabarti R, Klibanov AM, Friesner RA. Sequence optimization and designability of enzyme active sites. Proc Natl Acad Sci U S A 2005; 102:12035-40. [PMID: 16103370 PMCID: PMC1189337 DOI: 10.1073/pnas.0505397102] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2005] [Indexed: 11/18/2022] Open
Abstract
We recently found that many residues in enzyme active sites can be computationally predicted by the optimization of scoring functions based on substrate binding affinity, subject to constraints on the geometry of catalytic residues and protein stability. Here, we explore the generality of this surprising observation. First, the impact of hydrogen-bonding networks necessary for catalysis on the accuracy of sequence optimization is assessed; incorporation of these networks, where relevant, into the set of catalytic constraints is found to be essential. Next, the impact of multiple substrate selectivity on sequence optimization is probed by carrying out independent calculations for complexes of deoxyribonucleoside kinases with various cognate ligands, revealing how simultaneous selection pressures determined active-site sequences of these enzymes. Including previous calculations on simpler enzymes, computational sequence optimization correctly predicts 76% of all active-site residues tested (86% correct, with 93% similar, for naturally conserved residues). In these studies, the ligand is fixed in its native conformation. To assess the applicability of these methods to de novo active-site design, the effect of small ligand motions around the native pose is also examined. Robustness of sequence accuracy for topologically similar poses is demonstrated for selected kinases, but not for a model peptidase. Based on these observations, we introduce the notion of the designability of an enzyme active site, a metric that may be used to guide the search for protein scaffolds suitable for the introduction of de novo activity for a desired chemical reaction.
Collapse
Affiliation(s)
- Raj Chakrabarti
- Department of Chemistry and Center for Biomolecular Simulation, Columbia University, New York, NY 10027, USA
| | | | | |
Collapse
|
47
|
Ferrè F, Ausiello G, Zanzoni A, Helmer-Citterich M. Functional annotation by identification of local surface similarities: a novel tool for structural genomics. BMC Bioinformatics 2005; 6:194. [PMID: 16076399 PMCID: PMC1190158 DOI: 10.1186/1471-2105-6-194] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2005] [Accepted: 08/02/2005] [Indexed: 12/03/2022] Open
Abstract
Background Protein function is often dependent on subsets of solvent-exposed residues that may exist in a similar three-dimensional configuration in non homologous proteins thus having different order and/or spacing in the sequence. Hence, functional annotation by means of sequence or fold similarity is not adequate for such cases. Results We describe a method for the function-related annotation of protein structures by means of the detection of local structural similarity with a library of annotated functional sites. An automatic procedure was used to annotate the function of local surface regions. Next, we employed a sequence-independent algorithm to compare exhaustively these functional patches with a larger collection of protein surface cavities. After tuning and validating the algorithm on a dataset of well annotated structures, we applied it to a list of protein structures that are classified as being of unknown function in the Protein Data Bank. By this strategy, we were able to provide functional clues to proteins that do not show any significant sequence or global structural similarity with proteins in the current databases. Conclusion This method is able to spot structural similarities associated to function-related similarities, independently on sequence or fold resemblance, therefore is a valuable tool for the functional analysis of uncharacterized proteins. Results are available at
Collapse
Affiliation(s)
- Fabrizio Ferrè
- Boston College, Biology Department, Chestnut Hill MA, USA
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Italy
| | - Gabriele Ausiello
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Italy
| | - Andreas Zanzoni
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Italy
| | - Manuela Helmer-Citterich
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Italy
| |
Collapse
|
48
|
Conant GC, Wagner A. The rarity of gene shuffling in conserved genes. Genome Biol 2005; 6:R50. [PMID: 15960802 PMCID: PMC1175970 DOI: 10.1186/gb-2005-6-6-r50] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2005] [Revised: 03/23/2005] [Accepted: 04/13/2005] [Indexed: 12/02/2022] Open
Abstract
The incidence of gene shuffling is estimated in conserved genes in 10 organisms from the three domains of life. Successful gene shuffling is found to be very rare among such conserved genes. This suggests that gene shuffling may not be a major force in reshaping the core genomes of eukaryotes. Background Among three sources of evolutionary innovation in gene function - point mutations, gene duplications, and gene shuffling (recombination between dissimilar genes) - gene shuffling is the most potent one. However, surprisingly little is known about its incidence on a genome-wide scale. Results We have studied shuffling in genes that are conserved between distantly related species. Specifically, we estimated the incidence of gene shuffling in ten organisms from the three domains of life: eukaryotes, eubacteria, and archaea, considering only genes showing significant sequence similarity in pairwise genome comparisons. We found that successful gene shuffling is very rare among such conserved genes. For example, we could detect only 48 successful gene-shuffling events in the genome of the fruit fly Drosophila melanogaster which have occurred since its common ancestor with the worm Caenorhabditis elegans more than half a billion years ago. Conclusion The incidence of gene shuffling is roughly an order of magnitude smaller than the incidence of single-gene duplication in eukaryotes, but it can approach or even exceed the gene-duplication rate in prokaryotes. If true in general, this pattern suggests that gene shuffling may not be a major force in reshaping the core genomes of eukaryotes. Our results also cast doubt on the notion that introns facilitate gene shuffling, both because prokaryotes show an appreciable incidence of gene shuffling despite their lack of introns and because we find no statistical association between exon-intron boundaries and recombined domains in the two multicellular genomes we studied.
Collapse
Affiliation(s)
- Gavin C Conant
- Department of Genetics, Smurfit Institute, University of Dublin, Trinity College, Dublin 2, Ireland
| | - Andreas Wagner
- Department of Biology, The University of New Mexico, Albuquerque, NM 87131-0001, USA
| |
Collapse
|
49
|
Wang K, Samudrala R. FSSA: a novel method for identifying functional signatures from structural alignments. Bioinformatics 2005; 21:2969-77. [PMID: 15860561 DOI: 10.1093/bioinformatics/bti471] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION It is commonly believed that sequence determines structure, which in turn determines function. However, the presence of many proteins with the same structural fold but different functions suggests that global structure and function do not always correlate well. RESULTS We propose a method for accurate functional annotation, based on identification of functional signatures from structural alignments (FSSA) using the Structural Classification of Proteins (SCOP) database. The FSSA method is superior at function discrimination and classification compared with several methods that directly inherit functional annotation information from homology inference, such as Smith-Waterman, PSI-BLAST, hidden Markov models and structure comparison methods, for a large number of structural fold families. Our results indicate that the contributions of amino acid residue types and positions to structure and function are largely separable for proteins in multi-functional fold families.
Collapse
Affiliation(s)
- Kai Wang
- Computational Genomics Group, Department of Microbiology, University of Washington Seattle, WA 98195, USA
| | | |
Collapse
|
50
|
Balamurugan R, Dekker FJ, Waldmann H. Design of compound libraries based on natural product scaffolds and protein structure similarity clustering (PSSC). MOLECULAR BIOSYSTEMS 2005; 1:36-45. [PMID: 16880961 DOI: 10.1039/b503623b] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Recent advances in structural biology, bioinformatics and combinatorial chemistry have significantly impacted the discovery of small molecules that modulate protein functions. Natural products which have evolved to bind to proteins may serve as biologically validated starting points for the design of focused libraries that might provide protein ligands with enhanced quality and probability. The combined application of natural product derived scaffolds with a new approach that clusters proteins according to structural similarity of their ligand sensing cores provides a new principle for the design and synthesis of such libraries. This article discusses recent advances in the synthesis of natural product inspired compound collections and the application of protein structure similarity clustering for the development of such libraries.
Collapse
Affiliation(s)
- Rengarajan Balamurugan
- Department of Chemical Biology, Max-Planck Institute of Molecular Physiology, Dortmund, Germany
| | | | | |
Collapse
|