1
|
Ghadermarzi S, Krawczyk B, Song J, Kurgan L. XRRpred: Accurate Predictor of Crystal Structure Quality from Protein Sequence. Bioinformatics 2021; 37:4366-4374. [PMID: 34247234 DOI: 10.1093/bioinformatics/btab509] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 06/10/2021] [Accepted: 07/06/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION X-ray crystallography was used to produce nearly 90% of protein structures. These efforts were supported by numerous sequence-based tools that accurately predict crystallizable proteins. However, protein structures vary widely in their quality, typically measured with resolution and R-free. This impacts the ability to use these structures for some applications including rational drug design and molecular docking and motivates development of methods that accurately predict structure quality. RESULTS We introduce XRRpred, the first predictor of the resolution and R-free values from protein sequences. XRRpred relies on original sequence profiles, hand-crafted features, empirically selected and parametrized regressors, and modern resampling techniques. Using an independent test dataset, we show that XRRpred provides accurate predictions of resolution and R-free. We demonstrate that XRRpred's predictions correctly model relationship between the resolution and R-free and reproduce structure quality relations between structural classes of proteins. We also show that XRRpred significantly outperforms indirect alternative ways to predict the structure quality that include predictors of crystallization propensity and an alignment-based approach. XRRpred is available as a convenient webserver that allows batch predictions and offers informative visualization of the results. AVAILABILITY http://biomine.cs.vcu.edu/servers/XRRPred/.
Collapse
Affiliation(s)
- Sina Ghadermarzi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Bartosz Krawczyk
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
2
|
Susanti D, Frazier MC, Mukhopadhyay B. A Genetic System for Methanocaldococcus jannaschii: An Evolutionary Deeply Rooted Hyperthermophilic Methanarchaeon. Front Microbiol 2019; 10:1256. [PMID: 31333590 PMCID: PMC6616113 DOI: 10.3389/fmicb.2019.01256] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2018] [Accepted: 05/20/2019] [Indexed: 12/20/2022] Open
Abstract
Phylogenetically deeply rooted methanogens belonging to the genus of Methanocaldococcus living in deep-sea hydrothermal vents derive energy exclusively from hydrogenotrophic methanogenesis, one of the oldest respiratory metabolisms on Earth. These hyperthermophilic, autotrophic archaea synthesize their biomolecules from inorganic substrates and perform high temperature biocatalysis producing methane, a valuable fuel and potent greenhouse gas. The information processing and stress response systems of archaea are highly homologous to those of the eukaryotes. For this broad relevance, Methanocaldococcus jannaschii, the first hyperthermophilic chemolithotrophic organism that was isolated from a deep-sea hydrothermal vent, was also the first archaeon and third organism for which the whole genome sequence was determined. The research that followed uncovered numerous novel information in multiple fields, including those described above. M. jannaschii was found to carry ancient redox control systems, precursors of dissimilatory sulfate reduction enzymes, and a eukaryotic-like protein translocation system. It provided a platform for structural genomics and tools for incorporating unnatural amino acids into proteins. However, the assignments of in vivo relevance to these findings or interrogations of unknown aspects of M. jannaschii through genetic manipulations remained out of reach, as the organism was genetically intractable. This report presents tools and methods that remove this block. It is now possible to knockout or modify a gene in M. jannaschii and genetically fuse a gene with an affinity tag sequence, thereby allowing facile isolation of a protein with M. jannaschii-specific attributes. These tools have helped to genetically validate the role of a novel coenzyme F420-dependent sulfite reductase in conferring resistance to sulfite in M. jannaschii and to demonstrate that the organism possesses a deazaflavin-dependent system for neutralizing oxygen.
Collapse
Affiliation(s)
- Dwi Susanti
- Department of Biochemistry, Virginia Tech, Blacksburg, VA, United States
| | - Mary C Frazier
- Department of Biochemistry, Virginia Tech, Blacksburg, VA, United States
| | - Biswarup Mukhopadhyay
- Department of Biochemistry, Virginia Tech, Blacksburg, VA, United States.,Biocomplexity Institute, Virginia Tech, Blacksburg, VA, United States.,Virginia Tech Carilion School of Medicine, Virginia Tech, Blacksburg, VA, United States
| |
Collapse
|
3
|
Shao L, Gao H, Liu Z, Feng J, Tang L, Lin H. Identification of Antioxidant Proteins With Deep Learning From Sequence Information. Front Pharmacol 2018; 9:1036. [PMID: 30294271 PMCID: PMC6158654 DOI: 10.3389/fphar.2018.01036] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2018] [Accepted: 08/27/2018] [Indexed: 01/26/2023] Open
Abstract
Antioxidant proteins have been found closely linked to disease control for its ability to eliminate excess free radicals. Because of its medicinal value, the study of identifying antioxidant proteins is on the upsurge. Many machine-learning classifiers have performed poorly owing to the nonlinear and unbalanced nature of biological data. Recently, deep learning techniques showed advantages over many state-of-the-art machine learning methods in various fields. In this study, a deep learning based classifier was proposed to identify antioxidant proteins based on mixed g-gap dipeptide composition feature vector. The classifier employed deep autoencoder to extract nonlinear representation from raw input. The t-Distributed Stochastic Neighbor Embedding (t-SNE) was used for dimensionality reduction. Support vector machine was finally performed for classification. The classifier achieved F 1 score of 0.8842 and MCC of 0.7409 in 10-fold cross validation. Experimental results show that our proposed method outperformed the traditional machine learning methods and could be a promising tool for antioxidant protein identification. For the convenience of others' scientific research, we have developed a user-friendly web server called IDAod for antioxidant protein identification, which can be accessed freely at http://bigroup.uestc.edu.cn/IDAod/.
Collapse
Affiliation(s)
- Lifen Shao
- Center for Informational Biology, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Hui Gao
- Center for Informational Biology, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhen Liu
- Center for Informational Biology, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Juan Feng
- Key Laboratory for Neuro-Information of Ministry of Education, Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Lixia Tang
- Key Laboratory for Neuro-Information of Ministry of Education, Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
4
|
Kim HN, Seok SH, Lee YS, Won HS, Seo MD. Crystal structure and functional characterization of SF216 from Shigella flexneri. FEBS Lett 2017; 591:3692-3703. [PMID: 28983914 DOI: 10.1002/1873-3468.12873] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Revised: 10/01/2017] [Accepted: 10/02/2017] [Indexed: 12/18/2022]
Abstract
Shigella flexneri is a Gram-negative anaerobic bacterium that causes highly infectious bacterial dysentery in humans. Here, we solved the crystal structure of SF216, a hypothetical protein from the S. flexneri 5a strain M90T, at 1.7 Å resolution. The crystal structure of SF216 represents a homotrimer stabilized by intersubunit interactions and ion-mediated electrostatic interactions. Each subunit consists of three β-strands and five α-helices with the β-β-β-α-α-α-α-α topology. Based on the structural information, we also demonstrate that SF216 shows weak ribonuclease activity by a fluorescence quenching assay. Furthermore, we identify potential druggable pockets (putative hot spots) on the surface of the SF216 structure by computational mapping.
Collapse
Affiliation(s)
- Ha-Neul Kim
- Department of Molecular Science and Technology, Ajou University, Suwon, Gyeonggi, Korea.,College of Pharmacy, Ajou University, Suwon, Gyeonggi, Korea
| | | | - Yoo-Sup Lee
- Department of Molecular Science and Technology, Ajou University, Suwon, Gyeonggi, Korea
| | - Hyung-Sik Won
- Department of Biotechnology, Research Institute and College of Biomedical and Health Science (RIBHS), Konkuk University, Chungju, Chungbuk, Korea
| | - Min-Duk Seo
- Department of Molecular Science and Technology, Ajou University, Suwon, Gyeonggi, Korea.,College of Pharmacy, Ajou University, Suwon, Gyeonggi, Korea
| |
Collapse
|
5
|
Herlihy SE, Tang Y, Phillips JE, Gomer RH. Functional similarities between the dictyostelium protein AprA and the human protein dipeptidyl-peptidase IV. Protein Sci 2017; 26:578-585. [PMID: 28028841 DOI: 10.1002/pro.3107] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Revised: 12/20/2016] [Accepted: 12/21/2016] [Indexed: 01/15/2023]
Abstract
Autocrine proliferation repressor protein A (AprA) is a protein secreted by Dictyostelium discoideum cells. Although there is very little sequence similarity between AprA and any human protein, AprA has a predicted structural similarity to the human protein dipeptidyl peptidase IV (DPPIV). AprA is a chemorepellent for Dictyostelium cells, and DPPIV is a chemorepellent for neutrophils. This led us to investigate if AprA and DPPIV have additional functional similarities. We find that like AprA, DPPIV is a chemorepellent for, and inhibits the proliferation of, D. discoideum cells, and that AprA binds some DPPIV binding partners such as fibronectin. Conversely, rAprA has DPPIV-like protease activity. These results indicate a functional similarity between two eukaryotic chemorepellent proteins with very little sequence similarity, and emphasize the usefulness of using a predicted protein structure to search a protein structure database, in addition to searching for proteins with similar sequences.
Collapse
Affiliation(s)
- Sarah E Herlihy
- Department of Biology, Texas A&M University, College Station, Texas
| | - Yu Tang
- Department of Biology, Texas A&M University, College Station, Texas
| | | | - Richard H Gomer
- Department of Biology, Texas A&M University, College Station, Texas
| |
Collapse
|
6
|
Peng H, Yang T, Whitaker BD, Shangguan L, Fang J. Calcium/calmodulin alleviates substrate inhibition in a strawberry UDP-glucosyltransferase involved in fruit anthocyanin biosynthesis. BMC PLANT BIOLOGY 2016; 16:197. [PMID: 27609111 PMCID: PMC5017016 DOI: 10.1186/s12870-016-0888-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Accepted: 09/01/2016] [Indexed: 05/06/2023]
Abstract
BACKGROUND UDP-glucosyltransferase (UGT) is a key enzyme for anthocyanin biosynthesis, which by catalyzing glycosylation of anthocyanidins increases their solubility and accumulation in plants. Previously we showed that pre-harvest spray of CaCl2 enhanced anthocyanin accumulation in strawberry fruit by stimulating the expression of anthocyanin structural genes including a fruit specific FvUGT1. RESULTS To further understand the regulation of anthocyanin biosynthesis, we conducted kinetic analysis of recombinant FvUGT1 on glycosylation of pelargonidin, the major anthocyanidin in strawberry fruit. At the fixed pelargonidin concentration, FvUGT1 catalyzed the sugar transfer from UDP-glucose basically following Michaelis-Menten kinetics. By contrast, at the fixed UDP-glucose concentration, pelargonidin over 150 μM exhibited marked partial substrate inhibition in an uncompetitive mode. These results suggest that the sugar acceptor at high concentration inhibits FvUGT1 activity by binding to another site in addition to the catalytic site. Furthermore, calcium/calmodulin specifically bound FvUGT1 at a site partially overlapping with the interdomain linker, and significantly relieved the substrate inhibition. In the presence of 0.1 and 0.5 μM calmodulin, V max was increased by 71.4 and 327 %, respectively. CONCLUSIONS FvUGT1 activity is inhibited by anthocyanidin, the sugar acceptor substrate, and calcium/calmodulin binding to FvUGT1 enhances anthocyanin accumulation via alleviation of this substrate inhibition.
Collapse
Affiliation(s)
- Hui Peng
- Agricultural Research Service of U.S. Department of Agriculture, From the Food Quality Laboratory, Beltsville Agricultural Research Center, Beltsville, MD 20705 USA
- Horticulture & Landscape College, Hunan Agricultural University, Changsha, Hunan 410128 China
| | - Tianbao Yang
- Agricultural Research Service of U.S. Department of Agriculture, From the Food Quality Laboratory, Beltsville Agricultural Research Center, Beltsville, MD 20705 USA
| | - Bruce D. Whitaker
- Agricultural Research Service of U.S. Department of Agriculture, From the Food Quality Laboratory, Beltsville Agricultural Research Center, Beltsville, MD 20705 USA
| | - Lingfei Shangguan
- Agricultural Research Service of U.S. Department of Agriculture, From the Food Quality Laboratory, Beltsville Agricultural Research Center, Beltsville, MD 20705 USA
- College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095 China
| | - Jinggui Fang
- College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu 210095 China
| |
Collapse
|
7
|
Babbitt GA, Coppola EE, Alawad MA, Hudson AO. Can all heritable biology really be reduced to a single dimension? Gene 2016; 578:162-8. [DOI: 10.1016/j.gene.2015.12.043] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Revised: 12/16/2015] [Accepted: 12/17/2015] [Indexed: 12/23/2022]
|
8
|
Xu W, Peng H, Yang T, Whitaker B, Huang L, Sun J, Chen P. Effect of calcium on strawberry fruit flavonoid pathway gene expression and anthocyanin accumulation. PLANT PHYSIOLOGY AND BIOCHEMISTRY : PPB 2014; 82:289-98. [PMID: 25036468 DOI: 10.1016/j.plaphy.2014.06.015] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2014] [Accepted: 06/25/2014] [Indexed: 05/18/2023]
Abstract
Two diploid woodland strawberry (Fragaria vesca) inbred lines, Ruegen F7-4 (red fruit-bearing) and YW5AF7 (yellow fruit-bearing) were used to study the regulation of anthocyanin biosynthesis in fruit. Ruegen F7-4 fruit had similar total phenolics and anthocyanin contents to commercial octoploid (F. × ananassa) cultivar Seascape, while YW5AF7 exhibited relatively low total phenolics content and no anthocyanin accumulation. Foliar spray of CaCl2 boosted fruit total phenolics content, especially anthocyanins, by more than 20% in both Seascape and RF7-4. Expression levels of almost all the flavonoid pathway genes were comparable in Ruegen F7-4 and YW5AF7 green-stage fruit. However, at the turning and ripe stages, key anthocyanin structural genes, including flavanone 3-hydroxylase (F3H1), dihydroflavonol 4-reductase (DFR2), anthocyanidin synthase (ANS1), and UDP-glucosyltransferase (UGT1), were highly expressed in Ruegen F7-4 compared with YW5AF7 fruit. Calcium treatment further stimulated the expression of those genes in Ruegen F7-4 fruit. Anthocyanins isolated from petioles of YW5AF7 and Ruegen F-7 had the same HPLC-DAD profile, which differed from that of Ruegen F-7 fruit anthocyanins. All the anthocyanin structural genes except FvUGT1 were detected in petioles of YW5AF7 and Ruegen F-7. Taken together, these results indicate that the "yellow" gene in YW5AF7 is a fruit specific regulatory gene(s) for anthocyanin biosynthesis. Calcium can enhance accumulation of anthocyanins and total phenolics in fruit possibly via upregulation of anthocyanin structural genes. Our results also suggest that the anthocyanin biosynthesis machinery in petioles is different from that in fruit.
Collapse
Affiliation(s)
- Wenping Xu
- School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, China; Food Quality Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service of U.S. Department of Agriculture (USDA-ARS), 10300 Baltimore Avenue, Beltsville, MD 20705, USA
| | - Hui Peng
- Food Quality Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service of U.S. Department of Agriculture (USDA-ARS), 10300 Baltimore Avenue, Beltsville, MD 20705, USA; College of Life Sciences, Guangxi Normal University, Guilin, Guangxi 541004, China
| | - Tianbao Yang
- Food Quality Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service of U.S. Department of Agriculture (USDA-ARS), 10300 Baltimore Avenue, Beltsville, MD 20705, USA.
| | - Bruce Whitaker
- Food Quality Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service of U.S. Department of Agriculture (USDA-ARS), 10300 Baltimore Avenue, Beltsville, MD 20705, USA
| | - Luhong Huang
- Food Quality Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service of U.S. Department of Agriculture (USDA-ARS), 10300 Baltimore Avenue, Beltsville, MD 20705, USA; Hunan Agricultural Product Processing Institute, Hunan Academy of Agricultural Sciences, Changsha, Hunan 410125, China
| | - Jianghao Sun
- Food Composition and Methods Development Laboratory, Beltsville Human Nutrition Research Center, Agricultural Research Service of U.S. Department of Agriculture, Beltsville, MD 20705, USA
| | - Pei Chen
- Food Composition and Methods Development Laboratory, Beltsville Human Nutrition Research Center, Agricultural Research Service of U.S. Department of Agriculture, Beltsville, MD 20705, USA
| |
Collapse
|
9
|
van der Lee R, Buljan M, Lang B, Weatheritt RJ, Daughdrill GW, Dunker AK, Fuxreiter M, Gough J, Gsponer J, Jones D, Kim PM, Kriwacki R, Oldfield CJ, Pappu RV, Tompa P, Uversky VN, Wright P, Babu MM. Classification of intrinsically disordered regions and proteins. Chem Rev 2014; 114:6589-631. [PMID: 24773235 PMCID: PMC4095912 DOI: 10.1021/cr400525m] [Citation(s) in RCA: 1568] [Impact Index Per Article: 142.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2013] [Indexed: 12/11/2022]
Affiliation(s)
- Robin van der Lee
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
- Centre
for Molecular and Biomolecular Informatics, Radboud University Medical Centre, 6500 HB Nijmegen, The
Netherlands
| | - Marija Buljan
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Benjamin Lang
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Robert J. Weatheritt
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| | - Gary W. Daughdrill
- Department
of Cell Biology, Microbiology, and Molecular Biology, University of South Florida, 3720 Spectrum Boulevard, Suite 321, Tampa, Florida 33612, United States
| | - A. Keith Dunker
- Department
of Biochemistry and Molecular Biology, Indiana
University School of Medicine, Indianapolis, Indiana 46202, United States
| | - Monika Fuxreiter
- MTA-DE
Momentum Laboratory of Protein Dynamics, Department of Biochemistry
and Molecular Biology, University of Debrecen, H-4032 Debrecen, Nagyerdei krt 98, Hungary
| | - Julian Gough
- Department
of Computer Science, University of Bristol, The Merchant Venturers Building, Bristol BS8 1UB, United Kingdom
| | - Joerg Gsponer
- Department
of Biochemistry and Molecular Biology, Centre for High-Throughput
Biology, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - David
T. Jones
- Bioinformatics
Group, Department of Computer Science, University
College London, London, WC1E 6BT, United Kingdom
| | - Philip M. Kim
- Terrence Donnelly Centre for Cellular and Biomolecular Research, Department of Molecular
Genetics, and Department of Computer Science, University
of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Richard
W. Kriwacki
- Department
of Structural Biology, St. Jude Children’s
Research Hospital, Memphis, Tennessee 38105, United States
| | - Christopher J. Oldfield
- Department
of Biochemistry and Molecular Biology, Indiana
University School of Medicine, Indianapolis, Indiana 46202, United States
| | - Rohit V. Pappu
- Department
of Biomedical Engineering and Center for Biological Systems Engineering, Washington University in St. Louis, St. Louis, Missouri 63130, United States
| | - Peter Tompa
- VIB Department
of Structural Biology, Vrije Universiteit
Brussel, Brussels, Belgium
- Institute
of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - Vladimir N. Uversky
- Department
of Molecular Medicine and USF Health Byrd Alzheimer’s Research
Institute, Morsani College of Medicine, University of South Florida, Tampa, Florida 33612, United States
- Institute for Biological Instrumentation,
Russian Academy of Sciences, Pushchino,
Moscow Region, Russia
| | - Peter
E. Wright
- Department
of Integrative Structural and Computational Biology and Skaggs Institute
of Chemical Biology, The Scripps Research
Institute, 10550 North
Torrey Pines Road, La Jolla, California 92037, United States
| | - M. Madan Babu
- MRC
Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, United Kingdom
| |
Collapse
|
10
|
|
11
|
Sun J, Liu X, Yang T, Slovin J, Chen P. Profiling polyphenols of two diploid strawberry (Fragaria vesca) inbred lines using UHPLC-HRMS(n.). Food Chem 2014; 146:289-98. [PMID: 24176345 PMCID: PMC3902803 DOI: 10.1016/j.foodchem.2013.08.089] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2013] [Revised: 08/08/2013] [Accepted: 08/20/2013] [Indexed: 11/22/2022]
Abstract
Phenolic compounds in the fruits of two diploid strawberries (Fragaria vesca f. semperflorens) inbred lines-Ruegen F7-4 (a red-fruited genotype) and YW5AF7 (a yellow-fruited genotype) were characterised using ultra-high-performance liquid chromatography coupled with tandem high-resolution mass spectrometry (UHPLC-HRMS(n)). The changes of anthocyanin composition during fruit development and between Ruegen F7-4 and YW5AF7 were studied. About 67 phenolic compounds, including taxifolin 3-O-arabinoside, glycosides of quercetin, kaempferol, cyanidin, pelargonidin, peonidin, ellagic acid derivatives, and other flavonols were identified in these two inbred lines. Compared to the regular octoploid strawberry, unique phenolic compounds were found in F. vesca fruits, such as taxifolin 3-O-arabinoside (both) and peonidin 3-O-malonylglucoside (Ruegen F7-4). The results provide the basis for comparative analysis of polyphenolic compounds in yellow and red diploid strawberries, as well as with the cultivated octoploid strawberries.
Collapse
Affiliation(s)
- Jianghao Sun
- U.S. Department of Agriculture, Agricultural Research
Service, Beltsville Human Nutrition Research Center, Food Composition and Methods
Development Laboratory, Beltsville, MD 20705, United States
| | - Xianjin Liu
- Jiangsu Academy of Agricultural Sciences, Institute of Food
Safety and Quality, Najing, Jiangsu Province, China
| | - Tianbao Yang
- U.S. Department of Agriculture, Agricultural Research
Service, Food Quality Laboratory, Beltsville, MD 20705, United States
| | - Janet Slovin
- U.S. Department of Agriculture, Agricultural Research
Service, Genetic Improvement of Fruits and Vegetables Laboratory, Beltsville, MD
20705, United States
| | - Pei Chen
- U.S. Department of Agriculture, Agricultural Research
Service, Beltsville Human Nutrition Research Center, Food Composition and Methods
Development Laboratory, Beltsville, MD 20705, United States
| |
Collapse
|
12
|
Brylinski M. Exploring the "dark matter" of a mammalian proteome by protein structure and function modeling. Proteome Sci 2013; 11:47. [PMID: 24321360 PMCID: PMC3866606 DOI: 10.1186/1477-5956-11-47] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2013] [Accepted: 12/03/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A growing body of evidence shows that gene products encoded by short open reading frames play key roles in numerous cellular processes. Yet, they are generally overlooked in genome assembly, escaping annotation because small protein-coding genes are difficult to predict computationally. Consequently, there are still a considerable number of small proteins whose functions are yet to be characterized. RESULTS To address this issue, we apply a collection of structural bioinformatics algorithms to infer molecular function of putative small proteins from the mouse proteome. Specifically, we construct 1,743 confident structure models of small proteins, which reveal a significant structural diversity with a noticeably high helical content. A subsequent structure-based function annotation of small protein models exposes 178,745 putative protein-protein interactions with the remaining gene products in the mouse proteome, 1,100 potential binding sites for small organic molecules and 987 metal-binding signatures. CONCLUSIONS These results strongly indicate that many small proteins adopt three-dimensional structures and are fully functional, playing important roles in transcriptional regulation, cell signaling and metabolism. Data collected through this work is freely available to the academic community at http://www.brylinski.org/content/databases to support future studies oriented on elucidating the functions of hypothetical small proteins.
Collapse
Affiliation(s)
- Michal Brylinski
- Department of Biological Sciences, Louisiana State University, 70803 Baton Rouge, LA, USA.
| |
Collapse
|
13
|
Herlihy SE, Pilling D, Maharjan AS, Gomer RH. Dipeptidyl peptidase IV is a human and murine neutrophil chemorepellent. THE JOURNAL OF IMMUNOLOGY 2013; 190:6468-77. [PMID: 23677473 DOI: 10.4049/jimmunol.1202583] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
In Dictyostelium discoideum, AprA is a secreted protein that inhibits proliferation and causes chemorepulsion of Dictyostelium cells, yet AprA has little sequence similarity to any human proteins. We found that a predicted structure of AprA has similarity to human dipeptidyl peptidase IV (DPPIV). DPPIV is a serine protease present in extracellular fluids that cleaves peptides with a proline or alanine in the second position. In Insall chambers, DPPIV gradients below, similar to, and above the human serum DPPIV concentration cause movement of human neutrophils away from the higher concentration of DPPIV. A 1% DPPIV concentration difference between the front and back of the cell is sufficient to cause chemorepulsion. Neutrophil speed and viability are unaffected by DPPIV. DPPIV inhibitors block DPPIV-mediated chemorepulsion. In a murine model of acute respiratory distress syndrome, aspirated bleomycin induces a significant increase in the number of neutrophils in the lungs after 3 d. Oropharyngeal aspiration of DPPIV inhibits the bleomycin-induced accumulation of mouse neutrophils. These results indicate that DPPIV functions as a chemorepellent of human and mouse neutrophils, and they suggest new mechanisms to inhibit neutrophil accumulation in acute respiratory distress syndrome.
Collapse
Affiliation(s)
- Sarah E Herlihy
- Department of Biology, Texas A&M University, College Station, TX 77843, USA
| | | | | | | |
Collapse
|
14
|
Bitra A, Hussain B, Tanwar AS, Anand R. Identification of Function and Mechanistic Insights of Guanine Deaminase from Nitrosomonas europaea: Role of the C-Terminal Loop in Catalysis. Biochemistry 2013; 52:3512-22. [DOI: 10.1021/bi400068g] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Affiliation(s)
- Aruna Bitra
- Department of Chemistry, IIT Bombay, Mumbai, India 400076
| | - Bhukya Hussain
- Department of Chemistry, IIT Bombay, Mumbai, India 400076
| | | | - Ruchi Anand
- Department of Chemistry, IIT Bombay, Mumbai, India 400076
| |
Collapse
|
15
|
Mallipeddi PL, Joshi M, Briggs JM. Pharmacophore-Based Virtual Screening to Aid in the Identification of Unknown Protein Function. Chem Biol Drug Des 2012; 80:828-42. [DOI: 10.1111/j.1747-0285.2012.01408.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
16
|
Nagarajan R, Siva Balan S, Sabarinathan R, Kirti Vaishnavi M, Sekar K. Fragment Finder 2.0: a computing server to identify structurally similar fragments. J Appl Crystallogr 2012. [DOI: 10.1107/s0021889812001501] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Fragment Finder 2.0is a web-based interactive computing server which can be used to retrieve structurally similar protein fragments from 25 and 90% nonredundant data sets. The computing server identifies structurally similar fragments using the protein backbone Cα angles. In addition, the identified fragments can be superimposed using either of the two structural superposition programs,STAMPandPROFIT, provided in the server. The freely available Java plug-inJmolhas been interfaced with the server for the visualization of the query and superposed fragments. The server is the updated version of a previously developed search engine and employs an in-house-developed fast pattern matching algorithm. This server can be accessed freely over the World Wide Web through the URL http://cluster.physics.iisc.ernet.in/ff/.
Collapse
|
17
|
Vorobjev YN. Blind docking method combining search of low-resolution binding sites with ligand pose refinement by molecular dynamics-based global optimization. J Comput Chem 2010; 31:1080-92. [PMID: 19821514 DOI: 10.1002/jcc.21394] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
This study describes the development of a new blind hierarchical docking method, bhDock, its implementation, and accuracy assessment. The bhDock method uses two-step algorithm. First, a comprehensive set of low-resolution binding sites is determined by analyzing entire protein surface and ranked by a simple score function. Second, ligand position is determined via a molecular dynamics-based method of global optimization starting from a small set of high ranked low-resolution binding sites. The refinement of the ligand binding pose starts from uniformly distributed multiple initial ligand orientations and uses simulated annealing molecular dynamics coupled with guided force-field deformation of protein-ligand interactions to find the global minimum. Assessment of the bhDock method on the set of 37 protein-ligand complexes has shown the success rate of predictions of 78%, which is better than the rate reported for the most cited docking methods, such as AutoDock, DOCK, GOLD, and FlexX, on the same set of complexes.
Collapse
Affiliation(s)
- Yury N Vorobjev
- Institute of Chemical Biology and Fundamental Medicine of the Siberian Branch of the Russian Academy of Science, Novosibirsk, Russia.
| |
Collapse
|
18
|
Rizvi SB, Shukla AK, Dubey VK. A simple method based on multiple alignment and phylogeny to derive a correlation between the protein fold and sequence via motif search. Interdiscip Sci 2009; 1:235-243. [PMID: 20640843 DOI: 10.1007/s12539-009-0041-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2009] [Revised: 04/18/2009] [Accepted: 05/11/2009] [Indexed: 05/29/2023]
Abstract
Predicting information regarding the structure of the protein from its sequence still remains an uphill task. Though both are intimately linked, it has been found difficult so far to get a direct correlation between the two. In our present approach we use a simple method based on multiple alignment and phylogeny to derive a correlation between the protein structure and sequence via motif search. The protein families which we have considered are SH2 like, Homeodomain, Leucine rich repeat, Alphabeta knot trefoilknot and ferritin like helix bundle. We have been able to successfully predict the protein families with an average prediction of accuracy of 81%, the highest being 89% and the lowest being 73% on our test data set.
Collapse
Affiliation(s)
- Syed Baquer Rizvi
- Department of Biotechnology, Indian Institute of Technology Guwahati, Assam, 781039, India
| | | | | |
Collapse
|
19
|
Sadowski MI, Jones DT. The sequence-structure relationship and protein function prediction. Curr Opin Struct Biol 2009; 19:357-62. [PMID: 19406632 DOI: 10.1016/j.sbi.2009.03.008] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2009] [Accepted: 03/16/2009] [Indexed: 11/28/2022]
Abstract
An incomplete understanding of protein sequence/structure/function relationships causes many difficulties for prediction methods. The highly complex nature of these relationships is a consequence of the interplay between physics and evolution that has been studied using a wide array of experimental and theoretical techniques. We review recent findings relating to conservation of sequence, structure and function and discuss their use in developing improved prediction methods.
Collapse
Affiliation(s)
- M I Sadowski
- Division of Mathematical Biology, National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA UK
| | | |
Collapse
|
20
|
Buchko GW, Robinson H, Addlagatta A. Structural characterization of the protein cce_0567 from Cyanothece 51142, a metalloprotein associated with nitrogen fixation in the DUF683 family. BIOCHIMICA ET BIOPHYSICA ACTA 2009; 1794:627-33. [PMID: 19336042 PMCID: PMC3707797 DOI: 10.1016/j.bbapap.2009.01.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2008] [Revised: 01/06/2009] [Accepted: 01/07/2009] [Indexed: 11/26/2022]
Abstract
The genomes of many cyanobacteria contain the sequence for a small protein with a common "Domain of Unknown Function" grouped into the DUF683 protein family. While the biological function of DUF683 is still not known, their genomic location within nitrogen fixation clusters suggests that DUF683 proteins may play a role in the process. The diurnal cyanobacterium Cyanothece sp. PCC 51142 contains a gene for a protein that falls into the DUF683 family, cce_0567 (78 aa, 9.0 kDa). In an effort to elucidate the biochemical role DUF683 proteins may play in nitrogen fixation, we have determined the first crystal structure for a protein in this family, cce_0567, to 1.84 A resolution. Cce_0567 crystallized in space group P2(1) with two protein molecules and one Ni(2+) cation per asymmetric unit. The protein is composed of two alpha-helices, residues P11 to G41 (alpha1) and L49-E74 (alpha2), with the second alpha-helix containing a short 3(10)-helix (Y46-N48). A four-residue linker (L42-D45) between the helices allows them to form an anti-parallel bundle and cross over each other towards their termini. In solution it is likely that two molecules of cce_0567 form a rod-like dimer by the stacking interactions of approximately 1/2 of the protein. Histidine-36 is highly conserved in all known DUF683 proteins and the N2 nitrogen of the H36 side chain of each molecule in the dimer is coordinated with Ni(2+) in the crystal structure. The divalent cation Ni(2+) was titrated into (15)N-labeled cce_0567 and chemical shift perturbations were observed only in the (1)H-(15)N HSQC spectra for residues at, or near, the site of Ni(2+) binding observed in the crystal structure. There was no evidence for an increase in the size of cce_0567 upon binding Ni(2+), even in large molar excess of Ni(2+), indicating that a metal was not required for dimer formation. Circular dichroism spectroscopy indicated that cce_0567 was extremely robust, with a melting temperature of approximately 62 degrees C that was reversible.
Collapse
Affiliation(s)
- Garry W Buchko
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA.
| | | | | |
Collapse
|
21
|
Han GW, Rife C, Sawaya MR. Applications of bioinformatics to protein structures: how protein structure and bioinformatics overlap. Methods Mol Biol 2009; 569:157-172. [PMID: 19623490 DOI: 10.1007/978-1-59745-524-4_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
In this chapter, we will focus on the role of bioinformatics to analyze a protein after its protein structure has been determined. First, we present how to validate protein structures for quality assurance. Then, we discuss how to analyze protein-protein interfaces and how to predict the biomolecule which is the biological oligomeric state of the protein. Finally, we discuss how to search for homologs based on the 3-D structure which is an essential step for understanding protein function.
Collapse
Affiliation(s)
- Gye Won Han
- Burnham Institute for Medical Research, La Jolla, CA, USA
| | | | | |
Collapse
|
22
|
Bernardes JS, Fernandez JH, Vasconcelos ATR. Structural descriptor database: a new tool for sequence-based functional site prediction. BMC Bioinformatics 2008; 9:492. [PMID: 19032768 PMCID: PMC2612011 DOI: 10.1186/1471-2105-9-492] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2008] [Accepted: 11/25/2008] [Indexed: 11/19/2022] Open
Abstract
Background The Structural Descriptor Database (SDDB) is a web-based tool that predicts the function of proteins and functional site positions based on the structural properties of related protein families. Structural alignments and functional residues of a known protein set (defined as the training set) are used to build special Hidden Markov Models (HMM) called HMM descriptors. SDDB uses previously calculated and stored HMM descriptors for predicting active sites, binding residues, and protein function. The database integrates biologically relevant data filtered from several databases such as PDB, PDBSUM, CSA and SCOP. It accepts queries in fasta format and predicts functional residue positions, protein-ligand interactions, and protein function, based on the SCOP database. Results To assess the SDDB performance, we used different data sets. The Trypsion-like Serine protease data set assessed how well SDDB predicts functional sites when curated data is available. The SCOP family data set was used to analyze SDDB performance by using training data extracted from PDBSUM (binding sites) and from CSA (active sites). The ATP-binding experiment was used to compare our approach with the most current method. For all evaluations, significant improvements were obtained with SDDB. Conclusion SDDB performed better when trusty training data was available. SDDB worked better in predicting active sites rather than binding sites because the former are more conserved than the latter. Nevertheless, by using our prediction method we obtained results with precision above 70%.
Collapse
Affiliation(s)
- Juliana S Bernardes
- Laboratório Nacional de Computação Científica LNCC/MTC, Quitandinha, Petrópolis, RJ, Brazil.
| | | | | |
Collapse
|
23
|
Abstract
The initial objective of the Berkeley Structural Genomics Center was to obtain a near complete three-dimensional (3D) structural information of all soluble proteins of two minimal organisms, closely related pathogens Mycoplasma genitalium and M. pneumoniae. The former has fewer than 500 genes and the latter has fewer than 700 genes. A semiautomated structural genomics pipeline was set up from target selection, cloning, expression, purification, and ultimately structural determination. At the time of this writing, structural information of more than 93% of all soluble proteins of M. genitalium is avail able. This chapter summarizes the approaches taken by the authors' center.
Collapse
|
24
|
Buchko GW, Sofia HJ. Backbone 1H, 13C, and 15N NMR assignments for the Cyanothece 51142 protein cce_0567: a protein associated with nitrogen fixation in the DUF683 family. BIOMOLECULAR NMR ASSIGNMENTS 2008; 2:25-28. [PMID: 19636916 DOI: 10.1007/s12104-007-9075-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2007] [Accepted: 12/11/2007] [Indexed: 05/28/2023]
Abstract
Cyanothece 51142 contains a 78-residue protein, cce_0567, that falls into the DUF683 family of proteins associated with nitrogen fixation. Here we report the assignment of most of the main chain and 13C(beta) side chain resonances of the approximately 40 kDa homo-tetramer.
Collapse
Affiliation(s)
- Garry W Buchko
- Biological Sciences Division, Pacific Northwest National Laboratory, Mail-Stop K8-98, P.O. Box 999, Richland, WA 99352, USA.
| | | |
Collapse
|
25
|
Bagautdinov B, Matsuura Y, Bagautdinova S, Kunishima N, Yutani K. Structure of putative CutA1 from Homo sapiens determined at 2.05 A resolution. Acta Crystallogr Sect F Struct Biol Cryst Commun 2008; 64:351-7. [PMID: 18453701 PMCID: PMC2376402 DOI: 10.1107/s1744309108009846] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2008] [Accepted: 04/10/2008] [Indexed: 11/10/2022]
Abstract
The structure of human brain CutA1 (HsCutA1) has been determined using diffraction data to 2.05 A resolution. HsCutA1 has been implicated in the anchoring of acetylcholinesterase in neuronal cell membranes, while its bacterial homologue Escherichia coli CutA1 is involved in copper tolerance. Additionally, the structure of HsCutA1 bears similarity to that of the signal transduction protein PII, which is involved in regulation of nitrogen metabolism. Although several crystal structures of CutA1 from various sources with different rotation angles and degrees of interaction between trimer interfaces have been reported, the specific functional role of CutA1 is still unclear. In this study, the X-ray structure of HsCutA1 was determined in space group P2(1)2(1)2(1), with unit-cell parameters a = 68.69, b = 88.84, c = 125.33 A and six molecules per asymmetric unit. HsCutA1 is a trimeric molecule with intertwined antiparallel beta-strands; each subunit has a molecular weight of 14.6 kDa and contains 135 amino-acid residues. In order to obtain clues to the possible function of HsCutA1, its crystal structure was compared with those of other CutA1 and PII proteins.
Collapse
Affiliation(s)
- Bagautdin Bagautdinov
- Protein Structure Analysis Team, RIKEN SPring-8 Center, Harima Institute, 1-1-1 Kouto, Sayo-cho, Sayo-gun, Hyogo 679-5148, Japan.
| | | | | | | | | |
Collapse
|
26
|
Shin DH, Proudfoot M, Lim HJ, Choi IK, Yokota H, Yakunin AF, Kim R, Kim SH. Structural and enzymatic characterization of DR1281: A calcineurin-like phosphoesterase from Deinococcus radiodurans. Proteins 2008; 70:1000-9. [PMID: 17847097 DOI: 10.1002/prot.21584] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We have determined the crystal structure of DR1281 from Deinococcus radiodurans. DR1281 is a protein of unknown function with over 170 homologs found in prokaryotes and eukaryotes. To elucidate the molecular function of DR1281, its crystal structure at 2.3 A resolution was determined and a series of biochemical screens for catalytic activity was performed. The crystal structure shows that DR1281 has two domains, a small alpha domain and a putative catalytic domain formed by a four-layered structure of two beta-sheets flanked by five alpha-helices on both sides. The small alpha domain interacts with other molecules in the asymmetric unit and contributes to the formation of oligomers. The structural comparison of the putative catalytic domain with known structures suggested its biochemical function to be a phosphatase, phosphodiesterase, nuclease, or nucleotidase. Structural analyses with its homologues also indicated that there is a dinuclear center at the interface of two domains formed by Asp8, Glu37, Asn38, Asn65, His148, His173, and His175. An absolute requirement of metal ions for activity has been proved by enzymatic assay with various divalent metal ions. A panel of general enzymatic assays of DR1281 revealed metal-dependent catalytic activity toward model substrates for phosphatases (p-nitrophenyl phosphate) and phosphodiesterases (bis-p-nitrophenyl phosphate). Subsequent secondary enzymatic screens with natural substrates demonstrated significant phosphatase activity toward phosphoenolpyruvate and phosphodiesterase activity toward 2',3'-cAMP. Thus, our structural and enzymatic studies have identified the biochemical function of DR1281 as a novel phosphatase/phosphodiesterase and disclosed key conserved residues involved in metal binding and catalytic activity.
Collapse
Affiliation(s)
- Dong Hae Shin
- College of Pharmacy, Ewha Womans University, Seoul 120-750, Republic of Korea.
| | | | | | | | | | | | | | | |
Collapse
|
27
|
Wang Q, Canutescu AA, Dunbrack RL. SCWRL and MolIDE: computer programs for side-chain conformation prediction and homology modeling. Nat Protoc 2008; 3:1832-47. [PMID: 18989261 PMCID: PMC2682191 DOI: 10.1038/nprot.2008.184] [Citation(s) in RCA: 156] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
SCWRL and MolIDE are software applications for prediction of protein structures. SCWRL is designed specifically for the task of prediction of side-chain conformations given a fixed backbone usually obtained from an experimental structure determined by X-ray crystallography or NMR. SCWRL is a command-line program that typically runs in a few seconds. MolIDE provides a graphical interface for basic comparative (homology) modeling using SCWRL and other programs. MolIDE takes an input target sequence and uses PSI-BLAST to identify and align templates for comparative modeling of the target. The sequence alignment to any template can be manually modified within a graphical window of the target-template alignment and visualization of the alignment on the template structure. MolIDE builds the model of the target structure on the basis of the template backbone, predicted side-chain conformations with SCWRL and a loop-modeling program for insertion-deletion regions with user-selected sequence segments. SCWRL and MolIDE can be obtained at (http://dunbrack.fccc.edu/Software.php).
Collapse
Affiliation(s)
- Qiang Wang
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, PA 19111, USA
| | | | | |
Collapse
|
28
|
Kim SM, Bowers PM, Pal D, Strong M, Terwilliger TC, Kaufmann M, Eisenberg D. Functional linkages can reveal protein complexes for structure determination. Structure 2007; 15:1079-89. [PMID: 17850747 DOI: 10.1016/j.str.2007.06.021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2005] [Revised: 05/25/2007] [Accepted: 06/01/2007] [Indexed: 11/19/2022]
Abstract
In the study of protein complexes, is there a computational method for inferring which combinations of proteins in an organism are likely to form a crystallizable complex? Here we attempt to answer this question, using the Protein Data Bank (PDB) to assess the usefulness of inferred functional protein linkages from the Prolinks database. We find that of the 242 nonredundant prokaryotic protein complexes shared between the current PDB and Prolinks, 44% (107/242) contain proteins linked at high confidence by one or more methods of computed functional linkages. Similarly, high-confidence linkages detect 47% of known Escherichia coli protein complexes, with 45% accuracy. Together these findings suggest that functional linkages will be useful in defining protein complexes for structural studies, including for structural genomics. We offer a database of inferred linkages corresponding to likely protein complexes for some 629,952 pairs of proteins in 154 prokaryotes and archaea.
Collapse
Affiliation(s)
- Sul-Min Kim
- Department of Chemistry and Biochemistry, University of California-Los Angeles, Los Angeles, CA 90095, USA
| | | | | | | | | | | | | |
Collapse
|
29
|
Shin DH, Hou J, Chandonia JM, Das D, Choi IG, Kim R, Kim SH. Structure-based inference of molecular functions of proteins of unknown function from Berkeley Structural Genomics Center. ACTA ACUST UNITED AC 2007; 8:99-105. [PMID: 17764033 DOI: 10.1007/s10969-007-9025-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2007] [Accepted: 07/27/2007] [Indexed: 11/26/2022]
Abstract
Advances in sequence genomics have resulted in an accumulation of a huge number of protein sequences derived from genome sequences. However, the functions of a large portion of them cannot be inferred based on the current methods of sequence homology detection to proteins of known functions. Three-dimensional structure can have an important impact in providing inference of molecular function (physical and chemical function) of a protein of unknown function. Structural genomics centers worldwide have been determining many 3-D structures of the proteins of unknown functions, and possible molecular functions of them have been inferred based on their structures. Combined with bioinformatics and enzymatic assay tools, the successful acceleration of the process of protein structure determination through high throughput pipelines enables the rapid functional annotation of a large fraction of hypothetical proteins. We present a brief summary of the process we used at the Berkeley Structural Genomics Center to infer molecular functions of proteins of unknown function.
Collapse
Affiliation(s)
- Dong Hae Shin
- College of Pharmacy, Ewha Womans University, Seoul, Korea
| | | | | | | | | | | | | |
Collapse
|
30
|
Kawabata T, Go N. Detection of pockets on protein surfaces using small and large probe spheres to find putative ligand binding sites. Proteins 2007; 68:516-29. [PMID: 17444522 DOI: 10.1002/prot.21283] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
One of the simplest ways to predict ligand binding sites is to identify pocket-shaped regions on the protein surface. Many programs have already been proposed to identify these pocket regions. Examination of their algorithms revealed that a pocket intrinsically has two arbitrary properties, "size" and "depth". We proposed a new definition for pockets using two explicit adjustable parameters that correspond to these two arbitrary properties. A pocket region is defined as a space into which a small probe can enter, but a large probe cannot. The radii of small and large probe spheres are the two parameters that correspond to the "size" and "depth" of the pockets, respectively. These values can be adjusted individual putative ligand molecule. To determine the optimal value of the large probe spheres radius, we generated pockets for thousands of protein structures in the database, using several size of large probe spheres, examined the correspondence of these pockets with known binding site positions. A new measure of shallowness, a minimum inaccessible radius, R(inaccess), indicated that binding sites of coenzymes are very deep, while those for adenine/guanine mononucleotide have only medium shallowness and those for short peptides and oligosaccharides are shallow. The optimal radius of large probe spheres was 3-4 A for the coenzymes, 4 A for adenine/guanine mononucleotides, and 5 A or more for peptides/oligosaccharides. Comparison of our program with two other popular pocket-finding programs showed that our program had a higher performance of detecting binding pockets, although it required more computational time.
Collapse
Affiliation(s)
- Takeshi Kawabata
- Graduate School of Information Science, Nara Institute of Science and Technology, Ikoma, Nara, Japan.
| | | |
Collapse
|
31
|
Rigden DJ. Understanding the cell in terms of structure and function: insights from structural genomics. Curr Opin Biotechnol 2006; 17:457-64. [PMID: 16890423 DOI: 10.1016/j.copbio.2006.07.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2006] [Revised: 06/21/2006] [Accepted: 07/25/2006] [Indexed: 10/24/2022]
Abstract
Structural genomics programs are only now moving into the large-scale production phase, yet have already produced around 2000 protein structures. Through a widespread if not exclusive emphasis on structural novelty, our knowledge of the protein fold universe is improving rapidly. With this information comes the challenge of structure-based function annotation for the many target proteins about which little or nothing is known. Recent years have therefore seen the emergence of impressively diverse bioinformatics approaches to predict the function of a protein structure. Attention is now turning to means of combining these predictions with information from various other sources.
Collapse
Affiliation(s)
- Daniel J Rigden
- School of Biological Sciences, University of Liverpool, Biosciences Building, Crown Street, Liverpool L69 7ZB, UK.
| |
Collapse
|
32
|
Kim SH, Shin DH, Liu J, Oganesyan V, Chen S, Xu QS, Kim JS, Das D, Schulze-Gahmen U, Holbrook SR, Holbrook EL, Martinez BA, Oganesyan N, DeGiovanni A, Lou Y, Henriquez M, Huang C, Jancarik J, Pufan R, Choi IG, Chandonia JM, Hou J, Gold B, Yokota H, Brenner SE, Adams PD, Kim R. Structural genomics of minimal organisms and protein fold space. ACTA ACUST UNITED AC 2006; 6:63-70. [PMID: 16211501 DOI: 10.1007/s10969-005-2651-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2004] [Accepted: 02/15/2005] [Indexed: 11/29/2022]
Abstract
The initial aim of the Berkeley Structural Genomics Center is to obtain a near-complete structural complement of two minimal organisms, closely related pathogens Mycoplasma genitalium and M. pneumoniae. The former has fewer than 500 genes and the latter fewer than 700 genes. To achieve this goal, the current protein targets have been selected starting with those predicted to be most tractable and likely to yield new structural and functional information. During the past 3 years, the semi-automated structural genomics pipeline has been set up from cloning, expression, purification, and ultimately to structural determination. The results from the pipeline substantially increased the coverage of the protein fold space of M. pneumoniae and M. genitalium. Furthermore, about 1/2 of the structures of 'unique' protein sequences revealed new and novel folds, and over 2/3 of the structures of previously annotated 'hypothetical proteins' inferred their molecular functions.
Collapse
Affiliation(s)
- Sung-Hou Kim
- Department of Chemistry, University of California, Berkeley, 94720-5230, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Yura K, Yamaguchi A, Go M. Coverage of whole proteome by structural genomics observed through protein homology modeling database. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2006; 7:65-76. [PMID: 17146617 PMCID: PMC1769342 DOI: 10.1007/s10969-006-9010-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2006] [Accepted: 08/08/2006] [Indexed: 11/07/2022]
Abstract
We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE ( http://daisy.nagahama-i-bio.ac.jp/Famsbase/ ), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50% of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60% of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30% in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5%, and that for eubacteria by 7% in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics.
Collapse
Affiliation(s)
- Kei Yura
- Quantum Bioinformatics Team, Center for Computational Science and Engineering, Japan Atomic Energy Agency, Kyoto 619-0215, Japan.
| | | | | |
Collapse
|
34
|
Shin DH, Kim JS, Yokota H, Kim R, Kim SH. Crystal structure of the DUF16 domain of MPN010 from Mycoplasma pneumoniae. Protein Sci 2006; 15:921-8. [PMID: 16522803 PMCID: PMC2242467 DOI: 10.1110/ps.051993506] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
We have determined the crystal structure of the DUF16 domain of unknown function encoded by the gene MPN010 of Mycoplasma pneumoniae at 1.8 A resolution. The crystal structure revealed that this domain is composed of two separated homotrimeric coiled-coils. The shorter one consists of 11 highly conserved residues. The sequence comprises noncanonical heptad repeats that induce a right-handed coiled-coil structure. The longer one is composed of approximately nine heptad repeats. In this coiled-coil structure, there are three distinguishable regions that confer unique structural properties compared with other known homotrimeric coiled-coils. The first part, containing one stutter, is an unusual phenylalanine-rich region that is not found in any other coiled-coil structures. The second part is a highly conserved glutamine-rich region, frequently found in other trimeric coiled-coil structures. The last part is composed of prototype heptad repeats. The phylogenetic analysis of the DUF16 family together with a secondary structure prediction shows that the DUF16 family can be classified into five subclasses according to N-terminal sequences. Based on the structural comparison with other coiled-coil structures, a probable molecular function of the DUF16 family is discussed.
Collapse
Affiliation(s)
- Dong Hae Shin
- College of Pharmacy, Ewha Womans Unversity, Seoul 120-750, Korea
| | | | | | | | | |
Collapse
|
35
|
Powell BC, Hutchison CA. Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs. BMC Bioinformatics 2006; 7:31. [PMID: 16423288 PMCID: PMC1386717 DOI: 10.1186/1471-2105-7-31] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2005] [Accepted: 01/19/2006] [Indexed: 11/23/2022] Open
Abstract
Background Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informative as more genomes are examined. We studied all open reading frames (ORFs) of at least 30 codons from the genomes of 27 sequenced bacterial strains. We grouped the potential peptide sequences encoded from the ORFs by forming Clusters of Orthologous Groups (COGs). We used this grouping in order to find homologous relationships that would not be distinguishable from noise when using simple BLAST searches. Although COG analysis was initially developed to group annotated genes, we applied it to the task of grouping anonymous DNA sequences that may encode proteins. Results "Mixed COGs" of ORFs (clusters in which some sequences correspond to annotated genes and some do not) are attractive targets when seeking errors of gene predicion. Examination of mixed COGs reveals some situations in which genes appear to have been missed in current annotations and a smaller number of regions that appear to have been annotated as gene loci erroneously. This technique can also be used to detect potential pseudogenes or sequencing errors. Our method uses an adjustable parameter for degree of conservation among the studied genomes (stringency). We detail results for one level of stringency at which we found 83 potential genes which had not previously been identified, 60 potential pseudogenes, and 7 sequences with existing gene annotations that are probably incorrect. Conclusion Systematic study of sequence conservation offers a way to improve existing annotations by identifying potentially homologous regions where the annotation of the presence or absence of a gene is inconsistent among genomes.
Collapse
Affiliation(s)
- Bradford C Powell
- Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Clyde A Hutchison
- Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Department of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- J. Craig Venter Institute, Rockville, Maryland, USA
| |
Collapse
|
36
|
Chandonia JM, Kim SH, Brenner SE. Target selection and deselection at the Berkeley Structural Genomics Center. Proteins 2005; 62:356-70. [PMID: 16276528 DOI: 10.1002/prot.20674] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
At the Berkeley Structural Genomics Center (BSGC), our goal is to obtain a near-complete structural complement of proteins in the minimal organisms Mycoplasma genitalium and M. pneumoniae, two closely related pathogens. Current targets for structure determination have been selected in six major stages, starting with those predicted to be most tractable to high throughput study and likely to yield new structural information. We report on the process used to select these proteins, as well as our target deselection procedure. Target deselection reduces experimental effort by eliminating targets similar to those recently solved by the structural biology community or other centers. We measure the impact of the 69 structures solved at the BSGC as of July 2004 on structure prediction coverage of the M. pneumoniae and M. genitalium proteomes. The number of Mycoplasma proteins for which the fold could first be reliably assigned based on structures solved at the BSGC (24 M. pneumoniae and 21 M. genitalium) is approximately 25% of the total resulting from work at all structural genomics centers and the worldwide structural biology community (94 M. pneumoniae and 86 M. genitalium) during the same period. As the number of structures contributed by the BSGC during that period is less than 1% of the total worldwide output, the benefits of a focused target selection strategy are apparent. If the structures of all current targets were solved, the percentage of M. pneumoniae proteins for which folds could be reliably assigned would increase from approximately 57% (391 of 687) at present to around 80% (550 of 687), and the percentage of the proteome that could be accurately modeled would increase from around 37% (254 of 687) to about 64% (438 of 687). In M. genitalium, the percentage of the proteome that could be structurally annotated based on structures of our remaining targets would rise from 72% (348 of 486) to around 76% (371 of 486), with the percentage of accurately modeled proteins would rise from 50% (243 of 486) to 58% (283 of 486). Sequences and data on experimental progress on our targets are available in the public databases TargetDB and PEPCdb.
Collapse
Affiliation(s)
- John-Marc Chandonia
- Berkeley Structural Genomics Center, Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | | | | |
Collapse
|
37
|
Noel JP, Austin MB, Bomati EK. Structure-function relationships in plant phenylpropanoid biosynthesis. CURRENT OPINION IN PLANT BIOLOGY 2005; 8:249-53. [PMID: 15860421 PMCID: PMC2861907 DOI: 10.1016/j.pbi.2005.03.013] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Plants, as sessile organisms, evolve and exploit metabolic systems to create a rich repertoire of complex natural products that hold adaptive significance for their survival in challenging ecological niches on earth. As an experimental tool set, structural biology provides a high-resolution means to uncover detailed information about the structure-function relationships of metabolic enzymes at the atomic level. Together with genomic and biochemical approaches and an appreciation of molecular evolution, structural enzymology holds great promise for addressing a number of questions relating to secondary or, more appropriately, specialized metabolism. Why is secondary metabolism so adaptable? How are reactivity, regio-chemistry and stereo-chemistry steered during the multi-step conversion of substrates into products? What are the vestigial structural and mechanistic traits that remain in biosynthetic enzymes during the diversification of substrate and product selectivity? What does the catalytic landscape look like as an enzyme family traverses all possible lineages en route to the acquisition of new substrate and/or product specificities? And how can one rationally engineer biosynthesis using the unique perspectives of evolution and structural biology to create novel chemicals for human use?
Collapse
Affiliation(s)
- Joseph P Noel
- Jack Skirball Chemical Biology and Proteomics Laboratory, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, California 92037, USA.
| | | | | |
Collapse
|
38
|
Crystal structure of the hypothetical protein TA1238 from Thermoplasma acidophilum: a new type of helical super-bundle. ACTA ACUST UNITED AC 2005. [DOI: 10.1007/s10969-004-3789-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
39
|
Stark A, Shkumatov A, Russell RB. Finding functional sites in structural genomics proteins. Structure 2005; 12:1405-12. [PMID: 15296734 DOI: 10.1016/j.str.2004.05.012] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2004] [Revised: 05/13/2004] [Accepted: 05/14/2004] [Indexed: 10/26/2022]
Abstract
Assigning function to structures is an important aspect of structural genomics projects, since they frequently provide structures for uncharacterized proteins. Similarities uncovered by structure alignment can suggest a similar function, even in the absence of sequence similarity. For proteins adopting novel folds or those with many functions, this strategy can fail, but functional clues can still come from comparison of local functional sites involving a few key residues. Here we assess the general applicability of functional site comparison through the study of 157 proteins solved by structural genomics initiatives. For 17, the method bolsters confidence in predictions made based on overall fold similarity. For another 12 with new folds, it suggests functions, including a putative phosphotyrosine binding site in the Archaeal protein Mth1187 and an active site for a ribose isomerase. The approach is applied weekly to all new structures, providing a resource for those interested in using structure to infer function.
Collapse
|
40
|
Galperin MY, Koonin EV. 'Conserved hypothetical' proteins: prioritization of targets for experimental study. Nucleic Acids Res 2004; 32:5452-63. [PMID: 15479782 PMCID: PMC524295 DOI: 10.1093/nar/gkh885] [Citation(s) in RCA: 309] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Comparative genomics shows that a substantial fraction of the genes in sequenced genomes encodes 'conserved hypothetical' proteins, i.e. those that are found in organisms from several phylogenetic lineages but have not been functionally characterized. Here, we briefly discuss recent progress in functional characterization of prokaryotic 'conserved hypothetical' proteins and the possible criteria for prioritizing targets for experimental study. Based on these criteria, the chief one being wide phyletic spread, we offer two 'top 10' lists of highly attractive targets. The first list consists of proteins for which biochemical activity could be predicted with reasonable confidence but the biological function was predicted only in general terms, if at all ('known unknowns'). The second list includes proteins for which there is no prediction of biochemical activity, even if, for some, general biological clues exist ('unknown unknowns'). The experimental characterization of these and other 'conserved hypothetical' proteins is expected to reveal new, crucial aspects of microbial biology and could also lead to better functional prediction for medically relevant human homologs.
Collapse
Affiliation(s)
- Michael Y Galperin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | |
Collapse
|
41
|
Sanishvili R, Pennycooke M, Gu J, Xu X, Joachimiak A, Edwards AM, Christendat D. Crystal structure of the hypothetical protein TA1238 from Thermoplasma acidophilum: a new type of helical super-bundle. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2004; 5:231-40. [PMID: 15704011 PMCID: PMC2792032 DOI: 10.1007/s10969-005-3789-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2004] [Accepted: 08/18/2004] [Indexed: 10/25/2022]
Abstract
The crystal structure of the hypothetical protein TA1238 from Thermoplasma acidophilum was solved with multiple-wavelength anomalous diffraction and refined at 2.0 A resolution. The molecule consists of a typical four-helix antiparallel bundle with overhand connection. However, its oligomerization into a trimer leads to a coiled "super-helix" which is novel for such bundles. Its central feature, a six-stranded coiled coil, is also novel for proteins. TA1238 does not have strong sequence homologues in databases, but shows strong structural similarity with some proteins in the Protein Data Bank. The function could not be inferred from the sequence but the structure, with some rearrangement, bears some resemblance to the active site region of cobalamin adenosyltransferase (TA1434). Specifically, TA1238 retains Arg104, which is structurally equivalent to functionally critical Arg119 of TA1434. For such conformational change, the overhand connection of TA1238 might need to be involved in a gating mechanism that might be modulated by ligands and/or by interactions with the physiological partners. This allowed us to hypothesize that TA1238 could be involved in cobalamin biosyntheses.
Collapse
Affiliation(s)
- Ruslan Sanishvili
- Structural Biology Center, Midwest Center for Structural Genomics, Biosciences, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439, USA; (now at GM/CA Collaborative Access Team, Argonne National Laboratory)
| | - Micha Pennycooke
- Clinical Genomics Center, University Health Network, 101 College Street, Toronto, Ontario, Canada M5G 1L7
| | - Jun Gu
- Clinical Genomics Center, University Health Network, 101 College Street, Toronto, Ontario, Canada M5G 1L7
| | - Xiaohui Xu
- Clinical Genomics Center, University Health Network, 101 College Street, Toronto, Ontario, Canada M5G 1L7
| | - Andrzej Joachimiak
- Structural Biology Center, Midwest Center for Structural Genomics, Biosciences, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439, USA; (now at GM/CA Collaborative Access Team, Argonne National Laboratory)
| | - Aled M. Edwards
- Banting and Best Department of Medical Research, University of Toronto, 112 College Street, Toronto, Ontario, Canada M5G 1L6
- Structural Genomics Consortium, University of Toronto, 112 College Street, Toronto, Ontario, Canada M5G 1L6
- Department of Medical Biophysics, University of Toronto, 610 University Avenue, Toronto, Ontario, Canada M5G 2M9
| | - Dinesh Christendat
- Department of Botany, University of Toronto, 25 Wilcocks Street, Toronto, Ontario, Canada M5S 3B2; (now at GM/CA Collaborative Access Team, Biosciences, Argonne National Laboratory)
| |
Collapse
|
42
|
Abstract
The problem of assigning a biochemical function to newly discovered proteins has been traditionally approached by expert enzymological analysis, sequence analysis, and structural modeling. In recent years, the appearance of databases containing protein-ligand interaction data for large numbers of protein classes and chemical compounds have provided new ways of investigating proteins for which the biochemical function is not completely understood. In this work, we introduce a method that utilizes ligand-binding data for functional classification of enzymes. The method makes use of the existing Enzyme Commission (EC) classification scheme and the data on interactions of small molecules with enzymes from the BRENDA database. A set of ligands that binds to an enzyme with unknown biochemical function serves as a query to search a protein-ligand interaction database for enzyme classes that are known to interact with a similar set of ligands. These classes provide hypotheses of the query enzyme's function and complement other computational annotations that take advantage of sequence and structural information. Similarity between sets of ligands is computed using point set similarity measures based upon similarity between individual compounds. We present the statistics of classification of the enzymes in the database by a cross-validation procedure and illustrate the application of the method on several examples.
Collapse
Affiliation(s)
- Sergei Izrailev
- Johnson & Johnson Pharmaceutical Research and Development, Cranbury, New Jersey 08512, USA.
| | | |
Collapse
|