1
|
Standley DM, Nakanishi T, Xu Z, Haruna S, Li S, Nazlica SA, Katoh K. The evolution of structural genomics. Biophys Rev 2022; 14:1247-1253. [PMID: 36536641 PMCID: PMC9753067 DOI: 10.1007/s12551-022-01031-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/24/2022] [Indexed: 12/23/2022] Open
Abstract
Structural genomics began as a global effort in the 1990s to determine the tertiary structures of all protein families as a response to large-scale genome sequencing projects. The immediate outcome was an influx of tens of thousands of protein structures, many of which had unknown functions. At the time, the value of structural genomics was controversial. However, the structures themselves were only the most obvious output. In addition, these newly solved structures motivated the emergence of huge data science and infrastructure efforts, which, together with advances in Deep Learning, have brought about a revolution in computational molecular biology. Here, we review some of the computational research carried out at the Protein Data Bank Japan (PDBj) during the Protein 3000 project under the leadership of Haruki Nakamura, much of which continues to flourish today.
Collapse
Affiliation(s)
- Daron M. Standley
- grid.136593.b0000 0004 0373 3971Department of Genome Informatics, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, 565-0871 Japan
| | - Tokuichiro Nakanishi
- grid.136593.b0000 0004 0373 3971Department of Genome Informatics, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, 565-0871 Japan
| | - Zichang Xu
- grid.136593.b0000 0004 0373 3971Department of Genome Informatics, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, 565-0871 Japan
| | - Soichiro Haruna
- grid.136593.b0000 0004 0373 3971Department of Genome Informatics, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, 565-0871 Japan
| | - Songling Li
- grid.136593.b0000 0004 0373 3971Department of Genome Informatics, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, 565-0871 Japan
| | - Sedat Aybars Nazlica
- grid.136593.b0000 0004 0373 3971Department of Genome Informatics, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, 565-0871 Japan
| | - Kazutaka Katoh
- grid.136593.b0000 0004 0373 3971Department of Genome Informatics, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, 565-0871 Japan
| |
Collapse
|
2
|
Dumpati R, Dulapalli R, Kondagari B, Ramatenki V, Vellanki S, Vadija R, Vuruputuri U. Suppressor of Cytokine Signalling-3 as a Drug Target for Type 2 Diabetes Mellitus: A Structure-Guided Approach. ChemistrySelect 2016. [DOI: 10.1002/slct.201600640] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Ramakrishna Dumpati
- Department of Chemistry; University College of Science; Osmania University; Tarnaka, Hyderabad, Telangana INDIA - 500007
| | - Ramasree Dulapalli
- Department of Chemistry; University College of Science; Osmania University; Tarnaka, Hyderabad, Telangana INDIA - 500007
| | - Bhargavi Kondagari
- Department of Chemistry; University College of Science; Osmania University; Tarnaka, Hyderabad, Telangana INDIA - 500007
| | - Vishwanath Ramatenki
- Department of Chemistry; University College of Science; Osmania University; Tarnaka, Hyderabad, Telangana INDIA - 500007
| | - Santhiprada Vellanki
- Department of Chemistry; University College of Science; Osmania University; Tarnaka, Hyderabad, Telangana INDIA - 500007
| | - Rajender Vadija
- Department of Chemistry; University College of Science; Osmania University; Tarnaka, Hyderabad, Telangana INDIA - 500007
| | - Uma Vuruputuri
- Department of Chemistry; University College of Science; Osmania University; Tarnaka, Hyderabad, Telangana INDIA - 500007
| |
Collapse
|
3
|
|
4
|
Kloppmann E, Punta M, Rost B. Structural genomics plucks high-hanging membrane proteins. Curr Opin Struct Biol 2012; 22:326-32. [PMID: 22622032 DOI: 10.1016/j.sbi.2012.05.002] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2012] [Revised: 03/28/2012] [Accepted: 05/01/2012] [Indexed: 01/21/2023]
Abstract
Recent years have seen the establishment of structural genomics centers that explicitly target integral membrane proteins. Here, we review the advances in targeting these extremely high-hanging fruits of structural biology in high-throughput mode. We observe that the experimental determination of high-resolution structures of integral membrane proteins is increasingly successful both in terms of getting structures and of covering important protein families, for example, from Pfam. Structural genomics has begun to contribute significantly toward this progress. An important component of this contribution is the set up of robotic pipelines that generate a wealth of experimental data for membrane proteins. We argue that prediction methods for the identification of membrane regions and for the comparison of membrane proteins largely suffice to meet the challenges of target selection for structural genomics of membrane proteins. In contrast, we need better methods to prioritize the most promising members in a family of closely related proteins and to annotate protein function from sequence and structure in absence of homology.
Collapse
Affiliation(s)
- Edda Kloppmann
- Department of Bioinformatics and Computational Biology, Technical University Munich, Germany.
| | | | | |
Collapse
|
5
|
Punta M, Love J, Handelman S, Hunt JF, Shapiro L, Hendrickson WA, Rost B. Structural genomics target selection for the New York consortium on membrane protein structure. ACTA ACUST UNITED AC 2009; 10:255-68. [PMID: 19859826 PMCID: PMC2780672 DOI: 10.1007/s10969-009-9071-1] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2009] [Accepted: 09/30/2009] [Indexed: 01/02/2023]
Abstract
The New York Consortium on Membrane Protein Structure (NYCOMPS), a part of the Protein Structure Initiative (PSI) in the USA, has as its mission to establish a high-throughput pipeline for determination of novel integral membrane protein structures. Here we describe our current target selection protocol, which applies structural genomics approaches informed by the collective experience of our team of investigators. We first extract all annotated proteins from our reagent genomes, i.e. the 96 fully sequenced prokaryotic genomes from which we clone DNA. We filter this initial pool of sequences and obtain a list of valid targets. NYCOMPS defines valid targets as those that, among other features, have at least two predicted transmembrane helices, no predicted long disordered regions and, except for community nominated targets, no significant sequence similarity in the predicted transmembrane region to any known protein structure. Proteins that feed our experimental pipeline are selected by defining a protein seed and searching the set of all valid targets for proteins that are likely to have a transmembrane region structurally similar to that of the seed. We require sequence similarity aligning at least half of the predicted transmembrane region of seed and target. Seeds are selected according to their feasibility and/or biological interest, and they include both centrally selected targets and community nominated targets. As of December 2008, over 6,000 targets have been selected and are currently being processed by the experimental pipeline. We discuss how our target list may impact structural coverage of the membrane protein space.
Collapse
Affiliation(s)
- Marco Punta
- Department of Biochemistry and Molecular Biophysics, Columbia University, 630 West 168th Street, New York, NY, 10032, USA.
| | | | | | | | | | | | | |
Collapse
|
6
|
Schlessinger A, Liu J, Rost B. Natively unstructured loops differ from other loops. PLoS Comput Biol 2008; 3:e140. [PMID: 17658943 PMCID: PMC1924875 DOI: 10.1371/journal.pcbi.0030140] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2006] [Accepted: 06/05/2007] [Indexed: 11/24/2022] Open
Abstract
Natively unstructured or disordered protein regions may increase the functional complexity of an organism; they are particularly abundant in eukaryotes and often evade structure determination. Many computational methods predict unstructured regions by training on outliers in otherwise well-ordered structures. Here, we introduce an approach that uses a neural network in a very different and novel way. We hypothesize that very long contiguous segments with nonregular secondary structure (NORS regions) differ significantly from regular, well-structured loops, and that a method detecting such features could predict natively unstructured regions. Training our new method, NORSnet, on predicted information rather than on experimental data yielded three major advantages: it removed the overlap between testing and training, it systematically covered entire proteomes, and it explicitly focused on one particular aspect of unstructured regions with a simple structural interpretation, namely that they are loops. Our hypothesis was correct: well-structured and unstructured loops differ so substantially that NORSnet succeeded in their distinction. Benchmarks on previously used and new experimental data of unstructured regions revealed that NORSnet performed very well. Although it was not the best single prediction method, NORSnet was sufficiently accurate to flag unstructured regions in proteins that were previously not annotated. In one application, NORSnet revealed previously undetected unstructured regions in putative targets for structural genomics and may thereby contribute to increasing structural coverage of large eukaryotic families. NORSnet found unstructured regions more often in domain boundaries than expected at random. In another application, we estimated that 50%–70% of all worm proteins observed to have more than seven protein–protein interaction partners have unstructured regions. The comparative analysis between NORSnet and DISOPRED2 suggested that long unstructured loops are a major part of unstructured regions in molecular networks. The details of protein structures are important for function. Regions that do not adopt any regular structure in isolation (natively unstructured or disordered regions) initially appeared as a curious exception to this structure–function paradigm. It has become increasingly clear that unstructured regions are fundamental to many roles and that they are particularly important for multicellular organisms. Structural biology is just beginning to apprehend the stunning diversity of these roles. Here, we focused on unstructured regions dominated by a particular type of loop, namely the natively unstructured one. We developed a method that succeeded in the distinction between well-structured and natively unstructured loops. For the development, we did not use any experimental data for unstructured regions; when tested on experimental data, the method performed surprisingly well. Due to its different premises, the method captured very different aspects of unstructured regions than other methods that we tested. We applied the new method to two different problems. The first was the identification of proteins that may be difficult targets for structure determination. The second was the identification of worm proteins that have many interaction partners (more than seven) and unstructured regions. Surprisingly, we found unstructured regions of the loopy type in more than 50% of all the promiscuous worm proteins.
Collapse
Affiliation(s)
- Avner Schlessinger
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, USA.
| | | | | |
Collapse
|
7
|
Stöckigt J, Barleben L, Panjikar S, Loris EA. 3D-Structure and function of strictosidine synthase--the key enzyme of monoterpenoid indole alkaloid biosynthesis. PLANT PHYSIOLOGY AND BIOCHEMISTRY : PPB 2008; 46:340-55. [PMID: 18280746 DOI: 10.1016/j.plaphy.2007.12.011] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2007] [Indexed: 05/03/2023]
Abstract
Strictosidine synthase (STR; EC 4.3.3.2) plays a key role in the biosynthesis of monoterpenoid indole alkaloids by catalyzing the Pictet-Spengler reaction between tryptamine and secologanin, leading exclusively to 3alpha-(S)-strictosidine. The structure of the native enzyme from the Indian medicinal plant Rauvolfia serpentina represents the first example of a six-bladed four-stranded beta-propeller fold from the plant kingdom. Moreover, the architecture of the enzyme-substrate and enzyme-product complexes reveals deep insight into the active centre and mechanism of the synthase highlighting the importance of Glu309 as the catalytic residue. The present review describes the 3D-structure and function of R. serpentina strictosidine synthase and provides a summary of the strictosidine synthase substrate specificity studies carried out in different organisms to date. Based on the enzyme-product complex, this paper goes on to describe a rational, structure-based redesign of the enzyme, which offers the opportunity to produce novel strictosidine derivatives which can be used to generate alkaloid libraries of the N-analogues heteroyohimbine type. Finally, alignment studies of functionally expressed strictosidine synthases are presented and the evolutionary aspects of sequence- and structure-related beta-propeller folds are discussed.
Collapse
Affiliation(s)
- Joachim Stöckigt
- College of Pharmaceutical Sciences, Zijingang Campus, Zhejiang University, 310058 Hangzhou, China.
| | | | | | | |
Collapse
|
8
|
Lopez G, Valencia A, Tress M. FireDB--a database of functionally important residues from proteins of known structure. Nucleic Acids Res 2006; 35:D219-23. [PMID: 17132832 PMCID: PMC1716728 DOI: 10.1093/nar/gkl897] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
The FireDB database is a databank for functional information relating to proteins with known structures. It contains the most comprehensive and detailed repository of known functionally important residues, bringing together both ligand binding and catalytic residues in one site. The platform integrates biologically relevant data filtered from the close atomic contacts in Protein Data Bank crystal structures and reliably annotated catalytic residues from the Catalytic Site Atlas. The interface allows users to make queries by protein, ligand or keyword. Relevant biologically important residues are displayed in a simple and easy to read manner that allows users to assess binding site similarity across homologous proteins. Binding site residue variations can also be viewed with molecular visualization tools. The database is available at
Collapse
Affiliation(s)
- Gonzalo Lopez
- Computational and Structural Biology Program, Spanish National Cancer Research Centre (CNIO) Melchor Fernández Almagro, 3, E-28029, Madrid, Spain.
| | | | | |
Collapse
|
9
|
Abstract
MOTIVATION Despite the continuing advance in the experimental determination of protein structures, the gap between the number of known protein sequences and structures continues to increase. Prediction methods can bridge this sequence-structure gap only partially. Better predictions of non-local contacts between residues could improve comparative modeling, fold recognition and could assist in the experimental structure determination. RESULTS Here, we introduced PROFcon, a novel contact prediction method that combines information from alignments, from predictions of secondary structure and solvent accessibility, from the region between two residues and from the average properties of the entire protein. In contrast to some other methods, PROFcon predicted short and long proteins at similar levels of accuracy. As expected, PROFcon was clearly less accurate when tested on sparse evolutionary profiles, that is, on families with few homologs. Prediction accuracy was highest for proteins belonging to the SCOP alpha/beta class. PROFcon compared favorably with state-of-the-art prediction methods at the CASP6 meeting. While the performance may still be perceived as low, our method clearly pushed the mark higher. Furthermore, predictions are already accurate enough to seed predictions of global features of protein structure.
Collapse
Affiliation(s)
- Marco Punta
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University 650 West 168th Street BB217, New York, NY 10032, USA.
| | | |
Collapse
|
10
|
Joseph-McCarthy D. Chapter 12 Structure-Based Lead Optimization. ACTA ACUST UNITED AC 2005. [DOI: 10.1016/s1574-1400(05)01012-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2023]
|
11
|
Liu J, Hegyi H, Acton TB, Montelione GT, Rost B. Automatic target selection for structural genomics on eukaryotes. Proteins 2004; 56:188-200. [PMID: 15211504 DOI: 10.1002/prot.20012] [Citation(s) in RCA: 60] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A central goal of structural genomics is to experimentally determine representative structures for all protein families. At least 14 structural genomics pilot projects are currently investigating the feasibility of high-throughput structure determination; the National Institutes of Health funded nine of these in the United States. Initiatives differ in the particular subset of "all families" on which they focus. At the NorthEast Structural Genomics consortium (NESG), we target eukaryotic protein domain families. The automatic target selection procedure has three aims: 1) identify all protein domain families from currently five entirely sequenced eukaryotic target organisms based on their sequence homology, 2) discard those families that can be modeled on the basis of structural information already present in the PDB, and 3) target representatives of the remaining families for structure determination. To guarantee that all members of one family share a common foldlike region, we had to begin by dissecting proteins into structural domain-like regions before clustering. Our hierarchical approach, CHOP, utilizing homology to PrISM, Pfam-A, and SWISS-PROT chopped the 103,796 eukaryotic proteins/ORFs into 247,222 fragments. Of these fragments, 122,999 appeared suitable targets that were grouped into >27,000 singletons and >18,000 multifragment clusters. Thus, our results suggested that it might be necessary to determine >40,000 structures to minimally cover the subset of five eukaryotic proteomes.
Collapse
Affiliation(s)
- Jinfeng Liu
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| | | | | | | | | |
Collapse
|
12
|
Abstract
Guessing the boundaries of structural domains has been an important and challenging problem in experimental and computational structural biology. Predictions were based on intuition, biochemical properties, statistics, sequence homology and other aspects of predicted protein structure. Here, we introduced CHOPnet, a de novo method that predicts structural domains in the absence of homology to known domains. Our method was based on neural networks and relied exclusively on information available for all proteins. Evaluating sustained performance through rigorous cross-validation on proteins of known structure, we correctly predicted the number of domains in 69% of all proteins. For 50% of the two-domain proteins the centre of the predicted boundary was closer than 20 residues to the boundary assigned from three-dimensional (3D) structures; this was about eight percentage points better than predictions by 'equal split'. Our results appeared to compare favourably with those from previously published methods. CHOPnet may be useful to restrict the experimental testing of different fragments for structure determination in the context of structural genomics.
Collapse
Affiliation(s)
- Jinfeng Liu
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA.
| | | |
Collapse
|
13
|
Przybylski D, Rost B. Improving Fold Recognition Without Folds. J Mol Biol 2004; 341:255-69. [PMID: 15312777 DOI: 10.1016/j.jmb.2004.05.041] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2004] [Revised: 05/18/2004] [Accepted: 05/18/2004] [Indexed: 11/21/2022]
Abstract
The most reliable way to align two proteins of unknown structure is through sequence-profile and profile-profile alignment methods. If the structure for one of the two is known, fold recognition methods outperform purely sequence-based alignments. Here, we introduced a novel method that aligns generalised sequence and predicted structure profiles. Using predicted 1D structure (secondary structure and solvent accessibility) significantly improved over sequence-only methods, both in terms of correctly recognising pairs of proteins with different sequences and similar structures and in terms of correctly aligning the pairs. The scores obtained by our generalised scoring matrix followed an extreme value distribution; this yielded accurate estimates of the statistical significance of our alignments. We found that mistakes in 1D structure predictions correlated between proteins from different sequence-structure families. The impact of this surprising result was that our method succeeded in significantly out-performing sequence-only methods even without explicitly using structural information from any of the two. Since AGAPE also outperformed established methods that rely on 3D information, we made it available through. If we solved the problem of CPU-time required to apply AGAPE on millions of proteins, our results could also impact everyday database searches.
Collapse
Affiliation(s)
- Dariusz Przybylski
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA.
| | | |
Collapse
|
14
|
Abstract
We developed a method CHOP dissecting proteins into domain-like fragments. The basic idea was to cut proteins beginning from very reliable experimental information (PDB), proceeding to expert annotations of domain-like regions (Pfam-A), and completing through cuts based on termini of known proteins. In this way, CHOP dissected more than two thirds of all proteins from 62 proteomes. Analysis of our structural domain-like fragments revealed four surprising results. First, >70% of all dissected proteins contained more than one fragment. Second, most domains spanned on average over approximately 100 residues. This average was similar for eukaryotic and prokaryotic proteins, and it is also valid-although previously not described-for all proteins in the PDB. Third, single-domain proteins were significant longer than most domains in multidomain proteins. Fourth, three fourths of all domains appeared shorter than 210 residues. We believe that our CHOP fragments constituted an important resource for functional and structural genomics. Nevertheless, our main motivation to develop CHOP was that the single-linkage clustering method failed to adequately group full-length proteins. In contrast, CLUP-the simple clustering scheme CLUP introduced here-succeeded largely to group the CHOP fragments from 62 proteomes such that all members of one cluster shared a basic structural core. CLUP found >63,000 multi- and >118,000 single-member clusters. Although most fragments were restricted to a particular cluster, approximately 24% of the fragments were duplicated in at least two clusters. Our thresholds for grouping two fragments into the same cluster were rather conservative. Nevertheless, our results suggested that structural genomics initiatives have to target >30,000 fragments to at least cover the multimember clusters in 62 proteomes.
Collapse
Affiliation(s)
- Jinfeng Liu
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, USA
| | | |
Collapse
|
15
|
John B, Sali A. Comparative protein structure modeling by iterative alignment, model building and model assessment. Nucleic Acids Res 2003; 31:3982-92. [PMID: 12853614 PMCID: PMC165975 DOI: 10.1093/nar/gkg460] [Citation(s) in RCA: 264] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Comparative or homology protein structure modeling is severely limited by errors in the alignment of a modeled sequence with related proteins of known three-dimensional structure. To ameliorate this problem, we have developed an automated method that optimizes both the alignment and the model implied by it. This task is achieved by a genetic algorithm protocol that starts with a set of initial alignments and then iterates through re-alignment, model building and model assessment to optimize a model assessment score. During this iterative process: (i) new alignments are constructed by application of a number of operators, such as alignment mutations and cross-overs; (ii) comparative models corresponding to these alignments are built by satisfaction of spatial restraints, as implemented in our program MODELLER; (iii) the models are assessed by a variety of criteria, partly depending on an atomic statistical potential. When testing the procedure on a very difficult set of 19 modeling targets sharing only 4-27% sequence identity with their template structures, the average final alignment accuracy increased from 37 to 45% relative to the initial alignment (the alignment accuracy was measured as the percentage of positions in the tested alignment that were identical to the reference structure-based alignment). Correspondingly, the average model accuracy increased from 43 to 54% (the model accuracy was measured as the percentage of the C(alpha) atoms of the model that were within 5 A of the corresponding C(alpha) atoms in the superposed native structure). The present method also compares favorably with two of the most successful previously described methods, PSI-BLAST and SAM. The accuracy of the final models would be increased further if a better method for ranking of the models were available.
Collapse
Affiliation(s)
- Bino John
- Laboratory of Molecular Biophysics, Pels Family Center for Biochemistry and Structural Biology, The Rockefeller University, New York, NY 10021, USA
| | | |
Collapse
|
16
|
Klebe G. From structure to recognition principles: mining in crystal data as a prerequisite for drug design. ERNST SCHERING RESEARCH FOUNDATION WORKSHOP 2003:103-26. [PMID: 12664538 DOI: 10.1007/978-3-662-05314-0_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- G Klebe
- Institut für Pharmazeutische, Chemie Philipps-Universität Marburg, 35032 Marburg, Germany.
| |
Collapse
|
17
|
Joseph-McCarthy D, Alvarez JC. Automated generation of MCSS-derived pharmacophoric DOCK site points for searching multiconformation databases. Proteins 2003; 51:189-202. [PMID: 12660988 DOI: 10.1002/prot.10296] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
All docking methods employ some sort of heuristic to orient the ligand molecules into the binding site of the target structure. An automated method, MCSS2SPTS, for generating chemically labeled site points for docking is presented. MCSS2SPTS employs the program Multiple Copy Simultaneous Search (MCSS) to determine target-based theoretical pharmacophores. More specifically, chemically labeled site points are automatically extracted from selected low-energy functional-group minima and clustered together. These pharmacophoric site points can then be directly matched to the pharmacophoric features of database molecules with the use of either DOCK or PhDOCK to place the small molecules into the binding site. Several examples of the ability of MCSS2SPTS to reproduce the three-dimensional pharmacophoric features of ligands from known ligand-protein complex structures are discussed. In addition, a site-point set calculated for one human immunodeficiency virus 1 (HIV1) protease structure is used with PhDOCK to dock a set of HIV1 protease ligands; the docked poses are compared to the corresponding complex structures of the ligands. Finally, the use of an MCSS2SPTS-derived site-point set for acyl carrier protein synthase is compared to the use of atomic positions from a bound ligand as site points for a large-scale DOCK search. In general, MCSS2SPTS-generated site points focus the search on the more relevant areas and thereby allow for more effective sampling of the target site.
Collapse
|
18
|
Godzik A, Canaves J, Grzechnik S, Jaroszewski L, Morse A, Ouyang J, Wang X, West B, Wooley J. Challenges of structural genomics: bioinformatics. ACTA ACUST UNITED AC 2003. [DOI: 10.1016/s1478-5382(03)02259-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
19
|
Abstract
The rapid growth of bio-sequence information has resulted in an increasing demand for reliable methods that group proteins. A few databases with curated alignments of protein families have demonstrated that expert-driven repositories can keep up with the data deluge in the genome era. These original resources implicitly identify domain-like modules in proteins. An increasing number of automatic methods have sprouted over the past few years that cluster the protein universe. Many of these implicitly dissect proteins into structural domain-like fragments. In a very coarse-grained evaluation, some of the automatic methods appear to be on par with expert-driven approaches. However, neither automatic nor manual methods are currently entirely up to the challenges of tasks such as target selection in structural genomics. Thus, we urgently need refined and sustained automatic clustering tools.
Collapse
Affiliation(s)
- Jinfeng Liu
- CUBIC and North East Structural Genomics Consortium, Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA
| | | |
Collapse
|
20
|
Nair R, Rost B. Sequence conserved for subcellular localization. Protein Sci 2002; 11:2836-47. [PMID: 12441382 PMCID: PMC2373743 DOI: 10.1110/ps.0207402] [Citation(s) in RCA: 131] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2002] [Revised: 09/05/2002] [Accepted: 09/10/2002] [Indexed: 10/27/2022]
Abstract
The more proteins diverged in sequence, the more difficult it becomes for bioinformatics to infer similarities of protein function and structure from sequence. The precise thresholds used in automated genome annotations depend on the particular aspect of protein function transferred by homology. Here, we presented the first large-scale analysis of the relation between sequence similarity and identity in subcellular localization. Three results stood out: (1) The subcellular compartment is generally more conserved than what might have been expected given that short sequence motifs like nuclear localization signals can alter the native compartment; (2) the sequence conservation of localization is similar between different compartments; and (3) it is similar to the conservation of structure and enzymatic activity. In particular, we found the transition between the regions of conserved and nonconserved localization to be very sharp, although the thresholds for conservation were less well defined than for structure and enzymatic activity. We found that a simple measure for sequence similarity accounting for pairwise sequence identity and alignment length, the HSSP distance, distinguished accurately between protein pairs of identical and different localizations. In fact, BLAST expectation values outperformed the HSSP distance only for alignments in the subtwilight zone. We succeeded in slightly improving the accuracy of inferring localization through homology by fine tuning the thresholds. Finally, we applied our results to the entire SWISS-PROT database and five entirely sequenced eukaryotes.
Collapse
Affiliation(s)
- Rajesh Nair
- Columbia University Bioinformatics Center (CUBIC), Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| | | |
Collapse
|
21
|
Schmitt S, Kuhn D, Klebe G. A new method to detect related function among proteins independent of sequence and fold homology. J Mol Biol 2002; 323:387-406. [PMID: 12381328 DOI: 10.1016/s0022-2836(02)00811-2] [Citation(s) in RCA: 292] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
A new method has been developed to detect functional relationships among proteins independent of a given sequence or fold homology. It is based on the idea that protein function is intimately related to the recognition and subsequent response to the binding of a substrate or an endogenous ligand in a well-characterized binding pocket. Thus, recognition of similar ligands, supposedly linked to similar function, requires conserved recognition features exposed in terms of common physicochemical interaction properties via the functional groups of the residues flanking a particular binding cavity. Following a technique commonly used in the comparison of small molecule ligands, generic pseudocenters coding for possible interaction properties were assigned for a large sample set of cavities extracted from the entire PDB and stored in the database Cavbase. Using a particular query cavity a series of related cavities of decreasing similarity is detected based on a clique detection algorithm. The detected similarity is ranked according to property-based surface patches shared in common by the different clique solutions. The approach either retrieves protein cavities accommodating the same (e.g. co-factors) or closely related ligands or it extracts proteins exhibiting similar function in terms of a related catalytic mechanism. Finally the new method has strong potential to suggest alternative molecular skeletons in de novo design. The retrieval of molecular building blocks accommodated in a particular sub-pocket that shares similarity with the pocket in a protein studied by drug design can inspire the discovery of novel ligands.
Collapse
Affiliation(s)
- Stefan Schmitt
- Inst. of Pharmaceutical Chemistry, Univ. of Marburg, Marbacher Weg 6, D-35032, Marburg, Germany
| | | | | |
Collapse
|
22
|
Abstract
Over the last decade, structural biologists have unravelled many proteins that appear natively disordered. Common assumptions are that many of these proteins adopt structure through binding and that the structural flexibility enables them to adopt different functions. Here, we investigated regions of more than 70 sequence-consecutive residues that have no regular secondary structure (NORS). Analysing 31 entirely sequenced organisms, we predicted five times as many proteins with NORS regions (loopy proteins) in eukaryotes (20%) than in prokaryotes and archaeas (4%). Thousands of these NORS regions were over 150 residues long. The amino acid composition of NORS regions differed from that of loops in PDB. Although NORS proteins had significantly more residues in low-complexity regions than other proteins, simple cut-off thresholds for sequence bias missed most NORS regions. On average, NORS regions were evolutionarily at least as conserved as their flanking regions. Furthermore, yeast proteins with NORS regions had more protein-protein interaction partners than other proteins. Regulatory and transcription-related functions were over-represented in loopy proteins, biosynthesis and energy metabolism were under-represented. Overall, our analysis confirmed that proteins with non-regular structures appear to play important functional roles, and they may adopt as yet unknown types of protein structures.
Collapse
Affiliation(s)
- Jinfeng Liu
- Department of Pharmacology, Columbia University, New York, NY 10032, USA
| | | | | |
Collapse
|
23
|
Attwood TK, Blythe MJ, Flower DR, Gaulton A, Mabey JE, Maudling N, McGregor L, Mitchell AL, Moulton G, Paine K, Scordis P. PRINTS and PRINTS-S shed light on protein ancestry. Nucleic Acids Res 2002; 30:239-41. [PMID: 11752304 PMCID: PMC99143 DOI: 10.1093/nar/30.1.239] [Citation(s) in RCA: 76] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The PRINTS database houses a collection of protein fingerprints. These may be used to make family and tentative functional assignments for uncharacterised sequences. The September 2001 release (version 32.0) includes 1600 fingerprints, encoding approximately 10 000 motifs, covering a range of globular and membrane proteins, modular polypeptides and so on. In addition to its continued steady growth, we report here its use as a source of annotation in the InterPro resource, and the use of its relational cousin, PRINTS-S, to model relationships between families, including those beyond the reach of conventional sequence analysis approaches. The database is accessible for BLAST, fingerprint and text searches at http://www.bioinf.man.ac.uk/dbbrowser/PRINTS/.
Collapse
Affiliation(s)
- T K Attwood
- School of Biological Sciences, The University of Manchester, Manchester M13 9PT, UK.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Liu J, Rost B. Comparing function and structure between entire proteomes. Protein Sci 2001; 10:1970-9. [PMID: 11567088 PMCID: PMC2374214 DOI: 10.1110/ps.10101] [Citation(s) in RCA: 202] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2001] [Revised: 07/06/2001] [Accepted: 07/12/2001] [Indexed: 12/22/2022]
Abstract
More than 30 organisms have been sequenced entirely. Here, we applied a variety of simple bioinformatics tools to analyze 29 proteomes for representatives from all three kingdoms: eukaryotes, prokaryotes, and archaebacteria. We confirmed that eukaryotes have relatively more long proteins than prokaryotes and archaes, and that the overall amino acid composition is similar among the three. We predicted that approximately 15%-30% of all proteins contained transmembrane helices. We could not find a correlation between the content of membrane proteins and the complexity of the organism. In particular, we did not find significantly higher percentages of helical membrane proteins in eukaryotes than in prokaryotes or archae. However, we found more proteins with seven transmembrane helices in eukaryotes and more with six and 12 transmembrane helices in prokaryotes. We found twice as many coiled-coil proteins in eukaryotes (10%) as in prokaryotes and archaes (4%-5%), and we predicted approximately 15%-25% of all proteins to be secreted by most eukaryotes and prokaryotes. Every tenth protein had no known homolog in current databases, and 30%-40% of the proteins fell into structural families with >100 members. A classification by cellular function verified that eukaryotes have a higher proportion of proteins for communication with the environment. Finally, we found at least one homolog of experimentally known structure for approximately 20%-45% of all proteins; the regions with structural homology covered 20%-30% of all residues. These numbers may or may not suggest that there are 1200-2600 folds in the universe of protein structures. All predictions are available at http://cubic.bioc.columbia.edu/genomes.
Collapse
Affiliation(s)
- J Liu
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| | | |
Collapse
|
25
|
Orengo CA, Sillitoe I, Reeves G, Pearl FM. Review: what can structural classifications reveal about protein evolution? J Struct Biol 2001; 134:145-65. [PMID: 11551176 DOI: 10.1006/jsbi.2001.4398] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
In this article we present a review of the methods used for comparing and classifying protein structures. We discuss the hierarchies and populations of fold groups and evolutionary families in some of the major classifications and we consider some of the problems confronting any general analyses of structural evolution in protein families. We also review some more recent analyses that have expanded these classifications by identifying sequence relatives in the genomes and thereby reveal interesting trends in fold usage and recurrence.
Collapse
Affiliation(s)
- C A Orengo
- Department of Biochemistry and Molecular Biology, University College, Gower Street, London, WC1E 6BT, United Kingdom
| | | | | | | |
Collapse
|
26
|
Armon A, Graur D, Ben-Tal N. ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J Mol Biol 2001; 307:447-63. [PMID: 11243830 DOI: 10.1006/jmbi.2000.4474] [Citation(s) in RCA: 362] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Experimental approaches for the identification of functionally important regions on the surface of a protein involve mutagenesis, in which exposed residues are replaced one after another while the change in binding to other proteins or changes in activity are recorded. However, practical considerations limit the use of these methods to small-scale studies, precluding a full mapping of all the functionally important residues on the surface of a protein. We present here an alternative approach involving the use of evolutionary data in the form of multiple-sequence alignment for a protein family to identify hot spots and surface patches that are likely to be in contact with other proteins, domains, peptides, DNA, RNA or ligands. The underlying assumption in this approach is that key residues that are important for binding should be conserved throughout evolution, just like residues that are crucial for maintaining the protein fold, i.e. buried residues. A main limitation in the implementation of this approach is that the sequence space of a protein family may be unevenly sampled, e.g. mammals may be overly represented. Thus, a seemingly conserved position in the alignment may reflect a taxonomically uneven sampling, rather than being indicative of structural or functional importance. To avoid this problem, we present here a novel methodology based on evolutionary relations among proteins as revealed by inferred phylogenetic trees, and demonstrate its capabilities for mapping binding sites in SH2 and PTB signaling domains. A computer program that implements these ideas is available freely at: http://ashtoret.tau.ac.il/ approximately rony
Collapse
Affiliation(s)
- A Armon
- Department of Biochemistry, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv, Israel
| | | | | |
Collapse
|
27
|
Jacoboni I, Martelli PL, Fariselli P, Compiani M, Casadio R. Predictions of protein segments with the same aminoacid sequence and different secondary structure: a benchmark for predictive methods. Proteins 2000; 41:535-44. [PMID: 11056040 DOI: 10.1002/1097-0134(20001201)41:4<535::aid-prot100>3.0.co;2-c] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The most stringent test for predictive methods of protein secondary structure is whether identical short sequences that are known to be present with different conformations in different proteins known at atomic resolution can be correctly discriminated. In this study, we show that the prediction efficiency of this type of segments in unrelated proteins reaches an average accuracy per residue ranging from about 72 to 75% (depending on the alignment method used to generate the input sequence profile) only when methods of the third generation are used. A comparison of different methods based on segment statistics (2nd generation methods) and/or including also evolutionary information (3rd generation methods) indicate that the discrimination of the different conformations of identical segments is dependent on the method used for the prediction. Accuracy is similar when methods similarly performing on the secondary structure prediction are tested. When evolutionary information is taken into account as compared to single sequence input, the number of correctly discriminated pairs is increased twofold. The results also highlight the predictive capability of neural networks for identical segments whose conformation differs in different proteins.
Collapse
Affiliation(s)
- I Jacoboni
- Laboratory of Biocomputing, Centro Interdipartimentale per le Ricerche Biotecnologiche (CIRB), Bologna, Italy
| | | | | | | | | |
Collapse
|
28
|
Abstract
The combinatorial chemistry industry has made major advances in the handling and mixing of small volumes, and in the development of robust liquid-handling systems. In addition, developments have been made in the area of material handling for the high-throughput drug screening and combinatorial chemistry fields. Lastly, improvements in beamline optics at synchrotron sources have enabled the use of flash-frozen micron-sized (10-50 microm) crystals. The combination of these and other recent advances will make high-throughput protein crystallography possible. Further advances in high-throughput methods of protein crystallography will require application of the above developments and the accumulation of success/failure data in a more systematic manner. Major changes in crystallography technology will emerge based on the data collected by first-generation high-throughput systems.
Collapse
Affiliation(s)
- R C Stevens
- The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, USA.
| |
Collapse
|
29
|
Cort JR, Yee A, Edwards AM, Arrowsmith CH, Kennedy MA. Structure-based functional classification of hypothetical protein MTH538 from Methanobacterium thermoautotrophicum. J Mol Biol 2000; 302:189-203. [PMID: 10964569 DOI: 10.1006/jmbi.2000.4052] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The structure of MTH538, a previously uncharacterized hypothetical protein from Methanobacterium thermoautotrophicum, has been determined by NMR spectroscopy. MTH538 is one of numerous structural genomics targets selected in a genome-wide survey of uncharacterized sequences from this organism. MTH538 is a so-called singleton, a sequence not closely related to any other (known) sequences. The structure of MTH538 closely resembles the known structures of receiver domains from two component response regulator systems, such as CheY, and is similar to the structures of flavodoxins and GTP-binding proteins. Tests on MTH538 for characteristic activities of CheY and flavodoxin were negative. MTH538 did not become phosphorylated in the presence of acetyl phosphate and Mg(2+), although it appeared to bind Mg(2+). MTH538 also did not bind flavin mononucleotide (FMN) or coenzyme F(420). Nevertheless, sequence and structure parallels between MTH538/CheY and two families of ATPase/phosphatase proteins suggest that MTH538 may have a role in a phosphorylation-independent two-component response regulator system.
Collapse
Affiliation(s)
- J R Cort
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, EMSL 2569 K8-98, Richland, WA 99352, USA
| | | | | | | | | |
Collapse
|
30
|
Balasubramanian S, Schneider T, Gerstein M, Regan L. Proteomics of Mycoplasma genitalium: identification and characterization of unannotated and atypical proteins in a small model genome. Nucleic Acids Res 2000; 28:3075-82. [PMID: 10931922 PMCID: PMC108442 DOI: 10.1093/nar/28.16.3075] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2000] [Revised: 06/28/2000] [Accepted: 06/28/2000] [Indexed: 11/14/2022] Open
Abstract
We present the results of a comprehensive analysis of the proteome of Mycoplasma genitalium (MG), the smallest autonomously replicating organism that has been completely sequenced. Our aim was to identify and characterize all soluble proteins in MG that are structurally and functionally uncharacterized. We were particularly interested in identifying proteins that differed significantly from typical globular proteins, for example, proteins which are unstructured in the absence of a 'partner' molecule or those that exhibit unusual thermodynamic properties. This work is complementary to other structural genomics projects whose primary aim is to determine the three-dimensional structures of proteins with unknown folds. We have identified all the full-length open reading frames (ORFs) in MG that have no homologs of known structure and are of unknown function. Twenty-five of the total 483 ORFs fall into this category and we have expressed, purified and characterized 11 of them. We have used circular dichroism (CD) to rapidly investigate their biophysical properties. Our studies reveal that these proteins have a wide range of structures varying from highly helical to partially structured to unfolded or random coil. They also display a variety of thermodynamic properties ranging from cooperative unfolding to no detectable unfolding upon thermal denaturation. Several of these proteins are highly conserved from mycoplasma to man. Further information about target selection and CD results is available at http://bioinfo.mbb.yale.edu/genome
Collapse
Affiliation(s)
- S Balasubramanian
- Department of Molecular Biophysics and Biochemistry, Department of Computer Science and Department of Chemistry, Yale University, 266 Whitney Avenue, New Haven, CT 06520-8114, USA
| | | | | | | |
Collapse
|
31
|
Grishin NV. C-terminal domains of Escherichia coli topoisomerase I belong to the zinc-ribbon superfamily. J Mol Biol 2000; 299:1165-77. [PMID: 10873443 DOI: 10.1006/jmbi.2000.3841] [Citation(s) in RCA: 44] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Detection of remote evolutionary connections is increasingly difficult with sequence and structural divergence. A combination of sequence and structural analysis, in which statistically supported sequence similarity had a crucial impact, revealed that Escherichia coli topoisomerase I C-terminal fragment is evolutionarily related to the three tetracysteine zinc-binding domains of the enzyme. Spatial structure analysis of this C-terminal fragment indicates that it consists of two structurally similar domains and suggests homology between them. Sequence similarity between the zinc-binding domains of type Ia topoisomerases and transcription regulators of known spatial structure helps to conclude that E. coli topo I contains five copies of a zinc ribbon domain at the C terminus. Two of these domains, corresponding to the C-terminal fragment, lost their cysteine residues and are probably not able to bind zinc. Present analyses lead to the classification of the C-terminal fragment of E. coli topoisomerase I as a member of zinc ribbon superfamily, despite the absence of zinc-binding sites.
Collapse
Affiliation(s)
- N V Grishin
- Biochemistry Department, University of Texas Southwestern Medical Center, 5323, Harry Hines Blvd, Dallas, TX, 75390-9038, USA.
| |
Collapse
|
32
|
Abstract
Here, we present a systematic analysis of the open-faced beta-sheet topologies in a set of non-redundant protein domain structures; in particular, we focus on the topological diversity of four-stranded beta-sheet motifs. Of the 96 topologies that are possible for a four-stranded beta-sheet, 42 were identified in known protein structures. Of these, four account for 50% of the structures that we have studied. Two sets of the topologies that were not observed may represent the section of the topological space that is not readily accessible to proteins on either thermodynamic or kinetic grounds. The first set contains topologies with alternating parallel and antiparallel beta-ladders. Their rare occurrence reflects the expectation that it is energetically unfavorable to match different hydrogen bonding patterns. The polypeptide chains in the second set of topologies go through convoluted paths and are expected to experience great kinetic frustrations during the folding processes. A knowledge of the potential causes for the topological preference of small beta-sheets also helps us to understand the topological properties of larger beta-sheet structures which frequently contain four-stranded motifs. The notion that protein topologies can only be taken from a confined and discrete space has important implications for structural genomics.
Collapse
Affiliation(s)
- C Zhang
- Department of Chemistry, E.O. Lawrence Berkeley National Laboratory, University of California, Berkeley, CA, 94720, USA
| | | |
Collapse
|
33
|
Minasov G, Teplova M, Stewart GC, Koonin EV, Anderson WF, Egli M. Functional implications from crystal structures of the conserved Bacillus subtilis protein Maf with and without dUTP. Proc Natl Acad Sci U S A 2000; 97:6328-33. [PMID: 10841541 PMCID: PMC18602 DOI: 10.1073/pnas.97.12.6328] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Three-dimensional structures of functionally uncharacterized proteins may furnish insight into their functions. The potential benefits of three-dimensional structural information regarding such proteins are particularly obvious when the corresponding genes are conserved during evolution, implying an important function, and no functional classification can be inferred from their sequences. The Bacillus subtilis Maf protein is representative of a family of proteins that has homologs in many of the completely sequenced genomes from archaea, prokaryotes, and eukaryotes, but whose function is unknown. As an aid in exploring function, we determined the crystal structure of this protein at a resolution of 1.85 A. The structure, in combination with multiple sequence alignment, reveals a putative active site. Phosphate ions present at this site and structural similarities between a portion of Maf and the anticodon-binding domains of several tRNA synthetases suggest that Maf may be a nucleic acid-binding protein. The crystal structure of a Maf-nucleoside triphosphate complex provides support for this hypothesis and hints at di- or oligonucleotides with either 5'- or 3'-terminal phosphate groups as ligands or substrates of Maf. A further clue comes from the observation that the structure of the Maf monomer bears similarity to that of the recently reported Methanococcus jannaschii Mj0226 protein. Just as for Maf, the structure of this predicted NTPase was determined as part of a structural genomics pilot project. The structural relation between Maf and Mj0226 was not apparent from sequence analysis approaches. These results emphasize the potential of structural genomics to reveal new unexpected connections between protein families previously considered unrelated.
Collapse
Affiliation(s)
- G Minasov
- Department of Molecular Pharmacology and Biological Chemistry and The Drug Discovery Program, Northwestern University Medical School, Chicago, IL 60611, USA
| | | | | | | | | | | |
Collapse
|
34
|
Büssow K, Nordhoff E, Lübbert C, Lehrach H, Walter G. A human cDNA library for high-throughput protein expression screening. Genomics 2000; 65:1-8. [PMID: 10777659 DOI: 10.1006/geno.2000.6141] [Citation(s) in RCA: 108] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We have constructed a human fetal brain cDNA library in an Escherichia coli expression vector for high-throughput screening of recombinant human proteins. Using robot technology, the library was arrayed in microtiter plates and gridded onto high-density filter membranes. Putative expression clones were detected on the filters using an antibody against the N-terminal sequence RGS-His(6) of fusion proteins. Positive clones were rearrayed into a new sublibrary, and 96 randomly chosen clones were analyzed. Expression products were analyzed by SDS-PAGE, affinity purification, matrix-assisted laser desorption/ionization-time-of-flight mass spectrometry, and the determined protein masses were compared to masses predicted from DNA sequencing data. It was found that 66% of these clones contained inserts in a correct reading frame. Sixty-four percent of the correct reading frame clones comprised the complete coding sequence of a human protein. High-throughput microtiter plate methods were developed for protein expression, extraction, purification, and mass spectrometric analyses. An enzyme assay for glyceraldehyde-3-phosphate dehydrogenase activity in native extracts was adapted to the microtiter plate format. Our data indicate that high-throughput screening of an arrayed protein expression library is an economical way of generating large numbers of clones producing recombinant human proteins for structural and functional analyses.
Collapse
Affiliation(s)
- K Büssow
- Max Planck Institute of Molecular Genetics, Ihnestrasse 73, Berlin, 14195, Germany.
| | | | | | | | | |
Collapse
|
35
|
Fischer D. Rational structural genomics: affirmative action for ORFans and the growth in our structural knowledge. PROTEIN ENGINEERING 1999; 12:1029-30. [PMID: 10611394 DOI: 10.1093/protein/12.12.1029] [Citation(s) in RCA: 21] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Affiliation(s)
- D Fischer
- Faculty of Natural Science, Department of Mathematics and Computer Science, Ben Gurion University, Beer-Sheva 84015, Israel.
| |
Collapse
|
36
|
Blomberg N, Gabdoulline RR, Nilges M, Wade RC. Classification of protein sequences by homology modeling and quantitative analysis of electrostatic similarity. Proteins 1999; 37:379-87. [PMID: 10591098 DOI: 10.1002/(sici)1097-0134(19991115)37:3<379::aid-prot6>3.0.co;2-k] [Citation(s) in RCA: 72] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Protein electrostatics plays a key role in ligand binding and protein-protein interactions. Therefore, similarities or dissimilarities in electrostatic potentials can be used as indicators of similarities or dissimilarities in protein function. We here describe a method to compare the electrostatic properties within protein families objectively and quantitatively. Three-dimensional structures are built from database sequences by comparative modeling. Molecular potentials are then computed for these with a continuum solvation model by finite difference solution of the Poisson-Boltzmann equation or analytically as a multipole expansion that permits rapid comparison of very large datasets. This approach is applied to 104 members of the Pleckstrin homology (PH) domain family. The deviation of the potentials of the homology models from those of the corresponding experimental structures is comparable to the variation of the potential in an ensemble of structures from nuclear magnetic resonance data or between snapshots from a molecular dynamics simulation. For this dataset, the results for analysis of the full electrostatic potential and the analysis using only monopole and dipole terms are very similar. The electrostatic properties of the PH domains are generally conserved despite the extreme sequence divergence in this family. Notable exceptions from this conservation are seen for PH domains linked to a Db1 homology (DH) domain and in proteins with internal PH domain repeats.
Collapse
Affiliation(s)
- N Blomberg
- European Molecular Biology Laboratory, Heidelberg, Germany
| | | | | | | |
Collapse
|
37
|
Ponting CP, Aravind L, Schultz J, Bork P, Koonin EV. Eukaryotic signalling domain homologues in archaea and bacteria. Ancient ancestry and horizontal gene transfer. J Mol Biol 1999; 289:729-45. [PMID: 10369758 DOI: 10.1006/jmbi.1999.2827] [Citation(s) in RCA: 245] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Phyletic distributions of eukaryotic signalling domains were studied using recently developed sensitive methods for protein sequence analysis, with an emphasis on the detection and accurate enumeration of homologues in bacteria and archaea. A major difference was found between the distributions of enzyme families that are typically found in all three divisions of cellular life and non-enzymatic domain families that are usually eukaryote-specific. Previously undetected bacterial homologues were identified for# plant pathogenesis-related proteins, Pad1, von Willebrand factor type A, src homology 3 and YWTD repeat-containing domains. Comparisons of the domain distributions in eukaryotes and prokaryotes enabled distinctions to be made between the domains originating prior to the last common ancestor of all known life forms and those apparently originating as consequences of horizontal gene transfer events. A number of transfers of signalling domains from eukaryotes to bacteria were confidently identified, in contrast to only a single case of apparent transfer from eukaryotes to archaea.
Collapse
Affiliation(s)
- C P Ponting
- National Center for Biotechnology Information National Library of Medicine, National Institutes of Health, Bldg. 38A, Bethesda, MD, 20894, USA.
| | | | | | | | | |
Collapse
|
38
|
Terwilliger TC, Berendzen J. Automated MAD and MIR structure solution. ACTA CRYSTALLOGRAPHICA. SECTION D, BIOLOGICAL CRYSTALLOGRAPHY 1999; 55:849-61. [PMID: 10089316 PMCID: PMC2746121 DOI: 10.1107/s0907444999000839] [Citation(s) in RCA: 2717] [Impact Index Per Article: 104.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/1998] [Accepted: 01/15/1999] [Indexed: 11/13/2022]
Abstract
Obtaining an electron-density map from X-ray diffraction data can be difficult and time-consuming even after the data have been collected, largely because MIR and MAD structure determinations currently require many subjective evaluations of the qualities of trial heavy-atom partial structures before a correct heavy-atom solution is obtained. A set of criteria for evaluating the quality of heavy-atom partial solutions in macromolecular crystallography have been developed. These have allowed the conversion of the crystal structure-solution process into an optimization problem and have allowed its automation. The SOLVE software has been used to solve MAD data sets with as many as 52 selenium sites in the asymmetric unit. The automated structure-solution process developed is a major step towards the fully automated structure-determination, model-building and refinement procedure which is needed for genomic scale structure determinations.
Collapse
Affiliation(s)
- T C Terwilliger
- Structural Biology Group, Mail Stop M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | | |
Collapse
|
39
|
Zarembinski TI, Hung LW, Mueller-Dieckmann HJ, Kim KK, Yokota H, Kim R, Kim SH. Structure-based assignment of the biochemical function of a hypothetical protein: a test case of structural genomics. Proc Natl Acad Sci U S A 1998; 95:15189-93. [PMID: 9860944 PMCID: PMC28018 DOI: 10.1073/pnas.95.26.15189] [Citation(s) in RCA: 226] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/28/1998] [Indexed: 11/18/2022] Open
Abstract
Many small bacterial, archaebacterial, and eukaryotic genomes have been sequenced, and the larger eukaryotic genomes are predicted to be completely sequenced within the next decade. In all genomes sequenced to date, a large portion of these organisms' predicted protein coding regions encode polypeptides of unknown biochemical, biophysical, and/or cellular functions. Three-dimensional structures of these proteins may suggest biochemical or biophysical functions. Here we report the crystal structure of one such protein, MJ0577, from a hyperthermophile, Methanococcus jannaschii, at 1.7-A resolution. The structure contains a bound ATP, suggesting MJ0577 is an ATPase or an ATP-mediated molecular switch, which we confirm by biochemical experiments. Furthermore, the structure reveals different ATP binding motifs that are shared among many homologous hypothetical proteins in this family. This result indicates that structure-based assignment of molecular function is a viable approach for the large-scale biochemical assignment of proteins and for discovering new motifs, a basic premise of structural genomics.
Collapse
Affiliation(s)
- T I Zarembinski
- Physical Biosciences Division of Lawrence Berkeley National Laboratory, University of California, Berkeley, CA 94720, USA
| | | | | | | | | | | | | |
Collapse
|
40
|
Teichmann SA, Park J, Chothia C. Structural assignments to the Mycoplasma genitalium proteins show extensive gene duplications and domain rearrangements. Proc Natl Acad Sci U S A 1998; 95:14658-63. [PMID: 9843945 PMCID: PMC24505 DOI: 10.1073/pnas.95.25.14658] [Citation(s) in RCA: 112] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The parasitic bacterium Mycoplasma genitalium has a small, reduced genome with close to a basic set of genes. As a first step toward determining the families of protein domains that form the products of these genes, we have used the multiple sequence programs PSI-BLAST and GEANFAMMER to match the sequences of the 467 gene products of M. genitalium to the sequences of the domains that form proteins of known structure [Protein Data Bank (PDB) sequences]. PDB sequences (274) match all of 106 M. genitalium sequences and some parts of another 85; thus, 41% of its total sequences are matched in all or part. The evolutionary relationships of the PDB domains that match M. genitalium are described in the structural classification of proteins (SCOP) database. Using this information, we show that the domains in the matched M. genitalium sequences come from 114 superfamilies and that 58% of them have arisen by gene duplication. This level of duplication is more than twice that found by using pairwise sequence comparisons. The PDB domain matches also describe the domain structure of the matched sequences: just over a quarter contain one domain and the rest have combinations of two or more domains.
Collapse
Affiliation(s)
- S A Teichmann
- Medical Research Council Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, United Kingdom.
| | | | | |
Collapse
|
41
|
Pautsch A, Schulz GE. Structure of the outer membrane protein A transmembrane domain. NATURE STRUCTURAL BIOLOGY 1998; 5:1013-7. [PMID: 9808047 DOI: 10.1038/2983] [Citation(s) in RCA: 393] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The outer membrane protein A of Escherichia coli (OmpA) is an intensely studied example in the field of membrane protein folding. We have determined the structure of the OmpA transmembrane domain consisting of residues 1-171, by X-ray diffraction analysis, to a resolution of 2.5 A. It consists of a regular, extended eight-stranded beta-barrel and appears to be constructed like an inverse micelle with large water-filled cavities, but does not form a pore. Surprisingly, the cavities seem to be highly conserved during evolution. The structure corroborates the concept that all outer membrane proteins consist of beta-barrels. The structure constitutes a beta-barrel membrane anchor that appears to be the outer membrane equivalent of the single-chain alpha-helix anchor of the inner membrane.
Collapse
Affiliation(s)
- A Pautsch
- Institut für Organische Chemie und Biochemie, Albert-Ludwigs-Universität, Freiburg im Breisgau, Germany
| | | |
Collapse
|
42
|
Gerstein M, Hegyi H. Comparing genomes in terms of protein structure: surveys of a finite parts list. FEMS Microbiol Rev 1998; 22:277-304. [PMID: 10357579 DOI: 10.1111/j.1574-6976.1998.tb00371.x] [Citation(s) in RCA: 67] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
We give an overview of the emerging field of structural genomics, describing how genomes can be compared in terms of protein structure. As the number of genes in a genome and the total number of protein folds are both quite limited, these comparisons take the form of surveys of a finite parts list, similar in respects to demographic censuses. Fold surveys have many similarities with other whole-genome characterizations, e.g., analyses of motifs or pathways. However, structure has a number of aspects that make it particularly suitable for comparing genomes, namely the way it allows for the precise definition of a basic protein module and the fact that it has a better defined relationship to sequence similarity than does protein function. An essential requirement for a structure survey is a library of folds, which groups the known structures into 'fold families.' This library can be built up automatically using a structure comparison program, and we described how important objective statistical measures are for assessing similarities within the library and between the library and genome sequences. After building the library, one can use it to count the number of folds in genomes, expressing the results in the form of Venn diagrams and 'top-10' statistics for shared and common folds. Depending on the counting methodology employed, these statistics can reflect different aspects of the genome, such as the amount of internal duplication or gene expression. Previous analyses have shown that the common folds shared between very different microorganisms, i.e., in different kingdoms, have a remarkably similar structure, being comprised of repeated strand-helix-strand super-secondary structure units. A major difficulty with this sort of 'fold-counting' is that only a small subset of the structures in a complete genome are currently known and this subset is prone to sampling bias. One way of overcoming biases is through structure prediction, which can be applied uniformly and comprehensively to a whole genome. Various investigators have, in fact, already applied many of the existing techniques for predicting secondary structure and transmembrane (TM) helices to the recently sequenced genomes. The results have been consistent: microbial genomes have similar fractions of strands and helices even though they have significantly different amino acid composition. The fraction of membrane proteins with a given number of TM helices falls off rapidly with more TM elements, approximately according to a Zipf law. This latter finding indicates that there is no preference for the highly studied 7-TM proteins in microbial genomes. Continuously updated tables and further information pertinent to this review are available over the web at http://bioinfo.mbb.yale.edu/genome.
Collapse
Affiliation(s)
- M Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA.
| | | |
Collapse
|
43
|
Abstract
Genome sequencing projects continue to provide a flood of new protein sequences, and prediction methods remain an important means of adding structural information. Recently, there have been advances in secondary structure prediction, which feed, in turn, into improved fold recognition algorithms. Finally, there have been technical improvements in comparative modelling, and studies of the expected accuracy of three-dimensional structural models built by this method.
Collapse
Affiliation(s)
- D R Westhead
- The European Bioinformatics Institute EMBL Outstation Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD, UK.
| | | |
Collapse
|