1
|
Mahajan S, de Brevern AG, Sanejouand YH, Srinivasan N, Offmann B. Use of a structural alphabet to find compatible folds for amino acid sequences. Protein Sci 2014; 24:145-53. [PMID: 25297700 DOI: 10.1002/pro.2581] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Accepted: 10/06/2014] [Indexed: 01/01/2023]
Abstract
The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence-search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino-acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as "Protein Blocks" (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence-search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z-score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales-up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web-server that is freely available at http://www.bo-protscience.fr/forsa.
Collapse
Affiliation(s)
- Swapnil Mahajan
- Université de La Réunion, DSIMB, UMR-S S1134, Saint Denis Messag Cedex 09, La Réunion, F-97715, France; INSERM, UMR-S 1134, DSIMB, F-75739, Paris, France; Laboratoire d'Excellence, GR-Ex, Paris, F-75739, France; Université de Nantes, UFIP CNRS UMR 6286 Faculté des Sciences et Techniques, 2 rue de la Houssinière, 44392, Nantes Cedex 03, France
| | | | | | | | | |
Collapse
|
2
|
Gana R, Rao S, Huang H, Wu C, Vasudevan S. Structural and functional studies of S-adenosyl-L-methionine binding proteins: a ligand-centric approach. BMC STRUCTURAL BIOLOGY 2013; 13:6. [PMID: 23617634 PMCID: PMC3662625 DOI: 10.1186/1472-6807-13-6] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/29/2012] [Accepted: 04/09/2013] [Indexed: 12/31/2022]
Abstract
BACKGROUND The post-genomic era poses several challenges. The biggest is the identification of biochemical function for protein sequences and structures resulting from genomic initiatives. Most sequences lack a characterized function and are annotated as hypothetical or uncharacterized. While homology-based methods are useful, and work well for sequences with sequence identities above 50%, they fail for sequences in the twilight zone (<30%) of sequence identity. For cases where sequence methods fail, structural approaches are often used, based on the premise that structure preserves function for longer evolutionary time-frames than sequence alone. It is now clear that no single method can be used successfully for functional inference. Given the growing need for functional assignments, we describe here a systematic new approach, designated ligand-centric, which is primarily based on analysis of ligand-bound/unbound structures in the PDB. Results of applying our approach to S-adenosyl-L-methionine (SAM) binding proteins are presented. RESULTS Our analysis included 1,224 structures that belong to 172 unique families of the Protein Information Resource Superfamily system. Our ligand-centric approach was divided into four levels: residue, protein/domain, ligand, and family levels. The residue level included the identification of conserved binding site residues based on structure-guided sequence alignments of representative members of a family, and the identification of conserved structural motifs. The protein/domain level included structural classification of proteins, Pfam domains, domain architectures, and protein topologies. The ligand level included ligand conformations, ribose sugar puckering, and the identification of conserved ligand-atom interactions. The family level included phylogenetic analysis. CONCLUSION We found that SAM bound to a total of 18 different fold types (I-XVIII). We identified 4 new fold types and 11 additional topological arrangements of strands within the well-studied Rossmann fold Methyltransferases (MTases). This extends the existing structural classification of SAM binding proteins. A striking correlation between fold type and the conformation of the bound SAM (classified as types) was found across the 18 fold types. Several site-specific rules were created for the assignment of functional residues to families and proteins that do not have a bound SAM or a solved structure.
Collapse
Affiliation(s)
- Rajaram Gana
- Department of Biostatistics and Bioinformatics, Georgetown University Medical Center, Washington, DC 20007, USA
| | | | | | | | | |
Collapse
|
3
|
Cormier CY, Park JG, Fiacco M, Steel J, Hunter P, Kramer J, Singla R, LaBaer J. PSI:Biology-materials repository: a biologist's resource for protein expression plasmids. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2011; 12:55-62. [PMID: 21360289 PMCID: PMC3184641 DOI: 10.1007/s10969-011-9100-8] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2010] [Accepted: 02/02/2011] [Indexed: 01/08/2023]
Abstract
The Protein Structure Initiative:Biology-Materials Repository (PSI:Biology-MR; MR; http://psimr.asu.edu ) sequence-verifies, annotates, stores, and distributes the protein expression plasmids and vectors created by the Protein Structure Initiative (PSI). The MR has developed an informatics and sample processing pipeline that manages this process for thousands of samples per month from nearly a dozen PSI centers. DNASU ( http://dnasu.asu.edu ), a freely searchable database, stores the plasmid annotations, which include the full-length sequence, vector information, and associated publications for over 130,000 plasmids created by our laboratory, by the PSI and other consortia, and by individual laboratories for distribution to researchers worldwide. Each plasmid links to external resources, including the PSI Structural Biology Knowledgebase ( http://sbkb.org ), which facilitates cross-referencing of a particular plasmid to additional protein annotations and experimental data. To expedite and simplify plasmid requests, the MR uses an expedited material transfer agreement (EP-MTA) network, where researchers from network institutions can order and receive PSI plasmids without institutional delays. As of March 2011, over 39,000 protein expression plasmids and 78 empty vectors from the PSI are available upon request from DNASU. Overall, the MR's repository of expression-ready plasmids, its automated pipeline, and the rapid process for receiving and distributing these plasmids more effectively allows the research community to dissect the biological function of proteins whose structures have been studied by the PSI.
Collapse
Affiliation(s)
- Catherine Y. Cormier
- The Virginia G. Piper Center for Personalized Diagnostics at the Biodesign Institute at Arizona State University, Tempe, AZ 85287-6401
| | - Jin G. Park
- The Virginia G. Piper Center for Personalized Diagnostics at the Biodesign Institute at Arizona State University, Tempe, AZ 85287-6401
| | - Michael Fiacco
- The Virginia G. Piper Center for Personalized Diagnostics at the Biodesign Institute at Arizona State University, Tempe, AZ 85287-6401
| | - Jason Steel
- The Virginia G. Piper Center for Personalized Diagnostics at the Biodesign Institute at Arizona State University, Tempe, AZ 85287-6401
| | - Preston Hunter
- The Virginia G. Piper Center for Personalized Diagnostics at the Biodesign Institute at Arizona State University, Tempe, AZ 85287-6401
| | - Jason Kramer
- The Virginia G. Piper Center for Personalized Diagnostics at the Biodesign Institute at Arizona State University, Tempe, AZ 85287-6401
| | - Rajeev Singla
- The Virginia G. Piper Center for Personalized Diagnostics at the Biodesign Institute at Arizona State University, Tempe, AZ 85287-6401
| | - Joshua LaBaer
- The Virginia G. Piper Center for Personalized Diagnostics at the Biodesign Institute at Arizona State University, Tempe, AZ 85287-6401
| |
Collapse
|
4
|
Nair R, Liu J, Soong TT, Acton TB, Everett JK, Kouranov A, Fiser A, Godzik A, Jaroszewski L, Orengo C, Montelione GT, Rost B. Structural genomics is the largest contributor of novel structural leverage. ACTA ACUST UNITED AC 2009; 10:181-91. [PMID: 19194785 PMCID: PMC2705706 DOI: 10.1007/s10969-008-9055-6] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2008] [Accepted: 12/08/2008] [Indexed: 11/28/2022]
Abstract
The Protein Structural Initiative (PSI) at the US National Institutes of Health (NIH) is funding four large-scale centers for structural genomics (SG). These centers systematically target many large families without structural coverage, as well as very large families with inadequate structural coverage. Here, we report a few simple metrics that demonstrate how successfully these efforts optimize structural coverage: while the PSI-2 (2005-now) contributed more than 8% of all structures deposited into the PDB, it contributed over 20% of all novel structures (i.e. structures for protein sequences with no structural representative in the PDB on the date of deposition). The structural coverage of the protein universe represented by today’s UniProt (v12.8) has increased linearly from 1992 to 2008; structural genomics has contributed significantly to the maintenance of this growth rate. Success in increasing novel leverage (defined in Liu et al. in Nat Biotechnol 25:849–851, 2007) has resulted from systematic targeting of large families. PSI’s per structure contribution to novel leverage was over 4-fold higher than that for non-PSI structural biology efforts during the past 8 years. If the success of the PSI continues, it may just take another ~15 years to cover most sequences in the current UniProt database.
Collapse
Affiliation(s)
- Rajesh Nair
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Mazumder R, Vasudevan S. Structure-guided comparative analysis of proteins: principles, tools, and applications for predicting function. PLoS Comput Biol 2008; 4:e1000151. [PMID: 18818720 PMCID: PMC2515338 DOI: 10.1371/journal.pcbi.1000151] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Affiliation(s)
- Raja Mazumder
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, D.C., United States of America
| | - Sona Vasudevan
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, D.C., United States of America
- * E-mail:
| |
Collapse
|
6
|
Structural genomics: from genes to structures with valuable materials and many questions in between. Nat Methods 2008; 5:129-32. [PMID: 18235432 DOI: 10.1038/nmeth0208-129] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The Protein Structure Initiative (PSI), funded by the US National Institutes of Health (NIH), provides a framework for the development and systematic evaluation of methods to solve protein structures. Although the PSI and other structural genomics efforts around the world have led to the solution of many new protein structures as well as the development of new methods, methodological bottlenecks still exist and are being addressed in this 'production phase' of PSI.
Collapse
|
7
|
Small-scale, semi-automated purification of eukaryotic proteins for structure determination. ACTA ACUST UNITED AC 2007; 8:153-66. [PMID: 17985212 PMCID: PMC2668602 DOI: 10.1007/s10969-007-9032-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2007] [Accepted: 10/16/2007] [Indexed: 11/07/2022]
Abstract
A simple approach that allows cost-effective automated purification of recombinant proteins in levels sufficient for functional characterization or structural studies is described. Studies with four human stem cell proteins, an engineered version of green fluorescent protein, and other proteins are included. The method combines an expression vector (pVP62K) that provides in vivo cleavage of an initial fusion protein, a factorial designed auto-induction medium that improves the performance of small-scale production, and rapid, automated metal affinity purification of His8-tagged proteins. For initial small-scale production screening, single colony transformants were grown overnight in 0.4 ml of auto-induction medium, produced proteins were purified using the Promega Maxwell 16, and purification results were analyzed by Caliper LC90 capillary electrophoresis. The yield of purified [U-15N]-His8-Tcl-1 was 7.5 μg/ml of culture medium, of purified [U-15N]-His8-GFP was 68 μg/ml, and of purified selenomethione-labeled AIA–GFP (His8 removed by treatment with TEV protease) was 172 μg/ml. The yield information obtained from a successful automated purification from 0.4 ml was used to inform the decision to scale-up for a second meso-scale (10–50 ml) cell growth and automated purification. 1H–15N NMR HSQC spectra of His8-Tcl-1 and of His8-GFP prepared from 50 ml cultures showed excellent chemical shift dispersion, consistent with well folded states in solution suitable for structure determination. Moreover, AIA–GFP obtained by proteolytic removal of the His8 tag was subjected to crystallization screening, and yielded crystals under several conditions. Single crystals were subsequently produced and optimized by the hanging drop method. The structure was solved by molecular replacement at a resolution of 1.7 Å. This approach provides an efficient way to carry out several key target screening steps that are essential for successful operation of proteomics pipelines with eukaryotic proteins: examination of total expression, determination of proteolysis of fusion tags, quantification of the yield of purified protein, and suitability for structure determination.
Collapse
|
8
|
Marsden RL, Lewis TA, Orengo CA. Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint. BMC Bioinformatics 2007; 8:86. [PMID: 17349043 PMCID: PMC1829165 DOI: 10.1186/1471-2105-8-86] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2006] [Accepted: 03/09/2007] [Indexed: 11/25/2022] Open
Abstract
Background Structural genomics initiatives were established with the aim of solving protein structures on a large-scale. For many initiatives, such as the Protein Structure Initiative (PSI), the primary aim of target selection is focussed towards structurally characterising protein families which, so far, lack a structural representative. It is therefore of considerable interest to gain insights into the number and distribution of these families, and what efforts may be required to achieve a comprehensive structural coverage across all protein families. Results In this analysis we have derived a comprehensive domain annotation of the genomes using CATH, Pfam-A and Newfam domain families. We consider what proportions of structurally uncharacterised families are accessible to high-throughput structural genomics pipelines, specifically those targeting families containing multiple prokaryotic orthologues. In measuring the domain coverage of the genomes, we show the benefits of selecting targets from both structurally uncharacterised domain families, whilst in addition, pursuing additional targets from large structurally characterised protein superfamilies. Conclusion This work suggests that such a combined approach to target selection is essential if structural genomics is to achieve a comprehensive structural coverage of the genomes, leading to greater insights into structure and the mechanisms that underlie protein evolution.
Collapse
Affiliation(s)
- Russell L Marsden
- Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK
| | - Tony A Lewis
- Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK
| | - Christine A Orengo
- Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
9
|
Watson JD, Sanderson S, Ezersky A, Savchenko A, Edwards A, Orengo C, Joachimiak A, Laskowski RA, Thornton JM. Towards fully automated structure-based function prediction in structural genomics: a case study. J Mol Biol 2007; 367:1511-22. [PMID: 17316683 PMCID: PMC2566530 DOI: 10.1016/j.jmb.2007.01.063] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2006] [Revised: 01/23/2007] [Accepted: 01/24/2007] [Indexed: 10/23/2022]
Abstract
As the global Structural Genomics projects have picked up pace, the number of structures annotated in the Protein Data Bank as hypothetical protein or unknown function has grown significantly. A major challenge now involves the development of computational methods to assign functions to these proteins accurately and automatically. As part of the Midwest Center for Structural Genomics (MCSG) we have developed a fully automated functional analysis server, ProFunc, which performs a battery of analyses on a submitted structure. The analyses combine a number of sequence-based and structure-based methods to identify functional clues. After the first stage of the Protein Structure Initiative (PSI), we review the success of the pipeline and the importance of structure-based function prediction. As a dataset, we have chosen all structures solved by the MCSG during the 5 years of the first PSI. Our analysis suggests that two of the structure-based methods are particularly successful and provide examples of local similarity that is difficult to identify using current sequence-based methods. No one method is successful in all cases, so, through the use of a number of complementary sequence and structural approaches, the ProFunc server increases the chances that at least one method will find a significant hit that can help elucidate function. Manual assessment of the results is a time-consuming process and subject to individual interpretation and human error. We present a method based on the Gene Ontology (GO) schema using GO-slims that can allow the automated assessment of hits with a success rate approaching that of expert manual assessment.
Collapse
Affiliation(s)
- James D Watson
- EMBL--European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Lee D, Grant A, Marsden RL, Orengo C. Identification and distribution of protein families in 120 completed genomes using Gene3D. Proteins 2006; 59:603-15. [PMID: 15768405 DOI: 10.1002/prot.20409] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Using a new protocol, PFscape, we undertake a systematic identification of protein families and domain architectures in 120 complete genomes. PFscape clusters sequences into protein families using a Markov clustering algorithm (Enright et al., Nucleic Acids Res 2002;30:1575-1584) followed by complete linkage clustering according to sequence identity. Within each protein family, domains are recognized using a library of hidden Markov models comprising CATH structural and Pfam functional domains. Domain architectures are then determined using DomainFinder (Pearl et al., Protein Sci 2002;11:233-244) and the protein family and domain architecture data are amalgamated in the Gene3D database (Buchan et al., Genome Res 2002;12:503-514). Using Gene3D, we have investigated protein sequence space, the extent of structural annotation, and the distribution of different domain architectures in completed genomes from all kingdoms of life. As with earlier studies by other researchers, the distribution of domain families shows power-law behavior such that the largest 2,000 domain families can be mapped to approximately 70% of nonsingleton genome sequences; the remaining sequences are assigned to much smaller families. While approximately 50% of domain annotations within a genome are assigned to 219 universal domain families, a much smaller proportion (< 10%) of protein sequences are assigned to universal protein families. This supports the mosaic theory of evolution whereby domain duplication followed by domain shuffling gives rise to novel domain architectures that can expand the protein functional repertoire of an organism. Functional data (e.g. COG/KEGG/GO) integrated within Gene3D result in a comprehensive resource that is currently being used in structure genomics initiatives and can be accessed via http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/.
Collapse
Affiliation(s)
- David Lee
- Biomolecular Structure and Modelling Group, Department of Biochemistry, University College London, Gower Street, London.
| | | | | | | |
Collapse
|
11
|
Noble WS, Kuang R, Leslie C, Weston J. Identifying remote protein homologs by network propagation. FEBS J 2005; 272:5119-28. [PMID: 16218946 DOI: 10.1111/j.1742-4658.2005.04947.x] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Perhaps the most widely used applications of bioinformatics are tools such as psi-blast for searching sequence databases. We describe a recently developed protein database search algorithm called rankprop. rankprop relies upon a precomputed network of pairwise protein similarities. The algorithm performs a diffusion operation from a specified query protein across the protein similarity network. The resulting activation scores, assigned to each database protein, encode information about the global structure of the protein similarity network. This type of algorithm has a rich history in associationist psychology, artificial intelligence and web search. We describe the rankprop algorithm and its relatives, and we provide evidence that the algorithm successfully improves upon the rankings produced by psi-blast.
Collapse
Affiliation(s)
- William S Noble
- Department of Genome Sciences Department of Computer Science and Engineering University of Washington Seattle, WA, USA.
| | | | | | | |
Collapse
|
12
|
Glaser F, Morris RJ, Najmanovich RJ, Laskowski RA, Thornton JM. A method for localizing ligand binding pockets in protein structures. Proteins 2005; 62:479-88. [PMID: 16304646 DOI: 10.1002/prot.20769] [Citation(s) in RCA: 141] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The accurate identification of ligand binding sites in protein structures can be valuable in determining protein function. Once the binding site is known, it becomes easier to perform in silico and experimental procedures that may allow the ligand type and the protein function to be determined. For example, binding pocket shape analysis relies heavily on the correct localization of the ligand binding site. We have developed SURFNET-ConSurf, a modular, two-stage method for identifying the location and shape of potential ligand binding pockets in protein structures. In the first stage, the SURFNET program identifies clefts in the protein surface that are potential binding sites. In the second stage, these clefts are trimmed in size by cutting away regions distant from highly conserved residues, as defined by the ConSurf-HSSP database. The largest clefts that remain tend to be those where ligands bind. To test the approach, we analyzed a nonredundant set of 244 protein structures from the PDB and found that SURFNET-ConSurf identifies a ligand binding pocket in 75% of them. The trimming procedure reduces the original cleft volumes by 30% on average, while still encompassing an average 87% of the ligand volume. From the analysis of the results we conclude that for those cases in which the ligands are found in large, highly conserved clefts, the combined SURFNET-ConSurf method gives pockets that are a better match to the ligand shape and location. We also show that this approach works better for enzymes than for nonenzyme proteins.
Collapse
Affiliation(s)
- Fabian Glaser
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
| | | | | | | | | |
Collapse
|
13
|
Chandonia JM, Kim SH, Brenner SE. Target selection and deselection at the Berkeley Structural Genomics Center. Proteins 2005; 62:356-70. [PMID: 16276528 DOI: 10.1002/prot.20674] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
At the Berkeley Structural Genomics Center (BSGC), our goal is to obtain a near-complete structural complement of proteins in the minimal organisms Mycoplasma genitalium and M. pneumoniae, two closely related pathogens. Current targets for structure determination have been selected in six major stages, starting with those predicted to be most tractable to high throughput study and likely to yield new structural information. We report on the process used to select these proteins, as well as our target deselection procedure. Target deselection reduces experimental effort by eliminating targets similar to those recently solved by the structural biology community or other centers. We measure the impact of the 69 structures solved at the BSGC as of July 2004 on structure prediction coverage of the M. pneumoniae and M. genitalium proteomes. The number of Mycoplasma proteins for which the fold could first be reliably assigned based on structures solved at the BSGC (24 M. pneumoniae and 21 M. genitalium) is approximately 25% of the total resulting from work at all structural genomics centers and the worldwide structural biology community (94 M. pneumoniae and 86 M. genitalium) during the same period. As the number of structures contributed by the BSGC during that period is less than 1% of the total worldwide output, the benefits of a focused target selection strategy are apparent. If the structures of all current targets were solved, the percentage of M. pneumoniae proteins for which folds could be reliably assigned would increase from approximately 57% (391 of 687) at present to around 80% (550 of 687), and the percentage of the proteome that could be accurately modeled would increase from around 37% (254 of 687) to about 64% (438 of 687). In M. genitalium, the percentage of the proteome that could be structurally annotated based on structures of our remaining targets would rise from 72% (348 of 486) to around 76% (371 of 486), with the percentage of accurately modeled proteins would rise from 50% (243 of 486) to 58% (283 of 486). Sequences and data on experimental progress on our targets are available in the public databases TargetDB and PEPCdb.
Collapse
Affiliation(s)
- John-Marc Chandonia
- Berkeley Structural Genomics Center, Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | | | | |
Collapse
|
14
|
Abstract
ProTarget is a Web-based tool for the automatic prediction of fold novelty. It offers the structural genomics community a method for target selection by providing an online analysis of any new or pre-existing sequence for its relationship to any previously solved three-dimensional structure. ProTarget takes as input an amino acid sequence. Regions of this sequence that exhibit high similarity to an existing PDB (Protein Data Bank) sequence are removed, leaving one or more subsequences. Each of these subsequences is then analyzed against a clustering of the protein space to determine the likelihood of its representing a new structural superfamily. This likelihood is derived from the distance in the clustering between the (sub)sequence and sequences that have known structures. The output of ProTarget is a graphical visualization of the protein of interest together with the likelihood that a protein sequence represents a novel structural superfamily. ProTarget is updated regularly and currently covers over 160 000 protein sequences from the SwissProt and PDB databases. ProTarget is available at .
Collapse
Affiliation(s)
| | - Michal Linial
- Department of Biological Chemistry, Institute of Life Sciences, Hebrew University of Jerusalem91904 Israel
- To whom correspondence should be addressed. Tel: +972 2 6585425; Fax: +972 2 6586448;
| |
Collapse
|
15
|
Nelson CA, Pekosz A, Lee CA, Diamond MS, Fremont DH. Structure and intracellular targeting of the SARS-coronavirus Orf7a accessory protein. Structure 2005; 13:75-85. [PMID: 15642263 PMCID: PMC7125549 DOI: 10.1016/j.str.2004.10.010] [Citation(s) in RCA: 131] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2004] [Revised: 10/18/2004] [Accepted: 10/19/2004] [Indexed: 11/17/2022]
Abstract
The open reading frame (ORF) 7a of the SARS-associated coronavirus (SARS-CoV) encodes a unique type I transmembrane protein of unknown function. We have determined the 1.8 Å resolution crystal structure of the N-terminal ectodomain of orf7a, revealing a compact seven-stranded β sandwich unexpectedly similar in fold and topology to members of the Ig superfamily. We also demonstrate that, in SARS-CoV- infected cells, the orf7a protein is expressed and retained intracellularly. Confocal microscopy studies using orf7a and orf7a/CD4 chimeras implicate the short cytoplasmic tail and transmembrane domain in trafficking of the protein within the endoplasmic reticulum and Golgi network. Taken together, our findings provide a structural and cellular framework in which to explore the role of orf7a in SARS-CoV pathogenesis.
Collapse
Affiliation(s)
- Christopher A. Nelson
- Department of Pathology and Immunology , 660 South Euclid Avenue, St. Louis, Missouri 63110
| | - Andrew Pekosz
- Department of Pathology and Immunology , 660 South Euclid Avenue, St. Louis, Missouri 63110
- Department of Molecular Microbiology , 660 South Euclid Avenue, St. Louis, Missouri 63110
| | - Chung A. Lee
- Department of Pathology and Immunology , 660 South Euclid Avenue, St. Louis, Missouri 63110
| | - Michael S. Diamond
- Department of Pathology and Immunology , 660 South Euclid Avenue, St. Louis, Missouri 63110
- Department of Molecular Microbiology , 660 South Euclid Avenue, St. Louis, Missouri 63110
- Department of Medicine , 660 South Euclid Avenue, St. Louis, Missouri 63110
| | - Daved H. Fremont
- Department of Pathology and Immunology , 660 South Euclid Avenue, St. Louis, Missouri 63110
- Department of Biochemistry , and Molecular Biophysics , Washington University School of Medicine , 660 South Euclid Avenue , St. Louis, Missouri 63110
- Ph: (314) 747-6547; Fax: (314) 362-8888
| |
Collapse
|
16
|
Siew N, Fischer D. Structural Biology Sheds Light on the Puzzle of Genomic ORFans. J Mol Biol 2004; 342:369-73. [PMID: 15327940 DOI: 10.1016/j.jmb.2004.06.073] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2004] [Revised: 06/09/2004] [Accepted: 06/19/2004] [Indexed: 10/26/2022]
Abstract
Genomic ORFans are orphan open reading frames (ORFs) with no significant sequence similarity to other ORFs. ORFans comprise 20-30% of the ORFs of most completely sequenced genomes. Because nothing can be learnt about ORFans via sequence homology, the functions and evolutionary origins of ORFans remain a mystery. Furthermore, because relatively few ORFans have been experimentally characterized, it has been suggested that most ORFans are not likely to correspond to functional, expressed proteins, but rather to spurious ORFs, pseudo-genes or to rapidly evolving proteins with non-essential roles. As a snapshot view of current ORFan structural studies, we searched for ORFans among proteins whose three-dimensional structures have been recently determined. We find that functional and structural studies of ORFans are not as underemphasized as previously suggested. These recently determined structures correspond to ORFans from all Kingdoms of life, and include proteins that have previously been functionally characterized, as well as structural genomics targets of unknown function labeled as "hypothetical proteins". This suggests that many of the ORFans in the databases are likely to correspond to expressed, functional (and even essential) proteins. Furthermore, the recently determined structures include examples of the various types of ORFans, suggesting that the functions and evolutionary origins of ORFans are diverse. Although this survey sheds some light on the ORFan mystery, further experimental studies are required to gain a better understanding of the role and origins of the tens of thousands of ORFans awaiting characterization.
Collapse
Affiliation(s)
- Naomi Siew
- Department of Chemistry, Ben Gurion University Beer-Sheva 84105, Israel
| | | |
Collapse
|
17
|
Mariani SM. Conference report--structural genomics: parsing the architecture of proteins highlights of the ABRF 2004--integrating technologies in proteomics and genomics, February 28-March 2, 2004; Portland, Oregon. MEDGENMED : MEDSCAPE GENERAL MEDICINE 2004; 6:22. [PMID: 15266248 PMCID: PMC1395765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 04/30/2023]
|