1
|
Noble D, Blundell TL, Kohl P. Progress in biophysics and molecular biology: A brief history of the journal. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2018; 140:1-4. [PMID: 30526959 DOI: 10.1016/j.pbiomolbio.2018.11.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Affiliation(s)
- Denis Noble
- Department of Physiology, Anatomy & Genetics, Parks Road, Oxford, OX1 3PT, UK.
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1GA, UK.
| | - Peter Kohl
- Institute for Experimental Cardiovascular Medicine, Faculty of Medicine, University of Freiburg, Elsasser Str 2Q, 90110, Freiburg, Germany.
| |
Collapse
|
2
|
Raja M. Special Interaction of Anionic Phosphatidic Acid Promotes High Secondary Structure in Tetrameric Potassium Channel. J Membr Biol 2014; 247:747-52. [DOI: 10.1007/s00232-014-9704-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Accepted: 06/27/2014] [Indexed: 10/25/2022]
|
3
|
Wilson WW, Delucas LJ. Applications of the second virial coefficient: protein crystallization and solubility. ACTA CRYSTALLOGRAPHICA SECTION F-STRUCTURAL BIOLOGY COMMUNICATIONS 2014; 70:543-54. [PMID: 24817708 PMCID: PMC4014317 DOI: 10.1107/s2053230x1400867x] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2014] [Accepted: 04/16/2014] [Indexed: 11/10/2022]
Abstract
This article begins by highlighting some of the ground-based studies emanating from NASA's Microgravity Protein Crystal Growth (PCG) program. This is followed by a more detailed discussion of the history of and the progress made in one of the NASA-funded PCG investigations involving the use of measured second virial coefficients (B values) as a diagnostic indicator of solution conditions conducive to protein crystallization. A second application of measured B values involves the determination of solution conditions that improve or maximize the solubility of aqueous and membrane proteins. These two important applications have led to several technological improvements that simplify the experimental expertise required, enable the measurement of membrane proteins and improve the diagnostic capability and measurement throughput.
Collapse
Affiliation(s)
| | - Lawrence J Delucas
- Center for Structural Biology, University of Alabama at Birmingham, 1720 Second Avenue South, Birmingham, AL 35294, USA
| |
Collapse
|
4
|
Gana R, Rao S, Huang H, Wu C, Vasudevan S. Structural and functional studies of S-adenosyl-L-methionine binding proteins: a ligand-centric approach. BMC STRUCTURAL BIOLOGY 2013; 13:6. [PMID: 23617634 PMCID: PMC3662625 DOI: 10.1186/1472-6807-13-6] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/29/2012] [Accepted: 04/09/2013] [Indexed: 12/31/2022]
Abstract
BACKGROUND The post-genomic era poses several challenges. The biggest is the identification of biochemical function for protein sequences and structures resulting from genomic initiatives. Most sequences lack a characterized function and are annotated as hypothetical or uncharacterized. While homology-based methods are useful, and work well for sequences with sequence identities above 50%, they fail for sequences in the twilight zone (<30%) of sequence identity. For cases where sequence methods fail, structural approaches are often used, based on the premise that structure preserves function for longer evolutionary time-frames than sequence alone. It is now clear that no single method can be used successfully for functional inference. Given the growing need for functional assignments, we describe here a systematic new approach, designated ligand-centric, which is primarily based on analysis of ligand-bound/unbound structures in the PDB. Results of applying our approach to S-adenosyl-L-methionine (SAM) binding proteins are presented. RESULTS Our analysis included 1,224 structures that belong to 172 unique families of the Protein Information Resource Superfamily system. Our ligand-centric approach was divided into four levels: residue, protein/domain, ligand, and family levels. The residue level included the identification of conserved binding site residues based on structure-guided sequence alignments of representative members of a family, and the identification of conserved structural motifs. The protein/domain level included structural classification of proteins, Pfam domains, domain architectures, and protein topologies. The ligand level included ligand conformations, ribose sugar puckering, and the identification of conserved ligand-atom interactions. The family level included phylogenetic analysis. CONCLUSION We found that SAM bound to a total of 18 different fold types (I-XVIII). We identified 4 new fold types and 11 additional topological arrangements of strands within the well-studied Rossmann fold Methyltransferases (MTases). This extends the existing structural classification of SAM binding proteins. A striking correlation between fold type and the conformation of the bound SAM (classified as types) was found across the 18 fold types. Several site-specific rules were created for the assignment of functional residues to families and proteins that do not have a bound SAM or a solved structure.
Collapse
Affiliation(s)
- Rajaram Gana
- Department of Biostatistics and Bioinformatics, Georgetown University Medical Center, Washington, DC 20007, USA
| | | | | | | | | |
Collapse
|
5
|
Chandonia JM, Brenner S. Update on the pfam5000 strategy for selection of structural genomics targets. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2012; 2006:751-5. [PMID: 17282292 DOI: 10.1109/iembs.2005.1616523] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Structural Genomics is an international effort to determine the three-dimensional shapes of all important biological macromolecules, with a primary focus on proteins. Target proteins should be selected according to a strategy that is medically and biologically relevant, of good financial value, and tractable. In 2003, we presented the "Pfam5000" strategy, which involves selecting the 5,000 most important families from the Pfam database as sources for targets. In this update, we show that although both the Pfam database and the number of sequenced genomes have increased in size, the expected benefits of the Pfam5000 strategy have not changed substantially. Solving the structures of proteins from the 5,000 largest Pfam families would allow accurate fold assignment for approximately 65% of all prokaryotic proteins (covering 54% of residues) and 63% of eukaryotic proteins (42% of residues). Fewer than 2,300 of the largest families on this list remain to be solved, making the project feasible in the next five years given the expected throughput to be achieved in the production phase of the Protein Structure Initiative.
Collapse
Affiliation(s)
- J-M Chandonia
- Berkeley Structural Genomics Center, Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA (e-mail: )
| | | |
Collapse
|
6
|
Ranjbar B, Gill P. Circular Dichroism Techniques: Biomolecular and Nanostructural Analyses- A Review. Chem Biol Drug Des 2009; 74:101-20. [DOI: 10.1111/j.1747-0285.2009.00847.x] [Citation(s) in RCA: 410] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
7
|
Bertonati C, Punta M, Fischer M, Yachdav G, Forouhar F, Zhou W, Kuzin AP, Seetharaman J, Abashidze M, Ramelot TA, Kennedy MA, Cort JR, Belachew A, Hunt JF, Tong L, Montelione GT, Rost B. Structural genomics reveals EVE as a new ASCH/PUA-related domain. Proteins 2009; 75:760-73. [PMID: 19191354 PMCID: PMC4080787 DOI: 10.1002/prot.22287] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We report on several proteins recently solved by structural genomics consortia, in particular by the Northeast Structural Genomics consortium (NESG). The proteins considered in this study differ substantially in their sequences but they share a similar structural core, characterized by a pseudobarrel five-stranded beta sheet. This core corresponds to the PUA domain-like architecture in the SCOP database. By connecting sequence information with structural knowledge, we characterize a new subgroup of these proteins that we propose to be distinctly different from previously described PUA domain-like domains such as PUA proper or ASCH. We refer to these newly defined domains as EVE. Although EVE may have retained the ability of PUA domains to bind RNA, the available experimental and computational data suggests that both the details of its molecular function and its cellular function differ from those of other PUA domain-like domains. This study of EVE and its relatives illustrates how the combination of structure and genomics creates new insights by connecting a cornucopia of structures that map to the same evolutionary potential. Primary sequence information alone would have not been sufficient to reveal these evolutionary links.
Collapse
Affiliation(s)
- Claudia Bertonati
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Mazumder R, Vasudevan S. Structure-guided comparative analysis of proteins: principles, tools, and applications for predicting function. PLoS Comput Biol 2008; 4:e1000151. [PMID: 18818720 PMCID: PMC2515338 DOI: 10.1371/journal.pcbi.1000151] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Affiliation(s)
- Raja Mazumder
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, D.C., United States of America
| | - Sona Vasudevan
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, D.C., United States of America
- * E-mail:
| |
Collapse
|
9
|
Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K, Eisen JA, Heidelberg KB, Manning G, Li W, Jaroszewski L, Cieplak P, Miller CS, Li H, Mashiyama ST, Joachimiak MP, van Belle C, Chandonia JM, Soergel DA, Zhai Y, Natarajan K, Lee S, Raphael BJ, Bafna V, Friedman R, Brenner SE, Godzik A, Eisenberg D, Dixon JE, Taylor SS, Strausberg RL, Frazier M, Venter JC. The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol 2007; 5:e16. [PMID: 17355171 PMCID: PMC1821046 DOI: 10.1371/journal.pbio.0050016] [Citation(s) in RCA: 535] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2006] [Accepted: 08/15/2006] [Indexed: 02/04/2023] Open
Abstract
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature. The rapidly emerging field of metagenomics seeks to examine the genomic content of communities of organisms to understand their roles and interactions in an ecosystem. Given the wide-ranging roles microbes play in many ecosystems, metagenomics studies of microbial communities will reveal insights into protein families and their evolution. Because most microbes will not grow in the laboratory using current cultivation techniques, scientists have turned to cultivation-independent techniques to study microbial diversity. One such technique—shotgun sequencing—allows random sampling of DNA sequences to examine the genomic material present in a microbial community. We used shotgun sequencing to examine microbial communities in water samples collected by the Sorcerer II Global Ocean Sampling (GOS) expedition. Our analysis predicted more than six million proteins in the GOS data—nearly twice the number of proteins present in current databases. These predictions add tremendous diversity to known protein families and cover nearly all known prokaryotic protein families. Some of the predicted proteins had no similarity to any currently known proteins and therefore represent new families. A higher than expected fraction of these novel families is predicted to be of viral origin. We also found that several protein domains that were previously thought to be kingdom specific have GOS examples in other kingdoms. Our analysis opens the door for a multitude of follow-up protein family analyses and indicates that we are a long way from sampling all the protein families that exist in nature. The GOS data identified 6.12 million predicted proteins covering nearly all known prokaryotic protein families, and several new families. This almost doubles the number of known proteins and shows that we are far from identifying all the proteins in nature.
Collapse
Affiliation(s)
- Shibu Yooseph
- J. Craig Venter Institute, Rockville, Maryland, United States of America.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Mirkovic N, Li Z, Parnassa A, Murray D. Strategies for high-throughput comparative modeling: applications to leverage analysis in structural genomics and protein family organization. Proteins 2007; 66:766-77. [PMID: 17154423 DOI: 10.1002/prot.21191] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The technological breakthroughs in structural genomics were designed to facilitate the solution of a sufficient number of structures, so that as many protein sequences as possible can be structurally characterized with the aid of comparative modeling. The leverage of a solved structure is the number and quality of the models that can be produced using the structure as a template for modeling and may be viewed as the "currency" with which the success of a structural genomics endeavor can be measured. Moreover, the models obtained in this way should be valuable to all biologists. To this end, at the Northeast Structural Genomics Consortium (NESG), a modular computational pipeline for automated high-throughput leverage analysis was devised and used to assess the leverage of the 186 unique NESG structures solved during the first phase of the Protein Structure Initiative (January 2000 to July 2005). Here, the results of this analysis are presented. The number of sequences in the nonredundant protein sequence database covered by quality models produced by the pipeline is approximately 39,000, so that the average leverage is approximately 210 models per structure. Interestingly, only 7900 of these models fulfill the stringent modeling criterion of being at least 30% sequence-identical to the corresponding NESG structures. This study shows how high-throughput modeling increases the efficiency of structure determination efforts by providing enhanced coverage of protein structure space. In addition, the approach is useful in refining the boundaries of structural domains within larger protein sequences, subclassifying sequence diverse protein families, and defining structure-based strategies specific to a particular family.
Collapse
Affiliation(s)
- Nebojsa Mirkovic
- Department of Microbiology and Immunology, Weill Medical College of Cornell University, New York, New York 10021, USA
| | | | | | | |
Collapse
|
11
|
Watson JD, Sanderson S, Ezersky A, Savchenko A, Edwards A, Orengo C, Joachimiak A, Laskowski RA, Thornton JM. Towards fully automated structure-based function prediction in structural genomics: a case study. J Mol Biol 2007; 367:1511-22. [PMID: 17316683 PMCID: PMC2566530 DOI: 10.1016/j.jmb.2007.01.063] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2006] [Revised: 01/23/2007] [Accepted: 01/24/2007] [Indexed: 10/23/2022]
Abstract
As the global Structural Genomics projects have picked up pace, the number of structures annotated in the Protein Data Bank as hypothetical protein or unknown function has grown significantly. A major challenge now involves the development of computational methods to assign functions to these proteins accurately and automatically. As part of the Midwest Center for Structural Genomics (MCSG) we have developed a fully automated functional analysis server, ProFunc, which performs a battery of analyses on a submitted structure. The analyses combine a number of sequence-based and structure-based methods to identify functional clues. After the first stage of the Protein Structure Initiative (PSI), we review the success of the pipeline and the importance of structure-based function prediction. As a dataset, we have chosen all structures solved by the MCSG during the 5 years of the first PSI. Our analysis suggests that two of the structure-based methods are particularly successful and provide examples of local similarity that is difficult to identify using current sequence-based methods. No one method is successful in all cases, so, through the use of a number of complementary sequence and structural approaches, the ProFunc server increases the chances that at least one method will find a significant hit that can help elucidate function. Manual assessment of the results is a time-consuming process and subject to individual interpretation and human error. We present a method based on the Gene Ontology (GO) schema using GO-slims that can allow the automated assessment of hits with a success rate approaching that of expert manual assessment.
Collapse
Affiliation(s)
- James D Watson
- EMBL--European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Chandonia JM, Kim SH. Structural proteomics of minimal organisms: conservation of protein fold usage and evolutionary implications. BMC STRUCTURAL BIOLOGY 2006; 6:7. [PMID: 16566839 PMCID: PMC1488858 DOI: 10.1186/1472-6807-6-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2005] [Accepted: 03/28/2006] [Indexed: 11/10/2022]
Abstract
BACKGROUND Determining the complete repertoire of protein structures for all soluble, globular proteins in a single organism has been one of the major goals of several structural genomics projects in recent years. RESULTS We report that this goal has nearly been reached for several "minimal organisms"--parasites or symbionts with reduced genomes--for which over 95% of the soluble, globular proteins may now be assigned folds, overall 3-D backbone structures. We analyze the structures of these proteins as they relate to cellular functions, and compare conservation of fold usage between functional categories. We also compare patterns in the conservation of folds among minimal organisms and those observed between minimal organisms and other bacteria. CONCLUSION We find that proteins performing essential cellular functions closely related to transcription and translation exhibit a higher degree of conservation in fold usage than proteins in other functional categories. Folds related to transcription and translation functional categories were also overrepresented in minimal organisms compared to other bacteria.
Collapse
Affiliation(s)
- John-Marc Chandonia
- Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Sung-Hou Kim
- Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Department of Chemistry, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
13
|
Chandonia JM, Brenner SE. Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches. Proteins 2006; 58:166-79. [PMID: 15521074 DOI: 10.1002/prot.20298] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Structural genomics is an international effort to determine the three-dimensional shapes of all important biological macromolecules, with a primary focus on proteins. Target proteins should be selected according to a strategy that is medically and biologically relevant, of good value, and tractable. As an option to consider, we present the "Pfam5000" strategy, which involves selecting the 5000 most important families from the Pfam database as sources for targets. We compare the Pfam5000 strategy to several other proposed strategies that would require similar numbers of targets. These strategies include complete solution of several small to moderately sized bacterial proteomes, partial coverage of the human proteome, and random selection of approximately 5000 targets from sequenced genomes. We measure the impact that successful implementation of these strategies would have upon structural interpretation of the proteins in Swiss-Prot, TrEMBL, and 131 complete proteomes (including 10 of eukaryotes) from the Proteome Analysis database at the European Bioinformatics Institute (EBI). Solving the structures of proteins from the 5000 largest Pfam families would allow accurate fold assignment for approximately 68% of all prokaryotic proteins (covering 59% of residues) and 61% of eukaryotic proteins (40% of residues). More fine-grained coverage that would allow accurate modeling of these proteins would require an order of magnitude more targets. The Pfam5000 strategy may be modified in several ways, for example, to focus on larger families, bacterial sequences, or eukaryotic sequences; as long as secondary consideration is given to large families within Pfam, coverage results vary only slightly. In contrast, focusing structural genomics on a single tractable genome would have only a limited impact in structural knowledge of other proteomes: A significant fraction (about 30-40% of the proteins and 40-60% of the residues) of each proteome is classified in small families, which may have little overlap with other species of interest. Random selection of targets from one or more genomes is similar to the Pfam5000 strategy in that proteins from larger families are more likely to be chosen, but substantial effort would be spent on small families.
Collapse
Affiliation(s)
- John-Marc Chandonia
- Berkeley Structural Genomics Center, Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | | |
Collapse
|
14
|
Chandonia JM, Kim SH, Brenner SE. Target selection and deselection at the Berkeley Structural Genomics Center. Proteins 2005; 62:356-70. [PMID: 16276528 DOI: 10.1002/prot.20674] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
At the Berkeley Structural Genomics Center (BSGC), our goal is to obtain a near-complete structural complement of proteins in the minimal organisms Mycoplasma genitalium and M. pneumoniae, two closely related pathogens. Current targets for structure determination have been selected in six major stages, starting with those predicted to be most tractable to high throughput study and likely to yield new structural information. We report on the process used to select these proteins, as well as our target deselection procedure. Target deselection reduces experimental effort by eliminating targets similar to those recently solved by the structural biology community or other centers. We measure the impact of the 69 structures solved at the BSGC as of July 2004 on structure prediction coverage of the M. pneumoniae and M. genitalium proteomes. The number of Mycoplasma proteins for which the fold could first be reliably assigned based on structures solved at the BSGC (24 M. pneumoniae and 21 M. genitalium) is approximately 25% of the total resulting from work at all structural genomics centers and the worldwide structural biology community (94 M. pneumoniae and 86 M. genitalium) during the same period. As the number of structures contributed by the BSGC during that period is less than 1% of the total worldwide output, the benefits of a focused target selection strategy are apparent. If the structures of all current targets were solved, the percentage of M. pneumoniae proteins for which folds could be reliably assigned would increase from approximately 57% (391 of 687) at present to around 80% (550 of 687), and the percentage of the proteome that could be accurately modeled would increase from around 37% (254 of 687) to about 64% (438 of 687). In M. genitalium, the percentage of the proteome that could be structurally annotated based on structures of our remaining targets would rise from 72% (348 of 486) to around 76% (371 of 486), with the percentage of accurately modeled proteins would rise from 50% (243 of 486) to 58% (283 of 486). Sequences and data on experimental progress on our targets are available in the public databases TargetDB and PEPCdb.
Collapse
Affiliation(s)
- John-Marc Chandonia
- Berkeley Structural Genomics Center, Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | | | | |
Collapse
|
15
|
Abstract
During the second half of the 20th century, biochemistry and subsequently molecular biology blossomed into the core upon which all biological and biomedical sciences now depend. A major part of these closely related disciplines has been the study of the structure and function of proteins and the diverse biological functions that they perform. Early experimentation necessarily focused on individual entities, selected mainly for their activities, but as technology improved there developed a tendency to look at proteins as larger, interactive groups or clusters. Spurred by the recent exponential production of genomic sequence data for a rapidly increasing number of species, protein chemistry has now evolved into a new discipline, proteomics. In addition to embracing the methods and approaches that have served protein scientists well in the past, it includes, and is perhaps best defined by, high-throughput analyses based in large part on 2D gel electrophoresis, MALDI and ESI mass spectrometry and combinatorial arrays. Proteomic targets include the identification of all genome products and a mapping of their interactions and expression profiles. These hold great promise for the identification of disease markers and drug targets, but are not without their challenges and pitfalls.
Collapse
Affiliation(s)
- Ralph A Bradshaw
- Department of Biochemistry, University of Cambridge, Cambridge, UK.
| | | |
Collapse
|
16
|
Apic G, Huber W, Teichmann SA. Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination. ACTA ACUST UNITED AC 2004; 4:67-78. [PMID: 14649290 DOI: 10.1023/a:1026113408773] [Citation(s) in RCA: 76] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
There is a limited repertoire of domain families in nature that are duplicated and combined in different ways to form the set of proteins in a genome. Most proteins in both prokaryote and eukaryote genomes consist of two or more domains, and we show that the family size distribution of multi-domain protein families follows a power law like that of individual families. Most domain pairs occur in four to six different domain architectures: in isolation and in combinations with different partners. We showed previously that within the set of all pairwise domain combinations, most small and medium-sized families are observed in combination with one or two other families, while a few large families are very versatile and combine with many different partners. Though this may appear to be a stochastic pattern, in which large families have more combination partners by virtue of their size, we establish here that all the domain families with more than three members in genomes are duplicated more frequently than would be expected by chance considering their number of neighbouring domains. This duplication of domain pairs is statistically significant for between one and three quarters of all families with seven or more members. For the majority of pairwise domain combinations, there is no known three-dimensional structure of the two domains together, and we term these novel combinations. Novel domain combinations are interesting and important targets for structural elucidation, as the geometry and interaction between the domains will help understand the function and evolution of multi-domain proteins. Of particular interest are those combinations that occur in the largest number of multi-domain proteins, and several of these frequent novel combinations contain DNA-binding domains.
Collapse
Affiliation(s)
- Gordana Apic
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK
| | | | | |
Collapse
|
17
|
Wallace BA, Lees JG, Orry AJW, Lobley A, Janes RW. Analyses of circular dichroism spectra of membrane proteins. Protein Sci 2003; 12:875-84. [PMID: 12649445 PMCID: PMC2323856 DOI: 10.1110/ps.0229603] [Citation(s) in RCA: 124] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2002] [Revised: 12/19/2002] [Accepted: 12/19/2002] [Indexed: 10/27/2022]
Abstract
Circular dichroism (CD) spectroscopy is a valuable technique for the determination of protein secondary structures. Many linear and nonlinear algorithms have been developed for the empirical analysis of CD data, using reference databases derived from proteins of known structures. To date, the reference databases used by the various algorithms have all been derived from the spectra of soluble proteins. When applied to the analysis of soluble protein spectra, these methods generally produce calculated secondary structures that correspond well with crystallographic structures. In this study, however, it was shown that when applied to membrane protein spectra, the resulting calculations produce considerably poorer results. One source of this discrepancy may be the altered spectral peak positions (wavelength shifts) of membrane proteins due to the different dielectric of the membrane environment relative to that of water. These results have important consequences for studies that seek to use the existing soluble protein reference databases for the analyses of membrane proteins.
Collapse
Affiliation(s)
- B A Wallace
- Department of Crystallography, Birkbeck College, University of London, London WC1E 7HX, UK.
| | | | | | | | | |
Collapse
|
18
|
Yang JK, Park MS, Waldo GS, Suh SW. Directed evolution approach to a structural genomics project: Rv2002 from Mycobacterium tuberculosis. Proc Natl Acad Sci U S A 2003; 100:455-60. [PMID: 12524453 PMCID: PMC141016 DOI: 10.1073/pnas.0137017100] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
One of the serious bottlenecks in structural genomics projects is overexpression of the target proteins in soluble form. We have applied the directed evolution technique and prepared soluble mutants of the Mycobacterium tuberculosis Rv2002 gene product, the wild type of which had been expressed as inclusion bodies in Escherichia coli. A triple mutant I6TV47MT69K (Rv2002-M3) was chosen for structural and functional characterizations. Enzymatic assays indicate that the Rv2002-M3 protein has a high catalytic activity as a NADH-dependent 3alpha, 20beta-hydroxysteroid dehydrogenase. We have determined the crystal structures of a binary complex with NAD(+) and a ternary complex with androsterone and NADH. The structure reveals that Asp-38 determines the cofactor specificity. The catalytic site includes the triad Ser-140Tyr-153Lys-157. Additionally, it has an unusual feature, Glu-142. Enzymatic assays of the E142A mutant of Rv2002-M3 indicate that Glu-142 reverses the effect of Lys-157 in influencing the pKa of Tyr-153. This study suggests that the Rv2002 gene product is a unique member of the SDR family and is likely to be involved in steroid metabolism in M. tuberculosis. Our work demonstrates the power of the directed evolution technique as a general way of overcoming the difficulties in overexpressing the target proteins in soluble form.
Collapse
Affiliation(s)
- Jin Kuk Yang
- Structural Proteomics Laboratory, School of Chemistry and Molecular Engineering, Seoul National University, Seoul 151-742, South Korea
| | | | | | | |
Collapse
|
19
|
Watson JD, Todd AE, Bray J, Laskowski RA, Edwards A, Joachimiak A, Orengo CA, Thornton JM. Target selection and determination of function in structural genomics. IUBMB Life 2003; 55:249-55. [PMID: 12880206 PMCID: PMC3366504 DOI: 10.1080/1521654031000123385] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
The first crucial step in any structural genomics project is the selection and prioritization of target proteins for structure determination. There may be a number of selection criteria to be satisfied, including that the proteins have novel folds, that they be representatives of large families for which no structure is known, and so on. The better the selection at this stage, the greater is the value of the structures obtained at the end of the experimental process. This value can be further enhanced once the protein structures have been solved if the functions of the given proteins can also be determined. Here we describe the methods used at either end of the experimental process: firstly, sensitive sequence comparison techniques for selecting a high-quality list of target proteins, and secondly the various computational methods that can be applied to the eventual 3D structures to determine the most likely biochemical function of the proteins in question.
Collapse
Affiliation(s)
- James D Watson
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | | | | | | | | | | | | |
Collapse
|
20
|
Abstract
High-throughput sequencing of human genomes and those of important model organisms (mouse, Drosophila melanogaster, Caenorhabditis elegans, fungi, archaea) and bacterial pathogens has laid the foundation for another "big science" initiative in biology. Together, X-ray crystallographers, nuclear magnetic resonance (NMR) spectroscopists, and computational biologists are pursuing high-throughput structural studies aimed at developing a comprehensive three-dimensional view of the protein structure universe. The new science of structural genomics promises more than 10,000 experimental protein structures and millions of calculated homology models of related proteins. The evolutionary underpinnings and technological challenges of automating target selection, protein expression and purification, sample preparation, NMR and X-ray data measurement/analysis, homology modeling, and structure/function annotation are discussed in detail. An informative case study from one of the structural genomics centers funded by the National Institutes of Health and the National Institute of General Medical Sciences (NIH/NIGMS) demonstrates how this experimental/computational pipeline will reveal important links between form and function in biology and provide new insights into evolution and human health and disease.
Collapse
Affiliation(s)
- Stephen K Burley
- Howard Hughes Medical Institute, Laboratories of Molecular Biophysics, The Rockefeller University, New York New York 10021, USA.
| | | |
Collapse
|
21
|
Abstract
Over the last decade, structural biologists have unravelled many proteins that appear natively disordered. Common assumptions are that many of these proteins adopt structure through binding and that the structural flexibility enables them to adopt different functions. Here, we investigated regions of more than 70 sequence-consecutive residues that have no regular secondary structure (NORS). Analysing 31 entirely sequenced organisms, we predicted five times as many proteins with NORS regions (loopy proteins) in eukaryotes (20%) than in prokaryotes and archaeas (4%). Thousands of these NORS regions were over 150 residues long. The amino acid composition of NORS regions differed from that of loops in PDB. Although NORS proteins had significantly more residues in low-complexity regions than other proteins, simple cut-off thresholds for sequence bias missed most NORS regions. On average, NORS regions were evolutionarily at least as conserved as their flanking regions. Furthermore, yeast proteins with NORS regions had more protein-protein interaction partners than other proteins. Regulatory and transcription-related functions were over-represented in loopy proteins, biosynthesis and energy metabolism were under-represented. Overall, our analysis confirmed that proteins with non-regular structures appear to play important functional roles, and they may adopt as yet unknown types of protein structures.
Collapse
Affiliation(s)
- Jinfeng Liu
- Department of Pharmacology, Columbia University, New York, NY 10032, USA
| | | | | |
Collapse
|
22
|
Abstract
Pharmacogenomics requires the integration and analysis of genomic, molecular, cellular, and clinical data, and it thus offers a remarkable set of challenges to biomedical informatics. These include infrastructural challenges such as the creation of data models and databases for storing these data, the integration of these data with external databases, the extraction of information from natural language text, and the protection of databases with sensitive information. There are also scientific challenges in creating tools to support gene expression analysis, three-dimensional structural analysis, and comparative genomic analysis. In this review, we summarize the current uses of informatics within pharmacogenomics and show how the technical challenges that remain for biomedical informatics are typical of those that will be confronted in the postgenomic era.
Collapse
Affiliation(s)
- Russ B Altman
- Stanford Medical Informatics, Stanford, California 94305-5479, USA.
| | | |
Collapse
|
23
|
Getz G, Vendruscolo M, Sachs D, Domany E. Automated assignment of SCOP and CATH protein structure classifications from FSSP scores. Proteins 2002; 46:405-15. [PMID: 11835515 DOI: 10.1002/prot.1176] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We present an automated procedure to assign CATH and SCOP classifications to proteins whose FSSP score is available. CATH classification is assigned down to the topology level, and SCOP classification is assigned to the fold level. Because the FSSP database is updated weekly, this method makes it possible to update also CATH and SCOP with the same frequency. Our predictions have a nearly perfect success rate when ambiguous cases are discarded. These ambiguous cases are intrinsic in any protein structure classification that relies on structural information alone. Hence, we introduce the "twilight zone for structure classification." We further suggest that to resolve these ambiguous cases, other criteria of classification, based also on information about sequence and function, must be used.
Collapse
Affiliation(s)
- Gad Getz
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
| | | | | | | |
Collapse
|
24
|
Blundell TL, Jhoti H, Abell C. High-throughput crystallography for lead discovery in drug design. Nat Rev Drug Discov 2002; 1:45-54. [PMID: 12119609 DOI: 10.1038/nrd706] [Citation(s) in RCA: 348] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Knowledge of the three-dimensional structures of protein targets now emerging from genomic data has the potential to accelerate drug discovery greatly. X-ray crystallography is the most widely used technique for protein structure determination, but technical challenges and time constraints have traditionally limited its use primarily to lead optimization. Here, we describe how significant advances in process automation and informatics have aided the development of high-throughput X-ray crystallography, and discuss the use of this technique for structure-based lead discovery.
Collapse
Affiliation(s)
- Tom L Blundell
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1GA, UK.
| | | | | |
Collapse
|
25
|
Tian F, Valafar H, Prestegard JH. A dipolar coupling based strategy for simultaneous resonance assignment and structure determination of protein backbones. J Am Chem Soc 2001; 123:11791-6. [PMID: 11716736 DOI: 10.1021/ja011806h] [Citation(s) in RCA: 94] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A new approach for simultaneous protein backbone resonance assignment and structure determination by NMR is introduced. This approach relies on recent advances in high-resolution NMR spectroscopy that allow observation of anisotropic interactions, such as dipolar couplings, from proteins partially aligned in field ordered media. Residual dipolar couplings are used for both geometric information and a filter in the assembly of residues in a sequential manner. Experimental data were collected in less than one week on a small redox protein, rubredoxin, that was 15N enriched but not enriched above 1% natural abundance in 13C. Given the acceleration possible with partial 13C enrichment, the protocol described should provide a very rapid route to protein structure determination. This is critical for the structural genomics initiative where protein expression and structural determination in a high-throughput manner will be needed.
Collapse
Affiliation(s)
- F Tian
- Southeast Collaboratory for Structural Genomics, University of Georgia, Athens, Georgia 30602-4712, USA
| | | | | |
Collapse
|
26
|
Wallace BA, Janes RW. Synchrotron radiation circular dichroism spectroscopy of proteins: secondary structure, fold recognition and structural genomics. Curr Opin Chem Biol 2001; 5:567-71. [PMID: 11578931 DOI: 10.1016/s1367-5931(00)00243-x] [Citation(s) in RCA: 91] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Recent developments in instrumentation and bioinformatics show that the technique of synchrotron radiation circular dichroism spectroscopy can provide novel information on protein secondary structures and folding motifs, and has the potential to play an important role in structural genomics studies, both as a means of target selection and as a high-throughput, low-sample-requiring screening method. This is possible because of the additional information content in the low-vacuum ultraviolet wavelength data obtainable with intense synchrotron radiation light sources, compared with that present in spectra from conventional lab-based circular dichroism instruments.
Collapse
Affiliation(s)
- B A Wallace
- School of Crystallography, Birkbeck College, University of London, London WC1E 7HX, UK.
| | | |
Collapse
|
27
|
Abstract
Structural genomics projects aim to provide an experimental or computational three-dimensional model structure for all of the tractable macromolecules that are encoded by complete genomes. To this end, pilot centres worldwide are now exploring the feasibility of large-scale structure determination. Their experimental structures and computational models are expected to yield insight into the molecular function and mechanism of thousands of proteins. The pervasiveness of this information is likely to change the use of structure in molecular biology and biochemistry.
Collapse
Affiliation(s)
- S E Brenner
- Department of Plant and Microbial Biology, University of California, 461A Koshland Hall, Berkeley, California 94720-3102, USA.
| |
Collapse
|
28
|
Abstract
Following the complete genome sequencing of an increasing number of organisms, structural biology is engaging in a systematic approach of high-throughput structure determination called structural genomics to create a complete inventory of protein folds/structures that will help predict functions for all proteins. First results show that structural genomics will be highly effective in finding functional annotations for proteins of unknown function.
Collapse
Affiliation(s)
- P R Mittl
- Institute of Biochemistry, University of Zürich, Winterthurerstrasse 190, 8057 Zürich, Switzerland
| | | |
Collapse
|
29
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2001. [PMCID: PMC2447210 DOI: 10.1002/cfg.57] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|