1
|
Lipsh-Sokolik R, Fleishman SJ. Addressing epistasis in the design of protein function. Proc Natl Acad Sci U S A 2024; 121:e2314999121. [PMID: 39133844 PMCID: PMC11348311 DOI: 10.1073/pnas.2314999121] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2024] Open
Abstract
Mutations in protein active sites can dramatically improve function. The active site, however, is densely packed and extremely sensitive to mutations. Therefore, some mutations may only be tolerated in combination with others in a phenomenon known as epistasis. Epistasis reduces the likelihood of obtaining improved functional variants and dramatically slows natural and lab evolutionary processes. Research has shed light on the molecular origins of epistasis and its role in shaping evolutionary trajectories and outcomes. In addition, sequence- and AI-based strategies that infer epistatic relationships from mutational patterns in natural or experimental evolution data have been used to design functional protein variants. In recent years, combinations of such approaches and atomistic design calculations have successfully predicted highly functional combinatorial mutations in active sites. These were used to design thousands of functional active-site variants, demonstrating that, while our understanding of epistasis remains incomplete, some of the determinants that are critical for accurate design are now sufficiently understood. We conclude that the space of active-site variants that has been explored by evolution may be expanded dramatically to enhance natural activities or discover new ones. Furthermore, design opens the way to systematically exploring sequence and structure space and mutational impacts on function, deepening our understanding and control over protein activity.
Collapse
Affiliation(s)
- Rosalie Lipsh-Sokolik
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Sarel J Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7610001, Israel
| |
Collapse
|
2
|
AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. Commun Biol 2023; 6:160. [PMID: 36755055 PMCID: PMC9908985 DOI: 10.1038/s42003-023-04488-9] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 01/16/2023] [Indexed: 02/10/2023] Open
Abstract
Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.
Collapse
|
3
|
Exploring the effect of tethered domains on the folding of Grb2 protein. Arch Biochem Biophys 2022; 731:109444. [DOI: 10.1016/j.abb.2022.109444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Revised: 09/27/2022] [Accepted: 10/14/2022] [Indexed: 11/17/2022]
|
4
|
Bordin N, Sillitoe I, Lees JG, Orengo C. Tracing Evolution Through Protein Structures: Nature Captured in a Few Thousand Folds. Front Mol Biosci 2021; 8:668184. [PMID: 34041266 PMCID: PMC8141709 DOI: 10.3389/fmolb.2021.668184] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 04/27/2021] [Indexed: 11/13/2022] Open
Abstract
This article is dedicated to the memory of Cyrus Chothia, who was a leading light in the world of protein structure evolution. His elegant analyses of protein families and their mechanisms of structural and functional evolution provided important evolutionary and biological insights and firmly established the value of structural perspectives. He was a mentor and supervisor to many other leading scientists who continued his quest to characterise structure and function space. He was also a generous and supportive colleague to those applying different approaches. In this article we review some of his accomplishments and the history of protein structure classifications, particularly SCOP and CATH. We also highlight some of the evolutionary insights these two classifications have brought. Finally, we discuss how the expansion and integration of protein sequence data into these structural families helps reveal the dark matter of function space and can inform the emergence of novel functions in Metazoa. Since we cover 25 years of structural classification, it has not been feasible to review all structure based evolutionary studies and hence we focus mainly on those undertaken by the SCOP and CATH groups and their collaborators.
Collapse
Affiliation(s)
- Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Jonathan G Lees
- Department of Biological and Medical Sciences, Faculty of Health and Life Sciences, Oxford Brookes University, Oxford, United Kingdom
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| |
Collapse
|
5
|
Chu X, Suo Z, Wang J. Investigating the trade-off between folding and function in a multidomain Y-family DNA polymerase. eLife 2020; 9:60434. [PMID: 33079059 PMCID: PMC7641590 DOI: 10.7554/elife.60434] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Accepted: 10/16/2020] [Indexed: 01/01/2023] Open
Abstract
The way in which multidomain proteins fold has been a puzzling question for decades. Until now, the mechanisms and functions of domain interactions involved in multidomain protein folding have been obscure. Here, we develop structure-based models to investigate the folding and DNA-binding processes of the multidomain Y-family DNA polymerase IV (DPO4). We uncover shifts in the folding mechanism among ordered domain-wise folding, backtracking folding, and cooperative folding, modulated by interdomain interactions. These lead to ‘U-shaped’ DPO4 folding kinetics. We characterize the effects of interdomain flexibility on the promotion of DPO4–DNA (un)binding, which probably contributes to the ability of DPO4 to bypass DNA lesions, which is a known biological role of Y-family polymerases. We suggest that the native topology of DPO4 leads to a trade-off between fast, stable folding and tight functional DNA binding. Our approach provides an effective way to quantitatively correlate the roles of protein interactions in conformational dynamics at the multidomain level.
Collapse
Affiliation(s)
- Xiakun Chu
- Department of Chemistry, State University of New York at Stony Brook, New York, United States
| | - Zucai Suo
- Department of Biomedical Sciences, College of Medicine, Florida State University, Tallahassee, United States
| | - Jin Wang
- Department of Chemistry, State University of New York at Stony Brook, New York, United States
| |
Collapse
|
6
|
Kumar V, Chaudhuri TK. Spontaneous refolding of the large multidomain protein malate synthase G proceeds through misfolding traps. J Biol Chem 2018; 293:13270-13283. [PMID: 29959230 DOI: 10.1074/jbc.ra118.003903] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Revised: 06/28/2018] [Indexed: 11/06/2022] Open
Abstract
Most protein folding studies until now focus on single domain or truncated proteins. Although great insights in the folding of such systems has been accumulated, very little is known regarding the proteins containing multiple domains. It has been shown that the high stability of domains, in conjunction with inter-domain interactions, manifests as a frustrated energy landscape, causing complexity in the global folding pathway. However, multidomain proteins despite containing independently foldable, loosely cooperative sections can fold into native states with amazing speed and accuracy. To understand the complexity in mechanism, studies were conducted previously on the multidomain protein malate synthase G (MSG), an enzyme of the glyoxylate pathway with four distinct and adjacent domains. It was shown that the protein refolds to a functionally active intermediate state at a fast rate, which slowly produces the native state. Although experiments decoded the nature of the intermediate, a full description of the folding pathway was not elucidated. In this study, we use a battery of biophysical techniques to examine the protein's folding pathway. By using multiprobe kinetics studies and comparison with the equilibrium behavior of protein against urea, we demonstrate that the unfolded polypeptide undergoes conformational compaction to a misfolded intermediate within milliseconds of refolding. The misfolded product appears to be stabilized under moderate denaturant concentrations. Further folding of the protein produces a stable intermediate, which undergoes partial unfolding-assisted large segmental rearrangements to achieve the native state. This study reveals an evolved folding pathway of the multidomain protein MSG, which involves surpassing the multiple misfolding traps during refolding.
Collapse
Affiliation(s)
- Vipul Kumar
- From the Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, New Delhi 110016, India
| | - Tapan K Chaudhuri
- From the Kusuma School of Biological Sciences, Indian Institute of Technology, Delhi, New Delhi 110016, India
| |
Collapse
|
7
|
Crystal structure of an ASCH protein from Zymomonas mobilis and its ribonuclease activity specific for single-stranded RNA. Sci Rep 2017; 7:12303. [PMID: 28951575 PMCID: PMC5615036 DOI: 10.1038/s41598-017-12186-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Accepted: 09/05/2017] [Indexed: 01/29/2023] Open
Abstract
Activating signal cointegrator-1 homology (ASCH) domains were initially reported in human as a part of the ASC-1 transcriptional regulator, a component of a putative RNA-interacting protein complex; their presence has now been confirmed in a wide range of organisms. Here, we have determined the trigonal and monoclinic crystal structures of an ASCH domain-containing protein from Zymomonas mobilis (ZmASCH), and analyzed the structural determinants of its nucleic acid processing activity. The protein has a central β-barrel structure with several nearby α-helices. Positively charged surface patches form a cleft that runs through the pocket formed between the β-barrel and the surrounding α-helices. We further demonstrate by means of in vitro assays that ZmASCH binds nucleic acids, and degrades single-stranded RNAs in a magnesium ion-dependent manner with a cleavage preference for the phosphodiester bond between the pyrimidine and adenine nucleotides. ZmASCH also removes a nucleotide at the 5′-end. Mutagenesis studies, guided by molecular dynamics simulations, confirmed that three residues (Tyr47, Lys53, and Ser128) situated in the cleft contribute to nucleic acid-binding and RNA cleavage activities. These structural and biochemical studies imply that prokaryotic ASCH may function to control the cellular RNA amount.
Collapse
|
8
|
Molecular Modeling and Its Applications in Protein Engineering. Synth Biol (Oxf) 2016. [DOI: 10.1007/978-3-319-22708-5_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open
|
9
|
Cui X, Naveed H, Gao X. Finding optimal interaction interface alignments between biological complexes. Bioinformatics 2015; 31:i133-41. [PMID: 26072475 PMCID: PMC4765866 DOI: 10.1093/bioinformatics/btv242] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Motivation: Biological molecules perform their functions through interactions with other molecules. Structure alignment of interaction interfaces between biological complexes is an indispensable step in detecting their structural similarities, which are keys to understanding their evolutionary histories and functions. Although various structure alignment methods have been developed to successfully access the similarities of protein structures or certain types of interaction interfaces, existing alignment tools cannot directly align arbitrary types of interfaces formed by protein, DNA or RNA molecules. Specifically, they require a ‘blackbox preprocessing’ to standardize interface types and chain identifiers. Yet their performance is limited and sometimes unsatisfactory. Results: Here we introduce a novel method, PROSTA-inter, that automatically determines and aligns interaction interfaces between two arbitrary types of complex structures. Our method uses sequentially remote fragments to search for the optimal superimposition. The optimal residue matching problem is then formulated as a maximum weighted bipartite matching problem to detect the optimal sequence order-independent alignment. Benchmark evaluation on all non-redundant protein–DNA complexes in PDB shows significant performance improvement of our method over TM-align and iAlign (with the ‘blackbox preprocessing’). Two case studies where our method discovers, for the first time, structural similarities between two pairs of functionally related protein–DNA complexes are presented. We further demonstrate the power of our method on detecting structural similarities between a protein–protein complex and a protein–RNA complex, which is biologically known as a protein–RNA mimicry case. Availability and implementation: The PROSTA-inter web-server is publicly available at http://www.cbrc.kaust.edu.sa/prosta/. Contact:xin.gao@kaust.edu.sa
Collapse
Affiliation(s)
- Xuefeng Cui
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Hammad Naveed
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
10
|
Wilson WW, Delucas LJ. Applications of the second virial coefficient: protein crystallization and solubility. ACTA CRYSTALLOGRAPHICA SECTION F-STRUCTURAL BIOLOGY COMMUNICATIONS 2014; 70:543-54. [PMID: 24817708 PMCID: PMC4014317 DOI: 10.1107/s2053230x1400867x] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2014] [Accepted: 04/16/2014] [Indexed: 11/10/2022]
Abstract
This article begins by highlighting some of the ground-based studies emanating from NASA's Microgravity Protein Crystal Growth (PCG) program. This is followed by a more detailed discussion of the history of and the progress made in one of the NASA-funded PCG investigations involving the use of measured second virial coefficients (B values) as a diagnostic indicator of solution conditions conducive to protein crystallization. A second application of measured B values involves the determination of solution conditions that improve or maximize the solubility of aqueous and membrane proteins. These two important applications have led to several technological improvements that simplify the experimental expertise required, enable the measurement of membrane proteins and improve the diagnostic capability and measurement throughput.
Collapse
Affiliation(s)
| | - Lawrence J Delucas
- Center for Structural Biology, University of Alabama at Birmingham, 1720 Second Avenue South, Birmingham, AL 35294, USA
| |
Collapse
|
11
|
Abstract
Structural proteomics aims to understand the structural basis of protein interactions and functions. A prerequisite for this is the availability of 3D protein structures that mediate the biochemical interactions. The explosion in the number of available gene sequences set the stage for the next step in genome-scale projects -- to obtain 3D structures for each protein. To achieve this ambitious goal, the slow and costly structure determination experiments are supplemented with theoretical approaches. The current state and recent advances in structure modeling approaches are reviewed here, with special emphasis on comparative protein structure modeling techniques.
Collapse
Affiliation(s)
- András Fiser
- Department of Biochemistry, Seaver Foundation Center for Bioinformatics, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA.
| |
Collapse
|
12
|
Vyas VK, Ukawala RD, Ghate M, Chintha C. Homology modeling a fast tool for drug discovery: current perspectives. Indian J Pharm Sci 2012. [PMID: 23204616 PMCID: PMC3507339 DOI: 10.4103/0250-474x.102537] [Citation(s) in RCA: 155] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Major goal of structural biology involve formation of protein-ligand complexes; in which the protein molecules act energetically in the course of binding. Therefore, perceptive of protein-ligand interaction will be very important for structure based drug design. Lack of knowledge of 3D structures has hindered efforts to understand the binding specificities of ligands with protein. With increasing in modeling software and the growing number of known protein structures, homology modeling is rapidly becoming the method of choice for obtaining 3D coordinates of proteins. Homology modeling is a representation of the similarity of environmental residues at topologically corresponding positions in the reference proteins. In the absence of experimental data, model building on the basis of a known 3D structure of a homologous protein is at present the only reliable method to obtain the structural information. Knowledge of the 3D structures of proteins provides invaluable insights into the molecular basis of their functions. The recent advances in homology modeling, particularly in detecting and aligning sequences with template structures, distant homologues, modeling of loops and side chains as well as detecting errors in a model contributed to consistent prediction of protein structure, which was not possible even several years ago. This review focused on the features and a role of homology modeling in predicting protein structure and described current developments in this field with victorious applications at the different stages of the drug design and discovery.
Collapse
Affiliation(s)
- V K Vyas
- Department of Pharmaceutical Chemistry, Institute of Pharmacy, Nirma University, Ahmedabad-382 481, India
| | | | | | | |
Collapse
|
13
|
Rathankar N, Nirmala KA, Khanduja V, Nagendra HG. Identification of potential drug targets implicated in Parkinson's disease from human genome: insights of using fused domains in hypothetical proteins as probes. ISRN NEUROLOGY 2011; 2011:265253. [PMID: 22389811 PMCID: PMC3263550 DOI: 10.5402/2011/265253] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2011] [Accepted: 05/21/2011] [Indexed: 12/31/2022]
Abstract
High-throughput genome sequencing has led to data explosion in sequence databanks, with an imbalance of sequence-structure-function relationships, resulting in a substantial fraction of proteins known as hypothetical proteins. Functions of such proteins can be assigned based on the analysis and characterization of the domains that they are made up of. Domains are basic evolutionary units of proteins and most proteins contain multiple domains. A subset of multidomain proteins is fused domains (overlapping domains), wherein sequence overlaps between two or more domains occur. These fused domains are a result of gene fusion events and their implication in diseases is well established. Hence, an attempt has been made in this paper to identify the fused domain containing hypothetical proteins from human genome homologous to parkinsonian targets present in KEGG database. The results of this research identified 18 hypothetical proteins, with domains fused with ubiquitin domains and having homology with targets present in parkinsonian pathway.
Collapse
Affiliation(s)
- N Rathankar
- Department of Bioinformatics, School of Bioengineering, SRM University, Kattankulathur, Tamil Nadu 603 203, India
| | | | | | | |
Collapse
|
14
|
Parbhoo N, Stoychev SH, Fanucchi S, Achilonu I, Adamson RJ, Fernandes M, Gildenhuys S, Dirr HW. A Conserved Interdomain Interaction Is a Determinant of Folding Cooperativity in the GST Fold. Biochemistry 2011; 50:7067-75. [DOI: 10.1021/bi2006509] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Nishal Parbhoo
- Protein Structure−Function
Research Unit, School of Molecular and Cell Biology, University of the Witwatersrand, Johannesburg 2050,
South Africa
| | - Stoyan H. Stoychev
- Protein Structure−Function
Research Unit, School of Molecular and Cell Biology, University of the Witwatersrand, Johannesburg 2050,
South Africa
| | - Sylvia Fanucchi
- Protein Structure−Function
Research Unit, School of Molecular and Cell Biology, University of the Witwatersrand, Johannesburg 2050,
South Africa
| | - Ikechukwu Achilonu
- Protein Structure−Function
Research Unit, School of Molecular and Cell Biology, University of the Witwatersrand, Johannesburg 2050,
South Africa
| | - Roslin J. Adamson
- Protein Structure−Function
Research Unit, School of Molecular and Cell Biology, University of the Witwatersrand, Johannesburg 2050,
South Africa
| | - Manuel Fernandes
- School of
Chemistry, University of the Witwatersrand, Johannesburg 2050, South Africa
| | - Samantha Gildenhuys
- Protein Structure−Function
Research Unit, School of Molecular and Cell Biology, University of the Witwatersrand, Johannesburg 2050,
South Africa
| | - Heini W. Dirr
- Protein Structure−Function
Research Unit, School of Molecular and Cell Biology, University of the Witwatersrand, Johannesburg 2050,
South Africa
| |
Collapse
|
15
|
Sikora M, Cieplak M. Mechanical stability of multidomain proteins and novel mechanical clamps. Proteins 2011; 79:1786-99. [PMID: 21465555 DOI: 10.1002/prot.23001] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2010] [Revised: 01/04/2011] [Accepted: 01/10/2011] [Indexed: 11/12/2022]
Abstract
We estimate the size of mechanostability for 318 multidomain proteins which are single-chain and contain up to 1021 amino acids. We predict existence of novel types of mechanical clamps in which interdomain contacts play an essential role. Mechanical clamps are structural regions which are the primary source of a protein's resistance to pulling. Among these clamps there is one that opposes tensile stress due to two domains swinging apart. This movement strains and then ruptures the contacts that hold the two domains together. Another clamp also involves tensile stress but it originates from an immobilization of a structural region by a surrounding knot-loop (without involving any disulfide bonds). Still another mechanism involves shear between helical regions belonging to two domains. We also consider the amyloid-prone cystatin C which provides an example of a two-chain 3D domain-swapped protein. We predict that this protein should withstand remarkably large stress, perhaps of order 800 pN, when inducing a shearing strain. The survey is generated through molecular dynamics simulations performed within a structure-based coarse grained model.
Collapse
Affiliation(s)
- Mateusz Sikora
- Laboratory of Biological Physics, Institute of Physics, Polish Academy of Sciences, Warsaw 02-668, Poland
| | | |
Collapse
|
16
|
Park SY, Park JH, Kim JS. Cloning, expression, purification, crystallization and preliminary X-ray diffraction analysis of an ASCH domain-containing protein from Zymomonas mobilis ZM4. Acta Crystallogr Sect F Struct Biol Cryst Commun 2011; 67:310-2. [PMID: 21393833 PMCID: PMC3053153 DOI: 10.1107/s1744309110053467] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2010] [Accepted: 12/20/2010] [Indexed: 11/10/2022]
Abstract
The human activating signal cointegrator 1 (ASC-1) homology (ASCH) domain is frequently observed in many organisms, although its function has not yet been clearly defined. In Zymomonas mobilis ZM4, the ZMO0922 gene encodes a polypeptide that includes an ASCH domain (zmASCH). To provide a better structural background for the probable role of ASCH domain-containing proteins, the ZMO0922 gene was cloned and expressed. The purified protein was crystallized from 30%(w/v) polyethylene glycol 400, 0.1 M cacodylic acid pH 6.5 and 0.2 M lithium sulfate. Diffraction data were collected to 2.1 Å resolution using synchrotron radiation. The crystal belonged to the primitive trigonal space group P3(1)21 or P3(2)21, with unit-cell parameters a=b=51.67, c=207.30 Å, α=β=90, γ=120°. Assuming the presence of one molecule in the asymmetric unit gave a Matthews coefficient of 4.69 Å(3) Da(-1), corresponding to a solvent content of 73.7%.
Collapse
Affiliation(s)
- Suk-Youl Park
- Department of Chemistry, Chonnam National University, Gwangju 500-757, Republic of Korea
| | - Jeong-Hoh Park
- Department of Chemistry, Chonnam National University, Gwangju 500-757, Republic of Korea
| | - Jeong-Sun Kim
- Department of Chemistry, Chonnam National University, Gwangju 500-757, Republic of Korea
| |
Collapse
|
17
|
Li G, Chen Q, Li J, Hu X, Zhao J. A Compact Disk-Like Centrifugal Microfluidic System for High-Throughput Nanoliter-Scale Protein Crystallization Screening. Anal Chem 2010; 82:4362-9. [DOI: 10.1021/ac902904m] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Gang Li
- Nanotechnology Laboratory, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, People’s Republic of China, and Department of Physiology and Biophysics, Fudan University, Shanghai 200433, People’s Republic of China
| | - Qiang Chen
- Nanotechnology Laboratory, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, People’s Republic of China, and Department of Physiology and Biophysics, Fudan University, Shanghai 200433, People’s Republic of China
| | - Junjun Li
- Nanotechnology Laboratory, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, People’s Republic of China, and Department of Physiology and Biophysics, Fudan University, Shanghai 200433, People’s Republic of China
| | - Xiaojian Hu
- Nanotechnology Laboratory, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, People’s Republic of China, and Department of Physiology and Biophysics, Fudan University, Shanghai 200433, People’s Republic of China
| | - Jianlong Zhao
- Nanotechnology Laboratory, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, People’s Republic of China, and Department of Physiology and Biophysics, Fudan University, Shanghai 200433, People’s Republic of China
| |
Collapse
|
18
|
Liu X, Zhao YP. Donut-shaped fingerprint in homologous polypeptide relationships--a topological feature related to pathogenic structural changes in conformational disease. J Theor Biol 2009; 258:294-301. [PMID: 19248793 PMCID: PMC7094133 DOI: 10.1016/j.jtbi.2009.02.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2008] [Revised: 01/06/2009] [Accepted: 02/11/2009] [Indexed: 02/05/2023]
Abstract
Features of homologous relationship of proteins can provide us a general picture of protein universe, assist protein design and analysis, and further our comprehension of the evolution of organisms. Here we carried out a study of the evolution of protein molecules by investigating homologous relationships among residue segments. The motive was to identify detailed topological features of homologous relationships for short residue segments in the whole protein universe. Based on the data of a large number of non-redundant proteins, the universe of non-membrane polypeptide was analyzed by considering both residue mutations and structural conservation. By connecting homologous segments with edges, we obtained a homologous relationship network of the whole universe of short residue segments, which we named the graph of polypeptide relationships (GPR). Since the network is extremely complicated for topological transitions, to obtain an in-depth understanding, only subgraphs composed of vital nodes of the GPR were analyzed. Such analysis of vital subgraphs of the GPR revealed a donut-shaped fingerprint. Utilization of this topological feature revealed the switch sites (where the beginning of exposure of previously hidden "hot spots" of fibril-forming happens, in consequence a further opportunity for protein aggregation is provided; 188-202) of the conformational conversion of the normal alpha-helix-rich prion protein PrP(C) to the beta-sheet-rich PrP(Sc) that is thought to be responsible for a group of fatal neurodegenerative diseases, transmissible spongiform encephalopathies. Efforts in analyzing other proteins related to various conformational diseases are also introduced.
Collapse
Affiliation(s)
- Xin Liu
- Institute of Mechanics, Chinese Academy of Sciences, Beijing 100080, China
| | - Ya-Pu Zhao
- The State Key Laboratory of Nonlinear Mechanics, Institute of Mechanics, Chinese Academy of Sciences. Beijing 100190, China
| |
Collapse
|
19
|
Cutler TA, Mills BM, Lubin DJ, Chong LT, Loh SN. Effect of interdomain linker length on an antagonistic folding-unfolding equilibrium between two protein domains. J Mol Biol 2009; 386:854-68. [PMID: 19038264 PMCID: PMC2756608 DOI: 10.1016/j.jmb.2008.10.090] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2008] [Revised: 10/28/2008] [Accepted: 10/31/2008] [Indexed: 10/21/2022]
Abstract
Fusion of one protein domain with another is a common event in both evolution and protein engineering experiments. When insertion is at an internal site (e.g., a surface loop or turn), as opposed to one of the termini, conformational strain can be introduced into both domains. Strain is manifested by an antagonistic folding-unfolding equilibrium between the two domains, which we previously showed can be parameterized by a coupling free-energy term (DeltaG(X)). The extent of strain is predicted to depend primarily on the ratio of the N-to-C distance of the guest protein to the distance between ends of the surface loop in the host protein. Here, we test that hypothesis by inserting ubiquitin (Ub) into the bacterial ribonuclease barnase (Bn), using peptide linkers from zero to 10 amino acids each. DeltaG(X) values are determined by measuring the extent to which Co(2+) binding to an engineered site on the Ub domain destabilizes the Bn domain. All-atom, unforced Langevin dynamics simulations are employed to gain structural insight into the mechanism of mechanically induced unfolding. Experimental and computational results find that the two domains are structurally and energetically uncoupled when linkers are long and that DeltaG(X) increases with decreasing linker length. When the linkers are fewer than two amino acids, strain is so great that one domain unfolds the other. However, the protein is able to refold as dimers and higher-order oligomers. The likely mechanism is a three-dimensional domain swap of the Bn domain, which relieves conformational strain. The simulations suggest that an effective route to mechanical unfolding begins with disruption of the hydrophobic core of Bn near the Ub insertion site.
Collapse
Affiliation(s)
- Thomas A Cutler
- Department of Biochemistry and Molecular Biology, SUNY Upstate Medical University, 750 East Adams Street, Syracuse, NY 13210, USA
| | | | | | | | | |
Collapse
|
20
|
Ren Y, Gao J, Ge W, Li J. Thermal Unfolding of a Double-Domain Protein: Molecular Dynamics Simulation of Rhodanese. Ind Eng Chem Res 2008. [DOI: 10.1021/ie801441x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Ying Ren
- State Key Laboratory of Multiphase Complex System, Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China, and Graduate University of the Chinese Academy of Sciences, Beijing 100039, China
| | - Jian Gao
- State Key Laboratory of Multiphase Complex System, Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China, and Graduate University of the Chinese Academy of Sciences, Beijing 100039, China
| | - Wei Ge
- State Key Laboratory of Multiphase Complex System, Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China, and Graduate University of the Chinese Academy of Sciences, Beijing 100039, China
| | - Jinghai Li
- State Key Laboratory of Multiphase Complex System, Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China, and Graduate University of the Chinese Academy of Sciences, Beijing 100039, China
| |
Collapse
|
21
|
Batey S, Nickson AA, Clarke J. Studying the folding of multidomain proteins. HFSP JOURNAL 2008; 2:365-77. [PMID: 19436439 PMCID: PMC2645590 DOI: 10.2976/1.2991513] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 07/14/2008] [Indexed: 11/19/2022]
Abstract
There have been relatively few detailed comprehensive studies of the folding of protein domains (or modules) in the context of their natural covalently linked neighbors. This is despite the fact that a significant proportion of the proteome consists of multidomain proteins. In this review we highlight some key experimental investigations of the folding of multidomain proteins to draw attention to the difficulties that can arise in analyzing such systems. The evidence suggests that interdomain interactions can significantly affect stability, folding, and unfolding rates. However, preliminary studies suggest that folding pathways are unaffected-to this extent domains can be truly considered to be independent folding units. Nonetheless, it is clear that interactions between domains cannot be ignored, in particular when considering the effects of mutations.
Collapse
Affiliation(s)
- Sarah Batey
- Department of Chemistry, MRC Centre for Protein Engineering, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Adrian A. Nickson
- Department of Chemistry, MRC Centre for Protein Engineering, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Jane Clarke
- Department of Chemistry, MRC Centre for Protein Engineering, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| |
Collapse
|
22
|
Batey S, Clarke J. The folding pathway of a single domain in a multidomain protein is not affected by its neighbouring domain. J Mol Biol 2008; 378:297-301. [PMID: 18371978 PMCID: PMC2828540 DOI: 10.1016/j.jmb.2008.02.032] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2007] [Revised: 01/21/2008] [Accepted: 02/15/2008] [Indexed: 11/24/2022]
Abstract
Domains are the structural, functional, and evolutionary components of proteins. Most folding studies to date have concentrated on the folding of single domains, but more than 70% of human proteins contain more than one domain, and interdomain interactions can affect both the stability and the folding kinetics. Whether the folding pathway is altered by interdomain interactions is not yet known. Here we investigated the effect of a folded neighbouring domain on the folding pathway of spectrin R16 (the 16th α-helical repeat from chicken brain α-spectrin) by using the two-domain construct R1516. The R16 folds faster and unfolds more slowly in the presence of its folded neighbour R15 (the 15th α-helical repeat from chicken brain α-spectrin). An extensive Φ-value analysis of the R16 domain in R1516 was completed to compare the transition state of the R16 domain alone with that of the R16 domain in a multidomain construct. The results indicate that the folding pathways are the same. This result validates the current approach of breaking up larger proteins into domains for the study of protein folding pathways.
Collapse
|
23
|
Improving pairwise sequence alignment between distantly related proteins. Methods Mol Biol 2007. [PMID: 17993679 DOI: 10.1007/978-1-59745-514-5_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Sequence alignment between remotely related proteins has been one of the more difficult problems in structural biology. Improvements have been achieved by incorporating information that enhances the diversity of the substitution matrices. NdPASA is a web-based server that optimizes sequence alignments between proteins sharing low percentages of sequence identity. The program integrates structure information of the template sequence into a global alignment algorithm by employing amino acids' neighbor-dependent propensities for secondary structure as unique parameters for alignment. NdPASA optimizes alignment by evaluating the likelihood of a residue pair in the query sequence matching against a corresponding residue pair adopting a particular secondary structure in the template sequence. The server is designed to aid homologous protein structure modeling. It is most effective when the structure of the template sequence is known. NdPASA can be accessed online at www.fenglab.org/bioserver.html.
Collapse
|
24
|
Batey S, Clarke J. Apparent cooperativity in the folding of multidomain proteins depends on the relative rates of folding of the constituent domains. Proc Natl Acad Sci U S A 2006; 103:18113-8. [PMID: 17108086 PMCID: PMC1636339 DOI: 10.1073/pnas.0604580103] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2006] [Indexed: 11/18/2022] Open
Abstract
Approximately 75% of eukaryotic proteins contain more than one so-called independently folding domain. However, there have been relatively few systematic studies to investigate the effect of interdomain interactions on protein stability and fewer still on folding kinetics. We present the folding of pairs of three-helix bundle spectrin domains as a paradigm to indicate how complex such an analysis can be. Equilibrium studies show an increase in denaturant concentration required to unfold the domains with only a single unfolding transition; however, in some cases, this is not accompanied by the increase in m value, which would be expected if the protein is a truly cooperative, all-or-none system. We analyze the complex kinetics of spectrin domain pairs, both wild-type and carefully selected mutants. By comparing these pairs, we are able to demonstrate that equilibrium data alone are insufficient to describe the folding of multidomain proteins and to quantify the effects that one domain can have on its neighbor.
Collapse
Affiliation(s)
- Sarah Batey
- Department of Chemistry, University of Cambridge, Medical Research Council Centre for Protein Engineering, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Jane Clarke
- Department of Chemistry, University of Cambridge, Medical Research Council Centre for Protein Engineering, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| |
Collapse
|
25
|
Abstract
Homology modeling plays a central role in determining protein structure in the structural genomics project. The importance of homology modeling has been steadily increasing because of the large gap that exists between the overwhelming number of available protein sequences and experimentally solved protein structures, and also, more importantly, because of the increasing reliability and accuracy of the method. In fact, a protein sequence with over 30% identity to a known structure can often be predicted with an accuracy equivalent to a low-resolution X-ray structure. The recent advances in homology modeling, especially in detecting distant homologues, aligning sequences with template structures, modeling of loops and side chains, as well as detecting errors in a model, have contributed to reliable prediction of protein structure, which was not possible even several years ago. The ongoing efforts in solving protein structures, which can be time-consuming and often difficult, will continue to spur the development of a host of new computational methods that can fill in the gap and further contribute to understanding the relationship between protein structure and function.
Collapse
Affiliation(s)
- Zhexin Xiang
- Center for Molecular Modeling, Center for Information Technology, National Institutes of Health, Building 12A Room 2051, 12 South Drive, Bethesda, Maryland 20892-5624, USA.
| |
Collapse
|
26
|
Wang J, Feng JA. NdPASA: a novel pairwise protein sequence alignment algorithm that incorporates neighbor-dependent amino acid propensities. Proteins 2006; 58:628-37. [PMID: 15616964 DOI: 10.1002/prot.20359] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Sequence alignment has become one of the essential bioinformatics tools in biomedical research. Existing sequence alignment methods can produce reliable alignments for homologous proteins sharing a high percentage of sequence identity. The performance of these methods deteriorates sharply for the sequence pairs sharing less than 25% sequence identity. We report here a new method, NdPASA, for pairwise sequence alignment. This method employs neighbor-dependent propensities of amino acids as a unique parameter for alignment. The values of neighbor-dependent propensity measure the preference of an amino acid pair adopting a particular secondary structure conformation. NdPASA optimizes alignment by evaluating the likelihood of a residue pair in the query sequence matching against a corresponding residue pair adopting a particular secondary structure in the template sequence. Using superpositions of homologous proteins derived from the PSI-BLAST analysis and the Structural Classification of Proteins (SCOP) classification of a nonredundant Protein Data Bank (PDB) database as a gold standard, we show that NdPASA has improved pairwise alignment. Statistical analyses of the performance of NdPASA indicate that the introduction of sequence patterns of secondary structure derived from neighbor-dependent sequence analysis clearly improves alignment performance for sequence pairs sharing less than 20% sequence identity. For sequence pairs sharing 13-21% sequence identity, NdPASA improves the accuracy of alignment over the conventional global alignment (GA) algorithm using the BLOSUM62 by an average of 8.6%. NdPASA is most effective for aligning query sequences with template sequences whose structure is known. NdPASA can be accessed online at http://astro.temple.edu/feng/Servers/BioinformaticServers.htm.
Collapse
Affiliation(s)
- Junwen Wang
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, USA
| | | |
Collapse
|
27
|
Batey S, Scott KA, Clarke J. Complex folding kinetics of a multidomain protein. Biophys J 2006; 90:2120-30. [PMID: 16387757 PMCID: PMC1386790 DOI: 10.1529/biophysj.105.072710] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2005] [Accepted: 12/05/2005] [Indexed: 11/18/2022] Open
Abstract
Spectrin domains are three-helix bundles, commonly found in large tandem arrays. Equilibrium studies have shown that spectrin domains are significantly stabilized by their neighbors. In this work we show that domain:domain interactions can also have profound effects on their kinetic behavior. We have studied the folding of a tandem pair of spectrin domains (R1617) using a combination of single- and double-jump stopped flow experiments (monitoring folding by both circular dichroism and fluorescence). Mutant proteins were also used to investigate the complex folding kinetics. We find that, although the domains fold and unfold individually, there is a single rate-determining step for both folding and unfolding of the protein. This is consistent with the equilibrium observation of cooperative folding of the entire two-domain protein. The results may have important biological implications. Not only will the protein fold more efficiently during cotranslational folding, but the ability of the multidomain protein to withstand thermal unfolding in the cell will be dramatically increased. This study suggests that caution has to be exercised when extrapolating from single domains to larger proteins with a number of independently folding modules arranged in tandem. The multidomain protein spectrin is certainly more than "the sum of its parts".
Collapse
Affiliation(s)
- Sarah Batey
- Department of Chemistry, MRC Centre for Protein Engineering, University of Cambridge, Cambridge CB2 1EW, United Kingdom
| | | | | |
Collapse
|
28
|
Liang H, Landweber LF. Molecular mimicry: quantitative methods to study structural similarity between protein and RNA. RNA (NEW YORK, N.Y.) 2005; 11:1167-72. [PMID: 16043503 PMCID: PMC1370800 DOI: 10.1261/rna.7207205] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
With rapidly increasing availability of three-dimensional structures, one major challenge for the post-genome era is to infer the functions of biological molecules based on their structural similarity. While quantitative studies of structural similarity between the same type of biological molecules (e.g., protein vs. protein) have been carried out intensively, the comparable study of structural similarity between different types of biological molecules (e.g., protein vs. RNA) remains unexplored. Here we have developed a new bioinformatics approach to quantitatively study the structural similarity between two different types of biopolymers--proteins and RNA--based on the spatial distribution of conserved elements. We applied it to two previously proposed tRNA-protein mimicry pairs whose functional relatedness between two molecules has been recently determined experimentally. Our method detected the biologically meaningful signals, which are consistent with experimental evidence.
Collapse
Affiliation(s)
- Han Liang
- Department of Chemistry, Princeton University, NJ 08544, USA
| | | |
Collapse
|
29
|
Shah PK, Aloy P, Bork P, Russell RB. Structural similarity to bridge sequence space: finding new families on the bridges. Protein Sci 2005; 14:1305-14. [PMID: 15840833 PMCID: PMC2253280 DOI: 10.1110/ps.041187405] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Structures for protein domains have increased rapidly in recent years owing to advances in structural biology and structural genomics projects. New structures are often similar to those solved previously, and such similarities can give insights into function by linking poorly understood families to those that are better characterized. They also allow the possibility of combing information to find still more proteins adopting a similar structure and sometimes a similar function, and to reprioritize families in structural genomics pipelines. We explore this possibility here by preparing merged profiles for pairs of structurally similar, but not necessarily sequence-similar, domains within the SMART and Pfam database by way of the Structural Classification of Proteins (SCOP). We show that such profiles are often able to successfully identify further members of the same superfamily and thus can be used to increase the sensitivity of database searching methods like HMMer and PSI-BLAST. We perform detailed benchmarks using the SMART and Pfam databases with four complete genomes frequently used as annotation benchmarks. We quantify the associated increase in structural information in Swissprot and discuss examples illustrating the applicability of this approach to understand functional and evolutionary relationships between protein families.
Collapse
|
30
|
Batey S, Randles LG, Steward A, Clarke J. Cooperative Folding in a Multi-domain Protein. J Mol Biol 2005; 349:1045-59. [PMID: 15913648 DOI: 10.1016/j.jmb.2005.04.028] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2005] [Revised: 04/08/2005] [Accepted: 04/14/2005] [Indexed: 11/27/2022]
Abstract
Most protein domains are found in multi-domain proteins, yet most studies of protein folding have concentrated on small, single-domain proteins or on isolated domains from larger proteins. Spectrin domains are small (106 amino acid residues), independently folding domains consisting of three long alpha-helices. They are found in multi-domain proteins with a number of spectrin domains in tandem array. Structural studies have shown that in these arrays the last helix of one domain forms a continuous helix with the first helix of the following domain. It has been demonstrated that a number of spectrin domains are stabilised by their neighbours. Here we investigate the molecular basis for cooperativity between adjacent spectrin domains 16 and 17 from chicken brain alpha-spectrin (R16 and R17). We show that whereas the proteins unfold as a single cooperative unit at 25 degrees C, cooperativity is lost at higher temperatures and in the presence of stabilising salts. Mutations in the linker region also cause the cooperativity to be lost. However, the cooperativity does not rely on specific interactions in the linker region alone. Most mutations in the R17 domain cause a decrease in cooperativity, whereas proteins with mutations in the R16 domain still fold cooperatively. We propose a mechanism for this behaviour.
Collapse
Affiliation(s)
- Sarah Batey
- University of Cambridge, Department of Chemistry, MRC Centre for Protein Engineering, Lensfield Rd, Cambridge CB2 1EW, UK
| | | | | | | |
Collapse
|
31
|
Hou J, Jun SR, Zhang C, Kim SH. Global mapping of the protein structure space and application in structure-based inference of protein function. Proc Natl Acad Sci U S A 2005; 102:3651-6. [PMID: 15705717 PMCID: PMC548596 DOI: 10.1073/pnas.0409772102] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We have constructed a map of the "protein structure space" by using the pairwise structural similarity scores calculated for all nonredundant protein structures determined experimentally. As expected, proteins with similar structures clustered together in the map and the overall distribution of structural classes of this map followed closely that of the map of the "protein fold space" we have reported previously. Consequently, proteins sharing similar molecular functions also were found to colocalize in the protein structure space map, pointing toward a previously undescribed scheme for structure-based functional inference for remote homologues based on the proximity in the map of the protein structure space. We found that this scheme consistently outperformed other predictions made by using either the raw scores or normalized Z-scores of pairwise DALI structure alignment.
Collapse
Affiliation(s)
- Jingtong Hou
- Department of Chemistry and Graduate Program of Comparative Biochemistry, University of California, Berkeley, CA 94720, USA
| | | | | | | |
Collapse
|
32
|
Shakhnovich BE, Deeds E, Delisi C, Shakhnovich E. Protein structure and evolutionary history determine sequence space topology. Genome Res 2005; 15:385-92. [PMID: 15741509 PMCID: PMC551565 DOI: 10.1101/gr.3133605] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2004] [Accepted: 11/23/2004] [Indexed: 11/24/2022]
Abstract
Understanding the observed variability in the number of homologs of a gene is a very important unsolved problem that has broad implications for research into coevolution of structure and function, gene duplication, pseudogene formation, and possibly for emerging diseases. Here, we attempt to define and elucidate some possible causes behind the observed irregularity in sequence space. We present evidence that sequence variability and functional diversity of a gene or fold family is influenced by quantifiable characteristics of the protein structure. These characteristics reflect the structural potential for sequence plasticity, i.e., the ability to accept mutation without losing thermodynamic stability. We identify a structural feature of a protein domain-contact density-that serves as a determinant of entropy in sequence space, i.e., the ability of a protein to accept mutations without destroying the fold (also known as fold designability). We show that (log) of average gene family size exhibits statistical correlation (R(2) > 0.9.) with contact density of its three-dimensional structure. We present evidence that the size of individual gene families are influenced not only by the designability of the structure, but also by evolutionary history, e.g., the amount of time the gene family was in existence. We further show that our observed statistical correlation between gene family size and contact density of the structure is valid on many levels of evolutionary divergence, i.e., not only for closely related sequence, but also for less-related fold and superfamily levels of homology.
Collapse
|
33
|
Liu J, Hegyi H, Acton TB, Montelione GT, Rost B. Automatic target selection for structural genomics on eukaryotes. Proteins 2004; 56:188-200. [PMID: 15211504 DOI: 10.1002/prot.20012] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A central goal of structural genomics is to experimentally determine representative structures for all protein families. At least 14 structural genomics pilot projects are currently investigating the feasibility of high-throughput structure determination; the National Institutes of Health funded nine of these in the United States. Initiatives differ in the particular subset of "all families" on which they focus. At the NorthEast Structural Genomics consortium (NESG), we target eukaryotic protein domain families. The automatic target selection procedure has three aims: 1) identify all protein domain families from currently five entirely sequenced eukaryotic target organisms based on their sequence homology, 2) discard those families that can be modeled on the basis of structural information already present in the PDB, and 3) target representatives of the remaining families for structure determination. To guarantee that all members of one family share a common foldlike region, we had to begin by dissecting proteins into structural domain-like regions before clustering. Our hierarchical approach, CHOP, utilizing homology to PrISM, Pfam-A, and SWISS-PROT chopped the 103,796 eukaryotic proteins/ORFs into 247,222 fragments. Of these fragments, 122,999 appeared suitable targets that were grouped into >27,000 singletons and >18,000 multifragment clusters. Thus, our results suggested that it might be necessary to determine >40,000 structures to minimally cover the subset of five eukaryotic proteomes.
Collapse
Affiliation(s)
- Jinfeng Liu
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| | | | | | | | | |
Collapse
|
34
|
Das R, Gerstein M. A method using active-site sequence conservation to find functional shifts in protein families: application to the enzymes of central metabolism, leading to the identification of an anomalous isocitrate dehydrogenase in pathogens. Proteins 2004; 55:455-63. [PMID: 15048835 DOI: 10.1002/prot.10639] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We have introduced a method to identify functional shifts in protein families. Our method is based on the calculation of an active-site conservation ratio, which we call the "ASC ratio." For a structurally based alignment of a protein family, this ratio is the average sequence similarity of the active-site region compared to the full-length protein. The active-site region is defined as all the residues within a certain radius of the known functionally important groups. Using our method, we have analyzed enzymes of central metabolism from a large number of genomes (35). We found that for most of the enzymes, the active-site region is more highly conserved than the full-length sequence. However, for three tricarboxylic acid (TCA)-cycle enzymes, active-site sequences are considerably more diverged (than full-length ones). In particular, we were able to identify in six pathogens a novel isocitrate dehydrogenase that has very low sequence similarity around the active site. Detailed sequence-structure analysis indicates that while the active-site structure of isocitrate dehydrogenase is most likely similar between pathogens and nonpathogens, the unusual sequence divergence could result from an extra domain added at the N-terminus. This domain has a leucine-rich motif similar one in the Yersinia pestis cytotoxin and may therefore confer additional pathogenic functions.
Collapse
Affiliation(s)
- Rajdeep Das
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | | |
Collapse
|
35
|
Apic G, Huber W, Teichmann SA. Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination. ACTA ACUST UNITED AC 2004; 4:67-78. [PMID: 14649290 DOI: 10.1023/a:1026113408773] [Citation(s) in RCA: 76] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
There is a limited repertoire of domain families in nature that are duplicated and combined in different ways to form the set of proteins in a genome. Most proteins in both prokaryote and eukaryote genomes consist of two or more domains, and we show that the family size distribution of multi-domain protein families follows a power law like that of individual families. Most domain pairs occur in four to six different domain architectures: in isolation and in combinations with different partners. We showed previously that within the set of all pairwise domain combinations, most small and medium-sized families are observed in combination with one or two other families, while a few large families are very versatile and combine with many different partners. Though this may appear to be a stochastic pattern, in which large families have more combination partners by virtue of their size, we establish here that all the domain families with more than three members in genomes are duplicated more frequently than would be expected by chance considering their number of neighbouring domains. This duplication of domain pairs is statistically significant for between one and three quarters of all families with seven or more members. For the majority of pairwise domain combinations, there is no known three-dimensional structure of the two domains together, and we term these novel combinations. Novel domain combinations are interesting and important targets for structural elucidation, as the geometry and interaction between the domains will help understand the function and evolution of multi-domain proteins. Of particular interest are those combinations that occur in the largest number of multi-domain proteins, and several of these frequent novel combinations contain DNA-binding domains.
Collapse
Affiliation(s)
- Gordana Apic
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK
| | | | | |
Collapse
|
36
|
Abstract
Guessing the boundaries of structural domains has been an important and challenging problem in experimental and computational structural biology. Predictions were based on intuition, biochemical properties, statistics, sequence homology and other aspects of predicted protein structure. Here, we introduced CHOPnet, a de novo method that predicts structural domains in the absence of homology to known domains. Our method was based on neural networks and relied exclusively on information available for all proteins. Evaluating sustained performance through rigorous cross-validation on proteins of known structure, we correctly predicted the number of domains in 69% of all proteins. For 50% of the two-domain proteins the centre of the predicted boundary was closer than 20 residues to the boundary assigned from three-dimensional (3D) structures; this was about eight percentage points better than predictions by 'equal split'. Our results appeared to compare favourably with those from previously published methods. CHOPnet may be useful to restrict the experimental testing of different fragments for structure determination in the context of structural genomics.
Collapse
Affiliation(s)
- Jinfeng Liu
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA.
| | | |
Collapse
|
37
|
Ranea JAG, Buchan DWA, Thornton JM, Orengo CA. Evolution of protein superfamilies and bacterial genome size. J Mol Biol 2004; 336:871-87. [PMID: 15095866 DOI: 10.1016/j.jmb.2003.12.044] [Citation(s) in RCA: 77] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2003] [Revised: 12/11/2003] [Accepted: 12/12/2003] [Indexed: 10/26/2022]
Abstract
We present the structural annotation of 56 different bacterial species based on the assignment of genes to 816 evolutionary superfamilies in the CATH domain structure database. These assignments have enabled us to analyse the recurrence of specific superfamilies within and across the genomes. We have selected the superfamilies that have a very broad representation and therefore appear to be universally distributed in a significant number of bacterial lineages. Occurrence profiles of these universally distributed superfamilies are compared with genome size in order to estimate the correlation between superfamily duplication and the increase in proteome size. This distinguishes between those size-dependent superfamilies where frequency of occurrence is highly correlated with increase in genome size, and size-independent superfamilies where no correlation is observed. Consideration of the size correlation and the ratio between the mean and the standard deviations for all the superfamily profiles allows more detailed subdivisions and classification of superfamilies. For example, within the size-independent superfamilies, we distinguished a group that are distributed evenly amongst all the genomes. Within the size-dependent superfamilies we differentiated two groups: linearly distributed and non-linearly distributed. Functional annotation using the COG database was performed for all superfamilies in each of these groups, and this revealed significant differences amongst the three sets of superfamilies. Evenly distributed, size-independent domains are shown to be involved primarily in protein translation and biosynthesis. For the size-dependent superfamilies, linearly distributed superfamilies are involved mainly in metabolism, and non-linearly distributed superfamily domains are involved principally in gene regulation.
Collapse
Affiliation(s)
- Juan A G Ranea
- Biomlolecular Structure and Modelling Group, Department of Biochemistry and Molecular Biology, University College London, London WC1E 6BT, UK.
| | | | | | | |
Collapse
|
38
|
Tiana G, Shakhnovich BE, Dokholyan NV, Shakhnovich EI. Imprint of evolution on protein structures. Proc Natl Acad Sci U S A 2004; 101:2846-51. [PMID: 14970345 PMCID: PMC365708 DOI: 10.1073/pnas.0306638101] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2003] [Accepted: 12/22/2003] [Indexed: 11/18/2022] Open
Abstract
We attempt to understand the evolutionary origin of protein folds by simulating their divergent evolution with a three-dimensional lattice model. Starting from an initial seed lattice structure, evolution of model proteins progresses by sequence duplication and subsequent point mutations. A new gene's ability to fold into a stable and unique structure is tested each time through direct kinetic folding simulations. Where possible, the algorithm accepts the new sequence and structure and thus a "new protein structure" is born. During the course of each run, this model evolutionary algorithm provides several thousand new proteins with diverse structures. Analysis of evolved structures shows that later evolved structures are more designable than seed structures as judged by recently developed structural determinant of protein designability, as well as direct estimate of designability for selected structures by thermodynamic sampling of their sequence space. We test the significance of this trend predicted on lattice models on real proteins and show that protein domains that are found in eukaryotic organisms only feature statistically significant higher designability than their prokaryotic counterparts. These results present a fundamental view on protein evolution highlighting the relative roles of structural selection and evolutionary dynamics on genesis of modern proteins.
Collapse
Affiliation(s)
- Guido Tiana
- Department of Physics and Istituto Nazionale di Fisica Nucleare, University of Milano, Via Celoria 16, 20133 Milan, Italy
| | | | | | | |
Collapse
|
39
|
Abstract
A new potential energy function representing the conformational preferences of sequentially local regions of a protein backbone is presented. This potential is derived from secondary structure probabilities such as those produced by neural network-based prediction methods. The potential is applied to the problem of remote homolog identification, in combination with a distance-dependent inter-residue potential and position-based scoring matrices. This fold recognition jury is implemented in a Java application called JThread. These methods are benchmarked on several test sets, including one released entirely after development and parameterization of JThread. In benchmark tests to identify known folds structurally similar to (but not identical with) the native structure of a sequence, JThread performs significantly better than PSI-BLAST, with 10% more structures identified correctly as the most likely structural match in a fold library, and 20% more structures correctly narrowed down to a set of five possible candidates. JThread also improves the average sequence alignment accuracy significantly, from 53% to 62% of residues aligned correctly. Reliable fold assignments and alignments are identified, making the method useful for genome annotation. JThread is applied to predicted open reading frames (ORFs) from the genomes of Mycoplasma genitalium and Drosophila melanogaster, identifying 20 new structural annotations in the former and 801 in the latter.
Collapse
Affiliation(s)
- John Marc Chandonia
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, CA 94143-2240, USA
| | | |
Collapse
|
40
|
Grigoriev IV, Choi IG. Target selection for structural genomics: a single genome approach. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2003; 6:349-62. [PMID: 12626094 DOI: 10.1089/153623102321112773] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
We describe our strategy for selecting targets for protein structure determination in context of structural genomics of a single genome. In the course of target selection, we have studied two of the smallest microbial genomes, Mycoplasma genitalium and Mycoplasma pneumoniae. To our surprise, we found that only 71 Mycoplasma genes or their orthologues can be considered as easy targets for high-throughput structural studies--far fewer than expected. We discuss the methods and criteria used for target selection and the reasons explaining rarity of easy targets. First, despite the common opinion that protein folds can be predicted for only 30-50% of genes, the number of "truly unknown" structures is less than one-third. Second, due to the different codon usage, two thirds of Mycoplasma proteins cannot be directly expressed in E. coli in high-throughput manner and require substitution by their homologues from other organisms. Third, membrane or large multi-domain proteins are difficult targets because of solubility and size issues and often require identification and structure determination of protein domains. Finally, we propose different approaches to address the difficult targets.
Collapse
Affiliation(s)
- Igor V Grigoriev
- Department of Chemistry and E.O. Lawrence Berkeley National Laboratory, University of California, Berkeley, CA, USA.
| | | |
Collapse
|
41
|
Caetano-Anollés G, Caetano-Anollés D. An evolutionarily structured universe of protein architecture. Genome Res 2003; 13:1563-71. [PMID: 12840035 PMCID: PMC403752 DOI: 10.1101/gr.1161903] [Citation(s) in RCA: 121] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2003] [Accepted: 04/17/2003] [Indexed: 11/25/2022]
Abstract
Protein structural diversity encompasses a finite set of architectural designs. Embedded in these topologies are evolutionary histories that we here uncover using cladistic principles and measurements of protein-fold usage and sharing. The reconstructed phylogenies are inherently rooted and depict histories of protein and proteome diversification. Proteome phylogenies showed two monophyletic sister-groups delimiting Bacteria and Archaea, and a topology rooted in Eucarya. This suggests three dramatic evolutionary events and a common ancestor with a eukaryotic-like, gene-rich, and relatively modern organization. Conversely, a general phylogeny of protein architectures showed that structural classes of globular proteins appeared early in evolution and in defined order, the alpha/beta class being the first. Although most ancestral folds shared a common architecture of barrels or interleaved beta-sheets and alpha-helices, many were clearly derived, such as polyhedral folds in the all-alpha class and beta-sandwiches, beta-propellers, and beta-prisms in all-beta proteins. We also describe transformation pathways of architectures that are prevalently used in nature. For example, beta-barrels with increased curl and stagger were favored evolutionary outcomes in the all-beta class. Interestingly, we found cases where structural change followed the alpha-to-beta tendency uncovered in the tree of architectures. Lastly, we traced the total number of enzymatic functions associated with folds in the trees and show that there is a general link between structure and enzymatic function.
Collapse
|
42
|
Jackson DB, Minch E, Munro RE. Bioinformatics. EXS 2003:31-69. [PMID: 12613171 DOI: 10.1007/978-3-0348-7997-2_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
|
43
|
Hou J, Sims GE, Zhang C, Kim SH. A global representation of the protein fold space. Proc Natl Acad Sci U S A 2003; 100:2386-90. [PMID: 12606708 PMCID: PMC151350 DOI: 10.1073/pnas.2628030100] [Citation(s) in RCA: 112] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
One of the principal goals of the structural genomics initiative is to identify the total repertoire of protein folds and obtain a global view of the "protein structure universe." Here, we present a 3D map of the protein fold space in which structurally related folds are represented by spatially adjacent points. Such a representation reveals a high-level organization of the fold space that is intuitively interpretable. The shape of the fold space and the overall distribution of the folds are defined by three dominant trends: secondary structure class, chain topology, and protein domain size. Random coil-like structures of small proteins and peptides are mapped to a region where the three trends converge, offering an interesting perspective on both the demography of fold space and the evolution of protein structures.
Collapse
Affiliation(s)
- Jingtong Hou
- Department of Chemistry and Lawrence Berkeley National Laboratory, University of California, Berkeley, CA 94720, USA
| | | | | | | |
Collapse
|
44
|
Anantharaman V, Aravind L, Koonin EV. Emergence of diverse biochemical activities in evolutionarily conserved structural scaffolds of proteins. Curr Opin Chem Biol 2003; 7:12-20. [PMID: 12547421 DOI: 10.1016/s1367-5931(02)00018-2] [Citation(s) in RCA: 111] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Comparative analysis of numerous protein structures that have become available in the past few years, combined with genome comparison, has yielded new insights into the evolution of enzymes and their functions. In addition to the well-known diversification of substrate specificities, enzymes with several widespread catalytic folds, particularly the TIM barrel, the RRM-like domain and the double-stranded beta-helix (cupin) domain, have been extensively explored in 'reaction space', resulting in the evolution of numerous, diverse catalytic activities supported by the same structural scaffold. Common protein folds differ widely in the diversity of catalyzed reactions. The biochemical plasticity of a fold seems to hinge on the presence of a generic, symmetrical substrate-binding pocket as opposed to highly specialized binding sites.
Collapse
Affiliation(s)
- Vivek Anantharaman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | |
Collapse
|
45
|
Buchan DWA, Rison SCG, Bray JE, Lee D, Pearl F, Thornton JM, Orengo CA. Gene3D: structural assignments for the biologist and bioinformaticist alike. Nucleic Acids Res 2003; 31:469-73. [PMID: 12520054 PMCID: PMC165498 DOI: 10.1093/nar/gkg051] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Gene3D database (http://www.biochem.ucl.ac.uk/bsm/cath_new/Gene3D/) provides structural assignments for genes within complete genomes. These are available via the internet from either the World Wide Web or FTP. Assignments are made using PSI-BLAST and subsequently processed using the DRange protocol. The DRange protocol is an empirically benchmarked method for assessing the validity of structural assignments made using sequence searching methods where appropriate assignment statistics are collected and made available. Gene3D links assignments to their appropriate entries in relevent structural and classification resources (PDBsum, CATH database and the Dictionary of Homologous Superfamilies). Release 2.0 of Gene3D includes 62 genomes, 2 eukaryotes, 10 archaea and 40 bacteria. Currently, structural assignments can be made for between 30 and 40 percent of any given genome. In any genome, around half of those genes assigned a structural domain are assigned a single domain and the other half of the genes are assigned multiple structural domains. Gene3D is linked to the CATH database and is updated with each new update of CATH.
Collapse
Affiliation(s)
- Daniel W A Buchan
- Biomolecular Structure and Modelling Group, Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK
| | | | | | | | | | | | | |
Collapse
|
46
|
Krebs WG, Tsai J, Alexandrov V, Junker J, Jansen R, Gerstein M. Tools and Databases to Analyze Protein Flexibility; Approaches to Mapping Implied Features onto Sequences. Methods Enzymol 2003; 374:544-84. [PMID: 14696388 DOI: 10.1016/s0076-6879(03)74023-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Affiliation(s)
- W G Krebs
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California 92093, USA
| | | | | | | | | | | |
Collapse
|
47
|
Nair R, Rost B. Sequence conserved for subcellular localization. Protein Sci 2002; 11:2836-47. [PMID: 12441382 PMCID: PMC2373743 DOI: 10.1110/ps.0207402] [Citation(s) in RCA: 114] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2002] [Revised: 09/05/2002] [Accepted: 09/10/2002] [Indexed: 10/27/2022]
Abstract
The more proteins diverged in sequence, the more difficult it becomes for bioinformatics to infer similarities of protein function and structure from sequence. The precise thresholds used in automated genome annotations depend on the particular aspect of protein function transferred by homology. Here, we presented the first large-scale analysis of the relation between sequence similarity and identity in subcellular localization. Three results stood out: (1) The subcellular compartment is generally more conserved than what might have been expected given that short sequence motifs like nuclear localization signals can alter the native compartment; (2) the sequence conservation of localization is similar between different compartments; and (3) it is similar to the conservation of structure and enzymatic activity. In particular, we found the transition between the regions of conserved and nonconserved localization to be very sharp, although the thresholds for conservation were less well defined than for structure and enzymatic activity. We found that a simple measure for sequence similarity accounting for pairwise sequence identity and alignment length, the HSSP distance, distinguished accurately between protein pairs of identical and different localizations. In fact, BLAST expectation values outperformed the HSSP distance only for alignments in the subtwilight zone. We succeeded in slightly improving the accuracy of inferring localization through homology by fine tuning the thresholds. Finally, we applied our results to the entire SWISS-PROT database and five entirely sequenced eukaryotes.
Collapse
Affiliation(s)
- Rajesh Nair
- Columbia University Bioinformatics Center (CUBIC), Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| | | |
Collapse
|
48
|
Dokholyan NV, Shakhnovich B, Shakhnovich EI. Expanding protein universe and its origin from the biological Big Bang. Proc Natl Acad Sci U S A 2002; 99:14132-6. [PMID: 12384571 PMCID: PMC137849 DOI: 10.1073/pnas.202497999] [Citation(s) in RCA: 153] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The bottom-up approach to understanding the evolution of organisms is by studying molecular evolution. With the large number of protein structures identified in the past decades, we have discovered peculiar patterns that nature imprints on protein structural space in the course of evolution. In particular, we have discovered that the universe of protein structures is organized hierarchically into a scale-free network. By understanding the cause of these patterns, we attempt to glance at the very origin of life.
Collapse
Affiliation(s)
- Nikolay V Dokholyan
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, MA 02138, USA.
| | | | | |
Collapse
|
49
|
Abstract
EVA is a web-based server that evaluates automatic structure prediction servers continuously and objectively. Since June 2000, EVA collected more than 20,000 secondary structure predictions. The EVA sets sufficed to conclude that the field of secondary structure prediction has advanced again. Accuracy increased substantially in the 1990s through using evolutionary information taken from the divergence of proteins in the same structural family. Recently, the evolutionary information resulting from improved searches and larger databases has again boosted prediction accuracy by more than 4% to its current height around 76% of all residues predicted correctly in one of the three states: helix, strand, or other. The best current methods solved most of the problems raised at earlier CASP meetings: All good methods now get segments right and perform well on strands. Is the recent increase in accuracy significant enough to make predictions even more useful? We believe the answer is affirmative. What is the limit of prediction accuracy? We shall see. All data are available through the EVA web site at [cubic.bioc.columbia.edu/eva/]. The raw data for the results presented are available at [eva]/sec/bup_common/2001_02_22/.
Collapse
Affiliation(s)
- B Rost
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA.
| | | |
Collapse
|
50
|
Hegyi H, Lin J, Greenbaum D, Gerstein M. Structural genomics analysis: characteristics of atypical, common, and horizontally transferred folds. Proteins 2002; 47:126-41. [PMID: 11933060 DOI: 10.1002/prot.10078] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
We conducted a structural genomics analysis of the folds and structural superfamilies in the first 20 completely sequenced genomes by focusing on the patterns of fold usage and trying to identify structural characteristics of typical and atypical folds. We assigned folds to sequences using PSI-blast, run with a systematic protocol to reduce the amount of computational overhead. On average, folds could be assigned to about a fourth of the ORFs in the genomes and about a fifth of the amino acids in the proteomes. More than 80% of all the folds in the SCOP structural classification were identified in one of the 20 organisms, with worm and E. coli having the largest number of distinct folds. Folds are particularly effective at comprehensively measuring levels of gene duplication, because they group together even very remote homologues. Using folds, we find the average level of duplication varies depending on the complexity of the organism, ranging from 2.4 in M. genitalium to 32 for the worm, values significantly higher than those observed based purely on sequence similarity. We rank the common folds in the 20 organisms, finding that the top three are the P-loop NTP hydrolase, the ferrodoxin fold, and the TIM-barrel, and discuss in detail the many factors that affect and bias these rankings. We also identify atypical folds that are "unique" to one of the organisms in our study and compare the characteristics of these folds with the most common ones. We find that common folds tend be more multifunctional and associated with more regular, "symmetrical" structures than the unique ones. In addition, many of the unique folds are associated with proteins involved in cell defense (e.g., toxins). We analyze specific patterns of fold occurrence in the genomes by associating some of them with instances of horizontal transfer and others with gene loss. In particular, we find three possible examples of transfer between archaea and bacteria and six between eukarya and bacteria. We make available our detailed results at http://genecensus.org/20.
Collapse
Affiliation(s)
- Hedi Hegyi
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, USA
| | | | | | | |
Collapse
|