1
|
Gadzała M, Kalinowska B, Banach M, Konieczny L, Roterman I. Determining protein similarity by comparing hydrophobic core structure. Heliyon 2017; 3:e00235. [PMID: 28217749 PMCID: PMC5300504 DOI: 10.1016/j.heliyon.2017.e00235] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2016] [Revised: 12/06/2016] [Accepted: 01/19/2017] [Indexed: 12/19/2022] Open
Abstract
Formal assessment of structural similarity is - next to protein structure prediction - arguably the most important unsolved problem in proteomics. In this paper we propose a similarity criterion based on commonalities between the proteins' hydrophobic cores. The hydrophobic core emerges as a result of conformational changes through which each residue reaches its intended position in the protein body. A quantitative criterion based on this phenomenon has been proposed in the framework of the CASP challenge. The structure of the hydrophobic core - including the placement and scope of any deviations from the idealized model - may indirectly point to areas of importance from the point of view of the protein's biological function. Our analysis focuses on an arbitrarily selected target from the CASP11 challenge. The proposed measure, while compliant with CASP criteria (70-80% correlation), involves certain adjustments which acknowledge the presence of factors other than simple spatial arrangement of solids.
Collapse
Affiliation(s)
- M. Gadzała
- AGH - Academic Computer Center − Cyfronet, Nawojki 11, Kraków 30-950, Poland
| | - B. Kalinowska
- Faculty of Physics, Astronomy, Applied Computer Science − Jagiellonian University, Łojasiewicza 11, Kraków 30-348, Poland
| | - M. Banach
- Department of Bioinformatics and Telemedicine, Jagiellonian University − Medical College, Łazarza 16, Krakow 31-530, Poland
| | - L. Konieczny
- Chair of Medical Biochemistry, Jagiellonian University − Medical College, Kopernika 7, Kraków 31-034, Poland
| | - I. Roterman
- Department of Bioinformatics and Telemedicine, Jagiellonian University − Medical College, Łazarza 16, Krakow 31-530, Poland
| |
Collapse
|
2
|
Shatnawi M, Zaki N, Yoo PD. Protein inter-domain linker prediction using Random Forest and amino acid physiochemical properties. BMC Bioinformatics 2014; 15 Suppl 16:S8. [PMID: 25521329 PMCID: PMC4290662 DOI: 10.1186/1471-2105-15-s16-s8] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein chains are generally long and consist of multiple domains. Domains are distinct structural units of a protein that can evolve and function independently. The accurate prediction of protein domain linkers and boundaries is often regarded as the initial step of protein tertiary structure and function predictions. Such information not only enhances protein-targeted drug development but also reduces the experimental cost of protein analysis by allowing researchers to work on a set of smaller and independent units. In this study, we propose a novel and accurate domain-linker prediction approach based on protein primary structure information only. We utilize a nature-inspired machine-learning model called Random Forest along with a novel domain-linker profile that contains physiochemical and domain-linker information of amino acid sequences. RESULTS The proposed approach was tested on two well-known benchmark protein datasets and achieved 68% sensitivity and 99% precision, which is better than any existing protein domain-linker predictor. Without applying any data balancing technique such as class weighting and data re-sampling, the proposed approach is able to accurately classify inter-domain linkers from highly imbalanced datasets. CONCLUSION Our experimental results prove that the proposed approach is useful for domain-linker identification in highly imbalanced single- and multi-domain proteins.
Collapse
|
3
|
Zimic M, Gutiérrez AH, Gilman RH, López C, Quiliano M, Evangelista W, Gonzales A, García HH, Sheen P. Immunoinformatics prediction of linear epitopes from Taenia solium TSOL18. Bioinformation 2011; 6:271-4. [PMID: 21738328 PMCID: PMC3124692 DOI: 10.6026/97320630006271] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2011] [Accepted: 06/03/2011] [Indexed: 11/23/2022] Open
Abstract
Cysticercosis is a public health problem in several developing countries. The oncosphere protein TSOL18 is the most immunogenic and protective antigen ever reported against porcine cysticercosis, although no specific epitope has been identified to account for these properties. Recent evidence suggests that protection might be associated with conformational epitopes. Linear epitopes from TSOL18 were computationally predicted and evaluated for immunogenicity and protection against porcine cysticercosis. A synthetic peptide was designed based on predicted linear B cell and T cell epitopes that are exposed on the surface of the theoretically modeled structure of TSOL18. Three surface epitopes from TSOL18 were predicted as immunogenic. A peptide comprising a linear arrangement of these epitopes was chemically synthesized. The capacity of the synthetic peptide to protect pigs against an oral challenge with Taenia solium proglottids was tested in a vaccine trial. The synthetic peptide was able to produce IgG antibodies in pigs and was associated to a reduction of the number of cysts, although was not able to provide complete protection, defined as the complete absence of cysts in necropsy. This study demonstrated that B cell and T cell predicted epitopes from TSOL18 were not able to completely protect pigs against an oral challenge with Taenia solium proglottids. Therefore, other linear epitopes or eventually conformational epitopes may be responsible for the protection conferred by TSOL18.
Collapse
Affiliation(s)
- Mirko Zimic
- Unidad de Bioinformática. Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia
| | - Andrés Hazaet Gutiérrez
- Unidad de Bioinformática. Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia
| | - Robert Hugh Gilman
- Laboratorio de Enfermedades Infecciosas. Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia
- Department of International Health. Bloomberg School of Public Health, Johns Hopkins University
| | - César López
- Unidad de Bioinformática. Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia
| | - Miguel Quiliano
- Unidad de Bioinformática. Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia
| | - Wilfredo Evangelista
- Unidad de Bioinformática. Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia
| | - Armando Gonzales
- Facultad de Veterinaria, Universidad Nacional Mayor de San Marcos, Perú
| | - Héctor Hugo García
- Laboratorio de Enfermedades Infecciosas. Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia
- Cysticercosis Unit, Instituto de Ciencias Neurológicas, Perú
| | - Patricia Sheen
- Laboratorio de Enfermedades Infecciosas. Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia
| |
Collapse
|
4
|
Yan RX, Si JN, Wang C, Zhang Z. DescFold: a web server for protein fold recognition. BMC Bioinformatics 2009; 10:416. [PMID: 20003426 PMCID: PMC2803855 DOI: 10.1186/1471-2105-10-416] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2009] [Accepted: 12/14/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Machine learning-based methods have been proven to be powerful in developing new fold recognition tools. In our previous work [Zhang, Kochhar and Grigorov (2005) Protein Science, 14: 431-444], a machine learning-based method called DescFold was established by using Support Vector Machines (SVMs) to combine the following four descriptors: a profile-sequence-alignment-based descriptor using Psi-blast e-values and bit scores, a sequence-profile-alignment-based descriptor using Rps-blast e-values and bit scores, a descriptor based on secondary structure element alignment (SSEA), and a descriptor based on the occurrence of PROSITE functional motifs. In this work, we focus on the improvement of DescFold by incorporating more powerful descriptors and setting up a user-friendly web server. RESULTS In seeking more powerful descriptors, the profile-profile alignment score generated from the COMPASS algorithm was first considered as a new descriptor (i.e., PPA). When considering a profile-profile alignment between two proteins in the context of fold recognition, one protein is regarded as a template (i.e., its 3D structure is known). Instead of a sequence profile derived from a Psi-blast search, a structure-seeded profile for the template protein was generated by searching its structural neighbors with the assistance of the TM-align structural alignment algorithm. Moreover, the COMPASS algorithm was used again to derive a profile-structural-profile-alignment-based descriptor (i.e., PSPA). We trained and tested the new DescFold in a total of 1,835 highly diverse proteins extracted from the SCOP 1.73 version. When the PPA and PSPA descriptors were introduced, the new DescFold boosts the performance of fold recognition substantially. Using the SCOP_1.73_40% dataset as the fold library, the DescFold web server based on the trained SVM models was further constructed. To provide a large-scale test for the new DescFold, a stringent test set of 1,866 proteins were selected from the SCOP 1.75 version. At a less than 5% false positive rate control, the new DescFold is able to correctly recognize structural homologs at the fold level for nearly 46% test proteins. Additionally, we also benchmarked the DescFold method against several well-established fold recognition algorithms through the LiveBench targets and Lindahl dataset. CONCLUSIONS The new DescFold method was intensively benchmarked to have very competitive performance compared with some well-established fold recognition methods, suggesting that it can serve as a useful tool to assist in template-based protein structure prediction. The DescFold server is freely accessible at http://202.112.170.199/DescFold/index.html.
Collapse
Affiliation(s)
- Ren-Xiang Yan
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China.
| | | | | | | |
Collapse
|
5
|
Abstract
The SAM-T08 web server is a protein structure prediction server that provides several useful intermediate results in addition to the final predicted 3D structure: three multiple sequence alignments of putative homologs using different iterated search procedures, prediction of local structure features including various backbone and burial properties, calibrated E-values for the significance of template searches of PDB and residue–residue contact predictions. The server has been validated as part of the CASP8 assessment of structure prediction as having good performance across all classes of predictions. The SAM-T08 server is available at http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html
Collapse
Affiliation(s)
- Kevin Karplus
- Department of Biomolecular Engineering, Baskin School of Engineering, University of California, Santa Cruz, CA 95064, USA.
| |
Collapse
|
6
|
Doxey AC, Lynch MDJ, Müller KM, Meiering EM, McConkey BJ. Insights into the evolutionary origins of clostridial neurotoxins from analysis of the Clostridium botulinum strain A neurotoxin gene cluster. BMC Evol Biol 2008; 8:316. [PMID: 19014598 PMCID: PMC2605760 DOI: 10.1186/1471-2148-8-316] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2008] [Accepted: 11/14/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Clostridial neurotoxins (CNTs) are the most deadly toxins known and causal agents of botulism and tetanus neuroparalytic diseases. Despite considerable progress in understanding CNT structure and function, the evolutionary origins of CNTs remain a mystery as they are unique to Clostridium and possess a sequence and structural architecture distinct from other protein families. Uncovering the origins of CNTs would be a significant contribution to our understanding of how pathogens evolve and generate novel toxin families. RESULTS The C. botulinum strain A genome was examined for potential homologues of CNTs. A key link was identified between the neurotoxin and the flagellin gene (CBO0798) located immediately upstream of the BoNT/A neurotoxin gene cluster. This flagellin sequence displayed the strongest sequence similarity to the neurotoxin and NTNH homologue out of all proteins encoded within C. botulinum strain A. The CBO0798 gene contains a unique hypervariable region, which in closely related flagellins encodes a collagenase-like domain. Remarkably, these collagenase-containing flagellins were found to possess the characteristic HEXXH zinc-protease motif responsible for the neurotoxin's endopeptidase activity. Additional links to collagenase-related sequences and functions were detected by further analysis of CNTs and surrounding genes, including sequence similarities to collagen-adhesion domains and collagenases. Furthermore, the neurotoxin's HCRn domain was found to exhibit both structural and sequence similarity to eukaryotic collagen jelly-roll domains. CONCLUSION Multiple lines of evidence suggest that the neurotoxin and adjacent genes evolved from an ancestral collagenase-like gene cluster, linking CNTs to another major family of clostridial proteolytic toxins. Duplication, reshuffling and assembly of neighboring genes within the BoNT/A neurotoxin gene cluster may have lead to the neurotoxin's unique architecture. This work provides new insights into the evolution of C. botulinum neurotoxins and the evolutionary mechanisms underlying the origins of virulent genes.
Collapse
Affiliation(s)
- Andrew C Doxey
- Department of Biology, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, N2L 3G1, Canada.
| | | | | | | | | |
Collapse
|
7
|
Altman RB, Bergman CM, Blake J, Blaschke C, Cohen A, Gannon F, Grivell L, Hahn U, Hersh W, Hirschman L, Jensen LJ, Krallinger M, Mons B, O'Donoghue SI, Peitsch MC, Rebholz-Schuhmann D, Shatkay H, Valencia A. Text mining for biology--the way forward: opinions from leading scientists. Genome Biol 2008; 9 Suppl 2:S7. [PMID: 18834498 PMCID: PMC2559991 DOI: 10.1186/gb-2008-9-s2-s7] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
This article collects opinions from leading scientists about how text mining can provide better access to the biological literature, how the scientific community can help with this process, what the next steps are, and what role future BioCreative evaluations can play. The responses identify several broad themes, including the possibility of fusing literature and biological databases through text mining; the need for user interfaces tailored to different classes of users and supporting community-based annotation; the importance of scaling text mining technology and inserting it into larger workflows; and suggestions for additional challenge evaluations, new applications, and additional resources needed to make progress.
Collapse
Affiliation(s)
- Russ B Altman
- Stanford University, Stanford, California, 94305-5444, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Cheng J, Baldi P. Improved residue contact prediction using support vector machines and a large feature set. BMC Bioinformatics 2007; 8:113. [PMID: 17407573 PMCID: PMC1852326 DOI: 10.1186/1471-2105-8-113] [Citation(s) in RCA: 174] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2006] [Accepted: 04/02/2007] [Indexed: 11/12/2022] Open
Abstract
Background Predicting protein residue-residue contacts is an important 2D prediction task. It is useful for ab initio structure prediction and understanding protein folding. In spite of steady progress over the past decade, contact prediction remains still largely unsolved. Results Here we develop a new contact map predictor (SVMcon) that uses support vector machines to predict medium- and long-range contacts. SVMcon integrates profiles, secondary structure, relative solvent accessibility, contact potentials, and other useful features. On the same test data set, SVMcon's accuracy is 4% higher than the latest version of the CMAPpro contact map predictor. SVMcon recently participated in the seventh edition of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7) experiment and was evaluated along with seven other contact map predictors. SVMcon was ranked as one of the top predictors, yielding the second best coverage and accuracy for contacts with sequence separation >= 12 on 13 de novo domains. Conclusion We describe SVMcon, a new contact map predictor that uses SVMs and a large set of informative features. SVMcon yields good performance on medium- to long-range contact predictions and can be modularly incorporated into a structure prediction pipeline.
Collapse
Affiliation(s)
- Jianlin Cheng
- School of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FL 32816-2362, USA
| | - Pierre Baldi
- School of Information and Computer Sciences, University of California Irvine, Irvine, CA 92617, USA
| |
Collapse
|
9
|
Barberis M, De Gioia L, Ruzzene M, Sarno S, Coccetti P, Fantucci P, Vanoni M, Alberghina L. The yeast cyclin-dependent kinase inhibitor Sic1 and mammalian p27Kip1 are functional homologues with a structurally conserved inhibitory domain. Biochem J 2006; 387:639-47. [PMID: 15649124 PMCID: PMC1134993 DOI: 10.1042/bj20041299] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
In Saccharomyces cerevisiae, Sic1, an inhibitor of Cdk (cyclin-dependent kinase), blocks the activity of S-Cdk1 (Cdk1/Clb5,6) kinase that is required for DNA replication. Deletion of Sic1 causes premature DNA replication from fewer origins, extension of the S phase and inefficient separation of sister chromatids during anaphase. Despite the well-documented relevance of Sic1 inhibition of S-Cdk1 for cell cycle control and genome instability, the molecular mechanism by which Sic1 inhibits S-Cdk1 activity remains obscure. In this paper, we show that Sic1 is functionally and structurally related to the mammalian Cki (Cdk inhibitor) p27Kip1 of the Kip/Cip family. A molecular model of the inhibitory domain of Sic1 bound to the Cdk2-cyclin A complex suggested that the yeast inhibitor might productively interface with the mammalian Cdk2-cyclin A complex. Consistent with this, Sic1 is able to bind to, and strongly inhibit the kinase activity of, the Cdk2-cyclin A complex. In addition, comparison of the different inhibitory patterns obtained using histone H1 or GST (glutathione S-transferase)-pRb (retinoblastoma protein) fusion protein as substrate (the latter of which recognizes both the docking site and the catalytic site of Cdk2-cyclin A) offers interesting suggestions for the inhibitory mechanism of Sic1. Finally, overexpression of the KIP1 gene in vivo in Saccharomyces cerevisiae, like overexpression of the related SIC1 gene, rescues the cell cycle-related phenotype of a sic1Delta strain. Taken together, these findings strongly indicate that budding yeast Sic1 and mammalian p27(Kip1) are functional homologues with a structurally conserved inhibitory domain.
Collapse
Affiliation(s)
- Matteo Barberis
- *Dipartimento di Biotecnologie e Bioscienze, Università degli Studi di Milano-Bicocca, Piazza della Scienza 2, 20126 Milano, Italy
| | - Luca De Gioia
- *Dipartimento di Biotecnologie e Bioscienze, Università degli Studi di Milano-Bicocca, Piazza della Scienza 2, 20126 Milano, Italy
| | - Maria Ruzzene
- †Dipartimento di Chimica Biologica, Università di Padova, Viale G. Colombo 3, 35121 Padova, Italy
| | - Stefania Sarno
- †Dipartimento di Chimica Biologica, Università di Padova, Viale G. Colombo 3, 35121 Padova, Italy
| | - Paola Coccetti
- *Dipartimento di Biotecnologie e Bioscienze, Università degli Studi di Milano-Bicocca, Piazza della Scienza 2, 20126 Milano, Italy
| | - Piercarlo Fantucci
- *Dipartimento di Biotecnologie e Bioscienze, Università degli Studi di Milano-Bicocca, Piazza della Scienza 2, 20126 Milano, Italy
| | - Marco Vanoni
- *Dipartimento di Biotecnologie e Bioscienze, Università degli Studi di Milano-Bicocca, Piazza della Scienza 2, 20126 Milano, Italy
| | - Lilia Alberghina
- *Dipartimento di Biotecnologie e Bioscienze, Università degli Studi di Milano-Bicocca, Piazza della Scienza 2, 20126 Milano, Italy
- To whom correspondence should be addressed (email )
| |
Collapse
|
10
|
Wallner B, Elofsson A. Identification of correct regions in protein models using structural, alignment, and consensus information. Protein Sci 2006; 15:900-13. [PMID: 16522791 PMCID: PMC2242478 DOI: 10.1110/ps.051799606] [Citation(s) in RCA: 122] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
In this study we present two methods to predict the local quality of a protein model: ProQres and ProQprof. ProQres is based on structural features that can be calculated from a model, while ProQprof uses alignment information and can only be used if the model is created from an alignment. In addition, we also propose a simple approach based on local consensus, Pcons-local. We show that all these methods perform better than state-of-the-art methodologies and that, when applicable, the consensus approach is by far the best approach to predict local structure quality. It was also found that ProQprof performed better than other methods for models based on distant relationships, while ProQres performed best for models based on closer relationship, i.e., a model has to be reasonably good to make a structural evaluation useful. Finally, we show that a combination of ProQprof and ProQres (ProQlocal) performed better than any other nonconsensus method for both high- and low-quality models. Additional information and Web servers are available at: http://www.sbc.su.se/~bjorn/ProQ/.
Collapse
Affiliation(s)
- Björn Wallner
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden.
| | | |
Collapse
|
11
|
Challis RJ, Goodacre SL, Hewitt GM. Evolution of spider silks: conservation and diversification of the C-terminus. INSECT MOLECULAR BIOLOGY 2006; 15:45-56. [PMID: 16469067 DOI: 10.1111/j.1365-2583.2005.00606.x] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Analysis of DNA sequences coding for the C-terminus of spider silk proteins from a range of spiders suggests that many silk C-termini share a common origin, and that their physical properties have been highly conserved over several hundred million years. These physical properties are compatible with roles in protein synthesis, silk function and in recruiting accessory proteins. Phylogenetic relationships among different silk genes suggest that any recombination has been insufficient to homogenize the different types of silk gene, which appear to have evolved independently of one another. The types of nucleotide substitutions that have occurred suggest that selection may have operated differently in the various silk lineages. Amino acid sequences of flagelliform silk C-termini differ substantially from the other types of spider silk studied, but they are expected to have very similar physical properties and may perform a similar function.
Collapse
Affiliation(s)
- R J Challis
- IEB, University of Edinburgh, King's Buildings, West Mains Road, Edinburgh, UK
| | | | | |
Collapse
|
12
|
Draker R, Roper RL, Petric M, Tellier R. The complete sequence of the bovine torovirus genome. Virus Res 2005; 115:56-68. [PMID: 16137782 PMCID: PMC7114287 DOI: 10.1016/j.virusres.2005.07.005] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2005] [Revised: 07/05/2005] [Accepted: 07/12/2005] [Indexed: 12/15/2022]
Abstract
Viruses in the family Coronaviridae have elicited new interest, with the outbreaks caused by SARS-HCoV in 2003 and the recent discovery of a new human coronavirus, HCoV-NL63. The genus Torovirus, within the family Coronaviridae, is less well characterized, in part because toroviruses cannot yet be grown in cell culture (except for the Berne virus). In this study, we determined the sequence of the complete genome of Breda-1 (BoTV-1), a bovine torovirus. This is the first complete torovirus genome sequence to be reported. BoTV-1 RNA was amplified using long RT-PCR and the amplicons sequenced. The genome has a length of 28.475 kb and consisted mainly of the replicase gene (∼20.2 kb) which contains two large overlapping ORFs, ORF1a and ORF1b, encoding polyproteins pp1a and pp1b, respectively. Sequence analysis identified conserved domains within the predicted sequences of pp1a and pp1b. Sequence alignments and protein secondary structure prediction data suggest the presence of a 3C-like serine protease domain with similarity to the arterivirus 3C-like serine protease and a single papain-like cysteine protease domain with similarity to the picornavirus leader protease. The ADRP (APPR-1″) domain – unique to the Coronaviridae – was also located in BoTV pp1a. In addition, several hydrophobic domains were identified that are typical of a nidovirus replicase. Within the pp1b sequence the polymerase and helicase domains were identified, as well as sequences predicted to be involved in ribosomal frameshifting, including the conserved slippery sequence UUUAAAC and two potential pseudoknot structures.
Collapse
Affiliation(s)
- Ryan Draker
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ont., Canada
| | - Rachel L. Roper
- Department of Microbiology and Immunology, Brody School of Medicine, East Carolina University, NC, USA
| | - Martin Petric
- British Columbia Center for Disease Control, Vancouver, BC, Canada
| | - Raymond Tellier
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ont., Canada
- Division of Microbiology, The Hospital for Sick Children, 555 University Avenue, Toronto, Ont., Canada M5G 1X8
- Corresponding author. Tel.: +1 416 813 6592; fax: +1 416 813 6257.
| |
Collapse
|
13
|
Jaroszewski L, Rychlewski L, Li Z, Li W, Godzik A. FFAS03: a server for profile--profile sequence alignments. Nucleic Acids Res 2005; 33:W284-8. [PMID: 15980471 PMCID: PMC1160179 DOI: 10.1093/nar/gki418] [Citation(s) in RCA: 456] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The FFAS03 server provides a web interface to the third generation of the profile–profile alignment and fold-recognition algorithm of fold and function assignment system (FFAS) [L. Rychlewski, L. Jaroszewski, W. Li and A. Godzik (2000), Protein Sci., 9, 232–241]. Profile–profile algorithms use information present in sequences of homologous proteins to amplify the patterns defining the family. As a result, they enable detection of remote homologies beyond the reach of other methods. FFAS, initially developed in 2000, is consistently one of the best ranked fold prediction methods in the CAFASP and LiveBench competitions. It is also used by several fold-recognition consensus methods and meta-servers. The FFAS03 server accepts a user supplied protein sequence and automatically generates a profile, which is then compared with several sets of sequence profiles of proteins from PDB, COG, PFAM and SCOP. The profile databases used by the server are automatically updated with the latest structural and sequence information. The server provides access to the alignment analysis, multiple alignment, and comparative modeling tools. Access to the server is open for both academic and commercial researchers. The FFAS03 server is available at .
Collapse
Affiliation(s)
| | | | | | | | - Adam Godzik
- To whom correspondence should be addressed. Tel: +1 858 646 3168; Fax: +1 858 713 9925;
| |
Collapse
|
14
|
Abstract
The Na exchanger regulatory factor (NHERF) family of epithelial-enriched PDZ domain scaffolding proteins plays important roles in maintaining and regulating epithelial cell function. The NHERFs exhibit some overlap in tissue distribution and binding partners, suggesting redundant functions. Yet, it is clear that each NHERF protein exhibits distinct properties, translating into unique cellular functions. The work summarized in this review suggests the most recently identified family member, NHERF4, is the most divergent. Additional investigation is needed, however, to understand more completely the role of NHERF4 in the context of the NHERF family.
Collapse
Affiliation(s)
- William R Thelin
- Department of Cell and Developmental Biology, The University of North Carolina at Chapel Hill, CB 7090, Chapel Hill, NC 27599, USA
| | | | | |
Collapse
|
15
|
Ginalski K, Grishin NV, Godzik A, Rychlewski L. Practical lessons from protein structure prediction. Nucleic Acids Res 2005; 33:1874-91. [PMID: 15805122 PMCID: PMC1074308 DOI: 10.1093/nar/gki327] [Citation(s) in RCA: 99] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Despite recent efforts to develop automated protein structure determination protocols, structural genomics projects are slow in generating fold assignments for complete proteomes, and spatial structures remain unknown for many protein families. Alternative cheap and fast methods to assign folds using prediction algorithms continue to provide valuable structural information for many proteins. The development of high-quality prediction methods has been boosted in the last years by objective community-wide assessment experiments. This paper gives an overview of the currently available practical approaches to protein structure prediction capable of generating accurate fold assignment. Recent advances in assessment of the prediction quality are also discussed.
Collapse
Affiliation(s)
- Krzysztof Ginalski
- BioInfoBank Instituteul. Limanowskiego 24A, 60-744 Poznań, Poland
- Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw UniversityPawińskiego 5a, 02-106 Warsaw, Poland
- Department of Biochemistry, University of Texas, Southwestern Medical Center5323 Harry Hines Boulevard, Dallas, TX 75390-9038, USA
| | - Nick V. Grishin
- Department of Biochemistry, University of Texas, Southwestern Medical Center5323 Harry Hines Boulevard, Dallas, TX 75390-9038, USA
- Howard Hughes Medical Institute, University of Texas, Southwestern Medical Center5323 Harry Hines Boulevard, Dallas, TX 75390-9050, USA
| | - Adam Godzik
- The Burnham Institute10901 N. Torrey Pines Road, La Jolla, CA 92037, USA
| | - Leszek Rychlewski
- BioInfoBank Instituteul. Limanowskiego 24A, 60-744 Poznań, Poland
- To whom correspondence should be addressed. Tel: +48 604 628805; Fax: +48 61 8643350;
| |
Collapse
|
16
|
Goulielmos GN, Eliopoulos E, Loukas M, Tsakas S. Functional constraints of 6-phosphogluconate dehydrogenase (6-PGD) based on sequence and structural information. J Mol Evol 2005; 59:358-71. [PMID: 15553090 DOI: 10.1007/s00239-004-2630-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
The pentose phosphate cycle is considered as a major source of NADPH and pentose needed for nucleic acid biosynthesis. 6-Phosphogluconate dehydrogenase (6PGD), an enzyme participating in this cycle, catalyzes the oxidative decarboxylation of 6PGD to ribulose 5-phosphate with the subsequent release of CO2 and the reduction of NADP. We have determined the amino acid sequence of 6PGD of Bactrocera oleae and constructed a three-dimensional model based on the homologous known sheep structure. In a comparative study of 6PGD sequences from numerous species, all the conserved and variable regions of the enzyme were analyzed and the regions of functional importance were localized, in an attempt promoted also by the direct involvement of the enzyme in various human diseases. Thus, analysis of amino acid variability of 37 6PGD sequences revealed that all regions important for the catalytic activity, such as those forming the substrate and coenzyme binding sites, are highly conserved in all species examined. Moreover, several amino acid residues responsible for substrate and coenzyme specificity were also found to be identical in all species examined. The higher percentage of protein divergence is observed at two regions that accumulate mutations, located at the distant parts of the two domains of the enzyme with respect to their interface. These peripheral regions of non-functional importance are highly variable and are predicted as antigenic, thus reflecting possible regions for antibody recognition. Furthermore, locating the differences between diptera 6PGD sequences on the three-dimensional model suggests probable positions of different amino acid residues appearing at B. oleae fast, intermediate, and slow allozymic variants.
Collapse
Affiliation(s)
- George N Goulielmos
- Department of Genetics, Agricultural University of Athens, Iera Odos 75, Votanikos, 118 55 Athens, Greece.
| | | | | | | |
Collapse
|
17
|
Hashimoto Y, Lawrence P. Comparative analysis of selected genes from Diachasmimorpha longicaudata entomopoxvirus and other poxviruses. JOURNAL OF INSECT PHYSIOLOGY 2005; 51:207-20. [PMID: 15749105 PMCID: PMC7094658 DOI: 10.1016/j.jinsphys.2004.10.010] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2004] [Accepted: 10/22/2004] [Indexed: 05/16/2023]
Abstract
The Diachasmimorpha longicaudata entomopoxvirus (DlEPV) is the first symbiotic EPV described from a parasitic wasp. The DlEPV is introduced into the tephritid fruit fly larval host along with the wasp egg at oviposition. We sequenced a shotgun genomic library of the DlEPV DNA and analyzed and compared the predicted protein sequences of eight ORFs with those of selected poxviruses and other organisms. BlastP searches showed that five of these are homologous to poxvirus putative proteins such as metalloprotease, a putative membrane protein, late transcription factor-3, virion surface protein, and poly (A) polymerase (PAP) regulatory small subunit. Three of these are similar to those of other organisms such as the gamma-glutamyltransferase (GGT) of Arabidopsis thaliana, eukaryotic initiation factor 4A (eIF4A) of Caenorhabditis briggsae and lambda phage integrase (lambda-Int) of Enterococcus faecium. Transcription motifs for early (TGA,A/T,XXXXA) or late (TAAATG, TAAT, or TAAAT) gene expression conserved in poxviruses were identified with those ORFs. Phylogenetic analysis of multiple alignments of five ORFs and 20 poxvirus homologous sequences and of a concatenate of multiple alignments suggested that DlEPV probably diverged from the ancestral node between the fowlpox virus and the genus B, lepidopteran and orthopteran EPVs, to which Amsacta moorei and Melanoplus sanguinipes EPV, respectively, belong. The DlEPV putative GGT, eIF4A, and lambda-Int contained many conserved domains that typified these proteins. These homologues may be involved in either viral pathogenicity or enhancing parasitism via the gamma-glutamyl cycle and compensation of eIF4A levels in the parasitized fly, or via the integration of a portion of the viral genome into the wasp and/or parasitized fly.
Collapse
Affiliation(s)
| | - P.O. Lawrence
- Department of Entomology and Nematology, University of Florida, Gainesville, FL 32611-0620, USA
| |
Collapse
|
18
|
Eliopoulos E, Goulielmos GN, Loukas M. Functional constraints of alcohol dehydrogenase (ADH) of tephritidae and relationships with other Dipteran species. J Mol Evol 2004; 58:493-505. [PMID: 15170253 DOI: 10.1007/s00239-003-2568-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2003] [Accepted: 11/04/2003] [Indexed: 10/26/2022]
Abstract
Alcohol dehydrogenase is considered a very important enzyme in insect metabolism because it is involved (in its homodimeric form) in the catalysis of the reversible conversion of various alcohols in larval feeding sites to their corresponding aldehydes and ketones, thus contributing to detoxification and metabolic purposes. Using 14 amino acid ADH sequences recently determined in our laboratory, we constructed a three-dimensional (3D) model of olive fruit fly Bactrocera oleae ADH1 and ADH2, based on the known homologous Drosophila lebanonensis ADH structure, and the amino acid residues that have been proposed as being responsible for catalysis were located on it. Moreover, in a comparative study of the ADH sequences, the residues occupying characteristic positions in the ADH of species of the Bactrocera and Ceratitis genera (called genus-specific) as well as residues appearing only in ADH1 or ADH2 (called isozymic-specific) were defined and localized on the 3D model. All regions important for catalytic activity, such as those forming the substrate- and coenzyme-binding sites, are highly conserved in all tephritid species examined. Genus-specific amino acids are located on the outside of the protein, on loops and regions predicted to be antigenic. The higher percentage of genus-specific amino acid variation seems to be centered in the NAD adenine-binding site, located near the surface of the protein molecule. Nine of 12 isozymic-specific positions are lined along an "arc" on the surface of the protein, thus linking the two "monomer bases" of the dimer via the C-terminal interacting loops. Furthermore, the distribution of isozymic- and genus-specific amino acids on the monomer-monomer interface may have some evolutionary significance. Most amino acids predicted to be antigenic are positioned in peripheral regions of nonfunctional importance, but surprisingly, an additional antigenic region is contained within the (highly conserved in tephritids) C-terminal tail.
Collapse
Affiliation(s)
- Elias Eliopoulos
- Department of Genetics, Agricultural University of Athens, Iera Odos 75, Votanikos, 118 55 Athens, Greece
| | | | | |
Collapse
|
19
|
Abstract
We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted partitioning. The speed and accuracy of MUSCLE are compared with T-Coffee, MAFFT and CLUSTALW on four test sets of reference alignments: BAliBASE, SABmark, SMART and a new benchmark, PREFAB. MUSCLE achieves the highest, or joint highest, rank in accuracy on each of these sets. Without refinement, MUSCLE achieves average accuracy statistically indistinguishable from T-Coffee and MAFFT, and is the fastest of the tested methods for large numbers of sequences, aligning 5000 sequences of average length 350 in 7 min on a current desktop computer. The MUSCLE program, source code and PREFAB test data are freely available at http://www.drive5. com/muscle.
Collapse
|
20
|
Kopp J, Schwede T. The SWISS-MODEL Repository of annotated three-dimensional protein structure homology models. Nucleic Acids Res 2004; 32:D230-4. [PMID: 14681401 PMCID: PMC308743 DOI: 10.1093/nar/gkh008] [Citation(s) in RCA: 243] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The SWISS-MODEL Repository is a database of annotated three-dimensional comparative protein structure models generated by the fully automated homology-modelling pipeline SWISS-MODEL. The Repository currently contains about 300,000 three-dimensional models for sequences from the Swiss-Prot and TrEMBL databases. The content of the Repository is updated on a regular basis incorporating new sequences, taking advantage of new template structures becoming available and reflecting improvements in the underlying modelling algorithms. Each entry consists of one or more three-dimensional protein models, the superposed template structures, the alignments on which the models are based, a summary of the modelling process and a force field based quality assessment. The SWISS-MODEL Repository can be queried via an interactive website at http://swissmodel.expasy. org/repository/. Annotation and cross-linking of the models with other databases, e.g. Swiss-Prot on the ExPASy server, allow for seamless navigation between protein sequence and structure information. The aim of the SWISS-MODEL Repository is to provide access to an up-to-date collection of annotated three-dimensional protein models generated by automated homology modelling, bridging the gap between sequence and structure databases.
Collapse
Affiliation(s)
- Jürgen Kopp
- Biozentrum der Universität Basel and Swiss Institute of Bioinformatics, Klingelbergstrasse 50-70, CH 4056 Basel, Switzerland
| | | |
Collapse
|
21
|
Eyrich VA, Rost B. META-PP: single interface to crucial prediction servers. Nucleic Acids Res 2003; 31:3308-10. [PMID: 12824314 PMCID: PMC168978 DOI: 10.1093/nar/gkg572] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2003] [Revised: 04/08/2003] [Accepted: 04/08/2003] [Indexed: 11/14/2022] Open
Abstract
The META-PP server (http://cubic.bioc.columbia.edu/meta/) simplifies access to a battery of public protein structure and function prediction servers by providing a common and stable web-based interface. The goal is to make these powerful and increasingly essential methods more readily available to nonexpert users and the bioinformatics community at large. At present META-PP provides access to a selected set of high-quality servers in the areas of comparative modelling, threading/fold recognition, secondary structure prediction and more specialized fields like contact and function prediction.
Collapse
Affiliation(s)
- Volker A Eyrich
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA.
| | | |
Collapse
|
22
|
Abstract
The ability to separate correct models of protein structures from less correct models is of the greatest importance for protein structure prediction methods. Several studies have examined the ability of different types of energy function to detect the native, or native-like, protein structure from a large set of decoys. In contrast to earlier studies, we examine here the ability to detect models that only show limited structural similarity to the native structure. These correct models are defined by the existence of a fragment that shows significant similarity between this model and the native structure. It has been shown that the existence of such fragments is useful for comparing the performance between different fold recognition methods and that this performance correlates well with performance in fold recognition. We have developed ProQ, a neural-network-based method to predict the quality of a protein model that extracts structural features, such as frequency of atom-atom contacts, and predicts the quality of a model, as measured either by LGscore or MaxSub. We show that ProQ performs at least as well as other measures when identifying the native structure and is better at the detection of correct models. This performance is maintained over several different test sets. ProQ can also be combined with the Pcons fold recognition predictor (Pmodeller) to increase its performance, with the main advantage being the elimination of a few high-scoring incorrect models. Pmodeller was successful in CASP5 and results from the latest LiveBench, LiveBench-6, indicating that Pmodeller has a higher specificity than Pcons alone.
Collapse
Affiliation(s)
- Björn Wallner
- Stockholm Bioinformatics Center, SCFAB, Stockholm University, SE-106 91 Stockholm, Sweden
| | | |
Collapse
|
23
|
González B, Campillo N, Garrido F, Gasset M, Sanz-Aparicio J, Pajares MA. Active-site-mutagenesis study of rat liver betaine-homocysteine S-methyltransferase. Biochem J 2003; 370:945-52. [PMID: 12487625 PMCID: PMC1223237 DOI: 10.1042/bj20021510] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2002] [Revised: 12/09/2002] [Accepted: 12/17/2002] [Indexed: 11/17/2022]
Abstract
A site-directed-mutagenesis study of putative active-site residues in rat liver betaine-homocysteine S-methyltransferase has been carried out. Identification of these amino acids was based on data derived from a structural model of the enzyme. No alterations in the CD spectra or the gel-filtration chromatography elution pattern were observed with the mutants, thus suggesting no modification in the secondary structure content or in the association state of the proteins. All the mutants obtained showed a reduction of the enzyme activity, the most dramatic effect being that of Glu(159), followed by Tyr(77) and Asp(26). Changes in affinity for either of the substrates, homocysteine or betaine, were detected when substitutions were performed of Glu(21), Asp(26), Phe(74) and Cys(186). Interestingly, Asp(26), postulated to be involved in homocysteine binding, has a strong effect on affinity for betaine. The relevance of these results is discussed in the light of very recent structural data obtained for the human enzyme.
Collapse
Affiliation(s)
- Beatriz González
- Instituto de Química-Física Rocasolano (CSIC), Serrano 119, 28006 Madrid, Spain
| | | | | | | | | | | |
Collapse
|
24
|
Ko DC, Binkley J, Sidow A, Scott MP. The integrity of a cholesterol-binding pocket in Niemann-Pick C2 protein is necessary to control lysosome cholesterol levels. Proc Natl Acad Sci U S A 2003; 100:2518-25. [PMID: 12591949 PMCID: PMC151373 DOI: 10.1073/pnas.0530027100] [Citation(s) in RCA: 157] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/03/2003] [Indexed: 11/18/2022] Open
Abstract
The neurodegenerative disease Niemann-Pick Type C2 (NPC2) results from mutations in the NPC2 (HE1) gene that cause abnormally high cholesterol accumulation in cells. We find that purified NPC2, a secreted soluble protein, binds cholesterol specifically with a much higher affinity (K(d) = 30-50 nM) than previously reported. Genetic and biochemical studies identified single amino acid changes that prevent both cholesterol binding and the restoration of normal cholesterol levels in mutant cells. The amino acids that affect cholesterol binding surround a hydrophobic pocket in the NPC2 protein structure, identifying a candidate sterol-binding location. On the basis of evolutionary analysis and mutagenesis, three other regions of the NPC2 protein emerged as important, including one required for efficient secretion.
Collapse
MESH Headings
- Amino Acid Sequence
- Amino Acids/metabolism
- Animals
- Binding Sites
- CHO Cells
- Carrier Proteins
- Cells, Cultured
- Cholesterol/chemistry
- Cholesterol/metabolism
- Chromatography, Gel
- Cricetinae
- Culture Media, Conditioned/pharmacology
- DNA, Complementary/metabolism
- Dose-Response Relationship, Drug
- Evolution, Molecular
- Fibroblasts/metabolism
- Filipin/chemistry
- Glycoproteins/chemistry
- Glycoproteins/physiology
- Humans
- Kinetics
- Mice
- Models, Molecular
- Molecular Sequence Data
- Mutagenesis, Site-Directed
- Mutation
- Plasmids/metabolism
- Protein Binding
- Protein Conformation
- Protein Structure, Tertiary
- Sequence Homology, Amino Acid
- Software
- Time Factors
- Vesicular Transport Proteins
Collapse
Affiliation(s)
- Dennis C Ko
- Department of Developmental Biology, Beckman Center B300, 279 Campus Drive, Stanford University School of Medicine, Stanford, CA 94305-5329, USA
| | | | | | | |
Collapse
|
25
|
Beebe K, Ribas de Pouplana L, Schimmel P. Elucidation of tRNA-dependent editing by a class II tRNA synthetase and significance for cell viability. EMBO J 2003; 22:668-75. [PMID: 12554667 PMCID: PMC140749 DOI: 10.1093/emboj/cdg065] [Citation(s) in RCA: 136] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2002] [Revised: 12/03/2002] [Accepted: 12/05/2002] [Indexed: 11/14/2022] Open
Abstract
Editing of misactivated amino acids by class I tRNA synthetases is encoded by a specialized internal domain specific to class I enzymes. In contrast, little is known about editing activities of the structurally distinct class II enzymes. Here we show that the class II alanyl-tRNA synthetase (AlaRS) has a specialized internal domain that appears weakly related to an appended domain of threonyl-tRNA synthetase (ThrRS), but is unrelated to that found in class I enzymes. Editing of misactivated glycine or serine was shown to require a tRNA cofactor. Specific mutations in the aforementioned domain disrupt editing and lead to production of mischarged tRNA. This class-specific editing domain was found to be essential for cell growth, in the presence of elevated concentrations of glycine or serine. In contrast to ThrRS, where the editing domain is not found in all three kingdoms of living organisms, it was incorporated early into AlaRSs and is present throughout evolution. Thus, tRNA-dependent editing by AlaRS may have been critical for making the genetic code sufficiently accurate to generate the tree of life.
Collapse
Affiliation(s)
| | | | - Paul Schimmel
- The Skaggs Institute for Chemical Biology, The Scripps Research Institute, Beckman Center, BCC379, 10550 North Torrey Pines Road, La Jolla, CA 92037, USA
Corresponding author e-mail:
| |
Collapse
|
26
|
Swalla BM, Gumport RI, Gardner JF. Conservation of structure and function among tyrosine recombinases: homology-based modeling of the lambda integrase core-binding domain. Nucleic Acids Res 2003; 31:805-18. [PMID: 12560475 PMCID: PMC149183 DOI: 10.1093/nar/gkg142] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Tyrosine recombinases participate in diverse biological processes by catalyzing recombination between specific DNA sites. Although a conserved protein fold has been described for the catalytic (CAT) domains of five recombinases, structural relationships between their core-binding (CB) domains remain unclear. Despite differences in the specificity and affinity of core-type DNA recognition, a conserved binding mechanism is suggested by the shared two-domain motif in crystal structure models of the recombinases Cre, XerD and Flp. We have found additional evidence for conservation of the CB domain fold. Comparison of XerD and Cre crystal structures showed that their CB domains are closely related; the three central alpha-helices of these domains are superposable to within 1.44 A. A structure-based multiple sequence alignment containing 25 diverse CB domain sequences provided evidence for widespread conservation of both structural and functional elements in this fold. Based upon the Cre and XerD crystal structures, we employed homology modeling to construct a three-dimensional structure for the lambda integrase CB domain. The model provides a conceptual framework within which many previously identified, functionally important amino acid residues were investigated. In addition, the model predicts new residues that may participate in core-type DNA binding or dimerization, thereby providing hypotheses for future genetic and biochemical experiments.
Collapse
|
27
|
Grundhoff A, Ganem D. The latency-associated nuclear antigen of Kaposi's sarcoma-associated herpesvirus permits replication of terminal repeat-containing plasmids. J Virol 2003; 77:2779-83. [PMID: 12552022 PMCID: PMC141125 DOI: 10.1128/jvi.77.4.2779-2783.2003] [Citation(s) in RCA: 122] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The latency-associated nuclear antigen (LANA) of Kaposi's sarcoma-associated herpesvirus can associate with mitotic chromosomes and promote latent episome maintenance and segregation. Here we report that LANA also mediates the replication of plasmid DNAs bearing viral terminal repeats. The predicted secondary structure of LANA's C terminus reveals striking similarity to the known structure of the DNA-binding domain of Epstein-Barr virus EBNA1, despite the absence of primary sequence homology between these proteins, suggesting conservation of the key mechanistic features of latent gammaherpesvirus DNA replication.
Collapse
Affiliation(s)
- Adam Grundhoff
- Departments of Microbiology and Medicine, Howard Hughes Medical Institute, University of California Medical Center, San Francisco, CA 94143-0414, USA
| | | |
Collapse
|
28
|
Rigden DJ, Setlow P, Setlow B, Bagyan I, Stein RA, Jedrzejas MJ. PrfA protein of Bacillus species: prediction and demonstration of endonuclease activity on DNA. Protein Sci 2002; 11:2370-81. [PMID: 12237459 PMCID: PMC2373696 DOI: 10.1110/ps.0216802] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The prfA gene product of Gram-positive bacteria is unusual in being implicated in several cellular processes; cell wall synthesis, chromosome segregation, and DNA recombination and repair. However, no homology of PrfA with other proteins has been evident. Here we report a structural relationship between PrfA and the restriction enzyme PvuII, and thereby produce models that predict that PrfA binds DNA. Indeed, wild-type Bacillus stearothermophilus PrfA, but not a catalytic site mutant, nicked one strand of supercoiled plasmid templates leaving 5'-phosphate and 3'-hydroxyl termini. This activity, much lower on linear or relaxed circular double-stranded DNA or on single-stranded DNA, is consistent with a role for this protein in chromosome segregation, DNA recombination, or DNA repair.
Collapse
Affiliation(s)
- Daniel J Rigden
- National Centre of Genetic Resources and Biotechnology, Cenargen/Embrapa, Brasília, Brazil, D.F. 70770-900.
| | | | | | | | | | | |
Collapse
|
29
|
Samudrala R, Levitt M. A comprehensive analysis of 40 blind protein structure predictions. BMC STRUCTURAL BIOLOGY 2002; 2:3. [PMID: 12150712 PMCID: PMC122083 DOI: 10.1186/1472-6807-2-3] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2002] [Accepted: 08/01/2002] [Indexed: 11/21/2022]
Abstract
BACKGROUND We thoroughly analyse the results of 40 blind predictions for which an experimental answer was made available at the fourth meeting on the critical assessment of protein structure methods (CASP4). Using our comparative modelling and fold recognition methodologies, we made 29 predictions for targets that had sequence identities ranging from 50% to 10% to the nearest related protein with known structure. Using our ab initio methodologies, we made eleven predictions for targets that had no detectable sequence relationships. RESULTS For 23 of these proteins, we produced models ranging from 1.0 to 6.0 A root mean square deviation (RMSD) for the Calpha atoms between the model and the corresponding experimental structure for all or large parts of the protein, with model accuracies scaling fairly linearly with respect to sequence identity (i.e., the higher the sequence identity, the better the prediction). We produced nine models with accuracies ranging from 4.0 to 6.0 A Calpha RMSD for 60-100 residue proteins (or large fragments of a protein), with a prediction accuracy of 4.0 A Calpha RMSD for residues 1-80 for T110/rbfa. CONCLUSIONS The areas of protein structure prediction that work well, and areas that need improvement, are discernable by examining how our methods have performed over the past four CASP experiments. These results have implications for modelling the structure of all tractable proteins encoded by the genome of an organism.
Collapse
Affiliation(s)
- Ram Samudrala
- Department of Microbiology, University of Washington, School of Medicine, Seattle, WA 98195, USA
| | - Michael Levitt
- Department of Structural Biology, Stanford University, School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
30
|
Harton JA, O'Connor W, Conti BJ, Linhoff MW, Ting JPY. Leucine-rich repeats of the class II transactivator control its rate of nuclear accumulation. Hum Immunol 2002; 63:588-601. [PMID: 12072194 DOI: 10.1016/s0198-8859(02)00400-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Activation of class II major histocompatibility complex (MHC) gene expression is regulated by a master regulator, class II transcriptional activator (CIITA). Transactivation by CIITA requires its nuclear import. This study will address a mechanistic role for the leucine-rich repeats (LRR) of CIITA in regulating nuclear translocation by mutating 12 individual consensus-motif "leucine" residues in both its alpha-motifs and beta-motifs. While some leucine mutations in the LRR motif of CIITA cause congruent loss of transactivation function and nuclear import, other alanine substitutions in both the alpha-helices and the beta-sheets have normal transactivation function but a loss of nuclear accumulation (i.e., functional mutants). This seeming paradox is resolved by the observations that nuclear accumulation of these functional mutants does occur but is significantly less than wild-type. This difference is revealed only in the presence of leptomycin B and actinomycin D, which permit examination of nuclear accumulation unencumbered by nuclear export and new CIITA synthesis. Further analysis of these mutants reveals that at limiting concentrations of CIITA, a dramatic difference in transactivation function between mutants and wild-type CIITA is easily detected, in agreement with their lowered nuclear accumulation. These experiments reveal an interesting aspect of LRR in controlling the amount of nuclear accumulation.
Collapse
Affiliation(s)
- Jonathan A Harton
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA
| | | | | | | | | |
Collapse
|
31
|
Cristobal S, Zemla A, Fischer D, Rychlewski L, Elofsson A. A study of quality measures for protein threading models. BMC Bioinformatics 2001; 2:5. [PMID: 11545673 PMCID: PMC55330 DOI: 10.1186/1471-2105-2-5] [Citation(s) in RCA: 148] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2001] [Accepted: 08/01/2001] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Prediction of protein structures is one of the fundamental challenges in biology today. To fully understand how well different prediction methods perform, it is necessary to use measures that evaluate their performance. Every two years, starting in 1994, the CASP (Critical Assessment of protein Structure Prediction) process has been organized to evaluate the ability of different predictors to blindly predict the structure of proteins. To capture different features of the models, several measures have been developed during the CASP processes. However, these measures have not been examined in detail before. In an attempt to develop fully automatic measures that can be used in CASP, as well as in other type of benchmarking experiments, we have compared twenty-one measures. These measures include the measures used in CASP3 and CASP2 as well as have measures introduced later. We have studied their ability to distinguish between the better and worse models submitted to CASP3 and the correlation between them. RESULTS Using a small set of 1340 models for 23 different targets we show that most methods correlate with each other. Most pairs of measures show a correlation coefficient of about 0.5. The correlation is slightly higher for measures of similar types. We found that a significant problem when developing automatic measures is how to deal with proteins of different length. Also the comparisons between different measures is complicated as many measures are dependent on the size of the target. We show that the manual assessment can be reproduced to about 70% using automatic measures. Alignment independent measures, detects slightly more of the models with the correct fold, while alignment dependent measures agree better when selecting the best models for each target. Finally we show that using automatic measures would, to a large extent, reproduce the assessors ranking of the predictors at CASP3. CONCLUSIONS We show that given a sufficient number of targets the manual and automatic measures would have given almost identical results at CASP3. If the intent is to reproduce the type of scoring done by the manual assessor in in CASP3, the best approach might be to use a combination of alignment independent and alignment dependent measures, as used in several recent studies.
Collapse
Affiliation(s)
- Susana Cristobal
- Cell and Molecular Biology Department, Box 596. BMC Uppsala University, SE-751 24 Uppsala, Sweden
| | - Adam Zemla
- Lawrence Livermore National Laboratory, 7000 East Ave., Livermore, CA 94550-9234 USA
| | - Daniel Fischer
- Department Bioinformatics/Computer Science, Ben Gurion University, Beer-Sheva 84015, Israel
| | - Leszek Rychlewski
- International Institute of Molecular and Cell Biology, Ks. Trojdena 4, 02-109 Warsaw, Poland
| | - Arne Elofsson
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden
| |
Collapse
|
32
|
Rodrigues-Lima F, Deloménie C, Goodfellow GH, Grant DM, Dupret JM. Homology modelling and structural analysis of human arylamine N-acetyltransferase NAT1: evidence for the conservation of a cysteine protease catalytic domain and an active-site loop. Biochem J 2001; 356:327-34. [PMID: 11368758 PMCID: PMC1221842 DOI: 10.1042/0264-6021:3560327] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Arylamine N-acetyltransferases (EC 2.3.1.5) (NATs) catalyse the biotransformation of many primary arylamines, hydrazines and their N-hydroxylated metabolites, thereby playing an important role in both the detoxification and metabolic activation of numerous xenobiotics. The recently published crystal structure of the Salmonella typhimurium NAT (StNAT) revealed the existence of a cysteine protease-like (Cys-His-Asp) catalytic triad. In the present study, a three-dimensional homology model of human NAT1, based upon the crystal structure of StNAT [Sinclair, Sandy, Delgoda, Sim and Noble (2000) Nat. Struct. Biol. 7, 560-564], is demonstrated. Alignment of StNAT and NAT1, together with secondary structure predictions, have defined a consensus region (residues 29-131) in which 37% of the residues are conserved. Homology modelling provided a good quality model of the corresponding region in human NAT1. The location of the catalytic triad was found to be identical in StNAT and NAT1. Comparison of active-site structural elements revealed that a similar length loop is conserved in both species (residues 122-131 in NAT1 model and residues 122-133 in StNAT). This observation may explain the involvement of residues 125, 127 and 129 in human NAT substrate selectivity. Our model, and the fact that cysteine protease inhibitors do not affect the activity of NAT1, suggests that human NATs may have adapted a common catalytic mechanism from cysteine proteases to accommodate it for acetyl-transfer reactions.
Collapse
Affiliation(s)
- F Rodrigues-Lima
- CNRS-UMR7000, Faculté de Médecine Pitié-Salpêtrière, 105 bd de l'Hôpital, 75013 Paris, France
| | | | | | | | | |
Collapse
|
33
|
Bujnicki JM, Elofsson A, Fischer D, Rychlewski L. LiveBench-1: continuous benchmarking of protein structure prediction servers. Protein Sci 2001; 10:352-61. [PMID: 11266621 PMCID: PMC2373940 DOI: 10.1110/ps.40501] [Citation(s) in RCA: 101] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
We present a novel, continuous approach aimed at the large-scale assessment of the performance of available fold-recognition servers. Six popular servers were investigated: PDB-Blast, FFAS, T98-lib, GenTHREADER, 3D-PSSM, and INBGU. The assessment was conducted using as prediction targets a large number of selected protein structures released from October 1999 to April 2000. A target was selected if its sequence showed no significant similarity to any of the proteins previously available in the structural database. Overall, the servers were able to produce structurally similar models for one-half of the targets, but significantly accurate sequence-structure alignments were produced for only one-third of the targets. We further classified the targets into two sets: easy and hard. We found that all servers were able to find the correct answer for the vast majority of the easy targets if a structurally similar fold was present in the server's fold libraries. However, among the hard targets--where standard methods such as PSI-BLAST fail--the most sensitive fold-recognition servers were able to produce similar models for only 40% of the cases, half of which had a significantly accurate sequence-structure alignment. Among the hard targets, the presence of updated libraries appeared to be less critical for the ranking. An "ideally combined consensus" prediction, where the results of all servers are considered, would increase the percentage of correct assignments by 50%. Each server had a number of cases with a correct assignment, where the assignments of all the other servers were wrong. This emphasizes the benefits of considering more than one server in difficult prediction tasks. The LiveBench program (http://BioInfo.PL/LiveBench) is being continued, and all interested developers are cordially invited to join.
Collapse
Affiliation(s)
- J M Bujnicki
- Bioinformatics Laboratory, International Institute of Molecular and Cell Biology, 02-109 Warsaw, Poland
| | | | | | | |
Collapse
|
34
|
Iwadate M, Ebisawa K, Umeyama H. Comparative Modeling of CAFASP2 Competition. CHEM-BIO INFORMATICS JOURNAL 2001. [DOI: 10.1273/cbij.1.136] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Affiliation(s)
- Mitsuo Iwadate
- Department of Biomolecular Design School of Pharmaceutical Sciences, Kitasato University
| | - Kazuyoshi Ebisawa
- Department of Biomolecular Design School of Pharmaceutical Sciences, Kitasato University
| | - Hideaki Umeyama
- Department of Biomolecular Design School of Pharmaceutical Sciences, Kitasato University
| |
Collapse
|
35
|
Abstract
The threading approach to protein fold recognition attempts to evaluate how well a query sequence fits into an already-solved fold. 3D-1D threaders rely on matching 1-dimensional strings of 3-dimensional information predicted from the query sequence with corresponding features of the target structure. In many cases this is combined with a sequence comparison. The combination of sequence and structure information has been shown to improve the accuracy of fold recognition, relative to the exclusive use of sequence or structure. In this paper, we review progress made since the introduction of threading methods a decade ago, highlighting recent advances. We focus on two emerging methods that are unconventional 3D-1D threaders: proximity correlation matrices and parallel cascade identification.
Collapse
Affiliation(s)
- R David
- Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| | | | | |
Collapse
|