Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Karplus K, Barrett C, Cline M, Diekhans M, Grate L, Hughey R. Predicting protein structure using only sequence information. Proteins 1999;Suppl 3:121-5. [PMID: 10526360 DOI: 10.1002/(sici)1097-0134(1999)37:3+<121::aid-prot16>3.3.co;2-h] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

For:	Karplus K, Barrett C, Cline M, Diekhans M, Grate L, Hughey R. Predicting protein structure using only sequence information. Proteins 1999;Suppl 3:121-5. [PMID: 10526360 DOI: 10.1002/(sici)1097-0134(1999)37:3+<121::aid-prot16>3.3.co;2-h] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Number

Cited by Other Article(s)

Kanto S, Grynberg M, Kaneko Y, Fujita J, Satake M. A variant of Runx2 that differs from the bone isoform in its splicing is expressed in spermatogenic cells. PeerJ 2016;4:e1862. [PMID: 27069802 PMCID: PMC4824880 DOI: 10.7717/peerj.1862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Accepted: 03/09/2016] [Indexed: 11/20/2022] Open

Abstract

Background. Members of the Runx gene family encode transcription factors that bind to DNA in a sequence-specific manner. Among the three Runx proteins, Runx2 comprises 607 amino acid (aa) residues, is expressed in bone, and plays crucial roles in osteoblast differentiation and bone development. We examined whether the Runx2 gene is also expressed in testes.

Methods. Murine testes from 1-, 2-, 3-, 4-, and 10-week-old male mice of the C57BL/6J strain and W∕W^v strain were used throughout the study. Northern Blot Analyses were performed using extracts form the murine testes. Sequencing of cDNA clones and 5′-rapid amplification of cDNA ends were performed to determine the full length of the transcripts, which revealed that the testicular Runx2 comprises 106 aa residues coding novel protein. Generating an antiserum using the amino-terminal 15 aa of Runx2 (Met¹ to Gly¹⁵) as an antigen, immunoblot analyses were performed to detect the predicted polypeptide of 106 aa residues with the initiating Met¹. With the affinity-purified anti-Runx2 antibody, immunohistochemical analyses were performed to elucidate the localization of the protein. Furthermore, bioinformatic analyses were performed to predict the function of the protein.

Results. A Runx2 transcript was detected in testes and was specifically expressed in germ cells. Determination of the transcript structure indicated that the testicular Runx2 is a splice isoform. The predicted testicular Runx2 polypeptide is composed of only 106 aa residues, lacks a Runt domain, and appears to be a basic protein with a predominantly alpha-helical conformation. Immunoblot analyses with an anti-Runx2 antibody revealed that Met¹ in the deduced open reading frame of Runx2 is used as the initiation codon to express an 11 kDa protein. Furthermore, immunohistochemical analyses revealed that the Runx2 polypeptide was located in the nuclei, and was detected in spermatocytes at the stages of late pachytene, diplotene and second meiotic cells as well as in round spermatids. Bioinformatic analyses suggested that the testicular Runx2 is a histone-like protein.

Discussion. A variant of Runx2 that differs from the bone isoform in its splicing is expressed in pachytene spermatocytes and round spermatids in testes, and encodes a histone-like, nuclear protein of 106 aa residues. Considering its nuclear localization and differentiation stage-dependent expression, Runx2 may function as a chromatin-remodeling factor during spermatogenesis. We thus conclude that a single Runx2 gene can encode two different types of nuclear proteins, a previously defined transcription factor in bone and cartilage and a short testicular variant that lacks a Runt domain.

Collapse

Tong J, Sadreyev RI, Pei J, Kinch LN, Grishin NV. Using homology relations within a database markedly boosts protein sequence similarity search. Proc Natl Acad Sci U S A 2015;112:7003-8. [PMID: 26038555 PMCID: PMC4460465 DOI: 10.1073/pnas.1424324112] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Yaseen A, Li Y. Template-based C8-SCORPION: a protein 8-state secondary structure prediction method using structural information and context-based features. BMC Bioinformatics 2014;15 Suppl 8:S3. [PMID: 25080939 PMCID: PMC4120151 DOI: 10.1186/1471-2105-15-s8-s3] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Tataru P, Sand A, Hobolth A, Mailund T, Pedersen CNS. Algorithms for hidden markov models restricted to occurrences of regular expressions. BIOLOGY 2013;2:1282-95. [PMID: 24833225 PMCID: PMC4009796 DOI: 10.3390/biology2041282] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2013] [Revised: 10/08/2013] [Accepted: 11/05/2013] [Indexed: 11/24/2022]

Conotoxin protein classification using free scores of words and support vector machines. BMC Bioinformatics 2011;12:217. [PMID: 21619696 PMCID: PMC3133552 DOI: 10.1186/1471-2105-12-217] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2010] [Accepted: 05/29/2011] [Indexed: 11/23/2022] Open

Arenas NE, Salazar LM, Soto CY, Vizcaíno C, Patarroyo ME, Patarroyo MA, Gómez A. Molecular modeling and in silico characterization of Mycobacterium tuberculosis TlyA: possible misannotation of this tubercle bacilli-hemolysin. BMC STRUCTURAL BIOLOGY 2011;11:16. [PMID: 21443791 PMCID: PMC3072309 DOI: 10.1186/1472-6807-11-16] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2010] [Accepted: 03/28/2011] [Indexed: 11/24/2022]

Abstract

Background

The TlyA protein has a controversial function as a virulence factor in Mycobacterium tuberculosis (M. tuberculosis). At present, its dual activity as hemolysin and RNA methyltransferase in M. tuberculosis has been indirectly proposed based on in vitro results. There is no evidence however for TlyA relevance in the survival of tubercle bacilli inside host cells or whether both activities are functionally linked. A thorough analysis of structure prediction for this mycobacterial protein in this study shows the need for reevaluating TlyA's function in virulence.

Results

Bioinformatics analysis of TlyA identified a ribosomal protein binding domain (S4 domain), located between residues 5 and 68 as well as an FtsJ-like methyltranferase domain encompassing residues 62 and 247, all of which have been previously described in translation machinery-associated proteins. Subcellular localization prediction showed that TlyA lacks a signal peptide and its hydrophobicity profile showed no evidence of transmembrane helices. These findings suggested that it may not be attached to the membrane, which is consistent with a cytoplasmic localization. Three-dimensional modeling of TlyA showed a consensus structure, having a common core formed by a six-stranded β-sheet between two α-helix layers, which is consistent with an RNA methyltransferase structure. Phylogenetic analyses showed high conservation of the tlyA gene among Mycobacterium species. Additionally, the nucleotide substitution rates suggested purifying selection during tlyA gene evolution and the absence of a common ancestor between TlyA proteins and bacterial pore-forming proteins.

Conclusion

Altogether, our manual in silico curation suggested that TlyA is involved in ribosomal biogenesis and that there is a functional annotation error regarding this protein family in several microbial and plant genomes, including the M. tuberculosis genome.

Collapse

Kalkhof S, Haehn S, Paulsson M, Smyth N, Meiler J, Sinz A. Computational modeling of laminin N-terminal domains using sparse distance constraints from disulfide bonds and chemical cross-linking. Proteins 2010;78:3409-27. [PMID: 20939100 PMCID: PMC5079110 DOI: 10.1002/prot.22848] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2010] [Revised: 07/16/2010] [Accepted: 07/25/2010] [Indexed: 11/10/2022]

Kountouris P, Hirst JD. Predicting beta-turns and their types using predicted backbone dihedral angles and secondary structures. BMC Bioinformatics 2010;11:407. [PMID: 20673368 PMCID: PMC2920885 DOI: 10.1186/1471-2105-11-407] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2010] [Accepted: 07/31/2010] [Indexed: 11/29/2022] Open

Schmidt am Busch M, Sedano A, Simonson T. Computational protein design: validation and possible relevance as a tool for homology searching and fold recognition. PLoS One 2010;5:e10410. [PMID: 20463972 PMCID: PMC2864755 DOI: 10.1371/journal.pone.0010410] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2009] [Accepted: 03/31/2010] [Indexed: 11/19/2022] Open

Abstract

BACKGROUND

Protein fold recognition usually relies on a statistical model of each fold; each model is constructed from an ensemble of natural sequences belonging to that fold. A complementary strategy may be to employ sequence ensembles produced by computational protein design. Designed sequences can be more diverse than natural sequences, possibly avoiding some limitations of experimental databases.

METHODOLOGY/PRINCIPAL FINDINGS

WE EXPLORE THIS STRATEGY FOR FOUR SCOP FAMILIES: Small Kunitz-type inhibitors (SKIs), Interleukin-8 chemokines, PDZ domains, and large Caspase catalytic subunits, represented by 43 structures. An automated procedure is used to redesign the 43 proteins. We use the experimental backbones as fixed templates in the folded state and a molecular mechanics model to compute the interaction energies between sidechain and backbone groups. Calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is used to scan the sequence and conformational space, yielding 200,000-300,000 sequences per backbone template. The results confirm and generalize our earlier study of SH2 and SH3 domains. The designed sequences ressemble moderately-distant, natural homologues of the initial templates; e.g., the SUPERFAMILY, profile Hidden-Markov Model library recognizes 85% of the low-energy sequences as native-like. Conversely, Position Specific Scoring Matrices derived from the sequences can be used to detect natural homologues within the SwissProt database: 60% of known PDZ domains are detected and around 90% of known SKIs and chemokines. Energy components and inter-residue correlations are analyzed and ways to improve the method are discussed.

CONCLUSIONS/SIGNIFICANCE

For some families, designed sequences can be a useful complement to experimental ones for homologue searching. However, improved tools are needed to extract more information from the designed profiles before the method can be of general use.

Collapse

Kountouris P, Hirst JD. Prediction of backbone dihedral angles and protein secondary structure using support vector machines. BMC Bioinformatics 2009;10:437. [PMID: 20025785 PMCID: PMC2811710 DOI: 10.1186/1471-2105-10-437] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2009] [Accepted: 12/22/2009] [Indexed: 11/26/2022] Open

Green JR, Korenberg MJ, Aboul-Magd MO. PCI-SS: MISO dynamic nonlinear protein secondary structure prediction. BMC Bioinformatics 2009;10:222. [PMID: 19615046 PMCID: PMC2720391 DOI: 10.1186/1471-2105-10-222] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2008] [Accepted: 07/17/2009] [Indexed: 11/10/2022] Open

Abstract

Background

Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology. Here we report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishing α-helices, β-strands, and non-regular structures) from primary sequence data which makes use of Parallel Cascade Identification (PCI), a powerful technique from the field of nonlinear system identification.

Results

Using PSI-BLAST divergent evolutionary profiles as input data, dynamic nonlinear systems are built through a black-box approach to model the process of protein folding. Genetic algorithms (GAs) are applied in order to optimize the architectural parameters of the PCI models. The three-state prediction problem is broken down into a combination of three binary sub-problems and protein structure classifiers are built using 2 layers of PCI classifiers. Careful construction of the optimization, training, and test datasets ensures that no homology exists between any training and testing data. A detailed comparison between PCI and 9 contemporary methods is provided over a set of 125 new protein chains guaranteed to be dissimilar to all training data. Unlike other secondary structure prediction methods, here a web service is developed to provide both human- and machine-readable interfaces to PCI-based protein secondary structure prediction. This server, called PCI-SS, is available at . In addition to a dynamic PHP-generated web interface for humans, a Simple Object Access Protocol (SOAP) interface is added to permit invocation of the PCI-SS service remotely. This machine-readable interface facilitates incorporation of PCI-SS into multi-faceted systems biology analysis pipelines requiring protein secondary structure information, and greatly simplifies high-throughput analyses. XML is used to represent the input protein sequence data and also to encode the resulting structure prediction in a machine-readable format. To our knowledge, this represents the only publicly available SOAP-interface for a protein secondary structure prediction service with published WSDL interface definition.

Conclusion

Relative to the 9 contemporary methods included in the comparison cascaded PCI classifiers perform well, however PCI finds greatest application as a consensus classifier. When PCI is used to combine a sequence-to-structure PCI-based classifier with the current leading ANN-based method, PSIPRED, the overall error rate (Q3) is maintained while the rate of occurrence of a particularly detrimental error is reduced by up to 25%. This improvement in BAD score, combined with the machine-readable SOAP web service interface makes PCI-SS particularly useful for inclusion in a tertiary structure prediction pipeline.

Collapse

Karplus K. SAM-T08, HMM-based protein structure prediction. Nucleic Acids Res 2009;37:W492-7. [PMID: 19483096 PMCID: PMC2703928 DOI: 10.1093/nar/gkp403] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Wang Y, Sadreyev RI, Grishin NV. PROCAIN: protein profile comparison with assisting information. Nucleic Acids Res 2009;37:3522-30. [PMID: 19357092 PMCID: PMC2699500 DOI: 10.1093/nar/gkp212] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Katzman S, Barrett C, Thiltgen G, Karchin R, Karplus K. PREDICT-2ND: a tool for generalized protein local structure prediction. ACTA ACUST UNITED AC 2008;24:2453-9. [PMID: 18757875 DOI: 10.1093/bioinformatics/btn438] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Abstract

MOTIVATION

Predictions of protein local structure, derived from sequence alignment information alone, provide visualization tools for biologists to evaluate the importance of amino acid residue positions of interest in the absence of X-ray crystal/NMR structures or homology models. They are also useful as inputs to sequence analysis and modeling tools, such as hidden Markov models (HMMs), which can be used to search for homology in databases of known protein structure. In addition, local structure predictions can be used as a component of cost functions in genetic algorithms that predict protein tertiary structure. We have developed a program (predict-2nd) that trains multilayer neural networks and have applied it to numerous local structure alphabets, tuning network parameters such as the number of layers, the number of units in each layer and the window sizes of each layer. We have had the most success with four-layer networks, with gradually increasing window sizes at each layer.

RESULTS

Because the four-layer neural nets occasionally get trapped in poor local optima, our training protocol now uses many different random starts, with short training runs, followed by more training on the best performing networks from the short runs. One recent addition to the program is the option to add a guide sequence to the profile inputs, increasing the number of inputs per position by 20. We find that use of a guide sequence provides a small but consistent improvement in the predictions for several different local-structure alphabets.

AVAILABILITY

Local structure prediction with the methods described here is available for use online at http://www.soe.ucsc.edu/compbio/SAM_T08/T08-query.html. The source code and example networks for PREDICT-2ND are available at http://www.soe.ucsc.edu/~karplus/predict-2nd/ A required C++ library is available at http://www.soe.ucsc.edu/~karplus/ultimate/

Collapse

Application of nonnegative matrix factorization to improve profile-profile alignment features for fold recognition and remote homolog detection. BMC Bioinformatics 2008;9:298. [PMID: 18590572 PMCID: PMC2459191 DOI: 10.1186/1471-2105-9-298] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2008] [Accepted: 07/01/2008] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Nonnegative matrix factorization (NMF) is a feature extraction method that has the property of intuitive part-based representation of the original features. This unique ability makes NMF a potentially promising method for biological sequence analysis. Here, we apply NMF to fold recognition and remote homolog detection problems. Recent studies have shown that combining support vector machines (SVM) with profile-profile alignments improves performance of fold recognition and remote homolog detection remarkably. However, it is not clear which parts of sequences are essential for the performance improvement.

RESULTS

The performance of fold recognition and remote homolog detection using NMF features is compared to that of the unmodified profile-profile alignment (PPA) features by estimating Receiver Operating Characteristic (ROC) scores. The overall performance is noticeably improved. For fold recognition at the fold level, SVM with NMF features recognize 30% of homolog proteins at > 0.99 ROC scores, while original PPA feature, HHsearch, and PSI-BLAST recognize almost none. For detecting remote homologs that are related at the superfamily level, NMF features also achieve higher performance than the original PPA features. At > 0.90 ROC50 scores, 25% of proteins with NMF features correctly detects remotely related proteins, whereas using original PPA features only 1% of proteins detect remote homologs. In addition, we investigate the effect of number of positive training examples and the number of basis vectors on performance improvement. We also analyze the ability of NMF to extract essential features by comparing NMF basis vectors with functionally important sites and structurally conserved regions of proteins. The results show that NMF basis vectors have significant overlap with functional sites from PROSITE and with structurally conserved regions from the multiple structural alignments generated by MUSTANG. The correlation between NMF basis vectors and biologically essential parts of proteins supports our conjecture that NMF basis vectors can explicitly represent important sites of proteins.

CONCLUSION

The present work demonstrates that applying NMF to profile-profile alignments can reveal essential features of proteins and that these features significantly improve the performance of fold recognition and remote homolog detection.

Collapse

Singh A, Kushwaha HR, Sharma P. Molecular modelling and comparative structural account of aspartyl beta-semialdehyde dehydrogenase of Mycobacterium tuberculosis (H37Rv). J Mol Model 2008;14:249-63. [PMID: 18236087 DOI: 10.1007/s00894-008-0267-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2007] [Accepted: 01/03/2008] [Indexed: 11/29/2022]

Yao XQ, Zhu H, She ZS. A dynamic Bayesian network approach to protein secondary structure prediction. BMC Bioinformatics 2008;9:49. [PMID: 18218144 PMCID: PMC2266706 DOI: 10.1186/1471-2105-9-49] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2007] [Accepted: 01/25/2008] [Indexed: 11/19/2022] Open

Abstract

Background

Protein secondary structure prediction method based on probabilistic models such as hidden Markov model (HMM) appeals to many because it provides meaningful information relevant to sequence-structure relationship. However, at present, the prediction accuracy of pure HMM-type methods is much lower than that of machine learning-based methods such as neural networks (NN) or support vector machines (SVM).

Results

In this paper, we report a new method of probabilistic nature for protein secondary structure prediction, based on dynamic Bayesian networks (DBN). The new method models the PSI-BLAST profile of a protein sequence using a multivariate Gaussian distribution, and simultaneously takes into account the dependency between the profile and secondary structure and the dependency between profiles of neighboring residues. In addition, a segment length distribution is introduced for each secondary structure state. Tests show that the DBN method has made a significant improvement in the accuracy compared to other pure HMM-type methods. Further improvement is achieved by combining the DBN with an NN, a method called DBNN, which shows better Q₃accuracy than many popular methods and is competitive to the current state-of-the-arts. The most interesting feature of DBN/DBNN is that a significant improvement in the prediction accuracy is achieved when combined with other methods by a simple consensus.

Conclusion

The DBN method using a Gaussian distribution for the PSI-BLAST profile and a high-ordered dependency between profiles of neighboring residues produces significantly better prediction accuracy than other HMM-type probabilistic methods. Owing to their different nature, the DBN and NN combine to form a more accurate method DBNN. Future improvement may be achieved by combining DBNN with a method of SVM type.

Collapse

Sadreyev RI, Tang M, Kim BH, Grishin NV. COMPASS server for remote homology inference. Nucleic Acids Res 2007;35:W653-8. [PMID: 17517780 PMCID: PMC1933213 DOI: 10.1093/nar/gkm293] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Saha RP, Chakrabarti P. Molecular modeling and characterization of Vibrio cholerae transcription regulator HlyU. BMC STRUCTURAL BIOLOGY 2006;6:24. [PMID: 17116251 PMCID: PMC1665450 DOI: 10.1186/1472-6807-6-24] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/17/2006] [Accepted: 11/20/2006] [Indexed: 11/15/2022]

Ku CJ, Yona G. The distance-profile representation and its application to detection of distantly related protein families. BMC Bioinformatics 2005;6:282. [PMID: 16316461 PMCID: PMC1345692 DOI: 10.1186/1471-2105-6-282] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2005] [Accepted: 11/29/2005] [Indexed: 11/11/2022] Open

Choo KH, Tong JC, Zhang L. Recent applications of Hidden Markov Models in computational biology. GENOMICS PROTEOMICS & BIOINFORMATICS 2005;2:84-96. [PMID: 15629048 PMCID: PMC5172443 DOI: 10.1016/s1672-0229(04)02014-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Nakai S, Li-Chan ECY, Dou J. Pattern similarity study of functional sites in protein sequences: lysozymes and cystatins. BMC BIOCHEMISTRY 2005;6:9. [PMID: 15904486 PMCID: PMC1173080 DOI: 10.1186/1471-2091-6-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2004] [Accepted: 05/18/2005] [Indexed: 11/10/2022]

Abstract

BACKGROUND

Although it is generally agreed that topography is more conserved than sequences, proteins sharing the same fold can have different functions, while there are protein families with low sequence similarity. An alternative method for profile analysis of characteristic conserved positions of the motifs within the 3D structures may be needed for functional annotation of protein sequences. Using the approach of quantitative structure-activity relationships (QSAR), we have proposed a new algorithm for postulating functional mechanisms on the basis of pattern similarity and average of property values of side-chains in segments within sequences. This approach was used to search for functional sites of proteins belonging to the lysozyme and cystatin families.

RESULTS

Hydrophobicity and beta-turn propensity of reference segments with 3-7 residues were used for the homology similarity search (HSS) for active sites. Hydrogen bonding was used as the side-chain property for searching the binding sites of lysozymes. The profiles of similarity constants and average values of these parameters as functions of their positions in the sequences could identify both active and substrate binding sites of the lysozyme of Streptomyces coelicolor, which has been reported as a new fold enzyme (Cellosyl). The same approach was successfully applied to cystatins, especially for postulating the mechanisms of amyloidosis of human cystatin C as well as human lysozyme.

CONCLUSION

Pattern similarity and average index values of structure-related properties of side chains in short segments of three residues or longer were, for the first time, successfully applied for predicting functional sites in sequences. This new approach may be applicable to studying functional sites in un-annotated proteins, for which complete 3D structures are not yet available.

Collapse

Ginalski K, Grishin NV, Godzik A, Rychlewski L. Practical lessons from protein structure prediction. Nucleic Acids Res 2005;33:1874-91. [PMID: 15805122 PMCID: PMC1074308 DOI: 10.1093/nar/gki327] [Citation(s) in RCA: 99] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Wu KP, Lin HN, Chang JM, Sung TY, Hsu WL. HYPROSP: a hybrid protein secondary structure prediction algorithm--a knowledge-based approach. Nucleic Acids Res 2004;32:5059-65. [PMID: 15448186 PMCID: PMC521652 DOI: 10.1093/nar/gkh836] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Sadreyev RI, Baker D, Grishin NV. Profile-profile comparisons by COMPASS predict intricate homologies between protein families. Protein Sci 2004;12:2262-72. [PMID: 14500884 PMCID: PMC2366929 DOI: 10.1110/ps.03197403] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Zhang DQ, Liu B, Feng DR, He YM, Wang SQ, Wang HB, Wang JF. Significance of conservative asparagine residues in the thermal hysteresis activity of carrot antifreeze protein. Biochem J 2004;377:589-95. [PMID: 14531728 PMCID: PMC1223888 DOI: 10.1042/bj20031249] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2003] [Revised: 10/06/2003] [Accepted: 10/08/2003] [Indexed: 11/17/2022]

Koh IYY, Eyrich VA, Marti-Renom MA, Przybylski D, Madhusudhan MS, Eswar N, Graña O, Pazos F, Valencia A, Sali A, Rost B. EVA: Evaluation of protein structure prediction servers. Nucleic Acids Res 2003;31:3311-5. [PMID: 12824315 PMCID: PMC169025 DOI: 10.1093/nar/gkg619] [Citation(s) in RCA: 134] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Mallick P, Weiss R, Eisenberg D. The directional atomic solvation energy: an atom-based potential for the assignment of protein sequences to known folds. Proc Natl Acad Sci U S A 2002;99:16041-6. [PMID: 12461172 PMCID: PMC138561 DOI: 10.1073/pnas.252626399] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Samudrala R, Levitt M. A comprehensive analysis of 40 blind protein structure predictions. BMC STRUCTURAL BIOLOGY 2002;2:3. [PMID: 12150712 PMCID: PMC122083 DOI: 10.1186/1472-6807-2-3] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2002] [Accepted: 08/01/2002] [Indexed: 11/21/2022]

Koretke KK, Russell RB, Lupas AN. Fold recognition without folds. Protein Sci 2002;11:1575-9. [PMID: 12021456 PMCID: PMC2373620 DOI: 10.1110/ps.3590102] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]

MacGregor EA. Possible structure and active site residues of starch, glycogen, and sucrose synthases. JOURNAL OF PROTEIN CHEMISTRY 2002;21:297-306. [PMID: 12168700 DOI: 10.1023/a:1019701621256] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Bujnicki JM, Rychlewski L. RNA:(guanine-N2) methyltransferases RsmC/RsmD and their homologs revisited--bioinformatic analysis and prediction of the active site based on the uncharacterized Mj0882 protein structure. BMC Bioinformatics 2002;3:10. [PMID: 11929612 PMCID: PMC102759 DOI: 10.1186/1471-2105-3-10] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2001] [Accepted: 04/03/2002] [Indexed: 01/01/2023] Open

Bonneau R, Baker D. Ab initio protein structure prediction: progress and prospects. ANNUAL REVIEW OF BIOPHYSICS AND BIOMOLECULAR STRUCTURE 2001;30:173-89. [PMID: 11340057 DOI: 10.1146/annurev.biophys.30.1.173] [Citation(s) in RCA: 226] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Rognes T. ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches. Nucleic Acids Res 2001;29:1647-52. [PMID: 11266569 PMCID: PMC31274 DOI: 10.1093/nar/29.7.1647] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Bujnicki JM, Elofsson A, Fischer D, Rychlewski L. LiveBench-1: continuous benchmarking of protein structure prediction servers. Protein Sci 2001;10:352-61. [PMID: 11266621 PMCID: PMC2373940 DOI: 10.1110/ps.40501] [Citation(s) in RCA: 101] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]

Abstract

We present a novel, continuous approach aimed at the large-scale assessment of the performance of available fold-recognition servers. Six popular servers were investigated: PDB-Blast, FFAS, T98-lib, GenTHREADER, 3D-PSSM, and INBGU. The assessment was conducted using as prediction targets a large number of selected protein structures released from October 1999 to April 2000. A target was selected if its sequence showed no significant similarity to any of the proteins previously available in the structural database. Overall, the servers were able to produce structurally similar models for one-half of the targets, but significantly accurate sequence-structure alignments were produced for only one-third of the targets. We further classified the targets into two sets: easy and hard. We found that all servers were able to find the correct answer for the vast majority of the easy targets if a structurally similar fold was present in the server's fold libraries. However, among the hard targets--where standard methods such as PSI-BLAST fail--the most sensitive fold-recognition servers were able to produce similar models for only 40% of the cases, half of which had a significantly accurate sequence-structure alignment. Among the hard targets, the presence of updated libraries appeared to be less critical for the ranking. An "ideally combined consensus" prediction, where the results of all servers are considered, would increase the percentage of correct assignments by 50%. Each server had a number of cases with a correct assignment, where the assignments of all the other servers were wrong. This emphasizes the benefits of considering more than one server in difficult prediction tasks. The LiveBench program (http://BioInfo.PL/LiveBench) is being continued, and all interested developers are cordially invited to join.

Collapse

Jaroszewski L, Rychlewski L, Godzik A. Improving the quality of twilight-zone alignments. Protein Sci 2000;9:1487-96. [PMID: 10975570 PMCID: PMC2144727 DOI: 10.1110/ps.9.8.1487] [Citation(s) in RCA: 99] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Abstract

Several recent publications illustrated advantages of using sequence profiles in recognizing distant homologies between proteins. At the same time, the practical usefulness of distant homology recognition depends not only on the sensitivity of the algorithm, but also on the quality of the alignment between a prediction target and the template from the database of known proteins. Here, we study this question for several supersensitive protein algorithms that were previously compared in their recognition sensitivity (Rychlewski et al., 2000). A database of protein pairs with similar structures, but low sequence similarity is used to rate the alignments obtained with several different methods, which included sequence-sequence, sequence-profile, and profile-profile alignment methods. We show that incorporation of evolutionary information encoded in sequence profiles into alignment calculation methods significantly increases the alignment accuracy, bringing them closer to the alignments obtained from structure comparison. In general, alignment quality is correlated with recognition and alignment score significance. For every alignment method, alignments with statistically significant scores correlate with both correct structural templates and good quality alignments. At the same time, average alignment lengths differ in various methods, making the comparison between them difficult. For instance, the alignments obtained by FFAS, the profile-profile alignment algorithm developed in our group are always longer that the alignments obtained with the PSI-BLAST algorithms. To address this problem, we develop methods to truncate or extend alignments to cover a specified percentage of protein lengths. In most cases, the elongation of the alignment by profile-profile methods is reasonable, adding fragments of similar structure. The examples of erroneous alignment are examined and it is shown that they can be identified based on the model quality.

Collapse