Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Doolittle RF. Similar amino acid sequences: chance or common ancestry? Science 1981;214:149-59. [PMID: 7280687 DOI: 10.1126/science.7280687] [Citation(s) in RCA: 623] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]

Number

Cited by Other Article(s)

101

Sitbon E, Pietrokovski S. Occurrence of protein structure elements in conserved sequence regions. BMC STRUCTURAL BIOLOGY 2007;7:3. [PMID: 17210087 PMCID: PMC1781454 DOI: 10.1186/1472-6807-7-3] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/04/2006] [Accepted: 01/09/2007] [Indexed: 11/19/2022]

102

Raghava GPS, Barton GJ. Quantification of the variation in percentage identity for protein sequence alignments. BMC Bioinformatics 2006;7:415. [PMID: 16984632 PMCID: PMC1592310 DOI: 10.1186/1471-2105-7-415] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2005] [Accepted: 09/19/2006] [Indexed: 11/26/2022] Open

Abstract

Background

Percentage Identity (PID) is frequently quoted in discussion of sequence alignments since it appears simple and easy to understand. However, although there are several different ways to calculate percentage identity and each may yield a different result for the same alignment, the method of calculation is rarely reported. Accordingly, quantification of the variation in PID caused by the different calculations would help in interpreting PID values in the literature. In this study, the variation in PID was quantified systematically on a reference set of 1028 alignments generated by comparison of the protein three-dimensional structures. Since the alignment algorithm may also affect the range of PID, this study also considered the effect of algorithm, and the combination of algorithm and PID method.

Results

The maximum variation in PID due to the calculation method was 11.5% while the effect of alignment algorithm on PID was up to 14.6% across three popular alignment methods. The combined effect of alignment algorithm and PID calculation gave a variation of up to 22% on the test data, with an average of 5.3% ± 2.8% for sequence pairs with < 30% identity. In order to see which PID method was most highly correlated with structural similarity, four different PID calculations were compared to similarity scores (Sc) from the comparison of the corresponding protein three-dimensional structures. The highest correlation coefficient for a PID calculation was 0.80. In contrast, the more sophisticated Z-score calculated by reference to randomized sequences gave a correlation coefficient of 0.84.

Conclusion

Although it is well known amongst expert sequence analysts that PID is a poor score for discriminating between protein sequences, the apparent simplicity of the percentage identity score encourages its widespread use in establishing cutoffs for structural similarity. This paper illustrates that not only is PID a poor measure of sequence similarity when compared to the Z-score, but that there is also a large uncertainty in reported PID values. Since better alternatives to PID exist to quantify sequence similarity, these should be quoted where possible in preference to PID. The findings presented here should prove helpful to those new to sequence analysis, and in warning those who seek to interpret the value of a PID reported in the literature.

Collapse

103

Sidhu A, Yang ZR. Prediction of signal peptides using bio-basis function neural networks and decision trees. ACTA ACUST UNITED AC 2006;5:13-9. [PMID: 16539533 DOI: 10.2165/00822942-200605010-00002] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

104

Pham TD. LPC Cepstral Distortion Measure for Protein Sequence Comparison. IEEE Trans Nanobioscience 2006;5:83-8. [PMID: 16805103 DOI: 10.1109/tnb.2006.875029] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

105

Krishna SS, Sadreyev RI, Grishin NV. A tale of two ferredoxins: sequence similarity and structural differences. BMC STRUCTURAL BIOLOGY 2006;6:8. [PMID: 16603087 PMCID: PMC1459171 DOI: 10.1186/1472-6807-6-8] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2005] [Accepted: 04/09/2006] [Indexed: 11/10/2022]

106

Williams TJ, Zhang CL, Scott JH, Bazylinski DA. Evidence for autotrophy via the reverse tricarboxylic acid cycle in the marine magnetotactic coccus strain MC-1. Appl Environ Microbiol 2006;72:1322-9. [PMID: 16461683 PMCID: PMC1392968 DOI: 10.1128/aem.72.2.1322-1329.2006] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2005] [Accepted: 11/30/2005] [Indexed: 11/20/2022] Open

Abstract

Strain MC-1 is a marine, microaerophilic, magnetite-producing, magnetotactic coccus phylogenetically affiliated with the alpha-Proteobacteria. Strain MC-1 grew chemolithotrophically with sulfide and thiosulfate as electron donors with HCO3-/CO2 as the sole carbon source. Experiments with cells grown microaerobically in liquid with thiosulfate and H14CO3-/14CO2 showed that all cell carbon was derived from H14CO3-/14CO2 and therefore that MC-1 is capable of chemolithoautotrophy. Cell extracts did not exhibit ribulose-1,5-bisphosphate carboxylase-oxygenase (RubisCO) activity, nor were RubisCO genes found in the draft genome of MC-1. Thus, unlike other chemolithoautotrophic, magnetotactic bacteria, strain MC-1 does not appear to utilize the Calvin-Benson-Bassham cycle for autotrophy. Cell extracts did not exhibit carbon monoxide dehydrogenase activity, indicating that the acetyl-coenzyme A pathway also does not function in strain MC-1. The 13C content of whole cells of MC-1 relative to the 13C content of the inorganic carbon source (Deltadelta13C) was -11.4 per thousand. Cellular fatty acids showed enrichment of 13C relative to whole cells. Strain MC-1 cell extracts showed activities for several key enzymes of the reverse (reductive) tricarboxylic acid (rTCA) cycle including fumarate reductase, pyruvate:acceptor oxidoreductase and 2-oxoglutarate:acceptor oxidoreductase. Although ATP citrate lyase (another key enzyme of the rTCA cycle) activity was not detected in strain MC-1 using commonly used assays, cell extracts did cleave citrate, and the reaction was dependent upon the presence of ATP and coenzyme A. Thus, we infer the presence of an ATP-dependent citrate-cleaving mechanism. These results are consistent with the operation of the rTCA cycle in MC-1. Strain MC-1 appears to be the first known representative of the alpha-Proteobacteria to use the rTCA cycle for autotrophy.

Collapse

107

Li H, Li J, Wong L. Discovering motif pairs at interaction sites from protein sequences on a proteome-wide scale. Bioinformatics 2006;22:989-96. [PMID: 16446278 DOI: 10.1093/bioinformatics/btl020] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Abstract

MOTIVATION

Protein-protein interaction, mediated by protein interaction sites, is intrinsic to many functional processes in the cell. In this paper, we propose a novel method to discover patterns in protein interaction sites. We observed from protein interaction networks that there exist a kind of significant substructures called interacting protein group pairs, which exhibit an all-versus-all interaction between the two protein-sets in such a pair. The full-interaction between the pair indicates a common interaction mechanism shared by the proteins in the pair, which can be referred as an interaction type. Motif pairs at the interaction sites of the protein group pairs can be used to represent such interaction type, with each motif derived from the sequences of a protein group by standard motif discovery algorithms. The systematic discovery of all pairs of interacting protein groups from large protein interaction networks is a computationally challenging problem. By a careful and sophisticated problem transformation, the problem is solved using efficient algorithms for mining frequent patterns, a problem extensively studied in data mining.

RESULTS

We found 5349 pairs of interacting protein groups from a yeast interaction dataset. The expected value of sequence identity within the groups is only 7.48%, indicating non-homology within these protein groups. We derived 5343 motif pairs from these group pairs, represented in the form of blocks. Comparing our motifs with domains in the BLOCKS and PRINTS databases, we found that our blocks could be mapped to an average of 3.08 correlated blocks in these two databases. The mapped blocks occur 4221 out of total 6794 domains (protein groups) in these two databases. Comparing our motif pairs with iPfam consisting of 3045 interacting domain pairs derived from PDB, we found 47 matches occurring in 105 distinct PDB complexes. Comparing with another putative domain interaction database InterDom, we found 203 matches.

AVAILABILITY

http://research.i2r.a-star.edu.sg/BindingMotifPairs/resources.

SUPPLEMENTARY INFORMATION

http://research.i2r.a-star.edu.sg/BindingMotifPairs and Bioinformatics online.

Collapse

108

Huang YM, Bystroff C. Improved pairwise alignments of proteins in the Twilight Zone using local structure predictions. Bioinformatics 2005;22:413-22. [PMID: 16352653 DOI: 10.1093/bioinformatics/bti828] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

109

Kunin V, Goldovsky L, Darzentas N, Ouzounis CA. The net of life: reconstructing the microbial phylogenetic network. Genome Res 2005;15:954-9. [PMID: 15965028 PMCID: PMC1172039 DOI: 10.1101/gr.3666505] [Citation(s) in RCA: 164] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

110

Wen ZN, Wang KL, Li ML, Nie FS, Yang Y. Analyzing functional similarity of protein sequences with discrete wavelet transform. Comput Biol Chem 2005;29:220-8. [PMID: 15979042 DOI: 10.1016/j.compbiolchem.2005.04.007] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2004] [Accepted: 04/14/2005] [Indexed: 10/25/2022]

111

Haas BJ, Wortman JR, Ronning CM, Hannick LI, Smith RK, Maiti R, Chan AP, Yu C, Farzad M, Wu D, White O, Town CD. Complete reannotation of the Arabidopsis genome: methods, tools, protocols and the final release. BMC Biol 2005;3:7. [PMID: 15784138 PMCID: PMC1082884 DOI: 10.1186/1741-7007-3-7] [Citation(s) in RCA: 111] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2004] [Accepted: 03/22/2005] [Indexed: 11/29/2022] Open

112

A configuration space of homologous proteins conserving mutual information and allowing a phylogeny inference based on pair-wise Z-score probabilities. BMC Bioinformatics 2005;6:49. [PMID: 15757521 PMCID: PMC555736 DOI: 10.1186/1471-2105-6-49] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2004] [Accepted: 03/10/2005] [Indexed: 11/15/2022] Open

113

Kunin V, Ahren D, Goldovsky L, Janssen P, Ouzounis CA. Measuring genome conservation across taxa: divided strains and united kingdoms. Nucleic Acids Res 2005;33:616-21. [PMID: 15681613 PMCID: PMC548337 DOI: 10.1093/nar/gki181] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

114

Pirun M, Babnigg G, Stevens FJ. Template-based recognition of protein fold within the midnight and twilight zones of protein sequence similarity. J Mol Recognit 2005;18:203-12. [PMID: 15540237 DOI: 10.1002/jmr.728] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Abstract

Most homologous pairs of proteins have no significant sequence similarity to each other and are not identified by direct sequence comparison or profile-based strategies. However, multiple sequence alignments of low similarity homologues typically reveal a limited number of positions that are well conserved despite diversity of function. It may be inferred that conservation at most of these positions is the result of the importance of the contribution of these amino acids to the folding and stability of the protein. As such, these amino acids and their relative positions may define a structural signature. We demonstrate that extraction of this fold template provides the basis for the sequence database to be searched for patterns consistent with the fold, enabling identification of homologs that are not recognized by global sequence analysis. The fold template method was developed to address the need for a tool that could comprehensively search the midnight and twilight zones of protein sequence similarity without reliance on global statistical significance. Manual implementations of the fold template method were performed on three folds--immunoglobulin, c-lectin and TIM barrel. Following proof of concept of the template method, an automated version of the approach was developed. This automated fold template method was used to develop fold templates for 10 of the more populated folds in the SCOP database. The fold template method developed three-dimensional structural motifs or signatures that were able to return a diverse collection of proteins, while maintaining a low false positive rate. Although the results of the manual fold template method were more comprehensive than the automated fold template method, the diversity of the results from the automated fold template method surpassed those of current methods that rely on statistical significance to infer evolutionary relationships among divergent proteins.

Collapse

115

Stevens FJ. Efficient recognition of protein fold at low sequence identity by conservative application of Psi-BLAST: validation. J Mol Recognit 2005;18:139-49. [PMID: 15558595 DOI: 10.1002/jmr.721] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

116

Hsieh MJ, Luo R. Physical scoring function based on AMBER force field and Poisson-Boltzmann implicit solvent for protein structure prediction. Proteins 2004;56:475-86. [PMID: 15229881 DOI: 10.1002/prot.20133] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Abstract

A well-behaved physics-based all-atom scoring function for protein structure prediction is analyzed with several widely used all-atom decoy sets. The scoring function, termed AMBER/Poisson-Boltzmann (PB), is based on a refined AMBER force field for intramolecular interactions and an efficient PB model for solvation interactions. Testing on the chosen decoy sets shows that the scoring function, which is designed to consider detailed chemical environments, is able to consistently discriminate all 62 native crystal structures after considering the heteroatom groups, disulfide bonds, and crystal packing effects that are not included in the decoy structures. When NMR structures are considered in the testing, the scoring function is able to discriminate 8 out of 10 targets. In the more challenging test of selecting near-native structures, the scoring function also performs very well: for the majority of the targets studied, the scoring function is able to select decoys that are close to the corresponding native structures as evaluated by ranking numbers and backbone Calpha root mean square deviations. Various important components of the scoring function are also studied to understand their discriminative contributions toward the rankings of native and near-native structures. It is found that neither the nonpolar solvation energy as modeled by the surface area model nor a higher protein dielectric constant improves its discriminative power. The terms remaining to be improved are related to 1-4 interactions. The most troublesome term is found to be the large and highly fluctuating 1-4 electrostatics term, not the dihedral-angle term. These data support ongoing efforts in the community to develop protein structure prediction methods with physics-based potentials that are competitive with knowledge-based potentials.

Collapse

117

Hall BG. Comparison of the accuracies of several phylogenetic methods using protein and DNA sequences. Mol Biol Evol 2004;22:792-802. [PMID: 15590907 DOI: 10.1093/molbev/msi066] [Citation(s) in RCA: 114] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

118

Ghosh P. Process of protein transport by the type III secretion system. Microbiol Mol Biol Rev 2004;68:771-95. [PMID: 15590783 PMCID: PMC539011 DOI: 10.1128/mmbr.68.4.771-795.2004] [Citation(s) in RCA: 305] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

119

Söding J. Protein homology detection by HMM-HMM comparison. Bioinformatics 2004;21:951-60. [PMID: 15531603 DOI: 10.1093/bioinformatics/bti125] [Citation(s) in RCA: 1825] [Impact Index Per Article: 86.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

120

Ouyang Z, Zhu H, Wang J, She ZS. Multivariate entropy distance method for prokaryotic gene identification. J Bioinform Comput Biol 2004;2:353-73. [PMID: 15297987 DOI: 10.1142/s0219720004000624] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 07/10/2003] [Indexed: 11/18/2022]

121

Bazylinski DA, Dean AJ, Williams TJ, Long LK, Middleton SL, Dubbels BL. Chemolithoautotrophy in the marine, magnetotactic bacterial strains MV-1 and MV-2. Arch Microbiol 2004;182:373-87. [PMID: 15338111 DOI: 10.1007/s00203-004-0716-y] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2004] [Revised: 06/14/2004] [Accepted: 07/19/2004] [Indexed: 11/28/2022]

122

Bhaduri A, Pugalenthi G, Gupta N, Sowdhamini R. iMOT: an interactive package for the selection of spatially interacting motifs. Nucleic Acids Res 2004;32:W602-5. [PMID: 15215459 PMCID: PMC441513 DOI: 10.1093/nar/gkh375] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

123

Sadreyev RI, Grishin NV. Estimates of statistical significance for comparison of individual positions in multiple sequence alignments. BMC Bioinformatics 2004;5:106. [PMID: 15296518 PMCID: PMC516024 DOI: 10.1186/1471-2105-5-106] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2004] [Accepted: 08/05/2004] [Indexed: 11/17/2022] Open

124

May ACW. Percent Sequence Identity. Structure 2004;12:737-8. [PMID: 15130466 DOI: 10.1016/j.str.2004.04.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

125

Newlove T, Konieczka JH, Cordes MHJ. Secondary Structure Switching in Cro Protein Evolution. Structure 2004;12:569-81. [PMID: 15062080 DOI: 10.1016/j.str.2004.02.024] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2003] [Revised: 01/05/2004] [Accepted: 01/05/2004] [Indexed: 11/28/2022]

126

Pandit SB, Bhadra R, Gowri VS, Balaji S, Anand B, Srinivasan N. SUPFAM: a database of sequence superfamilies of protein domains. BMC Bioinformatics 2004;5:28. [PMID: 15113407 PMCID: PMC394316 DOI: 10.1186/1471-2105-5-28] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2003] [Accepted: 03/15/2004] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

SUPFAM database is a compilation of superfamily relationships between protein domain families of either known or unknown 3-D structure. In SUPFAM, sequence families from Pfam and structural families from SCOP are associated, using profile matching, to result in sequence superfamilies of known structure. Subsequently all-against-all family profile matches are made to deduce a list of new potential superfamilies of yet unknown structure.

DESCRIPTION

The current version of SUPFAM (release 1.4) corresponds to significant enhancements and major developments compared to the earlier and basic version. In the present version we have used RPS-BLAST, which is robust and sensitive, for profile matching. The reliability of connections between protein families is ensured better than before by use of benchmarked criteria involving strict e-value cut-off and a minimal alignment length condition. An e-value based indication of reliability of connections is now presented in the database. Web access to a RPS-BLAST-based tool to associate a query sequence to one of the family profiles in SUPFAM is available with the current release. In terms of the scientific content the present release of SUPFAM is entirely reorganized with the use of 6190 Pfam families and 2317 structural families derived from SCOP. Due to a steep increase in the number of sequence and structural families used in SUPFAM the details of scientific content in the present release are almost entirely complementary to previous basic version. Of the 2286 families, we could relate 245 Pfam families with apparently no structural information to families of known 3-D structures, thus resulting in the identification of new families in the existing superfamilies. Using the profiles of 3904 Pfam families of yet unknown structure, an all-against-all comparison involving sequence-profile match resulted in clustering of 96 Pfam families into 39 new potential superfamilies.

CONCLUSION

SUPFAM presents many non-trivial superfamily relationships of sequence families involved in a variety of functions and hence the information content is of interest to a wide scientific community. The grouping of related proteins without a known structure in SUPFAM is useful in identifying priority targets for structural genomics initiatives and in the assignment of putative functions. Database URL: http://pauling.mbu.iisc.ernet.in/~supfam.

Collapse

127

Sunyaev SR, Bogopolsky GA, Oleynikova NV, Vlasov PK, Finkelstein AV, Roytberg MA. From analysis of protein structural alignments toward a novel approach to align protein sequences. Proteins 2003;54:569-82. [PMID: 14748004 DOI: 10.1002/prot.10503] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

128

Enright AJ, Kunin V, Ouzounis CA. Protein families and TRIBES in genome sequence space. Nucleic Acids Res 2003;31:4632-8. [PMID: 12888524 PMCID: PMC169885 DOI: 10.1093/nar/gkg495] [Citation(s) in RCA: 98] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

129

Hung LH, Samudrala R. PROTINFO: Secondary and tertiary protein structure prediction. Nucleic Acids Res 2003;31:3296-9. [PMID: 12824311 PMCID: PMC168948 DOI: 10.1093/nar/gkg541] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2003] [Revised: 03/31/2003] [Accepted: 03/31/2003] [Indexed: 11/14/2022] Open

130

Simmons MP, Freudenstein JV. The effects of increasing genetic distance on alignment of, and tree construction from, rDNA internal transcribed spacer sequences. Mol Phylogenet Evol 2003;26:444-51. [PMID: 12644403 DOI: 10.1016/s1055-7903(02)00366-4] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

131

Sadreyev R, Grishin N. COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J Mol Biol 2003;326:317-36. [PMID: 12547212 DOI: 10.1016/s0022-2836(02)01371-2] [Citation(s) in RCA: 198] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

132

Edwards YJK, Cottage A. Bioinformatics methods to predict protein structure and function. A practical approach. Mol Biotechnol 2003;23:139-66. [PMID: 12632698 DOI: 10.1385/mb:23:2:139] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

133

Koehl P, Levitt M. Sequence variations within protein families are linearly related to structural variations. J Mol Biol 2002;323:551-62. [PMID: 12381308 PMCID: PMC2692051 DOI: 10.1016/s0022-2836(02)00971-3] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

134

Samudrala R, Levitt M. A comprehensive analysis of 40 blind protein structure predictions. BMC STRUCTURAL BIOLOGY 2002;2:3. [PMID: 12150712 PMCID: PMC122083 DOI: 10.1186/1472-6807-2-3] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2002] [Accepted: 08/01/2002] [Indexed: 11/21/2022]

135

de Trad CH, Fang Q, Cosic I. Protein sequence comparison based on the wavelet transform approach. Protein Eng Des Sel 2002;15:193-203. [PMID: 11932490 DOI: 10.1093/protein/15.3.193] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

136

Campos F, Richardson M. The complete amino acid sequence of the α-amylase inhibitor I-2 from seeds of ragi (Indian finger millet, Eleusine coracana Gaertn.). FEBS Lett 2001. [DOI: 10.1016/0014-5793(84)80130-1] [Citation(s) in RCA: 43] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

137

Grishin NV. Fold change in evolution of protein structures. J Struct Biol 2001;134:167-85. [PMID: 11551177 DOI: 10.1006/jsbi.2001.4335] [Citation(s) in RCA: 342] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

138

Balaji S, Srinivasan N. Use of a database of structural alignments and phylogenetic trees in investigating the relationship between sequence and structural variability among homologous proteins. PROTEIN ENGINEERING 2001;14:219-26. [PMID: 11391013 DOI: 10.1093/protein/14.4.219] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Abstract

The database PALI (Phylogeny and ALIgnment of homologous protein structures) consists of families of protein domains of known three-dimensional (3D) structure. In a PALI family, every member has been structurally aligned with every other member (pairwise) and also simultaneous superposition (multiple) of all the members has been performed. The database also contains 3D structure-based and structure-dependent sequence similarity-based phylogenetic dendrograms for all the families. The PALI release used in the present analysis comprises 225 families derived largely from the HOMSTRAD and SCOP databases. The quality of the multiple rigid-body structural alignments in PALI was compared with that obtained from COMPARER, which encodes a procedure based on properties and relationships. The alignments from the two procedures agreed very well and variations are seen only in the low sequence similarity cases often in the loop regions. A validation of Direct Pairwise Alignment (DPA) between two proteins is provided by comparing it with Pairwise alignment extracted from Multiple Alignment of all the members in the family (PMA). In general, DPA and PMA are found to vary rarely. The ready availability of pairwise alignments allows the analysis of variations in structural distances as a function of sequence similarities and number of topologically equivalent Calpha atoms. The structural distance metric used in the analysis combines root mean square deviation (r.m.s.d.) and number of equivalences, and is shown to vary similarly to r.m.s.d. The correlation between sequence similarity and structural similarity is poor in pairs with low sequence similarities. A comparison of sequence and 3D structure-based phylogenies for all the families suggests that only a few families have a radical difference in the two kinds of dendrograms. The difference could occur when the sequence similarity among the homologues is low or when the structures are subjected to evolutionary pressure for the retention of function. The PALI database is expected to be useful in furthering our understanding of the relationship between sequences and structures of homologous proteins and their evolution.

Collapse

139

Reddy BV, Li WW, Shindyalov IN, Bourne PE. Conserved key amino acid positions (CKAAPs) derived from the analysis of common substructures in proteins. Proteins 2001. [DOI: 10.1002/1097-0134(20010201)42:2%3c148::aid-prot20%3e3.0.co;2-r] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

140

Reddy BV, Li WW, Shindyalov IN, Bourne PE. Conserved key amino acid positions (CKAAPs) derived from the analysis of common substructures in proteins. Proteins 2001;42:148-63. [PMID: 11119639 DOI: 10.1002/1097-0134(20010201)42:2<148::aid-prot20>3.0.co;2-r] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Abstract

An all-against-all protein structure comparison using the Combinatorial Extension (CE) algorithm applied to a representative set of PDB structures revealed a gallery of common substructures in proteins (http://cl.sdsc.edu/ce.html). These substructures represent commonly identified folds, domains, or components thereof. Most of the subsequences forming these similar substructures have no significant sequence similarity. We present a method to identify conserved amino acid positions and residue-dependent property clusters within these subsequences starting with structure alignments. Each of the subsequences is aligned to its homologues in SWALL, a nonredundant protein sequence database. The most similar sequences are purged into a common frequency matrix, and weighted homologues of each one of the subsequences are used in scoring for conserved key amino acid positions (CKAAPs). We have set the top 20% of the high-scoring positions in each substructure to be CKAAPs. It is hypothesized that CKAAPs may be responsible for the common folding patterns in either a local or global view of the protein-folding pathway. Where a significant number of structures exist, CKAAPs have also been identified in structure alignments of complete polypeptide chains from the same protein family or superfamily. Evidence to support the presence of CKAAPs comes from other computational approaches and experimental studies of mutation and protein-folding experiments, notably the Paracelsus challenge. Finally, the structural environment of CKAAPs versus non-CKAAPs is examined for solvent accessibility, hydrogen bonding, and secondary structure. The identification of CKAAPs has important implications for protein engineering, fold recognition, modeling, and structure prediction studies and is dependent on the availability of structures and an accurate structure alignment methodology. Proteins 2001;42:148-163.

Collapse

141

Grishin NV. KH domain: one motif, two folds. Nucleic Acids Res 2001;29:638-43. [PMID: 11160884 PMCID: PMC30387 DOI: 10.1093/nar/29.3.638] [Citation(s) in RCA: 241] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2000] [Revised: 12/01/2000] [Accepted: 12/01/2000] [Indexed: 11/14/2022] Open

142

Balaji S, Sujatha S, Kumar SS, Srinivasan N. PALI-a database of Phylogeny and ALIgnment of homologous protein structures. Nucleic Acids Res 2001;29:61-5. [PMID: 11125050 PMCID: PMC29825 DOI: 10.1093/nar/29.1.61] [Citation(s) in RCA: 65] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2000] [Revised: 10/25/2000] [Accepted: 10/25/2000] [Indexed: 11/13/2022] Open

Abstract

PALI (release 1.2) contains three-dimensional (3-D) structure-dependent sequence alignments as well as structure-based phylogenetic trees of homologous protein domains in various families. The data set of homologous protein structures has been derived by consulting the SCOP database (release 1.50) and the data set comprises 604 families of homologous proteins involving 2739 protein domain structures with each family made up of at least two members. Each member in a family has been structurally aligned with every other member in the same family (pairwise alignment) and all the members in the family are also aligned using simultaneous super-position (multiple alignment). The structural alignments are performed largely automatically, with manual interventions especially in the cases of distantly related proteins, using the program STAMP (version 4.2). Every family is also associated with two dendrograms, calculated using PHYLIP (version 3.5), one based on a structural dissimilarity metric defined for every pairwise alignment and the other based on similarity of topologically equivalent residues. These dendrograms enable easy comparison of sequence and structure-based relationships among the members in a family. Structure-based alignments with the details of structural and sequence similarities, superposed coordinate sets and dendrograms can be accessed conveniently using a web interface. The database can be queried for protein pairs with sequence or structural similarities falling within a specified range. Thus PALI forms a useful resource to help in analysing the relationship between sequence and structure variation at a given level of sequence similarity. PALI also contains over 653 'orphans' (single member families). Using the web interface involving PSI_BLAST and PHYLIP it is possible to associate the sequence of a new protein with one of the families in PALI and generate a phylogenetic tree combining the query sequence and proteins of known 3-D structure. The database with the web interfaced search and dendrogram generation tools can be accessed at http://pauling.mbu.iisc.ernet. in/ approximately pali.

Collapse

143

Chiu TL, Goldstein RA. How to generate improved potentials for protein tertiary structure prediction: a lattice model study. Proteins 2000;41:157-63. [PMID: 10966569 DOI: 10.1002/1097-0134(20001101)41:2<157::aid-prot10>3.0.co;2-w] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

144

Villar HO, Koehler RT. Amino acid preferences of small, naturally occurring polypeptides. Biopolymers 2000;53:226-32. [PMID: 10679627 DOI: 10.1002/(sici)1097-0282(200003)53:3<226::aid-bip2>3.0.co;2-#] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

145

Thomas MC, García-Pérez JL, Alonso C, López MC. Molecular characterization of KMP11 from Trypanosoma cruzi: a cytoskeleton-associated protein regulated at the translational level. DNA Cell Biol 2000;19:47-57. [PMID: 10668791 DOI: 10.1089/104454900314708] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

146

Chopra S, Brendel V, Zhang J, Axtell JD, Peterson T. Molecular characterization of a mutable pigmentation phenotype and isolation of the first active transposable element from Sorghum bicolor. Proc Natl Acad Sci U S A 1999;96:15330-5. [PMID: 10611384 PMCID: PMC24819 DOI: 10.1073/pnas.96.26.15330] [Citation(s) in RCA: 87] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

147

Desiere F, Lucchini S, Brüssow H. Comparative sequence analysis of the DNA packaging, head, and tail morphogenesis modules in the temperate cos-site Streptococcus thermophilus bacteriophage Sfi21. Virology 1999;260:244-53. [PMID: 10417259 DOI: 10.1006/viro.1999.9830] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

148

Fraternali F, Pastore A. Modularity and homology: modelling of the type II module family from titin. J Mol Biol 1999;290:581-93. [PMID: 10390355 DOI: 10.1006/jmbi.1999.2876] [Citation(s) in RCA: 23] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

149

Thompson JD, Plewniak F, Poch O. A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res 1999;27:2682-90. [PMID: 10373585 PMCID: PMC148477 DOI: 10.1093/nar/27.13.2682] [Citation(s) in RCA: 387] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

150

Benvenga S, Alesci S, Trimarchi F, Facchiano A. Homologies of the thyroid sodium-iodide symporter with bacterial and viral proteins. J Endocrinol Invest 1999;22:535-40. [PMID: 10475151 DOI: 10.1007/bf03343605] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]