1
|
Leelananda SP, Kloczkowski A, Jernigan RL. Fold-specific sequence scoring improves protein sequence matching. BMC Bioinformatics 2016; 17:328. [PMID: 27578239 PMCID: PMC5006591 DOI: 10.1186/s12859-016-1198-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Accepted: 08/24/2016] [Indexed: 11/10/2022] Open
Abstract
Background Sequence matching is extremely important for applications throughout biology, particularly for discovering information such as functional and evolutionary relationships, and also for discriminating between unimportant and disease mutants. At present the functions of a large fraction of genes are unknown; improvements in sequence matching will improve gene annotations. Universal amino acid substitution matrices such as Blosum62 are used to measure sequence similarities and to identify distant homologues, regardless of the structure class. However, such single matrices do not take into account important structural information evident within the different topologies of proteins and treats substitutions within all protein folds identically. Others have suggested that the use of structural information can lead to significant improvements in sequence matching but this has not yet been very effective. Here we develop novel substitution matrices that include not only general sequence information but also have a topology specific component that is unique for each CATH topology. This novel feature of using a combination of sequence and structure information for each protein topology significantly improves the sequence matching scores for the sequence pairs tested. We have used a novel multi-structure alignment method for each homology level of CATH in order to extract topological information. Results We obtain statistically significant improved sequence matching scores for 73 % of the alpha helical test cases. On average, 61 % of the test cases showed improvements in homology detection when structure information was incorporated into the substitution matrices. On average z-scores for homology detection are improved by more than 54 % for all cases, and some individual cases have z-scores more than twice those obtained using generic matrices. Our topology specific similarity matrices also outperform other traditional similarity matrices and single matrix based structure methods. When default amino acid substitution matrix in the Psi-blast algorithm is replaced by our structure-based matrices, the structure matching is significantly improved over conventional Psi-blast. It also outperforms results obtained for the corresponding HMM profiles generated for each topology. Conclusions We show that by incorporating topology-specific structure information in addition to sequence information into specific amino acid substitution matrices, the sequence matching scores and homology detection are significantly improved. Our topology specific similarity matrices outperform other traditional similarity matrices, single matrix based structure methods, also show improvement over conventional Psi-blast and HMM profile based methods in sequence matching. The results support the discriminatory ability of the new amino acid similarity matrices to distinguish between distant homologs and structurally dissimilar pairs. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1198-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sumudu P Leelananda
- Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, 112 Office and Lab Building, Ames, IA, 50011-3020, USA.,Laurence H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, 112 Office and Lab Building, Ames, IA, 50011-3020, USA.,Present Address: 2120 Newman and Wolfrom Laboratory, The Ohio State University, 100 W 18th Ave, Columbus, OH, 43210, USA.,Present Address: Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital, Columbus, OH, 43205, USA
| | - Andrzej Kloczkowski
- Present Address: Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital, Columbus, OH, 43205, USA.,Present Address: Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, 43205, USA
| | - Robert L Jernigan
- Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, 112 Office and Lab Building, Ames, IA, 50011-3020, USA. .,Laurence H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, 112 Office and Lab Building, Ames, IA, 50011-3020, USA.
| |
Collapse
|
2
|
Tong J, Pei J, Grishin NV. SFESA: a web server for pairwise alignment refinement by secondary structure shifts. BMC Bioinformatics 2015; 16:282. [PMID: 26335387 PMCID: PMC4558796 DOI: 10.1186/s12859-015-0711-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2015] [Accepted: 08/19/2015] [Indexed: 12/01/2022] Open
Abstract
Background Protein sequence alignment is essential for a variety of tasks such as homology modeling and active site prediction. Alignment errors remain the main cause of low-quality structure models. A bioinformatics tool to refine alignments is needed to make protein alignments more accurate. Results We developed the SFESA web server to refine pairwise protein sequence alignments. Compared to the previous version of SFESA, which required a set of 3D coordinates for a protein, the new server will search a sequence database for the closest homolog with an available 3D structure to be used as a template. For each alignment block defined by secondary structure elements in the template, SFESA evaluates alignment variants generated by local shifts and selects the best-scoring alignment variant. A scoring function that combines the sequence score of profile-profile comparison and the structure score of template-derived contact energy is used for evaluation of alignments. PROMALS pairwise alignments refined by SFESA are more accurate than those produced by current advanced alignment methods such as HHpred and CNFpred. In addition, SFESA also improves alignments generated by other software. Conclusions SFESA is a web-based tool for alignment refinement, designed for researchers to compute, refine, and evaluate pairwise alignments with a combined sequence and structure scoring of alignment blocks. To our knowledge, the SFESA web server is the only tool that refines alignments by evaluating local shifts of secondary structure elements. The SFESA web server is available at http://prodata.swmed.edu/sfesa.
Collapse
Affiliation(s)
- Jing Tong
- Department of Biophysics and Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas, 6001 Forest Park Road, Dallas, TX, 75390-9050, USA.
| | - Jimin Pei
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, 6001 Forest Park Road, Dallas, TX, 75390-9050, USA.
| | - Nick V Grishin
- Department of Biophysics and Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas, 6001 Forest Park Road, Dallas, TX, 75390-9050, USA. .,Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, 6001 Forest Park Road, Dallas, TX, 75390-9050, USA.
| |
Collapse
|
3
|
Gniewek P, Kolinski A, Kloczkowski A, Gront D. BioShell-Threading: versatile Monte Carlo package for protein 3D threading. BMC Bioinformatics 2014; 15:22. [PMID: 24444459 PMCID: PMC3937128 DOI: 10.1186/1471-2105-15-22] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2012] [Accepted: 11/18/2013] [Indexed: 11/26/2022] Open
Abstract
Background The comparative modeling approach to protein structure prediction inherently relies on a template structure. Before building a model such a template protein has to be found and aligned with the query sequence. Any error made on this stage may dramatically affects the quality of result. There is a need, therefore, to develop accurate and sensitive alignment protocols. Results BioShell threading software is a versatile tool for aligning protein structures, protein sequences or sequence profiles and query sequences to a template structures. The software is also capable of sub-optimal alignment generation. It can be executed as an application from the UNIX command line, or as a set of Java classes called from a script or a Java application. The implemented Monte Carlo search engine greatly facilitates the development and benchmarking of new alignment scoring schemes even when the functions exhibit non-deterministic polynomial-time complexity. Conclusions Numerical experiments indicate that the new threading application offers template detection abilities and provides much better alignments than other methods. The package along with documentation and examples is available at: http://bioshell.pl/threading3d.
Collapse
Affiliation(s)
| | | | | | - Dominik Gront
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland.
| |
Collapse
|
4
|
Gront D, Blaszczyk M, Wojciechowski P, Kolinski A. BioShell Threader: protein homology detection based on sequence profiles and secondary structure profiles. Nucleic Acids Res 2012; 40:W257-62. [PMID: 22693216 PMCID: PMC3394251 DOI: 10.1093/nar/gks555] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
The BioShell package has recently been extended with a web server for protein homology detection based on profile-to-profile alignment (known as 1D threading). Its aim is to assign structural templates to each domain of the query. The server uses sequence profiles that describe observed sequence variability and secondary structure profiles providing expected probability for a certain secondary structure type at a given position in a protein. Three independent predictors are used to increase the rate of successful predictions. Careful evaluation shows that there is nearly 80% chance that the query sequence belongs to the same SCOP family as the top scoring template. The Bioshell Threader server is freely available at: http://www.bioshell.pl/threader/.
Collapse
Affiliation(s)
- Dominik Gront
- University of Warsaw, Faculty of Chemistry, Pasteura 1, 02-093 Warsaw, Poland.
| | | | | | | |
Collapse
|