Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Fox G, Sievers F, Higgins DG. Using de novo protein structure predictions to measure the quality of very large multiple sequence alignments. ACTA ACUST UNITED AC 2015;32:814-20. [PMID: 26568625 PMCID: PMC5939968 DOI: 10.1093/bioinformatics/btv592] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Accepted: 10/10/2015] [Indexed: 01/03/2023]

For:	Fox G, Sievers F, Higgins DG. Using de novo protein structure predictions to measure the quality of very large multiple sequence alignments. ACTA ACUST UNITED AC 2015;32:814-20. [PMID: 26568625 PMCID: PMC5939968 DOI: 10.1093/bioinformatics/btv592] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Accepted: 10/10/2015] [Indexed: 01/03/2023]

Number

Cited by Other Article(s)

Shrestha B, Adhikari B. Scoring protein sequence alignments using deep Learning. Bioinformatics 2022;38:2988-2995. [PMID: 35385080 DOI: 10.1093/bioinformatics/btac210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 04/01/2022] [Accepted: 04/05/2022] [Indexed: 11/12/2022] Open

Lladós J, Cores F, Guirado F, Lérida JL. Accurate consistency-based MSA reducing the memory footprint. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021;208:106237. [PMID: 34198017 DOI: 10.1016/j.cmpb.2021.106237] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2020] [Accepted: 06/08/2021] [Indexed: 06/13/2023]

Wright ES. RNAconTest: comparing tools for noncoding RNA multiple sequence alignment based on structural consistency. RNA (NEW YORK, N.Y.) 2020;26:531-540. [PMID: 32005745 PMCID: PMC7161358 DOI: 10.1261/rna.073015.119] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Accepted: 01/28/2020] [Indexed: 05/05/2023]

Katoh K, Rozewicki J, Yamada KD. MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Brief Bioinform 2020;20:1160-1166. [PMID: 28968734 PMCID: PMC6781576 DOI: 10.1093/bib/bbx108] [Citation(s) in RCA: 4439] [Impact Index Per Article: 887.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 07/27/2017] [Indexed: 11/28/2022] Open

Torrisi M, Pollastri G, Le Q. Deep learning methods in protein structure prediction. Comput Struct Biotechnol J 2020;18:1301-1310. [PMID: 32612753 PMCID: PMC7305407 DOI: 10.1016/j.csbj.2019.12.011] [Citation(s) in RCA: 132] [Impact Index Per Article: 26.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 12/19/2019] [Accepted: 12/20/2019] [Indexed: 01/01/2023] Open

Fukuda H, Tomii K. DeepECA: an end-to-end learning framework for protein contact prediction from a multiple sequence alignment. BMC Bioinformatics 2020;21:10. [PMID: 31918654 PMCID: PMC6953294 DOI: 10.1186/s12859-019-3190-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 11/04/2019] [Indexed: 12/30/2022] Open

Abstract

Background

Recently developed methods of protein contact prediction, a crucially important step for protein structure prediction, depend heavily on deep neural networks (DNNs) and multiple sequence alignments (MSAs) of target proteins. Protein sequences are accumulating to an increasing degree such that abundant sequences to construct an MSA of a target protein are readily obtainable. Nevertheless, many cases present different ends of the number of sequences that can be included in an MSA used for contact prediction. The abundant sequences might degrade prediction results, but opportunities remain for a limited number of sequences to construct an MSA. To resolve these persistent issues, we strove to develop a novel framework using DNNs in an end-to-end manner for contact prediction.

Results

We developed neural network models to improve precision of both deep and shallow MSAs. Results show that higher prediction accuracy was achieved by assigning weights to sequences in a deep MSA. Moreover, for shallow MSAs, adding a few sequential features was useful to increase the prediction accuracy of long-range contacts in our model. Based on these models, we expanded our model to a multi-task model to achieve higher accuracy by incorporating predictions of secondary structures and solvent-accessible surface areas. Moreover, we demonstrated that ensemble averaging of our models can raise accuracy. Using past CASP target protein domains, we tested our models and demonstrated that our final model is superior to or equivalent to existing meta-predictors.

Conclusions

The end-to-end learning framework we built can use information derived from either deep or shallow MSAs for contact prediction. Recently, an increasing number of protein sequences have become accessible, including metagenomic sequences, which might degrade contact prediction results. Under such circumstances, our model can provide a means to reduce noise automatically. According to results of tertiary structure prediction based on contacts and secondary structures predicted by our model, more accurate three-dimensional models of a target protein are obtainable than those from existing ECA methods, starting from its MSA. DeepECA is available from https://github.com/tomiilab/DeepECA.

Collapse

Modi V, Dunbrack RL. A Structurally-Validated Multiple Sequence Alignment of 497 Human Protein Kinase Domains. Sci Rep 2019;9:19790. [PMID: 31875044 PMCID: PMC6930252 DOI: 10.1038/s41598-019-56499-4] [Citation(s) in RCA: 83] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 11/14/2019] [Indexed: 12/21/2022] Open

Nakamura T, Yamada KD, Tomii K, Katoh K. Parallelization of MAFFT for large-scale multiple sequence alignments. Bioinformatics 2019;34:2490-2492. [PMID: 29506019 PMCID: PMC6041967 DOI: 10.1093/bioinformatics/bty121] [Citation(s) in RCA: 612] [Impact Index Per Article: 102.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Accepted: 02/28/2018] [Indexed: 12/03/2022] Open

Sievers F, Higgins DG. QuanTest2: benchmarking multiple sequence alignments using secondary structure prediction. Bioinformatics 2019;36:90-95. [PMID: 31292629 PMCID: PMC9881607 DOI: 10.1093/bioinformatics/btz552] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Revised: 06/17/2019] [Accepted: 07/09/2019] [Indexed: 02/02/2023] Open

Le Q, Sievers F, Higgins DG. Protein multiple sequence alignment benchmarking through secondary structure prediction. Bioinformatics 2018;33:1331-1337. [PMID: 28093407 PMCID: PMC5408826 DOI: 10.1093/bioinformatics/btw840] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Accepted: 01/10/2017] [Indexed: 12/26/2022] Open

Abstract

Motivation

Multiple sequence alignment (MSA) is commonly used to analyze sets of homologous protein or DNA sequences. This has lead to the development of many methods and packages for MSA over the past 30 years. Being able to compare different methods has been problematic and has relied on gold standard benchmark datasets of ‘true’ alignments or on MSA simulations. A number of protein benchmark datasets have been produced which rely on a combination of manual alignment and/or automated superposition of protein structures. These are either restricted to very small MSAs with few sequences or require manual alignment which can be subjective. In both cases, it remains very difficult to properly test MSAs of more than a few dozen sequences. PREFAB and HomFam both rely on using a small subset of sequences of known structure and do not fairly test the quality of a full MSA.

Results

In this paper we describe QuanTest, a fully automated and highly scalable test system for protein MSAs which is based on using secondary structure prediction accuracy (SSPA) to measure alignment quality. This is based on the assumption that better MSAs will give more accurate secondary structure predictions when we include sequences of known structure. SSPA measures the quality of an entire alignment however, not just the accuracy on a handful of selected sequences. It can be scaled to alignments of any size but here we demonstrate its use on alignments of either 200 or 1000 sequences. This allows the testing of slow accurate programs as well as faster, less accurate ones. We show that the scores from QuanTest are highly correlated with existing benchmark scores. We also validate the method by comparing a wide range of MSA alignment options and by including different levels of mis-alignment into MSA, and examining the effects on the scores.

Availability and Implementation

QuanTest is available from http://www.bioinf.ucd.ie/download/QuanTest.tgz

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Sievers F, Higgins DG. Clustal Omega for making accurate alignments of many protein sequences. Protein Sci 2017;27:135-145. [PMID: 28884485 DOI: 10.1002/pro.3290] [Citation(s) in RCA: 1210] [Impact Index Per Article: 151.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Revised: 09/01/2017] [Accepted: 09/05/2017] [Indexed: 01/05/2023]

Deorowicz S, Debudaj-Grabysz A, Gudyś A. FAMSA: Fast and accurate multiple sequence alignment of huge protein families. Sci Rep 2016;6:33964. [PMID: 27670777 PMCID: PMC5037421 DOI: 10.1038/srep33964] [Citation(s) in RCA: 93] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Accepted: 08/31/2016] [Indexed: 11/10/2022] Open

Yamada KD, Tomii K, Katoh K. Application of the MAFFT sequence alignment program to large data-reexamination of the usefulness of chained guide trees. Bioinformatics 2016;32:3246-3251. [PMID: 27378296 PMCID: PMC5079479 DOI: 10.1093/bioinformatics/btw412] [Citation(s) in RCA: 219] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2016] [Accepted: 06/20/2016] [Indexed: 11/26/2022] Open