101
|
|
102
|
Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol 2013. [DOI: 10.1093/molbev/mst010 and 3004=3004-- upgm] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
103
|
Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol 2013. [DOI: 10.1093/molbev/mst010 and 3004=3004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
104
|
Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol 2013. [DOI: 10.1093/molbev/mst010 and 9880=5640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
105
|
Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol 2013. [DOI: 10.1093/molbev/mst010 order by 1-- hywi] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
106
|
Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol 2013. [DOI: 10.1093/molbev/mst010 and 9948=8094-- epkf] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
107
|
Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol 2013. [DOI: 10.1093/molbev/mst010 and (select (case when (6087=6087) then null else ctxsys.drithsx.sn(1,6087) end) from dual) is null-- fslz] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
108
|
|
109
|
|
110
|
|
111
|
Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol 2013. [DOI: 10.1093/molbev/mst010 rlike (select (case when (1029=8761) then 0x31302e313039332f6d6f6c6265762f6d7374303130 else 0x28 end))-- rjgs] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
112
|
Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol 2013. [DOI: 10.1093/molbev/mst010 or extractvalue(7211,concat(0x5c,0x7162627a71,(select (elt(7211=7211,1))),0x71787a7671))] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
113
|
Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol 2013. [DOI: 10.1093/molbev/mst010 or extractvalue(7211,concat(0x5c,0x7162627a71,(select (elt(7211=7211,1))),0x71787a7671))-- uqub] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
114
|
Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol 2013. [DOI: 10.1093/molbev/mst010 and (select (case when (6101=6101) then null else cast((chr(106)||chr(117)||chr(65)||chr(65)) as numeric) end)) is null-- fzjd] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
115
|
|
116
|
|
117
|
|
118
|
Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol 2013. [DOI: 10.1093/molbev/mst010 and (select (case when (6087=6087) then null else ctxsys.drithsx.sn(1,6087) end) from dual) is null] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
119
|
|
120
|
Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol 2013. [DOI: 10.1093/molbev/mst010 rlike (select (case when (2170=5954) then 0x31302e313039332f6d6f6c6265762f6d7374303130 else 0x28 end))] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
121
|
|
122
|
|
123
|
|
124
|
|
125
|
Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol 2013. [DOI: 10.1093/molbev/mst010 and (select (case when (6786=7864) then null else cast((chr(113)||chr(75)||chr(106)||chr(118)) as numeric) end)) is null] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
126
|
|
127
|
|
128
|
Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol 2013. [DOI: 10.1093/molbev/mst010 rlike (select (case when (4734=4734) then 0x31302e313039332f6d6f6c6265762f6d7374303130 else 0x28 end))-- tpuz] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
129
|
|
130
|
|
131
|
Katoh K, Standley DM. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol Biol Evol 2013. [DOI: 10.1093/molbev/mst010 and (select (case when (9712=5547) then null else ctxsys.drithsx.sn(1,9712) end) from dual) is null] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
132
|
Sievers F, Dineen D, Wilm A, Higgins DG. Making automated multiple alignments of very large numbers of protein sequences. ACTA ACUST UNITED AC 2013; 29:989-95. [PMID: 23428640 DOI: 10.1093/bioinformatics/btt093] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
MOTIVATION Recent developments in sequence alignment software have made possible multiple sequence alignments (MSAs) of >100 000 sequences in reasonable times. At present, there are no systematic analyses concerning the scalability of the alignment quality as the number of aligned sequences is increased. RESULTS We benchmarked a wide range of widely used MSA packages using a selection of protein families with some known structures and found that the accuracy of such alignments decreases markedly as the number of sequences grows. This is more or less true of all packages and protein families. The phenomenon is mostly due to the accumulation of alignment errors, rather than problems in guide-tree construction. This is partly alleviated by using iterative refinement or selectively adding sequences. The average accuracy of progressive methods by comparison with structure-based benchmarks can be improved by incorporating information derived from high-quality structural alignments of sequences with solved structures. This suggests that the availability of high quality curated alignments will have to complement algorithmic and/or software developments in the long-term. AVAILABILITY AND IMPLEMENTATION Benchmark data used in this study are available at http://www.clustal.org/omega/homfam-20110613-25.tar.gz and http://www.clustal.org/omega/bali3fam-26.tar.gz. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fabian Sievers
- School of Medicine and Medical Science, Conway Institute, University College Dublin, Dublin 4, Ireland.
| | | | | | | |
Collapse
|
133
|
Abstract
We address the problem of the joint statistical inference of phylogenetic trees and multiple sequence alignments from unaligned molecular sequences. This problem is generally formulated in terms of string-valued evolutionary processes along the branches of a phylogenetic tree. The classic evolutionary process, the TKF91 model [Thorne JL, Kishino H, Felsenstein J (1991) J Mol Evol 33(2):114-124] is a continuous-time Markov chain model composed of insertion, deletion, and substitution events. Unfortunately, this model gives rise to an intractable computational problem: The computation of the marginal likelihood under the TKF91 model is exponential in the number of taxa. In this work, we present a stochastic process, the Poisson Indel Process (PIP), in which the complexity of this computation is reduced to linear. The Poisson Indel Process is closely related to the TKF91 model, differing only in its treatment of insertions, but it has a global characterization as a Poisson process on the phylogeny. Standard results for Poisson processes allow key computations to be decoupled, which yields the favorable computational profile of inference under the PIP model. We present illustrative experiments in which Bayesian inference under the PIP model is compared with separate inference of phylogenies and alignments.
Collapse
Affiliation(s)
- Alexandre Bouchard-Côté
- Department of Statistics, University of British Columbia, Vancouver, BC, Canada V6T 1Z4; and
| | - Michael I. Jordan
- Departments of Statistics and Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720
| |
Collapse
|
134
|
Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 2013; 30:772-80. [PMID: 23329690 PMCID: PMC3603318 DOI: 10.1093/molbev/mst010] [Citation(s) in RCA: 26825] [Impact Index Per Article: 2235.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations.
Collapse
Affiliation(s)
- Kazutaka Katoh
- Immunology Frontier Research Center, Osaka University, Suita, Osaka, Japan.
| | | |
Collapse
|
135
|
Warnow T. Large-Scale Multiple Sequence Alignment and Phylogeny Estimation. MODELS AND ALGORITHMS FOR GENOME EVOLUTION 2013. [DOI: 10.1007/978-1-4471-5298-9_6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
136
|
Abstract
Two methods to add unaligned sequences into an existing multiple sequence alignment have been implemented as the ‘–add’ and ‘–addfragments’ options in the MAFFT package. The former option is a basic one and applicable only to full-length sequences, whereas the latter option is applicable even when the unaligned sequences are short and fragmentary. These methods internally infer the phylogenetic relationship among the sequences in the existing alignment and the phylogenetic positions of unaligned sequences. Benchmarks based on two independent simulations consistently suggest that the “–addfragments” option outperforms recent methods, PaPaRa and PAGAN, in accuracy for difficult problems and that these three methods appropriately handle easy problems. Availability:http://mafft.cbrc.jp/alignment/software/ Contact:katoh@ifrec.osaka-u.ac.jp Supplementary information:Supplementary data are available at Bioinformatics online
Collapse
Affiliation(s)
- Kazutaka Katoh
- Laboratory of Systems Immunology, Immunology Frontier Research Center, Osaka University, Yamadaoka, Suita 565-0871, Japan.
| | | |
Collapse
|
137
|
Szalkowski AM. Fast and robust multiple sequence alignment with phylogeny-aware gap placement. BMC Bioinformatics 2012; 13:129. [PMID: 22694311 PMCID: PMC3495709 DOI: 10.1186/1471-2105-13-129] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2011] [Accepted: 06/13/2012] [Indexed: 12/02/2022] Open
Abstract
Background ProGraphMSA is a state-of-the-art multiple sequence alignment tool which produces phylogenetically sensible gap patterns while maintaining robustness by allowing alternative splicings and errors in the branching pattern of the guide tree. Results This is achieved by incorporating a graph-based sequence representation combined with the advantages of the phylogeny-aware gap placement algorithm of Prank. Further, we account for variations in the substitution pattern by implementing context-specific profiles as in CS-Blast and by estimating amino acid frequencies from input data. Conclusions ProGraphMSA shows good performance and competitive execution times in various benchmarks.
Collapse
Affiliation(s)
- Adam M Szalkowski
- Department of Computer Science, ETH Zürich, Universitätstrasse, Switzerland.
| |
Collapse
|