Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Lunter GA, Miklós I, Song YS, Hein J. An efficient algorithm for statistical multiple alignment on arbitrary phylogenetic trees. J Comput Biol 2004;10:869-89. [PMID: 14980015 DOI: 10.1089/106652703322756122] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

For:	Lunter GA, Miklós I, Song YS, Hein J. An efficient algorithm for statistical multiple alignment on arbitrary phylogenetic trees. J Comput Biol 2004;10:869-89. [PMID: 14980015 DOI: 10.1089/106652703322756122] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Number

Cited by Other Article(s)

Nute M, Saleh E, Warnow T. Evaluating Statistical Multiple Sequence Alignment in Comparison to Other Alignment Methods on Protein Data Sets. Syst Biol 2019;68:396-411. [PMID: 30329135 PMCID: PMC6472439 DOI: 10.1093/sysbio/syy068] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 09/27/2018] [Accepted: 10/11/2018] [Indexed: 01/15/2023] Open

Levy Karin E, Ashkenazy H, Hein J, Pupko T. A Simulation-Based Approach to Statistical Alignment. Syst Biol 2018;68:252-266. [DOI: 10.1093/sysbio/syy059] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2018] [Accepted: 09/10/2018] [Indexed: 12/26/2022] Open

Holmes IH. Solving the master equation for Indels. BMC Bioinformatics 2017;18:255. [PMID: 28494756 PMCID: PMC5427538 DOI: 10.1186/s12859-017-1665-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Accepted: 04/30/2017] [Indexed: 01/09/2023] Open

Nute M, Warnow T. Scaling statistical multiple sequence alignment to large datasets. BMC Genomics 2016;17:764. [PMID: 28185555 PMCID: PMC5123300 DOI: 10.1186/s12864-016-3101-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open

Levy Karin E, Rabin A, Ashkenazy H, Shkedy D, Avram O, Cartwright RA, Pupko T. Inferring Indel Parameters using a Simulation-based Approach. Genome Biol Evol 2015;7:3226-38. [PMID: 26537226 PMCID: PMC4700945 DOI: 10.1093/gbe/evv212] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Bouchard-Côté A. A note on probabilistic models over strings: the linear algebra approach. Bull Math Biol 2013;75:2529-50. [PMID: 24135792 DOI: 10.1007/s11538-013-9906-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2013] [Accepted: 09/19/2013] [Indexed: 11/28/2022]

Warnow T. Large-Scale Multiple Sequence Alignment and Phylogeny Estimation. MODELS AND ALGORITHMS FOR GENOME EVOLUTION 2013. [DOI: 10.1007/978-1-4471-5298-9_6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

Liu K, Warnow T. Treelength optimization for phylogeny estimation. PLoS One 2012;7:e33104. [PMID: 22442677 PMCID: PMC3307723 DOI: 10.1371/journal.pone.0033104] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2011] [Accepted: 02/03/2012] [Indexed: 01/05/2023] Open

Warnow T. Standard maximum likelihood analyses of alignments with gaps can be statistically inconsistent. PLOS CURRENTS 2012;4:RRN1308. [PMID: 22453901 PMCID: PMC3299439 DOI: 10.1371/currents.rrn1308] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 03/12/2012] [Indexed: 12/04/2022]

Schmitt I, Barker FK. Phylogenetic methods in natural product research. Nat Prod Rep 2009;26:1585-602. [PMID: 19936388 DOI: 10.1039/b910458p] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Bradley RK, Holmes I. Evolutionary triplet models of structured RNA. PLoS Comput Biol 2009;5:e1000483. [PMID: 19714212 PMCID: PMC2725318 DOI: 10.1371/journal.pcbi.1000483] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2008] [Accepted: 07/23/2009] [Indexed: 12/31/2022] Open

Abstract

The reconstruction and synthesis of ancestral RNAs is a feasible goal for paleogenetics. This will require new bioinformatics methods, including a robust statistical framework for reconstructing histories of substitutions, indels and structural changes. We describe a "transducer composition" algorithm for extending pairwise probabilistic models of RNA structural evolution to models of multiple sequences related by a phylogenetic tree. This algorithm draws on formal models of computational linguistics as well as the 1985 protosequence algorithm of David Sankoff. The output of the composition algorithm is a multiple-sequence stochastic context-free grammar. We describe dynamic programming algorithms, which are robust to null cycles and empty bifurcations, for parsing this grammar. Example applications include structural alignment of non-coding RNAs, propagation of structural information from an experimentally-characterized sequence to its homologs, and inference of the ancestral structure of a set of diverged RNAs. We implemented the above algorithms for a simple model of pairwise RNA structural evolution; in particular, the algorithms for maximum likelihood (ML) alignment of three known RNA structures and a known phylogeny and inference of the common ancestral structure. We compared this ML algorithm to a variety of related, but simpler, techniques, including ML alignment algorithms for simpler models that omitted various aspects of the full model and also a posterior-decoding alignment algorithm for one of the simpler models. In our tests, incorporation of basepair structure was the most important factor for accurate alignment inference; appropriate use of posterior-decoding was next; and fine details of the model were least important. Posterior-decoding heuristics can be substantially faster than exact phylogenetic inference, so this motivates the use of sum-over-pairs heuristics where possible (and approximate sum-over-pairs). For more exact probabilistic inference, we discuss the use of transducer composition for ML (or MCMC) inference on phylogenies, including possible ways to make the core operations tractable.

Collapse

Fast statistical alignment. PLoS Comput Biol 2009;5:e1000392. [PMID: 19478997 PMCID: PMC2684580 DOI: 10.1371/journal.pcbi.1000392] [Citation(s) in RCA: 229] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2008] [Accepted: 04/20/2009] [Indexed: 02/01/2023] Open

Abstract

We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment—previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches—yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/.

Biological sequence alignment is one of the fundamental problems in comparative genomics, yet it remains unsolved. Over sixty sequence alignment programs are listed on Wikipedia, and many new programs are published every year. However, many popular programs suffer from pathologies such as aligning unrelated sequences and producing discordant alignments in protein (amino acid) and codon (nucleotide) space, casting doubt on the accuracy of the inferred alignments. Inaccurate alignments can introduce large and unknown systematic biases into downstream analyses such as phylogenetic tree reconstruction and substitution rate estimation. We describe a new program for multiple sequence alignment which can align protein, RNA and DNA sequence and improves on the accuracy of existing approaches on benchmarks of protein and RNA structural alignments and simulated mammalian and fly genomic alignments. Our approach, which seeks to find the alignment which is closest to the truth under our statistical model, leaves unrelated sequences largely unaligned and produces concordant alignments in protein and codon space. It is fast enough for difficult problems such as aligning orthologous genomic regions or aligning hundreds or thousands of proteins. It furthermore has a companion GUI for visualizing the estimated alignment reliability.

Collapse

Miklós I, Novák Á, Satija R, Lyngsø R, Hein J. Stochastic models of sequence evolution including insertion—deletion events. Stat Methods Med Res 2009;18:453-85. [DOI: 10.1177/0962280208099500] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Probabilistic phylogenetic inference with insertions and deletions. PLoS Comput Biol 2008;4:e1000172. [PMID: 18787703 PMCID: PMC2527138 DOI: 10.1371/journal.pcbi.1000172] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2007] [Accepted: 07/31/2008] [Indexed: 11/19/2022] Open

Do CB, Katoh K. Protein multiple sequence alignment. Methods Mol Biol 2008;484:379-413. [PMID: 18592193 DOI: 10.1007/978-1-59745-398-1_25] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]

Bradley RK, Holmes I. Transducers: an emerging probabilistic framework for modeling indels on trees. Bioinformatics 2007;23:3258-62. [PMID: 17804440 DOI: 10.1093/bioinformatics/btm402] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Diallo AB, Makarenkov V, Blanchette M. Exact and heuristic algorithms for the Indel Maximum Likelihood Problem. J Comput Biol 2007;14:446-61. [PMID: 17572023 DOI: 10.1089/cmb.2007.a006] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Mitrophanov AY, Borodovsky M. Convergence rate estimation for the TKF91 model of biological sequence length evolution. Math Biosci 2007;209:470-85. [PMID: 17448505 DOI: 10.1016/j.mbs.2007.02.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2006] [Revised: 02/17/2007] [Accepted: 02/23/2007] [Indexed: 10/23/2022]

Chindelevitch L, Li Z, Blais E, Blanchette M. On the inference of parsimonious indel evolutionary scenarios. J Bioinform Comput Biol 2006;4:721-44. [PMID: 16960972 DOI: 10.1142/s0219720006002168] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2005] [Revised: 12/02/2005] [Accepted: 12/31/2005] [Indexed: 11/18/2022]

Song YS. A Sufficient Condition for Reducing Recursions in Hidden Markov Models. Bull Math Biol 2006;68:361-84. [PMID: 16794935 DOI: 10.1007/s11538-005-9045-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2004] [Accepted: 05/09/2005] [Indexed: 11/25/2022]

Fleissner R, Metzler D, von Haeseler A. Simultaneous statistical multiple alignment and phylogeny reconstruction. Syst Biol 2006;54:548-61. [PMID: 16085574 DOI: 10.1080/10635150590950371] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open

Finding Maximum Likelihood Indel Scenarios. COMPARATIVE GENOMICS 2006. [DOI: 10.1007/11864127_14] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]

Redelings BD, Suchard MA. Joint Bayesian estimation of alignment and phylogeny. Syst Biol 2005;54:401-18. [PMID: 16012107 DOI: 10.1080/10635150590947041] [Citation(s) in RCA: 165] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open

Csurös M, Miklós I. Statistical alignment of retropseudogenes and their functional paralogs. Mol Biol Evol 2005;22:2457-71. [PMID: 16107591 DOI: 10.1093/molbev/msi238] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open

Lunter G, Miklós I, Drummond A, Jensen JL, Hein J. Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinformatics 2005;6:83. [PMID: 15804354 PMCID: PMC1087833 DOI: 10.1186/1471-2105-6-83] [Citation(s) in RCA: 158] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2005] [Accepted: 04/01/2005] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Two central problems in computational biology are the determination of the alignment and phylogeny of a set of biological sequences. The traditional approach to this problem is to first build a multiple alignment of these sequences, followed by a phylogenetic reconstruction step based on this multiple alignment. However, alignment and phylogenetic inference are fundamentally interdependent, and ignoring this fact leads to biased and overconfident estimations. Whether the main interest be in sequence alignment or phylogeny, a major goal of computational biology is the co-estimation of both.

RESULTS

We developed a fully Bayesian Markov chain Monte Carlo method for coestimating phylogeny and sequence alignment, under the Thorne-Kishino-Felsenstein model of substitution and single nucleotide insertion-deletion (indel) events. In our earlier work, we introduced a novel and efficient algorithm, termed the "indel peeling algorithm", which includes indels as phylogenetically informative evolutionary events, and resembles Felsenstein's peeling algorithm for substitutions on a phylogenetic tree. For a fixed alignment, our extension analytically integrates out both substitution and indel events within a proper statistical model, without the need for data augmentation at internal tree nodes, allowing for efficient sampling of tree topologies and edge lengths. To additionally sample multiple alignments, we here introduce an efficient partial Metropolized independence sampler for alignments, and combine these two algorithms into a fully Bayesian co-estimation procedure for the alignment and phylogeny problem. Our approach results in estimates for the posterior distribution of evolutionary rate parameters, for the maximum a-posteriori (MAP) phylogenetic tree, and for the posterior decoding alignment. Estimates for the evolutionary tree and multiple alignment are augmented with confidence estimates for each node height and alignment column. Our results indicate that the patterns in reliability broadly correspond to structural features of the proteins, and thus provides biologically meaningful information which is not existent in the usual point-estimate of the alignment. Our methods can handle input data of moderate size (10-20 protein sequences, each 100-200 bp), which we analyzed overnight on a standard 2 GHz personal computer.

CONCLUSION

Joint analysis of multiple sequence alignment, evolutionary trees and additional evolutionary parameters can be now done within a single coherent statistical framework.

Collapse

Blanchette M, Green ED, Miller W, Haussler D. Reconstructing large regions of an ancestral mammalian genome in silico. Genome Res 2004;14:2412-23. [PMID: 15574820 PMCID: PMC534665 DOI: 10.1101/gr.2800104] [Citation(s) in RCA: 106] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2004] [Accepted: 10/05/2004] [Indexed: 11/25/2022]

Knudsen B, Miyamoto MM. Sequence Alignments and Pair Hidden Markov Models Using Evolutionary History. J Mol Biol 2003;333:453-60. [PMID: 14529629 DOI: 10.1016/j.jmb.2003.08.015] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]