Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Pollard DA, Bergman CM, Stoye J, Celniker SE, Eisen MB. Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics 2004;5:6. [PMID: 14736341 PMCID: PMC344529 DOI: 10.1186/1471-2105-5-6] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2003] [Accepted: 01/21/2004] [Indexed: 11/10/2022] Open

For:	Pollard DA, Bergman CM, Stoye J, Celniker SE, Eisen MB. Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics 2004;5:6. [PMID: 14736341 PMCID: PMC344529 DOI: 10.1186/1471-2105-5-6] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2003] [Accepted: 01/21/2004] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Hubley R, Wheeler TJ, Smit AFA. Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families. NAR Genom Bioinform 2022;4:lqac040. [PMID: 35591887 PMCID: PMC9112768 DOI: 10.1093/nargab/lqac040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 03/29/2022] [Accepted: 04/29/2022] [Indexed: 02/06/2023] Open

Suzuki Y. Methods for making multiple alignment of genomic sequences for severe acute respiratory syndrome coronavirus 2. Meta Gene 2020;26:100785. [PMID: 32835005 PMCID: PMC7434624 DOI: 10.1016/j.mgene.2020.100785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 07/24/2020] [Accepted: 08/14/2020] [Indexed: 11/21/2022] Open

MySSP: Non-stationary evolutionary sequence simulation, including indels. Evol Bioinform Online 2017. [DOI: 10.1177/117693430500100007] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Chowdhury B, Garai G. A review on multiple sequence alignment from the perspective of genetic algorithm. Genomics 2017;109:419-431. [PMID: 28669847 DOI: 10.1016/j.ygeno.2017.06.007] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Revised: 05/27/2017] [Accepted: 06/27/2017] [Indexed: 01/04/2023]

Vincent BJ, Estrada J, DePace AH. The appeasement of Doug: a synthetic approach to enhancer biology. Integr Biol (Camb) 2016;8:475-84. [DOI: 10.1039/c5ib00321k] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

De Witte D, Van de Velde J, Decap D, Van Bel M, Audenaert P, Demeester P, Dhoedt B, Vandepoele K, Fostier J. BLSSpeller: exhaustive comparative discovery of conserved cis-regulatory elements. Bioinformatics 2015;31:3758-66. [PMID: 26254488 PMCID: PMC4653392 DOI: 10.1093/bioinformatics/btv466] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2014] [Accepted: 08/03/2015] [Indexed: 11/14/2022] Open

Park HJ, Lee SE, Kim HB, Isaacson RE, Seo KW, Song KH. Association of obesity with serum leptin, adiponectin, and serotonin and gut microflora in beagle dogs. J Vet Intern Med 2014;29:43-50. [PMID: 25407880 PMCID: PMC4858068 DOI: 10.1111/jvim.12455] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Revised: 06/12/2014] [Accepted: 08/12/2014] [Indexed: 12/17/2022] Open

Abstract

Background

Serotonin (5‐hydroxytryptamine, 5HT) is involved in hypothalamic regulation of energy consumption. Also, the gut microbiome can influence neuronal signaling to the brain through vagal afferent neurons. Therefore, serotonin concentrations in the central nervous system and the composition of the microbiota can be related to obesity.

Objective

To examine adipokine, and, serotonin concentrations, and the gut microbiota in lean dogs and dogs with experimentally induced obesity.

Animals

Fourteen healthy Beagle dogs were used in this study.

Methods

Seven Beagle dogs in the obese group were fed commercial food ad libitum, over a period of 6 months to increase their weight and seven Beagle dogs in lean group were fed a restricted amount of the same diet to maintain optimal body condition over a period of 6 months. Peripheral leptin, adiponectin, 5HT, and cerebrospinal fluid (CSF‐5HT) levels were measured by ELISA. Fecal samples were collected in lean and obese groups 6 months after obesity was induced. Targeted pyrosequencing of the 16S rRNA gene was performed using a Genome Sequencer FLX plus system.

Results

Leptin concentrations were higher in the obese group (1.98 ± 1.00) compared to those of the lean group (1.12 ± 0.07, P = .025). Adiponectin and 5‐hydroytryptamine of cerebrospinal fluid (CSF‐5HT) concentrations were higher in the lean group (27.1 ± 7.28) than in the obese group (14.4 ± 5.40, P = .018). Analysis of the microbiome revealed that the diversity of the microbial community was lower in the obese group. Microbes from the phylum Firmicutes (85%) were predominant group in the gut microbiota of lean dogs. However, bacteria from the phylum Proteobacteria (76%) were the predominant group in the gut microbiota of dogs in the obese group.

Conclusions and Clinical Importance

Decreased 5HT levels in obese group might increase the risk of obesity because of increased appetite. Microflora enriched with gram‐negative might be related with chronic inflammation status in obese dogs.

Collapse

FOGSAA: Fast Optimal Global Sequence Alignment Algorithm. Sci Rep 2014;3:1746. [PMID: 23624407 PMCID: PMC3638164 DOI: 10.1038/srep01746] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2012] [Accepted: 04/15/2013] [Indexed: 11/13/2022] Open

Erb I, González-Vallinas JR, Bussotti G, Blanco E, Eyras E, Notredame C. Use of ChIP-Seq data for the design of a multiple promoter-alignment method. Nucleic Acids Res 2012;40:e52. [PMID: 22230796 PMCID: PMC3326335 DOI: 10.1093/nar/gkr1292] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open

Zhen Y, Andolfatto P. Methods to detect selection on noncoding DNA. Methods Mol Biol 2012;856:141-59. [PMID: 22399458 DOI: 10.1007/978-1-61779-585-5_6] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Reineke AR, Bornberg-Bauer E, Gu J. Evolutionary divergence and limits of conserved non-coding sequence detection in plant genomes. Nucleic Acids Res 2011;39:6029-43. [PMID: 21470961 PMCID: PMC3152334 DOI: 10.1093/nar/gkr179] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2010] [Revised: 02/22/2011] [Accepted: 03/15/2011] [Indexed: 12/17/2022] Open

Cao MD, Dix TI, Allison L. A genome alignment algorithm based on compression. BMC Bioinformatics 2010;11:599. [PMID: 21159205 PMCID: PMC3022628 DOI: 10.1186/1471-2105-11-599] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2010] [Accepted: 12/16/2010] [Indexed: 11/26/2022] Open

Abstract

BACKGROUND

Traditional genome alignment methods consider sequence alignment as a variation of the string edit distance problem, and perform alignment by matching characters of the two sequences. They are often computationally expensive and unable to deal with low information regions. Furthermore, they lack a well-principled objective function to measure the performance of sets of parameters. Since genomic sequences carry genetic information, this article proposes that the information content of each nucleotide in a position should be considered in sequence alignment. An information-theoretic approach for pairwise genome local alignment, namely XMAligner, is presented. Instead of comparing sequences at the character level, XMAligner considers a pair of nucleotides from two sequences to be related if their mutual information in context is significant. The information content of nucleotides in sequences is measured by a lossless compression technique.

RESULTS

Experiments on both simulated data and real data show that XMAligner is superior to conventional methods especially on distantly related sequences and statistically biased data. XMAligner can align sequences of eukaryote genome size with only a modest hardware requirement. Importantly, the method has an objective function which can obviate the need to choose parameter values for high quality alignment. The alignment results from XMAligner can be integrated into a visualisation tool for viewing purpose.

CONCLUSIONS

The information-theoretic approach for sequence alignment is shown to overcome the mentioned problems of conventional character matching alignment methods. The article shows that, as genomic sequences are meant to carry information, considering the information content of nucleotides is helpful for genomic sequence alignment.

AVAILABILITY

Downloadable binaries, documentation and data can be found at ftp://ftp.infotech.monash.edu.au/software/DNAcompress-XM/XMAligner/.

Collapse

Aniba MR, Poch O, Thompson JD. Issues in bioinformatics benchmarking: the case study of multiple sequence alignment. Nucleic Acids Res 2010;38:7353-63. [PMID: 20639539 PMCID: PMC2995051 DOI: 10.1093/nar/gkq625] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2010] [Revised: 06/10/2010] [Accepted: 06/29/2010] [Indexed: 11/13/2022] Open

Nguyen TT, Almon RR, Dubois DC, Jusko WJ, Androulakis IP. Comparative analysis of acute and chronic corticosteroid pharmacogenomic effects in rat liver: transcriptional dynamics and regulatory structures. BMC Bioinformatics 2010;11:515. [PMID: 20946642 PMCID: PMC2973961 DOI: 10.1186/1471-2105-11-515] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2010] [Accepted: 10/14/2010] [Indexed: 12/11/2022] Open

Abstract

Background

Comprehensively understanding corticosteroid pharmacogenomic effects is an essential step towards an insight into the underlying molecular mechanisms for both beneficial and detrimental clinical effects. Nevertheless, even in a single tissue different methods of corticosteroid administration can induce different patterns of expression and regulatory control structures. Therefore, rich in vivo datasets of pharmacological time-series with two dosing regimens sampled from rat liver are examined for temporal patterns of changes in gene expression and their regulatory commonalities.

Results

The study addresses two issues, including (1) identifying significant transcriptional modules coupled with dynamic expression patterns and (2) predicting relevant common transcriptional controls to better understand the underlying mechanisms of corticosteroid adverse effects. Following the orientation of meta-analysis, an extended computational approach that explores the concept of agreement matrix from consensus clustering has been proposed with the aims of identifying gene clusters that share common expression patterns across multiple dosing regimens as well as handling challenges in the analysis of microarray data from heterogeneous sources, e.g. different platforms and time-grids in this study. Six significant transcriptional modules coupled with typical patterns of expression have been identified. Functional analysis reveals that virtually all enriched functions (gene ontologies, pathways) in these modules are shown to be related to metabolic processes, implying the importance of these modules in adverse effects under the administration of corticosteroids. Relevant putative transcriptional regulators (e.g. RXRF, FKHD, SP1F) are also predicted to provide another source of information towards better understanding the complexities of expression patterns and the underlying regulatory mechanisms of those modules.

Conclusions

We have proposed a framework to identify significant coexpressed clusters of genes across multiple conditions experimented from different microarray platforms, time-grids, and also tissues if applicable. Analysis on rich in vivo datasets of corticosteroid time-series yielded significant insights into the pharmacogenomic effects of corticosteroids, especially the relevance to metabolic side-effects. This has been illustrated through enriched metabolic functions in those transcriptional modules and the presence of GRE binding motifs in those enriched pathways, providing significant modules for further analysis on pharmacogenomic corticosteroid effects.

Collapse

Nakato R, Gotoh O. Cgaln: fast and space-efficient whole-genome alignment. BMC Bioinformatics 2010;11:224. [PMID: 20433723 PMCID: PMC2873541 DOI: 10.1186/1471-2105-11-224] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2010] [Accepted: 04/30/2010] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Whole-genome sequence alignment is an essential process for extracting valuable information about the functions, evolution, and peculiarities of genomes under investigation. As available genomic sequence data accumulate rapidly, there is great demand for tools that can compare whole-genome sequences within practical amounts of time and space. However, most existing genomic alignment tools can treat sequences that are only a few Mb long at once, and no state-of-the-art alignment program can align large sequences such as mammalian genomes directly on a conventional standalone computer.

RESULTS

We previously proposed the CGAT (Coarse-Grained AlignmenT) algorithm, which performs an alignment job in two steps: first at the block level and then at the nucleotide level. The former is "coarse-grained" alignment that can explore genomic rearrangements and reduce the sizes of the regions to be analyzed in the next step. The latter is detailed alignment within limited regions. In this paper, we present an update of the algorithm and the open-source program, Cgaln, that implements the algorithm. We compared the performance of Cgaln with those of other programs on whole genomic sequences of several bacteria and of some mammalian chromosome pairs. The results showed that Cgaln is several times faster and more memory-efficient than the best existing programs, while its sensitivity and accuracy are comparable to those of the best programs. Cgaln takes less than 13 hours to finish an alignment between the whole genomes of human and mouse in a single run on a conventional desktop computer with a single CPU and 2 GB memory.

CONCLUSIONS

Cgaln is not only fast and memory efficient but also effective in coping with genomic rearrangements. Our results show that Cgaln is very effective for comparison of large genomes, especially of intact chromosomal sequences. We believe that Cgaln provides novel viewpoint for reducing computational complexity and will contribute to various fields of genome science.

Collapse

Kim J, Sinha S. Towards realistic benchmarks for multiple alignments of non-coding sequences. BMC Bioinformatics 2010;11:54. [PMID: 20102627 PMCID: PMC2823711 DOI: 10.1186/1471-2105-11-54] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2009] [Accepted: 01/26/2010] [Indexed: 02/02/2023] Open

Abstract

BACKGROUND

With the continued development of new computational tools for multiple sequence alignment, it is necessary today to develop benchmarks that aid the selection of the most effective tools. Simulation-based benchmarks have been proposed to meet this necessity, especially for non-coding sequences. However, it is not clear if such benchmarks truly represent real sequence data from any given group of species, in terms of the difficulty of alignment tasks.

RESULTS

We find that the conventional simulation approach, which relies on empirically estimated values for various parameters such as substitution rate or insertion/deletion rates, is unable to generate synthetic sequences reflecting the broad genomic variation in conservation levels. We tackle this problem with a new method for simulating non-coding sequence evolution, by relying on genome-wide distributions of evolutionary parameters rather than their averages. We then generate synthetic data sets to mimic orthologous sequences from the Drosophila group of species, and show that these data sets truly represent the variability observed in genomic data in terms of the difficulty of the alignment task. This allows us to make significant progress towards estimating the alignment accuracy of current tools in an absolute sense, going beyond only a relative assessment of different tools. We evaluate six widely used multiple alignment tools in the context of Drosophila non-coding sequences, and find the accuracy to be significantly different from previously reported values. Interestingly, the performance of most tools degrades more rapidly when there are more insertions than deletions in the data set, suggesting an asymmetric handling of insertions and deletions, even though none of the evaluated tools explicitly distinguishes these two types of events. We also examine the accuracy of two existing tools for annotating insertions versus deletions, and find their performance to be close to optimal in Drosophila non-coding sequences if provided with the true alignments.

CONCLUSION

We have developed a method to generate benchmarks for multiple alignments of Drosophila non-coding sequences, and shown it to be more realistic than traditional benchmarks. Apart from helping to select the most effective tools, these benchmarks will help practitioners of comparative genomics deal with the effects of alignment errors, by providing accurate estimates of the extent of these errors.

Collapse

Patterns of DNA-sequence divergence between Drosophila miranda and D. pseudoobscura. J Mol Evol 2009;69:601-11. [PMID: 19859648 DOI: 10.1007/s00239-009-9298-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2009] [Accepted: 10/07/2009] [Indexed: 12/22/2022]

Rajapakse J, Chen C, Ho SL. Comparative genomic workflow: discovery of conserved noncoding DNA patterns. IEEE ENGINEERING IN MEDICINE AND BIOLOGY MAGAZINE : THE QUARTERLY MAGAZINE OF THE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY 2009;28:19-24. [PMID: 19622420 DOI: 10.1109/memb.2009.932910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]

Löytynoja A, Goldman N. A model of evolution and structure for multiple sequence alignment. Philos Trans R Soc Lond B Biol Sci 2009;363:3913-9. [PMID: 18852103 PMCID: PMC2592536 DOI: 10.1098/rstb.2008.0170] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

He X, Ling X, Sinha S. Alignment and prediction of cis-regulatory modules based on a probabilistic model of evolution. PLoS Comput Biol 2009;5:e1000299. [PMID: 19293946 PMCID: PMC2657044 DOI: 10.1371/journal.pcbi.1000299] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2008] [Accepted: 01/22/2009] [Indexed: 11/30/2022] Open

Abstract

Cross-species comparison has emerged as a powerful paradigm for predicting cis-regulatory modules (CRMs) and understanding their evolution. The comparison requires reliable sequence alignment, which remains a challenging task for less conserved noncoding sequences. Furthermore, the existing models of DNA sequence evolution generally do not explicitly treat the special properties of CRM sequences. To address these limitations, we propose a model of CRM evolution that captures different modes of evolution of functional transcription factor binding sites (TFBSs) and the background sequences. A particularly novel aspect of our work is a probabilistic model of gains and losses of TFBSs, a process being recognized as an important part of regulatory sequence evolution. We present a computational framework that uses this model to solve the problems of CRM alignment and prediction. Our alignment method is similar to existing methods of statistical alignment but uses the conserved binding sites to improve alignment. Our CRM prediction method deals with the inherent uncertainties of binding site annotations and sequence alignment in a probabilistic framework. In simulated as well as real data, we demonstrate that our program is able to improve both alignment and prediction of CRM sequences over several state-of-the-art methods. Finally, we used alignments produced by our program to study binding site conservation in genome-wide binding data of key transcription factors in the Drosophila blastoderm, with two intriguing results: (i) the factor-bound sequences are under strong evolutionary constraints even if their neighboring genes are not expressed in the blastoderm and (ii) binding sites in distal bound sequences (relative to transcription start sites) tend to be more conserved than those in proximal regions. Our approach is implemented as software, EMMA (Evolutionary Model-based cis-regulatory Module Analysis), ready to be applied in a broad biological context.

Comparison of noncoding DNA sequences across species has the potential to significantly improve our understanding of gene regulation and our ability to annotate regulatory regions of the genome. This potential is evident from recent publications analyzing 12 Drosophila genomes for regulatory annotation. However, because noncoding sequences are much less structured than coding sequences, their interspecies comparison presents technical challenges, such as ambiguity about how to align them and how to predict transcription factor binding sites, which are the fundamental units that make up regulatory sequences. This article describes how to build an integrated probabilistic framework that performs alignment and binding site prediction simultaneously, in the process improving the accuracy of both tasks. It defines a stochastic model for the evolution of entire “cis-regulatory modules,” with its highlight being a novel theoretical treatment of the commonly observed loss and gain of binding sites during evolution. This new evolutionary model forms the backbone of newly developed software for the prediction of new cis-regulatory modules, alignment of known modules to elucidate general principles of cis-regulatory evolution, or both. The new software is demonstrated to provide benefits in performance of these two crucial genomics tasks.

Collapse

McLean C, Bejerano G. Dispensability of mammalian DNA. Genome Res 2008;18:1743-51. [PMID: 18832441 DOI: 10.1101/gr.080184.108] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]

Liu J, Xu X, Stormo GD. The cis-regulatory map of Shewanella genomes. Nucleic Acids Res 2008;36:5376-90. [PMID: 18701645 PMCID: PMC2532739 DOI: 10.1093/nar/gkn515] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Abstract

While hundreds of microbial genomes are sequenced, the challenge remains to define their cis-regulatory maps. Here, we present a comparative genomic analysis of the cis-regulatory map of Shewanella oneidensis, an important model organism for bioremediation because of its extraordinary abilities to use a wide variety of metals and organic molecules as electron acceptors in respiration. First, from the experimentally verified transcriptional regulatory networks of Escherichia coli, we inferred 24 DNA motifs that are conserved in S. oneidensis. We then applied a new comparative approach on five Shewanella genomes that allowed us to systematically identify 194 nonredundant palindromic DNA motifs and corresponding regulons in S. oneidensis. Sixty-four percent of the predicted motifs are conserved in at least three of the seven newly sequenced and distantly related Shewanella genomes. In total, we obtained 209 unique DNA motifs in S. oneidensis that cover 849 unique transcription units. Besides conservation in other genomes, 77 of these motifs are supported by at least one additional type of evidence, including matching to known transcription factor binding motifs and significant functional enrichment or expression coherence of the corresponding target genes. Using the same approach on a more focused gene set, 990 differentially expressed genes derived from published microarray data of S. oneidensis during exposure to metal ions, we identified 31 putative cis-regulatory motifs (16 with at least one type of additional supporting evidence) that are potentially involved in the process of metal reduction. The majority (18/31) of those motifs had been found in our whole-genome comparative approach, further demonstrating that such an approach is capable of uncovering a large fraction of the regulatory map of a genome even in the absence of experimental data. The integrated computational approach developed in this study provides a useful strategy to identify genome-wide cis-regulatory maps and a novel avenue to explore the regulatory pathways for particular biological processes in bacterial systems.

Collapse

Huang W, Nevins JR, Ohler U. Phylogenetic simulation of promoter evolution: estimation and modeling of binding site turnover events and assessment of their impact on alignment tools. Genome Biol 2008;8:R225. [PMID: 17956628 PMCID: PMC2246299 DOI: 10.1186/gb-2007-8-10-r225] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2007] [Revised: 10/20/2007] [Accepted: 10/24/2007] [Indexed: 02/07/2023] Open

Jones SJM. Prediction of genomic functional elements. Annu Rev Genomics Hum Genet 2008;7:315-38. [PMID: 16824019 DOI: 10.1146/annurev.genom.7.080505.115745] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Approaches to comparative sequence analysis: towards a functional view of vertebrate genomes. Nat Rev Genet 2008;9:303-13. [PMID: 18347593 DOI: 10.1038/nrg2185] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Li L, Zhu Q, He X, Sinha S, Halfon MS. Large-scale analysis of transcriptional cis-regulatory modules reveals both common features and distinct subclasses. Genome Biol 2008;8:R101. [PMID: 17550599 PMCID: PMC2394749 DOI: 10.1186/gb-2007-8-6-r101] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2007] [Revised: 05/23/2007] [Accepted: 06/05/2007] [Indexed: 02/01/2023] Open

Abstract

Analysis of 280 experimentally-verified cis-regulatory modules from Drosophila reveal features both common to all and unique to distinct subclasses of modules.

Background

Transcriptional cis-regulatory modules (for example, enhancers) play a critical role in regulating gene expression. While many individual regulatory elements have been characterized, they have never been analyzed as a class.

Results

We have performed the first such large-scale study of cis-regulatory modules in order to determine whether they have common properties that might aid in their identification and contribute to our understanding of the mechanisms by which they function. A total of 280 individual, experimentally verified cis-regulatory modules from Drosophila were analyzed for a range of sequence-level and functional properties. We report here that regulatory modules do indeed share common properties, among them an elevated GC content, an increased level of interspecific sequence conservation, and a tendency to be transcribed into RNA. However, we find that dense clustering of transcription factor binding sites, especially homotypic clustering, which is commonly believed to be a general characteristic of regulatory modules, is rather a feature that belongs chiefly to a specific subclass. This has important implications for current computational approaches, many of which are biased toward this subset. We explore two new strategies to assess binding site clustering and gauge their performances with respect to their ability to detect all 280 modules and various functionally coherent subsets.

Conclusion

Our findings demonstrate that cis-regulatory modules share common features that help to define them as a class and that may lead to new insights into mechanisms of gene regulation. However, these properties alone may not be sufficient to reliably distinguish regulatory from non-regulatory sequences. We also demonstrate that there are distinct subclasses of cis-regulatory modules that are more amenable to in silico detection than others and that these differences must be taken into account when attempting genome-wide regulatory element discovery.

Collapse

Iwama H, Hori Y, Matsumoto K, Murao K, Ishida T. ReAlignerV: web-based genomic alignment tool with high specificity and robustness estimated by species-specific insertion sequences. BMC Bioinformatics 2008;9:112. [PMID: 18294369 PMCID: PMC2267439 DOI: 10.1186/1471-2105-9-112] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2007] [Accepted: 02/22/2008] [Indexed: 11/23/2022] Open

Lyons E, Freeling M. How to usefully compare homologous plant genes and chromosomes as DNA sequences. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2008;53:661-73. [PMID: 18269575 DOI: 10.1111/j.1365-313x.2007.03326.x] [Citation(s) in RCA: 308] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]

Janky R, van Helden J. Evaluation of phylogenetic footprint discovery for predicting bacterial cis-regulatory elements and revealing their evolution. BMC Bioinformatics 2008;9:37. [PMID: 18215291 PMCID: PMC2248561 DOI: 10.1186/1471-2105-9-37] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2007] [Accepted: 01/23/2008] [Indexed: 11/24/2022] Open

Benavides E, Baum R, McClellan D, Sites JW. Molecular phylogenetics of the lizard genus Microlophus (squamata:tropiduridae): aligning and retrieving indel signal from nuclear introns. Syst Biol 2008;56:776-97. [PMID: 17907054 DOI: 10.1080/10635150701618527] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open

Abstract

We use a multigene data set (the mitochondrial locus and nine nuclear gene regions) to test phylogenetic relationships in the South American "lava lizards" (genus Microlophus) and describe a strategy for aligning noncoding sequences that accounts for differences in tempo and class of mutational events. We focus on seven nuclear introns that vary in size and frequency of multibase length mutations (i.e., indels) and present a manual alignment strategy that incorporates insertions and deletions (indels) for each intron. Our method is based on mechanistic explanations of intron evolution that does not require a guide tree. We also use a progressive alignment algorithm (Probabilistic Alignment Kit; PRANK) and distinguishes insertions from deletions and avoids the "gapcost" conundrum. We describe an approach to selecting a guide tree purged of ambiguously aligned regions and use this to refine PRANK performance. We show that although manual alignment is successful in finding repeat motifs and the most obvious indels, some regions can only be subjectively aligned, and there are limits to the size and complexity of a data matrix for which this approach can be taken. PRANK alignments identified more parsimony-informative indels while simultaneously increasing nucleotide identity in conserved sequence blocks flanking the indel regions. When comparing manual and PRANK with two widely used methods (CLUSTAL, MUSCLE) for the alignment of the most length-variable intron, only PRANK recovered a tree congruent at deeper nodes with the combined data tree inferred from all nuclear gene regions. We take this concordance as an objective function of alignment quality and present a strongly supported phylogenetic hypothesis for Microlophus relationships. From this hypothesis we show that (1) a coded indel data partition derived from the PRANK alignment contributed significantly to nodal support and (2) the indel data set permitted detection of significant conflict between mitochondrial and nuclear data partitions, which we hypothesize arose from secondary contact of distantly related taxa, followed by hybridization and mtDNA introgression.

Collapse

Lunter G, Rocco A, Mimouni N, Heger A, Caldeira A, Hein J. Uncertainty in homology inferences: assessing and improving genomic sequence alignment. Genome Res 2007;18:298-309. [PMID: 18073381 DOI: 10.1101/gr.6725608] [Citation(s) in RCA: 114] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Abstract

Sequence alignment underpins all of comparative genomics, yet it remains an incompletely solved problem. In particular, the statistical uncertainty within inferred alignments is often disregarded, while parametric or phylogenetic inferences are considered meaningless without confidence estimates. Here, we report on a theoretical and simulation study of pairwise alignments of genomic DNA at human-mouse divergence. We find that >15% of aligned bases are incorrect in existing whole-genome alignments, and we identify three types of alignment error, each leading to systematic biases in all algorithms considered. Careful modeling of the evolutionary process improves alignment quality; however, these improvements are modest compared with the remaining alignment errors, even with exact knowledge of the evolutionary model, emphasizing the need for statistical approaches to account for uncertainty. We develop a new algorithm, Marginalized Posterior Decoding (MPD), which explicitly accounts for uncertainties, is less biased and more accurate than other algorithms we consider, and reduces the proportion of misaligned bases by a third compared with the best existing algorithm. To our knowledge, this is the first nonheuristic algorithm for DNA sequence alignment to show robust improvements over the classic Needleman-Wunsch algorithm. Despite this, considerable uncertainty remains even in the improved alignments. We conclude that a probabilistic treatment is essential, both to improve alignment quality and to quantify the remaining uncertainty. This is becoming increasingly relevant with the growing appreciation of the importance of noncoding DNA, whose study relies heavily on alignments. Alignment errors are inevitable, and should be considered when drawing conclusions from alignments. Software and alignments to assist researchers in doing this are provided at http://genserv.anat.ox.ac.uk/grape/.

Collapse

Wang AX, Ruzzo WL, Tompa M. How accurately is ncRNA aligned within whole-genome multiple alignments? BMC Bioinformatics 2007;8:417. [PMID: 17963514 PMCID: PMC2206062 DOI: 10.1186/1471-2105-8-417] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2007] [Accepted: 10/26/2007] [Indexed: 11/12/2022] Open

Mannhaupt G, Feldmann H. Genomic evolution of the proteasome system among hemiascomycetous yeasts. J Mol Evol 2007;65:529-40. [PMID: 17909694 DOI: 10.1007/s00239-007-9031-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2007] [Revised: 07/24/2007] [Accepted: 08/17/2007] [Indexed: 10/22/2022]

Blanchette M. Computation and Analysis of Genomic Multi-Sequence Alignments. Annu Rev Genomics Hum Genet 2007;8:193-213. [PMID: 17489682 DOI: 10.1146/annurev.genom.8.080706.092300] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Alekseyenko AV, Kim N, Lee CJ. Global analysis of exon creation versus loss and the role of alternative splicing in 17 vertebrate genomes. RNA (NEW YORK, N.Y.) 2007;13:661-70. [PMID: 17369312 PMCID: PMC1852814 DOI: 10.1261/rna.325107] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]

Doniger SW, Fay JC. Frequent gain and loss of functional transcription factor binding sites. PLoS Comput Biol 2007;3:e99. [PMID: 17530920 PMCID: PMC1876492 DOI: 10.1371/journal.pcbi.0030099] [Citation(s) in RCA: 126] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2006] [Accepted: 04/19/2007] [Indexed: 01/20/2023] Open

Abstract

Cis-regulatory sequences are not always conserved across species. Divergence within cis-regulatory sequences may result from the evolution of species-specific patterns of gene expression or the flexible nature of the cis-regulatory code. The identification of functional divergence in cis-regulatory sequences is therefore important for both understanding the role of gene regulation in evolution and annotating regulatory elements. We have developed an evolutionary model to detect the loss of constraint on individual transcription factor binding sites (TFBSs). We find that a significant fraction of functionally constrained binding sites have been lost in a lineage-specific manner among three closely related yeast species. Binding site loss has previously been explained by turnover, where the concurrent gain and loss of a binding site maintains gene regulation. We estimate that nearly half of all loss events cannot be explained by binding site turnover. Recreating the mutations that led to binding site loss confirms that these sequence changes affect gene expression in some cases. We also estimate that there is a high rate of binding site gain, as more than half of experimentally identified S. cerevisiae binding sites are not conserved across species. The frequent gain and loss of TFBSs implies that cis-regulatory sequences are labile and, in the absence of turnover, may contribute to species-specific patterns of gene expression.

Research in the field of molecular evolution is focused on understanding the genetic basis of functional differences between species. Protein coding sequences have traditionally been the focus of these studies, as the genetic code enables a detailed study of the strength of selection acting on amino acid sequences. However, from the earliest cross-species sequence comparisons, it was clear that protein sequences among closely related species are too similar to explain the observed phenotypic diversity. This led to the hypothesis that the evolution of gene regulation has played a key role in generating diversity between species. The availability of numerous complete genome sequences has made it possible to begin testing this hypothesis. In this work, the authors use an evolutionary model to identify functional divergence within transcription factor binding sites, the core functional elements involved in gene regulation. Applying this model to the baker's yeast, Saccharomyces cerevisiae, and its three closest relatives, the authors find that a substantial fraction of the ancestral binding sites have been lost in a species-specific manner. In some cases the loss of the binding site creates gene expression differences that may be indicative of species-specific changes in gene regulation. This work provides a useful computational framework that will allow further study of the conservation of cis-regulatory sequences and their role in molecular evolution.

Collapse

Kumar S, Filipski A. Multiple sequence alignment: in pursuit of homologous DNA positions. Genome Res 2007;17:127-35. [PMID: 17272647 DOI: 10.1101/gr.5232407] [Citation(s) in RCA: 104] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Ogden TH, Rosenberg MS. Alignment and Topological Accuracy of the Direct Optimization approach via POY and Traditional Phylogenetics via ClustalW + PAUP*. Syst Biol 2007;56:182-93. [PMID: 17454974 DOI: 10.1080/10635150701281102] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open

Rosenberg MS. MySSP: non-stationary evolutionary sequence simulation, including indels. Evol Bioinform Online 2007;1:81-3. [PMID: 19325855 PMCID: PMC2658873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Considerations in the identification of functional RNA structural elements in genomic alignments. BMC Bioinformatics 2007;8:33. [PMID: 17263882 PMCID: PMC1803800 DOI: 10.1186/1471-2105-8-33] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2006] [Accepted: 01/30/2007] [Indexed: 11/25/2022] Open

Abstract

Background

Accurate identification of novel, functional noncoding (nc) RNA features in genome sequence has proven more difficult than for exons. Current algorithms identify and score potential RNA secondary structures on the basis of thermodynamic stability, conservation, and/or covariance in sequence alignments. Neither the algorithms nor the information gained from the individual inputs have been independently assessed. Furthermore, due to issues in modelling background signal, it has been difficult to gauge the precision of these algorithms on a genomic scale, in which even a seemingly small false-positive rate can result in a vast excess of false discoveries.

Results

We developed a shuffling algorithm, shuffle-pair.pl, that simultaneously preserves dinucleotide frequency, gaps, and local conservation in pairwise sequence alignments. We used shuffle-pair.pl to assess precision and recall of six ncRNA search tools (MSARI, QRNA, ddbRNA, RNAz, Evofold, and several variants of simple thermodynamic stability on a test set of 3046 alignments of known ncRNAs. Relative to mononucleotide shuffling, preservation of dinucleotide content in shuffling the alignments resulted in a drastic increase in estimated false-positive detection rates for ncRNA elements, precluding evaluation of higher order alignments, which cannot not be adequately shuffled maintaining both dinucleotides and alignment structure. On pairwise alignments, none of the covariance-based tools performed markedly better than thermodynamic scoring alone. Although the high false-positive rates call into question the veracity of any individual predicted secondary structural element in our analysis, we nevertheless identified intriguing global trends in human genome alignments. The distribution of ncRNA prediction scores in 75-base windows overlapping UTRs, introns, and intergenic regions analyzed using both thermodynamic stability and EvoFold (which has no thermodynamic component) was significantly higher for real than shuffled sequence, while the distribution for coding sequences was lower than that of corresponding shuffles.

Conclusion

Accurate prediction of novel RNA structural elements in genome sequence remains a difficult problem, and development of an appropriate negative-control strategy for multiple alignments is an important practical challenge. Nonetheless, the general trends we observed for the distributions of predicted ncRNAs across genomic features are biologically meaningful, supporting the presence of secondary structural elements in many 3' UTRs, and providing evidence for evolutionary selection against secondary structures in coding regions.

Collapse

Cheng JF, Priest JR, Pennacchio LA. Comparative genomics: a tool to functionally annotate human DNA. Methods Mol Biol 2007;366:229-51. [PMID: 17568128 DOI: 10.1007/978-1-59745-030-0_13] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]

Müller F, Borycki AG. Sequence analyses to study the evolutionary history and cis-regulatory elements of Hedgehog genes. Methods Mol Biol 2007;397:231-250. [PMID: 18025724 DOI: 10.1007/978-1-59745-516-9_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]

Morgenstern B. Alignment of genomic sequences using DIALIGN. Methods Mol Biol 2007;395:195-204. [PMID: 17993675 DOI: 10.1007/978-1-59745-514-5_12] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]

Comparative analysis and visualization of genomic sequences using VISTA browser and associated computational tools. Methods Mol Biol 2007;395:3-16. [PMID: 17993664 DOI: 10.1007/978-1-59745-514-5_1] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]

Down TA, Bergman CM, Su J, Hubbard TJP. Large-scale discovery of promoter motifs in Drosophila melanogaster. PLoS Comput Biol 2006;3:e7. [PMID: 17238282 PMCID: PMC1779301 DOI: 10.1371/journal.pcbi.0030007] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2006] [Accepted: 12/01/2006] [Indexed: 11/28/2022] Open

Abstract

A key step in understanding gene regulation is to identify the repertoire of transcription factor binding motifs (TFBMs) that form the building blocks of promoters and other regulatory elements. Identifying these experimentally is very laborious, and the number of TFBMs discovered remains relatively small, especially when compared with the hundreds of transcription factor genes predicted in metazoan genomes. We have used a recently developed statistical motif discovery approach, NestedMICA, to detect candidate TFBMs from a large set of Drosophila melanogaster promoter regions. Of the 120 motifs inferred in our initial analysis, 25 were statistically significant matches to previously reported motifs, while 87 appeared to be novel. Analysis of sequence conservation and motif positioning suggested that the great majority of these discovered motifs are predictive of functional elements in the genome. Many motifs showed associations with specific patterns of gene expression in the D. melanogaster embryo, and we were able to obtain confident annotation of expression patterns for 25 of our motifs, including eight of the novel motifs. The motifs are available through Tiffin, a new database of DNA sequence motifs. We have discovered many new motifs that are overrepresented in D. melanogaster promoter regions, and offer several independent lines of evidence that these are novel TFBMs. Our motif dictionary provides a solid foundation for further investigation of regulatory elements in Drosophila, and demonstrates techniques that should be applicable in other species. We suggest that further improvements in computational motif discovery should narrow the gap between the set of known motifs and the total number of transcription factors in metazoan genomes.

In contrast to the genomic sequences that encode proteins, little is known about the regulatory elements that instruct the cell as to when and where a given gene should be active. Regulatory elements are thought to consist of clusters of short DNA words (motifs), each of which acts as a binding site for sequence-specific DNA binding protein. Thus, building a comprehensive dictionary of such motifs is an important step towards a broader understanding of gene regulation. Using the recently published NestedMICA method for detecting overrepresented motifs in a set of sequences, we build a dictionary of 120 motifs from regulatory sequences in the fruitfly genome, 87 of which are novel. Analysis of positional biases, conservation across species, and association with specific patterns of gene expression in fruitfly embryos suggest that the great majority of these newly discovered motifs represent functional regulatory elements. In addition to providing an initial motif dictionary for one of the most intensively studied model organisms, this work provides an analytical framework for the comprehensive discovery of regulatory motifs in complex animal genomes.

Collapse

Uchiyama I, Higuchi T, Kobayashi I. CGAT: a comparative genome analysis tool for visualizing alignments in the analysis of complex evolutionary changes between closely related genomes. BMC Bioinformatics 2006;7:472. [PMID: 17062155 PMCID: PMC1643837 DOI: 10.1186/1471-2105-7-472] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2006] [Accepted: 10/24/2006] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The recent accumulation of closely related genomic sequences provides a valuable resource for the elucidation of the evolutionary histories of various organisms. However, although numerous alignment calculation and visualization tools have been developed to date, the analysis of complex genomic changes, such as large insertions, deletions, inversions, translocations and duplications, still presents certain difficulties.

RESULTS

We have developed a comparative genome analysis tool, named CGAT, which allows detailed comparisons of closely related bacteria-sized genomes mainly through visualizing middle-to-large-scale changes to infer underlying mechanisms. CGAT displays precomputed pairwise genome alignments on both dotplot and alignment viewers with scrolling and zooming functions, and allows users to move along the pre-identified orthologous alignments. Users can place several types of information on this alignment, such as the presence of tandem repeats or interspersed repetitive sequences and changes in G+C contents or codon usage bias, thereby facilitating the interpretation of the observed genomic changes. In addition to displaying precomputed alignments, the viewer can dynamically calculate the alignments between specified regions; this feature is especially useful for examining the alignment boundaries, as these boundaries are often obscure and can vary between programs. Besides the alignment browser functionalities, CGAT also contains an alignment data construction module, which contains various procedures that are commonly used for pre- and post-processing for large-scale alignment calculation, such as the split-and-merge protocol for calculating long alignments, chaining adjacent alignments, and ortholog identification. Indeed, CGAT provides a general framework for the calculation of genome-scale alignments using various existing programs as alignment engines, which allows users to compare the outputs of different alignment programs. Earlier versions of this program have been used successfully in our research to infer the evolutionary history of apparently complex genome changes between closely related eubacteria and archaea.

CONCLUSION

CGAT is a practical tool for analyzing complex genomic changes between closely related genomes using existing alignment programs and other sequence analysis tools combined with extensive manual inspection.

Collapse

Ogden TH, Rosenberg MS. How should gaps be treated in parsimony? A comparison of approaches using simulation. Mol Phylogenet Evol 2006;42:817-26. [PMID: 17011794 DOI: 10.1016/j.ympev.2006.07.021] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2006] [Revised: 07/07/2006] [Accepted: 07/22/2006] [Indexed: 10/24/2022]

Pollard DA, Moses AM, Iyer VN, Eisen MB. Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments. BMC Bioinformatics 2006;7:376. [PMID: 16904011 PMCID: PMC1613255 DOI: 10.1186/1471-2105-7-376] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2006] [Accepted: 08/14/2006] [Indexed: 01/01/2023] Open

Abstract

Background

Molecular evolutionary studies of noncoding sequences rely on multiple alignments. Yet how multiple alignment accuracy varies across sequence types, tree topologies, divergences and tools, and further how this variation impacts specific inferences, remains unclear.

Results

Here we develop a molecular evolution simulation platform, CisEvolver, with models of background noncoding and transcription factor binding site evolution, and use simulated alignments to systematically examine multiple alignment accuracy and its impact on two key molecular evolutionary inferences: transcription factor binding site conservation and divergence estimation. We find that the accuracy of multiple alignments is determined almost exclusively by the pairwise divergence distance of the two most diverged species and that additional species have a negligible influence on alignment accuracy. Conserved transcription factor binding sites align better than surrounding noncoding DNA yet are often found to be misaligned at relatively short divergence distances, such that studies of binding site gain and loss could easily be confounded by alignment error. Divergence estimates from multiple alignments tend to be overestimated at short divergence distances but reach a tool specific divergence at which they cease to increase, leading to underestimation at long divergences. Our most striking finding was that overall alignment accuracy, binding site alignment accuracy and divergence estimation accuracy vary greatly across branches in a tree and are most accurate for terminal branches connecting sister taxa and least accurate for internal branches connecting sub-alignments.

Conclusion

Our results suggest that variation in alignment accuracy can lead to errors in molecular evolutionary inferences that could be construed as biological variation. These findings have implications for which species to choose for analyses, what kind of errors would be expected for a given set of species and how multiple alignment tools and phylogenetic inference methods might be improved to minimize or control for alignment errors.

Collapse

Ogden TH, Rosenberg MS. Multiple sequence alignment accuracy and phylogenetic inference. Syst Biol 2006;55:314-28. [PMID: 16611602 DOI: 10.1080/10635150500541730] [Citation(s) in RCA: 155] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open

Wang J, Keightley PD, Johnson T. MCALIGN2: faster, accurate global pairwise alignment of non-coding DNA sequences based on explicit models of indel evolution. BMC Bioinformatics 2006;7:292. [PMID: 16762073 PMCID: PMC1534069 DOI: 10.1186/1471-2105-7-292] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2005] [Accepted: 06/08/2006] [Indexed: 11/10/2022] Open