1
|
Noureen M, Kawashima T, Arita M. Genetic Markers of Genome Rearrangements in Helicobacter pylori. Microorganisms 2021; 9:621. [PMID: 33802974 PMCID: PMC8002640 DOI: 10.3390/microorganisms9030621] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 03/11/2021] [Accepted: 03/12/2021] [Indexed: 11/16/2022] Open
Abstract
Helicobacter pylori exhibits a diverse genomic structure with high mutation and recombination rates. Various genetic elements function as drivers of this genomic diversity including genome rearrangements. Identifying the association of these elements with rearrangements can pave the way to understand its genome evolution. We analyzed the order of orthologous genes among 72 publicly available complete genomes to identify large genome rearrangements, and rearrangement breakpoints were compared with the positions of insertion sequences, genomic islands, and restriction modification genes. Comparison of the shared inversions revealed the conserved genomic elements across strains from different geographical locations. Some were region-specific and others were global, indicating that highly shared rearrangements and their markers were more ancestral than strain-or region-specific ones. The locations of genomic islands were an important factor for the occurrence of the rearrangements. Comparative genomics helps to evaluate the conservation of various elements contributing to the diversity across genomes.
Collapse
Affiliation(s)
- Mehwish Noureen
- Department of Genetics, SOKENDAI University, Yata 1111, Mishima 411-8540, Shizuoka, Japan;
| | - Takeshi Kawashima
- Bioinformation and DDBJ Center, National Institute of Genetics, Yata 1111, Mishima 411-8540, Shizuoka, Japan;
| | - Masanori Arita
- Bioinformation and DDBJ Center, National Institute of Genetics, Yata 1111, Mishima 411-8540, Shizuoka, Japan;
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro, Tsurumi, Yokohama 230-0045, Kanagawa, Japan
| |
Collapse
|
2
|
Abstract
The domestication of animals led to a major shift in human subsistence patterns, from a hunter-gatherer to a sedentary agricultural lifestyle, which ultimately resulted in the development of complex societies. Over the past 15,000 years, the phenotype and genotype of multiple animal species, such as dogs, pigs, sheep, goats, cattle and horses, have been substantially altered during their adaptation to the human niche. Recent methodological innovations, such as improved ancient DNA extraction methods and next-generation sequencing, have enabled the sequencing of whole ancient genomes. These genomes have helped reconstruct the process by which animals entered into domestic relationships with humans and were subjected to novel selection pressures. Here, we discuss and update key concepts in animal domestication in light of recent contributions from ancient genomics.
Collapse
|
3
|
Luhmann N, Lafond M, Thevenin A, Ouangraoua A, Wittler R, Chauve C. The SCJ Small Parsimony Problem for Weighted Gene Adjacencies. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1364-1373. [PMID: 28166504 DOI: 10.1109/tcbb.2017.2661761] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Reconstructing ancestral gene orders in a given phylogeny is a classical problem in comparative genomics. Most existing methods compare conserved features in extant genomes in the phylogeny to define potential ancestral gene adjacencies, and either try to reconstruct all ancestral genomes under a global evolutionary parsimony criterion, or, focusing on a single ancestral genome, use a scaffolding approach to select a subset of ancestral gene adjacencies, generally aiming at reducing the fragmentation of the reconstructed ancestral genome. In this paper, we describe an exact algorithm for the Small Parsimony Problem that combines both approaches. We consider that gene adjacencies at internal nodes of the species phylogeny are weighted, and we introduce an objective function defined as a convex combination of these weights and the evolutionary cost under the Single-Cut-or-Join (SCJ) model. The weights of ancestral gene adjacencies can, e.g., be obtained through the recent availability of ancient DNA sequencing data, which provide a direct hint at the genome structure of the considered ancestor, or through probabilistic analysis of gene adjacencies evolution. We show the NP-hardness of our problem variant and propose a Fixed-Parameter Tractable algorithm based on the Sankoff-Rousseau dynamic programming algorithm that also allows to sample co-optimal solutions. We apply our approach to mammalian and bacterial data providing different degrees of complexity. We show that including adjacency weights in the objective has a significant impact in reducing the fragmentation of the reconstructed ancestral gene orders. An implementation is available at http://github.com/nluhmann/PhySca.
Collapse
|
4
|
Feng S, Li H, Song F, Wang Y, Stejskal V, Cai W, Li Z. A novel mitochondrial genome fragmentation pattern in Liposcelis brunnea, the type species of the genus Liposcelis (Psocodea: Liposcelididae). Int J Biol Macromol 2019; 132:1296-1303. [DOI: 10.1016/j.ijbiomac.2019.04.034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Revised: 03/22/2019] [Accepted: 04/05/2019] [Indexed: 10/27/2022]
|
5
|
Luhmann N, Chauve C, Stoye J, Wittler R. Scaffolding of Ancient Contigs and Ancestral Reconstruction in a Phylogenetic Framework. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:2094-2100. [PMID: 29993816 DOI: 10.1007/978-3-319-12418-6_17] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Ancestral genome reconstruction is an important task to analyze the evolution of genomes. Recent progress in sequencing ancient DNA led to the publication of so-called paleogenomes and allows the integration of this sequencing data in genome evolution analysis. However, the de novo assembly of ancient genomes is usually fragmented due to DNA degradation over time among others. Integrated phylogenetic assembly addresses the issue of genome fragmentation in the ancient DNA assembly while aiming to improve the reconstruction of all ancient genomes in the phylogeny simultaneously. The fragmented assembly of the ancient genome can be represented as an assembly graph, indicating contradicting ordering information of contigs. In this setting, our approach is to compare the ancient data with extant finished genomes. We generalize a reconstruction approach minimizing the Single-Cut-or-Join rearrangement distance towards multifurcating trees and include edge lengths to improve the reconstruction in practice. This results in a polynomial time algorithm that includes additional ancient DNA data at one node in the tree, resulting in consistent reconstructions of ancestral genomes.
Collapse
|
6
|
Wang D, Wang L. GRSR: a tool for deriving genome rearrangement scenarios from multiple unichromosomal genome sequences. BMC Bioinformatics 2018; 19:291. [PMID: 30367596 PMCID: PMC6101096 DOI: 10.1186/s12859-018-2268-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Background Genome rearrangements describe changes in the genetic linkage relationship of large chromosomal regions, involving reversals, transpositions, block interchanges, deletions, insertions, fissions, fusions and translocations etc. Many algorithms for calculating rearrangement scenarios between two genomes have been proposed. Very often, the calculated rearrangement scenario is not unique for the same pair of permutations. Hence, how to decide which calculated rearrangement scenario is more biologically meaningful becomes an essential task. Up to now, several mechanisms for genome rearrangements have been studied. One important theory is that genome rearrangement may be mediated by repeats, especially for reversal events. Many reversal regions are found to be flanked by a pair of inverted repeats. As a result, whether there are repeats at the breakpoints of the calculated rearrangement events can shed a light on deciding whether the calculated rearrangement events is biologically meaningful. To our knowledge, there is no tool which can automatically identify rearrangement events and check whether there exist repeats at the breakpoints of each calculated rearrangement event. Results In this paper, we describe a new tool named GRSR which allows us to compare multiple unichromosomal genomes to identify “independent” (obvious) rearrangement events such as reversals, (inverted) block interchanges and (inverted) transpositions and automatically searches for repeats at the breakpoints of each rearrangement event. We apply our tool on the complete genomes of 28 Mycobacterium tuberculosis strains and 24 Shewanella strains respectively. In both Mycobacterium tuberculosis and Shewanella strains, our tool finds many reversal regions flanked by a pair of inverted repeats. In particular, the GRSR tool also finds an inverted transposition and an inverted block interchange in Shewanella, where the repeats at the ends of rearrangement regions remain unchanged after the rearrangement event. To our knowledge, this is the first time such a phenomenon for inverted transposition and inverted block interchange is reported in Shewanella. Conclusions From the calculated results, there are many examples supporting the theory that the existence of repeats at the breakpoints of a rearrangement event can make the sequences at the breakpoints remain unchanged before and after the rearrangement events, suggesting that the conservation of ends could possibly be a popular phenomenon in many types of genome rearrangement events.
Collapse
Affiliation(s)
- Dan Wang
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave., Hong Kong, People's Republic of China
| | - Lusheng Wang
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave., Hong Kong, People's Republic of China. .,University of Hong Kong Shenzhen Research Institute, Shenzhen Hi-Tech Industrial Park, Nanshan District, Shenzhen, People's Republic of China.
| |
Collapse
|
7
|
Anselmetti Y, Luhmann N, Bérard S, Tannier E, Chauve C. Comparative Methods for Reconstructing Ancient Genome Organization. Methods Mol Biol 2018; 1704:343-362. [PMID: 29277873 DOI: 10.1007/978-1-4939-7463-4_13] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Comparative genomics considers the detection of similarities and differences between extant genomes, and, based on more or less formalized hypotheses regarding the involved evolutionary processes, inferring ancestral states explaining the similarities and an evolutionary history explaining the differences. In this chapter, we focus on the reconstruction of the organization of ancient genomes into chromosomes. We review different methodological approaches and software, applied to a wide range of datasets from different kingdoms of life and at different evolutionary depths. We discuss relations with genome assembly, and potential approaches to validate computational predictions on ancient genomes that are almost always only accessible through these predictions.
Collapse
Affiliation(s)
- Yoann Anselmetti
- Institut des Sciences de l'Évolution, Université Montpellier 2, Montpellier, France
| | - Nina Luhmann
- Faculty of Technology, Bielefeld University, Bielefeld, Germany.,Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany.,International Research Training Group1906, Bielefeld University, Bielefeld, Germany
| | - Sèverine Bérard
- Institut des Sciences de l'Évolution, Université Montpellier 2, Montpellier, France
| | - Eric Tannier
- UMR CNRS 5558 - LBBE "Biométrie et Biologie Évolutive", Inria Grenoble Rhône-Alpes and University of Lyon, Lyon, France
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC, Canada, V5A 1S6.
| |
Collapse
|
8
|
Luhmann N, Doerr D, Chauve C. Comparative scaffolding and gap filling of ancient bacterial genomes applied to two ancient Yersinia pestis genomes. Microb Genom 2017; 3:e000123. [PMID: 29114402 PMCID: PMC5643016 DOI: 10.1099/mgen.0.000123] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Accepted: 06/07/2017] [Indexed: 12/12/2022] Open
Abstract
Yersinia pestis is the causative agent of the bubonic plague, a disease responsible for several dramatic historical pandemics. Progress in ancient DNA (aDNA) sequencing rendered possible the sequencing of whole genomes of important human pathogens, including the ancient Y. pestis strains responsible for outbreaks of the bubonic plague in London in the 14th century and in Marseille in the 18th century, among others. However, aDNA sequencing data are still characterized by short reads and non-uniform coverage, so assembling ancient pathogen genomes remains challenging and often prevents a detailed study of genome rearrangements. It has recently been shown that comparative scaffolding approaches can improve the assembly of ancient Y. pestis genomes at a chromosome level. In the present work, we address the last step of genome assembly, the gap-filling stage. We describe an optimization-based method AGapEs (ancestral gap estimation) to fill in inter-contig gaps using a combination of a template obtained from related extant genomes and aDNA reads. We show how this approach can be used to refine comparative scaffolding by selecting contig adjacencies supported by a mix of unassembled aDNA reads and comparative signal. We applied our method to two Y. pestis data sets from the London and Marseilles outbreaks, for which we obtained highly improved genome assemblies for both genomes, comprised of, respectively, five and six scaffolds with 95 % of the assemblies supported by ancient reads. We analysed the genome evolution between both ancient genomes in terms of genome rearrangements, and observed a high level of synteny conservation between these strains.
Collapse
Affiliation(s)
- Nina Luhmann
- 2Genome Informatics, Faculty of Technology and Center for Biotechnology, Bielefeld University, Bielefeld, Germany.,1International Research Training Group "Computational Methods for the Analysis of the Diversity and Dynamics of Genomes", Bielefeld University, Bielefeld, Germany
| | - Daniel Doerr
- 2Genome Informatics, Faculty of Technology and Center for Biotechnology, Bielefeld University, Bielefeld, Germany.,3School of Computer and Communication Sciences, EPFL, 1015 Lausanne, Switzerland
| | - Cedric Chauve
- 4Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| |
Collapse
|
9
|
Wang D, Li S, Guo F, Ning K, Wang L. Core-genome scaffold comparison reveals the prevalence that inversion events are associated with pairs of inverted repeats. BMC Genomics 2017; 18:268. [PMID: 28356070 PMCID: PMC5372343 DOI: 10.1186/s12864-017-3655-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2016] [Accepted: 03/22/2017] [Indexed: 01/01/2023] Open
Abstract
Background Genome rearrangement describes gross changes of chromosomal regions, plays an important role in evolutionary biology and has profound impacts on phenotype in organisms ranging from microbes to humans. With more and more complete genomes accomplished, lots of genomic comparisons have been conducted in order to find genome rearrangements and the mechanisms which underlie the rearrangement events. In our opinion, genomic comparison of different individuals/strains within the same species (pan-genome) is more helpful to reveal the mechanisms for genome rearrangements since genomes of the same species are much closer to each other. Results We study the mechanism for inversion events via core-genome scaffold comparison of different strains within the same species. We focus on two kinds of bacteria, Pseudomonas aeruginosa and Escherichia coli, and investigate the inversion events among different strains of the same species. We find an interesting phenomenon that long (larger than 10,000 bp) inversion regions are flanked by a pair of Inverted Repeats (IRs). This mechanism can also explain why the breakpoint reuses for inversion events happen. We study the prevalence of the phenomenon and find that it is a major mechanism for inversions. The other observation is that for different rearrangement events such as transposition and inverted block interchange, the two ends of the swapped regions are also associated with repeats so that after the rearrangement operations the two ends of the swapped regions remain unchanged. To our knowledge, this is the first time such a phenomenon is reported for transposition event. Conclusions In both Pseudomonas aeruginosa and Escherichia coli strains, IRs were found at the two ends of long sequence inversions. The two ends of the inversion remained unchanged before and after the inversion event. The existence of IRs can explain the breakpoint reuse phenomenon. We also observed that other rearrangement operations such as transposition, inverted transposition, and inverted block interchange, had repeats (not necessarily inverted) at the ends of each segment, where the ends remained unchanged before and after the rearrangement operations. This suggests that the conservation of ends could possibly be a popular phenomenon in many types of chromosome rearrangement events. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3655-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Dan Wang
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave., Hong Kong, SAR, People's Republic of China
| | - Shuaicheng Li
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave., Hong Kong, SAR, People's Republic of China
| | - Fei Guo
- School of Computer Science and Technology, Tianjin University, Tianjin, People's Republic of China
| | - Kang Ning
- School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, People's Republic of China
| | - Lusheng Wang
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Ave., Hong Kong, SAR, People's Republic of China. .,University of Hong Kong Shenzhen Research Institute, Shenzhen Hi-Tech Industrial Park, Nanshan District, Shenzhen, People's Republic of China.
| |
Collapse
|
10
|
Rajaraman A, Zanetti JPP, Manuch J, Chauve C. Algorithms and Complexity Results for Genome Mapping Problems. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:418-430. [PMID: 26887011 DOI: 10.1109/tcbb.2016.2528239] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Genome mapping algorithms aim at computing an ordering of a set of genomic markers based on local ordering information such as adjacencies and intervals of markers. In most genome mapping models, markers are assumed to occur uniquely in the resulting map. We introduce algorithmic questions that consider repeats, i.e., markers that can have several occurrences in the resulting map. We show that, provided with an upper bound on the copy number of repeated markers and with intervals that span full repeat copies, called repeat spanning intervals, the problem of deciding if a set of adjacencies and repeat spanning intervals admits a genome representation is tractable if the target genome can contain linear and/or circular chromosomal fragments. We also show that extracting a maximum cardinality or weight subset of repeat spanning intervals given a set of adjacencies that admits a genome realization is NP-hard but fixed-parameter tractable in the maximum copy number and the number of adjacent repeats, and tractable if intervals contain a single repeated marker.
Collapse
|
11
|
Rajaraman A, Ma J. Reconstructing ancestral gene orders with duplications guided by synteny level genome reconstruction. BMC Bioinformatics 2016; 17:414. [PMID: 28185565 PMCID: PMC5123302 DOI: 10.1186/s12859-016-1262-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Background Reconstructing ancestral gene orders in the presence of duplications is important for a better understanding of genome evolution. Current methods for ancestral reconstruction are limited by either computational constraints or the availability of reliable gene trees, and often ignore duplications altogether. Recently, methods that consider duplications in ancestral reconstructions have been developed, but the quality of reconstruction, counted as the number of contiguous ancestral regions found, decreases rapidly with the number of duplicated genes, complicating the application of such approaches to mammalian genomes. However, such high fragmentation is not encountered when reconstructing mammalian genomes at the synteny-block level, although the relative positions of genes in such reconstruction cannot be recovered. Results We propose a new heuristic method, MultiRes, to reconstruct ancestral gene orders with duplications guided by homologous synteny blocks for a set of related descendant genomes. The method uses a synteny-level reconstruction to break the gene-order problem into several subproblems, which are then combined in order to disambiguate duplicated genes. We applied this method to both simulated and real data. Our results showed that MultiRes outperforms other methods in terms of gene content, gene adjacency, and common interval recovery. Conclusions This work demonstrates that the inclusion of synteny-level information can help us obtain better gene-level reconstructions. Our algorithm provides a basic toolbox for reconstructing ancestral gene orders with duplications. The source code of MultiRes is available on https://github.com/ma-compbio/MultiRes. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1262-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ashok Rajaraman
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, 15213, USA
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, 15213, USA.
| |
Collapse
|
12
|
Andam CP, Worby CJ, Chang Q, Campana MG. Microbial Genomics of Ancient Plagues and Outbreaks. Trends Microbiol 2016; 24:978-990. [PMID: 27618404 DOI: 10.1016/j.tim.2016.08.004] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2016] [Revised: 07/29/2016] [Accepted: 08/16/2016] [Indexed: 01/22/2023]
Abstract
The recent use of next-generation sequencing methods to investigate historical disease outbreaks has provided us with an unprecedented ability to address important and long-standing questions in epidemiology, pathogen evolution, and human history. In this review, we present major findings that illustrate how microbial genomics has provided new insights into the nature and etiology of infectious diseases of historical importance, such as plague, tuberculosis, and leprosy. Sequenced isolates collected from archaeological remains also provide evidence for the timing of historical evolutionary events as well as geographic spread of these pathogens. Elucidating the genomic basis of virulence in historical diseases can provide relevant information on how we can effectively understand the emergence and re-emergence of infectious diseases today and in the future.
Collapse
Affiliation(s)
- Cheryl P Andam
- Harvard T. H. Chan School of Public Health, Department of Epidemiology, Boston, MA 02115, USA; University of New Hampshire, Department of Molecular, Cellular and Biomedical Sciences, Durham, NH 03824, USA.
| | - Colin J Worby
- Harvard T. H. Chan School of Public Health, Department of Epidemiology, Boston, MA 02115, USA
| | - Qiuzhi Chang
- Harvard T. H. Chan School of Public Health, Department of Epidemiology, Boston, MA 02115, USA
| | - Michael G Campana
- Smithsonian Conservation Biology Institute, Center for Conservation Genomics, 3001 Connecticut Avenue NW, Washington, DC 20008, USA.
| |
Collapse
|
13
|
Abstract
Background Even for moderate size inputs, there are a tremendous number of optimal rearrangement scenarios, regardless what the model is and which specific question is to be answered. Therefore giving one optimal solution might be misleading and cannot be used for statistical inferring. Statistically well funded methods are necessary to sample uniformly from the solution space and then a small number of samples are sufficient for statistical inferring. Contribution In this paper, we give a mini-review about the state-of-the-art of sampling and counting rearrangement scenarios, focusing on the reversal, DCJ and SCJ models. Above that, we also give a Gibbs sampler for sampling most parsimonious labeling of evolutionary trees under the SCJ model. The method has been implemented and tested on real life data. The software package together with example data can be downloaded from http://www.renyi.hu/~miklosi/SCJ-Gibbs/
Collapse
|
14
|
Abstract
This paper presents new structural and algorithmic results around the scaffolding problem, which occurs prominently in next generation sequencing. The problem can be formalized as an optimization problem on a special graph, the "scaffold graph". We prove that the problem is polynomial if this graph is a tree by providing a dynamic programming algorithm for this case. This algorithm serves as a basis to deduce an exact algorithm for general graphs using a tree decomposition of the input. We explore other structural parameters, proving a linear-size problem kernel with respect to the size of a feedback-edge set on a restricted version of Scaffolding. Finally, we examine some parameters of scaffold graphs, which are based on real-world genomes, revealing that the feedback edge set is significantly smaller than the input size.
Collapse
Affiliation(s)
- Mathias Weller
- Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM) - Université de Montpellier - UMR 5506 CNRS, 161 rue Ada, 34090 Montpellier, France
- Institut de Biologie Computationnelle, Lirmm Bât 5 - 860 rue de St Priest, 34090 Montpellier, France
| | - Annie Chateau
- Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM) - Université de Montpellier - UMR 5506 CNRS, 161 rue Ada, 34090 Montpellier, France
- Institut de Biologie Computationnelle, Lirmm Bât 5 - 860 rue de St Priest, 34090 Montpellier, France
| | - Rodolphe Giroudeau
- Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM) - Université de Montpellier - UMR 5506 CNRS, 161 rue Ada, 34090 Montpellier, France
| |
Collapse
|
15
|
Anselmetti Y, Berry V, Chauve C, Chateau A, Tannier E, Bérard S. Ancestral gene synteny reconstruction improves extant species scaffolding. BMC Genomics 2015; 16 Suppl 10:S11. [PMID: 26450761 PMCID: PMC4603332 DOI: 10.1186/1471-2164-16-s10-s11] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
We exploit the methodological similarity between ancestral genome reconstruction and extant genome scaffolding. We present a method, called ARt-DeCo that constructs neighborhood relationships between genes or contigs, in both ancestral and extant genomes, in a phylogenetic context. It is able to handle dozens of complete genomes, including genes with complex histories, by using gene phylogenies reconciled with a species tree, that is, annotated with speciation, duplication and loss events. Reconstructed ancestral or extant synteny comes with a support computed from an exhaustive exploration of the solution space. We compare our method with a previously published one that follows the same goal on a small number of genomes with universal unicopy genes. Then we test it on the whole Ensembl database, by proposing partial ancestral genome structures, as well as a more complete scaffolding for many partially assembled genomes on 69 eukaryote species. We carefully analyze a couple of extant adjacencies proposed by our method, and show that they are indeed real links in the extant genomes, that were missing in the current assembly. On a reduced data set of 39 eutherian mammals, we estimate the precision and sensitivity of ARt-DeCo by simulating a fragmentation in some well assembled genomes, and measure how many adjacencies are recovered. We find a very high precision, while the sensitivity depends on the quality of the data and on the proximity of closely related genomes.
Collapse
Affiliation(s)
- Yoann Anselmetti
- Institut des Sciences de l'Évolution de Montpellier (ISE-M), Place Eugène Bataillon, Montpellier, 34095, France
- Laboratoire de Biométrie et Biologie Évolutive, LBBE, UMR CNRS 5558, University of Lyon 1, 43 boulevard du 11 novembre 1918, 69622 Villeurbanne, France
| | - Vincent Berry
- Institut de Biologie Computationnelle (IBC), Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université Montpellier - CNRS, 161 rue Ada, Montpellier, 34090, France
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, V5A 1S6, Canada
| | - Annie Chateau
- Institut de Biologie Computationnelle (IBC), Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université Montpellier - CNRS, 161 rue Ada, Montpellier, 34090, France
| | - Eric Tannier
- Laboratoire de Biométrie et Biologie Évolutive, LBBE, UMR CNRS 5558, University of Lyon 1, 43 boulevard du 11 novembre 1918, 69622 Villeurbanne, France
- Institut National de Recherche en Informatique et en Automatique (INRIA) Grenoble Rhône-Alpes, 655 avenue de l'Europe, 38330 Montbonnot, France
| | - Sèverine Bérard
- Institut des Sciences de l'Évolution de Montpellier (ISE-M), Place Eugène Bataillon, Montpellier, 34095, France
- Institut de Biologie Computationnelle (IBC), Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM), Université Montpellier - CNRS, 161 rue Ada, Montpellier, 34090, France
| |
Collapse
|
16
|
Duchemin W, Daubin V, Tannier E. Reconstruction of an ancestral Yersinia pestis genome and comparison with an ancient sequence. BMC Genomics 2015; 16 Suppl 10:S9. [PMID: 26450112 PMCID: PMC4603589 DOI: 10.1186/1471-2164-16-s10-s9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND We propose the computational reconstruction of a whole bacterial ancestral genome at the nucleotide scale, and its validation by a sequence of ancient DNA. This rare possibility is offered by an ancient sequence of the late middle ages plague agent. It has been hypothesized to be ancestral to extant Yersinia pestis strains based on the pattern of nucleotide substitutions. But the dynamics of indels, duplications, insertion sequences and rearrangements has impacted all genomes much more than the substitution process, which makes the ancestral reconstruction task challenging. RESULTS We use a set of gene families from 13 Yersinia species, construct reconciled phylogenies for all of them, and determine gene orders in ancestral species. Gene trees integrate information from the sequence, the species tree and gene order. We reconstruct ancestral sequences for ancestral genic and intergenic regions, providing nearly a complete genome sequence for the ancestor, containing a chromosome and three plasmids. CONCLUSION The comparison of the ancestral and ancient sequences provides a unique opportunity to assess the quality of ancestral genome reconstruction methods. But the quality of the sequencing and assembly of the ancient sequence can also be questioned by this comparison.
Collapse
Affiliation(s)
- Wandrille Duchemin
- Laboratoire de Biométrie et Biologie Évolutive, LBBE, UMR CNRS 5558, University of Lyon 1, 43 boulevard du 11 novembre 1918, 69622 Villeurbanne, France
| | - Vincent Daubin
- Laboratoire de Biométrie et Biologie Évolutive, LBBE, UMR CNRS 5558, University of Lyon 1, 43 boulevard du 11 novembre 1918, 69622 Villeurbanne, France
| | - Eric Tannier
- Laboratoire de Biométrie et Biologie Évolutive, LBBE, UMR CNRS 5558, University of Lyon 1, 43 boulevard du 11 novembre 1918, 69622 Villeurbanne, France
- Institut National de Recherche en Informatique et en Automatique (INRIA) Grenoble Rhône-Alpes, 655 avenue de l'Europe, 38330 Montbonnot, France
| |
Collapse
|
17
|
Bosi E, Donati B, Galardini M, Brunetti S, Sagot MF, Lió P, Crescenzi P, Fani R, Fondi M. MeDuSa: a multi-draft based scaffolder. Bioinformatics 2015; 31:2443-51. [PMID: 25810435 DOI: 10.1093/bioinformatics/btv171] [Citation(s) in RCA: 302] [Impact Index Per Article: 30.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2014] [Accepted: 03/19/2015] [Indexed: 01/07/2023] Open
Abstract
MOTIVATION Completing the genome sequence of an organism is an important task in comparative, functional and structural genomics. However, this remains a challenging issue from both a computational and an experimental viewpoint. Genome scaffolding (i.e. the process of ordering and orientating contigs) of de novo assemblies usually represents the first step in most genome finishing pipelines. RESULTS In this article we present MeDuSa (Multi-Draft based Scaffolder), an algorithm for genome scaffolding. MeDuSa exploits information obtained from a set of (draft or closed) genomes from related organisms to determine the correct order and orientation of the contigs. MeDuSa formalizes the scaffolding problem by means of a combinatorial optimization formulation on graphs and implements an efficient constant factor approximation algorithm to solve it. In contrast to currently used scaffolders, it does not require either prior knowledge on the microrganisms dataset under analysis (e.g. their phylogenetic relationships) or the availability of paired end read libraries. This makes usability and running time two additional important features of our method. Moreover, benchmarks and tests on real bacterial datasets showed that MeDuSa is highly accurate and, in most cases, outperforms traditional scaffolders. The possibility to use MeDuSa on eukaryotic datasets has also been evaluated, leading to interesting results.
Collapse
Affiliation(s)
- Emanuele Bosi
- Department of Biology, ComBo, Florence Computational Biology Group, Department of Biology, LEMM, Laboratory of Microbial and Molecular Evolution Florence, University of Florence, I-50019 Sesto F.no, Italy
| | - Beatrice Donati
- INRIA Rhône-Alpes, Villeurbanne Cedex, France, Université de Lyon, F-69000 Lyon, France, Dipartimento di Ingegneria dell'Informazione, University of Florence, I-50139 Firenze, Italy
| | - Marco Galardini
- EMBL-EBI - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD Cambridge, UK
| | - Sara Brunetti
- Dipartimento di Ingegneria dell'Informazione e Scienze Matematiche, University of Siena, Siena I-53100, Italy
| | - Marie-France Sagot
- INRIA Rhône-Alpes, Villeurbanne Cedex, France, Université de Lyon, F-69000 Lyon, France, Université Lyon 1, CNRS,UMR5558, 69622 Villeurbanne Cedex, France and
| | - Pietro Lió
- Computer Laboratory, University of Cambridge, CB3 0FD Cambridge, UK
| | - Pierluigi Crescenzi
- Dipartimento di Ingegneria dell'Informazione, University of Florence, I-50139 Firenze, Italy
| | - Renato Fani
- Department of Biology, ComBo, Florence Computational Biology Group, Department of Biology, LEMM, Laboratory of Microbial and Molecular Evolution Florence, University of Florence, I-50019 Sesto F.no, Italy
| | - Marco Fondi
- Department of Biology, ComBo, Florence Computational Biology Group, Department of Biology, LEMM, Laboratory of Microbial and Molecular Evolution Florence, University of Florence, I-50019 Sesto F.no, Italy
| |
Collapse
|
18
|
Tang H, Zhang X, Miao C, Zhang J, Ming R, Schnable JC, Schnable PS, Lyons E, Lu J. ALLMAPS: robust scaffold ordering based on multiple maps. Genome Biol 2015; 16:3. [PMID: 25583564 PMCID: PMC4305236 DOI: 10.1186/s13059-014-0573-1] [Citation(s) in RCA: 264] [Impact Index Per Article: 26.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2014] [Accepted: 12/15/2014] [Indexed: 11/29/2022] Open
Abstract
The ordering and orientation of genomic scaffolds to reconstruct chromosomes is an essential step during de novo genome assembly. Because this process utilizes various mapping techniques that each provides an independent line of evidence, a combination of multiple maps can improve the accuracy of the resulting chromosomal assemblies. We present ALLMAPS, a method capable of computing a scaffold ordering that maximizes colinearity across a collection of maps. ALLMAPS is robust against common mapping errors, and generates sequences that are maximally concordant with the input maps. ALLMAPS is a useful tool in building high-quality genome assemblies. ALLMAPS is available at: https://github.com/tanghaibao/jcvi/wiki/ALLMAPS.
Collapse
Affiliation(s)
- Haibao Tang
- Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, 350002, Fujian Province, China. .,School of Plant Sciences, iPlant Collaborative, University of Arizona, Tucson, AZ, 85721, USA. .,Data2Bio LLC, 2079 Roy J. Carver Co-Lab, Ames, Iowa, 50011, USA.
| | - Xingtan Zhang
- J. Craig Venter Institute, 9704 Medical Center Dr, Rockville, MD, 20850, USA.
| | - Chenyong Miao
- Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, 350002, Fujian Province, China.
| | - Jisen Zhang
- Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, 350002, Fujian Province, China.
| | - Ray Ming
- Center for Genomics and Biotechnology, Fujian Agriculture and Forestry University, Fuzhou, 350002, Fujian Province, China.
| | - James C Schnable
- Data2Bio LLC, 2079 Roy J. Carver Co-Lab, Ames, Iowa, 50011, USA. .,Department of Agronomy and Horticulture, University of Nebraska, Lincoln, NE, 68588, USA.
| | - Patrick S Schnable
- Data2Bio LLC, 2079 Roy J. Carver Co-Lab, Ames, Iowa, 50011, USA. .,Department of Agronomy, Iowa State University, Ames, IA, 50011, USA.
| | - Eric Lyons
- School of Plant Sciences, iPlant Collaborative, University of Arizona, Tucson, AZ, 85721, USA.
| | - Jianguo Lu
- Heilongjiang River Fisheries Research Institute, Harbin, 150070, China.
| |
Collapse
|
19
|
Major transitions in human evolution revisited: a tribute to ancient DNA. J Hum Evol 2014; 79:4-20. [PMID: 25532800 DOI: 10.1016/j.jhevol.2014.06.015] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Revised: 06/06/2014] [Accepted: 06/19/2014] [Indexed: 11/23/2022]
Abstract
The origin and diversification of modern humans have been characterized by major evolutionary transitions and demographic changes. Patterns of genetic variation within modern populations can help with reconstructing this ∼200 thousand year-long population history. However, by combining this information with genomic data from ancient remains, one can now directly access our evolutionary past and reveal our population history in much greater detail. This review outlines the main recent achievements in ancient DNA research and illustrates how the field recently moved from the polymerase chain reaction (PCR) amplification of short mitochondrial fragments to whole-genome sequencing and thereby revisited our own history. Ancient DNA research has revealed the routes that our ancestors took when colonizing the planet, whom they admixed with, how they domesticated plant and animal species, how they genetically responded to changes in lifestyle, and also, which pathogens decimated their populations. These approaches promise to soon solve many pending controversies about our own origins that are indecipherable from modern patterns of genetic variation alone, and therefore provide an extremely powerful toolkit for a new generation of molecular anthropologists.
Collapse
|
20
|
Hofreiter M, Paijmans JLA, Goodchild H, Speller CF, Barlow A, Fortes GG, Thomas JA, Ludwig A, Collins MJ. The future of ancient DNA: Technical advances and conceptual shifts. Bioessays 2014; 37:284-93. [PMID: 25413709 DOI: 10.1002/bies.201400160] [Citation(s) in RCA: 119] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Technological innovations such as next generation sequencing and DNA hybridisation enrichment have resulted in multi-fold increases in both the quantity of ancient DNA sequence data and the time depth for DNA retrieval. To date, over 30 ancient genomes have been sequenced, moving from 0.7× coverage (mammoth) in 2008 to more than 50× coverage (Neanderthal) in 2014. Studies of rapid evolutionary changes, such as the evolution and spread of pathogens and the genetic responses of hosts, or the genetics of domestication and climatic adaptation, are developing swiftly and the importance of palaeogenomics for investigating evolutionary processes during the last million years is likely to increase considerably. However, these new datasets require new methods of data processing and analysis, as well as conceptual changes in interpreting the results. In this review we highlight important areas of future technical and conceptual progress and discuss research topics in the rapidly growing field of palaeogenomics.
Collapse
Affiliation(s)
- Michael Hofreiter
- Institute for Biochemistry and Biology, University of Potsdam, Potsdam, Germany; Department of Biology, University of York, York, UK
| | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Abstract
This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.
Collapse
Affiliation(s)
- Gergely J Szöllősi
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Eric Tannier
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Vincent Daubin
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Bastien Boussau
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France;
| |
Collapse
|