1
|
Kwon D, Park N, Wy S, Lee D, Park W, Chai HH, Cho IC, Lee J, Kwon K, Kim H, Moon Y, Kim J, Kim J. Identification and characterization of structural variants related to meat quality in pigs using chromosome-level genome assemblies. BMC Genomics 2024; 25:299. [PMID: 38515031 PMCID: PMC10956321 DOI: 10.1186/s12864-024-10225-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 03/14/2024] [Indexed: 03/23/2024] Open
Abstract
BACKGROUND Many studies have been performed to identify various genomic loci and genes associated with the meat quality in pigs. However, the full genetic architecture of the trait still remains unclear in part because of the lack of accurate identification of related structural variations (SVs) which resulted from the shortage of target breeds, the limitations of sequencing data, and the incompleteness of genome assemblies. The recent generation of a new pig breed with superior meat quality, called Nanchukmacdon, and its chromosome-level genome assembly (the NCMD assembly) has provided new opportunities. RESULTS By applying assembly-based SV calling approaches to various genome assemblies of pigs including Nanchukmacdon, the impact of SVs on meat quality was investigated. Especially, by checking the commonality of SVs with other pig breeds, a total of 13,819 Nanchukmacdon-specific SVs (NSVs) were identified, which have a potential effect on the unique meat quality of Nanchukmacdon. The regulatory potentials of NSVs for the expression of nearby genes were further examined using transcriptome- and epigenome-based analyses in different tissues. CONCLUSIONS Whole-genome comparisons based on chromosome-level genome assemblies have led to the discovery of SVs affecting meat quality in pigs, and their regulatory potentials were analyzed. The identified NSVs will provide new insights regarding genetic architectures underlying the meat quality in pigs. Finally, this study confirms the utility of chromosome-level genome assemblies and multi-omics analysis to enhance the understanding of unique phenotypes.
Collapse
Affiliation(s)
- Daehong Kwon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Nayoung Park
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Suyeon Wy
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Daehwan Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Woncheoul Park
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju, 55365, Republic of Korea
| | - Han-Ha Chai
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju, 55365, Republic of Korea
| | - In-Cheol Cho
- Subtropical Livestock Research Institute, National Institute of Animal Science, RDA, Jeju, 63242, Republic of Korea
| | - Jongin Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Kisang Kwon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Heesun Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Youngbeen Moon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Juyeon Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea.
| |
Collapse
|
2
|
Hartmann T, Middendorf M, Bernt M. Genome Rearrangement Analysis : Cut and Join Genome Rearrangements and Gene Cluster Preserving Approaches. Methods Mol Biol 2024; 2802:215-245. [PMID: 38819562 DOI: 10.1007/978-1-0716-3838-5_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Genome rearrangements are mutations that change the gene content of a genome or the arrangement of the genes on a genome. Several years of research on genome rearrangements have established different algorithmic approaches for solving some fundamental problems in comparative genomics based on gene order information. This review summarizes the literature on genome rearrangement analysis along two lines of research. The first line considers rearrangement models that are particularly well suited for a theoretical analysis. These models use rearrangement operations that cut chromosomes into fragments and then join the fragments into new chromosomes. The second line works with rearrangement models that reflect several biologically motivated constraints, e.g., the constraint that gene clusters have to be preserved. In this chapter, the border between algorithmically "easy" and "hard" rearrangement problems is sketched and a brief review is given on the available software tools for genome rearrangement analysis.
Collapse
Affiliation(s)
- Tom Hartmann
- Swarm Intelligence and Complex Systems Group, Institute of Computer Science, University Leipzig, Leipzig, Germany
| | - Martin Middendorf
- Swarm Intelligence and Complex Systems Group, Institute of Computer Science, University Leipzig, Leipzig, Germany.
| | | |
Collapse
|
3
|
Cribbie EP, Doerr D, Chauve C. AGO, a Framework for the Reconstruction of Ancestral Syntenies and Gene Orders. Methods Mol Biol 2024; 2802:247-265. [PMID: 38819563 DOI: 10.1007/978-1-0716-3838-5_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Reconstructing ancestral gene orders from the genome data of extant species is an important problem in comparative and evolutionary genomics. In a phylogenomics setting that accounts for gene family evolution through gene duplication and gene loss, the reconstruction of ancestral gene orders involves several steps, including multiple sequence alignment, the inference of reconciled gene trees, and the inference of ancestral syntenies and gene adjacencies. For each of the steps of such a process, several methods can be used and implemented using a growing corpus of, often parameterized, tools; in practice, interfacing such tools into an ancestral gene order reconstruction pipeline is far from trivial. This chapter introduces AGO, a Python-based framework aimed at creating ancestral gene order reconstruction pipelines allowing to interface and parameterize different bioinformatics tools. The authors illustrate the features of AGO by reconstructing ancestral gene orders for the X chromosome of three ancestral Anopheles species using three different pipelines. AGO is freely available at https://github.com/cchauve/AGO-pipeline .
Collapse
Affiliation(s)
- Evan P Cribbie
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| | - Daniel Doerr
- Department for Endocrinology and Diabetology, Medical Faculty and University Hospital Düsseldorf, German Diabetes Center (DDZ), Leibniz Institute for Diabetes Research, and Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada.
| |
Collapse
|
4
|
Berdan EL, Barton NH, Butlin R, Charlesworth B, Faria R, Fragata I, Gilbert KJ, Jay P, Kapun M, Lotterhos KE, Mérot C, Durmaz Mitchell E, Pascual M, Peichel CL, Rafajlović M, Westram AM, Schaeffer SW, Johannesson K, Flatt T. How chromosomal inversions reorient the evolutionary process. J Evol Biol 2023; 36:1761-1782. [PMID: 37942504 DOI: 10.1111/jeb.14242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 09/13/2023] [Accepted: 10/05/2023] [Indexed: 11/10/2023]
Abstract
Inversions are structural mutations that reverse the sequence of a chromosome segment and reduce the effective rate of recombination in the heterozygous state. They play a major role in adaptation, as well as in other evolutionary processes such as speciation. Although inversions have been studied since the 1920s, they remain difficult to investigate because the reduced recombination conferred by them strengthens the effects of drift and hitchhiking, which in turn can obscure signatures of selection. Nonetheless, numerous inversions have been found to be under selection. Given recent advances in population genetic theory and empirical study, here we review how different mechanisms of selection affect the evolution of inversions. A key difference between inversions and other mutations, such as single nucleotide variants, is that the fitness of an inversion may be affected by a larger number of frequently interacting processes. This considerably complicates the analysis of the causes underlying the evolution of inversions. We discuss the extent to which these mechanisms can be disentangled, and by which approach.
Collapse
Affiliation(s)
- Emma L Berdan
- Bioinformatics Core, Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA
- Department of Marine Sciences, University of Gothenburg, Gothenburg, Sweden
| | - Nicholas H Barton
- Institute of Science and Technology Austria (ISTA), Klosterneuburg, Austria
| | - Roger Butlin
- Department of Marine Sciences, University of Gothenburg, Gothenburg, Sweden
- Ecology and Evolutionary Biology, School of Bioscience, The University of Sheffield, Sheffield, UK
| | - Brian Charlesworth
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | - Rui Faria
- CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
| | - Inês Fragata
- CHANGE - Global Change and Sustainability Institute/Animal Biology Department, cE3c - Center for Ecology, Evolution and Environmental Changes, Faculty of Sciences, University of Lisbon, Lisbon, Portugal
| | | | - Paul Jay
- Center for GeoGenetics, University of Copenhagen, Copenhagen, Denmark
| | - Martin Kapun
- Center for Anatomy and Cell Biology, Medical University of Vienna, Vienna, Austria
- Central Research Laboratories, Natural History Museum of Vienna, Vienna, Austria
| | - Katie E Lotterhos
- Department of Marine and Environmental Sciences, Northeastern University, Boston, Massachusetts, USA
| | - Claire Mérot
- UMR 6553 Ecobio, Université de Rennes, OSUR, CNRS, Rennes, France
| | - Esra Durmaz Mitchell
- Department of Biology, University of Fribourg, Fribourg, Switzerland
- Functional Genomics & Metabolism Research Unit, Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense M, Denmark
| | - Marta Pascual
- Departament de Genètica, Microbiologia i Estadística, Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Catherine L Peichel
- Division of Evolutionary Ecology, Institute of Ecology and Evolution, University of Bern, Bern, Switzerland
| | - Marina Rafajlović
- Department of Marine Sciences, University of Gothenburg, Gothenburg, Sweden
- Linnaeus Centre for Marine Evolutionary Biology, University of Gothenburg, Gothenburg, Sweden
| | - Anja M Westram
- Institute of Science and Technology Austria (ISTA), Klosterneuburg, Austria
- Faculty of Biosciences and Aquaculture, Nord University, Bodø, Norway
| | - Stephen W Schaeffer
- Department of Biology, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Kerstin Johannesson
- Linnaeus Centre for Marine Evolutionary Biology, University of Gothenburg, Gothenburg, Sweden
- Tjärnö Marine Laboratory, Department of Marine Sciences, University of Gothenburg, Strömstad, Sweden
| | - Thomas Flatt
- Department of Biology, University of Fribourg, Fribourg, Switzerland
| |
Collapse
|
5
|
Kwon D, Park N, Wy S, Lee D, Chai HH, Cho IC, Lee J, Kwon K, Kim H, Moon Y, Kim J, Park W, Kim J. A chromosome-level genome assembly of the Korean crossbred pig Nanchukmacdon (Sus scrofa). Sci Data 2023; 10:761. [PMID: 37923776 PMCID: PMC10624824 DOI: 10.1038/s41597-023-02661-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 10/17/2023] [Indexed: 11/06/2023] Open
Abstract
As plentiful high-quality genome assemblies have been accumulated, reference-guided genome assembly can be a good approach to reconstruct a high-quality assembly. Here, we present a chromosome-level genome assembly of the Korean crossbred pig called Nanchukmacdon (the NCMD assembly) using the reference-guided assembly approach with short and long reads. The NCMD assembly contains 20 chromosome-level scaffolds with a total size of 2.38 Gbp (N50: 138.77 Mbp). Its BUSCO score is 93.1%, which is comparable to the pig reference assembly, and a total of 20,588 protein-coding genes, 8,651 non-coding genes, and 996.14 Mbp of repetitive elements are annotated. The NCMD assembly was also used to close many gaps in the pig reference assembly. This NCMD assembly and annotation provide foundational resources for the genomic analyses of pig and related species.
Collapse
Affiliation(s)
- Daehong Kwon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Nayoung Park
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Suyeon Wy
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Daehwan Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Han-Ha Chai
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju, 55365, Republic of Korea
| | - In-Cheol Cho
- Subtropical Livestock Research Institute, National Institute of Animal Science, RDA, Jeju, 63242, Republic of Korea
| | - Jongin Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Kisang Kwon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Heesun Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Youngbeen Moon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Juyeon Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Woncheoul Park
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju, 55365, Republic of Korea.
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea.
| |
Collapse
|
6
|
Zabelkin A, Avdeyev P, Alexeev N. TruEst: a better estimator of evolutionary distance under the INFER model. J Math Biol 2023; 87:25. [PMID: 37423919 DOI: 10.1007/s00285-023-01955-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Revised: 06/11/2023] [Accepted: 06/15/2023] [Indexed: 07/11/2023]
Abstract
Genome rearrangements are evolutionary events that shuffle genomic architectures. The number of genome rearrangements that happened between two genomes is often used as the evolutionary distance between these species. This number is often estimated as the minimum number of genome rearrangements required to transform one genome into another which are only reliable for closely-related genomes. These estimations often underestimate the evolutionary distance for genomes that have substantially evolved from each other, and advanced statistical methods can be used to improve accuracy. Several statistical estimators have been developed, under various evolutionary models, of which the most complete one, INFER, takes into account different degrees of genome fragility. We present TruEst-an efficient tool that estimates the evolutionary distance between the genomes under the INFER model of genome rearrangements. We apply our method to both simulated and real data. It shows high accuracy on the simulated data. On the real datasets of mammal genomes the method found several pairs of genomes for which the estimated distances are in high consistency with the previous ancestral reconstruction studies.
Collapse
Affiliation(s)
- Alexey Zabelkin
- International Laboratory "Computer Technologies", ITMO University, Saint Petersburg, Russia.
| | - Pavel Avdeyev
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, USA
| | | |
Collapse
|
7
|
Reconstruction of hundreds of reference ancestral genomes across the eukaryotic kingdom. Nat Ecol Evol 2023; 7:355-366. [PMID: 36646945 PMCID: PMC9998269 DOI: 10.1038/s41559-022-01956-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 11/22/2022] [Indexed: 01/18/2023]
Abstract
Ancestral sequence reconstruction is a fundamental aspect of molecular evolution studies and can trace small-scale sequence modifications through the evolution of genomes and species. In contrast, fine-grained reconstructions of ancestral genome organizations are still in their infancy, limiting our ability to draw comprehensive views of genome and karyotype evolution. Here we reconstruct the detailed gene contents and organizations of 624 ancestral vertebrate, plant, fungi, metazoan and protist genomes, 183 of which are near-complete chromosomal gene order reconstructions. Reconstructed ancestral genomes are similar to their descendants in terms of gene content as expected and agree precisely with reference cytogenetic and in silico reconstructions when available. By comparing successive ancestral genomes along the phylogenetic tree, we estimate the intra- and interchromosomal rearrangement history of all major vertebrate clades at high resolution. This freely available resource introduces the possibility to follow evolutionary processes at genomic scales in chronological order, across multiple clades and without relying on a single extant species as reference.
Collapse
|
8
|
Sim M, Lee J, Kwon D, Lee D, Park N, Wy S, Ko Y, Kim J. Reference-based read clustering improves the de novo genome assembly of microbial strains. Comput Struct Biotechnol J 2022; 21:444-451. [PMID: 36618978 PMCID: PMC9804104 DOI: 10.1016/j.csbj.2022.12.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 12/17/2022] [Accepted: 12/19/2022] [Indexed: 12/24/2022] Open
Abstract
Constructing accurate microbial genome assemblies is necessary to understand genetic diversity in microbial genomes and its functional consequences. However, it still remains as a challenging task especially when only short-read sequencing technologies are used. Here, we present a new read-clustering algorithm, called RBRC, for improving de novo microbial genome assembly, by accurately estimating read proximity using multiple reference genomes. The performance of RBRC was confirmed by simulation-based evaluation in terms of assembly contiguity and the number of misassemblies, and was successfully applied to existing fungal and bacterial genomes by improving the quality of the assemblies without using additional sequencing data. RBRC is a very useful read-clustering algorithm that can be used (i) for generating high-quality genome assemblies of microbial strains when genome assemblies of related strains are available, and (ii) for upgrading existing microbial genome assemblies when the generation of additional sequencing data, such as long reads, is difficult.
Collapse
Affiliation(s)
- Mikang Sim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Jongin Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Daehong Kwon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Daehwan Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Nayoung Park
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Suyeon Wy
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea
| | - Younhee Ko
- Division of Biomedical Engineering, Hankuk University of Foreign Studies, Gyeonggi-do 17035, Republic of Korea
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Republic of Korea,Corresponding author.
| |
Collapse
|
9
|
Abstract
Computational reconstruction of ancestral mammalian karyotypes revealed a comprehensive picture of the chromosome rearrangements that occurred over the evolutionary history of mammals. Ancient gene order, in some cases extending to full chromosomes, was found conserved for more than 300 My, demonstrating strong evolutionary constraint against rearrangements in some regions. Conserved segments of chromosomes are enriched for genes that control developmental processes. Therefore, Darwinian selection likely maintains ancient gene combinations while allowing for genomic innovations within or near chromosomal sites that break and rearrange over evolutionary time. The revealed relationship between the three-dimensional structure of chromosomes and the evolutionary stability of chromosome segments provides additional insights into the mechanisms of chromosome evolution and diseases associated with genome rearrangements. Decrypting the rearrangements that drive mammalian chromosome evolution is critical to understanding the molecular bases of speciation, adaptation, and disease susceptibility. Using 8 scaffolded and 26 chromosome-scale genome assemblies representing 23/26 mammal orders, we computationally reconstructed ancestral karyotypes and syntenic relationships at 16 nodes along the mammalian phylogeny. Three different reference genomes (human, sloth, and cattle) representing phylogenetically distinct mammalian superorders were used to assess reference bias in the reconstructed ancestral karyotypes and to expand the number of clades with reconstructed genomes. The mammalian ancestor likely had 19 pairs of autosomes, with nine of the smallest chromosomes shared with the common ancestor of all amniotes (three still conserved in extant mammals), demonstrating a striking conservation of synteny for ∼320 My of vertebrate evolution. The numbers and types of chromosome rearrangements were classified for transitions between the ancestral mammalian karyotype, descendent ancestors, and extant species. For example, 94 inversions, 16 fissions, and 14 fusions that occurred over 53 My differentiated the therian from the descendent eutherian ancestor. The highest breakpoint rate was observed between the mammalian and therian ancestors (3.9 breakpoints/My). Reconstructed mammalian ancestor chromosomes were found to have distinct evolutionary histories reflected in their rates and types of rearrangements. The distributions of genes, repetitive elements, topologically associating domains, and actively transcribed regions in multispecies homologous synteny blocks and evolutionary breakpoint regions indicate that purifying selection acted over millions of years of vertebrate evolution to maintain syntenic relationships of developmentally important genes and regulatory landscapes of gene-dense chromosomes.
Collapse
|
10
|
Abstract
The Small Parsimony Problem (SPP) aims at finding the gene orders at internal nodes of a given phylogenetic tree such that the overall genome rearrangement distance along the tree branches is minimized. This problem is intractable in most genome rearrangement models, especially when gene duplication and loss are considered. In this work, we describe an Integer Linear Program algorithm to solve the SPP for natural genomes, i.e. genomes that contain conserved, unique, and duplicated markers. The evolutionary model that we consider is the DCJ-indel model that includes the Double-Cut and Join rearrangement operation and the insertion and deletion of genome segments. We evaluate our algorithm on simulated data and show that it is able to reconstruct very efficiently and accurately ancestral gene orders in a very comprehensive evolutionary model.
Collapse
Affiliation(s)
- Daniel Doerr
- Faculty of Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Cedric Chauve
- Department of Mathematic, Simon Fraser University, Canada
| |
Collapse
|
11
|
Xu Q, Jin L, Zhang Y, Zhang X, Zheng C, Leebens-Mack JH, Sankoff D. Ancestral Flowering Plant Chromosomes and Gene Orders Based on Generalized Adjacencies and Chromosomal Gene Co-Occurrences. J Comput Biol 2021; 28:1156-1179. [PMID: 34783601 DOI: 10.1089/cmb.2021.0340] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Recurrent whole genome duplication and the ensuing loss of redundant genes-fractionation-complicate efforts to reconstruct the gene orders and chromosomes of the ancestors associated with the nodes of a phylogeny. Loss of genes disrupts the gene adjacencies key to current techniques. With our RACCROCHE pipeline, instead of starting with the inference of short ancestral segments, we suggest delaying the choice of gene adjacencies while we accumulate many more syntenically validated generalized (gapped) adjacencies. We obtain longer ancestral contigs using maximum weight matching (MWM). Similarly, we do not construct chromosomes by successively piecing together contigs into larger segments, but rather compile counts of pairwise contig co-occurrences on the set of extant genomes and use these to cluster the contigs. Chromosome-level contig assemblies for a monoploid genome emerge naturally at each node of the phylogeny and the contigs then can be ordered along the chromosome. Sampling alternative MWM solutions, visualizing heat maps, and applying gap statistics allow us to estimate the number of chromosomes in the reconstruction. We introduce several measures of quality: length of contigs, continuity of contig structure on successive ancestors, coverage of the extant genome by the reconstruction, and rearrangement relations among the inferred chromosomes. The reconstructed ancestors are visualized by painting the ancestral projections on the descendant genomes. We submit genomes drawn from a broad range of monocot orders to our pipeline, confirming the tetraploidization event "tau" in the stem lineage between the alismatids and the lilioids. We show additional applications to the Solanaceae and to four Brassica genomes, producing evidence about the monoploid ancestor in each case.
Collapse
Affiliation(s)
- Qiaoji Xu
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Ontario, Canada
| | - Lingling Jin
- Department of Computer Science, University of Saskatchewan, Saskatoon, Canada
| | - Yue Zhang
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Ontario, Canada
| | - Xiaomeng Zhang
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Ontario, Canada
| | - Chunfang Zheng
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Ontario, Canada
| | | | - David Sankoff
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Ontario, Canada
| |
Collapse
|
12
|
Chua M, Tan A, Tremblay-Savard O. BOPAL 2.0 and a study of tRNA and rRNA gene evolution in Clostridium. J Bioinform Comput Biol 2021; 19:2140007. [PMID: 34775921 DOI: 10.1142/s0219720021400072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We present BOPAL 2.0, an improved version of the BOPAL algorithm for the evolutionary history inference of tRNA and rRNA genes in bacterial genomes. Our approach can infer complete evolutionary scenarios and ancestral gene orders on a phylogeny and considers a wide range of events such as duplications, deletions, substitutions, inversions and transpositions. It is based on the fact that tRNA and rRNA genes are often organized in operons/clusters in bacteria, and this information is used to help identify orthologous genes for each genome comparison. BOPAL 2.0 introduces new features, such as a triple-wise alignment step, context-aware singleton matching and a second pass of the algorithm. Evaluation on simulated datasets shows that BOPAL 2.0 outperforms the original BOPAL in terms of the accuracy of inferred events and ancestral genomes. We also present a study of the tRNA/rRNA gene evolution in the Clostridium genus, in which the organization of these genes is very divergent. Our results indicate that tRNA and rRNA genes in Clostridium have evolved through numerous duplications, losses, transpositions and substitutions, but very few inversions were inferred.
Collapse
Affiliation(s)
- Meghan Chua
- Department of Computer Science, University of Manitoba, 103 Dafoe Rd W, Winnipeg, Manitoba, Canada R3T 5V6, Canada
| | - Anthony Tan
- Department of Computer Science, University of Manitoba, 103 Dafoe Rd W, Winnipeg, Manitoba, Canada R3T 5V6, Canada
| | - Olivier Tremblay-Savard
- Department of Computer Science, University of Manitoba, 103 Dafoe Rd W, Winnipeg, Manitoba, Canada R3T 5V6, Canada
| |
Collapse
|
13
|
Zhou Y, Shearwin-Whyatt L, Li J, Song Z, Hayakawa T, Stevens D, Fenelon JC, Peel E, Cheng Y, Pajpach F, Bradley N, Suzuki H, Nikaido M, Damas J, Daish T, Perry T, Zhu Z, Geng Y, Rhie A, Sims Y, Wood J, Haase B, Mountcastle J, Fedrigo O, Li Q, Yang H, Wang J, Johnston SD, Phillippy AM, Howe K, Jarvis ED, Ryder OA, Kaessmann H, Donnelly P, Korlach J, Lewin HA, Graves J, Belov K, Renfree MB, Grutzner F, Zhou Q, Zhang G. Platypus and echidna genomes reveal mammalian biology and evolution. Nature 2021; 592:756-762. [PMID: 33408411 PMCID: PMC8081666 DOI: 10.1038/s41586-020-03039-0] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 07/30/2020] [Indexed: 12/13/2022]
Abstract
Egg-laying mammals (monotremes) are the only extant mammalian outgroup to therians (marsupial and eutherian animals) and provide key insights into mammalian evolution1,2. Here we generate and analyse reference genomes of the platypus (Ornithorhynchus anatinus) and echidna (Tachyglossus aculeatus), which represent the only two extant monotreme lineages. The nearly complete platypus genome assembly has anchored almost the entire genome onto chromosomes, markedly improving the genome continuity and gene annotation. Together with our echidna sequence, the genomes of the two species allow us to detect the ancestral and lineage-specific genomic changes that shape both monotreme and mammalian evolution. We provide evidence that the monotreme sex chromosome complex originated from an ancestral chromosome ring configuration. The formation of such a unique chromosome complex may have been facilitated by the unusually extensive interactions between the multi-X and multi-Y chromosomes that are shared by the autosomal homologues in humans. Further comparative genomic analyses unravel marked differences between monotremes and therians in haptoglobin genes, lactation genes and chemosensory receptor genes for smell and taste that underlie the ecological adaptation of monotremes.
Collapse
Affiliation(s)
- Yang Zhou
- BGI-Shenzhen, Shenzhen, China
- Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Linda Shearwin-Whyatt
- School of Biological Sciences, The Environment Institute, The University of Adelaide, Adelaide, South Australia, Australia
| | - Jing Li
- MOE Laboratory of Biosystems Homeostasis and Protection and Zhejiang Provincial Key Laboratory for Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, Hangzhou, China
| | - Zhenzhen Song
- BGI-Shenzhen, Shenzhen, China
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, China
| | - Takashi Hayakawa
- Faculty of Environmental Earth Science, Hokkaido University, Sapporo, Japan
- Japan Monkey Centre, Inuyama, Japan
| | - David Stevens
- School of Biological Sciences, The Environment Institute, The University of Adelaide, Adelaide, South Australia, Australia
| | - Jane C Fenelon
- School of BioSciences, The University of Melbourne, Melbourne, Victoria, Australia
| | - Emma Peel
- School of Life and Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia
| | - Yuanyuan Cheng
- School of Life and Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia
| | - Filip Pajpach
- School of Biological Sciences, The Environment Institute, The University of Adelaide, Adelaide, South Australia, Australia
| | - Natasha Bradley
- School of Biological Sciences, The Environment Institute, The University of Adelaide, Adelaide, South Australia, Australia
| | | | - Masato Nikaido
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
| | - Joana Damas
- The Genome Center, University of California, Davis, CA, USA
| | - Tasman Daish
- School of Biological Sciences, The Environment Institute, The University of Adelaide, Adelaide, South Australia, Australia
| | - Tahlia Perry
- School of Biological Sciences, The Environment Institute, The University of Adelaide, Adelaide, South Australia, Australia
| | - Zexian Zhu
- MOE Laboratory of Biosystems Homeostasis and Protection and Zhejiang Provincial Key Laboratory for Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, Hangzhou, China
| | - Yuncong Geng
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ying Sims
- Tree of Life Programme, Wellcome Sanger Institute, Cambridge, UK
| | - Jonathan Wood
- Tree of Life Programme, Wellcome Sanger Institute, Cambridge, UK
| | - Bettina Haase
- The Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | | | - Olivier Fedrigo
- The Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | - Qiye Li
- BGI-Shenzhen, Shenzhen, China
| | - Huanming Yang
- BGI-Shenzhen, Shenzhen, China
- James D. Watson Institute of Genome Sciences, Hangzhou, China
- University of the Chinese Academy of Sciences, Beijing, China
- Guangdong Provincial Academician Workstation of BGI Synthetic Genomics, BGI-Shenzhen, Shenzhen, China
| | - Jian Wang
- BGI-Shenzhen, Shenzhen, China
- James D. Watson Institute of Genome Sciences, Hangzhou, China
| | - Stephen D Johnston
- School of Agriculture and Food Sciences, The University of Queensland, Gatton, Queensland, Australia
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Kerstin Howe
- Tree of Life Programme, Wellcome Sanger Institute, Cambridge, UK
| | - Erich D Jarvis
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | | | - Henrik Kaessmann
- Center for Molecular Biology of Heidelberg University (ZMBH), DKFZ-ZMBH Alliance, Heidelberg, Germany
| | - Peter Donnelly
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | | | - Harris A Lewin
- The Genome Center, University of California, Davis, CA, USA
- Department of Evolution and Ecology, College of Biological Sciences, University of California, Davis, CA, USA
- Department of Reproduction and Population Health, School of Veterinary Medicine, University of California, Davis, CA, USA
| | - Jennifer Graves
- Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia
- Institute for Applied Ecology, University of Canberra, Canberra, Australian Capital Territory, Australia
- School of Life Sciences, La Trobe University, Melbourne, Victoria, Australia
| | - Katherine Belov
- School of Life and Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia
| | - Marilyn B Renfree
- School of BioSciences, The University of Melbourne, Melbourne, Victoria, Australia
| | - Frank Grutzner
- School of Biological Sciences, The Environment Institute, The University of Adelaide, Adelaide, South Australia, Australia.
| | - Qi Zhou
- MOE Laboratory of Biosystems Homeostasis and Protection and Zhejiang Provincial Key Laboratory for Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, Hangzhou, China.
- Department of Neuroscience and Developmental Biology, University of Vienna, Vienna, Austria.
- Center for Reproductive Medicine, The 2nd Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China.
| | - Guojie Zhang
- BGI-Shenzhen, Shenzhen, China.
- Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.
| |
Collapse
|
14
|
Saud Z, Kortsinoglou AM, Kouvelis VN, Butt TM. Telomere length de novo assembly of all 7 chromosomes and mitogenome sequencing of the model entomopathogenic fungus, Metarhizium brunneum, by means of a novel assembly pipeline. BMC Genomics 2021; 22:87. [PMID: 33509090 PMCID: PMC7842015 DOI: 10.1186/s12864-021-07390-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Accepted: 01/13/2021] [Indexed: 12/31/2022] Open
Abstract
Background More accurate and complete reference genomes have improved understanding of gene function, biology, and evolutionary mechanisms. Hybrid genome assembly approaches leverage benefits of both long, relatively error-prone reads from third-generation sequencing technologies and short, accurate reads from second-generation sequencing technologies, to produce more accurate and contiguous de novo genome assemblies in comparison to using either technology independently. In this study, we present a novel hybrid assembly pipeline that allowed for both mitogenome de novo assembly and telomere length de novo assembly of all 7 chromosomes of the model entomopathogenic fungus, Metarhizium brunneum. Results The improved assembly allowed for better ab initio gene prediction and a more BUSCO complete proteome set has been generated in comparison to the eight current NCBI reference Metarhizium spp. genomes. Remarkably, we note that including the mitogenome in ab initio gene prediction training improved overall gene prediction. The assembly was further validated by comparing contig assembly agreement across various assemblers, assessing the assembly performance of each tool. Genomic synteny and orthologous protein clusters were compared between Metarhizium brunneum and three other Hypocreales species with complete genomes, identifying core proteins, and listing orthologous protein clusters shared uniquely between the two entomopathogenic fungal species, so as to further facilitate the understanding of molecular mechanisms underpinning fungal-insect pathogenesis. Conclusions The novel assembly pipeline may be used for other haploid fungal species, facilitating the need to produce high-quality reference fungal genomes, leading to better understanding of fungal genomic evolution, chromosome structuring and gene regulation. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07390-y.
Collapse
Affiliation(s)
- Zack Saud
- Department of Biosciences, College of Science, Swansea University, Singleton Park, Swansea, Wales, SA2 8PP, UK.
| | - Alexandra M Kortsinoglou
- Department of Genetics and Biotechnology, Faculty of Biology, National and Kapodistrian University of Athens, Panepistimiopolis, 15701, Athens, Greece
| | - Vassili N Kouvelis
- Department of Genetics and Biotechnology, Faculty of Biology, National and Kapodistrian University of Athens, Panepistimiopolis, 15701, Athens, Greece
| | - Tariq M Butt
- Department of Biosciences, College of Science, Swansea University, Singleton Park, Swansea, Wales, SA2 8PP, UK.
| |
Collapse
|
15
|
|
16
|
3D Genome of macaque fetal brain reveals evolutionary innovations during primate corticogenesis. Cell 2021; 184:723-740.e21. [PMID: 33508230 DOI: 10.1016/j.cell.2021.01.001] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 11/09/2020] [Accepted: 12/31/2020] [Indexed: 02/06/2023]
Abstract
Elucidating the regulatory mechanisms of human brain evolution is essential to understanding human cognition and mental disorders. We generated multi-omics profiles and constructed a high-resolution map of 3D genome architecture of rhesus macaque during corticogenesis. By comparing the 3D genomes of human, macaque, and mouse brains, we identified many human-specific chromatin structure changes, including 499 topologically associating domains (TADs) and 1,266 chromatin loops. The human-specific loops are significantly enriched in enhancer-enhancer interactions, and the regulated genes show human-specific expression changes in the subplate, a transient zone of the developing brain critical for neural circuit formation and plasticity. Notably, many human-specific sequence changes are located in the human-specific TAD boundaries and loop anchors, which may generate new transcription factor binding sites and chromatin structures in human. Collectively, the presented data highlight the value of comparative 3D genome analyses in dissecting the regulatory mechanisms of brain development and evolution.
Collapse
|
17
|
Altenhoff AM, Train CM, Gilbert KJ, Mediratta I, Mendes de Farias T, Moi D, Nevers Y, Radoykova HS, Rossier V, Warwick Vesztrocy A, Glover NM, Dessimoz C. OMA orthology in 2021: website overhaul, conserved isoforms, ancestral gene order and more. Nucleic Acids Res 2021; 49:D373-D379. [PMID: 33174605 PMCID: PMC7779010 DOI: 10.1093/nar/gkaa1007] [Citation(s) in RCA: 93] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/10/2020] [Accepted: 10/14/2020] [Indexed: 01/11/2023] Open
Abstract
OMA is an established resource to elucidate evolutionary relationships among genes from currently 2326 genomes covering all domains of life. OMA provides pairwise and groupwise orthologs, functional annotations, local and global gene order conservation (synteny) information, among many other functions. This update paper describes the reorganisation of the database into gene-, group- and genome-centric pages. Other new and improved features are detailed, such as reporting of the evolutionarily best conserved isoforms of alternatively spliced genes, the inferred local order of ancestral genes, phylogenetic profiling, better cross-references, fast genome mapping, semantic data sharing via RDF, as well as a special coronavirus OMA with 119 viruses from the Nidovirales order, including SARS-CoV-2, the agent of the COVID-19 pandemic. We conclude with improvements to the documentation of the resource through primers, tutorials and short videos. OMA is accessible at https://omabrowser.org.
Collapse
Affiliation(s)
- Adrian M Altenhoff
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- ETH Zurich, Computer Science, Universitätstr. 6, 8092 Zurich, Switzerland
| | - Clément-Marie Train
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Kimberly J Gilbert
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Ishita Mediratta
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Department of Computer Science and Information Systems, BITS Pilani K.K. Birla Goa Campus, India
| | | | - David Moi
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Yannis Nevers
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Hale-Seda Radoykova
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, Gower St, London WC1E 6BT, United Kingdom
- Department of Computer Science, University College London, Gower St, London WC1E 6BT, United Kingdom
| | - Victor Rossier
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Alex Warwick Vesztrocy
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Natasha M Glover
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Christophe Dessimoz
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, Gower St, London WC1E 6BT, United Kingdom
- Department of Computer Science, University College London, Gower St, London WC1E 6BT, United Kingdom
| |
Collapse
|
18
|
Li J, Zhang J, Liu J, Zhou Y, Cai C, Xu L, Dai X, Feng S, Guo C, Rao J, Wei K, Jarvis ED, Jiang Y, Zhou Z, Zhang G, Zhou Q. A new duck genome reveals conserved and convergently evolved chromosome architectures of birds and mammals. Gigascience 2021; 10:giaa142. [PMID: 33406261 PMCID: PMC7787181 DOI: 10.1093/gigascience/giaa142] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Revised: 10/31/2020] [Accepted: 11/16/2020] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Ducks have a typical avian karyotype that consists of macro- and microchromosomes, but a pair of much less differentiated ZW sex chromosomes compared to chickens. To elucidate the evolution of chromosome architectures between ducks and chickens, and between birds and mammals, we produced a nearly complete chromosomal assembly of a female Pekin duck by combining long-read sequencing and multiplatform scaffolding techniques. RESULTS A major improvement of genome assembly and annotation quality resulted from the successful resolution of lineage-specific propagated repeats that fragmented the previous Illumina-based assembly. We found that the duck topologically associated domains (TAD) are demarcated by putative binding sites of the insulator protein CTCF, housekeeping genes, or transitions of active/inactive chromatin compartments, indicating conserved mechanisms of spatial chromosome folding with mammals. There are extensive overlaps of TAD boundaries between duck and chicken, and also between the TAD boundaries and chromosome inversion breakpoints. This suggests strong natural selection pressure on maintaining regulatory domain integrity, or vulnerability of TAD boundaries to DNA double-strand breaks. The duck W chromosome retains 2.5-fold more genes relative to chicken. Similar to the independently evolved human Y chromosome, the duck W evolved massive dispersed palindromic structures, and a pattern of sequence divergence with the Z chromosome that reflects stepwise suppression of homologous recombination. CONCLUSIONS Our results provide novel insights into the conserved and convergently evolved chromosome features of birds and mammals, and also importantly add to the genomic resources for poultry studies.
Collapse
Affiliation(s)
- Jing Li
- MOE Laboratory of Biosystems Homeostasis & Protection and Zhejiang Provincial Key Laboratory for Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, 866 Yuhangtang Road, Hangzhou 310058, China
| | - Jilin Zhang
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, 5 Nobels väg, Stockholm 17177, Sweden
| | - Jing Liu
- MOE Laboratory of Biosystems Homeostasis & Protection and Zhejiang Provincial Key Laboratory for Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, 866 Yuhangtang Road, Hangzhou 310058, China
- Department of Neuroscience and Developmental Biology, University of Vienna, 1 Universitätsring, Vienna 1090, Austria
| | - Yang Zhou
- BGI-Shenzhen, 146 Beishan Industrial Zone, Shenzhen 518083, China
| | - Cheng Cai
- MOE Laboratory of Biosystems Homeostasis & Protection and Zhejiang Provincial Key Laboratory for Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, 866 Yuhangtang Road, Hangzhou 310058, China
| | - Luohao Xu
- MOE Laboratory of Biosystems Homeostasis & Protection and Zhejiang Provincial Key Laboratory for Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, 866 Yuhangtang Road, Hangzhou 310058, China
- Department of Neuroscience and Developmental Biology, University of Vienna, 1 Universitätsring, Vienna 1090, Austria
| | - Xuelei Dai
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, 3 Taicheng Road, Yangling 712100, China
| | - Shaohong Feng
- BGI-Shenzhen, 146 Beishan Industrial Zone, Shenzhen 518083, China
| | - Chunxue Guo
- BGI-Shenzhen, 146 Beishan Industrial Zone, Shenzhen 518083, China
| | - Jinpeng Rao
- Center for Reproductive Medicine, The 2nd Affiliated Hospital, School of Medicine, Zhejiang University, 88 Jiefang Road, Hangzhou 310052, China
| | - Kai Wei
- Center for Reproductive Medicine, The 2nd Affiliated Hospital, School of Medicine, Zhejiang University, 88 Jiefang Road, Hangzhou 310052, China
| | - Erich D Jarvis
- Laboratory of Neurogenetics of Language, The Rockefeller University, 1230 York Ave, NY 10065, USA
- Howard Hughes Medical Institute, 4000 Jones Bridge Road, Chevy Chase, MD 20815, USA
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, 3 Taicheng Road, Yangling 712100, China
| | - Zhengkui Zhou
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, 12 Zhong Guan Cun Da Jie, Beijing, China
| | - Guojie Zhang
- China National GeneBank, BGI-Shenzhen, Jinsha Road, Shenzhen 518120, China
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, 32 East Jiaochang Road, Kunming 650223, China
- Section for Ecology and Evolution, Department of Biology, University of Copenhagen, 10 Nørregade, DK-2100 Copenhagen, Denmark
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, 32 East Jiaochang Road, Kunming 650223, China
| | - Qi Zhou
- MOE Laboratory of Biosystems Homeostasis & Protection and Zhejiang Provincial Key Laboratory for Cancer Molecular Cell Biology, Life Sciences Institute, Zhejiang University, 866 Yuhangtang Road, Hangzhou 310058, China
- Department of Neuroscience and Developmental Biology, University of Vienna, 1 Universitätsring, Vienna 1090, Austria
- Center for Reproductive Medicine, The 2nd Affiliated Hospital, School of Medicine, Zhejiang University, 88 Jiefang Road, Hangzhou 310052, China
| |
Collapse
|
19
|
Garrett Vieira F, Samaniego Castruita JA, Gilbert MTP. Using in silico predicted ancestral genomes to improve the efficiency of paleogenome reconstruction. Ecol Evol 2020; 10:12700-12709. [PMID: 33304488 PMCID: PMC7713980 DOI: 10.1002/ece3.6925] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 09/23/2020] [Accepted: 09/28/2020] [Indexed: 01/20/2023] Open
Abstract
Paleogenomics is the nascent discipline concerned with sequencing and analysis of genome-scale information from historic, ancient, and even extinct samples. While once inconceivable due to the challenges of DNA damage, contamination, and the technical limitations of PCR-based Sanger sequencing, following the dawn of the second-generation sequencing revolution, it has rapidly become a reality. However, a significant challenge facing ancient DNA studies on extinct species is the lack of closely related reference genomes against which to map the sequencing reads from ancient samples. Although bioinformatic efforts to improve the assemblies have focused mainly in mapping algorithms, in this article we explore the potential of an alternative approach, namely using reconstructed ancestral genome as reference for mapping DNA sequences of ancient samples. Specifically, we present a preliminary proof of concept for a general framework and demonstrate how under certain evolutionary divergence thresholds, considerable mapping improvements can be easily obtained.
Collapse
Affiliation(s)
- Filipe Garrett Vieira
- Section for Evolutionary GenomicsThe GLOBE InstituteFaculty of Health and Medical SciencesUniversity of CopenhagenCopenhagenDenmark
| | - José Alfredo Samaniego Castruita
- Section for Evolutionary GenomicsThe GLOBE InstituteFaculty of Health and Medical SciencesUniversity of CopenhagenCopenhagenDenmark
| | - M. Thomas P. Gilbert
- Section for Evolutionary GenomicsThe GLOBE InstituteFaculty of Health and Medical SciencesUniversity of CopenhagenCopenhagenDenmark
- University MuseumNorwegian University of Science and TechnologyTrondheimNorway
| |
Collapse
|
20
|
Abstract
The study of chromosome evolution is undergoing a resurgence of interest owing to advances in DNA sequencing technology that facilitate the production of chromosome-scale whole-genome assemblies de novo. This review focuses on the history, methods, discoveries, and current challenges facing the field, with an emphasis on vertebrate genomes. A detailed examination of the literature on the biology of chromosome rearrangements is presented, specifically the relationship between chromosome rearrangements and phenotypic evolution, adaptation, and speciation. A critical review of the methods for identifying, characterizing, and visualizing chromosome rearrangements and computationally reconstructing ancestral karyotypes is presented. We conclude by looking to the future, identifying the enormous technical and scientific challenges presented by the accumulation of hundreds and eventually thousands of chromosome-scale assemblies.
Collapse
Affiliation(s)
- Joana Damas
- The Genome Center, University of California, Davis, California 95616, USA; , ,
| | - Marco Corbo
- The Genome Center, University of California, Davis, California 95616, USA; , ,
| | - Harris A Lewin
- The Genome Center, University of California, Davis, California 95616, USA; , , .,Department of Evolution and Ecology, College of Biological Sciences, University of California, Davis, California 95616, USA
| |
Collapse
|
21
|
Paszek J, Tiuryn J, Górecki P. Minimizing genomic duplication episodes. Comput Biol Chem 2020; 89:107260. [PMID: 33038778 DOI: 10.1016/j.compbiolchem.2020.107260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Accepted: 04/02/2020] [Indexed: 11/17/2022]
Abstract
BACKGROUND The genomic duplication study is fundamental to understand the process of evolution. In evolutionary molecular biology, many approaches focus on discovering the occurrences of gene duplications and multiple gene duplication episodes and their locations in the Tree of Life. To reconstruct such episodes, one can cluster single gene duplications inferred by reconciling a set of gene trees with a species tree. RESULTS We propose an efficient quadratic time algorithm to solve the problem of genomic duplication clustering, in which input gene trees are rooted, episode locations are restricted to preserve the minimal number of single gene duplications, clustering rules are described by minimum episodes method, and the goal is based on the recently introduced new approach to minimize the maximal number of duplication episodes on a single path, called here the MP score. Based on our theoretical results, we show new algorithmic relationships between the MP score and the minimum episodes (ME) score, defined as the minimal number of duplication episodes. CONCLUSIONS Our evaluation analysis on three empirical datasets demonstrates, that under the model in which the minimal number of duplications is preserved, the duplication clusterings with minimal MP score support the clusterings with the minimal total number of duplication episodes. AVAILABILITY The software is available at https://bitbucket.org/pgor17/rmp.
Collapse
Affiliation(s)
- Jarosław Paszek
- Warsaw University, Faculty of Mathematics, Informatics and Mechanics, Banacha 2, 02-097 Warsaw, Poland.
| | - Jerzy Tiuryn
- Warsaw University, Faculty of Mathematics, Informatics and Mechanics, Banacha 2, 02-097 Warsaw, Poland.
| | - Paweł Górecki
- Warsaw University, Faculty of Mathematics, Informatics and Mechanics, Banacha 2, 02-097 Warsaw, Poland.
| |
Collapse
|
22
|
Lallemand T, Leduc M, Landès C, Rizzon C, Lerat E. An Overview of Duplicated Gene Detection Methods: Why the Duplication Mechanism Has to Be Accounted for in Their Choice. Genes (Basel) 2020; 11:E1046. [PMID: 32899740 PMCID: PMC7565063 DOI: 10.3390/genes11091046] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 09/01/2020] [Accepted: 09/02/2020] [Indexed: 12/11/2022] Open
Abstract
Gene duplication is an important evolutionary mechanism allowing to provide new genetic material and thus opportunities to acquire new gene functions for an organism, with major implications such as speciation events. Various processes are known to allow a gene to be duplicated and different models explain how duplicated genes can be maintained in genomes. Due to their particular importance, the identification of duplicated genes is essential when studying genome evolution but it can still be a challenge due to the various fates duplicated genes can encounter. In this review, we first describe the evolutionary processes allowing the formation of duplicated genes but also describe the various bioinformatic approaches that can be used to identify them in genome sequences. Indeed, these bioinformatic approaches differ according to the underlying duplication mechanism. Hence, understanding the specificity of the duplicated genes of interest is a great asset for tool selection and should be taken into account when exploring a biological question.
Collapse
Affiliation(s)
- Tanguy Lallemand
- IRHS, Agrocampus-Ouest, INRAE, Université d’Angers, SFR 4207 QuaSaV, 49071 Beaucouzé, France; (T.L.); (M.L.); (C.L.)
| | - Martin Leduc
- IRHS, Agrocampus-Ouest, INRAE, Université d’Angers, SFR 4207 QuaSaV, 49071 Beaucouzé, France; (T.L.); (M.L.); (C.L.)
| | - Claudine Landès
- IRHS, Agrocampus-Ouest, INRAE, Université d’Angers, SFR 4207 QuaSaV, 49071 Beaucouzé, France; (T.L.); (M.L.); (C.L.)
| | - Carène Rizzon
- Laboratoire de Mathématiques et Modélisation d’Evry (LaMME), Université d’Evry Val d’Essonne, Université Paris-Saclay, UMR CNRS 8071, ENSIIE, USC INRAE, 23 bvd de France, CEDEX, 91037 Evry Paris, France;
| | - Emmanuelle Lerat
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, F-69622 Villeurbanne, France
| |
Collapse
|
23
|
Drillon G, Champeimont R, Oteri F, Fischer G, Carbone A. Phylogenetic Reconstruction Based on Synteny Block and Gene Adjacencies. Mol Biol Evol 2020; 37:2747-2762. [PMID: 32384156 PMCID: PMC7475045 DOI: 10.1093/molbev/msaa114] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Gene order can be used as an informative character to reconstruct phylogenetic relationships between species independently from the local information present in gene/protein sequences. PhyChro is a reconstruction method based on chromosomal rearrangements, applicable to a wide range of eukaryotic genomes with different gene contents and levels of synteny conservation. For each synteny breakpoint issued from pairwise genome comparisons, the algorithm defines two disjoint sets of genomes, named partial splits, respectively, supporting the two block adjacencies defining the breakpoint. Considering all partial splits issued from all pairwise comparisons, a distance between two genomes is computed from the number of partial splits separating them. Tree reconstruction is achieved through a bottom-up approach by iteratively grouping sister genomes minimizing genome distances. PhyChro estimates branch lengths based on the number of synteny breakpoints and provides confidence scores for the branches. PhyChro performance is evaluated on two data sets of 13 vertebrates and 21 yeast genomes by using up to 130,000 and 179,000 breakpoints, respectively, a scale of genomic markers that has been out of reach until now. PhyChro reconstructs very accurate tree topologies even at known problematic branching positions. Its robustness has been benchmarked for different synteny block reconstruction methods. On simulated data PhyChro reconstructs phylogenies perfectly in almost all cases, and shows the highest accuracy compared with other existing tools. PhyChro is very fast, reconstructing the vertebrate and yeast phylogenies in <15 min.
Collapse
Affiliation(s)
- Guénola Drillon
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative—UMR 7238, Paris, France, Paris, France
| | - Raphaël Champeimont
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative—UMR 7238, Paris, France, Paris, France
| | - Francesco Oteri
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative—UMR 7238, Paris, France, Paris, France
| | - Gilles Fischer
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative—UMR 7238, Paris, France, Paris, France
| | - Alessandra Carbone
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative—UMR 7238, Paris, France, Paris, France
- Institut Universitaire de France, Paris, France
| |
Collapse
|
24
|
Delabre M, El-Mabrouk N, Huber KT, Lafond M, Moulton V, Noutahi E, Castellanos MS. Evolution through segmental duplications and losses: a Super-Reconciliation approach. Algorithms Mol Biol 2020; 15:12. [PMID: 32508979 PMCID: PMC7249433 DOI: 10.1186/s13015-020-00171-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Accepted: 05/05/2020] [Indexed: 02/02/2023] Open
Abstract
The classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is not appropriate for genes grouped into syntenic blocks, which are more plausibly the result of a concerted evolution. Here, we introduce the Super-Reconciliation problem which consists in inferring a history of segmental duplication and loss events (involving a set of neighboring genes) leading to a set of present-day syntenies from a single ancestral one. In other words, we extend the traditional Duplication-Loss reconciliation problem of a single gene tree, to a set of trees, accounting for segmental duplications and losses. Existency of a Super-Reconciliation depends on individual gene tree consistency. In addition, ignoring rearrangements implies that existency also depends on gene order consistency. We first show that the problem of reconstructing a most parsimonious Super-Reconciliation, if any, is NP-hard and give an exact exponential-time algorithm to solve it. Alternatively, we show that accounting for rearrangements in the evolutionary model, but still only minimizing segmental duplication and loss events, leads to an exact polynomial-time algorithm. We finally assess time efficiency of the former exponential time algorithm for the Duplication-Loss model on simulated datasets, and give a proof of concept on the opioid receptor genes.
Collapse
|
25
|
Evolution of the Human Chromosome 13 Synteny: Evolutionary Rearrangements, Plasticity, Human Disease Genes and Cancer Breakpoints. Genes (Basel) 2020; 11:genes11040383. [PMID: 32244767 PMCID: PMC7230465 DOI: 10.3390/genes11040383] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 03/27/2020] [Accepted: 03/27/2020] [Indexed: 01/29/2023] Open
Abstract
The history of each human chromosome can be studied through comparative cytogenetic approaches in mammals which permit the identification of human chromosomal homologies and rearrangements between species. Comparative banding, chromosome painting, Bacterial Artificial Chromosome (BAC) mapping and genome data permit researchers to formulate hypotheses about ancestral chromosome forms. Human chromosome 13 has been previously shown to be conserved as a single syntenic element in the Ancestral Primate Karyotype; in this context, in order to study and verify the conservation of primate chromosomes homologous to human chromosome 13, we mapped a selected set of BAC probes in three platyrrhine species, characterised by a high level of rearrangements, using fluorescence in situ hybridisation (FISH). Our mapping data on Saguinus oedipus, Callithrix argentata and Alouatta belzebul provide insight into synteny of human chromosome 13 evolution in a comparative perspective among primate species, showing rearrangements across taxa. Furthermore, in a wider perspective, we have revised previous cytogenomic literature data on chromosome 13 evolution in eutherian mammals, showing a complex origin of the eutherian mammal ancestral karyotype which has still not been completely clarified. Moreover, we analysed biomedical aspects (the OMIM and Mitelman databases) regarding human chromosome 13, showing that this autosome is characterised by a certain level of plasticity that has been implicated in many human cancers and diseases.
Collapse
|
26
|
Wang J, Cui B, Zhao Y, Guo M. A New Algorithm for Identifying Genome Rearrangements in the Mammalian Evolution. Front Genet 2019; 10:1020. [PMID: 31737036 PMCID: PMC6828935 DOI: 10.3389/fgene.2019.01020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Accepted: 09/24/2019] [Indexed: 11/13/2022] Open
Abstract
Genome rearrangements are the evolutionary events on level of genomes. It is a global view on evolution research of species to analyze the genome rearrangements. We introduce a new method called RGRPT (recovering the genome rearrangements based on phylogenetic tree) used to identify the genome rearrangements. We test the RGRPT using simulated data. The results of experiments show that RGRPT have high sensitivity and specificity compared with other tools when to predict rearrangement events. We use RGRPT to predict the rearrangement events of six mammalian genomes (human, chimpanzee, rhesus macaque, mouse, rat, and dog). RGRPT has recognized a total of 1,157 rearrangement events for them at 10 kb resolution, including 858 reversals, 16 translocations, 249 transpositions, and 34 fusions/fissions. And RGRPT has recognized 475 rearrangement events for them at 50 kb resolution, including 332 reversals, 13 translocations, 94 transpositions, and 36 fusions/fissions. The code source of RGRPT is available from https://github.com/wangjuanimu/data-of-genome-rearrangement.
Collapse
Affiliation(s)
- Juan Wang
- School of Computer Science, Inner Mongolia University, Hohhot, China
| | - Bo Cui
- School of Computer Science, Inner Mongolia University, Hohhot, China
| | - Yulan Zhao
- School of Computer Science, Inner Mongolia University, Hohhot, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.,Beijing University of Civil Engineering and Architecture, Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing, China
| |
Collapse
|
27
|
Ren L, Huang W, Cannon SB. Reconstruction of ancestral genome reveals chromosome evolution history for selected legume species. THE NEW PHYTOLOGIST 2019; 223:2090-2103. [PMID: 30834536 DOI: 10.1111/nph.15770] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 02/24/2019] [Indexed: 05/18/2023]
Abstract
Reconstruction of an ancestral genome for a set of plant species has been a challenging task because of complex histories that may include whole-genome duplications, segmental duplications, independent gene duplications or losses, diploidization and rearrangement events. Here, we describe the reconstruction a hypothetical ancestral genome for the papilionoid legumes (the largest subfamily within the third largest family in flowering plants), and evaluate the results relative to phylogenetic and chromosomal count data for this group of legumes, spanning 294 diverse papilionoid genera. To reconstruct the ancestral genomes for nine legume species with sequenced genomes, we used a maximum likelihood approach combined with a novel method for identifying informative markers for this purpose. Analyzing genomes from four species within the Phaseoleae, two in Dalbergieae, two in the 'inverted repeat loss' clade, and one in the Robinieae, we infer a common ancestral genome with nine chromosomes. The reconstructed genome structural histories are consistent with chromosomal and phylogenetic histories, but we also infer that a common ancestor with nine chromosomes was probably intermediate to an earlier state of 14 chromosomes following a whole-genome duplication that pre-dated the radiation of the papilionoid legumes, evidence for which is found in early-diverging papilionoid lineages.
Collapse
Affiliation(s)
- Longhui Ren
- Interdepartmental Genetics Graduate Program, 2014 Molecular Biology, Iowa State University, 2437 Pammel Drive, Ames, IA, 50011, USA
| | - Wei Huang
- Department of Agronomy, Iowa State University, 716 Farm House Ln, Ames, IA, 50011, USA
| | - Steven B Cannon
- Corn Insects and Crop Genetics Research Unit, US Department of Agriculture-Agricultural Research Service, 819 Wallace Rd, Ames, IA, 50011, USA
| |
Collapse
|
28
|
Luhmann N, Lafond M, Thevenin A, Ouangraoua A, Wittler R, Chauve C. The SCJ Small Parsimony Problem for Weighted Gene Adjacencies. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1364-1373. [PMID: 28166504 DOI: 10.1109/tcbb.2017.2661761] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Reconstructing ancestral gene orders in a given phylogeny is a classical problem in comparative genomics. Most existing methods compare conserved features in extant genomes in the phylogeny to define potential ancestral gene adjacencies, and either try to reconstruct all ancestral genomes under a global evolutionary parsimony criterion, or, focusing on a single ancestral genome, use a scaffolding approach to select a subset of ancestral gene adjacencies, generally aiming at reducing the fragmentation of the reconstructed ancestral genome. In this paper, we describe an exact algorithm for the Small Parsimony Problem that combines both approaches. We consider that gene adjacencies at internal nodes of the species phylogeny are weighted, and we introduce an objective function defined as a convex combination of these weights and the evolutionary cost under the Single-Cut-or-Join (SCJ) model. The weights of ancestral gene adjacencies can, e.g., be obtained through the recent availability of ancient DNA sequencing data, which provide a direct hint at the genome structure of the considered ancestor, or through probabilistic analysis of gene adjacencies evolution. We show the NP-hardness of our problem variant and propose a Fixed-Parameter Tractable algorithm based on the Sankoff-Rousseau dynamic programming algorithm that also allows to sample co-optimal solutions. We apply our approach to mammalian and bacterial data providing different degrees of complexity. We show that including adjacency weights in the objective has a significant impact in reducing the fragmentation of the reconstructed ancestral gene orders. An implementation is available at http://github.com/nluhmann/PhySca.
Collapse
|
29
|
Yang Y, Zhang Y, Ren B, Dixon JR, Ma J. Comparing 3D Genome Organization in Multiple Species Using Phylo-HMRF. Cell Syst 2019; 8:494-505.e14. [PMID: 31229558 PMCID: PMC6706282 DOI: 10.1016/j.cels.2019.05.011] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Accepted: 05/22/2019] [Indexed: 12/30/2022]
Abstract
Recent whole-genome mapping approaches for the chromatin interactome have offered new insights into 3D genome organization. However, our knowledge of the evolutionary patterns of 3D genome in mammals remains limited. In particular, there are no existing phylogenetic-model-based methods to analyze chromatin interactions as continuous features. Here, we develop phylogenetic hidden Markov random field (Phylo-HMRF) to identify evolutionary patterns of 3D genome based on multi-species Hi-C data by jointly utilizing spatial constraints among genomic loci and continuous-trait evolutionary models. We used Phylo-HMRF to uncover cross-species 3D genome patterns based on Hi-C data from the same cell type in four primate species (human, chimpanzee, bonobo, and gorilla). The identified evolutionary patterns of 3D genome correlate with features of genome structure and function. This work provides a new framework to analyze multi-species continuous genomic features with spatial constraints and has the potential to help reveal the evolutionary principles of 3D genome organization.
Collapse
Affiliation(s)
- Yang Yang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Yang Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Bing Ren
- Ludwig Institute for Cancer Research, Department of Cellular and Molecular Medicine, Moores Cancer Center and Institute of Genomic Medicine, UCSD School of Medicine, La Jolla, CA 92093, USA
| | - Jesse R Dixon
- Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
| |
Collapse
|
30
|
Avdeyev P, Jiang S, Alekseyev MA. Linearization of Median Genomes Under the Double-Cut-and-Join-Indel Model. Evol Bioinform Online 2019; 15:1176934318820534. [PMID: 31217687 PMCID: PMC6557028 DOI: 10.1177/1176934318820534] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Accepted: 11/27/2018] [Indexed: 11/17/2022] Open
Abstract
Reconstruction of the median genome consisting of linear chromosomes from three given genomes is known to be intractable. There exist efficient methods for solving a relaxed version of this problem, where the median genome is allowed to have circular chromosomes. We propose a method for construction of an approximate solution to the original problem from a solution to the relaxed problem and prove a bound on its approximation error. Our method also provides insights into the combinatorial structure of genome transformations with respect to appearance of circular chromosomes.
Collapse
Affiliation(s)
- Pavel Avdeyev
- Computational Biology Institute, The George Washington University, Washington, DC, USA
| | - Shuai Jiang
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, USA
| | - Max A Alekseyev
- Computational Biology Institute, The George Washington University, Washington, DC, USA
| |
Collapse
|
31
|
Kwon D, Lee J, Kim J. GMASS: a novel measure for genome assembly structural similarity. BMC Bioinformatics 2019; 20:147. [PMID: 30885117 PMCID: PMC6423833 DOI: 10.1186/s12859-019-2710-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Accepted: 03/03/2019] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Thanks to the recent advancements in next-generation sequencing (NGS) technologies, large amount of genomic data, which are short DNA sequences known as reads, has been accumulating. Diverse assemblers have been developed to generate high quality de novo assemblies using the NGS reads, but their output is very different because of algorithmic differences. However, there are not properly structured measures to show the similarity or difference in assemblies. RESULTS We developed a new measure, called the GMASS score, for comparing two genome assemblies in terms of their structure. The GMASS score was developed based on the distribution pattern of the number and coverage of similar regions between a pair of assemblies. The new measure was able to show structural similarity between assemblies when evaluated by simulated assembly datasets. The application of the GMASS score to compare assemblies in recently published benchmark datasets showed the divergent performance of current assemblers as well as its ability to compare assemblies. CONCLUSION The GMASS score is a novel measure for representing structural similarity between two assemblies. It will contribute to the understanding of assembly output and developing de novo assemblers.
Collapse
Affiliation(s)
- Daehong Kwon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, South Korea
| | - Jongin Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, South Korea
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, South Korea.
| |
Collapse
|
32
|
Farré M, Kim J, Proskuryakova AA, Zhang Y, Kulemzina AI, Li Q, Zhou Y, Xiong Y, Johnson JL, Perelman PL, Johnson WE, Warren WC, Kukekova AV, Zhang G, O'Brien SJ, Ryder OA, Graphodatsky AS, Ma J, Lewin HA, Larkin DM. Evolution of gene regulation in ruminants differs between evolutionary breakpoint regions and homologous synteny blocks. Genome Res 2019; 29:576-589. [PMID: 30760546 PMCID: PMC6442394 DOI: 10.1101/gr.239863.118] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 02/08/2019] [Indexed: 02/02/2023]
Abstract
The role of chromosome rearrangements in driving evolution has been a long-standing question of evolutionary biology. Here we focused on ruminants as a model to assess how rearrangements may have contributed to the evolution of gene regulation. Using reconstructed ancestral karyotypes of Cetartiodactyls, Ruminants, Pecorans, and Bovids, we traced patterns of gross chromosome changes. We found that the lineage leading to the ruminant ancestor after the split from other cetartiodactyls was characterized by mostly intrachromosomal changes, whereas the lineage leading to the pecoran ancestor (including all livestock ruminants) included multiple interchromosomal changes. We observed that the liver cell putative enhancers in the ruminant evolutionary breakpoint regions are highly enriched for DNA sequences under selective constraint acting on lineage-specific transposable elements (TEs) and a set of 25 specific transcription factor (TF) binding motifs associated with recently active TEs. Coupled with gene expression data, we found that genes near ruminant breakpoint regions exhibit more divergent expression profiles among species, particularly in cattle, which is consistent with the phylogenetic origin of these breakpoint regions. This divergence was significantly greater in genes with enhancers that contain at least one of the 25 specific TF binding motifs and located near bovidae-to-cattle lineage breakpoint regions. Taken together, by combining ancestral karyotype reconstructions with analysis of cis regulatory element and gene expression evolution, our work demonstrated that lineage-specific regulatory elements colocalized with gross chromosome rearrangements may have provided valuable functional modifications that helped to shape ruminant evolution.
Collapse
Affiliation(s)
- Marta Farré
- Royal Veterinary College, University of London, London NW1 0TU, United Kingdom
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, Korea
| | - Anastasia A Proskuryakova
- Institute of Molecular and Cellular Biology, SB RAS, Novosibirsk 630090, Russia.,Synthetic Biology Unit, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Yang Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | | | - Qiye Li
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Yang Zhou
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Yingqi Xiong
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Jennifer L Johnson
- Department of Animal Sciences, College of Agricultural, Consumer and Environmental Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Polina L Perelman
- Institute of Molecular and Cellular Biology, SB RAS, Novosibirsk 630090, Russia.,Synthetic Biology Unit, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Warren E Johnson
- Smithsonian Conservation Biology Institute, National Zoological Park, Front Royal, Virginia 22630, USA.,Walter Reed Biosystematics Unit, Museum Support Center, Smithsonian Institution, Suitland, Maryland 20746, USA
| | - Wesley C Warren
- Bond Life Sciences Center, University of Missouri, Columbia, Missouri 63201, USA
| | - Anna V Kukekova
- Department of Animal Sciences, College of Agricultural, Consumer and Environmental Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Guojie Zhang
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China.,State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China.,Centre for Social Evolution, Department of Biology, University of Copenhagen, DK-2100 Copenhagen, Denmark
| | - Stephen J O'Brien
- Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, St. Petersburg 199004, Russia.,Guy Harvey Oceanographic Center, Halmos College of Natural Sciences and Oceanography, Nova Southeastern University, Fort Lauderdale, Florida 33004, USA
| | - Oliver A Ryder
- Institute for Conservation Research, San Diego Zoo, Escondido, California 92027, USA
| | - Alexander S Graphodatsky
- Institute of Molecular and Cellular Biology, SB RAS, Novosibirsk 630090, Russia.,Synthetic Biology Unit, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Harris A Lewin
- Department of Evolution and Ecology and the UC Davis Genome Center, University of California, Davis, California 95616, USA
| | - Denis M Larkin
- Royal Veterinary College, University of London, London NW1 0TU, United Kingdom.,The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Sciences (ICG SB RAS), Novosibirsk 630090, Russia
| |
Collapse
|
33
|
Pont C, Wagner S, Kremer A, Orlando L, Plomion C, Salse J. Paleogenomics: reconstruction of plant evolutionary trajectories from modern and ancient DNA. Genome Biol 2019; 20:29. [PMID: 30744646 PMCID: PMC6369560 DOI: 10.1186/s13059-019-1627-1] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
How contemporary plant genomes originated and evolved is a fascinating question. One approach uses reference genomes from extant species to reconstruct the sequence and structure of their common ancestors over deep timescales. A second approach focuses on the direct identification of genomic changes at a shorter timescale by sequencing ancient DNA preserved in subfossil remains. Merged within the nascent field of paleogenomics, these complementary approaches provide insights into the evolutionary forces that shaped the organization and regulation of modern genomes and open novel perspectives in fostering genetic gain in breeding programs and establishing tools to predict future population changes in response to anthropogenic pressure and global warming.
Collapse
Affiliation(s)
- Caroline Pont
- INRA-UCA UMR 1095 Génétique Diversité et Ecophysiologie des Céréales, 63100, Clermont-Ferrand, France
| | - Stefanie Wagner
- Laboratoire d'Anthropobiologie Moléculaire et d'Imagerie de Synthèse, CNRS UMR 5288, allées Jules Guesde, Bâtiment A, 31000, Toulouse, France.,INRA-Université Bordeaux UMR1202, Biodiversité Gènes et Communautés, 33610, Cestas, France
| | - Antoine Kremer
- INRA-Université Bordeaux UMR1202, Biodiversité Gènes et Communautés, 33610, Cestas, France
| | - Ludovic Orlando
- Laboratoire d'Anthropobiologie Moléculaire et d'Imagerie de Synthèse, CNRS UMR 5288, allées Jules Guesde, Bâtiment A, 31000, Toulouse, France.,Centre for GeoGenetics, Natural History Museum of Denmark, Øster Voldgade, 1350K, Copenhagen, Denmark
| | - Christophe Plomion
- INRA-Université Bordeaux UMR1202, Biodiversité Gènes et Communautés, 33610, Cestas, France
| | - Jerome Salse
- INRA-UCA UMR 1095 Génétique Diversité et Ecophysiologie des Céréales, 63100, Clermont-Ferrand, France.
| |
Collapse
|
34
|
Abstract
Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make most effective use of our rapidly growing databases of whole genomes.
Collapse
Affiliation(s)
- Colin N Dewey
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
35
|
Guyeux C, Al-Nuaimi B, AlKindy B, Couchot JF, Salomon M. On the reconstruction of the ancestral bacterial genomes in genus Mycobacterium and Brucella. BMC SYSTEMS BIOLOGY 2018; 12:100. [PMID: 30458842 PMCID: PMC6245693 DOI: 10.1186/s12918-018-0618-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
BACKGROUND To reconstruct the evolution history of DNA sequences, novel models of increasing complexity regarding the number of free parameters taken into account in the sequence evolution, as well as faster and more accurate algorithms, and statistical and computational methods, are needed. More particularly, as the principal forces that have led to major structural changes are genome rearrangements (such as translocations, fusions, and so on), understanding their underlying mechanisms, among other things via the ancestral genome reconstruction, are essential. In this problem, since finding the ancestral genomes that minimize the number of rearrangements in a phylogenetic tree is known to be NP-hard for three or more genomes, heuristics are commonly chosen to obtain approximations of the exact solution. The aim of this work is to show that another path is possible. RESULTS Various algorithms and software already deal with the difficult nature of the problem of reconstruction of the ancestral genome, but they do not function with precision, in particular when indels or single nucleotide polymorphisms fall into repeated sequences. In this article, and despite the theoretical NP-hardness of the ancestral reconstruction problem, we show that an exact solution can be found in practice in various cases, encompassing organelles and some bacteria. A practical example proves that an accurate reconstruction, which also allows to highlight homoplasic events, can be obtained. This is illustrated by the reconstruction of ancestral genomes of two bacterial pathogens, belonging in Mycobacterium and Brucella genera. CONCLUSIONS By putting together automatically reconstructed ancestral regions with handmade ones for problematic cases, we show that an accurate reconstruction of ancestors of the Brucella genus and of the Mycobacterium tuberculosis complex is possible. By doing so, we are able to investigate the evolutionary history of each pathogen by computing their common ancestors. They can be investigated extensively, by studying the gene content evolution over time, the resistance acquisition, and the impacts of mobile elements on genome plasticity.
Collapse
Affiliation(s)
- Christophe Guyeux
- FEMTO-ST Institute, UMR 6174 CNRS, DISC Computer Science Department, Univ. Bourgogne Franche-Comté (UBFC), 16 Route de Gray, Besançon, 25000 France
| | - Bashar Al-Nuaimi
- FEMTO-ST Institute, UMR 6174 CNRS, DISC Computer Science Department, Univ. Bourgogne Franche-Comté (UBFC), 16 Route de Gray, Besançon, 25000 France
- Department of Computer Science, Diyala University, Diyala, 32001 Iraq
| | - Bassam AlKindy
- Department of Computer Science, Al-Mustansiriyah University, Baghdad, 10052 Iraq
| | - Jean-François Couchot
- FEMTO-ST Institute, UMR 6174 CNRS, DISC Computer Science Department, Univ. Bourgogne Franche-Comté (UBFC), 16 Route de Gray, Besançon, 25000 France
| | - Michel Salomon
- FEMTO-ST Institute, UMR 6174 CNRS, DISC Computer Science Department, Univ. Bourgogne Franche-Comté (UBFC), 16 Route de Gray, Besançon, 25000 France
| |
Collapse
|
36
|
Luhmann N, Chauve C, Stoye J, Wittler R. Scaffolding of Ancient Contigs and Ancestral Reconstruction in a Phylogenetic Framework. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:2094-2100. [PMID: 29993816 DOI: 10.1007/978-3-319-12418-6_17] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Ancestral genome reconstruction is an important task to analyze the evolution of genomes. Recent progress in sequencing ancient DNA led to the publication of so-called paleogenomes and allows the integration of this sequencing data in genome evolution analysis. However, the de novo assembly of ancient genomes is usually fragmented due to DNA degradation over time among others. Integrated phylogenetic assembly addresses the issue of genome fragmentation in the ancient DNA assembly while aiming to improve the reconstruction of all ancient genomes in the phylogeny simultaneously. The fragmented assembly of the ancient genome can be represented as an assembly graph, indicating contradicting ordering information of contigs. In this setting, our approach is to compare the ancient data with extant finished genomes. We generalize a reconstruction approach minimizing the Single-Cut-or-Join rearrangement distance towards multifurcating trees and include edge lengths to improve the reconstruction in practice. This results in a polynomial time algorithm that includes additional ancient DNA data at one node in the tree, resulting in consistent reconstructions of ancestral genomes.
Collapse
|
37
|
Damas J, Kim J, Farré M, Griffin DK, Larkin DM. Reconstruction of avian ancestral karyotypes reveals differences in the evolutionary history of macro- and microchromosomes. Genome Biol 2018; 19:155. [PMID: 30290830 PMCID: PMC6173868 DOI: 10.1186/s13059-018-1544-8] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Accepted: 09/18/2018] [Indexed: 11/29/2022] Open
Abstract
Background Reconstruction of ancestral karyotypes is critical for our understanding of genome evolution, allowing for the identification of the gross changes that shaped extant genomes. The identification of such changes and their time of occurrence can shed light on the biology of each species, clade and their evolutionary history. However, this is impeded by both the fragmented nature of the majority of genome assemblies and the limitations of the available software to work with them. These limitations are particularly apparent in birds, with only 10 chromosome-level assemblies reported thus far. Algorithmic approaches applied to fragmented genome assemblies can nonetheless help define patterns of chromosomal change in defined taxonomic groups. Results Here, we make use of the DESCHRAMBLER algorithm to perform the first large-scale study of ancestral chromosome structure and evolution in birds. This algorithm allows us to reconstruct the overall genome structure of 14 key nodes of avian evolution from the Avian ancestor to the ancestor of the Estrildidae, Thraupidae and Fringillidae families. Conclusions Analysis of these reconstructions provides important insights into the variability of rearrangement rates during avian evolution and allows the detection of patterns related to the chromosome distribution of evolutionary breakpoint regions. Moreover, the inclusion of microchromosomes in our reconstructions allows us to provide novel insights into the evolution of these avian chromosomes, specifically. Electronic supplementary material The online version of this article (10.1186/s13059-018-1544-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Joana Damas
- Department of Comparative Biomedical Sciences, Royal Veterinary College, University of London, London, NW1 0TU, UK
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, South Korea
| | - Marta Farré
- Department of Comparative Biomedical Sciences, Royal Veterinary College, University of London, London, NW1 0TU, UK
| | - Darren K Griffin
- School of Biosciences, University of Kent, Canterbury, CT2 7NY, UK
| | - Denis M Larkin
- Department of Comparative Biomedical Sciences, Royal Veterinary College, University of London, London, NW1 0TU, UK.
| |
Collapse
|
38
|
Lee J, Lee D, Sim M, Kwon D, Kim J, Ko Y, Kim J. mySyntenyPortal: an application package to construct websites for synteny block analysis. BMC Bioinformatics 2018; 19:216. [PMID: 29871588 PMCID: PMC5989462 DOI: 10.1186/s12859-018-2219-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 05/28/2018] [Indexed: 01/01/2023] Open
Abstract
Background Advances in sequencing technologies have facilitated large-scale comparative genomics based on whole genome sequencing. Constructing and investigating conserved genomic regions among multiple species (called synteny blocks) are essential in the comparative genomics. However, they require significant amounts of computational resources and time in addition to bioinformatics skills. Many web interfaces have been developed to make such tasks easier. However, these web interfaces cannot be customized for users who want to use their own set of genome sequences or definition of synteny blocks. Results To resolve this limitation, we present mySyntenyPortal, a stand-alone application package to construct websites for synteny block analyses by using users’ own genome data. mySyntenyPortal provides both command line and web-based interfaces to build and manage websites for large-scale comparative genomic analyses. The websites can be also easily published and accessed by other users. To demonstrate the usability of mySyntenyPortal, we present an example study for building websites to compare genomes of three mammalian species (human, mouse, and cow) and show how they can be easily utilized to identify potential genes affected by genome rearrangements. Conclusions mySyntenyPortal will contribute for extended comparative genomic analyses based on large-scale whole genome sequences by providing unique functionality to support the easy creation of interactive websites for synteny block analyses from user’s own genome data.
Collapse
Affiliation(s)
- Jongin Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, South Korea
| | - Daehwan Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, South Korea
| | - Mikang Sim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, South Korea
| | - Daehong Kwon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, South Korea
| | - Juyeon Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, South Korea
| | - Younhee Ko
- Division of Biomedical Engineering, Hankuk University of Foreign Studies, Yongin, 17035, South Korea
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, South Korea.
| |
Collapse
|
39
|
Anselmetti Y, Duchemin W, Tannier E, Chauve C, Bérard S. Phylogenetic signal from rearrangements in 18 Anopheles species by joint scaffolding extant and ancestral genomes. BMC Genomics 2018; 19:96. [PMID: 29764366 PMCID: PMC5954271 DOI: 10.1186/s12864-018-4466-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background Genomes rearrangements carry valuable information for phylogenetic inference or the elucidation of molecular mechanisms of adaptation. However, the detection of genome rearrangements is often hampered by current deficiencies in data and methods: Genomes obtained from short sequence reads have generally very fragmented assemblies, and comparing multiple gene orders generally leads to computationally intractable algorithmic questions. Results We present a computational method, ADseq, which, by combining ancestral gene order reconstruction, comparative scaffolding and de novo scaffolding methods, overcomes these two caveats. ADseq provides simultaneously improved assemblies and ancestral genomes, with statistical supports on all local features. Compared to previous comparative methods, it runs in polynomial time, it samples solutions in a probabilistic space, and it can handle a significantly larger gene complement from the considered extant genomes, with complex histories including gene duplications and losses. We use ADseq to provide improved assemblies and a genome history made of duplications, losses, gene translocations, rearrangements, of 18 complete Anopheles genomes, including several important malaria vectors. We also provide additional support for a differentiated mode of evolution of the sex chromosome and of the autosomes in these mosquito genomes. Conclusions We demonstrate the method’s ability to improve extant assemblies accurately through a procedure simulating realistic assembly fragmentation. We study a debated issue regarding the phylogeny of the Gambiae complex group of Anopheles genomes in the light of the evolution of chromosomal rearrangements, suggesting that the phylogenetic signal they carry can differ from the phylogenetic signal carried by gene sequences, more prone to introgression. Electronic supplementary material The online version of this article (10.1186/s12864-018-4466-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yoann Anselmetti
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France.,Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex, 69622, France
| | - Wandrille Duchemin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex, 69622, France.,INRIA Grenoble - Rhône-Alpes, 655 Avenue de l'Europe, Montbonnot-Saint-Martin, 38330, France
| | - Eric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex, 69622, France.,INRIA Grenoble - Rhône-Alpes, 655 Avenue de l'Europe, Montbonnot-Saint-Martin, 38330, France
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, V5A1S6, BC, Canada
| | - Sèverine Bérard
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France.
| |
Collapse
|
40
|
Abstract
BACKGROUND The reconstruction of ancestral genomes must deal with the problem of resolution, necessarily involving a trade-off between trying to identify genomic details and being overwhelmed by noise at higher resolutions. RESULTS We use the median reconstruction at the synteny block level, of the ancestral genome of the order Gentianales, based on coffee, Rhazya stricta and grape, to exemplify the effects of resolution (granularity) on comparative genomic analyses. CONCLUSIONS We show how decreased resolution blurs the differences between evolving genomes, with respect to rate, mutational process and other characteristics.
Collapse
Affiliation(s)
- Chunfang Zheng
- Department of Mathematics and Statistics, University of Ottawa, 585 King Edward Avenue, Ottawa, Ontario, K1N 6N5, Canada
| | - Yuji Jeong
- Department of Mathematics and Statistics, University of Ottawa, 585 King Edward Avenue, Ottawa, Ontario, K1N 6N5, Canada
| | - Madisyn Gabrielle Turcotte
- Department of Mathematics and Statistics, University of Ottawa, 585 King Edward Avenue, Ottawa, Ontario, K1N 6N5, Canada
| | - David Sankoff
- Department of Mathematics and Statistics, University of Ottawa, 585 King Edward Avenue, Ottawa, Ontario, K1N 6N5, Canada.
| |
Collapse
|
41
|
Buckley RM, Kortschak RD, Adelson DL. Divergent genome evolution caused by regional variation in DNA gain and loss between human and mouse. PLoS Comput Biol 2018; 14:e1006091. [PMID: 29677183 PMCID: PMC5931693 DOI: 10.1371/journal.pcbi.1006091] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Revised: 05/02/2018] [Accepted: 03/15/2018] [Indexed: 12/31/2022] Open
Abstract
The forces driving the accumulation and removal of non-coding DNA and ultimately the evolution of genome size in complex organisms are intimately linked to genome structure and organisation. Our analysis provides a novel method for capturing the regional variation of lineage-specific DNA gain and loss events in their respective genomic contexts. To further understand this connection we used comparative genomics to identify genome-wide individual DNA gain and loss events in the human and mouse genomes. Focusing on the distribution of DNA gains and losses, relationships to important structural features and potential impact on biological processes, we found that in autosomes, DNA gains and losses both followed separate lineage-specific accumulation patterns. However, in both species chromosome X was particularly enriched for DNA gain, consistent with its high L1 retrotransposon content required for X inactivation. We found that DNA loss was associated with gene-rich open chromatin regions and DNA gain events with gene-poor closed chromatin regions. Additionally, we found that DNA loss events tended to be smaller than DNA gain events suggesting that they were able to accumulate in gene-rich open chromatin regions due to their reduced capacity to interrupt gene regulatory architecture. GO term enrichment showed that mouse loss hotspots were strongly enriched for terms related to developmental processes. However, these genes were also located in regions with a high density of conserved elements, suggesting that despite high levels of DNA loss, gene regulatory architecture remained conserved. This is consistent with a model in which DNA gain and loss results in turnover or "churning" in regulatory element dense regions of open chromatin, where interruption of regulatory elements is selected against.
Collapse
Affiliation(s)
- Reuben M. Buckley
- Department of Genetics and Evolution, The University of Adelaide, North Tce, Adelaide, Australia
| | - R. Daniel Kortschak
- Department of Genetics and Evolution, The University of Adelaide, North Tce, Adelaide, Australia
| | - David L. Adelson
- Department of Genetics and Evolution, The University of Adelaide, North Tce, Adelaide, Australia
- * E-mail:
| |
Collapse
|
42
|
Capilla L, Sánchez-Guillén RA, Farré M, Paytuví-Gallart A, Malinverni R, Ventura J, Larkin DM, Ruiz-Herrera A. Mammalian Comparative Genomics Reveals Genetic and Epigenetic Features Associated with Genome Reshuffling in Rodentia. Genome Biol Evol 2018; 8:3703-3717. [PMID: 28175287 PMCID: PMC5521730 DOI: 10.1093/gbe/evw276] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/08/2016] [Indexed: 12/16/2022] Open
Abstract
Understanding how mammalian genomes have been reshuffled through structural changes is fundamental to the dynamics of its composition, evolutionary relationships between species and, in the long run, speciation. In this work, we reveal the evolutionary genomic landscape in Rodentia, the most diverse and speciose mammalian order, by whole-genome comparisons of six rodent species and six representative outgroup mammalian species. The reconstruction of the evolutionary breakpoint regions across rodent phylogeny shows an increased rate of genome reshuffling that is approximately two orders of magnitude greater than in other mammalian species here considered. We identified novel lineage and clade-specific breakpoint regions within Rodentia and analyzed their gene content, recombination rates and their relationship with constitutive lamina genomic associated domains, DNase I hypersensitivity sites and chromatin modifications. We detected an accumulation of protein-coding genes in evolutionary breakpoint regions, especially genes implicated in reproduction and pheromone detection and mating. Moreover, we found an association of the evolutionary breakpoint regions with active chromatin state landscapes, most probably related to gene enrichment. Our results have two important implications for understanding the mechanisms that govern and constrain mammalian genome evolution. The first is that the presence of genes related to species-specific phenotypes in evolutionary breakpoint regions reinforces the adaptive value of genome reshuffling. Second, that chromatin conformation, an aspect that has been often overlooked in comparative genomic studies, might play a role in modeling the genomic distribution of evolutionary breakpoints.
Collapse
Affiliation(s)
- Laia Capilla
- Genome Integrity and Instability Group, Institut de Biotecnologia i Biomedicina (IBB), Universitat Autònoma de Barcelona (UAB), Barcelona, Spain.,Departament de Biologia Animal, Biologia Vegetal i Ecologia, Universitat Autònoma de Barcelona (UAB), Barcelona, Spain
| | - Rosa Ana Sánchez-Guillén
- Genome Integrity and Instability Group, Institut de Biotecnologia i Biomedicina (IBB), Universitat Autònoma de Barcelona (UAB), Barcelona, Spain.,Biología Evolutiva, Instituto de Ecología A.C, Xalapa, Veracruz, Apartado, Mexico
| | - Marta Farré
- Biología Evolutiva, Instituto de Ecología A.C, Xalapa, Veracruz, Apartado, Mexico
| | - Andreu Paytuví-Gallart
- Department of Comparative Biomedical Sciences, The Royal Veterinary College, London, UK.,Sequentia Biotech S.L. Calle Comte d'Urgell, Barcelona, Spain
| | - Roberto Malinverni
- Departament de Biologia Cel·lular, Fisiologia i Immunologia, Universitat Autònoma de Barcelona (UAB), Barcelona, Spain
| | - Jacint Ventura
- Departament de Biologia Animal, Biologia Vegetal i Ecologia, Universitat Autònoma de Barcelona (UAB), Barcelona, Spain
| | - Denis M Larkin
- Biología Evolutiva, Instituto de Ecología A.C, Xalapa, Veracruz, Apartado, Mexico
| | - Aurora Ruiz-Herrera
- Genome Integrity and Instability Group, Institut de Biotecnologia i Biomedicina (IBB), Universitat Autònoma de Barcelona (UAB), Barcelona, Spain.,Sequentia Biotech S.L. Calle Comte d'Urgell, Barcelona, Spain
| |
Collapse
|
43
|
Avdeyev P, Jiang S, Alekseyev MA. Implicit Transpositions in DCJ Scenarios. Front Genet 2018; 8:212. [PMID: 29312438 PMCID: PMC5733028 DOI: 10.3389/fgene.2017.00212] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 11/29/2017] [Indexed: 11/13/2022] Open
Abstract
Genome rearrangements are large-scale evolutionary events that shuffle genomic architectures. The minimal number of such events between two genomes is often used in phylogenomic studies to measure the evolutionary distance between the genomes. Double-Cut-and-Join (DCJ) operations represent a convenient model of most common genome rearrangements (reversals, translocations, fissions, and fusions), while other genome rearrangements, such as transpositions, can be modeled by pairs of DCJs. Since the DCJ model does not directly account for transpositions, their impact on DCJ scenarios is unclear. In the present work, we study implicit appearance of transpositions (as pairs of DCJs) in DCJ scenarios. We consider shortest DCJ scenarios satisfying the maximum parsimony assumption, as well as more general DCJ scenarios based on some realistic but less restrictive assumptions. In both cases, we derive a uniform lower bound for the rate of implicit transpositions, which depends only on the genomes but not a particular DCJ scenario between them. Our results imply that implicit appearance of transpositions in DCJ scenarios may be unavoidable or even abundant for some pairs of genomes. We estimate that for mammalian genomes implicit transpositions constitute at least 6% of genome rearrangements.
Collapse
Affiliation(s)
- Pavel Avdeyev
- Department of Mathematics and the Computational Biology Institute, George Washington University, Washington, DC, United States
| | - Shuai Jiang
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, United States
| | - Max A Alekseyev
- Department of Mathematics and the Computational Biology Institute, George Washington University, Washington, DC, United States
| |
Collapse
|
44
|
Duchemin W, Anselmetti Y, Patterson M, Ponty Y, Bérard S, Chauve C, Scornavacca C, Daubin V, Tannier E. DeCoSTAR: Reconstructing the Ancestral Organization of Genes or Genomes Using Reconciled Phylogenies. Genome Biol Evol 2018; 9:1312-1319. [PMID: 28402423 PMCID: PMC5441342 DOI: 10.1093/gbe/evx069] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/07/2017] [Indexed: 12/15/2022] Open
Abstract
DeCoSTAR is a software that aims at reconstructing the organization of ancestral genes or genomes in the form of sets of neighborhood relations (adjacencies) between pairs of ancestral genes or gene domains. It can also improve the assembly of fragmented genomes by proposing evolutionary-induced adjacencies between scaffolding fragments. Ancestral genes or domains are deduced from reconciled phylogenetic trees under an evolutionary model that considers gains, losses, speciations, duplications, and transfers as possible events for gene evolution. Reconciliations are either given as input or computed with the ecceTERA package, into which DeCoSTAR is integrated. DeCoSTAR computes adjacency evolutionary scenarios using a scoring scheme based on a weighted sum of adjacency gains and breakages. Solutions, both optimal and near-optimal, are sampled according to the Boltzmann–Gibbs distribution centered around parsimonious solutions, and statistical supports on ancestral and extant adjacencies are provided. DeCoSTAR supports the features of previously contributed tools that reconstruct ancestral adjacencies, namely DeCo, DeCoLT, ART-DeCo, and DeClone. In a few minutes, DeCoSTAR can reconstruct the evolutionary history of domains inside genes, of gene fusion and fission events, or of gene order along chromosomes, for large data sets including dozens of whole genomes from all kingdoms of life. We illustrate the potential of DeCoSTAR with several applications: ancestral reconstruction of gene orders for Anopheles mosquito genomes, multidomain proteins in Drosophila, and gene fusion and fission detection in Actinobacteria. Availability:http://pbil.univ-lyon1.fr/software/DeCoSTAR (Last accessed April 24, 2017).
Collapse
Affiliation(s)
- Wandrille Duchemin
- Inria Grenoble Rhône-Alpes, Montbonnot, France.,Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France
| | - Yoann Anselmetti
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France.,Institut des Sciences de l'Évolution, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Murray Patterson
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France.,Experimental Algorithmics Lab (AlgoLab), Dipartimento di Informatica, Sistemistica e Comunicazione (DISCo), Università degli Studi di Milano-Bicocca, Viale Sarca, Milano, Italy
| | - Yann Ponty
- CNRS, Ecole Polytechnique, LIX UMR7161, Palaiseau, France.,Inria Saclay, EP AMIB, Palaiseau, France
| | - Sèverine Bérard
- Institut des Sciences de l'Évolution, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France.,LIRMM, Université de Montpellier, CNRS, Montpellier, France
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Celine Scornavacca
- Institut des Sciences de l'Évolution, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Vincent Daubin
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France
| | - Eric Tannier
- Inria Grenoble Rhône-Alpes, Montbonnot, France.,Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France
| |
Collapse
|
45
|
Anselmetti Y, Luhmann N, Bérard S, Tannier E, Chauve C. Comparative Methods for Reconstructing Ancient Genome Organization. Methods Mol Biol 2018; 1704:343-362. [PMID: 29277873 DOI: 10.1007/978-1-4939-7463-4_13] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Comparative genomics considers the detection of similarities and differences between extant genomes, and, based on more or less formalized hypotheses regarding the involved evolutionary processes, inferring ancestral states explaining the similarities and an evolutionary history explaining the differences. In this chapter, we focus on the reconstruction of the organization of ancient genomes into chromosomes. We review different methodological approaches and software, applied to a wide range of datasets from different kingdoms of life and at different evolutionary depths. We discuss relations with genome assembly, and potential approaches to validate computational predictions on ancient genomes that are almost always only accessible through these predictions.
Collapse
Affiliation(s)
- Yoann Anselmetti
- Institut des Sciences de l'Évolution, Université Montpellier 2, Montpellier, France
| | - Nina Luhmann
- Faculty of Technology, Bielefeld University, Bielefeld, Germany.,Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany.,International Research Training Group1906, Bielefeld University, Bielefeld, Germany
| | - Sèverine Bérard
- Institut des Sciences de l'Évolution, Université Montpellier 2, Montpellier, France
| | - Eric Tannier
- UMR CNRS 5558 - LBBE "Biométrie et Biologie Évolutive", Inria Grenoble Rhône-Alpes and University of Lyon, Lyon, France
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, BC, Canada, V5A 1S6.
| |
Collapse
|
46
|
Choe J, Kim JE, Lee BW, Lee JH, Nam M, Park YI, Jo SH. A comparative synteny analysis tool for target-gene SNP marker discovery: connecting genomics data to breeding in Solanaceae. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:5032609. [PMID: 29873704 PMCID: PMC6007222 DOI: 10.1093/database/bay047] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2017] [Accepted: 04/23/2018] [Indexed: 11/20/2022]
Abstract
It is necessary for molecular breeders to overcome the difficulties in applying abundant genomic information to crop breeding. Candidate orthologs would be discovered more efficiently in less-studied crops if the information gained from studies of related crops were used. We developed a comparative analysis tool and web-based genome viewer to identify orthologous genes based synteny as well as sequence similarity between tomato, pepper and potato. The tool has a step-by-step interface with multiple viewing levels to support the easy and accurate exploration of functional orthologs. Furthermore, it provides access to single nucleotide-polymorphism markers from the massive genetic resource pool in order to accelerate the development of molecular markers for candidate orthologs in the Solanaceae. This tool provides a bridge between genome data and breeding by supporting effective marker development, data utilization and communication. Database URL: http://tgsol.seeders.co.kr/scomp/
Collapse
Affiliation(s)
- Junkyoung Choe
- SEEDERS Inc, Daejeon 34015, Republic of Korea.,School of Medicine, Biological Sciences, Chungnam National University, Daejeon 34134, Republic of Korea
| | - Ji-Eun Kim
- SEEDERS Inc, Daejeon 34015, Republic of Korea
| | | | | | - Moon Nam
- SEEDERS Inc, Daejeon 34015, Republic of Korea
| | - Youn-Il Park
- School of Medicine, Biological Sciences, Chungnam National University, Daejeon 34134, Republic of Korea
| | | |
Collapse
|
47
|
Genome Rearrangement Analysis: Cut and Join Genome Rearrangements and Gene Cluster Preserving Approaches. Methods Mol Biol 2018; 1704:261-289. [PMID: 29277869 DOI: 10.1007/978-1-4939-7463-4_9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Genome rearrangements are mutations that change the gene content of a genome or the arrangement of the genes on a genome. Several years of research on genome rearrangements have established different algorithmic approaches for solving some fundamental problems in comparative genomics based on gene order information. This review summarizes the literature on genome rearrangement analysis along two lines of research. The first line considers rearrangement models that are particularly well suited for a theoretical analysis. These models use rearrangement operations that cut chromosomes into fragments and then join the fragments into new chromosomes. The second line works with rearrangement models that reflect several biologically motivated constraints, e.g., the constraint that gene clusters have to be preserved. In this chapter, the border between algorithmically "easy" and "hard" rearrangement problems is sketched and a brief review is given on the available software tools for genome rearrangement analysis.
Collapse
|
48
|
Sharma V, Hiller M. Increased alignment sensitivity improves the usage of genome alignments for comparative gene annotation. Nucleic Acids Res 2017. [PMID: 28645144 PMCID: PMC5737078 DOI: 10.1093/nar/gkx554] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Genome alignments provide a powerful basis to transfer gene annotations from a well-annotated reference genome to many other aligned genomes. The completeness of these annotations crucially depends on the sensitivity of the underlying genome alignment. Here, we investigated the impact of the genome alignment parameters and found that parameters with a higher sensitivity allow the detection of thousands of novel alignments between orthologous exons that have been missed before. In particular, comparisons between species separated by an evolutionary distance of >0.75 substitutions per neutral site, like human and other non-placental vertebrates, benefit from increased sensitivity. To systematically test if increased sensitivity improves comparative gene annotations, we built a multiple alignment of 144 vertebrate genomes and used this alignment to map human genes to the other 143 vertebrates with CESAR. We found that higher alignment sensitivity substantially improves the completeness of comparative gene annotations by adding on average 2382 and 7440 novel exons and 117 and 317 novel genes for mammalian and non-mammalian species, respectively. Our results suggest a more sensitive alignment strategy that should generally be used for genome alignments between distantly-related species. Our 144-vertebrate genome alignment and the comparative gene annotations (https://bds.mpi-cbg.de/hillerlab/144VertebrateAlignment_CESAR/) are a valuable resource for comparative genomics.
Collapse
Affiliation(s)
- Virag Sharma
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.,Max Planck Institute for the Physics of Complex Systems, Dresden, Germany
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.,Max Planck Institute for the Physics of Complex Systems, Dresden, Germany
| |
Collapse
|
49
|
Romanenko SA, Serdyukova NA, Perelman PL, Pavlova SV, Bulatova NS, Golenishchev FN, Stanyon R, Graphodatsky AS. Intrachromosomal Rearrangements in Rodents from the Perspective of Comparative Region-Specific Painting. Genes (Basel) 2017; 8:E215. [PMID: 28867774 PMCID: PMC5615349 DOI: 10.3390/genes8090215] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Revised: 08/22/2017] [Accepted: 08/23/2017] [Indexed: 01/31/2023] Open
Abstract
It has long been hypothesized that chromosomal rearrangements play a central role in different evolutionary processes, particularly in speciation and adaptation. Interchromosomal rearrangements have been extensively mapped using chromosome painting. However, intrachromosomal rearrangements have only been described using molecular cytogenetics in a limited number of mammals, including a few rodent species. This situation is unfortunate because intrachromosomal rearrangements are more abundant than interchromosomal rearrangements and probably contain essential phylogenomic information. Significant progress in the detection of intrachromosomal rearrangement is now possible, due to recent advances in molecular biology and bioinformatics. We investigated the level of intrachromosomal rearrangement in the Arvicolinae subfamily, a species-rich taxon characterized by very high rate of karyotype evolution. We made a set of region specific probes by microdissection for a single syntenic region represented by the p-arm of chromosome 1 of Alexandromys oeconomus, and hybridized the probes onto the chromosomes of four arvicolines (Microtus agrestis, Microtus arvalis, Myodes rutilus, and Dicrostonyx torquatus). These experiments allowed us to show the intrachromosomal rearrangements in the subfamily at a significantly higher level of resolution than previously described. We found a number of paracentric inversions in the karyotypes of M. agrestis and M. rutilus, as well as multiple inversions and a centromere shift in the karyotype of M. arvalis. We propose that during karyotype evolution, arvicolines underwent a significant number of complex intrachromosomal rearrangements that were not previously detected.
Collapse
Affiliation(s)
- Svetlana A Romanenko
- Institute of Molecular and Cellular Biology, Siberian Branch of the Russian Academy of Sciences, 630090 Novosibirsk, Russia.
- Synthetic Biological Unit, Novosibirsk State University, 630090 Novosibirsk, Russia.
| | - Natalya A Serdyukova
- Institute of Molecular and Cellular Biology, Siberian Branch of the Russian Academy of Sciences, 630090 Novosibirsk, Russia.
| | - Polina L Perelman
- Institute of Molecular and Cellular Biology, Siberian Branch of the Russian Academy of Sciences, 630090 Novosibirsk, Russia.
- Synthetic Biological Unit, Novosibirsk State University, 630090 Novosibirsk, Russia.
| | - Svetlana V Pavlova
- A.N. Severtsov Institute of Ecology and Evolution, Russian Academy of Sciences, 119071 Moscow, Russia.
| | - Nina S Bulatova
- A.N. Severtsov Institute of Ecology and Evolution, Russian Academy of Sciences, 119071 Moscow, Russia.
| | | | - Roscoe Stanyon
- Department of Biology, Anthropology Laboratories, University of Florence, 50122 Florence, Italy.
| | - Alexander S Graphodatsky
- Institute of Molecular and Cellular Biology, Siberian Branch of the Russian Academy of Sciences, 630090 Novosibirsk, Russia.
- Synthetic Biological Unit, Novosibirsk State University, 630090 Novosibirsk, Russia.
| |
Collapse
|
50
|
Feng B, Zhou L, Tang J. Ancestral Genome Reconstruction on Whole Genome Level. Curr Genomics 2017; 18:306-315. [PMID: 29081686 PMCID: PMC5635614 DOI: 10.2174/1389202918666170307120943] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Revised: 10/08/2016] [Accepted: 11/03/2016] [Indexed: 11/22/2022] Open
Abstract
Comparative genomics, evolutionary biology, and cancer researches require tools to elucidate the evolutionary trajectories and reconstruct the ancestral genomes. Various methods have been developed to infer the genome content and gene ordering of ancestral genomes by using such genomic structural variants. There are mainly two kinds of computational approaches in the ancestral genome reconstruction study. Distance/event-based approaches employ genome evolutionary models and reconstruct the ancestral genomes that minimize the total distance or events over the edges of the given phylogeny. The homology/adjacency-based approaches search for the conserved gene adjacencies and genome structures, and assemble these regions into ancestral genomes along the internal node of the given phylogeny. We review the principles and algorithms of these approaches that can reconstruct the ancestral genomes on the whole genome level. We talk about their advantages and limitations of these approaches in dealing with various genome datasets, evolutionary events, and reconstruction problems. We also talk about the improvements and developments of these approaches in the subsequent researches. We select four most famous and powerful approaches from both distance/event-based and homology/adjacency-based categories to analyze and compare their performances in dealing with different kinds of datasets and evolutionary events. Based on our experiment, GASTS has the best performance in solving the problems with equal genome contents that only have genome rearrangement events. PMAG++ achieves the best performance in solving the problems with unequal genome contents that have all possible complicated evolutionary events.
Collapse
Affiliation(s)
- Bing Feng
- School of Computer Science and Technology, Tianjin University, Tianjin300350, China
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC29208, USA
| | - Lingxi Zhou
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC29208, USA
| | - Jijun Tang
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC29208, USA
| |
Collapse
|