1
|
Garcia-Erill G, Liu S, Le MD, Hurley MM, Nguyen HD, Nguyen DQ, Nguyen DH, Santander CG, Sánchez Barreiro F, Gomes Martins NF, Hanghøj K, Salleh FM, Ramos-Madrigal J, Wang X, Sinding MHS, Morales HE, Stæger FF, Wilkinson N, Meng G, Pečnerová P, Yang C, Rasmussen MS, Schubert M, Dunn RR, Moltke I, Zhang G, Chen L, Wang W, Cao TT, Nguyen HM, Siegismund HR, Albrechtsen A, Gilbert MTP, Heller R. Genomes of critically endangered saola are shaped by population structure and purging. Cell 2025:S0092-8674(25)00390-3. [PMID: 40328258 DOI: 10.1016/j.cell.2025.03.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Revised: 12/20/2024] [Accepted: 03/25/2025] [Indexed: 05/08/2025]
Abstract
The saola is one of the most elusive large mammals, standing at the brink of extinction. We constructed a reference genome and resequenced 26 saola individuals, confirming the saola as a basal member of the Bovini. Despite its small geographic range, we found that the saola is partitioned into two populations with high genetic differentiation (FST = 0.49). We estimate that these populations diverged and started declining 5,000-20,000 years ago, possibly due to climate changes and exacerbated by increasing human activities. The saola has long tracts without genomic diversity; however, most of these tracts are not shared by the two populations. Saolas carry a high genetic load, yet their gradual decline resulted in the purging of the most deleterious genetic variation. Finally, we find that combining the two populations, e.g., in an eventual captive breeding program, would mitigate the genetic load and increase the odds of species survival.
Collapse
Affiliation(s)
- Genís Garcia-Erill
- Department of Biology, University of Copenhagen, Copenhagen, Denmark; Bioinformatics Research Centre, Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Shanlin Liu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China; Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Minh Duc Le
- Faculty of Environmental Sciences, University of Science, Vietnam National University, Hanoi, 334 Nguyen Trai Road, Hanoi, Vietnam; Vietnam and Central Institute for Natural Resources and Environmental Studies, Vietnam National University, Hanoi, 19 Le Thanh Tong, Hanoi, Vietnam
| | - Martha M Hurley
- Center for Biodiversity and Conservation, American Museum of Natural History, New York, NY, USA
| | - Hung Dinh Nguyen
- Forest Inventory and Planning Institute, Ministry of Agriculture and Rural Development, Hanoi, Vietnam
| | - Dzung Quoc Nguyen
- Forest Inventory and Planning Institute, Ministry of Agriculture and Rural Development, Hanoi, Vietnam
| | - Dzung Huy Nguyen
- Forest Inventory and Planning Institute, Ministry of Agriculture and Rural Development, Hanoi, Vietnam
| | - Cindy G Santander
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | | | - Kristian Hanghøj
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Faezah Mohd Salleh
- Globe Institute, University of Copenhagen, Copenhagen, Denmark; Department of Biosciences, Faculty of Science, Universiti Teknologi Malaysia, 81310 Johor Bahru, Johor, Malaysia
| | | | - Xi Wang
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | | | | | | | - Guanliang Meng
- Zoological Research Museum Alexander Koenig, LIB, Bonn, Germany
| | | | | | | | - Mikkel Schubert
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Robert R Dunn
- Department of Applied Ecology, North Carolina State University, Raleigh, NC, USA
| | - Ida Moltke
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Guojie Zhang
- Department of Biology, University of Copenhagen, Copenhagen, Denmark; Center of Evolutionary & Organismal Biology, Zhejiang University School of Medicine, Hangzhou 310058, China
| | - Lei Chen
- Center for Ecological and Environmental Science, Northwestern Polytechnical University, Xi'an 710072, China
| | - Wen Wang
- Center for Ecological and Environmental Science, Northwestern Polytechnical University, Xi'an 710072, China
| | - Trung Tien Cao
- Institute of Biology, Chemistry and Environment, Vinh University, Vinh, Vietnam
| | - Ha Manh Nguyen
- Center for Nature Conservation and Development, No. 05, 56/119 Tu Lien Street, Hanoi, Vietnam
| | - Hans R Siegismund
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | - M Thomas P Gilbert
- Globe Institute, University of Copenhagen, Copenhagen, Denmark; University Museum, NTNU, Trondheim, Norway.
| | - Rasmus Heller
- Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
2
|
Jensen EL, Marchisio C, Ochoa A, Gray R, Parra V, Miller JM, Çilingir FG, Caccone A. Synteny Enabled Upgrade of the Galapagos Giant Tortoise Genome Improves Inferences of Runs of Homozygosity. Ecol Evol 2025; 15:e71358. [PMID: 40290375 PMCID: PMC12032190 DOI: 10.1002/ece3.71358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Revised: 03/26/2025] [Accepted: 04/15/2025] [Indexed: 04/30/2025] Open
Abstract
The utility and importance of whole-genome sequences are recognized across various fields, including evolution and conservation. However, for some taxa, like extinct species, using methods to generate contiguous genomes that rely on high-quality DNA is impossible. In such cases, an alternative may be to employ synteny-based methods using a genome from a closely related taxon to generate more complete genomes. Here we update the reference genome for the Pinta Island Galapagos giant tortoise (Chelonoidis abingdonii) without conducting additional sequencing through rescaffolding against the most closely related chromosome-level genome assembly, the Aldabra giant tortoise (Aldabrachelys gigantea). This effort resulted in a much more contiguous genome, CheloAbing_2.0, with an N50 that is two orders of magnitude longer and large reductions in L50 and the number of gaps. We then examined the impact of the CheloAbing_2.0 genome on estimates of runs of homozygosity (ROH) using genome resequencing data from 37 individual Galapagos giant tortoises from the 13 extant lineages to test the mechanisms by which a fragmented assembly may over- or underestimate the number and extent of ROH. The use of CheloAbing_2.0 resulted in individual estimates of inbreeding, including ROH proportion (FROH), number (NROH), and cumulative length (SROH), that were statistically different from those derived from the earlier genome assembly. This improved genome will serve as a resource for future efforts focusing on the ecology, evolution, and conservation of this species group. More broadly, our results highlight that synteny-based scaffolding is promising for generating contiguous genomes without needing additional data types.
Collapse
Affiliation(s)
- Evelyn L. Jensen
- School of Natural and Environmental Sciences, Newcastle UniversityNewcastleUpon TyneUK
| | - Chiara Marchisio
- School of Natural and Environmental Sciences, Newcastle UniversityNewcastleUpon TyneUK
- Faculty of Health and Life SciencesUniversitat Pompeu FabraBarcelonaSpain
| | - Alexander Ochoa
- Department of Ecology and Evolutionary BiologyYale UniversityNew HavenConnecticutUSA
| | - Rachel Gray
- School of Natural and Environmental Sciences, Newcastle UniversityNewcastleUpon TyneUK
| | - Vanessa Parra
- Biology DepartmentUniversity of KentuckyLexingtonKentuckyUSA
| | - Joshua M. Miller
- Department of Biological SciencesMacEwan UniversityEdmontonCanada
| | - F. Gözde Çilingir
- Department of Evolutionary Biology and Environmental StudiesUniversity of ZurichZurichSwitzerland
- Swiss Federal Institute for Research WSLBirmensdorfSwitzerland
| | - Adalgisa Caccone
- Department of Ecology and Evolutionary BiologyYale UniversityNew HavenConnecticutUSA
| |
Collapse
|
3
|
Kim MS, Lee JS, Yang Z, Hagiwara A, Kim DH, Lee JS. Comparative genome analysis and global methylation patterns for epigenetic study in the brackish water flea Diaphanosoma celebensis. COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY. PART D, GENOMICS & PROTEOMICS 2025; 55:101493. [PMID: 40174405 DOI: 10.1016/j.cbd.2025.101493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2025] [Revised: 03/19/2025] [Accepted: 03/21/2025] [Indexed: 04/04/2025]
Abstract
The brackish water flea Diaphanosoma celebensis is a crucial organism in brackish and estuarine ecosystems, acting as a key trophic link between primary producers and higher trophic levels. Its small size, short life cycle, and high reproductive capacity make it an ideal model for studying ecological responses to environmental stressors, especially in polluted environments. This study provides a chromosome-level genome assembly of D. celebensis, consisting of 22 chromosomes with an N50 of 4,113,329 base pairs and 95.1 % completeness, achieved by combining de novo assembly with Hi-C data from D. dubium. Whole-genome bisulfite sequencing (WGBS) revealed distinct DNA methylation patterns, with exons showing higher methylation than introns and intergenic regions. A detailed analysis identified four gene clusters based on methylation levels. Cluster δ (highly methylated), enriched for pathways related to protein processing, ribosomal activity, and ubiquitin-mediated proteolysis, suggests a regulatory mechanism for stress adaptation in D. celebensis. In contrast, cluster α (hypo methylated), associated with transcription regulation and neural functions, highlights genes involved in cellular processes that may respond dynamically to environmental changes. Functional gene comparisons indicated significant differences in pathways related to ion transport and ubiquitination, emphasizing the unique adaptations of D. celebensis to its brackish environment. These findings provide a deeper understanding of the species' genomic and epigenetic regulation, offering valuable insights for future studies on its adaptation to environmental pollutants.
Collapse
Affiliation(s)
- Min-Sub Kim
- Department of Biological Sciences, College of Science, Sungkyunkwan University, Suwon 16419, South Korea
| | - Jin-Sol Lee
- Department of Biological Sciences, College of Science, Sungkyunkwan University, Suwon 16419, South Korea
| | - Zhou Yang
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, School of Biological Sciences, Nanjing Normal University, Nanjing 210023, China
| | - Atsushi Hagiwara
- Institute of Integrated Science and Technology, Graduate School of Fisheries Science and Environmental Sciences, Nagasaki University, Nagasaki, Japan
| | - Duck-Hyun Kim
- Department of Biological Sciences, College of Science, Sungkyunkwan University, Suwon 16419, South Korea.
| | - Jae-Seong Lee
- Department of Biological Sciences, College of Science, Sungkyunkwan University, Suwon 16419, South Korea.
| |
Collapse
|
4
|
Wy S, Kwon D, Park W, Chai HH, Cho IC, Kim J. A chromosome-level genome assembly of the Korean minipig (Sus scrofa). Sci Data 2024; 11:840. [PMID: 39097649 PMCID: PMC11297928 DOI: 10.1038/s41597-024-03680-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Accepted: 07/25/2024] [Indexed: 08/05/2024] Open
Abstract
Recent advancements in sequencing and genome assembly technologies have led to rapid generation of high-quality genome assemblies for various species and breeds. Despite the importance as minipigs an animal model in biomedical research, the construction of high-quality genome assemblies of minipigs still lags behind other pig breeds. To address this problem, we constructed a high-quality chromosome-level genome assembly of the Korean minipig (KMP) utilizing multiple different types of sequencing reads and reference genomes. The KMP assembly included 19 chromosome-level sequences with a total length of 2.52 Gb and N50 of 137 Mb. Comparative analyses with the pig reference genome (Sscrofa11.1) demonstrated comparable contiguity and completeness of the KMP assembly. Additionally, genome annotation analyses identified 22,666 protein-coding genes and repetitive elements occupying 40.10% of the genome. The KMP assembly and genome annotation provide valuable resources that can contribute to various future research on minipig and other pig breeds.
Collapse
Affiliation(s)
- Suyeon Wy
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Daehong Kwon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Woncheoul Park
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju, 55365, Republic of Korea
| | - Han-Ha Chai
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju, 55365, Republic of Korea
| | - In-Cheol Cho
- Subtropical Livestock Research Institute, National Institute of Animal Science, RDA, Jeju, 63242, Republic of Korea
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea.
| |
Collapse
|
5
|
Sandoval-Velasco M, Dudchenko O, Rodríguez JA, Pérez Estrada C, Dehasque M, Fontsere C, Mak SST, Khan R, Contessoto VG, Oliveira Junior AB, Kalluchi A, Zubillaga Herrera BJ, Jeong J, Roy RP, Christopher I, Weisz D, Omer AD, Batra SS, Shamim MS, Durand NC, O'Connell B, Roca AL, Plikus MV, Kusliy MA, Romanenko SA, Lemskaya NA, Serdyukova NA, Modina SA, Perelman PL, Kizilova EA, Baiborodin SI, Rubtsov NB, Machol G, Rath K, Mahajan R, Kaur P, Gnirke A, Garcia-Treviño I, Coke R, Flanagan JP, Pletch K, Ruiz-Herrera A, Plotnikov V, Pavlov IS, Pavlova NI, Protopopov AV, Di Pierro M, Graphodatsky AS, Lander ES, Rowley MJ, Wolynes PG, Onuchic JN, Dalén L, Marti-Renom MA, Gilbert MTP, Aiden EL. Three-dimensional genome architecture persists in a 52,000-year-old woolly mammoth skin sample. Cell 2024; 187:3541-3562.e51. [PMID: 38996487 DOI: 10.1016/j.cell.2024.06.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 03/07/2024] [Accepted: 06/03/2024] [Indexed: 07/14/2024]
Abstract
Analyses of ancient DNA typically involve sequencing the surviving short oligonucleotides and aligning to genome assemblies from related, modern species. Here, we report that skin from a female woolly mammoth (†Mammuthus primigenius) that died 52,000 years ago retained its ancient genome architecture. We use PaleoHi-C to map chromatin contacts and assemble its genome, yielding 28 chromosome-length scaffolds. Chromosome territories, compartments, loops, Barr bodies, and inactive X chromosome (Xi) superdomains persist. The active and inactive genome compartments in mammoth skin more closely resemble Asian elephant skin than other elephant tissues. Our analyses uncover new biology. Differences in compartmentalization reveal genes whose transcription was potentially altered in mammoths vs. elephants. Mammoth Xi has a tetradic architecture, not bipartite like human and mouse. We hypothesize that, shortly after this mammoth's death, the sample spontaneously freeze-dried in the Siberian cold, leading to a glass transition that preserved subfossils of ancient chromosomes at nanometer scale.
Collapse
Affiliation(s)
| | - Olga Dudchenko
- The Center for Genome Architecture and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA.
| | - Juan Antonio Rodríguez
- Center for Evolutionary Hologenomics, University of Copenhagen, DK-1353 Copenhagen, Denmark; Centre Nacional d'Anàlisi Genòmica, CNAG, 08028 Barcelona, Spain
| | - Cynthia Pérez Estrada
- The Center for Genome Architecture and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA
| | - Marianne Dehasque
- Centre for Palaeogenetics, SE-106 91 Stockholm, Sweden; Department of Bioinformatics and Genetics, Swedish Museum of Natural History, 10405 Stockholm, Sweden; Department of Zoology, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Claudia Fontsere
- Center for Evolutionary Hologenomics, University of Copenhagen, DK-1353 Copenhagen, Denmark
| | - Sarah S T Mak
- Center for Evolutionary Hologenomics, University of Copenhagen, DK-1353 Copenhagen, Denmark
| | - Ruqayya Khan
- The Center for Genome Architecture and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | | | | | - Achyuth Kalluchi
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - Bernardo J Zubillaga Herrera
- Department of Physics, Northeastern University, Boston, MA 02115, USA; Center for Theoretical Biological Physics, Northeastern University, Boston, MA 02215, USA
| | - Jiyun Jeong
- The Center for Genome Architecture and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Renata P Roy
- The Center for Genome Architecture and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA; Departments of Biology and Physics, Texas Southern University, Houston, TX 77004, USA
| | - Ishawnia Christopher
- The Center for Genome Architecture and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - David Weisz
- The Center for Genome Architecture and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Arina D Omer
- The Center for Genome Architecture and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Sanjit S Batra
- The Center for Genome Architecture and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Muhammad S Shamim
- The Center for Genome Architecture and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Neva C Durand
- The Center for Genome Architecture and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Brendan O'Connell
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA; Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR 97239, USA
| | - Alfred L Roca
- Department of Animal Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Maksim V Plikus
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA 92697, USA
| | - Mariya A Kusliy
- Institute of Molecular and Cellular Biology SB RAS, Novosibirsk 630090, Russia
| | | | - Natalya A Lemskaya
- Institute of Molecular and Cellular Biology SB RAS, Novosibirsk 630090, Russia
| | | | - Svetlana A Modina
- Institute of Molecular and Cellular Biology SB RAS, Novosibirsk 630090, Russia
| | - Polina L Perelman
- Institute of Molecular and Cellular Biology SB RAS, Novosibirsk 630090, Russia
| | - Elena A Kizilova
- Institute of Cytology and Genetics SB RAS, Novosibirsk 630090, Russia
| | | | - Nikolai B Rubtsov
- Institute of Cytology and Genetics SB RAS, Novosibirsk 630090, Russia
| | - Gur Machol
- The Center for Genome Architecture and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Krisha Rath
- The Center for Genome Architecture and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ragini Mahajan
- The Center for Genome Architecture and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA; Department of Biosciences, Rice University, Houston, TX 77005, USA
| | - Parwinder Kaur
- UWA School of Agriculture and Environment, University of Western Australia, Perth, WA 6009, Australia
| | - Andreas Gnirke
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | - Rob Coke
- San Antonio Zoo, San Antonio, TX 78212, USA
| | | | | | - Aurora Ruiz-Herrera
- Departament de Biologia Cel·lular, Fisiologia i Immunologia and Genome Integrity and Instability Group, Institut de Biotecnologia i Biomedicina, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Spain
| | | | | | - Naryya I Pavlova
- Institute of Biological Problems of Cryolitezone SB RAS, Yakutsk 677000, Russia
| | - Albert V Protopopov
- Academy of Sciences of Sakha Republic, Yakutsk 677000, Russia; North-Eastern Federal University, Yakutsk 677027, Russia
| | - Michele Di Pierro
- Department of Physics, Northeastern University, Boston, MA 02115, USA; Center for Theoretical Biological Physics, Northeastern University, Boston, MA 02215, USA
| | | | - Eric S Lander
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA; Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - M Jordan Rowley
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - Peter G Wolynes
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA; Department of Biosciences, Rice University, Houston, TX 77005, USA; Departments of Physics, Astronomy, & Chemistry, Rice University, Houston, TX 77005, USA
| | - José N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA; Department of Biosciences, Rice University, Houston, TX 77005, USA; Departments of Physics, Astronomy, & Chemistry, Rice University, Houston, TX 77005, USA
| | - Love Dalén
- Centre for Palaeogenetics, SE-106 91 Stockholm, Sweden; Department of Bioinformatics and Genetics, Swedish Museum of Natural History, 10405 Stockholm, Sweden; Department of Zoology, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Marc A Marti-Renom
- Centre Nacional d'Anàlisi Genòmica, CNAG, 08028 Barcelona, Spain; Centre for Genomic Regulation, The Barcelona Institute for Science and Technology, 08003 Barcelona, Spain; ICREA, 08010 Barcelona, Spain; Universitat Pompeu Fabra, 08002 Barcelona, Spain.
| | - M Thomas P Gilbert
- Center for Evolutionary Hologenomics, University of Copenhagen, DK-1353 Copenhagen, Denmark; University Museum NTNU, 7012 Trondheim, Norway.
| | - Erez Lieberman Aiden
- The Center for Genome Architecture and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Center for Theoretical Biological Physics, Rice University, Houston, TX 77030, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
6
|
Liu K, Xie N, Wang Y, Liu X. The Utilization of Reference-Guided Assembly and In Silico Libraries Improves the Draft Genome of Clarias batrachus and Culter alburnus. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2023; 25:907-917. [PMID: 37661218 DOI: 10.1007/s10126-023-10248-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 08/28/2023] [Indexed: 09/05/2023]
Abstract
Long-read sequencing technologies can generate highly contiguous genome assemblies compared to short-read methods. However, their higher cost often poses a significant barrier. To address this, we explore the utilization of mapping-based genome assembly and reference-guided assembly as cost-effective alternative approaches. We assess the efficacy of these approaches in improving the contiguity of Clarias batrachus and Culter alburnus draft genomes. Our findings demonstrate that employing an iterative mapping strategy leads to a reduction in assembly errors. Specifically, after three iterations, the Mismatches per 100 kbp value for the C. batrachus genome decreased from 2447.20 to 2432.67, reaching a minimum of 2422.67 after two iterations. Additionally, the N50 value for the C. batrachus genome increased from 362,143 to 1,315,126 bp, with a maximum of 1,315,403 bp after two iterations. Furthermore, we achieved Mismatches per 100 kbp values of 3.70 for the reference-guided assembly of C. batrachus and 0.34 for C. alburnus. Correspondingly, the N50 value for the C. batrachus and C. alburnus genomes increased from 362,143 bp and 3,686,385 bp to 2,026,888 bp and 43,735,735 bp, respectively. Finally, we successfully utilized the improved C. batrachus and C. alburnus genomes to compare genome studies using the combined approach of Ragout and Ragtag. Through a comprehensive comparative analysis of mapping-based and reference-guided genome assembly methods, we shed light on the specific contributions of reference-guided assembly in reducing assembly errors and improving assembly continuity and integrity. These advancements establish reference-guided assembly and the utilization of in silico libraries as a promising and suitable approach for comparative genomics studies.
Collapse
Affiliation(s)
- Kai Liu
- Institute of Fishery Science, Hangzhou Academy of Agricultural Sciences, Hangzhou, 310024, China.
| | - Nan Xie
- Institute of Fishery Science, Hangzhou Academy of Agricultural Sciences, Hangzhou, 310024, China
| | - Yuxi Wang
- Institute of Fishery Science, Hangzhou Academy of Agricultural Sciences, Hangzhou, 310024, China
| | - Xinyi Liu
- Institute of Fishery Science, Hangzhou Academy of Agricultural Sciences, Hangzhou, 310024, China
| |
Collapse
|
7
|
Romanenko SA, Kliver SF, Serdyukova NA, Perelman PL, Trifonov VA, Seluanov A, Gorbunova V, Azpurua J, Pereira JC, Ferguson-Smith MA, Graphodatsky AS. Integration of fluorescence in situ hybridization and chromosome-length genome assemblies revealed synteny map for guinea pig, naked mole-rat, and human. Sci Rep 2023; 13:21055. [PMID: 38030702 PMCID: PMC10687270 DOI: 10.1038/s41598-023-46595-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 11/02/2023] [Indexed: 12/01/2023] Open
Abstract
Descriptions of karyotypes of many animal species are currently available. In addition, there has been a significant increase in the number of sequenced genomes and an ever-improving quality of genome assembly. To close the gap between genomic and cytogenetic data we applied fluorescent in situ hybridization (FISH) and Hi-C technology to make the first full chromosome-level genome comparison of the guinea pig (Cavia porcellus), naked mole-rat (Heterocephalus glaber), and human. Comparative chromosome maps obtained by FISH with chromosome-specific probes link genomic scaffolds to individual chromosomes and orient them relative to centromeres and heterochromatic blocks. Hi-C assembly made it possible to close all gaps on the comparative maps and to reveal additional rearrangements that distinguish the karyotypes of the three species. As a result, we integrated the bioinformatic and cytogenetic data and adjusted the previous comparative maps and genome assemblies of the guinea pig, naked mole-rat, and human. Syntenic associations in the two hystricomorphs indicate features of their putative ancestral karyotype. We postulate that the two approaches applied in this study complement one another and provide complete information about the organization of these genomes at the chromosome level.
Collapse
Affiliation(s)
- Svetlana A Romanenko
- Institute of Molecular and Cellular Biology, Russian Academy of Sciences, Siberian Branch, Novosibirsk, Russia.
| | - Sergei F Kliver
- Center for Evolutionary Hologenomics, The Globe Institute, The University of Copenhagen, Copenhagen, Denmark
| | - Natalia A Serdyukova
- Institute of Molecular and Cellular Biology, Russian Academy of Sciences, Siberian Branch, Novosibirsk, Russia
| | - Polina L Perelman
- Institute of Molecular and Cellular Biology, Russian Academy of Sciences, Siberian Branch, Novosibirsk, Russia
| | - Vladimir A Trifonov
- Institute of Molecular and Cellular Biology, Russian Academy of Sciences, Siberian Branch, Novosibirsk, Russia
- Novosibirsk State University, Novosibirsk, Russia
| | - Andrei Seluanov
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Vera Gorbunova
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Jorge Azpurua
- Department of Biochemistry and Molecular Medicine, The George Washington University, Washington, DC, USA
| | - Jorge C Pereira
- Animal and Veterinary Research Centre, University of Trás-os-Montes and Alto Douro, Vila Real, Portugal
- Cambridge Resource Centre for Comparative Genomics, Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | - Malcolm A Ferguson-Smith
- Cambridge Resource Centre for Comparative Genomics, Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | - Alexander S Graphodatsky
- Institute of Molecular and Cellular Biology, Russian Academy of Sciences, Siberian Branch, Novosibirsk, Russia
| |
Collapse
|
8
|
Kwon D, Park N, Wy S, Lee D, Chai HH, Cho IC, Lee J, Kwon K, Kim H, Moon Y, Kim J, Park W, Kim J. A chromosome-level genome assembly of the Korean crossbred pig Nanchukmacdon (Sus scrofa). Sci Data 2023; 10:761. [PMID: 37923776 PMCID: PMC10624824 DOI: 10.1038/s41597-023-02661-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 10/17/2023] [Indexed: 11/06/2023] Open
Abstract
As plentiful high-quality genome assemblies have been accumulated, reference-guided genome assembly can be a good approach to reconstruct a high-quality assembly. Here, we present a chromosome-level genome assembly of the Korean crossbred pig called Nanchukmacdon (the NCMD assembly) using the reference-guided assembly approach with short and long reads. The NCMD assembly contains 20 chromosome-level scaffolds with a total size of 2.38 Gbp (N50: 138.77 Mbp). Its BUSCO score is 93.1%, which is comparable to the pig reference assembly, and a total of 20,588 protein-coding genes, 8,651 non-coding genes, and 996.14 Mbp of repetitive elements are annotated. The NCMD assembly was also used to close many gaps in the pig reference assembly. This NCMD assembly and annotation provide foundational resources for the genomic analyses of pig and related species.
Collapse
Affiliation(s)
- Daehong Kwon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Nayoung Park
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Suyeon Wy
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Daehwan Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Han-Ha Chai
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju, 55365, Republic of Korea
| | - In-Cheol Cho
- Subtropical Livestock Research Institute, National Institute of Animal Science, RDA, Jeju, 63242, Republic of Korea
| | - Jongin Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Kisang Kwon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Heesun Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Youngbeen Moon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Juyeon Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Woncheoul Park
- Animal Genomics and Bioinformatics Division, National Institute of Animal Science, RDA, Wanju, 55365, Republic of Korea.
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea.
| |
Collapse
|
9
|
Herrera M, Ravasi T, Laudet V. Anemonefishes: A model system for evolutionary genomics. F1000Res 2023; 12:204. [PMID: 37928172 PMCID: PMC10624958 DOI: 10.12688/f1000research.130752.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/20/2023] [Indexed: 11/07/2023] Open
Abstract
Anemonefishes are an iconic group of coral reef fish particularly known for their mutualistic relationship with sea anemones. This mutualism is especially intriguing as it likely prompted the rapid diversification of anemonefish. Understanding the genomic architecture underlying this process has indeed become one of the holy grails of evolutionary research in these fishes. Recently, anemonefishes have also been used as a model system to study the molecular basis of highly complex traits such as color patterning, social sex change, larval dispersal and life span. Extensive genomic resources including several high-quality reference genomes, a linkage map, and various genetic tools have indeed enabled the identification of genomic features controlling some of these fascinating attributes, but also provided insights into the molecular mechanisms underlying adaptive responses to changing environments. Here, we review the latest findings and new avenues of research that have led to this group of fish being regarded as a model for evolutionary genomics.
Collapse
Affiliation(s)
- Marcela Herrera
- Marine Eco-Evo-Devo Unit, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa, 904-0495, Japan
| | - Timothy Ravasi
- Marine Climate Change Unit, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa, 904-0495, Japan
- Australian Research Council Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, Queensland, 4811, Australia
| | - Vincent Laudet
- Marine Eco-Evo-Devo Unit, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa, 904-0495, Japan
- Marine Research Station, Institute of Cellular and Organismic Biology (ICOB), Academia Sinica, 23-10, Dah-Uen Rd, Jiau Shi I-Lan 262, Taiwan
| |
Collapse
|
10
|
Poisson W, Prunier J, Carrier A, Gilbert I, Mastromonaco G, Albert V, Taillon J, Bourret V, Droit A, Côté SD, Robert C. Chromosome-level assembly of the Rangifer tarandus genome and validation of cervid and bovid evolution insights. BMC Genomics 2023; 24:142. [PMID: 36959567 PMCID: PMC10037892 DOI: 10.1186/s12864-023-09189-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 02/14/2023] [Indexed: 03/25/2023] Open
Abstract
BACKGROUND Genome assembly into chromosomes facilitates several analyses including cytogenetics, genomics and phylogenetics. Despite rapid development in bioinformatics, however, assembly beyond scaffolds remains challenging, especially in species without closely related well-assembled and available reference genomes. So far, four draft genomes of Rangifer tarandus (caribou or reindeer, a circumpolar distributed cervid species) have been published, but none with chromosome-level assembly. This emblematic northern species is of high interest in ecological studies and conservation since most populations are declining. RESULTS We have designed specific probes based on Oligopaint FISH technology to upgrade the latest published reindeer and caribou chromosome-level genomes. Using this oligonucleotide-based method, we found six mis-assembled scaffolds and physically mapped 68 of the largest scaffolds representing 78% of the most recent R. tarandus genome assembly. Combining physical mapping and comparative genomics, it was possible to document chromosomal evolution among Cervidae and closely related bovids. CONCLUSIONS Our results provide validation for the current chromosome-level genome assembly as well as resources to use chromosome banding in studies of Rangifer tarandus.
Collapse
Affiliation(s)
- William Poisson
- Département des sciences animales, Faculté des sciences de l'agriculture et de l'alimentation, Université Laval, Québec, QC, Canada
- Centre de Recherche en Reproduction, Développement et Santé Intergénérationnelle, Québec, QC, Canada
- Réseau Québécois en reproduction, QC, Saint-Hyacinthe, Canada
| | - Julien Prunier
- Département de biochimie, microbiologie et bio-informatique, Faculté des sciences et de génie, Université Laval, Québec, QC, Canada
| | - Alexandra Carrier
- Département des sciences animales, Faculté des sciences de l'agriculture et de l'alimentation, Université Laval, Québec, QC, Canada
- Centre de Recherche en Reproduction, Développement et Santé Intergénérationnelle, Québec, QC, Canada
- Réseau Québécois en reproduction, QC, Saint-Hyacinthe, Canada
| | - Isabelle Gilbert
- Département des sciences animales, Faculté des sciences de l'agriculture et de l'alimentation, Université Laval, Québec, QC, Canada
- Centre de Recherche en Reproduction, Développement et Santé Intergénérationnelle, Québec, QC, Canada
- Réseau Québécois en reproduction, QC, Saint-Hyacinthe, Canada
| | | | - Vicky Albert
- Ministère des Forêts, de la Faune et des Parcs du Québec (MFFP), Québec, QC, Canada
| | - Joëlle Taillon
- Ministère des Forêts, de la Faune et des Parcs du Québec (MFFP), Québec, QC, Canada
| | - Vincent Bourret
- Ministère des Forêts, de la Faune et des Parcs du Québec (MFFP), Québec, QC, Canada
| | - Arnaud Droit
- Département de médecine moléculaire, Faculté de médecine, Université Laval, Québec, QC, Canada
| | - Steeve D Côté
- Caribou Ungava, Département de biologie and Centre d'études nordiques, Faculté des sciences et de génie, Université Laval, Québec, QC, Canada
| | - Claude Robert
- Département des sciences animales, Faculté des sciences de l'agriculture et de l'alimentation, Université Laval, Québec, QC, Canada.
- Centre de Recherche en Reproduction, Développement et Santé Intergénérationnelle, Québec, QC, Canada.
- Réseau Québécois en reproduction, QC, Saint-Hyacinthe, Canada.
| |
Collapse
|
11
|
Minio A, Cochetel N, Vondras AM, Massonnet M, Cantu D. Assembly of complete diploid-phased chromosomes from draft genome sequences. G3 GENES|GENOMES|GENETICS 2022; 12:6605224. [PMID: 35686922 PMCID: PMC9339290 DOI: 10.1093/g3journal/jkac143] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 05/30/2022] [Indexed: 01/27/2023]
Abstract
De novo genome assembly is essential for genomic research. High-quality genomes assembled into phased pseudomolecules are challenging to produce and often contain assembly errors because of repeats, heterozygosity, or the chosen assembly strategy. Although algorithms that produce partially phased assemblies exist, haploid draft assemblies that may lack biological information remain favored because they are easier to generate and use. We developed HaploSync, a suite of tools that produces fully phased, chromosome-scale diploid genome assemblies, and performs extensive quality control to limit assembly artifacts. HaploSync scaffolds sequences from a draft diploid assembly into phased pseudomolecules guided by a genetic map and/or the genome of a closely related species. HaploSync generates a report that visualizes the relationships between current and legacy sequences, for both haplotypes, and displays their gene and marker content. This quality control helps the user identify misassemblies and guides Haplosync’s correction of scaffolding errors. Finally, HaploSync fills assembly gaps with unplaced sequences and resolves collapsed homozygous regions. In a series of plant, fungal, and animal kingdom case studies, we demonstrate that HaploSync efficiently increases the assembly contiguity of phased chromosomes, improves completeness by filling gaps, corrects scaffolding, and correctly phases highly heterozygous, complex regions.
Collapse
Affiliation(s)
- Andrea Minio
- Department of Viticulture and Enology, University of California Davis , Davis, CA 95616, USA
| | - Noé Cochetel
- Department of Viticulture and Enology, University of California Davis , Davis, CA 95616, USA
| | - Amanda M Vondras
- Department of Viticulture and Enology, University of California Davis , Davis, CA 95616, USA
| | - Mélanie Massonnet
- Department of Viticulture and Enology, University of California Davis , Davis, CA 95616, USA
| | - Dario Cantu
- Department of Viticulture and Enology, University of California Davis , Davis, CA 95616, USA
| |
Collapse
|
12
|
Population Scale Analysis of Centromeric Satellite DNA Reveals Highly Dynamic Evolutionary Patterns and Genomic Organization in Long-Tailed and Rhesus Macaques. Cells 2022; 11:cells11121953. [PMID: 35741082 PMCID: PMC9221937 DOI: 10.3390/cells11121953] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 06/12/2022] [Accepted: 06/14/2022] [Indexed: 02/04/2023] Open
Abstract
Centromeric satellite DNA (cen-satDNA) consists of highly divergent repeat monomers, each approximately 171 base pairs in length. Here, we investigated the genetic diversity in the centromeric region of two primate species: long-tailed (Macaca fascicularis) and rhesus (Macaca mulatta) macaques. Fluorescence in situ hybridization and bioinformatic analysis showed the chromosome-specific organization and dynamic nature of cen-satDNAsequences, and their substantial diversity, with distinct subfamilies across macaque populations, suggesting increased turnovers. Comparative genomics identified high level polymorphisms spanning a 120 bp deletion region and a remarkable interspecific variability in cen-satDNA size and structure. Population structure analysis detected admixture patterns within populations, indicating their high divergence and rapid evolution. However, differences in cen-satDNA profiles appear to not be involved in hybrid incompatibility between the two species. Our study provides a genomic landscape of centromeric repeats in wild macaques and opens new avenues for exploring their impact on the adaptive evolution and speciation of primates.
Collapse
|
13
|
Rahman A, Pachter L. SWALO: scaffolding with assembly likelihood optimization. Nucleic Acids Res 2021; 49:e117. [PMID: 34417615 PMCID: PMC8599790 DOI: 10.1093/nar/gkab717] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 06/16/2021] [Accepted: 08/16/2021] [Indexed: 01/01/2023] Open
Abstract
Scaffolding, i.e. ordering and orienting contigs is an important step in genome assembly. We present a method for scaffolding using second generation sequencing reads based on likelihoods of genome assemblies. A generative model for sequencing is used to obtain maximum likelihood estimates of gaps between contigs and to estimate whether linking contigs into scaffolds would lead to an increase in the likelihood of the assembly. We then link contigs if they can be unambiguously joined or if the corresponding increase in likelihood is substantially greater than that of other possible joins of those contigs. The method is implemented in a tool called Swalo with approximations to make it efficient and applicable to large datasets. Analysis on real and simulated datasets reveals that it consistently makes more or similar number of correct joins as other scaffolders while linking very few contigs incorrectly, thus outperforming other scaffolders and demonstrating that substantial improvement in genome assembly may be achieved through the use of statistical models. Swalo is freely available for download at https://atifrahman.github.io/SWALO/.
Collapse
Affiliation(s)
- Atif Rahman
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720, USA.,Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| | - Lior Pachter
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720, USA.,Departments of Mathematics and Molecular & Cell Biology, University of California, Berkeley, CA 94720, USA.,Departments of Biology and Computing & Mathematical Sciences, California Institute of Technology, Pasadena, CA 91103, USA
| |
Collapse
|
14
|
Yang L, Malhotra R, Chikhi R, Elleder D, Kaiser T, Rong J, Medvedev P, Poss M. Recombination marks the evolutionary dynamics of a recently endogenized retrovirus. Mol Biol Evol 2021; 38:5423-5436. [PMID: 34480565 PMCID: PMC8662619 DOI: 10.1093/molbev/msab252] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
All vertebrate genomes have been colonized by retroviruses along their evolutionary trajectory. Although endogenous retroviruses (ERVs) can contribute important physiological functions to contemporary hosts, such benefits are attributed to long-term coevolution of ERV and host because germline infections are rare and expansion is slow, and because the host effectively silences them. The genomes of several outbred species including mule deer (Odocoileus hemionus) are currently being colonized by ERVs, which provides an opportunity to study ERV dynamics at a time when few are fixed. We previously established the locus-specific distribution of cervid ERV (CrERV) in populations of mule deer. In this study, we determine the molecular evolutionary processes acting on CrERV at each locus in the context of phylogenetic origin, genome location, and population prevalence. A mule deer genome was de novo assembled from short- and long-insert mate pair reads and CrERV sequence generated at each locus. We report that CrERV composition and diversity have recently measurably increased by horizontal acquisition of a new retrovirus lineage. This new lineage has further expanded CrERV burden and CrERV genomic diversity by activating and recombining with existing CrERV. Resulting interlineage recombinants then endogenize and subsequently expand. CrERV loci are significantly closer to genes than expected if integration were random and gene proximity might explain the recent expansion of one recombinant CrERV lineage. Thus, in mule deer, retroviral colonization is a dynamic period in the molecular evolution of CrERV that also provides a burst of genomic diversity to the host population.
Collapse
Affiliation(s)
- Lei Yang
- Department of Biology, The Pennsylvania State University, University Park, PA, 16802, USA.,Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Raunaq Malhotra
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Rayan Chikhi
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, PA, 16802, USA.,Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, 16802, USA.,Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Daniel Elleder
- Department of Biology, The Pennsylvania State University, University Park, PA, 16802, USA.,Institute of Molecular Genetics, Academy of Sciences of the Czech Republic, 1083, 14220, Czech Republic Vídeňská Prague
| | - Theodora Kaiser
- Department of Biology, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Jesse Rong
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Paul Medvedev
- Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, PA, 16802, USA.,Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, 16802, USA.,Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Mary Poss
- Department of Biology, The Pennsylvania State University, University Park, PA, 16802, USA.,Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, PA, 16802, USA
| |
Collapse
|
15
|
Charlesworth D, Graham C, Trivedi U, Gardner J, Bergero R. PromethION sequencing and assembly of the genome of Micropoecilia picta, a fish with a highly Degenerated Y chromosome. Genome Biol Evol 2021; 13:6326803. [PMID: 34297069 PMCID: PMC8449826 DOI: 10.1093/gbe/evab171] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/19/2021] [Indexed: 11/13/2022] Open
Abstract
We here describe sequencing and assembly of both the autosomes and the sex chromosome in M. picta, the closest related species to the guppy, Poecilia reticulata. Poecilia ()Micropoecilia) picta is a close outgroup for studying the guppy, an important organism for studies in evolutionary ecology and in sex chromosome evolution. The guppy XY pair (LG12) has long been studied as a test case for the importance of sexually antagonistic variants in selection for suppressed recombination between Y and X chromosomes. The guppy Y chromosome is not degenerated, but appears to carry functional copies of all genes that are present on its X counterpart. The X chromosomes of M. picta (and its relative M. parae) are homologous to the guppy XY pair, but their Y chromosomes are highly degenerated, and no genes can be identified in the fully Y-linked region. A complete genome sequence of a M. picta male may therefore contribute to understanding how the guppy Y evolved. These fish species' genomes are estimated to be about 750 Mb, with high densities of repetitive sequences, suggesting that long-read sequencing is needed. We evaluated several assembly approaches, and used our results to investigate the extent of Y chromosome degeneration in this species.
Collapse
Affiliation(s)
- Deborah Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Charlotte Auerbach Road, EH9 3LF, UK
| | - Chay Graham
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Charlotte Auerbach Road, EH9 3LF, UK.,University of Cambridge, Department of Biochemistry, Sanger Building, 80 Tennis Ct Rd, Cambridge, CB2 1GA, UK
| | - Urmi Trivedi
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Charlotte Auerbach Road, EH9 3LF, UK
| | - Jim Gardner
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Charlotte Auerbach Road, EH9 3LF, UK
| | - Roberta Bergero
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Charlotte Auerbach Road, EH9 3LF, UK
| |
Collapse
|
16
|
Extreme Y chromosome polymorphism corresponds to five male reproductive morphs of a freshwater fish. Nat Ecol Evol 2021; 5:939-948. [PMID: 33958755 DOI: 10.1038/s41559-021-01452-w] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 03/23/2021] [Indexed: 02/02/2023]
Abstract
Loss of recombination between sex chromosomes often depletes Y chromosomes of functional content and genetic variation, which might limit their potential to generate adaptive diversity. Males of the freshwater fish Poecilia parae occur as one of five discrete morphs, all of which shoal together in natural populations where morph frequency has been stable for over 50 years. Each morph uses a different complex reproductive strategy and morphs differ dramatically in colour, body size and mating behaviour. Morph phenotype is passed perfectly from father to son, indicating there are five Y haplotypes segregating in the species, which encode the complex male morph characteristics. Here, we examine Y diversity in natural populations of P. parae. Using linked-read sequencing on multiple P. parae females and males of all five morphs, we find that the genetic architecture of the male morphs evolved on the Y chromosome after recombination suppression had occurred with the X. Comparing Y chromosomes between each of the morphs, we show that, although the Ys of the three minor morphs that differ in colour are highly similar, there are substantial amounts of unique genetic material and divergence between the Ys of the three major morphs that differ in reproductive strategy, body size and mating behaviour. Altogether, our results suggest that the Y chromosome is able to overcome the constraints of recombination loss to generate extreme diversity, resulting in five discrete Y chromosomes that control complex reproductive strategies.
Collapse
|
17
|
Huang S, He X, Wang G, Bao E. AlignGraph2: similar genome-assisted reassembly pipeline for PacBio long reads. Brief Bioinform 2021; 22:6146772. [PMID: 33621981 DOI: 10.1093/bib/bbab022] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 02/12/2021] [Accepted: 02/16/2021] [Indexed: 11/13/2022] Open
Abstract
Contigs assembled from the third-generation sequencing long reads are usually more complete than the second-generation short reads. However, the current algorithms still have difficulty in assembling the long reads into the ideal complete and accurate genome, or the theoretical best result [1]. To improve the long read contigs and with more and more fully sequenced genomes available, it could still be possible to use the similar genome-assisted reassembly method [2], which was initially proposed for the short reads making use of a closely related genome (similar genome) to the sequencing genome (target genome). The method aligns the contigs and reads to the similar genome, and then extends and refines the aligned contigs with the aligned reads. Here, we introduce AlignGraph2, a similar genome-assisted reassembly pipeline for the PacBio long reads. The AlignGraph2 pipeline is the second version of AlignGraph algorithm proposed by us but completely redesigned, can be inputted with either error-prone or HiFi long reads, and contains four novel algorithms: similarity-aware alignment algorithm and alignment filtration algorithm for alignment of the long reads and preassembled contigs to the similar genome, and reassembly algorithm and weight-adjusted consensus algorithm for extension and refinement of the preassembled contigs. In our performance tests on both error-prone and HiFi long reads, AlignGraph2 can align 5.7-27.2% more long reads and 7.3-56.0% more bases than some current alignment algorithm and is more efficient or comparable to the others. For contigs assembled with various de novo algorithms and aligned to similar genomes (aligned contigs), AlignGraph2 can extend 8.7-94.7% of them (extendable contigs), and obtain contigs of 7.0-249.6% larger N50 value and 5.2-87.7% smaller number of indels per 100 kbp (extended contigs). With genomes of decreased similarities, AlignGraph2 also has relatively stable performance. The AlignGraph2 software can be downloaded for free from this site: https://github.com/huangs001/AlignGraph2.
Collapse
Affiliation(s)
- Shien Huang
- Group of Interdisciplinary Information Sciences, School of Software Engineering, Beijing Jiaotong University, China
| | - Xinyu He
- Group of Interdisciplinary Information Sciences, School of Software Engineering, Beijing Jiaotong University, China
| | - Guohua Wang
- College of Information and Computer Engineering, Northeast Forestry University, China
| | - Ergude Bao
- Interdisciplinary Information Sciences, School of Software Engineering, Beijing Jiaotong University, China
| |
Collapse
|
18
|
|
19
|
Minkin I, Medvedev P. Scalable multiple whole-genome alignment and locally collinear block construction with SibeliaZ. Nat Commun 2020; 11:6327. [PMID: 33303762 PMCID: PMC7728760 DOI: 10.1038/s41467-020-19777-8] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2019] [Accepted: 10/29/2020] [Indexed: 11/29/2022] Open
Abstract
Multiple whole-genome alignment is a challenging problem in bioinformatics. Despite many successes, current methods are not able to keep up with the growing number, length, and complexity of assembled genomes, especially when computational resources are limited. Approaches based on compacted de Bruijn graphs to identify and extend anchors into locally collinear blocks have potential for scalability, but current methods do not scale to mammalian genomes. We present an algorithm, SibeliaZ-LCB, for identifying collinear blocks in closely related genomes based on analysis of the de Bruijn graph. We further incorporate this into a multiple whole-genome alignment pipeline called SibeliaZ. SibeliaZ shows run-time improvements over other methods while maintaining accuracy. On sixteen recently-assembled strains of mice, SibeliaZ runs in under 16 hours on a single machine, while other tools did not run to completion for eight mice within a week. SibeliaZ makes a significant step towards improving scalability of multiple whole-genome alignment and collinear block reconstruction algorithms on a single machine.
Collapse
Affiliation(s)
- Ilia Minkin
- Department of Computer Science and Engineering, The Pennsylvania State University, 506 Wartik Lab University Park, University Park, PA, 16802, USA.
| | - Paul Medvedev
- Department of Computer Science and Engineering, The Pennsylvania State University, 506 Wartik Lab University Park, University Park, PA, 16802, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, 506 Wartik Lab University Park, University Park, PA, 16802, USA
- Center for Computational Biology and Bioinformatics, The Pennsylvania State University, 506 Wartik Lab University Park, University Park, PA, 16802, USA
| |
Collapse
|
20
|
Chida AR, Ravi S, Jayaprasad S, Paul K, Saha J, Suresh C, Whadgar S, Kumar N, Rao K R, Ghosh C, Choudhary B, Subramani S, Srinivasan S. A Near-Chromosome Level Genome Assembly of Anopheles stephensi. Front Genet 2020; 11:565626. [PMID: 33312190 PMCID: PMC7703621 DOI: 10.3389/fgene.2020.565626] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 09/28/2020] [Indexed: 12/31/2022] Open
Abstract
Malaria remains a major healthcare risk to growing economies like India, and a chromosome-level reference genome of Anopheles stephensi is critical for successful vector management and understanding of vector evolution using comparative genomics. We report chromosome-level assemblies of an Indian strain, STE2, and a Pakistani strain SDA-500 by combining draft genomes of the two strains using a homology-based iterative approach. The resulting assembly IndV3/PakV3 with L50 of 9/12 and N50 6.3/6.9 Mb had scaffolds long enough for building 90% of the euchromatic regions of the three chromosomes, IndV3s/PakV3s, using low-resolution physical markers and enabled the generation of the next version of genome assemblies, IndV4/PakV4, using HiC data. We have validated these assemblies using contact maps against publicly available HiC raw data from two strains including STE2 and another lab strain of An. stephensi from UCI and compare the quality of the assemblies with other assemblies made available as preprints since the submission of the manuscript. We show that the IndV3s and IndV4 assemblies are sensitive in identifying a homozygous 2Rb inversion in the UCI strain and a 2Rb polymorphism in the STE2 strain. Multiple tandem copies of CYP6a14, 4c1, and 4c21 genes, implicated in insecticide resistance, lie within this inversion locus. Comparison of assembled genomes suggests a variation of 1 in 81 positions between the UCI and STE2 lab strains, 1 in 82 between SDA-500 and UCI strain, and 1 in 113 between SDA-500 and STE2 strains of An. stephensi, which are closer than 1 in 68 variations among individuals from two other lab strains sequenced and reported here. Based on the developmental transcriptome and orthology of all the 54 olfactory receptors (ORs) to those of other Anopheles species, we identify an OR with the potential for host recognition in the genus Anopheles. A comparative analysis of An. stephensi genomes with the completed genomes of a few other Anopheles species suggests limited inter-chromosomal gene flow and loss of synteny within chromosomal arms even among the closely related species.
Collapse
Affiliation(s)
- Afiya Razia Chida
- Institute of Bioinformatics and Applied Biotechnology, Bangalore, India
| | - Samathmika Ravi
- Institute of Bioinformatics and Applied Biotechnology, Bangalore, India
| | | | - Kiran Paul
- Institute of Bioinformatics and Applied Biotechnology, Bangalore, India
| | - Jaysmita Saha
- Institute of Bioinformatics and Applied Biotechnology, Bangalore, India
| | - Chinjusha Suresh
- Institute of Bioinformatics and Applied Biotechnology, Bangalore, India
| | - Saurabh Whadgar
- Institute of Bioinformatics and Applied Biotechnology, Bangalore, India
| | - Naveen Kumar
- Tata Institute for Genetics and Society Center at inStem, Bangalore, India
| | - Raksha Rao K
- Institute of Bioinformatics and Applied Biotechnology, Bangalore, India
| | - Chaitali Ghosh
- Tata Institute for Genetics and Society Center at inStem, Bangalore, India
| | - Bibha Choudhary
- Institute of Bioinformatics and Applied Biotechnology, Bangalore, India
| | - Suresh Subramani
- Tata Institute for Genetics and Society Center at inStem, Bangalore, India
| | - Subhashini Srinivasan
- Institute of Bioinformatics and Applied Biotechnology, Bangalore, India
- Tata Institute for Genetics and Society Center at inStem, Bangalore, India
| |
Collapse
|
21
|
Deb S, Jayaprasad S, Ravi S, Rao KR, Whadgar S, Hariharan N, Dixit S, Sunil M, Choudhary B, Stevanato P, Ramireddy E, Srinivasan S. Classification of Grain Amaranths Using Chromosome-Level Genome Assembly of Ramdana, A. hypochondriacus. FRONTIERS IN PLANT SCIENCE 2020; 11:579529. [PMID: 33262776 PMCID: PMC7686145 DOI: 10.3389/fpls.2020.579529] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Accepted: 10/12/2020] [Indexed: 06/12/2023]
Abstract
In the age of genomics-based crop improvement, a high-quality genome of a local landrace adapted to the local environmental conditions is critically important. Grain amaranths produce highly nutritional grains with a multitude of desirable properties including C4 photosynthesis highly sought-after in other crops. For improving the agronomic traits of grain amaranth and for the transfer of desirable traits to dicot crops, a reference genome of a local landrace is necessary. Toward this end, our lab had initiated sequencing the genome of Amaranthus (A.) hypochondriacus (A.hyp_K_white) and had reported a draft genome in 2014. We selected this landrace because it is well adapted for cultivation in India during the last century and is currently a candidate for TILLING-based crop improvement. More recently, a high-quality chromosome-level assembly of A. hypochondriacus (PI558499, Plainsman) was reported. Here, we report a chromosome-level assembly of A.hyp_K_white (AhKP) using low-coverage PacBio reads, contigs from the reported draft genome of A.hyp_K_white, raw HiC data and reference genome of Plainsman (A.hyp.V.2.1). The placement of A.hyp_K_white on the phylogenetic tree of grain amaranths of known accessions clearly suggests that A.hyp_K_white is genetically distal from Plainsman and is most closely related to the accession PI619259 from Nepal (Ramdana). Furthermore, the classification of another accession, Suvarna, adapted to the local environment and selected for yield and other desirable traits, is clearly Amaranthus cruentus. A classification based on hundreds of thousands of SNPs validated taxonomy-based classification for a majority of the accessions providing the opportunity for reclassification of a few.
Collapse
Affiliation(s)
- Saptarathi Deb
- Institute of Bioinformatics and Applied Biotechnology, Bengaluru, India
| | | | - Samathmika Ravi
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, viale dell’Università, Legnaro, Italy
| | - K. Raksha Rao
- Institute of Bioinformatics and Applied Biotechnology, Bengaluru, India
| | - Saurabh Whadgar
- Institute of Bioinformatics and Applied Biotechnology, Bengaluru, India
| | - Nivedita Hariharan
- Institute for Stem Cell Science and Regenerative Medicine (InStem), Bengaluru, India
| | - Shubham Dixit
- Institute of Bioinformatics and Applied Biotechnology, Bengaluru, India
| | - Meeta Sunil
- Institute of Bioinformatics and Applied Biotechnology, Bengaluru, India
| | - Bibha Choudhary
- Institute of Bioinformatics and Applied Biotechnology, Bengaluru, India
| | - Piergiorgio Stevanato
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, viale dell’Università, Legnaro, Italy
| | - Eswarayya Ramireddy
- Indian Institute of Science Education and Research, Tirupati, Tirupati, India
| | | |
Collapse
|
22
|
Liang P, Saqib HSA, Ni X, Shen Y. Long-read sequencing and de novo genome assembly of marine medaka (Oryzias melastigma). BMC Genomics 2020; 21:640. [PMID: 32938378 PMCID: PMC7493909 DOI: 10.1186/s12864-020-07042-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Accepted: 08/31/2020] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Marine medaka (Oryzias melastigma) is considered as an important ecotoxicological indicator to study the biochemical, physiological and molecular responses of marine organisms towards increasing amount of pollutants in marine and estuarine waters. RESULTS In this study, we reported a high-quality and accurate de novo genome assembly of marine medaka through the integration of single-molecule sequencing, Illumina paired-end sequencing, and 10X Genomics linked-reads. The 844.17 Mb assembly is estimated to cover more than 98% of the genome and is more continuous with fewer gaps and errors than the previous genome assembly. Comparison of O. melastigma with closely related species showed significant expansion of gene families associated with DNA repair and ATP-binding cassette (ABC) transporter pathways. We identified 274 genes that appear to be under significant positive selection and are involved in DNA repair, cellular transportation processes, conservation and stability of the genome. The positive selection of genes and the considerable expansion in gene numbers, especially related to stimulus responses provide strong supports for adaptations of O. melastigma under varying environmental stresses. CONCLUSIONS The highly contiguous marine medaka genome and comparative genomic analyses will increase our understanding of the underlying mechanisms related to its extraordinary adaptation capability, leading towards acceleration in the ongoing and future investigations in marine ecotoxicology.
Collapse
Affiliation(s)
- Pingping Liang
- College of the Environment and Ecology, Xiamen University, Xiamen, 361102, China
| | - Hafiz Sohaib Ahmed Saqib
- State Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Xiaomin Ni
- College of the Environment and Ecology, Xiamen University, Xiamen, 361102, China
- Fudan University, Shanghai, 200240, China
| | - Yingjia Shen
- College of the Environment and Ecology, Xiamen University, Xiamen, 361102, China.
| |
Collapse
|
23
|
Liu Z, Feng J, Yu B, Ma Q, Liu B. The functional determinants in the organization of bacterial genomes. Brief Bioinform 2020; 22:5892344. [PMID: 32793986 DOI: 10.1093/bib/bbaa172] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Revised: 06/30/2020] [Accepted: 07/07/2020] [Indexed: 12/13/2022] Open
Abstract
Bacterial genomes are now recognized as interacting intimately with cellular processes. Uncovering organizational mechanisms of bacterial genomes has been a primary focus of researchers to reveal the potential cellular activities. The advances in both experimental techniques and computational models provide a tremendous opportunity for understanding these mechanisms, and various studies have been proposed to explore the organization rules of bacterial genomes associated with functions recently. This review focuses mainly on the principles that shape the organization of bacterial genomes, both locally and globally. We first illustrate local structures as operons/transcription units for facilitating co-transcription and horizontal transfer of genes. We then clarify the constraints that globally shape bacterial genomes, such as metabolism, transcription and replication. Finally, we highlight challenges and opportunities to advance bacterial genomic studies and provide application perspectives of genome organization, including pathway hole assignment and genome assembly and understanding disease mechanisms.
Collapse
Affiliation(s)
| | | | - Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology
| | - Qin Ma
- Department of Biomedical Informatics, the Ohio State University
| | | |
Collapse
|
24
|
Kuhl H, Li L, Wuertz S, Stöck M, Liang XF, Klopp C. CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes. Gigascience 2020; 9:giaa034. [PMID: 32449778 PMCID: PMC7247394 DOI: 10.1093/gigascience/giaa034] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 01/29/2020] [Accepted: 03/24/2020] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Easy-to-use and fast bioinformatics pipelines for long-read assembly that go beyond the contig level to generate highly continuous chromosome-scale genomes from raw data remain scarce. RESULT Chromosome-Scale Assembler (CSA) is a novel computationally highly efficient bioinformatics pipeline that fills this gap. CSA integrates information from scaffolded assemblies (e.g., Hi-C or 10X Genomics) or even from diverged reference genomes into the assembly process. As CSA performs automated assembly of chromosome-sized scaffolds, we benchmark its performance against state-of-the-art reference genomes, i.e., conventionally built in a laborious fashion using multiple separate assembly tools and manual curation. CSA increases the contig lengths using scaffolding, local re-assembly, and gap closing. On certain datasets, initial contig N50 may be increased up to 4.5-fold. For smaller vertebrate genomes, chromosome-scale assemblies can be achieved within 12 h using low-cost, high-end desktop computers. Mammalian genomes can be processed within 16 h on compute-servers. Using diverged reference genomes for fish, birds, and mammals, we demonstrate that CSA calculates chromosome-scale assemblies from long-read data and genome comparisons alone. Even contig-level draft assemblies of diverged genomes are helpful for reconstructing chromosome-scale sequences. CSA is also capable of assembling ultra-long reads. CONCLUSIONS CSA can speed up and simplify chromosome-level assembly and significantly lower costs of large-scale family-level vertebrate genome projects.
Collapse
Affiliation(s)
- Heiner Kuhl
- Department of Ecophysiology and Aquaculture, Leibniz-Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 310, 12587 Berlin, Germany
| | - Ling Li
- Department of Ecophysiology and Aquaculture, Leibniz-Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 310, 12587 Berlin, Germany
- College of Fisheries, Chinese Perch Research Center, Huazhong Agricultural University; Innovation Base for Chinese Perch Breeding, Key Lab of Freshwater Animal Breeding, Ministry of Agriculture, No.1 Shizishan Street, Hongshan District, 430070 Wuhan, Hubei Province, P.R. China
| | - Sven Wuertz
- Department of Ecophysiology and Aquaculture, Leibniz-Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 310, 12587 Berlin, Germany
| | - Matthias Stöck
- Department of Ecophysiology and Aquaculture, Leibniz-Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 310, 12587 Berlin, Germany
| | - Xu-Fang Liang
- College of Fisheries, Chinese Perch Research Center, Huazhong Agricultural University; Innovation Base for Chinese Perch Breeding, Key Lab of Freshwater Animal Breeding, Ministry of Agriculture, No.1 Shizishan Street, Hongshan District, 430070 Wuhan, Hubei Province, P.R. China
| | - Christophe Klopp
- Sigenae, Bioinfo Genotoul, Mathématiques et Informatique Appliquées de Toulouse, INRAe, 24 Chemin de Borde Rouge, 31320 Auzeville-Tolosane, Castanet Tolosan, France
| |
Collapse
|
25
|
Farré M, Li Q, Darolti I, Zhou Y, Damas J, Proskuryakova AA, Kulemzina AI, Chemnick LG, Kim J, Ryder OA, Ma J, Graphodatsky AS, Zhang G, Larkin DM, Lewin HA. An integrated chromosome-scale genome assembly of the Masai giraffe (Giraffa camelopardalis tippelskirchi). Gigascience 2020; 8:5542321. [PMID: 31367745 PMCID: PMC6669057 DOI: 10.1093/gigascience/giz090] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2019] [Revised: 06/12/2019] [Accepted: 07/09/2019] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND The Masai giraffe (Giraffa camelopardalis tippelskirchi) is the largest-bodied giraffe and the world's tallest terrestrial animal. With its extreme size and height, the giraffe's unique anatomical and physiological adaptations have long been of interest to diverse research fields. Giraffes are also critical to ecosystems of sub-Saharan Africa, with their long neck serving as a conduit to food sources not shared by other herbivores. Although the genome of a Masai giraffe has been sequenced, the assembly was highly fragmented and suboptimal for genome analysis. Herein we report an improved giraffe genome assembly to facilitate evolutionary analysis of the giraffe and other ruminant genomes. FINDINGS Using SOAPdenovo2 and 170 Gbp of Illumina paired-end and mate-pair reads, we generated a 2.6-Gbp male Masai giraffe genome assembly, with a scaffold N50 of 3 Mbp. The incorporation of 114.6 Gbp of Chicago library sequencing data resulted in a HiRise SOAPdenovo + Chicago assembly with an N50 of 48 Mbp and containing 95% of expected genes according to BUSCO analysis. Using the Reference-Assisted Chromosome Assembly tool, we were able to order and orient scaffolds into 42 predicted chromosome fragments (PCFs). Using fluorescence in situ hybridization, we placed 153 cattle bacterial artificial chromosomes onto giraffe metaphase spreads to assess and assign the PCFs on 14 giraffe autosomes and the X chromosome resulting in the final assembly with an N50 of 177.94 Mbp. In this assembly, 21,621 protein-coding genes were identified using both de novo and homology-based predictions. CONCLUSIONS We have produced the first chromosome-scale genome assembly for a Giraffidae species. This assembly provides a valuable resource for the study of artiodactyl evolution and for understanding the molecular basis of the unique adaptive traits of giraffes. In addition, the assembly will provide a powerful resource to assist conservation efforts of Masai giraffe, whose population size has declined by 52% in recent years.
Collapse
Affiliation(s)
- Marta Farré
- Department of Comparative Biomedical Sciences, Royal Veterinary College, University of London, London NW1 0TU, UK.,School of Biosciences, University of Kent, Canterbury CT2 7NJ, UK
| | - Qiye Li
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China.,China National Genebank, BGI-Shenzhen, Shenzhen 518083, China
| | - Iulia Darolti
- Department of Comparative Biomedical Sciences, Royal Veterinary College, University of London, London NW1 0TU, UK.,Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Yang Zhou
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK.,Centre for Social Evolution, Department of Biology, Universitetsparken 15, University of Copenhagen, DK-2100 Copenhagen, Denmark
| | - Joana Damas
- Department of Comparative Biomedical Sciences, Royal Veterinary College, University of London, London NW1 0TU, UK.,The Genome Center, University of California, Davis, CA 95616, USA
| | - Anastasia A Proskuryakova
- Institute of Molecular and Cellular Biology, SB RAS, Novosibirsk 630090, Russia.,Novosibirsk State University, Novosibirsk 630090, Russia
| | | | - Leona G Chemnick
- San Diego Institute for Conservation Research, San Diego Zoo Global, Escondido, CA, USA
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, South Korea
| | - Oliver A Ryder
- San Diego Institute for Conservation Research, San Diego Zoo Global, Escondido, CA, USA
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Alexander S Graphodatsky
- Institute of Molecular and Cellular Biology, SB RAS, Novosibirsk 630090, Russia.,Novosibirsk State University, Novosibirsk 630090, Russia
| | - Guoije Zhang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China.,China National Genebank, BGI-Shenzhen, Shenzhen 518083, China.,Centre for Social Evolution, Department of Biology, Universitetsparken 15, University of Copenhagen, DK-2100 Copenhagen, Denmark
| | - Denis M Larkin
- Department of Comparative Biomedical Sciences, Royal Veterinary College, University of London, London NW1 0TU, UK.,The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Sciences (ICG SB RAS), Novosibirsk 630090, Russia
| | - Harris A Lewin
- The Genome Center, University of California, Davis, CA 95616, USA.,Department of Evolution and Ecology, College of Biological Sciences, and the Department of Reproduction and Population Health, School of Veterinary Medicine, University of California, Davis, CA 95616, USA
| |
Collapse
|
26
|
Waterhouse RM, Aganezov S, Anselmetti Y, Lee J, Ruzzante L, Reijnders MJMF, Feron R, Bérard S, George P, Hahn MW, Howell PI, Kamali M, Koren S, Lawson D, Maslen G, Peery A, Phillippy AM, Sharakhova MV, Tannier E, Unger MF, Zhang SV, Alekseyev MA, Besansky NJ, Chauve C, Emrich SJ, Sharakhov IV. Evolutionary superscaffolding and chromosome anchoring to improve Anopheles genome assemblies. BMC Biol 2020; 18:1. [PMID: 31898513 PMCID: PMC6939337 DOI: 10.1186/s12915-019-0728-3] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 11/26/2019] [Indexed: 11/18/2022] Open
Abstract
Background New sequencing technologies have lowered financial barriers to whole genome sequencing, but resulting assemblies are often fragmented and far from ‘finished’. Updating multi-scaffold drafts to chromosome-level status can be achieved through experimental mapping or re-sequencing efforts. Avoiding the costs associated with such approaches, comparative genomic analysis of gene order conservation (synteny) to predict scaffold neighbours (adjacencies) offers a potentially useful complementary method for improving draft assemblies. Results We evaluated and employed 3 gene synteny-based methods applied to 21 Anopheles mosquito assemblies to produce consensus sets of scaffold adjacencies. For subsets of the assemblies, we integrated these with additional supporting data to confirm and complement the synteny-based adjacencies: 6 with physical mapping data that anchor scaffolds to chromosome locations, 13 with paired-end RNA sequencing (RNAseq) data, and 3 with new assemblies based on re-scaffolding or long-read data. Our combined analyses produced 20 new superscaffolded assemblies with improved contiguities: 7 for which assignments of non-anchored scaffolds to chromosome arms span more than 75% of the assemblies, and a further 7 with chromosome anchoring including an 88% anchored Anopheles arabiensis assembly and, respectively, 73% and 84% anchored assemblies with comprehensively updated cytogenetic photomaps for Anopheles funestus and Anopheles stephensi. Conclusions Experimental data from probe mapping, RNAseq, or long-read technologies, where available, all contribute to successful upgrading of draft assemblies. Our evaluations show that gene synteny-based computational methods represent a valuable alternative or complementary approach. Our improved Anopheles reference assemblies highlight the utility of applying comparative genomics approaches to improve community genomic resources.
Collapse
Affiliation(s)
- Robert M Waterhouse
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
| | - Sergey Aganezov
- Department of Computer Science, Princeton University, Princeton, NJ, 08450, USA.,Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
| | | | - Jiyoung Lee
- The Interdisciplinary PhD Program in Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| | - Livio Ruzzante
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Maarten J M F Reijnders
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Romain Feron
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Sèverine Bérard
- ISEM, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Phillip George
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| | - Matthew W Hahn
- Departments of Biology and Computer Science, Indiana University, Bloomington, IN, 47405, USA
| | - Paul I Howell
- Centers for Disease Control and Prevention, Atlanta, GA, 30329, USA
| | - Maryam Kamali
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA.,Department of Medical Entomology and Parasitology, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Daniel Lawson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Gareth Maslen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Ashley Peery
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Maria V Sharakhova
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA.,Laboratory of Ecology, Genetics and Environmental Protection, Tomsk State University, Tomsk, Russia, 634050
| | - Eric Tannier
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, Unité Mixte de Recherche 5558 Centre National de la Recherche Scientifique, 69622, Villeurbanne, France.,Institut national de recherche en informatique et en automatique, Montbonnot, 38334, Grenoble, Rhône-Alpes, France
| | - Maria F Unger
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, Galvin Life Sciences Building, Notre Dame, IN, 46556, USA
| | - Simo V Zhang
- Departments of Biology and Computer Science, Indiana University, Bloomington, IN, 47405, USA
| | - Max A Alekseyev
- Department of Mathematics and Computational Biology Institute, George Washington University, Ashburn, VA, 20147, USA
| | - Nora J Besansky
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, Galvin Life Sciences Building, Notre Dame, IN, 46556, USA
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada
| | - Scott J Emrich
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, 37996, USA
| | - Igor V Sharakhov
- The Interdisciplinary PhD Program in Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA. .,Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA. .,Laboratory of Ecology, Genetics and Environmental Protection, Tomsk State University, Tomsk, Russia, 634050.
| |
Collapse
|
27
|
Fan H, Wu Q, Wei F, Yang F, Ng BL, Hu Y. Chromosome-level genome assembly for giant panda provides novel insights into Carnivora chromosome evolution. Genome Biol 2019; 20:267. [PMID: 31810476 PMCID: PMC6898958 DOI: 10.1186/s13059-019-1889-7] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Accepted: 11/15/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Chromosome evolution is an important driver of speciation and species evolution. Previous studies have detected chromosome rearrangement events among different Carnivora species using chromosome painting strategies. However, few of these studies have focused on chromosome evolution at a nucleotide resolution due to the limited availability of chromosome-level Carnivora genomes. Although the de novo genome assembly of the giant panda is available, current short read-based assemblies are limited to moderately sized scaffolds, making the study of chromosome evolution difficult. RESULTS Here, we present a chromosome-level giant panda draft genome with a total size of 2.29 Gb. Based on the giant panda genome and published chromosome-level dog and cat genomes, we conduct six large-scale pairwise synteny alignments and identify evolutionary breakpoint regions. Interestingly, gene functional enrichment analysis shows that for all of the three Carnivora genomes, some genes located in evolutionary breakpoint regions are significantly enriched in pathways or terms related to sensory perception of smell. In addition, we find that the sweet receptor gene TAS1R2, which has been proven to be a pseudogene in the cat genome, is located in an evolutionary breakpoint region of the giant panda, suggesting that interchromosomal rearrangement may play a role in the cat TAS1R2 pseudogenization. CONCLUSIONS We show that the combined strategies employed in this study can be used to generate efficient chromosome-level genome assemblies. Moreover, our comparative genomics analyses provide novel insights into Carnivora chromosome evolution, linking chromosome evolution to functional gene evolution.
Collapse
Affiliation(s)
- Huizhong Fan
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Qi Wu
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Fuwen Wei
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| | - Fengtang Yang
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Bee Ling Ng
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Yibo Hu
- CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.
| |
Collapse
|
28
|
Construction of High-Resolution RAD-Seq Based Linkage Map, Anchoring Reference Genome, and QTL Mapping of the Sex Chromosome in the Marine Medaka Oryzias melastigma. G3-GENES GENOMES GENETICS 2019; 9:3537-3545. [PMID: 31530635 PMCID: PMC6829124 DOI: 10.1534/g3.119.400708] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Medaka (Oryzias sp.) is an important fish species in ecotoxicology and considered as a model species due to its biological features including small body size and short generation time. Since Japanese medaka Oryzias latipes is a freshwater species with access to an excellent genome resource, the marine medaka Oryzias melastigma is also applicable for the marine ecotoxicology. In genome era, a high-density genetic linkage map is a very useful resource in genomic research, providing a means for comparative genomic analysis and verification of de novo genome assembly. In this study, we developed a high-density genetic linkage map for O. melastigma using restriction-site associated DNA sequencing (RAD-seq). The genetic map consisted of 24 linkage groups with 2,481 single nucleotide polymorphism (SNP) markers. The total map length was 1,784 cM with an average marker space of 0.72 cM. The genetic map was integrated with the reference-assisted chromosome assembly (RACA) of O. melastigma, which anchored 90.7% of the assembled sequence onto the linkage map. The values of complete Benchmarking Universal Single-Copy Orthologs were similar to RACA assembly but N50 (23.74 Mb; total genome length 779.4 Mb; gap 5.29%) increased to 29.99 Mb (total genome length 778.7 Mb; gap 5.2%). Using MapQTL analysis with SNP markers, we identified a major quantitative trait locus for sex traits on the Om10. The integration of the genetic map with the reference genome of marine medaka will serve as a good resource for studies in molecular toxicology, genomics, CRISPR/Cas9, and epigenetics.
Collapse
|
29
|
Palmer DH, Rogers TF, Dean R, Wright AE. How to identify sex chromosomes and their turnover. Mol Ecol 2019; 28:4709-4724. [PMID: 31538682 PMCID: PMC6900093 DOI: 10.1111/mec.15245] [Citation(s) in RCA: 104] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Revised: 09/05/2019] [Accepted: 09/13/2019] [Indexed: 12/12/2022]
Abstract
Although sex is a fundamental component of eukaryotic reproduction, the genetic systems that control sex determination are highly variable. In many organisms the presence of sex chromosomes is associated with female or male development. Although certain groups possess stable and conserved sex chromosomes, others exhibit rapid sex chromosome evolution, including transitions between male and female heterogamety, and turnover in the chromosome pair recruited to determine sex. These turnover events have important consequences for multiple facets of evolution, as sex chromosomes are predicted to play a central role in adaptation, sexual dimorphism, and speciation. However, our understanding of the processes driving the formation and turnover of sex chromosome systems is limited, in part because we lack a complete understanding of interspecific variation in the mechanisms by which sex is determined. New bioinformatic methods are making it possible to identify and characterize sex chromosomes in a diverse array of non-model species, rapidly filling in the numerous gaps in our knowledge of sex chromosome systems across the tree of life. In turn, this growing data set is facilitating and fueling efforts to address many of the unanswered questions in sex chromosome evolution. Here, we synthesize the available bioinformatic approaches to produce a guide for characterizing sex chromosome system and identity simultaneously across clades of organisms. Furthermore, we survey our current understanding of the processes driving sex chromosome turnover, and highlight important avenues for future research.
Collapse
Affiliation(s)
- Daniela H. Palmer
- Department of Animal and Plant SciencesUniversity of SheffieldSheffieldUK
| | - Thea F. Rogers
- Department of Animal and Plant SciencesUniversity of SheffieldSheffieldUK
| | - Rebecca Dean
- Department of Genetics, Evolution and EnvironmentUniversity College LondonLondonUK
| | - Alison E. Wright
- Department of Animal and Plant SciencesUniversity of SheffieldSheffieldUK
| |
Collapse
|
30
|
Alonge M, Soyk S, Ramakrishnan S, Wang X, Goodwin S, Sedlazeck FJ, Lippman ZB, Schatz MC. RaGOO: fast and accurate reference-guided scaffolding of draft genomes. Genome Biol 2019; 20:224. [PMID: 31661016 PMCID: PMC6816165 DOI: 10.1186/s13059-019-1829-6] [Citation(s) in RCA: 403] [Impact Index Per Article: 67.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Accepted: 09/19/2019] [Indexed: 01/10/2023] Open
Abstract
We present RaGOO, a reference-guided contig ordering and orienting tool that leverages the speed and sensitivity of Minimap2 to accurately achieve chromosome-scale assemblies in minutes. After the pseudomolecules are constructed, RaGOO identifies structural variants, including those spanning sequencing gaps. We show that RaGOO accurately orders and orients 3 de novo tomato genome assemblies, including the widely used M82 reference cultivar. We then demonstrate the scalability and utility of RaGOO with a pan-genome analysis of 103 Arabidopsis thaliana accessions by examining the structural variants detected in the newly assembled pseudomolecules. RaGOO is available open source at https://github.com/malonge/RaGOO .
Collapse
Affiliation(s)
- Michael Alonge
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Sebastian Soyk
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | | | - Xingang Wang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Zachary B Lippman
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
- Cold Spring Harbor Laboratory, Howard Hughes Medical Institute, Cold Spring Harbor, NY, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
31
|
Extreme heterogeneity in sex chromosome differentiation and dosage compensation in livebearers. Proc Natl Acad Sci U S A 2019; 116:19031-19036. [PMID: 31484763 PMCID: PMC6754558 DOI: 10.1073/pnas.1905298116] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Once recombination is halted between the X and Y chromosomes, sex chromosomes begin to differentiate and transition to heteromorphism. While there is a remarkable variation across clades in the degree of sex chromosome divergence, far less is known about the variation in sex chromosome differentiation within clades. Here, we combined whole-genome and transcriptome sequencing data to characterize the structure and conservation of sex chromosome systems across Poeciliidae, the livebearing clade that includes guppies. We found that the Poecilia reticulata XY system is much older than previously thought, being shared not only with its sister species, Poecilia wingei, but also with Poecilia picta, which diverged roughly 20 million years ago. Despite the shared ancestry, we uncovered an extreme heterogeneity across these species in the proportion of the sex chromosome with suppressed recombination, and the degree of Y chromosome decay. The sex chromosomes in P. reticulata and P. wingei are largely homomorphic, with recombination in the former persisting over a substantial fraction. However, the sex chromosomes in P. picta are completely nonrecombining and strikingly heteromorphic. Remarkably, the profound degradation of the ancestral Y chromosome in P. picta is counterbalanced by the evolution of functional chromosome-wide dosage compensation in this species, which has not been previously observed in teleost fish. Our results offer important insight into the initial stages of sex chromosome evolution and dosage compensation.
Collapse
|
32
|
Cleary A, Ramaraj T, Kahanda I, Mudge J, Mumey B. Exploring Frequented Regions in Pan-Genomic Graphs. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1424-1435. [PMID: 30106690 DOI: 10.1109/tcbb.2018.2864564] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We consider the problem of identifying regions within a pan-genome De Bruijn graph that are traversed by many sequence paths. We define such regions and the subpaths that traverse them as frequented regions (FRs). In this work, we formalize the FR problem and describe an efficient algorithm for finding FRs. Subsequently, we propose some applications of FRs based on machine-learning and pan-genome graph simplification. We demonstrate the effectiveness of these applications using data sets for the organisms Staphylococcus aureus (bacterium) and Saccharomyces cerevisiae (yeast). We corroborate the biological relevance of FRs such as identifying introgressions in yeast that aid in alcohol tolerance, and show that FRs are useful for classification of yeast strains by industrial use and visualizing pan-genomic space.
Collapse
|
33
|
Song G, Lee J, Kim J, Kang S, Lee H, Kwon D, Lee D, Lang GI, Cherry JM, Kim J. Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies. PLoS One 2019; 14:e0221858. [PMID: 31454399 PMCID: PMC6711525 DOI: 10.1371/journal.pone.0221858] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Accepted: 08/18/2019] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Genomic data have become major resources to understand complex mechanisms at fine-scale temporal and spatial resolution in functional and evolutionary genetic studies, including human diseases, such as cancers. Recently, a large number of whole genomes of evolving populations of yeast (Saccharomyces cerevisiae W303 strain) were sequenced in a time-dependent manner to identify temporal evolutionary patterns. For this type of study, a chromosome-level sequence assembly of the strain or population at time zero is required to compare with the genomes derived later. However, there is no fully automated computational approach in experimental evolution studies to establish the chromosome-level genome assembly using unique features of sequencing data. METHODS AND RESULTS In this study, we developed a new software pipeline, the integrative meta-assembly pipeline (IMAP), to build chromosome-level genome sequence assemblies by generating and combining multiple initial assemblies using three de novo assemblers from short-read sequencing data. We significantly improved the continuity and accuracy of the genome assembly using a large collection of sequencing data and hybrid assembly approaches. We validated our pipeline by generating chromosome-level assemblies of yeast strains W303 and SK1, and compared our results with assemblies built using long-read sequencing and various assembly evaluation metrics. We also constructed chromosome-level sequence assemblies of S. cerevisiae strain Sigma1278b, and three commonly used fungal strains: Aspergillus nidulans A713, Neurospora crassa 73, and Thielavia terrestris CBS 492.74, for which long-read sequencing data are not yet available. Finally, we examined the effect of IMAP parameters, such as reference and resolution, on the quality of the final assembly of the yeast strains W303 and SK1. CONCLUSIONS We developed a cost-effective pipeline to generate chromosome-level sequence assemblies using only short-read sequencing data. Our pipeline combines the strengths of reference-guided and meta-assembly approaches. Our pipeline is available online at http://github.com/jkimlab/IMAP including a Docker image, as well as a Perl script, to help users install the IMAP package, including several prerequisite programs. Users can use IMAP to easily build the chromosome-level assembly for the genome of their interest.
Collapse
Affiliation(s)
- Giltae Song
- School of Computer Science and Engineering, Pusan National University, Busan, South Korea
| | - Jongin Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, South Korea
| | - Juyeon Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, South Korea
| | - Seokwoo Kang
- School of Computer Science and Engineering, Pusan National University, Busan, South Korea
| | - Hoyong Lee
- School of Computer Science and Engineering, Pusan National University, Busan, South Korea
| | - Daehong Kwon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, South Korea
| | - Daehwan Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, South Korea
| | - Gregory I. Lang
- Department of Biological Sciences, Lehigh University, Bethlehem, PA, United States of America
| | - J. Michael Cherry
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, South Korea
| |
Collapse
|
34
|
A Multireference-Based Whole Genome Assembly for the Obligate Ant-Following Antbird, Rhegmatorhina melanosticta (Thamnophilidae). DIVERSITY-BASEL 2019. [DOI: 10.3390/d11090144] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Current generation high-throughput sequencing technology has facilitated the generation of more genomic-scale data than ever before, thus greatly improving our understanding of avian biology across a range of disciplines. Recent developments in linked-read sequencing (Chromium 10×) and reference-based whole-genome assembly offer an exciting prospect of more accessible chromosome-level genome sequencing in the near future. We sequenced and assembled a genome of the Hairy-crested Antbird (Rhegmatorhina melanosticta), which represents the first publicly available genome for any antbird (Thamnophilidae). Our objectives were to (1) assemble scaffolds to chromosome level based on multiple reference genomes, and report on differences relative to other genomes, (2) assess genome completeness and compare content to other related genomes, and (3) assess the suitability of linked-read sequencing technology for future studies in comparative phylogenomics and population genomics studies. Our R. melanosticta assembly was both highly contiguous (de novo scaffold N50 = 3.3 Mb, reference based N50 = 53.3 Mb) and relatively complete (contained close to 90% of evolutionarily conserved single-copy avian genes and known tetrapod ultraconserved elements). The high contiguity and completeness of this assembly enabled the genome to be successfully mapped to the chromosome level, which uncovered a consistent structural difference between R. melanosticta and other avian genomes. Our results are consistent with the observation that avian genomes are structurally conserved. Additionally, our results demonstrate the utility of linked-read sequencing for non-model genomics. Finally, we demonstrate the value of our R. melanosticta genome for future researchers by mapping reduced representation sequencing data, and by accurately reconstructing the phylogenetic relationships among a sample of thamnophilid species.
Collapse
|
35
|
Nascimento LC, Yanagui K, Jose J, Camargo ELO, Grassi MCB, Cunha CP, Bressiani JA, Carvalho GMA, Carvalho CR, Prado PF, Mieczkowski P, Pereira GAG, Carazzolle MF. Unraveling the complex genome of Saccharum spontaneum using Polyploid Gene Assembler. DNA Res 2019; 26:205-216. [PMID: 30768175 PMCID: PMC6589550 DOI: 10.1093/dnares/dsz001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Accepted: 01/21/2019] [Indexed: 12/01/2022] Open
Abstract
The Polyploid Gene Assembler (PGA), developed and tested in this study, represents a new strategy to perform gene-space assembly from complex genomes using low coverage DNA sequencing. The pipeline integrates reference-assisted loci and de novo assembly strategies to construct high-quality sequences focused on gene content. Pipeline validation was conducted with wheat (Triticum aestivum), a hexaploid species, using barley (Hordeum vulgare) as reference, that resulted in the identification of more than 90% of genes and several new genes. Moreover, PGA was used to assemble gene content in Saccharum spontaneum species, a parental lineage for hybrid sugarcane cultivars. Saccharum spontaneum gene sequence obtained was used to reference-guided transcriptome analysis of six different tissues. A total of 39,234 genes were identified, 60.4% clustered into known grass gene families. Thirty-seven gene families were expanded when compared with other grasses, three of them highlighted by the number of gene copies potentially involved in initial development and stress response. In addition, 3,108 promoters (many showing tissue specificity) were identified in this work. In summary, PGA can reconstruct high-quality gene sequences from polyploid genomes, as shown for wheat and S. spontaneum species, and it is more efficient than conventional genome assemblers using low coverage DNA sequencing.
Collapse
Affiliation(s)
- Leandro Costa Nascimento
- Laboratório de Genômica e bioEnergia (LGE), Departamento de Genética, Evolução, Microbiologia e Imunologia, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, SP, Brazil.,Laboratório Central de Tecnologias de Alto Desempenho (LaCTAD), Universidade Estadual de Campinas, Campinas, SP, Brazil
| | - Karina Yanagui
- Laboratório de Genômica e bioEnergia (LGE), Departamento de Genética, Evolução, Microbiologia e Imunologia, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, SP, Brazil
| | - Juliana Jose
- Laboratório de Genômica e bioEnergia (LGE), Departamento de Genética, Evolução, Microbiologia e Imunologia, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, SP, Brazil
| | - Eduardo L O Camargo
- Laboratório de Genômica e bioEnergia (LGE), Departamento de Genética, Evolução, Microbiologia e Imunologia, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, SP, Brazil.,Biocelere Agroindustrial Ltda, GranBio Investimentos S.A., Campinas, SP, Brazil
| | - Maria Carolina B Grassi
- Laboratório de Genômica e bioEnergia (LGE), Departamento de Genética, Evolução, Microbiologia e Imunologia, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, SP, Brazil
| | - Camila P Cunha
- Laboratório Nacional de Ciência e Tecnologia do Bioetanol (CTBE), Centro Nacional de Pesquisas em Energia e Materiais (CNPEM), Campinas, SP, Brazil
| | | | - Guilherme M A Carvalho
- Laboratório de Citogenética e Citometria, Departamento de Biologia Geral, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Carlos Roberto Carvalho
- Laboratório de Citogenética e Citometria, Departamento de Biologia Geral, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Paula F Prado
- Laboratório de Genômica e bioEnergia (LGE), Departamento de Genética, Evolução, Microbiologia e Imunologia, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, SP, Brazil
| | - Piotr Mieczkowski
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Gonçalo A G Pereira
- Laboratório de Genômica e bioEnergia (LGE), Departamento de Genética, Evolução, Microbiologia e Imunologia, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, SP, Brazil
| | - Marcelo F Carazzolle
- Laboratório de Genômica e bioEnergia (LGE), Departamento de Genética, Evolução, Microbiologia e Imunologia, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, SP, Brazil
| |
Collapse
|
36
|
Kwon D, Lee J, Kim J. GMASS: a novel measure for genome assembly structural similarity. BMC Bioinformatics 2019; 20:147. [PMID: 30885117 PMCID: PMC6423833 DOI: 10.1186/s12859-019-2710-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Accepted: 03/03/2019] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Thanks to the recent advancements in next-generation sequencing (NGS) technologies, large amount of genomic data, which are short DNA sequences known as reads, has been accumulating. Diverse assemblers have been developed to generate high quality de novo assemblies using the NGS reads, but their output is very different because of algorithmic differences. However, there are not properly structured measures to show the similarity or difference in assemblies. RESULTS We developed a new measure, called the GMASS score, for comparing two genome assemblies in terms of their structure. The GMASS score was developed based on the distribution pattern of the number and coverage of similar regions between a pair of assemblies. The new measure was able to show structural similarity between assemblies when evaluated by simulated assembly datasets. The application of the GMASS score to compare assemblies in recently published benchmark datasets showed the divergent performance of current assemblers as well as its ability to compare assemblies. CONCLUSION The GMASS score is a novel measure for representing structural similarity between two assemblies. It will contribute to the understanding of assembly output and developing de novo assemblers.
Collapse
Affiliation(s)
- Daehong Kwon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, South Korea
| | - Jongin Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, South Korea
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, South Korea.
| |
Collapse
|
37
|
Abstract
Affordable, high-throughput DNA sequencing has accelerated the pace of genome assembly over the past decade. Genome assemblies from high-throughput, short-read sequencing, however, are often not as contiguous as the first generation of genome assemblies. Whereas early genome assembly projects were often aided by clone maps or other mapping data, many current assembly projects forego these scaffolding data and only assemble genomes into smaller segments. Recently, new technologies have been invented that allow chromosome-scale assembly at a lower cost and faster speed than traditional methods. Here, we give an overview of the problem of chromosome-scale assembly and traditional methods for tackling this problem. We then review new technologies for chromosome-scale assembly and recent genome projects that used these technologies to create highly contiguous genome assemblies at low cost.
Collapse
Affiliation(s)
- Edward S. Rice
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA;,
| | - Richard E. Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA;,
- Dovetail Genomics, LLC, Santa Cruz, California 95060, USA
| |
Collapse
|
38
|
Ruvinskiy D, Larkin DM, Farré M. A Near Chromosome Assembly of the Dromedary Camel Genome. Front Genet 2019; 10:32. [PMID: 30804979 PMCID: PMC6371769 DOI: 10.3389/fgene.2019.00032] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Accepted: 01/17/2019] [Indexed: 01/11/2023] Open
Abstract
The dromedary camel is an economically and socially important species of livestock in many parts of the world, being used for transport and the production of milk and meat. Much like cattle and horses, the camel may be found in industrial farming conditions as well as used in sporting. Camel racing is a multi-million dollar industry, with some specimens being valued at upward of 9.5 million USD. Despite its apparent value to humans, the dromedary camel is a neglected species in genomics. While cattle and other domesticated species have had much attention in terms of genome assembly, the camel has only been assembled to scaffold level, which does not give a clear indication of the order or chromosomal location of sequenced fragments. In this study, the Reference Assistant Chromosome Assembly (RACA) algorithm was implemented to use read-pair information of camel scaffolds, aligned with the cattle and human genomes in order to organize and orient these scaffolds in a near-chromosome level assembly. This method generated 72 large size fragments (N50 54.36 Mb). These predicted chromosome fragments (PCFs) were then compared with comparative maps of camel and cytogenetic map of alpaca chromosomes, allowing us to further upgrade the assembly. This dromedary camel assembly will be an invaluable tool to verify future camel assemblies generated with chromatin conformation or/and long read technologies. This study provides the first near-chromosome assembly of the dromedary camel, thus adding this economically important species to a growing pool of knowledge regarding the genome structure of domesticated livestock.
Collapse
Affiliation(s)
- Daniil Ruvinskiy
- Comparative Biomedical Sciences, Royal Veterinary College, University of London, London, United Kingdom
| | - Denis M Larkin
- Comparative Biomedical Sciences, Royal Veterinary College, University of London, London, United Kingdom.,The Federal Research Center, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Marta Farré
- Comparative Biomedical Sciences, Royal Veterinary College, University of London, London, United Kingdom.,School of Biosciences, University of Kent, Canterbury, United Kingdom
| |
Collapse
|
39
|
Farré M, Li Q, Zhou Y, Damas J, Chemnick LG, Kim J, Ryder OA, Ma J, Zhang G, Larkin DM, Lewin HA. A near-chromosome-scale genome assembly of the gemsbok (Oryx gazella): an iconic antelope of the Kalahari desert. Gigascience 2019; 8:5289690. [PMID: 30649288 PMCID: PMC6351727 DOI: 10.1093/gigascience/giy162] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Accepted: 12/12/2018] [Indexed: 12/22/2022] Open
Abstract
Background The gemsbok (Oryx gazella) is one of the largest antelopes in Africa. Gemsbok are heterothermic and thus highly adapted to live in the desert, changing their feeding behavior when faced with extreme drought and heat. A high-quality genome sequence of this species will assist efforts to elucidate these and other important traits of gemsbok and facilitate research on conservation efforts. Findings Using 180 Gbp of Illumina paired-end and mate-pair reads, a 2.9 Gbp assembly with scaffold N50 of 1.48 Mbp was generated using SOAPdenovo. Scaffolds were extended using Chicago library sequencing, which yielded an additional 114.7 Gbp of DNA sequence. The HiRise assembly using SOAPdenovo + Chicago library sequencing produced a scaffold N50 of 47 Mbp and a final genome size of 2.9 Gbp, representing 90.6% of the estimated genome size and including 93.2% of expected genes according to Benchmarking Universal Single-Copy Orthologs analysis. The Reference-Assisted Chromosome Assembly tool was used to generate a final set of 47 predicted chromosome fragments with N50 of 86.25 Mbp and containing 93.8% of expected genes. A total of 23,125 protein-coding genes and 1.14 Gbp of repetitive sequences were annotated using de novo and homology-based predictions. Conclusions Our results provide the first high-quality, chromosome-scale genome sequence assembly for gemsbok, which will be a valuable resource for studying adaptive evolution of this species and other ruminants.
Collapse
Affiliation(s)
- Marta Farré
- Department of Comparative Biomedical Sciences, Royal Veterinary College, University of London, UK
| | - Qiye Li
- State Key Laboratory of Genetic Resources and Department of Comparative Biomedical Sciences Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China.,China National Genebank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China
| | - Yang Zhou
- China National Genebank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China.,Centre for Social Evolution, Department of Biology, Universitetsparken 15, University of Copenhagen, DK-2100 Copenhagen, Denmark
| | - Joana Damas
- Department of Comparative Biomedical Sciences, Royal Veterinary College, University of London, UK
| | - Leona G Chemnick
- Institute for Conservation Research, San Diego Zoo, Escondido, California, USA
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul 05029, South Korea
| | - Oliver A Ryder
- Institute for Conservation Research, San Diego Zoo, Escondido, California, USA
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, USA
| | - Guojie Zhang
- State Key Laboratory of Genetic Resources and Department of Comparative Biomedical Sciences Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China.,China National Genebank, BGI-Shenzhen, Dapeng New District, Shenzhen 518120, China.,Centre for Social Evolution, Department of Biology, Universitetsparken 15, University of Copenhagen, DK-2100 Copenhagen, Denmark
| | - Denis M Larkin
- Department of Comparative Biomedical Sciences, Royal Veterinary College, University of London, UK
| | - Harris A Lewin
- The UC Davis Genome Center, Department of Evolution and Ecology, College of Biological Sciences, and the Department of Reproduction and Population Health, School of Veterinary Medicine, University of California, Davis, USA
| |
Collapse
|
40
|
Takahashi Y, Sakai H, Yoshitsu Y, Muto C, Anai T, Pandiyan M, Senthil N, Tomooka N, Naito K. Domesticating Vigna Stipulacea: A Potential Legume Crop With Broad Resistance to Biotic Stresses. FRONTIERS IN PLANT SCIENCE 2019; 10:1607. [PMID: 31867036 PMCID: PMC6909428 DOI: 10.3389/fpls.2019.01607] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Accepted: 11/15/2019] [Indexed: 05/03/2023]
Abstract
Though crossing wild relatives to modern cultivars is a usual means to introduce alleles of stress tolerance, an alternative is de novo domesticating wild species that are already tolerant to various kinds of stresses. As a test case, we chose Vigna stipulacea Kuntze, which has fast growth, short vegetative stage, and broad resistance to pests and diseases. We developed an ethyl methanesulfonate-mutagenized population and obtained three mutants with reduced seed dormancy and one with reduced pod shattering. We crossed one of the mutants of less seed dormancy to the wild type and confirmed that the phenotype was inherited in a Mendelian manner. De novo assembly of V. stipulacea genome, and the following resequencing of the F2 progenies successfully identified a Single Nucleotide Polymorphism (SNP) associated with seed dormancy. By crossing and pyramiding the mutant phenotypes, we will be able to turn V. stipulacea into a crop which is yet primitive but can be cultivated without pesticides.
Collapse
Affiliation(s)
| | | | - Yuki Yoshitsu
- Kenpoku Agricultural Institute, Iwate Agricultural Research Center, Iwate, Japan
| | - Chiaki Muto
- Genetic Resources Center, NARO, Tsukuba, Japan
| | - Toyoaki Anai
- Department of Agriculture, Saga University, Saga, Japan
| | - Muthaiyan Pandiyan
- Agricultural College and Research Institute, Tamil Nadu Agricultural University, Thanjavur, India
| | - Natesan Senthil
- Agricultural College and Research Institute, Tamil Nadu Agricultural University, Madurai, India
| | | | - Ken Naito
- Genetic Resources Center, NARO, Tsukuba, Japan
- *Correspondence: Ken Naito,
| |
Collapse
|
41
|
Zhao X, Luo M, Li Z, Zhong P, Cheng Y, Lai F, Wang X, Min J, Bai M, Yang Y, Cheng H, Zhou R. Chromosome-scale assembly of the Monopterus genome. Gigascience 2018; 7:4982940. [PMID: 29688346 PMCID: PMC5946948 DOI: 10.1093/gigascience/giy046] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2017] [Accepted: 04/16/2018] [Indexed: 01/10/2023] Open
Abstract
Background The teleost fish Monopterus albus is emerging as a new model for biological studies due to its natural sex transition and small genome, in addition to its enormous economic and potential medical value. However, no genomic information for the Monopterus is currently available. Findings Here, we sequenced and de novo assembled the genome of M. albus and report the de novochromosome assembly by FISH walking assisted by conserved synteny (Cafs). Using Cafs, 328 scaffolds were assembled into 12 chromosomes, which covered genomic sequences of 555 Mb, accounting for 81.3% of the sequences assembled in scaffolds (∼689 Mb). A total of 18 ,660 genes were mapped on the chromosomes and showed a nonrandom distribution along chromosomes. Conclusions We report the first reference genome of the Monopterus and provide an efficient Cafs strategy for a de novo chromosome-level assembly of the Monopterus genome, which provides a valuable resource, not only for further studies in genetics, evolution, and development, particularly sex determination, but also for breed improvement of the species.
Collapse
Affiliation(s)
- Xueya Zhao
- Hubei Key Laboratory of Cell Homeostasis, Laboratory of Molecular and Developmental Genetics, College of Life Sciences, Wuhan University, Wuhan 430072, P. R. China
| | - Majing Luo
- Hubei Key Laboratory of Cell Homeostasis, Laboratory of Molecular and Developmental Genetics, College of Life Sciences, Wuhan University, Wuhan 430072, P. R. China
| | - Zhigang Li
- Hubei Key Laboratory of Cell Homeostasis, Laboratory of Molecular and Developmental Genetics, College of Life Sciences, Wuhan University, Wuhan 430072, P. R. China
| | - Pei Zhong
- Hubei Key Laboratory of Cell Homeostasis, Laboratory of Molecular and Developmental Genetics, College of Life Sciences, Wuhan University, Wuhan 430072, P. R. China
| | - Yibin Cheng
- Hubei Key Laboratory of Cell Homeostasis, Laboratory of Molecular and Developmental Genetics, College of Life Sciences, Wuhan University, Wuhan 430072, P. R. China
| | - Fengling Lai
- Hubei Key Laboratory of Cell Homeostasis, Laboratory of Molecular and Developmental Genetics, College of Life Sciences, Wuhan University, Wuhan 430072, P. R. China
| | - Xin Wang
- Hubei Key Laboratory of Cell Homeostasis, Laboratory of Molecular and Developmental Genetics, College of Life Sciences, Wuhan University, Wuhan 430072, P. R. China
| | - Jiumeng Min
- BGI Genomics, BGI-Shenzhen, Shenzhen 518083, P. R. China
| | - Mingzhou Bai
- BGI Genomics, BGI-Shenzhen, Shenzhen 518083, P. R. China
| | - Yulan Yang
- BGI Genomics, BGI-Shenzhen, Shenzhen 518083, P. R. China
| | - Hanhua Cheng
- BGI Genomics, BGI-Shenzhen, Shenzhen 518083, P. R. China
| | - Rongjia Zhou
- Hubei Key Laboratory of Cell Homeostasis, Laboratory of Molecular and Developmental Genetics, College of Life Sciences, Wuhan University, Wuhan 430072, P. R. China
| |
Collapse
|
42
|
Grau JH, Hackl T, Koepfli KP, Hofreiter M. Improving draft genome contiguity with reference-derived in silico mate-pair libraries. Gigascience 2018; 7:4980916. [PMID: 29688527 PMCID: PMC5967465 DOI: 10.1093/gigascience/giy029] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 03/20/2018] [Indexed: 11/29/2022] Open
Abstract
Background Contiguous genome assemblies are a highly valued biological resource because of the higher number of completely annotated genes and genomic elements that are usable compared to fragmented draft genomes. Nonetheless, contiguity is difficult to obtain if only low coverage data and/or only distantly related reference genome assemblies are available. Findings In order to improve genome contiguity, we have developed Cross-Species Scaffolding—a new pipeline that imports long-range distance information directly into the de novo assembly process by constructing mate-pair libraries in silico. Conclusions We show how genome assembly metrics and gene prediction dramatically improve with our pipeline by assembling two primate genomes solely based on ∼30x coverage of shotgun sequencing data.
Collapse
Affiliation(s)
- José Horacio Grau
- Museum für Naturkunde Berlin, Leibniz-Institut für Evolutions- und Biodiversitätsforschung an der Humboldt-Universität zu Berlin. Invalidenstraße 43, 10115. Berlin, Germany
| | - Thomas Hackl
- Massachusetts Institute of Technology, Department of Civil and Environmental Engineering, 15 Vassar Street, Cambridge, MA, 02139, USA
| | - Klaus-Peter Koepfli
- Smithsonian Conservation Biology Institute, National Zoological Park, 3001 Connecticut Avenue NW, Washington, D.C. 20008, USA.,Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, Sredniy Prospekt 41A, St. Petersburg, 199004, Russia
| | - Michael Hofreiter
- Faculty of Mathematics and Life Sciences, Institute of Biochemistry and Biology, Unit of General Zoology-Evolutionary Adaptive Genomics, University of Potsdam, Karl-Liebknecht-Straße 24-25, 14476 Potsdam, Germany
| |
Collapse
|
43
|
Kolmogorov M, Armstrong J, Raney BJ, Streeter I, Dunn M, Yang F, Odom D, Flicek P, Keane TM, Thybert D, Paten B, Pham S. Chromosome assembly of large and complex genomes using multiple references. Genome Res 2018; 28:1720-1732. [PMID: 30341161 PMCID: PMC6211643 DOI: 10.1101/gr.236273.118] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 09/24/2018] [Indexed: 11/25/2022]
Abstract
Despite the rapid development of sequencing technologies, the assembly of mammalian-scale genomes into complete chromosomes remains one of the most challenging problems in bioinformatics. To help address this difficulty, we developed Ragout 2, a reference-assisted assembly tool that works for large and complex genomes. By taking one or more target assemblies (generated from an NGS assembler) and one or multiple related reference genomes, Ragout 2 infers the evolutionary relationships between the genomes and builds the final assemblies using a genome rearrangement approach. By using Ragout 2, we transformed NGS assemblies of 16 laboratory mouse strains into sets of complete chromosomes, leaving <5% of sequence unlocalized per set. Various benchmarks, including PCR testing and realigning of long Pacific Biosciences (PacBio) reads, suggest only a small number of structural errors in the final assemblies, comparable with direct assembly approaches. We applied Ragout 2 to the Mus caroli and Mus pahari genomes, which exhibit karyotype-scale variations compared with other genomes from the Muridae family. Chromosome painting maps confirmed most large-scale rearrangements that Ragout 2 detected. We applied Ragout 2 to improve draft sequences of three ape genomes that have recently been published. Ragout 2 transformed three sets of contigs (generated using PacBio reads only) into chromosome-scale assemblies with accuracy comparable to chromosome assemblies generated in the original study using BioNano maps, Hi-C, BAC clones, and FISH.
Collapse
Affiliation(s)
- Mikhail Kolmogorov
- Department of Computer Science and Engineering, University of California, San Diego, California 92093, USA
| | - Joel Armstrong
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA
| | - Brian J Raney
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA
| | - Ian Streeter
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
| | - Matthew Dunn
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Fengtang Yang
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Duncan Odom
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
- Cancer Research UK Cambridge Institute, University of Cambridge, CB2 0RE Cambridge, United Kingdom
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Thomas M Keane
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
- School of Life Sciences, University of Nottingham, Nottingham NG7 2NR, United Kingdom
| | - David Thybert
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
- Earlham Institute, Norwich Research Park, Norwich NR4 7UG, United Kingdom
| | - Benedict Paten
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA
| | - Son Pham
- BioTuring Incorporated, San Diego, California 92121, USA
| |
Collapse
|
44
|
O’Connor RE, Farré M, Joseph S, Damas J, Kiazim L, Jennings R, Bennett S, Slack EA, Allanson E, Larkin DM, Griffin DK. Chromosome-level assembly reveals extensive rearrangement in saker falcon and budgerigar, but not ostrich, genomes. Genome Biol 2018; 19:171. [PMID: 30355328 PMCID: PMC6201548 DOI: 10.1186/s13059-018-1550-x] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Accepted: 09/24/2018] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND The number of de novo genome sequence assemblies is increasing exponentially; however, relatively few contain one scaffold/contig per chromosome. Such assemblies are essential for studies of genotype-to-phenotype association, gross genomic evolution, and speciation. Inter-species differences can arise from chromosomal changes fixed during evolution, and we previously hypothesized that a higher fraction of elements under negative selection contributed to avian-specific phenotypes and avian genome organization stability. The objective of this study is to generate chromosome-level assemblies of three avian species (saker falcon, budgerigar, and ostrich) previously reported as karyotypically rearranged compared to most birds. We also test the hypothesis that the density of conserved non-coding elements is associated with the positions of evolutionary breakpoint regions. RESULTS We used reference-assisted chromosome assembly, PCR, and lab-based molecular approaches, to generate chromosome-level assemblies of the three species. We mapped inter- and intrachromosomal changes from the avian ancestor, finding no interchromosomal rearrangements in the ostrich genome, despite it being previously described as chromosomally rearranged. We found that the average density of conserved non-coding elements in evolutionary breakpoint regions is significantly reduced. Fission evolutionary breakpoint regions have the lowest conserved non-coding element density, and intrachromomosomal evolutionary breakpoint regions have the highest. CONCLUSIONS The tools used here can generate inexpensive, efficient chromosome-level assemblies, with > 80% assigned to chromosomes, which is comparable to genomes assembled using high-density physical or genetic mapping. Moreover, conserved non-coding elements are important factors in defining where rearrangements, especially interchromosomal, are fixed during evolution without deleterious effects.
Collapse
Affiliation(s)
| | - Marta Farré
- Department of Comparative Biomedical Sciences, Royal Veterinary College, University of London, London, UK
| | - Sunitha Joseph
- School of Biosciences, University of Kent, Canterbury, UK
| | - Joana Damas
- Department of Comparative Biomedical Sciences, Royal Veterinary College, University of London, London, UK
| | - Lucas Kiazim
- School of Biosciences, University of Kent, Canterbury, UK
| | | | - Sophie Bennett
- School of Biosciences, University of Kent, Canterbury, UK
| | - Eden A Slack
- Department of Comparative Biomedical Sciences, Royal Veterinary College, University of London, London, UK
| | - Emily Allanson
- Department of Comparative Biomedical Sciences, Royal Veterinary College, University of London, London, UK
| | - Denis M Larkin
- Department of Comparative Biomedical Sciences, Royal Veterinary College, University of London, London, UK
| | | |
Collapse
|
45
|
Chromosome Level Genome Assembly and Comparative Genomics between Three Falcon Species Reveals an Unusual Pattern of Genome Organisation. DIVERSITY-BASEL 2018. [DOI: 10.3390/d10040113] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Whole genome assemblies are crucial for understanding a wide range of aspects of falcon biology, including morphology, ecology, and physiology, and are thus essential for their care and conservation. A key aspect of the genome of any species is its karyotype, which can then be linked to the whole genome sequence to generate a so-called chromosome-level assembly. Chromosome-level assemblies are essential for marker assisted selection and genotype-phenotype correlations in breeding regimes, as well as determining patterns of gross genomic evolution. To date, only two falcon species have been sequenced and neither initially were assembled to the chromosome level. Falcons have atypical avian karyotypes with fewer chromosomes than other birds, presumably brought about by wholesale fusion. To date, however, published chromosome preparations are of poor quality, few chromosomes have been distinguished and standard ideograms have not been made. The purposes of this study were to generate analyzable karyotypes and ideograms of peregrine, saker, and gyr falcons, report on our recent generation of chromosome level sequence assemblies of peregrine and saker falcons, and for the first time, sequence the gyr falcon genome. Finally, we aimed to generate comparative genomic data between all three species and the reference chicken genome. Results revealed a diploid number of 2n = 50 for peregrine falcon and 2n = 52 for saker and gyr through high quality banded chromosomes. Standard ideograms that are generated here helped to map predicted chromosomal fragments (PCFs) from the genome sequences directly to chromosomes and thus generate chromosome level sequence assemblies for peregrine and saker falcons. Whole genome sequencing was successful in gyr falcon, but read depth and coverage was not sufficient to generate a chromosome level assembly. Nonetheless, comparative genomics revealed no differences in genome organization between gyr and saker falcons. When compared to peregrine falcon, saker/gyr differed by one interchromosomal and seven intrachromosomal rearrangements (a fusion plus seven inversions), whereas peregrine and saker/gyr differ from the reference chicken genome by 14/13 fusions (11 microchromosomal) and six fissions. The chromosomal differences between the species could potentially provide the basis of a screening test for hybrid animals.
Collapse
|
46
|
Abstract
Increasing our understanding of Earth's biodiversity and responsibly stewarding its resources are among the most crucial scientific and social challenges of the new millennium. These challenges require fundamental new knowledge of the organization, evolution, functions, and interactions among millions of the planet's organisms. Herein, we present a perspective on the Earth BioGenome Project (EBP), a moonshot for biology that aims to sequence, catalog, and characterize the genomes of all of Earth's eukaryotic biodiversity over a period of 10 years. The outcomes of the EBP will inform a broad range of major issues facing humanity, such as the impact of climate change on biodiversity, the conservation of endangered species and ecosystems, and the preservation and enhancement of ecosystem services. We describe hurdles that the project faces, including data-sharing policies that ensure a permanent, freely available resource for future scientific discovery while respecting access and benefit sharing guidelines of the Nagoya Protocol. We also describe scientific and organizational challenges in executing such an ambitious project, and the structure proposed to achieve the project's goals. The far-reaching potential benefits of creating an open digital repository of genomic information for life on Earth can be realized only by a coordinated international effort.
Collapse
|
47
|
Turner I, Garimella KV, Iqbal Z, McVean G. Integrating long-range connectivity information into de Bruijn graphs. Bioinformatics 2018; 34:2556-2565. [PMID: 29554215 PMCID: PMC6061703 DOI: 10.1093/bioinformatics/bty157] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Revised: 11/25/2017] [Accepted: 03/14/2018] [Indexed: 12/27/2022] Open
Abstract
Motivation The de Bruijn graph is a simple and efficient data structure that is used in many areas of sequence analysis including genome assembly, read error correction and variant calling. The data structure has a single parameter k, is straightforward to implement and is tractable for large genomes with high sequencing depth. It also enables representation of multiple samples simultaneously to facilitate comparison. However, unlike the string graph, a de Bruijn graph does not retain long range information that is inherent in the read data. For this reason, applications that rely on de Bruijn graphs can produce sub-optimal results given their input data. Results We present a novel assembly graph data structure: the Linked de Bruijn Graph (LdBG). Constructed by adding annotations on top of a de Bruijn graph, it stores long range connectivity information through the graph. We show that with error-free data it is possible to losslessly store and recover sequence from a Linked de Bruijn graph. With assembly simulations we demonstrate that the LdBG data structure outperforms both our de Bruijn graph and the String Graph Assembler (SGA). Finally we apply the LdBG to Klebsiella pneumoniae short read data to make large (12 kbp) variant calls, which we validate using PacBio sequencing data, and to characterize the genomic context of drug-resistance genes. Availability and implementation Linked de Bruijn Graphs and associated algorithms are implemented as part of McCortex, which is available under the MIT license at https://github.com/mcveanlab/mccortex. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Isaac Turner
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Kiran V Garimella
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Zamin Iqbal
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Gil McVean
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| |
Collapse
|
48
|
Rando HM, Farré M, Robson MP, Won NB, Johnson JL, Buch R, Bastounes ER, Xiang X, Feng S, Liu S, Xiong Z, Kim J, Zhang G, Trut LN, Larkin DM, Kukekova AV. Construction of Red Fox Chromosomal Fragments from the Short-Read Genome Assembly. Genes (Basel) 2018; 9:E308. [PMID: 29925783 PMCID: PMC6027122 DOI: 10.3390/genes9060308] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2018] [Revised: 05/19/2018] [Accepted: 06/04/2018] [Indexed: 01/08/2023] Open
Abstract
The genome of a red fox (Vulpes vulpes) was recently sequenced and assembled using next-generation sequencing (NGS). The assembly is of high quality, with 94X coverage and a scaffold N50 of 11.8 Mbp, but is split into 676,878 scaffolds, some of which are likely to contain assembly errors. Fragmentation and misassembly hinder accurate gene prediction and downstream analysis such as the identification of loci under selection. Therefore, assembly of the genome into chromosome-scale fragments was an important step towards developing this genomic model. Scaffolds from the assembly were aligned to the dog reference genome and compared to the alignment of an outgroup genome (cat) against the dog to identify syntenic sequences among species. The program Reference-Assisted Chromosome Assembly (RACA) then integrated the comparative alignment with the mapping of the raw sequencing reads generated during assembly against the fox scaffolds. The 128 sequence fragments RACA assembled were compared to the fox meiotic linkage map to guide the construction of 40 chromosomal fragments. This computational approach to assembly was facilitated by prior research in comparative mammalian genomics, and the continued improvement of the red fox genome can in turn offer insight into canid and carnivore chromosome evolution. This assembly is also necessary for advancing genetic research in foxes and other canids.
Collapse
Affiliation(s)
- Halie M Rando
- Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
- Department of Animal Science, College of Agricultural, Consumer and Environmental Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
| | - Marta Farré
- Department of Comparative Biomedical Science, Royal Veterinary College, London NW1 0TU, UK.
| | - Michael P Robson
- Department of Computer Science, College of Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
| | - Naomi B Won
- Department of Animal Science, College of Agricultural, Consumer and Environmental Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
| | - Jennifer L Johnson
- Department of Animal Science, College of Agricultural, Consumer and Environmental Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
| | - Ronak Buch
- Department of Computer Science, College of Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
| | - Estelle R Bastounes
- Department of Animal Science, College of Agricultural, Consumer and Environmental Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
| | - Xueyan Xiang
- China National Genebank, BGI -Shenzhen, Shenzhen 518083, Guangdong, China.
| | - Shaohong Feng
- China National Genebank, BGI -Shenzhen, Shenzhen 518083, Guangdong, China.
| | - Shiping Liu
- China National Genebank, BGI -Shenzhen, Shenzhen 518083, Guangdong, China.
| | - Zijun Xiong
- China National Genebank, BGI -Shenzhen, Shenzhen 518083, Guangdong, China.
| | - Jaebum Kim
- Department of Stem Cell and Regenerative Biology, Konkuk University, Seoul 05029, Korea.
| | - Guojie Zhang
- China National Genebank, BGI -Shenzhen, Shenzhen 518083, Guangdong, China.
- Section for Ecology and Evolution, Department of Biology, Universitetsparken 15, University of Copenhagen, DK-2100 Copenhagen, Denmark.
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China.
| | - Lyudmila N Trut
- Institute of Cytology and Genetics of the Russian Academy of Sciences, Novosibirsk 630090, Russia.
| | - Denis M Larkin
- Department of Comparative Biomedical Science, Royal Veterinary College, London NW1 0TU, UK.
| | - Anna V Kukekova
- Department of Animal Science, College of Agricultural, Consumer and Environmental Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
| |
Collapse
|
49
|
Anselmetti Y, Duchemin W, Tannier E, Chauve C, Bérard S. Phylogenetic signal from rearrangements in 18 Anopheles species by joint scaffolding extant and ancestral genomes. BMC Genomics 2018; 19:96. [PMID: 29764366 PMCID: PMC5954271 DOI: 10.1186/s12864-018-4466-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background Genomes rearrangements carry valuable information for phylogenetic inference or the elucidation of molecular mechanisms of adaptation. However, the detection of genome rearrangements is often hampered by current deficiencies in data and methods: Genomes obtained from short sequence reads have generally very fragmented assemblies, and comparing multiple gene orders generally leads to computationally intractable algorithmic questions. Results We present a computational method, ADseq, which, by combining ancestral gene order reconstruction, comparative scaffolding and de novo scaffolding methods, overcomes these two caveats. ADseq provides simultaneously improved assemblies and ancestral genomes, with statistical supports on all local features. Compared to previous comparative methods, it runs in polynomial time, it samples solutions in a probabilistic space, and it can handle a significantly larger gene complement from the considered extant genomes, with complex histories including gene duplications and losses. We use ADseq to provide improved assemblies and a genome history made of duplications, losses, gene translocations, rearrangements, of 18 complete Anopheles genomes, including several important malaria vectors. We also provide additional support for a differentiated mode of evolution of the sex chromosome and of the autosomes in these mosquito genomes. Conclusions We demonstrate the method’s ability to improve extant assemblies accurately through a procedure simulating realistic assembly fragmentation. We study a debated issue regarding the phylogeny of the Gambiae complex group of Anopheles genomes in the light of the evolution of chromosomal rearrangements, suggesting that the phylogenetic signal they carry can differ from the phylogenetic signal carried by gene sequences, more prone to introgression. Electronic supplementary material The online version of this article (10.1186/s12864-018-4466-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yoann Anselmetti
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France.,Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex, 69622, France
| | - Wandrille Duchemin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex, 69622, France.,INRIA Grenoble - Rhône-Alpes, 655 Avenue de l'Europe, Montbonnot-Saint-Martin, 38330, France
| | - Eric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558, 43 Boulevard du 11 novembre 1918, Villeurbanne cedex, 69622, France.,INRIA Grenoble - Rhône-Alpes, 655 Avenue de l'Europe, Montbonnot-Saint-Martin, 38330, France
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, V5A1S6, BC, Canada
| | - Sèverine Bérard
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France.
| |
Collapse
|
50
|
Huang YX, Li L, Yang L, Zhang Y. Technique of laser chromosome welding for chromosome repair and artificial chromosome creation. BIOMEDICAL OPTICS EXPRESS 2018; 9:1783-1794. [PMID: 29675319 PMCID: PMC5905923 DOI: 10.1364/boe.9.001783] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 03/05/2018] [Accepted: 03/06/2018] [Indexed: 06/08/2023]
Abstract
Here we report a technique of laser chromosome welding that uses a violet pulse laser micro-beam for welding. The technique can integrate any size of a desired chromosome fragment into recipient chromosomes by combining with other techniques of laser chromosome manipulation such as chromosome cutting, moving, and stretching. We demonstrated that our method could perform chromosomal modifications with high precision, speed and ease of use in the absence of restriction enzymes, DNA ligases and DNA polymerases. Unlike the conventional methods such as de novo artificial chromosome synthesis, our method has no limitation on the size of the inserted chromosome fragment. The inserted DNA size can be precisely defined and the processed chromosome can retain its intrinsic structure and integrity. Therefore, our technique provides a high quality alternative approach to directed genetic recombination, and can be used for chromosomal repair, removal of defects and artificial chromosome creation. The technique may also have applicability on the manipulation and extension of large pieces of synthetic DNA.
Collapse
|