1
|
Cicherski A, Lisiecka A, Dojer N. AlfaPang: alignment free algorithm for pangenome graph construction. Algorithms Mol Biol 2025; 20:7. [PMID: 40375333 DOI: 10.1186/s13015-025-00277-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Accepted: 04/09/2025] [Indexed: 05/18/2025] Open
Abstract
The success of pangenome-based approaches to genomics analysis depends largely on the existence of efficient methods for constructing pangenome graphs that are applicable to large genome collections. In the current paper we present AlfaPang, a new pangenome graph building algorithm. AlfaPang is based on a novel alignment-free approach that allows to construct pangenome graphs using significantly less computational resources than state-of-the-art tools. The code of AlfaPang is freely available at https://github.com/AdamCicherski/AlfaPang .
Collapse
Affiliation(s)
- Adam Cicherski
- Institute of Informatics, University of Warsaw, Banacha 2, 02-097, Warsaw, Poland.
| | - Anna Lisiecka
- Institute of Informatics, University of Warsaw, Banacha 2, 02-097, Warsaw, Poland.
| | - Norbert Dojer
- Institute of Informatics, University of Warsaw, Banacha 2, 02-097, Warsaw, Poland.
| |
Collapse
|
2
|
Vrček L, Bresson X, Laurent T, Schmitz M, Kawaguchi K, Šikić M. Geometric deep learning framework for de novo genome assembly. Genome Res 2025; 35:839-849. [PMID: 39472021 PMCID: PMC12047240 DOI: 10.1101/gr.279307.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 10/18/2024] [Indexed: 03/16/2025]
Abstract
The critical stage of every de novo genome assembler is identifying paths in assembly graphs that correspond to the reconstructed genomic sequences. The existing algorithmic methods struggle with this, primarily due to repetitive regions causing complex graph tangles, leading to fragmented assemblies. Here, we introduce GNNome, a framework for path identification based on geometric deep learning that enables training models on assembly graphs without relying on existing assembly strategies. By leveraging only the symmetries inherent to the problem, GNNome reconstructs assemblies from PacBio HiFi reads with contiguity and quality comparable to those of the state-of-the-art tools across several species. With every new genome assembled telomere-to-telomere, the amount of reliable training data at our disposal increases. Combining the straightforward generation of abundant simulated data for diverse genomic structures with the AI approach makes the proposed framework a plausible cornerstone for future work on reconstructing complex genomes with different degrees of ploidy and aneuploidy. To facilitate such developments, we make the framework and the best-performing model publicly available, provided as a tool that can directly be used to assemble new haploid genomes.
Collapse
Affiliation(s)
- Lovro Vrček
- Genome Institute of Singapore, A*STAR, Singapore 138672;
- Faculty of Electrical Engineering and Computing, University of Zagreb, 10000, Zagreb, Croatia
| | - Xavier Bresson
- School of Computing, National University of Singapore, Singapore 117417
| | - Thomas Laurent
- Department of Mathematics, Loyola Marymount University, Los Angeles, California 90045, USA
| | - Martin Schmitz
- Genome Institute of Singapore, A*STAR, Singapore 138672
- School of Computing, National University of Singapore, Singapore 117417
| | - Kenji Kawaguchi
- School of Computing, National University of Singapore, Singapore 117417
| | - Mile Šikić
- Genome Institute of Singapore, A*STAR, Singapore 138672;
- Faculty of Electrical Engineering and Computing, University of Zagreb, 10000, Zagreb, Croatia
| |
Collapse
|
3
|
Wagner J, Olson ND, McDaniel J, Harris L, Pinto BJ, Jáspez D, Muñoz-Barrera A, Rubio-Rodríguez LA, Lorenzo-Salazar JM, Flores C, Sahraeian SME, Narzisi G, Byrska-Bishop M, Evani US, Xiao C, Lake JA, Fontana P, Greenberg C, Freed D, Mootor MFE, Boutros PC, Murray L, Shafin K, Carroll A, Sedlazeck FJ, Wilson M, Zook JM. Small variant benchmark from a complete assembly of X and Y chromosomes. Nat Commun 2025; 16:497. [PMID: 39779690 PMCID: PMC11711550 DOI: 10.1038/s41467-024-55710-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 12/19/2024] [Indexed: 01/11/2025] Open
Abstract
The sex chromosomes contain complex, important genes impacting medical phenotypes, but differ from the autosomes in their ploidy and large repetitive regions. To enable technology developers along with research and clinical laboratories to evaluate variant detection on male sex chromosomes X and Y, we create a small variant benchmark set with 111,725 variants for the Genome in a Bottle HG002 reference material. We develop an active evaluation approach to demonstrate the benchmark set reliably identifies errors in challenging genomic regions and across short and long read callsets. We show how complete assemblies can expand benchmarks to difficult regions, but highlight remaining challenges benchmarking variants in long homopolymers and tandem repeats, complex gene conversions, copy number variable gene arrays, and human satellites.
Collapse
Affiliation(s)
- Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr., Gaithersburg, MD, USA
| | - Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr., Gaithersburg, MD, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr., Gaithersburg, MD, USA
| | - Lindsay Harris
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr., Gaithersburg, MD, USA
| | - Brendan J Pinto
- Center for Evolution & Medicine and School of Life Sciences, Arizona State University, Tempe, AZ 85281 USA - Department of Zoology, Milwaukee Public Museum, Milwaukee, WI, USA
| | - David Jáspez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
| | - Adrián Muñoz-Barrera
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
| | - Luis A Rubio-Rodríguez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
| | - José M Lorenzo-Salazar
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
| | - Carlos Flores
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Granadilla de Abona, Spain
- CIBER de Enfermedades Respiratorias (CIBERES), Instituto de Salud Carlos III, Madrid, Spain
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, Instituto de Investigación Sanitaria de Canarias, Santa Cruz de Tenerife, Spain
- Facultad de Ciencias de la Salud, Universidad Fernando de Pessoa Canarias, Las Palmas de Gran Canaria, Spain
| | | | | | | | | | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | - Peter Fontana
- Information Technology Laboratory, National Institute of Standards and Technology, 100 Bureau Dr. Mailstop 8940, Gaithersburg, MD, USA
| | - Craig Greenberg
- Information Technology Laboratory, National Institute of Standards and Technology, 100 Bureau Dr. Mailstop 8940, Gaithersburg, MD, USA
| | | | | | - Paul C Boutros
- Department of Human Genetics, University of California Los Angeles, Los Angeles, CA, USA
| | | | - Kishwar Shafin
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
| | - Andrew Carroll
- Google Inc, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
| | - Fritz J Sedlazeck
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Melissa Wilson
- Center for Evolution & Medicine and School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr., Gaithersburg, MD, USA.
| |
Collapse
|
4
|
Secomandi S, Gallo GR, Rossi R, Rodríguez Fernandes C, Jarvis ED, Bonisoli-Alquati A, Gianfranceschi L, Formenti G. Pangenome graphs and their applications in biodiversity genomics. Nat Genet 2025; 57:13-26. [PMID: 39779953 DOI: 10.1038/s41588-024-02029-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 11/08/2024] [Indexed: 01/11/2025]
Abstract
Complete datasets of genetic variants are key to biodiversity genomic studies. Long-read sequencing technologies allow the routine assembly of highly contiguous, haplotype-resolved reference genomes. However, even when complete, reference genomes from a single individual may bias downstream analyses and fail to adequately represent genetic diversity within a population or species. Pangenome graphs assembled from aligned collections of high-quality genomes can overcome representation bias by integrating sequence information from multiple genomes from the same population, species or genus into a single reference. Here, we review the available tools and data structures to build, visualize and manipulate pangenome graphs while providing practical examples and discussing their applications in biodiversity and conservation genomics across the tree of life.
Collapse
Affiliation(s)
- Simona Secomandi
- Laboratory of Neurogenetics of Language, the Rockefeller University, New York, NY, USA
| | | | - Riccardo Rossi
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy
| | - Carlos Rodríguez Fernandes
- Centre for Ecology, Evolution and Environmental Changes (CE3C) and CHANGE, Global Change and Sustainability Institute, Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
- Faculdade de Psicologia, Universidade de Lisboa, Lisboa, Portugal
| | - Erich D Jarvis
- Laboratory of Neurogenetics of Language, the Rockefeller University, New York, NY, USA
- The Vertebrate Genome Laboratory, New York, NY, USA
| | - Andrea Bonisoli-Alquati
- Department of Biological Sciences, California State Polytechnic University, Pomona, Pomona, CA, USA
| | | | | |
Collapse
|
5
|
Kaj I, Mugal CF, Müller-Widmann R. A Wright-Fisher graph model and the impact of directional selection on genetic variation. Theor Popul Biol 2024; 159:13-24. [PMID: 39019334 DOI: 10.1016/j.tpb.2024.07.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 07/06/2024] [Accepted: 07/12/2024] [Indexed: 07/19/2024]
Abstract
We introduce a multi-allele Wright-Fisher model with mutation and selection such that allele frequencies at a single locus are traced by the path of a hybrid jump-diffusion process. The state space of the process is given by the vertices and edges of a topological graph, i.e. edges are unit intervals. Vertices represent monomorphic population states and positions on the edges mark the biallelic proportions of ancestral and derived alleles during polymorphic segments. In this setting, mutations can only occur at monomorphic loci. We derive the stationary distribution in mutation-selection-drift equilibrium and obtain the expected allele frequency spectrum under large population size scaling. For the extended model with multiple independent loci we derive rigorous upper bounds for a wide class of associated measures of genetic variation. Within this framework we present mathematically precise arguments to conclude that the presence of directional selection reduces the magnitude of genetic variation, as constrained by the bounds for neutral evolution.
Collapse
Affiliation(s)
- Ingemar Kaj
- Department of Mathematics, Uppsala University, Uppsala, Sweden.
| | - Carina F Mugal
- Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden; Laboratory of Biometry and Evolutionary Biology, University of Lyon 1, UMR CNRS 5558, Villeurbanne, France
| | | |
Collapse
|
6
|
Šimková H, Câmara AS, Mascher M. Hi-C techniques: from genome assemblies to transcription regulation. JOURNAL OF EXPERIMENTAL BOTANY 2024; 75:5357-5365. [PMID: 38430521 DOI: 10.1093/jxb/erae085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 02/28/2024] [Indexed: 03/04/2024]
Abstract
The invention of chromosome conformation capture (3C) techniques, in particular the key method Hi-C providing genome-wide information about chromatin contacts, revolutionized the way we study the three-dimensional organization of the nuclear genome and how it affects transcription, replication, and DNA repair. Because the frequency of chromatin contacts between pairs of genomic segments predictably relates to the distance in the linear genome, the information obtained by Hi-C has also proved useful for scaffolding genomic sequences. Here, we review recent improvements in experimental procedures of Hi-C and its various derivatives, such as Micro-C, HiChIP, and Capture Hi-C. We assess the advantages and limitations of the techniques, and present examples of their use in recent plant studies. We also report on progress in the development of computational tools used in assembling genome sequences.
Collapse
Affiliation(s)
- Hana Šimková
- Institute of Experimental Botany of the Czech Academy of Sciences, Slechtitelu 31, CZ-779 00 Olomouc, Czech Republic
| | - Amanda Souza Câmara
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstrasse 3, Gatersleben, D-06466 Seeland, Germany
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstrasse 3, Gatersleben, D-06466 Seeland, Germany
| |
Collapse
|
7
|
Liu C, Wu P, Wu X, Zhao X, Chen F, Cheng X, Zhu H, Wang O, Xu M. AsmMix: an efficient haplotype-resolved hybrid de novo genome assembling pipeline. Front Genet 2024; 15:1421565. [PMID: 39130747 PMCID: PMC11310137 DOI: 10.3389/fgene.2024.1421565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 07/05/2024] [Indexed: 08/13/2024] Open
Abstract
Accurate haplotyping facilitates distinguishing allele-specific expression, identifying cis-regulatory elements, and characterizing genomic variations, which enables more precise investigations into the relationship between genotype and phenotype. Recent advances in third-generation single-molecule long read and synthetic co-barcoded read sequencing techniques have harnessed long-range information to simplify the assembly graph and improve assembly genomic sequence. However, it remains methodologically challenging to reconstruct the complete haplotypes due to high sequencing error rates of long reads and limited capturing efficiency of co-barcoded reads. We here present a pipeline, AsmMix, for generating both contiguous and accurate diploid genomes. It first assembles co-barcoded reads to generate accurate haplotype-resolved assemblies that may contain many gaps, while the long-read assembly is contiguous but susceptible to errors. Then two assembly sets are integrated into haplotype-resolved assemblies with reduced misassembles. Through extensive evaluation on multiple synthetic datasets, AsmMix consistently demonstrates high precision and recall rates for haplotyping across diverse sequencing platforms, coverage depths, read lengths, and read accuracies, significantly outperforming other existing tools in the field. Furthermore, we validate the effectiveness of our pipeline using a human whole genome dataset (HG002), and produce highly contiguous, accurate, and haplotype-resolved assemblies. These assemblies are evaluated using the GIAB benchmarks, confirming the accuracy of variant calling. Our results demonstrate that AsmMix offers a straightforward yet highly efficient approach that effectively leverages both long reads and co-barcoded reads for haplotype-resolved assembly.
Collapse
Affiliation(s)
- Chao Liu
- BGI, Tianjin, China
- BGI Research, Shenzhen, China
| | - Pei Wu
- BGI, Tianjin, China
- BGI Research, Shenzhen, China
| | - Xue Wu
- BGI Research, Shenzhen, China
| | | | | | | | - Hongmei Zhu
- BGI, Tianjin, China
- BGI Research, Shenzhen, China
| | - Ou Wang
- BGI Research, Shenzhen, China
| | - Mengyang Xu
- BGI Research, Shenzhen, China
- BGI Research, Qingdao, China
| |
Collapse
|
8
|
Yu Y, Chen H. Human pangenome: far-reaching implications in precision medicine. Front Med 2024; 18:403-409. [PMID: 38157192 DOI: 10.1007/s11684-023-1039-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 10/15/2023] [Indexed: 01/03/2024]
Affiliation(s)
- Yingyan Yu
- Department of General Surgery of Ruijin Hospital, Shanghai Institute of Digestive Surgery, and Shanghai Key Laboratory for Gastric Neoplasms, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Hongzhuan Chen
- Shuguang Lab for Future Health, Shanghai Frontier Science Center of TCM Chemical Biology, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China.
| |
Collapse
|
9
|
Cochetel N, Minio A, Guarracino A, Garcia JF, Figueroa-Balderas R, Massonnet M, Kasuga T, Londo JP, Garrison E, Gaut BS, Cantu D. A super-pangenome of the North American wild grape species. Genome Biol 2023; 24:290. [PMID: 38111050 PMCID: PMC10729490 DOI: 10.1186/s13059-023-03133-2] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 11/30/2023] [Indexed: 12/20/2023] Open
Abstract
BACKGROUND Capturing the genetic diversity of wild relatives is crucial for improving crops because wild species are valuable sources of agronomic traits that are essential to enhance the sustainability and adaptability of domesticated cultivars. Genetic diversity across a genus can be captured in super-pangenomes, which provide a framework for interpreting genomic variations. RESULTS Here we report the sequencing, assembly, and annotation of nine wild North American grape genomes, which are phased and scaffolded at chromosome scale. We generate a reference-unbiased super-pangenome using pairwise whole-genome alignment methods, revealing the extent of the genomic diversity among wild grape species from sequence to gene level. The pangenome graph captures genomic variation between haplotypes within a species and across the different species, and it accurately assesses the similarity of hybrids to their parents. The species selected to build the pangenome are a great representation of the genus, as illustrated by capturing known allelic variants in the sex-determining region and for Pierce's disease resistance loci. Using pangenome-wide association analysis, we demonstrate the utility of the super-pangenome by effectively mapping short reads from genus-wide samples and identifying loci associated with salt tolerance in natural populations of grapes. CONCLUSIONS This study highlights how a reference-unbiased super-pangenome can reveal the genetic basis of adaptive traits from wild relatives and accelerate crop breeding research.
Collapse
Affiliation(s)
- Noé Cochetel
- Department of Viticulture and Enology, University of California Davis, Davis, CA, USA
| | - Andrea Minio
- Department of Viticulture and Enology, University of California Davis, Davis, CA, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Human Technopole, Milan, Italy
| | - Jadran F Garcia
- Department of Viticulture and Enology, University of California Davis, Davis, CA, USA
| | | | - Mélanie Massonnet
- Department of Viticulture and Enology, University of California Davis, Davis, CA, USA
| | - Takao Kasuga
- Crops Pathology and Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, Davis, CA, USA
| | - Jason P Londo
- Horticulture Section, School of Integrative Plant Science, Cornell AgriTech, Cornell University, Geneva, NY, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Brandon S Gaut
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, CA, USA
| | - Dario Cantu
- Department of Viticulture and Enology, University of California Davis, Davis, CA, USA.
- Genome Center, University of California Davis, Davis, CA, USA.
| |
Collapse
|
10
|
Santangelo JS, Battlay P, Hendrickson BT, Kuo WH, Olsen KM, Kooyers NJ, Johnson MTJ, Hodgins KA, Ness RW. Haplotype-Resolved, Chromosome-Level Assembly of White Clover (Trifolium repens L., Fabaceae). Genome Biol Evol 2023; 15:evad146. [PMID: 37542471 PMCID: PMC10433932 DOI: 10.1093/gbe/evad146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 07/24/2023] [Accepted: 07/29/2023] [Indexed: 08/07/2023] Open
Abstract
White clover (Trifolium repens L.; Fabaceae) is an important forage and cover crop in agricultural pastures around the world and is increasingly used in evolutionary ecology and genetics to understand the genetic basis of adaptation. Historically, improvements in white clover breeding practices and assessments of genetic variation in nature have been hampered by a lack of high-quality genomic resources for this species, owing in part to its high heterozygosity and allotetraploid hybrid origin. Here, we use PacBio HiFi and chromosome conformation capture (Omni-C) technologies to generate a chromosome-level, haplotype-resolved genome assembly for white clover totaling 998 Mbp (scaffold N50 = 59.3 Mbp) and 1 Gbp (scaffold N50 = 58.6 Mbp) for haplotypes 1 and 2, respectively, with each haplotype arranged into 16 chromosomes (8 per subgenome). We additionally provide a functionally annotated haploid mapping assembly (968 Mbp, scaffold N50 = 59.9 Mbp), which drastically improves on the existing reference assembly in both contiguity and assembly accuracy. We annotated 78,174 protein-coding genes, resulting in protein BUSCO completeness scores of 99.6% and 99.3% against the embryophyta_odb10 and fabales_odb10 lineage datasets, respectively.
Collapse
Affiliation(s)
- James S Santangelo
- Department of Biology, University of Toronto Mississauga, Mississauga, Ontario, Canada
| | - Paul Battlay
- School of Biological Sciences, Monash University, Melbourne, Victoria, Australia
| | | | - Wen-Hsi Kuo
- Department of Biology, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Kenneth M Olsen
- Department of Biology, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Nicholas J Kooyers
- Department of Biology, University of Louisiana, Lafayette, Louisiana, USA
| | - Marc T J Johnson
- Department of Biology, University of Toronto Mississauga, Mississauga, Ontario, Canada
| | - Kathryn A Hodgins
- School of Biological Sciences, Monash University, Melbourne, Victoria, Australia
| | - Rob W Ness
- Department of Biology, University of Toronto Mississauga, Mississauga, Ontario, Canada
| |
Collapse
|
11
|
Secomandi S, Gallo GR, Sozzoni M, Iannucci A, Galati E, Abueg L, Balacco J, Caprioli M, Chow W, Ciofi C, Collins J, Fedrigo O, Ferretti L, Fungtammasan A, Haase B, Howe K, Kwak W, Lombardo G, Masterson P, Messina G, Møller AP, Mountcastle J, Mousseau TA, Ferrer Obiol J, Olivieri A, Rhie A, Rubolini D, Saclier M, Stanyon R, Stucki D, Thibaud-Nissen F, Torrance J, Torroni A, Weber K, Ambrosini R, Bonisoli-Alquati A, Jarvis ED, Gianfranceschi L, Formenti G. A chromosome-level reference genome and pangenome for barn swallow population genomics. Cell Rep 2023; 42:111992. [PMID: 36662619 PMCID: PMC10044405 DOI: 10.1016/j.celrep.2023.111992] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 07/20/2022] [Accepted: 01/04/2023] [Indexed: 01/20/2023] Open
Abstract
Insights into the evolution of non-model organisms are limited by the lack of reference genomes of high accuracy, completeness, and contiguity. Here, we present a chromosome-level, karyotype-validated reference genome and pangenome for the barn swallow (Hirundo rustica). We complement these resources with a reference-free multialignment of the reference genome with other bird genomes and with the most comprehensive catalog of genetic markers for the barn swallow. We identify potentially conserved and accelerated genes using the multialignment and estimate genome-wide linkage disequilibrium using the catalog. We use the pangenome to infer core and accessory genes and to detect variants using it as a reference. Overall, these resources will foster population genomics studies in the barn swallow, enable detection of candidate genes in comparative genomics studies, and help reduce bias toward a single reference genome.
Collapse
Affiliation(s)
- Simona Secomandi
- Department of Biosciences, University of Milan, Milan, Italy; Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | - Guido R Gallo
- Department of Biosciences, University of Milan, Milan, Italy
| | | | - Alessio Iannucci
- Department of Biology, University of Florence, Sesto Fiorentino (FI), Italy
| | - Elena Galati
- Department of Biosciences, University of Milan, Milan, Italy
| | - Linelle Abueg
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Jennifer Balacco
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Manuela Caprioli
- Department of Environmental Sciences and Policy, University of Milan, Milan, Italy
| | | | - Claudio Ciofi
- Department of Biology, University of Florence, Sesto Fiorentino (FI), Italy
| | | | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Luca Ferretti
- Department of Biology and Biotechnology "L. Spallanzani", University of Pavia, Pavia, Italy
| | | | - Bettina Haase
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | | | - Woori Kwak
- Department of Medical and Biological Sciences, The Catholic University of Korea, Bucheon 14662, Korea
| | - Gianluca Lombardo
- Department of Biology and Biotechnology "L. Spallanzani", University of Pavia, Pavia, Italy
| | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | - Anders P Møller
- Ecologie Systématique Evolution, Université Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Orsay Cedex, France
| | | | - Timothy A Mousseau
- Department of Biological Sciences, University of South Carolina, Columbia, SC 29208, USA
| | - Joan Ferrer Obiol
- Department of Environmental Sciences and Policy, University of Milan, Milan, Italy
| | - Anna Olivieri
- Department of Biology and Biotechnology "L. Spallanzani", University of Pavia, Pavia, Italy
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Diego Rubolini
- Department of Environmental Sciences and Policy, University of Milan, Milan, Italy
| | | | - Roscoe Stanyon
- Department of Biology, University of Florence, Sesto Fiorentino (FI), Italy
| | | | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | - Antonio Torroni
- Department of Biology and Biotechnology "L. Spallanzani", University of Pavia, Pavia, Italy
| | | | - Roberto Ambrosini
- Department of Environmental Sciences and Policy, University of Milan, Milan, Italy
| | - Andrea Bonisoli-Alquati
- Department of Biological Sciences, California State Polytechnic University - Pomona, Pomona, CA, USA
| | - Erich D Jarvis
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA; The Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | | | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA.
| |
Collapse
|
12
|
Marone MP, Singh HC, Pozniak CJ, Mascher M. A technical guide to TRITEX, a computational pipeline for chromosome-scale sequence assembly of plant genomes. PLANT METHODS 2022; 18:128. [PMID: 36461065 PMCID: PMC9719158 DOI: 10.1186/s13007-022-00964-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 11/25/2022] [Indexed: 06/17/2023]
Abstract
BACKGROUND As complete and accurate genome sequences are becoming easier to obtain, more researchers wish to get one or more of them to support their research endeavors. Reliable and well-documented sequence assembly workflows find use in reference or pangenome projects. RESULTS We describe modifications to the TRITEX genome assembly workflow motivated by the rise of fast and easy long-read contig assembly of inbred plant genomes and the routine deployment of the toolchains in pangenome projects. New features include the use as surrogates of or complements to dense genetic maps and the introduction of user-editable tables to make the curation of contig placements easier and more intuitive. CONCLUSION Even maximally contiguous sequence assemblies of the telomere-to-telomere sort, and to a yet greater extent, the fragmented kind require validation, correction, and comparison to reference standards. As pangenomics is burgeoning, these tasks are bound to become more widespread and TRITEX is one tool to get them done. This technical guide is supported by a step-by-step computational tutorial accessible under https://tritexassembly.bitbucket.io/ . The TRITEX source code is hosted under this URL: https://bitbucket.org/tritexassembly .
Collapse
Affiliation(s)
- Marina Püpke Marone
- Leibniz-Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Seeland, Germany
- Department of Genetics, Evolution, Microbiology and Immunology, University of Campinas, Campinas, Brazil
| | - Harmeet Chawla Singh
- Crop Development Centre and Department of Plant Sciences, University of Saskatchewan, Saskatoon, SK, S7N 5A8, Canada
- Department of Plant Science, University of Manitoba, Winnipeg, MB, R3T 2N2, Canada
| | - Curtis J Pozniak
- Crop Development Centre and Department of Plant Sciences, University of Saskatchewan, Saskatoon, SK, S7N 5A8, Canada
| | - Martin Mascher
- Leibniz-Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Seeland, Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.
| |
Collapse
|
13
|
Chen Y, Miao Y, Bai W, Lin K, Pang E. Characteristics and potential functional effects of long insertions in Asian butternuts. BMC Genomics 2022; 23:732. [PMID: 36307757 PMCID: PMC9617325 DOI: 10.1186/s12864-022-08961-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 10/17/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
Structural variants (SVs) play important roles in adaptation evolution and species diversification. Especially, in plants, many phenotypes of response to the environment were found to be associated with SVs. Despite the prevalence and significance of SVs, long insertions remain poorly detected and studied in all but model species.
Results
We used whole-genome resequencing of paired reads from 80 Asian butternuts to detect long insertions and further analyse their characteristics and potential functional effects. By combining of mapping-based and de novo assembly-based methods, we obtained a multiple related species pangenome representing higher taxonomic groups. We obtained 89,312 distinct contigs totaling 147,773,999 base pair (bp) of new sequences, of which 347 were putative long insertions placed in the reference genome. Most of the putative long insertions appeared in multiple species; in contrast, only 62 putative long insertions appeared in one species, which may be involved in the response to the environment. 65 putative long insertions fell into 61 distinct protein-coding genes involved in plant development, and 105 putative long insertions fell into upstream of 106 distinct protein-coding genes involved in cellular respiration. 3,367 genes were annotated in 2,606 contigs. We propose PLAINS (https://github.com/CMB-BNU/PLAINS.git), a streamlined, comprehensive pipeline for the prediction and analysis of long insertions using whole-genome resequencing.
Conclusions
Our study lays down an important foundation for further whole-genome long insertion studies, allowing the investigation of their effects by experiments.
Collapse
|