1
|
van Westerhoven A, Fokkens L, Wissink K, Kema GJ, Rep M, Seidl M. Reference-free identification and pangenome analysis of accessory chromosomes in a major fungal plant pathogen. NAR Genom Bioinform 2025; 7:lqaf034. [PMID: 40176926 PMCID: PMC11963757 DOI: 10.1093/nargab/lqaf034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Revised: 02/19/2025] [Accepted: 03/14/2025] [Indexed: 04/05/2025] Open
Abstract
Accessory chromosomes, found in some but not all individuals of a species, play an important role in pathogenicity and host specificity in fungal plant pathogens. However, their variability complicates reference-based analysis, especially when these chromosomes are missing in the reference genome. Pangenome variation graphs offer a reference-free alternative for studying these chromosomes. Here, we constructed a pangenome variation graph for 73 diverse Fusarium oxysporum genomes, a major fungal plant pathogen with a compartmentalized genome that includes conserved core as well as variable accessory chromosomes. To obtain insights into accessory chromosome dynamics, we first constructed a chromosome similarity network using all-vs-all similarity mapping. We identified eleven core chromosomes conserved across all strains and a substantial number of highly variable accessory chromosomes. Some of these accessory chromosomes are host-specific and likely play a role in determining host range. Using a k-mer based approach, we further identified the presence of these accessory chromosomes in all available (581) F. oxysporum assemblies and corroborated the occurrence of host-specific accessory chromosomes. To further analyze the evolution of chromosomes in F. oxysporum, we constructed a pangenome variation graph per group of homologous chromosomes. This reveals that accessory chromosomes are composed of different stretches of accessory regions, and possibly rearrangements between accessory regions gave rise to these mosaic accessory chromosomes. Furthermore, we show that accessory chromosomes are likely horizontally transferred in natural populations. Our findings demonstrate that a pangenome variation graph is a powerful approach to elucidate the evolutionary dynamics of accessory chromosomes in F. oxysporum, which is not only a useful resource for Fusarium but also provides a framework for similar analyses in other species containing accessory chromosomes.
Collapse
Affiliation(s)
- Anouk C van Westerhoven
- Theoretical Biology and Bioinformatics, Utrecht University, Padualaan 8, 3583CH, Utrecht, the Netherlands
- Laboratory of Phytopathology, Wageningen University & Research, Droevendaalsesteeg 1, 6708PB, Wageningen, the Netherlands
| | - Like Fokkens
- Laboratory of Phytopathology, Wageningen University & Research, Droevendaalsesteeg 1, 6708PB, Wageningen, the Netherlands
| | - Kyran Wissink
- Theoretical Biology and Bioinformatics, Utrecht University, Padualaan 8, 3583CH, Utrecht, the Netherlands
| | - Gert H J Kema
- Laboratory of Phytopathology, Wageningen University & Research, Droevendaalsesteeg 1, 6708PB, Wageningen, the Netherlands
| | - Martijn Rep
- Molecular Plant Pathology, Swammerdam Institute of Life Sciences, University of Amsterdam,1090GE, Amsterdam, the Netherlands
| | - Michael F Seidl
- Theoretical Biology and Bioinformatics, Utrecht University, Padualaan 8, 3583CH, Utrecht, the Netherlands
| |
Collapse
|
2
|
Gao Y, Yang L, Kuhn K, Li W, Zanton G, Bowman M, Zhao P, Zhou Y, Fang L, Cole JB, Rosen BD, Ma L, Li C, Baldwin RL, Van Tassell CP, Zhang Z, Smith TPL, Liu GE. Long read and preliminary pangenome analyses reveal breed-specific structural variations and novel sequences in Holstein and Jersey cattle. J Adv Res 2025:S2090-1232(25)00258-9. [PMID: 40258473 DOI: 10.1016/j.jare.2025.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2024] [Revised: 04/06/2025] [Accepted: 04/10/2025] [Indexed: 04/23/2025] Open
Abstract
INTRODUCTION Most SV studies in livestock rely on short-read sequencing, posing challenges in accurately characterizing large genomic variants due to their limited read length. OBJECTIVES Our goal is to reveal structural variation and novel sequences specific to Holstein and Jersey cattle breeds using long-read and pan-genome analyses. METHODS We sequenced 20 Holsteins and 8 Jersey cattle using PacBio HiFi to 20×, and integrated five read-based and one assembly-based SV caller to determine SVs. RESULTS We assembled the 28 genomes averaging 3.25 Gb with a contig N50 of 69.36 Mb and using the ARS-UCD1.2 reference, we acquired Holstein/Jersey SV catalogs with 74,068/54,689 events spanning 202/135 Mb (7.43 %/4.97 % of the genome). SVs were enriched in less conserved, non-coding, and non-regulatory regions. Comparing Holsteins with differing feed efficiency (FE), SVs unique to high FE were linked to energy metabolism and olfactory receptors, while those specific to low FE were associated with material transport. We constructed Holstein/Jersey pangenome graphs with 148,598/105,875 nodes and 208,891/147,990 edges, representing 47,028/37,137 biallelic and multi-allelic events, and 63.75/42.34 Mb of novel sequence. We observed SV count saturation with 20 Holsteins, while adding Jerseys significantly increased the SV count, highlighting breed-specific SV events. CONCLUSION Our long-read data and SV catalogs are valuable resources, revealing that the cattle genome is more complex than previously thought.
Collapse
Affiliation(s)
- Yahui Gao
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China; Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA; Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA.
| | - Liu Yang
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA; Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA.
| | - Kristen Kuhn
- USDA, ARS, U.S. Meat Animal Research Center (USMARC), Clay Center, NE, USA.
| | - Wenli Li
- US Dairy Forage Research Center, USDA-ARS, Madison, WI, USA.
| | - Geoffrey Zanton
- US Dairy Forage Research Center, USDA-ARS, Madison, WI, USA.
| | - Mary Bowman
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Pengju Zhao
- Hainan Institute, Zhejiang University, Yongyou Industry Park, Yazhou Bay Sci-Tech City, Sanya 572000, China.
| | - Yang Zhou
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China.
| | - Lingzhao Fang
- Quantitative Genetics and Genomics (QGG), Aarhus University, Aarhus, Denmark.
| | - John B Cole
- Council on Dairy Cattle Breeding, 4201 Northview Dr, Bowie, MD 20716, USA; Department of Animal Sciences, Donald Henry Barron Reproductive and Perinatal Biology Research Program, and the Genetics Institute, University of Florida, Gainesville, FL 32611-0910, USA; Department of Animal Science, North Carolina State University, Raleigh, NC 27695-7621, USA.
| | - Benjamin D Rosen
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Li Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA.
| | - Congjun Li
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Ransom L Baldwin
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Curtis P Van Tassell
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Zhe Zhang
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Timothy P L Smith
- USDA, ARS, U.S. Meat Animal Research Center (USMARC), Clay Center, NE, USA.
| | - George E Liu
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| |
Collapse
|
3
|
Vicedomini R, Andreace F, Dufresne Y, Chikhi R, Duitama González C. MUSET: set of utilities for constructing abundance unitig matrices from sequencing data. Bioinformatics 2025; 41:btaf054. [PMID: 39898792 PMCID: PMC11897428 DOI: 10.1093/bioinformatics/btaf054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 12/20/2024] [Accepted: 01/30/2025] [Indexed: 02/04/2025] Open
Abstract
SUMMARY MUSET is a novel set of utilities designed to efficiently construct abundance unitig matrices from sequencing data. Unitig matrices extend the concept of k-mer matrices by merging overlapping k-mers that unambiguously belong to the same sequence. MUSET addresses the limitations of current software by integrating k-mer counting and unitig extraction to generate unitig matrices containing abundance values, as opposed to only presence-absence in previous tools. These matrices preserve variations between samples while reducing disk space and the number of rows compared to k-mer matrices. We evaluated MUSET's performance using datasets derived from a 618-GB collection of ancient oral sequencing samples, producing a filtered unitig matrix that records abundances in <10 h and 20 GB memory. AVAILABILITY AND IMPLEMENTATION MUSET is open source and publicly available under the AGPL-3.0 licence in GitHub at https://github.com/CamilaDuitama/muset. Source code is implemented in C++ and provided with kmat_tools, a collection of tools for processing k-mer matrices. Version v0.5.1 is available on Zenodo with DOI 10.5281/zenodo.14164801.
Collapse
Affiliation(s)
- Riccardo Vicedomini
- GenScale, Université de Rennes, Inria RBA, CNRS UMR 6074, F-35000 Rennes, France
| | - Francesco Andreace
- Institut Pasteur, Université Paris Cité, Sequence Bioinformatics Unit, F-75015 Paris, France
- Sorbonne Université, Collège Doctoral, F-75005 Paris, France
| | - Yoann Dufresne
- Institut Pasteur, Université Paris Cité, Sequence Bioinformatics Unit, F-75015 Paris, France
- Sorbonne Université, Collège Doctoral, F-75005 Paris, France
| | - Rayan Chikhi
- Institut Pasteur, Université Paris Cité, Sequence Bioinformatics Unit, F-75015 Paris, France
| | - Camila Duitama González
- Institut Pasteur, Université Paris Cité, Sequence Bioinformatics Unit, F-75015 Paris, France
| |
Collapse
|
4
|
Edwards SV, Fang B, Khost D, Kolyfetis GE, Cheek RG, DeRaad DA, Chen N, Fitzpatrick JW, McCormack JE, Funk WC, Ghalambor CK, Garrison E, Guarracino A, Li H, Sackton TB. Comparative population pangenomes reveal unexpected complexity and fitness effects of structural variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.11.637762. [PMID: 39990470 PMCID: PMC11844517 DOI: 10.1101/2025.02.11.637762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/25/2025]
Abstract
Structural variants (SVs) are widespread in vertebrate genomes, yet their evolutionary dynamics remain poorly understood. Using 45 long-read de novo genome assemblies and pangenome tools, we analyze SVs within three closely related species of North American jays (Aphelocoma, scrub-jays) displaying a 60-fold range in effective population size. We find rapid evolution of genome architecture, including ~100 Mb variation in genome size driven by dynamic satellite landscapes with unexpectedly long (> 10 kb) repeat units and widespread variation in gene content, influencing gene expression. SVs exhibit slightly deleterious dynamics modulated by variant length and population size, with strong evidence of adaptive fixation only in large populations. Our results demonstrate how population size shapes the distribution of SVs and the importance of pangenomes to characterizing genomic diversity.
Collapse
Affiliation(s)
- Scott V. Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA, 2138, USA
- Museum of Comparative Zoology, Harvard University, 26 Oxford Street, Cambridge, MA, 2138, USA
| | - Bohao Fang
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA, 2138, USA
- Museum of Comparative Zoology, Harvard University, 26 Oxford Street, Cambridge, MA, 2138, USA
| | - Danielle Khost
- Informatics Group, Harvard University, 52 Oxford St, Cambridge, MA, 2138, USA
| | - George E Kolyfetis
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA, 2138, USA
| | - Rebecca G Cheek
- Department of Biology, Graduate Degree Program in Ecology, Colorado State University, 1878 Campus Delivery, Fort Collins, CO, 80523, USA
| | - Devon A DeRaad
- Moore Laboratory of Zoology, Occidental College, 1600 Campus Rd, Los Angeles, CA, 90041, USA
| | - Nancy Chen
- Department of Biology, University of Rochester, 477 Hutchison Hall, Box 270211, Rochester, NY, 14627, USA
| | - John W Fitzpatrick
- Cornell Lab of Ornithology, Cornell University, 159 Sapsucker Woods Rd, Ithaca, NY, 14850, USA
| | - John E. McCormack
- Moore Laboratory of Zoology, Occidental College, 1600 Campus Rd, Los Angeles, CA, 90041, USA
| | - W. Chris Funk
- Department of Biology, Graduate Degree Program in Ecology, Colorado State University, 1878 Campus Delivery, Fort Collins, CO, 80523, USA
| | - Cameron K Ghalambor
- Department of Biology, Norwegian University of Science and Technology, Høgskoleringen 5, Realfagbygget D1-137, Trondheim, 7491, Norway
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, 71 S. Manassas Street, Memphis, TN, 38163, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, 71 S. Manassas Street, Memphis, TN, 38163, USA
| | - Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, 450 Brookline Ave, Mailstop: CLSB 11007, Boston, MA, 2215
| | - Timothy B Sackton
- Informatics Group, Harvard University, 52 Oxford St, Cambridge, MA, 2138, USA
| |
Collapse
|
5
|
Ren J, Kou W, Xu Y, Lu M, Gong M, Zhang X, Liu Z, Li H, Yang Q, Shah AM, Zhu F, Hou Z, Xu N, Jiang Y, Wang F. Pan-genome analyses add ∼1000 genes to the "complete" genome assembly of chicken. J Genet Genomics 2025; 52:116-119. [PMID: 39510408 DOI: 10.1016/j.jgg.2024.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 10/18/2024] [Accepted: 10/23/2024] [Indexed: 11/15/2024]
Affiliation(s)
- Jilong Ren
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Wenyan Kou
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yuan Xu
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Meixuan Lu
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Mian Gong
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Xinmiao Zhang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Zhenyu Liu
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Hengkuan Li
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Qimeng Yang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Ali Mujtaba Shah
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Feng Zhu
- Frontiers Science Center for Molecular Design Breeding (MOE) & College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Zhuocheng Hou
- Frontiers Science Center for Molecular Design Breeding (MOE) & College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Naiyi Xu
- College of Animal Science and Technology, Southwest University, Chongqing 400715, China; Chongqing Key Laboratory of Herbivore Science, Chongqing 400715, China
| | - Yu Jiang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China; Key Laboratory of Livestock Biology, Northwest A&F University, Yangling, Shaanxi 712100, China.
| | - Fei Wang
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi 712100, China.
| |
Collapse
|
6
|
Secomandi S, Gallo GR, Rossi R, Rodríguez Fernandes C, Jarvis ED, Bonisoli-Alquati A, Gianfranceschi L, Formenti G. Pangenome graphs and their applications in biodiversity genomics. Nat Genet 2025; 57:13-26. [PMID: 39779953 DOI: 10.1038/s41588-024-02029-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 11/08/2024] [Indexed: 01/11/2025]
Abstract
Complete datasets of genetic variants are key to biodiversity genomic studies. Long-read sequencing technologies allow the routine assembly of highly contiguous, haplotype-resolved reference genomes. However, even when complete, reference genomes from a single individual may bias downstream analyses and fail to adequately represent genetic diversity within a population or species. Pangenome graphs assembled from aligned collections of high-quality genomes can overcome representation bias by integrating sequence information from multiple genomes from the same population, species or genus into a single reference. Here, we review the available tools and data structures to build, visualize and manipulate pangenome graphs while providing practical examples and discussing their applications in biodiversity and conservation genomics across the tree of life.
Collapse
Affiliation(s)
- Simona Secomandi
- Laboratory of Neurogenetics of Language, the Rockefeller University, New York, NY, USA
| | | | - Riccardo Rossi
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy
| | - Carlos Rodríguez Fernandes
- Centre for Ecology, Evolution and Environmental Changes (CE3C) and CHANGE, Global Change and Sustainability Institute, Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
- Faculdade de Psicologia, Universidade de Lisboa, Lisboa, Portugal
| | - Erich D Jarvis
- Laboratory of Neurogenetics of Language, the Rockefeller University, New York, NY, USA
- The Vertebrate Genome Laboratory, New York, NY, USA
| | - Andrea Bonisoli-Alquati
- Department of Biological Sciences, California State Polytechnic University, Pomona, Pomona, CA, USA
| | | | | |
Collapse
|
7
|
Aplakidou E, Vergoulidis N, Chasapi M, Venetsianou NK, Kokoli M, Panagiotopoulou E, Iliopoulos I, Karatzas E, Pafilis E, Georgakopoulos-Soares I, Kyrpides NC, Pavlopoulos GA, Baltoumas FA. Visualizing metagenomic and metatranscriptomic data: A comprehensive review. Comput Struct Biotechnol J 2024; 23:2011-2033. [PMID: 38765606 PMCID: PMC11101950 DOI: 10.1016/j.csbj.2024.04.060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/25/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024] Open
Abstract
The fields of Metagenomics and Metatranscriptomics involve the examination of complete nucleotide sequences, gene identification, and analysis of potential biological functions within diverse organisms or environmental samples. Despite the vast opportunities for discovery in metagenomics, the sheer volume and complexity of sequence data often present challenges in processing analysis and visualization. This article highlights the critical role of advanced visualization tools in enabling effective exploration, querying, and analysis of these complex datasets. Emphasizing the importance of accessibility, the article categorizes various visualizers based on their intended applications and highlights their utility in empowering bioinformaticians and non-bioinformaticians to interpret and derive insights from meta-omics data effectively.
Collapse
Affiliation(s)
- Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Nikolaos Vergoulidis
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Maria Chasapi
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Maria Kokoli
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Eleni Panagiotopoulou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Ioannis Iliopoulos
- Department of Basic Sciences, School of Medicine, University of Crete, 71003 Heraklion, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikos C. Kyrpides
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Center of New Biotechnologies & Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Greece
- Hellenic Army Academy, 16673 Vari, Greece
| | - Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| |
Collapse
|
8
|
Jayakodi M, Lu Q, Pidon H, Rabanus-Wallace MT, Bayer M, Lux T, Guo Y, Jaegle B, Badea A, Bekele W, Brar GS, Braune K, Bunk B, Chalmers KJ, Chapman B, Jørgensen ME, Feng JW, Feser M, Fiebig A, Gundlach H, Guo W, Haberer G, Hansson M, Himmelbach A, Hoffie I, Hoffie RE, Hu H, Isobe S, König P, Kale SM, Kamal N, Keeble-Gagnère G, Keller B, Knauft M, Koppolu R, Krattinger SG, Kumlehn J, Langridge P, Li C, Marone MP, Maurer A, Mayer KFX, Melzer M, Muehlbauer GJ, Murozuka E, Padmarasu S, Perovic D, Pillen K, Pin PA, Pozniak CJ, Ramsay L, Pedas PR, Rutten T, Sakuma S, Sato K, Schüler D, Schmutzer T, Scholz U, Schreiber M, Shirasawa K, Simpson C, Skadhauge B, Spannagl M, Steffenson BJ, Thomsen HC, Tibbits JF, Nielsen MTS, Trautewig C, Vequaud D, Voss C, Wang P, Waugh R, Westcott S, Rasmussen MW, Zhang R, Zhang XQ, Wicker T, Dockter C, Mascher M, Stein N. Structural variation in the pangenome of wild and domesticated barley. Nature 2024; 636:654-662. [PMID: 39537924 PMCID: PMC11655362 DOI: 10.1038/s41586-024-08187-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 10/09/2024] [Indexed: 11/16/2024]
Abstract
Pangenomes are collections of annotated genome sequences of multiple individuals of a species1. The structural variants uncovered by these datasets are a major asset to genetic analysis in crop plants2. Here we report a pangenome of barley comprising long-read sequence assemblies of 76 wild and domesticated genomes and short-read sequence data of 1,315 genotypes. An expanded catalogue of sequence variation in the crop includes structurally complex loci that are rich in gene copy number variation. To demonstrate the utility of the pangenome, we focus on four loci involved in disease resistance, plant architecture, nutrient release and trichome development. Novel allelic variation at a powdery mildew resistance locus and population-specific copy number gains in a regulator of vegetative branching were found. Expansion of a family of starch-cleaving enzymes in elite malting barleys was linked to shifts in enzymatic activity in micro-malting trials. Deletion of an enhancer motif is likely to change the developmental trajectory of the hairy appendages on barley grains. Our findings indicate that allelic diversity at structurally complex loci may have helped crop plants to adapt to new selective regimes in agricultural ecosystems.
Collapse
Affiliation(s)
- Murukarthick Jayakodi
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
- Department of Soil and Crop Sciences, Texas A&M AgriLife Research-Dallas, Dallas, TX, USA
| | - Qiongxian Lu
- Carlsberg Research Laboratory, Copenhagen, Denmark
| | - Hélène Pidon
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
- IPSiM, University of Montpellier, CNRS, INRAE, Institut Agro, Montpellier, France
| | | | | | - Thomas Lux
- PGSB-Plant Genome and Systems Biology, Helmholtz Center Munich-German Research Center for Environmental Health, Neuherberg, Germany
| | - Yu Guo
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Benjamin Jaegle
- Department of Plant and Microbial Biology, University of Zurich, Zurich, Switzerland
| | - Ana Badea
- Brandon Research and Development Centre, Agriculture et Agri-Food Canada, Brandon, Manitoba, Canada
| | - Wubishet Bekele
- Ottawa Research and Development Centre, Agriculture et Agri-Food Canada, Ottawa, Ontario, Canada
| | - Gurcharn S Brar
- Faculty of Land and Food Systems, The University of British Columbia, Vancouver, British Columbia, Canada
- Faculty of Agricultural, Life and Environmental Sciences (ALES), University of Alberta, Edmonton, Alberta, Canada
| | | | - Boyke Bunk
- DSMZ-German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany
| | - Kenneth J Chalmers
- School of Agriculture, Food and Wine, University of Adelaide, Urrbrae, South Australia, Australia
| | - Brett Chapman
- Western Crop Genetics Alliance, Food Futures Institute/School of Agriculture, Murdoch University, Murdoch, Western Australia, Australia
| | | | - Jia-Wu Feng
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Manuel Feser
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Anne Fiebig
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Heidrun Gundlach
- PGSB-Plant Genome and Systems Biology, Helmholtz Center Munich-German Research Center for Environmental Health, Neuherberg, Germany
| | | | - Georg Haberer
- PGSB-Plant Genome and Systems Biology, Helmholtz Center Munich-German Research Center for Environmental Health, Neuherberg, Germany
| | - Mats Hansson
- Department of Biology, Lund University, Lund, Sweden
| | - Axel Himmelbach
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Iris Hoffie
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Robert E Hoffie
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Haifei Hu
- Western Crop Genetics Alliance, Food Futures Institute/School of Agriculture, Murdoch University, Murdoch, Western Australia, Australia
- Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | | | - Patrick König
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Sandip M Kale
- Carlsberg Research Laboratory, Copenhagen, Denmark
- Department of Agroecology, Aarhus University, Slagelse, Denmark
| | - Nadia Kamal
- PGSB-Plant Genome and Systems Biology, Helmholtz Center Munich-German Research Center for Environmental Health, Neuherberg, Germany
| | - Gabriel Keeble-Gagnère
- Agriculture Victoria, Department of Jobs, Precincts and Regions, Agribio, La Trobe University, Bundoora, Victoria, Australia
| | - Beat Keller
- Department of Plant and Microbial Biology, University of Zurich, Zurich, Switzerland
| | - Manuela Knauft
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Ravi Koppolu
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Simon G Krattinger
- Plant Science Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Jochen Kumlehn
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Peter Langridge
- School of Agriculture, Food and Wine, University of Adelaide, Urrbrae, South Australia, Australia
| | - Chengdao Li
- Western Crop Genetics Alliance, Food Futures Institute/School of Agriculture, Murdoch University, Murdoch, Western Australia, Australia
- Department of Primary Industry and Regional Development, Government of Western Australia, Perth, Western Australia, Australia
- College of Agriculture, Yangtze University, Jingzhou, China
| | - Marina P Marone
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Andreas Maurer
- Institute of Agricultural and Nutritional Sciences, Martin Luther University Halle-Wittenberg, Halle, Germany
| | - Klaus F X Mayer
- PGSB-Plant Genome and Systems Biology, Helmholtz Center Munich-German Research Center for Environmental Health, Neuherberg, Germany
- School of Life Sciences Weihenstephan, Technical University Munich, Freising, Germany
| | - Michael Melzer
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Gary J Muehlbauer
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, USA
| | | | - Sudharsan Padmarasu
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Dragan Perovic
- Institute for Resistance Research and Stress Tolerance, Julius Kuehn-Institute (JKI), Federal Research Centre for Cultivated Plants, Quedlinburg, Germany
| | - Klaus Pillen
- Institute of Agricultural and Nutritional Sciences, Martin Luther University Halle-Wittenberg, Halle, Germany
| | | | - Curtis J Pozniak
- Department of Plant Sciences and Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | | | | | - Twan Rutten
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Shun Sakuma
- Faculty of Agriculture, Tottori University, Tottori, Japan
| | - Kazuhiro Sato
- Kazusa DNA Research Institute, Kisarazu, Japan
- Institute of Plant Science and Resources, Okayama University, Kurashiki, Japan
| | - Danuta Schüler
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Thomas Schmutzer
- Institute of Agricultural and Nutritional Sciences, Martin Luther University Halle-Wittenberg, Halle, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | | | | | | | | | - Manuel Spannagl
- PGSB-Plant Genome and Systems Biology, Helmholtz Center Munich-German Research Center for Environmental Health, Neuherberg, Germany
| | - Brian J Steffenson
- Department of Plant Pathology, University of Minnesota, St. Paul, MN, USA
| | | | - Josquin F Tibbits
- Agriculture Victoria, Department of Jobs, Precincts and Regions, Agribio, La Trobe University, Bundoora, Victoria, Australia
| | | | - Corinna Trautewig
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | | | - Cynthia Voss
- Carlsberg Research Laboratory, Copenhagen, Denmark
| | - Penghao Wang
- Western Crop Genetics Alliance, Food Futures Institute/School of Agriculture, Murdoch University, Murdoch, Western Australia, Australia
| | - Robbie Waugh
- The James Hutton Institute, Dundee, UK
- School of Life Sciences, University of Dundee, Dundee, UK
| | - Sharon Westcott
- Western Crop Genetics Alliance, Food Futures Institute/School of Agriculture, Murdoch University, Murdoch, Western Australia, Australia
| | | | | | - Xiao-Qi Zhang
- Western Crop Genetics Alliance, Food Futures Institute/School of Agriculture, Murdoch University, Murdoch, Western Australia, Australia
| | - Thomas Wicker
- Department of Plant and Microbial Biology, University of Zurich, Zurich, Switzerland.
| | | | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany.
- Institute of Agricultural and Nutritional Sciences, Martin Luther University Halle-Wittenberg, Halle, Germany.
| |
Collapse
|
9
|
Fang B, Edwards SV. Fitness consequences of structural variation inferred from a House Finch pangenome. Proc Natl Acad Sci U S A 2024; 121:e2409943121. [PMID: 39531493 PMCID: PMC11588099 DOI: 10.1073/pnas.2409943121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 10/03/2024] [Indexed: 11/16/2024] Open
Abstract
Genomic structural variants (SVs) play a crucial role in adaptive evolution, yet their average fitness effects and characterization with pangenome tools are understudied in wild animal populations. We constructed a pangenome for House Finches (Haemorhous mexicanus), a model for studies of host-pathogen coevolution, using long-read sequence data on 16 individuals (32 de novo-assembled haplotypes) and one outgroup. We identified 887,118 SVs larger than 50 base pairs, mostly (60%) involving repetitive elements, with reduced SV diversity in the eastern US as a result of its introduction by humans. The distribution of fitness effects of genome-wide SVs was estimated using maximum likelihood approaches and revealed that SVs in both coding and noncoding regions were on average more deleterious than smaller indels or single nucleotide polymorphisms. The reference-free pangenome facilitated identification of a > 10-My-old, 11-megabase-long pericentric inversion on chromosome 1. We found that the genotype frequencies of the inversion, estimated from 135 birds widely sampled temporally and geographically, increased steadily over the 25 y since House Finches were first exposed to the bacterial pathogen Mycoplasma gallisepticum and showed signatures of balancing selection, capturing genes related to immunity and telomerase activity. We also observed shorter telomeres in populations with a greater number of years exposure to Mycoplasma. Our study illustrates the utility of long-read sequencing and pangenome methods for understanding wild animal populations, estimating fitness effects of genome-wide SVs, and advancing our understanding of adaptive evolution through structural variation.
Collapse
Affiliation(s)
- Bohao Fang
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA02138
- Museum of Comparative Zoology, Harvard University, Cambridge, MA02138
| | - Scott V. Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA02138
- Museum of Comparative Zoology, Harvard University, Cambridge, MA02138
| |
Collapse
|
10
|
Chikhi R, Dufresne Y, Medvedev P. Constructing and personalizing population pangenome graphs. Nat Methods 2024; 21:1980-1981. [PMID: 39433877 PMCID: PMC11962983 DOI: 10.1038/s41592-024-02402-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
Abstract
Pangenome graphs signify a new frontier in genome representation. Recent advances in constructing and personalizing them mark progress in this area.
Collapse
Affiliation(s)
- Rayan Chikhi
- Institut Pasteur, Université Paris Cité, Sequence Bioinformatics Unit, Paris, France.
| | - Yoann Dufresne
- Institut Pasteur, Université Paris Cité, Sequence Bioinformatics Unit, Paris, France
| | - Paul Medvedev
- The Pennsylvania State University, State College, PA, USA
| |
Collapse
|
11
|
Heumos S, Heuer ML, Hanssen F, Heumos L, Guarracino A, Heringer P, Ehmele P, Prins P, Garrison E, Nahnsen S. Cluster-efficient pangenome graph construction with nf-core/pangenome. Bioinformatics 2024; 40:btae609. [PMID: 39400346 PMCID: PMC11568064 DOI: 10.1093/bioinformatics/btae609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 09/16/2024] [Accepted: 10/10/2024] [Indexed: 10/15/2024] Open
Abstract
MOTIVATION Pangenome graphs offer a comprehensive way of capturing genomic variability across multiple genomes. However, current construction methods often introduce biases, excluding complex sequences or relying on references. The PanGenome Graph Builder (PGGB) addresses these issues. To date, though, there is no state-of-the-art pipeline allowing for easy deployment, efficient and dynamic use of available resources, and scalable usage at the same time. RESULTS To overcome these limitations, we present nf-core/pangenome, a reference-unbiased approach implemented in Nextflow following nf-core's best practices. Leveraging biocontainers ensures portability and seamless deployment in High-Performance Computing (HPC) environments. Unlike PGGB, nf-core/pangenome distributes alignments across cluster nodes, enabling scalability. Demonstrating its efficiency, we constructed pangenome graphs for 1000 human chromosome 19 haplotypes and 2146 Escherichia coli sequences, achieving a two to threefold speedup compared to PGGB without increasing greenhouse gas emissions. AVAILABILITY AND IMPLEMENTATION nf-core/pangenome is released under the MIT open-source license, available on GitHub and Zenodo, with documentation accessible at https://nf-co.re/pangenome/docs/usage.
Collapse
Affiliation(s)
- Simon Heumos
- Quantitative Biology Center (QBiC) Tübingen, University of Tübingen, Tübingen, 72076, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen, 72076, Germany
- M3 Research Center, University Hospital Tübingen, Tübingen, 72076, Germany
- Institute for Bioinformatics and Medical Informatics (IBMI), Eberhard-Karls University of Tübingen, Tübingen, 72076, Germany
| | - Michael L Heuer
- University of California, Berkeley, Berkeley, CA 94720, United States
| | - Friederike Hanssen
- Quantitative Biology Center (QBiC) Tübingen, University of Tübingen, Tübingen, 72076, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen, 72076, Germany
- M3 Research Center, University Hospital Tübingen, Tübingen, 72076, Germany
- Institute for Bioinformatics and Medical Informatics (IBMI), Eberhard-Karls University of Tübingen, Tübingen, 72076, Germany
| | - Lukas Heumos
- Department of Computational Health, Institute of Computational Biology, Helmholtz Munich, Munich, 85764, Germany
- Comprehensive Pneumology Center with the CPC-M bioArchive, Helmholtz Zentrum Munich, Member of the German Center for Lung Research (DZL), Munich, 81377, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, 81377, Germany
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, United States
- Human Technopole, Milan 20157, Italy
| | - Peter Heringer
- Quantitative Biology Center (QBiC) Tübingen, University of Tübingen, Tübingen, 72076, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen, 72076, Germany
- M3 Research Center, University Hospital Tübingen, Tübingen, 72076, Germany
- Institute for Bioinformatics and Medical Informatics (IBMI), Eberhard-Karls University of Tübingen, Tübingen, 72076, Germany
| | - Philipp Ehmele
- Department of Computational Health, Institute of Computational Biology, Helmholtz Munich, Munich, 85764, Germany
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, United States
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, United States
| | - Sven Nahnsen
- Quantitative Biology Center (QBiC) Tübingen, University of Tübingen, Tübingen, 72076, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen, 72076, Germany
- M3 Research Center, University Hospital Tübingen, Tübingen, 72076, Germany
- Institute for Bioinformatics and Medical Informatics (IBMI), Eberhard-Karls University of Tübingen, Tübingen, 72076, Germany
| |
Collapse
|
12
|
Kaur H, Shannon LM, Samac DA. A stepwise guide for pangenome development in crop plants: an alfalfa (Medicago sativa) case study. BMC Genomics 2024; 25:1022. [PMID: 39482604 PMCID: PMC11526573 DOI: 10.1186/s12864-024-10931-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Accepted: 10/21/2024] [Indexed: 11/03/2024] Open
Abstract
BACKGROUND The concept of pangenomics and the importance of structural variants is gaining recognition within the plant genomics community. Due to advancements in sequencing and computational technology, it has become feasible to sequence the entire genome of numerous individuals of a single species at a reasonable cost. Pangenomes have been constructed for many major diploid crops, including rice, maize, soybean, sorghum, pearl millet, peas, sunflower, grapes, and mustards. However, pangenomes for polyploid species are relatively scarce and are available in only few crops including wheat, cotton, rapeseed, and potatoes. MAIN BODY In this review, we explore the various methods used in crop pangenome development, discussing the challenges and implications of these techniques based on insights from published pangenome studies. We offer a systematic guide and discuss the tools available for constructing a pangenome and conducting downstream analyses. Alfalfa, a highly heterozygous, cross pollinated and autotetraploid forage crop species, is used as an example to discuss the concerns and challenges offered by polyploid crop species. We conducted a comparative analysis using linear and graph-based methods by constructing an alfalfa graph pangenome using three publicly available genome assemblies. To illustrate the intricacies captured by pangenome graphs for a complex crop genome, we used five different gene sequences and aligned them against the three graph-based pangenomes. The comparison of the three graph pangenome methods reveals notable variations in the genomic variation captured by each pipeline. CONCLUSION Pangenome resources are proving invaluable by offering insights into core and dispensable genes, novel gene discovery, and genome-wide patterns of variation. Developing user-friendly online portals for linear pangenome visualization has made these resources accessible to the broader scientific and breeding community. However, challenges remain with graph-based pangenomes including compatibility with other tools, extraction of sequence for regions of interest, and visualization of genetic variation captured in pangenome graphs. These issues necessitate further refinement of tools and pipelines to effectively address the complexities of polyploid, highly heterozygous, and cross-pollinated species.
Collapse
Affiliation(s)
- Harpreet Kaur
- Department of Horticultural Science, University of Minnesota, St. Paul, MN, 55108, USA.
| | - Laura M Shannon
- Department of Horticultural Science, University of Minnesota, St. Paul, MN, 55108, USA
| | - Deborah A Samac
- USDA-ARS, Plant Science Research Unit, St. Paul, MN, 55108, USA
| |
Collapse
|
13
|
Ndiaye M, Prieto-Baños S, Fitzgerald LM, Yazdizadeh Kharrazi A, Oreshkov S, Dessimoz C, Sedlazeck FJ, Glover N, Majidian S. When less is more: sketching with minimizers in genomics. Genome Biol 2024; 25:270. [PMID: 39402664 PMCID: PMC11472564 DOI: 10.1186/s13059-024-03414-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 10/01/2024] [Indexed: 10/19/2024] Open
Abstract
The exponential increase in sequencing data calls for conceptual and computational advances to extract useful biological insights. One such advance, minimizers, allows for reducing the quantity of data handled while maintaining some of its key properties. We provide a basic introduction to minimizers, cover recent methodological developments, and review the diverse applications of minimizers to analyze genomic data, including de novo genome assembly, metagenomics, read alignment, read correction, and pangenomes. We also touch on alternative data sketching techniques including universal hitting sets, syncmers, or strobemers. Minimizers and their alternatives have rapidly become indispensable tools for handling vast amounts of data.
Collapse
Affiliation(s)
- Malick Ndiaye
- Department of Fundamental Microbiology, UNIL, Lausanne, Switzerland
| | - Silvia Prieto-Baños
- Department of Computational Biology, UNIL, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | | | - Sergey Oreshkov
- Department of Endocrinology, Diabetology, Metabolism, CHUV, Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, UNIL, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Natasha Glover
- Department of Computational Biology, UNIL, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Sina Majidian
- Department of Computational Biology, UNIL, Lausanne, Switzerland.
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
14
|
Matthews CA, Watson-Haigh NS, Burton RA, Sheppard AE. A gentle introduction to pangenomics. Brief Bioinform 2024; 25:bbae588. [PMID: 39552065 PMCID: PMC11570541 DOI: 10.1093/bib/bbae588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 09/12/2024] [Accepted: 11/01/2024] [Indexed: 11/19/2024] Open
Abstract
Pangenomes have emerged in response to limitations associated with traditional linear reference genomes. In contrast to a traditional reference that is (usually) assembled from a single individual, pangenomes aim to represent all of the genomic variation found in a group of organisms. The term 'pangenome' is currently used to describe multiple different types of genomic information, and limited language is available to differentiate between them. This is frustrating for researchers working in the field and confusing for researchers new to the field. Here, we provide an introduction to pangenomics relevant to both prokaryotic and eukaryotic organisms and propose a formalization of the language used to describe pangenomes (see the Glossary) to improve the specificity of discussion in the field.
Collapse
Affiliation(s)
- Chelsea A Matthews
- School of Agriculture, Food and Wine, Waite Campus, University of Adelaide, Urrbrae, South Australia 5064, Australia
| | - Nathan S Watson-Haigh
- Australian Genome Research Facility, Victorian Comprehensive Cancer Centre, Melbourne, Victoria 3000, Australia
- South Australian Genomics Centre, SAHMRI, North Terrace, Adelaide, South Australia 5000, Australia
- Alkahest Inc., San Carlos, CA 94070, United States
| | - Rachel A Burton
- School of Agriculture, Food and Wine, Waite Campus, University of Adelaide, Urrbrae, South Australia 5064, Australia
| | - Anna E Sheppard
- School of Biological Sciences, University of Adelaide, Adelaide, South Australia 5005, Australia
| |
Collapse
|
15
|
Schreiber M, Jayakodi M, Stein N, Mascher M. Plant pangenomes for crop improvement, biodiversity and evolution. Nat Rev Genet 2024; 25:563-577. [PMID: 38378816 PMCID: PMC7616794 DOI: 10.1038/s41576-024-00691-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/14/2023] [Indexed: 02/22/2024]
Abstract
Plant genome sequences catalogue genes and the genetic elements that regulate their expression. Such inventories further research aims as diverse as mapping the molecular basis of trait diversity in domesticated plants or inquiries into the origin of evolutionary innovations in flowering plants millions of years ago. The transformative technological progress of DNA sequencing in the past two decades has enabled researchers to sequence ever more genomes with greater ease. Pangenomes - complete sequences of multiple individuals of a species or higher taxonomic unit - have now entered the geneticists' toolkit. The genomes of crop plants and their wild relatives are being studied with translational applications in breeding in mind. But pangenomes are applicable also in ecological and evolutionary studies, as they help classify and monitor biodiversity across the tree of life, deepen our understanding of how plant species diverged and show how plants adapt to changing environments or new selection pressures exerted by human beings.
Collapse
Affiliation(s)
- Mona Schreiber
- Department of Biology, University of Marburg, Marburg, Germany
| | - Murukarthick Jayakodi
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
- Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.
| |
Collapse
|
16
|
Martayan I, Cazaux B, Limasset A, Marchet C. Conway-Bromage-Lyndon (CBL): an exact, dynamic representation of k-mer sets. Bioinformatics 2024; 40:i48-i57. [PMID: 38940123 PMCID: PMC11211824 DOI: 10.1093/bioinformatics/btae217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
SUMMARY In this article, we introduce the Conway-Bromage-Lyndon (CBL) structure, a compressed, dynamic and exact method for representing k-mer sets. Originating from Conway and Bromage's concept, CBL innovatively employs the smallest cyclic rotations of k-mers, akin to Lyndon words, to leverage lexicographic redundancies. In order to support dynamic operations and set operations, we propose a dynamic bit vector structure that draws a parallel with Elias-Fano's scheme. This structure is encapsulated in a Rust library, demonstrating a balanced blend of construction efficiency, cache locality, and compression. Our findings suggest that CBL outperforms existing dynamic k-mer set methods. Unique to this work, CBL stands out as the only known exact k-mer structure offering in-place set operations. Its different combined abilities position it as a flexible Swiss knife structure for k-mer set management. AVAILABILITY AND IMPLEMENTATION https://github.com/imartayan/CBL.
Collapse
Affiliation(s)
- Igor Martayan
- Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, Lille, F-59000, France
| | - Bastien Cazaux
- Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, Lille, F-59000, France
| | - Antoine Limasset
- Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, Lille, F-59000, France
| | - Camille Marchet
- Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, Lille, F-59000, France
| |
Collapse
|