1
|
Abstract
A single reference genome does not fully capture species diversity. By contrast, a pangenome incorporates multiple genomes to capture the entire set of nonredundant genes in a given species, along with its genome diversity. New sequencing technologies enable researchers to produce multiple high-quality genome sequences and catalog diverse genetic variations with better precision. Pangenomic studies have detected structural variants in plant genomes, dissected the genetic architecture of agronomic traits, and helped unravel molecular underpinnings and evolutionary origins of plant phenotypes. The pangenome concept has further evolved into a so-called super-pangenome that includes wild relatives within a genus or clade and shifted to graph-based reference systems. Nevertheless, building pangenomes and representing complex structural variants remain challenging in many crops. Standardized computing pipelines and common data structures are needed to compare and interpret pangenomes. The growing body of plant pangenomics data requires new algorithms, huge data storage capacity, and training to help researchers and breeders take advantage of newly discovered genes and genetic variants.
Collapse
Affiliation(s)
- Murukarthick Jayakodi
- Department of Soil and Crop Sciences, Texas A&M University, College Station, Texas, USA;
- Texas A&M AgriLife Research Center at Dallas, Texas A&M University System, Dallas, Texas, USA
| | - Hyeonah Shim
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Seeland, Germany
| | - Martin Mascher
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Leipzig, Germany;
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Seeland, Germany
| |
Collapse
|
2
|
Xin H, Strickland LW, Hamilton JP, Trusky JK, Fang C, Butler NM, Douches DS, Buell CR, Jiang J. Jan and mini-Jan, a model system for potato functional genomics. PLANT BIOTECHNOLOGY JOURNAL 2025; 23:1243-1256. [PMID: 39846980 PMCID: PMC11933877 DOI: 10.1111/pbi.14582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Revised: 12/28/2024] [Accepted: 01/02/2025] [Indexed: 01/24/2025]
Abstract
Potato (Solanum tuberosum) is the third-most important food crop in the world. Although the potato genome has been fully sequenced, functional genomics research of potato lags behind that of other major food crops, largely due to the lack of a model experimental potato line. Here, we present a diploid potato line, 'Jan,' which possesses all essential characteristics for facile functional genomics studies. Jan exhibits a high level of homozygosity after seven generations of self-pollination. Jan is vigorous, highly fertile and produces tubers with outstanding traits. Additionally, it demonstrates high regeneration rates and excellent transformation efficiencies. We generated a chromosome-scale genome assembly for Jan, annotated its genes and identified syntelogs relative to the potato reference genome assembly DMv6.1 to facilitate functional genomics. To miniaturize plant architecture, we developed two 'mini-Jan' lines with compact and dwarf plant stature through CRISPR/Cas9-mediated mutagenesis targeting the Dwarf and Erecta genes involved in growth. One mini-Jan mutant, mini-JanE, is fully fertile and will permit higher-throughput studies in limited growth chamber and greenhouse space. Thus, Jan and mini-Jan offer a robust model system that can be leveraged for gene editing and functional genomics research in potato.
Collapse
Affiliation(s)
- Haoyang Xin
- Department of Plant BiologyMichigan State UniversityEast LansingMIUSA
| | | | - John P. Hamilton
- Center for Applied Genetic TechnologiesUniversity of GeorgiaAthensGAUSA
- Department of Crop and Soil SciencesUniversity of GeorgiaAthensGAUSA
| | - Jacob K. Trusky
- Department of Plant BiologyMichigan State UniversityEast LansingMIUSA
| | - Chao Fang
- Department of Plant BiologyMichigan State UniversityEast LansingMIUSA
- Present address:
Yazhouwan National LaboratorySanyaChina
| | - Nathaniel M. Butler
- Department of HorticultureUniversity of Wisconsin‐MadisonMadisonWIUSA
- United States Department of Agriculture‐Agricultural Research ServiceVegetable Crops Research UnitMadisonWIUSA
| | - David S. Douches
- Department of Plant, Soil, and Microbial SciencesMichigan State UniversityEast LansingMIUSA
- Michigan State University AgBioResearchEast LansingMIUSA
| | - C. Robin Buell
- Center for Applied Genetic TechnologiesUniversity of GeorgiaAthensGAUSA
- Department of Crop and Soil SciencesUniversity of GeorgiaAthensGAUSA
- Institute of Plant Breeding, Genetics and GenomicsUniversity of GeorgiaAthensGAUSA
- The Plant CenterUniversity of GeorgiaAthensGAUSA
| | - Jiming Jiang
- Department of Plant BiologyMichigan State UniversityEast LansingMIUSA
- Michigan State University AgBioResearchEast LansingMIUSA
- Department of HorticultureMichigan State UniversityEast LansingMIUSA
| |
Collapse
|
3
|
Roberts MD, Davis O, Josephs EB, Williamson RJ. K-mer-based Approaches to Bridging Pangenomics and Population Genetics. Mol Biol Evol 2025; 42:msaf047. [PMID: 40111256 PMCID: PMC11925024 DOI: 10.1093/molbev/msaf047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 01/10/2025] [Accepted: 02/04/2025] [Indexed: 03/12/2025] Open
Abstract
Many commonly studied species now have more than one chromosome-scale genome assembly, revealing a large amount of genetic diversity previously missed by approaches that map short reads to a single reference. However, many species still lack multiple reference genomes and correctly aligning references to build pangenomes can be challenging for many species, limiting our ability to study this missing genomic variation in population genetics. Here, we argue that k-mers are a very useful but underutilized tool for bridging the reference-focused paradigms of population genetics with the reference-free paradigms of pangenomics. We review current literature on the uses of k-mers for performing three core components of most population genetics analyses: identifying, measuring, and explaining patterns of genetic variation. We also demonstrate how different k-mer-based measures of genetic variation behave in population genetic simulations according to the choice of k, depth of sequencing coverage, and degree of data compression. Overall, we find that k-mer-based measures of genetic diversity scale consistently with pairwise nucleotide diversity (π) up to values of about π=0.025 (R2=0.97) for neutrally evolving populations. For populations with even more variation, using shorter k-mers will maintain the scalability up to at least π=0.1. Furthermore, in our simulated populations, k-mer dissimilarity values can be reliably approximated from counting bloom filters, highlighting a potential avenue to decreasing the memory burden of k-mer-based genomic dissimilarity analyses. For future studies, there is a great opportunity to further develop methods to identifying selected loci using k-mers.
Collapse
Affiliation(s)
- Miles D Roberts
- Genetics and Genome Sciences Program, Michigan State University, East Lansing, MI 48824, USA
| | - Olivia Davis
- Department of Computer Science and Software Engineering, Rose-Hulman Institute of Technology, Terre Haute, IN 47803, USA
| | - Emily B Josephs
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
- Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, MI 48824, USA
- Plant Resilience Institute, Michigan State University, East Lansing, MI 48824, USA
| | - Robert J Williamson
- Department of Computer Science and Software Engineering, Rose-Hulman Institute of Technology, Terre Haute, IN 47803, USA
- Department of Biology and Biomedical Engineering, Rose-Hulman Institute of Technology, Terre Haute, IN 47803, USA
| |
Collapse
|
4
|
Hwang S, Brown NK, Ahmed OY, Jenike KM, Kovaka S, Schatz MC, Langmead B. Mem-based pangenome indexing for k-mer queries. Algorithms Mol Biol 2025; 20:3. [PMID: 40025556 PMCID: PMC11871630 DOI: 10.1186/s13015-025-00272-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Accepted: 02/13/2025] [Indexed: 03/04/2025] Open
Abstract
Pangenomes are growing in number and size, thanks to the prevalence of high-quality long-read assemblies. However, current methods for studying sequence composition and conservation within pangenomes have limitations. Methods based on graph pangenomes require a computationally expensive multiple-alignment step, which can leave out some variation. Indexes based on k-mers and de Bruijn graphs are limited to answering questions at a specific substring length k. We present Maximal Exact Match Ordered (MEMO), a pangenome indexing method based on maximal exact matches (MEMs) between sequences. A single MEMO index can handle arbitrary-length queries over pangenomic windows. MEMO enables both queries that test k-mer presence/absence (membership queries) and that count the number of genomes containing k-mers in a window (conservation queries). MEMO's index for a pangenome of 89 human autosomal haplotypes fits in 2.04 GB, 8.8 × smaller than a comparable KMC3 index and 11.4 × smaller than a PanKmer index. MEMO indexes can be made smaller by sacrificing some counting resolution, with our decile-resolution HPRC index reaching 0.67 GB. MEMO can conduct a conservation query for 31-mers over the human leukocyte antigen locus in 13.89 s, 2.5 × faster than other approaches. MEMO's small index size, lack of k-mer length dependence, and efficient queries make it a flexible tool for studying and visualizing substring conservation in pangenomes.
Collapse
Affiliation(s)
- Stephen Hwang
- XDBio Program, Johns Hopkins University, Baltimore, MD, USA
| | - Nathaniel K Brown
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Omar Y Ahmed
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Katharine M Jenike
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Sam Kovaka
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
5
|
Michael TP. Can a plant biologist fix a thermostat? THE NEW PHYTOLOGIST 2025; 245:1403-1410. [PMID: 39748179 PMCID: PMC11754934 DOI: 10.1111/nph.20382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 11/23/2024] [Indexed: 01/04/2025]
Abstract
The shift to reductionist biology at the dawn of the genome era yielded a 'parts list' of plant genes and a nascent understanding of complex biological processes. Today, with the genomics era in full swing, advances in high-definition genomics enabled precise temporal and spatial analyses of biological systems down to the single-cell level. These insights, coupled with artificial intelligence-driven in silico design, are propelling the development of the first synthetic plants. By integrating reductionist and systems approaches, researchers are not only reimagining plants as sources of food, fiber, and fuel but also as 'environmental thermostats' capable of mitigating the impacts of a changing climate.
Collapse
Affiliation(s)
- Todd P. Michael
- Plant Molecular and Cellular Biology LaboratoryThe Salk Institute for Biological StudiesLa JollaCA92037‐100210USA
| |
Collapse
|
6
|
Stack GM, Quade MA, Wilkerson DG, Monserrate LA, Bentz PC, Carey SB, Grimwood J, Toth JA, Crawford S, Harkess A, Smart LB. Comparison of Recombination Rate, Reference Bias, and Unique Pangenomic Haplotypes in Cannabis sativa Using Seven De Novo Genome Assemblies. Int J Mol Sci 2025; 26:1165. [PMID: 39940933 PMCID: PMC11818205 DOI: 10.3390/ijms26031165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Revised: 01/20/2025] [Accepted: 01/27/2025] [Indexed: 02/16/2025] Open
Abstract
Genomic characterization of Cannabis sativa has accelerated rapidly in the last decade as sequencing costs have decreased and public and private interest in the species has increased. Here, we present seven new chromosome-level haplotype-phased genomes of C. sativa. All of these genotypes were alive at the time of publication, and several have numerous years of associated phenotype data. We performed a k-mer-based pangenome analysis to contextualize these assemblies within over 200 existing assemblies. This allowed us to identify unique haplotypes and genomic diversity among Cannabis sativa genotypes. We leveraged linkage maps constructed from F2 progeny of two of the assembled genotypes to characterize the recombination rate across the genome showing strong periphery-biased recombination. Lastly, we re-aligned a bulk segregant analysis dataset for the major-effect flowering locus Early1 to several of the new assemblies to evaluate the impact of reference bias on the mapping results and narrow the locus to a smaller region of the chromosome. These new assemblies, combined with the continued propagation of the genotypes, will contribute to the growing body of genomic resources for C. sativa to accelerate future research efforts.
Collapse
Affiliation(s)
- George M. Stack
- Horticulture Section, School of Integrative Plant Science, Cornell University, Geneva, NY 14456, USA; (G.M.S.); (M.A.Q.); (D.G.W.); (L.A.M.); (J.A.T.)
| | - Michael A. Quade
- Horticulture Section, School of Integrative Plant Science, Cornell University, Geneva, NY 14456, USA; (G.M.S.); (M.A.Q.); (D.G.W.); (L.A.M.); (J.A.T.)
| | - Dustin G. Wilkerson
- Horticulture Section, School of Integrative Plant Science, Cornell University, Geneva, NY 14456, USA; (G.M.S.); (M.A.Q.); (D.G.W.); (L.A.M.); (J.A.T.)
| | - Luis A. Monserrate
- Horticulture Section, School of Integrative Plant Science, Cornell University, Geneva, NY 14456, USA; (G.M.S.); (M.A.Q.); (D.G.W.); (L.A.M.); (J.A.T.)
| | - Philip C. Bentz
- HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA; (P.C.B.); (S.B.C.); (J.G.); (A.H.)
| | - Sarah B. Carey
- HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA; (P.C.B.); (S.B.C.); (J.G.); (A.H.)
| | - Jane Grimwood
- HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA; (P.C.B.); (S.B.C.); (J.G.); (A.H.)
| | - Jacob A. Toth
- Horticulture Section, School of Integrative Plant Science, Cornell University, Geneva, NY 14456, USA; (G.M.S.); (M.A.Q.); (D.G.W.); (L.A.M.); (J.A.T.)
| | | | - Alex Harkess
- HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA; (P.C.B.); (S.B.C.); (J.G.); (A.H.)
| | - Lawrence B. Smart
- Horticulture Section, School of Integrative Plant Science, Cornell University, Geneva, NY 14456, USA; (G.M.S.); (M.A.Q.); (D.G.W.); (L.A.M.); (J.A.T.)
| |
Collapse
|
7
|
Xin H, Strickland LW, Hamilton JP, Trusky JK, Fang C, Butler NM, Douches DS, Buell CR, Jiang J. Jan and mini-Jan, a model system for potato functional genomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.10.627817. [PMID: 39713299 PMCID: PMC11661178 DOI: 10.1101/2024.12.10.627817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
Potato (Solanum tuberosum) is the third most important food crop in the world. Although the potato genome has been fully sequenced, functional genomics research of potato lags relative to other major food crops due primarily to the lack of a model experimental potato line. Here, we present a diploid potato line, 'Jan', which possesses all essential characteristics for facile functional genomics studies. Jan has a high level of homozygosity after seven generations of self-pollination. Jan is vigorous and highly fertile with outstanding tuber traits, high regeneration rates, and excellent transformation efficiencies. We generated a chromosome-scale genome assembly for Jan, annotated genes, and identified syntelogs relative to the potato reference genome assembly DMv6.1 to facilitate functional genomics. To miniaturize plant architecture, we developed two "mini-Jan" lines with compact and dwarf plant stature using CRISPR/Cas9-mediated mutagenesis targeting the Dwarf and Erecta genes related to growth. Mini-Jan mutants are fully fertile and will permit higher-throughput studies in limited growth chamber and greenhouse space. Thus, Jan and mini-Jan provide an outstanding model system that can be leveraged for gene editing and functional genomics research in potato.
Collapse
Affiliation(s)
- Haoyang Xin
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824, USA
| | - Luke W. Strickland
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824, USA
| | - John P. Hamilton
- Center for Applied Genetic Technologies, University of Georgia, Athens, Georgia 30602, USA
- Department of Crop and Soil Sciences, University of Georgia, Athens, Georgia 30602, USA
| | - Jacob K. Trusky
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824, USA
| | - Chao Fang
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824, USA
| | - Nathaniel M. Butler
- Department of Horticulture, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
- United States Department of Agriculture-Agricultural Research Service, Vegetable Crops Research Unit, Madison, Wisconsin 53706, USA
| | - David S. Douches
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, East Lansing, Michigan 48824, USA
- Michigan State University AgBioResearch, East Lansing, Michigan 48824, USA
| | - C. Robin Buell
- Center for Applied Genetic Technologies, University of Georgia, Athens, Georgia 30602, USA
- Department of Crop and Soil Sciences, University of Georgia, Athens, Georgia 30602, USA
- Institute of Plant Breeding, Genetics and Genomics, University of Georgia, Athens, Georgia 30602, USA
- The Plant Center, University of Georgia, Athens, Georgia 30602, USA
| | - Jiming Jiang
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824, USA
- Michigan State University AgBioResearch, East Lansing, Michigan 48824, USA
- Department of Horticulture, Michigan State University, East Lansing, Michigan 48824, USA
| |
Collapse
|
8
|
Bouhouch Y, Aggad D, Richet N, Rehman S, Al-Jaboobi M, Kehel Z, Esmaeel Q, Hafidi M, Jacquard C, Sanchez L. Early Detection of Both Pyrenophora teres f. teres and f. maculata in Asymptomatic Barley Leaves Using Digital Droplet PCR (ddPCR). Int J Mol Sci 2024; 25:11980. [PMID: 39596050 PMCID: PMC11593351 DOI: 10.3390/ijms252211980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Revised: 10/28/2024] [Accepted: 11/01/2024] [Indexed: 11/28/2024] Open
Abstract
Efficient early pathogen detection, before symptom apparition, is crucial for optimizing disease management. In barley, the fungal pathogen Pyrenophora teres is the causative agent of net blotch disease, which exists in two forms: P. teres f. sp. teres (Ptt), causing net-form of net blotch (NTNB), and P. teres f. sp. maculata (Ptm), responsible for spot-form of net blotch (STNB). In this study, we developed primers and a TaqMan probe to detect both Ptt and Ptm. A comprehensive k-mer based analysis was performed across a collection of P. teres genomes to identify the conserved regions that had potential as universal genetic markers. These regions were then analyzed for their prevalence and copy number across diverse Moroccan P. teres strains, using both a k-mer analysis for sequence identification and a phylogenetic assessment to establish genetic relatedness. The designed primer-probe set was successfully validated through qPCR, and early disease detection, prior to symptom development, was achieved using ddPCR. The k-mer analysis performed across the available P. teres genomes suggests the potential for these sequences to serve as universal markers for P. teres, transcending environmental variations.
Collapse
Affiliation(s)
- Yassine Bouhouch
- INRAE, RIBP, Université de Reims Champagne-Ardenne, USC 1488, BP 1039 Reims, France; (Y.B.); (N.R.); (Q.E.); (C.J.)
- Plateformes Technologiques URCATech, Plateau MOBICYTE, Université de Reims Champagne-Ardenne, BP 1039 Reims, France;
| | - Dina Aggad
- Plateformes Technologiques URCATech, Plateau MOBICYTE, Université de Reims Champagne-Ardenne, BP 1039 Reims, France;
| | - Nicolas Richet
- INRAE, RIBP, Université de Reims Champagne-Ardenne, USC 1488, BP 1039 Reims, France; (Y.B.); (N.R.); (Q.E.); (C.J.)
| | - Sajid Rehman
- Biodiversity and Crop Improvement Program, International Center for Agricultural Research in the Dry Areas, Rabat BP 6202, Morocco; (S.R.); (M.A.-J.); (Z.K.)
| | - Muamar Al-Jaboobi
- Biodiversity and Crop Improvement Program, International Center for Agricultural Research in the Dry Areas, Rabat BP 6202, Morocco; (S.R.); (M.A.-J.); (Z.K.)
| | - Zakaria Kehel
- Biodiversity and Crop Improvement Program, International Center for Agricultural Research in the Dry Areas, Rabat BP 6202, Morocco; (S.R.); (M.A.-J.); (Z.K.)
| | - Qassim Esmaeel
- INRAE, RIBP, Université de Reims Champagne-Ardenne, USC 1488, BP 1039 Reims, France; (Y.B.); (N.R.); (Q.E.); (C.J.)
| | - Majida Hafidi
- Laboratoire de Biotechnologie Végétale et de Biologie Moléculaire, Faculté des Sciences, Université Moulay Ismail, Zitoune, Meknès BP 11201, Morocco;
| | - Cédric Jacquard
- INRAE, RIBP, Université de Reims Champagne-Ardenne, USC 1488, BP 1039 Reims, France; (Y.B.); (N.R.); (Q.E.); (C.J.)
| | - Lisa Sanchez
- INRAE, RIBP, Université de Reims Champagne-Ardenne, USC 1488, BP 1039 Reims, France; (Y.B.); (N.R.); (Q.E.); (C.J.)
| |
Collapse
|
9
|
Longo GC, Minich JJ, Allsing N, James K, Adams-Herrmann ES, Larson W, Hartwick N, Duong T, Muhling B, Michael TP, Craig MT. Crossing the Pacific: Genomics Reveals the Presence of Japanese Sardine (Sardinops melanosticta) in the California Current Large Marine Ecosystem. Mol Ecol 2024; 33:e17561. [PMID: 39440436 DOI: 10.1111/mec.17561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Accepted: 09/12/2024] [Indexed: 10/25/2024]
Abstract
Recent increases in frequency and intensity of warm water anomalies and marine heatwaves have led to shifts in species ranges and assemblages. Genomic tools can be instrumental in detecting such shifts. In the early stages of a project assessing population genetic structure in Pacific Sardine (Sardinops sagax), we detected the presence of Japanese Sardine (Sardinops melanosticta) along the west coast of North America for the first time. We assembled a high quality, chromosome-scale reference genome of the Pacific Sardine and generated low coverage, whole genome sequence (lcWGS) data for 345 sardine collected in the California Current Large Marine Ecosystem (CCLME) in 2021 and 2022. Fifty individuals sampled in 2022 were identified as Japanese Sardine based on strong differentiation observed in lcWGS SNP and full mitogenome data. Although we detected a single case of mitochondrial introgression, we did not observe evidence for recent hybridization events. These findings change our understanding of Sardinops spp. distribution and dispersal in the Pacific and highlight the importance of long-term monitoring programs.
Collapse
Affiliation(s)
- Gary C Longo
- National Marine Fisheries Service, Southwest Fisheries Science Center, Ocean Associates, Inc., Under Contract to the National Oceanic and Atmospheric Administration, La Jolla, California, USA
| | - Jeremiah J Minich
- The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, California, USA
| | - Nicholas Allsing
- The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, California, USA
| | - Kelsey James
- National Marine Fisheries Service, Southwest Fisheries Science Center, National Oceanographic and Atmospheric Administration, La Jolla, California, USA
| | - Ella S Adams-Herrmann
- National Marine Fisheries Service, Southwest Fisheries Science Center, Ocean Associates, Inc., Under Contract to the National Oceanic and Atmospheric Administration, La Jolla, California, USA
- University of San Diego, San Diego, California, USA
- University of Central Florida, Department of Biology, Orlando, FL, USA
| | - Wes Larson
- National Oceanographic and Atmospheric Administration, National Marine Fisheries Service, Alaska Fisheries Science Center, Auke Bay Laboratories, Juneau, Alaska, USA
| | - Nolan Hartwick
- The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, California, USA
| | - Tiffany Duong
- The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, California, USA
| | - Barbara Muhling
- National Marine Fisheries Service, Southwest Fisheries Science Center, National Oceanographic and Atmospheric Administration, La Jolla, California, USA
- Institute of Marine Sciences Fisheries Collaborative Program, University of California, Santa Cruz, Santa Cruz, California, USA
| | - Todd P Michael
- The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, California, USA
| | - Matthew T Craig
- National Marine Fisheries Service, Southwest Fisheries Science Center, National Oceanographic and Atmospheric Administration, La Jolla, California, USA
| |
Collapse
|
10
|
Sangar S, Kolage P, Chunarkar-Patil P. Species annotation using a k-mer based KNN model. Bioinformation 2024; 20:986-989. [PMID: 39917243 PMCID: PMC11795478 DOI: 10.6026/973206300200986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2024] [Revised: 09/30/2024] [Accepted: 09/30/2024] [Indexed: 02/09/2025] Open
Abstract
Bacterial identification is a critical process in microbiology, clinical diagnostics, environmental monitoring, and food safety. Machine learning holds great promise for improving bacterial identification by increasing accuracy, speed, and scalability. However, challenges such as data dependency, model interpretability, and computational demands must be addressed to fully realize it's potential. k-mer based bacterial identification algorithm is an attempt to address these issues. Sequence matching is completed using the KNN technique. This included feature extraction, dataset preparation, classifier training, and label prediction based on k-mer frequency distribution similarity. The algorithm's performance has been cross-checked through accuracy assessment metrics such as F1 score and precision with an impressive 93% accuracy rate.
Collapse
Affiliation(s)
- Srushti Sangar
- Department of Bioinformatics, Rajiv Gandhi Institute of IT and Biotechnology, Bharati Vidyapeeth (Deemed to be University), Pune, Maharashtra, India
| | - Prathamesh Kolage
- Department of Bioinformatics, Rajiv Gandhi Institute of IT and Biotechnology, Bharati Vidyapeeth (Deemed to be University), Pune, Maharashtra, India
| | - Pritee Chunarkar-Patil
- Department of Bioinformatics, Rajiv Gandhi Institute of IT and Biotechnology, Bharati Vidyapeeth (Deemed to be University), Pune, Maharashtra, India
| |
Collapse
|
11
|
Barragan AC, Latorre SM, Malmgren A, Harant A, Win J, Sugihara Y, Burbano HA, Kamoun S, Langner T. Multiple Horizontal Mini-chromosome Transfers Drive Genome Evolution of Clonal Blast Fungus Lineages. Mol Biol Evol 2024; 41:msae164. [PMID: 39107250 PMCID: PMC11346369 DOI: 10.1093/molbev/msae164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 07/02/2024] [Accepted: 07/31/2024] [Indexed: 08/09/2024] Open
Abstract
Crop disease pandemics are often driven by asexually reproducing clonal lineages of plant pathogens that reproduce asexually. How these clonal pathogens continuously adapt to their hosts despite harboring limited genetic variation, and in absence of sexual recombination remains elusive. Here, we reveal multiple instances of horizontal chromosome transfer within pandemic clonal lineages of the blast fungus Magnaporthe (Syn. Pyricularia) oryzae. We identified a horizontally transferred 1.2Mb accessory mini-chromosome which is remarkably conserved between M. oryzae isolates from both the rice blast fungus lineage and the lineage infecting Indian goosegrass (Eleusine indica), a wild grass that often grows in the proximity of cultivated cereal crops. Furthermore, we show that this mini-chromosome was horizontally acquired by clonal rice blast isolates through at least nine distinct transfer events over the past three centuries. These findings establish horizontal mini-chromosome transfer as a mechanism facilitating genetic exchange among different host-associated blast fungus lineages. We propose that blast fungus populations infecting wild grasses act as genetic reservoirs that drive genome evolution of pandemic clonal lineages that afflict cereal crops.
Collapse
Affiliation(s)
- Ana Cristina Barragan
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, Norwich, UK
| | - Sergio M Latorre
- Department of Genetics, Evolution and Environment, Centre for Life's Origins and Evolution, University College London, London, UK
| | - Angus Malmgren
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, Norwich, UK
| | - Adeline Harant
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, Norwich, UK
| | - Joe Win
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, Norwich, UK
| | - Yu Sugihara
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, Norwich, UK
| | - Hernán A Burbano
- Department of Genetics, Evolution and Environment, Centre for Life's Origins and Evolution, University College London, London, UK
| | - Sophien Kamoun
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, Norwich, UK
| | - Thorsten Langner
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, Norwich, UK
| |
Collapse
|
12
|
Backman T, Latorre SM, Symeonidi E, Muszyński A, Bleak E, Eads L, Martinez-Koury PI, Som S, Hawks A, Gloss AD, Belnap DM, Manuel AM, Deutschbauer AM, Bergelson J, Azadi P, Burbano HA, Karasov TL. A phage tail-like bacteriocin suppresses competitors in metapopulations of pathogenic bacteria. Science 2024; 384:eado0713. [PMID: 38870284 PMCID: PMC11404688 DOI: 10.1126/science.ado0713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 04/24/2024] [Indexed: 06/15/2024]
Abstract
Bacteria can repurpose their own bacteriophage viruses (phage) to kill competing bacteria. Phage-derived elements are frequently strain specific in their killing activity, although there is limited evidence that this specificity drives bacterial population dynamics. Here, we identified intact phage and their derived elements in a metapopulation of wild plant-associated Pseudomonas genomes. We discovered that the most abundant viral cluster encodes a phage remnant resembling a phage tail called a tailocin, which bacteria have co-opted to kill bacterial competitors. Each pathogenic Pseudomonas strain carries one of a few distinct tailocin variants that target the variable polysaccharides in the outer membrane of co-occurring pathogenic Pseudomonas strains. Analysis of herbarium samples from the past 170 years revealed that the same tailocin and bacterial receptor variants have persisted in Pseudomonas populations. These results suggest that tailocin genetic diversity can be mined to develop targeted "tailocin cocktails" for microbial control.
Collapse
Affiliation(s)
- Talia Backman
- School of Biological Sciences, University of Utah, Salt Lake City, UT 84112, USA
| | - Sergio M. Latorre
- Centre for Life’s Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
- Research Group for Ancient Genomics and Evolution, Department of Molecular Biology, Max Planck Institute for Biology, 72076 Tübingen, Germany
| | - Efthymia Symeonidi
- School of Biological Sciences, University of Utah, Salt Lake City, UT 84112, USA
| | - Artur Muszyński
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA 30602, USA
| | - Ella Bleak
- School of Biological Sciences, University of Utah, Salt Lake City, UT 84112, USA
| | - Lauren Eads
- School of Biological Sciences, University of Utah, Salt Lake City, UT 84112, USA
| | | | - Sarita Som
- School of Biological Sciences, University of Utah, Salt Lake City, UT 84112, USA
| | - Aubrey Hawks
- School of Biological Sciences, University of Utah, Salt Lake City, UT 84112, USA
| | - Andrew D. Gloss
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - David M. Belnap
- School of Biological Sciences, University of Utah, Salt Lake City, UT 84112, USA
- Department of Biochemistry, University of Utah, Salt Lake City, UT 84112, USA
| | - Allison M. Manuel
- Mass Spectrometry and Proteomics Core, The University of Utah, Salt Lake City, UT 84112, USA
| | - Adam M. Deutschbauer
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Joy Bergelson
- Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Parastoo Azadi
- Complex Carbohydrate Research Center, University of Georgia, Athens, GA 30602, USA
| | - Hernán A. Burbano
- Centre for Life’s Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
- Research Group for Ancient Genomics and Evolution, Department of Molecular Biology, Max Planck Institute for Biology, 72076 Tübingen, Germany
| | - Talia L. Karasov
- School of Biological Sciences, University of Utah, Salt Lake City, UT 84112, USA
| |
Collapse
|
13
|
Shi G, Dai Y, Zhou D, Chen M, Zhang J, Bi Y, Liu S, Wu Q. An alignment- and reference-free strategy using k-mer present pattern for population genomic analyses. Mycology 2024; 16:309-323. [PMID: 40083414 PMCID: PMC11899203 DOI: 10.1080/21501203.2024.2358868] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Accepted: 05/17/2024] [Indexed: 03/16/2025] Open
Abstract
Pangenomes are replacing single reference genomes to capture all variants within a species or clade, but their analysis predominantly leverages graph-based methods that require multiple high-quality genomes and computationally intensive multiple-genome alignments. K-mer decomposition is an alternative to graph-based pangenomes. However, how to directly use k-mers for the population genetic analyses is unknown. Here, we developed a novel strategy that uses the variants of k-mer count in the genome for population analyses. To test the effectivity of this method, we compared it directly to the SNP-based method on the analysis of population structure and genetic diversity of 267 Saccharomyces cerevisiae strains within two simulated datasets and a real sequence dataset. The population structure identified with k-mers recapitulates that obtained using SNPs, indicating the effectiveness of k-mer-based approach, and higher genetic diversity within real dataset supported k-mers contained more genetic variants. Based on k-mer frequency, we found not only SNP but also some insertion/deletion and horizontal gene transfer (HGT) fragments related to the adaptive evolution of S. cerevisiae. Our study creates a framework for the alignment- and reference-free (ARF) method in population genetic analyses, which will be more pronounced in the species with no complete genome or highly diverged species.
Collapse
Affiliation(s)
- Guohui Shi
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| | - Yi Dai
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
- College of Life Science, University of the Chinese Academy of Sciences, Beijing, China
| | - Da Zhou
- School of Mathematical Sciences, Xiamen University, Xiamen, China
| | - Mengmeng Chen
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
- College of Life Science, University of the Chinese Academy of Sciences, Beijing, China
| | - Jiaqi Zhang
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
- College of Life Science, University of the Chinese Academy of Sciences, Beijing, China
| | - Yilong Bi
- School of Mathematical Sciences, Xiamen University, Xiamen, China
| | - Shuai Liu
- College of Life Science, University of the Chinese Academy of Sciences, Beijing, China
| | - Qi Wu
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
14
|
Hwang S, Brown NK, Ahmed OY, Jenike KM, Kovaka S, Schatz MC, Langmead B. MEM-based pangenome indexing for k-mer queries. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.20.595044. [PMID: 38826299 PMCID: PMC11142109 DOI: 10.1101/2024.05.20.595044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Pangenomes are growing in number and size, thanks to the prevalence of high-quality long-read assemblies. However, current methods for studying sequence composition and conservation within pangenomes have limitations. Methods based on graph pangenomes require a computationally expensive multiple-alignment step, which can leave out some variation. Indexes based on k-mers and de Bruijn graphs are limited to answering questions at a specific substring length k. We present Maximal Exact Match Ordered (MEMO), a pangenome indexing method based on maximal exact matches (MEMs) between sequences. A single MEMO index can handle arbitrary-length queries over pangenomic windows. MEMO enables both queries that test k-mer presence/absence (membership queries) and that count the number of genomes containing k-mers in a window (conservation queries). MEMO's index for a pangenome of 89 human autosomal haplotypes fits in 2.04 GB, 8.8× smaller than a comparable KMC3 index and 11.4× smaller than a PanKmer index. MEMO indexes can be made smaller by sacrificing some counting resolution, with our decile-resolution HPRC index reaching 0.67 GB. MEMO can conduct a conservation query for 31-mers over the human leukocyte antigen locus in 13.89 seconds, 2.5x faster than other approaches. MEMO's small index size, lack of k-mer length dependence, and efficient queries make it a flexible tool for studying and visualizing substring conservation in pangenomes.
Collapse
Affiliation(s)
- Stephen Hwang
- XDBio Program, Johns Hopkins University, Baltimore MD, USA
| | - Nathaniel K. Brown
- Department of Computer Science, Johns Hopkins University, Baltimore MD, USA
| | - Omar Y. Ahmed
- Department of Computer Science, Johns Hopkins University, Baltimore MD, USA
| | | | - Sam Kovaka
- Department of Computer Science, Johns Hopkins University, Baltimore MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore MD, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore MD, USA
| |
Collapse
|
15
|
Wang X, Li P, Wang R, Gao X. PseUpred-ELPSO Is an Ensemble Learning Predictor with Particle Swarm Optimizer for Improving the Prediction of RNA Pseudouridine Sites. BIOLOGY 2024; 13:248. [PMID: 38666860 PMCID: PMC11048358 DOI: 10.3390/biology13040248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 03/27/2024] [Accepted: 04/01/2024] [Indexed: 04/28/2024]
Abstract
RNA pseudouridine modification exists in different RNA types of many species, and it has a significant role in regulating the expression of biological processes. To understand the functional mechanisms for RNA pseudouridine sites, the accurate identification of pseudouridine sites in RNA sequences is essential. Although several fast and inexpensive computational methods have been proposed, the challenge of improving recognition accuracy and generalization still exists. This study proposed a novel ensemble predictor called PseUpred-ELPSO for improved RNA pseudouridine site prediction. After analyzing the nucleotide composition preferences between RNA pseudouridine site sequences, two feature representations were determined and fed into the stacking ensemble framework. Then, using five tree-based machine learning classifiers as base classifiers, 30-dimensional RNA profiles are constructed to represent RNA sequences, and using the PSO algorithm, the weights of the RNA profiles were searched to further enhance the representation. A logistic regression classifier was used as a meta-classifier to complete the final predictions. Compared to the most advanced predictors, the performance of PseUpred-ELPSO is superior in both cross-validation and the independent test. Based on the PseUpred-ELPSO predictor, a free and easy-to-operate web server has been established, which will be a powerful tool for pseudouridine site identification.
Collapse
Affiliation(s)
- Xiao Wang
- School of Computer Science and Technology, Zhengzhou University of Light Industry, No. 136, Science Avenue, Zhengzhou 450002, China; (X.W.); (P.L.)
- Henan Provincial Key Laboratory of Data Intelligence for Food Safety, Zhengzhou University of Light Industry, No. 136, Science Avenue, Zhengzhou 450002, China
| | - Pengfei Li
- School of Computer Science and Technology, Zhengzhou University of Light Industry, No. 136, Science Avenue, Zhengzhou 450002, China; (X.W.); (P.L.)
| | - Rong Wang
- School of Electronic Information, Zhengzhou University of Light Industry, No. 136, Science Avenue, Zhengzhou 450002, China;
| | - Xu Gao
- National Supercomputing Center in Zhengzhou, School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China
| |
Collapse
|