1
|
Gao Y, Yang L, Kuhn K, Li W, Zanton G, Bowman M, Zhao P, Zhou Y, Fang L, Cole JB, Rosen BD, Ma L, Li C, Baldwin RL, Van Tassell CP, Zhang Z, Smith TPL, Liu GE. Long read and preliminary pangenome analyses reveal breed-specific structural variations and novel sequences in Holstein and Jersey cattle. J Adv Res 2025:S2090-1232(25)00258-9. [PMID: 40258473 DOI: 10.1016/j.jare.2025.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2024] [Revised: 04/06/2025] [Accepted: 04/10/2025] [Indexed: 04/23/2025] Open
Abstract
INTRODUCTION Most SV studies in livestock rely on short-read sequencing, posing challenges in accurately characterizing large genomic variants due to their limited read length. OBJECTIVES Our goal is to reveal structural variation and novel sequences specific to Holstein and Jersey cattle breeds using long-read and pan-genome analyses. METHODS We sequenced 20 Holsteins and 8 Jersey cattle using PacBio HiFi to 20×, and integrated five read-based and one assembly-based SV caller to determine SVs. RESULTS We assembled the 28 genomes averaging 3.25 Gb with a contig N50 of 69.36 Mb and using the ARS-UCD1.2 reference, we acquired Holstein/Jersey SV catalogs with 74,068/54,689 events spanning 202/135 Mb (7.43 %/4.97 % of the genome). SVs were enriched in less conserved, non-coding, and non-regulatory regions. Comparing Holsteins with differing feed efficiency (FE), SVs unique to high FE were linked to energy metabolism and olfactory receptors, while those specific to low FE were associated with material transport. We constructed Holstein/Jersey pangenome graphs with 148,598/105,875 nodes and 208,891/147,990 edges, representing 47,028/37,137 biallelic and multi-allelic events, and 63.75/42.34 Mb of novel sequence. We observed SV count saturation with 20 Holsteins, while adding Jerseys significantly increased the SV count, highlighting breed-specific SV events. CONCLUSION Our long-read data and SV catalogs are valuable resources, revealing that the cattle genome is more complex than previously thought.
Collapse
Affiliation(s)
- Yahui Gao
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China; Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA; Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA.
| | - Liu Yang
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA; Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA.
| | - Kristen Kuhn
- USDA, ARS, U.S. Meat Animal Research Center (USMARC), Clay Center, NE, USA.
| | - Wenli Li
- US Dairy Forage Research Center, USDA-ARS, Madison, WI, USA.
| | - Geoffrey Zanton
- US Dairy Forage Research Center, USDA-ARS, Madison, WI, USA.
| | - Mary Bowman
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Pengju Zhao
- Hainan Institute, Zhejiang University, Yongyou Industry Park, Yazhou Bay Sci-Tech City, Sanya 572000, China.
| | - Yang Zhou
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China.
| | - Lingzhao Fang
- Quantitative Genetics and Genomics (QGG), Aarhus University, Aarhus, Denmark.
| | - John B Cole
- Council on Dairy Cattle Breeding, 4201 Northview Dr, Bowie, MD 20716, USA; Department of Animal Sciences, Donald Henry Barron Reproductive and Perinatal Biology Research Program, and the Genetics Institute, University of Florida, Gainesville, FL 32611-0910, USA; Department of Animal Science, North Carolina State University, Raleigh, NC 27695-7621, USA.
| | - Benjamin D Rosen
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Li Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA.
| | - Congjun Li
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Ransom L Baldwin
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Curtis P Van Tassell
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| | - Zhe Zhang
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
| | - Timothy P L Smith
- USDA, ARS, U.S. Meat Animal Research Center (USMARC), Clay Center, NE, USA.
| | - George E Liu
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD 20705, USA.
| |
Collapse
|
2
|
Milia S, Leonard AS, Mapel XM, Bernal Ulloa SM, Drögemüller C, Pausch H. Taurine pangenome uncovers a segmental duplication upstream of KIT associated with depigmentation in white-headed cattle. Genome Res 2025; 35:1041-1052. [PMID: 39694857 PMCID: PMC12047182 DOI: 10.1101/gr.279064.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 12/02/2024] [Indexed: 12/20/2024]
Abstract
Cattle have been selectively bred for coat color, spotting, and depigmentation patterns. The assumed autosomal dominant inherited genetic variants underlying the characteristic white head of Fleckvieh, Simmental, and Hereford cattle have not been identified yet, although the contribution of structural variation upstream of the KIT gene has been proposed. Here, we construct a graph pangenome from 24 haplotype assemblies representing seven taurine cattle breeds to identify and characterize the white-head-associated locus for the first time based on long-read sequencing data and pangenome analyses. We introduce a pangenome-wide association mapping approach that examines assembly path similarities within the graph to reveal an association between two most likely serial alleles of a complex structural variant (SV) 66 kb upstream of KIT and facial depigmentation. The complex SV contains a variable number of tandemly duplicated 14.3 kb repeats, consisting of LTRs, LINEs, and other repetitive elements, leading to misleading alignments of short and long reads when using a linear reference. We align 250 short-read sequencing samples spanning 15 cattle breeds to the pangenome graph, further validating that the alleles of the SV segregate with head depigmentation. We estimate an increased count of repeats in Hereford relative to Simmental and other white-headed cattle breeds from the graph alignment coverage, suggesting a large under-assembly in the current Hereford-based cattle reference genome, which had fewer copies. Our work shows that exploiting assembly path similarities within graph pangenomes can reveal trait-associated complex SVs.
Collapse
Affiliation(s)
- Sotiria Milia
- Animal Genomics, ETH Zurich, Zurich 8092, Switzerland
| | | | | | | | - Cord Drögemüller
- Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern 3012, Switzerland
| | - Hubert Pausch
- Animal Genomics, ETH Zurich, Zurich 8092, Switzerland;
| |
Collapse
|
3
|
Azam S, Sahu A, Pandey NK, Neupane M, Van Tassell CP, Rosen BD, Gandham RK, Rath SN, Majumdar SS. Constructing a draft Indian cattle pangenome using short-read sequencing. Commun Biol 2025; 8:605. [PMID: 40223124 PMCID: PMC11994783 DOI: 10.1038/s42003-025-07978-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Accepted: 03/21/2025] [Indexed: 04/15/2025] Open
Abstract
Indian desi cattle, known for their adaptability and phenotypic diversity, represent a valuable genetic resource. However, a single reference genome often fails to capture the full extent of their genetic variation. To address this, we construct a pangenome for desi cattle by identifying and characterizing non-reference novel sequences (NRNS). We sequence 68 genomes from seven breeds, generating 48.35 billion short reads. Using the PanGenome Analysis (PanGA) pipeline, we identify 13,065 NRNS (~41 Mbp), with substantial variation across the population. Most NRNS were unique to desi cattle, with minimal overlap (4.1%) with the Chinese indicine pangenome. Approximately 40% of NRNS exhibited ancestral origins within the Bos genus and were enriched in genic regions, suggesting functional roles. These sequences are linked to quantitative trait loci for traits such as milk production. The pangenome approach enhances read mapping accuracy, reduces spurious single nucleotide polymorphism calls, and uncovers novel genetic variants, offering a deeper understanding of desi cattle genomics.
Collapse
Affiliation(s)
- Sarwar Azam
- National Institute of Animal Biotechnology, Hyderabad, India
- Indian Institute of Technology Hyderabad, Sangareddy, India
| | - Abhisek Sahu
- National Institute of Animal Biotechnology, Hyderabad, India
| | | | - Mahesh Neupane
- Animal Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD, USA
| | | | - Benjamin D Rosen
- Animal Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD, USA.
| | | | | | | |
Collapse
|
4
|
Xu S, Akhatayeva Z, Liu J, Feng X, Yu Y, Badaoui B, Esmailizadeh A, Kantanen J, Amills M, Lenstra JA, Johansson AM, Coltman DW, Liu GE, Curik I, Orozco-terWengel P, Paiva SR, Zinovieva NA, Zhang L, Yang J, Liu Z, Wang Y, Yu Y, Li M. Genetic advancements and future directions in ruminant livestock breeding: from reference genomes to multiomics innovations. SCIENCE CHINA. LIFE SCIENCES 2025; 68:934-960. [PMID: 39609363 DOI: 10.1007/s11427-024-2744-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Accepted: 09/24/2024] [Indexed: 11/30/2024]
Abstract
Ruminant livestock provide a rich source of products, such as meat, milk, and wool, and play a critical role in global food security and nutrition. Over the past few decades, genomic studies of ruminant livestock have provided valuable insights into their domestication and the genetic basis of economically important traits, facilitating the breeding of elite varieties. In this review, we summarize the main advancements for domestic ruminants in reference genome assemblies, population genomics, and the identification of functional genes or variants for phenotypic traits. These traits include meat and carcass quality, reproduction, milk production, feed efficiency, wool and cashmere yield, horn development, tail type, coat color, environmental adaptation, and disease resistance. Functional genomic research is entering a new era with the advancements of graphical pangenomics and telomere-to-telomere (T2T) gap-free genome assembly. These advancements promise to improve our understanding of domestication and the molecular mechanisms underlying economically important traits in ruminant livestock. Finally, we provide new perspectives and future directions for genomic research on ruminant genomes. We suggest how ever-increasing multiomics datasets will facilitate future studies and molecular breeding in livestock, including the potential to uncover novel genetic mechanisms underlying phenotypic traits, to enable more accurate genomic prediction models, and to accelerate genetic improvement programs.
Collapse
Affiliation(s)
- Songsong Xu
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Zhanerke Akhatayeva
- Institute of Grassland Research, Chinese Academy of Agricultural Sciences, Hohhot, 010010, China
| | - Jiaxin Liu
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Xueyan Feng
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Yi Yu
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Bouabid Badaoui
- Laboratory of Biodiversity, Ecology and Genome, Department of Biology, Faculty of Sciences Rabat, Mohammed V University, Rabat, 10106, Morocco
| | - Ali Esmailizadeh
- Department of Animal Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, 76169-133, Iran
| | - Juha Kantanen
- Production Systems, Natural Resources Institute Finland (Luke), Jokioinen, FI-31600, Finland
| | - Marcel Amills
- Department of Animal Genetics, Center for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus de la Universitat Autónoma de Barcelona, Bellaterra, 08193, Spain
- Departament de Ciència Animal i dels Aliments, Universitat Autónoma de Barcelona, Bellaterra, 08193, Spain
| | - Johannes A Lenstra
- Faculty of Veterinary Medicine, Utrecht University, Utrecht, 3584, The Netherlands
| | - Anna M Johansson
- Department of Animal Breeding and Genetics, Faculty of Veterinary Medicine and Animal Science, Swedish University of Agricultural Sciences, Uppsala, 75007, Sweden
| | - David W Coltman
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta, T6G 2E9, Canada
- Department of Biology, Western University, London, Ontario, N6A 5B7, Canada
| | - George E Liu
- Animal Genomics and Improvement Laboratory, BARC, USDA-ARS, Beltsville, MD, 20705, USA
| | - Ino Curik
- Department of Animal Science, Faculty of Agriculture, University of Zagreb, Zagreb, 10000, Croatia
- Institute of Animal Sciences, Hungarian University of Agriculture and Life Sciences (MATE), Kaposvár, 7400, Hungary
| | | | - Samuel R Paiva
- Embrapa Genetic Resources and Biotechnology, Laboratory of Animal Genetics, Brasília, Federal District, 70770917, Brazil
| | - Natalia A Zinovieva
- L.K. Ernst Federal Science Center for Animal Husbandry, Moscow Region, Podolsk, 142132, Russian Federation
| | - Linwei Zhang
- Department of Neurology, China-Japan Friendship Hospital, Beijing, 100029, China
| | - Ji Yang
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Zhihong Liu
- College of Animal Science, Inner Mongolia Agricultural University, Hohhot, 010018, China
| | - Yachun Wang
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Ying Yu
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Menghua Li
- Frontiers Science Center for Molecular Design Breeding (MOE); State Key Laboratory of Animal Biotech Breeding; College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
- Sanya Institute of China Agricultural University, Sanya, 572024, China.
| |
Collapse
|
5
|
Edwards SV, Fang B, Khost D, Kolyfetis GE, Cheek RG, DeRaad DA, Chen N, Fitzpatrick JW, McCormack JE, Funk WC, Ghalambor CK, Garrison E, Guarracino A, Li H, Sackton TB. Comparative population pangenomes reveal unexpected complexity and fitness effects of structural variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.11.637762. [PMID: 39990470 PMCID: PMC11844517 DOI: 10.1101/2025.02.11.637762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/25/2025]
Abstract
Structural variants (SVs) are widespread in vertebrate genomes, yet their evolutionary dynamics remain poorly understood. Using 45 long-read de novo genome assemblies and pangenome tools, we analyze SVs within three closely related species of North American jays (Aphelocoma, scrub-jays) displaying a 60-fold range in effective population size. We find rapid evolution of genome architecture, including ~100 Mb variation in genome size driven by dynamic satellite landscapes with unexpectedly long (> 10 kb) repeat units and widespread variation in gene content, influencing gene expression. SVs exhibit slightly deleterious dynamics modulated by variant length and population size, with strong evidence of adaptive fixation only in large populations. Our results demonstrate how population size shapes the distribution of SVs and the importance of pangenomes to characterizing genomic diversity.
Collapse
Affiliation(s)
- Scott V. Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA, 2138, USA
- Museum of Comparative Zoology, Harvard University, 26 Oxford Street, Cambridge, MA, 2138, USA
| | - Bohao Fang
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA, 2138, USA
- Museum of Comparative Zoology, Harvard University, 26 Oxford Street, Cambridge, MA, 2138, USA
| | - Danielle Khost
- Informatics Group, Harvard University, 52 Oxford St, Cambridge, MA, 2138, USA
| | - George E Kolyfetis
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA, 2138, USA
| | - Rebecca G Cheek
- Department of Biology, Graduate Degree Program in Ecology, Colorado State University, 1878 Campus Delivery, Fort Collins, CO, 80523, USA
| | - Devon A DeRaad
- Moore Laboratory of Zoology, Occidental College, 1600 Campus Rd, Los Angeles, CA, 90041, USA
| | - Nancy Chen
- Department of Biology, University of Rochester, 477 Hutchison Hall, Box 270211, Rochester, NY, 14627, USA
| | - John W Fitzpatrick
- Cornell Lab of Ornithology, Cornell University, 159 Sapsucker Woods Rd, Ithaca, NY, 14850, USA
| | - John E. McCormack
- Moore Laboratory of Zoology, Occidental College, 1600 Campus Rd, Los Angeles, CA, 90041, USA
| | - W. Chris Funk
- Department of Biology, Graduate Degree Program in Ecology, Colorado State University, 1878 Campus Delivery, Fort Collins, CO, 80523, USA
| | - Cameron K Ghalambor
- Department of Biology, Norwegian University of Science and Technology, Høgskoleringen 5, Realfagbygget D1-137, Trondheim, 7491, Norway
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, 71 S. Manassas Street, Memphis, TN, 38163, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, 71 S. Manassas Street, Memphis, TN, 38163, USA
| | - Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, 450 Brookline Ave, Mailstop: CLSB 11007, Boston, MA, 2215
| | - Timothy B Sackton
- Informatics Group, Harvard University, 52 Oxford St, Cambridge, MA, 2138, USA
| |
Collapse
|
6
|
Ruperao P, Rangan P, Shah T, Sharma V, Rathore A, Mayes S, Pandey MK. Developing pangenomes for large and complex plant genomes and their representation formats. J Adv Res 2025:S2090-1232(25)00071-2. [PMID: 39894347 DOI: 10.1016/j.jare.2025.01.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 01/27/2025] [Accepted: 01/27/2025] [Indexed: 02/04/2025] Open
Abstract
BACKGROUND The development of pangenomes has revolutionized genomic studies by capturing the complete genetic diversity within a species. Pangenome assembly integrates data from multiple individuals to construct a comprehensive genomic landscape, revealing both core and accessory genomic elements. This approach enables the identification of novel genes, structural variations, and gene presence-absence variations, providing insights into species evolution, adaptation, and trait variation. Representing pangenomes requires innovative visualization formats that effectively convey the complex genomic structures and variations. AIM This review delves into contemporary methodologies and recent advancements in constructing pangenomes, particularly in plant genomes. It examines the structure of pangenome representation, including format comparison, conversion, visualization techniques, and their implications for enhancing crop improvement strategies. KEY SCIENTIFIC CONCEPTS OF REVIEW Earlier comparative studies have illuminated novel gene sequences, copy number variations, and presence-absence variations across diverse crop species. The concept of a pan-genome, which captures multiple genetic variations from a broad spectrum of genotypes, offers a holistic perspective of a species' genetic makeup. However, constructing a pan-genome for plants with larger genomes poses challenges, including managing vast genome sequence data and comprehending the genetic variations within the germplasm. To address these challenges, researchers have explored cost-effective alternatives to encapsulate species diversity in a single assembly known as a pangenome. This involves reducing the volume of genome sequences while focusing on genetic variations. With the growing prominence of the pan-genome concept in plant genomics, several software tools have emerged to facilitate pangenome construction. This review sheds light on developing and utilizing software tools tailored for constructing pan-genomes in plants. It also discusses representation formats suitable for downstream analyses, offering valuable insights into the genetic landscape and evolutionary dynamics of plant species. In summary, this review underscores the significance of pan-genome construction and representation formats in resolving the genetic architecture of plants, particularly those with complex genomes. It provides a comprehensive overview of recent advancements, aiding in exploring and understanding plant genetic diversity.
Collapse
Affiliation(s)
- Pradeep Ruperao
- Center of Excellence in Genomics and Systems Biology (CEGSB) and Center for Pre-Breeding Research (CPBR), International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India.
| | - Parimalan Rangan
- ICAR-National Bureau of Plant Genetic Resources (NBPGR), New Delhi, India; Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, Australia
| | - Trushar Shah
- International Institute of Tropical Agriculture (IITA), Nairobi, Kenya
| | - Vinay Sharma
- Center of Excellence in Genomics and Systems Biology (CEGSB) and Center for Pre-Breeding Research (CPBR), International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | - Abhishek Rathore
- International Maize and Wheat Improvement Center (CIMMYT), Nairobi, Kenya
| | - Sean Mayes
- Center of Excellence in Genomics and Systems Biology (CEGSB) and Center for Pre-Breeding Research (CPBR), International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | - Manish K Pandey
- Center of Excellence in Genomics and Systems Biology (CEGSB) and Center for Pre-Breeding Research (CPBR), International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India.
| |
Collapse
|
7
|
Secomandi S, Gallo GR, Rossi R, Rodríguez Fernandes C, Jarvis ED, Bonisoli-Alquati A, Gianfranceschi L, Formenti G. Pangenome graphs and their applications in biodiversity genomics. Nat Genet 2025; 57:13-26. [PMID: 39779953 DOI: 10.1038/s41588-024-02029-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 11/08/2024] [Indexed: 01/11/2025]
Abstract
Complete datasets of genetic variants are key to biodiversity genomic studies. Long-read sequencing technologies allow the routine assembly of highly contiguous, haplotype-resolved reference genomes. However, even when complete, reference genomes from a single individual may bias downstream analyses and fail to adequately represent genetic diversity within a population or species. Pangenome graphs assembled from aligned collections of high-quality genomes can overcome representation bias by integrating sequence information from multiple genomes from the same population, species or genus into a single reference. Here, we review the available tools and data structures to build, visualize and manipulate pangenome graphs while providing practical examples and discussing their applications in biodiversity and conservation genomics across the tree of life.
Collapse
Affiliation(s)
- Simona Secomandi
- Laboratory of Neurogenetics of Language, the Rockefeller University, New York, NY, USA
| | | | - Riccardo Rossi
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy
| | - Carlos Rodríguez Fernandes
- Centre for Ecology, Evolution and Environmental Changes (CE3C) and CHANGE, Global Change and Sustainability Institute, Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
- Faculdade de Psicologia, Universidade de Lisboa, Lisboa, Portugal
| | - Erich D Jarvis
- Laboratory of Neurogenetics of Language, the Rockefeller University, New York, NY, USA
- The Vertebrate Genome Laboratory, New York, NY, USA
| | - Andrea Bonisoli-Alquati
- Department of Biological Sciences, California State Polytechnic University, Pomona, Pomona, CA, USA
| | | | | |
Collapse
|
8
|
Vorbrugg S, Bezrukov I, Bao Z, Weigel D. Gretl-variation GRaph Evaluation TooLkit. Bioinformatics 2024; 41:btae755. [PMID: 39719064 PMCID: PMC11729725 DOI: 10.1093/bioinformatics/btae755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Revised: 11/15/2024] [Accepted: 12/21/2024] [Indexed: 12/26/2024] Open
Abstract
MOTIVATION As genome graphs are powerful data structures for representing the genetic diversity within populations, they can help identify genomic variations that traditional linear references miss, but their complexity and size makes the analysis of genome graphs challenging. We sought to develop a genome graph analysis tool that helps these analyses to become more accessible by addressing the limitations of existing tools. Specifically, we improve scalability and user-friendliness, and we provide many new statistics tailored to variation graphs for graph evaluation, including sample-specific features. RESULTS We developed an efficient, comprehensive, and integrated tool, gretl, to analyze genome graphs and gain insights into their structure and composition by providing a wide range of statistics. gretl can be utilized to evaluate different graphs, compare the output of graph construction pipelines with different parameters, as well as perform an in-depth analysis of individual graphs, including sample-specific analysis. With the assistance of gretl, novel patterns of genetic variation and potential regions of interest can be identified, for later, more detailed inspection. We demonstrate that gretl outperforms other tools in terms of speed, particularly for larger genome graphs. AVAILABILITY AND IMPLEMENTATION Commented Rust source code and documentation is available under MIT license at https://github.com/MoinSebi/gretl together with Python scripts and step-by-step usage examples. The package is available at Bioconda for easy installation.
Collapse
Affiliation(s)
- Sebastian Vorbrugg
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Ilja Bezrukov
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Zhigui Bao
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| |
Collapse
|
9
|
Bortoluzzi C, Mapel XM, Neuenschwander S, Janett F, Pausch H, Leonard AS. Genome assembly of wisent (Bison bonasus) uncovers a deletion that likely inactivates the THRSP gene. Commun Biol 2024; 7:1580. [PMID: 39604663 PMCID: PMC11603333 DOI: 10.1038/s42003-024-07295-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Accepted: 11/19/2024] [Indexed: 11/29/2024] Open
Abstract
The wisent (Bison bonasus) is Europe's largest land mammal. We produced a HiFi read-based wisent assembly with a contig N50 value of 91 Mb containing 99.7% of the highly conserved single copy mammalian genes which improves contiguity a thousand-fold over an existing assembly. Extended runs of homozygosity in the wisent genome compromised the separation of the HiFi reads into parental-specific read sets, which resulted in inferior haplotype assemblies. A bovine super-pangenome built with assemblies from wisent, bison, gaur, yak, taurine and indicine cattle identified a 1580 bp deletion removing the protein-coding sequence of THRSP encoding thyroid hormone-responsive protein from the wisent and bison genomes. Analysis of 725 sequenced samples across the Bovinae subfamily showed that the deletion is fixed in both Bison species but absent in Bos and Bubalus. The THRSP transcript is abundant in adipose, fat, liver, muscle, and mammary gland tissue of Bos and Bubalus, but absent in bison. This indicates that the deletion likely inactivates THRSP in bison. We show that super-pangenomes can reveal potentially trait-associated variation across phylogenies, but also demonstrate that haplotype assemblies from species that went through population bottlenecks warrant scrutiny, as they may have accumulated long runs of homozygosity that complicate phasing.
Collapse
Affiliation(s)
| | | | | | - Fredi Janett
- Clinic of Reproductive Medicine, University of Zurich, Zurich, Switzerland
| | | | | |
Collapse
|
10
|
Li X, Zhu K, Zhen Y. A versatile pipeline to identify convergently lost ancestral conserved fragments associated with convergent evolution of vocal learning. Brief Bioinform 2024; 26:bbae614. [PMID: 39581870 PMCID: PMC11586126 DOI: 10.1093/bib/bbae614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 10/10/2024] [Accepted: 11/12/2024] [Indexed: 11/26/2024] Open
Abstract
Molecular convergence in convergently evolved lineages provides valuable insights into the shared genetic basis of converged phenotypes. However, most methods are limited to coding regions, overlooking the potential contribution of regulatory regions. We focused on the independently evolved vocal learning ability in multiple avian lineages, and developed a whole-genome-alignment-free approach to identify genome-wide Convergently Lost Ancestral Conserved fragments (CLACs) in these lineages, encompassing noncoding regions. We discovered 2711 CLACs that are overrepresented in noncoding regions. Proximal genes of these CLACs exhibit significant enrichment in neurological pathways, including glutamate receptor signaling pathway and axon guidance pathway. Moreover, their expression is highly enriched in brain tissues associated with speech formation. Notably, several have known functions in speech and language learning, including ROBO family, SLIT2, GRIN1, and GRIN2B. Additionally, we found significantly enriched motifs in noncoding CLACs, which match binding motifs of transcriptional factors involved in neurogenesis and gene expression regulation in brain. Furthermore, we discovered 19 candidate genes that harbor CLACs in both human and multiple avian vocal learning lineages, suggesting their potential contribution to the independent evolution of vocal learning in both birds and humans.
Collapse
Affiliation(s)
- Xiaoyi Li
- School of Life Sciences, Fudan University, 220 Handan Road, Yangpu District, Shanghai 200433, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences and Research Center for Industries of the Future, Westlake University, 600 Dunyu Road, Xihu District, Hangzhou, Zhejiang 310030, China
- Westlake Laboratory of Life Sciences and Biomedicine, 600 Dunyu Road, Xihu District, Hangzhou, Zhejiang 310030, China
| | - Kangli Zhu
- Westlake Laboratory of Life Sciences and Biomedicine, 600 Dunyu Road, Xihu District, Hangzhou, Zhejiang 310030, China
| | - Ying Zhen
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences and Research Center for Industries of the Future, Westlake University, 600 Dunyu Road, Xihu District, Hangzhou, Zhejiang 310030, China
- Westlake Laboratory of Life Sciences and Biomedicine, 600 Dunyu Road, Xihu District, Hangzhou, Zhejiang 310030, China
- Institute of Biology, Westlake Institute for Advanced Study, 18 Shilongshan Road, Xihu District, Hangzhou, Zhejiang 310024, China
| |
Collapse
|
11
|
Avila Cartes J, Bonizzoni P, Ciccolella S, Della Vedova G, Denti L. PangeBlocks: customized construction of pangenome graphs via maximal blocks. BMC Bioinformatics 2024; 25:344. [PMID: 39497039 PMCID: PMC11533710 DOI: 10.1186/s12859-024-05958-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Accepted: 10/16/2024] [Indexed: 11/06/2024] Open
Abstract
BACKGROUND The construction of a pangenome graph is a fundamental task in pangenomics. A natural theoretical question is how to formalize the computational problem of building an optimal pangenome graph, making explicit the underlying optimization criterion and the set of feasible solutions. Current approaches build a pangenome graph with some heuristics, without assuming some explicit optimization criteria. Thus it is unclear how a specific optimization criterion affects the graph topology and downstream analysis, like read mapping and variant calling. RESULTS In this paper, by leveraging the notion of maximal block in a Multiple Sequence Alignment (MSA), we reframe the pangenome graph construction problem as an exact cover problem on blocks called Minimum Weighted Block Cover (MWBC). Then we propose an Integer Linear Programming (ILP) formulation for the MWBC problem that allows us to study the most natural objective functions for building a graph. We provide an implementation of the ILP approach for solving the MWBC and we evaluate it on SARS-CoV-2 complete genomes, showing how different objective functions lead to pangenome graphs that have different properties, hinting that the specific downstream task can drive the graph construction phase. CONCLUSION We show that a customized construction of a pangenome graph based on selecting objective functions has a direct impact on the resulting graphs. In particular, our formalization of the MWBC problem, based on finding an optimal subset of blocks covering an MSA, paves the way to novel practical approaches to graph representations of an MSA where the user can guide the construction.
Collapse
Affiliation(s)
- Jorge Avila Cartes
- Department of Informatics, Systems, and Communications, University of Milano - Bicocca, Viale Sarca, 20126, Milano, Italy
| | - Paola Bonizzoni
- Department of Informatics, Systems, and Communications, University of Milano - Bicocca, Viale Sarca, 20126, Milano, Italy.
| | - Simone Ciccolella
- Department of Informatics, Systems, and Communications, University of Milano - Bicocca, Viale Sarca, 20126, Milano, Italy
| | - Gianluca Della Vedova
- Department of Informatics, Systems, and Communications, University of Milano - Bicocca, Viale Sarca, 20126, Milano, Italy
| | - Luca Denti
- Department of Informatics, Systems, and Communications, University of Milano - Bicocca, Viale Sarca, 20126, Milano, Italy
- Department of Applied Informatics, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Mlynská dolina F1, Bratislava, 84248, Slovakia
| |
Collapse
|
12
|
Garrison E, Guarracino A, Heumos S, Villani F, Bao Z, Tattini L, Hagmann J, Vorbrugg S, Marco-Sola S, Kubica C, Ashbrook DG, Thorell K, Rusholme-Pilcher RL, Liti G, Rudbeck E, Golicz AA, Nahnsen S, Yang Z, Mwaniki MN, Nobrega FL, Wu Y, Chen H, de Ligt J, Sudmant PH, Huang S, Weigel D, Soranzo N, Colonna V, Williams RW, Prins P. Building pangenome graphs. Nat Methods 2024; 21:2008-2012. [PMID: 39433878 DOI: 10.1038/s41592-024-02430-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Accepted: 08/26/2024] [Indexed: 10/23/2024]
Abstract
Pangenome graphs can represent all variation between multiple reference genomes, but current approaches to build them exclude complex sequences or are based upon a single reference. In response, we developed the PanGenome Graph Builder, a pipeline for constructing pangenome graphs without bias or exclusion. The PanGenome Graph Builder uses all-to-all alignments to build a variation graph in which we can identify variation, measure conservation, detect recombination events and infer phylogenetic relationships.
Collapse
Affiliation(s)
- Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA.
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Human Technopole, Milan, Italy
| | - Simon Heumos
- Quantitative Biology Center (QBiC) Tübingen, University of Tübingen, Tübingen, Germany
- Biomedical Data Science, Dept. of Computer Science, University of Tübingen, Tübingen, Germany
- M3 Research Center, University Hospital Tübingen, Tübingen, Germany
| | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Zhigui Bao
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Lorenzo Tattini
- Université Côte d'Azur, CNRS, INSERM, IRCAN, Nice, France
- Data Science Department, EURECOM, Biot, France
| | | | - Sebastian Vorbrugg
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | - Santiago Marco-Sola
- Computer Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
- Department of Computer Science, Universitat Politècnica de Catalunya, Barcelona, Spain
| | - Christian Kubica
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | - David G Ashbrook
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Kaisa Thorell
- Chemistry and Molecular Biology, Faculty of Science, University of Gothenburg, Gothenburg, Sweden
| | | | - Gianni Liti
- Université Côte d'Azur, CNRS, INSERM, IRCAN, Nice, France
| | - Emilio Rudbeck
- Clinical Genomics Gothenburg, Bioinformatics and Data Centre, University of Gothenburg, Gothenburg, Sweden
| | - Agnieszka A Golicz
- Department of Plant Breeding, Justus Liebig University Giessen, Giessen, Germany
| | - Sven Nahnsen
- Quantitative Biology Center (QBiC) Tübingen, University of Tübingen, Tübingen, Germany
- Biomedical Data Science, Dept. of Computer Science, University of Tübingen, Tübingen, Germany
- M3 Research Center, University Hospital Tübingen, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics (IBMI), Eberhard-Karls University of Tübingen, Tübingen, Germany
| | - Zuyu Yang
- The Institute of Environmental Science and Research, Wellington, New Zealand
| | | | - Franklin L Nobrega
- School of Biological Sciences, Faculty of Environmental and Life Sciences, University of Southampton, Southampton, UK
| | - Yi Wu
- School of Biological Sciences, Faculty of Environmental and Life Sciences, University of Southampton, Southampton, UK
| | - Hao Chen
- Department of Pharmacology, Addiction Science and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Joep de Ligt
- Hartwig Medical Foundation, Amsterdam, the Netherlands
| | - Peter H Sudmant
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA, USA
| | - Sanwen Huang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University Tübingen, Tübingen, Germany
| | - Nicole Soranzo
- Human Technopole, Milan, Italy
- Wellcome Sanger Institute, Genome Campus, Hinxton, UK
- National Institute for Health Research Blood and Transplant Research Unit in Donor Health and Genomics, University of Cambridge, Cambridge, UK
- Department of Haematology, Cambridge Biomedical Campus, Cambridge, UK
- British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK
| | - Vincenza Colonna
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| |
Collapse
|
13
|
Chen G, Shi G, Dai Y, Zhao R, Wu Q. Graph-Based Pan-Genome Reveals the Pattern of Deleterious Mutations during the Domestication of Saccharomyces cerevisiae. J Fungi (Basel) 2024; 10:575. [PMID: 39194902 DOI: 10.3390/jof10080575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 08/08/2024] [Accepted: 08/10/2024] [Indexed: 08/29/2024] Open
Abstract
The "cost of domestication" hypothesis suggests that the domestication of wild species increases the number, frequency, and/or proportion of deleterious genetic variants, potentially reducing their fitness in the wild. While extensively studied in domesticated species, this phenomenon remains understudied in fungi. Here, we used Saccharomyces cerevisiae, the world's oldest domesticated fungus, as a model to investigate the genomic characteristics of deleterious variants arising from fungal domestication. Employing a graph-based pan-genome approach, we identified 1,297,761 single nucleotide polymorphisms (SNPs), 278,147 insertion/deletion events (indels; <30 bp), and 19,967 non-redundant structural variants (SVs; ≥30 bp) across 687 S. cerevisiae isolates. Comparing these variants with synonymous SNPs (sSNPs) as neutral controls, we found that the majority of the derived nonsynonymous SNPs (nSNPs), indels, and SVs were deleterious. Heterozygosity was positively correlated with the impact of deleterious SNPs, suggesting a role of genetic diversity in mitigating their effects. The domesticated isolates exhibited a higher additive burden of deleterious SNPs (dSNPs) than the wild isolates, but a lower burden of indels and SVs. Moreover, the domesticated S. cerevisiae showed reduced rates of adaptive evolution relative to the wild S. cerevisiae. In summary, deleterious variants tend to be heterozygous, which may mitigate their harmful effects, but they also constrain breeding potential. Addressing deleterious alleles and minimizing the genetic load are crucial considerations for future S. cerevisiae breeding efforts.
Collapse
Affiliation(s)
- Guotao Chen
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
- University of the Chinese Academy of Sciences, Beijing 100049, China
| | - Guohui Shi
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| | - Yi Dai
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| | - Ruilin Zhao
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| | - Qi Wu
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
14
|
L Rocha J, Lou RN, Sudmant PH. Structural variation in humans and our primate kin in the era of telomere-to-telomere genomes and pangenomics. Curr Opin Genet Dev 2024; 87:102233. [PMID: 39042999 PMCID: PMC11695101 DOI: 10.1016/j.gde.2024.102233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 07/02/2024] [Accepted: 07/05/2024] [Indexed: 07/25/2024]
Abstract
Structural variants (SVs) account for the majority of base pair differences both within and between primate species. However, our understanding of inter- and intra-species SV has been historically hampered by the quality of draft primate genomes and the absence of genome resources for key taxa. Recently, advances in long-read sequencing and genome assembly have begun to radically reshape our understanding of SVs. Two landmark achievements include the publication of a human telomere-to-telomere (T2T) genome as well as the development of the first human pangenome reference. In this review, we first look back to the major works laying the foundation for these projects. We then examine the ways in which T2T genome assemblies and pangenomes are transforming our understanding of and approach to primate SV. Finally, we discuss what the future of primate SV research may look like in the era of T2T genomes and pangenomics.
Collapse
Affiliation(s)
- Joana L Rocha
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA. https://twitter.com/@joanocha
| | - Runyang N Lou
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA. https://twitter.com/@NicolasLou10
| | - Peter H Sudmant
- Department of Integrative Biology, University of California, Berkeley, Berkeley, USA; Center for Computational Biology, University of California, Berkeley, Berkeley, USA.
| |
Collapse
|
15
|
Gao Z, Lu Y, Chong Y, Li M, Hong J, Wu J, Wu D, Xi D, Deng W. Beef Cattle Genome Project: Advances in Genome Sequencing, Assembly, and Functional Genes Discovery. Int J Mol Sci 2024; 25:7147. [PMID: 39000250 PMCID: PMC11240973 DOI: 10.3390/ijms25137147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 06/23/2024] [Accepted: 06/26/2024] [Indexed: 07/16/2024] Open
Abstract
Beef is a major global source of protein, playing an essential role in the human diet. The worldwide production and consumption of beef continue to rise, reflecting a significant trend. However, despite the critical importance of beef cattle resources in agriculture, the diversity of cattle breeds faces severe challenges, with many breeds at risk of extinction. The initiation of the Beef Cattle Genome Project is crucial. By constructing a high-precision functional annotation map of their genome, it becomes possible to analyze the genetic mechanisms underlying important traits in beef cattle, laying a solid foundation for breeding more efficient and productive cattle breeds. This review details advances in genome sequencing and assembly technologies, iterative upgrades of the beef cattle reference genome, and its application in pan-genome research. Additionally, it summarizes relevant studies on the discovery of functional genes associated with key traits in beef cattle, such as growth, meat quality, reproduction, polled traits, disease resistance, and environmental adaptability. Finally, the review explores the potential of telomere-to-telomere (T2T) genome assembly, structural variations (SVs), and multi-omics techniques in future beef cattle genetic breeding. These advancements collectively offer promising avenues for enhancing beef cattle breeding and improving genetic traits.
Collapse
Affiliation(s)
- Zhendong Gao
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China
| | - Ying Lu
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China
| | - Yuqing Chong
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China
| | - Mengfei Li
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China
| | - Jieyun Hong
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China
| | - Jiao Wu
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China
| | - Dongwang Wu
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China
| | - Dongmei Xi
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China
| | - Weidong Deng
- Yunnan Provincial Key Laboratory of Animal Nutrition and Feed, Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming 650201, China
- State Key Laboratory for Conservation and Utilization of Bio-Resource in Yunnan, Kunming 650201, China
| |
Collapse
|
16
|
Kalleberg J, Rissman J, Schnabel RD. Overcoming Limitations to Deep Learning in Domesticated Animals with TrioTrain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.15.589602. [PMID: 38659907 PMCID: PMC11042298 DOI: 10.1101/2024.04.15.589602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Variant calling across diverse species remains challenging as most bioinformatics tools default to assumptions based on human genomes. DeepVariant (DV) excels without joint genotyping while offering fewer implementation barriers. However, the growing appeal of a "universal" algorithm has magnified the unknown impacts when used with non-human genomes. Here, we use bovine genomes to assess the limits of human-genome-trained models in other species. We introduce the first multi-species DV model that achieves a lower Mendelian Inheritance Error (MIE) rate during single-sample genotyping. Our novel approach, TrioTrain, automates extending DV for species without Genome In A Bottle (GIAB) resources and uses region shuffling to mitigate barriers for SLURM-based clusters. To offset imperfect truth labels for animal genomes, we remove Mendelian discordant variants before training, where models are tuned to genotype the offspring correctly. With TrioTrain, we use cattle, yak, and bison trios to build 30 model iterations across five phases. We observe remarkable performance across phases when testing the GIAB human trios with a mean SNP F1 score >0.990. In HG002, our phase 4 bovine model identifies more variants at a lower MIE rate than DeepTrio. In bovine F1-hybrid genomes, our model substantially reduces inheritance errors with a mean MIE rate of 0.03 percent. Although constrained by imperfect labels, we find that multi-species, trio-based training produces a robust variant calling model. Our research demonstrates that exclusively training with human genomes restricts the application of deep-learning approaches for comparative genomics.
Collapse
Affiliation(s)
- Jenna Kalleberg
- University of Missouri, Division of Animal Sciences, Columbia, MO, 65201 USA
| | - Jacob Rissman
- University of Missouri, Division of Animal Sciences, Columbia, MO, 65201 USA
| | - Robert D Schnabel
- University of Missouri, Division of Animal Sciences, Columbia, MO, 65201 USA
- University of Missouri, Genetics Area Program, Columbia, MO, 65201 USA
| |
Collapse
|
17
|
Leonard AS, Mapel XM, Pausch H. Pangenome-genotyped structural variation improves molecular phenotype mapping in cattle. Genome Res 2024; 34:300-309. [PMID: 38355307 PMCID: PMC10984387 DOI: 10.1101/gr.278267.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 02/01/2024] [Indexed: 02/16/2024]
Abstract
Expression and splicing quantitative trait loci (e/sQTL) are large contributors to phenotypic variability. Achieving sufficient statistical power for e/sQTL mapping requires large cohorts with both genotypes and molecular phenotypes, and so, the genomic variation is often called from short-read alignments, which are unable to comprehensively resolve structural variation. Here we build a pangenome from 16 HiFi haplotype-resolved cattle assemblies to identify small and structural variation and genotype them with PanGenie in 307 short-read samples. We find high (>90%) concordance of PanGenie-genotyped and DeepVariant-called small variation and confidently genotype close to 21 million small and 43,000 structural variants in the larger population. We validate 85% of these structural variants (with MAF > 0.1) directly with a subset of 25 short-read samples that also have medium coverage HiFi reads. We then conduct e/sQTL mapping with this comprehensive variant set in a subset of 117 cattle that have testis transcriptome data, and find 92 structural variants as causal candidates for eQTL and 73 for sQTL. We find that roughly half of the top associated structural variants affecting expression or splicing are transposable elements, such as SV-eQTL for STN1 and MYH7 and SV-sQTL for CEP89 and ASAH2 Extensive linkage disequilibrium between small and structural variation results in only 28 additional eQTL and 17 sQTL discovered when including SVs, although many top associated SVs are compelling candidates.
Collapse
Affiliation(s)
| | - Xena M Mapel
- Animal Genomics, ETH Zurich, 8092 Zurich, Switzerland
| | - Hubert Pausch
- Animal Genomics, ETH Zurich, 8092 Zurich, Switzerland
| |
Collapse
|
18
|
Wang J, Liu J, Guo Z. Natural uORF variation in plants. TRENDS IN PLANT SCIENCE 2024; 29:290-302. [PMID: 37640640 DOI: 10.1016/j.tplants.2023.07.005] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 07/04/2023] [Accepted: 07/19/2023] [Indexed: 08/31/2023]
Abstract
Taking advantage of natural variation promotes our understanding of phenotypic diversity and trait evolution, ultimately accelerating plant breeding, in which the identification of causal variations is critical. To date, sequence variations in the coding region and transcription level polymorphisms caused by variations in the promoter have been prioritized. An upstream open reading frame (uORF) in the 5' untranslated region (5' UTR) regulates gene expression at the post-transcription or translation level. In recent years, studies have demonstrated that natural uORF variations shape phenotypic diversity. This opinion article highlights recent researches and speculates on future directions for natural uORF variation in plants.
Collapse
Affiliation(s)
- Jiangen Wang
- Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Juhong Liu
- Fuzhou Institute for Data Technology Co., Ltd., Fuzhou 350207, China
| | - Zilong Guo
- Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
| |
Collapse
|
19
|
Miao J, Wei X, Cao C, Sun J, Xu Y, Zhang Z, Wang Q, Pan Y, Wang Z. Pig pangenome graph reveals functional features of non-reference sequences. J Anim Sci Biotechnol 2024; 15:32. [PMID: 38389084 PMCID: PMC10882747 DOI: 10.1186/s40104-023-00984-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 12/22/2023] [Indexed: 02/24/2024] Open
Abstract
BACKGROUND The reliance on a solitary linear reference genome has imposed a significant constraint on our comprehensive understanding of genetic variation in animals. This constraint is particularly pronounced for non-reference sequences (NRSs), which have not been extensively studied. RESULTS In this study, we constructed a pig pangenome graph using 21 pig assemblies and identified 23,831 NRSs with a total length of 105 Mb. Our findings revealed that NRSs were more prevalent in breeds exhibiting greater genetic divergence from the reference genome. Furthermore, we observed that NRSs were rarely found within coding sequences, while NRS insertions were enriched in immune-related Gene Ontology terms. Notably, our investigation also unveiled a close association between novel genes and the immune capacity of pigs. We observed substantial differences in terms of frequencies of NRSs between Eastern and Western pigs, and the heat-resistant pigs exhibited a substantial number of NRS insertions in an 11.6 Mb interval on chromosome X. Additionally, we discovered a 665 bp insertion in the fourth intron of the TNFRSF19 gene that may be associated with the ability of heat tolerance in Southern Chinese pigs. CONCLUSIONS Our findings demonstrate the potential of a graph genome approach to reveal important functional features of NRSs in pig populations.
Collapse
Affiliation(s)
- Jian Miao
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Xingyu Wei
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Caiyun Cao
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Jiabao Sun
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Yuejin Xu
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Zhe Zhang
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Qishan Wang
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China
- Yazhou Bay Science and Technology City, Hainan Institute of Zhejiang University, Yazhou District, Building 11, Yongyou Industrial Park, Sanya, 572025, Hainan, China
| | - Yuchun Pan
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
- Yazhou Bay Science and Technology City, Hainan Institute of Zhejiang University, Yazhou District, Building 11, Yongyou Industrial Park, Sanya, 572025, Hainan, China.
| | - Zhen Wang
- College of Animal Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
20
|
Groza C, Schwendinger-Schreck C, Cheung WA, Farrow EG, Thiffault I, Lake J, Rizzo WB, Evrony G, Curran T, Bourque G, Pastinen T. Pangenome graphs improve the analysis of structural variants in rare genetic diseases. Nat Commun 2024; 15:657. [PMID: 38253606 PMCID: PMC10803329 DOI: 10.1038/s41467-024-44980-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 01/10/2024] [Indexed: 01/24/2024] Open
Abstract
Rare DNA alterations that cause heritable diseases are only partially resolvable by clinical next-generation sequencing due to the difficulty of detecting structural variation (SV) in all genomic contexts. Long-read, high fidelity genome sequencing (HiFi-GS) detects SVs with increased sensitivity and enables assembling personal and graph genomes. We leverage standard reference genomes, public assemblies (n = 94) and a large collection of HiFi-GS data from a rare disease program (Genomic Answers for Kids, GA4K, n = 574 assemblies) to build a graph genome representing a unified SV callset in GA4K, identify common variation and prioritize SVs that are more likely to cause genetic disease (MAF < 0.01). Using graphs, we obtain a higher level of reproducibility than the standard reference approach. We observe over 200,000 SV alleles unique to GA4K, including nearly 1000 rare variants that impact coding sequence. With improved specificity for rare SVs, we isolate 30 candidate SVs in phenotypically prioritized genes, including known disease SVs. We isolate a novel diagnostic SV in KMT2E, demonstrating use of personal assemblies coupled with pangenome graphs for rare disease genomics. The community may interrogate our pangenome with additional assemblies to discover new SVs within the allele frequency spectrum relevant to genetic diseases.
Collapse
Affiliation(s)
- Cristian Groza
- Quantitative Life Sciences, McGill University, Montréal, QC, Canada
| | | | - Warren A Cheung
- Genomic Medicine Center, Children's Mercy Hospital and Research Institute, KC, MO, USA
| | - Emily G Farrow
- Genomic Medicine Center, Children's Mercy Hospital and Research Institute, KC, MO, USA
| | - Isabelle Thiffault
- Genomic Medicine Center, Children's Mercy Hospital and Research Institute, KC, MO, USA
| | | | - William B Rizzo
- Child Health Research Institute, Department of Pediatrics, Nebraska Medical Center, Omaha, NE, USA
| | - Gilad Evrony
- Center for Human Genetics and Genomics, Department of Pediatrics, Neuroscience & Physiology, New York University Grossman School of Medicine, New York, NY, USA
| | - Tom Curran
- Children's Mercy Research Institute, Kansas City, MO, USA
| | - Guillaume Bourque
- Canadian Center for Computational Genomics, McGill University, Montréal, QC, Canada.
- Department of Human Genetics, McGill University, Montréal, QC, Canada.
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan.
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, QC, Canada.
| | - Tomi Pastinen
- Genomic Medicine Center, Children's Mercy Hospital and Research Institute, KC, MO, USA.
| |
Collapse
|
21
|
Mapel XM, Kadri NK, Leonard AS, He Q, Lloret-Villas A, Bhati M, Hiltpold M, Pausch H. Molecular quantitative trait loci in reproductive tissues impact male fertility in cattle. Nat Commun 2024; 15:674. [PMID: 38253538 PMCID: PMC10803364 DOI: 10.1038/s41467-024-44935-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 01/08/2024] [Indexed: 01/24/2024] Open
Abstract
Breeding bulls are well suited to investigate inherited variation in male fertility because they are genotyped and their reproductive success is monitored through semen analyses and thousands of artificial inseminations. However, functional data from relevant tissues are lacking in cattle, which prevents fine-mapping fertility-associated genomic regions. Here, we characterize gene expression and splicing variation in testis, epididymis, and vas deferens transcriptomes of 118 mature bulls and conduct association tests between 414,667 molecular phenotypes and 21,501,032 genome-wide variants to identify 41,156 regulatory loci. We show broad consensus in tissue-specific and tissue-enriched gene expression between the three bovine tissues and their human and murine counterparts. Expression- and splicing-mediating variants are more than three times as frequent in testis than epididymis and vas deferens, highlighting the transcriptional complexity of testis. Finally, we identify genes (WDR19, SPATA16, KCTD19, ZDHHC1) and molecular phenotypes that are associated with quantitative variation in male fertility through transcriptome-wide association and colocalization analyses.
Collapse
Affiliation(s)
- Xena Marie Mapel
- Animal Genomics, ETH Zurich, Universitatstrasse 2, 8092, Zurich, Switzerland
| | - Naveen Kumar Kadri
- Animal Genomics, ETH Zurich, Universitatstrasse 2, 8092, Zurich, Switzerland
| | - Alexander S Leonard
- Animal Genomics, ETH Zurich, Universitatstrasse 2, 8092, Zurich, Switzerland
| | - Qiongyu He
- Animal Genomics, ETH Zurich, Universitatstrasse 2, 8092, Zurich, Switzerland
| | | | - Meenu Bhati
- Animal Genomics, ETH Zurich, Universitatstrasse 2, 8092, Zurich, Switzerland
- Roslin Institute, The University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - Maya Hiltpold
- Animal Genomics, ETH Zurich, Universitatstrasse 2, 8092, Zurich, Switzerland
- GenPhySE, Université de Toulouse, INRAE, ENVT, 31326, Castanet Tolosan, France
| | - Hubert Pausch
- Animal Genomics, ETH Zurich, Universitatstrasse 2, 8092, Zurich, Switzerland.
| |
Collapse
|
22
|
Rice ES, Alberdi A, Alfieri J, Athrey G, Balacco JR, Bardou P, Blackmon H, Charles M, Cheng HH, Fedrigo O, Fiddaman SR, Formenti G, Frantz LAF, Gilbert MTP, Hearn CJ, Jarvis ED, Klopp C, Marcos S, Mason AS, Velez-Irizarry D, Xu L, Warren WC. A pangenome graph reference of 30 chicken genomes allows genotyping of large and complex structural variants. BMC Biol 2023; 21:267. [PMID: 37993882 PMCID: PMC10664547 DOI: 10.1186/s12915-023-01758-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 11/02/2023] [Indexed: 11/24/2023] Open
Abstract
BACKGROUND The red junglefowl, the wild outgroup of domestic chickens, has historically served as a reference for genomic studies of domestic chickens. These studies have provided insight into the etiology of traits of commercial importance. However, the use of a single reference genome does not capture diversity present among modern breeds, many of which have accumulated molecular changes due to drift and selection. While reference-based resequencing is well-suited to cataloging simple variants such as single-nucleotide changes and short insertions and deletions, it is mostly inadequate to discover more complex structural variation in the genome. METHODS We present a pangenome for the domestic chicken consisting of thirty assemblies of chickens from different breeds and research lines. RESULTS We demonstrate how this pangenome can be used to catalog structural variants present in modern breeds and untangle complex nested variation. We show that alignment of short reads from 100 diverse wild and domestic chickens to this pangenome reduces reference bias by 38%, which affects downstream genotyping results. This approach also allows for the accurate genotyping of a large and complex pair of structural variants at the K feathering locus using short reads, which would not be possible using a linear reference. CONCLUSIONS We expect that this new paradigm of genomic reference will allow better pinpointing of exact mutations responsible for specific phenotypes, which will in turn be necessary for breeding chickens that meet new sustainability criteria and are resilient to quickly evolving pathogen threats.
Collapse
Affiliation(s)
- Edward S Rice
- Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
- Faculty of Veterinary Medicine, Ludwig-Maximilians-Universität, Munich, Germany
| | - Antton Alberdi
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen (UCPH), Copenhagen, Denmark
| | - James Alfieri
- Department of Ecology & Evolutionary Biology, Texas A&M University, College Station, TX, USA
| | - Giridhar Athrey
- Department of Poultry Science, Texas A&M University, College Station, TX, USA
| | - Jennifer R Balacco
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Philippe Bardou
- Sigenae, GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, 31326, France
| | - Heath Blackmon
- Department of Biology, Texas A&M University, College Station, TX, USA
| | - Mathieu Charles
- University Paris-Saclay, INRAE, AgroParisTech, GABI, Sigenae, Jouy-en-Josas, France
| | - Hans H Cheng
- Avian Disease and Oncology Laboratory, USDA, ARS, USNPRC, East Lansing, MI, USA
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | | | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Laurent A F Frantz
- Faculty of Veterinary Medicine, Ludwig-Maximilians-Universität, Munich, Germany
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, E1 4DQ, UK
| | - M Thomas P Gilbert
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen (UCPH), Copenhagen, Denmark
| | - Cari J Hearn
- Avian Disease and Oncology Laboratory, USDA, ARS, USNPRC, East Lansing, MI, USA
| | - Erich D Jarvis
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
- The Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Christophe Klopp
- Sigenae, Genotoul Bioinfo, MIAT UR875, INRAE, Castanet Tolosan, France
| | - Sofia Marcos
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen (UCPH), Copenhagen, Denmark
- Applied Genomics and Bioinformatics, University of the Basque Country (UPV/EHU), Leioa, Bilbao, Spain
| | | | | | - Luohao Xu
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Key Laboratory of Aquatic Science of Chongqing, School of Life Sciences, Southwest University, Chongqing, 400715, China
| | - Wesley C Warren
- Department of Animal Sciences, University of Missouri, Columbia, MO, USA.
| |
Collapse
|
23
|
Jang J, Jung J, Lee YH, Lee S, Baik M, Kim H. Chromosome-level genome assembly of Korean native cattle and pangenome graph of 14 Bos taurus assemblies. Sci Data 2023; 10:560. [PMID: 37612339 PMCID: PMC10447506 DOI: 10.1038/s41597-023-02453-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Accepted: 08/08/2023] [Indexed: 08/25/2023] Open
Abstract
This study presents the first chromosome-level genome assembly of Hanwoo, an indigenous Korean breed of Bos taurus taurus. This is the first genome assembly of Asian taurus breed. Also, we constructed a pangenome graph of 14 B. taurus genome assemblies. The contig N50 was over 55 Mb, the scaffold N50 was over 89 Mb and a genome completeness of 95.8%, as estimated by BUSCO using the mammalian set, indicated a high-quality assembly. 48.7% of the genome comprised various repetitive elements, including DNAs, tandem repeats, long interspersed nuclear elements, and simple repeats. A total of 27,314 protein-coding genes were identified, including 25,302 proteins with inferred gene names and 2,012 unknown proteins. The pangenome graph of 14 B. taurus autosomes revealed 528.47 Mb non-reference regions in total and 61.87 Mb Hanwoo-specific regions. Our Hanwoo assembly and pangenome graph provide valuable resources for studying B. taurus populations.
Collapse
Affiliation(s)
- Jisung Jang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Jaehoon Jung
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, College of Agriculture and Life Sciences, Seoul National University, Seoul, Korea
| | - Young Ho Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Sanghyun Lee
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, College of Agriculture and Life Sciences, Seoul National University, Seoul, Korea
| | - Myunggi Baik
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, College of Agriculture and Life Sciences, Seoul National University, Seoul, Korea
| | - Heebal Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, College of Agriculture and Life Sciences, Seoul National University, Seoul, Korea.
| |
Collapse
|
24
|
Sinha D, Maurya AK, Abdi G, Majeed M, Agarwal R, Mukherjee R, Ganguly S, Aziz R, Bhatia M, Majgaonkar A, Seal S, Das M, Banerjee S, Chowdhury S, Adeyemi SB, Chen JT. Integrated Genomic Selection for Accelerating Breeding Programs of Climate-Smart Cereals. Genes (Basel) 2023; 14:1484. [PMID: 37510388 PMCID: PMC10380062 DOI: 10.3390/genes14071484] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 07/14/2023] [Accepted: 07/18/2023] [Indexed: 07/30/2023] Open
Abstract
Rapidly rising population and climate changes are two critical issues that require immediate action to achieve sustainable development goals. The rising population is posing increased demand for food, thereby pushing for an acceleration in agricultural production. Furthermore, increased anthropogenic activities have resulted in environmental pollution such as water pollution and soil degradation as well as alterations in the composition and concentration of environmental gases. These changes are affecting not only biodiversity loss but also affecting the physio-biochemical processes of crop plants, resulting in a stress-induced decline in crop yield. To overcome such problems and ensure the supply of food material, consistent efforts are being made to develop strategies and techniques to increase crop yield and to enhance tolerance toward climate-induced stress. Plant breeding evolved after domestication and initially remained dependent on phenotype-based selection for crop improvement. But it has grown through cytological and biochemical methods, and the newer contemporary methods are based on DNA-marker-based strategies that help in the selection of agronomically useful traits. These are now supported by high-end molecular biology tools like PCR, high-throughput genotyping and phenotyping, data from crop morpho-physiology, statistical tools, bioinformatics, and machine learning. After establishing its worth in animal breeding, genomic selection (GS), an improved variant of marker-assisted selection (MAS), has made its way into crop-breeding programs as a powerful selection tool. To develop novel breeding programs as well as innovative marker-based models for genetic evaluation, GS makes use of molecular genetic markers. GS can amend complex traits like yield as well as shorten the breeding period, making it advantageous over pedigree breeding and marker-assisted selection (MAS). It reduces the time and resources that are required for plant breeding while allowing for an increased genetic gain of complex attributes. It has been taken to new heights by integrating innovative and advanced technologies such as speed breeding, machine learning, and environmental/weather data to further harness the GS potential, an approach known as integrated genomic selection (IGS). This review highlights the IGS strategies, procedures, integrated approaches, and associated emerging issues, with a special emphasis on cereal crops. In this domain, efforts have been taken to highlight the potential of this cutting-edge innovation to develop climate-smart crops that can endure abiotic stresses with the motive of keeping production and quality at par with the global food demand.
Collapse
Affiliation(s)
- Dwaipayan Sinha
- Department of Botany, Government General Degree College, Mohanpur 721436, India
| | - Arun Kumar Maurya
- Department of Botany, Multanimal Modi College, Modinagar, Ghaziabad 201204, India
| | - Gholamreza Abdi
- Department of Biotechnology, Persian Gulf Research Institute, Persian Gulf University, Bushehr 75169, Iran
| | - Muhammad Majeed
- Department of Botany, University of Gujrat, Punjab 50700, Pakistan
| | - Rachna Agarwal
- Applied Genomics Section, Bhabha Atomic Research Centre, Mumbai 400085, India
| | - Rashmi Mukherjee
- Research Center for Natural and Applied Sciences, Department of Botany (UG & PG), Raja Narendralal Khan Women's College, Gope Palace, Midnapur 721102, India
| | - Sharmistha Ganguly
- Department of Dravyaguna, Institute of Post Graduate Ayurvedic Education and Research, Kolkata 700009, India
| | - Robina Aziz
- Department of Botany, Government, College Women University, Sialkot 51310, Pakistan
| | - Manika Bhatia
- TERI School of Advanced Studies, New Delhi 110070, India
| | - Aqsa Majgaonkar
- Department of Botany, St. Xavier's College (Autonomous), Mumbai 400001, India
| | - Sanchita Seal
- Department of Botany, Polba Mahavidyalaya, Polba 712148, India
| | - Moumita Das
- V. Sivaram Research Foundation, Bangalore 560040, India
| | - Swastika Banerjee
- Department of Botany, Kairali College of +3 Science, Champua, Keonjhar 758041, India
| | - Shahana Chowdhury
- Department of Biotechnology, Faculty of Engineering Sciences, German University Bangladesh, TNT Road, Telipara, Chandona Chowrasta, Gazipur 1702, Bangladesh
| | - Sherif Babatunde Adeyemi
- Ethnobotany/Phytomedicine Laboratory, Department of Plant Biology, Faculty of Life Sciences, University of Ilorin, Ilorin P.M.B 1515, Nigeria
| | - Jen-Tsung Chen
- Department of Life Sciences, National University of Kaohsiung, Kaohsiung 811, Taiwan
| |
Collapse
|