1
|
Quah FX, Almeida MV, Blumer M, Yuan CU, Fischer B, See K, Jackson B, Zatha R, Rusuwa B, Turner GF, Santos ME, Svardal H, Hemberg M, Durbin R, Miska E. Lake Malawi cichlid pangenome graph reveals extensive structural variation driven by transposable elements. Genome Res 2025; 35:1094-1107. [PMID: 40210437 PMCID: PMC12047535 DOI: 10.1101/gr.279674.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Accepted: 02/06/2025] [Indexed: 04/12/2025]
Abstract
Pangenome methods have the potential to uncover hitherto undiscovered sequences missing from established reference genomes, making them useful to study evolutionary and speciation processes in diverse organisms. The cichlid fishes of the East African Rift Lakes represent one of nature's most phenotypically diverse vertebrate radiations, but single-nucleotide polymorphism (SNP)-based studies have revealed little sequence difference, with 0.1%-0.25% pairwise divergence between Lake Malawi species. These were based on aligning short reads to a single linear reference genome and ignored the contribution of larger-scale structural variants (SVs). We constructed a pangenome graph that integrates six new and two existing long-read genome assemblies of Lake Malawi haplochromine cichlids. This graph intuitively represents complex and nested variation between the genomes and reveals that the SV landscape is dominated by large insertions, many exclusive to individual assemblies. The graph incorporates a substantial amount of extra sequence across seven species, the total size of which is 33.1% longer than that of a single cichlid genome. Approximately 4.73% to 9.86% of the assembly lengths are estimated as interspecies structural variation between cichlids, suggesting substantial genomic diversity underappreciated in SNP studies. Although coding regions remain highly conserved, our analysis uncovers a significant proportion of SV sequences as transposable element (TE) insertions, especially DNA, LINE, and LTR TEs. These findings underscore that the cichlid genome is shaped both by small-nucleotide mutations and large, TE-derived sequence alterations, both of which merit study to understand their interplay in cichlid evolution.
Collapse
Affiliation(s)
- Fu Xiang Quah
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom;
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | | | - Moritz Blumer
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Chengwei Ulrika Yuan
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Bettina Fischer
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Kirsten See
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom
| | - Ben Jackson
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Richard Zatha
- Department of Biological Sciences, University of Malawi, P.O. Box 280, Zomba, Malawi
| | - Bosco Rusuwa
- Department of Biological Sciences, University of Malawi, P.O. Box 280, Zomba, Malawi
| | - George F Turner
- School of Environmental and Natural Sciences, Bangor University, Bangor, Gwynedd LL57 2TH, United Kingdom
| | - M Emília Santos
- Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, United Kingdom
| | - Hannes Svardal
- Department of Biology, University of Antwerp, 2610 Wilrijk, Belgium
| | - Martin Hemberg
- The Gene Lay Institute of Immunology and Inflammation, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| | - Eric Miska
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, United Kingdom;
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, United Kingdom
| |
Collapse
|
2
|
Mahmoud M, Agustinho DP, Sedlazeck FJ. A Hitchhiker's Guide to long-read genomic analysis. Genome Res 2025; 35:545-558. [PMID: 40228901 PMCID: PMC12047252 DOI: 10.1101/gr.279975.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2025]
Abstract
Over the past decade, long-read sequencing has evolved into a pivotal technology for uncovering the hidden and complex regions of the genome. Significant cost efficiency, scalability, and accuracy advancements have driven this evolution. Concurrently, novel analytical methods have emerged to harness the full potential of long reads. These advancements have enabled milestones such as the first fully completed human genome, enhanced identification and understanding of complex genomic variants, and deeper insights into the interplay between epigenetics and genomic variation. This mini-review provides a comprehensive overview of the latest developments in long-read DNA sequencing analysis, encompassing reference-based and de novo assembly approaches. We explore the entire workflow, from initial data processing to variant calling and annotation, focusing on how these methods improve our ability to interpret a wide array of genomic variants. Additionally, we discuss the current challenges, limitations, and future directions in the field, offering a detailed examination of the state-of-the-art bioinformatics methods for long-read sequencing.
Collapse
Affiliation(s)
- Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Daniel P Agustinho
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA;
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
3
|
Peng Y, Mao K, Li H, Ping J, Zhu J, Liu X, Zhang Z, Jin M, Wu C, Wang N, Yesaya A, Wilson K, Xiao Y. Extreme genetic signatures of local adaptation in a notorious rice pest, Chilo suppressalis. Natl Sci Rev 2025; 12:nwae221. [PMID: 39949366 PMCID: PMC11823119 DOI: 10.1093/nsr/nwae221] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 04/13/2024] [Accepted: 05/23/2024] [Indexed: 02/16/2025] Open
Abstract
Climatic variation stands as a significant driving force behind genetic differentiation and the evolution of adaptive traits. Chilo (C.) suppressalis, commonly known as the rice stem borer, is a highly destructive pest that crucially harms rice production. The lack of natural population genomics data has hindered a more thorough understanding of its climate adaptation, particularly the genetic basis underlying adaptive traits. To overcome this obstacle, our study employed completely resequenced genomes of 384 individuals to explore the population structure, demographic history, and gene flow of C. suppressalis in China. This study observed that its gene flow occurred asymmetrically, moving from central populations to peripheral populations. Using genome-wide selection scans and genotype-environment association studies, we identified potential loci that may be associated with climatic adaptation. The most robust signal was found to be associated with cold tolerance, linked to a homeobox gene, goosecoid (GSC), whose expression level was significantly different in low and high latitudes. Moreover, downregulating the expression of this gene by RNAi enhances its cold tolerance phenotypes. Our findings have uncovered and delved into the genetic foundation of the ability of C. suppressalis to adapt to its environment. This is essential in ensuring the continued effectiveness and sustainability of novel control techniques.
Collapse
Affiliation(s)
- Yan Peng
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Kaikai Mao
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
- Guangxi Key Laboratory of Agro-Environment and Agric-Products Safety, College of Agriculture, Guangxi University, Nanning 530004, China
| | - Hongran Li
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Junfen Ping
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
- School of Life Sciences, Henan University, Kaifeng 475004, China
- Shenzhen Research Institute of Henan University, Shenzhen 518000, China
| | - Jingyun Zhu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Xinye Liu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Zhuting Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Minghui Jin
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Chao Wu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Nan Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Alexander Yesaya
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Kenneth Wilson
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
- Lancaster Environment Centre, Lancaster University, Lancaster LA1 4YW, UK
| | - Yutao Xiao
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Gene Editing Technologies (Hainan), Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| |
Collapse
|
4
|
Fawthrop R, Cerca J, Pacheco G, Sætre GP, Scordato ESC, Ravinet M, Rowe M. Understanding human-commensalism through an ecological and evolutionary framework. Trends Ecol Evol 2025; 40:159-169. [PMID: 39542789 DOI: 10.1016/j.tree.2024.10.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 10/14/2024] [Accepted: 10/17/2024] [Indexed: 11/17/2024]
Abstract
Human-commensalism has been intuitively characterised as an interspecific interaction whereby non-human individuals benefit from tight associations with anthropogenic environments. However, a clear definition of human-commensalism, rooted within an ecological and evolutionary framework, has yet to be proposed. Here, we define human-commensalism as a population-level dependence on anthropogenic resources, associated with genetic differentiation from the ancestral, non-commensal form. Such a definition helps us to understand the origins of human-commensalism and the pace and form of adaptation to anthropogenic niches, and may enable the prediction of future evolution in an increasingly human-modified world. Our discussion encourages greater consideration of the spatial and temporal complexity in anthropogenic niches, promoting a nuanced consideration of human-commensal populations when formulating research questions.
Collapse
Affiliation(s)
- Ruth Fawthrop
- Department of Animal Ecology, Netherlands Institute of Ecology (NIOO-KNAW), 6700 AB, Wageningen, The Netherlands; Groningen Institute for Evolutionary Life Sciences (GELIFES), University of Groningen, 9747 AG, Groningen, The Netherlands.
| | - José Cerca
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo 0316, Norway
| | - George Pacheco
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo 0316, Norway
| | - Glenn-Peter Sætre
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo 0316, Norway
| | - Elizabeth S C Scordato
- Department of Biological Sciences, California State Polytechnic University, Pomona, CA, USA
| | - Mark Ravinet
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo 0316, Norway
| | - Melissah Rowe
- Department of Animal Ecology, Netherlands Institute of Ecology (NIOO-KNAW), 6700 AB, Wageningen, The Netherlands
| |
Collapse
|
5
|
Secomandi S, Gallo GR, Rossi R, Rodríguez Fernandes C, Jarvis ED, Bonisoli-Alquati A, Gianfranceschi L, Formenti G. Pangenome graphs and their applications in biodiversity genomics. Nat Genet 2025; 57:13-26. [PMID: 39779953 DOI: 10.1038/s41588-024-02029-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 11/08/2024] [Indexed: 01/11/2025]
Abstract
Complete datasets of genetic variants are key to biodiversity genomic studies. Long-read sequencing technologies allow the routine assembly of highly contiguous, haplotype-resolved reference genomes. However, even when complete, reference genomes from a single individual may bias downstream analyses and fail to adequately represent genetic diversity within a population or species. Pangenome graphs assembled from aligned collections of high-quality genomes can overcome representation bias by integrating sequence information from multiple genomes from the same population, species or genus into a single reference. Here, we review the available tools and data structures to build, visualize and manipulate pangenome graphs while providing practical examples and discussing their applications in biodiversity and conservation genomics across the tree of life.
Collapse
Affiliation(s)
- Simona Secomandi
- Laboratory of Neurogenetics of Language, the Rockefeller University, New York, NY, USA
| | | | - Riccardo Rossi
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy
| | - Carlos Rodríguez Fernandes
- Centre for Ecology, Evolution and Environmental Changes (CE3C) and CHANGE, Global Change and Sustainability Institute, Departamento de Biologia Animal, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
- Faculdade de Psicologia, Universidade de Lisboa, Lisboa, Portugal
| | - Erich D Jarvis
- Laboratory of Neurogenetics of Language, the Rockefeller University, New York, NY, USA
- The Vertebrate Genome Laboratory, New York, NY, USA
| | - Andrea Bonisoli-Alquati
- Department of Biological Sciences, California State Polytechnic University, Pomona, Pomona, CA, USA
| | | | | |
Collapse
|
6
|
Chan YF, Lu CW, Kuo HC, Hung CM. A chromosome-level genome assembly of the Asian house martin implies potential genes associated with the feathered-foot trait. G3 (BETHESDA, MD.) 2024; 14:jkae077. [PMID: 38607414 PMCID: PMC11152083 DOI: 10.1093/g3journal/jkae077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 03/04/2024] [Accepted: 03/27/2024] [Indexed: 04/13/2024]
Abstract
The presence of feathers is a vital characteristic among birds, yet most modern birds had no feather on their feet. The discoveries of feathers on the hind limbs of basal birds and dinosaurs have sparked an interest in the evolutionary origin and genetic mechanism of feathered feet. However, the majority of studies investigating the genes associated with this trait focused on domestic populations. Understanding the genetic mechanism underpinned feathered-foot development in wild birds is still in its infancy. Here, we assembled a chromosome-level genome of the Asian house martin (Delichon dasypus) using the long-read High Fidelity sequencing approach to initiate the search for genes associated with its feathered feet. We employed the whole-genome alignment of D. dasypus with other swallow species to identify high-SNP regions and chromosomal inversions in the D. dasypus genome. After filtering out variations unrelated to D. dasypus evolution, we found six genes related to feather development near the high-SNP regions. We also detected three feather development genes in chromosomal inversions between the Asian house martin and the barn swallow genomes. We discussed their association with the wingless/integrated (WNT), bone morphogenetic protein, and fibroblast growth factor pathways and their potential roles in feathered-foot development. Future studies are encouraged to utilize the D. dasypus genome to explore the evolutionary process of the feathered-foot trait in avian species. This endeavor will shed light on the evolutionary path of feathers in birds.
Collapse
Affiliation(s)
- Yuan-Fu Chan
- Biodiversity Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Chia-Wei Lu
- Biodiversity Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Hao-Chih Kuo
- Biodiversity Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Chih-Ming Hung
- Biodiversity Research Center, Academia Sinica, Taipei 11529, Taiwan
| |
Collapse
|
7
|
Recuerda M, Campagna L. How structural variants shape avian phenotypes: Lessons from model systems. Mol Ecol 2024; 33:e17364. [PMID: 38651830 DOI: 10.1111/mec.17364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 04/04/2024] [Accepted: 04/09/2024] [Indexed: 04/25/2024]
Abstract
Despite receiving significant recent attention, the relevance of structural variation (SV) in driving phenotypic diversity remains understudied, although recent advances in long-read sequencing, bioinformatics and pangenomic approaches have enhanced SV detection. We review the role of SVs in shaping phenotypes in avian model systems, and identify some general patterns in SV type, length and their associated traits. We found that most of the avian SVs so far identified are short indels in chickens, which are frequently associated with changes in body weight and plumage colouration. Overall, we found that relatively short SVs are more frequently detected, likely due to a combination of their prevalence compared to large SVs, and a detection bias, stemming primarily from the widespread use of short-read sequencing and associated analytical methods. SVs most commonly involve non-coding regions, especially introns, and when patterns of inheritance were reported, SVs associated primarily with dominant discrete traits. We summarise several examples of phenotypic convergence across different species, mediated by different SVs in the same or different genes and different types of changes in the same gene that can lead to various phenotypes. Complex rearrangements and supergenes, which can simultaneously affect and link several genes, tend to have pleiotropic phenotypic effects. Additionally, SVs commonly co-occur with single-nucleotide polymorphisms, highlighting the need to consider all types of genetic changes to understand the basis of phenotypic traits. We end by summarising expectations for when long-read technologies become commonly implemented in non-model birds, likely leading to an increase in SV discovery and characterisation. The growing interest in this subject suggests an increase in our understanding of the phenotypic effects of SVs in upcoming years.
Collapse
Affiliation(s)
- María Recuerda
- Fuller Evolutionary Biology Program, Cornell Lab of Ornithology, Ithaca, New York, USA
| | - Leonardo Campagna
- Fuller Evolutionary Biology Program, Cornell Lab of Ornithology, Ithaca, New York, USA
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, New York, USA
| |
Collapse
|
8
|
Sebastianelli M, Lukhele SM, Secomandi S, de Souza SG, Haase B, Moysi M, Nikiforou C, Hutfluss A, Mountcastle J, Balacco J, Pelan S, Chow W, Fedrigo O, Downs CT, Monadjem A, Dingemanse NJ, Jarvis ED, Brelsford A, vonHoldt BM, Kirschel ANG. A genomic basis of vocal rhythm in birds. Nat Commun 2024; 15:3095. [PMID: 38653976 DOI: 10.1038/s41467-024-47305-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 03/22/2024] [Indexed: 04/25/2024] Open
Abstract
Vocal rhythm plays a fundamental role in sexual selection and species recognition in birds, but little is known of its genetic basis due to the confounding effect of vocal learning in model systems. Uncovering its genetic basis could facilitate identifying genes potentially important in speciation. Here we investigate the genomic underpinnings of rhythm in vocal non-learning Pogoniulus tinkerbirds using 135 individual whole genomes distributed across a southern African hybrid zone. We find rhythm speed is associated with two genes that are also known to affect human speech, Neurexin-1 and Coenzyme Q8A. Models leveraging ancestry reveal these candidate loci also impact rhythmic stability, a trait linked with motor performance which is an indicator of quality. Character displacement in rhythmic stability suggests possible reinforcement against hybridization, supported by evidence of asymmetric assortative mating in the species producing faster, more stable rhythms. Because rhythm is omnipresent in animal communication, candidate genes identified here may shape vocal rhythm across birds and other vertebrates.
Collapse
Affiliation(s)
- Matteo Sebastianelli
- Department of Biological Sciences, University of Cyprus, PO Box 20537, Nicosia, 1678, Cyprus.
- Department of Medical Biochemistry and Microbiology, Uppsala University, Box 582, 751 23, Uppsala, Sweden.
| | - Sifiso M Lukhele
- Department of Biological Sciences, University of Cyprus, PO Box 20537, Nicosia, 1678, Cyprus
| | - Simona Secomandi
- Department of Biological Sciences, University of Cyprus, PO Box 20537, Nicosia, 1678, Cyprus
| | - Stacey G de Souza
- Department of Biological Sciences, University of Cyprus, PO Box 20537, Nicosia, 1678, Cyprus
| | - Bettina Haase
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | - Michaella Moysi
- Department of Biological Sciences, University of Cyprus, PO Box 20537, Nicosia, 1678, Cyprus
| | - Christos Nikiforou
- Department of Biological Sciences, University of Cyprus, PO Box 20537, Nicosia, 1678, Cyprus
| | - Alexander Hutfluss
- Behavioural Ecology, Faculty of Biology, LMU Munich (LMU), 82152, Planegg-Martinsried, Germany
| | | | - Jennifer Balacco
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | | | | | - Olivier Fedrigo
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | - Colleen T Downs
- Centre for Functional Biodiversity, School of Life Sciences, University of KwaZulu-Natal, Pietermaritzburg, 3209, South Africa
| | - Ara Monadjem
- Department of Biological Sciences, University of Eswatini, Kwaluseni, Eswatini
- Mammal Research Institute, Department of Zoology & Entomology, University of Pretoria, Private Bag 20, Hatfield, 0028, Pretoria, South Africa
| | - Niels J Dingemanse
- Behavioural Ecology, Faculty of Biology, LMU Munich (LMU), 82152, Planegg-Martinsried, Germany
| | - Erich D Jarvis
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Alan Brelsford
- Department of Evolution, Ecology and Organismal Biology, University of California Riverside, Riverside, CA, 92521, USA
| | - Bridgett M vonHoldt
- Department of Ecology & Evolutionary Biology, Princeton University, Princeton, NJ, 08544, USA
| | - Alexander N G Kirschel
- Department of Biological Sciences, University of Cyprus, PO Box 20537, Nicosia, 1678, Cyprus.
| |
Collapse
|
9
|
Kalleberg J, Rissman J, Schnabel RD. Overcoming Limitations to Deep Learning in Domesticated Animals with TrioTrain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.15.589602. [PMID: 38659907 PMCID: PMC11042298 DOI: 10.1101/2024.04.15.589602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Variant calling across diverse species remains challenging as most bioinformatics tools default to assumptions based on human genomes. DeepVariant (DV) excels without joint genotyping while offering fewer implementation barriers. However, the growing appeal of a "universal" algorithm has magnified the unknown impacts when used with non-human genomes. Here, we use bovine genomes to assess the limits of human-genome-trained models in other species. We introduce the first multi-species DV model that achieves a lower Mendelian Inheritance Error (MIE) rate during single-sample genotyping. Our novel approach, TrioTrain, automates extending DV for species without Genome In A Bottle (GIAB) resources and uses region shuffling to mitigate barriers for SLURM-based clusters. To offset imperfect truth labels for animal genomes, we remove Mendelian discordant variants before training, where models are tuned to genotype the offspring correctly. With TrioTrain, we use cattle, yak, and bison trios to build 30 model iterations across five phases. We observe remarkable performance across phases when testing the GIAB human trios with a mean SNP F1 score >0.990. In HG002, our phase 4 bovine model identifies more variants at a lower MIE rate than DeepTrio. In bovine F1-hybrid genomes, our model substantially reduces inheritance errors with a mean MIE rate of 0.03 percent. Although constrained by imperfect labels, we find that multi-species, trio-based training produces a robust variant calling model. Our research demonstrates that exclusively training with human genomes restricts the application of deep-learning approaches for comparative genomics.
Collapse
Affiliation(s)
- Jenna Kalleberg
- University of Missouri, Division of Animal Sciences, Columbia, MO, 65201 USA
| | - Jacob Rissman
- University of Missouri, Division of Animal Sciences, Columbia, MO, 65201 USA
| | - Robert D Schnabel
- University of Missouri, Division of Animal Sciences, Columbia, MO, 65201 USA
- University of Missouri, Genetics Area Program, Columbia, MO, 65201 USA
| |
Collapse
|
10
|
Bukhman YV, Meyer S, Chu LF, Abueg L, Antosiewicz-Bourget J, Balacco J, Brecht M, Dinatale E, Fedrigo O, Formenti G, Fungtammasan A, Giri SJ, Hiller M, Howe K, Kihara D, Mamott D, Mountcastle J, Pelan S, Rabbani K, Sims Y, Tracey A, Wood JMD, Jarvis ED, Thomson JA, Chaisson MJP, Stewart R. Chromosome level genome assembly of the Etruscan shrew Suncus etruscus. Sci Data 2024; 11:176. [PMID: 38326333 PMCID: PMC10850158 DOI: 10.1038/s41597-024-03011-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 01/26/2024] [Indexed: 02/09/2024] Open
Abstract
Suncus etruscus is one of the world's smallest mammals, with an average body mass of about 2 grams. The Etruscan shrew's small body is accompanied by a very high energy demand and numerous metabolic adaptations. Here we report a chromosome-level genome assembly using PacBio long read sequencing, 10X Genomics linked short reads, optical mapping, and Hi-C linked reads. The assembly is partially phased, with the 2.472 Gbp primary pseudohaplotype and 1.515 Gbp alternate. We manually curated the primary assembly and identified 22 chromosomes, including X and Y sex chromosomes. The NCBI genome annotation pipeline identified 39,091 genes, 19,819 of them protein-coding. We also identified segmental duplications, inferred GO term annotations, and computed orthologs of human and mouse genes. This reference-quality genome will be an important resource for research on mammalian development, metabolism, and body size control.
Collapse
Affiliation(s)
- Yury V Bukhman
- Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA.
| | - Susanne Meyer
- Neuroscience Research Institute, University of California - Santa Barbara, 494 UCEN Rd, Isla Vista, CA, 93117, USA
| | - Li-Fang Chu
- Department of Comparative Biology and Experimental Medicine, University of Calgary, 2500 University Drive NW, Calgary, Alberta, T2N 1N4, Canada
| | - Linelle Abueg
- Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA
| | | | - Jennifer Balacco
- Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA
| | - Michael Brecht
- BCCN/Humboldt University Berlin, Philippstr, 13 House 6, 10115, Berlin, Germany
| | - Erica Dinatale
- Max Planck Institute for Biology Tübingen, Max-Planck-Ring 5, 72076, Tübingen, Germany
| | - Olivier Fedrigo
- Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA
| | - Giulio Formenti
- Laboratory of Neurogenetics of Language, The Rockefeller University/HHMI, 1230 York Avenue, New York, NY, 10065, USA
| | | | - Swagarika Jaharlal Giri
- Department of Computer Science, Purdue University, 249 S. Martin Jischke Dr, West Lafayette, IN, 47907, USA
| | - Michael Hiller
- LOEWE Centre for Translational Biodiversity Genomics, Senckenberganlage 25, 60325, Frankfurt, Germany
- Senckenberg Research Institute, Senckenberganlage 25, 60325, Frankfurt, Germany
- Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 9, 60438, Frankfurt, Germany
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, 249 S. Martin Jischke Dr, West Lafayette, IN, 47907, USA
- Department of Biological Sciences, Purdue University, 249 S. Martin Jischke Dr., West Lafayette, IN, 47907, USA
| | - Daniel Mamott
- Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA
| | - Jacquelyn Mountcastle
- Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA
| | - Sarah Pelan
- Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
| | - Keon Rabbani
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way RRI 408, Los Angeles, CA, 90089, USA
| | - Ying Sims
- Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
| | - Alan Tracey
- Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
| | | | - Erich D Jarvis
- Vertebrate Genome Lab, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University/HHMI, 1230 York Avenue, New York, NY, 10065, USA
| | - James A Thomson
- Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA
- Department of Molecular, Cellular and Developmental Biology, University of California Santa Barbara, Santa Barbara, CA, 93106, USA
- Department of Cell and Regenerative Biology, University of Wisconsin School of Medicine and Public Health, Madison, WI, 53726, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way RRI 408, Los Angeles, CA, 90089, USA
| | - Ron Stewart
- Regenerative Biology, Morgridge Institute for Research, 330 N. Orchard St., Madison, WI, 53715, USA
| |
Collapse
|
11
|
O’Connor RE, Kretschmer R, Romanov MN, Griffin DK. A Bird's-Eye View of Chromosomic Evolution in the Class Aves. Cells 2024; 13:310. [PMID: 38391923 PMCID: PMC10886771 DOI: 10.3390/cells13040310] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 01/27/2024] [Accepted: 02/05/2024] [Indexed: 02/24/2024] Open
Abstract
Birds (Aves) are the most speciose of terrestrial vertebrates, displaying Class-specific characteristics yet incredible external phenotypic diversity. Critical to agriculture and as model organisms, birds have adapted to many habitats. The only extant examples of dinosaurs, birds emerged ~150 mya and >10% are currently threatened with extinction. This review is a comprehensive overview of avian genome ("chromosomic") organization research based mostly on chromosome painting and BAC-based studies. We discuss traditional and contemporary tools for reliably generating chromosome-level assemblies and analyzing multiple species at a higher resolution and wider phylogenetic distance than previously possible. These results permit more detailed investigations into inter- and intrachromosomal rearrangements, providing unique insights into evolution and speciation mechanisms. The 'signature' avian karyotype likely arose ~250 mya and remained largely unchanged in most groups including extinct dinosaurs. Exceptions include Psittaciformes, Falconiformes, Caprimulgiformes, Cuculiformes, Suliformes, occasional Passeriformes, Ciconiiformes, and Pelecaniformes. The reasons for this remarkable conservation may be the greater diploid chromosome number generating variation (the driver of natural selection) through a greater possible combination of gametes and/or an increase in recombination rate. A deeper understanding of avian genomic structure permits the exploration of fundamental biological questions pertaining to the role of evolutionary breakpoint regions and homologous synteny blocks.
Collapse
Affiliation(s)
- Rebecca E. O’Connor
- School of Biosciences, University of Kent, Canterbury CT2 7NJ, UK; (R.E.O.); (M.N.R.)
| | - Rafael Kretschmer
- Departamento de Ecologia, Zoologia e Genética, Instituto de Biologia, Campus Universitário Capão do Leão, Universidade Federal de Pelotas, Pelotas 96010-900, RS, Brazil;
| | - Michael N. Romanov
- School of Biosciences, University of Kent, Canterbury CT2 7NJ, UK; (R.E.O.); (M.N.R.)
- L. K. Ernst Federal Research Centre for Animal Husbandry, Dubrovitsy, 142132 Podolsk, Moscow Oblast, Russia
| | - Darren K. Griffin
- School of Biosciences, University of Kent, Canterbury CT2 7NJ, UK; (R.E.O.); (M.N.R.)
| |
Collapse
|
12
|
Rice ES, Alberdi A, Alfieri J, Athrey G, Balacco JR, Bardou P, Blackmon H, Charles M, Cheng HH, Fedrigo O, Fiddaman SR, Formenti G, Frantz LAF, Gilbert MTP, Hearn CJ, Jarvis ED, Klopp C, Marcos S, Mason AS, Velez-Irizarry D, Xu L, Warren WC. A pangenome graph reference of 30 chicken genomes allows genotyping of large and complex structural variants. BMC Biol 2023; 21:267. [PMID: 37993882 PMCID: PMC10664547 DOI: 10.1186/s12915-023-01758-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 11/02/2023] [Indexed: 11/24/2023] Open
Abstract
BACKGROUND The red junglefowl, the wild outgroup of domestic chickens, has historically served as a reference for genomic studies of domestic chickens. These studies have provided insight into the etiology of traits of commercial importance. However, the use of a single reference genome does not capture diversity present among modern breeds, many of which have accumulated molecular changes due to drift and selection. While reference-based resequencing is well-suited to cataloging simple variants such as single-nucleotide changes and short insertions and deletions, it is mostly inadequate to discover more complex structural variation in the genome. METHODS We present a pangenome for the domestic chicken consisting of thirty assemblies of chickens from different breeds and research lines. RESULTS We demonstrate how this pangenome can be used to catalog structural variants present in modern breeds and untangle complex nested variation. We show that alignment of short reads from 100 diverse wild and domestic chickens to this pangenome reduces reference bias by 38%, which affects downstream genotyping results. This approach also allows for the accurate genotyping of a large and complex pair of structural variants at the K feathering locus using short reads, which would not be possible using a linear reference. CONCLUSIONS We expect that this new paradigm of genomic reference will allow better pinpointing of exact mutations responsible for specific phenotypes, which will in turn be necessary for breeding chickens that meet new sustainability criteria and are resilient to quickly evolving pathogen threats.
Collapse
Affiliation(s)
- Edward S Rice
- Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
- Faculty of Veterinary Medicine, Ludwig-Maximilians-Universität, Munich, Germany
| | - Antton Alberdi
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen (UCPH), Copenhagen, Denmark
| | - James Alfieri
- Department of Ecology & Evolutionary Biology, Texas A&M University, College Station, TX, USA
| | - Giridhar Athrey
- Department of Poultry Science, Texas A&M University, College Station, TX, USA
| | - Jennifer R Balacco
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Philippe Bardou
- Sigenae, GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, 31326, France
| | - Heath Blackmon
- Department of Biology, Texas A&M University, College Station, TX, USA
| | - Mathieu Charles
- University Paris-Saclay, INRAE, AgroParisTech, GABI, Sigenae, Jouy-en-Josas, France
| | - Hans H Cheng
- Avian Disease and Oncology Laboratory, USDA, ARS, USNPRC, East Lansing, MI, USA
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | | | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Laurent A F Frantz
- Faculty of Veterinary Medicine, Ludwig-Maximilians-Universität, Munich, Germany
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, E1 4DQ, UK
| | - M Thomas P Gilbert
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen (UCPH), Copenhagen, Denmark
| | - Cari J Hearn
- Avian Disease and Oncology Laboratory, USDA, ARS, USNPRC, East Lansing, MI, USA
| | - Erich D Jarvis
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
- The Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Christophe Klopp
- Sigenae, Genotoul Bioinfo, MIAT UR875, INRAE, Castanet Tolosan, France
| | - Sofia Marcos
- Center for Evolutionary Hologenomics, Globe Institute, University of Copenhagen (UCPH), Copenhagen, Denmark
- Applied Genomics and Bioinformatics, University of the Basque Country (UPV/EHU), Leioa, Bilbao, Spain
| | | | | | - Luohao Xu
- Key Laboratory of Freshwater Fish Reproduction and Development (Ministry of Education), Key Laboratory of Aquatic Science of Chongqing, School of Life Sciences, Southwest University, Chongqing, 400715, China
| | - Wesley C Warren
- Department of Animal Sciences, University of Missouri, Columbia, MO, USA.
| |
Collapse
|
13
|
Saco A, Rey-Campos M, Gallardo-Escárate C, Gerdol M, Novoa B, Figueras A. Gene presence/absence variation in Mytilus galloprovincialis and its implications in gene expression and adaptation. iScience 2023; 26:107827. [PMID: 37744033 PMCID: PMC10514466 DOI: 10.1016/j.isci.2023.107827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 07/12/2023] [Accepted: 09/01/2023] [Indexed: 09/26/2023] Open
Abstract
Presence/absence variation (PAV) is a well-known phenomenon in prokaryotes that was described for the first time in bivalves in 2020 in Mytilus galloprovincialis. The objective of the present study was to further our understanding of the PAV phenomenon in mussel biology. The distribution of PAV was studied in a mussel chromosome-level genome assembly, revealing a widespread distribution but with hotspots of dispensability. Special attention was given to the effect of PAV in gene expression, since dispensable genes were found to be inherently subject to distortions due to their sparse distribution among individuals. Furthermore, the high expression and strong tissue specificity of some dispensable genes, such as myticins, strongly supported their biological relevance. The significant differences in the repertoire of dispensable genes associated with two geographically distinct populations suggest that PAV is involved in local adaptation. Overall, the PAV phenomenon would provide a key selective advantage at the population level.
Collapse
Affiliation(s)
- Amaro Saco
- Institute of Marine Research, Spanish National Research Council, Vigo, Spain
| | - Magalí Rey-Campos
- Institute of Marine Research, Spanish National Research Council, Vigo, Spain
| | | | - Marco Gerdol
- Department of Life Sciences, University of Trieste, Trieste, Italy
| | - Beatriz Novoa
- Institute of Marine Research, Spanish National Research Council, Vigo, Spain
| | - Antonio Figueras
- Institute of Marine Research, Spanish National Research Council, Vigo, Spain
| |
Collapse
|