1
|
Carter JK, Kimball RT, Funk ER, Kane NC, Schield DR, Spellman GM, Safran RJ. Estimating phylogenies from genomes: A beginners review of commonly used genomic data in vertebrate phylogenomics. J Hered 2023; 114:1-13. [PMID: 36808491 DOI: 10.1093/jhered/esac061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 10/26/2022] [Indexed: 02/20/2023] Open
Abstract
Despite the increasing feasibility of sequencing whole genomes from diverse taxa, a persistent problem in phylogenomics is the selection of appropriate genetic markers or loci for a given taxonomic group or research question. In this review, we aim to streamline the decision-making process when selecting specific markers to use in phylogenomic studies by introducing commonly used types of genomic markers, their evolutionary characteristics, and their associated uses in phylogenomics. Specifically, we review the utilities of ultraconserved elements (including flanking regions), anchored hybrid enrichment loci, conserved nonexonic elements, untranslated regions, introns, exons, mitochondrial DNA, single nucleotide polymorphisms, and anonymous regions (nonspecific regions that are evenly or randomly distributed across the genome). These various genomic elements and regions differ in their substitution rates, likelihood of neutrality or of being strongly linked to loci under selection, and mode of inheritance, each of which are important considerations in phylogenomic reconstruction. These features may give each type of marker important advantages and disadvantages depending on the biological question, number of taxa sampled, evolutionary timescale, cost effectiveness, and analytical methods used. We provide a concise outline as a resource to efficiently consider key aspects of each type of genetic marker. There are many factors to consider when designing phylogenomic studies, and this review may serve as a primer when weighing options between multiple potential phylogenomic markers.
Collapse
Affiliation(s)
- Javan K Carter
- Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, United States
- Genomics, Bioinformatics, and Translational Research Center, Research Triangle Institute International, RTP, NC, United States
| | - Rebecca T Kimball
- Department of Biology, University of Florida, Gainesville, FL, United States
| | - Erik R Funk
- Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, United States
- Department of Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA, United States
| | - Nolan C Kane
- Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, United States
| | - Drew R Schield
- Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, United States
| | - Garth M Spellman
- Department of Zoology, Denver Museum of Nature and Science, Denver, CO, United States
| | - Rebecca J Safran
- Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, United States
| |
Collapse
|
2
|
Card DC, Jennings WB, Edwards SV. Genome Evolution and the Future of Phylogenomics of Non-Avian Reptiles. Animals (Basel) 2023; 13. [PMID: 36766360 DOI: 10.3390/ani13030471] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 01/13/2023] [Accepted: 01/15/2023] [Indexed: 02/01/2023] Open
Abstract
Non-avian reptiles comprise a large proportion of amniote vertebrate diversity, with squamate reptiles-lizards and snakes-recently overtaking birds as the most species-rich tetrapod radiation. Despite displaying an extraordinary diversity of phenotypic and genomic traits, genomic resources in non-avian reptiles have accumulated more slowly than they have in mammals and birds, the remaining amniotes. Here we review the remarkable natural history of non-avian reptiles, with a focus on the physical traits, genomic characteristics, and sequence compositional patterns that comprise key axes of variation across amniotes. We argue that the high evolutionary diversity of non-avian reptiles can fuel a new generation of whole-genome phylogenomic analyses. A survey of phylogenetic investigations in non-avian reptiles shows that sequence capture-based approaches are the most commonly used, with studies of markers known as ultraconserved elements (UCEs) especially well represented. However, many other types of markers exist and are increasingly being mined from genome assemblies in silico, including some with greater information potential than UCEs for certain investigations. We discuss the importance of high-quality genomic resources and methods for bioinformatically extracting a range of marker sets from genome assemblies. Finally, we encourage herpetologists working in genomics, genetics, evolutionary biology, and other fields to work collectively towards building genomic resources for non-avian reptiles, especially squamates, that rival those already in place for mammals and birds. Overall, the development of this cross-amniote phylogenomic tree of life will contribute to illuminate interesting dimensions of biodiversity across non-avian reptiles and broader amniotes.
Collapse
|
3
|
Gawehns F, Postuma M, Van Antro M, Nunn A, Sepers B, Fatma S, van Gurp TP, Wagemaker NCAM, Mateman AC, Milanovic-Ivanovic S, Grosse I, van Oers K, Vergeer P, Verhoeven KJF. epiGBS2: Improvements and evaluation of highly multiplexed, epiGBS-based reduced representation bisulfite sequencing. Mol Ecol Resour 2022; 22:2087-2104. [PMID: 35178872 PMCID: PMC9311447 DOI: 10.1111/1755-0998.13597] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 02/04/2022] [Accepted: 02/09/2022] [Indexed: 11/28/2022]
Abstract
Several reduced‐representation bisulfite sequencing methods have been developed in recent years to determine cytosine methylation de novo in nonmodel species. Here, we present epiGBS2, a laboratory protocol based on epiGBS with a revised and user‐friendly bioinformatics pipeline for a wide range of species with or without a reference genome. epiGBS2 is cost‐ and time‐efficient and the computational workflow is designed in a user‐friendly and reproducible manner. The library protocol allows a flexible choice of restriction enzymes and a double digest. The bioinformatics pipeline was integrated in the snakemake workflow management system, which makes the pipeline easy to execute and modular, and parameter settings for important computational steps flexible. We implemented bismark for alignment and methylation analysis and we preprocessed alignment files by double masking to enable single nucleotide polymorphism calling with freebayes (epifreebayes). The performance of several critical steps in epiGBS2 was evaluated against baseline data sets from Arabidopsis thaliana and great tit (Parus major), which confirmed its overall good performance. We provide a detailed description of the laboratory protocol and an extensive manual of the bioinformatics pipeline, which is publicly accessible on github (https://github.com/nioo‐knaw/epiGBS2) and zenodo (https://doi.org/10.5281/zenodo.4764652).
Collapse
Affiliation(s)
- Fleur Gawehns
- Netherlands Institute of Ecology (NIOO-KNAW), Bioinformatics Unit, Wageningen, the Netherlands
| | - Maarten Postuma
- Netherlands Institute of Ecology (NIOO-KNAW), Department of Terrestrial Ecology, Wageningen, the Netherlands.,Wageningen University & Research (WUR), Plant Ecology and Nature Conservation Group, Wageningen, the Netherlands
| | - Morgane Van Antro
- Netherlands Institute of Ecology (NIOO-KNAW), Department of Terrestrial Ecology, Wageningen, the Netherlands
| | - Adam Nunn
- ecSeq Bioinformatics GmbH, Leipzig, Germany.,Universität Leipzig, Institut für Informatik, Leipzig, Germany
| | - Bernice Sepers
- Netherlands Institute of Ecology (NIOO-KNAW), Department of Animal Ecology, Wageningen, the Netherlands.,Wageningen University & Research (WUR), Behavioural Ecology Group, Wageningen, the Netherlands
| | - Samar Fatma
- Martin Luther University Halle-Wittenberg, Institute of Computer Science, Halle, Germany
| | - Thomas P van Gurp
- Netherlands Institute of Ecology (NIOO-KNAW), Department of Terrestrial Ecology, Wageningen, the Netherlands
| | | | - A Christa Mateman
- Netherlands Institute of Ecology (NIOO-KNAW), Department of Animal Ecology, Wageningen, the Netherlands
| | - Slavica Milanovic-Ivanovic
- Netherlands Institute of Ecology (NIOO-KNAW), Department of Terrestrial Ecology, Wageningen, the Netherlands
| | - Ivo Grosse
- Martin Luther University Halle-Wittenberg, Institute of Computer Science, Halle, Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
| | - Kees van Oers
- Netherlands Institute of Ecology (NIOO-KNAW), Department of Animal Ecology, Wageningen, the Netherlands.,Wageningen University & Research (WUR), Behavioural Ecology Group, Wageningen, the Netherlands
| | - Philippine Vergeer
- Wageningen University & Research (WUR), Plant Ecology and Nature Conservation Group, Wageningen, the Netherlands
| | - Koen J F Verhoeven
- Netherlands Institute of Ecology (NIOO-KNAW), Department of Terrestrial Ecology, Wageningen, the Netherlands
| |
Collapse
|
4
|
Ahrens CW, Jordan R, Bragg J, Harrison PA, Hopley T, Bothwell H, Murray K, Steane DA, Whale JW, Byrne M, Andrew R, Rymer PD. Regarding the F-word: The effects of data filtering on inferred genotype-environment associations. Mol Ecol Resour 2021; 21:1460-1474. [PMID: 33565725 DOI: 10.1111/1755-0998.13351] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 02/01/2021] [Accepted: 02/05/2021] [Indexed: 01/05/2023]
Abstract
Genotype-environment association (GEA) methods have become part of the standard landscape genomics toolkit, yet, we know little about how to best filter genotype-by-sequencing data to provide robust inferences for environmental adaptation. In many cases, default filtering thresholds for minor allele frequency and missing data are applied regardless of sample size, having unknown impacts on the results, negatively affecting management strategies. Here, we investigate the effects of filtering on GEA results and the potential implications for assessment of adaptation to environment. We use empirical and simulated data sets derived from two widespread tree species to assess the effects of filtering on GEA outputs. Critically, we find that the level of filtering of missing data and minor allele frequency affect the identification of true positives. Even slight adjustments to these thresholds can change the rate of true positive detection. Using conservative thresholds for missing data and minor allele frequency substantially reduces the size of the data set, lessening the power to detect adaptive variants (i.e., simulated true positives) with strong and weak strengths of selection. Regardless, strength of selection was a good predictor for GEA detection, but even some SNPs under strong selection went undetected. False positive rates varied depending on the species and GEA method, and filtering significantly impacted the predictions of adaptive capacity in downstream analyses. We make several recommendations regarding filtering for GEA methods. Ultimately, there is no filtering panacea, but some choices are better than others, depending on the study system, availability of genomic resources, and desired objectives.
Collapse
Affiliation(s)
- Collin W Ahrens
- Hawkesbury Institute for the Environment, Western Sydney University, Richmond, NSW, Australia
| | | | - Jason Bragg
- Research Centre for Ecosystem Resilience, Australian Institute of Botanical Science, The Royal Botanic Garden, Sydney, NSW, Australia
| | - Peter A Harrison
- School of Natural Sciences and Australian Research Council Training Centre for Forest Value, University of Tasmania, Hobart, Tas., Australia
| | - Tara Hopley
- Department of Biodiversity, Conservation and Attractions, Biodiversity and Conservation Science, Perth, WA, Australia
| | | | - Kevin Murray
- Australian National University, Acton, ACT, Australia
| | - Dorothy A Steane
- CSIRO Land & Water, Hobart, Tas., Australia.,School of Natural Sciences and Australian Research Council Training Centre for Forest Value, University of Tasmania, Hobart, Tas., Australia
| | - John W Whale
- Hawkesbury Institute for the Environment, Western Sydney University, Richmond, NSW, Australia
| | - Margaret Byrne
- Department of Biodiversity, Conservation and Attractions, Biodiversity and Conservation Science, Perth, WA, Australia
| | - Rose Andrew
- School of Environmental and Rural Science, University of New England, Armidale, NSW, Australia
| | - Paul D Rymer
- Hawkesbury Institute for the Environment, Western Sydney University, Richmond, NSW, Australia
| |
Collapse
|
5
|
Karin BR, Gamble T, Jackman TR. Optimizing Phylogenomics with Rapidly Evolving Long Exons: Comparison with Anchored Hybrid Enrichment and Ultraconserved Elements. Mol Biol Evol 2020; 37:904-922. [PMID: 31710677 PMCID: PMC7038749 DOI: 10.1093/molbev/msz263] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Marker selection has emerged as an important component of phylogenomic study design due to rising concerns of the effects of gene tree estimation error, model misspecification, and data-type differences. Researchers must balance various trade-offs associated with locus length and evolutionary rate among other factors. The most commonly used reduced representation data sets for phylogenomics are ultraconserved elements (UCEs) and Anchored Hybrid Enrichment (AHE). Here, we introduce Rapidly Evolving Long Exon Capture (RELEC), a new set of loci that targets single exons that are both rapidly evolving (evolutionary rate faster than RAG1) and relatively long in length (>1,500 bp), while at the same time avoiding paralogy issues across amniotes. We compare the RELEC data set to UCEs and AHE in squamate reptiles by aligning and analyzing orthologous sequences from 17 squamate genomes, composed of 10 snakes and 7 lizards. The RELEC data set (179 loci) outperforms AHE and UCEs by maximizing per-locus genetic variation while maintaining presence and orthology across a range of evolutionary scales. RELEC markers show higher phylogenetic informativeness than UCE and AHE loci, and RELEC gene trees show greater similarity to the species tree than AHE or UCE gene trees. Furthermore, with fewer loci, RELEC remains computationally tractable for full Bayesian coalescent species tree analyses. We contrast RELEC to and discuss important aspects of comparable methods, and demonstrate how RELEC may be the most effective set of loci for resolving difficult nodes and rapid radiations. We provide several resources for capturing or extracting RELEC loci from other amniote groups.
Collapse
Affiliation(s)
- Benjamin R Karin
- Department of Biology, Villanova University, Villanova, PA
- Museum of Vertebrate Zoology and Department of Integrative Biology, University of California, Berkeley, CA
| | - Tony Gamble
- Department of Biological Sciences, Marquette University, Milwaukee, WI
- Milwaukee Public Museum, Milwaukee, WI
- Bell Museum of Natural History, University of Minnesota, St. Paul, MN
| | - Todd R Jackman
- Department of Biology, Villanova University, Villanova, PA
| |
Collapse
|
6
|
Varinli H, Statham AL, Clark SJ, Molloy PL, Ross JP. COBRA-Seq: Sensitive and Quantitative Methylome Profiling. Genes (Basel) 2015; 6:1140-63. [PMID: 26512698 PMCID: PMC4690032 DOI: 10.3390/genes6041140] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2015] [Revised: 09/22/2015] [Accepted: 09/24/2015] [Indexed: 12/15/2022] Open
Abstract
Combined Bisulfite Restriction Analysis (COBRA) quantifies DNA methylation at a specific locus. It does so via digestion of PCR amplicons produced from bisulfite-treated DNA, using a restriction enzyme that contains a cytosine within its recognition sequence, such as TaqI. Here, we introduce COBRA-seq, a genome wide reduced methylome method that requires minimal DNA input (0.1-1.0 mg) and can either use PCR or linear amplification to amplify the sequencing library. Variants of COBRA-seq can be used to explore CpG-depleted as well as CpG-rich regions in vertebrate DNA. The choice of enzyme influences enrichment for specific genomic features, such as CpG-rich promoters and CpG islands, or enrichment for less CpG dense regions such as enhancers. COBRA-seq coupled with linear amplification has the additional advantage of reduced PCR bias by producing full length fragments at high abundance. Unlike other reduced representative methylome methods, COBRA-seq has great flexibility in the choice of enzyme and can be multiplexed and tuned, to reduce sequencing costs and to interrogate different numbers of sites. Moreover, COBRA-seq is applicable to non-model organisms without the reference genome and compatible with the investigation of non-CpG methylation by using restriction enzymes containing CpA, CpT, and CpC in their recognition site.
Collapse
Affiliation(s)
- Hilal Varinli
- CSIRO Food and Nutrition Flagship, North Ryde, New South Wales 1670, Australia.
- Genomics and Epigenetics Division, Garvan Institute of Medical Research, Darlinghurst, New South Wales 2010, Australia.
- Department of Biological Sciences, Macquarie University, North Ryde, New South Wales 2109, Australia.
| | - Aaron L Statham
- Genomics and Epigenetics Division, Garvan Institute of Medical Research, Darlinghurst, New South Wales 2010, Australia.
| | - Susan J Clark
- Genomics and Epigenetics Division, Garvan Institute of Medical Research, Darlinghurst, New South Wales 2010, Australia.
- Vincent's Clinical School, Faculty of Medicine, UNSW, New South Wales 2010, Australia.
| | - Peter L Molloy
- CSIRO Food and Nutrition Flagship, North Ryde, New South Wales 1670, Australia.
| | - Jason P Ross
- CSIRO Food and Nutrition Flagship, North Ryde, New South Wales 1670, Australia.
| |
Collapse
|