1
|
Bernot JP, Owen CL, Wolfe JM, Meland K, Olesen J, Crandall KA. Major Revisions in Pancrustacean Phylogeny and Evidence of Sensitivity to Taxon Sampling. Mol Biol Evol 2023; 40:msad175. [PMID: 37552897 PMCID: PMC10414812 DOI: 10.1093/molbev/msad175] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 06/14/2023] [Accepted: 06/19/2023] [Indexed: 08/10/2023] Open
Abstract
The clade Pancrustacea, comprising crustaceans and hexapods, is the most diverse group of animals on earth, containing over 80% of animal species and half of animal biomass. It has been the subject of several recent phylogenomic analyses, yet relationships within Pancrustacea show a notable lack of stability. Here, the phylogeny is estimated with expanded taxon sampling, particularly of malacostracans. We show small changes in taxon sampling have large impacts on phylogenetic estimation. By analyzing identical orthologs between two slightly different taxon sets, we show that the differences in the resulting topologies are due primarily to the effects of taxon sampling on the phylogenetic reconstruction method. We compare trees resulting from our phylogenomic analyses with those from the literature to explore the large tree space of pancrustacean phylogenetic hypotheses and find that statistical topology tests reject the previously published trees in favor of the maximum likelihood trees produced here. Our results reject several clades including Caridoida, Eucarida, Multicrustacea, Vericrustacea, and Syncarida. Notably, we find Copepoda nested within Allotriocarida with high support and recover a novel relationship between decapods, euphausiids, and syncarids that we refer to as the Syneucarida. With denser taxon sampling, we find Stomatopoda sister to this latter clade, which we collectively name Stomatocarida, dividing Malacostraca into three clades: Leptostraca, Peracarida, and Stomatocarida. A new Bayesian divergence time estimation is conducted using 13 vetted fossils. We review our results in the context of other pancrustacean phylogenetic hypotheses and highlight 15 key taxa to sample in future studies.
Collapse
Affiliation(s)
- James P Bernot
- Department of Invertebrate Zoology, US National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA
| | - Christopher L Owen
- Systematic Entomology Laboratory, USDA-ARS, ℅ National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - Joanna M Wolfe
- Museum of Comparative Zoology and Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Kenneth Meland
- Department of Biology, University of Bergen, Bergen, Norway
| | - Jørgen Olesen
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Keith A Crandall
- Department of Invertebrate Zoology, US National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
- Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, USA
| |
Collapse
|
2
|
Owen CL, Marshall DC, Wade EJ, Meister R, Goemans G, Kunte K, Moulds M, Hill K, Villet M, Pham TH, Kortyna M, Lemmon EM, Lemmon AR, Simon C. Detecting and removing sample contamination in phylogenomic data: an example and its implications for Cicadidae phylogeny (Insecta: Hemiptera). Syst Biol 2022; 71:1504-1523. [PMID: 35708660 DOI: 10.1093/sysbio/syac043] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Revised: 05/23/2022] [Accepted: 06/07/2022] [Indexed: 11/13/2022] Open
Abstract
Contamination of a genetic sample with DNA from one or more non-target species is a continuing concern of molecular phylogenetic studies, both Sanger sequencing studies and Next-Generation Sequencing (NGS) studies. We developed an automated pipeline for identifying and excluding likely cross-contaminated loci based on detection of bimodal distributions of patristic distances across gene trees. When the contamination occurs between samples within a dataset, comparisons between a contaminated sample and its contaminant taxon will yield bimodal distributions with one peak close to zero patristic distance. This new method does not rely on a priori knowledge of taxon relatedness nor does it determine the causes(s) of the contamination. Exclusion of putatively contaminated loci from a dataset generated for the insect family Cicadidae showed that these sequences were affecting some topological patterns and branch supports, although the effects were sometimes subtle, with some contamination-influenced relationships exhibiting strong bootstrap support. Long tip branches and outlier values for one anchored phylogenomic pipeline statistic (AvgNHomologs) were correlated with the presence of contamination. While the AHE markers used here, which target hemipteroid taxa, proved effective in resolving deep and shallow level Cicadidae relationships in aggregate, individual markers contained inadequate phylogenetic signal, in part probably due to short length. The cleaned dataset, consisting of 429 loci, from 90 genera representing 44 of 56 current Cicadidae tribes, supported three of the four sampled Cicadidae subfamilies in concatenated-matrix maximum likelihood (ML) and multispecies coalescent-based species tree analyses, with the fourth subfamily weakly supported in the ML trees. No well-supported patterns from previous family-level Sanger sequencing studies of Cicadidae phylogeny were contradicted. One taxon (Aragualna plenalinea) did not fall with its current subfamily in the genetic tree, and this genus and its tribe Aragualnini is reclassified to Tibicininae following morphological re-examination. Only subtle differences were observed in trees after removal of loci for which divergent base frequencies were detected. Greater success may be achieved by increased taxon sampling and developing a probe set targeting a more recent common ancestor and longer loci. Searches for contamination are an essential step in phylogenomic analyses of all kinds and our pipeline is an effective solution.
Collapse
Affiliation(s)
- Christopher L Owen
- Systematic Entomology Laboratory, USDA-ARS, c/o National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - David C Marshall
- Dept. of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| | - Elizabeth J Wade
- Dept. of Natural Science and Mathematics, Curry College, Milton, MA 02186, USA
| | - Russ Meister
- Dept. of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| | - Geert Goemans
- Dept. of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| | - Krushnamegh Kunte
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, GKVK Campus, Bellary Road, Bangalore 560 065, India
| | - Max Moulds
- Australian Museum Research Institute, 1 William Street, Sydney N.S.W, Australia. 2010
| | - Kathy Hill
- Dept. of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| | - M Villet
- Dept. of Biology, Rhodes University, Grahamstown 6140, South Africa
| | - Thai-Hong Pham
- Mientrung Institute for Scientific Research, Vietnam Academy of Science and Technology, Hue, Vietnam.,Vietnam National Museum of Nature and Graduate School of Science and Technology, Vietnam Academy of Science and Technology, Hanoi, Vietnam
| | - Michelle Kortyna
- Department of Biological Science, Florida State University, 319 Stadium Drive, Tallahassee, USA
| | - Emily Moriarty Lemmon
- Department of Biological Science, Florida State University, 319 Stadium Drive, Tallahassee, FL 32306, USA
| | - Alan R Lemmon
- Department of Scientific Computing, Florida State University 400 Dirac Science Library, Tallahassee, FL 32306, USA
| | - Chris Simon
- Dept. of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
3
|
Hughes LC, Ortí G, Saad H, Li C, White WT, Baldwin CC, Crandall KA, Arcila D, Betancur-R R. Exon probe sets and bioinformatics pipelines for all levels of fish phylogenomics. Mol Ecol Resour 2020; 21:816-833. [PMID: 33084200 DOI: 10.1111/1755-0998.13287] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Accepted: 10/09/2020] [Indexed: 11/28/2022]
Abstract
Exon markers have a long history of use in phylogenetics of ray-finned fishes, the most diverse clade of vertebrates with more than 35,000 species. As the number of published genomes increases, it has become easier to test exons and other genetic markers for signals of ancient duplication events and filter out paralogues that can mislead phylogenetic analysis. We present seven new probe sets for current target-capture phylogenomic protocols that capture 1,104 exons explicitly filtered for paralogues using gene trees. These seven probe sets span the diversity of teleost fishes, including four sets that target five hyperdiverse percomorph clades which together comprise ca. 17,000 species (Carangaria, Ovalentaria, Eupercaria, and Syngnatharia + Pelagiaria combined). We additionally included probes to capture legacy nuclear exons and mitochondrial markers that have been commonly used in fish phylogenetics (despite some exons being flagged for paralogues) to facilitate integration of old and new molecular phylogenetic matrices. We tested these probes experimentally for 56 fish species (eight species per probe set) and merged new exon-capture sequence data into an existing data matrix of 1,104 exons and 300 ray-finned fish species. We provide an optimized bioinformatics pipeline to assemble exon capture data from raw reads to alignments for downstream analysis. We show that legacy loci with known paralogues are at risk of assembling duplicated sequences with target-capture, but we also assembled many useful orthologous sequences that can be integrated with many PCR-generated matrices. These probe sets are a valuable resource for advancing fish phylogenomics because targeted exons can easily be extracted from increasingly available whole genome and transcriptome data sets, and also may be integrated with existing PCR-based exon and mitochondrial data.
Collapse
Affiliation(s)
- Lily C Hughes
- Department of Biological Sciences, George Washington University, Washington, DC, USA.,Computational Biology Institute, Milken Institute of Public Health, George Washington University, Washington, DC, USA.,Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - Guillermo Ortí
- Department of Biological Sciences, George Washington University, Washington, DC, USA.,Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - Hadeel Saad
- Department of Biological Sciences, George Washington University, Washington, DC, USA
| | - Chenhong Li
- College of Fisheries and Life Sciences, Shanghai Ocean University, Shanghai, China
| | - William T White
- CSIRO Australian National Fish Collection, National Research Collections of Australia, Hobart, TAS, Australia
| | - Carole C Baldwin
- Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - Keith A Crandall
- Department of Biological Sciences, George Washington University, Washington, DC, USA.,Computational Biology Institute, Milken Institute of Public Health, George Washington University, Washington, DC, USA
| | - Dahiana Arcila
- Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA.,Sam Noble Oklahoma Museum of Natural History, Norman, OK, USA.,Department of Biology, University of Oklahoma, Norman, OK, USA
| | | |
Collapse
|