1
|
Sweet AD, Doña J, Johnson KP. Biogeographic History of Pigeons and Doves Drives the Origin and Diversification of Their Parasitic Body Lice. Syst Biol 2025; 74:198-214. [PMID: 39037176 DOI: 10.1093/sysbio/syae038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 07/03/2024] [Accepted: 07/20/2024] [Indexed: 07/23/2024] Open
Abstract
Despite their extensive diversity and ecological importance, the history of diversification for most groups of parasitic organisms remains relatively understudied. Elucidating broad macroevolutionary patterns of parasites is challenging, often limited by the availability of samples, genetic resources, and knowledge about ecological relationships with their hosts. In this study, we explore the macroevolutionary history of parasites by focusing on parasitic body lice from doves. Building on extensive knowledge of ecological relationships and previous phylogenomic studies of their avian hosts, we tested specific questions about the evolutionary origins of the body lice of doves, leveraging whole genome data sets for phylogenomics. Specifically, we sequenced whole genomes from 68 samples of dove body lice, including representatives of all body louse genera from 51 host taxa. From these data, we assembled > 2300 nuclear genes to estimate dated phylogenetic relationships among body lice and several outgroup taxa. The resulting phylogeny of body lice was well supported, although some branches had conflicting signals across the genome. We then reconstructed ancestral biogeographic ranges of body lice and compared the body louse phylogeny to the phylogeny of doves, and also to a previously published phylogeny of the wing lice of doves. Divergence estimates placed the origin of body lice in the late Oligocene. Body lice likely originated in Australasia and dispersed with their hosts during the early Miocene, with subsequent codivergence and host switching throughout the world. Notably, this evolutionary history is very similar to that of dove wing lice, despite the stronger dispersal capabilities of wing lice compared to body lice. Our results highlight the central role of the biogeographic history of host organisms in driving the evolutionary history of their parasites across time and geographic space.
Collapse
Affiliation(s)
- Andrew D Sweet
- Department of Biological Sciences, Arkansas State University, PO Box 599, State University, AR 72467, USA
| | - Jorge Doña
- Illinois Natural History Survey, Prairie Research Institute, University of Illinois, 1816 South Oak St., Champaign, IL 61820, USA
- Departamento de Zoología, Universidad de Granada, Avenida de la Fuente Nueva S/N, Granada 18071, Spain
| | - Kevin P Johnson
- Illinois Natural History Survey, Prairie Research Institute, University of Illinois, 1816 South Oak St., Champaign, IL 61820, USA
| |
Collapse
|
2
|
He R, Wang S, Li Q, Wang Z, Mei Y, Li F. Phylogenomic analysis and molecular identification of true fruit flies. Front Genet 2024; 15:1414074. [PMID: 38974385 PMCID: PMC11224437 DOI: 10.3389/fgene.2024.1414074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 05/30/2024] [Indexed: 07/09/2024] Open
Abstract
The family Tephritidae in the order Diptera, known as true fruit flies, are agriculturally important insect pests. However, the phylogenetic relationships of true fruit flies, remain controversial. Moreover, rapid identification of important invasive true fruit flies is essential for plant quarantine but is still challenging. To this end, we sequenced the genome of 16 true fruit fly species at coverage of 47-228×. Together with the previously reported genomes of nine species, we reconstructed phylogenetic trees of the Tephritidae using benchmarking universal single-copy ortholog (BUSCO), ultraconserved element (UCE) and anchored hybrid enrichment (AHE) gene sets, respectively. The resulting trees of 50% taxon-occupancy dataset for each marker type were generally congruent at 88% nodes for both concatenation and coalescent analyses. At the subfamily level, both Dacinae and Trypetinae are monophyletic. At the species level, Bactrocera dorsalis is more closely related to Bactrocera latifrons than Bactrocera tryoni. This is inconsistent with previous conclusions based on mitochondrial genes but consistent with recent studies based on nuclear data. By analyzing these genome data, we screened ten pairs of species-specific primers for molecular identification of ten invasive fruit flies, which PCR validated. In summary, our work provides draft genome data of 16 true fruit fly species, addressing the long-standing taxonomic controversies and providing species-specific primers for molecular identification of invasive fruit flies.
Collapse
Affiliation(s)
- Rong He
- State Key Laboratory of Rice Biology and Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Shuping Wang
- Technical Centre for Animal, Plant and Food Inspection and Quarantine, Shanghai Customs, Shanghai, China
| | - Qiang Li
- State Key Laboratory of Rice Biology and Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Zuoqi Wang
- State Key Laboratory of Rice Biology and Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Yang Mei
- State Key Laboratory of Rice Biology and Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Fei Li
- State Key Laboratory of Rice Biology and Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| |
Collapse
|
3
|
Boyd BM, James I, Johnson KP, Weiss RB, Bush SE, Clayton DH, Dale C. Stochasticity, determinism, and contingency shape genome evolution of endosymbiotic bacteria. Nat Commun 2024; 15:4571. [PMID: 38811551 PMCID: PMC11137140 DOI: 10.1038/s41467-024-48784-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 05/10/2024] [Indexed: 05/31/2024] Open
Abstract
Evolution results from the interaction of stochastic and deterministic processes that create a web of historical contingency, shaping gene content and organismal function. To understand the scope of this interaction, we examine the relative contributions of stochasticity, determinism, and contingency in shaping gene inactivation in 34 lineages of endosymbiotic bacteria, Sodalis, found in parasitic lice, Columbicola, that are independently undergoing genome degeneration. Here we show that the process of genome degeneration in this system is largely deterministic: genes involved in amino acid biosynthesis are lost while those involved in providing B-vitamins to the host are retained. In contrast, many genes encoding redundant functions, including components of the respiratory chain and DNA repair pathways, are subject to stochastic loss, yielding historical contingencies that constrain subsequent losses. Thus, while selection results in functional convergence between symbiont lineages, stochastic mutations initiate distinct evolutionary trajectories, generating diverse gene inventories that lack the functional redundancy typically found in free-living relatives.
Collapse
Affiliation(s)
- Bret M Boyd
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, US.
| | - Ian James
- School of Biological Sciences, University of Utah, Salt Lake City, UT, US
| | - Kevin P Johnson
- Illinois Natural History Survey, Prairie Research Institute, University of Illinois, Champaign, IL, US
| | - Robert B Weiss
- Department of Human Genetics, University of Utah, Salt Lake City, UT, US
| | - Sarah E Bush
- School of Biological Sciences, University of Utah, Salt Lake City, UT, US
| | - Dale H Clayton
- School of Biological Sciences, University of Utah, Salt Lake City, UT, US
| | - Colin Dale
- School of Biological Sciences, University of Utah, Salt Lake City, UT, US
| |
Collapse
|
4
|
Rachtman E, Sarmashghi S, Bafna V, Mirarab S. Quantifying the uncertainty of assembly-free genome-wide distance estimates and phylogenetic relationships using subsampling. Cell Syst 2022; 13:817-829.e3. [PMID: 36265468 PMCID: PMC9589918 DOI: 10.1016/j.cels.2022.06.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 03/14/2022] [Accepted: 06/28/2022] [Indexed: 01/26/2023]
Abstract
Computing distance between two genomes without alignments or even access to assemblies has many downstream analyses. However, alignment-free methods, including in the fast-growing field of genome skimming, are hampered by a significant methodological gap. While accurate methods (many k-mer-based) for assembly-free distance calculation exist, measuring the uncertainty of estimated distances has not been sufficiently studied. In this paper, we show that bootstrapping, the standard non-parametric method of measuring estimator uncertainty, is not accurate for k-mer-based methods that rely on k-mer frequency profiles. Instead, we propose using subsampling (with no replacement) in combination with a correction step to reduce the variance of the inferred distribution. We show that the distribution of distances using our procedure matches the true uncertainty of the estimator. The resulting phylogenetic support values effectively differentiate between correct and incorrect branches and identify controversial branches that change across alignment-free and alignment-based phylogenies reported in the literature.
Collapse
Affiliation(s)
- Eleonora Rachtman
- Bioinformatics and Systems Biology Graduate Program, UC San Diego, San Diego, CA 92093, USA
| | - Shahab Sarmashghi
- Department of Electrical and Computer Engineering, UC San Diego, San Diego, CA 92093, USA
| | - Vineet Bafna
- Department of Computer Science and Engineering, UC San Diego, San Diego, CA 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, UC San Diego, San Diego, CA 92093, USA.
| |
Collapse
|
5
|
Boyd BM, Nguyen NP, Allen JM, Waterhouse RM, Vo KB, Sweet AD, Clayton DH, Bush SE, Shapiro MD, Johnson KP. Long-distance dispersal of pigeons and doves generated new ecological opportunities for host-switching and adaptive radiation by their parasites. Proc Biol Sci 2022; 289:20220042. [PMID: 35259992 PMCID: PMC8905168 DOI: 10.1098/rspb.2022.0042] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Adaptive radiation is an important mechanism of organismal diversification and can be triggered by new ecological opportunities. Although poorly studied in this regard, parasites are an ideal group in which to study adaptive radiations because of their close associations with host species. Both experimental and comparative studies suggest that the ectoparasitic wing lice of pigeons and doves have adaptively radiated, leading to differences in body size and overall coloration. Here, we show that long-distance dispersal by dove hosts was central to parasite diversification because it provided new ecological opportunities for parasites to speciate after host-switching. We further show that among extant parasite lineages host-switching decreased over time, with cospeciation becoming the more dominant mode of parasite speciation. Taken together, our results suggest that host dispersal, followed by host-switching, provided novel ecological opportunities that facilitated adaptive radiation by parasites.
Collapse
Affiliation(s)
- Bret M Boyd
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Nam-Phuong Nguyen
- Department of Computer Science, University of Illinois, Champaign, IL, USA
| | - Julie M Allen
- Department of Biology, University of Nevada Reno, Reno, NV, USA
| | - Robert M Waterhouse
- Department of Ecology and Evolution, University of Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Kyle B Vo
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, USA
| | - Andrew D Sweet
- Department of Biological Sciences, Arkansas State University, Jonesboro, AR, USA
| | - Dale H Clayton
- School of Biological Sciences, University of Utah, Salt Lake City, UT, USA
| | - Sarah E Bush
- School of Biological Sciences, University of Utah, Salt Lake City, UT, USA
| | - Michael D Shapiro
- School of Biological Sciences, University of Utah, Salt Lake City, UT, USA
| | - Kevin P Johnson
- Illinois Natural History Survey, Prairie Research Institute, University of Illinois, Champaign, IL, USA
| |
Collapse
|
6
|
Jacob Machado D, Portella de Luna Marques F, Jiménez-Ferbans L, Grant T. An empirical test of the relationship between the bootstrap and likelihood ratio support in maximum likelihood phylogenetic analysis. Cladistics 2021; 38:392-401. [PMID: 34932221 DOI: 10.1111/cla.12496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/15/2021] [Indexed: 11/27/2022] Open
Abstract
In maximum likelihood (ML), the support for a clade can be calculated directly as the likelihood ratio (LR) or log-likelihood difference (S, LLD) of the best trees with and without the clade of interest. However, bootstrap (BS) clade frequencies are more pervasive in ML phylogenetics and are almost universally interpreted as measuring support. In addition to theoretical arguments against that interpretation, BS has several undesirable attributes for a support measure. For example, it does not vary in proportion to optimality or identify clades that are rejected by the evidence and can be overestimated due to missing data. Nevertheless, if BS is a reliable predictor of S, then it might be an efficient indirect method of measuring support-an attractive possibility, given the speed of many BS implementations. To assess the relationship between S and BS, we analyzed 106 empirical datasets retrieved from TreeBASE. Also, to evaluate the degree to which S and BS are affected by the number of replicates during suboptimal tree searches for S and pseudoreplicates during BS estimation, we randomly selected 5 of the 106 datasets and analyzed them using variable numbers of replicates and pseudoreplicates, respectively. The correlation between S and BS was extremely weak in the datasets we analyzed. Increasing the number of replicates during tree search decreased the estimated values of S for most clades, but the magnitude of change was small. In contrast, although increasing pseudoreplicates affected BS values for only approximately 40% of clades, values both increased and decreased, and they did so at much greater magnitudes. Increasing replicates/pseudoreplicates affected the rank order of clades in each tree for both S and BS. Our findings show decisively that BS is not an efficient indirect method of measuring support and suggest that even quite superficial searches to calculate S provide better estimates of support.
Collapse
Affiliation(s)
- Denis Jacob Machado
- Programa Inter-unidades de Pós-graduação em Bioinformática, Universidade de São Paulo, Rua do Matão 1010 São Paulo, SP 05508-090, Brazil.,Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, 9331 Robert D. Snyder Rd, Charlotte, NC 28223, USA
| | - Fernando Portella de Luna Marques
- Departamento de Zoologia, Instituto de Biociências, Universidade de São Paulo, Tv. 14, 101 - Butantã, São Paulo, SP, 05508-090, Brazil
| | - Larry Jiménez-Ferbans
- Facultad de Ciencias Básicas, Universidad del Magdalena, Carrera 32 No 22-08, Santa Marta D.T.C.H., Magdalena 470004, Colombia
| | - Taran Grant
- Departamento de Zoologia, Instituto de Biociências, Universidade de São Paulo, Tv. 14, 101 - Butantã, São Paulo, SP, 05508-090, Brazil
| |
Collapse
|
7
|
Bush SE, Gustafsson DR, Tkach VV, Clayton DH. A MISIDENTIFICATION CRISIS PLAGUES SPECIMEN-BASED RESEARCH: A CASE FOR GUIDELINES WITH A RECENT EXAMPLE (ALI ET AL., 2020). J Parasitol 2021; 107:262-266. [PMID: 33780971 DOI: 10.1645/21-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
A recent paper in this journal concerning parasites of rock pigeons (Columba livia) published by Ali and colleagues exemplifies a growing trend of misidentified parasites in the literature, despite increased online resources that should help facilitate accurate identification. In the Ali et al. paper, a pigeon louse in the genus Columbicola (Phthiraptera: Ischnocera) is misidentified as Menopon gallinae, which is a parasite of chickens (Gallus gallus) and their relatives; moreover, this louse is from an entirely different suborder of lice (Phthiraptera: Amblycera). Another louse is misidentified as Goniodes dissimilis, another parasite of chickens and junglefowl. In addition, photographs of cestodes from pigeons in the same paper are not sufficient to confirm identification. Misidentifications are fueled, in part, by increasing pressure to publish coupled with a decrease in taxonomic expertise. We consider the downstream consequences of misidentification and suggest guidelines for authors, reviewers, and editors that could help to improve the reliability of specimen-based research.
Collapse
Affiliation(s)
- Sarah E Bush
- School of Biological Sciences, University of Utah, 257 S. 1400 E., Salt Lake City, Utah 84112
| | - Daniel R Gustafsson
- Institute of Zoology, Guangdong Academy of Sciences, Guangzhou 510260, Guangdong, China
| | - Vasyl V Tkach
- Department of Biology, University of North Dakota, Grand Forks, North Dakota 58202
| | - Dale H Clayton
- School of Biological Sciences, University of Utah, 257 S. 1400 E., Salt Lake City, Utah 84112
| |
Collapse
|
8
|
Alickovic L, Johnson KP, Boyd BM. The reduced genome of a heritable symbiont from an ectoparasitic feather feeding louse. BMC Ecol Evol 2021; 21:108. [PMID: 34078265 PMCID: PMC8173840 DOI: 10.1186/s12862-021-01840-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 05/23/2021] [Indexed: 11/10/2022] Open
Abstract
Background Feather feeding lice are abundant and diverse ectoparasites that complete their entire life cycle on an avian host. The principal or sole source of nutrition for these lice is feathers. Feathers appear to lack four amino acids that the lice would require to complete development and reproduce. Several insect groups have acquired heritable and intracellular bacteria that can synthesize metabolites absent in an insect’s diet, allowing insects to feed exclusively on nutrient-poor resources. Multiple species of feather feeding lice have been shown to harbor heritable and intracellular bacteria. We expected that these bacteria augment the louse’s diet with amino acids and facilitated the evolution of these diverse and specialized parasites. Heritable symbionts of insects often have small genomes that contain a minimal set of genes needed to maintain essential cell functions and synthesize metabolites absent in the host insect’s diet. Therefore, we expected the genome of a bacterial endosymbiont in feather lice would be small, but encode pathways for biosynthesis of amino acids. Results We sequenced the genome of a bacterial symbiont from a feather feeding louse (Columbicola wolffhuegeli) that parasitizes the Pied Imperial Pigeon (Ducula bicolor) and used its genome to predict metabolism of amino acids based on the presence or absence of genes. We found that this bacterial symbiont has a small genome, similar to the genomes of heritable symbionts described in other insect groups. However, we failed to identify many of the genes that we expected would support metabolism of amino acids in the symbiont genome. We also evaluated other gene pathways and features of the highly reduced genome of this symbiotic bacterium. Conclusions Based on the data collected in this study, it does not appear that this bacterial symbiont can synthesize amino acids needed to complement the diet of a feather feeding louse. Our results raise additional questions about the biology of feather chewing lice and the roles of symbiotic bacteria in evolution of diverse avian parasites.
Collapse
Affiliation(s)
- Leila Alickovic
- Center for the Study of Biological Complexity, Virginia Commonwealth University, 1000 W. Cary St., Suite 111, Richmond, VA, 23284-2030, USA
| | - Kevin P Johnson
- Illinois Natural History Survey, Prairie Research Institute, University of Illinois, Champaign, IL, USA
| | - Bret M Boyd
- Center for the Study of Biological Complexity, Virginia Commonwealth University, 1000 W. Cary St., Suite 111, Richmond, VA, 23284-2030, USA.
| |
Collapse
|
9
|
Doña J, Sweet AD, Johnson KP. Comparing rates of introgression in parasitic feather lice with differing dispersal capabilities. Commun Biol 2020; 3:610. [PMID: 33097824 PMCID: PMC7584577 DOI: 10.1038/s42003-020-01345-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 09/30/2020] [Indexed: 12/14/2022] Open
Abstract
Organisms vary in their dispersal abilities, and these differences can have important biological consequences, such as impacting the likelihood of hybridization events. However, there is still much to learn about the factors influencing hybridization, and specifically how dispersal ability affects the opportunities for hybridization. Here, using the ecological replicate system of dove wing and body lice (Insecta: Phthiraptera), we show that species with higher dispersal abilities exhibited increased genomic signatures of introgression. Specifically, we found a higher proportion of introgressed genomic reads and more reticulated phylogenetic networks in wing lice, the louse group with higher dispersal abilities. Our results are consistent with the hypothesis that differences in dispersal ability might drive the extent of introgression through hybridization. Jorge Doña, Andrew Sweet and Kevin Johnson find that dove lice species with higher dispersal abilities have stronger genomic signatures of introgression. By using sequence data from multiple species of both wing and body lice from the same species of hosts, the authors are able to control for nearly all factors besides dispersal ability, demonstrating the power of this study system.
Collapse
Affiliation(s)
- Jorge Doña
- Illinois Natural History Survey, Prairie Research Institute, University of Illinois at Urbana-Champaign, 1816 S. Oak St., Champaign, IL, 61820, USA. .,Departamento de Biología Animal, Universidad de Granada, 18001, Granada, Spain.
| | - Andrew D Sweet
- Illinois Natural History Survey, Prairie Research Institute, University of Illinois at Urbana-Champaign, 1816 S. Oak St., Champaign, IL, 61820, USA.,Department of Entomology, Purdue University, 901 W. State St., West Lafayette, IN, 47907, USA
| | - Kevin P Johnson
- Illinois Natural History Survey, Prairie Research Institute, University of Illinois at Urbana-Champaign, 1816 S. Oak St., Champaign, IL, 61820, USA.
| |
Collapse
|
10
|
Bohmann K, Mirarab S, Bafna V, Gilbert MTP. Beyond DNA barcoding: The unrealized potential of genome skim data in sample identification. Mol Ecol 2020; 29:2521-2534. [PMID: 32542933 PMCID: PMC7496323 DOI: 10.1111/mec.15507] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 06/03/2020] [Accepted: 06/05/2020] [Indexed: 02/06/2023]
Abstract
Genetic tools are increasingly used to identify and discriminate between species. One key transition in this process was the recognition of the potential of the ca 658bp fragment of the organelle cytochrome c oxidase I (COI) as a barcode region, which revolutionized animal bioidentification and lead, among others, to the instigation of the Barcode of Life Database (BOLD), containing currently barcodes from >7.9 million specimens. Following this discovery, suggestions for other organellar regions and markers, and the primers with which to amplify them, have been continuously proposed. Most recently, the field has taken the leap from PCR-based generation of DNA references into shotgun sequencing-based "genome skimming" alternatives, with the ultimate goal of assembling organellar reference genomes. Unfortunately, in genome skimming approaches, much of the nuclear genome (as much as 99% of the sequence data) is discarded, which is not only wasteful, but can also limit the power of discrimination at, or below, the species level. Here, we advocate that the full shotgun sequence data can be used to assign an identity (that we term for convenience its "DNA-mark") for both voucher and query samples, without requiring any computationally intensive pretreatment (e.g. assembly) of reads. We argue that if reference databases are populated with such "DNA-marks," it will enable future DNA-based taxonomic identification to complement, or even replace PCR of barcodes with genome skimming, and we discuss how such methodology ultimately could enable identification to population, or even individual, level.
Collapse
Affiliation(s)
- Kristine Bohmann
- Section for Evolutionary GenomicsThe GLOBE InstituteUniversity of CopenhagenCopenhagenDenmark
| | - Siavash Mirarab
- Department of Electrical and Computer EngineeringUniversity of CaliforniaSan DiegoCAUSA
| | - Vineet Bafna
- Department of Computer Science and EngineeringUniversity of CaliforniaSan DiegoCAUSA
| | - M. Thomas P. Gilbert
- Section for Evolutionary GenomicsThe GLOBE InstituteUniversity of CopenhagenCopenhagenDenmark
- Center for Evolutionary HologenomicsThe GLOBE InstituteUniversity of CopenhagenCopenhagenDenmark
- NTNU University MuseumTrondheimNorway
| |
Collapse
|
11
|
Abstract
MOTIVATION Consider a simple computational problem. The inputs are (i) the set of mixed reads generated from a sample that combines two organisms and (ii) separate sets of reads for several reference genomes of known origins. The goal is to find the two organisms that constitute the mixed sample. When constituents are absent from the reference set, we seek to phylogenetically position them with respect to the underlying tree of the reference species. This simple yet fundamental problem (which we call phylogenetic double-placement) has enjoyed surprisingly little attention in the literature. As genome skimming (low-pass sequencing of genomes at low coverage, precluding assembly) becomes more prevalent, this problem finds wide-ranging applications in areas as varied as biodiversity research, food production and provenance, and evolutionary reconstruction. RESULTS We introduce a model that relates distances between a mixed sample and reference species to the distances between constituents and reference species. Our model is based on Jaccard indices computed between each sample represented as k-mer sets. The model, built on several assumptions and approximations, allows us to formalize the phylogenetic double-placement problem as a non-convex optimization problem that decomposes mixture distances and performs phylogenetic placement simultaneously. Using a variety of techniques, we are able to solve this optimization problem numerically. We test the resulting method, called MIxed Sample Analysis tool (MISA), on a varied set of simulated and biological datasets. Despite all the assumptions used, the method performs remarkably well in practice. AVAILABILITY AND IMPLEMENTATION The software and data are available at https://github.com/balabanmetin/misa and https://github.com/balabanmetin/misa-data. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Metin Balaban
- Bioinformatics and Systems Biology Department, University of California San Diego, San Diego, CA 92093, USA
| | - Siavash Mirarab
- Electrical and Computer Engineering Department, University of California San Diego, San Diego, CA 92093, USA
| |
Collapse
|
12
|
Balaban M, Sarmashghi S, Mirarab S. APPLES: Scalable Distance-Based Phylogenetic Placement with or without Alignments. Syst Biol 2020; 69:566-578. [PMID: 31545363 PMCID: PMC7164367 DOI: 10.1093/sysbio/syz063] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Revised: 09/05/2019] [Accepted: 09/10/2019] [Indexed: 11/14/2022] Open
Abstract
Placing a new species on an existing phylogeny has increasing relevance to several applications. Placement can be used to update phylogenies in a scalable fashion and can help identify unknown query samples using (meta-)barcoding, skimming, or metagenomic data. Maximum likelihood (ML) methods of phylogenetic placement exist, but these methods are not scalable to reference trees with many thousands of leaves, limiting their ability to enjoy benefits of dense taxon sampling in modern reference libraries. They also rely on assembled sequences for the reference set and aligned sequences for the query. Thus, ML methods cannot analyze data sets where the reference consists of unassembled reads, a scenario relevant to emerging applications of genome skimming for sample identification. We introduce APPLES, a distance-based method for phylogenetic placement. Compared to ML, APPLES is an order of magnitude faster and more memory efficient, and unlike ML, it is able to place on large backbone trees (tested for up to 200,000 leaves). We show that using dense references improves accuracy substantially so that APPLES on dense trees is more accurate than ML on sparser trees, where it can run. Finally, APPLES can accurately identify samples without assembled reference or aligned queries using kmer-based distances, a scenario that ML cannot handle. APPLES is available publically at github.com/balabanmetin/apples.
Collapse
Affiliation(s)
- Metin Balaban
- Bioinformatics and Systems Biology Graduate Program, UC San Diego, CA 92093, USA
| | - Shahab Sarmashghi
- Department of Electrical and Computer Engineering, UC San Diego, CA 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, UC San Diego, CA 92093, USA
| |
Collapse
|
13
|
Sweet AD, Johnson KP, Cameron SL. Mitochondrial genomes of Columbicola feather lice are highly fragmented, indicating repeated evolution of minicircle-type genomes in parasitic lice. PeerJ 2020; 8:e8759. [PMID: 32231878 PMCID: PMC7098387 DOI: 10.7717/peerj.8759] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 02/16/2020] [Indexed: 01/21/2023] Open
Abstract
Most animals have a conserved mitochondrial genome structure composed of a single chromosome. However, some organisms have their mitochondrial genes separated on several smaller circular or linear chromosomes. Highly fragmented circular chromosomes (“minicircles”) are especially prevalent in parasitic lice (Insecta: Phthiraptera), with 16 species known to have between nine and 20 mitochondrial minicircles per genome. All of these species belong to the same clade (mammalian lice), suggesting a single origin of drastic fragmentation. Nevertheless, other work indicates a lesser degree of fragmentation (2–3 chromosomes/genome) is present in some avian feather lice (Ischnocera: Philopteridae). In this study, we tested for minicircles in four species of the feather louse genus Columbicola (Philopteridae). Using whole genome shotgun sequence data, we applied three different bioinformatic approaches for assembling the Columbicola mitochondrial genome. We further confirmed these approaches by assembling the mitochondrial genome of Pediculus humanus from shotgun sequencing reads, a species known to have minicircles. Columbicola spp. genomes are highly fragmented into 15–17 minicircles between ∼1,100 and ∼3,100 bp in length, with 1–4 genes per minicircle. Subsequent annotation of the minicircles indicated that tRNA arrangements of minicircles varied substantially between species. These mitochondrial minicircles for species of Columbicola represent the first feather lice (Philopteridae) for which minicircles have been found in a full mitochondrial genome assembly. Combined with recent phylogenetic studies of parasitic lice, our results provide strong evidence that highly fragmented mitochondrial genomes, which are otherwise rare across the Tree of Life, evolved multiple times within parasitic lice.
Collapse
Affiliation(s)
- Andrew D Sweet
- Department of Entomology, Purdue University, West Lafayette, IN, United States of America
| | - Kevin P Johnson
- Illinois Natural History Survey, Prairie Research Institute, University of Illinois, Champaign, IL, United States of America
| | - Stephen L Cameron
- Department of Entomology, Purdue University, West Lafayette, IN, United States of America
| |
Collapse
|
14
|
Light JE, Harper SE, Johnson KP, Demastes JW, Spradling TA. Development and Characterization of 12 Novel Polymorphic Microsatellite Loci for the Mammal Chewing Louse Geomydoecus aurei (Insecta: Phthiraptera) and a Comparison of Next-Generation Sequencing Approaches for Use in Parasitology. J Parasitol 2017; 104:89-95. [PMID: 28985160 DOI: 10.1645/17-130] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Next-generation sequencing methodologies open the door for evolutionary studies of wildlife parasites. We used 2 next-generation sequencing approaches to discover microsatellite loci in the pocket gopher chewing louse Geomydoecus aurei for use in population genetic studies. In one approach, we sequenced a library enriched for microsatellite loci; in the other approach, we mined microsatellites from genomic sequences. Following microsatellite discovery, promising loci were tested for amplification and polymorphism in 390 louse individuals from 13 pocket gopher hosts. In total, 12 loci were selected for analysis (6 from each methodology), none of which exhibited evidence of null alleles or heterozygote deficiencies. These 12 loci showed adequate genetic diversity for population-level analyses, with 3-9 alleles per locus with an average HE per locus ranging from 0.32 to 0.70. Analysis of Molecular Variance (AMOVA) indicated that genetic variation among infrapopulations accounts for a low, but significant, percentage of the overall genetic variation, and individual louse infrapopulations showed FST values that were significantly different from zero in the majority of pairwise infrapopulation comparisons, despite all 13 infrapopulations being taken from the same locality. Therefore, these 12 polymorphic markers will be useful at the infrapopulation and population levels for future studies involving G. aurei. This study shows that next-generation sequencing methodologies can successfully be used to efficiently obtain data for a variety of evolutionary questions.
Collapse
Affiliation(s)
- J E Light
- Department of Wildlife and Fisheries Sciences, Texas A&M University, 534 John Kimbrough Blvd., College Station, Texas 77843
| | - S E Harper
- Department of Wildlife and Fisheries Sciences, Texas A&M University, 534 John Kimbrough Blvd., College Station, Texas 77843
| | - K P Johnson
- Department of Wildlife and Fisheries Sciences, Texas A&M University, 534 John Kimbrough Blvd., College Station, Texas 77843
| | - J W Demastes
- Department of Wildlife and Fisheries Sciences, Texas A&M University, 534 John Kimbrough Blvd., College Station, Texas 77843
| | - T A Spradling
- Department of Wildlife and Fisheries Sciences, Texas A&M University, 534 John Kimbrough Blvd., College Station, Texas 77843
| |
Collapse
|