1
|
Burda K, Konczal M. Validation of machine learning approach for direct mutation rate estimation. Mol Ecol Resour 2023; 23:1757-1771. [PMID: 37486035 DOI: 10.1111/1755-0998.13841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 06/16/2023] [Accepted: 07/05/2023] [Indexed: 07/25/2023]
Abstract
Mutations are the primary source of all genetic variation. Knowledge about their rates is critical for any evolutionary genetic analyses, but for a long time, that knowledge has remained elusive and indirectly inferred. In recent years, parent-offspring comparisons have yielded the first direct mutation rate estimates. The analyses are, however, challenging due to high rate of false positives and no consensus regarding standardized filtering of candidate de novo mutations. Here, we validate the application of a machine learning approach for such a task and estimate the mutation rate for the guppy (Poecilia reticulata), a model species in eco-evolutionary studies. We sequenced 4 parents and 20 offspring, followed by screening their genomes for de novo mutations. The initial large number of candidate de novo mutations was hard-filtered to remove false-positive results. These results were compared with mutation rate estimated with a supervised machine learning approach. Both approaches were followed by molecular validation of all candidate de novo mutations and yielded similar results. The ML method uniquely identified three mutations, but overall required more hands-on curation and had higher rates of false positives and false negatives. Both methods concordantly showed no difference in mutation rates between families. Estimated here the guppy mutation rate is among the lowest directly estimated mutation rates in vertebrates; however, previous research has also found low estimated rates in other teleost fishes. We discuss potential explanations for such a pattern, as well as future utility and limitations of machine learning approaches.
Collapse
Affiliation(s)
- Katarzyna Burda
- Evolutionary Biology Group, Faculty of Biology, Adam Mickiewicz University, Poznań, Poland
| | - Mateusz Konczal
- Evolutionary Biology Group, Faculty of Biology, Adam Mickiewicz University, Poznań, Poland
| |
Collapse
|
2
|
Bayer PE. Skim-Based Genotyping by Sequencing Using a Double Haploid Population to Call SNPs, Infer Gene Conversions, and Improve Genome Assemblies. Methods Mol Biol 2022; 2443:405-413. [PMID: 35037217 DOI: 10.1007/978-1-0716-2067-0_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Genotyping by sequencing (GBS) is an emerging technology to rapidly call an abundance of single nucleotide polymorphisms (SNPs) using genome sequencing technology. Several different methodologies and approaches have recently been established, most of these relying on a specific preparation of data. Here we describe our GBS pipeline, which uses high coverage reads from two parents and low coverage reads from their double haploid offspring to call SNPs on a large scale. The upside of this approach is the high resolution and scalability of the method.
Collapse
Affiliation(s)
- Philipp Emanuel Bayer
- School of Biological Sciences, University of Western Australia, Perth, WA, Australia.
| |
Collapse
|
3
|
Farrer RA. HaplotypeTools: a toolkit for accurately identifying recombination and recombinant genotypes. BMC Bioinformatics 2021; 22:560. [PMID: 34809571 PMCID: PMC8607637 DOI: 10.1186/s12859-021-04473-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 11/10/2021] [Indexed: 11/17/2022] Open
Abstract
Background Identifying haplotypes is central to sequence analysis in diploid or polyploid genomes. Despite this, there remains a lack of research and tools designed for physical phasing and its downstream analysis. Results HaplotypeTools is a new toolset to phase variant sites using VCF and BAM files and to analyse phased VCFs. Phasing is achieved via the identification of reads overlapping ≥ 2 heterozygous positions and then extended by additional reads, a process that can be parallelized across a computer cluster. HaplotypeTools includes various utility scripts for downstream analysis including crossover detection and phylogenetic placement of haplotypes to other lineages or species. HaplotypeTools was assessed for accuracy against WhatsHap using simulated short and long reads, demonstrating higher accuracy, albeit with reduced haplotype length. HaplotypeTools was also tested on real Illumina data to determine the ancestry of hybrid fungal isolate Batrachochytrium dendrobatidis (Bd) SA-EC3, finding 80% of haplotypes across the genome phylogenetically cluster with parental lineages BdGPL (39%) and BdCAPE (41%), indicating those are the parental lineages. Finally, ~ 99% of phasing was conserved between overlapping phase groups between SA-EC3 and either parental lineage, indicating mitotic gene conversion/parasexuality as the mechanism of recombination for this hybrid isolate. HaplotypeTools is open source and freely available from https://github.com/rhysf/HaplotypeTools under the MIT License. Conclusions HaplotypeTools is a powerful resource for analyzing hybrid or recombinant diploid or polyploid genomes and identifying parental ancestry for sub-genomic regions.
Collapse
Affiliation(s)
- Rhys A Farrer
- Medical Research Council Centre for Medical Mycology at the University of Exeter, Exeter, UK.
| |
Collapse
|
4
|
Paula DP. Next-Generation Sequencing and Its Impacts on Entomological Research in Ecology and Evolution. NEOTROPICAL ENTOMOLOGY 2021; 50:679-696. [PMID: 34374956 DOI: 10.1007/s13744-021-00895-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 07/06/2021] [Indexed: 06/13/2023]
Abstract
The advent of NGS-based methods has been profoundly transforming entomological research. Through continual development and improvement of different methods and sequencing platforms, NGS has promoted mass elucidation of partial or whole genetic materials associated with beneficial insects, pests (of agriculture, forestry and animal, and human health), and species of conservation concern, helping to unravel ecological and evolutionary mechanisms and characterizing survival, trophic interactions, and dispersal. It is shifting the scale of biodiversity and environmental analyses from individuals and biodiversity indicator species to the large-scale study of communities and ecosystems using bulk samples of species or a mixed "soup" of environmental DNA. As the NGS-based methods have become more affordable, complexity demystified, and specificity and sensitivity proven, their use in entomological research has spread widely. This article presents several examples on how NGS-based methods have been used in entomology to provide incentives to apply them when appropriate and to open our minds to the expected advances in entomology that are yet to come.
Collapse
|
5
|
Qi H, Li L, Zhang G. Construction of a chromosome-level genome and variation map for the Pacific oyster Crassostrea gigas. Mol Ecol Resour 2021; 21:1670-1685. [PMID: 33655634 DOI: 10.1111/1755-0998.13368] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 02/17/2021] [Accepted: 02/23/2021] [Indexed: 12/11/2022]
Abstract
The Pacific oyster (Crassostrea gigas) is a widely distributed marine bivalve of great ecological and economic importance. In this study, we provide a high-quality chromosome-level genome assembled using Pacific Bioscience long reads and Hi-C-based and linkage-map-based scaffolding technologies and a high-resolution variation map constructed using large-scale resequencing analysis. The 586.8 Mb genome consists of 10 pseudochromosome sequences ranging from 38.6 to 78.9 Mb, containing 301 contigs with an N50 size of 3.1 Mb. A total of 30,078 protein-coding genes were predicted, of which 22,757 (75.7%) were high-reliability annotations supported by a homologous match to a curated protein in the SWISS-PROT database or transcript expression. Although a medium level of repeat components (57.2%) was detected, the genomic content of the segmental duplications reached 26.2%, which is the highest among the reported genomes. By whole genome resequencing analysis of 495 Pacific oysters, a comprehensive variation map was built, comprised of 4.78 million single nucleotide polymorphisms, 0.60 million short insertions and deletions, and 49,333 copy number variation regions. The structural variations can lead to an average interindividual genomic divergence of 0.21, indicating their crucial role in shaping the Pacific oyster genome diversity. The large amount of mosaic distributed repeat elements, small variations, and copy number variations indicate that the Pacific oyster is a diploid organism with an extremely high genomic complexity at the intra- and interindividual level. The genome and variation maps can improve our understanding of oyster genome diversity and enrich the resources for oyster molecular evolution, comparative genomics, and genetic research.
Collapse
Affiliation(s)
- Haigang Qi
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China.,Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, China.,Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao, China.,National and Local Joint Engineering Laboratory of Ecological Mariculture, Qingdao, China
| | - Li Li
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China.,Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao, China.,Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, Qingdao, China.,National and Local Joint Engineering Laboratory of Ecological Mariculture, Qingdao, China
| | - Guofan Zhang
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China.,Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao, China.,Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao, China.,National and Local Joint Engineering Laboratory of Ecological Mariculture, Qingdao, China
| |
Collapse
|
6
|
Valiente-Mullor C, Beamud B, Ansari I, Francés-Cuesta C, García-González N, Mejía L, Ruiz-Hueso P, González-Candelas F. One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads. PLoS Comput Biol 2021; 17:e1008678. [PMID: 33503026 PMCID: PMC7870062 DOI: 10.1371/journal.pcbi.1008678] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Revised: 02/08/2021] [Accepted: 01/05/2021] [Indexed: 12/17/2022] Open
Abstract
Mapping of high-throughput sequencing (HTS) reads to a single arbitrary reference genome is a frequently used approach in microbial genomics. However, the choice of a reference may represent a source of errors that may affect subsequent analyses such as the detection of single nucleotide polymorphisms (SNPs) and phylogenetic inference. In this work, we evaluated the effect of reference choice on short-read sequence data from five clinically and epidemiologically relevant bacteria (Klebsiella pneumoniae, Legionella pneumophila, Neisseria gonorrhoeae, Pseudomonas aeruginosa and Serratia marcescens). Publicly available whole-genome assemblies encompassing the genomic diversity of these species were selected as reference sequences, and read alignment statistics, SNP calling, recombination rates, dN/dS ratios, and phylogenetic trees were evaluated depending on the mapping reference. The choice of different reference genomes proved to have an impact on almost all the parameters considered in the five species. In addition, these biases had potential epidemiological implications such as including/excluding isolates of particular clades and the estimation of genetic distances. These findings suggest that the single reference approach might introduce systematic errors during mapping that affect subsequent analyses, particularly for data sets with isolates from genetically diverse backgrounds. In any case, exploring the effects of different references on the final conclusions is highly recommended. Mapping consists in the alignment of reads (i.e., DNA fragments) obtained through high-throughput genome sequencing to a previously assembled reference sequence. It is a common practice in genomic studies to use a single reference for mapping, usually the ‘reference genome’ of a species—a high-quality assembly. However, the selection of an optimal reference is hindered by intrinsic intra-species genetic variability, particularly in bacteria. It is known that genetic differences between the reference genome and the read sequences may produce incorrect alignments during mapping. Eventually, these errors could lead to misidentification of variants and biased reconstruction of phylogenetic trees (which reflect ancestry between different bacterial lineages). To our knowledge, this is the first work to systematically examine the effect of different references for mapping on the inference of tree topology as well as the impact on recombination and natural selection inferences. Furthermore, the novelty of this work relies on a procedure that guarantees that we are evaluating only the effect of the reference. This effect has proved to be pervasive in the five bacterial species that we have studied and, in some cases, alterations in phylogenetic trees could lead to incorrect epidemiological inferences. Hence, the use of different reference genomes may be prescriptive to assess the potential biases of mapping.
Collapse
Affiliation(s)
- Carlos Valiente-Mullor
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
| | - Beatriz Beamud
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
- * E-mail: (BB); (FG-C)
| | - Iván Ansari
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
| | - Carlos Francés-Cuesta
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
| | - Neris García-González
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
| | - Lorena Mejía
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
- Instituto de Microbiología, Colegio de Ciencias Biológicas y Ambientales, Universidad San Francisco de Quito, Quito, Ecuador
| | - Paula Ruiz-Hueso
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
| | - Fernando González-Candelas
- Joint Research Unit “Infection and Public Health” FISABIO-University of Valencia, Institute for Integrative Systems Biology (I2SysBio), Valencia, Spain
- CIBER in Epidemiology and Public Health, Valencia, Spain
- * E-mail: (BB); (FG-C)
| |
Collapse
|
7
|
Abstract
Read alignment is the central step of many analytic pipelines that perform variant calling. To reduce error, it is common practice to pre-process raw sequencing reads to remove low-quality bases and residual adapter contamination, a procedure collectively known as ‘trimming’. Trimming is widely assumed to increase the accuracy of variant calling, although there are relatively few systematic evaluations of its effects and no clear consensus on its efficacy. As sequencing datasets increase both in number and size, it is worthwhile reappraising computational operations of ambiguous benefit, particularly when the scope of many analyses now routinely incorporates thousands of samples, increasing the time and cost required. Using a curated set of 17 Gram-negative bacterial genomes, this study initially evaluated the impact of four read-trimming utilities (Atropos, fastp, Trim Galore and Trimmomatic), each used with a range of stringencies, on the accuracy and completeness of three bacterial SNP-calling pipelines. It was found that read trimming made only small, and statistically insignificant, increases in SNP-calling accuracy even when using the highest-performing pre-processor in this study, fastp. To extend these findings, >6500 publicly archived sequencing datasets from Escherichia coli, Mycobacterium tuberculosis and Staphylococcus aureus were re-analysed using a common analytic pipeline. Of the approximately 125 million SNPs and 1.25 million indels called across all samples, the same bases were called in 98.8 and 91.9 % of cases, respectively, irrespective of whether raw reads or trimmed reads were used. Nevertheless, the proportion of mixed calls (i.e. calls where <100 % of the reads support the variant allele; considered a proxy of false positives) was significantly reduced after trimming, which suggests that while trimming rarely alters the set of variant bases, it can affect the proportion of reads supporting each call. It was concluded that read quality- and adapter-trimming add relatively little value to a SNP-calling pipeline and may only be necessary if small differences in the absolute number of SNP calls, or the false call rate, are critical. Broadly similar conclusions can be drawn about the utility of trimming to an indel-calling pipeline. Read trimming remains routinely performed prior to variant calling likely out of concern that doing otherwise would typically have negative consequences. While historically this may have been the case, the data in this study suggests that read trimming is not always a practical necessity.
Collapse
Affiliation(s)
- Stephen J Bush
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| |
Collapse
|
8
|
Bush SJ, Foster D, Eyre DW, Clark EL, De Maio N, Shaw LP, Stoesser N, Peto TEA, Crook DW, Walker AS. Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism-calling pipelines. Gigascience 2020; 9:giaa007. [PMID: 32025702 PMCID: PMC7002876 DOI: 10.1093/gigascience/giaa007] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 12/02/2019] [Accepted: 01/15/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Accurately identifying single-nucleotide polymorphisms (SNPs) from bacterial sequencing data is an essential requirement for using genomics to track transmission and predict important phenotypes such as antimicrobial resistance. However, most previous performance evaluations of SNP calling have been restricted to eukaryotic (human) data. Additionally, bacterial SNP calling requires choosing an appropriate reference genome to align reads to, which, together with the bioinformatic pipeline, affects the accuracy and completeness of a set of SNP calls obtained. This study evaluates the performance of 209 SNP-calling pipelines using a combination of simulated data from 254 strains of 10 clinically common bacteria and real data from environmentally sourced and genomically diverse isolates within the genera Citrobacter, Enterobacter, Escherichia, and Klebsiella. RESULTS We evaluated the performance of 209 SNP-calling pipelines, aligning reads to genomes of the same or a divergent strain. Irrespective of pipeline, a principal determinant of reliable SNP calling was reference genome selection. Across multiple taxa, there was a strong inverse relationship between pipeline sensitivity and precision, and the Mash distance (a proxy for average nucleotide divergence) between reads and reference genome. The effect was especially pronounced for diverse, recombinogenic bacteria such as Escherichia coli but less dominant for clonal species such as Mycobacterium tuberculosis. CONCLUSIONS The accuracy of SNP calling for a given species is compromised by increasing intra-species diversity. When reads were aligned to the same genome from which they were sequenced, among the highest-performing pipelines was Novoalign/GATK. By contrast, when reads were aligned to particularly divergent genomes, the highest-performing pipelines often used the aligners NextGenMap or SMALT, and/or the variant callers LoFreq, mpileup, or Strelka.
Collapse
Affiliation(s)
- Stephen J Bush
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Health Research Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - Dona Foster
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Oxford Biomedical Research Centre, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - David W Eyre
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - Emily L Clark
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SH, UK
| | - Liam P Shaw
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - Nicole Stoesser
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - Tim E A Peto
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Health Research Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Oxford Biomedical Research Centre, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - Derrick W Crook
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Health Research Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Oxford Biomedical Research Centre, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - A Sarah Walker
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Health Research Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Oxford Biomedical Research Centre, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| |
Collapse
|
9
|
Chang LY, Toghiani S, Hay EH, Aggrey SE, Rekaya R. A Weighted Genomic Relationship Matrix Based on Fixation Index (F ST) Prioritized SNPs for Genomic Selection. Genes (Basel) 2019; 10:genes10110922. [PMID: 31726712 PMCID: PMC6895924 DOI: 10.3390/genes10110922] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 11/06/2019] [Accepted: 11/08/2019] [Indexed: 12/30/2022] Open
Abstract
A dramatic increase in the density of marker panels has been expected to increase the accuracy of genomic selection (GS), unfortunately, little to no improvement has been observed. By including all variants in the association model, the dimensionality of the problem should be dramatically increased, and it could undoubtedly reduce the statistical power. Using all Single nucleotide polymorphisms (SNPs) to compute the genomic relationship matrix (G) does not necessarily increase accuracy as the additive relationships can be accurately estimated using a much smaller number of markers. Due to these limitations, variant prioritization has become a necessity to improve accuracy. The fixation index (FST) as a measure of population differentiation has been used to identify genome segments and variants under selection pressure. Using prioritized variants has increased the accuracy of GS. Additionally, FST can be used to weight the relative contribution of prioritized SNPs in computing G. In this study, relative weights based on FST scores were developed and incorporated into the calculation of G and their impact on the estimation of variance components and accuracy was assessed. The results showed that prioritizing SNPs based on their FST scores resulted in an increase in the genetic similarity between training and validation animals and improved the accuracy of GS by more than 5%.
Collapse
Affiliation(s)
- Ling-Yun Chang
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA; (S.T.); (R.R.)
- ABS Global, Inc., DeForest, WI 53532, USA
- Correspondence:
| | - Sajjad Toghiani
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA; (S.T.); (R.R.)
- USDA Agricultural Research Service, Fort Keogh Livestock and Range Research Laboratory, Miles City, MT 59301, USA;
| | - El Hamidi Hay
- USDA Agricultural Research Service, Fort Keogh Livestock and Range Research Laboratory, Miles City, MT 59301, USA;
| | - Samuel E. Aggrey
- Department of Poultry Science, University of Georgia, Athens, GA 30602, USA;
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Romdhane Rekaya
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA; (S.T.); (R.R.)
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
10
|
Hui W, Yang Y, Wu G, Wang Y, Zaky Zayed M, Chen X. Differential gene expression analyses related to fruit yield of Jatropha curcas L. using RNA-seq. BIOTECHNOL BIOTEC EQ 2018. [DOI: 10.1080/13102818.2018.1507757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022] Open
Affiliation(s)
- Wenkai Hui
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, College of Forestry and Landscape Architecture, South China Agricultural University, Guangzhou, P.R. China
- National Engineering Laboratory for Forest Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P.R. China
| | - Yuantong Yang
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, College of Forestry and Landscape Architecture, South China Agricultural University, Guangzhou, P.R. China
| | - Guojiang Wu
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, P.R. China
| | - Yi Wang
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, College of Forestry and Landscape Architecture, South China Agricultural University, Guangzhou, P.R. China
| | - Mohamed Zaky Zayed
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, College of Forestry and Landscape Architecture, South China Agricultural University, Guangzhou, P.R. China
- Forestry and Wood Technology Department, Faculty of Agriculture (EL-Shatby), Alexandria University, Alexandria, Egypt
| | - Xiaoyang Chen
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangdong Key Laboratory for Innovative Development and Utilization of Forest Plant Germplasm, College of Forestry and Landscape Architecture, South China Agricultural University, Guangzhou, P.R. China
- National Engineering Laboratory for Forest Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, P.R. China
| |
Collapse
|
11
|
Tiley GP, Kimball RT, Braun EL, Burleigh JG. Comparison of the Chinese bamboo partridge and red Junglefowl genome sequences highlights the importance of demography in genome evolution. BMC Genomics 2018; 19:336. [PMID: 29739321 PMCID: PMC5941490 DOI: 10.1186/s12864-018-4711-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Accepted: 04/23/2018] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Recent large-scale whole genome sequencing efforts in birds have elucidated broad patterns of avian phylogeny and genome evolution. However, despite the great interest in economically important phasianids like Gallus gallus (Red Junglefowl, the progenitor of the chicken), we know little about the genomes of closely related species. Gallus gallus is highly sexually dichromatic and polygynous, but its sister genus, Bambusicola, is smaller, sexually monomorphic, and monogamous with biparental care. We sequenced the genome of Bambusicola thoracicus (Chinese Bamboo Partridge) using a single insert library to test hypotheses about genome evolution in galliforms. Selection acting at the phenotypic level could result in more evidence of positive selection in the Gallus genome than in Bambusicola. However, the historical range size of Bambusicola was likely smaller than Gallus, and demographic effects could lead to higher rates of nonsynonymous substitution in Bambusicola than in Gallus. RESULTS We generated a genome assembly suitable for evolutionary analyses. We examined the impact of selection on coding regions by examining shifts in the average nonsynonymous to synonymous rate ratio (dN/dS) and the proportion of sites subject to episodic positive selection. We observed elevated dN/dS in Bambusicola relative to Gallus, which is consistent with our hypothesis that demographic effects may be important drivers of genome evolution in Bambusicola. We also demonstrated that alignment error can greatly inflate estimates of the number of genes that experienced episodic positive selection and heterogeneity in dN/dS. However, overall patterns of molecular evolution were robust to alignment uncertainty. Bambusicola thoracicus has higher estimates of heterozygosity than Gallus gallus, possibly due to migration events over the past 100,000 years. CONCLUSIONS Our results emphasized the importance of demographic processes in generating the patterns of variation between Bambusicola and Gallus. We also demonstrated that genome assemblies generated using a single library can provide valuable insights into avian evolutionary history and found that it is important to account for alignment uncertainty in evolutionary inferences from draft genomes.
Collapse
Affiliation(s)
- G P Tiley
- Department of Biology, University of Florida, Gainesville, FL, 32611, USA. .,Department of Biology, Duke University, Durham, NC, 27708, USA.
| | - R T Kimball
- Department of Biology, University of Florida, Gainesville, FL, 32611, USA
| | - E L Braun
- Department of Biology, University of Florida, Gainesville, FL, 32611, USA
| | - J G Burleigh
- Department of Biology, University of Florida, Gainesville, FL, 32611, USA
| |
Collapse
|
12
|
Bradic M, Warring SD, Tooley GE, Scheid P, Secor WE, Land KM, Huang PJ, Chen TW, Lee CC, Tang P, Sullivan SA, Carlton JM. Genetic Indicators of Drug Resistance in the Highly Repetitive Genome of Trichomonas vaginalis. Genome Biol Evol 2018. [PMID: 28633446 PMCID: PMC5522705 DOI: 10.1093/gbe/evx110] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Trichomonas vaginalis, the most common nonviral sexually transmitted parasite, causes ∼283 million trichomoniasis infections annually and is associated with pregnancy complications and increased risk of HIV-1 acquisition. The antimicrobial drug metronidazole is used for treatment, but in a fraction of clinical cases, the parasites can become resistant to this drug. We undertook sequencing of multiple clinical isolates and lab derived lines to identify genetic markers and mechanisms of metronidazole resistance. Reduced representation genome sequencing of ∼100 T. vaginalis clinical isolates identified 3,923 SNP markers and presence of a bipartite population structure. Linkage disequilibrium was found to decay rapidly, suggesting genome-wide recombination and the feasibility of genetic association studies in the parasite. We identified 72 SNPs associated with metronidazole resistance, and a comparison of SNPs within several lab-derived resistant lines revealed an overlap with the clinically resistant isolates. We identified SNPs in genes for which no function has yet been assigned, as well as in functionally-characterized genes relevant to drug resistance (e.g., pyruvate:ferredoxin oxidoreductase). Transcription profiles of resistant strains showed common changes in genes involved in drug activation (e.g., flavin reductase), accumulation (e.g., multidrug resistance pump), and detoxification (e.g., nitroreductase). Finally, we identified convergent genetic changes in lab-derived resistant lines of Tritrichomonas foetus, a distantly related species that causes venereal disease in cattle. Shared genetic changes within and between T. vaginalis and Tr. foetus parasites suggest conservation of the pathways through which adaptation has occurred. These findings extend our knowledge of drug resistance in the parasite, providing a panel of markers that can be used as a diagnostic tool.
Collapse
Affiliation(s)
- Martina Bradic
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York
| | - Sally D Warring
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York
| | - Grace E Tooley
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York
| | - Paul Scheid
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York
| | - William E Secor
- Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, GE
| | - Kirkwood M Land
- Department of Biological Sciences, University of the Pacific, Stockton, CA
| | - Po-Jung Huang
- Bioinformatics Center/Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Ting-Wen Chen
- Bioinformatics Center/Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Chi-Ching Lee
- Bioinformatics Center/Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Petrus Tang
- Bioinformatics Center/Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Steven A Sullivan
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York
| | - Jane M Carlton
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York
| |
Collapse
|
13
|
Farrer RA, Fisher MC. Describing Genomic and Epigenomic Traits Underpinning Emerging Fungal Pathogens. ADVANCES IN GENETICS 2017; 100:73-140. [PMID: 29153405 DOI: 10.1016/bs.adgen.2017.09.009] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
An unprecedented number of pathogenic fungi are emerging and causing disease in animals and plants, putting the resilience of wild and managed ecosystems in jeopardy. While the past decades have seen an increase in the number of pathogenic fungi, they have also seen the birth of new big data technologies and analytical approaches to tackle these emerging pathogens. We review how the linked fields of genomics and epigenomics are transforming our ability to address the challenge of emerging fungal pathogens. We explore the methodologies and bioinformatic toolkits that currently exist to rapidly analyze the genomes of unknown fungi, then discuss how these data can be used to address key questions that shed light on their epidemiology. We show how genomic approaches are leading a revolution into our understanding of emerging fungal diseases and speculate on future approaches that will transform our ability to tackle this increasingly important class of emerging pathogens.
Collapse
|
14
|
Wu SH, Schwartz RS, Winter DJ, Conrad DF, Cartwright RA. Estimating error models for whole genome sequencing using mixtures of Dirichlet-multinomial distributions. Bioinformatics 2017; 33:2322-2329. [PMID: 28334373 PMCID: PMC5860108 DOI: 10.1093/bioinformatics/btx133] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Revised: 01/22/2017] [Accepted: 03/07/2017] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION Accurate identification of genotypes is an essential part of the analysis of genomic data, including in identification of sequence polymorphisms, linking mutations with disease and determining mutation rates. Biological and technical processes that adversely affect genotyping include copy-number-variation, paralogous sequences, library preparation, sequencing error and reference-mapping biases, among others. RESULTS We modeled the read depth for all data as a mixture of Dirichlet-multinomial distributions, resulting in significant improvements over previously used models. In most cases the best model was comprised of two distributions. The major-component distribution is similar to a binomial distribution with low error and low reference bias. The minor-component distribution is overdispersed with higher error and reference bias. We also found that sites fitting the minor component are enriched for copy number variants and low complexity regions, which can produce erroneous genotype calls. By removing sites that do not fit the major component, we can improve the accuracy of genotype calls. AVAILABILITY AND IMPLEMENTATION Methods and data files are available at https://github.com/CartwrightLab/WuEtAl2017/ (doi:10.5281/zenodo.256858). CONTACT cartwright@asu.edu. SUPPLEMENTARY INFORMATION Supplementary data is available at Bioinformatics online.
Collapse
Affiliation(s)
- Steven H Wu
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Rachel S Schwartz
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
- Department of Biological Sciences, The University of Rhode Island, Kingston, RI, USA
| | - David J Winter
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Donald F Conrad
- Department of Genetics, Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, USA
| | - Reed A Cartwright
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
15
|
Farrer RA, Martel A, Verbrugghe E, Abouelleil A, Ducatelle R, Longcore JE, James TY, Pasmans F, Fisher MC, Cuomo CA. Genomic innovations linked to infection strategies across emerging pathogenic chytrid fungi. Nat Commun 2017; 8:14742. [PMID: 28322291 PMCID: PMC5364385 DOI: 10.1038/ncomms14742] [Citation(s) in RCA: 68] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2016] [Accepted: 01/26/2017] [Indexed: 11/09/2022] Open
Abstract
To understand the evolutionary pathways that lead to emerging infections of vertebrates, here we explore the genomic innovations that allow free-living chytrid fungi to adapt to and colonize amphibian hosts. Sequencing and comparing the genomes of two pathogenic species of Batrachochytrium to those of close saprophytic relatives reveals that pathogenicity is associated with remarkable expansions of protease and cell wall gene families, while divergent infection strategies are linked to radiations of lineage-specific gene families. By comparing the host–pathogen response to infection for both pathogens, we illuminate the traits that underpin a strikingly different immune response within a shared host species. Our results show that, despite commonalities that promote infection, specific gene-family radiations contribute to distinct infection strategies. The breadth and evolutionary novelty of candidate virulence factors that we discover underscores the urgent need to halt the advance of pathogenic chytrids and prevent incipient loss of biodiversity. Batrachochytrium dendrobatidis and B. salamandrivorans are both important pathogens of amphibians, but they differ in their host ranges, infection strategies, and host immune responses. Here, Farrer and colleagues compare their genomes and transcriptomes to identify the genetic basis of these differences.
Collapse
Affiliation(s)
- Rhys A Farrer
- Genome Sequencing and Analysis Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.,Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London W2 1PG, UK
| | - An Martel
- Department of Pathology, Bacteriology and Avian Diseases, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, B-9820 Merelbeke, Belgium
| | - Elin Verbrugghe
- Department of Pathology, Bacteriology and Avian Diseases, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, B-9820 Merelbeke, Belgium
| | - Amr Abouelleil
- Genome Sequencing and Analysis Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Richard Ducatelle
- Department of Pathology, Bacteriology and Avian Diseases, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, B-9820 Merelbeke, Belgium
| | - Joyce E Longcore
- School of Biology and Ecology, University of Maine, Orono, Maine 04469, USA
| | - Timothy Y James
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Frank Pasmans
- Department of Pathology, Bacteriology and Avian Diseases, Faculty of Veterinary Medicine, Ghent University, Salisburylaan 133, B-9820 Merelbeke, Belgium
| | - Matthew C Fisher
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London W2 1PG, UK
| | - Christina A Cuomo
- Genome Sequencing and Analysis Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| |
Collapse
|
16
|
Farrer RA, Voelz K, Henk DA, Johnston SA, Fisher MC, May RC, Cuomo CA. Microevolutionary traits and comparative population genomics of the emerging pathogenic fungus Cryptococcus gattii. Philos Trans R Soc Lond B Biol Sci 2016; 371:20160021. [PMID: 28080992 PMCID: PMC5095545 DOI: 10.1098/rstb.2016.0021] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/04/2016] [Indexed: 01/15/2023] Open
Abstract
Emerging fungal pathogens cause an expanding burden of disease across the animal kingdom, including a rise in morbidity and mortality in humans. Yet, we currently have only a limited repertoire of available therapeutic interventions. A greater understanding of the mechanisms of fungal virulence and of the emergence of hypervirulence within species is therefore needed for new treatments and mitigation efforts. For example, over the past decade, an unusual lineage of Cryptococcus gattii, which was first detected on Vancouver Island, has spread to the Canadian mainland and the Pacific Northwest infecting otherwise healthy individuals. The molecular changes that led to the development of this hypervirulent cryptococcal lineage remain unclear. To explore this, we traced the history of similar microevolutionary events that can lead to changes in host range and pathogenicity. Here, we detail fine-resolution mapping of genetic differences between two highly related Cryptococcus gattii VGIIc isolates that differ in their virulence traits (phagocytosis, vomocytosis, macrophage death, mitochondrial tubularization and intracellular proliferation). We identified a small number of single site variants within coding regions that potentially contribute to variations in virulence. We then extended our methods across multiple lineages of C. gattii to study how selection is acting on key virulence genes within different lineages.This article is part of the themed issue 'Tackling emerging fungal threats to animal health, food security and ecosystem resilience'.
Collapse
Affiliation(s)
- Rhys A Farrer
- Genome Sequencing and Analysis Program, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London W2 1PG, UK
| | - Kerstin Voelz
- Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Birmingham B15 2TT, UK
- NIHR Surgical Reconstruction and Microbiology Research Centre, University Hospitals Birmingham NHS Foundation Trust, Queen Elizabeth Hospital Birmingham, Birmingham B15 2TH, UK
| | - Daniel A Henk
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London W2 1PG, UK
| | - Simon A Johnston
- Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Birmingham B15 2TT, UK
| | - Matthew C Fisher
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London W2 1PG, UK
| | - Robin C May
- Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Birmingham B15 2TT, UK
- NIHR Surgical Reconstruction and Microbiology Research Centre, University Hospitals Birmingham NHS Foundation Trust, Queen Elizabeth Hospital Birmingham, Birmingham B15 2TH, UK
| | - Christina A Cuomo
- Genome Sequencing and Analysis Program, The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
17
|
Muñoz JF, Farrer RA, Desjardins CA, Gallo JE, Sykes S, Sakthikumar S, Misas E, Whiston EA, Bagagli E, Soares CMA, Teixeira MDM, Taylor JW, Clay OK, McEwen JG, Cuomo CA. Genome Diversity, Recombination, and Virulence across the Major Lineages of Paracoccidioides. mSphere 2016; 1:e00213-16. [PMID: 27704050 PMCID: PMC5040785 DOI: 10.1128/msphere.00213-16] [Citation(s) in RCA: 74] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Accepted: 09/06/2016] [Indexed: 12/29/2022] Open
Abstract
The Paracoccidioides genus includes two species of thermally dimorphic fungi that cause paracoccidioidomycosis, a neglected health-threatening human systemic mycosis endemic to Latin America. To examine the genome evolution and the diversity of Paracoccidioides spp., we conducted whole-genome sequencing of 31 isolates representing the phylogenetic, geographic, and ecological breadth of the genus. These samples included clinical, environmental and laboratory reference strains of the S1, PS2, PS3, and PS4 lineages of P. brasiliensis and also isolates of Paracoccidioides lutzii species. We completed the first annotated genome assemblies for the PS3 and PS4 lineages and found that gene order was highly conserved across the major lineages, with only a few chromosomal rearrangements. Comparing whole-genome assemblies of the major lineages with single-nucleotide polymorphisms (SNPs) predicted from the remaining 26 isolates, we identified a deep split of the S1 lineage into two clades we named S1a and S1b. We found evidence for greater genetic exchange between the S1b lineage and all other lineages; this may reflect the broad geographic range of S1b, which is often sympatric with the remaining, largely geographically isolated lineages. In addition, we found evidence of positive selection for the GP43 and PGA1 antigen genes and genes coding for other secreted proteins and proteases and lineage-specific loss-of-function mutations in cell wall and protease genes; these together may contribute to virulence and host immune response variation among natural isolates of Paracoccidioides spp. These insights into the recent evolutionary events highlight important differences between the lineages that could impact the distribution, pathogenicity, and ecology of Paracoccidioides. IMPORTANCE Characterization of genetic differences between lineages of the dimorphic human-pathogenic fungus Paracoccidioides can identify changes linked to important phenotypes and guide the development of new diagnostics and treatments. In this article, we compared genomes of 31 diverse isolates representing the major lineages of Paracoccidioides spp. and completed the first annotated genome sequences for the PS3 and PS4 lineages. We analyzed the population structure and characterized the genetic diversity among the lineages of Paracoccidioides, including a deep split of S1 into two lineages (S1a and S1b), and differentiated S1b, associated with most clinical cases, as the more highly recombining and diverse lineage. In addition, we found patterns of positive selection in surface proteins and secreted enzymes among the lineages, suggesting diversifying mechanisms of pathogenicity and adaptation across this species complex. These genetic differences suggest associations with the geographic range, pathogenicity, and ecological niches of Paracoccidioides lineages.
Collapse
Affiliation(s)
- José F. Muñoz
- Cellular and Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia
- Institute of Biology, Universidad de Antioquia, Medellín, Colombia
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Rhys A. Farrer
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | | | - Juan E. Gallo
- Cellular and Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia
- Doctoral Program in Biomedical Sciences, Universidad del Rosario, Bogotá, Colombia
| | - Sean Sykes
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | | | - Elizabeth Misas
- Cellular and Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia
- Institute of Biology, Universidad de Antioquia, Medellín, Colombia
| | - Emily A. Whiston
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, California, USA
| | - Eduardo Bagagli
- Instituto de Biociências, Universidade Estadual Paulista, Botucatu, São Paulo, Brazil
| | - Celia M. A. Soares
- Laboratório de Biología Molecular, Instituto de Ciências Biológicas, ICBII, Goiânia, Brazil
| | - Marcus de M. Teixeira
- Instituto de Ciências Biológicas, Universidade de Brasília, Brasília, Distrito Federal, Brazil
- Division of Pathogen Genomics, Translational Genomics Research Institute North, Flagstaff, Arizona, USA
| | - John W. Taylor
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, California, USA
| | - Oliver K. Clay
- Cellular and Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia
- School of Medicine and Health Sciences, Universidad del Rosario, Bogotá, Colombia
| | - Juan G. McEwen
- Cellular and Molecular Biology Unit, Corporación para Investigaciones Biológicas, Medellín, Colombia
- School of Medicine, Universidad de Antioquia, Medellín, Colombia
| | | |
Collapse
|
18
|
Shifman AR, Johnson RM, Wilhelm BT. Cascade: an RNA-seq visualization tool for cancer genomics. BMC Genomics 2016; 17:75. [PMID: 26810393 PMCID: PMC4727405 DOI: 10.1186/s12864-016-2389-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 01/11/2016] [Indexed: 12/20/2022] Open
Abstract
Background Cancer genomics projects are producing ever-increasing amounts of rich and diverse data from patient samples. The ability to easily visualize this data in an integrated an intuitive way is currently limited by the current software available. As a result, users typically must use several different tools to view the different data types for their cohort, making it difficult to have a simple unified view of their data. Results Here we present Cascade, a novel web based tool for the intuitive 3D visualization of RNA-seq data from cancer genomics experiments. The Cascade viewer allows multiple data types (e.g. mutation, gene expression, alternative splicing frequency) to be simultaneously displayed, allowing a simplified view of the data in a way that is tuneable based on user specified parameters. The main webpage of Cascade provides a primary view of user data which is overlaid onto known biological pathways that are either predefined or added by users. A space-saving menu for data selection and parameter adjustment allows users to access an underlying MySQL database and customize the features presented in the main view. Conclusions There is currently a pressing need for new software tools to allow researchers to easily explore large cancer genomics datasets and generate hypotheses. Cascade represents a simple yet intuitive interface for data visualization that is both scalable and customizable. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2389-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Aaron R Shifman
- Laboratory for high throughput genomics, Institute for Research in Immunology and Cancer, University of Montreal, Montreal, QC, Canada.
| | - Radia M Johnson
- Laboratory for high throughput genomics, Institute for Research in Immunology and Cancer, University of Montreal, Montreal, QC, Canada.
| | - Brian T Wilhelm
- Laboratory for high throughput genomics, Institute for Research in Immunology and Cancer, University of Montreal, Montreal, QC, Canada.
| |
Collapse
|
19
|
Bayer PE. Skim-Based Genotyping by Sequencing Using a Double Haploid Population to Call SNPs, Infer Gene Conversions, and Improve Genome Assemblies. Methods Mol Biol 2016; 1374:285-292. [PMID: 26519413 DOI: 10.1007/978-1-4939-3167-5_16] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Genotyping by sequencing (GBS) is an emerging technology to rapidly call an abundance of Single Nucleotide Polymorphisms (SNPs) using genome sequencing technology. Several different methodologies and approaches have recently been established, most of these relying on a specific preparation of data. Here we describe our GBS-pipeline, which uses high coverage reads from two parents and low coverage reads from their double haploid offspring to call SNPs on a large scale. The upside of this approach is the high resolution and scalability of the method.
Collapse
Affiliation(s)
- Philipp Emanuel Bayer
- School of Plant Biology, University of Western Australia, 35 Stirling Highway, Crawley, Perth, WA, 6009, Australia.
| |
Collapse
|
20
|
Pightling AW, Petronella N, Pagotto F. Choice of reference-guided sequence assembler and SNP caller for analysis of Listeria monocytogenes short-read sequence data greatly influences rates of error. BMC Res Notes 2015; 8:748. [PMID: 26643440 PMCID: PMC4672502 DOI: 10.1186/s13104-015-1689-4] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 11/11/2015] [Indexed: 02/07/2023] Open
Abstract
Background The influences that different programs and conditions have on error rates of single-nucleotide polymorphism (SNP) analyses are poorly understood. Using Illumina short-read sequence data generated from Listeria monocytogenes strain HPB5622, we assessed the performance of four SNP callers (BCFtools, FreeBayes, UnifiedGenotyper, VarScan) under a variety of conditions, including: (1) a range of sequencing coverages; (2) use of four popular reference-guided assemblers (Burrows-Wheeler Aligner, Novoalign, MOSAIK, SMALT); (3) with and without read quality trimming and filtering; and (4) use of different reference sequences. Results At 8-fold coverage the proportions of true positive calls ranged from 0.22 to 25.00 % when reads were aligned to a nearly identical reference (0.000096 % distant). Calls made when reads were aligned to a non-identical reference (0.85 % distant) were from 92.54 to 98.88 % accurate. At 79-fold coverage accuracies ranged from 3.95 to 20.00 % with the nearly identical reference and 93.80–98.75 % with the non-identical reference. Read preprocessing significantly changed the numbers of false positive calls made, from a 65.24 % decrease to a 54.55 % increase. Conclusions The combinations of reference-guided sequence assemblers and SNP callers greatly influenced not only the numbers of true and false positive sites but also the proportions of true positive calls relative to the total numbers of calls made. Furthermore, the efficacy of different assembler and caller combinations changed dramatically with the different conditions tested. Researchers should consider whether identifying the greatest numbers of true positive sites, reducing the numbers of false positive calls, or achieving the highest accuracies are desired. Electronic supplementary material The online version of this article (doi:10.1186/s13104-015-1689-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Arthur W Pightling
- Office of Analytics and Outreach, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, 5100 Paint Branch Parkway, College Park, MD, 20740, USA.
| | - Nicholas Petronella
- Biostatistics and Modelling Division, Bureau of Food Surveillance and Science Integration, Food Directorate, Health Products and Food Branch, Health Canada, 251 Sir Frederick Banting Driveway, Ottawa, ON, K1A 0K9, Canada.
| | - Franco Pagotto
- Listeriosis Reference Service for Canada, Microbiology Research Division, Bureau of Microbial Hazards, Food Directorate, Health Products and Food Branch, Health Canada, 251 Sir Frederick Banting Driveway, Ottawa, ON, K1A 0K9, Canada.
| |
Collapse
|
21
|
Ribeiro A, Golicz A, Hackett CA, Milne I, Stephen G, Marshall D, Flavell AJ, Bayer M. An investigation of causes of false positive single nucleotide polymorphisms using simulated reads from a small eukaryote genome. BMC Bioinformatics 2015; 16:382. [PMID: 26558718 PMCID: PMC4642669 DOI: 10.1186/s12859-015-0801-z] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2015] [Accepted: 10/29/2015] [Indexed: 12/30/2022] Open
Abstract
Background Single Nucleotide Polymorphisms (SNPs) are widely used molecular markers, and their use has increased massively since the inception of Next Generation Sequencing (NGS) technologies, which allow detection of large numbers of SNPs at low cost. However, both NGS data and their analysis are error-prone, which can lead to the generation of false positive (FP) SNPs. We explored the relationship between FP SNPs and seven factors involved in mapping-based variant calling — quality of the reference sequence, read length, choice of mapper and variant caller, mapping stringency and filtering of SNPs by read mapping quality and read depth. This resulted in 576 possible factor level combinations. We used error- and variant-free simulated reads to ensure that every SNP found was indeed a false positive. Results The variation in the number of FP SNPs generated ranged from 0 to 36,621 for the 120 million base pairs (Mbp) genome. All of the experimental factors tested had statistically significant effects on the number of FP SNPs generated and there was a considerable amount of interaction between the different factors. Using a fragmented reference sequence led to a dramatic increase in the number of FP SNPs generated, as did relaxed read mapping and a lack of SNP filtering. The choice of reference assembler, mapper and variant caller also significantly affected the outcome. The effect of read length was more complex and suggests a possible interaction between mapping specificity and the potential for contributing more false positives as read length increases. Conclusions The choice of tools and parameters involved in variant calling can have a dramatic effect on the number of FP SNPs produced, with particularly poor combinations of software and/or parameter settings yielding tens of thousands in this experiment. Between-factor interactions make simple recommendations difficult for a SNP discovery pipeline but the quality of the reference sequence is clearly of paramount importance. Our findings are also a stark reminder that it can be unwise to use the relaxed mismatch settings provided as defaults by some read mappers when reads are being mapped to a relatively unfinished reference sequence from e.g. a non-model organism in its early stages of genomic exploration. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0801-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Antonio Ribeiro
- The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, Scotland, UK. .,Division of Plant Sciences, University of Dundee at JHI, Invergowrie, Dundee, DD2 5DA, Scotland, UK.
| | - Agnieszka Golicz
- School of Agriculture and Food Sciences, University of Queensland, Brisbane, Queensland, 4072, Australia. .,Australian Centre for Plant Functional Genomics and School of Agriculture and Food Sciences, University of Queensland, Brisbane, Queensland, 4072, Australia.
| | | | - Iain Milne
- The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, Scotland, UK.
| | - Gordon Stephen
- The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, Scotland, UK.
| | - David Marshall
- The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, Scotland, UK.
| | - Andrew J Flavell
- Division of Plant Sciences, University of Dundee at JHI, Invergowrie, Dundee, DD2 5DA, Scotland, UK.
| | - Micha Bayer
- The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, Scotland, UK.
| |
Collapse
|
22
|
Beal MA, Gagné R, Williams A, Marchetti F, Yauk CL. Characterizing Benzo[a]pyrene-induced lacZ mutation spectrum in transgenic mice using next-generation sequencing. BMC Genomics 2015; 16:812. [PMID: 26481219 PMCID: PMC4617527 DOI: 10.1186/s12864-015-2004-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2015] [Accepted: 10/03/2015] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND The transgenic rodent mutation reporter assay provides an efficient approach to identify mutagenic agents in vivo. A major advantage of this assay is that mutant reporter transgenes can be sequenced to provide information on the mode of action of a mutagen and to identify clonally expanded mutations. However, conventional DNA sequence analysis is laborious and expensive for long transgenes, such as lacZ (3096 bp), and is not normally implemented in routine screening. METHODS We developed a high-throughput next-generation sequencing (NGS) approach to simultaneously sequence large numbers of barcoded mutant lacZ transgenes from different animals. We collected 3872 mutants derived from the bone marrow DNA of six Muta™Mouse males exposed to the well-established mutagen benzo[a]pyrene (BaP) and six solvent-exposed controls. Mutants within animal samples were pooled, barcoded, and then sequenced using NGS. RESULTS We identified 1652 mutant sequences from 1006 independent mutations that underwent clonal expansion. This deep sequencing analysis of mutation spectrum demonstrated that BaP causes primarily guanine transversions (e.g. G:C → T:A), which is highly consistent with previous studies employing Sanger sequencing. Furthermore, we identified novel mutational hotspots in the lacZ transgene that were previously uncharacterized by Sanger sequencing. Deep sequencing also allowed for an unprecedented ability to correct for clonal expansion events, improving the sensitivity of the mutation reporter assay by 50 %. CONCLUSION These results demonstrate that the high-throughput nature and reduced costs offered by NGS provide a sensitive and fast approach for elucidating and comparing mutagenic mechanisms of various agents among tissues and enabling improved evaluation of genotoxins.
Collapse
Affiliation(s)
- Marc A Beal
- Carleton University, Ottawa, ON, K1S 5B6, Canada.
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON, K1A 0K9, Canada.
| | - Rémi Gagné
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON, K1A 0K9, Canada.
| | - Andrew Williams
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON, K1A 0K9, Canada.
| | - Francesco Marchetti
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON, K1A 0K9, Canada.
| | - Carole L Yauk
- Environmental Health Science and Research Bureau, Healthy Environments and Consumer Safety Branch, Health Canada, Ottawa, ON, K1A 0K9, Canada.
| |
Collapse
|
23
|
Arthur JW, Cheung FSG, Reichardt JKV. Single nucleotide differences (SNDs) continue to contaminate the dbSNP database with consequences for human genomics and health. Hum Mutat 2015; 36:196-9. [PMID: 25421747 DOI: 10.1002/humu.22735] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2014] [Accepted: 11/17/2014] [Indexed: 01/31/2023]
Abstract
It has been established that up to 8.3% of the biallelic coding SNPs present in dbSNP are actually artefactual polymorphism-like errors, previously termed single nucleotide differences, or SNDs. In this study, a previous analysis of SNPs in dbSNP was extended and updated to examine how the incidence of SNDs has changed over an intervening five year period. The incidence of SNDs was found to be lower than in the previous analysis at 2.2% of all biallelic SNPs. There was only a modest reduction in the percentage of SNDs in the original set of biallelic coding SNPs tested. This suggests that the overall reduction in the incidence of SNDs over the intervening 5-year period is related to an improvement in SNP detection methods and more rigorous curation, rather than efforts to ameliorate the presence of SNDs. We note that SNDs contaminating the dbSNP may lead to erroneous conclusions on human conditions.
Collapse
Affiliation(s)
- Jonathan W Arthur
- Children's Medical Research Institute, University of Sydney, Westmead, New South Wales, Australia
| | | | | |
Collapse
|
24
|
Abstract
Cryptococcus gattii is a fungal pathogen of humans, causing pulmonary infections in otherwise healthy hosts. To characterize genomic variation among the four major lineages of C. gattii (VGI, -II, -III, and -IV), we generated, annotated, and compared 16 de novo genome assemblies, including the first for the rarely isolated lineages VGIII and VGIV. By identifying syntenic regions across assemblies, we found 15 structural rearrangements, which were almost exclusive to the VGI-III-IV lineages. Using synteny to inform orthology prediction, we identified a core set of 87% of C. gattii genes present as single copies in all four lineages. Remarkably, 737 genes are variably inherited across lineages and are overrepresented for response to oxidative stress, mitochondrial import, and metal binding and transport. Specifically, VGI has an expanded set of iron-binding genes thought to be important to the virulence of Cryptococcus, while VGII has expansions in the stress-related heat shock proteins relative to the other lineages. We also characterized genes uniquely absent in each lineage, including a copper transporter absent from VGIV, which influences Cryptococcus survival during pulmonary infection and the onset of meningoencephalitis. Through inclusion of population-level data for an additional 37 isolates, we identified a new transcontinental clonal group that we name VGIIx, mitochondrial recombination between VGII and VGIII, and positive selection of multidrug transporters and the iron-sulfur protein aconitase along multiple branches of the phylogenetic tree. Our results suggest that gene expansion or contraction and positive selection have introduced substantial variation with links to mechanisms of pathogenicity across this species complex. The genetic differences between phenotypically different pathogens provide clues to the underlying mechanisms of those traits and can lead to new drug targets and improved treatments for those diseases. In this paper, we compare 16 genomes belonging to four highly differentiated lineages of Cryptococcus gattii, which cause pulmonary infections in otherwise healthy humans and other animals. Half of these lineages have not had their genomes previously assembled and annotated. We identified 15 ancestral rearrangements in the genome and over 700 genes that are unique to one or more lineages, many of which are associated with virulence. In addition, we found evidence for recent transcontinental spread, mitochondrial genetic exchange, and positive selection in multidrug transporters. Our results suggest that gene expansion/contraction and positive selection are diversifying the mechanisms of pathogenicity across this species complex.
Collapse
|
25
|
Dell'Acqua M, Zuccolo A, Tuna M, Gianfranceschi L, Pè ME. Targeting environmental adaptation in the monocot model Brachypodium distachyon: a multi-faceted approach. BMC Genomics 2014; 15:801. [PMID: 25236859 PMCID: PMC4177692 DOI: 10.1186/1471-2164-15-801] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2014] [Accepted: 09/04/2014] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND The local environment plays a major role in the spatial distribution of plant populations. Natural plant populations have an extremely poor displacing capacity, so their continued survival in a given environment depends on how well they adapt to local pedoclimatic conditions. Genomic tools can be used to identify adaptive traits at a DNA level and to further our understanding of evolutionary processes. Here we report the use of genotyping-by-sequencing on local groups of the sequenced monocot model species Brachypodium distachyon. Exploiting population genetics, landscape genomics and genome wide association studies, we evaluate B. distachyon role as a natural probe for identifying genomic loci involved in environmental adaptation. RESULTS Brachypodium distachyon individuals were sampled in nine locations with different ecologies and characterized with 16,697 SNPs. Variations in sequencing depth showed consistent patterns at 8,072 genomic bins, which were significantly enriched in transposable elements. We investigated the structuration and diversity of this collection, and exploited climatic data to identify loci with adaptive significance through i) two different approaches for genome wide association analyses considering climatic variation, ii) an outlier loci approach, and iii) a canonical correlation analysis on differentially sequenced bins. A linkage disequilibrium-corrected Bonferroni method was applied to filter associations. The two association methods jointly identified a set of 15 genes significantly related to environmental adaptation. The outlier loci approach revealed that 5.7% of the loci analysed were under selection. The canonical correlation analysis showed that the distribution of some differentially sequenced regions was associated to environmental variation. CONCLUSIONS We show that the multi-faceted approach used here targeted different components of B. distachyon adaptive variation, and may lead to the discovery of genes related to environmental adaptation in natural populations. Its application to a model species with a fully sequenced genome is a modular strategy that enables the stratification of biological material and thus improves our knowledge of the functional loci determining adaptation in near-crop species. When coupled with population genetics and measures of genomic structuration, methods coming from genome wide association studies may lead to the exploitation of model species as natural probes to identify loci related to environmental adaptation.
Collapse
Affiliation(s)
| | | | | | | | - Mario Enrico Pè
- Institute of Life Sciences, Scuola Superiore Sant'Anna, Pisa, Italy.
| |
Collapse
|
26
|
Pightling AW, Petronella N, Pagotto F. Choice of reference sequence and assembler for alignment of Listeria monocytogenes short-read sequence data greatly influences rates of error in SNP analyses. PLoS One 2014; 9:e104579. [PMID: 25144537 PMCID: PMC4140716 DOI: 10.1371/journal.pone.0104579] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Accepted: 07/14/2014] [Indexed: 01/06/2023] Open
Abstract
The wide availability of whole-genome sequencing (WGS) and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs) in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs) are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps) are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i) depth of sequencing coverage, ii) choice of reference-guided short-read sequence assembler, iii) choice of reference genome, and iv) whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT), using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming). We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers should test a variety of conditions to achieve optimal results.
Collapse
Affiliation(s)
- Arthur W. Pightling
- Listeriosis Reference Service for Canada, Research Division, Bureau of Microbial Hazards, Food Directorate, Health Products and Food Branch, Health Canada, Ottawa, Ontario, Canada
| | - Nicholas Petronella
- Biostatistics and Modelling Division, Bureau of Food Surveillance and Science Integration, Food Directorate, Health Products and Food Branch, Health Canada, Ottawa, Ontario, Canada
| | - Franco Pagotto
- Listeriosis Reference Service for Canada, Research Division, Bureau of Microbial Hazards, Food Directorate, Health Products and Food Branch, Health Canada, Ottawa, Ontario, Canada
- * E-mail:
| |
Collapse
|
27
|
Voelz K, Ma H, Phadke S, Byrnes EJ, Zhu P, Mueller O, Farrer RA, Henk DA, Lewit Y, Hsueh YP, Fisher MC, Idnurm A, Heitman J, May RC. Transmission of Hypervirulence traits via sexual reproduction within and between lineages of the human fungal pathogen cryptococcus gattii. PLoS Genet 2013; 9:e1003771. [PMID: 24039607 PMCID: PMC3764205 DOI: 10.1371/journal.pgen.1003771] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2013] [Accepted: 07/22/2013] [Indexed: 01/11/2023] Open
Abstract
Since 1999 a lineage of the pathogen Cryptococcus gattii has been infecting humans and other animals in Canada and the Pacific Northwest of the USA. It is now the largest outbreak of a life-threatening fungal infection in a healthy population in recorded history. The high virulence of outbreak strains is closely linked to the ability of the pathogen to undergo rapid mitochondrial tubularisation and proliferation following engulfment by host phagocytes. Most outbreaks spread by geographic expansion across suitable niches, but it is known that genetic re-assortment and hybridisation can also lead to rapid range and host expansion. In the context of C. gattii, however, the likelihood of virulence traits associated with the outbreak lineages spreading to other lineages via genetic exchange is currently unknown. Here we address this question by conducting outgroup crosses between distantly related C. gattii lineages (VGII and VGIII) and ingroup crosses between isolates from the same molecular type (VGII). Systematic phenotypic characterisation shows that virulence traits are transmitted to outgroups infrequently, but readily inherited during ingroup crosses. In addition, we observed higher levels of biparental (as opposed to uniparental) mitochondrial inheritance during VGII ingroup sexual mating in this species and provide evidence for mitochondrial recombination following mating. Taken together, our data suggest that hypervirulence can spread among the C. gattii lineages VGII and VGIII, potentially creating novel hypervirulent genotypes, and that current models of uniparental mitochondrial inheritance in the Cryptococcus genus may not be universal. How infections spread within the human population is an important question in forecasting potential epidemics. One way to investigate potential mechanisms is to test experimentally whether combinations of genes that confer high virulence are able to spread to less-virulent lineages. Here, we address this question in a fungal pathogen that is causing an outbreak of meningitis in healthy humans in Canada and the Pacific Northwest. We demonstrate that virulence traits are easily transmitted between closely related pathogenic strains, but are more difficult to transmit to more distant lineages. In addition, we show that a paradigm of organelle inheritance, namely that mitochondria are inherited uniparentally from the a mating type, is altered in the R265α outbreak strain such that it transmits its mitochondrial genome to 25–30% of its progeny. This biparental inheritance likely contributes to increased mitochondrial recombination. Taken together, our data suggest that virulence traits may be relatively mobile within this species and that current models of mitochondrial inheritance may require revising.
Collapse
Affiliation(s)
- Kerstin Voelz
- Institute of Microbiology and Infection & School of Biosciences, University of Birmingham, Birmingham, United Kingdom
- The National Institute of Health Research Surgical Reconstruction and Microbiology Research Centre, Queen Elizabeth Hospital Birmingham, Birmingham, United Kingdom
| | - Hansong Ma
- Institute of Microbiology and Infection & School of Biosciences, University of Birmingham, Birmingham, United Kingdom
| | - Sujal Phadke
- Department of Molecular Genetics and Microbiology, Duke University, Durham, North Carolina, United States of America
| | - Edmond J. Byrnes
- Department of Molecular Genetics and Microbiology, Duke University, Durham, North Carolina, United States of America
| | - Pinkuan Zhu
- School of Biological Sciences, University of Missouri, Kansas City, Missouri, United States of America
| | - Olaf Mueller
- Department of Molecular Genetics and Microbiology, Duke University, Durham, North Carolina, United States of America
| | - Rhys A. Farrer
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
| | - Daniel A. Henk
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
| | - Yonathan Lewit
- Department of Molecular Genetics and Microbiology, Duke University, Durham, North Carolina, United States of America
| | - Yen-Ping Hsueh
- Department of Molecular Genetics and Microbiology, Duke University, Durham, North Carolina, United States of America
| | - Matthew C. Fisher
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
| | - Alexander Idnurm
- School of Biological Sciences, University of Missouri, Kansas City, Missouri, United States of America
| | - Joseph Heitman
- Department of Molecular Genetics and Microbiology, Duke University, Durham, North Carolina, United States of America
- * E-mail: (JH); (RCM)
| | - Robin C. May
- Institute of Microbiology and Infection & School of Biosciences, University of Birmingham, Birmingham, United Kingdom
- The National Institute of Health Research Surgical Reconstruction and Microbiology Research Centre, Queen Elizabeth Hospital Birmingham, Birmingham, United Kingdom
- * E-mail: (JH); (RCM)
| |
Collapse
|
28
|
Chromosomal copy number variation, selection and uneven rates of recombination reveal cryptic genome diversity linked to pathogenicity. PLoS Genet 2013; 9:e1003703. [PMID: 23966879 PMCID: PMC3744429 DOI: 10.1371/journal.pgen.1003703] [Citation(s) in RCA: 97] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Accepted: 06/21/2013] [Indexed: 11/19/2022] Open
Abstract
Pathogenic fungi constitute a growing threat to both plant and animal species on a global scale. Despite a clonal mode of reproduction dominating the population genetic structure of many fungi, putatively asexual species are known to adapt rapidly when confronted by efforts to control their growth and transmission. However, the mechanisms by which adaptive diversity is generated across a clonal background are often poorly understood. We sequenced a global panel of the emergent amphibian pathogen, Batrachochytrium dendrobatidis (Bd), to high depth and characterized rapidly changing features of its genome that we believe hold the key to the worldwide success of this organism. Our analyses show three processes that contribute to the generation of de novo diversity. Firstly, we show that the majority of wild isolates manifest chromosomal copy number variation that changes over short timescales. Secondly, we show that cryptic recombination occurs within all lineages of Bd, leading to large regions of the genome being in linkage equilibrium, and is preferentially associated with classes of genes of known importance for virulence in other pathosystems. Finally, we show that these classes of genes are under directional selection, and that this has predominantly targeted the Global Panzootic Lineage (BdGPL). Our analyses show that Bd manifests an unusually dynamic genome that may have been shaped by its association with the amphibian host. The rates of variation that we document likely explain the high levels of phenotypic variability that have been reported for Bd, and suggests that the dynamic genome of this pathogen has contributed to its success across multiple biomes and host-species. Pathogenic fungi constitute a growing threat to both plant and animal species on a global scale. However, many features of the fungal genome that enable them to successfully adapt to infect diverse hosts and ecological niches remain cryptic, especially for newly evolved emerging lineages. In this paper, we report three novel features of genome diversity linked to pathogenicity in the emerging amphibian pathogen, Batrachochytrium dendrobatidis (Bd). Firstly, we identified widespread chromosome copy number variation (CCNV) across our lineages, with individual isolates harboring between 2 to 5 copies of each chromosome and rapid rates of CCNV occurring in culture. In addition, by using in vitro divergence of replicate lines of Bd, we showed that changes in ploidy can occur within as few as 40 generations. Secondly, we identified uneven rates of recombination across the genomes and lineages, revealing hot spots in known classes of virulence factors. Finally we identified significant evidence of diversifying selection across the secretome of Bd, and showed that selection also targets putative virulence factors. These findings add to our knowledge of genome-dynamicity and modes of evolution manifested by eukaryote microbial pathogens, and may explain the varied phenotypic responses observed in Bd.
Collapse
|
29
|
Population-Sequencing as a Biomarker for Sample Characterization. J Biomark 2013; 2013:861823. [PMID: 26317024 PMCID: PMC4437355 DOI: 10.1155/2013/861823] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2013] [Accepted: 10/10/2013] [Indexed: 11/27/2022] Open
Abstract
Sequencing is accepted as the “gold” standard for genetic analysis and continues to be used as a validation and reference tool. The idea of using sequence analysis directly for sample characterization has been met with skepticism. However, herein, utility of direct use of sequencing to identify multiple genomes present in samples is presented and reviewed. All samples and “pure” isolates are populations of genomes. Population-Sequencing is the use of probabilistic matching tools in combination with large volumes of sequence information to identify genomes present, based on DNA analysis across entire genomes to determine genome assignments, to calculate confidence scores of major and minor genome content. Accurate genome identification from mixtures without culture purification steps can achieve phylogenetic classification by direct analysis of millions of DNA fragments. Genome sequencing data of mixtures can function as biomarkers for use to interrogate genetic content of samples and to establish a sample profile, inclusive of major and minor genome components, drill down to identify rare SNP and mutation events, compare relatedness of genetic content between samples, profile-to-profile, and provide a probabilistic or statistical scoring confidence for sample characterization and attribution. The application of Population-Sequencing will facilitate sample characterization and genome identification strategies.
Collapse
|