1
|
What is new in FungiDB: a web-based bioinformatics platform for omics-scale data analysis for fungal and oomycete species. Genetics 2024; 227:iyae035. [PMID: 38529759 DOI: 10.1093/genetics/iyae035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 02/15/2024] [Indexed: 03/27/2024] Open
Abstract
FungiDB (https://fungidb.org) serves as a valuable online resource that seamlessly integrates genomic and related large-scale data for a wide range of fungal and oomycete species. As an integral part of the VEuPathDB Bioinformatics Resource Center (https://veupathdb.org), FungiDB continually integrates both published and unpublished data addressing various aspects of fungal biology. Established in early 2011, the database has evolved to support 674 datasets. The datasets include over 300 genomes spanning various taxa (e.g. Ascomycota, Basidiomycota, Blastocladiomycota, Chytridiomycota, Mucoromycota, as well as Albuginales, Peronosporales, Pythiales, and Saprolegniales). In addition to genomic assemblies and annotation, over 300 extra datasets encompassing diverse information, such as expression and variation data, are also available. The resource also provides an intuitive web-based interface, facilitating comprehensive approaches to data mining and visualization. Users can test their hypotheses and navigate through omics-scale datasets using a built-in search strategy system. Moreover, FungiDB offers capabilities for private data analysis via the integrated VEuPathDB Galaxy platform. FungiDB also permits genome improvements by capturing expert knowledge through the User Comments system and the Apollo genome annotation editor for structural and functional gene curation. FungiDB facilitates data exploration and analysis and contributes to advancing research efforts by capturing expert knowledge for fungal and oomycete species.
Collapse
|
2
|
In silico prediction of candidate gene targets for the management of African cassava whitefly ( Bemisia tabaci, SSA1-SG1), a key vector of viruses causing cassava brown streak disease. PeerJ 2024; 12:e16949. [PMID: 38410806 PMCID: PMC10896082 DOI: 10.7717/peerj.16949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 01/24/2024] [Indexed: 02/28/2024] Open
Abstract
Whiteflies (Bemisia tabaci sensu lato) have a wide host range and are globally important agricultural pests. In Sub-Saharan Africa, they vector viruses that cause two ongoing disease epidemics: cassava brown streak disease and cassava mosaic virus disease. These two diseases threaten food security for more than 800 million people in Sub-Saharan Africa. Efforts are ongoing to identify target genes for the development of novel management options against the whitefly populations that vector these devastating viral diseases affecting cassava production in Sub-Saharan Africa. This study aimed to identify genes that mediate osmoregulation and symbiosis functions within cassava whitefly gut and bacteriocytes and evaluate their potential as key gene targets for novel whitefly control strategies. The gene expression profiles of dissected guts, bacteriocytes and whole bodies were compared by RNAseq analysis to identify genes with significantly enriched expression in the gut and bacteriocytes. Phylogenetic analyses identified three candidate osmoregulation gene targets: two α-glucosidases, SUC 1 and SUC 2 with predicted function in sugar transformations that reduce osmotic pressure in the gut; and a water-specific aquaporin (AQP1) mediating water cycling from the distal to the proximal end of the gut. Expression of the genes in the gut was enriched 23.67-, 26.54- and 22.30-fold, respectively. Genome-wide metabolic reconstruction coupled with constraint-based modeling revealed four genes (argH, lysA, BCAT & dapB) within the bacteriocytes as potential targets for the management of cassava whiteflies. These genes were selected based on their role and essentiality within the different essential amino acid biosynthesis pathways. A demonstration of candidate osmoregulation and symbiosis gene targets in other species of the Bemisia tabaci species complex that are orthologs of the empirically validated osmoregulation genes highlights the latter as promising gene targets for the control of cassava whitefly pests by in planta RNA interference.
Collapse
|
3
|
Ensembl 2024. Nucleic Acids Res 2024; 52:D891-D899. [PMID: 37953337 PMCID: PMC10767893 DOI: 10.1093/nar/gkad1049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/20/2023] [Accepted: 10/24/2023] [Indexed: 11/14/2023] Open
Abstract
Ensembl (https://www.ensembl.org) is a freely available genomic resource that has produced high-quality annotations, tools, and services for vertebrates and model organisms for more than two decades. In recent years, there has been a dramatic shift in the genomic landscape, with a large increase in the number and phylogenetic breadth of high-quality reference genomes, alongside major advances in the pan-genome representations of higher species. In order to support these efforts and accelerate downstream research, Ensembl continues to focus on scaling for the rapid annotation of new genome assemblies, developing new methods for comparative analysis, and expanding the depth and quality of our genome annotations. This year we have continued our expansion to support global biodiversity research, doubling the number of annotated genomes we support on our Rapid Release site to over 1700, driven by our close collaboration with biodiversity projects such as Darwin Tree of Life. We have also strengthened support for key agricultural species, including the first regulatory builds for farmed animals, and have updated key tools and resources that support the global scientific community, notably the Ensembl Variant Effect Predictor. Ensembl data, software, and tools are freely available.
Collapse
|
4
|
VEuPathDB: the eukaryotic pathogen, vector and host bioinformatics resource center in 2023. Nucleic Acids Res 2024; 52:D808-D816. [PMID: 37953350 PMCID: PMC10767879 DOI: 10.1093/nar/gkad1003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/09/2023] [Accepted: 10/19/2023] [Indexed: 11/14/2023] Open
Abstract
The Eukaryotic Pathogen, Vector and Host Informatics Resource (VEuPathDB, https://veupathdb.org) is a Bioinformatics Resource Center funded by the National Institutes of Health with additional funding from the Wellcome Trust. VEuPathDB supports >600 organisms that comprise invertebrate vectors, eukaryotic pathogens (protists and fungi) and relevant free-living or non-pathogenic species or hosts. Since 2004, VEuPathDB has analyzed omics data from the public domain using contemporary bioinformatic workflows, including orthology predictions via OrthoMCL, and integrated the analysis results with analysis tools, visualizations, and advanced search capabilities. The unique data mining platform coupled with >3000 pre-analyzed data sets facilitates the exploration of pertinent omics data in support of hypothesis driven research. Comparisons are easily made across data sets, data types and organisms. A Galaxy workspace offers the opportunity for the analysis of private large-scale datasets and for porting to VEuPathDB for comparisons with integrated data. The MapVEu tool provides a platform for exploration of spatially resolved data such as vector surveillance and insecticide resistance monitoring. To address the growing body of omics data and advances in laboratory techniques, VEuPathDB has added several new data types, searches and features, improved the Galaxy workspace environment, redesigned the MapVEu interface and updated the infrastructure to accommodate these changes.
Collapse
|
5
|
VEuPathDB: the eukaryotic pathogen, vector and host bioinformatics resource center. Nucleic Acids Res 2022; 50:D898-D911. [PMID: 34718728 PMCID: PMC8728164 DOI: 10.1093/nar/gkab929] [Citation(s) in RCA: 186] [Impact Index Per Article: 93.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 09/21/2021] [Accepted: 10/04/2021] [Indexed: 11/13/2022] Open
Abstract
The Eukaryotic Pathogen, Vector and Host Informatics Resource (VEuPathDB, https://veupathdb.org) represents the 2019 merger of VectorBase with the EuPathDB projects. As a Bioinformatics Resource Center funded by the National Institutes of Health, with additional support from the Welllcome Trust, VEuPathDB supports >500 organisms comprising invertebrate vectors, eukaryotic pathogens (protists and fungi) and relevant free-living or non-pathogenic species or hosts. Designed to empower researchers with access to Omics data and bioinformatic analyses, VEuPathDB projects integrate >1700 pre-analysed datasets (and associated metadata) with advanced search capabilities, visualizations, and analysis tools in a graphic interface. Diverse data types are analysed with standardized workflows including an in-house OrthoMCL algorithm for predicting orthology. Comparisons are easily made across datasets, data types and organisms in this unique data mining platform. A new site-wide search facilitates access for both experienced and novice users. Upgraded infrastructure and workflows support numerous updates to the web interface, tools, searches and strategies, and Galaxy workspace where users can privately analyse their own data. Forthcoming upgrades include cloud-ready application architecture, expanded support for the Galaxy workspace, tools for interrogating host-pathogen interactions, and improved interactions with affiliated databases (ClinEpiDB, MicrobiomeDB) and other scientific resources, and increased interoperability with the Bacterial & Viral BRC.
Collapse
|
6
|
Ensembl Genomes 2020-enabling non-vertebrate genomic research. Nucleic Acids Res 2020; 48:D689-D695. [PMID: 31598706 PMCID: PMC6943047 DOI: 10.1093/nar/gkz890] [Citation(s) in RCA: 283] [Impact Index Per Article: 70.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Revised: 09/29/2019] [Accepted: 10/02/2019] [Indexed: 12/28/2022] Open
Abstract
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of interfaces to genomic data across the tree of life, including reference genome sequence, gene models, transcriptional data, genetic variation and comparative analysis. Data may be accessed via our website, online tools platform and programmatic interfaces, with updates made four times per year (in synchrony with Ensembl). Here, we provide an overview of Ensembl Genomes, with a focus on recent developments. These include the continued growth, more robust and reproducible sets of orthologues and paralogues, and enriched views of gene expression and gene function in plants. Finally, we report on our continued deeper integration with the Ensembl project, which forms a key part of our future strategy for dealing with the increasing quantity of available genome-scale data across the tree of life.
Collapse
|
7
|
Evolutionary superscaffolding and chromosome anchoring to improve Anopheles genome assemblies. BMC Biol 2020; 18:1. [PMID: 31898513 PMCID: PMC6939337 DOI: 10.1186/s12915-019-0728-3] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 11/26/2019] [Indexed: 11/18/2022] Open
Abstract
Background New sequencing technologies have lowered financial barriers to whole genome sequencing, but resulting assemblies are often fragmented and far from ‘finished’. Updating multi-scaffold drafts to chromosome-level status can be achieved through experimental mapping or re-sequencing efforts. Avoiding the costs associated with such approaches, comparative genomic analysis of gene order conservation (synteny) to predict scaffold neighbours (adjacencies) offers a potentially useful complementary method for improving draft assemblies. Results We evaluated and employed 3 gene synteny-based methods applied to 21 Anopheles mosquito assemblies to produce consensus sets of scaffold adjacencies. For subsets of the assemblies, we integrated these with additional supporting data to confirm and complement the synteny-based adjacencies: 6 with physical mapping data that anchor scaffolds to chromosome locations, 13 with paired-end RNA sequencing (RNAseq) data, and 3 with new assemblies based on re-scaffolding or long-read data. Our combined analyses produced 20 new superscaffolded assemblies with improved contiguities: 7 for which assignments of non-anchored scaffolds to chromosome arms span more than 75% of the assemblies, and a further 7 with chromosome anchoring including an 88% anchored Anopheles arabiensis assembly and, respectively, 73% and 84% anchored assemblies with comprehensively updated cytogenetic photomaps for Anopheles funestus and Anopheles stephensi. Conclusions Experimental data from probe mapping, RNAseq, or long-read technologies, where available, all contribute to successful upgrading of draft assemblies. Our evaluations show that gene synteny-based computational methods represent a valuable alternative or complementary approach. Our improved Anopheles reference assemblies highlight the utility of applying comparative genomics approaches to improve community genomic resources.
Collapse
|
8
|
Ensembl Genomes 2018: an integrated omics infrastructure for non-vertebrate species. Nucleic Acids Res 2018; 46:D802-D808. [PMID: 29092050 PMCID: PMC5753204 DOI: 10.1093/nar/gkx1011] [Citation(s) in RCA: 306] [Impact Index Per Article: 51.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2017] [Revised: 10/06/2017] [Accepted: 10/24/2017] [Indexed: 02/06/2023] Open
Abstract
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including genome sequence, gene models, transcript sequence, genetic variation, and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments and expansions. These include the incorporation of almost 20 000 additional genome sequences and over 35 000 tracks of RNA-Seq data, which have been aligned to genomic sequence and made available for visualization. Other advances since 2015 include the release of the database in Resource Description Framework (RDF) format, a large increase in community-derived curation, a new high-performance protein sequence search, additional cross-references, improved annotation of non-protein-coding genes, and the launch of pre-release and archival sites. Collectively, these changes are part of a continuing response to the increasing quantity of publicly-available genome-scale data, and the consequent need to archive, integrate, annotate and disseminate these using automated, scalable methods.
Collapse
|
9
|
Ensembl Genomes 2016: more genomes, more complexity. Nucleic Acids Res 2015; 44:D574-80. [PMID: 26578574 PMCID: PMC4702859 DOI: 10.1093/nar/gkv1209] [Citation(s) in RCA: 431] [Impact Index Per Article: 47.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2015] [Accepted: 10/27/2015] [Indexed: 12/14/2022] Open
Abstract
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces.
Collapse
|
10
|
Association mapping by pooled sequencing identifies TOLL 11 as a protective factor against Plasmodium falciparum in Anopheles gambiae. BMC Genomics 2015; 16:779. [PMID: 26462916 PMCID: PMC4603968 DOI: 10.1186/s12864-015-2009-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2015] [Accepted: 10/03/2015] [Indexed: 11/16/2022] Open
Abstract
Background The genome-wide association study (GWAS) techniques that have been used for genetic mapping in other organisms have not been successfully applied to mosquitoes, which have genetic characteristics of high nucleotide diversity, low linkage disequilibrium, and complex population stratification that render population-based GWAS essentially unfeasible at realistic sample size and marker density. Methods We designed a novel mapping strategy for the mosquito system that combines the power of linkage mapping with the resolution afforded by genetic association. We established founder colonies from West Africa, controlled for diversity, linkage disequilibrium and population stratification. Colonies were challenged by feeding on the infectious stage of the human malaria parasite, Plasmodium falciparum, mosquitoes were phenotyped for parasite load, and DNA pools for phenotypically similar mosquitoes were Illumina sequenced. Phenotype-genotype mapping was carried out in two stages, coarse and fine. Results In the first mapping stage, pooled sequences were analysed genome-wide for intervals displaying relativereduction in diversity between phenotype pools, and candidate genomic loci were identified for influence upon parasite infection levels. In the second mapping stage, focused genotyping of SNPs from the first mapping stage was carried out in unpooled individual mosquitoes and replicates. The second stage confirmed significant SNPs in a locus encoding two Toll-family proteins. RNAi-mediated gene silencing and infection challenge revealed that TOLL 11 protects mosquitoes against P. falciparum infection. Conclusions We present an efficient and cost-effective method for genetic mapping using natural variation segregating in defined recent Anopheles founder colonies, and demonstrate its applicability for mapping in a complex non-model genome. This approach is a practical and preferred alternative to population-based GWAS for first-pass mapping of phenotypes in Anopheles. This design should facilitate mapping of other traits involved in physiology, epidemiology, and behaviour. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2009-z) contains supplementary material, which is available to authorized users.
Collapse
|
11
|
VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases. Nucleic Acids Res 2014; 43:D707-13. [PMID: 25510499 PMCID: PMC4383932 DOI: 10.1093/nar/gku1117] [Citation(s) in RCA: 433] [Impact Index Per Article: 43.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
VectorBase is a National Institute of Allergy and Infectious Diseases supported Bioinformatics Resource Center (BRC) for invertebrate vectors of human pathogens. Now in its 11th year, VectorBase currently hosts the genomes of 35 organisms including a number of non-vectors for comparative analysis. Hosted data range from genome assemblies with annotated gene features, transcript and protein expression data to population genetics including variation and insecticide-resistance phenotypes. Here we describe improvements to our resource and the set of tools available for interrogating and accessing BRC data including the integration of Web Apollo to facilitate community annotation and providing Galaxy to support user-based workflows. VectorBase also actively supports our community through hands-on workshops and online tutorials. All information and data are freely available from our website at https://www.vectorbase.org/.
Collapse
|
12
|
Mosquito genomics. Highly evolvable malaria vectors: the genomes of 16 Anopheles mosquitoes. Science 2014; 347:1258522. [PMID: 25554792 DOI: 10.1126/science.1258522] [Citation(s) in RCA: 362] [Impact Index Per Article: 36.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Variation in vectorial capacity for human malaria among Anopheles mosquito species is determined by many factors, including behavior, immunity, and life history. To investigate the genomic basis of vectorial capacity and explore new avenues for vector control, we sequenced the genomes of 16 anopheline mosquito species from diverse locations spanning ~100 million years of evolution. Comparative analyses show faster rates of gene gain and loss, elevated gene shuffling on the X chromosome, and more intron losses, relative to Drosophila. Some determinants of vectorial capacity, such as chemosensory genes, do not show elevated turnover but instead diversify through protein-sequence changes. This dynamism of anopheline genes and genomes may contribute to their flexible capacity to take advantage of new ecological niches, including adapting to humans as primary hosts.
Collapse
|
13
|
Genome analysis of a major urban malaria vector mosquito, Anopheles stephensi. Genome Biol 2014; 15:459. [PMID: 25244985 PMCID: PMC4195908 DOI: 10.1186/s13059-014-0459-2] [Citation(s) in RCA: 83] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2014] [Accepted: 09/03/2014] [Indexed: 12/24/2022] Open
Abstract
Background Anopheles stephensi is the key vector of malaria throughout the Indian subcontinent and Middle East and an emerging model for molecular and genetic studies of mosquito-parasite interactions. The type form of the species is responsible for the majority of urban malaria transmission across its range. Results Here, we report the genome sequence and annotation of the Indian strain of the type form of An. stephensi. The 221 Mb genome assembly represents more than 92% of the entire genome and was produced using a combination of 454, Illumina, and PacBio sequencing. Physical mapping assigned 62% of the genome onto chromosomes, enabling chromosome-based analysis. Comparisons between An. stephensi and An. gambiae reveal that the rate of gene order reshuffling on the X chromosome was three times higher than that on the autosomes. An. stephensi has more heterochromatin in pericentric regions but less repetitive DNA in chromosome arms than An. gambiae. We also identify a number of Y-chromosome contigs and BACs. Interspersed repeats constitute 7.1% of the assembled genome while LTR retrotransposons alone comprise more than 49% of the Y contigs. RNA-seq analyses provide new insights into mosquito innate immunity, development, and sexual dimorphism. Conclusions The genome analysis described in this manuscript provides a resource and platform for fundamental and translational research into a major urban malaria vector. Chromosome-based investigations provide unique perspectives on Anopheles chromosome evolution. RNA-seq analysis and studies of immunity genes offer new insights into mosquito biology and mosquito-parasite interactions. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0459-2) contains supplementary material, which is available to authorized users.
Collapse
|
14
|
Adaptive introgression between Anopheles sibling species eliminates a major genomic island but not reproductive isolation. Nat Commun 2014; 5:4248. [PMID: 24963649 PMCID: PMC4086683 DOI: 10.1038/ncomms5248] [Citation(s) in RCA: 118] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2014] [Accepted: 05/28/2014] [Indexed: 11/16/2022] Open
Abstract
Adaptive introgression can provide novel genetic variation to fuel rapid evolutionary
responses, though it may be counterbalanced by potential for detrimental disruption of the
recipient genomic background. We examine the extent and impact of recent introgression of a
strongly selected insecticide-resistance mutation (Vgsc-1014F) located within one of
two exceptionally large genomic islands of divergence separating the Anopheles
gambiae species pair. Here we show that transfer of the Vgsc mutation results
in homogenization of the entire genomic island region (~1.5% of the genome) between
species. Despite this massive disruption, introgression is clearly adaptive with a dramatic
rise in frequency of Vgsc-1014F and no discernable impact on subsequent reproductive
isolation between species. Our results show (1) how resilience of genomes to massive
introgression can permit rapid adaptive response to anthropogenic selection and (2) that
even extreme prominence of genomic islands of divergence can be an unreliable indicator of
importance in speciation. Highly divergent genomic islands segregate between a species pair of the
mosquito, Anopheles gambiae. Here Clarkson et al. show that loss of one of the
largest islands, driven by adaptive introgression of an insecticide-resistance mutation, had
no impact on reproductive isolation.
Collapse
|
15
|
Ensembl Genomes 2013: scaling up access to genome-wide data. Nucleic Acids Res 2014; 42:D546-52. [PMID: 24163254 PMCID: PMC3965094 DOI: 10.1093/nar/gkt979] [Citation(s) in RCA: 180] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2013] [Accepted: 10/01/2013] [Indexed: 12/20/2022] Open
Abstract
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. This article provides an update to the previous publications about the resource, with a focus on recent developments. These include the addition of important new genomes (and related data sets) including crop plants, vectors of human disease and eukaryotic pathogens. In addition, the resource has scaled up its representation of bacterial genomes, and now includes the genomes of over 9000 bacteria. Specific extensions to the web and programmatic interfaces have been developed to support users in navigating these large data sets. Looking forward, analytic tools to allow targeted selection of data for visualization and download are likely to become increasingly important in future as the number of available genomes increases within all domains of life, and some of the challenges faced in representing bacterial data are likely to become commonplace for eukaryotes in future.
Collapse
|
16
|
Multiple populations of artemisinin-resistant Plasmodium falciparum in Cambodia. Nat Genet 2013; 45:648-55. [PMID: 23624527 PMCID: PMC3807790 DOI: 10.1038/ng.2624] [Citation(s) in RCA: 340] [Impact Index Per Article: 30.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2012] [Accepted: 04/04/2013] [Indexed: 11/09/2022]
Abstract
We describe an analysis of genome variation in 825 P. falciparum samples from Asia and Africa that identifies an unusual pattern of parasite population structure at the epicenter of artemisinin resistance in western Cambodia. Within this relatively small geographic area, we have discovered several distinct but apparently sympatric parasite subpopulations with extremely high levels of genetic differentiation. Of particular interest are three subpopulations, all associated with clinical resistance to artemisinin, which have skewed allele frequency spectra and high levels of haplotype homozygosity, indicative of founder effects and recent population expansion. We provide a catalog of SNPs that show high levels of differentiation in the artemisinin-resistant subpopulations, including codon variants in transporter proteins and DNA mismatch repair proteins. These data provide a population-level genetic framework for investigating the biological origins of artemisinin resistance and for defining molecular markers to assist in its elimination.
Collapse
|
17
|
Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing. Nature 2012; 487:375-9. [PMID: 22722859 PMCID: PMC3738909 DOI: 10.1038/nature11174] [Citation(s) in RCA: 384] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2010] [Accepted: 04/30/2012] [Indexed: 02/02/2023]
Abstract
Malaria elimination strategies require surveillance of the parasite population for genetic changes that demand a public health response, such as new forms of drug resistance. Here we describe methods for the large-scale analysis of genetic variation in Plasmodium falciparum by deep sequencing of parasite DNA obtained from the blood of patients with malaria, either directly or after short-term culture. Analysis of 86,158 exonic single nucleotide polymorphisms that passed genotyping quality control in 227 samples from Africa, Asia and Oceania provides genome-wide estimates of allele frequency distribution, population structure and linkage disequilibrium. By comparing the genetic diversity of individual infections with that of the local parasite population, we derive a metric of within-host diversity that is related to the level of inbreeding in the population. An open-access web application has been established for the exploration of regional differences in allele frequency and of highly differentiated loci in the P. falciparum genome.
Collapse
|
18
|
An In-Solution Hybridisation Method for the Isolation of Pathogen DNA from Human DNA-rich Clinical Samples for Analysis by NGS. ACTA ACUST UNITED AC 2012; 5. [PMID: 24273626 DOI: 10.2174/1875693x01205010018] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Studies on DNA from pathogenic organisms, within clinical samples, are often complicated by the presence of large amounts of host, e.g., human DNA. Isolation of pathogen DNA from these samples would improve the efficiency of next-generation sequencing (NGS) and pathogen identification. Here we describe a solution-based hybridisation method for isolation of pathogen DNA from a mixed population. This straightforward and inexpensive technique uses probes made from whole-genome DNA and off-the-shelf reagents. In this study, Escherichia coli DNA was successfully enriched from a mixture of E.coli and human DNA. After enrichment, genome coverage following NGS was significantly higher and the evenness of coverage and GC content were unaffected. This technique was also applied to samples containing a mixture of human and Plasmodium falciparum DNA. The P.falciparum genome is particularly difficult to sequence due to its high AT content (80.6%) and repetitive nature. Post enrichment, a bias in the recovered DNA was observed, with a poorer representation of the AT-rich non-coding regions. This uneven coverage was also observed in pre-enrichment samples, but to a lesser degree. Despite the coverage bias in enriched samples, SNP (single-nucleotide polymorphism) calling in coding regions was unaffected and the majority of samples had over 90% of their coding region covered at 5× depth. This technique shows significant promise as an effective method to enrich pathogen DNA from samples with heavy human contamination, particularly when applied to GC-neutral genomes.
Collapse
|
19
|
Characterization of within-host Plasmodium falciparum diversity using next-generation sequence data. PLoS One 2012; 7:e32891. [PMID: 22393456 PMCID: PMC3290604 DOI: 10.1371/journal.pone.0032891] [Citation(s) in RCA: 80] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2012] [Accepted: 02/07/2012] [Indexed: 11/19/2022] Open
Abstract
Our understanding of the composition of multi-clonal malarial infections and the epidemiological factors which shape their diversity remain poorly understood. Traditionally within-host diversity has been defined in terms of the multiplicity of infection (MOI) derived by PCR-based genotyping. Massively parallel, single molecule sequencing technologies now enable individual read counts to be derived on genome-wide datasets facilitating the development of new statistical approaches to describe within-host diversity. In this class of measures the FWS metric characterizes within-host diversity and its relationship to population level diversity. Utilizing P. falciparum field isolates from patients in West Africa we here explore the relationship between the traditional MOI and FWS approaches. FWS statistics were derived from read count data at 86,158 SNPs in 64 samples sequenced on the Illumina GA platform. MOI estimates were derived by PCR at the msp-1 and -2 loci. Significant correlations were observed between the two measures, particularly with the msp-1 locus (P = 5.92×10−5). The FWS metric should be more robust than the PCR-based approach owing to reduced sensitivity to potential locus-specific artifacts. Furthermore the FWS metric captures information on a range of parameters which influence out-crossing risk including the number of clones (MOI), their relative proportions and genetic divergence. This approach should provide novel insights into the factors which correlate with, and shape within-host diversity.
Collapse
|
20
|
Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes. BMC Genomics 2012; 13:1. [PMID: 22214261 PMCID: PMC3312816 DOI: 10.1186/1471-2164-13-1] [Citation(s) in RCA: 268] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2011] [Accepted: 01/03/2012] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Massively parallel sequencing technology is revolutionizing approaches to genomic and genetic research. Since its advent, the scale and efficiency of Next-Generation Sequencing (NGS) has rapidly improved. In spite of this success, sequencing genomes or genomic regions with extremely biased base composition is still a great challenge to the currently available NGS platforms. The genomes of some important pathogenic organisms like Plasmodium falciparum (high AT content) and Mycobacterium tuberculosis (high GC content) display extremes of base composition. The standard library preparation procedures that employ PCR amplification have been shown to cause uneven read coverage particularly across AT and GC rich regions, leading to problems in genome assembly and variation analyses. Alternative library-preparation approaches that omit PCR amplification require large quantities of starting material and hence are not suitable for small amounts of DNA/RNA such as those from clinical isolates. We have developed and optimized library-preparation procedures suitable for low quantity starting material and tolerant to extremely high AT content sequences. RESULTS We have used our optimized conditions in parallel with standard methods to prepare Illumina sequencing libraries from a non-clinical and a clinical isolate (containing ~53% host contamination). By analyzing and comparing the quality of sequence data generated, we show that our optimized conditions that involve a PCR additive (TMAC), produces amplified libraries with improved coverage of extremely AT-rich regions and reduced bias toward GC neutral templates. CONCLUSION We have developed a robust and optimized Next-Generation Sequencing library amplification method suitable for extremely AT-rich genomes. The new amplification conditions significantly reduce bias and retain the complexity of either extremes of base composition. This development will greatly benefit sequencing clinical samples that often require amplification due to low mass of DNA starting material.
Collapse
|
21
|
An optimized microarray platform for assaying genomic variation in Plasmodium falciparum field populations. Genome Biol 2011; 12:R35. [PMID: 21477297 PMCID: PMC3218861 DOI: 10.1186/gb-2011-12-4-r35] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2011] [Revised: 03/04/2011] [Accepted: 04/08/2011] [Indexed: 11/13/2022] Open
Abstract
We present an optimized probe design for copy number variation (CNV) and SNP genotyping in the Plasmodium falciparum genome. We demonstrate that variable length and isothermal probes are superior to static length probes. We show that sample preparation and hybridization conditions mitigate the effects of host DNA contamination in field samples. The microarray and workflow presented can be used to identify CNVs and SNPs with 95% accuracy in a single hybridization, in field samples containing up to 92% human DNA contamination.
Collapse
|
22
|
Abstract
Summary: Array-based comparative genomic hybridization (CGH) technology is used to discover and validate genomic structural variation, including copy number variants, insertions, deletions and other structural variants (SVs). The visualization and summarization of the array CGH data outputs, potentially across many samples, is an important process in the identification and analysis of SVs. We have developed a software tool for SV analysis using data from array CGH technologies, which is also amenable to short-read sequence data. Availability and implementation: SnoopCGH is written in java and is available from http://snoopcgh.sourceforge.net/ Contact:jg10@sanger.ac.uk; tc5@sanger.ac.uk
Collapse
|
23
|
Integrated outcrop, drill core, borehole and seismic stratigraphic architecture of a cyclothemic, shallow‐marine depositional system, Wanganui Basin, New Zealand. J R Soc N Z 2005. [DOI: 10.1080/03014223.2005.9517778] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
24
|
Towards high resolution maps of the mouse and human genomes—a facility for ordering markers to 0.1 cM resolution. Hum Mol Genet 1994. [DOI: 10.1093/hmg/3.4.621] [Citation(s) in RCA: 130] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
25
|
Partial sequence data from three evolutionarily conserved loci from the proximal short arm of the human X chromosome; assignment of DXF34S1 to Xp11.21-cen. CYTOGENETICS AND CELL GENETICS 1993; 62:153-5. [PMID: 8428516 DOI: 10.1159/000133460] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
DNA sequence data have been obtained from three clones derived from the human X chromosome which contain evolutionarily conserved sequences. Primers have been designed which enable these loci to be defined as sequence-tagged-sites (STS's). The assignment of one of the loci, DXF34S1, has been refined to Xp11.21-cen, thus limiting the novel pericentromeric segment of homology defined by this locus to the extreme proximal region of Xp.
Collapse
|
26
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 13 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
27
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 26 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
28
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 6 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
29
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 18 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
30
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 2 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
31
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 15 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
32
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 14 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
33
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 20 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
34
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 22 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
35
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 11 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
36
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 16 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
37
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 23 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
38
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 3 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317215] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
39
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 12 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
40
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 8 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
41
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 5 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
42
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 24 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317237] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
43
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 7 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
44
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 9 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
45
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 17 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
46
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms. Cytogenet Genome Res 1991. [DOI: 10.1159/000133727] [Citation(s) in RCA: 69] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
47
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 19 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
48
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 10 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
49
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 21 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
50
|
Report of the DNA committee and catalogues of cloned and mapped genes, markers formatted for PCR and DNA polymorphisms (Part 25 of 27). Cytogenet Genome Res 1991. [DOI: 10.1159/000317238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|