251
|
Soares SC, Trost E, Ramos RTJ, Carneiro AR, Santos AR, Pinto AC, Barbosa E, Aburjaile F, Ali A, Diniz CAA, Hassan SS, Fiaux K, Guimarães LC, Bakhtiar SM, Pereira U, Almeida SS, Abreu VAC, Rocha FS, Dorella FA, Miyoshi A, Silva A, Azevedo V, Tauch A. Genome sequence of Corynebacterium pseudotuberculosis biovar equi strain 258 and prediction of antigenic targets to improve biotechnological vaccine production. J Biotechnol 2012. [PMID: 23201561 DOI: 10.1016/j.jbiotec.2012.11.003] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Corynebacterium pseudotuberculosis is the causative agent of several veterinary diseases in a broad range of economically important hosts, which can vary from caseous lymphadenitis in sheep and goats (biovar ovis) to ulcerative lymphangitis in cattle and horses (biovar equi). Existing vaccines against C. pseudotuberculosis are mainly intended for small ruminants and, even in these hosts, they still present remarkable limitations. In this study, we present the complete genome sequence of C. pseudotuberculosis biovar equi strain 258, isolated from a horse with ulcerative lymphangitis. The genome has a total size of 2,314,404 bp and contains 2088 predicted protein-coding regions. Using in silico analysis, eleven pathogenicity islands were detected in the genome sequence of C. pseudotuberculosis 258. The application of a reverse vaccinology strategy identified 49 putative antigenic proteins, which can be used as candidate vaccine targets in future works.
Collapse
Affiliation(s)
- Siomar C Soares
- CLIB Graduate Cluster Industrial Biotechnology, Centrum für Biotechnologie, Universität Bielefeld, 33615 Bielefeld, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
252
|
Etienne KA, Gillece J, Hilsabeck R, Schupp JM, Colman R, Lockhart SR, Gade L, Thompson EH, Sutton DA, Neblett-Fanfair R, Park BJ, Turabelidze G, Keim P, Brandt ME, Deak E, Engelthaler DM. Whole genome sequence typing to investigate the Apophysomyces outbreak following a tornado in Joplin, Missouri, 2011. PLoS One 2012; 7:e49989. [PMID: 23209631 PMCID: PMC3507928 DOI: 10.1371/journal.pone.0049989] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2012] [Accepted: 10/19/2012] [Indexed: 11/19/2022] Open
Abstract
Case reports of Apophysomyces spp. in immunocompetent hosts have been a result of traumatic deep implantation of Apophysomyces spp. spore-contaminated soil or debris. On May 22, 2011 a tornado occurred in Joplin, MO, leaving 13 tornado victims with Apophysomyces trapeziformis infections as a result of lacerations from airborne material. We used whole genome sequence typing (WGST) for high-resolution phylogenetic SNP analysis of 17 outbreak Apophysomyces isolates and five additional temporally and spatially diverse Apophysomyces control isolates (three A. trapeziformis and two A. variabilis isolates). Whole genome SNP phylogenetic analysis revealed three clusters of genotypically related or identical A. trapeziformis isolates and multiple distinct isolates among the Joplin group; this indicated multiple genotypes from a single or multiple sources. Though no linkage between genotype and location of exposure was observed, WGST analysis determined that the Joplin isolates were more closely related to each other than to the control isolates, suggesting local population structure. Additionally, species delineation based on WGST demonstrated the need to reassess currently accepted taxonomic classifications of phylogenetic species within the genus Apophysomyces.
Collapse
Affiliation(s)
- Kizee A Etienne
- Centers for Disease Control and Prevention, Atlanta, GA, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
253
|
Dubois A, Carrere S, Raymond O, Pouvreau B, Cottret L, Roccia A, Onesto JP, Sakr S, Atanassova R, Baudino S, Foucher F, Le Bris M, Gouzy J, Bendahmane M. Transcriptome database resource and gene expression atlas for the rose. BMC Genomics 2012; 13:638. [PMID: 23164410 PMCID: PMC3518227 DOI: 10.1186/1471-2164-13-638] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Accepted: 11/06/2012] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND For centuries roses have been selected based on a number of traits. Little information exists on the genetic and molecular basis that contributes to these traits, mainly because information on expressed genes for this economically important ornamental plant is scarce. RESULTS Here, we used a combination of Illumina and 454 sequencing technologies to generate information on Rosa sp. transcripts using RNA from various tissues and in response to biotic and abiotic stresses. A total of 80714 transcript clusters were identified and 76611 peptides have been predicted among which 20997 have been clustered into 13900 protein families. BLASTp hits in closely related Rosaceae species revealed that about half of the predicted peptides in the strawberry and peach genomes have orthologs in Rosa dataset. Digital expression was obtained using RNA samples from organs at different development stages and under different stress conditions. qPCR validated the digital expression data for a selection of 23 genes with high or low expression levels. Comparative gene expression analyses between the different tissues and organs allowed the identification of clusters that are highly enriched in given tissues or under particular conditions, demonstrating the usefulness of the digital gene expression analysis. A web interface ROSAseq was created that allows data interrogation by BLAST, subsequent analysis of DNA clusters and access to thorough transcript annotation including best BLAST matches on Fragaria vesca, Prunus persica and Arabidopsis. The rose peptides dataset was used to create the ROSAcyc resource pathway database that allows access to the putative genes and enzymatic pathways. CONCLUSIONS The study provides useful information on Rosa expressed genes, with thorough annotation and an overview of expression patterns for transcripts with good accuracy.
Collapse
Affiliation(s)
- Annick Dubois
- Reproduction et Développement des Plantes UMR INRA-CNRS- Université Lyon 1-ENSL, Ecole Normale Supérieure, 46 allée d'Italie, Lyon Cedex 07 69364, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
254
|
Human disease isolates of serotype m4 and m22 group a streptococcus lack genes required for hyaluronic acid capsule biosynthesis. mBio 2012; 3:e00413-12. [PMID: 23131832 PMCID: PMC3487777 DOI: 10.1128/mbio.00413-12] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Group A streptococcus (GAS) causes human pharyngitis and invasive infections and frequently colonizes individuals asymptomatically. Many lines of evidence generated over decades have shown that the hyaluronic acid capsule is a major virulence factor contributing to these infections. While conducting a whole-genome analysis of the in vivo molecular genetic changes that occur in GAS during longitudinal human pharyngeal interaction, we discovered that serotypes M4 and M22 GAS strains lack the hasABC genes necessary for hyaluronic acid capsule biosynthesis. Using targeted PCR, we found that all 491 temporally and geographically diverse disease isolates of these two serotypes studied lack the hasABC genes. Consistent with the lack of capsule synthesis genes, none of the strains produced detectable hyaluronic acid. Despite the lack of a hyaluronic acid capsule, all strains tested multiplied extensively ex vivo in human blood. Thus, counter to the prevailing concept in GAS pathogenesis research, strains of these two serotypes do not require hyaluronic acid to colonize the upper respiratory tract or cause abundant mucosal or invasive human infections. We speculate that serotype M4 and M22 GAS have alternative, compensatory mechanisms that promote virulence. A century of study of the antiphagocytic hyaluronic acid capsule made by group A streptococcus has led to the concept that it is a major virulence factor contributing to human pharyngeal and invasive infections. However, the discovery that some strains that cause abundant human infections lack hyaluronic acid biosynthetic genes and fail to produce this capsule provides a new stimulus for research designed to understand the group A streptococcus factors contributing to pharyngeal infection and invasive disease episodes.
Collapse
|
255
|
Gullapalli RR, Desai KV, Santana-Santos L, Kant JA, Becich MJ. Next generation sequencing in clinical medicine: Challenges and lessons for pathology and biomedical informatics. J Pathol Inform 2012; 3:40. [PMID: 23248761 PMCID: PMC3519097 DOI: 10.4103/2153-3539.103013] [Citation(s) in RCA: 95] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2012] [Accepted: 07/19/2012] [Indexed: 11/25/2022] Open
Abstract
The Human Genome Project (HGP) provided the initial draft of mankind's DNA sequence in 2001. The HGP was produced by 23 collaborating laboratories using Sanger sequencing of mapped regions as well as shotgun sequencing techniques in a process that occupied 13 years at a cost of ~$3 billion. Today, Next Generation Sequencing (NGS) techniques represent the next phase in the evolution of DNA sequencing technology at dramatically reduced cost compared to traditional Sanger sequencing. A single laboratory today can sequence the entire human genome in a few days for a few thousand dollars in reagents and staff time. Routine whole exome or even whole genome sequencing of clinical patients is well within the realm of affordability for many academic institutions across the country. This paper reviews current sequencing technology methods and upcoming advancements in sequencing technology as well as challenges associated with data generation, data manipulation and data storage. Implementation of routine NGS data in cancer genomics is discussed along with potential pitfalls in the interpretation of the NGS data. The overarching importance of bioinformatics in the clinical implementation of NGS is emphasized.[7] We also review the issue of physician education which also is an important consideration for the successful implementation of NGS in the clinical workplace. NGS technologies represent a golden opportunity for the next generation of pathologists to be at the leading edge of the personalized medicine approaches coming our way. Often under-emphasized issues of data access and control as well as potential ethical implications of whole genome NGS sequencing are also discussed. Despite some challenges, it's hard not to be optimistic about the future of personalized genome sequencing and its potential impact on patient care and the advancement of knowledge of human biology and disease in the near future.
Collapse
Affiliation(s)
- Rama R Gullapalli
- Department of Pathology, University of Pittsburgh Medical Centre, A701, Scaife Hall, 3550 Terrace Street, Pittsburgh, PA
| | | | | | | | | |
Collapse
|
256
|
|
257
|
Ramos RTJ, Carneiro AR, Azevedo V, Schneider MP, Barh D, Silva A. Simplifier: a web tool to eliminate redundant NGS contigs. Bioinformation 2012; 8:996-9. [PMID: 23275695 PMCID: PMC3524941 DOI: 10.6026/97320630008996] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2012] [Accepted: 08/28/2012] [Indexed: 01/31/2023] Open
Abstract
UNLABELLED Modern genomic sequencing technologies produce a large amount of data with reduced cost per base; however, this data consists of short reads. This reduction in the size of the reads, compared to those obtained with previous methodologies, presents new challenges, including a need for efficient algorithms for the assembly of genomes from short reads and for resolving repetitions. Additionally after abinitio assembly, curation of the hundreds or thousands of contigs generated by assemblers demands considerable time and computational resources. We developed Simplifier, a stand-alone software that selectively eliminates redundant sequences from the collection of contigs generated by ab initio assembly of genomes. Application of Simplifier to data generated by assembly of the genome of Corynebacterium pseudotuberculosis strain 258 reduced the number of contigs generated by ab initio methods from 8,004 to 5,272, a reduction of 34.14%; in addition, N50 increased from 1 kb to 1.5 kb. Processing the contigs of Escherichia coli DH10B with Simplifier reduced the mate-paired library 17.47% and the fragment library 23.91%. Simplifier removed redundant sequences from datasets produced by assemblers, thereby reducing the effort required for finalization of genome assembly in tests with data from Prokaryotic organisms. AVAILABILITY Simplifier is available at http://www.genoma.ufpa.br/rramos/softwares/simplifier.xhtmlIt requires Sun jdk 6 or higher.
Collapse
Affiliation(s)
| | | | - Vasco Azevedo
- Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | | | - Debmalya Barh
- Centre for Genomics and Applied Gene Technology, Institute of Integrative Omics and Applied Biotechnology (IIOAB), Nonakuri, Purba Medinipur, WB-721172, India
| | - Artur Silva
- Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém, PA, Brazil
| |
Collapse
|
258
|
Casseb SMM, Cardoso JF, Ramos R, Carneiro A, Nunes M, Vasconcelos PFC, Silva A. Optimization of dengue virus genome assembling using GSFLX 454 pyrosequencing data: evaluation of assembling strategies. GENETICS AND MOLECULAR RESEARCH 2012; 11:3688-95. [PMID: 22930429 DOI: 10.4238/2012.august.17.6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Currently assembling genomes without reference is one of the most important challenges for bioinformaticists all over the world in an attempt to characterize new organisms. The current study has used two dengue virus type 4 (DENV-4) strains recently isolated in Brazil, which have its genomes sequenced using the GSFLX 454 sequencer (Roche, Life Science) by the pyrosequencing method. The GSFLX 454 data were used for testing different genome assembling strategies. We described a pipeline that was able to recover more than 96% of the sequenced genome in a single run and could be helpful for further assembly attempts of other DENV genomes, as well as other RNA virus-like genomes.
Collapse
Affiliation(s)
- S M M Casseb
- Departamento de Arbovirologia e Febres Hemorrágicas, Instituto Evandro Chagas, Ananindeua, PA, Brasil.
| | | | | | | | | | | | | |
Collapse
|
259
|
Reddy JS, Kumar R, Watt JM, Lawrence ML, Burgess SC, Nanduri B. Transcriptome profile of a bovine respiratory disease pathogen: Mannheimia haemolytica PHL213. BMC Bioinformatics 2012; 13 Suppl 15:S4. [PMID: 23046475 PMCID: PMC3439734 DOI: 10.1186/1471-2105-13-s15-s4] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Background Computational methods for structural gene annotation have propelled gene discovery but face certain drawbacks with regards to prokaryotic genome annotation. Identification of transcriptional start sites, demarcating overlapping gene boundaries, and identifying regulatory elements such as small RNA are not accurate using these approaches. In this study, we re-visit the structural annotation of Mannheimia haemolytica PHL213, a bovine respiratory disease pathogen. M. haemolytica is one of the causative agents of bovine respiratory disease that results in about $3 billion annual losses to the cattle industry. We used RNA-Seq and analyzed the data using freely-available computational methods and resources. The aim was to identify previously unannotated regions of the genome using RNA-Seq based expression profile to complement the existing annotation of this pathogen. Results Using the Illumina Genome Analyzer, we generated 9,055,826 reads (average length ~76 bp) and aligned them to the reference genome using Bowtie. The transcribed regions were analyzed using SAMTOOLS and custom Perl scripts in conjunction with BLAST searches and available gene annotation information. The single nucleotide resolution map enabled the identification of 14 novel protein coding regions as well as 44 potential novel sRNA. The basal transcription profile revealed that 2,506 of the 2,837 annotated regions were expressed in vitro, at 95.25% coverage, representing all broad functional gene categories in the genome. The expression profile also helped identify 518 potential operon structures involving 1,086 co-expressed pairs. We also identified 11 proteins with mutated/alternate start codons. Conclusions The application of RNA-Seq based transcriptome profiling to structural gene annotation helped correct existing annotation errors and identify potential novel protein coding regions and sRNA. We used computational tools to predict regulatory elements such as promoters and terminators associated with the novel expressed regions for further characterization of these novel functional elements. Our study complements the existing structural annotation of Mannheimia haemolytica PHL213 based on experimental evidence. Given the role of sRNA in virulence gene regulation and stress response, potential novel sRNA described in this study can form the framework for future studies to determine the role of sRNA, if any, in M. haemolytica pathogenesis.
Collapse
Affiliation(s)
- Joseph S Reddy
- College of Veterinary Medicine, Mississippi State University, Mississippi State, MS 39762, USA
| | | | | | | | | | | |
Collapse
|
260
|
Genotypic and phenotypic evaluation of the evolution of high-level daptomycin nonsusceptibility in vancomycin-resistant Enterococcus faecium. Antimicrob Agents Chemother 2012; 56:6051-3. [PMID: 22948885 DOI: 10.1128/aac.01318-12] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Whole-genome sequencing and cell membrane studies of three clonal Enterococcus faecium strains with daptomycin MICs of 4, 32, and 192 μg/ml were performed, revealing nonsynonymous single nucleotide variants in eight open reading frames, including those predicted to encode a phosphoenolpyruvate-dependent, mannose-specific phosphotransferase system, cardiolipin synthetase, and EzrA. Membrane studies revealed a higher net surface charge among the daptomycin-nonsusceptible isolates and increased septum formation in the isolate with a daptomycin MIC of 192 μg/ml.
Collapse
|
261
|
Rodrigues FA, Marcolino-Gomes J, de Fátima Corrêa Carvalho J, do Nascimento LC, Neumaier N, Farias JRB, Carazzolle MF, Marcelino FC, Nepomuceno AL. Subtractive libraries for prospecting differentially expressed genes in the soybean under water deficit. Genet Mol Biol 2012; 35:304-14. [PMID: 22802715 PMCID: PMC3392882 DOI: 10.1590/s1415-47572012000200011] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Soybean has a wide range of applications in the industry and, due to its crop potential, its improvement is widely desirable. During drought conditions, soybean crops suffer significant losses in productivity. Therefore, understanding the responses of the soybean under this stress is an effective way of targeting crop improvement techniques. In this study, we employed the Suppressive Subtractive Hybridization (SSH) technique to investigate differentially expressed genes under water deficit conditions. Embrapa 48 and BR 16 soybean lines, known as drought-tolerant and -sensitive, respectively, were grown hydroponically and subjected to different short-term periods of stress by withholding the nutrient solution. Using this approach, we have identified genes expressed during the early response to water deficit in roots and leaves. These genes were compared among the lines to assess probable differences in the plant transcriptomes. In general, similar biochemical processes were predominant in both cultivars; however, there were more considerable differences between roots and leaves of Embrapa 48. Moreover, we present here a fast, clean and straightforward method to obtain drought-stressed root tissues and a large enriched collection of transcripts expressed by soybean plants under water deficit that can be useful for further studies towards the understanding of plant responses to stress.
Collapse
|
262
|
Complete genome sequence of Corynebacterium pseudotuberculosis strain Cp267, isolated from a llama. J Bacteriol 2012; 194:3567-8. [PMID: 22689248 DOI: 10.1128/jb.00461-12] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
In this work we report the genome of Corynebacterium pseudotuberculosis strain 267, isolated from a llama. This pathogen is of great veterinary and economic importance, as it is the cause of caseous lymphadenitis in several livestock species around the world and causes significant losses due to the high cost of treatment.
Collapse
|
263
|
Vyverman M, De Baets B, Fack V, Dawyndt P. Prospects and limitations of full-text index structures in genome analysis. Nucleic Acids Res 2012; 40:6993-7015. [PMID: 22584621 PMCID: PMC3424560 DOI: 10.1093/nar/gks408] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2012] [Revised: 04/16/2012] [Accepted: 04/19/2012] [Indexed: 11/21/2022] Open
Abstract
The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared.
Collapse
Affiliation(s)
- Michaël Vyverman
- Department of Applied Mathematics and Computer Science, Ghent University, Building S9, 281 Krijgslaan, Belgium.
| | | | | | | |
Collapse
|
264
|
Borić M, Danevčič T, Stopar D. Viscosity dictates metabolic activity of Vibrio ruber. Front Microbiol 2012; 3:255. [PMID: 22826705 PMCID: PMC3399222 DOI: 10.3389/fmicb.2012.00255] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2012] [Accepted: 06/29/2012] [Indexed: 11/13/2022] Open
Abstract
Little is known about metabolic activity of bacteria, when viscosity of their environment changes. In this work, bacterial metabolic activity in media with viscosity ranging from 0.8 to 29.4 mPas was studied. Viscosities up to 2.4 mPas did not affect metabolic activity of Vibrio ruber. On the other hand, at 29.4 mPas respiration rate and total dehydrogenase activity increased 8 and 4-fold, respectively. The activity of glucose-6-phosphate dehydrogenase (GPD) increased up to 13-fold at higher viscosities. However, intensified metabolic activity did not result in faster growth rate. Increased viscosity delayed the onset as well as the duration of biosynthesis of prodigiosin. As an adaptation to viscous environment V. ruber increased metabolic flux through the pentose phosphate pathway and reduced synthesis of a secondary metabolite. In addition, V. ruber was able to modify the viscosity of its environment.
Collapse
Affiliation(s)
| | | | - David Stopar
- Chair of Microbiology, Biotechnical Faculty, Department of Food Science and Technology, University of LjubljanaLjubljana, Slovenia
| |
Collapse
|
265
|
Liu GE, Bickhart DM. Copy number variation in the cattle genome. Funct Integr Genomics 2012; 12:609-24. [DOI: 10.1007/s10142-012-0289-9] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Revised: 06/13/2012] [Accepted: 06/20/2012] [Indexed: 11/29/2022]
|
266
|
Review of general algorithmic features for genome assemblers for next generation sequencers. GENOMICS PROTEOMICS & BIOINFORMATICS 2012; 10:58-73. [PMID: 22768980 PMCID: PMC5054208 DOI: 10.1016/j.gpb.2012.05.006] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2011] [Accepted: 10/26/2011] [Indexed: 01/09/2023]
Abstract
In the realm of bioinformatics and computational biology, the most rudimentary data upon which all the analysis is built is the sequence data of genes, proteins and RNA. The sequence data of the entire genome is the solution to the genome assembly problem. The scope of this contribution is to provide an overview on the art of problem-solving applied within the domain of genome assembly in the next-generation sequencing (NGS) platforms. This article discusses the major genome assemblers that were proposed in the literature during the past decade by outlining their basic working principles. It is intended to act as a qualitative, not a quantitative, tutorial to all working on genome assemblers pertaining to the next generation of sequencers. We discuss the theoretical aspects of various genome assemblers, identifying their working schemes. We also discuss briefly the direction in which the area is headed towards along with discussing core issues on software simplicity.
Collapse
|
267
|
Conway T, Wazny J, Bromage A, Zobel J, Beresford-Smith B. Gossamer--a resource-efficient de novo assembler. Bioinformatics 2012; 28:1937-8. [PMID: 22611131 DOI: 10.1093/bioinformatics/bts297] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The de novo assembly of short read high-throughput sequencing data poses significant computational challenges. The volume of data is huge; the reads are tiny compared to the underlying sequence, and there are significant numbers of sequencing errors. There are numerous software packages that allow users to assemble short reads, but most are either limited to relatively small genomes (e.g. bacteria) or require large computing infrastructure or employ greedy algorithms and thus often do not yield high-quality results. RESULTS We have developed Gossamer, an implementation of the de Bruijn approach to assembly that requires close to the theoretical minimum of memory, but still allows efficient processing. Our results show that it is space efficient and produces high-quality assemblies. AVAILABILITY Gossamer is available for non-commercial use from http://www.genomics.csse.unimelb.edu.au/product-gossamer.php.
Collapse
Affiliation(s)
- Thomas Conway
- NICTA Victoria Research Laboratory, Department of Computing and Information Systems, The University of Melbourne, Parkville, Victoria 3010, Australia.
| | | | | | | | | |
Collapse
|
268
|
Gonnella G, Kurtz S. Readjoiner: a fast and memory efficient string graph-based sequence assembler. BMC Bioinformatics 2012; 13:82. [PMID: 22559072 PMCID: PMC3507659 DOI: 10.1186/1471-2105-13-82] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2011] [Accepted: 03/02/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Ongoing improvements in throughput of the next-generation sequencing technologies challenge the current generation of de novo sequence assemblers. Most recent sequence assemblers are based on the construction of a de Bruijn graph. An alternative framework of growing interest is the assembly string graph, not necessitating a division of the reads into k-mers, but requiring fast algorithms for the computation of suffix-prefix matches among all pairs of reads. RESULTS Here we present efficient methods for the construction of a string graph from a set of sequencing reads. Our approach employs suffix sorting and scanning methods to compute suffix-prefix matches. Transitive edges are recognized and eliminated early in the process and the graph is efficiently constructed including irreducible edges only. CONCLUSIONS Our suffix-prefix match determination and string graph construction algorithms have been implemented in the software package Readjoiner. Comparison with existing string graph-based assemblers shows that Readjoiner is faster and more space efficient. Readjoiner is available at http://www.zbh.uni-hamburg.de/readjoiner.
Collapse
Affiliation(s)
- Giorgio Gonnella
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | | |
Collapse
|
269
|
Genomic characterization of the conditionally dispensable chromosome in Alternaria arborescens provides evidence for horizontal gene transfer. BMC Genomics 2012; 13:171. [PMID: 22559316 PMCID: PMC3443068 DOI: 10.1186/1471-2164-13-171] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2011] [Accepted: 03/08/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Fungal plant pathogens cause serious agricultural losses worldwide. Alternaria arborescens is a major pathogen of tomato, with its virulence determined by the presence of a conditionally dispensable chromosome (CDC) carrying host-specific toxin genes. Genes encoding these toxins are well-studied, however the genomic content and organization of the CDC is not known. RESULTS To gain a richer understanding of the molecular determinants of virulence and the evolution of pathogenicity, we performed whole genome sequencing of A. arborescens. Here we present the de-novo assembly of the CDC and its predicted gene content. Also presented is hybridization data validating the CDC assembly. Predicted genes were functionally annotated through BLAST. Gene ontology terms were assigned, and conserved domains were identified. Differences in nucleotide usage were found between CDC genes and those on the essential chromosome (EC), including GC3-content, codon usage bias, and repeat region load. Genes carrying PKS and NRPS domains were identified in clusters on the CDC and evidence supporting the origin of the CDC through horizontal transfer from an unrelated fungus was found. CONCLUSIONS We provide evidence supporting the hypothesis that the CDC in A. arborescens was acquired through horizontal transfer, likely from an unrelated fungus. We also identified several predicted CDC genes under positive selection that may serve as candidate virulence factors.
Collapse
|
270
|
Cantacessi C, Campbell BE, Gasser RB. Key strongylid nematodes of animals — Impact of next-generation transcriptomics on systems biology and biotechnology. Biotechnol Adv 2012; 30:469-88. [DOI: 10.1016/j.biotechadv.2011.08.016] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2011] [Revised: 08/09/2011] [Accepted: 08/19/2011] [Indexed: 10/17/2022]
|
271
|
Tao X, Gu YH, Wang HY, Zheng W, Li X, Zhao CW, Zhang YZ. Digital gene expression analysis based on integrated de novo transcriptome assembly of sweet potato [Ipomoea batatas (L.) Lam]. PLoS One 2012; 7:e36234. [PMID: 22558397 PMCID: PMC3338685 DOI: 10.1371/journal.pone.0036234] [Citation(s) in RCA: 110] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2011] [Accepted: 03/29/2012] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Sweet potato (Ipomoea batatas L. [Lam.]) ranks among the top six most important food crops in the world. It is widely grown throughout the world with high and stable yield, strong adaptability, rich nutrient content, and multiple uses. However, little is known about the molecular biology of this important non-model organism due to lack of genomic resources. Hence, studies based on high-throughput sequencing technologies are needed to get a comprehensive and integrated genomic resource and better understanding of gene expression patterns in different tissues and at various developmental stages. METHODOLOGY/PRINCIPAL FINDINGS Illumina paired-end (PE) RNA-Sequencing was performed, and generated 48.7 million of 75 bp PE reads. These reads were de novo assembled into 128,052 transcripts (≥ 100 bp), which correspond to 41.1 million base pairs, by using a combined assembly strategy. Transcripts were annotated by Blast2GO and 51,763 transcripts got BLASTX hits, in which 39,677 transcripts have GO terms and 14,117 have ECs that are associated with 147 KEGG pathways. Furthermore, transcriptome differences of seven tissues were analyzed by using Illumina digital gene expression (DGE) tag profiling and numerous differentially and specifically expressed transcripts were identified. Moreover, the expression characteristics of genes involved in viral genomes, starch metabolism and potential stress tolerance and insect resistance were also identified. CONCLUSIONS/SIGNIFICANCE The combined de novo transcriptome assembly strategy can be applied to other organisms whose reference genomes are not available. The data provided here represent the most comprehensive and integrated genomic resources for cloning and identifying genes of interest in sweet potato. Characterization of sweet potato transcriptome provides an effective tool for better understanding the molecular mechanisms of cellular processes including development of leaves and storage roots, tissue-specific gene expression, potential biotic and abiotic stress response in sweet potato.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Yi-Zheng Zhang
- Key Laboratory of Bio-resources and Eco-environment, Ministry of Education, Sichuan Key Laboratory of Molecular Biology and Biotechnology, Center for Functional Genomics and Bioinformatics, College of Life Sciences, Sichuan University, Chengdu, Sichuan, People's Republic of China
| |
Collapse
|
272
|
Chen CC, Lin WD, Chang YJ, Chen CL, Ho JM. Enhancing de novo transcriptome assembly by incorporating multiple overlap sizes. ISRN BIOINFORMATICS 2012; 2012:816402. [PMID: 25969752 PMCID: PMC4417554 DOI: 10.5402/2012/816402] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2011] [Accepted: 02/09/2012] [Indexed: 11/23/2022]
Abstract
Background. The emergence of next-generation sequencing platform gives rise to a new generation of assembly algorithms. Compared with the Sanger sequencing data, the next-generation sequence data present shorter reads, higher coverage depth, and different error profiles. These features bring new challenging issues for de novo transcriptome assembly.
Methodology. To explore the influence of these features on assembly algorithms, we studied the relationship between read overlap size, coverage depth, and error rate using simulated data. According to the relationship, we propose a de novo transcriptome assembly procedure, called Euler-mix, and demonstrate its performance on a real transcriptome dataset of mice. The simulation tool and evaluation tool are freely available as open source. Significance. Euler-mix is a straightforward pipeline; it focuses on dealing with the variation of coverage depth of short reads dataset. The experiment result showed that Euler-mix improves the performance of de novo transcriptome assembly.
Collapse
Affiliation(s)
- Chien-Chih Chen
- Department of Computer Science and Information Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan
| | - Wen-Dar Lin
- Institute of Plant and Microbial Biology, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 115, Taiwan
| | - Yu-Jung Chang
- Institute of Information Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 115, Taiwan
| | - Chuen-Liang Chen
- Department of Computer Science and Information Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan
| | - Jan-Ming Ho
- Institute of Information Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 115, Taiwan
| |
Collapse
|
273
|
Peng Y, Leung HCM, Yiu SM, Chin FYL. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 2012; 28:1420-8. [PMID: 22495754 DOI: 10.1093/bioinformatics/bts174] [Citation(s) in RCA: 2083] [Impact Index Per Article: 160.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Next-generation sequencing allows us to sequence reads from a microbial environment using single-cell sequencing or metagenomic sequencing technologies. However, both technologies suffer from the problem that sequencing depth of different regions of a genome or genomes from different species are highly uneven. Most existing genome assemblers usually have an assumption that sequencing depths are even. These assemblers fail to construct correct long contigs. RESULTS We introduce the IDBA-UD algorithm that is based on the de Bruijn graph approach for assembling reads from single-cell sequencing or metagenomic sequencing technologies with uneven sequencing depths. Several non-trivial techniques have been employed to tackle the problems. Instead of using a simple threshold, we use multiple depthrelative thresholds to remove erroneous k-mers in both low-depth and high-depth regions. The technique of local assembly with paired-end information is used to solve the branch problem of low-depth short repeat regions. To speed up the process, an error correction step is conducted to correct reads of high-depth regions that can be aligned to highconfident contigs. Comparison of the performances of IDBA-UD and existing assemblers (Velvet, Velvet-SC, SOAPdenovo and Meta-IDBA) for different datasets, shows that IDBA-UD can reconstruct longer contigs with higher accuracy. AVAILABILITY The IDBA-UD toolkit is available at our website http://www.cs.hku.hk/~alse/idba_ud
Collapse
Affiliation(s)
- Yu Peng
- Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong
| | | | | | | |
Collapse
|
274
|
Abstract
In recent years, the number of sequencing data of plant whole genome have been increasing rapidly and the whole genome sequencing has been also performed widely in woody plants. However, there are a set of obstacles in investigating the whole genome sequencing in woody plants, which include larger genome, complex genome structure, limitations of assembly, annotation, functional analysis, and restriction of the funds for scientific research. Therefore, to promote the efficiency of the whole genome sequencing in woody plants, the development and defect of this field should be analyzed. The three-generation sequencing technologies (i.e., Sanger sequencing, synthesis sequencing, and single molecule sequencing) were compared in our studies. The progress mainly focused on the whole genome sequencing in four woody plants (Populus, Grapevine, Papaya, and Apple), and the application of sequencing results also was analyzed. The future of whole genome sequencing research in woody plants, consisting of material selection, establishment of genetic map and physical map, selection of sequencing technology, bioinformatic analysis, and application of sequencing results, was discussed.
Collapse
|
275
|
CANTACESSI C, CAMPBELL BE, JEX AR, YOUNG ND, HALL RS, RANGANATHAN S, GASSER RB. Bioinformatics meets parasitology. Parasite Immunol 2012; 34:265-75. [DOI: 10.1111/j.1365-3024.2011.01304.x] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
|
276
|
Bohle HM, Gabaldón T. Selection of marker genes using whole-genome DNA polymorphism analysis. Evol Bioinform Online 2012; 8:161-9. [PMID: 22474405 PMCID: PMC3315472 DOI: 10.4137/ebo.s8989] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Molecular markers serve to assign individual samples to specific groups. Such markers should be easily identified and have a high discrimination power, being highly conserved within groups while showing sufficient variability between the groups that are to be distinguished. The availability of a large number of complete genomic sequences now enables the informed selection of genes as molecular markers based on the observed patterns of variability. We derived a new scoring system based on observed DNA polymorphic differences, and which uses the Bayes theorem as adapted by Wilcox. For validation, we applied this system to the problem of identifying individual species within a prokaryotic (Vibrio) and a eukaryotic (Diphyllobothrium) genus for validation. Top-scoring candidates genes Chromosome segregation ATPase and ATPase-subunit 6 showed better discrimination power in Vibrio and Diphyllobothrium, respectively, as compared to standard molecular markers (recA, dnaJ and atpA for Vibrio, and 18s rRNA, ITS and COX1 for Diphyllobothrium).
Collapse
Affiliation(s)
- Harry M Bohle
- Bioinformática, Universidad Internacional de Andalucía, Málaga, Spain
| | | |
Collapse
|
277
|
Genome sequence comparison of two United States live attenuated vaccines of infectious laryngotracheitis virus (ILTV). Virus Genes 2012; 44:470-4. [DOI: 10.1007/s11262-012-0728-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2011] [Accepted: 02/14/2012] [Indexed: 10/28/2022]
|
278
|
Lassen KS, Schultz H, Heegaard NHH, He M. A novel DNAseq program for enhanced analysis of Illumina GAII data: a case study on antibody complementarity-determining regions. N Biotechnol 2012; 29:271-8. [PMID: 22155428 DOI: 10.1016/j.nbt.2011.11.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2011] [Revised: 11/09/2011] [Accepted: 11/25/2011] [Indexed: 11/16/2022]
Abstract
High-throughput DNA sequencing technologies are increasingly becoming powerful systems for the comprehensive analysis of variations in whole genomes or various DNA libraries. As they are capable of producing massive collections of short sequences with varying lengths, a major challenge is how to turn these reads into biologically meaningful information. The first stage is to assemble the short reads into longer sequences through an in silico process. However, currently available software/programs allow only the assembly of abundant sequences, which apparently results in the loss of highly variable (or rare) sequences or creates artefact assemblies. In this paper, we describe a novel program (DNAseq) that is capable of assembling highly variable sequences and displaying them directly for phylogenetic analysis. In addition, this program is Microsoft Windows-based and runs by a normal PC with 700MB RAM for a general use. We have applied it to analyse a human naive single-chain antibody (scFv) library, comprehensively revealing the diversity of antibody variable complementarity-determining regions (CDRs) and their families. Although only a scFv library was exemplified here, we envisage that this program could be applicable to other genome libraries.
Collapse
Affiliation(s)
- Klaus S Lassen
- Department of Clinical Biochemistry and Immunology, Statens Serum Institut, Artillerivej 5, 2300 Copenhagen S, Denmark.
| | | | | | | |
Collapse
|
279
|
Gasser RB, Cantacessi C. Heartworm genomics: unprecedented opportunities for fundamental molecular insights and new intervention strategies. Top Companion Anim Med 2012; 26:193-9. [PMID: 22152607 DOI: 10.1053/j.tcam.2011.09.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Vector-borne diseases, including canine heartworm disease (CHWD), are of major socioeconomic and canine health importance worldwide. Although many studies have provided insights into CHWD, to date there has been limited study of fundamental molecular aspects of Dirofilaria immitis itself, its relationship with the canine host, its vectors, as well as the potential of drug resistance to emerge, using advanced -omic technologies. This article takes a prospective view of the benefits that advanced -omics technologies will have toward understanding D. immitis and CHWD. Tackling key biological questions using these technologies will provide a "systems biology" context and could lead to radically new intervention and management strategies against heartworm.
Collapse
Affiliation(s)
- Robin B Gasser
- Faculty of Veterinary Science, The University of Melbourne, Parkville, Victoria 3010, Australia.
| | | |
Collapse
|
280
|
Physical and Linkage Maps for Drosophila serrata, a Model Species for Studies of Clinal Adaptation and Sexual Selection. G3-GENES GENOMES GENETICS 2012; 2:287-97. [PMID: 22384407 PMCID: PMC3284336 DOI: 10.1534/g3.111.001354] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/05/2011] [Accepted: 12/12/2011] [Indexed: 11/28/2022]
Abstract
Drosophila serrata is a member of the montium group, which contains more than 98 species and until recently was considered a subgroup within the melanogaster group. This Drosophila species is an emerging model system for evolutionary quantitative genetics and has been used in studies of species borders, clinal variation and sexual selection. Despite the importance of D. serrata as a model for evolutionary research, our poor understanding of its genome remains a significant limitation. Here, we provide a first-generation gene-based linkage map and a physical map for this species. Consistent with previous studies of other drosophilids we observed strong conservation of genes within chromosome arms homologous with D. melanogaster but major differences in within-arm synteny. These resources will be a useful complement to ongoing genome sequencing efforts and QTL mapping studies in this species.
Collapse
|
281
|
Tripathy S, Jiang RHY. Massively parallel sequencing technology in pathogenic microbes. Methods Mol Biol 2012; 835:271-94. [PMID: 22183660 DOI: 10.1007/978-1-61779-501-5_17] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Next-Generation Sequencing (NGS) methods have revolutionized various aspects of genomics including transcriptome analysis. Digital expression analysis is all set to replace analog expression analysis that uses microarray chips through their cost-effectiveness, reproducibility, accuracy, and speed. The last 2 years have seen a surge in the development of statistical methods and software tools for analysis and visualization of NGS data. Large amounts of NGS data are available for pathogenic fungi and oomycetes. As the analysis results start pouring in, it brings about a paradigm shift in the understanding of host pathogen interactions with discovery of new transcripts, splice variants, mutations, regulatory elements, and epigenetic controls. Here we describe the core technology of the new sequencing platforms, the methodology of data analysis, and different aspects of applications.
Collapse
Affiliation(s)
- Sucheta Tripathy
- Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA.
| | | |
Collapse
|
282
|
Lee HC, Lai K, Lorenc MT, Imelfort M, Duran C, Edwards D. Bioinformatics tools and databases for analysis of next-generation sequence data. Brief Funct Genomics 2011; 11:12-24. [DOI: 10.1093/bfgp/elr037] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
|
283
|
Barrero RA, Chapman B, Yang Y, Moolhuijzen P, Keeble-Gagnère G, Zhang N, Tang Q, Bellgard MI, Qiu D. De novo assembly of Euphorbia fischeriana root transcriptome identifies prostratin pathway related genes. BMC Genomics 2011; 12:600. [PMID: 22151917 PMCID: PMC3273484 DOI: 10.1186/1471-2164-12-600] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2011] [Accepted: 12/13/2011] [Indexed: 11/17/2022] Open
Abstract
Background Euphorbia fischeriana is an important medicinal plant found in Northeast China. The plant roots contain many medicinal compounds including 12-deoxyphorbol-13-acetate, commonly known as prostratin that is a phorbol ester from the tigliane diterpene series. Prostratin is a protein kinase C activator and is effective in the treatment of Human Immunodeficiency Virus (HIV) by acting as a latent HIV activator. Latent HIV is currently the biggest limitation for viral eradication. The aim of this study was to sequence, assemble and annotate the E. fischeriana transcriptome to better understand the potential biochemical pathways leading to the synthesis of prostratin and other related diterpene compounds. Results In this study we conducted a high throughput RNA-seq approach to sequence the root transcriptome of E. fischeriana. We assembled 18,180 transcripts, of these the majority encoded protein-coding genes and only 17 transcripts corresponded to known RNA genes. Interestingly, we identified 5,956 protein-coding transcripts with high similarity (> = 75%) to Ricinus communis, a close relative to E. fischeriana. We also evaluated the conservation of E. fischeriana genes against EST datasets from the Euphorbeacea family, which included R. communis, Hevea brasiliensis and Euphorbia esula. We identified a core set of 1,145 gene clusters conserved in all four species and 1,487 E. fischeriana paralogous genes. Furthermore, we screened E. fischeriana transcripts against an in-house reference database for genes implicated in the biosynthesis of upstream precursors to prostratin. This identified 24 and 9 candidate transcripts involved in the terpenoid and diterpenoid biosyntehsis pathways, respectively. The majority of the candidate genes in these pathways presented relatively low expression levels except for 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase (HDS) and isopentenyl diphosphate/dimethylallyl diphosphate synthase (IDS), which are required for multiple downstream pathways including synthesis of casbene, a proposed precursor to prostratin. Conclusion The resources generated in this study provide new insights into the upstream pathways to the synthesis of prostratin and will likely facilitate functional studies aiming to produce larger quantities of this compound for HIV research and/or treatment of patients.
Collapse
Affiliation(s)
- Roberto A Barrero
- Centre for Comparative Genomics, Murdoch University, WA 6150, Australia
| | | | | | | | | | | | | | | | | |
Collapse
|
284
|
Abeel T, Van Parys T, Saeys Y, Galagan J, Van de Peer Y. GenomeView: a next-generation genome browser. Nucleic Acids Res 2011; 40:e12. [PMID: 22102585 PMCID: PMC3258165 DOI: 10.1093/nar/gkr995] [Citation(s) in RCA: 114] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Due to ongoing advances in sequencing technologies, billions of nucleotide sequences are now produced on a daily basis. A major challenge is to visualize these data for further downstream analysis. To this end, we present GenomeView, a stand-alone genome browser specifically designed to visualize and manipulate a multitude of genomics data. GenomeView enables users to dynamically browse high volumes of aligned short-read data, with dynamic navigation and semantic zooming, from the whole genome level to the single nucleotide. At the same time, the tool enables visualization of whole genome alignments of dozens of genomes relative to a reference sequence. GenomeView is unique in its capability to interactively handle huge data sets consisting of tens of aligned genomes, thousands of annotation features and millions of mapped short reads both as viewer and editor. GenomeView is freely available as an open source software package.
Collapse
Affiliation(s)
- Thomas Abeel
- Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, Belgium.
| | | | | | | | | |
Collapse
|
285
|
Gui J, Patel IR. Recent advances in molecular technologies and their application in pathogen detection in foods with particular reference to yersinia. J Pathog 2011; 2011:310135. [PMID: 22567329 PMCID: PMC3335726 DOI: 10.4061/2011/310135] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2011] [Accepted: 09/08/2011] [Indexed: 12/20/2022] Open
Abstract
Yersinia enterocolitica is an important zoonotic pathogen that can cause yersiniosis in humans and animals. Food has been suggested to be the main source of yersiniosis. It is critical for the researchers to be able to detect Yersinia or any other foodborne pathogen with increased sensitivity and specificity, as well as in real-time, in the case of a foodborne disease outbreak. Conventional detection methods are known to be labor intensive, time consuming, or expensive. On the other hand, more sensitive molecular-based detection methods like next generation sequencing, microarray, and many others are capable of providing faster results. DNA testing is now possible on a single molecule, and high-throughput analysis allows multiple detection reactions to be performed at once, thus allowing a range of characteristics to be rapidly and simultaneously determined. Despite better detection efficiencies, results derived using molecular biology methods can be affected by the various food matrixes. With the improvements in sample preparation, data analysis, and testing procedures, molecular detection techniques will likely continue to simplify and increase the speed of detection while simultaneously improving the sensitivity and specificity for tracking pathogens in food matrices.
Collapse
Affiliation(s)
- Jin Gui
- College of Management and Technology, Walden University, 155 Fifth Avenue South, Minneapolis, MN 55401, USA
| | - Isha R. Patel
- Division of Molecular Biology, Office of Applied Research and Safety Assessment, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, 8301 Muirkirk Road, MOD 1 Facility, Laurel, MD 20708, USA
| |
Collapse
|
286
|
Connecting genotype to phenotype in the era of high-throughput sequencing. Biochim Biophys Acta Gen Subj 2011; 1810:967-77. [PMID: 21421023 DOI: 10.1016/j.bbagen.2011.03.010] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2010] [Revised: 02/17/2011] [Accepted: 03/13/2011] [Indexed: 12/25/2022]
|
287
|
Drinnenberg IA, Fink GR, Bartel DP. Compatibility with killer explains the rise of RNAi-deficient fungi. Science 2011; 333:1592. [PMID: 21921191 DOI: 10.1126/science.1209575] [Citation(s) in RCA: 172] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
The RNA interference (RNAi) pathway is found in most eukaryotic lineages but curiously is absent in others, including that of Saccharomyces cerevisiae. We show that reconstituting RNAi in S. cerevisiae causes loss of a beneficial double-stranded RNA virus known as killer virus. Incompatibility between RNAi and killer viruses extends to other fungal species in that RNAi is absent in all species known to possess double-stranded RNA killer viruses, whereas killer viruses are absent in closely related species that retained RNAi. Thus, the advantage imparted by acquiring and retaining killer viruses explains the persistence of RNAi-deficient species during fungal evolution.
Collapse
Affiliation(s)
- Ines A Drinnenberg
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
| | | | | |
Collapse
|
288
|
Tang H, Yao Y, Zhang D, Meng X, Wang L, Yu H, Ma L, Xu P. A novel NADH-dependent and FAD-containing hydroxylase is crucial for nicotine degradation by Pseudomonas putida. J Biol Chem 2011; 286:39179-87. [PMID: 21949128 DOI: 10.1074/jbc.m111.283929] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Nicotine, the main alkaloid produced by Nicotiana tabacum and other Solanaceae, is very toxic and may be a leading toxicant causing preventable disease and death, with the rise in global tobacco consumption. Several different microbial pathways of nicotine metabolism have been reported: Arthrobacter uses the pyridine pathway, and Pseudomonas, like mammals, uses the pyrrolidine pathway. We identified and characterized a novel 6-hydroxy-3-succinoyl-pyridine (HSP) hydroxylase (HspB) using enzyme purification, peptide sequencing, and sequencing of the Pseudomonas putida S16 genome. The HSP hydroxylase has no known orthologs and converts HSP to 2,5-dihydroxy-pyridine and succinic semialdehyde, using NADH. (18)O(2) labeling experiments provided direct evidence for the incorporation of oxygen from O(2) into 2,5-dihydroxy-pyridine. The hspB gene deletion showed that this enzyme is essential for nicotine degradation, and site-directed mutagenesis identified an FAD-binding domain. This study demonstrates the importance of the newly discovered enzyme HspB, which is crucial for nicotine degradation by the Pseudomonas strain.
Collapse
Affiliation(s)
- Hongzhi Tang
- State Key Laboratory of Microbial Metabolism & School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | | | | | | | | | | | | | | |
Collapse
|
289
|
Chitsaz H, Yee-Greenbaum JL, Tesler G, Lombardo MJ, Dupont CL, Badger JH, Novotny M, Rusch DB, Fraser LJ, Gormley NA, Schulz-Trieglaff O, Smith GP, Evers DJ, Pevzner PA, Lasken RS. Efficient de novo assembly of single-cell bacterial genomes from short-read data sets. Nat Biotechnol 2011; 29:915-21. [PMID: 21926975 PMCID: PMC3558281 DOI: 10.1038/nbt.1966] [Citation(s) in RCA: 161] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2010] [Accepted: 08/09/2011] [Indexed: 11/09/2022]
Abstract
Whole genome amplification by the multiple displacement amplification (MDA) method allows sequencing of DNA from single cells of bacteria that cannot be cultured. Assembling a genome is challenging, however, because MDA generates highly nonuniform coverage of the genome. Here we describe an algorithm tailored for short-read data from single cells that improves assembly through the use of a progressively increasing coverage cutoff. Assembly of reads from single Escherichia coli and Staphylococcus aureus cells captures >91% of genes within contigs, approaching the 95% captured from an assembly based on many E. coli cells. We apply this method to assemble a genome from a single cell of an uncultivated SAR324 clade of Deltaproteobacteria, a cosmopolitan bacterial lineage in the global ocean. Metabolic reconstruction suggests that SAR324 is aerobic, motile and chemotaxic. Our approach enables acquisition of genome assemblies for individual uncultivated bacteria using only short reads, providing cell-specific genetic information absent from metagenomic studies.
Collapse
Affiliation(s)
- Hamidreza Chitsaz
- Department of Computer Science, University of California, La Jolla, CA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
290
|
Earl D, Bradnam K, St John J, Darling A, Lin D, Fass J, Yu HOK, Buffalo V, Zerbino DR, Diekhans M, Nguyen N, Ariyaratne PN, Sung WK, Ning Z, Haimel M, Simpson JT, Fonseca NA, Birol İ, Docking TR, Ho IY, Rokhsar DS, Chikhi R, Lavenier D, Chapuis G, Naquin D, Maillet N, Schatz MC, Kelley DR, Phillippy AM, Koren S, Yang SP, Wu W, Chou WC, Srivastava A, Shaw TI, Ruby JG, Skewes-Cox P, Betegon M, Dimon MT, Solovyev V, Seledtsov I, Kosarev P, Vorobyev D, Ramirez-Gonzalez R, Leggett R, MacLean D, Xia F, Luo R, Li Z, Xie Y, Liu B, Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Yin S, Sharpe T, Hall G, Kersey PJ, Durbin R, Jackman SD, Chapman JA, Huang X, DeRisi JL, Caccamo M, Li Y, Jaffe DB, Green RE, Haussler D, Korf I, Paten B. Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res 2011; 21:2224-41. [PMID: 21926179 DOI: 10.1101/gr.126599.111] [Citation(s) in RCA: 324] [Impact Index Per Article: 23.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.
Collapse
Affiliation(s)
- Dent Earl
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
291
|
Earl D, Bradnam K, St John J, Darling A, Lin D, Fass J, Yu HOK, Buffalo V, Zerbino DR, Diekhans M, Nguyen N, Ariyaratne PN, Sung WK, Ning Z, Haimel M, Simpson JT, Fonseca NA, Birol İ, Docking TR, Ho IY, Rokhsar DS, Chikhi R, Lavenier D, Chapuis G, Naquin D, Maillet N, Schatz MC, Kelley DR, Phillippy AM, Koren S, Yang SP, Wu W, Chou WC, Srivastava A, Shaw TI, Ruby JG, Skewes-Cox P, Betegon M, Dimon MT, Solovyev V, Seledtsov I, Kosarev P, Vorobyev D, Ramirez-Gonzalez R, Leggett R, MacLean D, Xia F, Luo R, Li Z, Xie Y, Liu B, Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Yin S, Sharpe T, Hall G, Kersey PJ, Durbin R, Jackman SD, Chapman JA, Huang X, DeRisi JL, Caccamo M, Li Y, Jaffe DB, Green RE, Haussler D, Korf I, Paten B. Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res 2011. [PMID: 21926179 DOI: 10.1101/gr.126599] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.
Collapse
Affiliation(s)
- Dent Earl
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
292
|
Haiminen N, Kuhn DN, Parida L, Rigoutsos I. Evaluation of methods for de novo genome assembly from high-throughput sequencing reads reveals dependencies that affect the quality of the results. PLoS One 2011; 6:e24182. [PMID: 21915294 PMCID: PMC3168497 DOI: 10.1371/journal.pone.0024182] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2011] [Accepted: 08/01/2011] [Indexed: 12/19/2022] Open
Abstract
Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole-genome assembly an appealing target application. In this paper we evaluate the feasibility of de novo genome assembly from short reads (≤100 nucleotides) through a detailed study involving genomic sequences of various lengths and origin, in conjunction with several of the currently popular assembly programs. Our extensive analysis demonstrates that, in addition to sequencing coverage, attributes such as the architecture of the target genome, the identity of the used assembly program, the average read length and the observed sequencing error rates are powerful variables that affect the best achievable assembly of the target sequence in terms of size and correctness.
Collapse
Affiliation(s)
- Niina Haiminen
- Computational Biology Center, IBM Thomas J. Watson Research Center, Yorktown Heights, New York, United States of America
- * E-mail: (NH); (IR)
| | - David N. Kuhn
- Subtropical Horticulture Research Station, Agricultural Research Service (ARS), United States Department of Agriculture (USDA), Miami, Florida, United Sates of America
| | - Laxmi Parida
- Computational Biology Center, IBM Thomas J. Watson Research Center, Yorktown Heights, New York, United States of America
| | - Isidore Rigoutsos
- Computational Biology Center, IBM Thomas J. Watson Research Center, Yorktown Heights, New York, United States of America
- * E-mail: (NH); (IR)
| |
Collapse
|
293
|
Chapman JA, Ho I, Sunkara S, Luo S, Schroth GP, Rokhsar DS. Meraculous: de novo genome assembly with short paired-end reads. PLoS One 2011; 6:e23501. [PMID: 21876754 PMCID: PMC3158087 DOI: 10.1371/journal.pone.0023501] [Citation(s) in RCA: 125] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2010] [Accepted: 07/19/2011] [Indexed: 11/18/2022] Open
Abstract
We describe a new algorithm, meraculous, for whole genome assembly of deep paired-end short reads, and apply it to the assembly of a dataset of paired 75-bp Illumina reads derived from the 15.4 megabase genome of the haploid yeast Pichia stipitis. More than 95% of the genome is recovered, with no errors; half the assembled sequence is in contigs longer than 101 kilobases and in scaffolds longer than 269 kilobases. Incorporating fosmid ends recovers entire chromosomes. Meraculous relies on an efficient and conservative traversal of the subgraph of the k-mer (deBruijn) graph of oligonucleotides with unique high quality extensions in the dataset, avoiding an explicit error correction step as used in other short-read assemblers. A novel memory-efficient hashing scheme is introduced. The resulting contigs are ordered and oriented using paired reads separated by ∼280 bp or ∼3.2 kbp, and many gaps between contigs can be closed using paired-end placements. Practical issues with the dataset are described, and prospects for assembling larger genomes are discussed.
Collapse
Affiliation(s)
- Jarrod A Chapman
- U.S. Department of Energy Joint Genome Institute, Walnut Creek, California, United States of America.
| | | | | | | | | | | |
Collapse
|
294
|
Klein JD, Ossowski S, Schneeberger K, Weigel D, Huson DH. LOCAS--a low coverage assembly tool for resequencing projects. PLoS One 2011; 6:e23455. [PMID: 21858125 PMCID: PMC3156226 DOI: 10.1371/journal.pone.0023455] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2011] [Accepted: 07/18/2011] [Indexed: 11/18/2022] Open
Abstract
MOTIVATION Next Generation Sequencing (NGS) is a frequently applied approach to detect sequence variations between highly related genomes. Recent large-scale re-sequencing studies as the Human 1000 Genomes Project utilize NGS data of low coverage to afford sequencing of hundreds of individuals. Here, SNPs and micro-indels can be detected by applying an alignment-consensus approach. However, computational methods capable of discovering other variations such as novel insertions or highly diverged sequence from low coverage NGS data are still lacking. RESULTS We present LOCAS, a new NGS assembler particularly designed for low coverage assembly of eukaryotic genomes using a mismatch sensitive overlap-layout-consensus approach. LOCAS assembles homologous regions in a homology-guided manner while it performs de novo assemblies of insertions and highly polymorphic target regions subsequently to an alignment-consensus approach. LOCAS has been evaluated in homology-guided assembly scenarios with low sequence coverage of Arabidopsis thaliana strains sequenced as part of the Arabidopsis 1001 Genomes Project. While assembling the same amount of long insertions as state-of-the-art NGS assemblers, LOCAS showed best results regarding contig size, error rate and runtime. CONCLUSION LOCAS produces excellent results for homology-guided assembly of eukaryotic genomes with short reads and low sequencing depth, and therefore appears to be the assembly tool of choice for the detection of novel sequence variations in this scenario.
Collapse
|
295
|
Wright AM, Beres SB, Consamus EN, Long SW, Flores AR, Barrios R, Richter GS, Oh SY, Garufi G, Maier H, Drews AL, Stockbauer KE, Cernoch P, Schneewind O, Olsen RJ, Musser JM. Rapidly progressive, fatal, inhalation anthrax-like infection in a human: case report, pathogen genome sequencing, pathology, and coordinated response. Arch Pathol Lab Med 2011; 135:1447-59. [PMID: 21882964 DOI: 10.5858/2011-0362-sair.1] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
CONTEXT Ten years ago a bioterrorism event involving Bacillus anthracis spores captured the nation's interest, stimulated extensive new research on this pathogen, and heightened concern about illegitimate release of infectious agents. Sporadic reports have described rare, fulminant, and sometimes fatal cases of pneumonia in humans and nonhuman primates caused by strains of Bacillus cereus , a species closely related to Bacillus anthracis. OBJECTIVES To describe and investigate a case of rapidly progressive, fatal, anthrax-like pneumonia and the overwhelming infection caused by a Bacillus species of uncertain provenance in a patient residing in rural Texas. DESIGN We characterized the genome of the causative strain within days of its recovery from antemortem cultures using next-generation sequencing and performed immunohistochemistry on tissues obtained at autopsy with antibodies directed against virulence proteins of B anthracis and B cereus. RESULTS We discovered that the infection was caused by a previously unknown strain of B cereus that was closely related to, but genetically distinct from, B anthracis . The strain contains a plasmid similar to pXO1, a genetic element encoding anthrax toxin and other known virulence factors. Immunohistochemistry demonstrated that several homologs of B anthracis virulence proteins were made in infected tissues, likely contributing to the patient's death. CONCLUSIONS Rapid genome sequence analysis permitted us to genetically define this strain, rule out the likelihood of bioterrorism, and contribute effectively to the institutional response to this event. Our experience strongly reinforced the critical value of deploying a well-integrated, anatomic, clinical, and genomic strategy to respond rapidly to a potential emerging, infectious threat to public health.
Collapse
Affiliation(s)
- Angela M Wright
- Department of Pathology and Laboratory Medicine, The Methodist Hospital System, Houston, Texas, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
296
|
Sequencing and validation of the genome of a Campylobacter concisus reveals intra-species diversity. PLoS One 2011; 6:e22170. [PMID: 21829448 PMCID: PMC3146479 DOI: 10.1371/journal.pone.0022170] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2011] [Accepted: 06/16/2011] [Indexed: 01/31/2023] Open
Abstract
Campylobacter concisus is an emerging pathogen of the human gastrointestinal tract. Its role in different diseases remains a subject of debate; this may be due to strain to strain genetic variation. Here, we sequence and analyze the genome of a C. concisus from a biopsy of a child with Crohn's disease (UNSWCD); the second such genome for this species. A 1.8 Mb genome was assembled with paired-end reads from a next-generation sequencer. This genome is smaller than the 2.1 Mb C. concisus reference BAA-1457. While 1593 genes were conserved across UNSWCD and BAA-1457, 138 genes from UNSWCD and 281 from BAA-1457 were unique when compared against the other. To further validate the genome assembly and annotation, comprehensive shotgun proteomics was performed. This confirmed 78% of open reading frames in UNSWCD and, importantly, provided evidence of expression for 217 proteins previously defined as 'hypothetical' in Campylobacter. Substantial functional differences were observed between the UNSWCD and the reference strain. Enrichment analysis revealed differences in membrane proteins, response to stimulus, molecular transport and electron carriers. Synteny maps for the 281 genes not present in UNSWCD identified seven functionally associated gene clusters. These included one associated with the CRISPR family and another which encoded multiple restriction endonucleases; these genes are all involved in resistance to phage attack. Many of the observed differences are consistent with UNSWCD having adapted to greater surface interaction with host cells, as opposed to BAA-1457 which may prefer a free-living environment.
Collapse
|
297
|
Lienard J, Croxatto A, Prod'hom G, Greub G. Estrella lausannensis, a new star in the Chlamydiales order. Microbes Infect 2011; 13:1232-41. [PMID: 21816232 DOI: 10.1016/j.micinf.2011.07.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2011] [Revised: 07/07/2011] [Accepted: 07/07/2011] [Indexed: 11/17/2022]
Abstract
Originally, the Chlamydiales order was represented by a single family, the Chlamydiaceae, composed of several pathogens, such as Chlamydia trachomatis, Chlamydia pneumoniae, Chlamydia psittaci and Chlamydia abortus. Recently, 6 new families of Chlamydia-related bacteria have been added to the Chlamydiales order. Most of these obligate intracellular bacteria are able to replicate in free-living amoebae. Amoebal co-culture may be used to selectively isolate amoeba-resisting bacteria. This method allowed in a previous work to discover strain CRIB 30, from an environmental water sample. Based on its 16S rRNA gene sequence similarity with Criblamydia sequanensis, strain CRIB 30 was considered as a new member of the Criblamydiaceae family. In the present work, phylogenetic analyses of the genes gyrA, gyrB, rpoA, rpoB, secY, topA and 23S rRNA as well as MALDI-TOF MS confirmed the taxonomic classification of strain CRIB 30. Morphological examination revealed peculiar star-shaped elementary bodies (EBs) similar to those of C. sequanensis. Therefore, this new strain was called "Estrella lausannensis". Finally, E. lausannensis showed a large amoebal host range and a very efficient replication rate in Acanthamoeba species. Furthermore, E. lausannensis is the first member of the Chlamydiales order to grow successfully in the genetically tractable Dictyostelium discoideum, which opens new perspectives in the study of chlamydial biology.
Collapse
MESH Headings
- Acanthamoeba/microbiology
- Amoeba/microbiology
- Chlamydiales/classification
- Chlamydiales/genetics
- Chlamydiales/growth & development
- Chlamydiales/isolation & purification
- Coculture Techniques
- DNA, Bacterial/analysis
- DNA, Bacterial/genetics
- Dictyostelium/microbiology
- Genes, rRNA/genetics
- Microscopy, Fluorescence
- Phylogeny
- RNA, Ribosomal, 16S/analysis
- RNA, Ribosomal, 16S/genetics
- Sequence Analysis, DNA
- Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization
Collapse
Affiliation(s)
- Julia Lienard
- Institute of Microbiology, Faculty of Biology and Medicine, University of Lausanne, Lausanne, Switzerland
| | | | | | | |
Collapse
|
298
|
Dutilh BE, Jurgelenaite R, Szklarczyk R, van Hijum SAFT, Harhangi HR, Schmid M, de Wild B, Françoijs KJ, Stunnenberg HG, Strous M, Jetten MSM, Op den Camp HJM, Huynen MA. FACIL: Fast and Accurate Genetic Code Inference and Logo. Bioinformatics 2011; 27:1929-33. [PMID: 21653513 PMCID: PMC3129529 DOI: 10.1093/bioinformatics/btr316] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2011] [Revised: 05/17/2011] [Accepted: 05/18/2011] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The intensification of DNA sequencing will increasingly unveil uncharacterized species with potential alternative genetic codes. A total of 0.65% of the DNA sequences currently in Genbank encode their proteins with a variant genetic code, and these exceptions occur in many unrelated taxa. RESULTS We introduce FACIL (Fast and Accurate genetic Code Inference and Logo), a fast and reliable tool to evaluate nucleic acid sequences for their genetic code that detects alternative codes even in species distantly related to known organisms. To illustrate this, we apply FACIL to a set of mitochondrial genomic contigs of Globobulimina pseudospinescens. This foraminifer does not have any sequenced close relative in the databases, yet we infer its alternative genetic code with high confidence values. Results are intuitively visualized in a Genetic Code Logo. AVAILABILITY AND IMPLEMENTATION FACIL is available as a web-based service at http://www.cmbi.ru.nl/FACIL/ and as a stand-alone program.
Collapse
Affiliation(s)
- Bas E Dutilh
- Centre for Molecular and Biomolecular Informatics, Nijmegen Centre for Molecular Life Sciences, Radboud University Nijmegen Medical Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
299
|
Baltrus DA, Nishimura MT, Romanchuk A, Chang JH, Mukhtar MS, Cherkis K, Roach J, Grant SR, Jones CD, Dangl JL. Dynamic evolution of pathogenicity revealed by sequencing and comparative genomics of 19 Pseudomonas syringae isolates. PLoS Pathog 2011; 7:e1002132. [PMID: 21799664 PMCID: PMC3136466 DOI: 10.1371/journal.ppat.1002132] [Citation(s) in RCA: 303] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2011] [Accepted: 05/06/2011] [Indexed: 11/18/2022] Open
Abstract
Closely related pathogens may differ dramatically in host range, but the molecular, genetic, and evolutionary basis for these differences remains unclear. In many Gram- negative bacteria, including the phytopathogen Pseudomonas syringae, type III effectors (TTEs) are essential for pathogenicity, instrumental in structuring host range, and exhibit wide diversity between strains. To capture the dynamic nature of virulence gene repertoires across P. syringae, we screened 11 diverse strains for novel TTE families and coupled this nearly saturating screen with the sequencing and assembly of 14 phylogenetically diverse isolates from a broad collection of diseased host plants. TTE repertoires vary dramatically in size and content across all P. syringae clades; surprisingly few TTEs are conserved and present in all strains. Those that are likely provide basal requirements for pathogenicity. We demonstrate that functional divergence within one conserved locus, hopM1, leads to dramatic differences in pathogenicity, and we demonstrate that phylogenetics-informed mutagenesis can be used to identify functionally critical residues of TTEs. The dynamism of the TTE repertoire is mirrored by diversity in pathways affecting the synthesis of secreted phytotoxins, highlighting the likely role of both types of virulence factors in determination of host range. We used these 14 draft genome sequences, plus five additional genome sequences previously reported, to identify the core genome for P. syringae and we compared this core to that of two closely related non-pathogenic pseudomonad species. These data revealed the recent acquisition of a 1 Mb megaplasmid by a sub-clade of cucumber pathogens. This megaplasmid encodes a type IV secretion system and a diverse set of unknown proteins, which dramatically increases both the genomic content of these strains and the pan-genome of the species. Breakthroughs in genomics have unleashed a new suite of tools for studying the genetic bases of phenotypic differences across diverse bacterial isolates. Here, we analyze 19 genomes of P. syringae, a pathogen of many crop species, to reveal the genetic changes underlying differences in virulence across host plants ranging from rice to maple trees. Surprisingly, a pair of strains diverged dramatically via the acquisition of a 1 Mb megaplasmid, which constitutes roughly 14% of the genome. Novel plasmids and horizontal genetic exchange have contributed extensively to species-wide diversification. Type III effector proteins are essential for pathogenicity, exhibit wide diversity between strains and are present in distinct higher-level patterns across the species. Furthermore, we use sequence comparisons within an evolutionary context to identify functional changes in multiple virulence genes. Overall, our data provide a unique overview of evolutionary pressures within P. syringae and an important resource for the phytopathogen research community.
Collapse
Affiliation(s)
- David A. Baltrus
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Marc T. Nishimura
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Artur Romanchuk
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Jeff H. Chang
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - M. Shahid Mukhtar
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Karen Cherkis
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Jeff Roach
- Research Computing Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Sarah R. Grant
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Corbin D. Jones
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- * E-mail: (CDJ, computational queries); (JLD, biological queries)
| | - Jeffery L. Dangl
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- * E-mail: (CDJ, computational queries); (JLD, biological queries)
| |
Collapse
|
300
|
Combinations of macrolide resistance determinants in field isolates of Mannheimia haemolytica and Pasteurella multocida. Antimicrob Agents Chemother 2011; 55:4128-33. [PMID: 21709086 DOI: 10.1128/aac.00450-11] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Respiratory tract infections in cattle are commonly associated with the bacterial pathogens Mannheimia haemolytica and Pasteurella multocida. These infections can generally be successfully treated in the field with one of several groups of antibiotics, including macrolides. A few recent isolates of these species exhibit resistance to veterinary macrolides with phenotypes that fall into three distinct classes. The first class has type I macrolide, lincosamide, and streptogramin B antibiotic resistance and, consistent with this, the 23S rRNA nucleotide A2058 is monomethylated by the enzyme product of the erm(42) gene. The second class shows no lincosamide resistance and lacks erm(42) and concomitant 23S rRNA methylation. Sequencing of the genome of a representative strain from this class, P. multocida 3361, revealed macrolide efflux and phosphotransferase genes [respectively termed msr(E) and mph(E)] that are arranged in tandem and presumably expressed from the same promoter. The third class exhibits the most marked drug phenotype, with high resistance to all of the macrolides tested, and possesses all three resistance determinants. The combinations of erm(42), msr(E), and mph(E) are chromosomally encoded and intermingled with other exogenous genes, many of which appear to have been transferred from other members of the Pasteurellaceae. The presence of some of the exogenous genes explains recent reports of resistance to additional drug classes. We have expressed recombinant versions of the erm(42), msr(E), and mph(E) genes within an isogenic Escherichia coli background to assess their individually contributions to resistance. Our findings indicate what types of compounds might have driven the selection for these resistance determinants.
Collapse
|