101
|
Blankers T, Oh KP, Bombarely A, Shaw KL. The Genomic Architecture of a Rapid Island Radiation: Recombination Rate Variation, Chromosome Structure, and Genome Assembly of the Hawaiian Cricket Laupala. Genetics 2018; 209:1329-1344. [PMID: 29875253 PMCID: PMC6063224 DOI: 10.1534/genetics.118.300894] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Accepted: 06/03/2018] [Indexed: 12/30/2022] Open
Abstract
Phenotypic evolution and speciation depend on recombination in many ways. Within populations, recombination can promote adaptation by bringing together favorable mutations and decoupling beneficial and deleterious alleles. As populations diverge, crossing over can give rise to maladapted recombinants and impede or reverse diversification. Suppressed recombination due to genomic rearrangements, modifier alleles, and intrinsic chromosomal properties may offer a shield against maladaptive gene flow eroding coadapted gene complexes. Both theoretical and empirical results support this relationship. However, little is known about this relationship in the context of behavioral isolation, where coevolving signals and preferences are the major hybridization barrier. Here we examine the genomic architecture of recently diverged, sexually isolated Hawaiian swordtail crickets (Laupala). We assemble a de novo genome and generate three dense linkage maps from interspecies crosses. In line with expectations based on the species' recent divergence and successful interbreeding in the laboratory, the linkage maps are highly collinear and show no evidence for large-scale chromosomal rearrangements. Next, the maps were used to anchor the assembly to pseudomolecules and estimate recombination rates across the genome to test the hypothesis that loci involved in behavioral isolation (song and preference divergence) are in regions of low interspecific recombination. Contrary to our expectations, the genomic region where a male song and female preference QTL colocalize is not associated with particularly low recombination rates. This study provides important novel genomic resources for an emerging evolutionary genetics model system and suggests that trait-preference coevolution is not necessarily facilitated by locally suppressed recombination.
Collapse
Affiliation(s)
- Thomas Blankers
- Department of Neurobiology and Behavior, Cornell University, Ithaca, New York 14853
| | - Kevin P Oh
- Department of Neurobiology and Behavior, Cornell University, Ithaca, New York 14853
| | - Aureliano Bombarely
- Department of Horticulture, Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061
| | - Kerry L Shaw
- Department of Neurobiology and Behavior, Cornell University, Ithaca, New York 14853
| |
Collapse
|
102
|
Barrett C, Huang FW, Reidys CM. Sequence-structure relations of biopolymers. Bioinformatics 2018; 33:382-389. [PMID: 28171628 DOI: 10.1093/bioinformatics/btw621] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Revised: 05/16/2016] [Accepted: 09/26/2016] [Indexed: 12/12/2022] Open
Abstract
Motivation DNA data is transcribed into single-stranded RNA, which folds into specific molecular structures. In this paper we pose the question to what extent sequence- and structure-information correlate. We view this correlation as structural semantics of sequence data that allows for a different interpretation than conventional sequence alignment. Structural semantics could enable us to identify more general embedded ‘patterns’ in DNA and RNA sequences. Results We compute the partition function of sequences with respect to a fixed structure and connect this computation to the mutual information of a sequence–structure pair for RNA secondary structures. We present a Boltzmann sampler and obtain the a priori probability of specific sequence patterns. We present a detailed analysis for the three PDB-structures, 2JXV (hairpin), 2N3R (3-branch multi-loop) and 1EHZ (tRNA). We localize specific sequence patterns, contrast the energy spectrum of the Boltzmann sampled sequences versus those sequences that refold into the same structure and derive a criterion to identify native structures. We illustrate that there are multiple sequences in the partition function of a fixed structure, each having nearly the same mutual information, that are nevertheless poorly aligned. This indicates the possibility of the existence of relevant patterns embedded in the sequences that are not discoverable using alignments. Availability and Implementation The source code is freely available at http://staff.vbi.vt.edu/fenixh/Sampler.zip Contact duckcr@vbi.vt.edu Supplimentary Information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christopher Barrett
- Biocomplexity Institute of Virginia Tech, Virginia Tech University, Blacksburg, VA, USA
| | - Fenix W Huang
- Biocomplexity Institute of Virginia Tech, Virginia Tech University, Blacksburg, VA, USA
| | - Christian M Reidys
- Biocomplexity Institute of Virginia Tech, Virginia Tech University, Blacksburg, VA, USA
| |
Collapse
|
103
|
Chen Q, Lan C, Zhao L, Wang J, Chen B, Chen YPP. Recent advances in sequence assembly: principles and applications. Brief Funct Genomics 2018; 16:361-378. [PMID: 28453648 DOI: 10.1093/bfgp/elx006] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The application of advanced sequencing technologies and the rapid growth of various sequence data have led to increasing interest in DNA sequence assembly. However, repeats and polymorphism occur frequently in genomes, and each of these has different impacts on assembly. Further, many new applications for sequencing, such as metagenomics regarding multiple species, have emerged in recent years. These not only give rise to higher complexity but also prevent short-read assembly in an efficient way. This article reviews the theoretical foundations that underlie current mapping-based assembly and de novo-based assembly, and highlights the key issues and feasible solutions that need to be considered. It focuses on how individual processes, such as optimal k-mer determination and error correction in assembly, rely on intelligent strategies or high-performance computation. We also survey primary algorithms/software and offer a discussion on the emerging challenges in assembly.
Collapse
|
104
|
The Genetics of a Behavioral Speciation Phenotype in an Island System. Genes (Basel) 2018; 9:genes9070346. [PMID: 29996514 PMCID: PMC6070818 DOI: 10.3390/genes9070346] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Revised: 07/03/2018] [Accepted: 07/03/2018] [Indexed: 12/30/2022] Open
Abstract
Mating behavior divergence can make significant contributions to reproductive isolation and speciation in various biogeographic contexts. However, whether the genetic architecture underlying mating behavior divergence is related to the biogeographic history and the tempo and mode of speciation remains poorly understood. Here, we use quantitative trait locus (QTL) mapping to infer the number, distribution, and effect size of mating song rhythm variations in the crickets Laupala eukolea and Laupala cerasina, which occur on different islands (Maui and Hawaii). We then compare these results with a similar study of an independently evolving species pair that diverged within the same island. Finally, we annotate the L. cerasina transcriptome and test whether the QTL fall in functionally enriched genomic regions. We document a polygenic architecture behind the song rhythm divergence in the inter-island species pair that is remarkably similar to that previously found for an intra-island species pair in the same genus. Importantly, the QTL regions were significantly enriched for potential homologs of the genes involved in pathways that may be modulating the cricket song rhythm. These clusters of loci could constrain the spatial genomic distribution of the genetic variation underlying the cricket song variation and harbor several candidate genes that merit further study.
Collapse
|
105
|
Sancho R, Cantalapiedra CP, López-Alvarez D, Gordon SP, Vogel JP, Catalán P, Contreras-Moreira B. Comparative plastome genomics and phylogenomics of Brachypodium: flowering time signatures, introgression and recombination in recently diverged ecotypes. THE NEW PHYTOLOGIST 2018; 218:1631-1644. [PMID: 29206296 DOI: 10.1111/nph.14926] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Accepted: 03/03/2017] [Indexed: 05/24/2023]
Abstract
Few pan-genomic studies have been conducted in plants, and none of them have focused on the intraspecific diversity and evolution of their plastid genomes. We address this issue in Brachypodium distachyon and its close relatives B. stacei and B. hybridum, for which a large genomic data set has been compiled. We analyze inter- and intraspecific plastid comparative genomics and phylogenomic relationships within a family-wide framework. Major indel differences were detected between Brachypodium plastomes. Within B. distachyon, we detected two main lineages, a mostly Extremely Delayed Flowering (EDF+) clade and a mostly Spanish (S+) - Turkish (T+) clade, plus nine chloroplast capture and two plastid DNA (ptDNA) introgression and micro-recombination events. Early Oligocene (30.9 million yr ago (Ma)) and Late Miocene (10.1 Ma) divergence times were inferred for the respective stem and crown nodes of Brachypodium and a very recent Mid-Pleistocene (0.9 Ma) time for the B. distachyon split. Flowering time variation is a main factor driving rapid intraspecific divergence in B. distachyon, although it is counterbalanced by repeated introgression between previously isolated lineages. Swapping of plastomes between the three different genomic groups, EDF+, T+, S+, probably resulted from random backcrossing followed by stabilization through selection pressure.
Collapse
Affiliation(s)
- Rubén Sancho
- Department of Agricultural and Environmental Sciences, High Polytechnic School of Huesca, University of Zaragoza, Huesca, Spain
- Grupo de Bioquímica, Biofísica y Biología Computacional (BIFI, UNIZAR), Unidad Asociada al CSIC, Saragossa, Spain
| | - Carlos P Cantalapiedra
- Department of Genetics and Plant Breeding, Estación Experimental de Aula Dei-Consejo Superior de Investigaciones Científicas, Zaragoza, Spain
| | - Diana López-Alvarez
- Department of Agricultural and Environmental Sciences, High Polytechnic School of Huesca, University of Zaragoza, Huesca, Spain
| | - Sean P Gordon
- DOE Joint Genome Institute, Walnut Creek, CA, 94598, USA
| | - John P Vogel
- DOE Joint Genome Institute, Walnut Creek, CA, 94598, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, 94720, USA
| | - Pilar Catalán
- Department of Agricultural and Environmental Sciences, High Polytechnic School of Huesca, University of Zaragoza, Huesca, Spain
- Grupo de Bioquímica, Biofísica y Biología Computacional (BIFI, UNIZAR), Unidad Asociada al CSIC, Saragossa, Spain
| | - Bruno Contreras-Moreira
- Grupo de Bioquímica, Biofísica y Biología Computacional (BIFI, UNIZAR), Unidad Asociada al CSIC, Saragossa, Spain
- Department of Genetics and Plant Breeding, Estación Experimental de Aula Dei-Consejo Superior de Investigaciones Científicas, Zaragoza, Spain
- Fundación ARAID, Zaragoza, Spain
| |
Collapse
|
106
|
Outbreak of Invasive Wound Mucormycosis in a Burn Unit Due to Multiple Strains of Mucor circinelloides f. circinelloides Resolved by Whole-Genome Sequencing. mBio 2018; 9:mBio.00573-18. [PMID: 29691339 PMCID: PMC5915733 DOI: 10.1128/mbio.00573-18] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Mucorales are ubiquitous environmental molds responsible for mucormycosis in diabetic, immunocompromised, and severely burned patients. Small outbreaks of invasive wound mucormycosis (IWM) have already been reported in burn units without extensive microbiological investigations. We faced an outbreak of IWM in our center and investigated the clinical isolates with whole-genome sequencing (WGS) analysis. We analyzed M. circinelloides isolates from patients in our burn unit (BU1, Hôpital Saint-Louis, Paris, France) together with nonoutbreak isolates from Burn Unit 2 (BU2, Paris area) and from France over a 2-year period (2013 to 2015). A total of 21 isolates, including 14 isolates from six BU1 patients, were analyzed by whole-genome sequencing (WGS). Phylogenetic classification based on de novo assembly and assembly free approaches showed that the clinical isolates clustered in four highly divergent clades. Clade 1 contained at least one of the strains from the six epidemiologically linked BU1 patients. The clinical isolates were specific to each patient. Two patients were infected with more than two strains from different clades, suggesting that an environmental reservoir of clonally unrelated isolates was the source of contamination. Only two patients from BU1 shared one strain, which could correspond to direct transmission or contamination with the same environmental source. In conclusion, WGS of several isolates per patients coupled with precise epidemiological data revealed a complex situation combining potential cross-transmission between patients and multiple contaminations with a heterogeneous pool of strains from a cryptic environmental reservoir. Invasive wound mucormycosis (IWM) is a severe infection due to environmental molds belonging to the order Mucorales. Severely burned patients are particularly at risk for IWM. Here, we used whole-genome sequencing (WGS) analysis to resolve an outbreak of IWM due to Mucor circinelloides that occurred in our hospital (BU1). We sequenced 21 clinical isolates, including 14 from BU1 and 7 unrelated isolates, and compared them to the reference genome (1006PhL). This analysis revealed that the outbreak was mainly due to multiple strains that seemed patient specific, suggesting that the patients were more likely infected from a pool of diverse strains from the environment rather than from direct transmission among them. This study revealed the complexity of a Mucorales outbreak in the settings of IWM in burn patients, which has been highlighted based on WGS combined with careful sampling.
Collapse
|
107
|
Draft Genome Sequence of the Fish Pathogen Flavobacterium columnare Genomovar III Strain PH-97028 (=CIP 109753). GENOME ANNOUNCEMENTS 2018; 6:6/14/e00222-18. [PMID: 29622616 PMCID: PMC5887026 DOI: 10.1128/genomea.00222-18] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Flavobacterium columnare strain PH-97028 (=CIP 109753) is a genomovar III reference strain that was isolated from a diseased Ayu fish in Japan. We report here the analysis of the first available genomovar III sequence of this species to aid in identification, epidemiological tracking, and virulence studies.
Collapse
|
108
|
Arboleya S, Bottacini F, O'Connell-Motherway M, Ryan CA, Ross RP, van Sinderen D, Stanton C. Gene-trait matching across the Bifidobacterium longum pan-genome reveals considerable diversity in carbohydrate catabolism among human infant strains. BMC Genomics 2018; 19:33. [PMID: 29310579 PMCID: PMC5759876 DOI: 10.1186/s12864-017-4388-9] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2017] [Accepted: 12/15/2017] [Indexed: 12/15/2022] Open
Abstract
Background Bifidobacterium longum is a common member of the human gut microbiota and is frequently present at high numbers in the gut microbiota of humans throughout life, thus indicative of a close symbiotic host-microbe relationship. Different mechanisms may be responsible for the high competitiveness of this taxon in its human host to allow stable establishment in the complex and dynamic intestinal microbiota environment. The objective of this study was to assess the genetic and metabolic diversity in a set of 20 B. longum strains, most of which had previously been isolated from infants, by performing whole genome sequencing and comparative analysis, and to analyse their carbohydrate utilization abilities using a gene-trait matching approach. Results We analysed their pan-genome and their phylogenetic relatedness. All strains clustered in the B. longum ssp. longum phylogenetic subgroup, except for one individual strain which was found to cluster in the B. longum ssp. suis phylogenetic group. The examined strains exhibit genomic diversity, while they also varied in their sugar utilization profiles. This allowed us to perform a gene-trait matching exercise enabling the identification of five gene clusters involved in the utilization of xylo-oligosaccharides, arabinan, arabinoxylan, galactan and fucosyllactose, the latter of which is an abundant human milk oligosaccharide (HMO). Conclusions The results showed high diversity in terms of genes and predicted glycosyl-hydrolases, as well as the ability to metabolize a large range of sugars. Moreover, we corroborate the capability of B. longum ssp. longum to metabolise HMOs. Ultimately, their intraspecific genomic diversity and the ability to consume a wide assortment of carbohydrates, ranging from plant-derived carbohydrates to HMOs, may provide an explanation for the competitive advantage and persistence of B. longum in the human gut microbiome. Electronic supplementary material The online version of this article (10.1186/s12864-017-4388-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Silvia Arboleya
- APC Microbiome Institute, University College Cork, Cork, Ireland.,Teagasc Food Research Centre, Moorepark, Fermoy, Co. Cork, Ireland.,Instituto de Productos Lácteos de Asturias (IPLA-CSIC), Paseo Río Linares, Villaviciosa, Asturias, Spain
| | - Francesca Bottacini
- APC Microbiome Institute, University College Cork, Cork, Ireland.,School of Microbiology, University College Cork, Cork, Ireland
| | - Mary O'Connell-Motherway
- APC Microbiome Institute, University College Cork, Cork, Ireland.,School of Microbiology, University College Cork, Cork, Ireland
| | - C Anthony Ryan
- APC Microbiome Institute, University College Cork, Cork, Ireland.,Department of Neonatology, Cork University Maternity Hospital, Cork, Ireland
| | - R Paul Ross
- APC Microbiome Institute, University College Cork, Cork, Ireland.,Teagasc Food Research Centre, Moorepark, Fermoy, Co. Cork, Ireland.,School of Microbiology, University College Cork, Cork, Ireland
| | - Douwe van Sinderen
- APC Microbiome Institute, University College Cork, Cork, Ireland.,School of Microbiology, University College Cork, Cork, Ireland
| | - Catherine Stanton
- APC Microbiome Institute, University College Cork, Cork, Ireland. .,Teagasc Food Research Centre, Moorepark, Fermoy, Co. Cork, Ireland.
| |
Collapse
|
109
|
Milla L, van Nieukerken EJ, Vijverberg R, Doorenweerd C, Wilcox SA, Halsey M, Young DA, Jones TM, Kallies A, Hilton DJ. A preliminary molecular phylogeny of shield-bearer moths (Lepidoptera: Adeloidea: Heliozelidae) highlights rich undescribed diversity. Mol Phylogenet Evol 2017; 120:129-143. [PMID: 29229488 DOI: 10.1016/j.ympev.2017.12.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2017] [Revised: 11/24/2017] [Accepted: 12/04/2017] [Indexed: 11/25/2022]
Abstract
Heliozelidae are a widespread, evolutionarily early diverging family of small, day-flying monotrysian moths, for which a comprehensive phylogeny is lacking. We generated the first molecular phylogeny of the family using DNA sequences of two mitochondrial genes (COI and COII) and two nuclear genes (H3 and 28S) from 130 Heliozelidae specimens, including eight of the twelve known genera: Antispila, Antispilina, Coptodisca, Heliozela, Holocacista, Hoplophanes, Pseliastis, and Tyriozela. Our results provide strong support for five major Heliozelidae clades: (i) a large widespread clade containing the leaf-mining genera Antispilina, Coptodisca and Holocacista and some species of Antispila, (ii) a clade containing most of the described Antispila, (iii) a clade containing the leaf-mining genus Heliozela and the monotypic genus Tyriozela, (iv) an Australian clade containing Pseliastis and (v) an Australian clade containing Hoplophanes. Each clade includes several new species and potentially new genera. Collectively, our data uncover a rich and undescribed diversity that appears to be especially prevalent in Australia. Our work highlights the need for a major taxonomic revision of the family and for generating a robust molecular phylogeny using multi-gene approaches in order to resolve the relationships among clades.
Collapse
Affiliation(s)
- Liz Milla
- School of BioSciences, The University of Melbourne, Parkville, Victoria 3010, Australia.
| | | | - Ruben Vijverberg
- Naturalis Biodiversity Center, PO Box 9517, 2300 RA Leiden, The Netherlands
| | - Camiel Doorenweerd
- Naturalis Biodiversity Center, PO Box 9517, 2300 RA Leiden, The Netherlands
| | - Stephen A Wilcox
- School of BioSciences, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Mike Halsey
- Faculty of Health and Life Sciences, Oxford Brookes University, England, UK
| | - David A Young
- D'Estrees Entomology & Science Services, Kingscote 5223, Australia
| | - Therésa M Jones
- School of BioSciences, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Axel Kallies
- School of BioSciences, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Douglas J Hilton
- School of BioSciences, The University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
110
|
Shimizu T, Tanizawa Y, Mochizuki T, Nagasaki H, Yoshioka T, Toyoda A, Fujiyama A, Kaminuma E, Nakamura Y. Draft Sequencing of the Heterozygous Diploid Genome of Satsuma ( Citrus unshiu Marc.) Using a Hybrid Assembly Approach. Front Genet 2017; 8:180. [PMID: 29259619 PMCID: PMC5723288 DOI: 10.3389/fgene.2017.00180] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2017] [Accepted: 11/06/2017] [Indexed: 12/19/2022] Open
Abstract
Satsuma (Citrus unshiu Marc.) is one of the most abundantly produced mandarin varieties of citrus, known for its seedless fruit production and as a breeding parent of citrus. De novo assembly of the heterozygous diploid genome of Satsuma ("Miyagawa Wase") was conducted by a hybrid assembly approach using short-read sequences, three mate-pair libraries, and a long-read sequence of PacBio by the PLATANUS assembler. The assembled sequence, with a total size of 359.7 Mb at the N50 length of 386,404 bp, consisted of 20,876 scaffolds. Pseudomolecules of Satsuma constructed by aligning the scaffolds to three genetic maps showed genome-wide synteny to the genomes of Clementine, pummelo, and sweet orange. Gene prediction by modeling with MAKER-P proposed 29,024 genes and 37,970 mRNA; additionally, gene prediction analysis found candidates for novel genes in several biosynthesis pathways for gibberellin and violaxanthin catabolism. BUSCO scores for the assembled scaffold and predicted transcripts, and another analysis by BAC end sequence mapping indicated the assembled genome consistency was close to those of the haploid Clementine, pummel, and sweet orange genomes. The number of repeat elements and long terminal repeat retrotransposon were comparable to those of the seven citrus genomes; this suggested no significant failure in the assembly at the repeat region. A resequencing application using the assembled sequence confirmed that both kunenbo-A and Satsuma are offsprings of Kishu, and Satsuma is a back-crossed offspring of Kishu. These results illustrated the performance of the hybrid assembly approach and its ability to construct an accurate heterozygous diploid genome.
Collapse
Affiliation(s)
- Tokurou Shimizu
- Division of Citrus Research, Institute of Fruit Tree and Tea Science, National Agriculture and Food Research Organization, Shimizu, Japan
| | - Yasuhiro Tanizawa
- Genome Informatics Laboratory, Center for Information Biology, National Institute of Genetics, Mishima, Japan
| | - Takako Mochizuki
- Genome Informatics Laboratory, Center for Information Biology, National Institute of Genetics, Mishima, Japan
| | - Hideki Nagasaki
- Genome Informatics Laboratory, Center for Information Biology, National Institute of Genetics, Mishima, Japan
| | - Terutaka Yoshioka
- Division of Citrus Research, Institute of Fruit Tree and Tea Science, National Agriculture and Food Research Organization, Shimizu, Japan
| | - Atsushi Toyoda
- Comparative Genomics Laboratory, Center for Information Biology, National Institute of Genetics, Mishima, Japan
| | - Asao Fujiyama
- Comparative Genomics Laboratory, Center for Information Biology, National Institute of Genetics, Mishima, Japan
| | - Eli Kaminuma
- Genome Informatics Laboratory, Center for Information Biology, National Institute of Genetics, Mishima, Japan
| | - Yasukazu Nakamura
- Genome Informatics Laboratory, Center for Information Biology, National Institute of Genetics, Mishima, Japan
| |
Collapse
|
111
|
Huang YT, Huang YW. An efficient error correction algorithm using FM-index. BMC Bioinformatics 2017; 18:524. [PMID: 29179672 PMCID: PMC5704532 DOI: 10.1186/s12859-017-1940-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2017] [Accepted: 11/14/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND High-throughput sequencing offers higher throughput and lower cost for sequencing a genome. However, sequencing errors, including mismatches and indels, may be produced during sequencing. Because, errors may reduce the accuracy of subsequent de novo assembly, error correction is necessary prior to assembly. However, existing correction methods still face trade-offs among correction power, accuracy, and speed. RESULTS We develop a novel overlap-based error correction algorithm using FM-index (called FMOE). FMOE first identifies overlapping reads by aligning a query read simultaneously against multiple reads compressed by FM-index. Subsequently, sequencing errors are corrected by k-mer voting from overlapping reads only. The experimental results indicate that FMOE has highest correction power with comparable accuracy and speed. Our algorithm performs better in long-read than short-read datasets when compared with others. The assembly results indicated different algorithms has its own strength and weakness, whereas FMOE is good for long or good-quality reads. CONCLUSIONS FMOE is freely available at https://github.com/ythuang0522/FMOC .
Collapse
Affiliation(s)
- Yao-Ting Huang
- Department of Computer Science and Information Engineering, National Chuang Cheng University, Chiayi, Taiwan.
| | - Yu-Wen Huang
- Department of Computer Science and Information Engineering, National Chuang Cheng University, Chiayi, Taiwan
| |
Collapse
|
112
|
Beres SB, Olsen RJ, Ojeda Saavedra M, Ure R, Reynolds A, Lindsay DSJ, Smith AJ, Musser JM. Genome sequence analysis of emm89 Streptococcus pyogenes strains causing infections in Scotland, 2010-2016. J Med Microbiol 2017; 66:1765-1773. [PMID: 29099690 PMCID: PMC5845742 DOI: 10.1099/jmm.0.000622] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Purpose Strains of type emm89 Streptococcus pyogenes have recently increased in frequency as a cause of human infections in several countries in Europe and North America. This increase has been molecular epidemiologically linked with the emergence of a new genetically distinct clone, designated clade 3. We sought to extend our understanding of this epidemic behavior by the genetic characterization of type emm89 strains responsible in recent years for an increased frequency of infections in Scotland. Methodology We sequenced the genomes of a retrospective cohort of 122 emm89 strains recovered from patients with invasive and noninvasive infections throughout Scotland during 2010 to 2016. Results All but one of the 122 emm89 infection isolates are of the recently emerged epidemic clade 3 clonal lineage. The Scotland isolates are closely related to and not genetically distinct from recent emm89 strains from England, they constitute a single genetic population. Conclusions The clade 3 clone causes virtually all-contemporary emm89 infections in Scotland. These findings add Scotland to a growing list of countries of Europe and North America where, by whole genome sequencing, emm89 clade 3 strains have been demonstrated to be the cause of an ongoing epidemic of invasive infections and to be genetically related due to descent from a recent common progenitor.
Collapse
Affiliation(s)
- Stephen B Beres
- Center for Molecular and Translational Human Infectious Diseases Research, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute, and Houston Methodist Hospital, Houston, TX 77030, USA
| | - Randall J Olsen
- Center for Molecular and Translational Human Infectious Diseases Research, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute, and Houston Methodist Hospital, Houston, TX 77030, USA.,Departments of Pathology and Laboratory Medicine and Microbiology and Immunology, Weill Cornell Medical College, NY 10021, USA
| | - Matthew Ojeda Saavedra
- Center for Molecular and Translational Human Infectious Diseases Research, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute, and Houston Methodist Hospital, Houston, TX 77030, USA
| | - Roisin Ure
- Scottish Haemophilus Legionella Meningococcus Pneumococcus Reference Laboratory, New Lister Building, Glasgow, G31 2ER, Scotland, UK
| | - Arlene Reynolds
- Scottish Haemophilus Legionella Meningococcus Pneumococcus Reference Laboratory, New Lister Building, Glasgow, G31 2ER, Scotland, UK
| | - Diane S J Lindsay
- Scottish Haemophilus Legionella Meningococcus Pneumococcus Reference Laboratory, New Lister Building, Glasgow, G31 2ER, Scotland, UK
| | - Andrew J Smith
- Scottish Haemophilus Legionella Meningococcus Pneumococcus Reference Laboratory, New Lister Building, Glasgow, G31 2ER, Scotland, UK.,College of Medical, Veterinary and Life Sciences, Glasgow Dental Hospital and School, University of Glasgow, 378 Sauchiehall Street, Glasgow, G2 3JZ, Scotland, UK
| | - James M Musser
- Center for Molecular and Translational Human Infectious Diseases Research, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute, and Houston Methodist Hospital, Houston, TX 77030, USA.,Departments of Pathology and Laboratory Medicine and Microbiology and Immunology, Weill Cornell Medical College, NY 10021, USA
| |
Collapse
|
113
|
Xu C, Zhang R, Sun G, Gleason ML. Comparative Genome Analysis Reveals Adaptation to the Ectophytic Lifestyle of Sooty Blotch and Flyspeck Fungi. Genome Biol Evol 2017; 9:3137-3151. [PMID: 29126189 PMCID: PMC5737583 DOI: 10.1093/gbe/evx229] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/05/2017] [Indexed: 01/04/2023] Open
Abstract
Sooty blotch and flyspeck (SBFS) fungi are a distinctive group of plant pathogens which, although phylogenetically diverse, occupy an exclusively surface-dwelling niche. They cause economic losses by superficially blemishing the fruit of several tree crops, principally apple, in moist temperate regions worldwide. In this study, we performed genome-wide comparative analyses separately within three pairs of species of ascomycete pathogens; each pair contained an SBFS species as well as a closely related but plant-penetrating parasite (PPP) species. Our results showed that all three of the SBFS pathogens had significantly smaller genome sizes, gene numbers and repeat ratios than their counterpart PPPs. The pathogenicity-related genes encoding MFS transporters, secreted proteins (mainly effectors and peptidases), plant cell wall degrading enzymes, and secondary metabolism enzymes were also drastically reduced in the SBFS fungi compared with their PPP relatives. We hypothesize that the above differences in genome composition are due largely to different levels of acquisition, loss, expansion, and contraction of gene families and emergence of orphan genes. Furthermore, results suggested that horizontal gene transfer may have played a role, although limited, in the divergent evolutionary paths of SBFS pathogens and PPPs; repeat-induced point mutation could have inhibited the propagation of transposable elements and expansion of gene families in the SBFS group, given that this mechanism is stronger in the SBFS fungi than in their PPP relatives. These results substantially broaden understanding of evolutionary mechanisms of adaptation of fungi to the epicuticular niche of plants.
Collapse
Affiliation(s)
- Chao Xu
- State Key Laboratory of Crop Stress Biology for Arid Areas and Department of Plant Pathology, College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, China
- Department of Plant Pathology, College of Plant Protection, Henan Agricultural University, Zhengzhou, Henan, China
| | - Rong Zhang
- State Key Laboratory of Crop Stress Biology for Arid Areas and Department of Plant Pathology, College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, China
| | - Guangyu Sun
- State Key Laboratory of Crop Stress Biology for Arid Areas and Department of Plant Pathology, College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, China
| | - Mark L Gleason
- Department of Plant Pathology and Microbiology, Iowa State University, Ames, Iowa, USA
| |
Collapse
|
114
|
Savel D, LaFramboise T, Grama A, Koyuturk M. Pluribus-Exploring the Limits of Error Correction Using a Suffix Tree. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1378-1388. [PMID: 27362987 PMCID: PMC5754272 DOI: 10.1109/tcbb.2016.2586060] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Next generation sequencing technologies enable efficient and cost-effective genome sequencing. However, sequencing errors increase the complexity of the de novo assembly process, and reduce the quality of the assembled sequences. Many error correction techniques utilizing substring frequencies have been developed to mitigate this effect. In this paper, we present a novel and effective method called Pluribus, for correcting sequencing errors using a generalized suffix trie. Pluribus utilizes multiple manifestations of an error in the trie to accurately identify errors and suggest corrections. We show that Pluribus produces the least number of false positives across a diverse set of real sequencing datasets when compared to other methods. Furthermore, Pluribus can be used in conjunction with other contemporary error correction methods to achieve higher levels of accuracy than either tool alone. These increases in error correction accuracy are also realized in the quality of the contigs that are generated during assembly. We explore, in-depth, the behavior of Pluribus , to explain the observed improvement in accuracy and assembly performance. Pluribus is freely available at http://compbio. CASE edu/pluribus/.
Collapse
|
115
|
Squeakr: an exact and approximate k-mer counting system. Bioinformatics 2017; 34:568-575. [DOI: 10.1093/bioinformatics/btx636] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Accepted: 10/06/2017] [Indexed: 11/14/2022] Open
|
116
|
Moura A, Tourdjman M, Leclercq A, Hamelin E, Laurent E, Fredriksen N, Van Cauteren D, Bracq-Dieye H, Thouvenot P, Vales G, Tessaud-Rita N, Maury MM, Alexandru A, Criscuolo A, Quevillon E, Donguy MP, Enouf V, de Valk H, Brisse S, Lecuit M. Real-Time Whole-Genome Sequencing for Surveillance of Listeria monocytogenes, France. Emerg Infect Dis 2017. [PMID: 28643628 PMCID: PMC5572858 DOI: 10.3201/eid2309.170336] [Citation(s) in RCA: 113] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
During 2015–2016, we evaluated the performance of whole-genome sequencing (WGS) as a routine typing tool. Its added value for microbiological and epidemiologic surveillance of listeriosis was compared with that for pulsed-field gel electrophoresis (PFGE), the current standard method. A total of 2,743 Listeria monocytogenes isolates collected as part of routine surveillance were characterized in parallel by PFGE and core genome multilocus sequence typing (cgMLST) extracted from WGS. We investigated PFGE and cgMLST clusters containing human isolates. Discrimination of isolates was significantly higher by cgMLST than by PFGE (p<0.001). cgMLST discriminated unrelated isolates that shared identical PFGE profiles and phylogenetically closely related isolates with distinct PFGE profiles. This procedure also refined epidemiologic investigations to include only phylogenetically closely related isolates, improved source identification, and facilitated epidemiologic investigations, enabling identification of more outbreaks at earlier stages. WGS-based typing should replace PFGE as the primary typing method for L. monocytogenes.
Collapse
|
117
|
Ochoa A, Onorato DP, Fitak RR, Roelke-Parker ME, Culver M. Evolutionary and Functional Mitogenomics Associated With the Genetic Restoration of the Florida Panther. J Hered 2017; 108:449-455. [PMID: 28204600 DOI: 10.1093/jhered/esx015] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2016] [Accepted: 02/14/2017] [Indexed: 01/02/2023] Open
Abstract
Florida panthers are endangered pumas that currently persist in reduced patches of habitat in South Florida, USA. We performed mitogenome reference-based assemblies for most parental lines of the admixed Florida panthers that resulted from the introduction of female Texas pumas into South Florida in 1995. With the addition of 2 puma mitogenomes, we characterized 174 single nucleotide polymorphisms (SNPs) across 12 individuals. We defined 5 haplotypes (Pco1-Pco5), one of which (Pco1) had a geographic origin exclusive to Costa Rica and Panama and was possibly introduced into the Everglades National Park, Florida, prior to 1995. Haplotype Pco2 was native to Florida. Haplotypes Pco3 and Pco4 were exclusive to Texas, whereas haplotype Pco5 had an undetermined geographic origin. Phylogenetic inference suggests that haplotypes Pco1-Pco4 diverged ~202000 (95% HPDI = 83000-345000) years ago and that haplotypes Pco2-Pco4 diverged ~61000 (95% HPDI = 9000-127000) years ago. These results are congruent with a south-to-north continental expansion and with a recent North American colonization by pumas. Furthermore, pumas may have migrated from Texas to Florida no earlier than ~44000 (95% HPDI = 2000-98000) years ago. Synonymous mutations presented a greater mean substitution rate than other mitochondrial functional regions: nonsynonymous mutations, tRNAs, rRNAs, and control region. Similarly, all protein-coding genes were under predominant negative selection constraints. We directly and indirectly assessed the presence of potential deleterious SNPs in the ND2 and ND5 genes in Florida panthers prior to and as a consequence of the introduction of Texas pumas. Screenings for such variants are recommended in extant Florida panthers.
Collapse
Affiliation(s)
- Alexander Ochoa
- From the School of Natural Resources and the Environment, University of Arizona, Tucson, AZ 85721 (Ochoa and Culver); Fish and Wildlife Research Institute, Florida Fish and Wildlife Conservation Commission, Naples, FL 34114 (Onorato); Department of Biology, Duke University, Durham, NC 27708 (Fitak); Frederick National Laboratory of Cancer Research, Leidos Biomedical Research, Inc., Bethesda, MD 20892 (Roelke-Parker); and US Geological Survey, Arizona Cooperative Fish and Wildlife Research Unit, Tucson, AZ 85721 (Culver)
| | - David P Onorato
- From the School of Natural Resources and the Environment, University of Arizona, Tucson, AZ 85721 (Ochoa and Culver); Fish and Wildlife Research Institute, Florida Fish and Wildlife Conservation Commission, Naples, FL 34114 (Onorato); Department of Biology, Duke University, Durham, NC 27708 (Fitak); Frederick National Laboratory of Cancer Research, Leidos Biomedical Research, Inc., Bethesda, MD 20892 (Roelke-Parker); and US Geological Survey, Arizona Cooperative Fish and Wildlife Research Unit, Tucson, AZ 85721 (Culver)
| | - Robert R Fitak
- From the School of Natural Resources and the Environment, University of Arizona, Tucson, AZ 85721 (Ochoa and Culver); Fish and Wildlife Research Institute, Florida Fish and Wildlife Conservation Commission, Naples, FL 34114 (Onorato); Department of Biology, Duke University, Durham, NC 27708 (Fitak); Frederick National Laboratory of Cancer Research, Leidos Biomedical Research, Inc., Bethesda, MD 20892 (Roelke-Parker); and US Geological Survey, Arizona Cooperative Fish and Wildlife Research Unit, Tucson, AZ 85721 (Culver)
| | - Melody E Roelke-Parker
- From the School of Natural Resources and the Environment, University of Arizona, Tucson, AZ 85721 (Ochoa and Culver); Fish and Wildlife Research Institute, Florida Fish and Wildlife Conservation Commission, Naples, FL 34114 (Onorato); Department of Biology, Duke University, Durham, NC 27708 (Fitak); Frederick National Laboratory of Cancer Research, Leidos Biomedical Research, Inc., Bethesda, MD 20892 (Roelke-Parker); and US Geological Survey, Arizona Cooperative Fish and Wildlife Research Unit, Tucson, AZ 85721 (Culver)
| | - Melanie Culver
- From the School of Natural Resources and the Environment, University of Arizona, Tucson, AZ 85721 (Ochoa and Culver); Fish and Wildlife Research Institute, Florida Fish and Wildlife Conservation Commission, Naples, FL 34114 (Onorato); Department of Biology, Duke University, Durham, NC 27708 (Fitak); Frederick National Laboratory of Cancer Research, Leidos Biomedical Research, Inc., Bethesda, MD 20892 (Roelke-Parker); and US Geological Survey, Arizona Cooperative Fish and Wildlife Research Unit, Tucson, AZ 85721 (Culver)
| |
Collapse
|
118
|
Sanchez-Larrayoz AF, Elhosseiny NM, Chevrette MG, Fu Y, Giunta P, Spallanzani RG, Ravi K, Pier GB, Lory S, Maira-Litrán T. Complexity of Complement Resistance Factors Expressed by Acinetobacter baumannii Needed for Survival in Human Serum. THE JOURNAL OF IMMUNOLOGY 2017; 199:2803-2814. [PMID: 28855313 DOI: 10.4049/jimmunol.1700877] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Accepted: 08/07/2017] [Indexed: 11/19/2022]
Abstract
Acinetobacter baumannii is a bacterial pathogen with increasing impact in healthcare settings, due in part to this organism's resistance to many antimicrobial agents, with pneumonia and bacteremia as the most common manifestations of disease. A significant proportion of clinically relevant A. baumannii strains are resistant to killing by normal human serum (NHS), an observation supported in this study by showing that 12 out of 15 genetically diverse strains of A. baumannii are resistant to NHS killing. To expand our understanding of the genetic basis of A. baumannii serum resistance, a transposon (Tn) sequencing (Tn-seq) approach was used to identify genes contributing to this trait. An ordered Tn library in strain AB5075 with insertions in every nonessential gene was subjected to selection in NHS. We identified 50 genes essential for the survival of A. baumannii in NHS, including already known serum resistance factors, and many novel genes not previously associated with serum resistance. This latter group included the maintenance of lipid asymmetry genetic pathway as a key determinant in protecting A. baumannii from the bactericidal activity of NHS via the alternative complement pathway. Follow-up studies validated the role of eight additional genes identified by Tn-seq in A. baumannii resistance to killing by NHS but not by normal mouse serum, highlighting the human species specificity of A. baumannii serum resistance. The identification of a large number of genes essential for serum resistance in A. baumannii indicates the degree of complexity needed for this phenotype, which might reflect a general pattern that pathogens rely on to cause serious infections.
Collapse
Affiliation(s)
- Amaro F Sanchez-Larrayoz
- Division of Infectious Diseases, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115; and
| | - Noha M Elhosseiny
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115
| | - Marc G Chevrette
- Division of Infectious Diseases, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115; and
| | - Yang Fu
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115
| | - Peter Giunta
- Division of Infectious Diseases, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115; and
| | - Raúl G Spallanzani
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115
| | - Keerthikka Ravi
- Division of Infectious Diseases, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115; and
| | - Gerald B Pier
- Division of Infectious Diseases, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115; and
| | - Stephen Lory
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115
| | - Tomás Maira-Litrán
- Division of Infectious Diseases, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115; and
| |
Collapse
|
119
|
Nicholson AC, Gulvik CA, Whitney AM, Humrighouse BW, Graziano J, Emery B, Bell M, Loparev V, Juieng P, Gartin J, Bizet C, Clermont D, Criscuolo A, Brisse S, McQuiston JR. Revisiting the taxonomy of the genus Elizabethkingia using whole-genome sequencing, optical mapping, and MALDI-TOF, along with proposal of three novel Elizabethkingia species: Elizabethkingia bruuniana sp. nov., Elizabethkingia ursingii sp. nov., and Elizabethkingia occulta sp. nov. Antonie van Leeuwenhoek 2017; 111:55-72. [PMID: 28856455 DOI: 10.1007/s10482-017-0926-3] [Citation(s) in RCA: 72] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2017] [Accepted: 08/07/2017] [Indexed: 10/19/2022]
Abstract
The genus Elizabethkingia is genetically heterogeneous, and the phenotypic similarities between recognized species pose challenges in correct identification of clinically derived isolates. In addition to the type species Elizabethkingia meningoseptica, and more recently proposed Elizabethkingia miricola, Elizabethkingia anophelis and Elizabethkingia endophytica, four genomospecies have long been recognized. By comparing historic DNA-DNA hybridization results with whole genome sequences, optical maps, and MALDI-TOF mass spectra on a large and diverse set of strains, we propose a comprehensive taxonomic revision of this genus. Genomospecies 1 and 2 contain the type strains E. anophelis and E. miricola, respectively. Genomospecies 3 and 4 are herein proposed as novel species named as Elizabethkingia bruuniana sp. nov. (type strain, G0146T = DSM 2975T = CCUG 69503T = CIP 111191T) and Elizabethkingia ursingii sp. nov. (type strain, G4122T = DSM 2974T = CCUG 69496T = CIP 111192T), respectively. Finally, the new species Elizabethkingia occulta sp. nov. (type strain G4070T = DSM 2976T = CCUG 69505T = CIP 111193T), is proposed.
Collapse
Affiliation(s)
- Ainsley C Nicholson
- Special Bacteriology Reference Laboratory, Bacterial Special Pathogens Branch, Division of High Consequence Pathogens and Pathology, Centers for Disease Control and Prevention, Atlanta, GA, 30333, USA.
| | - Christopher A Gulvik
- Special Bacteriology Reference Laboratory, Bacterial Special Pathogens Branch, Division of High Consequence Pathogens and Pathology, Centers for Disease Control and Prevention, Atlanta, GA, 30333, USA
| | - Anne M Whitney
- Special Bacteriology Reference Laboratory, Bacterial Special Pathogens Branch, Division of High Consequence Pathogens and Pathology, Centers for Disease Control and Prevention, Atlanta, GA, 30333, USA
| | - Ben W Humrighouse
- Special Bacteriology Reference Laboratory, Bacterial Special Pathogens Branch, Division of High Consequence Pathogens and Pathology, Centers for Disease Control and Prevention, Atlanta, GA, 30333, USA
| | - James Graziano
- Special Bacteriology Reference Laboratory, Bacterial Special Pathogens Branch, Division of High Consequence Pathogens and Pathology, Centers for Disease Control and Prevention, Atlanta, GA, 30333, USA
| | - Brian Emery
- Special Bacteriology Reference Laboratory, Bacterial Special Pathogens Branch, Division of High Consequence Pathogens and Pathology, Centers for Disease Control and Prevention, Atlanta, GA, 30333, USA
| | - Melissa Bell
- Special Bacteriology Reference Laboratory, Bacterial Special Pathogens Branch, Division of High Consequence Pathogens and Pathology, Centers for Disease Control and Prevention, Atlanta, GA, 30333, USA
| | - Vladimir Loparev
- Division of Scientific Resources, Centers for Disease Control and Prevention, Atlanta, GA, 30333, USA
| | - Phalasy Juieng
- Division of Scientific Resources, Centers for Disease Control and Prevention, Atlanta, GA, 30333, USA
| | - Jarrett Gartin
- Special Bacteriology Reference Laboratory, Bacterial Special Pathogens Branch, Division of High Consequence Pathogens and Pathology, Centers for Disease Control and Prevention, Atlanta, GA, 30333, USA
| | - Chantal Bizet
- Microbiology Department, Institut Pasteur, Collection de L'Institut Pasteur (CIP), Paris, France
| | - Dominique Clermont
- Microbiology Department, Institut Pasteur, Collection de L'Institut Pasteur (CIP), Paris, France
| | - Alexis Criscuolo
- Institut Pasteur - Bioinformatics and Biostatistics Hub - C3BI, USR 3756 IP CNRS, Paris, France
| | - Sylvain Brisse
- Microbial Evolutionary Genomics, Institut Pasteur, Paris, France.,CNRS, UMR 3525, Paris, France.,Institut Pasteur, Biodiversity and Epidemiology of Bacterial Pathogens, Paris, France
| | - John R McQuiston
- Special Bacteriology Reference Laboratory, Bacterial Special Pathogens Branch, Division of High Consequence Pathogens and Pathology, Centers for Disease Control and Prevention, Atlanta, GA, 30333, USA
| |
Collapse
|
120
|
Hurtado-Ortiz R, Nazimoudine A, Criscuolo A, Hugon P, Mornico D, Brisse S, Bizet C, Clermont D. Psychrobacter pasteurii and Psychrobacter piechaudii sp. nov., two novel species within the genus Psychrobacter. Int J Syst Evol Microbiol 2017; 67:3192-3197. [PMID: 28840795 DOI: 10.1099/ijsem.0.002065] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Six Gram-negative, non-motile, non-spore-forming, non-pigmented, oxidase- and catalase-positive bacterial strains were deposited in 1972, in the Collection of the Institut Pasteur (CIP), Paris, France. The strains, previously identified as members of the genus Moraxella on the basis of their phenotypic and biochemical characteristics, were placed within the genus Psychrobacter based on the results from comparative 16S rRNA gene sequence studies. Their closest phylogenetic relatives were Psychrobacter sanguinis CIP 110993T, Psychrobacter phenylpyruvicus CIP 82.27T and Psychrobacter lutiphocae CIP 110018T. The DNA G+C contents were between 42.1 and 42.7 mol%. The predominant fatty acids were C18 : 1ω9c, C16 : 0, C12 : 0 3-OH, and C18 : 0. Average nucleotide identity between the six strains and their closest phylogenetic relatives, as well as their phenotypic characteristics, supported the assignment of these strains to two novel species within the genus Psychrobacter. The proposed names for these strains are Psychrobacter pasteurii sp. nov., for which the type strain is A1019T (=CIP 110853T=CECT 9184T), and Psychrobacter piechaudii sp. nov., for which the type strain is 1232T (=CIP110854T=CECT 9185T).
Collapse
Affiliation(s)
- Raquel Hurtado-Ortiz
- CRBIP-Centre de Ressources Biologiques, Institut Pasteur, Paris, France.,CIP-Collection of Institut Pasteur, Institut Pasteur, Paris, France
| | | | - Alexis Criscuolo
- Hub Bioinformatique et Biostatistique - C3BI, USR 3756 IP CNRS - Institut Pasteur, Paris, France
| | - Perrine Hugon
- Microbial Evolutionary Genomics, Institut Pasteur, Paris, France
| | - Damien Mornico
- Hub Bioinformatique et Biostatistique - C3BI, USR 3756 IP CNRS - Institut Pasteur, Paris, France
| | - Sylvain Brisse
- Microbial Evolutionary Genomics, Institut Pasteur, Paris, France.,Centre National de la Recherche Scientifique (CNRS), UMR 3525, Paris, France.,Molecular Prevention and Therapy of Human Diseases, Institut Pasteur, Paris, France
| | - Chantal Bizet
- CIP-Collection of Institut Pasteur, Institut Pasteur, Paris, France.,CRBIP-Centre de Ressources Biologiques, Institut Pasteur, Paris, France
| | | |
Collapse
|
121
|
Evaluation of the impact of Illumina error correction tools on de novo genome assembly. BMC Bioinformatics 2017; 18:374. [PMID: 28821237 PMCID: PMC5563063 DOI: 10.1186/s12859-017-1784-8] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Accepted: 08/11/2017] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Recently, many standalone applications have been proposed to correct sequencing errors in Illumina data. The key idea is that downstream analysis tools such as de novo genome assemblers benefit from a reduced error rate in the input data. Surprisingly, a systematic validation of this assumption using state-of-the-art assembly methods is lacking, even for recently published methods. RESULTS For twelve recent Illumina error correction tools (EC tools) we evaluated both their ability to correct sequencing errors and their ability to improve de novo genome assembly in terms of contig size and accuracy. CONCLUSIONS We confirm that most EC tools reduce the number of errors in sequencing data without introducing many new errors. However, we found that many EC tools suffer from poor performance in certain sequence contexts such as regions with low coverage or regions that contain short repeated or low-complexity sequences. Reads overlapping such regions are often ill-corrected in an inconsistent manner, leading to breakpoints in the resulting assemblies that are not present in assemblies obtained from uncorrected data. Resolving this systematic flaw in future EC tools could greatly improve the applicability of such tools.
Collapse
|
122
|
Malhotra R, Jha M, Poss M, Acharya R. A random forest classifier for detecting rare variants in NGS data from viral populations. Comput Struct Biotechnol J 2017; 15:388-395. [PMID: 28819548 PMCID: PMC5548337 DOI: 10.1016/j.csbj.2017.07.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Revised: 07/01/2017] [Accepted: 07/03/2017] [Indexed: 11/28/2022] Open
Abstract
We propose a random forest classifier for detecting rare variants from sequencing errors in Next Generation Sequencing (NGS) data from viral populations. The method utilizes counts of varying length of k-mers from the reads of a viral population to train a Random forest classifier, called MultiRes, that classifies k-mers as erroneous or rare variants. Our algorithm is rooted in concepts from signal processing and uses a frame-based representation of k-mers. Frames are sets of non-orthogonal basis functions that were traditionally used in signal processing for noise removal. We define discrete spatial signals for genomes and sequenced reads, and show that k-mers of a given size constitute a frame. We evaluate MultiRes on simulated and real viral population datasets, which consist of many low frequency variants, and compare it to the error detection methods used in correction tools known in the literature. MultiRes has 4 to 500 times less false positives k-mer predictions compared to other methods, essential for accurate estimation of viral population diversity and their de-novo assembly. It has high recall of the true k-mers, comparable to other error correction methods. MultiRes also has greater than 95% recall for detecting single nucleotide polymorphisms (SNPs) and fewer false positive SNPs, while detecting higher number of rare variants compared to other variant calling methods for viral populations. The software is available freely from the GitHub link https://github.com/raunaq-m/MultiRes.
Collapse
Affiliation(s)
- Raunaq Malhotra
- The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Manjari Jha
- The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Mary Poss
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Raj Acharya
- School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA
| |
Collapse
|
123
|
Dlugosz M, Deorowicz S. RECKONER: read error corrector based on KMC. Bioinformatics 2017; 33:1086-1089. [PMID: 28062451 DOI: 10.1093/bioinformatics/btw746] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2016] [Accepted: 11/24/2016] [Indexed: 11/12/2022] Open
Abstract
Summary Presence of sequencing errors in data produced by next-generation sequencers affects quality of downstream analyzes. Accuracy of them can be improved by performing error correction of sequencing reads. We introduce a new correction algorithm capable of processing eukaryotic close to 500 Mbp-genome-size, high error-rated data using less than 4 GB of RAM in about 35 min on 16-core computer. Availability and Implementation Program is freely available at http://sun.aei.polsl.pl/REFRESH/reckoner . Contact sebastian.deorowicz@polsl.pl. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
124
|
Draft Genome Sequences of Listeria monocytogenes, Isolated from Fresh Leaf Vegetables in Owerri City, Nigeria. GENOME ANNOUNCEMENTS 2017; 5:5/22/e00354-17. [PMID: 28572306 PMCID: PMC5454189 DOI: 10.1128/genomea.00354-17] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Here, we report the draft genome sequences of three Listeria monocytogenes isolates from fresh leaves collected in Nigeria, belonging to sequence types ST5 and ST155 (sublineages SL5 and SL155, respectively).
Collapse
|
125
|
Evolutionary dynamics and genomic features of the Elizabethkingia anophelis 2015 to 2016 Wisconsin outbreak strain. Nat Commun 2017; 8:15483. [PMID: 28537263 PMCID: PMC5458099 DOI: 10.1038/ncomms15483] [Citation(s) in RCA: 121] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 03/30/2017] [Indexed: 11/26/2022] Open
Abstract
An atypically large outbreak of Elizabethkingia anophelis infections occurred in Wisconsin. Here we show that it was caused by a single strain with thirteen characteristic genomic regions. Strikingly, the outbreak isolates show an accelerated evolutionary rate and an atypical mutational spectrum. Six phylogenetic sub-clusters with distinctive temporal and geographic dynamics are revealed, and their last common ancestor existed approximately one year before the first recognized human infection. Unlike other E. anophelis, the outbreak strain had a disrupted DNA repair mutY gene caused by insertion of an integrative and conjugative element. This genomic change probably contributed to the high evolutionary rate of the outbreak strain and may have increased its adaptability, as many mutations in protein-coding genes occurred during the outbreak. This unique discovery of an outbreak caused by a naturally occurring mutator bacterial pathogen provides a dramatic example of the potential impact of pathogen evolutionary dynamics on infectious disease epidemiology. Elizabethkingia anophelis is an emerging pathogen of high antimicrobial resistance. Perrin and colleagues sequenced isolates of a 2015/2016 E. anophelis outbreak in Wisconsin and found substantial genetic diversity, accelerated evolutionary rate and a disruptive mutation in the DNA repair gene mutY.
Collapse
|
126
|
Population Genomic Analysis of 1,777 Extended-Spectrum Beta-Lactamase-Producing Klebsiella pneumoniae Isolates, Houston, Texas: Unexpected Abundance of Clonal Group 307. mBio 2017; 8:mBio.00489-17. [PMID: 28512093 PMCID: PMC5433097 DOI: 10.1128/mbio.00489-17] [Citation(s) in RCA: 98] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Klebsiella pneumoniae is a major human pathogen responsible for high morbidity and mortality rates. The emergence and spread of strains resistant to multiple antimicrobial agents and documented large nosocomial outbreaks are especially concerning. To develop new therapeutic strategies for K. pneumoniae, it is imperative to understand the population genomic structure of strains causing human infections. To address this knowledge gap, we sequenced the genomes of 1,777 extended-spectrum beta-lactamase-producing K. pneumoniae strains cultured from patients in the 2,000-bed Houston Methodist Hospital system between September 2011 and May 2015, representing a comprehensive, population-based strain sample. Strains of largely uncharacterized clonal group 307 (CG307) caused more infections than those of well-studied epidemic CG258. Strains varied markedly in gene content and had an extensive array of small and very large plasmids, often containing antimicrobial resistance genes. Some patients with multiple strains cultured over time were infected with genetically distinct clones. We identified 15 strains expressing the New Delhi metallo-beta-lactamase 1 (NDM-1) enzyme that confers broad resistance to nearly all beta-lactam antibiotics. Transcriptome sequencing analysis of 10 phylogenetically diverse strains showed that the global transcriptome of each strain was unique and highly variable. Experimental mouse infection provided new information about immunological parameters of host-pathogen interaction. We exploited the large data set to develop whole-genome sequence-based classifiers that accurately predict clinical antimicrobial resistance for 12 of the 16 antibiotics tested. We conclude that analysis of large, comprehensive, population-based strain samples can assist understanding of the molecular diversity of these organisms and contribute to enhanced translational research.IMPORTANCEKlebsiella pneumoniae causes human infections that are increasingly difficult to treat because many strains are resistant to multiple antibiotics. Clonal group 258 (CG258) organisms have caused outbreaks in health care settings worldwide. Using a comprehensive population-based sample of extended-spectrum beta-lactamase (ESBL)-producing K. pneumoniae strains, we show that a relatively uncommon clonal type, CG307, caused the plurality of ESBL-producing K. pneumoniae infections in our patients. We discovered that CG307 strains have been abundant in Houston for many years. As assessed by experimental mouse infection, CG307 strains were as virulent as pandemic CG258 strains. Our results may portend the emergence of an especially successful clonal group of antibiotic-resistant K. pneumoniae.
Collapse
|
127
|
Schmidt B, Hildebrandt A. Next-generation sequencing: big data meets high performance computing. Drug Discov Today 2017; 22:712-717. [DOI: 10.1016/j.drudis.2017.01.014] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2016] [Revised: 12/16/2016] [Accepted: 01/25/2017] [Indexed: 12/17/2022]
|
128
|
Zhao L, Chen Q, Li W, Jiang P, Wong L, Li J. MapReduce for accurate error correction of next-generation sequencing data. Bioinformatics 2017; 33:3844-3851. [PMID: 28205674 DOI: 10.1093/bioinformatics/btx089] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Accepted: 02/14/2017] [Indexed: 11/14/2022] Open
Affiliation(s)
- Liang Zhao
- School of Computing and Electronic Information, Guangxi University, Nanning, China
- Taihe Hospital, Hubei University of Medicine, Hubei, China
| | - Qingfeng Chen
- School of Computing and Electronic Information, Guangxi University, Nanning, China
| | - Wencui Li
- Taihe Hospital, Hubei University of Medicine, Hubei, China
| | - Peng Jiang
- School of Computing and Electronic Information, Guangxi University, Nanning, China
| | - Limsoon Wong
- School of Computing, National University of Singapore, Singapore, Singapore
| | - Jinyan Li
- Advanced Analytics Institute and Centre for Health Technologies, University of Technology Sydney, Broadway, NSW, Australia
| |
Collapse
|
129
|
Saraka D, Savin C, Kouassi S, Cissé B, Koffi E, Cabanel N, Brémont S, Faye-Kette H, Dosso M, Carniel E. Yersinia enterocolitica, a Neglected Cause of Human Enteric Infections in Côte d'Ivoire. PLoS Negl Trop Dis 2017; 11:e0005216. [PMID: 28081123 PMCID: PMC5230755 DOI: 10.1371/journal.pntd.0005216] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2016] [Accepted: 11/30/2016] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Enteropathogenic Yersinia circulate in the pig reservoir and are the third bacterial cause of human gastrointestinal infections in Europe. In West Africa, reports of human yersiniosis are rare. This study was conducted to determine whether pathogenic Yersinia are circulating in pig farms and are responsible for human infections in the Abidjan District. METHODOLOGY/PRINCIPAL FINDINGS From June 2012 to December 2013, pig feces were collected monthly in 41 swine farms of the Abidjan district. Of the 781 samples collected, 19 Yersinia strains were isolated in 3 farms: 7 non-pathogenic Yersinia intermedia and 12 pathogenic Yersinia enterocolitica bioserotype 4/O:3. Farm animals other than pigs and wild animals were not found infected. Furthermore, 2 Y. enterocolitica 4/O:3 strains were isolated from 426 fecal samples of patients with digestive disorders. All 14 Y. enterocolitica strains shared the same PFGE and MLVA profile, indicating their close genetic relationship. However, while 6 of them displayed the usual phage type VIII, the other 8 had the highly infrequent phage type XI. Whole genome sequencing and SNP analysis of individual colonies revealed that phage type XI strains had unusually high rates of mutations. These strains displayed a hypermutator phenotype that was attributable to a large deletion in the mutS gene involved in DNA mismatch repair. CONCLUSIONS/SIGNIFICANCE This study demonstrates that pathogenic Y. enterocolitica circulate in the pig reservoir in Côte d'Ivoire and cause human infections with a prevalence comparable to that of many developed countries. The paucity of reports of yersiniosis in West Africa is most likely attributable to a lack of active detection rather than to an absence of the microorganism. The identification of hypermutator strains in pigs and humans is of concern as these strains can rapidly acquire selective advantages that may increase their fitness, pathogenicity or resistance to commonly used treatments.
Collapse
Affiliation(s)
- Daniel Saraka
- Environnement and Health department, Institut Pasteur, Abidjan, Côte d'Ivoire
- * E-mail: ,
| | - Cyril Savin
- Yersinia Research Unit and National Reference Laboratory, Institut Pasteur, Paris, France
| | - Stephane Kouassi
- Environnement and Health department, Institut Pasteur, Abidjan, Côte d'Ivoire
| | - Bakary Cissé
- Environnement and Health department, Institut Pasteur, Abidjan, Côte d'Ivoire
| | - Eugène Koffi
- Environnement and Health department, Institut Pasteur, Abidjan, Côte d'Ivoire
| | - Nicolas Cabanel
- Yersinia Research Unit and National Reference Laboratory, Institut Pasteur, Paris, France
| | - Sylvie Brémont
- Yersinia Research Unit and National Reference Laboratory, Institut Pasteur, Paris, France
| | - Hortense Faye-Kette
- Bacteriology and Virology department, Institut Pasteur, Abidjan, Côte d'Ivoire
| | - Mireille Dosso
- Bacteriology and Virology department, Institut Pasteur, Abidjan, Côte d'Ivoire
| | - Elisabeth Carniel
- Yersinia Research Unit and National Reference Laboratory, Institut Pasteur, Paris, France
| |
Collapse
|
130
|
Cantalapiedra CP, García-Pereira MJ, Gracia MP, Igartua E, Casas AM, Contreras-Moreira B. Large Differences in Gene Expression Responses to Drought and Heat Stress between Elite Barley Cultivar Scarlett and a Spanish Landrace. FRONTIERS IN PLANT SCIENCE 2017; 8:647. [PMID: 28507554 PMCID: PMC5410667 DOI: 10.3389/fpls.2017.00647] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2016] [Accepted: 04/10/2017] [Indexed: 05/05/2023]
Abstract
Drought causes important losses in crop production every season. Improvement for drought tolerance could take advantage of the diversity held in germplasm collections, much of which has not been incorporated yet into modern breeding. Spanish landraces constitute a promising resource for barley breeding, as they were widely grown until last century and still show good yielding ability under stress. Here, we study the transcriptome expression landscape in two genotypes, an outstanding Spanish landrace-derived inbred line (SBCC073) and a modern cultivar (Scarlett). Gene expression of adult plants after prolonged stresses, either drought or drought combined with heat, was monitored. Transcriptome of mature leaves presented little changes under severe drought, whereas abundant gene expression changes were observed under combined mild drought and heat. Developing inflorescences of SBCC073 exhibited mostly unaltered gene expression, whereas numerous changes were found in the same tissues for Scarlett. Genotypic differences in physiological traits and gene expression patterns confirmed the different behavior of landrace SBCC073 and cultivar Scarlett under abiotic stress, suggesting that they responded to stress following different strategies. A comparison with related studies in barley, addressing gene expression responses to drought, revealed common biological processes, but moderate agreement regarding individual differentially expressed transcripts. Special emphasis was put in the search of co-expressed genes and underlying common regulatory motifs. Overall, 11 transcription factors were identified, and one of them matched cis-regulatory motifs discovered upstream of co-expressed genes involved in those responses.
Collapse
Affiliation(s)
- Carlos P. Cantalapiedra
- Department of Genetics and Plant Production, Estación Experimental de Aula Dei (CSIC)Zaragoza, Spain
| | - María J. García-Pereira
- Department of Genetics and Plant Production, Estación Experimental de Aula Dei (CSIC)Zaragoza, Spain
| | - María P. Gracia
- Department of Genetics and Plant Production, Estación Experimental de Aula Dei (CSIC)Zaragoza, Spain
| | - Ernesto Igartua
- Department of Genetics and Plant Production, Estación Experimental de Aula Dei (CSIC)Zaragoza, Spain
| | - Ana M. Casas
- Department of Genetics and Plant Production, Estación Experimental de Aula Dei (CSIC)Zaragoza, Spain
| | - Bruno Contreras-Moreira
- Department of Genetics and Plant Production, Estación Experimental de Aula Dei (CSIC)Zaragoza, Spain
- Fundación ARAIDZaragoza, Spain
- *Correspondence: Bruno Contreras-Moreira
| |
Collapse
|
131
|
Contreras-Moreira B, Cantalapiedra CP, García-Pereira MJ, Gordon SP, Vogel JP, Igartua E, Casas AM, Vinuesa P. Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species. FRONTIERS IN PLANT SCIENCE 2017; 8:184. [PMID: 28261241 PMCID: PMC5306281 DOI: 10.3389/fpls.2017.00184] [Citation(s) in RCA: 49] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Accepted: 01/30/2017] [Indexed: 05/22/2023]
Abstract
The pan-genome of a species is defined as the union of all the genes and non-coding sequences found in all its individuals. However, constructing a pan-genome for plants with large genomes is daunting both in sequencing cost and the scale of the required computational analysis. A more affordable alternative is to focus on the genic repertoire by using transcriptomic data. Here, the software GET_HOMOLOGUES-EST was benchmarked with genomic and RNA-seq data of 19 Arabidopsis thaliana ecotypes and then applied to the analysis of transcripts from 16 Hordeum vulgare genotypes. The goal was to sample their pan-genomes and classify sequences as core, if detected in all accessions, or accessory, when absent in some of them. The resulting sequence clusters were used to simulate pan-genome growth, and to compile Average Nucleotide Identity matrices that summarize intra-species variation. Although transcripts were found to under-estimate pan-genome size by at least 10%, we concluded that clusters of expressed sequences can recapitulate phylogeny and reproduce two properties observed in A. thaliana gene models: accessory loci show lower expression and higher non-synonymous substitution rates than core genes. Finally, accessory sequences were observed to preferentially encode transposon components in both species, plus disease resistance genes in cultivated barleys, and a variety of protein domains from other families that appear frequently associated with presence/absence variation in the literature. These results demonstrate that pan-genome analyses are useful to explore germplasm diversity.
Collapse
Affiliation(s)
- Bruno Contreras-Moreira
- Estación Experimental de Aula Dei - Consejo Superior de Investigaciones CientíficasZaragoza, Spain; Fundación ARAIDZaragoza, Spain
| | - Carlos P Cantalapiedra
- Estación Experimental de Aula Dei - Consejo Superior de Investigaciones Científicas Zaragoza, Spain
| | - María J García-Pereira
- Estación Experimental de Aula Dei - Consejo Superior de Investigaciones Científicas Zaragoza, Spain
| | | | - John P Vogel
- DOE Joint Genome Institute, Walnut Creek CA, USA
| | - Ernesto Igartua
- Estación Experimental de Aula Dei - Consejo Superior de Investigaciones Científicas Zaragoza, Spain
| | - Ana M Casas
- Estación Experimental de Aula Dei - Consejo Superior de Investigaciones Científicas Zaragoza, Spain
| | - Pablo Vinuesa
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México Cuernavaca, Mexico
| |
Collapse
|
132
|
Genomic Landscape of Intrahost Variation in Group A Streptococcus: Repeated and Abundant Mutational Inactivation of the fabT Gene Encoding a Regulator of Fatty Acid Synthesis. Infect Immun 2016; 84:3268-3281. [PMID: 27600505 DOI: 10.1128/iai.00608-16] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Accepted: 08/08/2016] [Indexed: 01/03/2023] Open
Abstract
To obtain new information about Streptococcus pyogenes intrahost genetic variation during invasive infection, we sequenced the genomes of 2,954 serotype M1 strains recovered from a nonhuman primate experimental model of necrotizing fasciitis. A total of 644 strains (21.8%) acquired polymorphisms relative to the input parental strain. The fabT gene, encoding a transcriptional regulator of fatty acid biosynthesis genes, contained 54.5% of these changes. The great majority of polymorphisms were predicted to deleteriously alter FabT function. Transcriptome-sequencing (RNA-seq) analysis of a wild-type strain and an isogenic fabT deletion mutant strain found that between 3.7 and 28.5% of the S. pyogenes transcripts were differentially expressed, depending on the growth temperature (35°C or 40°C) and growth phase (mid-exponential or stationary phase). Genes implicated in fatty acid synthesis and lipid metabolism were significantly upregulated in the fabT deletion mutant strain. FabT also directly or indirectly regulated central carbon metabolism genes, including pyruvate hub enzymes and fermentation pathways and virulence genes. Deletion of fabT decreased virulence in a nonhuman primate model of necrotizing fasciitis. In addition, the fabT deletion strain had significantly decreased survival in human whole blood and during phagocytic interaction with polymorphonuclear leukocytes ex vivo We conclude that FabT mutant progeny arise during infection, constitute a metabolically distinct subpopulation, and are less virulent in the experimental models used here.
Collapse
|
133
|
Gallone B, Steensels J, Prahl T, Soriaga L, Saels V, Herrera-Malaver B, Merlevede A, Roncoroni M, Voordeckers K, Miraglia L, Teiling C, Steffy B, Taylor M, Schwartz A, Richardson T, White C, Baele G, Maere S, Verstrepen KJ. Domestication and Divergence of Saccharomyces cerevisiae Beer Yeasts. Cell 2016; 166:1397-1410.e16. [PMID: 27610566 PMCID: PMC5018251 DOI: 10.1016/j.cell.2016.08.020] [Citation(s) in RCA: 427] [Impact Index Per Article: 47.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Revised: 06/08/2016] [Accepted: 08/08/2016] [Indexed: 12/04/2022]
Abstract
Whereas domestication of livestock, pets, and crops is well documented, it is still unclear to what extent microbes associated with the production of food have also undergone human selection and where the plethora of industrial strains originates from. Here, we present the genomes and phenomes of 157 industrial Saccharomyces cerevisiae yeasts. Our analyses reveal that today’s industrial yeasts can be divided into five sublineages that are genetically and phenotypically separated from wild strains and originate from only a few ancestors through complex patterns of domestication and local divergence. Large-scale phenotyping and genome analysis further show strong industry-specific selection for stress tolerance, sugar utilization, and flavor production, while the sexual cycle and other phenotypes related to survival in nature show decay, particularly in beer yeasts. Together, these results shed light on the origins, evolutionary history, and phenotypic diversity of industrial yeasts and provide a resource for further selection of superior strains. PaperClip
We sequenced and phenotyped 157 S. cerevisiae yeasts Present-day industrial yeasts originate from only a few domesticated ancestors Beer yeasts show strong genetic and phenotypic hallmarks of domestication Domestication of industrial yeasts predates microbe discovery
Collapse
Affiliation(s)
- Brigida Gallone
- Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics (CMPG), KU Leuven, Kasteelpark Arenberg 22, 3001 Leuven, Belgium; Laboratory for Systems Biology, VIB, Bio-Incubator, Gaston Geenslaan 1, 3001 Leuven, Belgium; Department of Plant Systems Biology, VIB, 9052 Gent, Belgium; Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium
| | - Jan Steensels
- Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics (CMPG), KU Leuven, Kasteelpark Arenberg 22, 3001 Leuven, Belgium; Laboratory for Systems Biology, VIB, Bio-Incubator, Gaston Geenslaan 1, 3001 Leuven, Belgium
| | - Troels Prahl
- White Labs, 9495 Candida Street, San Diego, CA 92126, USA
| | - Leah Soriaga
- Synthetic Genomics, 11149 North Torrey Pines Road, La Jolla, CA 92037, USA
| | - Veerle Saels
- Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics (CMPG), KU Leuven, Kasteelpark Arenberg 22, 3001 Leuven, Belgium; Laboratory for Systems Biology, VIB, Bio-Incubator, Gaston Geenslaan 1, 3001 Leuven, Belgium
| | - Beatriz Herrera-Malaver
- Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics (CMPG), KU Leuven, Kasteelpark Arenberg 22, 3001 Leuven, Belgium; Laboratory for Systems Biology, VIB, Bio-Incubator, Gaston Geenslaan 1, 3001 Leuven, Belgium
| | - Adriaan Merlevede
- Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics (CMPG), KU Leuven, Kasteelpark Arenberg 22, 3001 Leuven, Belgium; Laboratory for Systems Biology, VIB, Bio-Incubator, Gaston Geenslaan 1, 3001 Leuven, Belgium
| | - Miguel Roncoroni
- Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics (CMPG), KU Leuven, Kasteelpark Arenberg 22, 3001 Leuven, Belgium; Laboratory for Systems Biology, VIB, Bio-Incubator, Gaston Geenslaan 1, 3001 Leuven, Belgium
| | - Karin Voordeckers
- Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics (CMPG), KU Leuven, Kasteelpark Arenberg 22, 3001 Leuven, Belgium; Laboratory for Systems Biology, VIB, Bio-Incubator, Gaston Geenslaan 1, 3001 Leuven, Belgium
| | - Loren Miraglia
- Encinitas Brewing Science, 141 Rodney Avenue, Encinitas, CA 92024, USA
| | | | - Brian Steffy
- Illumina, 5200 Illumina Way, San Diego, CA 92122, USA
| | - Maryann Taylor
- Biological & Popular Culture (BioPop), 2205 Faraday Avenue, Suite E, Carlsbad, CA 92008, USA
| | - Ariel Schwartz
- Synthetic Genomics, 11149 North Torrey Pines Road, La Jolla, CA 92037, USA
| | - Toby Richardson
- Synthetic Genomics, 11149 North Torrey Pines Road, La Jolla, CA 92037, USA
| | | | - Guy Baele
- Department of Microbiology and Immunology, Rega Institute, KU Leuven, 3000 Leuven, Belgium
| | - Steven Maere
- Department of Plant Systems Biology, VIB, 9052 Gent, Belgium; Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium.
| | - Kevin J Verstrepen
- Laboratory for Genetics and Genomics, Centre of Microbial and Plant Genetics (CMPG), KU Leuven, Kasteelpark Arenberg 22, 3001 Leuven, Belgium; Laboratory for Systems Biology, VIB, Bio-Incubator, Gaston Geenslaan 1, 3001 Leuven, Belgium.
| |
Collapse
|
134
|
From next-generation resequencing reads to a high-quality variant data set. Heredity (Edinb) 2016; 118:111-124. [PMID: 27759079 DOI: 10.1038/hdy.2016.102] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2016] [Revised: 09/03/2016] [Accepted: 09/06/2016] [Indexed: 12/11/2022] Open
Abstract
Sequencing has revolutionized biology by permitting the analysis of genomic variation at an unprecedented resolution. High-throughput sequencing is fast and inexpensive, making it accessible for a wide range of research topics. However, the produced data contain subtle but complex types of errors, biases and uncertainties that impose several statistical and computational challenges to the reliable detection of variants. To tap the full potential of high-throughput sequencing, a thorough understanding of the data produced as well as the available methodologies is required. Here, I review several commonly used methods for generating and processing next-generation resequencing data, discuss the influence of errors and biases together with their resulting implications for downstream analyses and provide general guidelines and recommendations for producing high-quality single-nucleotide polymorphism data sets from raw reads by highlighting several sophisticated reference-based methods representing the current state of the art.
Collapse
|
135
|
Latronico F, Nasser W, Puhakainen K, Ollgren J, Hyyryläinen HL, Beres SB, Lyytikäinen O, Jalava J, Musser JM, Vuopio J. Genomic Characteristics Behind the Spread of Bacteremic Group A Streptococcus Type emm89 in Finland, 2004-2014. J Infect Dis 2016; 214:1987-1995. [PMID: 27707808 PMCID: PMC5142090 DOI: 10.1093/infdis/jiw468] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Accepted: 09/27/2016] [Indexed: 12/20/2022] Open
Abstract
Background. Many countries worldwide have reported increasing numbers of emm89 group A Streptococcus (GAS) infections during last decade. Pathogen genetic factors linked to this increase need assessment. Methods. We investigated epidemiological characteristics of emm89 GAS bacteremic infections, including 7-day and 30-day case-fatality rates, in Finland during 2004–2014 and linked them to whole-genome sequencing data obtained from corresponding strains. The Fisher exact test and exact logistic regression were used to compare differences between bacteremic infections due to emm89 GAS belonging to different genetic clades and subclades. Results. Out of 1928 cases of GAS bacteremic infection, 278 were caused by emm89 GAS. We identified 2 genetically distinct clades, arbitrarily designated clade 2 and clade 3. Both clades were present during 2004–2008, but clade 3 increased rapidly from 2009 onward. Six subclades (designated subclades A–F) were identified within clade 3, based on phylogenetic core genome analysis. The case-fatality rate differed significantly between subclades (P < .05), with subclade D having the highest 30-day estimated case-fatality rate (19% vs 3%–14%). Conclusions. A new emm89 clone, clade 3, emerged in 2009 and spread rapidly in Finland. Patients infected with certain subclades of clade 3 were significantly more likely to die. A specific polymerase chain reaction assay was developed to follow the spread of subclade D in 2015.
Collapse
Affiliation(s)
- Francesca Latronico
- Department of Infectious Diseases, National Institute for Health and Welfare, Helsinki.,European Programme for Public Health Microbiology Training, European Centre for Disease Prevention and Control, Stockholm, Sweden
| | - Waleed Nasser
- Center for Molecular and Translational Human Infectious Diseases Research, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute, Texas
| | - Kai Puhakainen
- Department of Infectious Diseases, National Institute for Health and Welfare, Helsinki.,Department of Medical Microbiology and Immunology, University of Turku, Finland
| | - Jukka Ollgren
- Department of Infectious Diseases, National Institute for Health and Welfare, Helsinki
| | | | - Stephen B Beres
- Center for Molecular and Translational Human Infectious Diseases Research, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute, Texas
| | - Outi Lyytikäinen
- Department of Infectious Diseases, National Institute for Health and Welfare, Helsinki
| | - Jari Jalava
- Department of Infectious Diseases, National Institute for Health and Welfare, Helsinki
| | - James M Musser
- Center for Molecular and Translational Human Infectious Diseases Research, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute, Texas
| | - Jaana Vuopio
- Department of Infectious Diseases, National Institute for Health and Welfare, Helsinki.,Department of Medical Microbiology and Immunology, University of Turku, Finland
| |
Collapse
|
136
|
Akogwu I, Wang N, Zhang C, Gong P. A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis. Hum Genomics 2016; 10 Suppl 2:20. [PMID: 27461106 PMCID: PMC4965716 DOI: 10.1186/s40246-016-0068-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Innumerable opportunities for new genomic research have been stimulated by advancement in high-throughput next-generation sequencing (NGS). However, the pitfall of NGS data abundance is the complication of distinction between true biological variants and sequence error alterations during downstream analysis. Many error correction methods have been developed to correct erroneous NGS reads before further analysis, but independent evaluation of the impact of such dataset features as read length, genome size, and coverage depth on their performance is lacking. This comparative study aims to investigate the strength and weakness as well as limitations of some newest k-spectrum-based methods and to provide recommendations for users in selecting suitable methods with respect to specific NGS datasets. METHODS Six k-spectrum-based methods, i.e., Reptile, Musket, Bless, Bloocoo, Lighter, and Trowel, were compared using six simulated sets of paired-end Illumina sequencing data. These NGS datasets varied in coverage depth (10× to 120×), read length (36 to 100 bp), and genome size (4.6 to 143 MB). Error Correction Evaluation Toolkit (ECET) was employed to derive a suite of metrics (i.e., true positives, false positive, false negative, recall, precision, gain, and F-score) for assessing the correction quality of each method. RESULTS Results from computational experiments indicate that Musket had the best overall performance across the spectra of examined variants reflected in the six datasets. The lowest accuracy of Musket (F-score = 0.81) occurred to a dataset with a medium read length (56 bp), a medium coverage (50×), and a small-sized genome (5.4 MB). The other five methods underperformed (F-score < 0.80) and/or failed to process one or more datasets. CONCLUSIONS This study demonstrates that various factors such as coverage depth, read length, and genome size may influence performance of individual k-spectrum-based error correction methods. Thus, efforts have to be paid in choosing appropriate methods for error correction of specific NGS datasets. Based on our comparative study, we recommend Musket as the top choice because of its consistently superior performance across all six testing datasets. Further extensive studies are warranted to assess these methods using experimental datasets generated by NGS platforms (e.g., 454, SOLiD, and Ion Torrent) under more diversified parameter settings (k-mer values and edit distances) and to compare them against other non-k-spectrum-based classes of error correction methods.
Collapse
Affiliation(s)
- Isaac Akogwu
- School of Computing, University of Southern Mississippi, Hattiesburg, MS, 39406, USA
| | - Nan Wang
- School of Computing, University of Southern Mississippi, Hattiesburg, MS, 39406, USA
| | - Chaoyang Zhang
- School of Computing, University of Southern Mississippi, Hattiesburg, MS, 39406, USA
| | - Ping Gong
- Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, 39180, USA.
| |
Collapse
|
137
|
Mamun AA, Pal S, Rajasekaran S. KCMBT: a k-mer Counter based on Multiple Burst Trees. Bioinformatics 2016; 32:2783-90. [PMID: 27283950 DOI: 10.1093/bioinformatics/btw345] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2016] [Accepted: 05/25/2016] [Indexed: 01/30/2023] Open
Abstract
MOTIVATION A massive number of bioinformatics applications require counting of k-length substrings in genetically important long strings. A k-mer counter generates the frequencies of each k-length substring in genome sequences. Genome assembly, repeat detection, multiple sequence alignment, error detection and many other related applications use a k-mer counter as a building block. Very fast and efficient algorithms are necessary to count k-mers in large data sets to be useful in such applications. RESULTS We propose a novel trie-based algorithm for this k-mer counting problem. We compare our devised algorithm k-mer Counter based on Multiple Burst Trees (KCMBT) with available all well-known algorithms. Our experimental results show that KCMBT is around 30% faster than the previous best-performing algorithm KMC2 for human genome dataset. As another example, our algorithm is around six times faster than Jellyfish2. Overall, KCMBT is 20-30% faster than KMC2 on five benchmark data sets when both the algorithms were run using multiple threads. AVAILABILITY AND IMPLEMENTATION KCMBT is freely available on GitHub: (https://github.com/abdullah009/kcmbt_mt). CONTACT rajasek@engr.uconn.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Abdullah-Al Mamun
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA
| | - Soumitra Pal
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA
| | - Sanguthevar Rajasekaran
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
138
|
Liu Y, Hankeln T, Schmidt B. Parallel and Space-Efficient Construction of Burrows-Wheeler Transform and Suffix Array for Big Genome Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:592-598. [PMID: 27295644 DOI: 10.1109/tcbb.2015.2430314] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Next-generation sequencing technologies have led to the sequencing of more and more genomes, propelling related research into the era of big data. In this paper, we present ParaBWT, a parallelized Burrows-Wheeler transform (BWT) and suffix array construction algorithm for big genome data. In ParaBWT, we have investigated a progressive construction approach to constructing the BWT of single genome sequences in linear space complexity, but with a small constant factor. This approach has been further parallelized using multi-threading based on a master-slave coprocessing model. After gaining the BWT, the suffix array is constructed in a memory-efficient manner. The performance of ParaBWT has been evaluated using two sequences generated from two human genome assemblies: the Ensembl Homo sapiens assembly and the human reference genome. Our performance comparison to FMD-index and Bwt-disk reveals that on 12 CPU cores, ParaBWT runs up to 2.2× faster than FMD-index and up to 99.0× faster than Bwt-disk. BWT construction algorithms for very long genomic sequences are time consuming and (due to their incremental nature) inherently difficult to parallelize. Thus, their parallelization is challenging and even relatively small speedups like the ones of our method over FMD-index are of high importance to research. ParaBWT is written in C++, and is freely available at http://parabwt.sourceforge.net.
Collapse
|
139
|
Heo Y, Ramachandran A, Hwu WM, Ma J, Chen D. BLESS 2: accurate, memory-efficient and fast error correction method. ACTA ACUST UNITED AC 2016; 32:2369-71. [PMID: 27153708 DOI: 10.1093/bioinformatics/btw146] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2015] [Accepted: 03/12/2016] [Indexed: 11/14/2022]
Abstract
UNLABELLED The most important features of error correction tools for sequencing data are accuracy, memory efficiency and fast runtime. The previous version of BLESS was highly memory-efficient and accurate, but it was too slow to handle reads from large genomes. We have developed a new version of BLESS to improve runtime and accuracy while maintaining a small memory usage. The new version, called BLESS 2, has an error correction algorithm that is more accurate than BLESS, and the algorithm has been parallelized using hybrid MPI and OpenMP programming. BLESS 2 was compared with five top-performing tools, and it was found to be the fastest when it was executed on two computing nodes using MPI, with each node containing twelve cores. Also, BLESS 2 showed at least 11% higher gain while retaining the memory efficiency of the previous version for large genomes. AVAILABILITY AND IMPLEMENTATION Freely available at https://sourceforge.net/projects/bless-ec CONTACT dchen@illinois.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yun Heo
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Anand Ramachandran
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Wen-Mei Hwu
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Jian Ma
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Deming Chen
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
140
|
Madoui MA, Dossat C, d'Agata L, van Oeveren J, van der Vossen E, Aury JM. MaGuS: a tool for quality assessment and scaffolding of genome assemblies with Whole Genome Profiling™ Data. BMC Bioinformatics 2016; 17:115. [PMID: 26936254 PMCID: PMC4776351 DOI: 10.1186/s12859-016-0969-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Accepted: 02/23/2016] [Indexed: 12/20/2022] Open
Abstract
Background Scaffolding is an essential step in the genome assembly process. Current methods based on large fragment paired-end reads or long reads allow an increase in contiguity but often lack consistency in repetitive regions, resulting in fragmented assemblies. Here, we describe a novel tool to link assemblies to a genome map to aid complex genome reconstruction by detecting assembly errors and allowing scaffold ordering and anchoring. Results We present MaGuS (map-guided scaffolding), a modular tool that uses a draft genome assembly, a Whole Genome Profiling™ (WGP) map, and high-throughput paired-end sequencing data to estimate the quality and to enhance the contiguity of an assembly. We generated several assemblies of the Arabidopsis genome using different scaffolding programs and applied MaGuS to select the best assembly using quality metrics. Then, we used MaGuS to perform map-guided scaffolding to increase contiguity by creating new scaffold links in low-covered and highly repetitive regions where other commonly used scaffolding methods lack consistency. Conclusions MaGuS is a powerful reference-free evaluator of assembly quality and a WGP map-guided scaffolder that is freely available at https://github.com/institut-de-genomique/MaGuS. Its use can be extended to other high-throughput sequencing data (e.g., long-read data) and also to other map data (e.g., genetic maps) to improve the quality and the contiguity of large and complex genome assemblies. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-0969-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mohammed-Amin Madoui
- CEA, DSV, Institut de Génomique, Genoscope, 2 rue Gaston Crémieux, CP5706, 91057, Evry, France.
| | - Carole Dossat
- CEA, DSV, Institut de Génomique, Genoscope, 2 rue Gaston Crémieux, CP5706, 91057, Evry, France.
| | - Léo d'Agata
- CEA, DSV, Institut de Génomique, Genoscope, 2 rue Gaston Crémieux, CP5706, 91057, Evry, France.
| | - Jan van Oeveren
- Keygene NV, Agro Business Park 90, 6708 PW, Wageningen, The Netherlands.
| | | | - Jean-Marc Aury
- CEA, DSV, Institut de Génomique, Genoscope, 2 rue Gaston Crémieux, CP5706, 91057, Evry, France.
| |
Collapse
|
141
|
Sameith K, Roscito JG, Hiller M. Iterative error correction of long sequencing reads maximizes accuracy and improves contig assembly. Brief Bioinform 2016; 18:1-8. [PMID: 26868358 PMCID: PMC5221426 DOI: 10.1093/bib/bbw003] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Revised: 01/02/2016] [Indexed: 11/13/2022] Open
Abstract
Next-generation sequencers such as Illumina can now produce reads up to 300 bp with high throughput, which is attractive for genome assembly. A first step in genome assembly is to computationally correct sequencing errors. However, correcting all errors in these longer reads is challenging. Here, we show that reads with remaining errors after correction often overlap repeats, where short erroneous k-mers occur in other copies of the repeat. We developed an iterative error correction pipeline that runs the previously published String Graph Assembler (SGA) in multiple rounds of k-mer-based correction with an increasing k-mer size, followed by a final round of overlap-based correction. By combining the advantages of small and large k-mers, this approach corrects more errors in repeats and minimizes the total amount of erroneous reads. We show that higher read accuracy increases contig lengths two to three times. We provide SGA-Iteratively Correcting Errors (https://github.com/hillerlab/IterativeErrorCorrection/) that implements iterative error correction by using modules from SGA.
Collapse
Affiliation(s)
- Katrin Sameith
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, Dresden, Germany
| | - Juliana G Roscito
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, Dresden, Germany
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, Dresden, Germany
- Corresponding author. Michael Hiller. Max Planck Institute of Molecular Cell Biology and Genetics & Max Planck Institute for the Physics of Complex Systems, 01307 Dresden, Germany. E-mail:
| |
Collapse
|
142
|
Tong L, Yang C, Wu PY, Wang MD. Evaluating the impact of sequencing error correction for RNA-seq data with ERCC RNA spike-in controls. ... IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS. IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS 2016; 2016:74-77. [PMID: 27532064 DOI: 10.1109/bhi.2016.7455838] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Sequencing errors are a major issue for several next-generation sequencing-based applications such as de novo assembly and single nucleotide polymorphism detection. Several error-correction methods have been developed to improve raw data quality. However, error-correction performance is hard to evaluate because of the lack of a ground truth. In this study, we propose a novel approach which using ERCC RNA spike-in controls as the ground truth to facilitate error-correction performance evaluation. After aligning raw and corrected RNA-seq data, we characterized the quality of reads by three metrics: mismatch patterns (i.e., the substitution rate of A to C) of reads aligned with one mismatch, mismatch patterns of reads aligned with two mismatches and the percentage increase of reads aligned to reference. We observed that the mismatch patterns for reads aligned with one mismatch are significantly correlated between ERCC spike-ins and real RNA samples. Based on such observations, we conclude that ERCC spike-ins can serve as ground truths for error correction beyond their previous applications for validation of dynamic range and fold-change response. Also, the mismatch patterns for ERCC reads aligned with one mismatch can serve as a novel and reliable metric to evaluate the performance of error-correction tools.
Collapse
Affiliation(s)
- Li Tong
- Dept. of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Cheng Yang
- Dept. of Biomedical Engineering, Peking University, No.5 Yiheyuan Road Haidian District, Beijing, P.R. China 100871
| | - Po-Yen Wu
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - May D Wang
- Dept. of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| |
Collapse
|
143
|
Acemel RD, Tena JJ, Irastorza-Azcarate I, Marlétaz F, Gómez-Marín C, de la Calle-Mustienes E, Bertrand S, Diaz SG, Aldea D, Aury JM, Mangenot S, Holland PWH, Devos DP, Maeso I, Escrivá H, Gómez-Skarmeta JL. A single three-dimensional chromatin compartment in amphioxus indicates a stepwise evolution of vertebrate Hox bimodal regulation. Nat Genet 2016; 48:336-41. [PMID: 26829752 DOI: 10.1038/ng.3497] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Accepted: 12/30/2015] [Indexed: 12/19/2022]
Abstract
The HoxA and HoxD gene clusters of jawed vertebrates are organized into bipartite three-dimensional chromatin structures that separate long-range regulatory inputs coming from the anterior and posterior Hox-neighboring regions. This architecture is instrumental in allowing vertebrate Hox genes to pattern disparate parts of the body, including limbs. Almost nothing is known about how these three-dimensional topologies originated. Here we perform extensive 4C-seq profiling of the Hox cluster in embryos of amphioxus, an invertebrate chordate. We find that, in contrast to the architecture in vertebrates, the amphioxus Hox cluster is organized into a single chromatin interaction domain that includes long-range contacts mostly from the anterior side, bringing distant cis-regulatory elements into contact with Hox genes. We infer that the vertebrate Hox bipartite regulatory system is an evolutionary novelty generated by combining ancient long-range regulatory contacts from DNA in the anterior Hox neighborhood with new regulatory inputs from the posterior side.
Collapse
Affiliation(s)
- Rafael D Acemel
- Centro Andaluz de Biología del Desarrollo (CABD), Consejo Superior de Investigaciones Científicas/Universidad Pablo de Olavide, Seville, Spain
| | - Juan J Tena
- Centro Andaluz de Biología del Desarrollo (CABD), Consejo Superior de Investigaciones Científicas/Universidad Pablo de Olavide, Seville, Spain
| | - Ibai Irastorza-Azcarate
- Centro Andaluz de Biología del Desarrollo (CABD), Consejo Superior de Investigaciones Científicas/Universidad Pablo de Olavide, Seville, Spain
| | | | - Carlos Gómez-Marín
- Centro Andaluz de Biología del Desarrollo (CABD), Consejo Superior de Investigaciones Científicas/Universidad Pablo de Olavide, Seville, Spain
| | - Elisa de la Calle-Mustienes
- Centro Andaluz de Biología del Desarrollo (CABD), Consejo Superior de Investigaciones Científicas/Universidad Pablo de Olavide, Seville, Spain
| | - Stéphanie Bertrand
- Université Pierre et Marie Curie Université Paris 6, CNRS, UMR 7232, Biologie Integrative des Organismes Marins (BIOM), Observatoire Océanologique de Banyuls-sur-Mer, Banyuls-sur-Mer, France
| | - Sergio G Diaz
- Centro Andaluz de Biología del Desarrollo (CABD), Consejo Superior de Investigaciones Científicas/Universidad Pablo de Olavide, Seville, Spain
| | - Daniel Aldea
- Université Pierre et Marie Curie Université Paris 6, CNRS, UMR 7232, Biologie Integrative des Organismes Marins (BIOM), Observatoire Océanologique de Banyuls-sur-Mer, Banyuls-sur-Mer, France
| | - Jean-Marc Aury
- Commissariat à l'Energie Atomique (CEA), Institut de Génomique (IG), Genoscope, Evry, France
| | - Sophie Mangenot
- Commissariat à l'Energie Atomique (CEA), Institut de Génomique (IG), Genoscope, Evry, France
| | | | - Damien P Devos
- Centro Andaluz de Biología del Desarrollo (CABD), Consejo Superior de Investigaciones Científicas/Universidad Pablo de Olavide, Seville, Spain
| | - Ignacio Maeso
- Centro Andaluz de Biología del Desarrollo (CABD), Consejo Superior de Investigaciones Científicas/Universidad Pablo de Olavide, Seville, Spain
| | - Hector Escrivá
- Université Pierre et Marie Curie Université Paris 6, CNRS, UMR 7232, Biologie Integrative des Organismes Marins (BIOM), Observatoire Océanologique de Banyuls-sur-Mer, Banyuls-sur-Mer, France
| | - José Luis Gómez-Skarmeta
- Centro Andaluz de Biología del Desarrollo (CABD), Consejo Superior de Investigaciones Científicas/Universidad Pablo de Olavide, Seville, Spain
| |
Collapse
|
144
|
Kozma R, Melsted P, Magnússon KP, Höglund J. Looking into the past - the reaction of three grouse species to climate change over the last million years using whole genome sequences. Mol Ecol 2016; 25:570-80. [PMID: 26607571 DOI: 10.1111/mec.13496] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2015] [Revised: 11/19/2015] [Accepted: 11/20/2015] [Indexed: 01/08/2023]
Abstract
Tracking past population fluctuations can give insight into current levels of genetic variation present within species. Analysing population dynamics over larger timescales can be aligned to known climatic changes to determine the response of species to varying environments. Here, we applied the Pairwise Sequentially Markovian Coalescent (psmc) model to infer past population dynamics of three widespread grouse species; black grouse, willow grouse and rock ptarmigan. This allowed the tracking of the effective population size (Ne ) of all three species beyond 1 Mya, revealing that (i) early Pleistocene cooling (~2.5 Mya) caused an increase in the willow grouse and rock ptarmigan populations, (ii) the mid-Brunhes event (~430 kya) and following climatic oscillations decreased the Ne of willow grouse and rock ptarmigan, but increased the Ne of black grouse and (iii) all three species reacted differently to the last glacial maximum (LGM) - black grouse increased prior to it, rock ptarmigan experienced a severe bottleneck and willow grouse was maintained at large population size. We postulate that the varying psmc signal throughout the LGM depicts only the local history of the species. Nevertheless, the large population fluctuations in willow grouse and rock ptarmigan indicate that both species are opportunistic breeders while black grouse tracks the climatic changes more slowly and is maintained at lower Ne . Our results highlight the usefulness of the psmc approach in investigating species' reaction to climate change in the deep past, but also that caution should be taken in drawing general conclusions about the recent past.
Collapse
Affiliation(s)
- Radoslav Kozma
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, Uppsala, SE-75236, Sweden
| | - Páll Melsted
- Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavik, 107, Iceland.,deCODE Genetics/Amgen, Reykjavik, Iceland
| | - Kristinn P Magnússon
- The Icelandic Institute of Natural History, Borgir v. Nordurslod, Akureyri, 600, Iceland.,Department of Natural Resource Sciences, University of Akureyri, Borgir vid Nordurslod, Akureyri, 600, Iceland.,Biomedical Center, University of Iceland, Vatnsmýrarvegur 16, Reykjavik, 101, Iceland
| | - Jacob Höglund
- Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, Uppsala, SE-75236, Sweden
| |
Collapse
|
145
|
Alic AS, Ruzafa D, Dopazo J, Blanquer I. Objective review of de novostand-alone error correction methods for NGS data. WILEY INTERDISCIPLINARY REVIEWS: COMPUTATIONAL MOLECULAR SCIENCE 2016. [DOI: 10.1002/wcms.1239] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Andy S. Alic
- Institute of Instrumentation for Molecular Imaging (I3M); Universitat Politècnica de València; València Spain
| | - David Ruzafa
- Departamento de Quìmica Fìsica e Instituto de Biotecnologìa, Facultad de Ciencias; Universidad de Granada; Granada Spain
| | - Joaquin Dopazo
- Department of Computational Genomics; Príncipe Felipe Research Centre (CIPF); Valencia Spain
- CIBER de Enfermedades Raras (CIBERER); Valencia Spain
- Functional Genomics Node (INB) at CIPF; Valencia Spain
| | - Ignacio Blanquer
- Institute of Instrumentation for Molecular Imaging (I3M); Universitat Politècnica de València; València Spain
- Biomedical Imaging Research Group GIBI 2; Polytechnic University Hospital La Fe; Valencia Spain
| |
Collapse
|
146
|
Laehnemann D, Borkhardt A, McHardy AC. Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Brief Bioinform 2016; 17:154-79. [PMID: 26026159 PMCID: PMC4719071 DOI: 10.1093/bib/bbv029] [Citation(s) in RCA: 190] [Impact Index Per Article: 21.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Revised: 04/09/2015] [Indexed: 12/23/2022] Open
Abstract
Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here.
Collapse
|
147
|
Abstract
BACKGROUND Continued advances in next generation short-read sequencing technologies are increasing throughput and read lengths, while driving down error rates. Taking advantage of the high coverage sampling used in many applications, several error correction algorithms have been developed to improve data quality further. However, correcting errors in high coverage sequence data requires significant computing resources. METHODS We propose a different approach to handle erroneous sequence data. Presently, error rates of high-throughput platforms such as the Illumina HiSeq are within 1%. Moreover, the errors are not uniformly distributed in all reads, and a large percentage of reads are indeed error-free. Ability to predict such perfect reads can significantly impact the run-time complexity of applications. We present a simple and fast k-spectrum analysis based method to identify error-free reads. The filtration process to identify and weed out erroneous reads can be customized at several levels of stringency depending upon the downstream application need. RESULTS Our experiments show that if around 80% of the reads in a dataset are perfect, then our method retains almost 99.9% of them with more than 90% precision rate. Though filtering out reads identified as erroneous by our method reduces the average coverage by about 7%, we found the remaining reads provide as uniform a coverage as the original dataset. We demonstrate the effectiveness of our approach on an example downstream application: we show that an error correction algorithm, Reptile, which rely on collectively analyzing the reads in a dataset to identify and correct erroneous bases, instead use reads predicted to be perfect by our method to correct the other reads, the overall accuracy improves further by up to 10%. CONCLUSIONS Thanks to the continuous technological improvements, the coverage and accuracy of reads from dominant sequencing platforms have now reached an extent where we can envision just filtering out reads with errors, thus making error correction less important. Our algorithm is a first attempt to propose and demonstrate this new paradigm. Moreover, our demonstration is applicable to any error correction algorithm as a downstream application, this in turn gives a new class of error correcting algorithms as a by product.
Collapse
|
148
|
Abstract
Background In highly parallel next-generation sequencing (NGS) techniques millions to billions of short reads are produced from a genomic sequence in a single run. Due to the limitation of the NGS technologies, there could be errors in the reads. The error rate of the reads can be reduced with trimming and by correcting the erroneous bases of the reads. It helps to achieve high quality data and the computational complexity of many biological applications will be greatly reduced if the reads are first corrected. We have developed a novel error correction algorithm called EC and compared it with four other state-of-the-art algorithms using both real and simulated sequencing reads. Results We have done extensive and rigorous experiments that reveal that EC is indeed an effective, scalable, and efficient error correction tool. Real reads that we have employed in our performance evaluation are Illumina-generated short reads of various lengths. Six experimental datasets we have utilized are taken from sequence and read archive (SRA) at NCBI. The simulated reads are obtained by picking substrings from random positions of reference genomes. To introduce errors, some of the bases of the simulated reads are changed to other bases with some probabilities. Conclusions Error correction is a vital problem in biology especially for NGS data. In this paper we present a novel algorithm, called Error Corrector (EC), for correcting substitution errors in biological sequencing reads. We plan to investigate the possibility of employing the techniques introduced in this research paper to handle insertion and deletion errors also. Software availability The implementation is freely available for non-commercial purposes. It can be downloaded from: http://engr.uconn.edu/~rajasek/EC.zip.
Collapse
|
149
|
Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads. Gigascience 2015; 4:48. [PMID: 26500767 PMCID: PMC4615873 DOI: 10.1186/s13742-015-0089-y] [Citation(s) in RCA: 329] [Impact Index Per Article: 32.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Accepted: 10/09/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Next-generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been highly effective for whole-genome sequencing (WGS) reads, but are unsuitable for RNA-seq reads, owing to the variation in gene expression levels and alternative splicing. FINDINGS We developed a k-mer based method, Rcorrector, to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted k-mers in the input reads. Unlike WGS read correctors, which use a global threshold to determine trusted k-mers, Rcorrector computes a local threshold at every position in a read. CONCLUSIONS Rcorrector has an accuracy higher than or comparable to existing methods, including the only other method (SEECER) designed for RNA-seq reads, and is more time and memory efficient. With a 5 GB memory footprint for 100 million reads, it can be run on virtually any desktop or server. The software is available free of charge under the GNU General Public License from https://github.com/mourisl/Rcorrector/.
Collapse
|
150
|
Thangam M, Gopal RK. CRCDA--Comprehensive resources for cancer NGS data analysis. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav092. [PMID: 26450948 PMCID: PMC4597977 DOI: 10.1093/database/bav092] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 08/31/2015] [Indexed: 12/24/2022]
Abstract
Next generation sequencing (NGS) innovations put a compelling landmark in life science and changed the direction of research in clinical oncology with its productivity to diagnose and treat cancer. The aim of our portal comprehensive resources for cancer NGS data analysis (CRCDA) is to provide a collection of different NGS tools and pipelines under diverse classes with cancer pathways and databases and furthermore, literature information from PubMed. The literature data was constrained to 18 most common cancer types such as breast cancer, colon cancer and other cancers that exhibit in worldwide population. NGS-cancer tools for the convenience have been categorized into cancer genomics, cancer transcriptomics, cancer epigenomics, quality control and visualization. Pipelines for variant detection, quality control and data analysis were listed to provide out-of-the box solution for NGS data analysis, which may help researchers to overcome challenges in selecting and configuring individual tools for analysing exome, whole genome and transcriptome data. An extensive search page was developed that can be queried by using (i) type of data [literature, gene data and sequence read archive (SRA) data] and (ii) type of cancer (selected based on global incidence and accessibility of data). For each category of analysis, variety of tools are available and the biggest challenge is in searching and using the right tool for the right application. The objective of the work is collecting tools in each category available at various places and arranging the tools and other data in a simple and user-friendly manner for biologists and oncologists to find information easier. To the best of our knowledge, we have collected and presented a comprehensive package of most of the resources available in cancer for NGS data analysis. Given these factors, we believe that this website will be an useful resource to the NGS research community working on cancer. Database URL: http://bioinfo.au-kbc.org.in/ngs/ngshome.html.
Collapse
Affiliation(s)
- Manonanthini Thangam
- AU-KBC Research Centre, MIT Campus of Anna University, Chromepet, Chennai, India
| | - Ramesh Kumar Gopal
- AU-KBC Research Centre, MIT Campus of Anna University, Chromepet, Chennai, India
| |
Collapse
|