501
|
Host genome integration and giant virus-induced reactivation of the virophage mavirus. Nature 2017; 540:288-291. [PMID: 27929021 DOI: 10.1038/nature20593] [Citation(s) in RCA: 100] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 11/02/2016] [Indexed: 11/08/2022]
Abstract
Endogenous viral elements are increasingly found in eukaryotic genomes, yet little is known about their origins, dynamics, or function. Here we provide a compelling example of a DNA virus that readily integrates into a eukaryotic genome where it acts as an inducible antiviral defence system. We found that the virophage mavirus, a parasite of the giant Cafeteria roenbergensis virus (CroV), integrates at multiple sites within the nuclear genome of the marine protozoan Cafeteria roenbergensis. The endogenous mavirus is structurally and genetically similar to eukaryotic DNA transposons and endogenous viruses of the Maverick/Polinton family. Provirophage genes are not constitutively expressed, but are specifically activated by superinfection with CroV, which induces the production of infectious mavirus particles. Virophages can inhibit the replication of mimivirus-like giant viruses and an anti-viral protective effect of provirophages on their hosts has been hypothesized. We find that provirophage-carrying cells are not directly protected from CroV; however, lysis of these cells releases infectious mavirus particles that are then able to suppress CroV replication and enhance host survival during subsequent rounds of infection. The microbial host-parasite interaction described here involves an altruistic aspect and suggests that giant-virus-induced activation of provirophages might be ecologically relevant in natural protist populations.
Collapse
|
502
|
Delayed Otolith Development Does Not Impair Vestibular Circuit Formation in Zebrafish. J Assoc Res Otolaryngol 2017; 18:415-425. [PMID: 28332011 DOI: 10.1007/s10162-017-0617-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Accepted: 02/21/2017] [Indexed: 10/19/2022] Open
Abstract
What is the role of normally patterned sensory signaling in development of vestibular circuits? For technical reasons, including the difficulty in depriving animals of vestibular inputs, this has been a challenging question to address. Here we take advantage of a vestibular-deficient zebrafish mutant, rock solo AN66 , in order to examine whether normal sensory input is required for formation of vestibular-driven postural circuitry. We show that the rock solo AN66 mutant is a splice site mutation in the secreted glycoprotein otogelin (otog), which we confirm through both whole genome sequencing and complementation with an otog early termination mutant. Using confocal microscopy, we find that elements of postural circuits are anatomically normal in rock solo AN66 mutants, including hair cells, vestibular ganglion neurons, and vestibulospinal neurons. Surprisingly, the balance and postural deficits that are readily apparent in younger larvae disappear around 2 weeks of age. We demonstrate that this behavioral recovery follows the delayed development of the anterior (utricular) otolith, which appears around 14 days post-fertilization (dpf), compared to 1 dpf in WT. These findings indicate that utricular signaling is not required for normal structural development of the inner ear and vestibular nucleus neurons. Furthermore, despite the otolith's developmental delay until well after postural behaviors normally appear, downstream circuits can drive righting reflexes within ∼1-2 days of its arrival, indicating that vestibular circuit wiring is not impaired by a delay in patterned activity. The functional recovery of postural behaviors may shed light on why humans with mutations in otog exhibit only subclinical vestibular deficits.
Collapse
|
503
|
TreeToReads - a pipeline for simulating raw reads from phylogenies. BMC Bioinformatics 2017; 18:178. [PMID: 28320310 PMCID: PMC5359950 DOI: 10.1186/s12859-017-1592-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2016] [Accepted: 03/10/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Using phylogenomic analysis tools for tracking pathogens has become standard practice in academia, public health agencies, and large industries. Using the same raw read genomic data as input, there are several different approaches being used to infer phylogenetic tree. These include many different SNP pipelines, wgMLST approaches, k-mer algorithms, whole genome alignment and others; each of these has advantages and disadvantages, some have been extensively validated, some are faster, some have higher resolution. A few of these analysis approaches are well-integrated into the regulatory process of US Federal agencies (e.g. the FDA's SNP pipeline for tracking foodborne pathogens). However, despite extensive validation on benchmark datasets and comparison with other pipelines, we lack methods for fully exploring the effects of multiple parameter values in each pipeline that can potentially have an effect on whether the correct phylogenetic tree is recovered. RESULTS To resolve this problem, we offer a program, TreeToReads, which can generate raw read data from mutated genomes simulated under a known phylogeny. This simulation pipeline allows direct comparisons of simulated and observed data in a controlled environment. At each step of these simulations, researchers can vary parameters of interest (e.g., input tree topology, amount of sequence divergence, rate of indels, read coverage, distance of reference genome, etc) to assess the effects of various parameter values on correctly calling SNPs and reconstructing an accurate tree. CONCLUSIONS Such critical assessments of the accuracy and robustness of analytical pipelines are essential to progress in both research and applied settings.
Collapse
|
504
|
Zojer M, Schuster LN, Schulz F, Pfundner A, Horn M, Rattei T. Variant profiling of evolving prokaryotic populations. PeerJ 2017; 5:e2997. [PMID: 28224054 PMCID: PMC5316281 DOI: 10.7717/peerj.2997] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Accepted: 01/17/2017] [Indexed: 12/30/2022] Open
Abstract
Genomic heterogeneity of bacterial species is observed and studied in experimental evolution experiments and clinical diagnostics, and occurs as micro-diversity of natural habitats. The challenge for genome research is to accurately capture this heterogeneity with the currently used short sequencing reads. Recent advances in NGS technologies improved the speed and coverage and thus allowed for deep sequencing of bacterial populations. This facilitates the quantitative assessment of genomic heterogeneity, including low frequency alleles or haplotypes. However, false positive variant predictions due to sequencing errors and mapping artifacts of short reads need to be prevented. We therefore created VarCap, a workflow for the reliable prediction of different types of variants even at low frequencies. In order to predict SNPs, InDels and structural variations, we evaluated the sensitivity and accuracy of different software tools using synthetic read data. The results suggested that the best sensitivity could be reached by a union of different tools, however at the price of increased false positives. We identified possible reasons for false predictions and used this knowledge to improve the accuracy by post-filtering the predicted variants according to properties such as frequency, coverage, genomic environment/localization and co-localization with other variants. We observed that best precision was achieved by using an intersection of at least two tools per variant. This resulted in the reliable prediction of variants above a minimum relative abundance of 2%. VarCap is designed for being routinely used within experimental evolution experiments or for clinical diagnostics. The detected variants are reported as frequencies within a VCF file and as a graphical overview of the distribution of the different variant/allele/haplotype frequencies. The source code of VarCap is available at https://github.com/ma2o/VarCap. In order to provide this workflow to a broad community, we implemeted VarCap on a Galaxy webserver, which is accessible at http://galaxy.csb.univie.ac.at.
Collapse
Affiliation(s)
- Markus Zojer
- Department of Microbiology and Ecosystems Science, Division of Computational Systems Biology, University of Vienna , Vienna , Austria
| | - Lisa N Schuster
- Department of Microbiology and Ecosystems Science, Division of Microbial Ecology, University of Vienna , Vienna , Austria
| | - Frederik Schulz
- DOE Joint Genome Institute, Lawrence Berkeley National Lab , Walnut Creek , CA , United States
| | - Alexander Pfundner
- Department of Microbiology and Ecosystems Science, Division of Computational Systems Biology, University of Vienna , Vienna , Austria
| | - Matthias Horn
- Department of Microbiology and Ecosystems Science, Division of Microbial Ecology, University of Vienna , Vienna , Austria
| | - Thomas Rattei
- Department of Microbiology and Ecosystems Science, Division of Computational Systems Biology, University of Vienna , Vienna , Austria
| |
Collapse
|
505
|
Gymrek M. A genomic view of short tandem repeats. Curr Opin Genet Dev 2017; 44:9-16. [PMID: 28213161 DOI: 10.1016/j.gde.2017.01.012] [Citation(s) in RCA: 82] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Accepted: 01/30/2017] [Indexed: 12/31/2022]
Abstract
Short tandem repeats (STRs) are some of the fastest mutating loci in the genome. Tools for accurately profiling STRs from high-throughput sequencing data have enabled genome-wide interrogation of more than a million STRs across hundreds of individuals. These catalogs have revealed that STRs are highly multiallelic and may contribute more de novo mutations than any other variant class. Recent studies have leveraged these catalogs to show that STRs play a widespread role in regulating gene expression and other molecular phenotypes. These analyses suggest that STRs are an underappreciated but rich reservoir of variation that likely make significant contributions to Mendelian diseases, complex traits, and cancer.
Collapse
Affiliation(s)
- Melissa Gymrek
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
506
|
Liu NQ, ter Huurne M, Nguyen LN, Peng T, Wang SY, Studd JB, Joshi O, Ongen H, Bramsen JB, Yan J, Andersen CL, Taipale J, Dermitzakis ET, Houlston RS, Hubner NC, Stunnenberg HG. The non-coding variant rs1800734 enhances DCLK3 expression through long-range interaction and promotes colorectal cancer progression. Nat Commun 2017; 8:14418. [PMID: 28195176 PMCID: PMC5316867 DOI: 10.1038/ncomms14418] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Accepted: 12/28/2016] [Indexed: 01/02/2023] Open
Abstract
Genome-wide association studies have identified a great number of non-coding risk variants for colorectal cancer (CRC). To date, the majority of these variants have not been functionally studied. Identification of allele-specific transcription factor (TF) binding is of great importance to understand regulatory consequences of such variants. A recently developed proteome-wide analysis of disease-associated SNPs (PWAS) enables identification of TF-DNA interactions in an unbiased manner. Here we perform a large-scale PWAS study to comprehensively characterize TF-binding landscape that is associated with CRC, which identifies 731 allele-specific TF binding at 116 CRC risk loci. This screen identifies the A-allele of rs1800734 within the promoter region of MLH1 as perturbing the binding of TFAP4 and consequently increasing DCLK3 expression through a long-range interaction, which promotes cancer malignancy through enhancing expression of the genes related to epithelial-to-mesenchymal transition.
Collapse
Affiliation(s)
- Ning Qing Liu
- Faculty of Science, Department of Molecular Biology, Radboud University, RIMLS, PO BOX 9101, 6500HB Nijmegen, The Netherlands
| | - Menno ter Huurne
- Faculty of Science, Department of Molecular Biology, Radboud University, RIMLS, PO BOX 9101, 6500HB Nijmegen, The Netherlands
| | - Luan N. Nguyen
- Faculty of Science, Department of Molecular Biology, Radboud University, RIMLS, PO BOX 9101, 6500HB Nijmegen, The Netherlands
| | - Tianran Peng
- Faculty of Science, Department of Molecular Biology, Radboud University, RIMLS, PO BOX 9101, 6500HB Nijmegen, The Netherlands
| | - Shuang-Yin Wang
- Faculty of Science, Department of Molecular Biology, Radboud University, RIMLS, PO BOX 9101, 6500HB Nijmegen, The Netherlands
| | - James B. Studd
- Division of Genetics and Epidemiology, Institute of Cancer Research, 15 Cotswold Road, Sutton, SM2 5NG Surrey, UK
| | - Onkar Joshi
- Faculty of Science, Department of Molecular Biology, Radboud University, RIMLS, PO BOX 9101, 6500HB Nijmegen, The Netherlands
| | - Halit Ongen
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva 1211, Switzerland
| | - Jesper B Bramsen
- Department of Molecular Medicine, Aarhus University Hospital, Palle Juul-Jensens Boulevard 99, DK-8200 Aarhus, Denmark
| | - Jian Yan
- Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE 141 83 Stockholm, Sweden
- Ludwig Institute for Cancer Research, 9500 Gilman Drive, La Jolla, California 92093, USA
| | - Claus L. Andersen
- Department of Molecular Medicine, Aarhus University Hospital, Palle Juul-Jensens Boulevard 99, DK-8200 Aarhus, Denmark
| | - Jussi Taipale
- Division of Functional Genomics and Systems Biology, Department of Medical Biochemistry and Biophysics, Karolinska Institutet, SE 141 83 Stockholm, Sweden
| | - Emmanouil T. Dermitzakis
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva 1211, Switzerland
| | - Richard S. Houlston
- Division of Genetics and Epidemiology, Institute of Cancer Research, 15 Cotswold Road, Sutton, SM2 5NG Surrey, UK
| | - Nina C. Hubner
- Faculty of Science, Department of Molecular Biology, Radboud University, RIMLS, PO BOX 9101, 6500HB Nijmegen, The Netherlands
| | - Hendrik G. Stunnenberg
- Faculty of Science, Department of Molecular Biology, Radboud University, RIMLS, PO BOX 9101, 6500HB Nijmegen, The Netherlands
| |
Collapse
|
507
|
Tørresen OK, Star B, Jentoft S, Reinar WB, Grove H, Miller JR, Walenz BP, Knight J, Ekholm JM, Peluso P, Edvardsen RB, Tooming-Klunderud A, Skage M, Lien S, Jakobsen KS, Nederbragt AJ. An improved genome assembly uncovers prolific tandem repeats in Atlantic cod. BMC Genomics 2017; 18:95. [PMID: 28100185 PMCID: PMC5241972 DOI: 10.1186/s12864-016-3448-x] [Citation(s) in RCA: 115] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Accepted: 12/20/2016] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND The first Atlantic cod (Gadus morhua) genome assembly published in 2011 was one of the early genome assemblies exclusively based on high-throughput 454 pyrosequencing. Since then, rapid advances in sequencing technologies have led to a multitude of assemblies generated for complex genomes, although many of these are of a fragmented nature with a significant fraction of bases in gaps. The development of long-read sequencing and improved software now enable the generation of more contiguous genome assemblies. RESULTS By combining data from Illumina, 454 and the longer PacBio sequencing technologies, as well as integrating the results of multiple assembly programs, we have created a substantially improved version of the Atlantic cod genome assembly. The sequence contiguity of this assembly is increased fifty-fold and the proportion of gap-bases has been reduced fifteen-fold. Compared to other vertebrates, the assembly contains an unusual high density of tandem repeats (TRs). Indeed, retrospective analyses reveal that gaps in the first genome assembly were largely associated with these TRs. We show that 21% of the TRs across the assembly, 19% in the promoter regions and 12% in the coding sequences are heterozygous in the sequenced individual. CONCLUSIONS The inclusion of PacBio reads combined with the use of multiple assembly programs drastically improved the Atlantic cod genome assembly by successfully resolving long TRs. The high frequency of heterozygous TRs within or in the vicinity of genes in the genome indicate a considerable standing genomic variation in Atlantic cod populations, which is likely of evolutionary importance.
Collapse
Affiliation(s)
- Ole K. Tørresen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
| | - Bastiaan Star
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
| | - Sissel Jentoft
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
- Department of Natural Sciences, University of Agder, Kristiansand, NO-4604 Norway
| | - William B. Reinar
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
| | - Harald Grove
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, Ås, NO-1432 Norway
| | - Jason R. Miller
- J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, 20850 MD USA
| | - Brian P. Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, 20892 MD USA
| | - James Knight
- Yale School of Medicine, Yale University, New Haven, 06520 CT USA
| | | | | | | | - Ave Tooming-Klunderud
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
| | - Morten Skage
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
| | - Sigbjørn Lien
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, Ås, NO-1432 Norway
| | - Kjetill S. Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
| | - Alexander J. Nederbragt
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, NO-0316 Norway
- Biomedical Informatics Research Group, Department of Informatics, University of Oslo, Oslo, NO-0316 Norway
| |
Collapse
|
508
|
Imielinski M, Guo G, Meyerson M. Insertions and Deletions Target Lineage-Defining Genes in Human Cancers. Cell 2017; 168:460-472.e14. [PMID: 28089356 DOI: 10.1016/j.cell.2016.12.025] [Citation(s) in RCA: 82] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2016] [Revised: 10/25/2016] [Accepted: 12/16/2016] [Indexed: 01/21/2023]
Abstract
Certain cell types function as factories, secreting large quantities of one or more proteins that are central to the physiology of the respective organ. Examples include surfactant proteins in lung alveoli, albumin in liver parenchyma, and lipase in the stomach lining. Whole-genome sequencing analysis of lung adenocarcinomas revealed noncoding somatic mutational hotspots near VMP1/MIR21 and indel hotspots in surfactant protein genes (SFTPA1, SFTPB, and SFTPC). Extrapolation to other solid cancers demonstrated highly recurrent and tumor-type-specific indel hotspots targeting the noncoding regions of highly expressed genes defining certain secretory cellular lineages: albumin (ALB) in liver carcinoma, gastric lipase (LIPF) in stomach carcinoma, and thyroglobulin (TG) in thyroid carcinoma. The sequence contexts of indels targeting lineage-defining genes were significantly enriched in the AATAATD DNA motif and specific chromatin contexts, including H3K27ac and H3K36me3. Our findings illuminate a prevalent and hitherto unrecognized mutational process linking cellular lineage and cancer.
Collapse
Affiliation(s)
- Marcin Imielinski
- Department of Pathology and Laboratory Medicine, Englander Institute for Precision Medicine, Institute for Computational Biomedicine, and Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; New York Genome Center, New York, NY 10013, USA.
| | - Guangwu Guo
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Medical Oncology, Dana Farber Cancer Institute, Harvard Medical School, Boston, MA 02215, USA
| | - Matthew Meyerson
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Department of Medical Oncology, Dana Farber Cancer Institute, Harvard Medical School, Boston, MA 02215, USA; Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02215, USA.
| |
Collapse
|
509
|
Ho PW, Swinnen S, Duitama J, Nevoigt E. The sole introduction of two single-point mutations establishes glycerol utilization in Saccharomyces cerevisiae CEN.PK derivatives. BIOTECHNOLOGY FOR BIOFUELS 2017; 10:10. [PMID: 28053667 PMCID: PMC5209837 DOI: 10.1186/s13068-016-0696-6] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2016] [Accepted: 12/23/2016] [Indexed: 06/06/2023]
Abstract
BACKGROUND Glycerol is an abundant by-product of biodiesel production and has several advantages as a substrate in biotechnological applications. Unfortunately, the popular production host Saccharomyces cerevisiae can barely metabolize glycerol by nature. RESULTS In this study, two evolved derivatives of the strain CEN.PK113-1A were created that were able to grow in synthetic glycerol medium (strains PW-1 and PW-2). Their growth performances on glycerol were compared with that of the previously published evolved CEN.PK113-7D derivative JL1. As JL1 showed a higher maximum specific growth rate on glycerol (0.164 h-1 compared to 0.119 h-1 for PW-1 and 0.127 h-1 for PW-2), its genomic DNA was subjected to whole-genome resequencing. Two point mutations in the coding sequences of the genes UBR2 and GUT1 were identified to be crucial for growth in synthetic glycerol medium and subsequently verified by reverse engineering of the wild-type strain CEN.PK113-7D. The growth rate of the resulting reverse-engineered strain was 0.130 h-1. Sanger sequencing of the GUT1 and UBR2 alleles of the above-mentioned evolved strains PW-1 and PW-2 also revealed one single-point mutation in these two genes, and both mutations were demonstrated to be also crucial and sufficient for obtaining a maximum specific growth rate on glycerol of ~0.120 h-1. CONCLUSIONS The current work confirmed the importance of UBR2 and GUT1 as targets for establishing glycerol utilization in strains of the CEN.PK family. In addition, it shows that a growth rate on glycerol of 0.130 h-1 can be established in reverse-engineered CEN.PK strains by solely replacing a single amino acid in the coding sequences of both Ubr2 and Gut1.
Collapse
Affiliation(s)
- Ping-Wei Ho
- Department of Life Sciences and Chemistry, Jacobs University Bremen gGmbH, Campus Ring 1, 28759 Bremen, Germany
| | - Steve Swinnen
- Department of Life Sciences and Chemistry, Jacobs University Bremen gGmbH, Campus Ring 1, 28759 Bremen, Germany
| | - Jorge Duitama
- Systems and Computing Engineering Department, Universidad de los Andes, Cra 1 Este No 19A-40, Bogotá, Colombia
| | - Elke Nevoigt
- Department of Life Sciences and Chemistry, Jacobs University Bremen gGmbH, Campus Ring 1, 28759 Bremen, Germany
| |
Collapse
|
510
|
High-Throughput Resequencing of Maize Landraces at Genomic Regions Associated with Flowering Time. PLoS One 2017; 12:e0168910. [PMID: 28045987 PMCID: PMC5207663 DOI: 10.1371/journal.pone.0168910] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2016] [Accepted: 12/08/2016] [Indexed: 12/17/2022] Open
Abstract
Despite the reduction in the price of sequencing, it remains expensive to sequence and assemble whole, complex genomes of multiple samples for population studies, particularly for large genomes like those of many crop species. Enrichment of target genome regions coupled with next generation sequencing is a cost-effective strategy to obtain sequence information for loci of interest across many individuals, providing a less expensive approach to evaluating sequence variation at the population scale. Here we evaluate amplicon-based enrichment coupled with semiconductor sequencing on a validation set consisting of three maize inbred lines, two hybrids and 19 landrace accessions. We report the use of a multiplexed panel of 319 PCR assays that target 20 candidate loci associated with photoperiod sensitivity in maize while requiring 25 ng or less of starting DNA per sample. Enriched regions had an average on-target sequence read depth of 105 with 98% of the sequence data mapping to the maize ‘B73’ reference and 80% of the reads mapping to the target interval. Sequence reads were aligned to B73 and 1,486 and 1,244 variants were called using SAMtools and GATK, respectively. Of the variants called by both SAMtools and GATK, 30% were not previously reported in maize. Due to the high sequence read depth, heterozygote genotypes could be called with at least 92.5% accuracy in hybrid materials using GATK. The genetic data are congruent with previous reports of high total genetic diversity and substantial population differentiation among maize landraces. In conclusion, semiconductor sequencing of highly multiplexed PCR reactions is a cost-effective strategy for resequencing targeted genomic loci in diverse maize materials.
Collapse
|
511
|
Hofmann AL, Behr J, Singer J, Kuipers J, Beisel C, Schraml P, Moch H, Beerenwinkel N. Detailed simulation of cancer exome sequencing data reveals differences and common limitations of variant callers. BMC Bioinformatics 2017; 18:8. [PMID: 28049408 PMCID: PMC5209852 DOI: 10.1186/s12859-016-1417-7] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2016] [Accepted: 12/10/2016] [Indexed: 12/30/2022] Open
Abstract
Background Next-generation sequencing of matched tumor and normal biopsy pairs has become a technology of paramount importance for precision cancer treatment. Sequencing costs have dropped tremendously, allowing the sequencing of the whole exome of tumors for just a fraction of the total treatment costs. However, clinicians and scientists cannot take full advantage of the generated data because the accuracy of analysis pipelines is limited. This particularly concerns the reliable identification of subclonal mutations in a cancer tissue sample with very low frequencies, which may be clinically relevant. Results Using simulations based on kidney tumor data, we compared the performance of nine state-of-the-art variant callers, namely deepSNV, GATK HaplotypeCaller, GATK UnifiedGenotyper, JointSNVMix2, MuTect, SAMtools, SiNVICT, SomaticSniper, and VarScan2. The comparison was done as a function of variant allele frequencies and coverage. Our analysis revealed that deepSNV and JointSNVMix2 perform very well, especially in the low-frequency range. We attributed false positive and false negative calls of the nine tools to specific error sources and assigned them to processing steps of the pipeline. All of these errors can be expected to occur in real data sets. We found that modifying certain steps of the pipeline or parameters of the tools can lead to substantial improvements in performance. Furthermore, a novel integration strategy that combines the ranks of the variants yielded the best performance. More precisely, the rank-combination of deepSNV, JointSNVMix2, MuTect, SiNVICT and VarScan2 reached a sensitivity of 78% when fixing the precision at 90%, and outperformed all individual tools, where the maximum sensitivity was 71% with the same precision. Conclusions The choice of well-performing tools for alignment and variant calling is crucial for the correct interpretation of exome sequencing data obtained from mixed samples, and common pipelines are suboptimal. We were able to relate observed substantial differences in performance to the underlying statistical models of the tools, and to pinpoint the error sources of false positive and false negative calls. These findings might inspire new software developments that improve exome sequencing pipelines and further the field of precision cancer treatment. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1417-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ariane L Hofmann
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstr, Basel, 26, 4058, Switzerland.,Swiss Institute of Bioinformatics, Mattenstr, Basel, 26, 4058, Switzerland
| | - Jonas Behr
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstr, Basel, 26, 4058, Switzerland.,Swiss Institute of Bioinformatics, Mattenstr, Basel, 26, 4058, Switzerland
| | - Jochen Singer
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstr, Basel, 26, 4058, Switzerland.,Swiss Institute of Bioinformatics, Mattenstr, Basel, 26, 4058, Switzerland
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstr, Basel, 26, 4058, Switzerland.,Swiss Institute of Bioinformatics, Mattenstr, Basel, 26, 4058, Switzerland
| | - Christian Beisel
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstr, Basel, 26, 4058, Switzerland
| | - Peter Schraml
- Institute for Surgical Pathology, University Hospital Zurich, Schmelzbergstrasse 12, Zurich, 8091, Switzerland
| | - Holger Moch
- Institute for Surgical Pathology, University Hospital Zurich, Schmelzbergstrasse 12, Zurich, 8091, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstr, Basel, 26, 4058, Switzerland. .,Swiss Institute of Bioinformatics, Mattenstr, Basel, 26, 4058, Switzerland.
| |
Collapse
|
512
|
Miga KH. The Promises and Challenges of Genomic Studies of Human Centromeres. PROGRESS IN MOLECULAR AND SUBCELLULAR BIOLOGY 2017; 56:285-304. [PMID: 28840242 DOI: 10.1007/978-3-319-58592-5_12] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Human centromeres are genomic regions that act as sites of kinetochore assembly to ensure proper chromosome segregation during mitosis and meiosis. Although the biological importance of centromeres in genome stability, and ultimately, cell viability are well understood, the complete sequence content and organization in these multi-megabase-sized regions remains unknown. The lack of a high-resolution reference assembly inhibits standard bioinformatics protocols, and as a result, sequence-based studies involving human centromeres lag far behind the advances made for the non-repetitive sequences in the human genome. In this chapter, I introduce what is known about the genomic organization in the highly repetitive regions spanning human centromeres, and discuss the challenges these sequences pose for assembly, alignment, and data interpretation. Overcoming these obstacles is expected to issue a new era for centromere genomics, which will offer new discoveries in basic cell biology and human biomedical research.
Collapse
Affiliation(s)
- Karen H Miga
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, CA, USA.
| |
Collapse
|
513
|
Yu LX, Zheng P, Bhamidimarri S, Liu XP, Main D. The Impact of Genotyping-by-Sequencing Pipelines on SNP Discovery and Identification of Markers Associated with Verticillium Wilt Resistance in Autotetraploid Alfalfa ( Medicago sativa L.). FRONTIERS IN PLANT SCIENCE 2017; 8:89. [PMID: 28223988 PMCID: PMC5293825 DOI: 10.3389/fpls.2017.00089] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Accepted: 01/16/2017] [Indexed: 05/08/2023]
Abstract
Verticillium wilt (VW) of alfalfa is a soilborne disease causing severe yield loss in alfalfa. To identify molecular markers associated with VW resistance, we used an integrated framework of genome-wide association study (GWAS) with high-throughput genotyping by sequencing (GBS) to identify loci associated with VW resistance in an F1 full-sib alfalfa population. Phenotyping was performed using manual inoculation of the pathogen to cloned plants of each individual and disease severity was scored using a standard scale. Genotyping was done by GBS, followed by genotype calling using three bioinformatics pipelines including the TASSEL-GBS pipeline (TASSEL), the Universal Network Enabled Analysis Kit (UNEAK), and the haplotype-based FreeBayes pipeline (FreeBayes). The resulting numbers of SNPs, marker density, minor allele frequency (MAF) and heterozygosity were compared among the pipelines. The TASSEL pipeline generated more markers with the highest density and MAF, whereas the highest heterozygosity was obtained by the UNEAK pipeline. The FreeBayes pipeline generated tetraploid genotypes, with the least number of markers. SNP markers generated from each pipeline were used independently for marker-trait association. Markers significantly associated with VW resistance identified by each pipeline were compared. Similar marker loci were found on chromosomes 5, 6, and 7, whereas different loci on chromosome 1, 2, 3, and 4 were identified by different pipelines. Most significant markers were located on chromosome 6 and they were identified by all three pipelines. Of those identified, several loci were linked to known genes whose functions are involved in the plants' resistance to pathogens. Further investigation on these loci and their linked genes would provide insight into understanding molecular mechanisms of VW resistance in alfalfa. Functional markers closely linked to the resistance loci would be useful for MAS to improve alfalfa cultivars with enhanced resistance to the disease.
Collapse
Affiliation(s)
- Long-Xi Yu
- Plant Germplasm Introduction and Testing Research, United States Department of Agriculture-Agricultural Research Service, ProsserWA, USA
- *Correspondence: Long-Xi Yu,
| | - Ping Zheng
- Department of Horticulture, Washington State University, PullmanWA, USA
| | | | - Xiang-Ping Liu
- Plant Germplasm Introduction and Testing Research, United States Department of Agriculture-Agricultural Research Service, ProsserWA, USA
| | - Dorie Main
- Department of Horticulture, Washington State University, PullmanWA, USA
| |
Collapse
|
514
|
Leiva-Torres GA, Nebesio N, Vidal SM. Discovery of Variants Underlying Host Susceptibility to Virus Infection Using Whole-Exome Sequencing. Methods Mol Biol 2017; 1656:209-227. [PMID: 28808973 PMCID: PMC7120756 DOI: 10.1007/978-1-4939-7237-1_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
The clinical course of any viral infection greatly differs in individuals. This variation results from various viral, host, and environmental factors. The identification of host genetic factors influencing inter-individual variation in susceptibility to several pathogenic viruses has tremendously increased our understanding of the mechanisms and pathways required for immunity. Next-generation sequencing of whole exomes represents a powerful tool in biomedical research. In this chapter, we briefly introduce whole-exome sequencing in the context of genetic approaches to identify host susceptibility genes to viral infections. We then describe general aspects of the workflow for whole-exome sequence analysis together with the tools and online resources that can be used to identify and annotate variant calls, and then prioritize them for their potential association to phenotypes of interest.
Collapse
Affiliation(s)
- Gabriel A Leiva-Torres
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- McGill University Research Center on Complex Traits, Montreal, QC, Canada
- Department of Medicine, McGill University, Montreal, QC, Canada
| | - Nestor Nebesio
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- McGill University Research Center on Complex Traits, Montreal, QC, Canada
- Department of Medicine, McGill University, Montreal, QC, Canada
| | - Silvia M Vidal
- Department of Human Genetics, McGill University, Montreal, QC, Canada.
- McGill University Research Center on Complex Traits, Montreal, QC, Canada.
- Department of Medicine, McGill University, Montreal, QC, Canada.
| |
Collapse
|
515
|
Comprehensive population-based genome sequencing provides insight into hematopoietic regulatory mechanisms. Proc Natl Acad Sci U S A 2016; 114:E327-E336. [PMID: 28031487 DOI: 10.1073/pnas.1619052114] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Genetic variants affecting hematopoiesis can influence commonly measured blood cell traits. To identify factors that affect hematopoiesis, we performed association studies for blood cell traits in the population-based Estonian Biobank using high-coverage whole-genome sequencing (WGS) in 2,284 samples and SNP genotyping in an additional 14,904 samples. Using up to 7,134 samples with available phenotype data, our analyses identified 17 associations across 14 blood cell traits. Integration of WGS-based fine-mapping and complementary epigenomic datasets provided evidence for causal mechanisms at several loci, including at a previously undiscovered basophil count-associated locus near the master hematopoietic transcription factor CEBPA The fine-mapped variant at this basophil count association near CEBPA overlapped an enhancer active in common myeloid progenitors and influenced its activity. In situ perturbation of this enhancer by CRISPR/Cas9 mutagenesis in hematopoietic stem and progenitor cells demonstrated that it is necessary for and specifically regulates CEBPA expression during basophil differentiation. We additionally identified basophil count-associated variation at another more pleiotropic myeloid enhancer near GATA2, highlighting regulatory mechanisms for ordered expression of master hematopoietic regulators during lineage specification. Our study illustrates how population-based genetic studies can provide key insights into poorly understood cell differentiation processes of considerable physiologic relevance.
Collapse
|
516
|
Phelan J, O’Sullivan DM, Machado D, Ramos J, Whale AS, O’Grady J, Dheda K, Campino S, McNerney R, Viveiros M, Huggett JF, Clark TG. The variability and reproducibility of whole genome sequencing technology for detecting resistance to anti-tuberculous drugs. Genome Med 2016; 8:132. [PMID: 28003022 PMCID: PMC5178084 DOI: 10.1186/s13073-016-0385-x] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2016] [Accepted: 11/30/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The emergence of resistance to anti-tuberculosis drugs is a serious and growing threat to public health. Next-generation sequencing is rapidly gaining traction as a diagnostic tool for investigating drug resistance in Mycobacterium tuberculosis to aid treatment decisions. However, there are few little data regarding the precision of such sequencing for assigning resistance profiles. METHODS We investigated two sequencing platforms (Illumina MiSeq, Ion Torrent PGM™) and two rapid analytic pipelines (TBProfiler, Mykrobe predictor) using a well characterised reference strain (H37Rv) and clinical isolates from patients with tuberculosis resistant to up to 13 drugs. Results were compared to phenotypic drug susceptibility testing. To assess analytical robustness individual DNA samples were subjected to repeated sequencing. RESULTS The MiSeq and Ion PGM systems accurately predicted drug-resistance profiles and there was high reproducibility between biological and technical sample replicates. Estimated variant error rates were low (MiSeq 1 per 77 kbp, Ion PGM 1 per 41 kbp) and genomic coverage high (MiSeq 51-fold, Ion PGM 53-fold). MiSeq provided superior coverage in GC-rich regions, which translated into incremental detection of putative genotypic drug-specific resistance, including for resistance to para-aminosalicylic acid and pyrazinamide. The TBProfiler bioinformatics pipeline was concordant with reported phenotypic susceptibility for all drugs tested except pyrazinamide and para-aminosalicylic acid, with an overall concordance of 95.3%. When using the Mykrobe predictor concordance with phenotypic testing was 73.6%. CONCLUSIONS We have demonstrated high comparative reproducibility of two sequencing platforms, and high predictive ability of the TBProfiler mutation library and analytical pipeline, when profiling resistance to first- and second-line anti-tuberculosis drugs. However, platform-specific variability in coverage of some genome regions may have implications for predicting resistance to specific drugs. These findings may have implications for future clinical practice and thus deserve further scrutiny, set within larger studies and using updated mutation libraries.
Collapse
Affiliation(s)
- Jody Phelan
- Department of Pathogen Molecular Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, WC1E 7HT London, UK
| | | | - Diana Machado
- Unidade de Microbiologia Médica, Global Health and Tropical Medicine, GHTM, Instituto de Higiene e Medicina Tropical, IHMT, Universidade NOVA de Lisboa, UNL, Lisbon, Portugal
| | - Jorge Ramos
- Unidade de Microbiologia Médica, Global Health and Tropical Medicine, GHTM, Instituto de Higiene e Medicina Tropical, IHMT, Universidade NOVA de Lisboa, UNL, Lisbon, Portugal
| | - Alexandra S. Whale
- Molecular Biology, LGC Ltd, Queens Road, Teddington, Middlesex TW11 0LY UK
| | - Justin O’Grady
- Norwich Medical School, University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ UK
| | - Keertan Dheda
- Division of Pulmonary Medicine and UCT Lung Institute, Lung Infection and Immunity Unit, University of Cape Town, Groote Schuur Hospital, Observatory, 7925, Cape Town, South Africa
| | - Susana Campino
- Department of Pathogen Molecular Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, WC1E 7HT London, UK
| | - Ruth McNerney
- Division of Pulmonary Medicine and UCT Lung Institute, Lung Infection and Immunity Unit, University of Cape Town, Groote Schuur Hospital, Observatory, 7925, Cape Town, South Africa
| | - Miguel Viveiros
- Unidade de Microbiologia Médica, Global Health and Tropical Medicine, GHTM, Instituto de Higiene e Medicina Tropical, IHMT, Universidade NOVA de Lisboa, UNL, Lisbon, Portugal
| | - Jim F. Huggett
- Molecular Biology, LGC Ltd, Queens Road, Teddington, Middlesex TW11 0LY UK
- School of Biosciences & Medicine, Faculty of Health & Medical Science, University of Surrey, Guildford, GU2 7XH UK
| | - Taane G. Clark
- Department of Pathogen Molecular Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, WC1E 7HT London, UK
- Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, WC1E 7HT London, UK
| |
Collapse
|
517
|
Chan CH, Octavia S, Sintchenko V, Lan R. SnpFilt: A pipeline for reference-free assembly-based identification of SNPs in bacterial genomes. Comput Biol Chem 2016; 65:178-184. [DOI: 10.1016/j.compbiolchem.2016.09.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Accepted: 09/07/2016] [Indexed: 10/21/2022]
|
518
|
Long H, Winter DJ, Chang AYC, Sung W, Wu SH, Balboa M, Azevedo RBR, Cartwright RA, Lynch M, Zufall RA. Low Base-Substitution Mutation Rate in the Germline Genome of the Ciliate Tetrahymena thermophil. Genome Biol Evol 2016; 8:3629-3639. [PMID: 27635054 PMCID: PMC5585995 DOI: 10.1093/gbe/evw223] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/12/2016] [Indexed: 12/28/2022] Open
Abstract
Mutation is the ultimate source of all genetic variation and is, therefore, central to evolutionary change. Previous work on Paramecium tetraurelia found an unusually low germline base-substitution mutation rate in this ciliate. Here, we tested the generality of this result among ciliates using Tetrahymena thermophila. We sequenced the genomes of 10 lines of T. thermophila that had each undergone approximately 1,000 generations of mutation accumulation (MA). We applied an existing mutation-calling pipeline and developed a new probabilistic mutation detection approach that directly models the design of an MA experiment and accommodates the noise introduced by mismapped reads. Our probabilistic mutation-calling method provides a straightforward way of estimating the number of sites at which a mutation could have been called if one was present, providing the denominator for our mutation rate calculations. From these methods, we find that T. thermophila has a germline base-substitution mutation rate of 7.61 × 10 - 12 per-site, per cell division, which is consistent with the low base-substitution mutation rate in P. tetraurelia. Over the course of the evolution experiment, genomic exclusion lines derived from the MA lines experienced a fitness decline that cannot be accounted for by germline base-substitution mutations alone, suggesting that other genetic or epigenetic factors must be involved. Because selection can only operate to reduce mutation rates based upon the "visible" mutational load, asexual reproduction with a transcriptionally silent germline may allow ciliates to evolve extremely low germline mutation rates.
Collapse
Affiliation(s)
- Hongan Long
- Department of Biology and Biochemistry, University of Houston, Houston, TX
- Department of Biology, Indiana University, Bloomington, IN
| | - David J Winter
- The Biodesign Institute, Arizona State University, Tempe, AZ
| | - Allan Y.-C Chang
- Department of Biology and Biochemistry, University of Houston, Houston, TX
| | - Way Sung
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC
| | - Steven H Wu
- The Biodesign Institute, Arizona State University, Tempe, AZ
| | - Mariel Balboa
- Department of Biology and Biochemistry, University of Houston, Houston, TX
| | | | - Reed A Cartwright
- The Biodesign Institute, Arizona State University, Tempe, AZ
- School of Life Sciences, Arizona State University, Tempe, AZ
| | - Michael Lynch
- Department of Biology, Indiana University, Bloomington, IN
| | - Rebecca A Zufall
- Department of Biology and Biochemistry, University of Houston, Houston, TX
| |
Collapse
|
519
|
Ganna A, Genovese G, Howrigan DP, Byrnes A, Kurki M, Zekavat SM, Whelan CW, Kals M, Nivard MG, Bloemendal A, Bloom JM, Goldstein JI, Poterba T, Seed C, Handsaker RE, Natarajan P, Mägi R, Gage D, Robinson EB, Metspalu A, Salomaa V, Suvisaari J, Purcell SM, Sklar P, Kathiresan S, Daly MJ, McCarroll SA, Sullivan PF, Palotie A, Esko T, Hultman C, Neale BM. Ultra-rare disruptive and damaging mutations influence educational attainment in the general population. Nat Neurosci 2016; 19:1563-1565. [PMID: 27694993 PMCID: PMC5127781 DOI: 10.1038/nn.4404] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2016] [Accepted: 09/07/2016] [Indexed: 12/14/2022]
Abstract
Disruptive, damaging ultra-rare variants in highly constrained genes are enriched in individuals with neurodevelopmental disorders. In the general population, this class of variants was associated with a decrease in years of education (YOE). This effect was stronger among highly brain-expressed genes and explained more YOE variance than pathogenic copy number variation but less than common variants. Disruptive, damaging ultra-rare variants in highly constrained genes influence the determinants of YOE in the general population.
Collapse
Affiliation(s)
- Andrea Ganna
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston 02114, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm 171 77, Sweden
| | - Giulio Genovese
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Daniel P. Howrigan
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston 02114, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Andrea Byrnes
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston 02114, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Mitja Kurki
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston 02114, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Institute for Molecular Medicine Finland, FIMM, University of Helsinki, Helsinki FI-00014, Finland
| | - Seyedeh M. Zekavat
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Center for Human Genetic Research and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Christopher W. Whelan
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Mart Kals
- Estonian Genome Center, University of Tartu, Tartu 51010, Estonia
- Institute of Mathematics and Statistics, University of Tartu, Tartu 50409, Estonia
| | - Michel G. Nivard
- Department of Biological Psychology, VU University Amsterdam, Amsterdam 1081 HV, The Netherlands
| | - Alex Bloemendal
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston 02114, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Jonathan M. Bloom
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston 02114, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Jacqueline I. Goldstein
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston 02114, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Timothy Poterba
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston 02114, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Cotton Seed
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston 02114, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Robert E. Handsaker
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Pradeep Natarajan
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Center for Human Genetic Research and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Reedik Mägi
- Estonian Genome Center, University of Tartu, Tartu 51010, Estonia
| | - Diane Gage
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Elise B. Robinson
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston 02114, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Andres Metspalu
- Estonian Genome Center, University of Tartu, Tartu 51010, Estonia
| | - Veikko Salomaa
- Department of Health, THL-National Institute for Health and Welfare, Helsinki FI-00271, Finland
| | - Jaana Suvisaari
- Department of Health, THL-National Institute for Health and Welfare, Helsinki FI-00271, Finland
| | - Shaun M. Purcell
- Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
- Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | - Pamela Sklar
- Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
- Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | - Sekar Kathiresan
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Center for Human Genetic Research and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Mark J. Daly
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston 02114, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Steven A. McCarroll
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Patrick F. Sullivan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm 171 77, Sweden
- Departments of Genetics and Psychiatry, University of North Carolina, Chapel Hill, North Carolina 27599-7264, USA
| | - Aarno Palotie
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston 02114, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Institute for Molecular Medicine Finland, FIMM, University of Helsinki, Helsinki FI-00014, Finland
| | - Tõnu Esko
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Estonian Genome Center, University of Tartu, Tartu 51010, Estonia
| | - Christina Hultman
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm 171 77, Sweden
| | - Benjamin M. Neale
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston 02114, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
520
|
Shafer ABA, Peart CR, Tusso S, Maayan I, Brelsford A, Wheat CW, Wolf JBW. Bioinformatic processing of RAD‐seq data dramatically impacts downstream population genetic inference. Methods Ecol Evol 2016. [DOI: 10.1111/2041-210x.12700] [Citation(s) in RCA: 189] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Aaron B. A. Shafer
- Department of Evolutionary Biology Evolutionary Biology Centre Uppsala University Norbyvägen 18D SE‐752 36 Uppsala Sweden
- Forensic Science and Environmental & Life Sciences Trent University 2014 East Bank Dr K9J 7B8 Peterborough Canada
| | - Claire R. Peart
- Department of Evolutionary Biology Evolutionary Biology Centre Uppsala University Norbyvägen 18D SE‐752 36 Uppsala Sweden
| | - Sergio Tusso
- Department of Evolutionary Biology Evolutionary Biology Centre Uppsala University Norbyvägen 18D SE‐752 36 Uppsala Sweden
| | - Inbar Maayan
- Department of Evolutionary Biology Evolutionary Biology Centre Uppsala University Norbyvägen 18D SE‐752 36 Uppsala Sweden
| | - Alan Brelsford
- Department of Ecology and Evolution University of Lausanne CH‐1015 Lausanne Switzerland
| | | | - Jochen B. W. Wolf
- Department of Evolutionary Biology Evolutionary Biology Centre Uppsala University Norbyvägen 18D SE‐752 36 Uppsala Sweden
- Division of Evolutionary Biology Faculty of Biology Ludwig‐Maximilians University of Munich Grosshaderner Str. 2 82152 Planegg‐Martinsried Germany
| |
Collapse
|
521
|
Huddleston J, Chaisson MJP, Steinberg KM, Warren W, Hoekzema K, Gordon D, Graves-Lindsay TA, Munson KM, Kronenberg ZN, Vives L, Peluso P, Boitano M, Chin CS, Korlach J, Wilson RK, Eichler EE. Discovery and genotyping of structural variation from long-read haploid genome sequence data. Genome Res 2016; 27:677-685. [PMID: 27895111 PMCID: PMC5411763 DOI: 10.1101/gr.214007.116] [Citation(s) in RCA: 235] [Impact Index Per Article: 26.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Accepted: 11/15/2016] [Indexed: 01/07/2023]
Abstract
In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural variants (SVs) and indels resolving the sequence structure of 461,553 genetic variants from 2 bp to 28 kbp in length. We find that >89% of these variants have been missed as part of analysis of the 1000 Genomes Project even after adjusting for more common variants (MAF > 1%). We estimate that this theoretical human diploid differs by as much as ∼16 Mbp with respect to the human reference, with long-read sequencing data providing a fivefold increase in sensitivity for genetic variants ranging in size from 7 bp to 1 kbp compared with short-read sequence data. Although a large fraction of genetic variants were not detected by short-read approaches, once the alternate allele is sequence-resolved, we show that 61% of SVs can be genotyped in short-read sequence data sets with high accuracy. Uncoupling discovery from genotyping thus allows for the majority of this missed common variation to be genotyped in the human population. Interestingly, when we repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, we find that ∼59% of the heterozygous SVs are no longer detected by SMRT-SV. These results indicate that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.
Collapse
Affiliation(s)
- John Huddleston
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Mark J P Chaisson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Karyn Meltz Steinberg
- McDonnell Genome Institute, Department of Medicine, Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| | - Wes Warren
- McDonnell Genome Institute, Department of Medicine, Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - David Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Tina A Graves-Lindsay
- McDonnell Genome Institute, Department of Medicine, Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Zev N Kronenberg
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Laura Vives
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Paul Peluso
- Pacific Biosciences of California, Incorporated, Menlo Park, California 94025, USA
| | - Matthew Boitano
- Pacific Biosciences of California, Incorporated, Menlo Park, California 94025, USA
| | - Chen-Shin Chin
- Pacific Biosciences of California, Incorporated, Menlo Park, California 94025, USA
| | - Jonas Korlach
- Pacific Biosciences of California, Incorporated, Menlo Park, California 94025, USA
| | - Richard K Wilson
- Department of Pathology, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
522
|
Cai L, Yuan W, Zhang Z, He L, Chou KC. In-depth comparison of somatic point mutation callers based on different tumor next-generation sequencing depth data. Sci Rep 2016; 6:36540. [PMID: 27874022 PMCID: PMC5118795 DOI: 10.1038/srep36540] [Citation(s) in RCA: 75] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Accepted: 10/17/2016] [Indexed: 12/26/2022] Open
Abstract
Four popular somatic single nucleotide variant (SNV) calling methods (Varscan, SomaticSniper, Strelka and MuTect2) were carefully evaluated on the real whole exome sequencing (WES, depth of ~50X) and ultra-deep targeted sequencing (UDT-Seq, depth of ~370X) data. The four tools returned poor consensus on candidates (only 20% of calls were with multiple hits by the callers). For both WES and UDT-Seq, MuTect2 and Strelka obtained the largest proportion of COSMIC entries as well as the lowest rate of dbSNP presence and high-alternative-alleles-in-control calls, demonstrating their superior sensitivity and accuracy. Combining different callers does increase reliability of candidates, but narrows the list down to very limited range of tumor read depth and variant allele frequency. Calling SNV on UDT-Seq data, which were of much higher read-depth, discovered additional true-positive variations, despite an even more tremendous growth in false positive predictions. Our findings not only provide valuable benchmark for state-of-the-art SNV calling methods, but also shed light on the access to more accurate SNV identification in the future.
Collapse
Affiliation(s)
- Lei Cai
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders (No.13dz2260500), Shanghai Jiao Tong University, Shanghai, 200030, China.,Gordon Life Science Institute, Boston, Massachusetts, 02478, USA
| | - Wei Yuan
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders (No.13dz2260500), Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Zhou Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders (No.13dz2260500), Shanghai Jiao Tong University, Shanghai, 200030, China.,Institute of Biliary Tract Disease, Xinhua Hospital, Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, 200092, China
| | - Lin He
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Key Laboratory of Psychotic Disorders (No.13dz2260500), Shanghai Jiao Tong University, Shanghai, 200030, China.,Women's Hospital School Of Medicine Zhejiang University, Hangzhou, 310006, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, Massachusetts, 02478, USA.,Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| |
Collapse
|
523
|
Abstract
Our understanding of the chronology of human evolution relies on the “molecular clock” provided by the steady accumulation of substitutions on an evolutionary lineage. Recent analyses of human pedigrees have called this understanding into question by revealing unexpectedly low germline mutation rates, which imply that substitutions accrue more slowly than previously believed. Translating mutation rates estimated from pedigrees into substitution rates is not as straightforward as it may seem, however. We dissect the steps involved, emphasizing that dating evolutionary events requires not “a mutation rate” but a precise characterization of how mutations accumulate in development in males and females—knowledge that remains elusive.
Collapse
Affiliation(s)
- Priya Moorjani
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
- * E-mail: (PM); (ZG); (MP)
| | - Ziyue Gao
- Howard Hughes Medical Institute & Dept. of Genetics, Stanford University, Stanford, California, United States of America
- * E-mail: (PM); (ZG); (MP)
| | - Molly Przeworski
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
- Department of Systems Biology, Columbia University, New York, New York, United States of America
- * E-mail: (PM); (ZG); (MP)
| |
Collapse
|
524
|
Branham K, Matsui H, Biswas P, Guru AA, Hicks M, Suk JJ, Li H, Jakubosky D, Long T, Telenti A, Nariai N, Heckenlively JR, Frazer KA, Sieving PA, Ayyagari R. Establishing the involvement of the novel gene AGBL5 in retinitis pigmentosa by whole genome sequencing. Physiol Genomics 2016; 48:922-927. [PMID: 27764769 DOI: 10.1152/physiolgenomics.00101.2016] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2016] [Accepted: 10/06/2016] [Indexed: 02/06/2023] Open
Abstract
While more than 250 genes are known to cause inherited retinal degenerations (IRD), nearly 40-50% of families have the genetic basis for their disease unknown. In this study we sought to identify the underlying cause of IRD in a family by whole genome sequence (WGS) analysis. Clinical characterization including standard ophthalmic examination, fundus photography, visual field testing, electroretinography, and review of medical and family history was performed. WGS was performed on affected and unaffected family members using Illumina HiSeq X10. Sequence reads were aligned to hg19 using BWA-MEM and variant calling was performed with Genome Analysis Toolkit. The called variants were annotated with SnpEff v4.11, PolyPhen v2.2.2, and CADD v1.3. Copy number variations were called using Genome STRiP (svtoolkit 2.00.1611) and SpeedSeq software. Variants were filtered to detect rare potentially deleterious variants segregating with disease. Candidate variants were validated by dideoxy sequencing. Clinical evaluation revealed typical adolescent-onset recessive retinitis pigmentosa (arRP) in affected members. WGS identified about 4 million variants in each individual. Two rare and potentially deleterious compound heterozygous variants p.Arg281Cys and p.Arg487* were identified in the gene ATP/GTP binding protein like 5 (AGBL5) as likely causal variants. No additional variants in IRD genes that segregated with disease were identified. Mutation analysis confirmed the segregation of these variants with the IRD in the pedigree. Homology models indicated destabilization of AGBL5 due to the p.Arg281Cys change. Our findings establish the involvement of mutations in AGBL5 in RP and validate the WGS variant filtering pipeline we designed.
Collapse
Affiliation(s)
- Kari Branham
- Kellogg Eye Center, University of Michigan, Ann Arbor, Michigan
| | - Hiroko Matsui
- Institute for Genomic Medicine, University of California San Diego, La Jolla, California
| | - Pooja Biswas
- Shiley Eye Institute, University of California San Diego, La Jolla, California
| | - Aditya A Guru
- Shiley Eye Institute, University of California San Diego, La Jolla, California
| | | | - John J Suk
- Shiley Eye Institute, University of California San Diego, La Jolla, California
| | - He Li
- Institute for Genomic Medicine, University of California San Diego, La Jolla, California
| | - David Jakubosky
- Institute for Genomic Medicine, University of California San Diego, La Jolla, California
| | - Tao Long
- Human Longevity Incorporated, San Diego, California
| | | | - Naoki Nariai
- Institute for Genomic Medicine, University of California San Diego, La Jolla, California
| | | | - Kelly A Frazer
- Institute for Genomic Medicine, University of California San Diego, La Jolla, California.,Department of Pediatrics and Rady Children's Hospital, Division of Genome Information Sciences, University of California, San Diego, La Jolla, California; and
| | - Paul A Sieving
- National Eye Institute, National Institutes of Health, Bethesda, Maryland
| | - Radha Ayyagari
- Shiley Eye Institute, University of California San Diego, La Jolla, California;
| |
Collapse
|
525
|
Tran Q, Gao S, Phan V. Analysis of optimal alignments unfolds aligners' bias in existing variant profiles. BMC Bioinformatics 2016; 17:349. [PMID: 27766935 PMCID: PMC5073887 DOI: 10.1186/s12859-016-1216-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Efforts such as International HapMap Project and 1000 Genomes Project resulted in a catalog of millions of single nucleotides and insertion/deletion (INDEL) variants of the human population. Viewed as a reference of existing variants, this resource commonly serves as a gold standard for studying and developing methods to detect genetic variants. Our analysis revealed that this reference contained thousands of INDELs that were constructed in a biased manner. This bias occurred at the level of aligning short reads to reference genomes to detect variants. The bias is caused by the existence of many theoretically optimal alignments between the reference genome and reads containing alternative alleles at those INDEL locations. We examined several popular aligners and showed that these aligners could be divided into groups whose alignments yielded INDELs that agreed strongly or disagreed strongly with reported INDELs. This finding suggests that the agreement or disagreement between the aligners’ called INDEL and the reported INDEL is merely a result of the arbitrary selection of one of the optimal alignments. The existence of bias in INDEL calling might have a serious influence in downstream analyses. As such, our finding suggests that this phenomenon should be further addressed.
Collapse
Affiliation(s)
- Quang Tran
- Department of Computer Science, University of Memphis, Memphis, 38152, TN, USA
| | - Shanshan Gao
- Department of Computer Science, University of Memphis, Memphis, 38152, TN, USA
| | - Vinhthuy Phan
- Department of Computer Science, University of Memphis, Memphis, 38152, TN, USA.
| |
Collapse
|
526
|
Hayano T, Matsui H, Nakaoka H, Ohtake N, Hosomichi K, Suzuki K, Inoue I. Germline Variants of Prostate Cancer in Japanese Families. PLoS One 2016; 11:e0164233. [PMID: 27701467 PMCID: PMC5049788 DOI: 10.1371/journal.pone.0164233] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Accepted: 09/21/2016] [Indexed: 02/02/2023] Open
Abstract
Prostate cancer (PC) is the second most common cancer in men. Family history is the major risk factor for PC. Only two susceptibility genes were identified in PC, BRCA2 and HOXB13. A comprehensive search of germline variants for patients with PC has not been reported in Japanese families. In this study, we conducted exome sequencing followed by Sanger sequencing to explore responsible germline variants in 140 Japanese patients with PC from 66 families. In addition to known susceptibility genes, BRCA2 and HOXB13, we identified TRRAP variants in a mutually exclusive manner in seven large PC families (three or four patients per family). We also found shared variants of BRCA2, HOXB13, and TRRAP from 59 additional small PC families (two patients per family). We identified two deleterious HOXB13 variants (F127C and G132E). Further exploration of the shared variants in rest of the families revealed deleterious variants of the so-called cancer genes (ATP1A1, BRIP1, FANCA, FGFR3, FLT3, HOXD11, MUTYH, PDGFRA, SMARCA4, and TCF3). The germline variant profile provides a new insight to clarify the genetic etiology and heterogeneity of PC among Japanese men.
Collapse
Affiliation(s)
- Takahide Hayano
- Division of Human Genetics, National Institute of Genetics, Mishima, Japan
| | - Hiroshi Matsui
- Department of Urology, Gunma University Graduate School of Medicine, Maebashi, Japan
| | - Hirofumi Nakaoka
- Division of Human Genetics, National Institute of Genetics, Mishima, Japan
| | - Nobuaki Ohtake
- Department of Urology, Gunma University Graduate School of Medicine, Maebashi, Japan
| | - Kazuyoshi Hosomichi
- Department of Bioinformatics and Genomics, Graduate School of Medical Sciences, Kanazawa University, Ishikawa, Japan
| | - Kazuhiro Suzuki
- Department of Urology, Gunma University Graduate School of Medicine, Maebashi, Japan
| | - Ituro Inoue
- Division of Human Genetics, National Institute of Genetics, Mishima, Japan
- * E-mail:
| |
Collapse
|
527
|
Tian S, Yan H, Kalmbach M, Slager SL. Impact of post-alignment processing in variant discovery from whole exome data. BMC Bioinformatics 2016; 17:403. [PMID: 27716037 PMCID: PMC5048557 DOI: 10.1186/s12859-016-1279-z] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2016] [Accepted: 09/26/2016] [Indexed: 01/11/2023] Open
Abstract
Background GATK Best Practices workflows are widely used in large-scale sequencing projects and recommend post-alignment processing before variant calling. Two key post-processing steps include the computationally intensive local realignment around known INDELs and base quality score recalibration (BQSR). Both have been shown to reduce erroneous calls; however, the findings are mainly supported by the analytical pipeline that incorporates BWA and GATK UnifiedGenotyper. It is not known whether there is any benefit of post-processing and to what extent the benefit might be for pipelines implementing other methods, especially given that both mappers and callers are typically updated. Moreover, because sequencing platforms are upgraded regularly and the new platforms provide better estimations of read quality scores, the need for post-processing is also unknown. Finally, some regions in the human genome show high sequence divergence from the reference genome; it is unclear whether there is benefit from post-processing in these regions. Results We used both simulated and NA12878 exome data to comprehensively assess the impact of post-processing for five or six popular mappers together with five callers. Focusing on chromosome 6p21.3, which is a region of high sequence divergence harboring the human leukocyte antigen (HLA) system, we found that local realignment had little or no impact on SNP calling, but increased sensitivity was observed in INDEL calling for the Stampy + GATK UnifiedGenotyper pipeline. No or only a modest effect of local realignment was detected on the three haplotype-based callers and no evidence of effect on Novoalign. BQSR had virtually negligible effect on INDEL calling and generally reduced sensitivity for SNP calling that depended on caller, coverage and level of divergence. Specifically, for SAMtools and FreeBayes calling in the regions with low divergence, BQSR reduced the SNP calling sensitivity but improved the precision when the coverage is insufficient. However, in regions of high divergence (e.g., the HLA region), BQSR reduced the sensitivity of both callers with little gain in precision rate. For the other three callers, BQSR reduced the sensitivity without increasing the precision rate regardless of coverage and divergence level. Conclusions We demonstrated that the gain from post-processing is not universal; rather, it depends on mapper and caller combination, and the benefit is influenced further by sequencing depth and divergence level. Our analysis highlights the importance of considering these key factors in deciding to apply the computationally intensive post-processing to Illumina exome data. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1279-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shulan Tian
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st St SW, Rochester, MN, 55905, USA
| | - Huihuang Yan
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st St SW, Rochester, MN, 55905, USA
| | - Michael Kalmbach
- Division of Research and Education Support Systems, Department of Information Technology Mayo Clinic, Rochester, MN, 55905, USA
| | - Susan L Slager
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st St SW, Rochester, MN, 55905, USA.
| |
Collapse
|
528
|
Next Generation Sequencing of Pooled Samples: Guideline for Variants' Filtering. Sci Rep 2016; 6:33735. [PMID: 27670852 PMCID: PMC5037392 DOI: 10.1038/srep33735] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2015] [Accepted: 08/30/2016] [Indexed: 02/07/2023] Open
Abstract
Sequencing large number of individuals, which is often needed for population genetics studies, is still economically challenging despite falling costs of Next Generation Sequencing (NGS). Pool-seq is an alternative cost- and time-effective option in which DNA from several individuals is pooled for sequencing. However, pooling of DNA creates new problems and challenges for accurate variant call and allele frequency (AF) estimation. In particular, sequencing errors confound with the alleles present at low frequency in the pools possibly giving rise to false positive variants. We sequenced 996 individuals in 83 pools (12 individuals/pool) in a targeted re-sequencing experiment. We show that Pool-seq AFs are robust and reliable by comparing them with public variant databases and in-house SNP-genotyping data of individual subjects of pools. Furthermore, we propose a simple filtering guideline for the removal of spurious variants based on the Kolmogorov-Smirnov statistical test. We experimentally validated our filters by comparing Pool-seq to individual sequencing data showing that the filters remove most of the false variants while retaining majority of true variants. The proposed guideline is fairly generic in nature and could be easily applied in other Pool-seq experiments.
Collapse
|
529
|
Jakaitiene A, Avino M, Guarracino MR. Beta-Binomial Model for the Detection of Rare Mutations in Pooled Next-Generation Sequencing Experiments. J Comput Biol 2016; 24:357-367. [PMID: 27632638 DOI: 10.1089/cmb.2016.0106] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Against diminishing costs, next-generation sequencing (NGS) still remains expensive for studies with a large number of individuals. As cost saving, sequencing genome of pools containing multiple samples might be used. Currently, there are many software available for the detection of single-nucleotide polymorphisms (SNPs). Sensitivity and specificity depend on the model used and data analyzed, indicating that all software have space for improvement. We use beta-binomial model to detect rare mutations in untagged pooled NGS experiments. We propose a multireference framework for pooled data with ability being specific up to two patients affected by neuromuscular disorders (NMD). We assessed the results comparing with The Genome Analysis Toolkit (GATK), CRISP, SNVer, and FreeBayes. Our results show that the multireference approach applying beta-binomial model is accurate in predicting rare mutations at 0.01 fraction. Finally, we explored the concordance of mutations between the model and software, checking their involvement in any NMD-related gene. We detected seven novel SNPs, for which the functional analysis produced enriched terms related to locomotion and musculature.
Collapse
Affiliation(s)
- Audrone Jakaitiene
- 1 Bioinformatics and Biostatistics Center, Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University , Vilnius, Lithuania
| | - Mariano Avino
- 2 High Performance Computing and Networking Institute , National Research Council, Naples, Italy
| | - Mario Rosario Guarracino
- 2 High Performance Computing and Networking Institute , National Research Council, Naples, Italy
| |
Collapse
|
530
|
Senís E, Mockenhaupt S, Rupp D, Bauer T, Paramasivam N, Knapp B, Gronych J, Grosse S, Windisch MP, Schmidt F, Theis FJ, Eils R, Lichter P, Schlesner M, Bartenschlager R, Grimm D. TALEN/CRISPR-mediated engineering of a promoterless anti-viral RNAi hairpin into an endogenous miRNA locus. Nucleic Acids Res 2016; 45:e3. [PMID: 27614072 PMCID: PMC5224498 DOI: 10.1093/nar/gkw805] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Revised: 08/31/2016] [Accepted: 09/04/2016] [Indexed: 12/12/2022] Open
Abstract
Successful RNAi applications depend on strategies allowing robust and persistent expression of minimal gene silencing triggers without perturbing endogenous gene expression. Here, we propose a novel avenue which is integration of a promoterless shmiRNA, i.e. a shRNA embedded in a micro-RNA (miRNA) scaffold, into an engineered genomic miRNA locus. For proof-of-concept, we used TALE or CRISPR/Cas9 nucleases to site-specifically integrate an anti-hepatitis C virus (HCV) shmiRNA into the liver-specific miR-122/hcr locus in hepatoma cells, with the aim to obtain cellular clones that are genetically protected against HCV infection. Using reporter assays, Northern blotting and qRT-PCR, we confirmed anti-HCV shmiRNA expression as well as miR-122 integrity and functionality in selected cellular progeny. Moreover, we employed a comprehensive battery of PCR, cDNA/miRNA profiling and whole genome sequencing analyses to validate targeted integration of a single shmiRNA molecule at the expected position, and to rule out deleterious effects on the genomes or transcriptomes of the engineered cells. Importantly, a subgenomic HCV replicon and a full-length reporter virus, but not a Dengue virus control, were significantly impaired in the modified cells. Our original combination of DNA engineering and RNAi expression technologies benefits numerous applications, from miRNA, genome and transgenesis research, to human gene therapy.
Collapse
Affiliation(s)
- Elena Senís
- Department of Infectious Diseases, Virology, Heidelberg University Hospital, Cluster of Excellence CellNetworks, Heidelberg, 69120, Germany.,BioQuant Center, University of Heidelberg, Heidelberg, 69120, Germany
| | - Stefan Mockenhaupt
- Department of Infectious Diseases, Virology, Heidelberg University Hospital, Cluster of Excellence CellNetworks, Heidelberg, 69120, Germany.,BioQuant Center, University of Heidelberg, Heidelberg, 69120, Germany
| | - Daniel Rupp
- Department of Infectious Diseases, Molecular Virology, Heidelberg University Hospital, Heidelberg, 69120, Germany.,Division of Virus-Associated Carcinogenesis (F170), German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
| | - Tobias Bauer
- Division of Theoretical Bioinformatics (B080), German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
| | - Nagarajan Paramasivam
- Division of Theoretical Bioinformatics (B080), German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany.,Medical Faculty Heidelberg, Heidelberg University, Heidelberg, 69120, Germany
| | - Bettina Knapp
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, 85764, Germany
| | - Jan Gronych
- Division of Molecular Genetics (B060), German Cancer Research Center (DKFZ) and German Cancer Consortium (DKTK), Heidelberg, 69120, Germany
| | - Stefanie Grosse
- Department of Infectious Diseases, Virology, Heidelberg University Hospital, Cluster of Excellence CellNetworks, Heidelberg, 69120, Germany.,BioQuant Center, University of Heidelberg, Heidelberg, 69120, Germany
| | - Marc P Windisch
- Department of Infectious Diseases, Molecular Virology, Heidelberg University Hospital, Heidelberg, 69120, Germany
| | - Florian Schmidt
- Department of Infectious Diseases, Virology, Heidelberg University Hospital, Cluster of Excellence CellNetworks, Heidelberg, 69120, Germany.,BioQuant Center, University of Heidelberg, Heidelberg, 69120, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, 85764, Germany.,Department of Mathematics, Technische Universität München, Garching, 85748, Germany
| | - Roland Eils
- BioQuant Center, University of Heidelberg, Heidelberg, 69120, Germany.,Division of Theoretical Bioinformatics (B080), German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany.,Department for Bioinformatics and Functional Genomics, Institute for Pharmacy and Molecular Biotechnology (IPMB), Heidelberg University, Heidelberg, 69120, Germany
| | - Peter Lichter
- Division of Molecular Genetics (B060), German Cancer Research Center (DKFZ) and German Cancer Consortium (DKTK), Heidelberg, 69120, Germany
| | - Matthias Schlesner
- Division of Theoretical Bioinformatics (B080), German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
| | - Ralf Bartenschlager
- Department of Infectious Diseases, Molecular Virology, Heidelberg University Hospital, Heidelberg, 69120, Germany.,Division of Virus-Associated Carcinogenesis (F170), German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
| | - Dirk Grimm
- Department of Infectious Diseases, Virology, Heidelberg University Hospital, Cluster of Excellence CellNetworks, Heidelberg, 69120, Germany .,BioQuant Center, University of Heidelberg, Heidelberg, 69120, Germany
| |
Collapse
|
531
|
Abstract
The number of large-scale genomics projects is increasing due to the availability of affordable high-throughput sequencing (HTS) technologies. The use of HTS for bacterial infectious disease research is attractive because one whole-genome sequencing (WGS) run can replace multiple assays for bacterial typing, molecular epidemiology investigations, and more in-depth pathogenomic studies. The computational resources and bioinformatics expertise required to accommodate and analyze the large amounts of data pose new challenges for researchers embarking on genomics projects for the first time. Here, we present a comprehensive overview of a bacterial genomics projects from beginning to end, with a particular focus on the planning and computational requirements for HTS data, and provide a general understanding of the analytical concepts to develop a workflow that will meet the objectives and goals of HTS projects.
Collapse
|
532
|
Popitsch N, Schuh A, Taylor JC. ReliableGenome: annotation of genomic regions with high/low variant calling concordance. Bioinformatics 2016; 33:155-160. [PMID: 27605105 PMCID: PMC5903559 DOI: 10.1093/bioinformatics/btw587] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Revised: 08/12/2016] [Accepted: 09/04/2016] [Indexed: 12/30/2022] Open
Abstract
Motivation The increasing adoption of clinical whole-genome resequencing (WGS) demands for highly accurate and reproducible variant calling (VC) methods. The observed discordance between state-of-the-art VC pipelines, however, indicates that the current practice still suffers from non-negligible numbers of false positive and negative SNV and INDEL calls that were shown to be enriched among discordant calls but also in genomic regions with low sequence complexity. Results Here, we describe our method ReliableGenome (RG) for partitioning genomes into high and low concordance regions with respect to a set of surveyed VC pipelines. Our method combines call sets derived by multiple pipelines from arbitrary numbers of datasets and interpolates expected concordance for genomic regions without data. By applying RG to 219 deep human WGS datasets, we demonstrate that VC concordance depends predominantly on genomic context rather than the actual sequencing data which manifests in high recurrence of regions that can/cannot be reliably genotyped by a single method. This enables the application of pre-computed regions to other data created with comparable sequencing technology and software. RG outperforms comparable efforts in predicting VC concordance and false positive calls in low-concordance regions which underlines its usefulness for variant filtering, annotation and prioritization. RG allows focusing resource-intensive algorithms (e.g. consensus calling methods) on the smaller, discordant share of the genome (20–30%) which might result in increased overall accuracy at reasonable costs. Our method and analysis of discordant calls may further be useful for development, benchmarking and optimization of VC algorithms and for the relative comparison of call sets between different studies/pipelines. Availability and Implementation RG was implemented in Java, source code and binaries are freely available for non-commercial use at https://github.com/popitsch/wtchg-rg/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Niko Popitsch
- Wellcome Trust Centre of Human Genetics, University of Oxford, Oxford OX3 7BN, UK.,National Institute for Health Research (NIHR) Oxford Biomedical Research Centre, The Churchill Hospital, Old Road OX3 7LE, UK
| | | | - Anna Schuh
- National Institute for Health Research (NIHR) Oxford Biomedical Research Centre, The Churchill Hospital, Old Road OX3 7LE, UK.,Department of Oncology, University of Oxford, Oxford OX3 7DQ, UK
| | - Jenny C Taylor
- Wellcome Trust Centre of Human Genetics, University of Oxford, Oxford OX3 7BN, UK.,National Institute for Health Research (NIHR) Oxford Biomedical Research Centre, The Churchill Hospital, Old Road OX3 7LE, UK
| |
Collapse
|
533
|
Tian S, Yan H, Neuhauser C, Slager SL. An analytical workflow for accurate variant discovery in highly divergent regions. BMC Genomics 2016; 17:703. [PMID: 27590916 PMCID: PMC5010666 DOI: 10.1186/s12864-016-3045-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2016] [Accepted: 08/25/2016] [Indexed: 02/07/2023] Open
Abstract
Background Current variant discovery methods often start with the mapping of short reads to a reference genome; yet, their performance deteriorates in genomic regions where the reads are highly divergent from the reference sequence. This is particularly problematic for the human leukocyte antigen (HLA) region on chromosome 6p21.3. This region is associated with over 100 diseases, but variant calling is hindered by the extreme divergence across different haplotypes. Results We simulated reads from chromosome 6 exonic regions over a wide range of sequence divergence and coverage depth. We systematically assessed combinations between five mappers and five callers for their performance on simulated data and exome-seq data from NA12878, a well-studied individual in which multiple public call sets have been generated. Among those combinations, the number of known SNPs differed by about 5 % in the non-HLA regions of chromosome 6 but over 20 % in the HLA region. Notably, GSNAP mapping combined with GATK UnifiedGenotyper calling identified about 20 % more known SNPs than most existing methods without a noticeable loss of specificity, with 100 % sensitivity in three highly polymorphic HLA genes examined. Much larger differences were observed among these combinations in INDEL calling from both non-HLA and HLA regions. We obtained similar results with our internal exome-seq data from a cohort of chronic lymphocytic leukemia patients. Conclusions We have established a workflow enabling variant detection, with high sensitivity and specificity, over the full spectrum of divergence seen in the human genome. Comparing to public call sets from NA12878 has highlighted the overall superiority of GATK UnifiedGenotyper, followed by GATK HaplotypeCaller and SAMtools, in SNP calling, and of GATK HaplotypeCaller and Platypus in INDEL calling, particularly in regions of high sequence divergence such as the HLA region. GSNAP and Novoalign are the ideal mappers in combination with the above callers. We expect that the proposed workflow should be applicable to variant discovery in other highly divergent regions. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3045-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shulan Tian
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st St SW, Rochester, MN, 55905, USA
| | - Huihuang Yan
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st St SW, Rochester, MN, 55905, USA
| | - Claudia Neuhauser
- Informatics Institute, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Susan L Slager
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st St SW, Rochester, MN, 55905, USA.
| |
Collapse
|
534
|
Lim HC, Braun MJ. High‐throughput
SNP
genotyping of historical and modern samples of five bird species via sequence capture of ultraconserved elements. Mol Ecol Resour 2016; 16:1204-23. [DOI: 10.1111/1755-0998.12568] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2015] [Revised: 07/12/2016] [Accepted: 07/15/2016] [Indexed: 11/30/2022]
Affiliation(s)
- Haw Chuan Lim
- Department of Vertebrate Zoology National Museum of Natural History Smithsonian Institution Washington DC 20560 USA
| | - Michael J. Braun
- Department of Vertebrate Zoology National Museum of Natural History Smithsonian Institution Washington DC 20560 USA
| |
Collapse
|
535
|
Furi L, Haigh R, Al Jabri ZJH, Morrissey I, Ou HY, León-Sampedro R, Martinez JL, Coque TM, Oggioni MR. Dissemination of Novel Antimicrobial Resistance Mechanisms through the Insertion Sequence Mediated Spread of Metabolic Genes. Front Microbiol 2016; 7:1008. [PMID: 27446047 PMCID: PMC4923244 DOI: 10.3389/fmicb.2016.01008] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2016] [Accepted: 06/13/2016] [Indexed: 12/14/2022] Open
Abstract
The widely used biocide triclosan selectively targets FabI, the NADH-dependent trans-2-enoyl-acyl carrier protein (ACP) reductase, which is also an important target for the development of narrow spectrum antibiotics. The analysis of triclosan resistant Staphylococcus aureus isolates had previously shown that in about half of the strains, the mechanism of triclosan resistance consists on the heterologous duplication of the triclosan target gene due to the acquisition of an additional fabI allele derived from Staphylococcus haemolyticus (sh-fabI). In the current work, the genomic sequencing of 10 of these strains allowed the characterization of two novel composite transposons TnSha1 and TnSha2 involved in the spread of sh-fabI. TnSha1 harbors one copy of IS1272, whereas TnSha2 is a 11.7 kb plasmid carrying TnSha1 present either as plasmid or in an integrated form generally flanked by two IS1272 elements. The target and mechanism of integration for IS1272 and TnSha1 are novel and include targeting of DNA secondary structures, generation of blunt-end deletions of the stem-loop and absence of target duplication. Database analyses showed widespread occurrence of these two elements in chromosomes and plasmids, with TnSha1 mainly in S. aureus and with TnSha2 mainly in S. haemolyticus and S. epidermidis. The acquisition of resistance by means of an insertion sequence-based mobilization and consequent duplication of drug-target metabolic genes, as observed here for sh-fabI, is highly reminiscent of the situation with the ileS2 gene conferring mupirocin resistance, and the dfrA and dfrG genes conferring trimethoprim resistance both of which are mobilized by IS257. These three examples, which show similar mechanisms and levels of spread of metabolic genes linked to IS elements, highlight the importance of this genetic strategy for recruitment and rapid distribution of novel resistance mechanisms in staphylococci.
Collapse
Affiliation(s)
- Leonardo Furi
- Department of Genetics, University of LeicesterLeicester, UK; Dipartimento di Biotecnologie Mediche, Universita di SienaSiena, Italy
| | - Richard Haigh
- Department of Genetics, University of Leicester Leicester, UK
| | | | | | - Hong-Yu Ou
- State Key Laboratory for Microbial Metabolism and School of Life Sciences and Biotechnology, Shanghai Jiaotong University Shanghai, China
| | - Ricardo León-Sampedro
- Departamento de Microbiología, Instituto Ramón y Cajal de Investigación Sanitaria, Hospital Universitario Ramón y CajalMadrid, Spain; Centro de Investigación Biomédica en Red de Epidemiología y Salud Pública (CIBERESP)Spain
| | - Jose L Martinez
- Departamento de Biotecnología Microbiana, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones CientíficasMadrid, Spain; Unidad de Resistencia a Antibióticos y Virulencia Bacteriana (RYC-Consejo Superior de Investigaciones Científicas)Madrid, Spain
| | - Teresa M Coque
- Departamento de Microbiología, Instituto Ramón y Cajal de Investigación Sanitaria, Hospital Universitario Ramón y CajalMadrid, Spain; Centro de Investigación Biomédica en Red de Epidemiología y Salud Pública (CIBERESP)Spain; Unidad de Resistencia a Antibióticos y Virulencia Bacteriana (RYC-Consejo Superior de Investigaciones Científicas)Madrid, Spain
| | - Marco R Oggioni
- Department of Genetics, University of LeicesterLeicester, UK; Dipartimento di Biotecnologie Mediche, Universita di SienaSiena, Italy
| |
Collapse
|
536
|
Xia LC, Sakshuwong S, Hopmans ES, Bell JM, Grimes SM, Siegmund DO, Ji HP, Zhang NR. A genome-wide approach for detecting novel insertion-deletion variants of mid-range size. Nucleic Acids Res 2016; 44:e126. [PMID: 27325742 PMCID: PMC5009736 DOI: 10.1093/nar/gkw481] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2015] [Accepted: 05/15/2016] [Indexed: 11/14/2022] Open
Abstract
We present SWAN, a statistical framework for robust detection of genomic structural variants in next-generation sequencing data and an analysis of mid-range size insertion and deletions (<10 Kb) for whole genome analysis and DNA mixtures. To identify these mid-range size events, SWAN collectively uses information from read-pair, read-depth and one end mapped reads through statistical likelihoods based on Poisson field models. SWAN also uses soft-clip/split read remapping to supplement the likelihood analysis and determine variant boundaries. The accuracy of SWAN is demonstrated by in silico spike-ins and by identification of known variants in the NA12878 genome. We used SWAN to identify a series of novel set of mid-range insertion/deletion detection that were confirmed by targeted deep re-sequencing. An R package implementation of SWAN is open source and freely available.
Collapse
Affiliation(s)
- Li C Xia
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA Department of Statistics, the Wharton School, University of Pennsylvania, Philadelphia, PA 18014, USA
| | - Sukolsak Sakshuwong
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Erik S Hopmans
- Stanford Genome Technology Centre, Stanford University, Palo Alto, CA 94304, USA
| | - John M Bell
- Stanford Genome Technology Centre, Stanford University, Palo Alto, CA 94304, USA
| | - Susan M Grimes
- Stanford Genome Technology Centre, Stanford University, Palo Alto, CA 94304, USA
| | - David O Siegmund
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA Stanford Genome Technology Centre, Stanford University, Palo Alto, CA 94304, USA
| | - Nancy R Zhang
- Department of Statistics, the Wharton School, University of Pennsylvania, Philadelphia, PA 18014, USA
| |
Collapse
|
537
|
Pedersen BS, Layer RM, Quinlan AR. Vcfanno: fast, flexible annotation of genetic variants. Genome Biol 2016; 17:118. [PMID: 27250555 PMCID: PMC4888505 DOI: 10.1186/s13059-016-0973-5] [Citation(s) in RCA: 125] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Accepted: 05/03/2016] [Indexed: 01/17/2023] Open
Abstract
The integration of genome annotations is critical to the identification of genetic variants that are relevant to studies of disease or other traits. However, comprehensive variant annotation with diverse file formats is difficult with existing methods. Here we describe vcfanno, which flexibly extracts and summarizes attributes from multiple annotation files and integrates the annotations within the INFO column of the original VCF file. By leveraging a parallel "chromosome sweeping" algorithm, we demonstrate substantial performance gains by annotating ~85,000 variants per second with 50 attributes from 17 commonly used genome annotation resources. Vcfanno is available at https://github.com/brentp/vcfanno under the MIT license.
Collapse
Affiliation(s)
- Brent S Pedersen
- Department of Human Genetics, University of Utah, Salt Lake City, UT, 84105, USA.
- USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, 84105, USA.
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, 84105, USA.
| | - Ryan M Layer
- Department of Human Genetics, University of Utah, Salt Lake City, UT, 84105, USA
- USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, 84105, USA
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, 84105, USA
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, UT, 84105, USA.
- USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT, 84105, USA.
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, 84105, USA.
| |
Collapse
|
538
|
Liddiard K, Ruis B, Takasugi T, Harvey A, Ashelford KE, Hendrickson EA, Baird DM. Sister chromatid telomere fusions, but not NHEJ-mediated inter-chromosomal telomere fusions, occur independently of DNA ligases 3 and 4. Genome Res 2016; 26:588-600. [PMID: 26941250 PMCID: PMC4864465 DOI: 10.1101/gr.200840.115] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2015] [Accepted: 03/02/2016] [Indexed: 01/26/2023]
Abstract
Telomeres shorten with each cell division and can ultimately become substrates for nonhomologous end-joining repair, leading to large-scale genomic rearrangements of the kind frequently observed in human cancers. We have characterized more than 1400 telomere fusion events at the single-molecule level, using a combination of high-throughput sequence analysis together with experimentally induced telomeric double-stranded DNA breaks. We show that a single chromosomal dysfunctional telomere can fuse with diverse nontelomeric genomic loci, even in the presence of an otherwise stable genome, and that fusion predominates in coding regions. Fusion frequency was markedly increased in the absence of TP53 checkpoint control and significantly modulated by the cellular capacity for classical, versus alternative, nonhomologous end joining (NHEJ). We observed a striking reduction in inter-chromosomal fusion events in cells lacking DNA ligase 4, in contrast to a remarkably consistent profile of intra-chromosomal fusion in the context of multiple genetic knockouts, including DNA ligase 3 and 4 double-knockouts. We reveal distinct mutational signatures associated with classical NHEJ-mediated inter-chromosomal, as opposed to alternative NHEJ-mediated intra-chromosomal, telomere fusions and evidence for an unanticipated sufficiency of DNA ligase 1 for these intra-chromosomal events. Our findings have implications for mechanisms driving cancer genome evolution.
Collapse
Affiliation(s)
- Kate Liddiard
- Institute of Cancer and Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, United Kingdom
| | - Brian Ruis
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota Medical School, Minneapolis, Minnesota 55455, USA
| | - Taylor Takasugi
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota Medical School, Minneapolis, Minnesota 55455, USA
| | - Adam Harvey
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota Medical School, Minneapolis, Minnesota 55455, USA
| | - Kevin E Ashelford
- Institute of Cancer and Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, United Kingdom
| | - Eric A Hendrickson
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota Medical School, Minneapolis, Minnesota 55455, USA
| | - Duncan M Baird
- Institute of Cancer and Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, United Kingdom
| |
Collapse
|
539
|
Germline RECQL mutations in high risk Chinese breast cancer patients. Breast Cancer Res Treat 2016; 157:211-215. [PMID: 27125668 DOI: 10.1007/s10549-016-3784-1] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Accepted: 04/05/2016] [Indexed: 10/21/2022]
Abstract
Recently, RECQL was reported as a new breast cancer susceptibility gene. RECQL belongs to the RECQ DNA helicase family which unwinds double strand DNA and involved in the DNA replication stress response, telomere maintenance and DNA repair. RECQL deficient mice cells are prone to spontaneous chromosomal instability and aneuploidy, suggesting a tumor-suppressive role of RECQL in cancer. In this study, RECQL gene mutation screening was performed on 1110 breast cancer patients who were negative for BRCA1, BRCA2, TP53 and PTEN gene mutations and recruited from March 2007 to June 2015 in the Hong Kong Hereditary and High Risk Breast Cancer Program. Four different RECQL pathogenic mutations were identified in six of the 1110 (0.54 %) tested breast cancer patients. The identified mutations include one frame-shift deletion (c.974_977delAAGA), two splicing site mutations (c.394+1G>A, c.867+1G>T) and one nonsense mutation (c.796C>T, p.Gln266Ter). Two of the mutations (c.867+1G>T and p.Gln266Ter) were seen in more than one patients. This study provides the basis for existing of pathogenic RECQL mutations in Southern Chinese breast cancer patients. The significance of rare variants in RECQL gene in the estimation of breast cancer risk warranted further investigation in larger cohort of patients and in other ethnic groups.
Collapse
|
540
|
Campino S, Benavente ED, Assefa S, Thompson E, Drought LG, Taylor CJ, Gorvett Z, Carret CK, Flueck C, Ivens AC, Kwiatkowski DP, Alano P, Baker DA, Clark TG. Genomic variation in two gametocyte non-producing Plasmodium falciparum clonal lines. Malar J 2016; 15:229. [PMID: 27098483 PMCID: PMC4839107 DOI: 10.1186/s12936-016-1254-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2016] [Accepted: 03/30/2016] [Indexed: 11/10/2022] Open
Abstract
Background Transmission of the malaria parasite Plasmodium falciparum from humans to the mosquito vector requires differentiation of a sub-population of asexual forms replicating within red blood cells into non-dividing male and female gametocytes. The nature of the molecular mechanism underlying this key differentiation event required for malaria transmission is not fully understood. Methods Whole genome sequencing was used to examine the genomic diversity of the gametocyte non-producing 3D7-derived lines F12 and A4. These lines were used in the recent detection of the PF3D7_1222600 locus (encoding PfAP2-G), which acts as a genetic master switch that triggers gametocyte development. Results The evolutionary changes from the 3D7 parental strain through its derivatives F12 (culture-passage derived cloned line) and A4 (transgenic cloned line) were identified. The genetic differences including the formation of chimeric var genes are presented. Conclusion A genomics resource is provided for the further study of gametocytogenesis or other phenotypes using these parasite lines. Electronic supplementary material The online version of this article (doi:10.1186/s12936-016-1254-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Susana Campino
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London, UK.
| | - Ernest Diez Benavente
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London, UK
| | - Samuel Assefa
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London, UK
| | - Eloise Thompson
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London, UK
| | - Laura G Drought
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London, UK
| | - Catherine J Taylor
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London, UK
| | - Zaria Gorvett
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London, UK
| | - Celine K Carret
- The European Molecular Biology Organization, Heidelberg, Germany
| | - Christian Flueck
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London, UK
| | - Al C Ivens
- Centre for Immunity, Infection and Evolution, University of Edinburgh, Edinburgh, UK
| | - Dominic P Kwiatkowski
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.,Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Pietro Alano
- Dipartimento di Malattie Infettive, Parassitarie ed Immunomediate, Istituto Superiore di Sanità, Rome, Italy
| | - David A Baker
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London, UK
| | - Taane G Clark
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London, UK.,Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, London, UK
| |
Collapse
|
541
|
Gordon D, Huddleston J, Chaisson MJP, Hill CM, Kronenberg ZN, Munson KM, Malig M, Raja A, Fiddes I, Hillier LW, Dunn C, Baker C, Armstrong J, Diekhans M, Paten B, Shendure J, Wilson RK, Haussler D, Chin CS, Eichler EE. Long-read sequence assembly of the gorilla genome. Science 2016; 352:aae0344. [PMID: 27034376 PMCID: PMC4920363 DOI: 10.1126/science.aae0344] [Citation(s) in RCA: 232] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 02/26/2016] [Indexed: 12/24/2022]
Abstract
Accurate sequence and assembly of genomes is a critical first step for studies of genetic variation. We generated a high-quality assembly of the gorilla genome using single-molecule, real-time sequence technology and a string graph de novo assembly algorithm. The new assembly improves contiguity by two to three orders of magnitude with respect to previously released assemblies, recovering 87% of missing reference exons and incomplete gene models. Although regions of large, high-identity segmental duplications remain largely unresolved, this comprehensive assembly provides new biological insight into genetic diversity, structural variation, gene loss, and representation of repeat structures within the gorilla genome. The approach provides a path forward for the routine assembly of mammalian genomes at a level approaching that of the current quality of the human genome.
Collapse
Affiliation(s)
- David Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA. Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - John Huddleston
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA. Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Mark J P Chaisson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Christopher M Hill
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Zev N Kronenberg
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Maika Malig
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Archana Raja
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA. Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Ian Fiddes
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, CA 95064, USA
| | - LaDeana W Hillier
- McDonnell Genome Institute, Department of Medicine, Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, USA
| | | | - Carl Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Joel Armstrong
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, CA 95064, USA
| | - Benedict Paten
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, CA 95064, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA. Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Richard K Wilson
- McDonnell Genome Institute, Department of Medicine, Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - David Haussler
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, CA 95064, USA
| | - Chen-Shan Chin
- Pacific Biosciences of California, Menlo Park, CA 94025, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA. Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
542
|
Sequence Diversity, Intersubgroup Relationships, and Origins of the Mouse Leukemia Gammaretroviruses of Laboratory and Wild Mice. J Virol 2016; 90:4186-98. [PMID: 26865715 DOI: 10.1128/jvi.03186-15] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Accepted: 02/03/2016] [Indexed: 12/12/2022] Open
Abstract
UNLABELLED Mouse leukemia viruses (MLVs) are found in the common inbred strains of laboratory mice and in the house mouse subspecies ofMus musculus Receptor usage and envelope (env) sequence variation define three MLV host range subgroups in laboratory mice: ecotropic, polytropic, and xenotropic MLVs (E-, P-, and X-MLVs, respectively). These exogenous MLVs derive from endogenous retroviruses (ERVs) that were acquired by the wild mouse progenitors of laboratory mice about 1 million years ago. We analyzed the genomes of seven MLVs isolated from Eurasian and American wild mice and three previously sequenced MLVs to describe their relationships and identify their possible ERV progenitors. The phylogenetic tree based on the receptor-determining regions ofenvproduced expected host range clusters, but these clusters are not maintained in trees generated from other virus regions. Colinear alignments of the viral genomes identified segmental homologies to ERVs of different host range subgroups. Six MLVs show close relationships to a small xenotropic ERV subgroup largely confined to the inbred mouse Y chromosome.envvariations define three E-MLV subtypes, one of which carries duplications of various sizes, sequences, and locations in the proline-rich region ofenv Outside theenvregion, all E-MLVs are related to different nonecotropic MLVs. These results document the diversity in gammaretroviruses isolated from globally distributedMussubspecies, provide insight into their origins and relationships, and indicate that recombination has had an important role in the evolution of these mutagenic and pathogenic agents. IMPORTANCE Laboratory mice carry mouse leukemia viruses (MLVs) of three host range groups which were acquired from their wild mouse progenitors. We sequenced the complete genomes of seven infectious MLVs isolated from geographically separated Eurasian and American wild mice and compared them with endogenous germ line retroviruses (ERVs) acquired early in house mouse evolution. We did this because the laboratory mouse viruses derive directly from specific ERVs or arise by recombination between different ERVs. The six distinctively different wild mouse viruses appear to be recombinants, often involving different host range subgroups, and most are related to a distinctive, largely Y-chromosome-linked MLV ERV subtype. MLVs with ecotropic host ranges show the greatest variability with extensive inter- and intrasubtype envelope differences and with homologies to other host range subgroups outside the envelope. The sequence diversity among these wild mouse isolates helps define their relationships and origins and emphasizes the importance of recombination in their evolution.
Collapse
|
543
|
Phelan J, Coll F, McNerney R, Ascher DB, Pires DEV, Furnham N, Coeck N, Hill-Cawthorne GA, Nair MB, Mallard K, Ramsay A, Campino S, Hibberd ML, Pain A, Rigouts L, Clark TG. Mycobacterium tuberculosis whole genome sequencing and protein structure modelling provides insights into anti-tuberculosis drug resistance. BMC Med 2016; 14:31. [PMID: 27005572 PMCID: PMC4804620 DOI: 10.1186/s12916-016-0575-9] [Citation(s) in RCA: 87] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 02/02/2016] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Combating the spread of drug resistant tuberculosis is a global health priority. Whole genome association studies are being applied to identify genetic determinants of resistance to anti-tuberculosis drugs. Protein structure and interaction modelling are used to understand the functional effects of putative mutations and provide insight into the molecular mechanisms leading to resistance. METHODS To investigate the potential utility of these approaches, we analysed the genomes of 144 Mycobacterium tuberculosis clinical isolates from The Special Programme for Research and Training in Tropical Diseases (TDR) collection sourced from 20 countries in four continents. A genome-wide approach was applied to 127 isolates to identify polymorphisms associated with minimum inhibitory concentrations for first-line anti-tuberculosis drugs. In addition, the effect of identified candidate mutations on protein stability and interactions was assessed quantitatively with well-established computational methods. RESULTS The analysis revealed that mutations in the genes rpoB (rifampicin), katG (isoniazid), inhA-promoter (isoniazid), rpsL (streptomycin) and embB (ethambutol) were responsible for the majority of resistance observed. A subset of the mutations identified in rpoB and katG were predicted to affect protein stability. Further, a strong direct correlation was observed between the minimum inhibitory concentration values and the distance of the mutated residues in the three-dimensional structures of rpoB and katG to their respective drugs binding sites. CONCLUSIONS Using the TDR resource, we demonstrate the usefulness of whole genome association and convergent evolution approaches to detect known and potentially novel mutations associated with drug resistance. Further, protein structural modelling could provide a means of predicting the impact of polymorphisms on drug efficacy in the absence of phenotypic data. These approaches could ultimately lead to novel resistance mutations to improve the design of tuberculosis control measures, such as diagnostics, and inform patient management.
Collapse
Affiliation(s)
- Jody Phelan
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Francesc Coll
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Ruth McNerney
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK.,University of Cape Town Lung Institute, Lung Infection & Immunity Unit, Old Main Building, Groote Schuur Hospital, Observatory, Cape Town, 7925, South Africa
| | - David B Ascher
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge, CB2 1GA, UK
| | - Douglas E V Pires
- Centro de Pesquisas René Rachou, Fundação Oswaldo Cruz, Avenida Augusto de Lima 1715, Belo Horizonte, 30190-002, Brazil
| | - Nick Furnham
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Nele Coeck
- Mycobacteriology Unit, Institute of Tropical Medicine, Antwerp, Belgium
| | - Grant A Hill-Cawthorne
- Pathogen Genomics Laboratory, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.,Sydney Emerging Infections and Biosecurity Institute and School of Public Health, Sydney Medical School, University of Sydney, Sydney, NSW, 2006, Australia
| | - Mridul B Nair
- Pathogen Genomics Laboratory, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Kim Mallard
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Andrew Ramsay
- Special Programme for Research and Training in Tropical Diseases (TDR), World Health Organisation, Geneva, Switzerland
| | - Susana Campino
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Martin L Hibberd
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Arnab Pain
- Pathogen Genomics Laboratory, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Leen Rigouts
- Mycobacteriology Unit, Institute of Tropical Medicine, Antwerp, Belgium.,Department of Biomedical Sciences, Antwerp University, Antwerp, Belgium
| | - Taane G Clark
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK. .,Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK. .,Department of Pathogen Molecular Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, UK.
| |
Collapse
|
544
|
Singh T, Kurki MI, Curtis D, Purcell SM, Crooks L, McRae J, Suvisaari J, Chheda H, Blackwood D, Breen G, Pietiläinen O, Gerety SS, Ayub M, Blyth M, Cole T, Collier D, Coomber EL, Craddock N, Daly MJ, Danesh J, DiForti M, Foster A, Freimer NB, Geschwind D, Johnstone M, Joss S, Kirov G, Körkkö J, Kuismin O, Holmans P, Hultman CM, Iyegbe C, Lönnqvist J, Männikkö M, McCarroll SA, McGuffin P, McIntosh AM, McQuillin A, Moilanen JS, Moore C, Murray RM, Newbury-Ecob R, Ouwehand W, Paunio T, Prigmore E, Rees E, Roberts D, Sambrook J, Sklar P, St Clair D, Veijola J, Walters JTR, Williams H, Sullivan PF, Hurles ME, O'Donovan MC, Palotie A, Owen MJ, Barrett JC. Rare loss-of-function variants in SETD1A are associated with schizophrenia and developmental disorders. Nat Neurosci 2016; 19:571-7. [PMID: 26974950 DOI: 10.1038/nn.4267] [Citation(s) in RCA: 321] [Impact Index Per Article: 35.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2016] [Accepted: 02/11/2016] [Indexed: 12/17/2022]
Abstract
By analyzing the whole-exome sequences of 4,264 schizophrenia cases, 9,343 controls and 1,077 trios, we identified a genome-wide significant association between rare loss-of-function (LoF) variants in SETD1A and risk for schizophrenia (P = 3.3 × 10(-9)). We found only two heterozygous LoF variants in 45,376 exomes from individuals without a neuropsychiatric diagnosis, indicating that SETD1A is substantially depleted of LoF variants in the general population. Seven of the ten individuals with schizophrenia carrying SETD1A LoF variants also had learning difficulties. We further identified four SETD1A LoF carriers among 4,281 children with severe developmental disorders and two more carriers in an independent sample of 5,720 Finnish exomes, both with notable neuropsychiatric phenotypes. Together, our observations indicate that LoF variants in SETD1A cause a range of neurodevelopmental disorders, including schizophrenia. Combining these data with previous common variant evidence, we suggest that epigenetic dysregulation, specifically in the histone H3K4 methylation pathway, is an important mechanism in the pathogenesis of schizophrenia.
Collapse
Affiliation(s)
- Tarjinder Singh
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Mitja I Kurki
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland.,Program in Medical and Population Genetics and Genetic Analysis Platform, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - David Curtis
- University College London Genetics Institute, University College London, London, UK
| | - Shaun M Purcell
- Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Lucy Crooks
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.,Sheffield Diagnostic Genetics Service, Sheffield Childrens' NHS Foundation Trust, Sheffield, UK
| | - Jeremy McRae
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Jaana Suvisaari
- National Institute for Health and Welfare (THL), Helsinki, Finland
| | - Himanshu Chheda
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Douglas Blackwood
- Division of Psychiatry, The University of Edinburgh, Royal Edinburgh Hospital, Edinburgh, UK
| | - Gerome Breen
- Institute of Psychiatry, Kings College London, London, UK.,NIHR BRC for Mental Health, Institute of Psychiatry and SLaM NHS Trust, King's College London, London, UK
| | - Olli Pietiläinen
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.,Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland.,National Institute for Health and Welfare (THL), Helsinki, Finland
| | - Sebastian S Gerety
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Muhammad Ayub
- Division of Developmental Disabilities, Department of Psychiatry, Queen's University, Kingston, Ontario, Canada
| | - Moira Blyth
- Department of Clinical Genetics, Chapel Allerton Hospital, Chapeltown Road, Leeds, UK
| | - Trevor Cole
- Birmingham Women's Hospital, Edgbaston, Birmingham, UK
| | - David Collier
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, King's College London, London, UK.,Lilly Research Laboratories, Eli Lilly &Co. Ltd., Windlesham, Surrey, UK
| | - Eve L Coomber
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Nick Craddock
- MRC Centre for Neuropsychiatric Genetics &Genomics, Institute of Psychological Medicine &Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Mark J Daly
- Program in Medical and Population Genetics and Genetic Analysis Platform, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - John Danesh
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.,NIHR Blood and Transplant Research Unit in Donor Health and Genomics, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,INTERVAL Coordinating Centre, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Marta DiForti
- Institute of Psychiatry, Kings College London, London, UK
| | - Alison Foster
- Clinical Genetics Unit, Birmingham Women's NHS Foundation Trust, Edgbaston, Birmingham, UK
| | - Nelson B Freimer
- Center for Neurobehavioral Genetics, University of California Los Angeles, Los Angeles, California, USA
| | - Daniel Geschwind
- UCLA David Geffen School of Medicine, Los Angeles, California, USA
| | - Mandy Johnstone
- Division of Psychiatry, The University of Edinburgh, Royal Edinburgh Hospital, Edinburgh, UK
| | - Shelagh Joss
- West of Scotland Genetics Service, South Glasgow University Hospitals, Glasgow, UK
| | - Georg Kirov
- MRC Centre for Neuropsychiatric Genetics &Genomics, Institute of Psychological Medicine &Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Jarmo Körkkö
- Center for Intellectual Disability Care, Oulu University Hospital and University of Oulu, Oulu, Finland
| | - Outi Kuismin
- PEDEGO Research Unit, Medical Research Center Oulu, Oulu University Hospital and University of Oulu, Oulu, Finland
| | - Peter Holmans
- MRC Centre for Neuropsychiatric Genetics &Genomics, Institute of Psychological Medicine &Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Christina M Hultman
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Conrad Iyegbe
- Institute of Psychiatry, Kings College London, London, UK
| | - Jouko Lönnqvist
- National Institute for Health and Welfare (THL), Helsinki, Finland
| | - Minna Männikkö
- Center for Life Course Epidemiology and Systems Medicine, University of Oulu, Oulu, Finland
| | - Steve A McCarroll
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.,Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
| | - Peter McGuffin
- Institute of Psychiatry, Kings College London, London, UK
| | - Andrew M McIntosh
- Division of Psychiatry, The University of Edinburgh, Royal Edinburgh Hospital, Edinburgh, UK
| | - Andrew McQuillin
- University College London, Molecular Psychiatry Laboratory, Division of Psychiatry, London, UK
| | - Jukka S Moilanen
- PEDEGO Research Unit, Medical Research Center Oulu, Oulu University Hospital and University of Oulu, Oulu, Finland
| | - Carmel Moore
- NIHR Blood and Transplant Research Unit in Donor Health and Genomics, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,INTERVAL Coordinating Centre, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Robin M Murray
- Institute of Psychiatry, Kings College London, London, UK.,NIHR BRC for Mental Health, Institute of Psychiatry and SLaM NHS Trust, King's College London, London, UK
| | - Ruth Newbury-Ecob
- Department of Clinical Genetics, University Hospitals Bristol NHS Foundation Trust, St Michael's Hospital, Bristol, UK
| | - Willem Ouwehand
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.,NIHR Blood and Transplant Research Unit in Donor Health and Genomics, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,Department of Haemotology, University of Cambridge, Cambridge, UK.,NHS Blood and Transplant, Cambridge, UK
| | - Tiina Paunio
- National Institute for Health and Welfare (THL), Helsinki, Finland.,University of Helsinki, Department of Psychiatry, Helsinki, Finland
| | - Elena Prigmore
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Elliott Rees
- MRC Centre for Neuropsychiatric Genetics &Genomics, Institute of Psychological Medicine &Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - David Roberts
- NIHR Blood and Transplant Research Unit in Donor Health and Genomics, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,NHS Blood and Transplant Oxford Centre, John Radcliffe Hospital, Oxford, UK.,Radcliffe Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford, UK
| | - Jennifer Sambrook
- INTERVAL Coordinating Centre, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,Department of Haemotology, University of Cambridge, Cambridge, UK
| | - Pamela Sklar
- Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - David St Clair
- Institute of Medical Sciences, University of Aberdeen, Aberdeen, UK
| | - Juha Veijola
- Medical Research Center Oulu, Oulu University Hospital and University of Oulu, Oulu, Finland
| | - James T R Walters
- MRC Centre for Neuropsychiatric Genetics &Genomics, Institute of Psychological Medicine &Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Hywel Williams
- MRC Centre for Neuropsychiatric Genetics &Genomics, Institute of Psychological Medicine &Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | | | | | | | | | - Patrick F Sullivan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.,Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA.,Department of Psychiatry, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Matthew E Hurles
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Michael C O'Donovan
- MRC Centre for Neuropsychiatric Genetics &Genomics, Institute of Psychological Medicine &Clinical Neurosciences, School of Medicine, Cardiff University, Cardiff, UK
| | - Aarno Palotie
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.,Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland.,Program in Medical and Population Genetics and Genetic Analysis Platform, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Michael J Owen
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Jeffrey C Barrett
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
545
|
Narasimhan VM, Hunt KA, Mason D, Baker CL, Karczewski KJ, Barnes MR, Barnett AH, Bates C, Bellary S, Bockett NA, Giorda K, Griffiths CJ, Hemingway H, Jia Z, Kelly MA, Khawaja HA, Lek M, McCarthy S, McEachan R, O'Donnell-Luria A, Paigen K, Parisinos CA, Sheridan E, Southgate L, Tee L, Thomas M, Xue Y, Schnall-Levin M, Petkov PM, Tyler-Smith C, Maher ER, Trembath RC, MacArthur DG, Wright J, Durbin R, van Heel DA. Health and population effects of rare gene knockouts in adult humans with related parents. Science 2016; 352:474-7. [PMID: 26940866 DOI: 10.1126/science.aac8624] [Citation(s) in RCA: 217] [Impact Index Per Article: 24.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Accepted: 02/18/2016] [Indexed: 12/13/2022]
Abstract
Examining complete gene knockouts within a viable organism can inform on gene function. We sequenced the exomes of 3222 British adults of Pakistani heritage with high parental relatedness, discovering 1111 rare-variant homozygous genotypes with predicted loss of function (knockouts) in 781 genes. We observed 13.7% fewer homozygous knockout genotypes than we expected, implying an average load of 1.6 recessive-lethal-equivalent loss-of-function (LOF) variants per adult. When genetic data were linked to the individuals' lifelong health records, we observed no significant relationship between gene knockouts and clinical consultation or prescription rate. In this data set, we identified a healthy PRDM9-knockout mother and performed phased genome sequencing on her, her child, and control individuals. Our results show that meiotic recombination sites are localized away from PRDM9-dependent hotspots. Thus, natural LOF variants inform on essential genetic loci and demonstrate PRDM9 redundancy in humans.
Collapse
Affiliation(s)
| | - Karen A Hunt
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK
| | - Dan Mason
- Bradford Institute for Health Research, Bradford Teaching Hospitals National Health Service (NHS) Foundation Trust, Bradford BD9 6RJ, UK
| | - Christopher L Baker
- Center for Genome Dynamics, The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Konrad J Karczewski
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Michael R Barnes
- William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK
| | - Anthony H Barnett
- Diabetes and Endocrine Centre, Heart of England NHS Foundation Trust and University of Birmingham, Birmingham B9 5SS, UK
| | - Chris Bates
- TPP, Mill House, Troy Road, Leeds LS18 5TN, UK
| | - Srikanth Bellary
- Aston Research Centre for Healthy Ageing, Aston University, Birmingham B4 7ET, UK
| | - Nicholas A Bockett
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK
| | - Kristina Giorda
- 10X Genomics, 7068 Koll Center Parkway, Suite 415, Pleasanton, CA 94566, USA
| | - Christopher J Griffiths
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK
| | - Harry Hemingway
- Farr Institute of Health Informatics Research, London NW1 2DA, UK. Institute of Health Informatics, University College London, London NW1 2DA, UK
| | - Zhilong Jia
- William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK
| | - M Ann Kelly
- School of Clinical and Experimental Medicine, University of Birmingham, Birmingham B15 2TT, UK
| | - Hajrah A Khawaja
- William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK
| | - Monkol Lek
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Shane McCarthy
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Rosie McEachan
- Bradford Institute for Health Research, Bradford Teaching Hospitals National Health Service (NHS) Foundation Trust, Bradford BD9 6RJ, UK
| | - Anne O'Donnell-Luria
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Kenneth Paigen
- Center for Genome Dynamics, The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Constantinos A Parisinos
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK
| | - Eamonn Sheridan
- Bradford Institute for Health Research, Bradford Teaching Hospitals National Health Service (NHS) Foundation Trust, Bradford BD9 6RJ, UK
| | - Laura Southgate
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK
| | - Louise Tee
- School of Clinical and Experimental Medicine, University of Birmingham, Birmingham B15 2TT, UK
| | - Mark Thomas
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Yali Xue
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | | | - Petko M Petkov
- Center for Genome Dynamics, The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | | | - Eamonn R Maher
- Department of Medical Genetics, University of Cambridge and National Institute for Health Research (NIHR) Cambridge Biomedical Research Centre, Box 238, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK. Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK
| | - Richard C Trembath
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK. Faculty of Life Sciences and Medicine, King's College London, London SE1 1UL, UK
| | - Daniel G MacArthur
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - John Wright
- Bradford Institute for Health Research, Bradford Teaching Hospitals National Health Service (NHS) Foundation Trust, Bradford BD9 6RJ, UK
| | - Richard Durbin
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
| | - David A van Heel
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK.
| |
Collapse
|
546
|
Complete Genome Sequence of Bovine Polyomavirus Type 1 from Aborted Cattle, Isolated in Belgium in 2014. GENOME ANNOUNCEMENTS 2016; 4:4/2/e01646-15. [PMID: 26941154 PMCID: PMC4777765 DOI: 10.1128/genomea.01646-15] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The complete and fully annotated genome sequence of a bovine polyomavirus type 1 (BPyV/BEL/1/2014) from aborted cattle was assembled from a metagenomics data set. The 4,697-bp circular dsDNA genome contains 6 protein-coding genes. Bovine polyomavirus is unlikely to be causally related to the abortion cases.
Collapse
|
547
|
Hernaez M, Ochoa I, Weissman T. A cluster-based approach to compression of Quality Scores. PROCEEDINGS. DATA COMPRESSION CONFERENCE 2016; 2016:261-270. [PMID: 29057318 DOI: 10.1109/dcc.2016.49] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Massive amounts of sequencing data are being generated thanks to advances in sequencing technology and a dramatic drop in the sequencing cost. Storing and sharing this large data has become a major bottleneck in the discovery and analysis of genetic variants that are used for medical inference. As such, lossless compression of this data has been proposed. Of the compressed data, more than 70% correspond to quality scores, which indicate the sequencing machine reliability when calling a particular basepair. Thus, to further improve the compression performance, lossy compression of quality scores is emerging as the natural candidate. Since the data is used for genetic variants discovery, lossy compressors for quality scores are analyzed in terms of their rate-distortion performance, as well as their effect on the variant callers. Previously proposed algorithms do not do well under all performance metrics, and are hence unsuitable for certain applications. In this work we propose a new lossy compressor that first performs a clustering step, by assuming all the quality scores sequences come from a mixture of Markov models. Then, it performs quantization of the quality scores based on the Markov models. Each quantizer targets a specific distortion to optimize for the overall rate-distortion performance. Finally, the quantized values are compressed by an entropy encoder. We demonstrate that the proposed lossy compressor outperforms the previously proposed methods under all analyzed distortion metrics. This suggests that the effect that the proposed algorithm will have on any downstream application will likely be less noticeable than that of previously proposed lossy compressors. Moreover, we analyze how the proposed lossy compressor affects Single Nucleotide Polymorphism (SNP) calling, and show that the variability introduced on the calls is considerably smaller than the variability that exists between different methodologies for SNP calling.
Collapse
Affiliation(s)
- Mikel Hernaez
- Department of Electrical Engineering, Stanford University
| | - Idoia Ochoa
- Department of Electrical Engineering, Stanford University
| | | |
Collapse
|
548
|
Ikeda Y, Kiyotani K, Yew PY, Kato T, Tamura K, Yap KL, Nielsen SM, Mester JL, Eng C, Nakamura Y, Grogan RH. Germline PARP4 mutations in patients with primary thyroid and breast cancers. Endocr Relat Cancer 2016; 23:171-9. [PMID: 26699384 PMCID: PMC5152685 DOI: 10.1530/erc-15-0359] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/23/2015] [Indexed: 12/20/2022]
Abstract
Germline mutations in the PTEN gene, which cause Cowden syndrome, are known to be one of the genetic factors for primary thyroid and breast cancers; however, PTEN mutations are found in only a small subset of research participants with non-syndrome breast and thyroid cancers. In this study, we aimed to identify germline variants that may be related to genetic risk of primary thyroid and breast cancers. Genomic DNAs extracted from peripheral blood of 14 PTEN WT female research participants with primary thyroid and breast cancers were analyzed by whole-exome sequencing. Gene-based case-control association analysis using the information of 406 Europeans obtained from the 1000 Genomes Project database identified 34 genes possibly associated with the phenotype with P < 1.0 × 10(-3). Among them, rare variants in the PARP4 gene were detected at significant high frequency (odds ratio = 5.2; P = 1.0 × 10(-5)). The variants, G496V and T1170I, were found in six of the 14 study participants (43%) while their frequencies were only 0.5% in controls. Functional analysis using HCC1143 cell line showed that knockdown of PARP4 with siRNA significantly enhanced the cell proliferation, compared with the cells transfected with siControl (P = 0.02). Kaplan-Meier analysis using Gene Expression Omnibus (GEO), European Genome-phenome Archive (EGA) and The Cancer Genome Atlas (TCGA) datasets showed poor relapse-free survival (P < 0.001, Hazard ratio 1.27) and overall survival (P = 0.006, Hazard ratio 1.41) in a PARP4 low-expression group, suggesting that PARP4 may function as a tumor suppressor. In conclusion, we identified PARP4 as a possible susceptibility gene of primary thyroid and breast cancer.
Collapse
Affiliation(s)
- Yuji Ikeda
- Section of Hematology/OncologyDepartment of Medicine, The University of Chicago, Chicago, Illinois 60637, USAGenomic Medicine InstituteCleveland Clinic, Cleveland, Ohio 44195, USADepartment of Genetics and Genome SciencesComprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106, USAEndocrine Surgery Research ProgramSection of General Surgery, Department of Surgery, The University of Chicago, 5841 S Maryland Avenue, Chicago, Illinois 60637, USA
| | - Kazuma Kiyotani
- Section of Hematology/OncologyDepartment of Medicine, The University of Chicago, Chicago, Illinois 60637, USAGenomic Medicine InstituteCleveland Clinic, Cleveland, Ohio 44195, USADepartment of Genetics and Genome SciencesComprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106, USAEndocrine Surgery Research ProgramSection of General Surgery, Department of Surgery, The University of Chicago, 5841 S Maryland Avenue, Chicago, Illinois 60637, USA
| | - Poh Yin Yew
- Section of Hematology/OncologyDepartment of Medicine, The University of Chicago, Chicago, Illinois 60637, USAGenomic Medicine InstituteCleveland Clinic, Cleveland, Ohio 44195, USADepartment of Genetics and Genome SciencesComprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106, USAEndocrine Surgery Research ProgramSection of General Surgery, Department of Surgery, The University of Chicago, 5841 S Maryland Avenue, Chicago, Illinois 60637, USA
| | - Taigo Kato
- Section of Hematology/OncologyDepartment of Medicine, The University of Chicago, Chicago, Illinois 60637, USAGenomic Medicine InstituteCleveland Clinic, Cleveland, Ohio 44195, USADepartment of Genetics and Genome SciencesComprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106, USAEndocrine Surgery Research ProgramSection of General Surgery, Department of Surgery, The University of Chicago, 5841 S Maryland Avenue, Chicago, Illinois 60637, USA
| | - Kenji Tamura
- Section of Hematology/OncologyDepartment of Medicine, The University of Chicago, Chicago, Illinois 60637, USAGenomic Medicine InstituteCleveland Clinic, Cleveland, Ohio 44195, USADepartment of Genetics and Genome SciencesComprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106, USAEndocrine Surgery Research ProgramSection of General Surgery, Department of Surgery, The University of Chicago, 5841 S Maryland Avenue, Chicago, Illinois 60637, USA
| | - Kai Lee Yap
- Section of Hematology/OncologyDepartment of Medicine, The University of Chicago, Chicago, Illinois 60637, USAGenomic Medicine InstituteCleveland Clinic, Cleveland, Ohio 44195, USADepartment of Genetics and Genome SciencesComprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106, USAEndocrine Surgery Research ProgramSection of General Surgery, Department of Surgery, The University of Chicago, 5841 S Maryland Avenue, Chicago, Illinois 60637, USA
| | - Sarah M Nielsen
- Section of Hematology/OncologyDepartment of Medicine, The University of Chicago, Chicago, Illinois 60637, USAGenomic Medicine InstituteCleveland Clinic, Cleveland, Ohio 44195, USADepartment of Genetics and Genome SciencesComprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106, USAEndocrine Surgery Research ProgramSection of General Surgery, Department of Surgery, The University of Chicago, 5841 S Maryland Avenue, Chicago, Illinois 60637, USA
| | - Jessica L Mester
- Section of Hematology/OncologyDepartment of Medicine, The University of Chicago, Chicago, Illinois 60637, USAGenomic Medicine InstituteCleveland Clinic, Cleveland, Ohio 44195, USADepartment of Genetics and Genome SciencesComprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106, USAEndocrine Surgery Research ProgramSection of General Surgery, Department of Surgery, The University of Chicago, 5841 S Maryland Avenue, Chicago, Illinois 60637, USA
| | - Charis Eng
- Section of Hematology/OncologyDepartment of Medicine, The University of Chicago, Chicago, Illinois 60637, USAGenomic Medicine InstituteCleveland Clinic, Cleveland, Ohio 44195, USADepartment of Genetics and Genome SciencesComprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106, USAEndocrine Surgery Research ProgramSection of General Surgery, Department of Surgery, The University of Chicago, 5841 S Maryland Avenue, Chicago, Illinois 60637, USA Section of Hematology/OncologyDepartment of Medicine, The University of Chicago, Chicago, Illinois 60637, USAGenomic Medicine InstituteCleveland Clinic, Cleveland, Ohio 44195, USADepartment of Genetics and Genome SciencesComprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106, USAEndocrine Surgery Research ProgramSection of General Surgery, Department of Surgery, The University of Chicago, 5841 S Maryland Avenue, Chicago, Illinois 60637, USA
| | - Yusuke Nakamura
- Section of Hematology/OncologyDepartment of Medicine, The University of Chicago, Chicago, Illinois 60637, USAGenomic Medicine InstituteCleveland Clinic, Cleveland, Ohio 44195, USADepartment of Genetics and Genome SciencesComprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106, USAEndocrine Surgery Research ProgramSection of General Surgery, Department of Surgery, The University of Chicago, 5841 S Maryland Avenue, Chicago, Illinois 60637, USA
| | - Raymon H Grogan
- Section of Hematology/OncologyDepartment of Medicine, The University of Chicago, Chicago, Illinois 60637, USAGenomic Medicine InstituteCleveland Clinic, Cleveland, Ohio 44195, USADepartment of Genetics and Genome SciencesComprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, Ohio 44106, USAEndocrine Surgery Research ProgramSection of General Surgery, Department of Surgery, The University of Chicago, 5841 S Maryland Avenue, Chicago, Illinois 60637, USA
| |
Collapse
|
549
|
Variation analysis to construct Korean-specific exome variation database of pilot scale. BIOCHIP JOURNAL 2016. [DOI: 10.1007/s13206-016-0207-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
550
|
Sengupta S, Gulukota K, Zhu Y, Ober C, Naughton K, Wentworth-Sheilds W, Ji Y. Ultra-fast local-haplotype variant calling using paired-end DNA-sequencing data reveals somatic mosaicism in tumor and normal blood samples. Nucleic Acids Res 2016; 44:e25. [PMID: 26420835 PMCID: PMC4756850 DOI: 10.1093/nar/gkv953] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2015] [Revised: 09/09/2015] [Accepted: 09/13/2015] [Indexed: 12/30/2022] Open
Abstract
Somatic mosaicism refers to the existence of somatic mutations in a fraction of somatic cells in a single biological sample. Its importance has mainly been discussed in theory although experimental work has started to emerge linking somatic mosaicism to disease diagnosis. Through novel statistical modeling of paired-end DNA-sequencing data using blood-derived DNA from healthy donors as well as DNA from tumor samples, we present an ultra-fast computational pipeline, LocHap that searches for multiple single nucleotide variants (SNVs) that are scaffolded by the same reads. We refer to scaffolded SNVs as local haplotypes (LH). When an LH exhibits more than two genotypes, we call it a local haplotype variant (LHV). The presence of LHVs is considered evidence of somatic mosaicism because a genetically homogeneous cell population will not harbor LHVs. Applying LocHap to whole-genome and whole-exome sequence data in DNA from normal blood and tumor samples, we find wide-spread LHVs across the genome. Importantly, we find more LHVs in tumor samples than in normal samples, and more in older adults than in younger ones. We confirm the existence of LHVs and somatic mosaicism by validation studies in normal blood samples. LocHap is publicly available at http://www.compgenome.org/lochap.
Collapse
Affiliation(s)
- Subhajit Sengupta
- Program of Computational Genomics & Medicine, NorthShore University HealthSystem, Evanston, IL 60201, USA
| | - Kamalakar Gulukota
- Center for Molecular Medicine, NorthShore University HealthSystem, Evanston, IL 60201, USA
| | - Yitan Zhu
- Program of Computational Genomics & Medicine, NorthShore University HealthSystem, Evanston, IL 60201, USA
| | - Carole Ober
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Katherine Naughton
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | | | - Yuan Ji
- Program of Computational Genomics & Medicine, NorthShore University HealthSystem, Evanston, IL 60201, USA Department of Health Studies, University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|