1
|
Kotnik E, Spies N, Miller C, Li T, Inkman M, Zhang J, Guo L, Maher C, McCourt C, Thaker P, Hagemann A, Mutch D, Powell M, Fuh K. Characterization of primary-metastasis pairs in high-grade serous ovarian cancer with short- and long-term survival. Gynecol Oncol 2020. [DOI: 10.1016/j.ygyno.2020.05.458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
2
|
Chapman LM, Spies N, Pai P, Lim CS, Carroll A, Narzisi G, Watson CM, Proukakis C, Clarke WE, Nariai N, Dawson E, Jones G, Blankenberg D, Brueffer C, Xiao C, Kolora SRR, Alexander N, Wolujewicz P, Ahmed AE, Smith G, Shehreen S, Wenger AM, Salit M, Zook JM. A crowdsourced set of curated structural variants for the human genome. PLoS Comput Biol 2020; 16:e1007933. [PMID: 32559231 PMCID: PMC7329145 DOI: 10.1371/journal.pcbi.1007933] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Revised: 07/01/2020] [Accepted: 05/07/2020] [Indexed: 11/19/2022] Open
Abstract
A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app-SVCurator-to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. 'Expert' curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of 'expert' curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies.
Collapse
Affiliation(s)
- Lesley M. Chapman
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland, United States of America
| | - Noah Spies
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland, United States of America
- The Joint Initiative for Metrology in Biology, Stanford University, Stanford, California, United States of America
- Departments of Genetics and Pathology, Stanford University, Stanford, California, United States of America
| | - Patrick Pai
- University of Maryland - College Park, College Park, Maryland, United States of America
| | - Chun Shen Lim
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Andrew Carroll
- DNAnexus Inc, Mountain View, California, United States of America
| | - Giuseppe Narzisi
- New York Genome Center, New York, New York, United States of America
| | - Christopher M. Watson
- School of Medicine, University of Leeds, Saint James's University Hospital, Leeds, Leeds, United Kingdom
- Yorkshire Regional Genetics Service, The Leeds Teaching Hospitals NHS Trust, Saint James's University Hospital, Leeds, United Kingdom
| | - Christos Proukakis
- University College London, Institute of Neurology, London, United Kingdom
| | - Wayne E. Clarke
- New York Genome Center, New York, New York, United States of America
| | - Naoki Nariai
- Illumina, Inc. San Diego, California, United States of America
| | - Eric Dawson
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland, United States of America
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Garan Jones
- University of Exeter Medical School, Epidemiology and Public Health Group, Barrack Road, Exeter, Devon, United Kingdom
| | - Daniel Blankenberg
- Genomic Medicine Institute Lerner Research Institute Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Christian Brueffer
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Lund, Sweden
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Sree Rohit Raj Kolora
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany
- Molecular Evolution and Systematics of Animals, Institute of Biology, University of Leipzig, Leipzig, Germany
| | - Noah Alexander
- Molecular Biology Institute, University of California Los Angeles, Los Angeles, California, United States of America
| | - Paul Wolujewicz
- Weill Cornell, Belfer Research Building, New York, New York, United States of America
| | - Azza E. Ahmed
- Center for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum and Department of Electrical and Electronic Engineering, Faculty of Engineering, University of Khartoum, Khartoum, Sudan
| | - Graeme Smith
- Guy's Hospital and St Thomas's NHS Foundation Trust Great Maze Pond, London, United Kingdom
| | - Saadlee Shehreen
- Department of Genetic Engineering & Biotechnology, University of Dhaka, Bangladesh
| | - Aaron M. Wenger
- Pacific Biosciences, Menlo Park, California, United States of America
| | - Marc Salit
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland, United States of America
- The Joint Initiative for Metrology in Biology, Stanford University, Stanford, California, United States of America
| | - Justin M. Zook
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland, United States of America
| |
Collapse
|
3
|
Zhou B, Ho SS, Greer SU, Spies N, Bell JM, Zhang X, Zhu X, Arthur JG, Byeon S, Pattni R, Saha I, Huang Y, Song G, Perrin D, Wong WH, Ji HP, Abyzov A, Urban AE. Haplotype-resolved and integrated genome analysis of the cancer cell line HepG2. Nucleic Acids Res 2019; 47:3846-3861. [PMID: 30864654 PMCID: PMC6486628 DOI: 10.1093/nar/gkz169] [Citation(s) in RCA: 170] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Revised: 02/19/2019] [Accepted: 03/01/2019] [Indexed: 12/19/2022] Open
Abstract
HepG2 is one of the most widely used human cancer cell lines in biomedical research and one of the main cell lines of ENCODE. Although the functional genomic and epigenomic characteristics of HepG2 are extensively studied, its genome sequence has never been comprehensively analyzed and higher order genomic structural features are largely unknown. The high degree of aneuploidy in HepG2 renders traditional genome variant analysis methods challenging and partially ineffective. Correct and complete interpretation of the extensive functional genomics data from HepG2 requires an understanding of the cell line’s genome sequence and genome structure. Using a variety of sequencing and analysis methods, we identified a wide spectrum of genome characteristics in HepG2: copy numbers of chromosomal segments at high resolution, SNVs and Indels (corrected for aneuploidy), regions with loss of heterozygosity, phased haplotypes extending to entire chromosome arms, retrotransposon insertions and structural variants (SVs) including complex and somatic genomic rearrangements. A large number of SVs were phased, sequence assembled and experimentally validated. We re-analyzed published HepG2 datasets for allele-specific expression and DNA methylation and assembled an allele-specific CRISPR/Cas9 targeting map. We demonstrate how deeper insights into genomic regulatory complexity are gained by adopting a genome-integrated framework.
Collapse
Affiliation(s)
- Bo Zhou
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Steve S Ho
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Stephanie U Greer
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Noah Spies
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA.,Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA.,Genome-scale Measurements Group, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA
| | - John M Bell
- Stanford Genome Technology Center, Stanford University, Palo Alto, CA 94304, USA
| | - Xianglong Zhang
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Xiaowei Zhu
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Joseph G Arthur
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Seunggyu Byeon
- School of Computer Science and Engineering, College of Engineering, Pusan National University, Busan 46241, South Korea
| | - Reenal Pattni
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ishan Saha
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Yiling Huang
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Giltae Song
- School of Computer Science and Engineering, College of Engineering, Pusan National University, Busan 46241, South Korea
| | - Dimitri Perrin
- Science and Engineering Faculty, Queensland University of Technology, Brisbane, QLD 4001, Australia
| | - Wing H Wong
- Department of Statistics, Stanford University, Stanford, CA 94305, USA.,Department of Biomedical Data Science, Bio-X Program, Stanford University, Stanford, CA 94305, USA
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA.,Stanford Genome Technology Center, Stanford University, Palo Alto, CA 94304, USA
| | - Alexej Abyzov
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905, USA
| | - Alexander E Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA.,Tashia and John Morgridge Faculty Scholar, Stanford Child Health Research Institute, Stanford, CA 94305, USA
| |
Collapse
|
4
|
Zhou B, Ho SS, Greer SU, Zhu X, Bell JM, Arthur JG, Spies N, Zhang X, Byeon S, Pattni R, Ben-Efraim N, Haney MS, Haraksingh RR, Song G, Ji HP, Perrin D, Wong WH, Abyzov A, Urban AE. Comprehensive, integrated, and phased whole-genome analysis of the primary ENCODE cell line K562. Genome Res 2019; 29:472-484. [PMID: 30737237 PMCID: PMC6396411 DOI: 10.1101/gr.234948.118] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 12/28/2018] [Indexed: 11/24/2022]
Abstract
K562 is widely used in biomedical research. It is one of three tier-one cell lines of ENCODE and also most commonly used for large-scale CRISPR/Cas9 screens. Although its functional genomic and epigenomic characteristics have been extensively studied, its genome sequence and genomic structural features have never been comprehensively analyzed. Such information is essential for the correct interpretation and understanding of the vast troves of existing functional genomics and epigenomics data for K562. We performed and integrated deep-coverage whole-genome (short-insert), mate-pair, and linked-read sequencing as well as karyotyping and array CGH analysis to identify a wide spectrum of genome characteristics in K562: copy numbers (CN) of aneuploid chromosome segments at high-resolution, SNVs and indels (both corrected for CN in aneuploid regions), loss of heterozygosity, megabase-scale phased haplotypes often spanning entire chromosome arms, structural variants (SVs), including small and large-scale complex SVs and nonreference retrotransposon insertions. Many SVs were phased, assembled, and experimentally validated. We identified multiple allele-specific deletions and duplications within the tumor suppressor gene FHIT. Taking aneuploidy into account, we reanalyzed K562 RNA-seq and whole-genome bisulfite sequencing data for allele-specific expression and allele-specific DNA methylation. We also show examples of how deeper insights into regulatory complexity are gained by integrating genomic variant information and structural context with functional genomics and epigenomics data. Furthermore, using K562 haplotype information, we produced an allele-specific CRISPR targeting map. This comprehensive whole-genome analysis serves as a resource for future studies that utilize K562 as well as a framework for the analysis of other cancer genomes.
Collapse
Affiliation(s)
- Bo Zhou
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Steve S Ho
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Stephanie U Greer
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Xiaowei Zhu
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - John M Bell
- Stanford Genome Technology Center, Stanford University, Palo Alto, California 94304, USA
| | - Joseph G Arthur
- Department of Statistics, Stanford University, Stanford, California 94305, USA
| | - Noah Spies
- Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA.,Department of Pathology, Stanford University School of Medicine, Stanford, California 94305, USA.,Genome-Scale Measurements Group, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Xianglong Zhang
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Seunggyu Byeon
- School of Computer Science and Engineering, College of Engineering, Pusan National University, Busan 46241, South Korea
| | - Reenal Pattni
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Noa Ben-Efraim
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Michael S Haney
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Rajini R Haraksingh
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Giltae Song
- School of Computer Science and Engineering, College of Engineering, Pusan National University, Busan 46241, South Korea
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, California 94305, USA.,Stanford Genome Technology Center, Stanford University, Palo Alto, California 94304, USA
| | - Dimitri Perrin
- Science and Engineering Faculty, Queensland University of Technology, Brisbane, QLD 4001, Australia
| | - Wing H Wong
- Department of Statistics, Stanford University, Stanford, California 94305, USA.,Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Alexej Abyzov
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA
| | - Alexander E Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 94305, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA.,Tashia and John Morgridge Faculty Scholar, Stanford Child Health Research Institute, Stanford, California 94305, USA
| |
Collapse
|
5
|
Zook JM, Catoe D, McDaniel J, Vang L, Spies N, Sidow A, Weng Z, Liu Y, Mason CE, Alexander N, Henaff E, McIntyre AB, Chandramohan D, Chen F, Jaeger E, Moshrefi A, Pham K, Stedman W, Liang T, Saghbini M, Dzakula Z, Hastie A, Cao H, Deikus G, Schadt E, Sebra R, Bashir A, Truty RM, Chang CC, Gulbahce N, Zhao K, Ghosh S, Hyland F, Fu Y, Chaisson M, Xiao C, Trow J, Sherry ST, Zaranek AW, Ball M, Bobe J, Estep P, Church GM, Marks P, Kyriazopoulou-Panagiotopoulou S, Zheng GX, Schnall-Levin M, Ordonez HS, Mudivarti PA, Giorda K, Sheng Y, Rypdal KB, Salit M. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci Data 2016; 3:160025. [PMID: 27271295 PMCID: PMC4896128 DOI: 10.1038/sdata.2016.25] [Citation(s) in RCA: 385] [Impact Index Per Article: 48.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2015] [Accepted: 03/15/2016] [Indexed: 02/01/2023] Open
Abstract
The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.
Collapse
Affiliation(s)
- Justin M. Zook
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - David Catoe
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Jennifer McDaniel
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Lindsay Vang
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Noah Spies
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
- Stanford University, Stanford, California 94305, USA
| | - Arend Sidow
- Stanford University, Stanford, California 94305, USA
| | - Ziming Weng
- Stanford University, Stanford, California 94305, USA
| | - Yuling Liu
- Stanford University, Stanford, California 94305, USA
| | - Christopher E. Mason
- Department of Physiology and Biophysics, the Feil Family Brain and Mind Research Institute, and HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College, Cornell University, New York, New York 10065, USA
| | - Noah Alexander
- Department of Physiology and Biophysics, the Feil Family Brain and Mind Research Institute, and HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College, Cornell University, New York, New York 10065, USA
| | - Elizabeth Henaff
- Department of Physiology and Biophysics, the Feil Family Brain and Mind Research Institute, and HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College, Cornell University, New York, New York 10065, USA
| | - Alexa B.R. McIntyre
- Department of Physiology and Biophysics, the Feil Family Brain and Mind Research Institute, and HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College, Cornell University, New York, New York 10065, USA
| | - Dhruva Chandramohan
- Department of Physiology and Biophysics, the Feil Family Brain and Mind Research Institute, and HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College, Cornell University, New York, New York 10065, USA
| | - Feng Chen
- Illumina Mission Bay, San Francisco, California 94158, USA
| | - Erich Jaeger
- Illumina Mission Bay, San Francisco, California 94158, USA
| | - Ali Moshrefi
- Illumina Mission Bay, San Francisco, California 94158, USA
| | - Khoa Pham
- BioNano Genomics, San Diego, California 92121, USA
| | | | | | | | | | - Alex Hastie
- BioNano Genomics, San Diego, California 92121, USA
| | - Han Cao
- BioNano Genomics, San Diego, California 92121, USA
| | - Gintaras Deikus
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | - Eric Schadt
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | - Robert Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | - Ali Bashir
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | | | | | - Natali Gulbahce
- Complete Genomics Inc., Mountain View, California 94043, USA
| | - Keyan Zhao
- Thermo Fisher Scientific, South San Francisco, California 94080, USA
| | - Srinka Ghosh
- Thermo Fisher Scientific, South San Francisco, California 94080, USA
| | - Fiona Hyland
- Thermo Fisher Scientific, South San Francisco, California 94080, USA
| | - Yutao Fu
- Thermo Fisher Scientific, South San Francisco, California 94080, USA
| | - Mark Chaisson
- Genome Sciences, University of Washington, Seattle, Washington 98105, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, Maryland 20892, USA
| | - Jonathan Trow
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, Maryland 20892, USA
| | - Stephen T. Sherry
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, Maryland 20892, USA
| | | | | | - Jason Bobe
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
- PersonalGenomes.org, Boston, Massachusetts 02115, USA
| | - Preston Estep
- PersonalGenomes.org, Boston, Massachusetts 02115, USA
- Harvard Medical School, Boston, Massachusetts 02115, USA
| | - George M. Church
- PersonalGenomes.org, Boston, Massachusetts 02115, USA
- Harvard Medical School, Boston, Massachusetts 02115, USA
| | | | | | | | | | | | | | | | - Ying Sheng
- Department of Medical Genetics, Oslo University Hospital, Kirkeveien 166, Bygg 25, Oslo 0450, Norway
| | | | - Marc Salit
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
- Stanford University, Stanford, California 94305, USA
| |
Collapse
|
6
|
Parikh H, Mohiyuddin M, Lam HYK, Iyer H, Chen D, Pratt M, Bartha G, Spies N, Losert W, Zook JM, Salit M. svclassify: a method to establish benchmark structural variant calls. BMC Genomics 2016; 17:64. [PMID: 26772178 PMCID: PMC4715349 DOI: 10.1186/s12864-016-2366-2] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2015] [Accepted: 01/05/2016] [Indexed: 01/24/2023] Open
Abstract
Background The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives. Results We first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz. Conclusions We find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2366-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hemang Parikh
- Genome-Scale Measurements Group, Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8313, Gaithersburg, MD, 20899, USA. .,Dakota Consulting Inc., 1110 Bonifant Street, Suite 310, Silver Spring, MD, 20910, USA.
| | | | - Hugo Y K Lam
- Bina Technologies, Roche Sequencing, Redwood City, CA, 94065, USA.
| | - Hariharan Iyer
- Statistical Engineering Division, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA.
| | - Desu Chen
- Institute for Research in Electronics and Applied Physics, University of Maryland, College Park, MD, 20742, USA.
| | - Mark Pratt
- Personalis Inc., 1350 Willow Road, Suite 202, Menlo Park, CA, 94025, USA.
| | - Gabor Bartha
- Personalis Inc., 1350 Willow Road, Suite 202, Menlo Park, CA, 94025, USA.
| | - Noah Spies
- Genome-Scale Measurements Group, Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8313, Gaithersburg, MD, 20899, USA. .,Department of Pathology, Stanford University, Stanford, CA, USA.
| | - Wolfgang Losert
- Institute for Research in Electronics and Applied Physics, University of Maryland, College Park, MD, 20742, USA.
| | - Justin M Zook
- Genome-Scale Measurements Group, Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8313, Gaithersburg, MD, 20899, USA.
| | - Marc Salit
- Genome-Scale Measurements Group, Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8313, Gaithersburg, MD, 20899, USA. .,Bioengineering Department, Stanford University, Stanford, CA, USA.
| |
Collapse
|
7
|
Spies N, Zook JM, Salit M, Sidow A. svviz: a read viewer for validating structural variants. Bioinformatics 2015; 31:3994-6. [PMID: 26286809 DOI: 10.1093/bioinformatics/btv478] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Accepted: 08/10/2015] [Indexed: 12/14/2022] Open
Abstract
UNLABELLED Visualizing read alignments is the most effective way to validate candidate structural variants (SVs) with existing data. We present svviz, a sequencing read visualizer for SVs that sorts and displays only reads relevant to a candidate SV. svviz works by searching input bam(s) for potentially relevant reads, realigning them against the inferred sequence of the putative variant allele as well as the reference allele and identifying reads that match one allele better than the other. Separate views of the two alleles are then displayed in a scrollable web browser view, enabling a more intuitive visualization of each allele, compared with the single reference genome-based view common to most current read browsers. The browser view facilitates examining the evidence for or against a putative variant, estimating zygosity, visualizing affected genomic annotations and manual refinement of breakpoints. svviz supports data from most modern sequencing platforms. AVAILABILITY AND IMPLEMENTATION svviz is implemented in python and freely available from http://svviz.github.io/.
Collapse
Affiliation(s)
- Noah Spies
- Department of Genetics, Stanford University, Department of Pathology, Stanford University, Genome Scale Measurements Group, National Institute of Standards and Technology, Stanford, CA, USA and
| | - Justin M Zook
- Genome Scale Measurements Group, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Marc Salit
- Genome Scale Measurements Group, National Institute of Standards and Technology, Stanford, CA, USA and
| | - Arend Sidow
- Department of Genetics, Stanford University, Department of Pathology, Stanford University
| |
Collapse
|
8
|
Spies N, Smith CL, Rodriguez JM, Baker JC, Batzoglou S, Sidow A. Constraint and divergence of global gene expression in the mammalian embryo. eLife 2015; 4:e05538. [PMID: 25871848 PMCID: PMC4417935 DOI: 10.7554/elife.05538] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2014] [Accepted: 04/13/2015] [Indexed: 11/18/2022] Open
Abstract
The effects of genetic variation on gene regulation in the developing mammalian embryo remain largely unexplored. To globally quantify these effects, we crossed two divergent mouse strains and asked how genotype of the mother or of the embryo drives gene expression phenotype genomewide. Embryonic expression of 331 genes depends on the genotype of the mother. Embryonic genotype controls allele-specific expression of 1594 genes and a highly overlapping set of cis-expression quantitative trait loci (eQTL). A marked paucity of trans-eQTL suggests that the widespread expression differences do not propagate through the embryonic gene regulatory network. The cis-eQTL genes exhibit lower-than-average evolutionary conservation and are depleted for developmental regulators, consistent with purifying selection acting on expression phenotype of pattern formation genes. The widespread effect of maternal and embryonic genotype in conjunction with the purifying selection we uncovered suggests that embryogenesis is an important and understudied reservoir of phenotypic variation. DOI:http://dx.doi.org/10.7554/eLife.05538.001 The way that the embryo of a mammal, such as a mouse or a human, develops from a fertilized egg is a complicated process that relies on controlling: which genes are active; when these genes activate; and for how long they are active. In broad terms, there are four ways that this control can be achieved: First, inside the sperm or egg, genes can be marked with small chemical tags that flag these genes to be activated (or remain inactive) after fertilization, depending on whether the modification was made by the father (in the sperm) or the mother (in the egg); this process is known as ‘imprinting’. Second, the mother can alter the gene activity in her offspring via the placenta; this process is known as ‘maternal effect’. Third, instructions encoded within the embryo's DNA can directly control if, and when, a nearby gene becomes activated; this is known as ‘cis-regulation’. Finally, similar instructions can also control genes that are situated elsewhere in the embryo's DNA through indirect mechanisms; this is known as ‘trans-regulation’. Now, Spies, Smith et al. have investigated these four processes in the offspring of two different strains of mice, one originally from Europe and the other from Southeast Asia. The two strains were crossbred and the resulting embryos were analyzed to see which of the four processes affected gene activity. This analysis revealed 31 imprinted genes and 331 genes that exhibited a maternal effect—which sometimes changed gene activity by as much as 52%. Spies, Smith et al. also found over a thousand DNA instructions in the embryo's DNA that could directly influence the activity of nearby genes, but fewer instructions that could indirectly control genes that were further away. These results suggest that direct control of genes, which affects only the genes closest to the DNA instruction, can vary a lot between individual embryos of the same species. However, indirect control of embryonically active genes, which affects many genes across the genome at the same time, appears much more tightly constrained by evolutionary forces. Which genes in the mother are responsible for the molecular signals that drive the maternal effect is an important question for future work, with implications for the genetic basis of embryonic development and disease. DOI:http://dx.doi.org/10.7554/eLife.05538.002
Collapse
Affiliation(s)
- Noah Spies
- Department of Pathology, Stanford University School of Medicine, Stanford, United States
| | - Cheryl L Smith
- Department of Pathology, Stanford University School of Medicine, Stanford, United States
| | - Jesse M Rodriguez
- Department of Computer Science, Stanford University, Stanford, United States
| | - Julie C Baker
- Department of Genetics, Stanford University School of Medicine, Stanford, United States
| | - Serafim Batzoglou
- Department of Computer Science, Stanford University, Stanford, United States
| | - Arend Sidow
- Department of Pathology, Stanford University School of Medicine, Stanford, United States
| |
Collapse
|
9
|
Weng Z, Spies N, Zhu SX, Newburger DE, Kashef-Haghighi D, Batzoglou S, Sidow A, West RB. Cell-lineage heterogeneity and driver mutation recurrence in pre-invasive breast neoplasia. Genome Med 2015; 7:28. [PMID: 25918554 PMCID: PMC4410742 DOI: 10.1186/s13073-015-0146-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2014] [Accepted: 02/26/2015] [Indexed: 12/12/2022] Open
Abstract
Background All cells in an individual are related to one another by a bifurcating lineage tree, in which each node is an ancestral cell that divided into two, each branch connects two nodes, and the root is the zygote. When a somatic mutation occurs in an ancestral cell, all its descendants carry the mutation, which can then serve as a lineage marker for the phylogenetic reconstruction of tumor progression. Using this concept, we investigate cell lineage relationships and genetic heterogeneity of pre-invasive neoplasias compared to invasive carcinomas. Methods We deeply sequenced over a thousand phylogenetically informative somatic variants in 66 morphologically independent samples from six patients that represent a spectrum of normal, early neoplasia, carcinoma in situ, and invasive carcinoma. For each patient, we obtained a highly resolved lineage tree that establishes the phylogenetic relationships among the pre-invasive lesions and with the invasive carcinoma. Results The trees reveal lineage heterogeneity of pre-invasive lesions, both within the same lesion, and between histologically similar ones. On the basis of the lineage trees, we identified a large number of independent recurrences of PIK3CA H1047 mutations in separate lesions in four of the six patients, often separate from the diagnostic carcinoma. Conclusions Our analyses demonstrate that multi-sample phylogenetic inference provides insights on the origin of driver mutations, lineage heterogeneity of neoplastic proliferations, and the relationship of genomically aberrant neoplasias with the primary tumors. PIK3CA driver mutations may be comparatively benign inducers of cellular proliferation. Electronic supplementary material The online version of this article (doi:10.1186/s13073-015-0146-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ziming Weng
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305 USA ; Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305 USA
| | - Noah Spies
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305 USA ; Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305 USA
| | - Shirley X Zhu
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305 USA
| | - Daniel E Newburger
- Biomedical Informatics Training Program, Stanford University, Stanford, CA 94305 USA
| | | | - Serafim Batzoglou
- Department of Computer Science, Stanford University, Stanford, CA 94305 USA
| | - Arend Sidow
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305 USA ; Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305 USA
| | - Robert B West
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305 USA
| |
Collapse
|
10
|
Abstract
Evolutionary mechanisms in cancer progression give tumors their individuality. Cancer evolution is different from organismal evolution, however, and we discuss where concepts from evolutionary genetics are useful or limited in facilitating an understanding of cancer. Based on these concepts we construct and apply the simplest plausible model of tumor growth and progression. Simulations using this simple model illustrate the importance of stochastic events early in tumorigenesis, highlight the dominance of exponential growth over linear growth and differentiation, and explain the clonal substructure of tumors.
Collapse
Affiliation(s)
- Arend Sidow
- Departments of Pathology and of Genetics, Stanford University, Stanford, CA 94305, USA.
| | - Noah Spies
- Departments of Pathology and of Genetics, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
11
|
Spies N, Burge CB, Bartel DP. 3' UTR-isoform choice has limited influence on the stability and translational efficiency of most mRNAs in mouse fibroblasts. Genome Res 2013; 23:2078-90. [PMID: 24072873 PMCID: PMC3847777 DOI: 10.1101/gr.156919.113] [Citation(s) in RCA: 147] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Variation in protein output across the genome is controlled at several levels, but the relative contributions of different regulatory mechanisms remain poorly understood. Here, we obtained global measurements of decay and translation rates for mRNAs with alternative 3′ untranslated regions (3′ UTRs) in murine 3T3 cells. Distal tandem isoforms had slightly but significantly lower mRNA stability and greater translational efficiency than proximal isoforms on average. The diversity of alternative 3′ UTRs also enabled inference and evaluation of both positively and negatively acting cis-regulatory elements. The 3′ UTR elements with the greatest implied influence were microRNA complementary sites, which were associated with repression of 32% and 4% at the stability and translational levels, respectively. Nonetheless, both the decay and translation rates were highly correlated for proximal and distal 3′ UTR isoforms from the same genes, implying that in 3T3 cells, alternative 3′ UTR sequences play a surprisingly small regulatory role compared to other mRNA regions.
Collapse
Affiliation(s)
- Noah Spies
- Howard Hughes Medical Institute and Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA
| | | | | |
Collapse
|
12
|
Chiang HR, Schoenfeld LW, Ruby JG, Auyeung VC, Spies N, Baek D, Johnston WK, Russ C, Luo S, Babiarz JE, Blelloch R, Schroth GP, Nusbaum C, Bartel DP. Mammalian microRNAs: experimental evaluation of novel and previously annotated genes. Genes Dev 2010; 24:992-1009. [PMID: 20413612 DOI: 10.1101/gad.1884710] [Citation(s) in RCA: 610] [Impact Index Per Article: 43.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
MicroRNAs (miRNAs) are small regulatory RNAs that derive from distinctive hairpin transcripts. To learn more about the miRNAs of mammals, we sequenced 60 million small RNAs from mouse brain, ovary, testes, embryonic stem cells, three embryonic stages, and whole newborns. Analysis of these sequences confirmed 398 annotated miRNA genes and identified 108 novel miRNA genes. More than 150 previously annotated miRNAs and hundreds of candidates failed to yield sequenced RNAs with miRNA-like features. Ectopically expressing these previously proposed miRNA hairpins also did not yield small RNAs, whereas ectopically expressing the confirmed and newly identified hairpins usually did yield small RNAs with the classical miRNA features, including dependence on the Drosha endonuclease for processing. These experiments, which suggest that previous estimates of conserved mammalian miRNAs were inflated, provide a substantially revised list of confidently identified murine miRNAs from which to infer the general features of mammalian miRNAs. Our analyses also revealed new aspects of miRNA biogenesis and modification, including tissue-specific strand preferences, sequential Dicer cleavage of a metazoan precursor miRNA (pre-miRNA), consequential 5' heterogeneity, newly identified instances of miRNA editing, and evidence for widespread pre-miRNA uridylation reminiscent of miRNA regulation by Lin28.
Collapse
Affiliation(s)
- H Rosaria Chiang
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Spies N, Nielsen CB, Padgett RA, Burge CB. Biased chromatin signatures around polyadenylation sites and exons. Mol Cell 2009; 36:245-54. [PMID: 19854133 DOI: 10.1016/j.molcel.2009.10.008] [Citation(s) in RCA: 295] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2009] [Revised: 10/06/2009] [Accepted: 10/08/2009] [Indexed: 12/29/2022]
Abstract
Core RNA-processing reactions in eukaryotic cells occur cotranscriptionally in a chromatin context, but the relationship between chromatin structure and pre-mRNA processing is poorly understood. We observed strong nucleosome depletion around human polyadenylation sites (PAS) and nucleosome enrichment just downstream of PAS. In genes with multiple alternative PAS, higher downstream nucleosome affinity was associated with higher PAS usage, independently of known PAS motifs that function at the RNA level. Conversely, exons were associated with distinct peaks in nucleosome density. Exons flanked by long introns or weak splice sites exhibited stronger nucleosome enrichment, and incorporation of nucleosome density data improved splicing simulation accuracy. Certain histone modifications, including H3K36me3 and H3K27me2, were specifically enriched on exons, suggesting active marking of exon locations at the chromatin level. Together, these findings provide evidence for extensive functional connections between chromatin structure and RNA processing.
Collapse
Affiliation(s)
- Noah Spies
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02142, USA
| | | | | | | |
Collapse
|
14
|
Bühler M, Spies N, Bartel DP, Moazed D. TRAMP-mediated RNA surveillance prevents spurious entry of RNAs into the Schizosaccharomyces pombe siRNA pathway. Nat Struct Mol Biol 2008; 15:1015-23. [PMID: 18776903 PMCID: PMC3240669 DOI: 10.1038/nsmb.1481] [Citation(s) in RCA: 147] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2008] [Accepted: 07/24/2008] [Indexed: 12/27/2022]
Abstract
In the fission yeast Schizosaccharomyces pombe, the RNA interference (RNAi) machinery is required to generate small interfering RNAs (siRNAs) that mediate heterochromatic gene silencing. Efficient silencing also requires the TRAMP complex, which contains the noncanonical Cid14 poly(A) polymerase and targets aberrant RNAs for degradation. Here we use high-throughput sequencing to analyze Argonaute-associated small RNAs (sRNAs) in both the presence and absence of Cid14. Most sRNAs in fission yeast start with a 5′ uracil, and we argue these are loaded most efficiently into Argonaute. In wild-type cells most sRNAs match to repeated regions of the genome, whereas in cid14Δ cells the sRNA profile changes to include major new classes of sRNAs originating from ribosomal RNAs and a tRNA. Thus, Cid14 prevents certain abundant RNAs from becoming substrates for the RNAi machinery, thereby freeing the RNAi machinery to act on its proper targets.
Collapse
Affiliation(s)
- Marc Bühler
- Department of Cell Biology, 240 Longwood Avenue, Harvard Medical School, Boston, Massachusetts 02115 USA
| | | | | | | |
Collapse
|
15
|
Kaulén R, Miranda E, Spies N. [National Biomedical Informations System (author's transl)]. Rev Med Chil 1978; 106:392-8. [PMID: 356151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|