1
|
Kolmogorov M, Billingsley KJ, Mastoras M, Meredith M, Monlong J, Lorig-Roach R, Asri M, Alvarez Jerez P, Malik L, Dewan R, Reed X, Genner RM, Daida K, Behera S, Shafin K, Pesout T, Prabakaran J, Carnevali P, Yang J, Rhie A, Scholz SW, Traynor BJ, Miga KH, Jain M, Timp W, Phillippy AM, Chaisson M, Sedlazeck FJ, Blauwendraat C, Paten B. Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation. Nat Methods 2023; 20:1483-1492. [PMID: 37710018 DOI: 10.1038/s41592-023-01993-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Accepted: 08/04/2023] [Indexed: 09/16/2023]
Abstract
Long-read sequencing technologies substantially overcome the limitations of short-reads but have not been considered as a feasible replacement for population-scale projects, being a combination of too expensive, not scalable enough or too error-prone. Here we develop an efficient and scalable wet lab and computational protocol, Napu, for Oxford Nanopore Technologies long-read sequencing that seeks to address those limitations. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the National Institutes of Health Center for Alzheimer's and Related Dementias. Using a single PromethION flow cell, we can detect single nucleotide polymorphisms with F1-score comparable to Illumina short-read sequencing. Small indel calling remains difficult within homopolymers and tandem repeats, but achieves good concordance to Illumina indel calls elsewhere. Further, we can discover structural variants with F1-score on par with state-of-the-art de novo assembly methods. Our protocol phases small and structural variants at megabase scales and produces highly accurate, haplotype-specific methylation calls.
Collapse
Affiliation(s)
- Mikhail Kolmogorov
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Kimberley J Billingsley
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA.
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA.
| | - Mira Mastoras
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | - Mobin Asri
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Pilar Alvarez Jerez
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Laksh Malik
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Ramita Dewan
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Xylena Reed
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Rylee M Genner
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Kensuke Daida
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | - Trevor Pesout
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Jeshuwin Prabakaran
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, USA
| | | | - Jianzhi Yang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sonja W Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Bryan J Traynor
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Miten Jain
- Department of Bioengineering, Northeastern University, Boston, MA, USA
- Department of Physics, Northeastern University, Boston, MA, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mark Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Cornelis Blauwendraat
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA.
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA.
| | | |
Collapse
|
2
|
Larivière D, Abueg L, Brajuka N, Gallardo-Alba C, Grüning B, Ko BJ, Ostrovsky A, Palmada-Flores M, Pickett BD, Rabbani K, Balacco JR, Chaisson M, Cheng H, Collins J, Denisova A, Fedrigo O, Gallo GR, Giani AM, Gooder GM, Jain N, Johnson C, Kim H, Lee C, Marques-Bonet T, O'Toole B, Rhie A, Secomandi S, Sozzoni M, Tilley T, Uliano-Silva M, van den Beek M, Waterhouse RM, Phillippy AM, Jarvis ED, Schatz MC, Nekrutenko A, Formenti G. Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy. bioRxiv 2023:2023.06.28.546576. [PMID: 37425881 PMCID: PMC10327048 DOI: 10.1101/2023.06.28.546576] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Improvements in genome sequencing and assembly are enabling high-quality reference genomes for all species. However, the assembly process is still laborious, computationally and technically demanding, lacks standards for reproducibility, and is not readily scalable. Here we present the latest Vertebrate Genomes Project assembly pipeline and demonstrate that it delivers high-quality reference genomes at scale across a set of vertebrate species arising over the last ~500 million years. The pipeline is versatile and combines PacBio HiFi long-reads and Hi-C-based haplotype phasing in a new graph-based paradigm. Standardized quality control is performed automatically to troubleshoot assembly issues and assess biological complexities. We make the pipeline freely accessible through Galaxy, accommodating researchers even without local computational resources and enhanced reproducibility by democratizing the training and assembly process. We demonstrate the flexibility and reliability of the pipeline by assembling reference genomes for 51 vertebrate species from major taxonomic groups (fish, amphibians, reptiles, birds, and mammals).
Collapse
Affiliation(s)
- Delphine Larivière
- Dept. of Biochemistry and Molecular Biology, Pennsylvania State University, USA
| | - Linelle Abueg
- Vertebrate Genome Laboratory, The Rockefeller University, USA
| | | | - Cristóbal Gallardo-Alba
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Bjorn Grüning
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Byung June Ko
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
| | - Alex Ostrovsky
- Departments of Biology and Computer Science, Johns Hopkins University, USA
| | - Marc Palmada-Flores
- Department of Medicine and Life Sciences (MELIS), Institut de Biologia Evolutiva, Universitat Pompeu Fabra-CSIC, Barcelona 08003, Spain
| | - Brandon D Pickett
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Keon Rabbani
- Department of Quantitative and Computational Biology, University of Southern California
| | | | - Mark Chaisson
- Department of Quantitative and Computational Biology, University of Southern California
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Joanna Collins
- Wellcome Sanger Institute, Cambridge CB10 1SA, United Kingdom
| | - Alexandra Denisova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Russia
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, USA
| | | | | | | | - Nivesh Jain
- Vertebrate Genome Laboratory, The Rockefeller University, USA
| | - Cassidy Johnson
- Vertebrate Genome Laboratory, The Rockefeller University, USA
| | - Heebal Kim
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
- eGnome, Inc, Seoul, Republic of Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Chul Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York City, NY, 10065, USA
| | - Tomas Marques-Bonet
- Department of Medicine and Life Sciences (MELIS), Institut de Biologia Evolutiva, Universitat Pompeu Fabra-CSIC, Barcelona 08003, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), Barcelona 08010, Spain
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona 08028, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Cerdanyola del Vallès 08193, Spain
| | - Brian O'Toole
- Vertebrate Genome Laboratory, The Rockefeller University, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Simona Secomandi
- Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | - Marcella Sozzoni
- University of Florence, Department of Biology, Via Madonna del Piano 6, Sesto Fiorentino (FI)
| | - Tatiana Tilley
- Vertebrate Genome Laboratory, The Rockefeller University, USA
| | | | - Marius van den Beek
- Dept. of Biochemistry and Molecular Biology, Pennsylvania State University, USA
| | - Robert M Waterhouse
- Department of Ecology & Evolution and Swiss Institute of Bioinformatics, University of Lausanne, Lausanne, Switzerland
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Erich D Jarvis
- Vertebrate Genome Laboratory, The Rockefeller University, USA
| | - Michael C Schatz
- Departments of Biology and Computer Science, Johns Hopkins University, USA
| | - Anton Nekrutenko
- Dept. of Biochemistry and Molecular Biology, Pennsylvania State University, USA
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, USA
| |
Collapse
|
3
|
Kolmogorov M, Billingsley KJ, Mastoras M, Meredith M, Monlong J, Lorig-Roach R, Asri M, Jerez PA, Malik L, Dewan R, Reed X, Genner RM, Daida K, Behera S, Shafin K, Pesout T, Prabakaran J, Carnevali P, Yang J, Rhie A, Scholz SW, Traynor BJ, Miga KH, Jain M, Timp W, Phillippy AM, Chaisson M, Sedlazeck FJ, Blauwendraat C, Paten B. Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation. bioRxiv 2023:2023.01.12.523790. [PMID: 36711673 PMCID: PMC9882142 DOI: 10.1101/2023.01.12.523790] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Long-read sequencing technologies substantially overcome the limitations of short-reads but to date have not been considered as feasible replacement at scale due to a combination of being too expensive, not scalable enough, or too error-prone. Here, we develop an efficient and scalable wet lab and computational protocol for Oxford Nanopore Technologies (ONT) long-read sequencing that seeks to provide a genuine alternative to short-reads for large-scale genomics projects. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the NIH Center for Alzheimer's and Related Dementias (CARD). Using a single PromethION flow cell, we can detect SNPs with F1-score better than Illumina short-read sequencing. Small indel calling remains to be difficult inside homopolymers and tandem repeats, but is comparable to Illumina calls elsewhere. Further, we can discover structural variants with F1-score comparable to state-of the-art methods involving Pacific Biosciences HiFi sequencing and trio information (but at a lower cost and greater throughput). Using ONT based phasing, we can then combine and phase small and structural variants at megabase scales. Our protocol also produces highly accurate, haplotype-specific methylation calls. Overall, this makes large-scale long-read sequencing projects feasible; the protocol is currently being used to sequence thousands of brain-based genomes as a part of the NIH CARD initiative. We provide the protocol and software as open-source integrated pipelines for generating phased variant calls and assemblies.
Collapse
Affiliation(s)
- Mikhail Kolmogorov
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, USA
| | - Kimberley J. Billingsley
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Mira Mastoras
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | - Mobin Asri
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Pilar Alvarez Jerez
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Laksh Malik
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Ramita Dewan
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Xylena Reed
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Rylee M. Genner
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Kensuke Daida
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Kishwar Shafin
- Google LLC, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Jeshuwin Prabakaran
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, USA
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, USA
| | | | | | - Jianzhi Yang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sonja W. Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, National Institutes of Health, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Bryan J. Traynor
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Northeastern University, Boston, MA, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mark Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, USA
| | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, Texas, USA
| | - Cornelis Blauwendraat
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | | |
Collapse
|
4
|
Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, Uliano-Silva M, Chow W, Fungtammasan A, Kim J, Lee C, Ko BJ, Chaisson M, Gedman GL, Cantin LJ, Thibaud-Nissen F, Haggerty L, Bista I, Smith M, Haase B, Mountcastle J, Winkler S, Paez S, Howard J, Vernes SC, Lama TM, Grutzner F, Warren WC, Balakrishnan CN, Burt D, George JM, Biegler MT, Iorns D, Digby A, Eason D, Robertson B, Edwards T, Wilkinson M, Turner G, Meyer A, Kautt AF, Franchini P, Detrich HW, Svardal H, Wagner M, Naylor GJP, Pippel M, Malinsky M, Mooney M, Simbirsky M, Hannigan BT, Pesout T, Houck M, Misuraca A, Kingan SB, Hall R, Kronenberg Z, Sović I, Dunn C, Ning Z, Hastie A, Lee J, Selvaraj S, Green RE, Putnam NH, Gut I, Ghurye J, Garrison E, Sims Y, Collins J, Pelan S, Torrance J, Tracey A, Wood J, Dagnew RE, Guan D, London SE, Clayton DF, Mello CV, Friedrich SR, Lovell PV, Osipova E, Al-Ajli FO, Secomandi S, Kim H, Theofanopoulou C, Hiller M, Zhou Y, Harris RS, Makova KD, Medvedev P, Hoffman J, Masterson P, Clark K, Martin F, Howe K, Flicek P, Walenz BP, Kwak W, Clawson H, Diekhans M, Nassar L, Paten B, Kraus RHS, Crawford AJ, Gilbert MTP, Zhang G, Venkatesh B, Murphy RW, Koepfli KP, Shapiro B, Johnson WE, Di Palma F, Marques-Bonet T, Teeling EC, Warnow T, Graves JM, Ryder OA, Haussler D, O'Brien SJ, Korlach J, Lewin HA, Howe K, Myers EW, Durbin R, Phillippy AM, Jarvis ED. Towards complete and error-free genome assemblies of all vertebrate species. Nature 2021; 592:737-746. [PMID: 33911273 PMCID: PMC8081667 DOI: 10.1038/s41586-021-03451-0] [Citation(s) in RCA: 617] [Impact Index Per Article: 205.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 03/12/2021] [Indexed: 02/02/2023]
Abstract
High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1-4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
Collapse
Affiliation(s)
- Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shane A McCarthy
- Department of Genetics, University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Cambridge, UK
| | - Olivier Fedrigo
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | - Joana Damas
- The Genome Center, University of California Davis, Davis, CA, USA
| | - Giulio Formenti
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Marcela Uliano-Silva
- Leibniz Institute for Zoo and Wildlife Research, Department of Evolutionary Genetics, Berlin, Germany
- Berlin Center for Genomics in Biodiversity Research, Berlin, Germany
| | | | | | - Juwan Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Chul Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Byung June Ko
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
| | - Mark Chaisson
- University of Southern California, Los Angeles, CA, USA
| | - Gregory L Gedman
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Lindsey J Cantin
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Francoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, USA
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Iliana Bista
- Department of Genetics, University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Cambridge, UK
| | | | - Bettina Haase
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | | | - Sylke Winkler
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- DRESDEN-concept Genome Center, Dresden, Germany
| | - Sadye Paez
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | | | - Sonja C Vernes
- Neurogenetics of Vocal Communication Group, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands
- School of Biology, University of St Andrews, St Andrews, UK
| | - Tanya M Lama
- University of Massachusetts Cooperative Fish and Wildlife Research Unit, Amherst, MA, USA
| | - Frank Grutzner
- School of Biological Science, The Environment Institute, University of Adelaide, Adelaide, South Australia, Australia
| | - Wesley C Warren
- Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | | | - Dave Burt
- UQ Genomics, University of Queensland, Brisbane, Queensland, Australia
| | - Julia M George
- Department of Biological Sciences, Clemson University, Clemson, SC, USA
| | - Matthew T Biegler
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - David Iorns
- The Genetic Rescue Foundation, Wellington, New Zealand
| | - Andrew Digby
- Kākāpō Recovery, Department of Conservation, Invercargill, New Zealand
| | - Daryl Eason
- Kākāpō Recovery, Department of Conservation, Invercargill, New Zealand
| | - Bruce Robertson
- Department of Zoology, University of Otago, Dunedin, New Zealand
| | | | - Mark Wilkinson
- Department of Life Sciences, Natural History Museum, London, UK
| | - George Turner
- School of Natural Sciences, Bangor University, Gwynedd, UK
| | - Axel Meyer
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Andreas F Kautt
- Department of Biology, University of Konstanz, Konstanz, Germany
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Paolo Franchini
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - H William Detrich
- Department of Marine and Environmental Sciences, Northeastern University Marine Science Center, Nahant, MA, USA
| | - Hannes Svardal
- Department of Biology, University of Antwerp, Antwerp, Belgium
- Naturalis Biodiversity Center, Leiden, The Netherlands
| | - Maximilian Wagner
- Institute of Biology, Karl-Franzens University of Graz, Graz, Austria
| | - Gavin J P Naylor
- Florida Museum of Natural History, University of Florida, Gainesville, FL, USA
| | - Martin Pippel
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Center for Systems Biology, Dresden, Germany
| | - Milan Malinsky
- Wellcome Sanger Institute, Cambridge, UK
- Zoological Institute, University of Basel, Basel, Switzerland
| | | | | | | | - Trevor Pesout
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | | | | | | | | | | | - Ivan Sović
- Pacific Biosciences, Menlo Park, CA, USA
- Digital BioLogic, Ivanić-Grad, Croatia
| | | | - Zemin Ning
- Wellcome Sanger Institute, Cambridge, UK
| | | | - Joyce Lee
- Bionano Genomics, San Diego, CA, USA
| | | | - Richard E Green
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
- Dovetail Genomics, Santa Cruz, CA, USA
| | | | - Ivo Gut
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Jay Ghurye
- Dovetail Genomics, Santa Cruz, CA, USA
- Department of Computer Science, University of Maryland College Park, College Park, MD, USA
| | - Erik Garrison
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Ying Sims
- Wellcome Sanger Institute, Cambridge, UK
| | | | | | | | | | | | | | - Dengfeng Guan
- Department of Genetics, University of Cambridge, Cambridge, UK
- School of Computer Science and Technology, Center for Bioinformatics, Harbin Institute of Technology, Harbin, China
| | - Sarah E London
- Department of Psychology, Institute for Mind and Biology, University of Chicago, Chicago, IL, USA
| | - David F Clayton
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA
| | - Claudio V Mello
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR, USA
| | - Samantha R Friedrich
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR, USA
| | - Peter V Lovell
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR, USA
| | - Ekaterina Osipova
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Center for Systems Biology, Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, Dresden, Germany
| | - Farooq O Al-Ajli
- Monash University Malaysia Genomics Facility, School of Science, Selangor Darul Ehsan, Malaysia
- Tropical Medicine and Biology Multidisciplinary Platform, Monash University Malaysia, Selangor Darul Ehsan, Malaysia
- Qatar Falcon Genome Project, Doha, Qatar
| | | | - Heebal Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
- eGnome, Inc., Seoul, Republic of Korea
| | | | - Michael Hiller
- LOEWE Centre for Translational Biodiversity Genomics, Frankfurt, Germany
- Senckenberg Research Institute, Frankfurt, Germany
- Goethe-University, Faculty of Biosciences, Frankfurt, Germany
| | | | - Robert S Harris
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University, University Park, PA, USA
- Center for Medical Genomics, Pennsylvania State University, University Park, PA, USA
- Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, USA
| | - Paul Medvedev
- Center for Medical Genomics, Pennsylvania State University, University Park, PA, USA
- Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, USA
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
| | - Jinna Hoffman
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, USA
| | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, USA
| | - Karen Clark
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, USA
| | - Fergal Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Kevin Howe
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Woori Kwak
- eGnome, Inc., Seoul, Republic of Korea
- Hoonygen, Seoul, Korea
| | - Hiram Clawson
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Luis Nassar
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Robert H S Kraus
- Department of Biology, University of Konstanz, Konstanz, Germany
- Department of Migration, Max Planck Institute of Animal Behavior, Radolfzell, Germany
| | - Andrew J Crawford
- Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - M Thomas P Gilbert
- Center for Evolutionary Hologenomics, The GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
- University Museum, NTNU, Trondheim, Norway
| | - Guojie Zhang
- China National Genebank, BGI-Shenzhen, Shenzhen, China
- Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| | - Byrappa Venkatesh
- Institute of Molecular and Cell Biology, A*STAR, Biopolis, Singapore, Singapore
| | - Robert W Murphy
- Centre for Biodiversity, Royal Ontario Museum, Toronto, Ontario, Canada
| | - Klaus-Peter Koepfli
- Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, Washington, DC, USA
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Warren E Johnson
- Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, Washington, DC, USA
- The Walter Reed Biosystematics Unit, Museum Support Center MRC-534, Smithsonian Institution, Suitland, MD, USA
- Walter Reed Army Institute of Research, Silver Spring, MD, USA
| | - Federica Di Palma
- Department of Biological Sciences, Earlham Institute, University of East Anglia, Norwich, UK
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Barcelona, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), Barcelona, Spain
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Emma C Teeling
- School of Biology and Environmental Science, University College Dublin, Dublin, Ireland
| | - Tandy Warnow
- Department of Computer Science, The University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | | | - Oliver A Ryder
- San Diego Zoo Global, Escondido, CA, USA
- Department of Evolution, Behavior, and Ecology, University of California San Diego, La Jolla, CA, USA
| | - David Haussler
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Stephen J O'Brien
- Laboratory of Genomics Diversity-Center for Computer Technologies, ITMO University, St. Petersburg, Russian Federation
- Guy Harvey Oceanographic Center, Halmos College of Natural Sciences and Oceanography, Nova Southeastern University, Fort Lauderdale, FL, USA
| | | | - Harris A Lewin
- The Genome Center, University of California Davis, Davis, CA, USA
- Department of Evolution and Ecology, University of California Davis, Davis, CA, USA
- John Muir Institute for the Environment, University of California Davis, Davis, CA, USA
| | | | - Eugene W Myers
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.
- Center for Systems Biology, Dresden, Germany.
- Faculty of Computer Science, Technical University Dresden, Dresden, Germany.
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Cambridge, UK.
- Wellcome Sanger Institute, Cambridge, UK.
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Erich D Jarvis
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA.
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| |
Collapse
|
5
|
Mc Cartney AM, Mahmoud M, Jochum M, Agustinho DP, Zorman B, Al Khleifat A, Dabbaghie F, K Kesharwani R, Smolka M, Dawood M, Albin D, Aliyev E, Almabrazi H, Arslan A, Balaji A, Behera S, Billingsley K, L Cameron D, Daw J, T. Dawson E, De Coster W, Du H, Dunn C, Esteban R, Jolly A, Kalra D, Liao C, Liu Y, Lu TY, M Havrilla J, M Khayat M, Marin M, Monlong J, Price S, Rafael Gener A, Ren J, Sagayaradj S, Sapoval N, Sinner C, C. Soto D, Soylev A, Subramaniyan A, Syed N, Tadimeti N, Tater P, Vats P, Vaughn J, Walker K, Wang G, Zeng Q, Zhang S, Zhao T, Kille B, Biederstedt E, Chaisson M, English A, Kronenberg Z, J. Treangen T, Hefferon T, Chin CS, Busby B, J Sedlazeck F. An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates. F1000Res 2021; 10:246. [PMID: 34621504 PMCID: PMC8479851 DOI: 10.12688/f1000research.51477.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/23/2021] [Indexed: 11/20/2022] Open
Abstract
In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research. The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Fawaz Dabbaghie
- Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | | | | | | | | | | | | | - Ahmed Arslan
- Stanford University School of Medicine, California, USA
| | | | | | | | - Daniel L Cameron
- Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
| | - Joyjit Daw
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | - Haowei Du
- Baylor College of Medicine, Houston, USA
| | | | | | | | | | | | | | | | | | | | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, USA
| | | | | | | | | | | | | | | | - Arda Soylev
- Konya Food and Agriculture University, Konya, Turkey
| | | | | | | | | | - Pankaj Vats
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | | | - Qiandong Zeng
- Laboratory Corporation of America Holdings, Westborough, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Mc Cartney AM, Mahmoud M, Jochum M, Agustinho DP, Zorman B, Al Khleifat A, Dabbaghie F, K Kesharwani R, Smolka M, Dawood M, Albin D, Aliyev E, Almabrazi H, Arslan A, Balaji A, Behera S, Billingsley K, L Cameron D, Daw J, T. Dawson E, De Coster W, Du H, Dunn C, Esteban R, Jolly A, Kalra D, Liao C, Liu Y, Lu TY, M Havrilla J, M Khayat M, Marin M, Monlong J, Price S, Rafael Gener A, Ren J, Sagayaradj S, Sapoval N, Sinner C, C. Soto D, Soylev A, Subramaniyan A, Syed N, Tadimeti N, Tater P, Vats P, Vaughn J, Walker K, Wang G, Zeng Q, Zhang S, Zhao T, Kille B, Biederstedt E, Chaisson M, English A, Kronenberg Z, J. Treangen T, Hefferon T, Chin CS, Busby B, J Sedlazeck F. An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates. F1000Res 2021; 10:246. [PMID: 34621504 PMCID: PMC8479851 DOI: 10.12688/f1000research.51477.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/04/2021] [Indexed: 11/08/2023] Open
Abstract
In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research. The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Fawaz Dabbaghie
- Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | | | | | | | | | | | | | - Ahmed Arslan
- Stanford University School of Medicine, California, USA
| | | | | | | | - Daniel L Cameron
- Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
| | - Joyjit Daw
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | - Haowei Du
- Baylor College of Medicine, Houston, USA
| | | | | | | | | | | | | | | | | | | | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, USA
| | | | | | | | | | | | | | | | - Arda Soylev
- Konya Food and Agriculture University, Konya, Turkey
| | | | | | | | | | - Pankaj Vats
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | | | - Qiandong Zeng
- Laboratory Corporation of America Holdings, Westborough, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Fan X, Chaisson M, Nakhleh L, Chen K. HySA: a Hybrid Structural variant Assembly approach using next-generation and single-molecule sequencing technologies. Genome Res 2017; 27:793-800. [PMID: 28104618 PMCID: PMC5411774 DOI: 10.1101/gr.214767.116] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 12/19/2016] [Indexed: 12/29/2022]
Abstract
Achieving complete, accurate, and cost-effective assembly of human genomes is of great importance for realizing the promise of precision medicine. The abundance of repeats and genetic variations in human genomes and the limitations of existing sequencing technologies call for the development of novel assembly methods that can leverage the complementary strengths of multiple technologies. We propose a Hybrid Structural variant Assembly (HySA) approach that integrates sequencing reads from next-generation sequencing and single-molecule sequencing technologies to accurately assemble and detect structural variants (SVs) in human genomes. By identifying homologous SV-containing reads from different technologies through a bipartite-graph-based clustering algorithm, our approach turns a whole genome assembly problem into a set of independent SV assembly problems, each of which can be effectively solved to enhance the assembly of structurally altered regions in human genomes. We used data generated from a haploid hydatidiform mole genome (CHM1) and a diploid human genome (NA12878) to test our approach. The result showed that, compared with existing methods, our approach had a low false discovery rate and substantially improved the detection of many types of SVs, particularly novel large insertions, small indels (10–50 bp), and short tandem repeat expansions and contractions. Our work highlights the strengths and limitations of current approaches and provides an effective solution for extending the power of existing sequencing technologies for SV discovery.
Collapse
Affiliation(s)
- Xian Fan
- Department of Computer Science, Rice University, Houston, Texas 77005, USA.,Department of Bioinformatics and Computational Biology, Division of Quantitative Sciences, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - Mark Chaisson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, Division of Quantitative Sciences, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| |
Collapse
|
8
|
Grasso CS, Shinbrot E, Yu M, Liesersen M, Chaisson M, Chan A, Connolly C, Dai J, Du M, Fuchs C, Garraway L, Giannakis M, Harrison T, Hsu L, Huyghe J, Mu J, Ogino S, Pritchard C, Salipante S, Sun W, Zaidi SH, Zhao N, Grady W, Raphael B, Hudson T, Wheeler D, Peters U. Abstract 136: Refining the molecular profile of colorectal tumors to expand prevention and treatment opportunities. Cancer Res 2016. [DOI: 10.1158/1538-7445.am2016-136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
The completion of The Cancer Genome Atlas (TCGA) project for colorectal cancer (CRC) is ushering in a new phase of identifying treatment strategies tailored to the molecular profile of each person's tumor. Precision medicine approaches to cancer treatment rely on the identification of molecular profiles that can be used to identify effective therapies and can be used in a targeted sequencing setting to make treatment decisions.
The initial TCGA colorectal effort included 276 samples and focused on integrating data from exome sequencing with genome-wide DNA copy number alterations (CNAs), DNA methylation, and mRNA and microRNA expression. Since then a total of 626 samples have been completed with the potential to refine CRC subtypes, identify novel mutated pathways, and further functional understanding. Such a large data set presents opportunities to identify new recurrent drug targets and to stratify patients into groups that are predictive of treatment response. However, large data sets also present substantial challenges, since hand-curation becomes intractable, while computational tools can be overwhelmed by hypermutation and copy number changes.
Here we present a comprehensive molecular analysis of all 626 TCGA colorectal cancer samples, including exome sequencing, CNAs, DNA methylation, and mRNA expression. For each data type, we identified recurrently altered genes. Using MutSigCV on 525 samples yielded 27 and 87 significantly mutated genes in non-hypermutated and hypermutated samples, respectively, a substantial increase over the 15 and 17 somatically recurrently mutated genes identified using MutSig in non-hypermutated and hypermutated samples, respectively, in the previously published TCGA colorectal study. For example, PTEN, a known tumor suppressor, was not reported as significantly recurrently mutated in the initial TCGA non-hypermutated set; however, it was in the larger non-hypermutated set, demonstrating the power of a larger data set for assessing the significance and relative frequency of mutations in the context of known subtypes.
In addition, we integrated the somatic mutation data, copy number data, LOH data, and hyper-methylation data to identify genes, like MLH1, that are recurrently disrupted by different mechanisms. We also considered somatic mutations that are likely gain-of-function mutations based on nonrandom clustering; and we used recurrent indels to identify loss-of-function drivers in samples positive for microsatellite instability (MSI). We further classified each sample using the previously identified subtypes – BRAF+, KRAS+, APC+, CTNNB1+ (beta-catenin+), TGFBR2/SMAD4+, PTEN+ and PIK3CA+, and R-spondin fusion positive, as well as CpG Island Methylator Phenotype (CIMP) and MSI - in order to refine the relevant molecular signatures driving CRC etiology and thereby prevention and treatment paradigms.
Citation Format: Catherine S. Grasso, Eve Shinbrot, Ming Yu, Max Liesersen, Mark Chaisson, Andrew Chan, Charles Connolly, James Dai, Margaret Du, Charles Fuchs, Levi Garraway, Marios Giannakis, Tabitha Harrison, Li Hsu, Jeroen Huyghe, Jasmine Mu, Shuji Ogino, Colin Pritchard, Stephen Salipante, Wei Sun, Syed H. Zaidi, Ni Zhao, William Grady, Ben Raphael, Thomas Hudson, David Wheeler, Ulrike Peters. Refining the molecular profile of colorectal tumors to expand prevention and treatment opportunities. [abstract]. In: Proceedings of the 107th Annual Meeting of the American Association for Cancer Research; 2016 Apr 16-20; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2016;76(14 Suppl):Abstract nr 136.
Collapse
Affiliation(s)
- Catherine S. Grasso
- 1Dept. of Cancer Prevention, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Eve Shinbrot
- 2Baylor College of Medicine — Human Genome Sequencing Center, Houston, TX
| | - Ming Yu
- 3Dept. of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Max Liesersen
- 4Dept. of Computer Science, Brown University, Providence, RI
| | - Mark Chaisson
- 5Dept. of Genome Sciences, University of Washington, Seattle, WA
| | | | - Charles Connolly
- 1Dept. of Cancer Prevention, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - James Dai
- 7Dept. of Biostatistics, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Margaret Du
- 1Dept. of Cancer Prevention, Fred Hutchinson Cancer Research Center, Seattle, WA
| | | | | | | | - Tabitha Harrison
- 1Dept. of Cancer Prevention, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Li Hsu
- 7Dept. of Biostatistics, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Jeroen Huyghe
- 1Dept. of Cancer Prevention, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Jasmine Mu
- 8Cancer Program, The Broad Institute, Boston, MA
| | | | - Colin Pritchard
- 9Dept. of Laboratory Medicine, The University of Washington, Seattle, WA
| | - Stephen Salipante
- 9Dept. of Laboratory Medicine, The University of Washington, Seattle, WA
| | - Wei Sun
- 7Dept. of Biostatistics, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Syed H. Zaidi
- 10Ontario Institute of Cancer Research, Toronto, Ontario, Canada
| | - Ni Zhao
- 7Dept. of Biostatistics, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - William Grady
- 3Dept. of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Ben Raphael
- 4Dept. of Computer Science, Brown University, Providence, RI
| | - Thomas Hudson
- 10Ontario Institute of Cancer Research, Toronto, Ontario, Canada
| | - David Wheeler
- 11Dept. of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX
| | - Ulrike Peters
- 1Dept. of Cancer Prevention, Fred Hutchinson Cancer Research Center, Seattle, WA
| |
Collapse
|
9
|
Zook JM, Catoe D, McDaniel J, Vang L, Spies N, Sidow A, Weng Z, Liu Y, Mason CE, Alexander N, Henaff E, McIntyre AB, Chandramohan D, Chen F, Jaeger E, Moshrefi A, Pham K, Stedman W, Liang T, Saghbini M, Dzakula Z, Hastie A, Cao H, Deikus G, Schadt E, Sebra R, Bashir A, Truty RM, Chang CC, Gulbahce N, Zhao K, Ghosh S, Hyland F, Fu Y, Chaisson M, Xiao C, Trow J, Sherry ST, Zaranek AW, Ball M, Bobe J, Estep P, Church GM, Marks P, Kyriazopoulou-Panagiotopoulou S, Zheng GX, Schnall-Levin M, Ordonez HS, Mudivarti PA, Giorda K, Sheng Y, Rypdal KB, Salit M. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci Data 2016; 3:160025. [PMID: 27271295 PMCID: PMC4896128 DOI: 10.1038/sdata.2016.25] [Citation(s) in RCA: 385] [Impact Index Per Article: 48.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2015] [Accepted: 03/15/2016] [Indexed: 02/01/2023] Open
Abstract
The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.
Collapse
Affiliation(s)
- Justin M. Zook
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - David Catoe
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Jennifer McDaniel
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Lindsay Vang
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Noah Spies
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
- Stanford University, Stanford, California 94305, USA
| | - Arend Sidow
- Stanford University, Stanford, California 94305, USA
| | - Ziming Weng
- Stanford University, Stanford, California 94305, USA
| | - Yuling Liu
- Stanford University, Stanford, California 94305, USA
| | - Christopher E. Mason
- Department of Physiology and Biophysics, the Feil Family Brain and Mind Research Institute, and HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College, Cornell University, New York, New York 10065, USA
| | - Noah Alexander
- Department of Physiology and Biophysics, the Feil Family Brain and Mind Research Institute, and HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College, Cornell University, New York, New York 10065, USA
| | - Elizabeth Henaff
- Department of Physiology and Biophysics, the Feil Family Brain and Mind Research Institute, and HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College, Cornell University, New York, New York 10065, USA
| | - Alexa B.R. McIntyre
- Department of Physiology and Biophysics, the Feil Family Brain and Mind Research Institute, and HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College, Cornell University, New York, New York 10065, USA
| | - Dhruva Chandramohan
- Department of Physiology and Biophysics, the Feil Family Brain and Mind Research Institute, and HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College, Cornell University, New York, New York 10065, USA
| | - Feng Chen
- Illumina Mission Bay, San Francisco, California 94158, USA
| | - Erich Jaeger
- Illumina Mission Bay, San Francisco, California 94158, USA
| | - Ali Moshrefi
- Illumina Mission Bay, San Francisco, California 94158, USA
| | - Khoa Pham
- BioNano Genomics, San Diego, California 92121, USA
| | | | | | | | | | - Alex Hastie
- BioNano Genomics, San Diego, California 92121, USA
| | - Han Cao
- BioNano Genomics, San Diego, California 92121, USA
| | - Gintaras Deikus
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | - Eric Schadt
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | - Robert Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | - Ali Bashir
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | | | | | - Natali Gulbahce
- Complete Genomics Inc., Mountain View, California 94043, USA
| | - Keyan Zhao
- Thermo Fisher Scientific, South San Francisco, California 94080, USA
| | - Srinka Ghosh
- Thermo Fisher Scientific, South San Francisco, California 94080, USA
| | - Fiona Hyland
- Thermo Fisher Scientific, South San Francisco, California 94080, USA
| | - Yutao Fu
- Thermo Fisher Scientific, South San Francisco, California 94080, USA
| | - Mark Chaisson
- Genome Sciences, University of Washington, Seattle, Washington 98105, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, Maryland 20892, USA
| | - Jonathan Trow
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, Maryland 20892, USA
| | - Stephen T. Sherry
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, Maryland 20892, USA
| | | | | | - Jason Bobe
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
- PersonalGenomes.org, Boston, Massachusetts 02115, USA
| | - Preston Estep
- PersonalGenomes.org, Boston, Massachusetts 02115, USA
- Harvard Medical School, Boston, Massachusetts 02115, USA
| | - George M. Church
- PersonalGenomes.org, Boston, Massachusetts 02115, USA
- Harvard Medical School, Boston, Massachusetts 02115, USA
| | | | | | | | | | | | | | | | - Ying Sheng
- Department of Medical Genetics, Oslo University Hospital, Kirkeveien 166, Bygg 25, Oslo 0450, Norway
| | | | - Marc Salit
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
- Stanford University, Stanford, California 94305, USA
| |
Collapse
|
10
|
Abstract
MOTIVATION Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. RESULTS To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. AVAILABILITY AND IMPLEMENTATION STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.
Collapse
|
11
|
Medvedev P, Pham S, Chaisson M, Tesler G, Pevzner P. Paired de bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers. J Comput Biol 2011; 18:1625-34. [PMID: 21999285 DOI: 10.1089/cmb.2011.0151] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The recent proliferation of next generation sequencing with short reads has enabled many new experimental opportunities but, at the same time, has raised formidable computational challenges in genome assembly. One of the key advances that has led to an improvement in contig lengths has been mate pairs, which facilitate the assembly of repeating regions. Mate pairs have been algorithmically incorporated into most next generation assemblers as various heuristic post-processing steps to correct the assembly graph or to link contigs into scaffolds. Such methods have allowed the identification of longer contigs than would be possible with single reads; however, they can still fail to resolve complex repeats. Thus, improved methods for incorporating mate pairs will have a strong effect on contig length in the future. Here, we introduce the paired de Bruijn graph, a generalization of the de Bruijn graph that incorporates mate pair information into the graph structure itself instead of analyzing mate pairs at a post-processing step. This graph has the potential to be used in place of the de Bruijn graph in any de Bruijn graph based assembler, maintaining all other assembly steps such as error-correction and repeat resolution. Through assembly results on simulated perfect data, we argue that this can effectively improve the contig sizes in assembly.
Collapse
Affiliation(s)
- Paul Medvedev
- Department of Computer Science and Engineering, University of California, San Diego, California, USA.
| | | | | | | | | |
Collapse
|
12
|
Medvedev P, Pham S, Chaisson M, Tesler G, Pevzner P. Paired de Bruijn Graphs: A Novel Approach for Incorporating Mate Pair Information into Genome Assemblers. Lecture Notes in Computer Science 2011. [DOI: 10.1007/978-3-642-20036-6_22] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
13
|
Abstract
MOTIVATION Current DNA sequencing technology produces reads of about 500-750 bp, with typical coverage under 10x. New sequencing technologies are emerging that produce shorter reads (length 80-200 bp) but allow one to generate significantly higher coverage (30x and higher) at low cost. Modern assembly programs and error correction routines have been tuned to work well with current read technology but were not designed for assembly of short reads. RESULTS We analyze the limitations of assembling reads generated by these new technologies and present a routine for base-calling in reads prior to their assembly. We demonstrate that while it is feasible to assemble such short reads, the resulting contigs will require significant (if not prohibitive) finishing efforts. AVAILABILITY Available from the web at http://www.cse.ucsd.edu/groups/bioinformatics/software.html
Collapse
Affiliation(s)
- Mark Chaisson
- Bioinformatics Program, University of California San Diego, La Jolla, CA 92093, USA.
| | | | | |
Collapse
|
14
|
Pierce RH, Campbell JS, Stephenson AB, Franklin CC, Chaisson M, Poot M, Kavanagh TJ, Rabinovitch PS, Fausto N. Disruption of redox homeostasis in tumor necrosis factor-induced apoptosis in a murine hepatocyte cell line. Am J Pathol 2000; 157:221-36. [PMID: 10880392 PMCID: PMC1850198 DOI: 10.1016/s0002-9440(10)64533-6] [Citation(s) in RCA: 97] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Tumor necrosis factor (TNF) is a mediator of the acute phase response in the liver and can initiate proliferation and cause cell death in hepatocytes. We investigated the mechanisms by which TNF causes apoptosis in hepatocytes focusing on the role of oxidative stress, antioxidant defenses, and mitochondrial damage. The studies were conducted in cultured AML12 cells, a line of differentiated murine hepatocytes. As is the case for hepatocytes in vivo, AML12 cells were not sensitive to cell death by TNF alone, but died by apoptosis when exposed to TNF and a small dose of actinomycin D (Act D). Morphological signs of apoptosis were not detected until 6 hours after the treatment and by 18 hours approximately 50% of the cells had died. Exposure of the cells to TNF+Act D did not block NFkappaB nuclear translocation, DNA binding, or its overall transactivation capacity. Induction of apoptosis was characterized by oxidative stress indicated by the loss of NAD(P)H and glutathione followed by mitochondrial damage that included loss of mitochondrial membrane potential, inner membrane structural damage, and mitochondrial condensation. These changes coincided with cytochrome C release and the activation of caspases-8, -9, and -3. TNF-induced apoptosis was dependent on glutathione levels. In cells with decreased levels of glutathione, TNF by itself in the absence of transcriptional blocking acted as an apoptotic agent. Conversely, the antioxidant alpha-lipoic acid, that protected against the loss of glutathione in cells exposed to TNF+Act D completely prevented mitochondrial damage, caspase activation, cytochrome C release, and apoptosis. The results demonstrate that apoptosis induced by TNF+Act D in AML12 cells involves oxidative injury and mitochondrial damage. As injury was regulated to a larger extent by the glutathione content of the cells, we suggest that the combination of TNF+Act D causes apoptosis because Act D blocks the transcription of genes required for antioxidant defenses.
Collapse
Affiliation(s)
- R H Pierce
- Department of Pathology, the University of Washington, Seattle 98195, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Kirillova I, Chaisson M, Fausto N. Tumor necrosis factor induces DNA replication in hepatic cells through nuclear factor kappaB activation. Cell Growth Differ 1999; 10:819-28. [PMID: 10616907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
Abstract
Tumor necrosis factor (TNF) signaling through TNF receptor 1 (TNFR1) with downstream participation of nuclear factor kappaB (NFkappaB), interleukin 6 (IL-6), and signal transducers and activators of transcription 3 (STAT3) is required for initiation of liver regeneration. It is not known whether the proliferative effect of TNF on hepatocytes is direct or requires the participation of Kupffer cells, the liver resident macrophages. Moreover, it has not been determined whether NFkappaB activation is an essential step in TNF-induced proliferation. To answer these questions, we conducted studies in LE6 cells, a rat liver epithelial cell line with hepatocyte progenitor capacity. We report that TNF induces DNA replication in growth-arrested LE6 cells and that its effect involves the activation of NFkappaB and STAT3 and an increase in c-myc and IL-6 mRNAs. All of these effects, which mimic the events that initiate liver regeneration in vivo, are blocked if NFKB activation is inhibited by expression of a dominant-inhibitor IkappaBalpha mutant (deltaN-IkappaBalpha). Although NFkappaB blockage by deltaN-IkappaBalpha causes caspase activation and massive death of cells stimulated by TNF, inhibition of NFkappaB and STAT3 binding by the serine protease inhibitor N-tosyl-L-phenylalanine chloromethyl ketone results in G0-G1 cell cycle arrest without death. We conclude that NFkappaB is an essential component of the TNF proliferative pathway and that TNF-induced changes in IL-6 mRNA, STAT3, and c-myc mRNA are dependent on NFkappaB activation. Blockage of NFkappaB inhibits TNF-induced proliferation but does not necessarily cause cell death.
Collapse
Affiliation(s)
- I Kirillova
- Department of Pathology, University of Washington School of Medicine, Seattle 98195-7470, USA
| | | | | |
Collapse
|
16
|
|
17
|
Chaisson M. Psych nurse helps others see how Jews view death. Am Nurse 1983; 15:13. [PMID: 6549729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]
|
18
|
|