1
|
Gustafson JA, Gibson SB, Damaraju N, Zalusky MPG, Hoekzema K, Twesigomwe D, Yang L, Snead AA, Richmond PA, De Coster W, Olson ND, Guarracino A, Li Q, Miller AL, Goffena J, Anderson Z, Storz SHR, Ward SA, Sinha M, Gonzaga-Jauregui C, Clarke WE, Basile AO, Corvelo A, Reeves C, Helland A, Musunuri RL, Revsine M, Patterson KE, Paschal CR, Zakarian C, Goodwin S, Jensen TD, Robb E, McCombie WR, Sedlazeck FJ, Zook JM, Montgomery SB, Garrison E, Kolmogorov M, Schatz MC, McLaughlin RN, Dashnow H, Zody MC, Loose M, Jain M, Eichler EE, Miller DE. Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation. medRxiv 2024:2024.03.05.24303792. [PMID: 38496498 PMCID: PMC10942501 DOI: 10.1101/2024.03.05.24303792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Less than half of individuals with a suspected Mendelian condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.
Collapse
Affiliation(s)
- Jonas A. Gustafson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | - Sophia B. Gibson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Nikhita Damaraju
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Institute for Public Health Genetics, University of Washington, Seattle, WA, USA
| | - Miranda PG Zalusky
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - David Twesigomwe
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Lei Yang
- Pacific Northwest Research Institute, Seattle, WA, USA
| | | | | | - Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Nathan D. Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Human Technopole, Milan, Italy
| | - Qiuhui Li
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Angela L. Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Joy Goffena
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Zachery Anderson
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Sophie HR Storz
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Sydney A. Ward
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Maisha Sinha
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
| | - Claudia Gonzaga-Jauregui
- International Laboratory for Human Genome Research, Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México
| | - Wayne E. Clarke
- New York Genome Center, New York, NY, USA
- Outlier Informatics Inc., Saskatoon, SK, Canada
| | | | | | | | | | | | - Mahler Revsine
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | | | - Cate R. Paschal
- Department of Laboratories, Seattle Children’s Hospital, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Christina Zakarian
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Sara Goodwin
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | | | - Esther Robb
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | | | | | | | | | - Fritz J. Sedlazeck
- Human Genome Sequencing Center Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Mikhail Kolmogorov
- Cancer Data Science Laboratory, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Richard N. McLaughlin
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
- Pacific Northwest Research Institute, Seattle, WA, USA
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
| | | | - Matt Loose
- Deep Seq, School of Life Sciences, University of Nottingham, Nottingham, England
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Khoury College of Computer Sciences, Northeastern University, Boston, MA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Danny E. Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA
| |
Collapse
|
2
|
Byrska-Bishop M, Evani US, Zhao X, Basile AO, Abel HJ, Regier AA, Corvelo A, Clarke WE, Musunuri R, Nagulapalli K, Fairley S, Runnels A, Winterkorn L, Lowy E, Paul Flicek, Germer S, Brand H, Hall IM, Talkowski ME, Narzisi G, Zody MC. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 2022; 185:3426-3440.e19. [PMID: 36055201 PMCID: PMC9439720 DOI: 10.1016/j.cell.2022.08.004] [Citation(s) in RCA: 199] [Impact Index Per Article: 99.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 06/21/2022] [Accepted: 08/03/2022] [Indexed: 01/05/2023]
Abstract
The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies.
Collapse
Affiliation(s)
| | | | - Xuefang Zhao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA,Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA,Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | | | - Haley J. Abel
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA,Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Allison A. Regier
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA,Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | | | - Wayne E. Clarke
- New York Genome Center, New York, NY 10013, USA,Outlier Informatics Inc., Saskatoon, SK S7H 1L4, Canada
| | | | | | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | | | - Ernesto Lowy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Harrison Brand
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA,Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA,Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ira M. Hall
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA,Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA,Center for Genomic Health, Yale University School of Medicine, New Haven, CT 06510, USA,Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Michael E. Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA,Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA,Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | - Michael C. Zody
- New York Genome Center, New York, NY 10013, USA,Corresponding author
| |
Collapse
|
3
|
Wagner J, Olson ND, Harris L, McDaniel J, Cheng H, Fungtammasan A, Hwang YC, Gupta R, Wenger AM, Rowell WJ, Khan ZM, Farek J, Zhu Y, Pisupati A, Mahmoud M, Xiao C, Yoo B, Sahraeian SME, Miller DE, Jáspez D, Lorenzo-Salazar JM, Muñoz-Barrera A, Rubio-Rodríguez LA, Flores C, Narzisi G, Evani US, Clarke WE, Lee J, Mason CE, Lincoln SE, Miga KH, Ebbert MTW, Shumate A, Li H, Chin CS, Zook JM, Sedlazeck FJ. Curated variation benchmarks for challenging medically relevant autosomal genes. Nat Biotechnol 2022; 40:672-680. [PMID: 35132260 PMCID: PMC9117392 DOI: 10.1038/s41587-021-01158-1] [Citation(s) in RCA: 69] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 11/10/2021] [Indexed: 11/09/2022]
Abstract
The repetitive nature and complexity of some medically relevant genes poses a challenge for their accurate analysis in a clinical setting. The Genome in a Bottle Consortium has provided variant benchmark sets, but these exclude nearly 400 medically relevant genes due to their repetitiveness or polymorphic complexity. Here, we characterize 273 of these 395 challenging autosomal genes using a haplotype-resolved whole-genome assembly. This curated benchmark reports over 17,000 single-nucleotide variations, 3,600 insertions and deletions and 200 structural variations each for human genome reference GRCh37 and GRCh38 across HG002. We show that false duplications in either GRCh37 or GRCh38 result in reference-specific, missed variants for short- and long-read technologies in medically relevant genes, including CBS, CRYAA and KCNE1. When masking these false duplications, variant recall can improve from 8% to 100%. Forming benchmarks from a haplotype-resolved whole-genome assembly may become a prototype for future benchmarks covering the whole genome.
Collapse
Affiliation(s)
- Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Lindsay Harris
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Haoyu Cheng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| | | | | | | | | | | | - Ziad M Khan
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Jesse Farek
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Yiming Zhu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Aishwarya Pisupati
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Byunggil Yoo
- Genomic Medicine Center, Children's Mercy Kansas City, Kansas City, MO, USA
| | | | - Danny E Miller
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - David Jáspez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - José M Lorenzo-Salazar
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Adrián Muñoz-Barrera
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Luis A Rubio-Rodríguez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Carlos Flores
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
- CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain
- Research Unit, Hospital Universitario N.S. de Candelaria, Santa Cruz de Tenerife, Spain
| | | | | | | | - Joyce Lee
- Bionano Genomics, San Diego, CA, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | | | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mark T W Ebbert
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY, USA
- Department of Internal Medicine, Division of Biomedical Informatics, University of Kentucky, Lexington, KY, USA
- Department of Neuroscience, University of Kentucky, Lexington, KY, USA
| | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| | | | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
4
|
Ebler J, Ebert P, Clarke WE, Rausch T, Audano PA, Houwaart T, Mao Y, Korbel JO, Eichler EE, Zody MC, Dilthey AT, Marschall T. Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes. Nat Genet 2022; 54:518-525. [PMID: 35410384 PMCID: PMC9005351 DOI: 10.1038/s41588-022-01043-w] [Citation(s) in RCA: 62] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 03/03/2022] [Indexed: 12/30/2022]
Abstract
Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, which are particularly challenging for fast k-mer-based genotypers. In the present study, we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation-a process we refer to as genome inference. Compared with mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage and achieves better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (≥50 bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being faster compared with alignment-based workflows.
Collapse
Affiliation(s)
- Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | | | - Tobias Rausch
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
- European Molecular Biology Laboratory, GeneCore, Heidelberg, Germany
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Torsten Houwaart
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Yafei Mao
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | | | - Alexander T Dilthey
- Institute of Medical Microbiology and Hospital Hygiene, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute of Medical Statistics and Computational Biology, University of Cologne, Cologne, Germany
- Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases, University of Cologne, Cologne, Germany
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.
| |
Collapse
|
5
|
Foox J, Tighe SW, Nicolet CM, Zook JM, Byrska-Bishop M, Clarke WE, Khayat MM, Mahmoud M, Laaguiby PK, Herbert ZT, Warner D, Grills GS, Jen J, Levy S, Xiang J, Alonso A, Zhao X, Zhang W, Teng F, Zhao Y, Lu H, Schroth GP, Narzisi G, Farmerie W, Sedlazeck FJ, Baldwin DA, Mason CE. Author Correction: Performance assessment of DNA sequencing platforms in the ABRF Next-Generation Sequencing Study. Nat Biotechnol 2021; 39:1466. [PMID: 34635840 DOI: 10.1038/s41587-021-01122-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Jonathan Foox
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA.,The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
| | - Scott W Tighe
- University of Vermont Cancer Center, Vermont Integrative Genomics Resource, University of Vermont, Burlington, VT, USA
| | - Charles M Nicolet
- Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Justin M Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | | | - Michael M Khayat
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Phoebe K Laaguiby
- University of Vermont Cancer Center, Vermont Integrative Genomics Resource, University of Vermont, Burlington, VT, USA
| | - Zachary T Herbert
- Molecular Biology Core Facilities, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Derek Warner
- DNA Sequencing Core, University of Utah, Salt Lake City, UT, USA
| | - George S Grills
- Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL, USA
| | - Jin Jen
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - Shawn Levy
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - Jenny Xiang
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Alicia Alonso
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Xia Zhao
- BGI-Shenzhen, Shenzhen, China.,MGI, BGI-Shenzhen, Shenzhen, China
| | | | | | - Yonggang Zhao
- BGI-Shenzhen, Shenzhen, China.,Department of Biotechnology and Biomedicine, Technical University of Denmark, Lyngby, Denmark
| | - Haorong Lu
- BGI-Shenzhen, Shenzhen, China.,Guangdong Provincial Key Laboratory of Genome Read and Write, Shenzhen, China
| | | | | | - William Farmerie
- Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, FL, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| | - Don A Baldwin
- Department of Pathology, Fox Chase Cancer Center, Philadelphia, PA, USA.
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA. .,The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA. .,The Feil Family Brain and Mind Research Institute, New York, NY, USA. .,The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
6
|
Foox J, Tighe SW, Nicolet CM, Zook JM, Byrska-Bishop M, Clarke WE, Khayat MM, Mahmoud M, Laaguiby PK, Herbert ZT, Warner D, Grills GS, Jen J, Levy S, Xiang J, Alonso A, Zhao X, Zhang W, Teng F, Zhao Y, Lu H, Schroth GP, Narzisi G, Farmerie W, Sedlazeck FJ, Baldwin DA, Mason CE. Performance assessment of DNA sequencing platforms in the ABRF Next-Generation Sequencing Study. Nat Biotechnol 2021; 39:1129-1140. [PMID: 34504351 PMCID: PMC8985210 DOI: 10.1038/s41587-021-01049-5] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Accepted: 08/05/2021] [Indexed: 02/08/2023]
Abstract
Assessing the reproducibility, accuracy and utility of massively parallel DNA sequencing platforms remains an ongoing challenge. Here the Association of Biomolecular Resource Facilities (ABRF) Next-Generation Sequencing Study benchmarks the performance of a set of sequencing instruments (HiSeq/NovaSeq/paired-end 2 × 250-bp chemistry, Ion S5/Proton, PacBio circular consensus sequencing (CCS), Oxford Nanopore Technologies PromethION/MinION, BGISEQ-500/MGISEQ-2000 and GS111) on human and bacterial reference DNA samples. Among short-read instruments, HiSeq 4000 and X10 provided the most consistent, highest genome coverage, while BGI/MGISEQ provided the lowest sequencing error rates. The long-read instrument PacBio CCS had the highest reference-based mapping rate and lowest non-mapping rate. The two long-read platforms PacBio CCS and PromethION/MinION showed the best sequence mapping in repeat-rich areas and across homopolymers. NovaSeq 6000 using 2 × 250-bp read chemistry was the most robust instrument for capturing known insertion/deletion events. This study serves as a benchmark for current genomics technologies, as well as a resource to inform experimental design and next-generation sequencing variant calling.
Collapse
Affiliation(s)
- Jonathan Foox
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA.,The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
| | - Scott W. Tighe
- University of Vermont Cancer Center, Vermont Integrative Genomics Resource, University of Vermont, Burlington, VT, USA
| | - Charles M. Nicolet
- Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Justin M. Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | | | - Michael M. Khayat
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Phoebe K. Laaguiby
- University of Vermont Cancer Center, Vermont Integrative Genomics Resource, University of Vermont, Burlington, VT, USA
| | - Zachary T. Herbert
- Molecular Biology Core Facilities, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Derek Warner
- DNA Sequencing Core, University of Utah, Salt Lake City, UT, USA
| | - George S. Grills
- Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL, USA
| | - Jin Jen
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - Shawn Levy
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - Jenny Xiang
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Alicia Alonso
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Xia Zhao
- BGI-Shenzhen, Shenzhen, China.,MGI, BGI-Shenzhen, Shenzhen, China
| | | | | | - Yonggang Zhao
- BGI-Shenzhen, Shenzhen, China.,Department of Biotechnology and Biomedicine, Technical University of Denmark, Lyngby, Denmark
| | - Haorong Lu
- BGI-Shenzhen, Shenzhen, China.,Guangdong Provincial Key Laboratory of Genome Read and Write, Shenzhen, China
| | | | | | - William Farmerie
- Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, FL, USA
| | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.,Correspondence and requests for materials should be addressed to Fritz J. Sedlazeck, Don A. Baldwin or Christopher E. Mason. ; ;
| | - Don A. Baldwin
- Department of Pathology, Fox Chase Cancer Center, Philadelphia, PA, USA.,Correspondence and requests for materials should be addressed to Fritz J. Sedlazeck, Don A. Baldwin or Christopher E. Mason. ; ;
| | - Christopher E. Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA.,The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA.,The Feil Family Brain and Mind Research Institute, New York, NY, USA.,The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA.,Correspondence and requests for materials should be addressed to Fritz J. Sedlazeck, Don A. Baldwin or Christopher E. Mason. ; ;
| |
Collapse
|
7
|
Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, Sulovari A, Ebler J, Zhou W, Serra Mari R, Yilmaz F, Zhao X, Hsieh P, Lee J, Kumar S, Lin J, Rausch T, Chen Y, Ren J, Santamarina M, Höps W, Ashraf H, Chuang NT, Yang X, Munson KM, Lewis AP, Fairley S, Tallon LJ, Clarke WE, Basile AO, Byrska-Bishop M, Corvelo A, Evani US, Lu TY, Chaisson MJP, Chen J, Li C, Brand H, Wenger AM, Ghareghani M, Harvey WT, Raeder B, Hasenfeld P, Regier AA, Abel HJ, Hall IM, Flicek P, Stegle O, Gerstein MB, Tubio JMC, Mu Z, Li YI, Shi X, Hastie AR, Ye K, Chong Z, Sanders AD, Zody MC, Talkowski ME, Mills RE, Devine SE, Lee C, Korbel JO, Marschall T, Eichler EE. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 2021; 372:eabf7117. [PMID: 33632895 PMCID: PMC8026704 DOI: 10.1126/science.abf7117] [Citation(s) in RCA: 270] [Impact Index Per Article: 90.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 02/09/2021] [Indexed: 12/14/2022]
Abstract
Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.
Collapse
Affiliation(s)
- Peter Ebert
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Bernardo Rodriguez-Martin
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Marc Jan Bonder
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Jana Ebler
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| | - Rebecca Serra Mari
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Xuefang Zhao
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Joyce Lee
- Bionano Genomics, San Diego, CA 92121, USA
| | - Sushant Kumar
- Program in Computational Biology and Bioinformatics, Yale University, BASS 432 and 437, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Jiadong Lin
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
| | - Tobias Rausch
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Yu Chen
- Department of Genetics and Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Jingwen Ren
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Martin Santamarina
- Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
- Department of Zoology, Genetics, and Physical Anthropology, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Wolfram Höps
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Hufsah Ashraf
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Nelson T Chuang
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | - Xiaofei Yang
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Luke J Tallon
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | | | | | | | | | | | - Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Junjie Chen
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | - Chong Li
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Aaron M Wenger
- Pacific Biosciences of California, Menlo Park, CA 94025, USA
| | - Maryam Ghareghani
- Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, 66123 Saarbrücken, Germany
- Saarbrücken Graduate School of Computer Science, Saarland University, Saarland Informatics Campus E1.3, 66123 Saarbrücken, Germany
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Benjamin Raeder
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Patrick Hasenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Allison A Regier
- Department of Medicine, Washington University, St. Louis, MO 63108, USA
| | - Haley J Abel
- Department of Medicine, Washington University, St. Louis, MO 63108, USA
| | - Ira M Hall
- Department of Genetics, Yale School of Medicine, 333 Cedar Street, New Haven, CT 06510, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Oliver Stegle
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Mark B Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, BASS 432 and 437, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Jose M C Tubio
- Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
- Department of Zoology, Genetics, and Physical Anthropology, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Zepeng Mu
- Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL 60637, USA
| | - Yang I Li
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Xinghua Shi
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | | | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
- Department of Human Genetics, University of Michigan, 1241 E. Catherine Street, Ann Arbor, MI 48109, USA
| | - Zechen Chong
- Department of Genetics and Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Ashley D Sanders
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | | | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ryan E Mills
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, 1241 E. Catherine Street, Ann Arbor, MI 48109, USA
| | - Scott E Devine
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA.
- Precision Medicine Center, The First Affiliated Hospital of Xi'an Jiaotong University, 277 West Yanta Road, Xi'an, 710061, Shaanxi, China
- Department of Graduate Studies-Life Sciences, Ewha Womans University, Ewhayeodae-gil, Seodaemun-gu, Seoul 120-750, South Korea
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tobias Marschall
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
8
|
Taliun D, Harris DN, Kessler MD, Carlson J, Szpiech ZA, Torres R, Taliun SAG, Corvelo A, Gogarten SM, Kang HM, Pitsillides AN, LeFaive J, Lee SB, Tian X, Browning BL, Das S, Emde AK, Clarke WE, Loesch DP, Shetty AC, Blackwell TW, Smith AV, Wong Q, Liu X, Conomos MP, Bobo DM, Aguet F, Albert C, Alonso A, Ardlie KG, Arking DE, Aslibekyan S, Auer PL, Barnard J, Barr RG, Barwick L, Becker LC, Beer RL, Benjamin EJ, Bielak LF, Blangero J, Boehnke M, Bowden DW, Brody JA, Burchard EG, Cade BE, Casella JF, Chalazan B, Chasman DI, Chen YDI, Cho MH, Choi SH, Chung MK, Clish CB, Correa A, Curran JE, Custer B, Darbar D, Daya M, de Andrade M, DeMeo DL, Dutcher SK, Ellinor PT, Emery LS, Eng C, Fatkin D, Fingerlin T, Forer L, Fornage M, Franceschini N, Fuchsberger C, Fullerton SM, Germer S, Gladwin MT, Gottlieb DJ, Guo X, Hall ME, He J, Heard-Costa NL, Heckbert SR, Irvin MR, Johnsen JM, Johnson AD, Kaplan R, Kardia SLR, Kelly T, Kelly S, Kenny EE, Kiel DP, Klemmer R, Konkle BA, Kooperberg C, Köttgen A, Lange LA, Lasky-Su J, Levy D, Lin X, Lin KH, Liu C, Loos RJF, Garman L, Gerszten R, Lubitz SA, Lunetta KL, Mak ACY, Manichaikul A, Manning AK, Mathias RA, McManus DD, McGarvey ST, Meigs JB, Meyers DA, Mikulla JL, Minear MA, Mitchell BD, Mohanty S, Montasser ME, Montgomery C, Morrison AC, Murabito JM, Natale A, Natarajan P, Nelson SC, North KE, O'Connell JR, Palmer ND, Pankratz N, Peloso GM, Peyser PA, Pleiness J, Post WS, Psaty BM, Rao DC, Redline S, Reiner AP, Roden D, Rotter JI, Ruczinski I, Sarnowski C, Schoenherr S, Schwartz DA, Seo JS, Seshadri S, Sheehan VA, Sheu WH, Shoemaker MB, Smith NL, Smith JA, Sotoodehnia N, Stilp AM, Tang W, Taylor KD, Telen M, Thornton TA, Tracy RP, Van Den Berg DJ, Vasan RS, Viaud-Martinez KA, Vrieze S, Weeks DE, Weir BS, Weiss ST, Weng LC, Willer CJ, Zhang Y, Zhao X, Arnett DK, Ashley-Koch AE, Barnes KC, Boerwinkle E, Gabriel S, Gibbs R, Rice KM, Rich SS, Silverman EK, Qasba P, Gan W, Papanicolaou GJ, Nickerson DA, Browning SR, Zody MC, Zöllner S, Wilson JG, Cupples LA, Laurie CC, Jaquish CE, Hernandez RD, O'Connor TD, Abecasis GR. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 2021; 590:290-299. [PMID: 33568819 PMCID: PMC7875770 DOI: 10.1038/s41586-021-03205-y] [Citation(s) in RCA: 801] [Impact Index Per Article: 267.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Accepted: 01/07/2021] [Indexed: 02/08/2023]
Abstract
The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
Collapse
Affiliation(s)
- Daniel Taliun
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Daniel N Harris
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Michael D Kessler
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Jedidiah Carlson
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Zachary A Szpiech
- Department of Biology, Pennsylvania State University, University Park, PA, USA
- Institute for Computational and Data Sciences, Pennsylvania State University, University Park, PA, USA
| | - Raul Torres
- Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, CA, USA
| | - Sarah A Gagliano Taliun
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | | | | | - Hyun Min Kang
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | | | - Jonathon LeFaive
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Seung-Been Lee
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Xiaowen Tian
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Brian L Browning
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Sayantan Das
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | | | | | - Douglas P Loesch
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Amol C Shetty
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Thomas W Blackwell
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Albert V Smith
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Quenna Wong
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Xiaoming Liu
- USF Genomics, College of Public Health, University of South Florida, Tampa, FL, USA
| | - Matthew P Conomos
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Dean M Bobo
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - François Aguet
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Alvaro Alonso
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | | | - Dan E Arking
- McKusick-Nathans Institute, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | | | - Paul L Auer
- Zilber School of Public Health, University of Wisconsin Milwaukee, Milwaukee, WI, USA
| | | | - R Graham Barr
- Department of Medicine, Columbia University Medical Center, New York, NY, USA
- Department of Epidemiology, Columbia University Medical Center, New York, NY, USA
| | | | | | - Rebecca L Beer
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Emelia J Benjamin
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
- Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
- Framingham Heart Study, Framingham, MA, USA
| | - Lawrence F Bielak
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - John Blangero
- Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Michael Boehnke
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Donald W Bowden
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Jennifer A Brody
- Department of Medicine, University of Washington, Seattle, WA, USA
- Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
| | - Esteban G Burchard
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Brian E Cade
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - James F Casella
- Department of Pediatrics, Johns Hopkins University, Baltimore, MD, USA
- Division of Pediatric Hematology, Johns Hopkins University, Baltimore, MD, USA
| | - Brandon Chalazan
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Daniel I Chasman
- Division of Preventive Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Yii-Der Ida Chen
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation, Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Michael H Cho
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | | | - Mina K Chung
- Department of Cardiovascular Medicine, Heart & Vascular Institute, Cleveland Clinic, Cleveland, OH, USA
- Department of Cardiovascular and Metabolic Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Clary B Clish
- Metabolomics Platform, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Adolfo Correa
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
- Department of Pediatrics, University of Mississippi Medical Center, Jackson, MS, USA
- Department of Population Health Science, University of Mississippi Medical Center, Jackson, MS, USA
| | - Joanne E Curran
- Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Brian Custer
- Vitalant Research Institute, San Francisco, CA, USA
- Department of Laboratory Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Dawood Darbar
- Department of Medicine, University of Illinois at Chicago, Chicago, IL, USA
| | - Michelle Daya
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | | - Dawn L DeMeo
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Susan K Dutcher
- McDonnell Genome Institute, Washington University, St Louis, MO, USA
- Department of Genetics, Washington University, St Louis, MO, USA
| | - Patrick T Ellinor
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Leslie S Emery
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Celeste Eng
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Diane Fatkin
- Molecular Cardiology Division, Victor Chang Cardiac Research Institute, Darlinghurst, New South Wales, Australia
- Faculty of Medicine, University of New South Wales, Kensington, New South Wales, Australia
- Cardiology Department, St Vincent's Hospital, Darlinghurst, New South Wales, Australia
| | - Tasha Fingerlin
- National Jewish Health, Center for Genes, Environment and Health, Denver, CO, USA
| | - Lukas Forer
- Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck, Austria
| | - Myriam Fornage
- Institute of Molecular Medicine, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Nora Franceschini
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
| | - Christian Fuchsberger
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck, Austria
- Institute for Biomedicine, Eurac Research, Bolzano, Italy
| | - Stephanie M Fullerton
- Department of Bioethics & Humanities, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Mark T Gladwin
- Pittsburgh Heart, Lung, Blood and Vascular Medicine Institute, University of Pittsburgh, Pittsburgh, PA, USA
- Pulmonary, Allergy and Critical Care Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Daniel J Gottlieb
- VA Boston Healthcare System, Boston, MA, USA
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation, Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Michael E Hall
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - Jiang He
- Department of Epidemiology, Tulane University, New Orleans, LA, USA
- Tulane University Translational Science Institute, Tulane University, New Orleans, LA, USA
| | - Nancy L Heard-Costa
- Framingham Heart Study, Framingham, MA, USA
- Department of Neurology, Boston University School of Medicine, Boston, MA, USA
| | - Susan R Heckbert
- Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
| | - Marguerite R Irvin
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Jill M Johnsen
- Department of Medicine, University of Washington, Seattle, WA, USA
- Bloodworks Northwest Research Institute, Seattle, WA, USA
| | - Andrew D Johnson
- Framingham Heart Study, Framingham, MA, USA
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Framingham, MA, USA
| | - Robert Kaplan
- Albert Einstein College of Medicine, New York, NY, USA
| | - Sharon L R Kardia
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Tanika Kelly
- Department of Epidemiology, Tulane University, New Orleans, LA, USA
| | - Shannon Kelly
- Department of Epidemiology, Vitalant Research Institute, San Francisco, CA, USA
- Department of Pediatrics, UCSF Benioff Children's Hospital, Oakland, CA, USA
- Division of Pediatric Hematology, UCSF Benioff Children's Hospital, Oakland, CA, USA
| | - Eimear E Kenny
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Douglas P Kiel
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Hinda and Arthur Marcus Institute for Aging Research, Hebrew SeniorLife, Boston, MA, USA
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Robert Klemmer
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Barbara A Konkle
- Department of Medicine, University of Washington, Seattle, WA, USA
- Bloodworks Northwest Research Institute, Seattle, WA, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Anna Köttgen
- Department of Epidemiology, Johns Hopkins University, Baltimore, MD, USA
- Institute of Genetic Epidemiology, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Leslie A Lange
- Department of Medicine, University of Colorado at Denver, Aurora, CO, USA
| | - Jessica Lasky-Su
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Brigham and Women's Hospital, Boston, MA, USA
| | - Daniel Levy
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
- Framingham Heart Study, Framingham, MA, USA
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Framingham, MA, USA
| | - Xihong Lin
- Biostatistics and Statistics, Harvard University, Boston, MA, USA
| | - Keng-Han Lin
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Chunyu Liu
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Ruth J F Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Lori Garman
- Department of Genes and Human Disease, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | | | | | - Kathryn L Lunetta
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Angel C Y Mak
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA
| | - Alisa K Manning
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Clinical and Translational Epidemiology Unit, Mongan Institute, Massachusetts General Hospital, Boston, MA, USA
- Metabolism Program, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Rasika A Mathias
- Department of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - David D McManus
- Cardiovascular Medicine, University of Massachusetts Medical School, Worcester, MA, USA
| | - Stephen T McGarvey
- International Health Institute, Brown University, Providence, RI, USA
- Department of Epidemiology, Brown University, Providence, RI, USA
- Department of Anthropology, Brown University, Providence, RI, USA
| | - James B Meigs
- Division of General Internal Medicine, Massachusetts General Hospital, Harvard Medical School, The Broad Institute of MIT and Harvard, Boston, MA, USA
| | | | - Julie L Mikulla
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mollie A Minear
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Braxton D Mitchell
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Geriatrics Research and Education Clinical Center, Baltimore Veterans Administration Medical Center, Baltimore, MD, USA
| | - Sanghamitra Mohanty
- Texas Cardiac Arrhythmia Institute, St David's Medical Center, Austin, TX, USA
- Department of Internal Medicine, Dell Medical School, Austin, TX, USA
| | - May E Montasser
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Courtney Montgomery
- Department of Genes and Human Disease, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Joanne M Murabito
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Andrea Natale
- Texas Cardiac Arrhythmia Institute, St David's Medical Center, Austin, TX, USA
| | - Pradeep Natarajan
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Sarah C Nelson
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Kari E North
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
| | - Jeffrey R O'Connell
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Nicholette D Palmer
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Nathan Pankratz
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, USA
| | - Gina M Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Patricia A Peyser
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Jacob Pleiness
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Wendy S Post
- Division of Cardiology, Department of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Bruce M Psaty
- Department of Medicine, University of Washington, Seattle, WA, USA
- Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Department of Health Services, University of Washington, Seattle, WA, USA
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - D C Rao
- Division of Biostatistics, Washington University in St Louis, St Louis, MO, USA
| | - Susan Redline
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Alexander P Reiner
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Dan Roden
- Vanderbilt University Medical Center, Nashville, TN, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation, Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Ingo Ruczinski
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Chloé Sarnowski
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Sebastian Schoenherr
- Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck, Austria
| | | | - Jeong-Sun Seo
- Precision Medicine Center, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
- Macrogen Inc, Seoul, Republic of Korea
- Gong Wu Genomic Medicine Institute, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
| | - Sudha Seshadri
- Framingham Heart Study, Framingham, MA, USA
- Glenn Biggs Institute for Alzheimer's and Neurodegenerative Diseases, University of Texas Health Sciences Center at San Antonio, San Antonio, TX, USA
| | - Vivien A Sheehan
- Department of Pediatrics, Emory University School of Medicine, Atlanta, GA, USA
- Aflac Cancer and Blood Disorders Center, Children's Healthcare of Atlanta, Atlanta, GA, USA
| | - Wayne H Sheu
- Taichung Veterans General Hospital Taiwan, Taichung City, Taiwan
| | | | - Nicholas L Smith
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
- Seattle Epidemiologic Research and Information Center, Department of Veterans Affairs Office of Research and Development, Seattle, WA, USA
| | - Jennifer A Smith
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Nona Sotoodehnia
- Cardiovascular Health Research Unit, University of Washington, Seattle, WA, USA
| | - Adrienne M Stilp
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Weihong Tang
- Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Kent D Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation, Harbor-UCLA Medical Center, Torrance, CA, USA
| | | | | | - Russell P Tracy
- Department of Pathology & Laboratory Medicine, University of Vermont Larner College of Medicine, Burlington, VT, USA
| | - David J Van Den Berg
- Center for Genetic Epidemiology, Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Ramachandran S Vasan
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
- Framingham Heart Study, Framingham, MA, USA
| | | | - Scott Vrieze
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | - Daniel E Weeks
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Bruce S Weir
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Scott T Weiss
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Brigham and Women's Hospital, Boston, MA, USA
| | | | - Cristen J Willer
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Internal Medicine-Cardiology, University of Michigan, Ann Arbor, MI, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Yingze Zhang
- Pittsburgh Heart, Lung, Blood and Vascular Medicine Institute, University of Pittsburgh, Pittsburgh, PA, USA
- Pulmonary, Allergy and Critical Care Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Xutong Zhao
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Donna K Arnett
- Department of Epidemiology, University of Kentucky, Lexington, KY, USA
| | - Allison E Ashley-Koch
- Duke Molecular Physiology Institute, Duke University Medical Center, Durham, NC, USA
| | - Kathleen C Barnes
- Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Eric Boerwinkle
- University of Texas Health Science Center at Houston, Houston, TX, USA
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Stacey Gabriel
- The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Richard Gibbs
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Kenneth M Rice
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA
| | - Edwin K Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Pankaj Qasba
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Weiniu Gan
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - George J Papanicolaou
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Deborah A Nickerson
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Northwest Genomics Center, Seattle, WA, USA
- Brotman Baty Institute, Seattle, WA, USA
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | | | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
- Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
| | - James G Wilson
- Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS, USA
| | - L Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA.
- Framingham Heart Study, Framingham, MA, USA.
| | - Cathy C Laurie
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| | - Cashell E Jaquish
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Ryan D Hernandez
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA.
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada.
- Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA, USA.
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA.
| | - Timothy D O'Connor
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA.
- Program in Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA.
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA.
| | - Gonçalo R Abecasis
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA.
| |
Collapse
|
9
|
Khedikar Y, Clarke WE, Chen L, Higgins EE, Kagale S, Koh CS, Bennett R, Parkin IAP. Narrow genetic base shapes population structure and linkage disequilibrium in an industrial oilseed crop, Brassica carinata A. Braun. Sci Rep 2020; 10:12629. [PMID: 32724070 PMCID: PMC7387349 DOI: 10.1038/s41598-020-69255-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Accepted: 07/09/2020] [Indexed: 12/16/2022] Open
Abstract
Ethiopian mustard (Brassica carinata A. Braun) is an emerging sustainable source of vegetable oil, in particular for the biofuel industry. The present study exploited genome assemblies of the Brassica diploids, Brassica nigra and Brassica oleracea, to discover over 10,000 genome-wide SNPs using genotype by sequencing of 620 B. carinata lines. The analyses revealed a SNP frequency of one every 91.7 kb, a heterozygosity level of 0.30, nucleotide diversity levels of 1.31 × 10-05, and the first five principal components captured only 13% molecular variation, indicating low levels of genetic diversity among the B. carinata collection. Genome bias was observed, with greater SNP density found on the B subgenome. The 620 lines clustered into two distinct sub-populations (SP1 and SP2) with the majority of accessions (88%) clustered in SP1 with those from Ethiopia, the presumed centre of origin. SP2 was distinguished by a collection of breeding lines, implicating targeted selection in creating population structure. Two selective sweep regions on B3 and B8 were detected, which harbour genes involved in fatty acid and aliphatic glucosinolate biosynthesis, respectively. The assessment of genetic diversity, population structure, and LD in the global B. carinata collection provides critical information to assist future crop improvement.
Collapse
Affiliation(s)
- Yogendra Khedikar
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, Canada
| | - Wayne E Clarke
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, Canada
| | - Lifeng Chen
- Agrisoma Biosciences Inc., 110 Gymnasium Place, Saskatoon, SK, Canada
| | - Erin E Higgins
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, Canada
| | - Sateesh Kagale
- National Research Council Canada, 110 Gymnasium Place, Saskatoon, SK, Canada
| | - Chu Shin Koh
- Global Institute of Food Security, Saskatoon, SK, Canada
| | - Rick Bennett
- Agrisoma Biosciences Inc., 110 Gymnasium Place, Saskatoon, SK, Canada
| | - Isobel A P Parkin
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, Canada.
| |
Collapse
|
10
|
Chapman LM, Spies N, Pai P, Lim CS, Carroll A, Narzisi G, Watson CM, Proukakis C, Clarke WE, Nariai N, Dawson E, Jones G, Blankenberg D, Brueffer C, Xiao C, Kolora SRR, Alexander N, Wolujewicz P, Ahmed AE, Smith G, Shehreen S, Wenger AM, Salit M, Zook JM. A crowdsourced set of curated structural variants for the human genome. PLoS Comput Biol 2020; 16:e1007933. [PMID: 32559231 PMCID: PMC7329145 DOI: 10.1371/journal.pcbi.1007933] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Revised: 07/01/2020] [Accepted: 05/07/2020] [Indexed: 11/19/2022] Open
Abstract
A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app-SVCurator-to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. 'Expert' curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of 'expert' curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies.
Collapse
Affiliation(s)
- Lesley M. Chapman
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland, United States of America
| | - Noah Spies
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland, United States of America
- The Joint Initiative for Metrology in Biology, Stanford University, Stanford, California, United States of America
- Departments of Genetics and Pathology, Stanford University, Stanford, California, United States of America
| | - Patrick Pai
- University of Maryland - College Park, College Park, Maryland, United States of America
| | - Chun Shen Lim
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Andrew Carroll
- DNAnexus Inc, Mountain View, California, United States of America
| | - Giuseppe Narzisi
- New York Genome Center, New York, New York, United States of America
| | - Christopher M. Watson
- School of Medicine, University of Leeds, Saint James's University Hospital, Leeds, Leeds, United Kingdom
- Yorkshire Regional Genetics Service, The Leeds Teaching Hospitals NHS Trust, Saint James's University Hospital, Leeds, United Kingdom
| | - Christos Proukakis
- University College London, Institute of Neurology, London, United Kingdom
| | - Wayne E. Clarke
- New York Genome Center, New York, New York, United States of America
| | - Naoki Nariai
- Illumina, Inc. San Diego, California, United States of America
| | - Eric Dawson
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland, United States of America
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Garan Jones
- University of Exeter Medical School, Epidemiology and Public Health Group, Barrack Road, Exeter, Devon, United Kingdom
| | - Daniel Blankenberg
- Genomic Medicine Institute Lerner Research Institute Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Christian Brueffer
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Lund, Sweden
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Sree Rohit Raj Kolora
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany
- Molecular Evolution and Systematics of Animals, Institute of Biology, University of Leipzig, Leipzig, Germany
| | - Noah Alexander
- Molecular Biology Institute, University of California Los Angeles, Los Angeles, California, United States of America
| | - Paul Wolujewicz
- Weill Cornell, Belfer Research Building, New York, New York, United States of America
| | - Azza E. Ahmed
- Center for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum and Department of Electrical and Electronic Engineering, Faculty of Engineering, University of Khartoum, Khartoum, Sudan
| | - Graeme Smith
- Guy's Hospital and St Thomas's NHS Foundation Trust Great Maze Pond, London, United Kingdom
| | - Saadlee Shehreen
- Department of Genetic Engineering & Biotechnology, University of Dhaka, Bangladesh
| | - Aaron M. Wenger
- Pacific Biosciences, Menlo Park, California, United States of America
| | - Marc Salit
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland, United States of America
- The Joint Initiative for Metrology in Biology, Stanford University, Stanford, California, United States of America
| | - Justin M. Zook
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland, United States of America
| |
Collapse
|
11
|
Llamas B, Narzisi G, Schneider V, Audano PA, Biederstedt E, Blauvelt L, Bradbury P, Chang X, Chin CS, Fungtammasan A, Clarke WE, Cleary A, Ebler J, Eizenga J, Sibbesen JA, Markello CJ, Garrison E, Garg S, Hickey G, Lazo GR, Lin MF, Mahmoud M, Marschall T, Minkin I, Monlong J, Musunuri RL, Sagayaradj S, Novak AM, Rautiainen M, Regier A, Sedlazeck FJ, Siren J, Souilmi Y, Wagner J, Wrightsman T, Yokoyama TT, Zeng Q, Zook JM, Paten B, Busby B. A strategy for building and using a human reference pangenome. F1000Res 2019; 8:1751. [PMID: 34386196 PMCID: PMC8350888 DOI: 10.12688/f1000research.19630.1] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/23/2021] [Indexed: 01/27/2024] Open
Abstract
In March 2019, 45 scientists and software engineers from around the world converged at the University of California, Santa Cruz for the first pangenomics codeathon. The purpose of the meeting was to propose technical specifications and standards for a usable human pangenome as well as to build relevant tools for genome graph infrastructures. During the meeting, the group held several intense and productive discussions covering a diverse set of topics, including advantages of graph genomes over a linear reference representation, design of new methods that can leverage graph-based data structures, and novel visualization and annotation approaches for pangenomes. Additionally, the participants self-organized themselves into teams that worked intensely over a three-day period to build a set of pipelines and tools for specific pangenomic applications. A summary of the questions raised and the tools developed are reported in this manuscript.
Collapse
Affiliation(s)
- Bastien Llamas
- Australian Centre for Ancient DNA, School of Biological Sciences, Environment Institute, The University of Adelaide, Adelaide, South Australia, 5005, Australia
| | | | - Valerie Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Peter A. Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Evan Biederstedt
- Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02215, USA
| | - Lon Blauvelt
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Peter Bradbury
- Robert W. Holley Center, USDA-ARS, Ithaca, NY, 14853, USA
| | - Xian Chang
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | | | | | | | - Alan Cleary
- National Center for Genome Resources 87505, Santa Fe, NM, 87505, USA
| | - Jana Ebler
- Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Jordan Eizenga
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Jonas A. Sibbesen
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Charles J. Markello
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Erik Garrison
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Shilpa Garg
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
| | - Glenn Hickey
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Gerard R. Lazo
- Western Regional Research Center, USDA-ARS, Albany, CA, 94710-1105, USA
| | | | - Medhat Mahmoud
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston TX, TX, 77030, USA
| | | | - Ilia Minkin
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Jean Monlong
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | | | - Sagayamary Sagayaradj
- Genome Center, University of California, Davis, Davis, CA, USA
- BASF, West Sacramento, CA, USA
| | - Adam M. Novak
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | | | - Allison Regier
- McDonnell Genome Institute, Washington University in St Louis, St Louis, MO, 63108, USA
| | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston TX, TX, 77030, USA
| | - Jouni Siren
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Yassine Souilmi
- Australian Centre for Ancient DNA, School of Biological Sciences, Environment Institute, The University of Adelaide, Adelaide, South Australia, 5005, Australia
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA
| | - Travis Wrightsman
- Section of Plant Breeding and Genetics, Cornell University, Ithaca, NY, 14853, USA
| | - Toshiyuki T. Yokoyama
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Qiandong Zeng
- Laboratory Corporation of America Holdings, Westborough, MA, 01581, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA
| | - Benedict Paten
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Ben Busby
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| |
Collapse
|
12
|
Llamas B, Narzisi G, Schneider V, Audano PA, Biederstedt E, Blauvelt L, Bradbury P, Chang X, Chin CS, Fungtammasan A, Clarke WE, Cleary A, Ebler J, Eizenga J, Sibbesen JA, Markello CJ, Garrison E, Garg S, Hickey G, Lazo GR, Lin MF, Mahmoud M, Marschall T, Minkin I, Monlong J, Musunuri RL, Sagayaradj S, Novak AM, Rautiainen M, Regier A, Sedlazeck FJ, Siren J, Souilmi Y, Wagner J, Wrightsman T, Yokoyama TT, Zeng Q, Zook JM, Paten B, Busby B. A strategy for building and using a human reference pangenome. F1000Res 2019; 8:1751. [PMID: 34386196 PMCID: PMC8350888 DOI: 10.12688/f1000research.19630.2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/23/2021] [Indexed: 11/20/2022] Open
Abstract
In March 2019, 45 scientists and software engineers from around the world converged at the University of California, Santa Cruz for the first pangenomics codeathon. The purpose of the meeting was to propose technical specifications and standards for a usable human pangenome as well as to build relevant tools for genome graph infrastructures. During the meeting, the group held several intense and productive discussions covering a diverse set of topics, including advantages of graph genomes over a linear reference representation, design of new methods that can leverage graph-based data structures, and novel visualization and annotation approaches for pangenomes. Additionally, the participants self-organized themselves into teams that worked intensely over a three-day period to build a set of pipelines and tools for specific pangenomic applications. A summary of the questions raised and the tools developed are reported in this manuscript.
Collapse
Affiliation(s)
- Bastien Llamas
- Australian Centre for Ancient DNA, School of Biological Sciences, Environment Institute, The University of Adelaide, Adelaide, South Australia, 5005, Australia
| | | | - Valerie Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Evan Biederstedt
- Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02215, USA
| | - Lon Blauvelt
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Peter Bradbury
- Robert W. Holley Center, USDA-ARS, Ithaca, NY, 14853, USA
| | - Xian Chang
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | | | | | | | - Alan Cleary
- National Center for Genome Resources 87505, Santa Fe, NM, 87505, USA
| | - Jana Ebler
- Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Jordan Eizenga
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Jonas A Sibbesen
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Charles J Markello
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Erik Garrison
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Shilpa Garg
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
| | - Glenn Hickey
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Gerard R Lazo
- Western Regional Research Center, USDA-ARS, Albany, CA, 94710-1105, USA
| | | | - Medhat Mahmoud
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston TX, TX, 77030, USA
| | | | - Ilia Minkin
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Jean Monlong
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | | | - Sagayamary Sagayaradj
- Genome Center, University of California, Davis, Davis, CA, USA.,BASF, West Sacramento, CA, USA
| | - Adam M Novak
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | | | - Allison Regier
- McDonnell Genome Institute, Washington University in St Louis, St Louis, MO, 63108, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston TX, TX, 77030, USA
| | - Jouni Siren
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Yassine Souilmi
- Australian Centre for Ancient DNA, School of Biological Sciences, Environment Institute, The University of Adelaide, Adelaide, South Australia, 5005, Australia
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA
| | - Travis Wrightsman
- Section of Plant Breeding and Genetics, Cornell University, Ithaca, NY, 14853, USA
| | - Toshiyuki T Yokoyama
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Qiandong Zeng
- Laboratory Corporation of America Holdings, Westborough, MA, 01581, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA
| | - Benedict Paten
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Ben Busby
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| |
Collapse
|
13
|
Corvelo A, Clarke WE, Robine N, Zody MC. taxMaps: comprehensive and highly accurate taxonomic classification of short-read data in reasonable time. Genome Res 2018; 28:751-758. [PMID: 29588360 PMCID: PMC5932614 DOI: 10.1101/gr.225276.117] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 03/21/2018] [Indexed: 01/15/2023]
Abstract
High-throughput sequencing is a revolutionary technology for the analysis of metagenomic samples. However, querying large volumes of reads against comprehensive DNA/RNA databases in a sensitive manner can be compute-intensive. Here, we present taxMaps, a highly efficient, sensitive, and fully scalable taxonomic classification tool. Using a combination of simulated and real metagenomics data sets, we demonstrate that taxMaps is more sensitive and more precise than widely used taxonomic classifiers and is capable of delivering classification accuracy comparable to that of BLASTN, but at up to three orders of magnitude less computational cost.
Collapse
Affiliation(s)
- André Corvelo
- New York Genome Center, New York, New York 10013, USA
| | | | | | | |
Collapse
|
14
|
Kagale S, Nixon J, Khedikar Y, Pasha A, Provart NJ, Clarke WE, Bollina V, Robinson SJ, Coutu C, Hegedus DD, Sharpe AG, Parkin IAP. The developmental transcriptome atlas of the biofuel crop Camelina sativa. Plant J 2016; 88:879-894. [PMID: 27513981 DOI: 10.1111/tpj.13302] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Revised: 08/01/2016] [Accepted: 08/04/2016] [Indexed: 05/17/2023]
Abstract
Camelina sativa is currently being embraced as a viable industrial bio-platform crop due to a number of desirable agronomic attributes and the unique fatty acid profile of the seed oil that has applications for food, feed and biofuel. The recent completion of the reference genome sequence of C. sativa identified a young hexaploid genome. To complement this work, we have generated a genome-wide developmental transcriptome map by RNA sequencing of 12 different tissues covering major developmental stages during the life cycle of C. sativa. We have generated a digital atlas of this comprehensive transcriptome resource that enables interactive visualization of expression data through a searchable database of electronic fluorescent pictographs (eFP browser). An analysis of this dataset supported expression of 88% of the annotated genes in C. sativa and provided a global overview of the complex architecture of temporal and spatial gene expression patterns active during development. Conventional differential gene expression analysis combined with weighted gene expression network analysis uncovered similarities as well as differences in gene expression patterns between different tissues and identified tissue-specific genes and network modules. A high-quality census of transcription factors, analysis of alternative splicing and tissue-specific genome dominance provided insight into the transcriptional dynamics and sub-genome interplay among the well-preserved triplicated repertoire of homeologous loci. The comprehensive transcriptome atlas in combination with the reference genome sequence provides a powerful resource for genomics research which can be leveraged to identify functional associations between genes and understand the regulatory networks underlying developmental processes.
Collapse
Affiliation(s)
- Sateesh Kagale
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, Canada
- National Research Council Canada, 110 Gymnasium Place, Saskatoon, SK, Canada
| | - John Nixon
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, Canada
| | - Yogendra Khedikar
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, Canada
| | - Asher Pasha
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Nicholas J Provart
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada
| | - Wayne E Clarke
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, Canada
| | - Venkatesh Bollina
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, Canada
| | - Stephen J Robinson
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, Canada
| | - Cathy Coutu
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, Canada
| | - Dwayne D Hegedus
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, Canada
| | - Andrew G Sharpe
- National Research Council Canada, 110 Gymnasium Place, Saskatoon, SK, Canada
| | - Isobel A P Parkin
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, Canada
| |
Collapse
|
15
|
Clarke WE, Higgins EE, Plieske J, Wieseke R, Sidebottom C, Khedikar Y, Batley J, Edwards D, Meng J, Li R, Lawley CT, Pauquet J, Laga B, Cheung W, Iniguez-Luy F, Dyrszka E, Rae S, Stich B, Snowdon RJ, Sharpe AG, Ganal MW, Parkin IAP. A high-density SNP genotyping array for Brassica napus and its ancestral diploid species based on optimised selection of single-locus markers in the allotetraploid genome. Theor Appl Genet 2016; 129:1887-99. [PMID: 27364915 PMCID: PMC5025514 DOI: 10.1007/s00122-016-2746-7] [Citation(s) in RCA: 118] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Accepted: 06/18/2016] [Indexed: 05/18/2023]
Abstract
The Brassica napus Illumina array provides genome-wide markers linked to the available genome sequence, a significant tool for genetic analyses of the allotetraploid B. napus and its progenitor diploid genomes. A high-density single nucleotide polymorphism (SNP) Illumina Infinium array, containing 52,157 markers, was developed for the allotetraploid Brassica napus. A stringent selection process employing the short probe sequence for each SNP assay was used to limit the majority of the selected markers to those represented a minimum number of times across the highly replicated genome. As a result approximately 60 % of the SNP assays display genome-specificity, resolving as three clearly separated clusters (AA, AB, and BB) when tested with a diverse range of B. napus material. This genome specificity was supported by the analysis of the diploid ancestors of B. napus, whereby 26,504 and 29,720 markers were scorable in B. oleracea and B. rapa, respectively. Forty-four percent of the assayed loci on the array were genetically mapped in a single doubled-haploid B. napus population allowing alignment of their physical and genetic coordinates. Although strong conservation of the two positions was shown, at least 3 % of the loci were genetically mapped to a homoeologous position compared to their presumed physical position in the respective genome, underlying the importance of genetic corroboration of locus identity. In addition, the alignments identified multiple rearrangements between the diploid and tetraploid Brassica genomes. Although mostly attributed to genome assembly errors, some are likely evidence of rearrangements that occurred since the hybridisation of the progenitor genomes in the B. napus nucleus. Based on estimates for linkage disequilibrium decay, the array is a valuable tool for genetic fine mapping and genome-wide association studies in B. napus and its progenitor genomes.
Collapse
Affiliation(s)
- Wayne E Clarke
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, S7N 0X2, Canada
| | - Erin E Higgins
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, S7N 0X2, Canada
| | - Joerg Plieske
- TraitGenetics GmbH, Am Schwabeplan 1b, Stadt Seeland OT, 06466, Gatersleben, Germany
| | - Ralf Wieseke
- TraitGenetics GmbH, Am Schwabeplan 1b, Stadt Seeland OT, 06466, Gatersleben, Germany
| | - Christine Sidebottom
- National Research Council Canada, 110 Gymnasium Place, Saskatoon, S7N 0W9, Canada
| | - Yogendra Khedikar
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, S7N 0X2, Canada
| | - Jacqueline Batley
- School of Plant Biology and The UWA Institute of Agriculture, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia
| | - Dave Edwards
- School of Plant Biology and The UWA Institute of Agriculture, The University of Western Australia, 35 Stirling Highway, Crawley, Perth, 6009, Australia
| | - Jinling Meng
- National Key Laboratory of Crop Genetic Improvement, Key Laboratory of Rapeseed Genetic Improvement, Ministry of Agriculture P. R. China, Huazhong Agricultural University, Wuhan, 430070, China
| | - Ruiyuan Li
- National Key Laboratory of Crop Genetic Improvement, Key Laboratory of Rapeseed Genetic Improvement, Ministry of Agriculture P. R. China, Huazhong Agricultural University, Wuhan, 430070, China
| | | | - Jérôme Pauquet
- BIOGEMMA 6, chemin des Panedautes, 31700, Mondonville, France
- SYNGENTA France SAS, 346, route des Pasquiers, 84260, Sarrians, France
| | | | - Wing Cheung
- DNA Landmarks Inc, 84 Rue Richelieu, St-Jean-sur-Richelieu, QC, J3B 6X3, Canada
| | - Federico Iniguez-Luy
- Genomics and Bioinformatics Unit, Agri Aquaculture Nutritional Genomic Center (CGNA), Conicyt-Regional, Gore La Araucania, R10C1001, Temuco, Chile
| | - Emmanuelle Dyrszka
- Syngenta France SAS, 12 Chemin de l'hobit, B.P. 27, 31790, Saint-Sauveur, France
| | | | - Benjamin Stich
- Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829, Cologne, Germany
| | - Rod J Snowdon
- Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University, Giessen, Germany
| | - Andrew G Sharpe
- National Research Council Canada, 110 Gymnasium Place, Saskatoon, S7N 0W9, Canada
| | - Martin W Ganal
- TraitGenetics GmbH, Am Schwabeplan 1b, Stadt Seeland OT, 06466, Gatersleben, Germany
| | - Isobel A P Parkin
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, S7N 0X2, Canada.
| |
Collapse
|
16
|
Rolfe SA, Strelkov SE, Links MG, Clarke WE, Robinson SJ, Djavaheri M, Malinowski R, Haddadi P, Kagale S, Parkin IAP, Taheri A, Borhan MH. The compact genome of the plant pathogen Plasmodiophora brassicae is adapted to intracellular interactions with host Brassica spp. BMC Genomics 2016; 17:272. [PMID: 27036196 PMCID: PMC4815078 DOI: 10.1186/s12864-016-2597-2] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2015] [Accepted: 03/16/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The protist Plasmodiophora brassicae is a soil-borne pathogen of cruciferous species and the causal agent of clubroot disease of Brassicas including agriculturally important crops such as canola/rapeseed (Brassica napus). P. brassicae has remained an enigmatic plant pathogen and is a rare example of an obligate biotroph that resides entirely inside the host plant cell. The pathogen is the cause of severe yield losses and can render infested fields unsuitable for Brassica crop growth due to the persistence of resting spores in the soil for up to 20 years. RESULTS To provide insight into the biology of the pathogen and its interaction with its primary host B. napus, we produced a draft genome of P. brassicae pathotypes 3 and 6 (Pb3 and Pb6) that differ in their host range. Pb3 is highly virulent on B. napus (but also infects other Brassica species) while Pb6 infects only vegetable Brassica crops. Both the Pb3 and Pb6 genomes are highly compact, each with a total size of 24.2 Mb, and contain less than 2 % repetitive DNA. Clustering of genome-wide single nucleotide polymorphisms (SNP) of Pb3, Pb6 and three additional re-sequenced pathotypes (Pb2, Pb5 and Pb8) shows a high degree of correlation of cluster grouping with host range. The Pb3 genome features significant reduction of intergenic space with multiple examples of overlapping untranslated regions (UTRs). Dependency on the host for essential nutrients is evident from the loss of genes for the biosynthesis of thiamine and some amino acids and the presence of a wide range of transport proteins, including some unique to P. brassicae. The annotated genes of Pb3 include those with a potential role in the regulation of the plant growth hormones cytokinin and auxin. The expression profile of Pb3 genes, including putative effectors, during infection and their potential role in manipulation of host defence is discussed. CONCLUSION The P. brassicae genome sequence reveals a compact genome, a dependency of the pathogen on its host for some essential nutrients and a potential role in the regulation of host plant cytokinin and auxin. Genome annotation supported by RNA sequencing reveals significant reduction in intergenic space which, in addition to low repeat content, has likely contributed to the P. brassicae compact genome.
Collapse
Affiliation(s)
- Stephen A. Rolfe
- />Department of Animal and Plant Sciences, University of Sheffield, Sheffield, S10 2TN UK
| | - Stephen E. Strelkov
- />Department of Agricultural, Food and Nutritional Science, University of Alberta, 410 Agriculture/Forestry Centre, Edmonton, AB T6G 2P5 Canada
| | - Matthew G. Links
- />Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2 Canada
| | - Wayne E. Clarke
- />Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2 Canada
- />Present address: New York Genome Center, 101 6th Ave, New York, NY 10013 USA
| | - Stephen J. Robinson
- />Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2 Canada
| | - Mohammad Djavaheri
- />Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2 Canada
| | - Robert Malinowski
- />Department of Integrative Plant Biology, Institute of Plant Genetics of the Polish Academy of Sciences, ul. Strzeszynska 34, 60-479 Poznan, Poland
| | - Parham Haddadi
- />Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2 Canada
| | - Sateesh Kagale
- />National Research Council Canada, 110 Gymnasium Place, Saskatoon, SK, S7N 0W9 Canada
| | - Isobel A. P. Parkin
- />Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2 Canada
| | - Ali Taheri
- />Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2 Canada
- />Present address: Department of Agricultural and Environmental Sciences, College of Agriculture, Human and Natural Sciences, Tennessee State University, 3500 John A Merritt Blvd, Nashville, TN 37209 USA
| | - M. Hossein Borhan
- />Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2 Canada
| |
Collapse
|
17
|
Abstract
The development of genotyping-by-sequencing (GBS) to rapidly detect nucleotide variation at the whole genome level, in many individuals simultaneously, has provided a transformative genetic profiling technique. GBS can be carried out in species with or without reference genome sequences yields huge amounts of potentially informative data. One limitation with the approach is the paucity of tools to transform the raw data into a format that can be easily interrogated at the genetic level. In this chapter we describe bioinformatics tools developed to address this shortfall together with experimental design considerations to fully leverage the power of GBS for genetic analysis.
Collapse
Affiliation(s)
- Sateesh Kagale
- National Research Council Canada, 110 Gymnasium Place, Saskatoon, SK, Canada, S7N 0W9
| | - Chushin Koh
- National Research Council Canada, 110 Gymnasium Place, Saskatoon, SK, Canada, S7N 0W9
| | - Wayne E Clarke
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, Canada, S7N 0X2
| | - Venkatesh Bollina
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, Canada, S7N 0X2
| | - Isobel A P Parkin
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK, Canada, S7N 0X2
| | - Andrew G Sharpe
- National Research Council Canada, 110 Gymnasium Place, Saskatoon, SK, Canada, S7N 0W9.
| |
Collapse
|
18
|
Singh R, Bollina V, Higgins EE, Clarke WE, Eynck C, Sidebottom C, Gugel R, Snowdon R, Parkin IAP. Single-nucleotide polymorphism identification and genotyping in Camelina sativa. Mol Breed 2015; 35:35. [PMID: 25620879 PMCID: PMC4300397 DOI: 10.1007/s11032-015-0224-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2014] [Accepted: 11/18/2014] [Indexed: 05/09/2023]
Abstract
Camelina sativa, a largely relict crop, has recently returned to interest due to its potential as an industrial oilseed. Molecular markers are key tools that will allow C. sativa to benefit from modern breeding approaches. Two complementary methodologies, capture of 3' cDNA tags and genomic reduced-representation libraries, both of which exploited second generation sequencing platforms, were used to develop a low density (768) Illumina GoldenGate single nucleotide polymorphism (SNP) array. The array allowed 533 SNP loci to be genetically mapped in a recombinant inbred population of C. sativa. Alignment of the SNP loci to the C. sativa genome identified the underlying sequenced regions that would delimit potential candidate genes in any mapping project. In addition, the SNP array was used to assess genetic variation among a collection of 175 accessions of C. sativa, identifying two sub-populations, yet low overall gene diversity. The SNP loci will provide useful tools for future crop improvement of C. sativa.
Collapse
Affiliation(s)
- Ravinder Singh
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, S7N 0X2 Canada
- School of Biotechnology, Sher-e-Kashmir University of Agricultural Sciences and Technology of Jammu, Jammu, 180 009 JK India
| | - Venkatesh Bollina
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, S7N 0X2 Canada
| | - Erin E. Higgins
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, S7N 0X2 Canada
| | - Wayne E. Clarke
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, S7N 0X2 Canada
| | - Christina Eynck
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, S7N 0X2 Canada
| | - Christine Sidebottom
- National Research Council Canada, 110 Gymnasium Place, Saskatoon, S7N 0W9 Canada
| | - Richard Gugel
- Plant Gene Resources Canada, 107 Science Place, Saskatoon, S7N 0X2 Canada
| | - Rod Snowdon
- Department of Plant Breeding, Justus Liebig University, Heinrich-Buff-Ring 26-32, 35392 Giessen, Germany
| | - Isobel A. P. Parkin
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, S7N 0X2 Canada
| |
Collapse
|
19
|
Kagale S, Robinson SJ, Nixon J, Xiao R, Huebert T, Condie J, Kessler D, Clarke WE, Edger PP, Links MG, Sharpe AG, Parkin IAP. Polyploid evolution of the Brassicaceae during the Cenozoic era. Plant Cell 2014; 26:2777-91. [PMID: 25035408 PMCID: PMC4145113 DOI: 10.1105/tpc.114.126391] [Citation(s) in RCA: 82] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Revised: 06/07/2014] [Accepted: 06/19/2014] [Indexed: 05/18/2023]
Abstract
The Brassicaceae (Cruciferae) family, owing to its remarkable species, genetic, and physiological diversity as well as its significant economic potential, has become a model for polyploidy and evolutionary studies. Utilizing extensive transcriptome pyrosequencing of diverse taxa, we established a resolved phylogeny of a subset of crucifer species. We elucidated the frequency, age, and phylogenetic position of polyploidy and lineage separation events that have marked the evolutionary history of the Brassicaceae. Besides the well-known ancient α (47 million years ago [Mya]) and β (124 Mya) paleopolyploidy events, several species were shown to have undergone a further more recent (∼7 to 12 Mya) round of genome multiplication. We identified eight whole-genome duplications corresponding to at least five independent neo/mesopolyploidy events. Although the Brassicaceae family evolved from other eudicots at the beginning of the Cenozoic era of the Earth (60 Mya), major diversification occurred only during the Neogene period (0 to 23 Mya). Remarkably, the widespread species divergence, major polyploidy, and lineage separation events during Brassicaceae evolution are clustered in time around epoch transitions characterized by prolonged unstable climatic conditions. The synchronized diversification of Brassicaceae species suggests that polyploid events may have conferred higher adaptability and increased tolerance toward the drastically changing global environment, thus facilitating species radiation.
Collapse
Affiliation(s)
- Sateesh Kagale
- Agriculture and Agri-Food Canada, Saskatoon SK S7N 0X2, Canada National Research Council Canada, Saskatoon SK S7N 0W9, Canada
| | | | - John Nixon
- Agriculture and Agri-Food Canada, Saskatoon SK S7N 0X2, Canada
| | - Rong Xiao
- Agriculture and Agri-Food Canada, Saskatoon SK S7N 0X2, Canada
| | - Terry Huebert
- Agriculture and Agri-Food Canada, Saskatoon SK S7N 0X2, Canada
| | - Janet Condie
- National Research Council Canada, Saskatoon SK S7N 0W9, Canada
| | - Dallas Kessler
- Plant Gene Resources of Canada, Saskatoon SK S7N 0X2, Canada
| | - Wayne E Clarke
- Agriculture and Agri-Food Canada, Saskatoon SK S7N 0X2, Canada
| | - Patrick P Edger
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720
| | - Matthew G Links
- Agriculture and Agri-Food Canada, Saskatoon SK S7N 0X2, Canada
| | - Andrew G Sharpe
- National Research Council Canada, Saskatoon SK S7N 0W9, Canada
| | | |
Collapse
|
20
|
Parkin IAP, Koh C, Tang H, Robinson SJ, Kagale S, Clarke WE, Town CD, Nixon J, Krishnakumar V, Bidwell SL, Denoeud F, Belcram H, Links MG, Just J, Clarke C, Bender T, Huebert T, Mason AS, Pires JC, Barker G, Moore J, Walley PG, Manoli S, Batley J, Edwards D, Nelson MN, Wang X, Paterson AH, King G, Bancroft I, Chalhoub B, Sharpe AG. Transcriptome and methylome profiling reveals relics of genome dominance in the mesopolyploid Brassica oleracea. Genome Biol 2014; 15:R77. [PMID: 24916971 PMCID: PMC4097860 DOI: 10.1186/gb-2014-15-6-r77] [Citation(s) in RCA: 281] [Impact Index Per Article: 28.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2014] [Accepted: 06/10/2014] [Indexed: 01/24/2023] Open
Abstract
Background Brassica oleracea is a valuable vegetable species that has contributed to human health and nutrition for hundreds of years and comprises multiple distinct cultivar groups with diverse morphological and phytochemical attributes. In addition to this phenotypic wealth, B. oleracea offers unique insights into polyploid evolution, as it results from multiple ancestral polyploidy events and a final Brassiceae-specific triplication event. Further, B. oleracea represents one of the diploid genomes that formed the economically important allopolyploid oilseed, Brassica napus. A deeper understanding of B. oleracea genome architecture provides a foundation for crop improvement strategies throughout the Brassica genus. Results We generate an assembly representing 75% of the predicted B. oleracea genome using a hybrid Illumina/Roche 454 approach. Two dense genetic maps are generated to anchor almost 92% of the assembled scaffolds to nine pseudo-chromosomes. Over 50,000 genes are annotated and 40% of the genome predicted to be repetitive, thus contributing to the increased genome size of B. oleracea compared to its close relative B. rapa. A snapshot of both the leaf transcriptome and methylome allows comparisons to be made across the triplicated sub-genomes, which resulted from the most recent Brassiceae-specific polyploidy event. Conclusions Differential expression of the triplicated syntelogs and cytosine methylation levels across the sub-genomes suggest residual marks of the genome dominance that led to the current genome architecture. Although cytosine methylation does not correlate with individual gene dominance, the independent methylation patterns of triplicated copies suggest epigenetic mechanisms play a role in the functional diversification of duplicate genes.
Collapse
|
21
|
Clarke WE, Parkin IA, Gajardo HA, Gerhardt DJ, Higgins E, Sidebottom C, Sharpe AG, Snowdon RJ, Federico ML, Iniguez-Luy FL. Genomic DNA enrichment using sequence capture microarrays: a novel approach to discover sequence nucleotide polymorphisms (SNP) in Brassica napus L. PLoS One 2013; 8:e81992. [PMID: 24312619 PMCID: PMC3849492 DOI: 10.1371/journal.pone.0081992] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 10/20/2013] [Indexed: 12/24/2022] Open
Abstract
Targeted genomic selection methodologies, or sequence capture, allow for DNA enrichment and large-scale resequencing and characterization of natural genetic variation in species with complex genomes, such as rapeseed canola (Brassica napus L., AACC, 2n=38). The main goal of this project was to combine sequence capture with next generation sequencing (NGS) to discover single nucleotide polymorphisms (SNPs) in specific areas of the B. napus genome historically associated (via quantitative trait loci –QTL– analysis) to traits of agronomical and nutritional importance. A 2.1 million feature sequence capture platform was designed to interrogate DNA sequence variation across 47 specific genomic regions, representing 51.2 Mb of the Brassica A and C genomes, in ten diverse rapeseed genotypes. All ten genotypes were sequenced using the 454 Life Sciences chemistry and to assess the effect of increased sequence depth, two genotypes were also sequenced using Illumina HiSeq chemistry. As a result, 589,367 potentially useful SNPs were identified. Analysis of sequence coverage indicated a four-fold increased representation of target regions, with 57% of the filtered SNPs falling within these regions. Sixty percent of discovered SNPs corresponded to transitions while 40% were transversions. Interestingly, fifty eight percent of the SNPs were found in genic regions while 42% were found in intergenic regions. Further, a high percentage of genic SNPs was found in exons (65% and 64% for the A and C genomes, respectively). Two different genotyping assays were used to validate the discovered SNPs. Validation rates ranged from 61.5% to 84% of tested SNPs, underpinning the effectiveness of this SNP discovery approach. Most importantly, the discovered SNPs were associated with agronomically important regions of the B. napus genome generating a novel data resource for research and breeding this crop species.
Collapse
Affiliation(s)
- Wayne E. Clarke
- Saskatoon Research Centre, Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan, Canada
| | - Isobel A. Parkin
- Saskatoon Research Centre, Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan, Canada
| | - Humberto A. Gajardo
- Genomics and Bioinformatics Unit, Agriaquaculture Nutritional Genomic Center (CGNA), Temuco, Louisiana, United States of America Araucanía, Chile
| | | | - Erin Higgins
- Saskatoon Research Centre, Agriculture and Agri-Food Canada, Saskatoon, Saskatchewan, Canada
| | - Christine Sidebottom
- Plant Biotechnology Institute, National Research Council Canada, Saskatoon, Saskatchewan, Canada
| | - Andrew G. Sharpe
- Plant Biotechnology Institute, National Research Council Canada, Saskatoon, Saskatchewan, Canada
| | - Rod J. Snowdon
- Department of Plant Breeding, Justus Liebig University, Giessen, Germany
| | - Maria L. Federico
- Genomics and Bioinformatics Unit, Agriaquaculture Nutritional Genomic Center (CGNA), Temuco, Louisiana, United States of America Araucanía, Chile
| | - Federico L. Iniguez-Luy
- Genomics and Bioinformatics Unit, Agriaquaculture Nutritional Genomic Center (CGNA), Temuco, Louisiana, United States of America Araucanía, Chile
- * E-mail:
| |
Collapse
|
22
|
Sharpe AG, Ramsay L, Sanderson LA, Fedoruk MJ, Clarke WE, Li R, Kagale S, Vijayan P, Vandenberg A, Bett KE. Ancient orphan crop joins modern era: gene-based SNP discovery and mapping in lentil. BMC Genomics 2013; 14:192. [PMID: 23506258 PMCID: PMC3635939 DOI: 10.1186/1471-2164-14-192] [Citation(s) in RCA: 100] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2012] [Accepted: 02/22/2013] [Indexed: 12/22/2022] Open
Abstract
Background The genus Lens comprises a range of closely related species within the galegoid clade of the Papilionoideae family. The clade includes other important crops (e.g. chickpea and pea) as well as a sequenced model legume (Medicago truncatula). Lentil is a global food crop increasing in importance in the Indian sub-continent and elsewhere due to its nutritional value and quick cooking time. Despite this importance there has been a dearth of genetic and genomic resources for the crop and this has limited the application of marker-assisted selection strategies in breeding. Results We describe here the development of a deep and diverse transcriptome resource for lentil using next generation sequencing technology. The generation of data in multiple cultivated (L. culinaris) and wild (L. ervoides) genotypes together with the utilization of a bioinformatics workflow enabled the identification of a large collection of SNPs and the subsequent development of a genotyping platform that was used to establish the first comprehensive genetic map of the L. culinaris genome. Extensive collinearity with M. truncatula was evident on the basis of sequence homology between mapped markers and the model genome and large translocations and inversions relative to M. truncatula were identified. An estimate for the time divergence of L. culinaris from L. ervoides and of both from M. truncatula was also calculated. Conclusions The availability of the genomic and derived molecular marker resources presented here will help change lentil breeding strategies and lead to increased genetic gain in the future.
Collapse
Affiliation(s)
- Andrew G Sharpe
- National Research Council Canada, 110 Gymnasium Place, Saskatoon, SK, S7N 0W9, Canada.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Eynck C, Séguin-Swartz G, Clarke WE, Parkin IAP. Monolignol biosynthesis is associated with resistance to Sclerotinia sclerotiorum in Camelina sativa. Mol Plant Pathol 2012; 13:887-99. [PMID: 22487550 PMCID: PMC6638904 DOI: 10.1111/j.1364-3703.2012.00798.x] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
The ascomycete Sclerotinia sclerotiorum is a necrotrophic plant pathogen with an extremely broad host range. It causes stem rot in Camelina sativa, a crucifer with great potential as an alternative oilseed crop. Lignification is a common phenomenon in the expression of resistance against necrotrophs, but the molecular mechanisms underlying this defence response are poorly understood. We present histochemical, gene expression and biochemical data investigating the role of monolignols in the resistance of C. sativa to S. sclerotiorum. Comparative studies with resistant and susceptible lines of C. sativa revealed substantial differences in constitutive transcript levels and gene regulation patterns for members of the gene family encoding cinnamoyl-CoA reductase (CCR), the first enzyme specifically committed to the synthesis of lignin monomers. These differences were associated with anatomical and metabolic factors. While the induction of CsCCR2 expression after inoculation with S. sclerotiorum was associated with the deposition of lignin mainly derived from guaiacyl monomers, high constitutive levels of CsCCR4 paralleled a high syringyl lignin content in healthy stems of resistant plants. The results provide evidence that plant cell wall strengthening plays a role in the resistance of C. sativa to S. sclerotiorum, and that both constitutive and inducible defence mechanisms contribute to reduced symptom development in resistant germplasm. This study provides the first characterization of quantitative resistance in C. sativa to S. sclerotiorum.
Collapse
Affiliation(s)
- Christina Eynck
- Agriculture and Agri-Food Canada, Saskatoon Research Centre, Saskatoon, SK, Canada, S7N 0X2.
| | | | | | | |
Collapse
|
24
|
Links MG, Holub E, Jiang RHY, Sharpe AG, Hegedus D, Beynon E, Sillito D, Clarke WE, Uzuhashi S, Borhan MH. De novo sequence assembly of Albugo candida reveals a small genome relative to other biotrophic oomycetes. BMC Genomics 2011; 12:503. [PMID: 21995639 PMCID: PMC3206522 DOI: 10.1186/1471-2164-12-503] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2011] [Accepted: 10/13/2011] [Indexed: 11/28/2022] Open
Abstract
Background Albugo candida is a biotrophic oomycete that parasitizes various species of Brassicaceae, causing a disease (white blister rust) with remarkable convergence in behaviour to unrelated rusts of basidiomycete fungi. Results A recent genome analysis of the oomycete Hyaloperonospora arabidopsidis suggests that a reduction in the number of genes encoding secreted pathogenicity proteins, enzymes for assimilation of inorganic nitrogen and sulphur represent a genomic signature for the evolution of obligate biotrophy. Here, we report a draft reference genome of a major crop pathogen Albugo candida (another obligate biotrophic oomycete) with an estimated genome of 45.3 Mb. This is very similar to the genome size of a necrotrophic oomycete Pythium ultimum (43 Mb) but less than half that of H. arabidopsidis (99 Mb). Sequencing of A. candida transcripts from infected host tissue and zoosporangia combined with genome-wide annotation revealed 15,824 predicted genes. Most of the predicted genes lack significant similarity with sequences from other oomycetes. Most intriguingly, A. candida appears to have a much smaller repertoire of pathogenicity-related proteins than H. arabidopsidis including genes that encode RXLR effector proteins, CRINKLER-like genes, and elicitins. Necrosis and Ethylene inducing Peptides were not detected in the genome of A. candida. Putative orthologs of tat-C, a component of the twin arginine translocase system, were identified from multiple oomycete genera along with proteins containing putative tat-secretion signal peptides. Conclusion Albugo candida has a comparatively small genome amongst oomycetes, retains motility of sporangial inoculum, and harbours a much smaller repertoire of candidate effectors than was recently reported for H. arabidopsidis. This minimal gene repertoire could indicate a lack of expansion, rather than a reduction, in the number of genes that signify the evolution of biotrophy in oomycetes.
Collapse
Affiliation(s)
- Matthew G Links
- Agriculture and Agri-Food Canada, Saskatoon, SK, S7N 0X2 Canada
| | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Parkin IA, Clarke WE, Sidebottom C, Zhang W, Robinson SJ, Links MG, Karcz S, Higgins EE, Fobert P, Sharpe AG. Towards unambiguous transcript mapping in the allotetraploid Brassica napusThis article is one of a selection of papers from the conference “Exploiting Genome-wide Association in Oilseed Brassicas: a model for genetic improvement of major OECD crops for sustainable farming”. Genome 2010; 53:929-38. [DOI: 10.1139/g10-053] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
The architecture of the Brassica napus genome is marked by its evolutionary origins. The genome of B. napus was formed from the hybridization of two closely related diploid Brassica species, both of which evolved from an hexaploid ancestor. The extensive whole genome duplication events in its near and distant past result in the allotetraploid genome of B. napus maintaining multiple copies of most genes, which predicts a highly complex and redundant transcriptome that can confound any expression analyses. A stringent assembly of 142 399 B. napus expressed sequence tags allowed the development of a well-differentiated set of reference transcripts, which were used as a foundation to assess the efficacy of available tools for identifying and distinguishing transcripts in B. napus ; including microarray hybridization and 3′ anchored sequence tag capture. Microarray platforms cannot distinguish transcripts derived from the two progenitors or close homologues, although observed differential expression appeared to be biased towards unique transcripts. The use of 3′ capture enhanced the ability to unambiguously identify homologues within the B. napus transcriptome but was limited by tag length. The ability to comprehensively catalogue gene expression in polyploid species could be transformed by the application of cost-efficient next generation sequencing technologies that will capture millions of long sequence tags.
Collapse
Affiliation(s)
- Isobel A.P. Parkin
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2, Canada
- Department of Computing Science, 176 Thorvaldson Building, University of Saskatchewan, 110 Science Place, Saskatoon, SK S7N 5C9, Canada
- National Research Council Plant Biotechnology Institute, 110 Gymnasium Place, Saskatoon, SK S7N 0W9, Canada
- Department of Veterinary Microbiology, WCVM, University of Saskatchewan, 52 Campus Drive, Saskatoon, SK S7N 5B4, Canada
| | - Wayne E. Clarke
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2, Canada
- Department of Computing Science, 176 Thorvaldson Building, University of Saskatchewan, 110 Science Place, Saskatoon, SK S7N 5C9, Canada
- National Research Council Plant Biotechnology Institute, 110 Gymnasium Place, Saskatoon, SK S7N 0W9, Canada
- Department of Veterinary Microbiology, WCVM, University of Saskatchewan, 52 Campus Drive, Saskatoon, SK S7N 5B4, Canada
| | - Christine Sidebottom
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2, Canada
- Department of Computing Science, 176 Thorvaldson Building, University of Saskatchewan, 110 Science Place, Saskatoon, SK S7N 5C9, Canada
- National Research Council Plant Biotechnology Institute, 110 Gymnasium Place, Saskatoon, SK S7N 0W9, Canada
- Department of Veterinary Microbiology, WCVM, University of Saskatchewan, 52 Campus Drive, Saskatoon, SK S7N 5B4, Canada
| | - Wentao Zhang
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2, Canada
- Department of Computing Science, 176 Thorvaldson Building, University of Saskatchewan, 110 Science Place, Saskatoon, SK S7N 5C9, Canada
- National Research Council Plant Biotechnology Institute, 110 Gymnasium Place, Saskatoon, SK S7N 0W9, Canada
- Department of Veterinary Microbiology, WCVM, University of Saskatchewan, 52 Campus Drive, Saskatoon, SK S7N 5B4, Canada
| | - Stephen J. Robinson
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2, Canada
- Department of Computing Science, 176 Thorvaldson Building, University of Saskatchewan, 110 Science Place, Saskatoon, SK S7N 5C9, Canada
- National Research Council Plant Biotechnology Institute, 110 Gymnasium Place, Saskatoon, SK S7N 0W9, Canada
- Department of Veterinary Microbiology, WCVM, University of Saskatchewan, 52 Campus Drive, Saskatoon, SK S7N 5B4, Canada
| | - Matthew G. Links
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2, Canada
- Department of Computing Science, 176 Thorvaldson Building, University of Saskatchewan, 110 Science Place, Saskatoon, SK S7N 5C9, Canada
- National Research Council Plant Biotechnology Institute, 110 Gymnasium Place, Saskatoon, SK S7N 0W9, Canada
- Department of Veterinary Microbiology, WCVM, University of Saskatchewan, 52 Campus Drive, Saskatoon, SK S7N 5B4, Canada
| | - Steve Karcz
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2, Canada
- Department of Computing Science, 176 Thorvaldson Building, University of Saskatchewan, 110 Science Place, Saskatoon, SK S7N 5C9, Canada
- National Research Council Plant Biotechnology Institute, 110 Gymnasium Place, Saskatoon, SK S7N 0W9, Canada
- Department of Veterinary Microbiology, WCVM, University of Saskatchewan, 52 Campus Drive, Saskatoon, SK S7N 5B4, Canada
| | - Erin E. Higgins
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2, Canada
- Department of Computing Science, 176 Thorvaldson Building, University of Saskatchewan, 110 Science Place, Saskatoon, SK S7N 5C9, Canada
- National Research Council Plant Biotechnology Institute, 110 Gymnasium Place, Saskatoon, SK S7N 0W9, Canada
- Department of Veterinary Microbiology, WCVM, University of Saskatchewan, 52 Campus Drive, Saskatoon, SK S7N 5B4, Canada
| | - Pierre Fobert
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2, Canada
- Department of Computing Science, 176 Thorvaldson Building, University of Saskatchewan, 110 Science Place, Saskatoon, SK S7N 5C9, Canada
- National Research Council Plant Biotechnology Institute, 110 Gymnasium Place, Saskatoon, SK S7N 0W9, Canada
- Department of Veterinary Microbiology, WCVM, University of Saskatchewan, 52 Campus Drive, Saskatoon, SK S7N 5B4, Canada
| | - Andrew G. Sharpe
- Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2, Canada
- Department of Computing Science, 176 Thorvaldson Building, University of Saskatchewan, 110 Science Place, Saskatoon, SK S7N 5C9, Canada
- National Research Council Plant Biotechnology Institute, 110 Gymnasium Place, Saskatoon, SK S7N 0W9, Canada
- Department of Veterinary Microbiology, WCVM, University of Saskatchewan, 52 Campus Drive, Saskatoon, SK S7N 5B4, Canada
| |
Collapse
|
26
|
Robinson SJ, Tang LH, Mooney BAG, McKay SJ, Clarke WE, Links MG, Karcz S, Regan S, Wu YY, Gruber MY, Cui D, Yu M, Parkin IAP. An archived activation tagged population of Arabidopsis thaliana to facilitate forward genetics approaches. BMC Plant Biol 2009; 9:101. [PMID: 19646253 PMCID: PMC3091532 DOI: 10.1186/1471-2229-9-101] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2009] [Accepted: 07/31/2009] [Indexed: 05/18/2023]
Abstract
BACKGROUND Functional genomics tools provide researchers with the ability to apply high-throughput techniques to determine the function and interaction of a diverse range of genes. Mutagenized plant populations are one such resource that facilitate gene characterisation. They allow complex physiological responses to be correlated with the expression of single genes in planta, through either reverse genetics where target genes are mutagenized to assay the affect, or through forward genetics where populations of mutant lines are screened to identify those whose phenotype diverges from wild type for a particular trait. One limitation of these types of populations is the prevalence of gene redundancy within plant genomes, which can mask the affect of individual genes. Activation or enhancer populations, which not only provide knock-out but also dominant activation mutations, can facilitate the study of such genes. RESULTS We have developed a population of almost 50,000 activation tagged A. thaliana lines that have been archived as individual lines to the T3 generation. The population is an excellent tool for both reverse and forward genetic screens and has been used successfully to identify a number of novel mutants. Insertion site sequences have been generated and mapped for 15,507 lines to enable further application of the population, while providing a clear distribution of T-DNA insertions across the genome. The population is being screened for a number of biochemical and developmental phenotypes, provisional data identifying novel alleles and genes controlling steps in proanthocyanidin biosynthesis and trichome development is presented. CONCLUSION This publicly available population provides an additional tool for plant researcher's to assist with determining gene function for the many as yet uncharacterised genes annotated within the Arabidopsis genome sequence http://aafc-aac.usask.ca/FST. The presence of enhancer elements on the inserted T-DNA molecule allows both knock-out and dominant activation phenotypes to be identified for traits of interest.
Collapse
Affiliation(s)
- Stephen J Robinson
- Agriculture and Agri-Food Canada, Saskatoon Research Centre, 107 Science Place, Saskatoon, S7N 0X2, Canada
| | - Lily H Tang
- Agriculture and Agri-Food Canada, Saskatoon Research Centre, 107 Science Place, Saskatoon, S7N 0X2, Canada
| | - Brent AG Mooney
- Agriculture and Agri-Food Canada, Saskatoon Research Centre, 107 Science Place, Saskatoon, S7N 0X2, Canada
| | - Sheldon J McKay
- Agriculture and Agri-Food Canada, Saskatoon Research Centre, 107 Science Place, Saskatoon, S7N 0X2, Canada
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Wayne E Clarke
- Agriculture and Agri-Food Canada, Saskatoon Research Centre, 107 Science Place, Saskatoon, S7N 0X2, Canada
| | - Matthew G Links
- Agriculture and Agri-Food Canada, Saskatoon Research Centre, 107 Science Place, Saskatoon, S7N 0X2, Canada
| | - Steven Karcz
- Agriculture and Agri-Food Canada, Saskatoon Research Centre, 107 Science Place, Saskatoon, S7N 0X2, Canada
| | - Sharon Regan
- Department of Biology, Biosciences Complex, Queens University, Kingston, Ontario, K7L 3N6, Canada
| | - Yun-Yun Wu
- Department of Biology, Biosciences Complex, Queens University, Kingston, Ontario, K7L 3N6, Canada
| | - Margaret Y Gruber
- Agriculture and Agri-Food Canada, Saskatoon Research Centre, 107 Science Place, Saskatoon, S7N 0X2, Canada
| | - Dejun Cui
- Agriculture and Agri-Food Canada, Saskatoon Research Centre, 107 Science Place, Saskatoon, S7N 0X2, Canada
| | - Min Yu
- Agriculture and Agri-Food Canada, Saskatoon Research Centre, 107 Science Place, Saskatoon, S7N 0X2, Canada
| | - Isobel AP Parkin
- Agriculture and Agri-Food Canada, Saskatoon Research Centre, 107 Science Place, Saskatoon, S7N 0X2, Canada
| |
Collapse
|
27
|
Smith C, Berry M, Clarke WE, Logan A. Differential expression of fibroblast growth factor-2 and fibroblast growth factor receptor 1 in a scarring and nonscarring model of CNS injury in the rat. Eur J Neurosci 2001; 13:443-56. [PMID: 11168551 DOI: 10.1046/j.1460-9568.2001.01400.x] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Injury to the adult brain results in abortive axon regeneration and the deposition of a dense fibrous glial scar. Therapeutic strategies to promote postinjury axon regeneration are likely to require antiscarring strategies. In neonatal brain wounds, scar material is not laid down and axons grow across the lesion site, either by de novo growth or regeneration. To achieve the therapeutic goal of recapitulating the nonscarring neonatal response in the injured adult, an understanding of how ontogenic differences in scarring reflect developmental diversities in the trophic response to injury is required. Fibrobast growth factor-2 (FGF-2) expression is developmentally regulated and has been implicated as a regulator of the wounding response of the adult rat central nervous system. We have investigated the expression of FGF-2 and fibroblast growth factor receptor 1 (FGFR1) after penetrating lesions to the cerebral cortex of 5 days post partum (dpp) (nonscarring) and 16 dpp and adult (scarring) rats. In situ hybridization, immunohistochemistry and Western blotting showed robust and sustained increases in FGF-2 and FGFR1 mRNA and protein in reactive astrocytes around the lesion in scarring rats, a response that was attenuated substantially in the nonscarring neonate. These results demonstrate that changes in astrocyte FGF-2 and FGFR1 expression are coincident with the establishment of a mature pattern of glial scarring after injury in the maturing central nervous system, but it is premature to infer a causal relationship without further experiments.
Collapse
Affiliation(s)
- C Smith
- Department of Medicine, University of Birmingham, Birmingham B15 2TT, UK
| | | | | | | |
Collapse
|
28
|
Clarke WE, Berry M, Smith C, Kent A, Logan A. Coordination of fibroblast growth factor receptor 1 (FGFR1) and fibroblast growth factor-2 (FGF-2) trafficking to nuclei of reactive astrocytes around cerebral lesions in adult rats. Mol Cell Neurosci 2001; 17:17-30. [PMID: 11161466 DOI: 10.1006/mcne.2000.0920] [Citation(s) in RCA: 67] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Traumatic injury to the adult central nervous system initiates a cascade of cellular and trophic events, culminating in the formation of a reactive gliotic scar through which transected axons fail to regenerate. Levels of fibroblast growth factor-2 (FGF-2), a potent gliogenic and neurotrophic factor, together with its full-length receptor, FGF receptor 1 (FGFR1) are coordinately and significantly increased postinjury in both nuclear and cytoplasmic fractions of extracted cerebral cortex biopsies after a penetrant injury. FGFR1 is colocalized with FGF-2 in the nuclei of reactive astrocytes, and here FGF-2 is associated with nuclear euchromatin. This study unequivocally demonstrates coordinate up-regulation and trafficking of FGF-2 and full-length FGFR1 to the nucleus of reactive astrocytes in an in vivo model of brain injury, thereby implicating a role in nuclear activity for these molecules. However, the precise contribution of nuclear FGF-2/FGFR1 to the pathophysiological response of astrocytes after injury is undetermined.
Collapse
MESH Headings
- Active Transport, Cell Nucleus/physiology
- Animals
- Astrocytes/metabolism
- Astrocytes/pathology
- Blotting, Western
- Brain/metabolism
- Brain/pathology
- Cell Nucleus/metabolism
- Cell Nucleus/ultrastructure
- Disease Models, Animal
- Euchromatin/metabolism
- Euchromatin/ultrastructure
- Female
- Fibroblast Growth Factor 2/metabolism
- Gliosis/etiology
- Gliosis/metabolism
- Gliosis/pathology
- Head Injuries, Penetrating/complications
- Head Injuries, Penetrating/metabolism
- Head Injuries, Penetrating/pathology
- Immunohistochemistry
- Protein Isoforms/metabolism
- Rats
- Rats, Wistar
- Receptor Protein-Tyrosine Kinases/metabolism
- Receptor, Fibroblast Growth Factor, Type 1
- Receptors, Fibroblast Growth Factor/metabolism
- Subcellular Fractions/metabolism
- Up-Regulation
Collapse
Affiliation(s)
- W E Clarke
- Department of Medicine, University of Birmingham, Birmingham, B15 2TT, United Kingdom
| | | | | | | | | |
Collapse
|
29
|
Gajendragadkar SV, Clarke WE. Comparison of sotalol hydrochloride and methyldopa in essential hypertension. J Int Med Res 1977; 5:233-5. [PMID: 881095 DOI: 10.1177/030006057700500403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
In a non-blind randomized comparison of sotalol hydrochloride and methyldopa in essential hypertension, the two drugs were equivocal in effect in reaching a preset hypotensive aim, in the maximum decreases from the baseline and in the mean reductions per week from the baseline. Milligram for milligram, sotalol was about twice as potent as methyldopa. No volunteered side-effects were noted for methyldopa; 3/19 for sotalol.
Collapse
|
30
|
Clarke WE. Dermatomyositis. Proc R Soc Med 1947; 40:475-476. [PMID: 19993582 PMCID: PMC2183542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
|