1
|
Humphrey J, Venkatesh S, Hasan R, Herb JT, de Paiva Lopes K, Küçükali F, Byrska-Bishop M, Evani US, Narzisi G, Fagegaltier D, Sleegers K, Phatnani H, Knowles DA, Fratta P, Raj T. Integrative transcriptomic analysis of the amyotrophic lateral sclerosis spinal cord implicates glial activation and suggests new risk genes. Nat Neurosci 2023; 26:150-162. [PMID: 36482247 DOI: 10.1038/s41593-022-01205-3] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 10/13/2022] [Indexed: 12/13/2022]
Abstract
Amyotrophic lateral sclerosis (ALS) is a progressively fatal neurodegenerative disease affecting motor neurons in the brain and spinal cord. In this study, we investigated gene expression changes in ALS via RNA sequencing in 380 postmortem samples from cervical, thoracic and lumbar spinal cord segments from 154 individuals with ALS and 49 control individuals. We observed an increase in microglia and astrocyte gene expression, accompanied by a decrease in oligodendrocyte gene expression. By creating a gene co-expression network in the ALS samples, we identified several activated microglia modules that negatively correlate with retrospective disease duration. We mapped molecular quantitative trait loci and found several potential ALS risk loci that may act through gene expression or splicing in the spinal cord and assign putative cell types for FNBP1, ACSL5, SH3RF1 and NFASC. Finally, we outline how common genetic variants associated with splicing of C9orf72 act as proxies for the well-known repeat expansion, and we use the same mechanism to suggest ATXN3 as a putative risk gene.
Collapse
Affiliation(s)
- Jack Humphrey
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomic Sciences & Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Sanan Venkatesh
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences & Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Rahat Hasan
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences & Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jake T Herb
- Graduate School of Biomedical Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Katia de Paiva Lopes
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences & Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Fahri Küçükali
- Complex Genetics of Alzheimer's Disease Group, Center for Molecular Neurology, VIB, Antwerp, Belgium
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | | | | | | | - Delphine Fagegaltier
- New York Genome Center, New York, NY, USA
- Center for Genomics of Neurodegenerative Disease, New York Genome Center, New York, NY, USA
| | - Kristel Sleegers
- Complex Genetics of Alzheimer's Disease Group, Center for Molecular Neurology, VIB, Antwerp, Belgium
- Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Hemali Phatnani
- New York Genome Center, New York, NY, USA
- Center for Genomics of Neurodegenerative Disease, New York Genome Center, New York, NY, USA
- Department of Neurology, Columbia University Irving Medical Center, Columbia University, New York, NY, USA
| | - David A Knowles
- New York Genome Center, New York, NY, USA
- Department of Computer Science, Columbia University, New York, NY, USA
| | - Pietro Fratta
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
| | - Towfique Raj
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomic Sciences & Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
2
|
Byrska-Bishop M, Evani US, Zhao X, Basile AO, Abel HJ, Regier AA, Corvelo A, Clarke WE, Musunuri R, Nagulapalli K, Fairley S, Runnels A, Winterkorn L, Lowy E, Paul Flicek, Germer S, Brand H, Hall IM, Talkowski ME, Narzisi G, Zody MC. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 2022; 185:3426-3440.e19. [PMID: 36055201 PMCID: PMC9439720 DOI: 10.1016/j.cell.2022.08.004] [Citation(s) in RCA: 212] [Impact Index Per Article: 106.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 06/21/2022] [Accepted: 08/03/2022] [Indexed: 01/05/2023]
Abstract
The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies.
Collapse
Affiliation(s)
| | | | - Xuefang Zhao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA,Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA,Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | | | - Haley J. Abel
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA,Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Allison A. Regier
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA,Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | | | - Wayne E. Clarke
- New York Genome Center, New York, NY 10013, USA,Outlier Informatics Inc., Saskatoon, SK S7H 1L4, Canada
| | | | | | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | | | - Ernesto Lowy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Harrison Brand
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA,Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA,Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ira M. Hall
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA,Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA,Center for Genomic Health, Yale University School of Medicine, New Haven, CT 06510, USA,Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Michael E. Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA,Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA,Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | - Michael C. Zody
- New York Genome Center, New York, NY 10013, USA,Corresponding author
| |
Collapse
|
3
|
Wagner J, Olson ND, Harris L, Khan Z, Farek J, Mahmoud M, Stankovic A, Kovacevic V, Yoo B, Miller N, Rosenfeld JA, Ni B, Zarate S, Kirsche M, Aganezov S, Schatz MC, Narzisi G, Byrska-Bishop M, Clarke W, Evani US, Markello C, Shafin K, Zhou X, Sidow A, Bansal V, Ebert P, Marschall T, Lansdorp P, Hanlon V, Mattsson CA, Barrio AM, Fiddes IT, Xiao C, Fungtammasan A, Chin CS, Wenger AM, Rowell WJ, Sedlazeck FJ, Carroll A, Salit M, Zook JM. Benchmarking challenging small variants with linked and long reads. Cell Genom 2022; 2:100128. [PMID: 36452119 PMCID: PMC9706577 DOI: 10.1016/j.xgen.2022.100128] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling and sequencing methods. Here we use accurate linked and long reads to expand benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are challenging for short reads. These benchmarks add more than 300,000 SNVs and 50,000 insertions or deletions (indels) and include 16% more exonic variants, many in challenging, clinically relevant genes not covered previously, such as PMS2. For HG002, we include 92% of the autosomal GRCh38 assembly while excluding regions problematic for benchmarking small variants, such as copy number variants, that should not have been in the previous version, which included 85% of GRCh38. It identifies eight times more false negatives in a short read variant call set relative to our previous benchmark. We demonstrate that this benchmark reliably identifies false positives and false negatives across technologies, enabling ongoing methods development.
Collapse
Affiliation(s)
- Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
- Corresponding author
| | - Nathan D. Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| | - Lindsay Harris
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| | - Ziad Khan
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Jesse Farek
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Ana Stankovic
- Seven Bridges, Omladinskih brigada 90g, 11070 Belgrade, Republic of Serbia
| | - Vladimir Kovacevic
- Seven Bridges, Omladinskih brigada 90g, 11070 Belgrade, Republic of Serbia
| | - Byunggil Yoo
- Children’s Mercy Kansas City, Kansas City, MO, USA
| | - Neil Miller
- Children’s Mercy Kansas City, Kansas City, MO, USA
| | | | - Bohan Ni
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Giuseppe Narzisi
- New York Genome Center, 101 Avenue of the Americas, New York, NY, USA
| | | | - Wayne Clarke
- New York Genome Center, 101 Avenue of the Americas, New York, NY, USA
| | - Uday S. Evani
- New York Genome Center, 101 Avenue of the Americas, New York, NY, USA
| | - Charles Markello
- University of California at Santa Cruz Genomics Institute, 1156 High Street, Santa Cruz, CA, USA
| | - Kishwar Shafin
- University of California at Santa Cruz Genomics Institute, 1156 High Street, Santa Cruz, CA, USA
| | - Xin Zhou
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Arend Sidow
- Department of Pathology, Stanford University, Stanford, CA 94305, USA
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Vikas Bansal
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
| | - Peter Ebert
- Institute of Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | - Tobias Marschall
- Institute of Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | - Peter Lansdorp
- Institute of Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany
| | - Vincent Hanlon
- Terry Fox Laboratory, BC Cancer Research Institute and Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Carl-Adam Mattsson
- Terry Fox Laboratory, BC Cancer Research Institute and Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | | | | | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | | | | | | | | | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Andrew Carroll
- Google Inc., 1600 Amphitheatre Pkwy., Mountain View, CA 94040, USA
| | - Marc Salit
- Joint Initiative for Metrology in Biology, SLAC National Laboratory, Stanford, CA, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
- Corresponding author
| |
Collapse
|
4
|
Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, Sulovari A, Ebler J, Zhou W, Serra Mari R, Yilmaz F, Zhao X, Hsieh P, Lee J, Kumar S, Lin J, Rausch T, Chen Y, Ren J, Santamarina M, Höps W, Ashraf H, Chuang NT, Yang X, Munson KM, Lewis AP, Fairley S, Tallon LJ, Clarke WE, Basile AO, Byrska-Bishop M, Corvelo A, Evani US, Lu TY, Chaisson MJP, Chen J, Li C, Brand H, Wenger AM, Ghareghani M, Harvey WT, Raeder B, Hasenfeld P, Regier AA, Abel HJ, Hall IM, Flicek P, Stegle O, Gerstein MB, Tubio JMC, Mu Z, Li YI, Shi X, Hastie AR, Ye K, Chong Z, Sanders AD, Zody MC, Talkowski ME, Mills RE, Devine SE, Lee C, Korbel JO, Marschall T, Eichler EE. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 2021; 372:eabf7117. [PMID: 33632895 PMCID: PMC8026704 DOI: 10.1126/science.abf7117] [Citation(s) in RCA: 270] [Impact Index Per Article: 90.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 02/09/2021] [Indexed: 12/14/2022]
Abstract
Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.
Collapse
Affiliation(s)
- Peter Ebert
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Bernardo Rodriguez-Martin
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Marc Jan Bonder
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Jana Ebler
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| | - Rebecca Serra Mari
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Xuefang Zhao
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Joyce Lee
- Bionano Genomics, San Diego, CA 92121, USA
| | - Sushant Kumar
- Program in Computational Biology and Bioinformatics, Yale University, BASS 432 and 437, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Jiadong Lin
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
| | - Tobias Rausch
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Yu Chen
- Department of Genetics and Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Jingwen Ren
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Martin Santamarina
- Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
- Department of Zoology, Genetics, and Physical Anthropology, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Wolfram Höps
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Hufsah Ashraf
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Nelson T Chuang
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | - Xiaofei Yang
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Luke J Tallon
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | | | | | | | | | | | - Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Junjie Chen
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | - Chong Li
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Aaron M Wenger
- Pacific Biosciences of California, Menlo Park, CA 94025, USA
| | - Maryam Ghareghani
- Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, 66123 Saarbrücken, Germany
- Saarbrücken Graduate School of Computer Science, Saarland University, Saarland Informatics Campus E1.3, 66123 Saarbrücken, Germany
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Benjamin Raeder
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Patrick Hasenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Allison A Regier
- Department of Medicine, Washington University, St. Louis, MO 63108, USA
| | - Haley J Abel
- Department of Medicine, Washington University, St. Louis, MO 63108, USA
| | - Ira M Hall
- Department of Genetics, Yale School of Medicine, 333 Cedar Street, New Haven, CT 06510, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Oliver Stegle
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Mark B Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, BASS 432 and 437, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Jose M C Tubio
- Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
- Department of Zoology, Genetics, and Physical Anthropology, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Zepeng Mu
- Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL 60637, USA
| | - Yang I Li
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Xinghua Shi
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | | | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
- Department of Human Genetics, University of Michigan, 1241 E. Catherine Street, Ann Arbor, MI 48109, USA
| | - Zechen Chong
- Department of Genetics and Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Ashley D Sanders
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | | | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ryan E Mills
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, 1241 E. Catherine Street, Ann Arbor, MI 48109, USA
| | - Scott E Devine
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA.
- Precision Medicine Center, The First Affiliated Hospital of Xi'an Jiaotong University, 277 West Yanta Road, Xi'an, 710061, Shaanxi, China
- Department of Graduate Studies-Life Sciences, Ewha Womans University, Ewhayeodae-gil, Seodaemun-gu, Seoul 120-750, South Korea
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tobias Marschall
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
5
|
Challis D, Antunes L, Garrison E, Banks E, Evani US, Muzny D, Poplin R, Gibbs RA, Marth G, Yu F. The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes. BMC Genomics 2015; 16:143. [PMID: 25765891 PMCID: PMC4352271 DOI: 10.1186/s12864-015-1333-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Accepted: 02/09/2015] [Indexed: 12/30/2022] Open
Abstract
Background Identifying insertion/deletion polymorphisms (INDELs) with high confidence has been intrinsically challenging in short-read sequencing data. Here we report our approach for improving INDEL calling accuracy by using a machine learning algorithm to combine call sets generated with three independent methods, and by leveraging the strengths of each individual pipeline. Utilizing this approach, we generated a consensus exome INDEL call set from a large dataset generated by the 1000 Genomes Project (1000G), maximizing both the sensitivity and the specificity of the calls. Results This consensus exome INDEL call set features 7,210 INDELs, from 1,128 individuals across 13 populations included in the 1000 Genomes Phase 1 dataset, with a false discovery rate (FDR) of about 7.0%. Conclusions In our study we further characterize the patterns and distributions of these exonic INDELs with respect to density, allele length, and site frequency spectrum, as well as the potential mutagenic mechanisms of coding INDELs in humans. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1333-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Danny Challis
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. .,Present address: Monsanto Company, Ankeny, IA, 50021, USA.
| | - Lilian Antunes
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. .,Present address: Washington University School of Medicine, Saint Louis, MO, 63110, USA.
| | - Erik Garrison
- Department of Biology, Boston College, Wellcome Trust Sanger Institute, Chestnut Hill, MA, 02467, USA.
| | - Eric Banks
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA.
| | - Uday S Evani
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. .,Present address: New York Genome Center, New York, NY, 10013, USA.
| | - Donna Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Ryan Poplin
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA.
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Gabor Marth
- Department of Biology, Boston College, Wellcome Trust Sanger Institute, Chestnut Hill, MA, 02467, USA. .,Present address: Department of Human Genetics and Utah Center for Genetic Discovery, University of Utah School of Medicine, Salt Lake City, UT, 84112, USA.
| | - Fuli Yu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. .,Institute of Neurology, Tianjin Medical University General Hospital, Tianjin, 300052, China.
| |
Collapse
|
6
|
Khurana E, Fu Y, Colonna V, Mu XJ, Kang HM, Lappalainen T, Sboner A, Lochovsky L, Chen J, Harmanci A, Das J, Abyzov A, Balasubramanian S, Beal K, Chakravarty D, Challis D, Chen Y, Clarke D, Clarke L, Cunningham F, Evani US, Flicek P, Fragoza R, Garrison E, Gibbs R, Gümüş ZH, Herrero J, Kitabayashi N, Kong Y, Lage K, Liluashvili V, Lipkin SM, MacArthur DG, Marth G, Muzny D, Pers TH, Ritchie GRS, Rosenfeld JA, Sisu C, Wei X, Wilson M, Xue Y, Yu F, Dermitzakis ET, Yu H, Rubin MA, Tyler-Smith C, Gerstein M. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 2013; 342:1235587. [PMID: 24092746 DOI: 10.1126/science.1235587] [Citation(s) in RCA: 269] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Interpreting variants, especially noncoding ones, in the increasing number of personal genomes is challenging. We used patterns of polymorphisms in functionally annotated regions in 1092 humans to identify deleterious variants; then we experimentally validated candidates. We analyzed both coding and noncoding regions, with the former corroborating the latter. We found regions particularly sensitive to mutations ("ultrasensitive") and variants that are disruptive because of mechanistic effects on transcription-factor binding (that is, "motif-breakers"). We also found variants in regions with higher network centrality tend to be deleterious. Insertions and deletions followed a similar pattern to single-nucleotide variants, with some notable exceptions (e.g., certain deletions and enhancers). On the basis of these patterns, we developed a computational tool (FunSeq), whose application to ~90 cancer genomes reveals nearly a hundred candidate noncoding drivers.
Collapse
Affiliation(s)
- Ekta Khurana
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Yao Fu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Vincenza Colonna
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK.,Institute of Genetics and Biophysics, National Research Council (CNR), 80131 Naples, Italy
| | - Xinmeng Jasmine Mu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Hyun Min Kang
- Center for Statistical Genetics, Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Tuuli Lappalainen
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland.,Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, 1211 Geneva, Switzerland.,Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland
| | - Andrea Sboner
- Institute for Precision Medicine and the Department of Pathology and Laboratory Medicine, Weill Cornell Medical College and New York-Presbyterian Hospital, New York, NY 10065, USA.,The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY 10021, USA
| | - Lucas Lochovsky
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Jieming Chen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.,Integrated Graduate Program in Physical and Engineering Biology, Yale University, New Haven, CT 06520, USA
| | - Arif Harmanci
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Jishnu Das
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Alexej Abyzov
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Suganthi Balasubramanian
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Kathryn Beal
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dimple Chakravarty
- Institute for Precision Medicine and the Department of Pathology and Laboratory Medicine, Weill Cornell Medical College and New York-Presbyterian Hospital, New York, NY 10065, USA
| | - Daniel Challis
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX 77030, USA
| | - Yuan Chen
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK
| | - Declan Clarke
- Department of Chemistry, Yale University, New Haven, CT 06520, USA
| | - Laura Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Uday S Evani
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX 77030, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Robert Fragoza
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA.,Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Erik Garrison
- Department of Biology, Boston College, Chestnut Hill, MA 02467, USA
| | - Richard Gibbs
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX 77030, USA
| | - Zeynep H Gümüş
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY 10021, USA.,Department of Physiology and Biophysics, Weill Cornell Medical College, New York, NY, 10065, USA
| | - Javier Herrero
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Naoki Kitabayashi
- Institute for Precision Medicine and the Department of Pathology and Laboratory Medicine, Weill Cornell Medical College and New York-Presbyterian Hospital, New York, NY 10065, USA
| | - Yong Kong
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA.,Keck Biotechnology Resource Laboratory, Yale University, New Haven, CT 06511, USA
| | - Kasper Lage
- Pediatric Surgical Research Laboratories, MassGeneral Hospital for Children, Massachusetts General Hospital, Boston, MA 02114, USA.,Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA.,Harvard Medical School, Boston, MA 02115, USA.,Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark.,Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
| | - Vaja Liluashvili
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY 10021, USA.,Department of Physiology and Biophysics, Weill Cornell Medical College, New York, NY, 10065, USA
| | - Steven M Lipkin
- Department of Medicine, Weill Cornell Medical College, New York, NY 10065, USA
| | - Daniel G MacArthur
- Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA.,Program in Medical and Population Genetics, Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA 02142, USA
| | - Gabor Marth
- Department of Biology, Boston College, Chestnut Hill, MA 02467, USA
| | - Donna Muzny
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX 77030, USA
| | - Tune H Pers
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark.,Division of Endocrinology and Center for Basic and Translational Obesity Research, Children's Hospital, Boston, MA 02115, USA.,Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Graham R S Ritchie
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jeffrey A Rosenfeld
- Department of Medicine, Rutgers New Jersey Medical School, Newark, NJ 07101, USA.,IST/High Performance and Research Computing, Rutgers University Newark, NJ 07101, USA.,Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA
| | - Cristina Sisu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Xiaomu Wei
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA.,Department of Medicine, Weill Cornell Medical College, New York, NY 10065, USA
| | - Michael Wilson
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.,Child Study Center, Yale University, New Haven, CT 06520, USA
| | - Yali Xue
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK
| | - Fuli Yu
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX 77030, USA
| | | | - Emmanouil T Dermitzakis
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland.,Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, 1211 Geneva, Switzerland.,Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland
| | - Haiyuan Yu
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Mark A Rubin
- Institute for Precision Medicine and the Department of Pathology and Laboratory Medicine, Weill Cornell Medical College and New York-Presbyterian Hospital, New York, NY 10065, USA
| | - Chris Tyler-Smith
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA.,Department of Computer Science, Yale University, New Haven, CT 06520, USA
| |
Collapse
|
7
|
Wittkop T, TerAvest E, Evani US, Fleisch KM, Berman AE, Powell C, Shah NH, Mooney SD. STOP using just GO: a multi-ontology hypothesis generation tool for high throughput experimentation. BMC Bioinformatics 2013; 14:53. [PMID: 23409969 PMCID: PMC3635999 DOI: 10.1186/1471-2105-14-53] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2012] [Accepted: 01/28/2013] [Indexed: 12/21/2022] Open
Abstract
Background Gene Ontology (GO) enrichment analysis remains one of the most common methods for hypothesis generation from high throughput datasets. However, we believe that researchers strive to test other hypotheses that fall outside of GO. Here, we developed and evaluated a tool for hypothesis generation from gene or protein lists using ontological concepts present in manually curated text that describes those genes and proteins. Results As a consequence we have developed the method Statistical Tracking of Ontological Phrases (STOP) that expands the realm of testable hypotheses in gene set enrichment analyses by integrating automated annotations of genes to terms from over 200 biomedical ontologies. While not as precise as manually curated terms, we find that the additional enriched concepts have value when coupled with traditional enrichment analyses using curated terms. Conclusion Multiple ontologies have been developed for gene and protein annotation, by using a dataset of both manually curated GO terms and automatically recognized concepts from curated text we can expand the realm of hypotheses that can be discovered. The web application STOP is available at http://mooneygroup.org/stop/.
Collapse
|
8
|
Evani US, Challis D, Yu J, Jackson AR, Paithankar S, Bainbridge MN, Jakkamsetti A, Pham P, Coarfa C, Milosavljevic A, Yu F. Atlas2 Cloud: a framework for personal genome analysis in the cloud. BMC Genomics 2012; 13 Suppl 6:S19. [PMID: 23134663 PMCID: PMC3481437 DOI: 10.1186/1471-2164-13-s6-s19] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Background Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues. Results We successfully enabled the Atlas2 Cloud pipeline for personal genome analysis on two different cloud service platforms: a community cloud via the Genboree Workbench, and a commercial cloud via the Amazon Web Services using Software-as-a-Service model. We report a case study of personal genome analysis using our Atlas2 Genboree pipeline. We also outline a detailed cost structure for running Atlas2 Amazon on whole exome capture data, providing cost projections in terms of storage, compute and I/O when running Atlas2 Amazon on a large data set. Conclusions We find that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way we develop tools, in standard operating procedures, and in funding mechanisms.
Collapse
Affiliation(s)
- Uday S Evani
- The Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Peters TW, Rardin MJ, Czerwieniec G, Evani US, Reis-Rodrigues P, Lithgow GJ, Mooney SD, Gibson BW, Hughes RE. Tor1 regulates protein solubility in Saccharomyces cerevisiae. Mol Biol Cell 2012; 23:4679-88. [PMID: 23097491 PMCID: PMC3521677 DOI: 10.1091/mbc.e12-08-0620] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The transition of proteins targeted for autophagic degradation from the soluble to the insoluble phase is regulated in an ATG1-independent mechanism by TORC1. This process is likely a critical mechanism for maintaining protein homeostasis when challenged with proteomic stress. Accumulation of insoluble protein in cells is associated with aging and aging-related diseases; however, the roles of insoluble protein in these processes are uncertain. The nature and impact of changes to protein solubility during normal aging are less well understood. Using quantitative mass spectrometry, we identify 480 proteins that become insoluble during postmitotic aging in Saccharomyces cerevisiae and show that this ensemble of insoluble proteins is similar to those that accumulate in aging nematodes. SDS-insoluble protein is present exclusively in a nonquiescent subpopulation of postmitotic cells, indicating an asymmetrical distribution of this protein. In addition, we show that nitrogen starvation of young cells is sufficient to cause accumulation of a similar group of insoluble proteins. Although many of the insoluble proteins identified are known to be autophagic substrates, induction of macroautophagy is not required for insoluble protein formation. However, genetic or chemical inhibition of the Tor1 kinase is sufficient to promote accumulation of insoluble protein. We conclude that target of rapamycin complex 1 regulates accumulation of insoluble proteins via mechanisms acting upstream of macroautophagy. Our data indicate that the accumulation of proteins in an SDS-insoluble state in postmitotic cells represents a novel autophagic cargo preparation process that is regulated by the Tor1 kinase.
Collapse
|
10
|
Reis‐Rodrigues P, Czerwieniec G, Peters TW, Evani US, Alavez S, Gaman EA, Vantipalli M, Mooney SD, Gibson BW, Lithgow GJ, Hughes RE. Proteomic analysis of age-dependent changes in protein solubility identifies genes that modulate lifespan. Aging Cell 2012; 11:120-7. [PMID: 22103665 PMCID: PMC3437485 DOI: 10.1111/j.1474-9726.2011.00765.x] [Citation(s) in RCA: 133] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
While it is generally recognized that misfolding of specific proteins can cause late-onset disease, the contribution of protein aggregation to the normal aging process is less well understood. To address this issue, a mass spectrometry-based proteomic analysis was performed to identify proteins that adopt sodium dodecyl sulfate (SDS)-insoluble conformations during aging in Caenorhabditis elegans. SDS-insoluble proteins extracted from young and aged C. elegans were chemically labeled by isobaric tagging for relative and absolute quantification (iTRAQ) and identified by liquid chromatography and mass spectrometry. Two hundred and three proteins were identified as being significantly enriched in an SDS-insoluble fraction in aged nematodes and were largely absent from a similar protein fraction in young nematodes. The SDS-insoluble fraction in aged animals contains a diverse range of proteins including a large number of ribosomal proteins. Gene ontology analysis revealed highly significant enrichments for energy production and translation functions. Expression of genes encoding insoluble proteins observed in aged nematodes was knocked down using RNAi, and effects on lifespan were measured. 41% of genes tested were shown to extend lifespan after RNAi treatment, compared with 18% in a control group of genes. These data indicate that genes encoding proteins that become insoluble with age are enriched for modifiers of lifespan. This demonstrates that proteomic approaches can be used to identify genes that modify lifespan. Finally, these observations indicate that the accumulation of insoluble proteins with diverse functions may be a general feature of aging.
Collapse
Affiliation(s)
- Pedro Reis‐Rodrigues
- The Interdisciplinary Research Consortium on Geroscience, The Buck Institute for Research on Aging, 8001 Redwood Blvd., Novato, CA 94949, USA
| | - Gregg Czerwieniec
- The Interdisciplinary Research Consortium on Geroscience, The Buck Institute for Research on Aging, 8001 Redwood Blvd., Novato, CA 94949, USA
| | - Theodore W. Peters
- The Interdisciplinary Research Consortium on Geroscience, The Buck Institute for Research on Aging, 8001 Redwood Blvd., Novato, CA 94949, USA
| | - Uday S. Evani
- The Interdisciplinary Research Consortium on Geroscience, The Buck Institute for Research on Aging, 8001 Redwood Blvd., Novato, CA 94949, USA
| | - Silvestre Alavez
- The Interdisciplinary Research Consortium on Geroscience, The Buck Institute for Research on Aging, 8001 Redwood Blvd., Novato, CA 94949, USA
| | - Emily A. Gaman
- The Interdisciplinary Research Consortium on Geroscience, The Buck Institute for Research on Aging, 8001 Redwood Blvd., Novato, CA 94949, USA
| | - Maithili Vantipalli
- The Interdisciplinary Research Consortium on Geroscience, The Buck Institute for Research on Aging, 8001 Redwood Blvd., Novato, CA 94949, USA
| | - Sean D. Mooney
- The Interdisciplinary Research Consortium on Geroscience, The Buck Institute for Research on Aging, 8001 Redwood Blvd., Novato, CA 94949, USA
| | | | | | | |
Collapse
|
11
|
Challis D, Yu J, Evani US, Jackson AR, Paithankar S, Coarfa C, Milosavljevic A, Gibbs RA, Yu F. An integrative variant analysis suite for whole exome next-generation sequencing data. BMC Bioinformatics 2012; 13:8. [PMID: 22239737 PMCID: PMC3292476 DOI: 10.1186/1471-2105-13-8] [Citation(s) in RCA: 209] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2011] [Accepted: 01/12/2012] [Indexed: 11/24/2022] Open
Abstract
Background Whole exome capture sequencing allows researchers to cost-effectively sequence the coding regions of the genome. Although the exome capture sequencing methods have become routine and well established, there is currently a lack of tools specialized for variant calling in this type of data. Results Using statistical models trained on validated whole-exome capture sequencing data, the Atlas2 Suite is an integrative variant analysis pipeline optimized for variant discovery on all three of the widely used next generation sequencing platforms (SOLiD, Illumina, and Roche 454). The suite employs logistic regression models in conjunction with user-adjustable cutoffs to accurately separate true SNPs and INDELs from sequencing and mapping errors with high sensitivity (96.7%). Conclusion We have implemented the Atlas2 Suite and applied it to 92 whole exome samples from the 1000 Genomes Project. The Atlas2 Suite is available for download at http://sourceforge.net/projects/atlas2/. In addition to a command line version, the suite has been integrated into the Genboree Workbench, allowing biomedical scientists with minimal informatics expertise to remotely call, view, and further analyze variants through a simple web interface. The existing genomic databases displayed via the Genboree browser also streamline the process from variant discovery to functional genomics analysis, resulting in an off-the-shelf toolkit for the broader community.
Collapse
Affiliation(s)
- Danny Challis
- The Human Genome Sequencing Center, Baylor College of Medicine, Houston, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Mort M, Evani US, Krishnan VG, Kamati KK, Baenziger PH, Bagchi A, Peters BJ, Sathyesh R, Li B, Sun Y, Xue B, Shah NH, Kann MG, Cooper DN, Radivojac P, Mooney SD. In silico functional profiling of human disease-associated and polymorphic amino acid substitutions. Hum Mutat 2010; 31:335-46. [PMID: 20052762 DOI: 10.1002/humu.21192] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
An important challenge in translational bioinformatics is to understand how genetic variation gives rise to molecular changes at the protein level that can precipitate both monogenic and complex disease. To this end, we compiled datasets of human disease-associated amino acid substitutions (AAS) in the contexts of inherited monogenic disease, complex disease, functional polymorphisms with no known disease association, and somatic mutations in cancer, and compared them with respect to predicted functional sites in proteins. Using the sequence homology-based tool SIFT to estimate the proportion of deleterious AAS in each dataset, only complex disease AAS were found to be indistinguishable from neutral polymorphic AAS. Investigation of monogenic disease AAS predicted to be nondeleterious by SIFT were characterized by a significant enrichment for inherited AAS within solvent accessible residues, regions of intrinsic protein disorder, and an association with the loss or gain of various posttranslational modifications. Sites of structural and/or functional interest were therefore surmised to constitute useful additional features with which to identify the molecular disruptions caused by deleterious AAS. A range of bioinformatic tools, designed to predict structural and functional sites in protein sequences, were then employed to demonstrate that intrinsic biases exist in terms of the distribution of different types of human AAS with respect to specific structural, functional and pathological features. Our Web tool, designed to potentiate the functional profiling of novel AAS, has been made available at http://profile.mutdb.org/.
Collapse
Affiliation(s)
- Matthew Mort
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|