1
|
Jun G, English AC, Metcalf GA, Yang J, Chaisson MJP, Pankratz N, Menon VK, Salerno WJ, Krasheninina O, Smith AV, Lane JA, Blackwell T, Kang HM, Salvi S, Meng Q, Shen H, Pasham D, Bhamidipati S, Kottapalli K, Arnett DK, Ashley-Koch A, Auer PL, Beutel KM, Bis JC, Blangero J, Bowden DW, Brody JA, Cade BE, Chen YDI, Cho MH, Curran JE, Fornage M, Freedman BI, Fingerlin T, Gelb BD, Hou L, Hung YJ, Kane JP, Kaplan R, Kim W, Loos RJ, Marcus GM, Mathias RA, McGarvey ST, Montgomery C, Naseri T, Nouraie SM, Preuss MH, Palmer ND, Peyser PA, Raffield LM, Ratan A, Redline S, Reupena S, Rotter JI, Rich SS, Rienstra M, Ruczinski I, Sankaran VG, Schwartz DA, Seidman CE, Seidman JG, Silverman EK, Smith JA, Stilp A, Taylor KD, Telen MJ, Weiss ST, Williams LK, Wu B, Yanek LR, Zhang Y, Lasky-Su J, Gingras MC, Dutcher SK, Eichler EE, Gabriel S, Germer S, Kim R, Viaud-Martinez KA, Nickerson DA, Luo J, Reiner A, Gibbs RA, Boerwinkle E, Abecasis G, Sedlazeck FJ. Structural variation across 138,134 samples in the TOPMed consortium. Res Sq 2023:rs.3.rs-2515453. [PMID: 36778386 PMCID: PMC9915771 DOI: 10.21203/rs.3.rs-2515453/v1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Ever larger Structural Variant (SV) catalogs highlighting the diversity within and between populations help researchers better understand the links between SVs and disease. The identification of SVs from DNA sequence data is non-trivial and requires a balance between comprehensiveness and precision. Here we present a catalog of 355,667 SVs (59.34% novel) across autosomes and the X chromosome (50bp+) from 138,134 individuals in the diverse TOPMed consortium. We describe our methodologies for SV inference resulting in high variant quality and >90% allele concordance compared to long-read de-novo assemblies of well-characterized control samples. We demonstrate utility through significant associations between SVs and important various cardio-metabolic and hematologic traits. We have identified 690 SV hotspots and deserts and those that potentially impact the regulation of medically relevant genes. This catalog characterizes SVs across multiple populations and will serve as a valuable tool to understand the impact of SV on disease development and progression.
Collapse
Affiliation(s)
- Goo Jun
- Human Genetics Center, School of Public Health, University of Texas Health Science Center at Houston
| | - Adam C English
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Ginger A Metcalf
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Jianzhi Yang
- University of Southern California, Los Angeles, CA, USA
| | | | | | - Vipin K Menon
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | | | | | - Albert V Smith
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - John A Lane
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Tom Blackwell
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Hyun Min Kang
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Sejal Salvi
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Qingchang Meng
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Hua Shen
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Divya Pasham
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Sravya Bhamidipati
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Kavya Kottapalli
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Donna K. Arnett
- Department of Epidemiology, University of Kentucky College of Public Health
| | - Allison Ashley-Koch
- Department of Medicine, Duke University Medical Center, Durham, NC
- Duke Molecular Physiology Institute, Duke University Medical Center, Durham, NC
| | - Paul L. Auer
- Division of Biostatistics and Cancer Center, Medical College of Wisconsin, Milwaukee WI
| | | | - Joshua C. Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas, Rio Grande Valley School of Medicine, Brownsville, TX
| | - Donald W. Bowden
- Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Jennifer A. Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Brian E. Cade
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, MA
| | - Yii-Der Ida Chen
- Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA USA
| | - Michael H. Cho
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Joanne E. Curran
- Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Myriam Fornage
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX
| | - Barry I. Freedman
- Department of Internal Medicine, Section on Nephrology, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Tasha Fingerlin
- Center for Genes, Environment and Health, National Jewish Health, 1400 Jackson St., Denver, CO, 80206, USA
| | - Bruce D. Gelb
- Mindich Child Health and Development Institute and the Departments of Pediatrics and Genetics & Genomic Sciences, Icahn School of Medicine at Mount Sinai
| | | | - Yi-Jen Hung
- Institute of Preventive Medicine, National Defense Medical Center, Taiwan
| | - John P Kane
- Cardiovascular Research Institute, University of California, San Francisco
| | - Robert Kaplan
- Department of epidemiology and population health, Albert Einstein College of Medicine, Bronx NY USA
| | - Wonji Kim
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Ruth J.F. Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Gregory M Marcus
- Division of Cardiology, University of California, San Francisco CA
| | - Rasika A. Mathias
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD
| | - Stephen T. McGarvey
- Department of Epidemiology, International Health Institute and Department of Anthropology, Brown University
| | - Courtney Montgomery
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation
| | - Take Naseri
- Ministry of Health, Government of Samoa, Apia, Samoa
| | - S. Mehdi Nouraie
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213
| | - Michael H. Preuss
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | | | - Patricia A. Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI USA
| | | | - Aakrosh Ratan
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, MA
| | | | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA USA
- Department of Cardiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Stephen S. Rich
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI USA
| | - Michiel Rienstra
- Department of Cardiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Ingo Ruczinski
- Department of Biostatistics, Johns Hopkins University Bloomberg, School of Public Health, Baltimore, MD, USA
| | - Vijay G. Sankaran
- Division of Hematology/Oncology, Boston Children’s Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
| | | | - Christine E. Seidman
- Department of Genetics, Harvard Medical School
- Cardiovascular Division, Brigham & Women’s Hospital, Harvard University
- Howard Hughes Medical Institute, Harvard University
| | | | - Edwin K. Silverman
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA
| | - Jennifer A. Smith
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213
| | - Adrienne Stilp
- Department of Biostatistics, University of Washington, Seattle, WA
| | - Kent D. Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA USA
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA USA
| | - Marilyn J. Telen
- Department of Medicine, Duke University Medical Center, Durham, NC
| | - Scott T. Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - L. Keoki Williams
- Center for Individualized and Genomic Medicine Research (CIGMA), Department of Internal Medicine, Henry Ford Health System, Detroit, Michigan, United States of America
| | - Baojun Wu
- Center for Individualized and Genomic Medicine Research (CIGMA), Department of Internal Medicine, Henry Ford Health System, Detroit, Michigan, United States of America
| | - Lisa R. Yanek
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Yingze Zhang
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213
| | - Jessica Lasky-Su
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | | | - Susan K. Dutcher
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington, USA
| | | | | | - Ryan Kim
- Psomagen, Inc.,Rockville, Maryland, USA
| | | | | | | | - James Luo
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alex Reiner
- Department of Epidemiology, University of Washington, Seattle, WA 98109, USA
| | - Richard A Gibbs
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, School of Public Health, University of Texas Health Science Center at Houston
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Goncalo Abecasis
- Regeneron Genetics Center
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Fritz J Sedlazeck
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA
| |
Collapse
|
2
|
Jun G, English AC, Metcalf GA, Yang J, Chaisson MJP, Pankratz N, Menon VK, Salerno WJ, Krasheninina O, Smith AV, Lane JA, Blackwell T, Kang HM, Salvi S, Meng Q, Shen H, Pasham D, Bhamidipati S, Kottapalli K, Arnett DK, Ashley-Koch A, Auer PL, Beutel KM, Bis JC, Blangero J, Bowden DW, Brody JA, Cade BE, Chen YDI, Cho MH, Curran JE, Fornage M, Freedman BI, Fingerlin T, Gelb BD, Hou L, Hung YJ, Kane JP, Kaplan R, Kim W, Loos RJ, Marcus GM, Mathias RA, McGarvey ST, Montgomery C, Naseri T, Nouraie SM, Preuss MH, Palmer ND, Peyser PA, Raffield LM, Ratan A, Redline S, Reupena S, Rotter JI, Rich SS, Rienstra M, Ruczinski I, Sankaran VG, Schwartz DA, Seidman CE, Seidman JG, Silverman EK, Smith JA, Stilp A, Taylor KD, Telen MJ, Weiss ST, Williams LK, Wu B, Yanek LR, Zhang Y, Lasky-Su J, Gingras MC, Dutcher SK, Eichler EE, Gabriel S, Germer S, Kim R, Viaud-Martinez KA, Nickerson DA, Luo J, Reiner A, Gibbs RA, Boerwinkle E, Abecasis G, Sedlazeck FJ. Structural variation across 138,134 samples in the TOPMed consortium. bioRxiv 2023:2023.01.25.525428. [PMID: 36747810 PMCID: PMC9900832 DOI: 10.1101/2023.01.25.525428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Ever larger Structural Variant (SV) catalogs highlighting the diversity within and between populations help researchers better understand the links between SVs and disease. The identification of SVs from DNA sequence data is non-trivial and requires a balance between comprehensiveness and precision. Here we present a catalog of 355,667 SVs (59.34% novel) across autosomes and the X chromosome (50bp+) from 138,134 individuals in the diverse TOPMed consortium. We describe our methodologies for SV inference resulting in high variant quality and >90% allele concordance compared to long-read de-novo assemblies of well-characterized control samples. We demonstrate utility through significant associations between SVs and important various cardio-metabolic and hemotologic traits. We have identified 690 SV hotspots and deserts and those that potentially impact the regulation of medically relevant genes. This catalog characterizes SVs across multiple populations and will serve as a valuable tool to understand the impact of SV on disease development and progression.
Collapse
Affiliation(s)
- Goo Jun
- Human Genetics Center, School of Public Health, University of Texas Health Science Center at Houston
| | - Adam C English
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Ginger A Metcalf
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Jianzhi Yang
- University of Southern California, Los Angeles, CA, USA
| | | | | | - Vipin K Menon
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | | | | | - Albert V Smith
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - John A Lane
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Tom Blackwell
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Hyun Min Kang
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Sejal Salvi
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Qingchang Meng
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Hua Shen
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Divya Pasham
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Sravya Bhamidipati
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Kavya Kottapalli
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Donna K. Arnett
- Department of Epidemiology, University of Kentucky College of Public Health
| | - Allison Ashley-Koch
- Department of Medicine, Duke University Medical Center, Durham, NC
- Duke Molecular Physiology Institute, Duke University Medical Center, Durham, NC
| | - Paul L. Auer
- Division of Biostatistics and Cancer Center, Medical College of Wisconsin, Milwaukee WI
| | | | - Joshua C. Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas, Rio Grande Valley School of Medicine, Brownsville, TX
| | - Donald W. Bowden
- Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Jennifer A. Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Brian E. Cade
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA
| | - Yii-Der Ida Chen
- Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA USA
| | - Michael H. Cho
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Joanne E. Curran
- Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Myriam Fornage
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX
| | - Barry I. Freedman
- Department of Internal Medicine, Section on Nephrology, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Tasha Fingerlin
- Center for Genes, Environment and Health, National Jewish Health, 1400 Jackson St., Denver, CO, 80206, USA
| | - Bruce D. Gelb
- Mindich Child Health and Development Institute and the Departments of Pediatrics and Genetics & Genomic Sciences, Icahn School of Medicine at Mount Sinai
| | | | - Yi-Jen Hung
- Institute of Preventive Medicine, National Defense Medical Center, Taiwan
| | - John P Kane
- Cardiovascular Research Institute, University of California, San Francisco
| | - Robert Kaplan
- Department of epidemiology and population health, Albert Einstein College of Medicine, Bronx NY USA
| | - Wonji Kim
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Ruth J.F. Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Gregory M Marcus
- Division of Cardiology, University of California, San Francisco CA
| | - Rasika A. Mathias
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD
| | - Stephen T. McGarvey
- Department of Epidemiology, International Health Institute and Department of Anthropology, Brown University
| | - Courtney Montgomery
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation
| | - Take Naseri
- Ministry of Health, Government of Samoa, Apia, Samoa
| | - S. Mehdi Nouraie
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213
| | - Michael H. Preuss
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | | | - Patricia A. Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI USA
| | | | - Aakrosh Ratan
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA
| | | | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA USA
- Department of Cardiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Stephen S. Rich
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI USA
| | - Michiel Rienstra
- Department of Cardiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Ingo Ruczinski
- Department of Biostatistics, Johns Hopkins University Bloomberg, School of Public Health, Baltimore, MD, USA
| | - Vijay G. Sankaran
- Division of Hematology/Oncology, Boston Children's Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
| | | | - Christine E. Seidman
- Department of Genetics, Harvard Medical School
- Cardiovascular Division, Brigham & Women’s Hospital, Harvard University
- Howard Hughes Medical Institute, Harvard University
| | | | - Edwin K. Silverman
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA
| | - Jennifer A. Smith
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213
| | - Adrienne Stilp
- Department of Biostatistics, University of Washington, Seattle, WA
| | - Kent D. Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA USA
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA USA
| | - Marilyn J. Telen
- Department of Medicine, Duke University Medical Center, Durham, NC
| | - Scott T. Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - L. Keoki Williams
- Center for Individualized and Genomic Medicine Research (CIGMA), Department of Internal Medicine, Henry Ford Health System, Detroit, Michigan, United States of America
| | - Baojun Wu
- Center for Individualized and Genomic Medicine Research (CIGMA), Department of Internal Medicine, Henry Ford Health System, Detroit, Michigan, United States of America
| | - Lisa R. Yanek
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Yingze Zhang
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213
| | - Jessica Lasky-Su
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | | | - Susan K. Dutcher
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington, USA
| | | | | | - Ryan Kim
- Psomagen, Inc.,Rockville, Maryland, USA
| | | | | | | | - James Luo
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alex Reiner
- Department of Epidemiology, University of Washington, Seattle, WA 98109, USA
| | - Richard A Gibbs
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, School of Public Health, University of Texas Health Science Center at Houston
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Goncalo Abecasis
- Regeneron Genetics Center
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Fritz J Sedlazeck
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA
| |
Collapse
|
3
|
Horowitz JE, Kosmicki JA, Damask A, Sharma D, Roberts GHL, Justice AE, Banerjee N, Coignet MV, Yadav A, Leader JB, Marcketta A, Park DS, Lanche R, Maxwell E, Knight SC, Bai X, Guturu H, Sun D, Baltzell A, Kury FSP, Backman JD, Girshick AR, O'Dushlaine C, McCurdy SR, Partha R, Mansfield AJ, Turissini DA, Li AH, Zhang M, Mbatchou J, Watanabe K, Gurski L, McCarthy SE, Kang HM, Dobbyn L, Stahl E, Verma A, Sirugo G, Ritchie MD, Jones M, Balasubramanian S, Siminovitch K, Salerno WJ, Shuldiner AR, Rader DJ, Mirshahi T, Locke AE, Marchini J, Overton JD, Carey DJ, Habegger L, Cantor MN, Rand KA, Hong EL, Reid JG, Ball CA, Baras A, Abecasis GR, Ferreira MA. Genome-wide analysis in 756,646 individuals provides first genetic evidence that ACE2 expression influences COVID-19 risk and yields genetic risk scores predictive of severe disease. medRxiv 2021. [PMID: 33619501 PMCID: PMC7899471 DOI: 10.1101/2020.12.14.20248176] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
SARS-CoV-2 enters host cells by binding angiotensin-converting enzyme 2 (ACE2). Through a genome-wide association study, we show that a rare variant (MAF = 0.3%, odds ratio 0.60, P=4.5×10-13) that down-regulates ACE2 expression reduces risk of COVID-19 disease, providing human genetics support for the hypothesis that ACE2 levels influence COVID-19 risk. Further, we show that common genetic variants define a risk score that predicts severe disease among COVID-19 cases.
Collapse
Affiliation(s)
- J E Horowitz
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - J A Kosmicki
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - A Damask
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - D Sharma
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - G H L Roberts
- AncestryDNA, 1300 West Traverse Parkway, Lehi, UT 84043, USA
| | | | - N Banerjee
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - M V Coignet
- AncestryDNA, 1300 West Traverse Parkway, Lehi, UT 84043, USA
| | - A Yadav
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | | | - A Marcketta
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - D S Park
- AncestryDNA, 1300 West Traverse Parkway, Lehi, UT 84043, USA
| | - R Lanche
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - E Maxwell
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - S C Knight
- AncestryDNA, 1300 West Traverse Parkway, Lehi, UT 84043, USA
| | - X Bai
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - H Guturu
- AncestryDNA, 1300 West Traverse Parkway, Lehi, UT 84043, USA
| | - D Sun
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - A Baltzell
- AncestryDNA, 1300 West Traverse Parkway, Lehi, UT 84043, USA
| | - F S P Kury
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - J D Backman
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - A R Girshick
- AncestryDNA, 1300 West Traverse Parkway, Lehi, UT 84043, USA
| | - C O'Dushlaine
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - S R McCurdy
- AncestryDNA, 1300 West Traverse Parkway, Lehi, UT 84043, USA
| | - R Partha
- AncestryDNA, 1300 West Traverse Parkway, Lehi, UT 84043, USA
| | - A J Mansfield
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - D A Turissini
- AncestryDNA, 1300 West Traverse Parkway, Lehi, UT 84043, USA
| | - A H Li
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - M Zhang
- AncestryDNA, 1300 West Traverse Parkway, Lehi, UT 84043, USA
| | - J Mbatchou
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - K Watanabe
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - L Gurski
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - S E McCarthy
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - H M Kang
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - L Dobbyn
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - E Stahl
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - A Verma
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - G Sirugo
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | | | - M D Ritchie
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - M Jones
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - S Balasubramanian
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - K Siminovitch
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - W J Salerno
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - A R Shuldiner
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - D J Rader
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | | | - A E Locke
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - J Marchini
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - J D Overton
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | | | - L Habegger
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - M N Cantor
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - K A Rand
- AncestryDNA, 1300 West Traverse Parkway, Lehi, UT 84043, USA
| | - E L Hong
- AncestryDNA, 1300 West Traverse Parkway, Lehi, UT 84043, USA
| | - J G Reid
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - C A Ball
- AncestryDNA, 1300 West Traverse Parkway, Lehi, UT 84043, USA
| | - A Baras
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - G R Abecasis
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| | - M A Ferreira
- Regeneron Genetics Center, 777 Old Saw Mill River Rd., Tarrytown, NY 10591, USA
| |
Collapse
|
4
|
Lin MF, Bai X, Salerno WJ, Reid JG. Sparse Project VCF: efficient encoding of population genotype matrices. Bioinformatics 2021; 36:5537-5538. [PMID: 33300997 PMCID: PMC8016461 DOI: 10.1093/bioinformatics/btaa1004] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Revised: 11/13/2020] [Accepted: 11/20/2020] [Indexed: 11/13/2022] Open
Abstract
SUMMARY Variant Call Format (VCF), the prevailing representation for germline genotypes in population sequencing, suffers rapid size growth as larger cohorts are sequenced and more rare variants are discovered. We present Sparse Project VCF (spVCF), an evolution of VCF with judicious entropy reduction and run-length encoding, delivering >10× size reduction for modern studies with practically minimal information loss. spVCF interoperates with VCF efficiently, including tabix-based random access. We demonstrate its effectiveness with the DiscovEHR and UK Biobank whole-exome sequencing cohorts. AVAILABILITY AND IMPLEMENTATION Apache-licensed reference implementation: github.com/mlin/spVCF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Xiaodong Bai
- Department of Regeneron Pharmaceuticals, Inc., Regeneron Genetics Center, Tarrytown, NY 10591, USA
| | - William J Salerno
- Department of Regeneron Pharmaceuticals, Inc., Regeneron Genetics Center, Tarrytown, NY 10591, USA
| | - Jeffrey G Reid
- Department of Regeneron Pharmaceuticals, Inc., Regeneron Genetics Center, Tarrytown, NY 10591, USA
| |
Collapse
|
5
|
Ranallo-Benavidez TR, Lemmon Z, Soyk S, Aganezov S, Salerno WJ, McCoy RC, Lippman ZB, Schatz MC, Sedlazeck FJ. Optimized sample selection for cost-efficient long-read population sequencing. Genome Res 2021; 31:910-918. [PMID: 33811084 PMCID: PMC8092009 DOI: 10.1101/gr.264879.120] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Accepted: 03/30/2021] [Indexed: 11/24/2022]
Abstract
An increasingly important scenario in population genetics is when a large cohort has been genotyped using a low-resolution approach (e.g., microarrays, exome capture, short-read WGS), from which a few individuals are resequenced using a more comprehensive approach, especially long-read sequencing. The subset of individuals selected should ensure that the captured genetic diversity is fully representative and includes variants across all subpopulations. For example, human variation has historically focused on individuals with European ancestry, but this represents a small fraction of the overall diversity. Addressing this, SVCollector identifies the optimal subset of individuals for resequencing by analyzing population-level VCF files from low-resolution genotyping studies. It then computes a ranked list of samples that maximizes the total number of variants present within a subset of a given size. To solve this optimization problem, SVCollector implements a fast, greedy heuristic and an exact algorithm using integer linear programming. We apply SVCollector on simulated data, 2504 human genomes from the 1000 Genomes Project, and 3024 genomes from the 3000 Rice Genomes Project and show the rankings it computes are more representative than alternative naive strategies. When selecting an optimal subset of 100 samples in these cohorts, SVCollector identifies individuals from every subpopulation, whereas naive methods yield an unbalanced selection. Finally, we show the number of variants present in cohorts selected using this approach follows a power-law distribution that is naturally related to the population genetic concept of the allele frequency spectrum, allowing us to estimate the diversity present with increasing numbers of samples.
Collapse
Affiliation(s)
| | - Zachary Lemmon
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Sebastian Soyk
- Center for Integrative Genomics, University of Lausanne, Lausanne 1005, Switzerland
| | | | - William J Salerno
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Rajiv C McCoy
- Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Zachary B Lippman
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA.,Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Michael C Schatz
- Johns Hopkins University, Baltimore, Maryland 21218, USA.,Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| |
Collapse
|
6
|
Backman JD, Li AH, Marcketta A, Sun D, Mbatchou J, Kessler MD, Benner C, Liu D, Locke AE, Balasubramanian S, Yadav A, Banerjee N, Gillies CE, Damask A, Liu S, Bai X, Hawes A, Maxwell E, Gurski L, Watanabe K, Kosmicki JA, Rajagopal V, Mighty J, Jones M, Mitnaul L, Stahl E, Coppola G, Jorgenson E, Habegger L, Salerno WJ, Shuldiner AR, Lotta LA, Overton JD, Cantor MN, Reid JG, Yancopoulos G, Kang HM, Marchini J, Baras A, Abecasis GR, Ferreira MAR. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature 2021; 599:628-634. [PMID: 34662886 PMCID: PMC8596853 DOI: 10.1038/s41586-021-04103-z] [Citation(s) in RCA: 276] [Impact Index Per Article: 92.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 10/06/2021] [Indexed: 12/19/2022]
Abstract
A major goal in human genetics is to use natural variation to understand the phenotypic consequences of altering each protein-coding gene in the genome. Here we used exome sequencing1 to explore protein-altering variants and their consequences in 454,787 participants in the UK Biobank study2. We identified 12 million coding variants, including around 1 million loss-of-function and around 1.8 million deleterious missense variants. When these were tested for association with 3,994 health-related traits, we found 564 genes with trait associations at P ≤ 2.18 × 10-11. Rare variant associations were enriched in loci from genome-wide association studies (GWAS), but most (91%) were independent of common variant signals. We discovered several risk-increasing associations with traits related to liver disease, eye disease and cancer, among others, as well as risk-lowering associations for hypertension (SLC9A3R2), diabetes (MAP3K15, FAM234A) and asthma (SLC27A3). Six genes were associated with brain imaging phenotypes, including two involved in neural development (GBE1, PLD1). Of the signals available and powered for replication in an independent cohort, 81% were confirmed; furthermore, association signals were generally consistent across individuals of European, Asian and African ancestry. We illustrate the ability of exome sequencing to identify gene-trait associations, elucidate gene function and pinpoint effector genes that underlie GWAS signals at scale.
Collapse
Affiliation(s)
- Joshua D. Backman
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Alexander H. Li
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Anthony Marcketta
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Dylan Sun
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Joelle Mbatchou
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Michael D. Kessler
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Christian Benner
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Daren Liu
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Adam E. Locke
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | | | - Ashish Yadav
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Nilanjana Banerjee
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | | | - Amy Damask
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Simon Liu
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Xiaodong Bai
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Alicia Hawes
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Evan Maxwell
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Lauren Gurski
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Kyoko Watanabe
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Jack A. Kosmicki
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Veera Rajagopal
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Jason Mighty
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | | | | | - Marcus Jones
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Lyndon Mitnaul
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Eli Stahl
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Giovanni Coppola
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Eric Jorgenson
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Lukas Habegger
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - William J. Salerno
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Alan R. Shuldiner
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Luca A. Lotta
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - John D. Overton
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Michael N. Cantor
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Jeffrey G. Reid
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - George Yancopoulos
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Hyun M. Kang
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Jonathan Marchini
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Aris Baras
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | - Gonçalo R. Abecasis
- grid.418961.30000 0004 0472 2713Regeneron Genetics Center, Tarrytown, NY USA
| | | |
Collapse
|
7
|
Zarate S, Carroll A, Mahmoud M, Krasheninina O, Jun G, Salerno WJ, Schatz MC, Boerwinkle E, Gibbs RA, Sedlazeck FJ. Parliament2: Accurate structural variant calling at scale. Gigascience 2020; 9:giaa145. [PMID: 33347570 PMCID: PMC7751401 DOI: 10.1093/gigascience/giaa145] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 09/17/2020] [Accepted: 11/18/2020] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Structural variants (SVs) are critical contributors to genetic diversity and genomic disease. To predict the phenotypic impact of SVs, there is a need for better estimates of both the occurrence and frequency of SVs, preferably from large, ethnically diverse cohorts. Thus, the current standard approach requires the use of short paired-end reads, which remain challenging to detect, especially at the scale of hundreds to thousands of samples. FINDINGS We present Parliament2, a consensus SV framework that leverages multiple best-in-class methods to identify high-quality SVs from short-read DNA sequence data at scale. Parliament2 incorporates pre-installed SV callers that are optimized for efficient execution in parallel to reduce the overall runtime and costs. We demonstrate the accuracy of Parliament2 when applied to data from NovaSeq and HiSeq X platforms with the Genome in a Bottle (GIAB) SV call set across all size classes. The reported quality score per SV is calibrated across different SV types and size classes. Parliament2 has the highest F1 score (74.27%) measured across the independent gold standard from GIAB. We illustrate the compute performance by processing all 1000 Genomes samples (2,691 samples) in <1 day on GRCH38. Parliament2 improves the runtime performance of individual methods and is open source (https://github.com/slzarate/parliament2), and a Docker image, as well as a WDL implementation, is available. CONCLUSION Parliament2 provides both a highly accurate single-sample SV call set from short-read DNA sequence data and enables cost-efficient application over cloud or cluster environments, processing thousands of samples.
Collapse
Affiliation(s)
- Samantha Zarate
- DNAnexus, 1975 W El Camino Real #204, Mountain View, CA 94040, USA
- Department of Computer Science, 3400 N. Charles St. Johns Hopkins University, Baltimore, MD 21218, USA
| | - Andrew Carroll
- DNAnexus, 1975 W El Camino Real #204, Mountain View, CA 94040, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Olga Krasheninina
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Goo Jun
- Human Genetics Center, 1200 Pressler Street, University of Texas Health Science Center at Houston, Houston, TX 77040, USA
| | - William J Salerno
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Michael C Schatz
- Department of Computer Science, 3400 N. Charles St. Johns Hopkins University, Baltimore, MD 21218, USA
| | - Eric Boerwinkle
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
- Human Genetics Center, 1200 Pressler Street, University of Texas Health Science Center at Houston, Houston, TX 77040, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
8
|
Van Hout CV, Tachmazidou I, Backman JD, Hoffman JD, Liu D, Pandey AK, Gonzaga-Jauregui C, Khalid S, Ye B, Banerjee N, Li AH, O'Dushlaine C, Marcketta A, Staples J, Schurmann C, Hawes A, Maxwell E, Barnard L, Lopez A, Penn J, Habegger L, Blumenfeld AL, Bai X, O'Keeffe S, Yadav A, Praveen K, Jones M, Salerno WJ, Chung WK, Surakka I, Willer CJ, Hveem K, Leader JB, Carey DJ, Ledbetter DH, Cardon L, Yancopoulos GD, Economides A, Coppola G, Shuldiner AR, Balasubramanian S, Cantor M, Nelson MR, Whittaker J, Reid JG, Marchini J, Overton JD, Scott RA, Abecasis GR, Yerges-Armstrong L, Baras A. Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature 2020; 586:749-756. [PMID: 33087929 PMCID: PMC7759458 DOI: 10.1038/s41586-020-2853-0] [Citation(s) in RCA: 259] [Impact Index Per Article: 64.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Accepted: 08/25/2020] [Indexed: 12/12/2022]
Abstract
The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world1. Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenic BRCA1 and BRCA2 variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.
Collapse
Affiliation(s)
| | | | | | - Joshua D Hoffman
- GlaxoSmithKline, Collegeville, PA, USA.,Foresite Labs, Cambridge, MA, USA
| | - Daren Liu
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | - Bin Ye
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | | - Claudia Schurmann
- Regeneron Genetics Center, Tarrytown, NY, USA.,Digital Health Center, Hasso Plattner Institute, University of Potsdam, Potsdam, Germany.,Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | | | | | | | | - John Penn
- Regeneron Genetics Center, Tarrytown, NY, USA.,DNANexus, Mountain View, CA, USA
| | | | | | | | | | | | | | | | | | - Wendy K Chung
- Department of Pediatrics, Columbia University Irving Medical Center, New York, NY, USA.,Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
| | | | | | - Kristian Hveem
- Norwegian University of Science and Technology, Trondheim, Norway
| | | | | | | | | | | | | | | | | | | | | | | | | | - Matthew R Nelson
- GlaxoSmithKline, Collegeville, PA, USA.,Deerfield, New York, NY, USA
| | | | | | | | | | | | | | | | - Aris Baras
- Regeneron Genetics Center, Tarrytown, NY, USA.
| |
Collapse
|
9
|
Bis JC, Jian X, Kunkle BW, Chen Y, Hamilton-Nelson KL, Bush WS, Salerno WJ, Lancour D, Ma Y, Renton AE, Marcora E, Farrell JJ, Zhao Y, Qu L, Ahmad S, Amin N, Amouyel P, Beecham GW, Below JE, Campion D, Cantwell L, Charbonnier C, Chung J, Crane PK, Cruchaga C, Cupples LA, Dartigues JF, Debette S, Deleuze JF, Fulton L, Gabriel SB, Genin E, Gibbs RA, Goate A, Grenier-Boley B, Gupta N, Haines JL, Havulinna AS, Helisalmi S, Hiltunen M, Howrigan DP, Ikram MA, Kaprio J, Konrad J, Kuzma A, Lander ES, Lathrop M, Lehtimäki T, Lin H, Mattila K, Mayeux R, Muzny DM, Nasser W, Neale B, Nho K, Nicolas G, Patel D, Pericak-Vance MA, Perola M, Psaty BM, Quenez O, Rajabli F, Redon R, Reitz C, Remes AM, Salomaa V, Sarnowski C, Schmidt H, Schmidt M, Schmidt R, Soininen H, Thornton TA, Tosto G, Tzourio C, van der Lee SJ, van Duijn CM, Valladares O, Vardarajan B, Wang LS, Wang W, Wijsman E, Wilson RK, Witten D, Worley KC, Zhang X, Bellenguez C, Lambert JC, Kurki MI, Palotie A, Daly M, Boerwinkle E, Lunetta KL, Destefano AL, Dupuis J, Martin ER, Schellenberg GD, Seshadri S, Naj AC, Fornage M, Farrer LA. Whole exome sequencing study identifies novel rare and common Alzheimer's-Associated variants involved in immune response and transcriptional regulation. Mol Psychiatry 2020; 25:1859-1875. [PMID: 30108311 PMCID: PMC6375806 DOI: 10.1038/s41380-018-0112-7] [Citation(s) in RCA: 160] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 05/01/2018] [Accepted: 05/14/2018] [Indexed: 12/21/2022]
Abstract
The Alzheimer's Disease Sequencing Project (ADSP) undertook whole exome sequencing in 5,740 late-onset Alzheimer disease (AD) cases and 5,096 cognitively normal controls primarily of European ancestry (EA), among whom 218 cases and 177 controls were Caribbean Hispanic (CH). An age-, sex- and APOE based risk score and family history were used to select cases most likely to harbor novel AD risk variants and controls least likely to develop AD by age 85 years. We tested ~1.5 million single nucleotide variants (SNVs) and 50,000 insertion-deletion polymorphisms (indels) for association to AD, using multiple models considering individual variants as well as gene-based tests aggregating rare, predicted functional, and loss of function variants. Sixteen single variants and 19 genes that met criteria for significant or suggestive associations after multiple-testing correction were evaluated for replication in four independent samples; three with whole exome sequencing (2,778 cases, 7,262 controls) and one with genome-wide genotyping imputed to the Haplotype Reference Consortium panel (9,343 cases, 11,527 controls). The top findings in the discovery sample were also followed-up in the ADSP whole-genome sequenced family-based dataset (197 members of 42 EA families and 501 members of 157 CH families). We identified novel and predicted functional genetic variants in genes previously associated with AD. We also detected associations in three novel genes: IGHG3 (p = 9.8 × 10-7), an immunoglobulin gene whose antibodies interact with β-amyloid, a long non-coding RNA AC099552.4 (p = 1.2 × 10-7), and a zinc-finger protein ZNF655 (gene-based p = 5.0 × 10-6). The latter two suggest an important role for transcriptional regulation in AD pathogenesis.
Collapse
Affiliation(s)
- Joshua C Bis
- Department of Medicine (General Internal Medicine), University of Washington, Seattle, WA, USA
| | - Xueqiu Jian
- Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Brian W Kunkle
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Yuning Chen
- Departments of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Kara L Hamilton-Nelson
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - William S Bush
- Case Western Reserve University, Cleveland Heights, OH, USA
| | - William J Salerno
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Daniel Lancour
- Department of Medicine (Biomedical Genetics), Boston University School of Medicine, Boston, MA, USA
| | - Yiyi Ma
- Department of Medicine (Biomedical Genetics), Boston University School of Medicine, Boston, MA, USA
| | - Alan E Renton
- Department of Neuroscience and Ronald M Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Edoardo Marcora
- Department of Neuroscience and Ronald M Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - John J Farrell
- Department of Medicine (Biomedical Genetics), Boston University School of Medicine, Boston, MA, USA
| | - Yi Zhao
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Liming Qu
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Shahzad Ahmad
- Erasmus University Medical Center, Rotterdam, Netherlands
| | - Najaf Amin
- Inserm, U1167, RID-AGE-Risk Factors and Molecular Determinants of Aging-Related Diseases, Lille, France
| | - Philippe Amouyel
- Inserm, U1167, RID-AGE-Risk Factors and Molecular Determinants of Aging-Related Diseases, Lille, France
- Institut Pasteur de Lille, Lille, France
- University Lille, U1167-Excellence Laboratory LabEx DISTALZ, Lille, France
| | - Gary W Beecham
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Jennifer E Below
- Department of Medical Genetics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Dominique Campion
- Department of Genetics and CNR-MAJ, Normandie Université, UNIROUEN, Inserm U1245 and Rouen University Hospital, F 76000, Normandy Centre for Genomic and Personalized Medicine, Rouen, France
- Department of Research, Centre Hospitalier du Rouvray, Sotteville-lès-, Rouen, France
| | - Laura Cantwell
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Camille Charbonnier
- Department of Genetics and CNR-MAJ, Normandie Université, UNIROUEN, Inserm U1245 and Rouen University Hospital, F 76000, Normandy Centre for Genomic and Personalized Medicine, Rouen, France
| | - Jaeyoon Chung
- Department of Medicine (Biomedical Genetics), Boston University School of Medicine, Boston, MA, USA
| | - Paul K Crane
- Department of Medicine (General Internal Medicine), University of Washington, Seattle, WA, USA
| | - Carlos Cruchaga
- Department of Psychiatry, Washington University, St. Louis, MO, USA
| | - L Adrienne Cupples
- Departments of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- National Heart, Lung, and Blood Institute's Framingham Heart Study, Framingham, MA, USA
| | - Jean-François Dartigues
- University of Bordeaux, Inserm, Bordeaux Population Health Research Center, team VINTAGE, UMR 1219, F-33000, Bordeaux, France
| | - Stéphanie Debette
- University of Bordeaux, Inserm, Bordeaux Population Health Research Center, team VINTAGE, UMR 1219, F-33000, Bordeaux, France
- Department of Neurology and Institute for Neurodegenerative Diseases, Bordeaux University Hospital, Memory Clinic, F-33000, Bordeaux, France
| | - Jean-François Deleuze
- Centre National de Recherche en Génomique Humaine, Institut François Jacob, Direction de le Recherche Fondamentale, CEA, Evry, France
| | - Lucinda Fulton
- McDonnell Genome Institute, Washington University, St. Louis, MO, USA
| | | | | | - Richard A Gibbs
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Alison Goate
- Department of Neuroscience and Ronald M Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Benjamin Grenier-Boley
- Inserm, U1167, RID-AGE-Risk Factors and Molecular Determinants of Aging-Related Diseases, Lille, France
| | - Namrata Gupta
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Aki S Havulinna
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- National Institute for Health and Welfare, Helsinki, Finland
| | - Seppo Helisalmi
- Institute of Clinical Medicine - Neurology and Department of Neurology, University of Eastern Finland, Kuopio, Finland
| | - Mikko Hiltunen
- Institute of Biomedicine, University of Eastern Finland, Kuopio, Finland
| | - Daniel P Howrigan
- Program in Medical and Population Genetics and Genetic Analysis Platform, Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - M Arfan Ikram
- Erasmus University Medical Center, Rotterdam, Netherlands
| | - Jaakko Kaprio
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Jan Konrad
- Department of Psychiatry, Washington University, St. Louis, MO, USA
| | - Amanda Kuzma
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Eric S Lander
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mark Lathrop
- McGill University and Génome Québec Innovation Centre, Montréal, Canada
| | - Terho Lehtimäki
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center-Tampere, Faculty of Medicine and Life Sciences, University of Tampere, Tampere, Finland
| | - Honghuang Lin
- Department of Medicine (Computational Biomedicine), Boston University School of Medicine, Boston, MA, USA
| | - Kari Mattila
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center-Tampere, Faculty of Medicine and Life Sciences, University of Tampere, Tampere, Finland
| | | | - Donna M Muzny
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Waleed Nasser
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Benjamin Neale
- Program in Medical and Population Genetics and Genetic Analysis Platform, Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Kwangsik Nho
- Indiana University School of Medicine, Indianapolis, IN, USA
| | - Gaël Nicolas
- Department of Genetics and CNR-MAJ, Normandie Université, UNIROUEN, Inserm U1245 and Rouen University Hospital, F 76000, Normandy Centre for Genomic and Personalized Medicine, Rouen, France
| | - Devanshi Patel
- Department of Medicine (Biomedical Genetics), Boston University School of Medicine, Boston, MA, USA
| | - Margaret A Pericak-Vance
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Markus Perola
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- National Institute for Health and Welfare, Helsinki, Finland
- University of Tartu, Estonian Genome Center, Tartu, Estonia
| | - Bruce M Psaty
- Department of Medicine (General Internal Medicine), University of Washington, Seattle, WA, USA
- Department of Epidemiology, University of Washington, Seattle, WA, USA
- Department of Health Services, University of Washington, Seattle, WA, USA
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - Olivier Quenez
- Department of Genetics and CNR-MAJ, Normandie Université, UNIROUEN, Inserm U1245 and Rouen University Hospital, F 76000, Normandy Centre for Genomic and Personalized Medicine, Rouen, France
| | - Farid Rajabli
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Richard Redon
- Inserm, CNRS, Univ. Nantes, CHU Nantes, l'institut du thorax, Nantes, France
| | | | - Anne M Remes
- Institute of Clinical Medicine - Neurology and Department of Neurology, University of Eastern Finland, Kuopio, Finland
- Unit of Clinical Neuroscience, Neurology, University of Oulu and Medical Research Center, Oulu University Hospital, Oulu, Finland
| | - Veikko Salomaa
- National Institute for Health and Welfare, Helsinki, Finland
| | - Chloe Sarnowski
- Departments of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Helena Schmidt
- Department of Neurology, Clinical Division of Neurogeriatrics, Medical University of Graz, Graz, Austria
| | - Michael Schmidt
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Reinhold Schmidt
- Department of Neurology, Clinical Division of Neurogeriatrics, Medical University of Graz, Graz, Austria
| | - Hilkka Soininen
- Institute of Clinical Medicine - Neurology and Department of Neurology, University of Eastern Finland, Kuopio, Finland
- Department of Neurology, Kuopio University Hospital, Kuopio, Finland
| | | | | | - Christophe Tzourio
- University of Bordeaux, Inserm, Bordeaux Population Health Research Center, team VINTAGE, UMR 1219, F-33000, Bordeaux, France
| | | | | | - Otto Valladares
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | | | - Li-San Wang
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Weixin Wang
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Ellen Wijsman
- Department of Medicine (Medical Genetics), University of Washington, Seattle, WA, USA
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Richard K Wilson
- McDonnell Genome Institute, Washington University, St. Louis, MO, USA
| | - Daniela Witten
- Department of Statistics, University of Washington, Seattle, WA, USA
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Kim C Worley
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Xiaoling Zhang
- Departments of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- Department of Medicine (Biomedical Genetics), Boston University School of Medicine, Boston, MA, USA
| | - Celine Bellenguez
- Inserm, U1167, RID-AGE-Risk Factors and Molecular Determinants of Aging-Related Diseases, Lille, France
| | - Jean-Charles Lambert
- Inserm, U1167, RID-AGE-Risk Factors and Molecular Determinants of Aging-Related Diseases, Lille, France
| | - Mitja I Kurki
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Program in Medical and Population Genetics and Genetic Analysis Platform, Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Aarno Palotie
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Program in Medical and Population Genetics and Genetic Analysis Platform, Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Mark Daly
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Eric Boerwinkle
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Kathryn L Lunetta
- Departments of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Anita L Destefano
- Departments of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- Departments of Neurology, Boston University School of Medicine, Boston, MA, USA
| | - Josée Dupuis
- Departments of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Eden R Martin
- John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL, USA
| | | | - Sudha Seshadri
- National Heart, Lung, and Blood Institute's Framingham Heart Study, Framingham, MA, USA
- Departments of Neurology, Boston University School of Medicine, Boston, MA, USA
- Glenn Biggs Institute for Alzheimer's and Neurodegenerative Diseases, University of Texas Health Sciences Center, San Antonio, TX, USA
| | - Adam C Naj
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Myriam Fornage
- Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
- School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Lindsay A Farrer
- Departments of Biostatistics, Boston University School of Public Health, Boston, MA, USA.
- Department of Medicine (Biomedical Genetics), Boston University School of Medicine, Boston, MA, USA.
- Departments of Neurology, Boston University School of Medicine, Boston, MA, USA.
- Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA.
- Department of Ophthalmology, Boston University School of Medicine, Boston, MA, USA.
| |
Collapse
|
10
|
Abel HJ, Larson DE, Regier AA, Chiang C, Das I, Kanchi KL, Layer RM, Neale BM, Salerno WJ, Reeves C, Buyske S, Matise TC, Muzny DM, Zody MC, Lander ES, Dutcher SK, Stitziel NO, Hall IM. Mapping and characterization of structural variation in 17,795 human genomes. Nature 2020; 583:83-89. [PMID: 32460305 PMCID: PMC7547914 DOI: 10.1038/s41586-020-2371-0] [Citation(s) in RCA: 141] [Impact Index Per Article: 35.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2018] [Accepted: 05/18/2020] [Indexed: 12/18/2022]
Abstract
A key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline1 to map and characterize structural variants in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest, to our knowledge, whole-genome-sequencing-based structural variant resource so far. On average, individuals carry 2.9 rare structural variants that alter coding regions; these variants affect the dosage or structure of 4.2 genes and account for 4.0-11.2% of rare high-impact coding alleles. Using a computational model, we estimate that structural variants account for 17.2% of rare alleles genome-wide, with predicted deleterious effects that are equivalent to loss-of-function coding alleles; approximately 90% of such structural variants are noncoding deletions (mean 19.1 per genome). We report 158,991 ultra-rare structural variants and show that 2% of individuals carry ultra-rare megabase-scale structural variants, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and noncoding elements, and reveal trends that relate to element class and conservation. This work will help to guide the analysis and interpretation of structural variants in the era of whole-genome sequencing.
Collapse
Affiliation(s)
- Haley J Abel
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St Louis, MO, USA
| | - David E Larson
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St Louis, MO, USA
| | - Allison A Regier
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
- Department of Medicine, Washington University School of Medicine, St Louis, MO, USA
| | - Colby Chiang
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
| | - Indraniel Das
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
| | - Krishna L Kanchi
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
| | - Ryan M Layer
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
- Department of Computer Science, University of Colorado, Boulder, CO, USA
| | - Benjamin M Neale
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - William J Salerno
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | - Steven Buyske
- Department of Statistics, Rutgers University, Piscataway, NJ, USA
| | - Tara C Matise
- Department of Genetics, Rutgers University, Piscataway, NJ, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | - Eric S Lander
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Susan K Dutcher
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St Louis, MO, USA
| | - Nathan O Stitziel
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St Louis, MO, USA
- Department of Medicine, Washington University School of Medicine, St Louis, MO, USA
| | - Ira M Hall
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA.
- Department of Genetics, Washington University School of Medicine, St Louis, MO, USA.
- Department of Medicine, Washington University School of Medicine, St Louis, MO, USA.
| |
Collapse
|
11
|
Bis JC, Jian X, Kunkle BW, Chen Y, Hamilton-Nelson KL, Bush WS, Salerno WJ, Lancour D, Ma Y, Renton AE, Marcora E, Farrell JJ, Zhao Y, Qu L, Ahmad S, Amin N, Amouyel P, Beecham GW, Below JE, Campion D, Cantwell L, Charbonnier C, Chung J, Crane PK, Cruchaga C, Cupples LA, Dartigues JF, Debette S, Deleuze JF, Fulton L, Gabriel SB, Genin E, Gibbs RA, Goate A, Grenier-Boley B, Gupta N, Haines JL, Havulinna AS, Helisalmi S, Hiltunen M, Howrigan DP, Ikram MA, Kaprio J, Konrad J, Kuzma A, Lander ES, Lathrop M, Lehtimäki T, Lin H, Mattila K, Mayeux R, Muzny DM, Nasser W, Neale B, Nho K, Nicolas G, Patel D, Pericak-Vance MA, Perola M, Psaty BM, Quenez O, Rajabli F, Redon R, Reitz C, Remes AM, Salomaa V, Sarnowski C, Schmidt H, Schmidt M, Schmidt R, Soininen H, Thornton TA, Tosto G, Tzourio C, van der Lee SJ, van Duijn CM, Valladares O, Vardarajan B, Wang LS, Wang W, Wijsman E, Wilson RK, Witten D, Worley KC, Zhang X, Bellenguez C, Lambert JC, Kurki MI, Palotie A, Daly M, Boerwinkle E, Lunetta KL, Destefano AL, Dupuis J, Martin ER, Schellenberg GD, Seshadri S, Naj AC, Fornage M, Farrer LA. Correction: Whole exome sequencing study identifies novel rare and common Alzheimer's-Associated variants involved in immune response and transcriptional regulation. Mol Psychiatry 2020; 25:1901-1903. [PMID: 31636380 PMCID: PMC7387240 DOI: 10.1038/s41380-019-0529-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
A correction to this paper has been published and can be accessed via a link at the top of the paper.
Collapse
Affiliation(s)
- Joshua C. Bis
- 0000000122986657grid.34477.33Department of Medicine (General Internal Medicine), University of Washington, Seattle, WA USA
| | - Xueqiu Jian
- 0000 0000 9206 2401grid.267308.8Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX USA
| | - Brian W. Kunkle
- 0000 0004 1936 8606grid.26790.3aJohn P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL USA
| | - Yuning Chen
- 0000 0004 1936 7558grid.189504.1Departments of Biostatistics, Boston University School of Public Health, Boston, MA USA
| | - Kara L. Hamilton-Nelson
- 0000 0004 1936 8606grid.26790.3aJohn P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL USA
| | - William S. Bush
- 0000 0001 2164 3847grid.67105.35Case Western Reserve University, Cleveland Heights, OH USA
| | - William J. Salerno
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX USA
| | - Daniel Lancour
- 0000 0004 0367 5222grid.475010.7Department of Medicine (Biomedical Genetics), Boston University School of Medicine, Boston, MA USA
| | - Yiyi Ma
- 0000 0004 0367 5222grid.475010.7Department of Medicine (Biomedical Genetics), Boston University School of Medicine, Boston, MA USA
| | - Alan E. Renton
- 0000 0001 0670 2351grid.59734.3cDepartment of Neuroscience and Ronald M Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY USA
| | - Edoardo Marcora
- 0000 0001 0670 2351grid.59734.3cDepartment of Neuroscience and Ronald M Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY USA ,0000 0001 0670 2351grid.59734.3cDepartment of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY USA
| | - John J. Farrell
- 0000 0004 0367 5222grid.475010.7Department of Medicine (Biomedical Genetics), Boston University School of Medicine, Boston, MA USA
| | - Yi Zhao
- 0000 0004 1936 8972grid.25879.31University of Pennsylvania Perelman School of Medicine, Philadelphia, PA USA
| | - Liming Qu
- 0000 0004 1936 8972grid.25879.31University of Pennsylvania Perelman School of Medicine, Philadelphia, PA USA
| | - Shahzad Ahmad
- 000000040459992Xgrid.5645.2Erasmus University Medical Center, Rotterdam, Netherlands
| | - Najaf Amin
- grid.457380.dInserm, U1167, RID-AGE-Risk Factors and Molecular Determinants of Aging-Related Diseases, Lille, France
| | - Philippe Amouyel
- grid.457380.dInserm, U1167, RID-AGE-Risk Factors and Molecular Determinants of Aging-Related Diseases, Lille, France ,0000 0001 2159 9858grid.8970.6Institut Pasteur de Lille, Lille, France ,0000 0001 2242 6780grid.503422.2University Lille, U1167-Excellence Laboratory LabEx DISTALZ, Lille, France
| | - Gary W. Beecham
- 0000 0004 1936 8606grid.26790.3aJohn P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL USA
| | - Jennifer E. Below
- 0000 0004 1936 9916grid.412807.8Department of Medical Genetics, Vanderbilt University Medical Center, Nashville, TN USA
| | - Dominique Campion
- 0000 0004 1785 9671grid.460771.3Department of Genetics and CNR-MAJ, Normandie Université, UNIROUEN, Inserm U1245 and Rouen University Hospital, F 76000, Normandy Centre for Genomic and Personalized Medicine, Rouen, France ,0000 0004 1765 2814grid.477068.aDepartment of Research, Centre Hospitalier du Rouvray, Sotteville-lès-, Rouen, France
| | - Laura Cantwell
- 0000 0004 1936 8972grid.25879.31University of Pennsylvania Perelman School of Medicine, Philadelphia, PA USA
| | - Camille Charbonnier
- 0000 0004 1785 9671grid.460771.3Department of Genetics and CNR-MAJ, Normandie Université, UNIROUEN, Inserm U1245 and Rouen University Hospital, F 76000, Normandy Centre for Genomic and Personalized Medicine, Rouen, France
| | - Jaeyoon Chung
- 0000 0004 0367 5222grid.475010.7Department of Medicine (Biomedical Genetics), Boston University School of Medicine, Boston, MA USA
| | - Paul K. Crane
- 0000000122986657grid.34477.33Department of Medicine (General Internal Medicine), University of Washington, Seattle, WA USA
| | - Carlos Cruchaga
- 0000 0001 2355 7002grid.4367.6Department of Psychiatry, Washington University, St. Louis, MO USA
| | - L. Adrienne Cupples
- 0000 0004 1936 7558grid.189504.1Departments of Biostatistics, Boston University School of Public Health, Boston, MA USA ,0000 0001 2293 4638grid.279885.9National Heart, Lung, and Blood Institute’s Framingham Heart Study, Framingham, MA USA
| | - Jean-François Dartigues
- 0000 0001 2106 639Xgrid.412041.2University of Bordeaux, Inserm, Bordeaux Population Health Research Center, team VINTAGE, UMR 1219, F-33000 Bordeaux, France
| | - Stéphanie Debette
- 0000 0001 2106 639Xgrid.412041.2University of Bordeaux, Inserm, Bordeaux Population Health Research Center, team VINTAGE, UMR 1219, F-33000 Bordeaux, France ,0000 0004 0593 7118grid.42399.35Department of Neurology and Institute for Neurodegenerative Diseases, Bordeaux University Hospital, Memory Clinic, F-33000 Bordeaux, France
| | - Jean-François Deleuze
- Centre National de Recherche en Génomique Humaine, Institut François Jacob, Direction de le Recherche Fondamentale, CEA, Evry, France
| | - Lucinda Fulton
- 0000 0001 2355 7002grid.4367.6McDonnell Genome Institute, Washington University, St. Louis, MO USA
| | | | | | - Richard A. Gibbs
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX USA
| | - Alison Goate
- 0000 0001 0670 2351grid.59734.3cDepartment of Neuroscience and Ronald M Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY USA ,0000 0001 0670 2351grid.59734.3cDepartment of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, NY USA
| | - Benjamin Grenier-Boley
- grid.457380.dInserm, U1167, RID-AGE-Risk Factors and Molecular Determinants of Aging-Related Diseases, Lille, France
| | - Namrata Gupta
- grid.66859.34Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Jonathan L. Haines
- 0000 0001 2164 3847grid.67105.35Case Western Reserve University, Cleveland Heights, OH USA
| | - Aki S. Havulinna
- 0000 0004 0410 2071grid.7737.4Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland ,0000 0001 1013 0499grid.14758.3fNational Institute for Health and Welfare, Helsinki, Finland
| | - Seppo Helisalmi
- 0000 0001 0726 2490grid.9668.1Institute of Clinical Medicine - Neurology and Department of Neurology, University of Eastern Finland, Kuopio, Finland
| | - Mikko Hiltunen
- 0000 0001 0726 2490grid.9668.1Institute of Biomedicine, University of Eastern Finland, Kuopio, Finland
| | - Daniel P. Howrigan
- grid.66859.34Program in Medical and Population Genetics and Genetic Analysis Platform, Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA USA ,0000 0004 0386 9924grid.32224.35Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA USA
| | - M. Arfan Ikram
- 000000040459992Xgrid.5645.2Erasmus University Medical Center, Rotterdam, Netherlands
| | - Jaakko Kaprio
- 0000 0004 0410 2071grid.7737.4Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Jan Konrad
- 0000 0001 2355 7002grid.4367.6Department of Psychiatry, Washington University, St. Louis, MO USA
| | - Amanda Kuzma
- 0000 0004 1936 8972grid.25879.31University of Pennsylvania Perelman School of Medicine, Philadelphia, PA USA
| | - Eric S. Lander
- grid.66859.34Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Mark Lathrop
- grid.411640.6McGill University and Génome Québec Innovation Centre, Montréal, Canada
| | - Terho Lehtimäki
- 0000 0001 2314 6254grid.502801.eDepartment of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center-Tampere, Faculty of Medicine and Life Sciences, University of Tampere, Tampere, Finland
| | - Honghuang Lin
- 0000 0004 0367 5222grid.475010.7Department of Medicine (Computational Biomedicine), Boston University School of Medicine, Boston, MA USA
| | - Kari Mattila
- 0000 0001 2314 6254grid.502801.eDepartment of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center-Tampere, Faculty of Medicine and Life Sciences, University of Tampere, Tampere, Finland
| | - Richard Mayeux
- 0000000419368729grid.21729.3fColumbia University, New York, NY USA
| | - Donna M. Muzny
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX USA
| | - Waleed Nasser
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX USA
| | - Benjamin Neale
- grid.66859.34Program in Medical and Population Genetics and Genetic Analysis Platform, Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA USA ,0000 0004 0386 9924grid.32224.35Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA USA
| | - Kwangsik Nho
- 0000 0001 2287 3919grid.257413.6Indiana University School of Medicine, Indianapolis, IN USA
| | - Gaël Nicolas
- 0000 0004 1785 9671grid.460771.3Department of Genetics and CNR-MAJ, Normandie Université, UNIROUEN, Inserm U1245 and Rouen University Hospital, F 76000, Normandy Centre for Genomic and Personalized Medicine, Rouen, France
| | - Devanshi Patel
- 0000 0004 0367 5222grid.475010.7Department of Medicine (Biomedical Genetics), Boston University School of Medicine, Boston, MA USA
| | - Margaret A. Pericak-Vance
- 0000 0004 1936 8606grid.26790.3aJohn P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL USA
| | - Markus Perola
- 0000 0004 0410 2071grid.7737.4Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland ,0000 0001 1013 0499grid.14758.3fNational Institute for Health and Welfare, Helsinki, Finland ,0000 0001 0943 7661grid.10939.32University of Tartu, Estonian Genome Center, Tartu, Estonia
| | - Bruce M. Psaty
- 0000000122986657grid.34477.33Department of Medicine (General Internal Medicine), University of Washington, Seattle, WA USA ,0000000122986657grid.34477.33Department of Epidemiology, University of Washington, Seattle, WA USA ,0000000122986657grid.34477.33Department of Health Services, University of Washington, Seattle, WA USA ,0000 0004 0615 7519grid.488833.cKaiser Permanente Washington Health Research Institute, Seattle, WA USA
| | - Olivier Quenez
- 0000 0004 1785 9671grid.460771.3Department of Genetics and CNR-MAJ, Normandie Université, UNIROUEN, Inserm U1245 and Rouen University Hospital, F 76000, Normandy Centre for Genomic and Personalized Medicine, Rouen, France
| | - Farid Rajabli
- 0000 0004 1936 8606grid.26790.3aJohn P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL USA
| | - Richard Redon
- 0000 0004 0472 0371grid.277151.7Inserm, CNRS, Univ. Nantes, CHU Nantes, l’institut du thorax, Nantes, France
| | - Christiane Reitz
- 0000000419368729grid.21729.3fColumbia University, New York, NY USA
| | - Anne M. Remes
- 0000 0001 0726 2490grid.9668.1Institute of Clinical Medicine - Neurology and Department of Neurology, University of Eastern Finland, Kuopio, Finland ,0000 0004 4685 4917grid.412326.0Unit of Clinical Neuroscience, Neurology, University of Oulu and Medical Research Center, Oulu University Hospital, Oulu, Finland
| | - Veikko Salomaa
- 0000 0001 1013 0499grid.14758.3fNational Institute for Health and Welfare, Helsinki, Finland
| | - Chloe Sarnowski
- 0000 0004 1936 7558grid.189504.1Departments of Biostatistics, Boston University School of Public Health, Boston, MA USA
| | - Helena Schmidt
- 0000 0000 8988 2476grid.11598.34Department of Neurology, Clinical Division of Neurogeriatrics, Medical University of Graz, Graz, Austria
| | - Michael Schmidt
- 0000 0004 1936 8606grid.26790.3aJohn P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL USA
| | - Reinhold Schmidt
- 0000 0000 8988 2476grid.11598.34Department of Neurology, Clinical Division of Neurogeriatrics, Medical University of Graz, Graz, Austria
| | - Hilkka Soininen
- 0000 0001 0726 2490grid.9668.1Institute of Clinical Medicine - Neurology and Department of Neurology, University of Eastern Finland, Kuopio, Finland ,0000 0004 0628 207Xgrid.410705.7Department of Neurology, Kuopio University Hospital, Kuopio, Finland
| | - Timothy A. Thornton
- 0000000122986657grid.34477.33Department of Statistics, University of Washington, Seattle, WA USA
| | - Giuseppe Tosto
- 0000000419368729grid.21729.3fColumbia University, New York, NY USA
| | - Christophe Tzourio
- 0000 0001 2106 639Xgrid.412041.2University of Bordeaux, Inserm, Bordeaux Population Health Research Center, team VINTAGE, UMR 1219, F-33000 Bordeaux, France
| | - Sven J. van der Lee
- 000000040459992Xgrid.5645.2Erasmus University Medical Center, Rotterdam, Netherlands
| | - Cornelia M. van Duijn
- 000000040459992Xgrid.5645.2Erasmus University Medical Center, Rotterdam, Netherlands
| | - Otto Valladares
- 0000 0004 1936 8972grid.25879.31University of Pennsylvania Perelman School of Medicine, Philadelphia, PA USA
| | - Badri Vardarajan
- 0000000419368729grid.21729.3fColumbia University, New York, NY USA
| | - Li-San Wang
- 0000 0004 1936 8972grid.25879.31University of Pennsylvania Perelman School of Medicine, Philadelphia, PA USA
| | - Weixin Wang
- 0000 0004 1936 8972grid.25879.31University of Pennsylvania Perelman School of Medicine, Philadelphia, PA USA
| | - Ellen Wijsman
- 0000000122986657grid.34477.33Department of Medicine (Medical Genetics), University of Washington, Seattle, WA USA ,0000000122986657grid.34477.33Department of Biostatistics, University of Washington, Seattle, WA USA
| | - Richard K. Wilson
- 0000 0001 2355 7002grid.4367.6McDonnell Genome Institute, Washington University, St. Louis, MO USA
| | - Daniela Witten
- 0000000122986657grid.34477.33Department of Statistics, University of Washington, Seattle, WA USA ,0000000122986657grid.34477.33Department of Biostatistics, University of Washington, Seattle, WA USA
| | - Kim C. Worley
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX USA
| | - Xiaoling Zhang
- 0000 0004 1936 7558grid.189504.1Departments of Biostatistics, Boston University School of Public Health, Boston, MA USA ,0000 0004 0367 5222grid.475010.7Department of Medicine (Biomedical Genetics), Boston University School of Medicine, Boston, MA USA
| | | | - Celine Bellenguez
- grid.457380.dInserm, U1167, RID-AGE-Risk Factors and Molecular Determinants of Aging-Related Diseases, Lille, France
| | - Jean-Charles Lambert
- grid.457380.dInserm, U1167, RID-AGE-Risk Factors and Molecular Determinants of Aging-Related Diseases, Lille, France
| | - Mitja I. Kurki
- 0000 0004 0410 2071grid.7737.4Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland ,grid.66859.34Program in Medical and Population Genetics and Genetic Analysis Platform, Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA USA ,0000 0004 0386 9924grid.32224.35Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA USA
| | - Aarno Palotie
- 0000 0004 0410 2071grid.7737.4Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland ,grid.66859.34Program in Medical and Population Genetics and Genetic Analysis Platform, Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA USA ,0000 0004 0386 9924grid.32224.35Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA USA
| | - Mark Daly
- grid.66859.34Broad Institute of MIT and Harvard, Cambridge, MA USA ,0000 0004 0410 2071grid.7737.4Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland ,0000 0004 0386 9924grid.32224.35Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA USA
| | - Eric Boerwinkle
- 0000 0001 2160 926Xgrid.39382.33Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX USA ,0000 0000 9206 2401grid.267308.8School of Public Health, University of Texas Health Science Center at Houston, Houston, TX USA
| | - Kathryn L. Lunetta
- 0000 0004 1936 7558grid.189504.1Departments of Biostatistics, Boston University School of Public Health, Boston, MA USA
| | - Anita L. Destefano
- 0000 0004 1936 7558grid.189504.1Departments of Biostatistics, Boston University School of Public Health, Boston, MA USA ,0000 0004 0367 5222grid.475010.7Departments of Neurology, Boston University School of Medicine, Boston, MA USA
| | - Josée Dupuis
- 0000 0004 1936 7558grid.189504.1Departments of Biostatistics, Boston University School of Public Health, Boston, MA USA
| | - Eden R. Martin
- 0000 0004 1936 8606grid.26790.3aJohn P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL USA
| | - Gerard D. Schellenberg
- 0000 0004 1936 8972grid.25879.31University of Pennsylvania Perelman School of Medicine, Philadelphia, PA USA
| | - Sudha Seshadri
- 0000 0001 2293 4638grid.279885.9National Heart, Lung, and Blood Institute’s Framingham Heart Study, Framingham, MA USA ,0000 0004 0367 5222grid.475010.7Departments of Neurology, Boston University School of Medicine, Boston, MA USA ,0000 0001 0629 5880grid.267309.9Glenn Biggs Institute for Alzheimer’s and Neurodegenerative Diseases, University of Texas Health Sciences Center, San Antonio, TX USA
| | - Adam C. Naj
- 0000 0004 1936 8972grid.25879.31University of Pennsylvania Perelman School of Medicine, Philadelphia, PA USA
| | - Myriam Fornage
- 0000 0000 9206 2401grid.267308.8Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX USA ,0000 0000 9206 2401grid.267308.8School of Public Health, University of Texas Health Science Center at Houston, Houston, TX USA
| | - Lindsay A. Farrer
- 0000 0004 1936 7558grid.189504.1Departments of Biostatistics, Boston University School of Public Health, Boston, MA USA ,0000 0004 0367 5222grid.475010.7Department of Medicine (Biomedical Genetics), Boston University School of Medicine, Boston, MA USA ,0000 0004 0367 5222grid.475010.7Departments of Neurology, Boston University School of Medicine, Boston, MA USA ,0000 0004 1936 7558grid.189504.1Department of Epidemiology, Boston University School of Public Health, Boston, MA USA ,0000 0004 0367 5222grid.475010.7Department of Ophthalmology, Boston University School of Medicine, Boston, MA USA
| |
Collapse
|
12
|
Naj AC, Lin H, Vardarajan BN, White S, Lancour D, Ma Y, Schmidt M, Sun F, Butkiewicz M, Bush WS, Kunkle BW, Malamon J, Amin N, Choi SH, Hamilton-Nelson KL, van der Lee SJ, Gupta N, Koboldt DC, Saad M, Wang B, Nato AQ, Sohi HK, Kuzma A, Wang LS, Cupples LA, van Duijn C, Seshadri S, Schellenberg GD, Boerwinkle E, Bis JC, Dupuis J, Salerno WJ, Wijsman EM, Martin ER, DeStefano AL. Quality control and integration of genotypes from two calling pipelines for whole genome sequence data in the Alzheimer's disease sequencing project. Genomics 2019; 111:808-818. [PMID: 29857119 PMCID: PMC6397097 DOI: 10.1016/j.ygeno.2018.05.004] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Revised: 04/03/2018] [Accepted: 05/06/2018] [Indexed: 12/30/2022]
Abstract
The Alzheimer's Disease Sequencing Project (ADSP) performed whole genome sequencing (WGS) of 584 subjects from 111 multiplex families at three sequencing centers. Genotype calling of single nucleotide variants (SNVs) and insertion-deletion variants (indels) was performed centrally using GATK-HaplotypeCaller and Atlas V2. The ADSP Quality Control (QC) Working Group applied QC protocols to project-level variant call format files (VCFs) from each pipeline, and developed and implemented a novel protocol, termed "consensus calling," to combine genotype calls from both pipelines into a single high-quality set. QC was applied to autosomal bi-allelic SNVs and indels, and included pipeline-recommended QC filters, variant-level QC, and sample-level QC. Low-quality variants or genotypes were excluded, and sample outliers were noted. Quality was assessed by examining Mendelian inconsistencies (MIs) among 67 parent-offspring pairs, and MIs were used to establish additional genotype-specific filters for GATK calls. After QC, 578 subjects remained. Pipeline-specific QC excluded ~12.0% of GATK and 14.5% of Atlas SNVs. Between pipelines, ~91% of SNV genotypes across all QCed variants were concordant; 4.23% and 4.56% of genotypes were exclusive to Atlas or GATK, respectively; the remaining ~0.01% of discordant genotypes were excluded. For indels, variant-level QC excluded ~36.8% of GATK and 35.3% of Atlas indels. Between pipelines, ~55.6% of indel genotypes were concordant; while 10.3% and 28.3% were exclusive to Atlas or GATK, respectively; and ~0.29% of discordant genotypes were. The final WGS consensus dataset contains 27,896,774 SNVs and 3,133,926 indels and is publicly available.
Collapse
Affiliation(s)
- Adam C Naj
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| | - Honghuang Lin
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Badri N Vardarajan
- Department of Neurology, Columbia University Medical Center, New York, NY, USA
| | - Simon White
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Daniel Lancour
- Department of Biomedical Genetics, Boston University School of Medicine, Boston, MA, USA
| | - Yiyi Ma
- Department of Biomedical Genetics, Boston University School of Medicine, Boston, MA, USA
| | - Michael Schmidt
- John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Fangui Sun
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Mariusz Butkiewicz
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| | - William S Bush
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| | - Brian W Kunkle
- John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - John Malamon
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Najaf Amin
- Department of Epidemiology, Erasmus Medical Center, Rotterdam, the Netherlands
| | - Seung Hoan Choi
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Kara L Hamilton-Nelson
- John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Sven J van der Lee
- Department of Epidemiology, Erasmus Medical Center, Rotterdam, the Netherlands
| | - Namrata Gupta
- Medical and Population Genetics Program, Broad Institute, Cambridge, MA, USA
| | - Daniel C Koboldt
- Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA
| | - Mohamad Saad
- Department of Biostatistics, University of Washington, Seattle, WA, USA; Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Bowen Wang
- Department of Statistics, University of Washington, Seattle, WA, USA
| | - Alejandro Q Nato
- Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Harkirat K Sohi
- Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Amanda Kuzma
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - L Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; The Framingham Heart Study, Framingham, MA, USA
| | - Cornelia van Duijn
- Department of Epidemiology, Erasmus Medical Center, Rotterdam, the Netherlands
| | - Sudha Seshadri
- The Framingham Heart Study, Framingham, MA, USA; Department of Neurology, Boston University School of Medicine, Boston, MA, USA
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Eric Boerwinkle
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA; Human Genetics Center, University of Texas Health Science Center, Houston, TX, USA
| | - Joshua C Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Josée Dupuis
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; The Framingham Heart Study, Framingham, MA, USA
| | - William J Salerno
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Ellen M Wijsman
- Department of Biostatistics, University of Washington, Seattle, WA, USA; Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Eden R Martin
- John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Anita L DeStefano
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; The Framingham Heart Study, Framingham, MA, USA; Department of Neurology, Boston University School of Medicine, Boston, MA, USA
| |
Collapse
|
13
|
Leung YY, Valladares O, Chou YF, Lin HJ, Kuzma AB, Cantwell L, Qu L, Gangadharan P, Salerno WJ, Schellenberg GD, Wang LS. VCPA: genomic variant calling pipeline and data management tool for Alzheimer's Disease Sequencing Project. Bioinformatics 2019; 35:1768-1770. [PMID: 30351394 PMCID: PMC6513159 DOI: 10.1093/bioinformatics/bty894] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2018] [Revised: 09/27/2018] [Accepted: 10/22/2018] [Indexed: 12/30/2022] Open
Abstract
SUMMARY We report VCPA, our SNP/Indel Variant Calling Pipeline and data management tool used for the analysis of whole genome and exome sequencing (WGS/WES) for the Alzheimer's Disease Sequencing Project. VCPA consists of two independent but linkable components: pipeline and tracking database. The pipeline, implemented using the Workflow Description Language and fully optimized for the Amazon elastic compute cloud environment, includes steps from aligning raw sequence reads to variant calling using GATK. The tracking database allows users to view job running status in real time and visualize >100 quality metrics per genome. VCPA is functionally equivalent to the CCDG/TOPMed pipeline. Users can use the pipeline and the dockerized database to process large WGS/WES datasets on Amazon cloud with minimal configuration. AVAILABILITY AND IMPLEMENTATION VCPA is released under the MIT license and is available for academic and nonprofit use for free. The pipeline source code and step-by-step instructions are available from the National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (http://www.niagads.org/VCPA). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuk Yee Leung
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Penn Neurodegeneration Genomics Center, Philadelphia, PA, USA
| | - Otto Valladares
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Penn Neurodegeneration Genomics Center, Philadelphia, PA, USA
| | - Yi-Fan Chou
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Penn Neurodegeneration Genomics Center, Philadelphia, PA, USA
| | - Han-Jen Lin
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Penn Neurodegeneration Genomics Center, Philadelphia, PA, USA
| | - Amanda B Kuzma
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Penn Neurodegeneration Genomics Center, Philadelphia, PA, USA
| | - Laura Cantwell
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Penn Neurodegeneration Genomics Center, Philadelphia, PA, USA
| | - Liming Qu
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Penn Neurodegeneration Genomics Center, Philadelphia, PA, USA
| | - Prabhakaran Gangadharan
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Penn Neurodegeneration Genomics Center, Philadelphia, PA, USA
| | - William J Salerno
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Penn Neurodegeneration Genomics Center, Philadelphia, PA, USA
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Penn Neurodegeneration Genomics Center, Philadelphia, PA, USA
| |
Collapse
|
14
|
Leung YY, Valladares O, Chou YF, Lin HJ, Kuzma AB, Cantwell L, Qu L, Gangadharan P, Salerno WJ, Schellenberg GD, Wang LS. VCPA: genomic variant calling pipeline and data management tool for Alzheimer's Disease Sequencing Project. Bioinformatics 2019; 35:1985. [PMID: 31004159 PMCID: PMC6546126 DOI: 10.1093/bioinformatics/btz216] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
15
|
Gangadharan P, Leung YY, Valladares O, Chou YF, Kuzma AB, Cantwell LB, Qu L, Lin HJ, Zhao Y, Malamon JS, Naj AC, Salerno WJ, Schellenberg GD, Wang LS. P4‐044: THE GCAD CLOUD‐BASED WORKFLOW FOR PROCESSING WHOLE EXOME AND WHOLE GENOME DATA FROM THE ALZHEIMER'S DISEASE SEQUENCING PROJECT. Alzheimers Dement 2018. [DOI: 10.1016/j.jalz.2018.06.2446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Affiliation(s)
| | | | - Otto Valladares
- University of Pennsylvania Perelman School of MedicinePhiladelphiaPAUSA
| | | | - Amanda B. Kuzma
- University of Pennsylvania Perelman School of MedicinePhiladelphiaPAUSA
| | | | - Liming Qu
- University of Pennsylvania Perelman School of MedicinePhiladelphiaPAUSA
| | | | - Yi Zhao
- University of PennsylvaniaPhiladelphiaPAUSA
| | | | - Adam C. Naj
- University of Pennsylvania Perelman School of MedicinePhiladelphiaPAUSA
| | | | | | - Li-San Wang
- University of Pennsylvania Perelman School of MedicinePhiladelphiaPAUSA
| |
Collapse
|
16
|
Jian X, Chiang T, Worley KC, Bis JC, Destefano AL, Seshadri S, Boerwinkle E, Fornage M, Salerno WJ. O3‐06‐01: WHOLE EXOME SEQUENCING STUDY IDENTIFIES RARE COPY NUMBER VARIATIONS FOR LATE‐ONSET ALZHEIMER'S DISEASE: THE ALZHEIMER'S DISEASE SEQUENCING PROJECT CASE‐CONTROL ANALYSIS. Alzheimers Dement 2018. [DOI: 10.1016/j.jalz.2018.06.2800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Affiliation(s)
- Xueqiu Jian
- The University of Texas Health Science Center at HoustonHoustonTXUSA
| | | | | | | | | | - Sudha Seshadri
- Boston UniversityBostonMAUSA
- The University of Texas Health Science Center at San AntonioSan AntonioTXUSA
| | - Eric Boerwinkle
- The University of Texas Health Science Center at HoustonHoustonTXUSA
- Baylor College of MedicineHoustonTXUSA
| | - Myriam Fornage
- The University of Texas Health Science Center at HoustonHoustonTXUSA
| | | |
Collapse
|
17
|
Kuzma AB, Faber K, Salerno WJ, Leung YY, Cantwell LB, Gupta N, Fulton R, Valladares O, Vogel B, Appelbaum E, Choi SH, Hamilton-Nelson KL, Zhao Y, Muzny D, Qu L, Reyes-Dumeyer D, Waligorski J, Farrell J, Naj AC, Bis JC, Destefano AL, Seshadri S, Boerwinkle E, Schellenberg G, Foroud TM, Wang LS. P1‐149: THE ALZHEIMER'S DISEASE SEQUENCING PROJECT (ADSP) DATA UPDATE 2018. Alzheimers Dement 2018. [DOI: 10.1016/j.jalz.2018.06.152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Affiliation(s)
- Amanda B. Kuzma
- University of Pennsylvania Perelman School of MedicinePhiladelphiaPAUSA
| | - Kelley Faber
- Indiana University School of MedicineIndianapolisINUSA
| | | | - Yuk Yee Leung
- University of Pennsylvania Perelman School of MedicinePhiladelphiaPAUSA
| | | | | | - Robert Fulton
- The Genome InstituteWashington UniversitySt. LouisMOUSA
| | - Otto Valladares
- University of Pennsylvania Perelman School of MedicinePhiladelphiaPAUSA
| | - Briana Vogel
- University of Pennsylvania Perelman School of MedicinePhiladelphiaPAUSA
| | | | | | | | - Yi Zhao
- University of Pennsylvania Perelman School of MedicinePhiladelphiaPAUSA
| | | | - Liming Qu
- University of Pennsylvania Perelman School of MedicinePhiladelphiaPAUSA
| | | | | | | | - Adam C. Naj
- University of Pennsylvania Perelman School of MedicinePhiladelphiaPAUSA
| | | | | | - Sudha Seshadri
- University of Texas Health Sciences CenterSan AntonioTXUSA
| | - Eric Boerwinkle
- University of Texas Health Science Center at HoustonHoustonTXUSA
| | | | | | | | | |
Collapse
|
18
|
Hampton OA, English AC, Wang M, Salerno WJ, Liu Y, Muzny DM, Han Y, Wheeler DA, Worley KC, Lupski JR, Gibbs RA. SVachra: a tool to identify genomic structural variation in mate pair sequencing data containing inward and outward facing reads. BMC Genomics 2017; 18:691. [PMID: 28984202 PMCID: PMC5629590 DOI: 10.1186/s12864-017-4021-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Background Characterization of genomic structural variation (SV) is essential to expanding the research and clinical applications of genome sequencing. Reliance upon short DNA fragment paired end sequencing has yielded a wealth of single nucleotide variants and internal sequencing read insertions-deletions, at the cost of limited SV detection. Multi-kilobase DNA fragment mate pair sequencing has supplemented the void in SV detection, but introduced new analytic challenges requiring SV detection tools specifically designed for mate pair sequencing data. Here, we introduce SVachra – Structural Variation Assessment of CHRomosomal Aberrations, a breakpoint calling program that identifies large insertions-deletions, inversions, inter- and intra-chromosomal translocations utilizing both inward and outward facing read types generated by mate pair sequencing.
Results We demonstrate SVachra’s utility by executing the program on large-insert (Illumina Nextera) mate pair sequencing data from the personal genome of a single subject (HS1011). An additional data set of long-read (Pacific BioSciences RSII) was also generated to validate SV calls from SVachra and other comparison SV calling programs. SVachra exhibited the highest validation rate and reported the widest distribution of SV types and size ranges when compared to other SV callers. Conclusions SVachra is a highly specific breakpoint calling program that exhibits a more unbiased SV detection methodology than other callers. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4021-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Oliver A Hampton
- Human Genome Sequencing Center, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA.
| | - Adam C English
- Human Genome Sequencing Center, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA
| | - Mark Wang
- Human Genome Sequencing Center, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA
| | - William J Salerno
- Human Genome Sequencing Center, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA
| | - Yue Liu
- Human Genome Sequencing Center, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA
| | - Yi Han
- Human Genome Sequencing Center, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA
| | - David A Wheeler
- Human Genome Sequencing Center, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA
| | - Kim C Worley
- Human Genome Sequencing Center, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA
| | - James R Lupski
- Human Genome Sequencing Center, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA.,Department of Pediatrics, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA.,Texas Children's Hospital, 6621 Fanin Street, Houston, TX, 77030, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA.,Department of Molecular and Human Genetics, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA
| |
Collapse
|
19
|
English AC, Salerno WJ, Hampton OA, Gonzaga-Jauregui C, Ambreth S, Ritter DI, Beck CR, Davis CF, Dahdouli M, Ma S, Carroll A, Veeraraghavan N, Bruestle J, Drees B, Hastie A, Lam ET, White S, Mishra P, Wang M, Han Y, Zhang F, Stankiewicz P, Wheeler DA, Reid JG, Muzny DM, Rogers J, Sabo A, Worley KC, Lupski JR, Boerwinkle E, Gibbs RA. Assessing structural variation in a personal genome-towards a human reference diploid genome. BMC Genomics 2015; 16:286. [PMID: 25886820 PMCID: PMC4490614 DOI: 10.1186/s12864-015-1479-3] [Citation(s) in RCA: 105] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Accepted: 03/23/2015] [Indexed: 01/19/2023] Open
Abstract
Background Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity, and size. These challenges are exacerbated by the experimental and computational demands of SV analysis. Here, we characterize the SV content of a personal genome with Parliament, a publicly available consensus SV-calling infrastructure that merges multiple data types and SV detection methods. Results We demonstrate Parliament’s efficacy via integrated analyses of data from whole-genome array comparative genomic hybridization, short-read next-generation sequencing, long-read (Pacific BioSciences RSII), long-insert (Illumina Nextera), and whole-genome architecture (BioNano Irys) data from the personal genome of a single subject (HS1011). From this genome, Parliament identified 31,007 genomic loci between 100 bp and 1 Mbp that are inconsistent with the hg19 reference assembly. Of these loci, 9,777 are supported as putative SVs by hybrid local assembly, long-read PacBio data, or multi-source heuristics. These SVs span 59 Mbp of the reference genome (1.8%) and include 3,801 events identified only with long-read data. The HS1011 data and complete Parliament infrastructure, including a BAM-to-SV workflow, are available on the cloud-based service DNAnexus. Conclusions HS1011 SV analysis reveals the limits and advantages of multiple sequencing technologies, specifically the impact of long-read SV discovery. With the full Parliament infrastructure, the HS1011 data constitute a public resource for novel SV discovery, software calibration, and personal genome structural variation analysis. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1479-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Adam C English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - William J Salerno
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Oliver A Hampton
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Claudia Gonzaga-Jauregui
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Shruthi Ambreth
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Deborah I Ritter
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Christine R Beck
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Caleb F Davis
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Mahmoud Dahdouli
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Singer Ma
- DNAnexus, Mountain View, CA, 94040, USA.
| | | | | | | | - Becky Drees
- Spiral Genetics Inc, Seattle, WA, 98117, USA.
| | - Alex Hastie
- BioNano Genomics Inc, San Diego, CA, 92121, USA.
| | - Ernest T Lam
- BioNano Genomics Inc, San Diego, CA, 92121, USA.
| | - Simon White
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Pamela Mishra
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Min Wang
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Yi Han
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Feng Zhang
- Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai, 200438, China.
| | - Pawel Stankiewicz
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - David A Wheeler
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Jeffrey G Reid
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Jeffrey Rogers
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Aniko Sabo
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Kim C Worley
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - James R Lupski
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA. .,Texas Children's Hospital, Houston, TX, 77030, USA.
| | - Eric Boerwinkle
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| |
Collapse
|
20
|
English AC, Salerno WJ, Reid JG. PBHoney: identifying genomic variants via long-read discordance and interrupted mapping. BMC Bioinformatics 2014; 15:180. [PMID: 24915764 PMCID: PMC4082283 DOI: 10.1186/1471-2105-15-180] [Citation(s) in RCA: 94] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Accepted: 06/04/2014] [Indexed: 11/25/2022] Open
Abstract
Background As resequencing projects become more prevalent across a larger number of species, accurate variant identification will further elucidate the nature of genetic diversity and become increasingly relevant in genomic studies. However, the identification of larger genomic variants via DNA sequencing is limited by both the incomplete information provided by sequencing reads and the nature of the genome itself. Long-read sequencing technologies provide high-resolution access to structural variants often inaccessible to shorter reads. Results We present PBHoney, software that considers both intra-read discordance and soft-clipped tails of long reads (>10,000 bp) to identify structural variants. As a proof of concept, we identify four structural variants and two genomic features in a strain of Escherichia coli with PBHoney and validate them via de novo assembly. PBHoney is available for download at http://sourceforge.net/projects/pb-jelly/. Conclusions Implementing two variant-identification approaches that exploit the high mappability of long reads, PBHoney is demonstrated as being effective at detecting larger structural variants using whole-genome Pacific Biosciences RS II Continuous Long Reads. Furthermore, PBHoney is able to discover two genomic features: the existence of Rac-Phage in isolate; evidence of E. coli’s circular genome.
Collapse
Affiliation(s)
- Adam C English
- Human Genome Sequencing Center at Baylor College of Medicine, One Baylor Plaza, Houston 77030, Texas, USA.
| | | | | |
Collapse
|
21
|
Salerno WJ, Seaver SM, Armstrong BR, Radhakrishnan I. MONSTER: inferring non-covalent interactions in macromolecular structures from atomic coordinate data. Nucleic Acids Res 2004; 32:W566-8. [PMID: 15215451 PMCID: PMC441572 DOI: 10.1093/nar/gkh434] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A web application for inferring potentially stabilizing non-bonding interactions in macromolecular structures from input atomic coordinate data is described. The core software, called Monster, comprises a PERL wrapper that takes advantage of scripts developed in-house as well as established software in the public domain to validate atomic coordinate files, identify interacting residues and assign the nature of these interactions. The results are assembled and presented in an intuitive and interactive graphical format. Potential applications of Monster range from mining and validating experimentally determined structures to guiding functional analysis. Non-commercial users can perform Monster analysis free of charge at http://monster.northwestern.edu.
Collapse
Affiliation(s)
- William J Salerno
- Department of Biochemistry, Molecular Biology and Cell Biology, Northwestern University, Evanston, IL 60208-3500, USA
| | | | | | | |
Collapse
|
22
|
Kang RS, Daniels CM, Francis SA, Shih SC, Salerno WJ, Hicke L, Radhakrishnan I. Solution structure of a CUE-ubiquitin complex reveals a conserved mode of ubiquitin binding. Cell 2003; 113:621-30. [PMID: 12787503 DOI: 10.1016/s0092-8674(03)00362-3] [Citation(s) in RCA: 192] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Monoubiquitination serves as a regulatory signal in a variety of cellular processes. Monoubiquitin signals are transmitted by binding to a small but rapidly expanding class of ubiquitin binding motifs. Several of these motifs, including the CUE domain, also promote intramolecular monoubiquitination. The solution structure of a CUE domain of the yeast Cue2 protein in complex with ubiquitin reveals intermolecular interactions involving conserved hydrophobic surfaces, including the Leu8-Ile44-Val70 patch on ubiquitin. The contact surface extends beyond this patch and encompasses Lys48, a site of polyubiquitin chain formation. This suggests an occlusion mechanism for inhibiting polyubiquitin chain formation during monoubiquitin signaling. The CUE domain shares a similar overall architecture with the UBA domain, which also contains a conserved hydrophobic patch. Comparative modeling suggests that the UBA domain interacts analogously with ubiquitin. The structure of the CUE-ubiquitin complex may thus serve as a paradigm for ubiquitin recognition and signaling by ubiquitin binding proteins.
Collapse
Affiliation(s)
- Richard S Kang
- Department of Biochemistry, Molecular Biology, and Cell Biology, Northwestern University, Evanston, IL 60208, USA
| | | | | | | | | | | | | |
Collapse
|