1
|
Guo MH, Francioli LC, Stenton SL, Goodrich JK, Watts NA, Singer-Berk M, Groopman E, Darnowsky PW, Solomonson M, Baxter S, Tiao G, Neale BM, Hirschhorn JN, Rehm HL, Daly MJ, O'Donnell-Luria A, Karczewski KJ, MacArthur DG, Samocha KE. Inferring compound heterozygosity from large-scale exome sequencing data. Nat Genet 2024; 56:152-161. [PMID: 38057443 PMCID: PMC10872287 DOI: 10.1038/s41588-023-01608-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 11/08/2023] [Indexed: 12/08/2023]
Abstract
Recessive diseases arise when both copies of a gene are impacted by a damaging genetic variant. When a patient carries two potentially causal variants in a gene, accurate diagnosis requires determining that these variants occur on different copies of the chromosome (that is, are in trans) rather than on the same copy (that is, in cis). However, current approaches for determining phase, beyond parental testing, are limited in clinical settings. Here we developed a strategy for inferring phase for rare variant pairs within genes, leveraging genotypes observed in the Genome Aggregation Database (v2, n = 125,748 exomes). Our approach estimates phase with 96% accuracy, both in trio data and in patients with Mendelian conditions and presumed causal compound heterozygous variants. We provide a public resource of phasing estimates for coding variants and counts per gene of rare variants in trans that can aid interpretation of rare co-occurring variants in the context of recessive disease.
Collapse
Affiliation(s)
- Michael H Guo
- Department of Neurology, Hospital of the University of the Pennsylvania, Philadelphia, PA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Laurent C Francioli
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Sarah L Stenton
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Julia K Goodrich
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Nicholas A Watts
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Moriel Singer-Berk
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Emily Groopman
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
| | - Philip W Darnowsky
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Matthew Solomonson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Samantha Baxter
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Grace Tiao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Benjamin M Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Joel N Hirschhorn
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Departments of Genetics and Pediatrics, Harvard Medical School, Boston, MA, USA
- Division of Endocrinology, Boston Children's Hospital, Boston, MA, USA
- Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, MA, USA
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Mark J Daly
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Institute for Molecular Medicine Finland (FIMM), Helsinki, Finland
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Konrad J Karczewski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Daniel G MacArthur
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Centre for Population Genomics, Garvan Institute of Medical Research and UNSW Sydney, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| | - Kaitlin E Samocha
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
| |
Collapse
|
2
|
Wang D, Perera D, He J, Cao C, Kossinna P, Li Q, Zhang W, Guo X, Platt A, Wu J, Zhang Q. cLD: Rare-variant linkage disequilibrium between genomic regions identifies novel genomic interactions. PLoS Genet 2023; 19:e1011074. [PMID: 38109434 PMCID: PMC10758262 DOI: 10.1371/journal.pgen.1011074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 01/01/2024] [Accepted: 11/20/2023] [Indexed: 12/20/2023] Open
Abstract
Linkage disequilibrium (LD) is a fundamental concept in genetics; critical for studying genetic associations and molecular evolution. However, LD measurements are only reliable for common genetic variants, leaving low-frequency variants unanalyzed. In this work, we introduce cumulative LD (cLD), a stable statistic that captures the rare-variant LD between genetic regions, which reflects more biological interactions between variants, in addition to lack of recombination. We derived the theoretical variance of cLD using delta methods to demonstrate its higher stability than LD for rare variants. This property is also verified by bootstrapped simulations using real data. In application, we find cLD reveals an increased genetic association between genes in 3D chromatin interactions, a phenomenon recently reported negatively by calculating standard LD between common variants. Additionally, we show that cLD is higher between gene pairs reported in interaction databases, identifies unreported protein-protein interactions, and reveals interacting genes distinguishing case/control samples in association studies.
Collapse
Affiliation(s)
- Dinghao Wang
- Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, Canada
| | - Deshan Perera
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
| | - Jingni He
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
| | - Chen Cao
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
| | - Pathum Kossinna
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
| | - Qing Li
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
| | - William Zhang
- The Harker School, San Jose, California, United States of America
| | - Xingyi Guo
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Alexander Platt
- Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Jingjing Wu
- Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, Canada
| | - Qingrun Zhang
- Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, Canada
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
3
|
Guo MH, Francioli LC, Stenton SL, Goodrich JK, Watts NA, Singer-Berk M, Groopman E, Darnowsky PW, Solomonson M, Baxter S, gnomAD Project Consortium, Tiao G, Neale BM, Hirschhorn JN, Rehm HL, Daly MJ, O’Donnell-Luria A, Karczewski KJ, MacArthur DG, Samocha KE. Inferring compound heterozygosity from large-scale exome sequencing data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.19.533370. [PMID: 36993580 PMCID: PMC10055215 DOI: 10.1101/2023.03.19.533370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/22/2023]
Abstract
Recessive diseases arise when both the maternal and the paternal copies of a gene are impacted by a damaging genetic variant in the affected individual. When a patient carries two different potentially causal variants in a gene for a given disorder, accurate diagnosis requires determining that these two variants occur on different copies of the chromosome (i.e., are in trans) rather than on the same copy (i.e. in cis). However, current approaches for determining phase, beyond parental testing, are limited in clinical settings. We developed a strategy for inferring phase for rare variant pairs within genes, leveraging genotypes observed in exome sequencing data from the Genome Aggregation Database (gnomAD v2, n=125,748). When applied to trio data where phase can be determined by transmission, our approach estimates phase with 95.7% accuracy and remains accurate even for very rare variants (allele frequency < 1×10-4). We also correctly phase 95.9% of variant pairs in a set of 293 patients with Mendelian conditions carrying presumed causal compound heterozygous variants. We provide a public resource of phasing estimates from gnomAD, including phasing estimates for coding variants across the genome and counts per gene of rare variants in trans, that can aid interpretation of rare co-occurring variants in the context of recessive disease.
Collapse
Affiliation(s)
- Michael H. Guo
- Department of Neurology, Hospital of the University of the Pennsylvania, Philadelphia, PA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Laurent C. Francioli
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Sarah L. Stenton
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA
| | - Julia K. Goodrich
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Nicholas A. Watts
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Moriel Singer-Berk
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Emily Groopman
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA
| | - Philip W. Darnowsky
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Matthew Solomonson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Samantha Baxter
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Grace Tiao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Benjamin M. Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Joel N. Hirschhorn
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Division of Endocrinology, Boston Children’s Hospital, Boston, MA, USA
- Center for Basic and Translational Obesity Research, Boston Children’s Hospital, Boston, MA, USA
| | - Heidi L. Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Mark J. Daly
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Institute for Molecular Medicine Finland, (FIMM) Helsinki, Finland
| | - Anne O’Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Konrad J. Karczewski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Daniel G. MacArthur
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Centre for Population Genomics, Garvan Institute of Medical Research, UNSW Sydney, Sydney, Australia
- Centre for Population Genomics, Murdoch Children’s Research Institute, Melbourne, Australia
| | - Kaitlin E. Samocha
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
4
|
Genotyping, the Usefulness of Imputation to Increase SNP Density, and Imputation Methods and Tools. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2467:113-138. [PMID: 35451774 DOI: 10.1007/978-1-0716-2205-6_4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Imputation has become a standard practice in modern genetic research to increase genome coverage and improve accuracy of genomic selection and genome-wide association study as a large number of samples can be genotyped at lower density (and lower cost) and, imputed up to denser marker panels or to sequence level, using information from a limited reference population. Most genotype imputation algorithms use information from relatives and population linkage disequilibrium. A number of software for imputation have been developed originally for human genetics and, more recently, for animal and plant genetics considering pedigree information and very sparse SNP arrays or genotyping-by-sequencing data. In comparison to human populations, the population structures in farmed species and their limited effective sizes allow to accurately impute high-density genotypes or sequences from very low-density SNP panels and a limited set of reference individuals. Whatever the imputation method, the imputation accuracy, measured by the correct imputation rate or the correlation between true and imputed genotypes, increased with the increasing relatedness of the individual to be imputed with its denser genotyped ancestors and as its own genotype density increased. Increasing the imputation accuracy pushes up the genomic selection accuracy whatever the genomic evaluation method. Given the marker densities, the most important factors affecting imputation accuracy are clearly the size of the reference population and the relationship between individuals in the reference and target populations.
Collapse
|
5
|
Tan KT, Kim H, Carrot-Zhang J, Zhang Y, Kim WJ, Kugener G, Wala JA, Howard TP, Chi YY, Beroukhim R, Li H, Ha G, Alper SL, Perlman EJ, Mullen EA, Hahn WC, Meyerson M, Hong AL. Haplotype-resolved germline and somatic alterations in renal medullary carcinomas. Genome Med 2021; 13:114. [PMID: 34261517 PMCID: PMC8281718 DOI: 10.1186/s13073-021-00929-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 06/25/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Renal medullary carcinomas (RMCs) are rare kidney cancers that occur in adolescents and young adults of African ancestry. Although RMC is associated with the sickle cell trait and somatic loss of the tumor suppressor, SMARCB1, the ancestral origins of RMC remain unknown. Further, characterization of structural variants (SVs) involving SMARCB1 in RMC remains limited. METHODS We used linked-read genome sequencing to reconstruct germline and somatic haplotypes in 15 unrelated patients with RMC registered on the Children's Oncology Group (COG) AREN03B2 study between 2006 and 2017 or from our prior study. We performed fine-mapping of the HBB locus and assessed the germline for cancer predisposition genes. Subsequently, we assessed the tumor samples for mutations outside of SMARCB1 and integrated RNA sequencing to interrogate the structural variants at the SMARCB1 locus. RESULTS We find that the haplotype of the sickle cell mutation in patients with RMC originated from three geographical regions in Africa. In addition, fine-mapping of the HBB locus identified the sickle cell mutation as the sole candidate variant. We further identify that the SMARCB1 structural variants are characterized by blunt or 1-bp homology events. CONCLUSIONS Our findings suggest that RMC does not arise from a single founder population and that the HbS allele is a strong candidate germline allele which confers risk for RMC. Furthermore, we find that the SVs that disrupt SMARCB1 function are likely repaired by non-homologous end-joining. These findings highlight how haplotype-based analyses using linked-read genome sequencing can be applied to identify potential risk variants in small and rare disease cohorts and provide nucleotide resolution to structural variants.
Collapse
Affiliation(s)
- Kar-Tong Tan
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Hyunji Kim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Jian Carrot-Zhang
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Yuxiang Zhang
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Won Jun Kim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Jeremiah A Wala
- Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Thomas P Howard
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Yueh-Yun Chi
- Department of Pediatrics, University of Southern California, Los Angeles, CA, USA
| | - Rameen Beroukhim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Heng Li
- Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Gavin Ha
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Seth L Alper
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | | | - Elizabeth A Mullen
- Department of Hematology and Oncology, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - William C Hahn
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Matthew Meyerson
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
| | - Andrew L Hong
- Department of Pediatrics, Emory University, Atlanta, GA, USA.
- Aflac Center for Cancer and Blood Disorders, Children's Healthcare of Atlanta, Atlanta, GA, USA.
| |
Collapse
|
6
|
Charon C, Allodji R, Meyer V, Deleuze JF. Impact of pre- and post-variant filtration strategies on imputation. Sci Rep 2021; 11:6214. [PMID: 33737531 PMCID: PMC7973508 DOI: 10.1038/s41598-021-85333-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Accepted: 02/22/2021] [Indexed: 01/04/2023] Open
Abstract
Quality control (QC) methods for genome-wide association studies and fine mapping are commonly used for imputation, however they result in loss of many single nucleotide polymorphisms (SNPs). To investigate the consequences of filtration on imputation, we studied the direct effects on the number of markers, their allele frequencies, imputation quality scores and post-filtration events. We pre-phrased 1031 genotyped individuals from diverse ethnicities and compared the imputed variants to 1089 NCBI recorded individuals for additional validation. Without QC-based variant pre-filtration, we observed no impairment in the imputation of SNPs that failed QC whereas with pre-filtration there was an overall loss of information. Significant differences between frequencies with and without pre-filtration were found only in the range of very rare (5E-04-1E-03) and rare variants (1E-03-5E-03) (p < 1E-04). Increasing the post-filtration imputation quality score from 0.3 to 0.8 reduced the number of single nucleotide variants (SNVs) < 0.001 2.5 fold with or without QC pre-filtration and halved the number of very rare variants (5E-04). Thus, to maintain confidence and enough SNVs, we propose here a two-step filtering procedure which allows less stringent filtering prior to imputation and post-imputation in order to increase the number of very rare and rare variants compared to conservative filtration methods.
Collapse
Affiliation(s)
- Céline Charon
- CEA Paris-Saclay, Institut François Jacob, Centre National de Recherche en Génomique Humaine, 2 rue Gaston Crémieux, Evry, 91057, France.
| | - Rodrigue Allodji
- Radiation Epidemiology Group CESP, Inserm Unit 1018, Gustave Roussy Université Paris Saclay, 114 rue Edouard Vaillant, Villejuif, 94805, France
| | - Vincent Meyer
- CEA Paris-Saclay, Institut François Jacob, Centre National de Recherche en Génomique Humaine, 2 rue Gaston Crémieux, Evry, 91057, France
| | - Jean-François Deleuze
- CEA Paris-Saclay, Institut François Jacob, Centre National de Recherche en Génomique Humaine, 2 rue Gaston Crémieux, Evry, 91057, France
| |
Collapse
|
7
|
Delaneau O, Zagury JF, Robinson MR, Marchini JL, Dermitzakis ET. Accurate, scalable and integrative haplotype estimation. Nat Commun 2019; 10:5436. [PMID: 31780650 PMCID: PMC6882857 DOI: 10.1038/s41467-019-13225-y] [Citation(s) in RCA: 327] [Impact Index Per Article: 54.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 10/10/2019] [Indexed: 01/28/2023] Open
Abstract
The number of human genomes being genotyped or sequenced increases exponentially and efficient haplotype estimation methods able to handle this amount of data are now required. Here we present a method, SHAPEIT4, which substantially improves upon other methods to process large genotype and high coverage sequencing datasets. It notably exhibits sub-linear running times with sample size, provides highly accurate haplotypes and allows integrating external phasing information such as large reference panels of haplotypes, collections of pre-phased variants and long sequencing reads. We provide SHAPEIT4 in an open source format and demonstrate its performance in terms of accuracy and running times on two gold standard datasets: the UK Biobank data and the Genome In A Bottle.
Collapse
Affiliation(s)
- Olivier Delaneau
- Department of Computational Biology, University of Lausanne, Génopode, 1015, Lausanne, Switzerland. .,Swiss Institute of Bioinformatics (SIB), University of Lausanne, Quartier Sorge - Batiment Amphipole, 1015, Lausanne, Switzerland.
| | - Jean-François Zagury
- Chaire de Bioinformatique, Laboratoire GBCM (EA7528), Conservatoire National des Arts et Métiers, HESAM Université, Paris, France
| | - Matthew R Robinson
- Department of Computational Biology, University of Lausanne, Génopode, 1015, Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), University of Lausanne, Quartier Sorge - Batiment Amphipole, 1015, Lausanne, Switzerland
| | - Jonathan L Marchini
- Department of Statistics, University of Oxford, 24-29 St. Giles, Oxford, OX1 3LB, UK
| | - Emmanouil T Dermitzakis
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1 rue Michel-Servet, 1211, Geneva, Switzerland.,Swiss Institute of Bioinformatics (SIB), University of Geneva, 1 rue Michel-Servet, 1211, Geneva, Switzerland.,Institute of Genetics and Genomics in Geneva, University of Geneva Medical School, 1 rue Michel-Servet, 1211, Geneva, Switzerland
| |
Collapse
|
8
|
Danecek P, McCarthy SA. BCFtools/csq: haplotype-aware variant consequences. Bioinformatics 2018; 33:2037-2039. [PMID: 28205675 PMCID: PMC5870570 DOI: 10.1093/bioinformatics/btx100] [Citation(s) in RCA: 252] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Accepted: 02/14/2017] [Indexed: 02/06/2023] Open
Abstract
Motivation Prediction of functional variant consequences is an important part of sequencing pipelines, allowing the categorization and prioritization of genetic variants for follow up analysis. However, current predictors analyze variants as isolated events, which can lead to incorrect predictions when adjacent variants alter the same codon, or when a frame-shifting indel is followed by a frame-restoring indel. Exploiting known haplotype information when making consequence predictions can resolve these issues. Results BCFtools/csq is a fast program for haplotype-aware consequence calling which can take into account known phase. Consequence predictions are changed for 501 of 5019 compound variants found in the 81.7M variants in the 1000 Genomes Project data, with an average of 139 compound variants per haplotype. Predictions match existing tools when run in localized mode, but the program is an order of magnitude faster and requires an order of magnitude less memory. Availability and Implementation The program is freely available for commercial and non-commercial use in the BCFtools package which is available for download from http://samtools.github.io/bcftools. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Petr Danecek
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | - Shane A McCarthy
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| |
Collapse
|
9
|
Gershon ES, Pearlson G, Keshavan MS, Tamminga C, Clementz B, Buckley PF, Alliey-Rodriguez N, Liu C, Sweeney JA, Keedy S, Meda SA, Tandon N, Shafee R, Bishop JR, Ivleva EI. Genetic analysis of deep phenotyping projects in common disorders. Schizophr Res 2018; 195:51-57. [PMID: 29056493 PMCID: PMC5910299 DOI: 10.1016/j.schres.2017.09.031] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/02/2017] [Revised: 09/19/2017] [Accepted: 09/22/2017] [Indexed: 11/19/2022]
Abstract
Several studies of complex psychotic disorders with large numbers of neurobiological phenotypes are currently under way, in living patients and controls, and on assemblies of brain specimens. Genetic analyses of such data typically present challenges, because of the choice of underlying hypotheses on genetic architecture of the studied disorders and phenotypes, large numbers of phenotypes, the appropriate multiple testing corrections, limited numbers of subjects, imputations required on missing phenotypes and genotypes, and the cross-disciplinary nature of the phenotype measures. Advances in genotype and phenotype imputation, and in genome-wide association (GWAS) methods, are useful in dealing with these challenges. As compared with the more traditional single-trait analyses, deep phenotyping with simultaneous genome-wide analyses serves as a discovery tool for previously unsuspected relationships of phenotypic traits with each other, and with specific molecular involvements.
Collapse
Affiliation(s)
- Elliot S Gershon
- Department of Psychiatry, Department of Human Genetics, University of Chicago, United States.
| | - Godfrey Pearlson
- Yale University Departments of Psychiatry & Neuroscience, Hartford, CT, United States; Olin Neuropsychiatry Research Center, Institute of Living, Hartford, Connecticut, USA
| | | | - Carol Tamminga
- Department of Psychiatry, University of Texas Southwestern Medical Center, Dallas, TX, United States
| | - Brett Clementz
- Department of Psychology, University of Georgia, Athens, GA, United States
| | - Peter F Buckley
- School of Medicine Virginia Commonwealth University (VCU), Richmond, VA, United States
| | - Ney Alliey-Rodriguez
- University of Chicago, Department of Psychiatry and Behavioral Neurosciences, Chicago, IL, United States
| | - Chunyu Liu
- University of Illinois at Chicago, Chicago, IL, United States
| | - John A Sweeney
- Department of Psychiatry, University of Texas Southwestern Medical Center, Dallas, TX, United States; University of Cincinnati, Department of Psychiatry and Behavioral Neuroscience, Cincinnati, OH, United States
| | - Sarah Keedy
- University of Chicago, Department of Psychiatry and Behavioral Neurosciences, Chicago, IL, United States
| | - Shashwath A Meda
- Yale University Departments of Psychiatry & Neuroscience, Hartford, CT, United States
| | - Neeraj Tandon
- Beth Israel Deaconess Medical Center, Dept of Psychiatry, Harvard Medical School, United States
| | - Rebecca Shafee
- Broad Institute of MIT and Harvard, Cambridge, MA, United States; Department of Genetics, Harvard Medical School, United States
| | - Jeffrey R Bishop
- Department of Clinical and Experimental Pharmacology, University of Minnesota, Minneapolis, MN, United States
| | - Elena I Ivleva
- Department of Psychiatry, University of Texas Southwestern Medical Center, Dallas, TX, United States
| |
Collapse
|
10
|
Herzig AF, Nutile T, Babron MC, Ciullo M, Bellenguez C, Leutenegger AL. Strategies for phasing and imputation in a population isolate. Genet Epidemiol 2018; 42:201-213. [PMID: 29319195 DOI: 10.1002/gepi.22109] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Revised: 11/16/2017] [Accepted: 11/16/2017] [Indexed: 11/05/2022]
Abstract
In the search for genetic associations with complex traits, population isolates offer the advantage of reduced genetic and environmental heterogeneity. In addition, cost-efficient next-generation association approaches have been proposed in these populations where only a subsample of representative individuals is sequenced and then genotypes are imputed into the rest of the population. Gene mapping in such populations thus requires high-quality genetic imputation and preliminary phasing. To identify an effective study design, we compare by simulation a range of phasing and imputation software and strategies. We simulated 1,115,604 variants on chromosome 10 for 477 members of the large complex pedigree of Campora, a village within the established isolate of Cilento in southern Italy. We assessed the phasing performance of identical by descent based software ALPHAPHASE and SLRP, LD-based software SHAPEIT2, SHAPEIT3, and BEAGLE, and new software EAGLE that combines both methodologies. For imputation we compared IMPUTE2, IMPUTE4, MINIMAC3, BEAGLE, and new software PBWT. Genotyping errors and missing genotypes were simulated to observe their effects on the performance of each software. Highly accurate phased data were achieved by all software with SHAPEIT2, SHAPEIT3, and EAGLE2 providing the most accurate results. MINIMAC3, IMPUTE4, and IMPUTE2 all performed strongly as imputation software and our study highlights the considerable gain in imputation accuracy provided by a genome sequenced reference panel specific to the population isolate.
Collapse
Affiliation(s)
- Anthony Francis Herzig
- Université Paris-Diderot, Sorbonne Paris Cité, U946, Paris, France.,Inserm, U946, Genetic Variation and Human Diseases, Paris, France
| | - Teresa Nutile
- Institute of Genetics and Biophysics A. Buzzati-Traverso-CNR, Naples, Italy
| | - Marie-Claude Babron
- Université Paris-Diderot, Sorbonne Paris Cité, U946, Paris, France.,Inserm, U946, Genetic Variation and Human Diseases, Paris, France
| | - Marina Ciullo
- Institute of Genetics and Biophysics A. Buzzati-Traverso-CNR, Naples, Italy.,IRCCS Neuromed, Pozzilli, Isernia, Italy
| | - Céline Bellenguez
- Inserm, U1167, RID-AGE-Risk Factors and Molecular Determinants of Aging-Related Diseases, Lille, France.,Institut Pasteur de Lille, Lille, France.,Université de Lille, U1167-Excellence Laboratory LabEx DISTALZ, Lille, France
| | - Anne-Louise Leutenegger
- Université Paris-Diderot, Sorbonne Paris Cité, U946, Paris, France.,Inserm, U946, Genetic Variation and Human Diseases, Paris, France
| |
Collapse
|
11
|
Aleman F. The Necessity of Diploid Genome Sequencing to Unravel the Genetic Component of Complex Phenotypes. Front Genet 2017; 8:148. [PMID: 29075286 PMCID: PMC5641544 DOI: 10.3389/fgene.2017.00148] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Accepted: 09/27/2017] [Indexed: 01/23/2023] Open
|
12
|
Loh PR, Danecek P, Palamara PF, Fuchsberger C, A Reshef Y, K Finucane H, Schoenherr S, Forer L, McCarthy S, Abecasis GR, Durbin R, L Price A. Reference-based phasing using the Haplotype Reference Consortium panel. Nat Genet 2016; 48:1443-1448. [PMID: 27694958 PMCID: PMC5096458 DOI: 10.1038/ng.3679] [Citation(s) in RCA: 1144] [Impact Index Per Article: 127.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2016] [Accepted: 08/29/2016] [Indexed: 12/17/2022]
Abstract
Haplotype phasing is a fundamental problem in medical and population genetics. Phasing is generally performed via statistical phasing in a genotyped cohort, an approach that can yield high accuracy in very large cohorts but attains lower accuracy in smaller cohorts. Here we instead explore the paradigm of reference-based phasing. We introduce a new phasing algorithm, Eagle2, that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium; HRC) using a new data structure based on the positional Burrows-Wheeler transform. We demonstrate that Eagle2 attains a ∼20× speedup and ∼10% increase in accuracy compared to reference-based phasing using SHAPEIT2. On European-ancestry samples, Eagle2 with the HRC panel achieves >2× the accuracy of 1000 Genomes-based phasing. Eagle2 is open source and freely available for HRC-based phasing via the Sanger Imputation Service and the Michigan Imputation Server.
Collapse
Affiliation(s)
- Po-Ru Loh
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Petr Danecek
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK
| | - Pier Francesco Palamara
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Christian Fuchsberger
- Center for Biomedicine, European Academy of Bozen/Bolzano (EURAC), affiliated with the University of Lübeck, Bolzano, Italy
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Yakir A Reshef
- Department of Computer Science, Harvard University, Cambridge, Massachusetts, USA
| | - Hilary K Finucane
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Sebastian Schoenherr
- Division of Genetic Epidemiology, Department of Medical Genetics, Molecular and Clinical Pharmacology, Medical University of Innsbruck, Innsbruck, Austria
| | - Lukas Forer
- Division of Genetic Epidemiology, Department of Medical Genetics, Molecular and Clinical Pharmacology, Medical University of Innsbruck, Innsbruck, Austria
| | - Shane McCarthy
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK
| | - Goncalo R Abecasis
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Richard Durbin
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| |
Collapse
|
13
|
McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, Kang HM, Fuchsberger C, Danecek P, Sharp K, Luo Y, Sidore C, Kwong A, Timpson N, Koskinen S, Vrieze S, Scott LJ, Zhang H, Mahajan A, Veldink J, Peters U, Pato C, van Duijn CM, Gillies CE, Gandin I, Mezzavilla M, Gilly A, Cocca M, Traglia M, Angius A, Barrett JC, Boomsma D, Branham K, Breen G, Brummett CM, Busonero F, Campbell H, Chan A, Chen S, Chew E, Collins FS, Corbin LJ, Smith GD, Dedoussis G, Dorr M, Farmaki AE, Ferrucci L, Forer L, Fraser RM, Gabriel S, Levy S, Groop L, Harrison T, Hattersley A, Holmen OL, Hveem K, Kretzler M, Lee JC, McGue M, Meitinger T, Melzer D, Min JL, Mohlke KL, Vincent JB, Nauck M, Nickerson D, Palotie A, Pato M, Pirastu N, McInnis M, Richards JB, Sala C, Salomaa V, Schlessinger D, Schoenherr S, Slagboom PE, Small K, Spector T, Stambolian D, Tuke M, Tuomilehto J, Van den Berg LH, Van Rheenen W, Volker U, Wijmenga C, Toniolo D, Zeggini E, Gasparini P, Sampson MG, Wilson JF, Frayling T, de Bakker PIW, Swertz MA, McCarroll S, Kooperberg C, Dekker A, Altshuler D, Willer C, Iacono W, Ripatti S, et alMcCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, Kang HM, Fuchsberger C, Danecek P, Sharp K, Luo Y, Sidore C, Kwong A, Timpson N, Koskinen S, Vrieze S, Scott LJ, Zhang H, Mahajan A, Veldink J, Peters U, Pato C, van Duijn CM, Gillies CE, Gandin I, Mezzavilla M, Gilly A, Cocca M, Traglia M, Angius A, Barrett JC, Boomsma D, Branham K, Breen G, Brummett CM, Busonero F, Campbell H, Chan A, Chen S, Chew E, Collins FS, Corbin LJ, Smith GD, Dedoussis G, Dorr M, Farmaki AE, Ferrucci L, Forer L, Fraser RM, Gabriel S, Levy S, Groop L, Harrison T, Hattersley A, Holmen OL, Hveem K, Kretzler M, Lee JC, McGue M, Meitinger T, Melzer D, Min JL, Mohlke KL, Vincent JB, Nauck M, Nickerson D, Palotie A, Pato M, Pirastu N, McInnis M, Richards JB, Sala C, Salomaa V, Schlessinger D, Schoenherr S, Slagboom PE, Small K, Spector T, Stambolian D, Tuke M, Tuomilehto J, Van den Berg LH, Van Rheenen W, Volker U, Wijmenga C, Toniolo D, Zeggini E, Gasparini P, Sampson MG, Wilson JF, Frayling T, de Bakker PIW, Swertz MA, McCarroll S, Kooperberg C, Dekker A, Altshuler D, Willer C, Iacono W, Ripatti S, Soranzo N, Walter K, Swaroop A, Cucca F, Anderson CA, Myers RM, Boehnke M, McCarthy MI, Durbin R. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 2016; 48:1279-83. [PMID: 27548312 PMCID: PMC5388176 DOI: 10.1038/ng.3643] [Show More Authors] [Citation(s) in RCA: 2002] [Impact Index Per Article: 222.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Accepted: 07/18/2016] [Indexed: 12/13/2022]
Abstract
We describe a reference panel of 64,976 human haplotypes at 39,235,157 SNPs constructed using whole-genome sequence data from 20 studies of predominantly European ancestry. Using this resource leads to accurate genotype imputation at minor allele frequencies as low as 0.1% and a large increase in the number of SNPs tested in association studies, and it can help to discover and refine causal loci. We describe remote server resources that allow researchers to carry out imputation and phasing consistently and efficiently.
Collapse
Affiliation(s)
- Shane McCarthy
- Human Genetics, Wellcome Trust Sanger Institute, Hinxton, UK
| | - Sayantan Das
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, USA
| | - Warren Kretzschmar
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Olivier Delaneau
- Genetics and Development, University of Geneva, Geneva, Switzerland
| | - Andrew R Wood
- Genetics of Complex Traits, Institute of Biomedical Science, University of Exeter Medical School, Exeter, UK
| | - Alexander Teumer
- Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
- DZHK (German Centre for Cardiovascular Research), Greifswald, Germany
| | - Hyun Min Kang
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, USA
| | - Christian Fuchsberger
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, USA
| | - Petr Danecek
- Vertebrate Resequencing Informatics, Wellcome Trust Sanger Institute, Hinxton, UK
| | - Kevin Sharp
- Department of Statistics, University of Oxford, Oxford, UK
| | - Yang Luo
- Human Genetics, Wellcome Trust Sanger Institute, Hinxton, UK
| | | | - Alan Kwong
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, USA
| | - Nicholas Timpson
- MRC Integrative Epidemiology Unit, University of Bristol, Oakfield Grove, UK
| | | | - Scott Vrieze
- Institute for Behavioral Genetics, University of Colorado, Boulder, Colorado, USA
- Department of Psychology and Neurosurgery, University of Colorado, Boulder, Colorado, USA
| | - Laura J Scott
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, USA
| | - He Zhang
- Division of Cardiovascular Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, USA
| | - Anubha Mahajan
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Jan Veldink
- Department of Neurology and Neurosurgery, Brain Center Rudolf Magnus, Utrecht, the Netherlands
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
- Department of Epidemiology, University of Washington School of Public Health, Seattle, Washington, USA
| | - Carlos Pato
- Department of Psychiatry, SUNY Downstate, Brooklyn, New York, USA
| | - Cornelia M van Duijn
- Genetic Epidemiology Unit, Department of Epidemiology, Erasmus MC, Rotterdam, the Netherlands
| | - Christopher E Gillies
- Department of Pediatrics-Nephrology, University of Michigan School of Medicine, Ann Arbor, Michigan, USA
| | - Ilaria Gandin
- Department of Medical, Surgical and Health Sciences, University of Trieste, Trieste, Italy
| | - Massimo Mezzavilla
- Genetica Medica, IRCCS Burlo Garofolo, Trieste, Italy
- Department of Experimental Genetics, Sidra, Doha, Qatar
| | - Arthur Gilly
- Human Genetics, Wellcome Trust Sanger Institute, Hinxton, UK
| | - Massimiliano Cocca
- Department of Medical, Surgical and Health Sciences, University of Trieste, Trieste, Italy
| | - Michela Traglia
- Genetics and Cell Biology, San Raffaele Research Institute, Milan, Italy
| | | | | | - Dorrett Boomsma
- Netherlands Twin Register, Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Kari Branham
- Department of Ophthalmology and Visual Sciences, University of Michigan, Ann Arbor, Michigan, USA
| | - Gerome Breen
- MRC Social Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
- NIHR Biomedical Research Centre for Mental Health, Institute of Psychiatry, Psychology and Neuroscience, King's College London and the South London Maudsley Hospital, London, UK
| | - Chad M Brummett
- Department of Anesthesiology, University of Michigan, Ann Arbor, Michigan, USA
| | | | - Harry Campbell
- Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, UK
| | - Andrew Chan
- Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, UK
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Sai Chen
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, USA
- Department of Computational Medicine, University of Michigan, Ann Arbor, Michigan, USA
- Department of Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Emily Chew
- Division of Epidemiology and Clinical Applications, National Eye Institute, Bethesda, Maryland, USA
| | - Francis S Collins
- Medical Genomics and Metabolic Genetics Branch, National Human Genome Research Institute, US National Institutes of Health, Bethesda, Maryland, USA
| | - Laura J Corbin
- MRC Integrative Epidemiology Unit, University of Bristol, Oakfield Grove, UK
| | - George Davey Smith
- MRC Integrative Epidemiology Unit, University of Bristol, Oakfield Grove, UK
| | - George Dedoussis
- Department of Nutrition and Dietetics, School of Health Science and Education, Harokopio University, Athens, Greece
| | - Marcus Dorr
- Department of Internal Medicine B, University Medicine Greifswald, Greifswald, Germany
- Institute of Clinical Chemistry and Laboratory Medicine, University Medicine Greifswald, Greifswald, Germany
| | - Aliki-Eleni Farmaki
- Department of Nutrition and Dietetics, School of Health Science and Education, Harokopio University, Athens, Greece
| | - Luigi Ferrucci
- Longitudinal Studies Section, Clinical Research Branch, Gerontology Research Center, National Institute on Aging, Baltimore, Maryland, USA
| | - Lukas Forer
- Division of Genetic Epidemiology, Department of Medical Genetics, Molecular and Clinical Pharmacology, Medical University of Innsbruck, Innsbruck, Austria
| | - Ross M Fraser
- Department of Anesthesiology, University of Michigan, Ann Arbor, Michigan, USA
| | - Stacey Gabriel
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Shawn Levy
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, USA
| | - Leif Groop
- Department of Clinical Sciences, Diabetes and Endocrinology, University of Lund, Malmö, Sweden
- Finnish Institute for Molecular Medicine, University of Helsinki, Helsinki, Finland
- Research Programs Unit, Diabetes and Obesity, University of Helsinki, Helsinki, Finland
| | - Tabitha Harrison
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Andrew Hattersley
- Institute of Biomedical and Clinical Research, University of Exeter Medical School, Exeter, UK
| | - Oddgeir L Holmen
- Hunt Research Centre, Department of Public Health and General Practice, Norwegian University of Science and Technology, Levanger, Norway
| | - Kristian Hveem
- Hunt Research Centre, Department of Public Health and General Practice, Norwegian University of Science and Technology, Levanger, Norway
| | - Matthias Kretzler
- Department of Computational Medicine, University of Michigan, Ann Arbor, Michigan, USA
- Department of Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
- Department of Internal Medicine, University of Michigan School of Medicine, Ann Arbor, Michigan, USA
| | - James C Lee
- Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge School of Clinical Medicine, Addenbrooke's Hospital, Cambridge, UK
| | - Matt McGue
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, USA
| | - Thomas Meitinger
- Institute of Human Genetics, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg, Germany
- Institute of Human Genetics, Technische Universität München, Munich, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Munich Heart Alliance, Munich, Germany
| | - David Melzer
- Epidemiology and Public Health, Institute of Biomedical and Clinical Science, University of Exeter Medical School, Exeter, UK
| | - Josine L Min
- MRC Integrative Epidemiology Unit, University of Bristol, Oakfield Grove, UK
| | - Karen L Mohlke
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - John B Vincent
- Molecular Neuropsychiatry and Development Laboratory, Centre for Addiction and Mental Health, Toronto, Ontario, Canada
- Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada
- Institute of Medical Science, University of Toronto, Toronto, Ontario, Canada
| | - Matthias Nauck
- DZHK (German Centre for Cardiovascular Research), Greifswald, Germany
- Institute of Clinical Chemistry and Laboratory Medicine, University Medicine Greifswald, Greifswald, Germany
| | - Deborah Nickerson
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Aarno Palotie
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Institute for Molecular Medicine, FIMM, Helsinki, Finland
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Department of Psychiatry, Massachusetts General Hospital, Boston, Massachusetts, USA
- Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Michele Pato
- Department of Psychiatry, SUNY Downstate, Brooklyn, New York, USA
| | - Nicola Pirastu
- Department of Medical, Surgical and Health Sciences, University of Trieste, Trieste, Italy
| | - Melvin McInnis
- Department of Psychiatry, University of Michigan, Ann Arbor, Michigan, USA
| | - J Brent Richards
- Department of Medicine, McGill University, Montreal, Quebec, Canada
- Department of Human Genetics, McGill University, Montreal, Quebec, Canada
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK
| | - Cinzia Sala
- Genetics and Cell Biology, San Raffaele Research Institute, Milan, Italy
| | | | - David Schlessinger
- National Institute on Aging, US National Institutes of Health, Baltimore, Maryland, USA
| | - Sebastian Schoenherr
- Division of Genetic Epidemiology, Department of Medical Genetics, Molecular and Clinical Pharmacology, Medical University of Innsbruck, Innsbruck, Austria
| | - P Eline Slagboom
- Molecular Epidemiology Section, Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, the Netherlands
| | - Kerrin Small
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK
| | - Timothy Spector
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK
| | - Dwight Stambolian
- Department of Ophthalmology, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Marcus Tuke
- Genetics of Complex Traits, Institute of Biomedical Science, University of Exeter Medical School, Exeter, UK
| | - Jaakko Tuomilehto
- Chronic Disease Prevention Unit, National Institute for Health and Welfare, Helsinki, Finland
- Dasman Diabetes Institute, Dasman, Kuwait
- Center for Vascular Prevention, Danube University Krems, Krems, Austria
- Diabetes Research Group, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Leonard H Van den Berg
- Department of Neurology and Neurosurgery, Brain Center Rudolf Magnus, Utrecht, the Netherlands
| | - Wouter Van Rheenen
- Department of Neurology and Neurosurgery, Brain Center Rudolf Magnus, Utrecht, the Netherlands
| | - Uwe Volker
- Institute of Clinical Chemistry and Laboratory Medicine, University Medicine Greifswald, Greifswald, Germany
- Interfaculty Institute for Genetics and Functional Genomics, University Medicine Greifswald, Greifswald, Germany
| | - Cisca Wijmenga
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
| | - Daniela Toniolo
- Genetics and Cell Biology, San Raffaele Research Institute, Milan, Italy
| | | | - Paolo Gasparini
- Department of Medical, Surgical and Health Sciences, University of Trieste, Trieste, Italy
- Department of Experimental Genetics, Sidra, Doha, Qatar
| | - Matthew G Sampson
- Department of Pediatrics-Nephrology, University of Michigan School of Medicine, Ann Arbor, Michigan, USA
| | - James F Wilson
- Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, UK
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, UK
| | - Timothy Frayling
- Genetics of Complex Traits, Institute of Biomedical Science, University of Exeter Medical School, Exeter, UK
| | - Paul I W de Bakker
- Medical Genetics, University Medical Center Utrecht, Utrecht, the Netherlands
- Department of Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Morris A Swertz
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen, the Netherlands
- University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen, the Netherlands
| | - Steven McCarroll
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
- Department of Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Charles Kooperberg
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Annelot Dekker
- Department of Neurology and Neurosurgery, Brain Center Rudolf Magnus, Utrecht, the Netherlands
| | - David Altshuler
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Diabetes Research Center (Diabetes Unit), Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA
- Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
- Vertex Pharmaceuticals, Boston, Massachusetts, USA
| | - Cristen Willer
- Division of Cardiovascular Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, USA
- Department of Computational Medicine, University of Michigan, Ann Arbor, Michigan, USA
- Department of Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - William Iacono
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, USA
| | - Samuli Ripatti
- Department of Public Health, University of Helsinki, Helsinki, Finland
| | - Nicole Soranzo
- Human Genetics, Wellcome Trust Sanger Institute, Hinxton, UK
- Department of Haematology, University of Cambridge, Cambridge, UK
- NIHR Blood and Transplant Unit (BTRU) in Donor Health and Genomics, University of Cambridge, Cambridge, UK
| | - Klaudia Walter
- Human Genetics, Wellcome Trust Sanger Institute, Hinxton, UK
| | - Anand Swaroop
- Neurobiology-Neurodegeneration and Repair Laboratory, National Eye Institute, US National Institutes of Health, Bethesda, Maryland, USA
| | | | - Carl A Anderson
- Human Genetics, Wellcome Trust Sanger Institute, Hinxton, UK
| | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, USA
| | - Michael Boehnke
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, USA
| | - Mark I McCarthy
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
- Oxford Centre for Diabetes, Endocrinology and Metabolism, Radcliffe Department of Medicine, University of Oxford, Oxford, UK
- Oxford NIHR Biomedical Research Centre, Churchill Hospital, Headington, Oxford, UK
| | - Richard Durbin
- Human Genetics, Wellcome Trust Sanger Institute, Hinxton, UK
| |
Collapse
|
14
|
Abstract
The UK Biobank (UKB) has recently released genotypes on 152,328 individuals together with extensive phenotypic and lifestyle information. We present a new phasing method SHAPEIT3 that can handle such biobank scale datasets and results in switch error rates as low as ~0.3%. The method exhibits O(NlogN) scaling in sample size (N), enabling fast and accurate phasing of even larger cohorts.
Collapse
|
15
|
Elliott LT, Teh YW. A nonparametric HMM for genetic imputation and coalescent inference. Electron J Stat 2016. [DOI: 10.1214/16-ejs1197] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|