1
|
Hai Y, Zhao W, Meng Q, Liu L, Wen Y. Bayesian linear mixed model with multiple random effects for family-based genetic studies. Front Genet 2023; 14:1267704. [PMID: 37928242 PMCID: PMC10620972 DOI: 10.3389/fgene.2023.1267704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 09/25/2023] [Indexed: 11/07/2023] Open
Abstract
Motivation: Family-based study design is one of the popular designs used in genetic research, and the whole-genome sequencing data obtained from family-based studies offer many unique features for risk prediction studies. They can not only provide a more comprehensive view of many complex diseases, but also utilize information in the design to further improve the prediction accuracy. While promising, existing analytical methods often ignore the information embedded in the study design and overlook the predictive effects of rare variants, leading to a prediction model with sub-optimal performance. Results: We proposed a Bayesian linear mixed model for the prediction analysis of sequencing data obtained from family-based studies. Our method can not only capture predictive effects from both common and rare variants, but also easily accommodate various disease model assumptions. It uses information embedded in the study design to form surrogates, where the predictive effects from unmeasured/unknown genetic and environmental risk factors can be modelled. Through extensive simulation studies and the analysis of sequencing data obtained from the Michigan State University Twin Registry study, we have demonstrated that the proposed method outperforms commonly adopted techniques. Availability: R package is available at https://github.com/yhai943/FBLMM.
Collapse
Affiliation(s)
- Yang Hai
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Wenxuan Zhao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Qingyu Meng
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Long Liu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Yalu Wen
- Department of Statistics, University of Auckland, Auckland, New Zealand
| |
Collapse
|
2
|
Choi DJ, Armstrong G, Lozzi B, Vijayaraghavan P, Plon SE, Wong TC, Boerwinkle E, Muzny DM, Chen HC, Gibbs RA, Ostrom QT, Melin B, Deneen B, Bondy ML, The Gliogene Consortium, Genomics England Research Consortium, Bainbridge MN. The genomic landscape of familial glioma. SCIENCE ADVANCES 2023; 9:eade2675. [PMID: 37115922 PMCID: PMC10146888 DOI: 10.1126/sciadv.ade2675] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 03/22/2023] [Indexed: 05/03/2023]
Abstract
Glioma is a rare brain tumor with a poor prognosis. Familial glioma is a subset of glioma with a strong genetic predisposition that accounts for approximately 5% of glioma cases. We performed whole-genome sequencing on an exploratory cohort of 203 individuals from 189 families with a history of familial glioma and an additional validation cohort of 122 individuals from 115 families. We found significant enrichment of rare deleterious variants of seven genes in both cohorts, and the most significantly enriched gene was HERC2 (P = 0.0006). Furthermore, we identified rare noncoding variants in both cohorts that were predicted to affect transcription factor binding sites or cause cryptic splicing. Last, we selected a subset of discovered genes for validation by CRISPR knockdown screening and found that DMBT1, HP1BP3, and ZCH7B3 have profound impacts on proliferation. This study performs comprehensive surveillance of the genomic landscape of familial glioma.
Collapse
Affiliation(s)
- Dong-Joo Choi
- Center for Cell and Gene Therapy, Baylor College of Medicine, Houston, TX, USA
| | - Georgina Armstrong
- Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
| | - Brittney Lozzi
- Center for Cell and Gene Therapy, Baylor College of Medicine, Houston, TX, USA
| | | | - Sharon E. Plon
- Department of Pediatrics/Hematology-Oncology, Baylor College of Medicine, Houston, TX, USA
| | - Terence C. Wong
- Rady Children’s Institute for Genomic Medicine, San Diego, CA, USA
| | - Eric Boerwinkle
- The University of Texas Health Science Center School of Public Health, Houston, TX, USA
| | - Donna M. Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Hsiao-Chi Chen
- Center for Cell and Gene Therapy, Baylor College of Medicine, Houston, TX, USA
| | - Richard A. Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Quinn T. Ostrom
- Department of Neurosurgery, Duke University School of Medicine, Durham, NC, USA
| | - Beatrice Melin
- Department of Radiation Sciences, Oncology, Umeå University, Umeå, Sweden
| | - Benjamin Deneen
- Center for Cell and Gene Therapy, Baylor College of Medicine, Houston, TX, USA
| | - Melissa L. Bondy
- Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
| | - The Gliogene Consortium
- Center for Cell and Gene Therapy, Baylor College of Medicine, Houston, TX, USA
- Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
- Rady Children’s Institute for Genomic Medicine, San Diego, CA, USA
- Department of Pediatrics/Hematology-Oncology, Baylor College of Medicine, Houston, TX, USA
- The University of Texas Health Science Center School of Public Health, Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Neurosurgery, Duke University School of Medicine, Durham, NC, USA
- Department of Radiation Sciences, Oncology, Umeå University, Umeå, Sweden
| | - Genomics England Research Consortium
- Center for Cell and Gene Therapy, Baylor College of Medicine, Houston, TX, USA
- Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
- Rady Children’s Institute for Genomic Medicine, San Diego, CA, USA
- Department of Pediatrics/Hematology-Oncology, Baylor College of Medicine, Houston, TX, USA
- The University of Texas Health Science Center School of Public Health, Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Neurosurgery, Duke University School of Medicine, Durham, NC, USA
- Department of Radiation Sciences, Oncology, Umeå University, Umeå, Sweden
| | | |
Collapse
|
3
|
Dapas M, Dunaif A. Deconstructing a Syndrome: Genomic Insights Into PCOS Causal Mechanisms and Classification. Endocr Rev 2022; 43:927-965. [PMID: 35026001 PMCID: PMC9695127 DOI: 10.1210/endrev/bnac001] [Citation(s) in RCA: 134] [Impact Index Per Article: 44.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Indexed: 01/16/2023]
Abstract
Polycystic ovary syndrome (PCOS) is among the most common disorders in women of reproductive age, affecting up to 15% worldwide, depending on the diagnostic criteria. PCOS is characterized by a constellation of interrelated reproductive abnormalities, including disordered gonadotropin secretion, increased androgen production, chronic anovulation, and polycystic ovarian morphology. It is frequently associated with insulin resistance and obesity. These reproductive and metabolic derangements cause major morbidities across the lifespan, including anovulatory infertility and type 2 diabetes (T2D). Despite decades of investigative effort, the etiology of PCOS remains unknown. Familial clustering of PCOS cases has indicated a genetic contribution to PCOS. There are rare Mendelian forms of PCOS associated with extreme phenotypes, but PCOS typically follows a non-Mendelian pattern of inheritance consistent with a complex genetic architecture, analogous to T2D and obesity, that reflects the interaction of susceptibility genes and environmental factors. Genomic studies of PCOS have provided important insights into disease pathways and have indicated that current diagnostic criteria do not capture underlying differences in biology associated with different forms of PCOS. We provide a state-of-the-science review of genetic analyses of PCOS, including an overview of genomic methodologies aimed at a general audience of non-geneticists and clinicians. Applications in PCOS will be discussed, including strengths and limitations of each study. The contributions of environmental factors, including developmental origins, will be reviewed. Insights into the pathogenesis and genetic architecture of PCOS will be summarized. Future directions for PCOS genetic studies will be outlined.
Collapse
Affiliation(s)
- Matthew Dapas
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Andrea Dunaif
- Division of Endocrinology, Diabetes and Bone Disease, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
4
|
Ma S, Dalgleish J, Lee J, Wang C, Liu L, Gill R, Buxbaum JD, Chung WK, Aschard H, Silverman EK, Cho MH, He Z, Ionita-Laza I. Powerful gene-based testing by integrating long-range chromatin interactions and knockoff genotypes. Proc Natl Acad Sci U S A 2021; 118:e2105191118. [PMID: 34799441 PMCID: PMC8617518 DOI: 10.1073/pnas.2105191118] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/07/2021] [Indexed: 02/03/2023] Open
Abstract
Gene-based tests are valuable techniques for identifying genetic factors in complex traits. Here, we propose a gene-based testing framework that incorporates data on long-range chromatin interactions, several recent technical advances for region-based tests, and leverages the knockoff framework for synthetic genotype generation for improved gene discovery. Through simulations and applications to genome-wide association studies (GWAS) and whole-genome sequencing data for multiple diseases and traits, we show that the proposed test increases the power over state-of-the-art gene-based tests in the literature, identifies genes that replicate in larger studies, and can provide a more narrow focus on the possible causal genes at a locus by reducing the confounding effect of linkage disequilibrium. Furthermore, our results show that incorporating genetic variation in distal regulatory elements tends to improve power over conventional tests. Results for UK Biobank and BioBank Japan traits are also available in a publicly accessible database that allows researchers to query gene-based results in an easy fashion.
Collapse
Affiliation(s)
- Shiyang Ma
- Department of Biostatistics, Columbia University, New York, NY 10032
| | - James Dalgleish
- Department of Biostatistics, Columbia University, New York, NY 10032
| | - Justin Lee
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA 94305
| | - Chen Wang
- Department of Biostatistics, Columbia University, New York, NY 10032
| | - Linxi Liu
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA 15260
| | - Richard Gill
- Department of Human Genetics, Genentech, South San Francisco, CA 94080
- Department of Epidemiology, Columbia University, New York, NY 10032
| | - Joseph D Buxbaum
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029
- Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY 10029
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029
| | - Wendy K Chung
- Department of Pediatrics, Columbia University, New York, NY 10032
- Department of Medicine, Columbia University, New York, NY 10032
| | - Hugues Aschard
- Department of Computational Biology, Institut Pasteur, 75015 Paris, France
| | - Edwin K Silverman
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115
- Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115
| | - Michael H Cho
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115
- Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115
| | - Zihuai He
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA 94305
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305
| | | |
Collapse
|
5
|
Li CW, Sachidanandam R, Jayaprakash A, Yi Z, Zhang W, Stefan-Lifshitz M, Concepcion E, Tomer Y. Identification of New Rare Variants Associated With Familial Autoimmune Thyroid Diseases by Deep Sequencing of Linked Loci. J Clin Endocrinol Metab 2021; 106:e4680-e4687. [PMID: 34143178 PMCID: PMC8530708 DOI: 10.1210/clinem/dgab440] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Indexed: 11/19/2022]
Abstract
CONTEXT Genetic risk factors play a major role in the pathoetiology of autoimmune thyroid diseases (AITD). So far, only common risk variants have been identified in AITD susceptibility genes. Recently, rare genetic variants have emerged as important contributors to complex diseases, and we hypothesized that rare variants play a key role in the genetic susceptibility to AITD. OBJECTIVE We aimed to identify new rare variants that are associated with familial AITD. METHODS We performed deep sequencing of 3 previously mapped AITD-linked loci (10q, 12q, and 14q) in a dataset of 34 families in which AITD clustered (familial AITD). RESULTS We identified 13 rare variants, located in the inositol polyphosphate multikinase (IPMK) gene, that were associated with AITD (ie, both Graves' disease [GD] and Hashimoto's thyroiditis [HT]); 2 rare variants, within the dihydrolipoamide S-succinyltransferase (DLST) and zinc-finger FYVE domain-containing protein (ZFYVE1) genes, that were associated with GD only; and 3 rare variants, within the phosphoglycerate mutase 1 pseudogene 5 (PGAM1P5), LOC105369879, and methionine aminopeptidase 2 (METAP2) genes, that were associated with HT only. CONCLUSION Our study demonstrates that, in addition to common variants, rare variants also contribute to the genetic susceptibility to AITD. We identified new rare variants in 6 AITD susceptibility genes that predispose to familial AITD. Of these, 3 genes, IPMK, ZFYVE1, and METAP2, are mechanistically involved in immune pathways and have been previously shown to be associated with autoimmunity. These genes predispose to thyroid autoimmunity and may serve as potential therapeutic targets in the future.
Collapse
Affiliation(s)
- Cheuk Wun Li
- Department of Medicine, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Ravi Sachidanandam
- Department of Oncological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Anitha Jayaprakash
- Department of Oncological Sciences, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Zhengzi Yi
- Department of Medicine Bioinformatics Core, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Weijia Zhang
- Department of Medicine Bioinformatics Core, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | | | - Erlinda Concepcion
- Department of Medicine, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Yaron Tomer
- Department of Medicine, Albert Einstein College of Medicine, Bronx, NY 10461, USA
- Correspondence: Yaron Tomer, MD, Department of Medicine, Albert Einstein College of Medicine, 1300 Morris Park Ave, Bronx, NY 10461, USA.
| |
Collapse
|
6
|
Forstner AJ, Fischer SB, Schenk LM, Strohmaier J, Maaser-Hecker A, Reinbold CS, Sivalingam S, Hecker J, Streit F, Degenhardt F, Witt SH, Schumacher J, Thiele H, Nürnberg P, Guzman-Parra J, Orozco Diaz G, Auburger G, Albus M, Borrmann-Hassenbach M, González MJ, Gil Flores S, Cabaleiro Fabeiro FJ, del Río Noriega F, Perez Perez F, Haro González J, Rivas F, Mayoral F, Bauer M, Pfennig A, Reif A, Herms S, Hoffmann P, Pirooznia M, Goes FS, Rietschel M, Nöthen MM, Cichon S. Whole-exome sequencing of 81 individuals from 27 multiply affected bipolar disorder families. Transl Psychiatry 2020; 10:57. [PMID: 32066727 PMCID: PMC7026119 DOI: 10.1038/s41398-020-0732-y] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 12/18/2019] [Accepted: 01/08/2020] [Indexed: 01/01/2023] Open
Abstract
Bipolar disorder (BD) is a highly heritable neuropsychiatric disease characterized by recurrent episodes of depression and mania. Research suggests that the cumulative impact of common alleles explains 25-38% of phenotypic variance, and that rare variants may contribute to BD susceptibility. To identify rare, high-penetrance susceptibility variants for BD, whole-exome sequencing (WES) was performed in three affected individuals from each of 27 multiply affected families from Spain and Germany. WES identified 378 rare, non-synonymous, and potentially functional variants. These spanned 368 genes, and were carried by all three affected members in at least one family. Eight of the 368 genes harbored rare variants that were implicated in at least two independent families. In an extended segregation analysis involving additional family members, five of these eight genes harbored variants showing full or nearly full cosegregation with BD. These included the brain-expressed genes RGS12 and NCKAP5, which were considered the most promising BD candidates on the basis of independent evidence. Gene enrichment analysis for all 368 genes revealed significant enrichment for four pathways, including genes reported in de novo studies of autism (padj < 0.006) and schizophrenia (padj = 0.015). These results suggest a possible genetic overlap with BD for autism and schizophrenia at the rare-sequence-variant level. The present study implicates novel candidate genes for BD development, and may contribute to an improved understanding of the biological basis of this common and often devastating disease.
Collapse
Affiliation(s)
- Andreas J. Forstner
- 0000 0004 1936 9756grid.10253.35Centre for Human Genetics, University of Marburg, Marburg, Germany ,0000 0001 2240 3300grid.10388.32Institute of Human Genetics, University of Bonn, School of Medicine & University Hospital Bonn, Bonn, Germany ,0000 0004 1937 0642grid.6612.3Department of Biomedicine, University of Basel, Basel, Switzerland ,0000 0004 1937 0642grid.6612.3Department of Psychiatry (UPK), University of Basel, Basel, Switzerland
| | - Sascha B. Fischer
- 0000 0004 1937 0642grid.6612.3Department of Biomedicine, University of Basel, Basel, Switzerland ,grid.410567.1Institute of Medical Genetics and Pathology, University Hospital Basel, Basel, Switzerland
| | - Lorena M. Schenk
- 0000 0001 2240 3300grid.10388.32Institute of Human Genetics, University of Bonn, School of Medicine & University Hospital Bonn, Bonn, Germany
| | - Jana Strohmaier
- 0000 0001 2190 4373grid.7700.0Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany ,SRH University Heidelberg, Academy for Psychotherapy, Heidelberg, Germany
| | - Anna Maaser-Hecker
- 0000 0001 2240 3300grid.10388.32Institute of Human Genetics, University of Bonn, School of Medicine & University Hospital Bonn, Bonn, Germany
| | - Céline S. Reinbold
- 0000 0004 1937 0642grid.6612.3Department of Biomedicine, University of Basel, Basel, Switzerland ,grid.410567.1Institute of Medical Genetics and Pathology, University Hospital Basel, Basel, Switzerland ,0000 0004 1936 8921grid.5510.1Center for Lifespan Changes in Brain and Cognition (LCBC), Department of Psychology, University of Oslo, Oslo, Norway
| | - Sugirthan Sivalingam
- 0000 0001 2240 3300grid.10388.32Institute of Human Genetics, University of Bonn, School of Medicine & University Hospital Bonn, Bonn, Germany
| | - Julian Hecker
- 000000041936754Xgrid.38142.3cDepartment of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA USA
| | - Fabian Streit
- 0000 0001 2190 4373grid.7700.0Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
| | - Franziska Degenhardt
- 0000 0001 2240 3300grid.10388.32Institute of Human Genetics, University of Bonn, School of Medicine & University Hospital Bonn, Bonn, Germany
| | - Stephanie H. Witt
- 0000 0001 2190 4373grid.7700.0Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
| | - Johannes Schumacher
- 0000 0004 1936 9756grid.10253.35Centre for Human Genetics, University of Marburg, Marburg, Germany ,0000 0001 2240 3300grid.10388.32Institute of Human Genetics, University of Bonn, School of Medicine & University Hospital Bonn, Bonn, Germany
| | - Holger Thiele
- 0000 0000 8580 3777grid.6190.eCologne Center for Genomics, University of Cologne, Cologne, Germany
| | - Peter Nürnberg
- 0000 0000 8580 3777grid.6190.eCologne Center for Genomics, University of Cologne, Cologne, Germany
| | - José Guzman-Parra
- grid.452525.1Department of Mental Health, University Regional Hospital of Málaga, Institute of Biomedicine of Málaga (IBIMA), Málaga, Spain
| | - Guillermo Orozco Diaz
- Unidad de Gestión Clínica del Dispositivo de Cuidados Críticos y Urgencias del Distrito Sanitario Málaga - Coin-Gudalhorce, Málaga, Spain
| | - Georg Auburger
- 0000 0004 0578 8220grid.411088.4Experimental Neurology, Department of Neurology, Goethe University Hospital, Frankfurt am Main, Germany
| | - Margot Albus
- 0000 0001 0690 3065grid.419834.3Isar Amper Klinikum München Ost, kbo, Haar, Germany
| | | | - Maria José González
- grid.452525.1Department of Mental Health, University Regional Hospital of Málaga, Institute of Biomedicine of Málaga (IBIMA), Málaga, Spain
| | - Susana Gil Flores
- 0000 0004 1771 4667grid.411349.aDepartment of Mental Health, University Hospital of Reina Sofia, Cordoba, Spain
| | | | - Francisco del Río Noriega
- grid.477360.1Department of Mental Health, Hospital of Jerez de la Frontera, Jerez de la Frontera, Spain
| | | | | | - Fabio Rivas
- Department of Psychiatry, Carlos Haya Regional University Hospital, Malaga, Spain
| | - Fermin Mayoral
- Department of Psychiatry, Carlos Haya Regional University Hospital, Malaga, Spain
| | - Michael Bauer
- Department of Psychiatry and Psychotherapy, Medical Faculty, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Andrea Pfennig
- Department of Psychiatry and Psychotherapy, Medical Faculty, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Andreas Reif
- 0000 0004 0578 8220grid.411088.4Department of Psychiatry, Psychosomatic Medicine and Psychotherapy, University Hospital Frankfurt am Main, Frankfurt am Main, Germany
| | - Stefan Herms
- 0000 0001 2240 3300grid.10388.32Institute of Human Genetics, University of Bonn, School of Medicine & University Hospital Bonn, Bonn, Germany ,0000 0004 1937 0642grid.6612.3Department of Biomedicine, University of Basel, Basel, Switzerland ,grid.410567.1Institute of Medical Genetics and Pathology, University Hospital Basel, Basel, Switzerland
| | - Per Hoffmann
- 0000 0001 2240 3300grid.10388.32Institute of Human Genetics, University of Bonn, School of Medicine & University Hospital Bonn, Bonn, Germany ,0000 0004 1937 0642grid.6612.3Department of Biomedicine, University of Basel, Basel, Switzerland ,grid.410567.1Institute of Medical Genetics and Pathology, University Hospital Basel, Basel, Switzerland ,0000 0001 2297 375Xgrid.8385.6Institute of Neuroscience and Medicine (INM-1), Research Center Jülich, Jülich, Germany
| | - Mehdi Pirooznia
- 0000 0001 2171 9311grid.21107.35Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD USA
| | - Fernando S. Goes
- 0000 0001 2171 9311grid.21107.35Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD USA
| | - Marcella Rietschel
- 0000 0001 2190 4373grid.7700.0Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
| | - Markus M. Nöthen
- 0000 0001 2240 3300grid.10388.32Institute of Human Genetics, University of Bonn, School of Medicine & University Hospital Bonn, Bonn, Germany
| | - Sven Cichon
- Institute of Human Genetics, University of Bonn, School of Medicine & University Hospital Bonn, Bonn, Germany. .,Department of Biomedicine, University of Basel, Basel, Switzerland. .,Institute of Medical Genetics and Pathology, University Hospital Basel, Basel, Switzerland. .,Institute of Neuroscience and Medicine (INM-1), Research Center Jülich, Jülich, Germany.
| |
Collapse
|
7
|
Jones RM, Melton PE, Pinese M, Rea AJ, Ingley E, Ballinger ML, Wood DJ, Thomas DM, Moses EK. Identification of novel sarcoma risk genes using a two-stage genome wide DNA sequencing strategy in cancer cluster families and population case and control cohorts. BMC MEDICAL GENETICS 2019; 20:69. [PMID: 31053105 PMCID: PMC6499942 DOI: 10.1186/s12881-019-0808-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 04/16/2019] [Indexed: 12/26/2022]
Abstract
Background Although familial clustering of cancers is relatively common, only a small proportion of familial cancer risk can be explained by known cancer predisposition genes. Methods In this study we employed a two-stage approach to identify candidate sarcoma risk genes. First, we conducted whole exome sequencing in three multigenerational cancer families ascertained through a sarcoma proband (n = 19) in order to prioritize candidate genes for validation in an independent case-control cohort of sarcoma patients using family-based association and segregation analysis. The second stage employed a burden analysis of rare variants within prioritized candidate genes identified from stage one in 560 sarcoma cases and 1144 healthy ageing controls, for which whole genome sequence was available. Results Variants from eight genes were identified in stage one. Following gene-based burden testing and after correction for multiple testing, two of these genes, ABCB5 and C16orf96, were determined to show statistically significant association with cancer. The ABCB5 gene was found to have a higher burden of putative regulatory variants (OR = 4.9, p-value = 0.007, q-value = 0.04) based on allele counts in sarcoma cases compared to controls. C16orf96, was found to have a significantly lower burden (OR = 0.58, p-value = 0.0004, q-value = 0.003) of regulatory variants in controls compared to sarcoma cases. Conclusions Based on these genetic association data we propose that ABCB5 and C16orf96 are novel candidate risk genes for sarcoma. Although neither of these two genes have been previously associated with sarcoma, ABCB5 has been shown to share clinical drug resistance associations with melanoma and leukaemia and C16orf96 shares regulatory elements with genes that are involved with TNF-alpha mediated apoptosis in a p53/TP53-dependent manner. Future genetic studies in other family and population cohorts will be required for further validation of these novel findings. Electronic supplementary material The online version of this article (10.1186/s12881-019-0808-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rachel M Jones
- The Curtin UWA Centre for Genetic Origins of Health and Disease, Faculty of Health Sciences, Curtin University and Faculty of Health and Medical Sciences, M409 The University of Western Australia, 35 Stirling Hwy, Crawley, 6009, Western Australia.,Medical School, Faculty of Health and Medical Sciences, University of Western Australia, Crawley, Australia
| | - Phillip E Melton
- The Curtin UWA Centre for Genetic Origins of Health and Disease, Faculty of Health Sciences, Curtin University and Faculty of Health and Medical Sciences, M409 The University of Western Australia, 35 Stirling Hwy, Crawley, 6009, Western Australia.,School of Pharmacy and Biomedical Sciences, Faculty of Health Sciences, Curtin University, Bentley, Western Australia
| | - Mark Pinese
- Cancer Division, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| | - Alexander J Rea
- The Curtin UWA Centre for Genetic Origins of Health and Disease, Faculty of Health Sciences, Curtin University and Faculty of Health and Medical Sciences, M409 The University of Western Australia, 35 Stirling Hwy, Crawley, 6009, Western Australia
| | - Evan Ingley
- School of Veterinary and Life Sciences, Murdoch University, Murdoch, Australia.,Harry Perkins Institute of Medical Research, Murdoch, Western Australia.,The Centre for Medical Research, The University of Western Australia, Crawley, Australia
| | - Mandy L Ballinger
- Cancer Division, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| | | | - David J Wood
- Medical School, Faculty of Health and Medical Sciences, University of Western Australia, Crawley, Australia
| | - David M Thomas
- Cancer Division, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| | - Eric K Moses
- The Curtin UWA Centre for Genetic Origins of Health and Disease, Faculty of Health Sciences, Curtin University and Faculty of Health and Medical Sciences, M409 The University of Western Australia, 35 Stirling Hwy, Crawley, 6009, Western Australia. .,School of Pharmacy and Biomedical Sciences, Faculty of Health Sciences, Curtin University, Bentley, Western Australia. .,School of Biomedical Sciences, Faculty of Health and Medical Sciences, The University of Western Australia, Crawley, Australia.
| |
Collapse
|
8
|
Pena GG, Martinez-Perez A, Dutra MS, Gazzinelli A, Corrêa-Oliveira R, Soria JM, Velasquez-Melendez G. Genetic determinants of cardiometabolic risk factors in rural families in Brazil. Am J Hum Biol 2016; 28:619-26. [PMID: 26891714 DOI: 10.1002/ajhb.22842] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2014] [Revised: 10/22/2015] [Accepted: 01/09/2016] [Indexed: 01/14/2023] Open
Abstract
OBJECTIVES The purpose of this study was to estimate the heritability of genetic and environmental correlations between cardiometabolic risk factors in extended pedigrees. METHODS The Jequitinhonha Community Family Study Cohort (JCFSC) consists of individuals aged ≥18 years living in rural villages. Family pedigrees were constructed of the cohort. The following data were collected: demographic and socioeconomic status, lifestyle variables, anthropometrics, and lipid traits. RESULTS The JCFSC consists of 931 individuals distributed into 69 pedigrees with 4,907 members in total. The heritabilities were 0.47 for total cholesterol (TC), 0.44 for triglycerides (TG) and 0.42 for high-density lipoprotein cholesterol (HDLc), 0.49 for metabolic syndrome, approximately 0.60 for anthropometric traits and 0.30 for blood pressure/hypertension. Significant genetic correlations (ρg ) were found mainly between TG and TC (ρg = 0.58) and hypertension and TG (ρg = 0.52). Systolic blood pressure (SBP) was correlated with TG (ρg = 0.39) and HDLc (ρg = -0.30). Diastolic blood pressures correlated with TG (ρg =0.56) and TC (ρg =0.30). Genetic correlations were also found between anthropometric traits, including: body mass index (BMI) and TG (ρg =0.34), waist circumference (WC) and TG (ρg =0.42), and WC and HDLc (ρg =-0.33). Household effects were found for HDLc (c(2) = 0.19), SBP (c(2) = 0.14) and Hypertension (c(2) = 0.14). CONCLUSIONS To some phenotypes, including lipids, hypertension, blood pressure, and anthropometric traits, genetic contribution is important in the determination of cardiometabolic risk factors. This study provides a foundation for future studies. These will mainly focus on rare variants that could describe the genetic mechanisms influencing cardiometabolic risk. Am. J. Hum. Biol. 28:619-626, 2016. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Geórgia G Pena
- School of Medicine, Universidade Federal de Uberlândia 1720, Pará Av., Umuarama, Uberlândia, Minas Gerais, 38400-902, Brazil.,Department of Maternal and Child Nursing and Public Health, School of Nursing, Universidade Federal de Minas Gerais. 190, Alfredo Balena Av., Santa Efigênia, Belo Horizonte, Minas Gerais, 30130-100, Brazil
| | - Angel Martinez-Perez
- Unit of Genomic of Complex Diseases, Sant Pau Institute of Biomedical Research (IIB-Sant Pau), 167 Sant Antoni M. Claret, Barcelona, 08025, Spain
| | - Míriam Santos Dutra
- Institute of Biological Sciences, Universidade Federal de Minas Gerais, 6627, Pres. Antônio Carlos Av., Pampulha, Belo Horizonte, Minas Gerais, 31270-901, Brazil
| | - Andrea Gazzinelli
- School of Medicine, Universidade Federal de Uberlândia 1720, Pará Av., Umuarama, Uberlândia, Minas Gerais, 38400-902, Brazil
| | - Rodrigo Corrêa-Oliveira
- Cellular and Molecular Immunology Laboratory, Centro de Pequisas René Rachou/FIOCRUZ. 1715, Augusto de Lima, Barro Preto, Belo Horizonte, Minas Gerais, 30190-002, Brazil
| | - José M Soria
- Unit of Genomic of Complex Diseases, Sant Pau Institute of Biomedical Research (IIB-Sant Pau), 167 Sant Antoni M. Claret, Barcelona, 08025, Spain
| | - Gustavo Velasquez-Melendez
- Department of Maternal and Child Nursing and Public Health, School of Nursing, Universidade Federal de Minas Gerais. 190, Alfredo Balena Av., Santa Efigênia, Belo Horizonte, Minas Gerais, 30130-100, Brazil.
| |
Collapse
|
9
|
Genetic data: The new challenge of personalized medicine, insights for rheumatoid arthritis patients. Gene 2016; 583:90-101. [PMID: 26869316 DOI: 10.1016/j.gene.2016.02.004] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2015] [Revised: 01/18/2016] [Accepted: 02/05/2016] [Indexed: 01/15/2023]
Abstract
Rapid advances in genotyping technology, analytical methods, and the establishment of large cohorts for population genetic studies have resulted in a large new body of information about the genetic basis of human rheumatoid arthritis (RA). Improved understanding of the root pathogenesis of the disease holds the promise of improved diagnostic and prognostic tools based upon this information. In this review, we summarize the nature of new genetic findings in human RA, including susceptibility loci and gene-gene and gene-environment interactions, as well as genetic loci associated with sub-groups of patients and those associated with response to therapy. Possible uses of these data are discussed, such as prediction of disease risk as well as personalized therapy and prediction of therapeutic response and risk of adverse events. While these applications are largely not refined to the point of clinical utility in RA, it seems likely that multi-parameter datasets including genetic, clinical, and biomarker data will be employed in the future care of RA patients.
Collapse
|
10
|
Lin KH, Zöllner S. Robust and Powerful Affected Sibpair Test for Rare Variant Association. Genet Epidemiol 2015; 39:325-33. [PMID: 25966809 DOI: 10.1002/gepi.21903] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Revised: 03/25/2015] [Accepted: 04/01/2015] [Indexed: 11/09/2022]
Abstract
Advances in DNA sequencing technology facilitate investigating the impact of rare variants on complex diseases. However, using a conventional case-control design, large samples are needed to capture enough rare variants to achieve sufficient power for testing the association between suspected loci and complex diseases. In such large samples, population stratification may easily cause spurious signals. One approach to overcome stratification is to use a family-based design. For rare variants, this strategy is especially appropriate, as power can be increased considerably by analyzing cases with affected relatives. We propose a novel framework for association testing in affected sibpairs by comparing the allele count of rare variants on chromosome regions shared identical by descent to the allele count of rare variants on nonshared chromosome regions, referred to as test for rare variant association with family-based internal control (TRAFIC). This design is generally robust to population stratification as cases and controls are matched within each sibpair. We evaluate the power analytically using general model for effect size of rare variants. For the same number of genotyped people, TRAFIC shows superior power over the conventional case-control study for variants with summed risk allele frequency f < 0.05; this power advantage is even more substantial when considering allelic heterogeneity. For complex models of gene-gene interaction, this power advantage depends on the direction of interaction and overall heritability. In sum, we introduce a new method for analyzing rare variants in affected sibpairs that is robust to population stratification, and provide freely available software.
Collapse
Affiliation(s)
- Keng-Han Lin
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America.,Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America.,Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America.,Department of Psychiatry, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
11
|
Abstract
PURPOSE OF REVIEW Detection of high-impact variants on lipid traits is complicated by complex genetic architecture. Although genome-wide association studies (GWAS) successfully identified many novel genes associated with lipid traits, it was less successful in identifying variants with a large impact on the phenotype. This is not unexpected, as the more common variants detectable by GWAS typically have small effects. The availability of large familial datasets and sequence data has changed the paradigm for successful genomic discovery of the novel genes and pathogenic variants underlying lipid disorders. RECENT FINDINGS Novel loci with large effects have been successfully mapped in families, and next-generation sequencing allowed for the identification of the underlying lipid-associated variants of large effect size. The success of this strategy relies on the simplification of the underlying genetic variation by focusing on large single families segregating extreme lipid phenotypes. SUMMARY Rare, high-impact variants are expected to have large effects and be more relevant for medical and pharmaceutical applications. Family data have many advantages over population-based data because they allow for the efficient detection of high-impact variants with an exponentially smaller sample size and increased power for follow-up studies.
Collapse
Affiliation(s)
- Elisabeth Rosenthal
- Department of Medicine (Medical Genetics), University of Washington, Seattle, Seattle, Washington, USA
| | - Elizabeth Blue
- Department of Medicine (Medical Genetics), University of Washington, Seattle, Seattle, Washington, USA
| | - Gail P. Jarvik
- Department of Medicine (Medical Genetics), University of Washington, Seattle, Seattle, Washington, USA
- Department of Genome Sciences, University of Washington, Seattle, Seattle, Washington, USA
| |
Collapse
|
12
|
Lin WY. Adaptive combination of P-values for family-based association testing with sequence data. PLoS One 2014; 9:e115971. [PMID: 25541952 PMCID: PMC4277421 DOI: 10.1371/journal.pone.0115971] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2014] [Accepted: 12/01/2014] [Indexed: 12/24/2022] Open
Abstract
Family-based study design will play a key role in identifying rare causal variants, because rare causal variants can be enriched in families with multiple affected subjects. Furthermore, different from population-based studies, family studies are robust to bias induced by population substructure. It is well known that rare causal variants are difficult to detect from single-locus tests. Therefore, burden tests and non-burden tests have been developed, by combining signals of multiple variants in a chromosomal region or a functional unit. This inevitably incorporates some neutral variants into the test statistics, which can dilute the power of statistical methods. To guard against the noise caused by neutral variants, we here propose an 'adaptive combination of P-values method' (abbreviated as 'ADA'). This method combines per-site P-values of variants that are more likely to be causal. Variants with large P-values (which are more likely to be neutral variants) are discarded from the combined statistic. In addition to performing extensive simulation studies, we applied these tests to the Genetic Analysis Workshop 17 data sets, where real sequence data were generated according to the 1000 Genomes Project. Compared with some existing methods, ADA is more robust to the inclusion of neutral variants. This is a merit especially when dichotomous traits are analyzed. However, there are some limitations for ADA. First, it is more computationally intensive. Second, pedigree structures and founders' sequence data are required for the permutation procedure. Third, unrelated controls cannot be included. We here show that, for family-based studies, the application of ADA is limited to dichotomous trait analyses with full pedigree information.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
13
|
Wen H, Kim YC, Snyder C, Xiao F, Fleissner EA, Becirovic D, Luo J, Downs B, Sherman S, Cowan KH, Lynch HT, Wang SM. Family-specific, novel, deleterious germline variants provide a rich resource to identify genetic predispositions for BRCAx familial breast cancer. BMC Cancer 2014; 14:470. [PMID: 24969172 PMCID: PMC4083142 DOI: 10.1186/1471-2407-14-470] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2014] [Accepted: 06/20/2014] [Indexed: 12/02/2022] Open
Abstract
Background Genetic predisposition is the primary risk factor for familial breast cancer. For the majority of familial breast cancer, however, the genetic predispositions remain unknown. All newly identified predispositions occur rarely in disease population, and the unknown genetic predispositions are estimated to reach up to total thousands. Family unit is the basic structure of genetics. Because it is an autosomal dominant disease, individuals with a history of familial breast cancer must carry the same genetic predisposition across generations. Therefore, focusing on the cases in lineages of familial breast cancer, rather than pooled cases in disease population, is expected to provide high probability to identify the genetic predisposition for each family. Methods In this study, we tested genetic predispositions by analyzing the family-specific variants in familial breast cancer. Using exome sequencing, we analyzed three families and 22 probands with BRCAx (BRCA-negative) familial breast cancer. Results We observed the presence of family-specific, novel, deleterious germline variants in each family. Of the germline variants identified, many were shared between the disease-affected family members of the same family but not found in different families, which have their own specific variants. Certain variants are putative deleterious genetic predispositions damaging functionally important genes involved in DNA replication and damaging repair, tumor suppression, signal transduction, and phosphorylation. Conclusions Our study demonstrates that the predispositions for many BRCAx familial breast cancer families can lie in each disease family. The application of a family-focused approach has the potential to detect many new predispositions.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Henry T Lynch
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, 986805 Nebraska Medical Center, Omaha, NE 68198, USA.
| | | |
Collapse
|
14
|
Abstract
The cost of next-generation sequencing is now approaching that of the first generation of genome-wide single-nucleotide genotyping panels, but this is still out of reach for large-scale epidemiologic studies with tens of thousands of subjects. Furthermore, the anticipated yield of millions of rare variants poses serious challenges for distinguishing causal from noncausal variants for disease. We explore the merits of using family-based designs for sequencing substudies to identify novel variants and prioritize them for their likelihood of causality. While the sharing of variants within families means that family-based designs may be less efficient for discovery than sequencing of a comparable number of unrelated individuals, the ability to exploit cosegregation of variants with disease within families helps distinguish causal from noncausal ones. We introduce a score test criterion for prioritizing discovered variants in terms of their likelihood of being functional. We compare the relative statistical efficiency of 2-stage versus1-stage family-based designs by application to the Genetic Analysis Workshop 18 simulated sequence data.
Collapse
Affiliation(s)
- Zhao Yang
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089-9234, USA
| | - Duncan C Thomas
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089-9234, USA
| |
Collapse
|
15
|
Sampson JN, Wheeler B, Li P, Shi J. Leveraging local identity-by-descent increases the power of case/control GWAS with related individuals. Ann Appl Stat 2014; 8:974-998. [DOI: 10.1214/14-aoas715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
16
|
Thomas DC, Yang Z, Yang F. Two-phase and family-based designs for next-generation sequencing studies. Front Genet 2013; 4:276. [PMID: 24379824 PMCID: PMC3861783 DOI: 10.3389/fgene.2013.00276] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2013] [Accepted: 11/19/2013] [Indexed: 12/21/2022] Open
Abstract
The cost of next-generation sequencing is now approaching that of early GWAS panels, but is still out of reach for large epidemiologic studies and the millions of rare variants expected poses challenges for distinguishing causal from non-causal variants. We review two types of designs for sequencing studies: two-phase designs for targeted follow-up of genomewide association studies using unrelated individuals; and family-based designs exploiting co-segregation for prioritizing variants and genes. Two-phase designs subsample subjects for sequencing from a larger case-control study jointly on the basis of their disease and carrier status; the discovered variants are then tested for association in the parent study. The analysis combines the full sequence data from the substudy with the more limited SNP data from the main study. We discuss various methods for selecting this subset of variants and describe the expected yield of true positive associations in the context of an on-going study of second breast cancers following radiotherapy. While the sharing of variants within families means that family-based designs are less efficient for discovery than sequencing unrelated individuals, the ability to exploit co-segregation of variants with disease within families helps distinguish causal from non-causal ones. Furthermore, by enriching for family history, the yield of causal variants can be improved and use of identity-by-descent information improves imputation of genotypes for other family members. We compare the relative efficiency of these designs with those using unrelated individuals for discovering and prioritizing variants or genes for testing association in larger studies. While associations can be tested with single variants, power is low for rare ones. Recent generalizations of burden or kernel tests for gene-level associations to family-based data are appealing. These approaches are illustrated in the context of a family-based study of colorectal cancer.
Collapse
Affiliation(s)
- Duncan C Thomas
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| | - Zhao Yang
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| | - Fan Yang
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| |
Collapse
|
17
|
|
18
|
Abstract
A novel web-based tool PedWiz that pipelines the informatics process for pedigree data is introduced. PedWiz is designed to assist researchers in the analysis of pedigree data. It provides a convenient tool for pedigree informatics: descriptive statistics, relative pairs, genetic similarity coefficients, the variance-covariance matrix for three estimated coefficients of allele identical-by-descent sharing as well as mean allele sharing, a plot of the pedigree structures, and a visualization of the identity coefficients. With a renewed interest in linkage and other family based methods, PedWiz will be a valuable tool for the analysis of family data.
Collapse
Affiliation(s)
- Yeunjoo E Song
- Department of Epidemiology and Biostatistics, Case Western Reserve University Cleveland, OH, USA
| | | |
Collapse
|
19
|
Mihaescu R, Pencina MJ, Alonso A, Lunetta KL, Heckbert SR, Benjamin EJ, Janssens ACJW. Incremental value of rare genetic variants for the prediction of multifactorial diseases. Genome Med 2013; 5:76. [PMID: 23961719 PMCID: PMC3971349 DOI: 10.1186/gm480] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2013] [Revised: 08/03/2013] [Accepted: 08/20/2013] [Indexed: 12/26/2022] Open
Abstract
Background It is often assumed that rare genetic variants will improve available risk prediction scores. We aimed to estimate the added predictive ability of rare variants for risk prediction of common diseases in hypothetical scenarios. Methods In simulated data, we constructed risk models with an area under the ROC curve (AUC) ranging between 0.50 and 0.95, to which we added a single variant representing the cumulative frequency and effect (odds ratio, OR) of multiple rare variants. The frequency of the rare variant ranged between 0.0001 and 0.01 and the OR between 2 and 10. We assessed the resulting AUC, increment in AUC, integrated discrimination improvement (IDI), net reclassification improvement (NRI(>0.01)) and categorical NRI. The analyses were illustrated by a simulation of atrial fibrillation risk prediction based on a published clinical risk model. Results We observed minimal improvement in AUC with the addition of rare variants. All measures increased with the frequency and OR of the variant, but maximum increment in AUC remained below 0.05. Increment in AUC and NRI(>0.01) decreased with higher AUC of the baseline model, whereas IDI remained constant. In the atrial fibrillation example, the maximum increment in AUC was 0.02 for a variant with frequency = 0.01 and OR = 10. IDI and NRI showed at most minimal increase for variants with frequency greater than or equal to 0.005 and OR greater than or equal to 5. Conclusions Since rare variants are present in only a minority of affected individuals, their predictive ability is generally low at the population level. To improve the predictive ability of clinical risk models for complex diseases, genetic variants must be common and have substantial effect on disease risk.
Collapse
Affiliation(s)
- Raluca Mihaescu
- Department of Epidemiology, Erasmus University Medical Center, Dr. Molewaterplein 50, Rotterdam, 3000 CA, The Netherlands
| | - Michael J Pencina
- Department of Biostatistics, Boston University, 801 Massachusetts Avenue, Boston, MA 02118, USA ; Harvard Clinical Research Institute, 930-W Commonwealth Avenue, Boston, MA 02215-1212, USA
| | - Alvaro Alonso
- Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, 1300 S. Second Street, Minneapolis, MN 55454-1015, USA
| | - Kathryn L Lunetta
- Department of Biostatistics, Boston University, 801 Massachusetts Avenue, Boston, MA 02118, USA ; The National Heart, Lung, and Blood Institute's Framingham Heart Study, 73 Mt. Wayte Avenue, Framingham, MA 01702-5827, USA
| | - Susan R Heckbert
- Department of Epidemiology, University of Washington, Seattle, 1959 NE Pacific Street, Seattle, WA 98195-7236, USA
| | - Emelia J Benjamin
- The National Heart, Lung, and Blood Institute's Framingham Heart Study, 73 Mt. Wayte Avenue, Framingham, MA 01702-5827, USA ; Cardiology and Preventive Medicine Section, Boston University School of Medicine, Boston, 715 Albany Street, MA 02118, USA ; Department of Epidemiology, Boston University School of Public Health, Boston, 715 Albany Street, MA 02118, USA
| | - A Cecile J W Janssens
- Department of Epidemiology, Erasmus University Medical Center, Dr. Molewaterplein 50, Rotterdam, 3000 CA, The Netherlands ; Emory University, Rollins School of Public Health, 1518 Clifton Road, Atlanta, GA 30322 USA
| |
Collapse
|
20
|
Matullo G, Di Gaetano C, Guarrera S. Next generation sequencing and rare genetic variants: from human population studies to medical genetics. ENVIRONMENTAL AND MOLECULAR MUTAGENESIS 2013; 54:518-532. [PMID: 23922201 DOI: 10.1002/em.21799] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2013] [Revised: 05/31/2013] [Accepted: 06/09/2013] [Indexed: 06/02/2023]
Abstract
The allelic frequency spectrum emerging from several Next Generation Sequencing (NGS) projects is revealing important details about evolutionary and demographic forces that shaped the human genome. Herein, we discuss some of the achievements of the use of low-frequency and rare variants from NGS studies. The majority of variants that affect protein-coding regions are recent and rare. Often, the novel rare variants are enriched for deleterious alleles and are population-specific, making them suitable for the study of disease susceptibility. To investigate this kind of variation and its effects in association studies, very large sample sizes will be necessary to achieve sufficient statistical power. Moreover, as these variants are typically population-specific, the replication of disease associations across populations could be very difficult due to population stratification. Therefore, the design of experiments focusing on the identification of rare variants and their effects should be carefully planned. Although several successes have already been achieved through NGS for genetic epidemiology, pharmacogenetic and clinical purposes, with improvements of the sequencing technology and decreased costs, further advances are expected in the near future.
Collapse
Affiliation(s)
- Giuseppe Matullo
- Dipartimento di Scienze Mediche, Università di Torino, Torino, Italy.
| | | | | |
Collapse
|
21
|
DeRycke MS, Gunawardena SR, Middha S, Asmann YW, Schaid DJ, McDonnell SK, Riska SM, Eckloff BW, Cunningham JM, Fridley BL, Serie DJ, Bamlet WR, Cicek MS, Jenkins MA, Duggan DJ, Buchanan D, Clendenning M, Haile RW, Woods MO, Gallinger SN, Casey G, Potter JD, Newcomb PA, Le Marchand L, Lindor NM, Thibodeau SN, Goode EL. Identification of novel variants in colorectal cancer families by high-throughput exome sequencing. Cancer Epidemiol Biomarkers Prev 2013; 22:1239-51. [PMID: 23637064 PMCID: PMC3704223 DOI: 10.1158/1055-9965.epi-12-1226] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Colorectal cancer (CRC) in densely affected families without Lynch Syndrome may be due to mutations in undiscovered genetic loci. Familial linkage analyses have yielded disparate results; the use of exome sequencing in coding regions may identify novel segregating variants. METHODS We completed exome sequencing on 40 affected cases from 16 multicase pedigrees to identify novel loci. Variants shared among all sequenced cases within each family were identified and filtered to exclude common variants and single-nucleotide variants (SNV) predicted to be benign. RESULTS We identified 32 nonsense or splice-site SNVs, 375 missense SNVs, 1,394 synonymous or noncoding SNVs, and 50 indels in the 16 families. Of particular interest are two validated and replicated missense variants in CENPE and KIF23, which are both located within previously reported CRC linkage regions, on chromosomes 1 and 15, respectively. CONCLUSIONS Whole-exome sequencing identified DNA variants in multiple genes. Additional sequencing of these genes in additional samples will further elucidate the role of variants in these regions in CRC susceptibility. IMPACT Exome sequencing of familial CRC cases can identify novel rare variants that may influence disease risk.
Collapse
Affiliation(s)
- Melissa S. DeRycke
- Departments of Health Sciences Research, Biomedical Statistics and Informatics, Laboratory Medicine and Pathology, Medical Genetics, Medical Genomics Technology and Advanced Genomics Technology Center, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Shanaka R. Gunawardena
- Departments of Health Sciences Research, Biomedical Statistics and Informatics, Laboratory Medicine and Pathology, Medical Genetics, Medical Genomics Technology and Advanced Genomics Technology Center, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Sumit Middha
- Departments of Health Sciences Research, Biomedical Statistics and Informatics, Laboratory Medicine and Pathology, Medical Genetics, Medical Genomics Technology and Advanced Genomics Technology Center, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Yan W Asmann
- Departments of Health Sciences Research, Biomedical Statistics and Informatics, Laboratory Medicine and Pathology, Medical Genetics, Medical Genomics Technology and Advanced Genomics Technology Center, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Daniel J. Schaid
- Departments of Health Sciences Research, Biomedical Statistics and Informatics, Laboratory Medicine and Pathology, Medical Genetics, Medical Genomics Technology and Advanced Genomics Technology Center, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Shannon K. McDonnell
- Departments of Health Sciences Research, Biomedical Statistics and Informatics, Laboratory Medicine and Pathology, Medical Genetics, Medical Genomics Technology and Advanced Genomics Technology Center, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Shaun M. Riska
- Departments of Health Sciences Research, Biomedical Statistics and Informatics, Laboratory Medicine and Pathology, Medical Genetics, Medical Genomics Technology and Advanced Genomics Technology Center, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Bruce W Eckloff
- Departments of Health Sciences Research, Biomedical Statistics and Informatics, Laboratory Medicine and Pathology, Medical Genetics, Medical Genomics Technology and Advanced Genomics Technology Center, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Julie M. Cunningham
- Departments of Health Sciences Research, Biomedical Statistics and Informatics, Laboratory Medicine and Pathology, Medical Genetics, Medical Genomics Technology and Advanced Genomics Technology Center, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Brooke L. Fridley
- Department of Biostatistics, University of Kansas Medical Center, Kansas City, KS 66160, USA
| | - Daniel J. Serie
- Departments of Health Sciences Research, Biomedical Statistics and Informatics, Laboratory Medicine and Pathology, Medical Genetics, Medical Genomics Technology and Advanced Genomics Technology Center, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - William R. Bamlet
- Departments of Health Sciences Research, Biomedical Statistics and Informatics, Laboratory Medicine and Pathology, Medical Genetics, Medical Genomics Technology and Advanced Genomics Technology Center, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Mine S. Cicek
- Departments of Health Sciences Research, Biomedical Statistics and Informatics, Laboratory Medicine and Pathology, Medical Genetics, Medical Genomics Technology and Advanced Genomics Technology Center, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Mark A. Jenkins
- Centre for Molecular, Environmental, Genetic and Analytic Epidemiology, University of Melbourne, Victoria 3010, Australia
| | - David J. Duggan
- Translational Genomics Research Institute, Phoenix, AZ, 85004, USA
| | - Daniel Buchanan
- Cancer and Population Studies Group, Queensland Institute of Medical Research, Queensland, Australia
| | - Mark Clendenning
- Cancer and Population Studies Group, Queensland Institute of Medical Research, Queensland, Australia
| | - Robert W. Haile
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, 90033, USA
| | - Michael O. Woods
- Discipline of Genetics, Faculty of Medicine, Memorial University of Newfoundland, St. Johns, NL, Canada
| | | | - Graham Casey
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, 90033, USA
| | - John D. Potter
- Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
| | - Polly A. Newcomb
- Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
| | - Loic Le Marchand
- Department of Epidemiology, University of Hawaii, Honolulu, HI, USA
| | - Noralane M. Lindor
- Department of Health Sciences Research, Mayo Clinic, Scottsdale, AZ 85259, USA
| | - Stephen N. Thibodeau
- Departments of Health Sciences Research, Biomedical Statistics and Informatics, Laboratory Medicine and Pathology, Medical Genetics, Medical Genomics Technology and Advanced Genomics Technology Center, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| | - Ellen L. Goode
- Departments of Health Sciences Research, Biomedical Statistics and Informatics, Laboratory Medicine and Pathology, Medical Genetics, Medical Genomics Technology and Advanced Genomics Technology Center, Mayo Clinic College of Medicine, Rochester, MN, 55905, USA
| |
Collapse
|
22
|
Thomas DC. Some surprising twists on the road to discovering the contribution of rare variants to complex diseases. Hum Hered 2013; 74:113-7. [PMID: 23594489 DOI: 10.1159/000347020] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
23
|
Perdry H, Müller-Myhsok B, Clerget-Darpoux F. Using Affected Sib-Pairs to Uncover Rare Disease Variants. Hum Hered 2013; 74:129-41. [DOI: 10.1159/000346788] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
24
|
Family-based association tests for sequence data, and comparisons with population-based association tests. Eur J Hum Genet 2013; 21:1158-62. [PMID: 23386037 DOI: 10.1038/ejhg.2012.308] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2012] [Revised: 10/22/2012] [Accepted: 11/21/2012] [Indexed: 11/08/2022] Open
Abstract
Recent advances in high-throughput sequencing technologies make it increasingly more efficient to sequence large cohorts for many complex traits. We discuss here a class of sequence-based association tests for family-based designs that corresponds naturally to previously proposed population-based tests, including the classical Burden and variance-component tests. This framework allows for a direct comparison between the powers of sequence-based association tests with family- vs population-based designs. We show that for dichotomous traits using family-based controls results in similar power levels as the population-based design (although at an increased sequencing cost for the family-based design), while for continuous traits (in random samples, no ascertainment) the population-based design can be substantially more powerful. A possible disadvantage of population-based designs is that they can lead to increased false-positive rates in the presence of population stratification, while the family-based designs are robust to population stratification. We show also an application to a small exome-sequencing family-based study on autism spectrum disorders. The tests are implemented in publicly available software.
Collapse
|
25
|
SINGH ANGADPAL, ZAFER SAMREEN, PE’ER ITSIK. MetaSeq: privacy preserving meta-analysis of sequencing-based association studies. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2013:356-367. [PMID: 23424140 PMCID: PMC3605551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Human genetics recently transitioned from GWAS to studies based on NGS data. For GWAS, small effects dictated large sample sizes, typically made possible through meta-analysis by exchanging summary statistics across consortia. NGS studies groupwise-test for association of multiple potentially-causal alleles along each gene. They are subject to similar power constraints and therefore likely to resort to meta-analysis as well. The problem arises when considering privacy of the genetic information during the data-exchange process. Many scoring schemes for NGS association rely on the frequency of each variant thus requiring the exchange of identity of the sequenced variant. As such variants are often rare, potentially revealing the identity of their carriers and jeopardizing privacy. We have thus developed MetaSeq, a protocol for meta-analysis of genome-wide sequencing data by multiple collaborating parties, scoring association for rare variants pooled per gene across all parties. We tackle the challenge of tallying frequency counts of rare, sequenced alleles, for metaanalysis of sequencing data without disclosing the allele identity and counts, thereby protecting sample identity. This apparent paradoxical exchange of information is achieved through cryptographic means. The key idea is that parties encrypt identity of genes and variants. When they transfer information about frequency counts in cases and controls, the exchanged data does not convey the identity of a mutation and therefore does not expose carrier identity. The exchange relies on a 3rd party, trusted to follow the protocol although not trusted to learn about the raw data. We show applicability of this method to publicly available exome-sequencing data from multiple studies, simulating phenotypic information for powerful meta-analysis. The MetaSeq software is publicly available as open source.
Collapse
Affiliation(s)
| | | | - ITSIK PE’ER
- Author to which all correspondence should be addressed
| |
Collapse
|
26
|
Wijsman EM. The role of large pedigrees in an era of high-throughput sequencing. Hum Genet 2012; 131:1555-63. [PMID: 22714655 PMCID: PMC3638020 DOI: 10.1007/s00439-012-1190-2] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2012] [Accepted: 06/07/2012] [Indexed: 12/13/2022]
Abstract
Rare variation is the current frontier in human genetics. The large pedigree design is practical, efficient, and well-suited for investigating rare variation. In large pedigrees, specific rare variants that co-segregate with a trait will occur in sufficient numbers so that effects can be measured, and evidence for association can be evaluated, by making use of methods that fully use the pedigree information. Evidence from linkage analysis can focus investigation, both reducing the multiple testing burden and expanding the variants that can be evaluated and followed up, as recent studies have shown. The large pedigree design requires only a small fraction of the sample size needed to identify rare variants of interest in population-based designs, and many highly suitable, well-understood, and available statistical and computational tools already exist. Samples consisting of large pedigrees with existing rich phenotype and genome scan data should be prime candidates for high-throughput sequencing in the search of the determinants of complex traits.
Collapse
Affiliation(s)
- Ellen M Wijsman
- Department of Biostatistics, University of Washington, Seattle, WA 98195-7720, USA.
| |
Collapse
|
27
|
Familial cosegregation of rare genetic variants with disease in complex disorders. Eur J Hum Genet 2012; 21:444-50. [PMID: 23010752 DOI: 10.1038/ejhg.2012.194] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Family-based designs are increasingly being used for identification of rare variants in complex disorders. This paper addresses two questions related to the utility of these designs. First, under what circumstances are rare disease-related variants expected to cosegregate with disease in families? Second, under what circumstances is a disease-variant association expected to be greater in studies restricted to familial cases than in studies of unselected cases? To investigate these questions, we developed a probability model of disease causation involving two loci. To address cosegregation, we examined the probability that an affected first-degree relative of a variant-carrying proband would also carry the variant. We find that this probability increases with increasing odds ratio (OR) for the variant, but declines with increasing sibling recurrence risk ratio (λs). For example, under reasonable assumptions, the 15q13.3 microdeletion in idiopathic generalized epilepsy, with an OR estimate of 68 in large case-control studies, is expected to be present in >95% of affected first-degree relatives of variant-carrying probands. However, for a variant with OR=5, the probability an affected relative has the variant ranges from 82% (when λs=2) to 58% (when λs=50). We also find that restriction of a study to familial cases does not necessarily increase a rare variant's association with disease, especially if λs is high and the variant contributes little to overall disease familial aggregation. These findings provide guidance for the design of family-based studies of rare variants in complex disorders.
Collapse
|
28
|
Statistical Challenges in Sequence-Based Association Studies with Population- and Family-Based Designs. STATISTICS IN BIOSCIENCES 2012. [DOI: 10.1007/s12561-012-9062-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
29
|
Abstract
Advances in sequencing technology allow assessing the impact of rare variation on common disorders. For this purpose, methods combine rare variants across a gene and compare an aggregate statistic between cases and controls. However, sequencing many individuals is costly. Hence, it is necessary to identify case samples that are most likely to result in powerful tests under realistic model assumptions. Power can be increased by selecting cases that are highly likely to carry risk variants. As rare variants that contribute to the heritability of a disease co-segregate among affected family members, selecting cases that have affected family members may increase the power of rare variant tests considerably. Here I compare sequencing random cases to cases ascertained to have affected family members. I quantify the power of the different approaches and provide criteria for sample selection under different models of inheritance. Under a model of multiplicative gene-gene interaction, a sample of random cases has to be 2-16-fold larger to achieve the same power as a sample of cases ascertained to have affected family members. However, in traits with high heritability this power gain can be reduced or even reversed under models of additive gene-gene interaction. Hence study designs should depend on the studied disease's heritability and on the available sample size. I also show that selecting cases that share both chromosomes identical by descent with an affected sibling at candidate regions can result in a further power gain.
Collapse
|
30
|
Properties and power of the Drosophila Synthetic Population Resource for the routine dissection of complex traits. Genetics 2012; 191:935-49. [PMID: 22505626 DOI: 10.1534/genetics.112.138537] [Citation(s) in RCA: 132] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The Drosophila Synthetic Population Resource (DSPR) is a newly developed multifounder advanced intercross panel consisting of >1600 recombinant inbred lines (RILs) designed for the genetic dissection of complex traits. Here, we describe the inference of the underlying mosaic founder structure for the full set of RILs from a dense set of semicodominant restriction-site-associated DNA (RAD) markers and use simulations to explore how variation in marker density and sequencing coverage affects inference. For a given sequencing effort, marker density is more important than sequence coverage per marker in terms of the amount of genetic information we can infer. We also assessed the power of the DSPR by assigning genotypes at a hidden QTL to each RIL on the basis of the inferred founder state and simulating phenotypes for different experimental designs, different genetic architectures, different sample sizes, and QTL of varying effect sizes. We found the DSPR has both high power (e.g., 84% power to detect a 5% QTL) and high mapping resolution (e.g., ∼1.5 cM for a 5% QTL).
Collapse
|
31
|
Ionita-Laza I, Makarov V, Yoon S, Raby B, Buxbaum J, Nicolae DL, Lin X. Finding disease variants in Mendelian disorders by using sequence data: methods and applications. Am J Hum Genet 2011; 89:701-12. [PMID: 22137099 DOI: 10.1016/j.ajhg.2011.11.003] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2011] [Revised: 09/19/2011] [Accepted: 11/03/2011] [Indexed: 12/11/2022] Open
Abstract
Many sequencing studies are now underway to identify the genetic causes for both Mendelian and complex traits. Via exome-sequencing, genes harboring variants implicated in several Mendelian traits have already been identified. The underlying methodology in these studies is a multistep algorithm based on filtering variants identified in a small number of affected individuals and depends on whether they are novel (not yet seen in public resources such as dbSNP), shared among affected individuals, and other external functional information on the variants. Although intuitive, these filter-based methods are nonoptimal and do not provide any measure of statistical uncertainty. We describe here a formal statistical approach that has several distinct advantages: (1) it provides fast computation of approximate p values for individual genes, (2) it adjusts for the background variation in each gene, (3) it allows for incorporation of functional or linkage-based information, and (4) it accommodates designs based on both affected relative pairs and unrelated affected individuals. We show via simulations that the proposed approach can be used in conjunction with the existing filter-based methods to achieve a substantially better ranking of a gene relevant for disease when compared to currently used filter-based approaches, this is especially so in the presence of disease locus heterogeneity. We revisit recent studies on three Mendelian diseases and show that the proposed approach results in the implicated gene being ranked first in all studies, and approximate p values of 10(-6) for the Miller Syndrome gene, 1.0 × 10(-4) for the Freeman-Sheldon Syndrome gene, and 3.5 × 10(-5) for the Kabuki Syndrome gene.
Collapse
|