1
|
Dondi A, Borgsmüller N, Ferreira PF, Haas BJ, Jacob F, Heinzelmann-Schwarz V, Beerenwinkel N. De novo detection of somatic variants in high-quality long-read single-cell RNA sequencing data. Genome Res 2025; 35:900-913. [PMID: 40107722 PMCID: PMC12047253 DOI: 10.1101/gr.279281.124] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Accepted: 02/28/2025] [Indexed: 03/22/2025]
Abstract
In cancer, genetic and transcriptomic variations generate clonal heterogeneity, leading to treatment resistance. Long-read single-cell RNA sequencing (LR scRNA-seq) has the potential to detect genetic and transcriptomic variations simultaneously. Here, we present LongSom, a computational workflow leveraging high-quality LR scRNA-seq data to call de novo somatic single-nucleotide variants (SNVs), including in mitochondria (mtSNVs), copy number alterations (CNAs), and gene fusions, to reconstruct the tumor clonal heterogeneity. Before somatic variant calling, LongSom reannotates marker gene-based cell types using cell mutational profiles. LongSom distinguishes somatic SNVs from noise and germline polymorphisms by applying an extensive set of hard filters and statistical tests. Applying LongSom to human ovarian cancer samples, we detected clinically relevant somatic SNVs that were validated against matched DNA samples. Leveraging somatic SNVs and fusions, LongSom found subclones with different predicted treatment outcomes. In summary, LongSom enables de novo variant detection without the need for normal samples, facilitating the study of cancer evolution, clonal heterogeneity, and treatment resistance.
Collapse
Affiliation(s)
- Arthur Dondi
- Department of Biosystems Science and Engineering, ETH Zurich, 4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4056 Basel, Switzerland
| | - Nico Borgsmüller
- Department of Biosystems Science and Engineering, ETH Zurich, 4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4056 Basel, Switzerland
| | - Pedro F Ferreira
- Department of Biosystems Science and Engineering, ETH Zurich, 4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4056 Basel, Switzerland
| | - Brian J Haas
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA
| | - Francis Jacob
- Ovarian Cancer Research, Department of Biomedicine, University Hospital Basel and University of Basel, 4031 Basel, Switzerland
| | - Viola Heinzelmann-Schwarz
- Ovarian Cancer Research, Department of Biomedicine, University Hospital Basel and University of Basel, 4031 Basel, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, 4056 Basel, Switzerland;
- SIB Swiss Institute of Bioinformatics, 4056 Basel, Switzerland
| |
Collapse
|
2
|
Rossi N, Syed N, Visconti A, Aliyev E, Berry S, Bourbon M, Spector TD, Hysi PG, Fakhro KA, Falchi M. Rare variants at KCNJ2 are associated with LDL-cholesterol levels in a cross-population study. NPJ Genom Med 2024; 9:36. [PMID: 38942744 PMCID: PMC11213907 DOI: 10.1038/s41525-024-00417-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 05/03/2024] [Indexed: 06/30/2024] Open
Abstract
Leveraging whole genome sequencing data of 1751 individuals from the UK and 2587 Qatari subjects, we suggest here an association of rare variants mapping to the sour taste-associated gene KCNJ2 with reduced low-density lipoprotein cholesterol (LDL-C, P = 2.10 × 10-12) and with a 22% decreased dietary trans-fat intake. This study identifies a novel candidate rare locus for LDL-C, adding insights into the genetic architecture of a complex trait implicated in cardiovascular disease.
Collapse
Affiliation(s)
- Niccolò Rossi
- Department of Twin Research & Genetic Epidemiology, King's College London, London, UK
| | - Najeeb Syed
- Department of Human Genetics, Sidra Medical and Research Center, Doha, Qatar
| | - Alessia Visconti
- Department of Twin Research & Genetic Epidemiology, King's College London, London, UK
- Center for Biostatistics, Epidemiology and Public Health, Department of Clinical and Biological Sciences, University of Turin, Turin, Italy
| | - Elbay Aliyev
- Department of Human Genetics, Sidra Medical and Research Center, Doha, Qatar
| | - Sarah Berry
- Department of Nutritional Sciences, King's College London, London, UK
| | - Mafalda Bourbon
- Cardiovascular Research Group, Department of Health Promotion and Prevention of non-Communicable Diseases, Instituto Nacional de Saúde Dr. Ricardo Jorge, Lisbon, Portugal
| | - Tim D Spector
- Department of Twin Research & Genetic Epidemiology, King's College London, London, UK
| | - Pirro G Hysi
- Department of Twin Research & Genetic Epidemiology, King's College London, London, UK
| | - Khalid A Fakhro
- Department of Human Genetics, Sidra Medical and Research Center, Doha, Qatar
- Department of Genetic Medicine, Weill-Cornell Medical College, Doha, Qatar
| | - Mario Falchi
- Department of Twin Research & Genetic Epidemiology, King's College London, London, UK.
| |
Collapse
|
3
|
Fu L, Wang Y, Li T, Yang S, Hu YQ. A Novel Hierarchical Clustering Approach for Joint Analysis of Multiple Phenotypes Uncovers Obesity Variants Based on ARIC. Front Genet 2022; 13:791920. [PMID: 35391794 PMCID: PMC8981031 DOI: 10.3389/fgene.2022.791920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 01/27/2022] [Indexed: 12/02/2022] Open
Abstract
Genome-wide association studies (GWASs) have successfully discovered numerous variants underlying various diseases. Generally, one-phenotype one-variant association study in GWASs is not efficient in identifying variants with weak effects, indicating that more signals have not been identified yet. Nowadays, jointly analyzing multiple phenotypes has been recognized as an important approach to elevate the statistical power for identifying weak genetic variants on complex diseases, shedding new light on potential biological mechanisms. Therefore, hierarchical clustering based on different methods for calculating correlation coefficients (HCDC) is developed to synchronously analyze multiple phenotypes in association studies. There are two steps involved in HCDC. First, a clustering approach based on the similarity matrix between two groups of phenotypes is applied to choose a representative phenotype in each cluster. Then, we use existing methods to estimate the genetic associations with the representative phenotypes rather than the individual phenotypes in every cluster. A variety of simulations are conducted to demonstrate the capacity of HCDC for boosting power. As a consequence, existing methods embedding HCDC are either more powerful or comparable with those of without embedding HCDC in most scenarios. Additionally, the application of obesity-related phenotypes from Atherosclerosis Risk in Communities via existing methods with HCDC uncovered several associated variants. Among these, UQCC1-rs1570004 is reported as a significant obesity signal for the first time, whose differential expression in subcutaneous fat, visceral fat, and muscle tissue is worthy of further functional studies.
Collapse
Affiliation(s)
- Liwan Fu
- Center for Non-communicable Disease Management, National Center for Children’s Health, Beijing Children’s Hospital, Capital Medical University, Beijing, China
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Yuquan Wang
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Tingting Li
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Siqian Yang
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Yue-Qing Hu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
- Shanghai Center for Mathematical Sciences, Fudan University, Shanghai, China
| |
Collapse
|
4
|
Yang Y, Sun Q, Huang L, Broome JG, Correa A, Reiner A, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, Raffield LM, Yang Y, Li Y. eSCAN: scan regulatory regions for aggregate association testing using whole-genome sequencing data. Brief Bioinform 2022; 23:bbab497. [PMID: 34882196 PMCID: PMC8898002 DOI: 10.1093/bib/bbab497] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 10/25/2021] [Accepted: 10/30/2021] [Indexed: 02/07/2023] Open
Abstract
Multiple statistical methods for aggregate association testing have been developed for whole-genome sequencing (WGS) data. Many aggregate variants in a given genomic window and ignore existing knowledge to define test regions, resulting in many identified regions not clearly linked to genes, and thus, limiting biological understanding. Functional information from new technologies (such as Hi-C and its derivatives), which can help link enhancers to their effector genes, can be leveraged to predefine variant sets for aggregate testing in WGS data. Here, we propose the eSCAN (scan the enhancers) method for genome-wide assessment of enhancer regions in sequencing studies, combining the advantages of dynamic window selection in SCANG (SCAN the Genome), a previously developed method, with the advantages of incorporating putative regulatory regions from annotation. eSCAN, by searching in putative enhancers, increases statistical power and aids mechanistic interpretation, as demonstrated by extensive simulation studies. We also apply eSCAN for blood cell traits using NHLBI Trans-Omics for Precision Medicine WGS data. Results from real data analysis show that eSCAN is able to capture more significant signals, and these signals are of shorter length (indicating higher resolution fine-mapping capability) and drive association of larger regions detected by other methods.
Collapse
Affiliation(s)
- Yingxi Yang
- Department of Statistics and Data Science, Yale University, New Haven, CT, 06511, USA
| | - Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Le Huang
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Jai G Broome
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA
| | - Adolfo Correa
- Department of Medicine and Population Health Science, University of Mississippi Medical Center, Jackson, MS, 39216, USA
| | - Alexander Reiner
- Department of Epidemiology, University of Washington, Seattle, WA, 98195, USA
- Fred Hutchinson Cancer Research Center, University of Washington, Seattle, WA, 98195, USA
| | | | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Yuchen Yang
- State Key Laboratory of Biocontrol, School of Ecology, Sun Yat-sen University, 510275 Guangzhou, China
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| |
Collapse
|
5
|
Wang P, Castellani CA, Yao J, Huan T, Bielak LF, Zhao W, Haessler J, Joehanes R, Sun X, Guo X, Longchamps RJ, Manson JE, Grove ML, Bressler J, Taylor KD, Lappalainen T, Kasela S, Van Den Berg DJ, Hou L, Reiner A, Liu Y, Boerwinkle E, Smith JA, Peyser PA, Fornage M, Rich SS, Rotter JI, Kooperberg C, Arking DE, Levy D, Liu C, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium. Epigenome-wide association study of mitochondrial genome copy number. Hum Mol Genet 2021; 31:309-319. [PMID: 34415308 PMCID: PMC8742999 DOI: 10.1093/hmg/ddab240] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 07/27/2021] [Accepted: 08/11/2021] [Indexed: 01/03/2023] Open
Abstract
We conducted cohort- and race-specific epigenome-wide association analyses of mitochondrial deoxyribonucleic acid (mtDNA) copy number (mtDNA CN) measured in whole blood from participants of African and European origins in five cohorts (n = 6182, mean age = 57-67 years, 65% women). In the meta-analysis of all the participants, we discovered 21 mtDNA CN-associated DNA methylation sites (CpG) (P < 1 × 10-7), with a 0.7-3.0 standard deviation increase (3 CpGs) or decrease (18 CpGs) in mtDNA CN corresponding to a 1% increase in DNA methylation. Several significant CpGs have been reported to be associated with at least two risk factors (e.g. chronological age or smoking) for cardiovascular disease (CVD). Five genes [PR/SET domain 16, nuclear receptor subfamily 1 group H member 3 (NR1H3), DNA repair protein, DNA polymerase kappa and decaprenyl-diphosphate synthase subunit 2], which harbor nine significant CpGs, are known to be involved in mitochondrial biosynthesis and functions. For example, NR1H3 encodes a transcription factor that is differentially expressed during an adipose tissue transition. The methylation level of cg09548275 in NR1H3 was negatively associated with mtDNA CN (effect size = -1.71, P = 4 × 10-8) and was positively associated with the NR1H3 expression level (effect size = 0.43, P = 0.0003), which indicates that the methylation level in NR1H3 may underlie the relationship between mtDNA CN, the NR1H3 transcription factor and energy expenditure. In summary, the study results suggest that mtDNA CN variation in whole blood is associated with DNA methylation levels in genes that are involved in a wide range of mitochondrial activities. These findings will help reveal molecular mechanisms between mtDNA CN and CVD.
Collapse
Affiliation(s)
- Penglong Wang
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Christina A Castellani
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario N6A 5C1, Canada
| | - Jie Yao
- Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Tianxiao Huan
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Lawrence F Bielak
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA
| | - Wei Zhao
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jeffrey Haessler
- Division of Public Health Science, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Roby Joehanes
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Xianbang Sun
- Department of Biostatistics, Boston University, Boston, MA 02118, USA
| | - Xiuqing Guo
- Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Ryan J Longchamps
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - JoAnn E Manson
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Megan L Grove
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Jan Bressler
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Kent D Taylor
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario N6A 5C1, Canada
| | - Tuuli Lappalainen
- New York Genome Center, New York, NY 10013, USA
- Department of Systems Biology, Columbia University, New York, NY 10034, USA
| | - Silva Kasela
- New York Genome Center, New York, NY 10013, USA
- Department of Systems Biology, Columbia University, New York, NY 10034, USA
| | - David J Van Den Berg
- Department of Population and Public Health Sciences, Center for Genetic Epidemiology, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| | - Lifang Hou
- Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Alexander Reiner
- Division of Public Health Science, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Yongmei Liu
- Department of Medicine, Divisions of Cardiology and Neurology, Duke University Medical Center, Durham, NC 27704, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jennifer A Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA
| | - Patricia A Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA
| | - Myriam Fornage
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22903, USA
| | - Jerome I Rotter
- Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Charles Kooperberg
- Division of Public Health Science, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Dan E Arking
- McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Daniel Levy
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
- Framingham Heart Study, National Heart, Lung, and Blood Institute (NHLBI), Framingham, MA 01702, USA
| | - Chunyu Liu
- Department of Biostatistics, Boston University, Boston, MA 02118, USA
- Framingham Heart Study, National Heart, Lung, and Blood Institute (NHLBI), Framingham, MA 01702, USA
| | | |
Collapse
|
6
|
Arani AA, Sehhati M, Tabatabaiefar MA. Predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization. Sci Rep 2021; 11:23747. [PMID: 34887492 PMCID: PMC8660898 DOI: 10.1038/s41598-021-03230-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 11/30/2021] [Indexed: 11/21/2022] Open
Abstract
Among an assortment of genetic variations, Missense are major ones which a small subset of them may led to the upset of the protein function and ultimately end in human diseases. Various machine learning methods were declared to differentiate deleterious and benign missense variants by means of a large number of features, including structure, sequence, interaction networks, gene disease associations as well as phenotypes. However, development of a reliable and accurate algorithm for merging heterogeneous information is highly needed as it could be captured all information of complex interactions on network that genes participate in. In this study we proposed a new method based on the non-negative matrix tri-factorization clustering method. We outlined two versions of the proposed method: two-source and three-source algorithms. Two-source algorithm aggregates individual deleteriousness prediction methods and PPI network, and three-source algorithm incorporates gene disease associations into the other sources already mentioned. Four benchmark datasets were employed for internally and externally validation of both algorithms of our predictor. The results at all datasets confirmed that, our method outperforms most state of the art variant prediction tools. Two key features of our variant effect prediction method are worth mentioning. Firstly, despite the fact that the incorporation of gene disease information at three-source algorithm can improve prediction performance by comparison with two-source algorithm, our method did not hinder by type 2 circularity error unlike some recent ensemble-based prediction methods. Type 2 circularity error occurs when the predictor annotates variants on the basis of the genes located on. Secondly, the performance of our predictor is superior over other ensemble-based methods for variants positioned on genes in which we do not have enough information about their pathogenicity.
Collapse
Affiliation(s)
- Asieh Amousoltani Arani
- Department of Bioelectric and Biomedical Engineering, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
- Student Research Committee, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Mohammadreza Sehhati
- Department of Bioinformatics, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.
- Deputy of Research and Technology, GTaC Corp, Isfahan University of Medical Sciences, Isfahan, Iran.
| | - Mohammad Amin Tabatabaiefar
- Deputy of Research and Technology, GTaC Corp, Isfahan University of Medical Sciences, Isfahan, Iran
- Department of Genetics and Molecular Biology, School of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| |
Collapse
|
7
|
Fu L, Wang Y, Li T, Hu YQ. A Novel Approach Integrating Hierarchical Clustering and Weighted Combination for Association Study of Multiple Phenotypes and a Genetic Variant. Front Genet 2021; 12:654804. [PMID: 34220938 PMCID: PMC8249926 DOI: 10.3389/fgene.2021.654804] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 04/20/2021] [Indexed: 11/26/2022] Open
Abstract
As a pivotal research tool, genome-wide association study has successfully identified numerous genetic variants underlying distinct diseases. However, these identified genetic variants only explain a small proportion of the phenotypic variation for certain diseases, suggesting that there are still more genetic signals to be detected. One of the reasons may be that one-phenotype one-variant association study is not so efficient in detecting variants of weak effects. Nowadays, it is increasingly worth noting that joint analysis of multiple phenotypes may boost the statistical power to detect pathogenic variants with weak genetic effects on complex diseases, providing more clues for their underlying biology mechanisms. So a Weighted Combination of multiple phenotypes following Hierarchical Clustering method (WCHC) is proposed for simultaneously analyzing multiple phenotypes in association studies. A series of simulations are conducted, and the results show that WCHC is either the most powerful method or comparable with the most powerful competitor in most of the simulation scenarios. Additionally, we evaluated the performance of WCHC in its application to the obesity-related phenotypes from Atherosclerosis Risk in Communities, and several associated variants are reported.
Collapse
Affiliation(s)
- Liwan Fu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
- Center for Non-communicable Disease Management, Beijing Children’s Hospital, Capital Medical University, National Center for Children’s Health, Beijing, China
| | - Yuquan Wang
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Tingting Li
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Yue-Qing Hu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
- Shanghai Center for Mathematical Sciences, Fudan University, Shanghai, China
| |
Collapse
|
8
|
Shulman C, Liang E, Kamura M, Udwan K, Yao T, Cattran D, Reich H, Hladunewich M, Pei Y, Savige J, Paterson AD, Suico MA, Kai H, Barua M. Type IV Collagen Variants in CKD: Performance of Computational Predictions for Identifying Pathogenic Variants. Kidney Med 2021; 3:257-266. [PMID: 33851121 PMCID: PMC8039416 DOI: 10.1016/j.xkme.2020.12.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Rationale & Objective Pathogenic variants in type IV collagen have been reported to account for a significant proportion of chronic kidney disease. Accordingly, genetic testing is increasingly used to diagnose kidney diseases, but testing also may reveal rare missense variants that are of uncertain clinical significance. To aid in interpretation, computational prediction (called in silico) programs may be used to predict whether a variant is clinically important. We evaluate the performance of in silico programs for COL4A3/A4/A5 variants. Study Design, Setting, & Participants Rare missense variants in COL4A3/A4/A5 were identified in disease cohorts, including a local focal segmental glomerulosclerosis (FSGS) cohort and publicly available disease databases, in which they are categorized as pathogenic or benign based on clinical criteria. Tests Compared & Outcomes All rare missense variants identified in the 4 disease cohorts were subjected to in silico predictions using 12 different programs. Comparisons between the predictions were compared with: (1) variant classification (pathogenic or benign) in the cohorts and (2) functional characterization in a randomly selected smaller number (17) of pathogenic or uncertain significance variants obtained from the local FSGS cohort. Results In silico predictions correctly classified 75% to 97% of pathogenic and 57% to 100% of benign COL4A3/A4/A5 variants in public disease databases. The congruency of in silico predictions was similar for variants categorized as pathogenic and benign, with the exception of benign COL4A5 variants, in which disease effects were overestimated. By contrast, in silico predictions and functional characterization classified all 9 pathogenic COL4A3/A4/A5 variants correctly that were obtained from a local FSGS cohort. However, these programs also overestimated the effects of genomic variants of uncertain significance when compared with functional characterization. Each of the 12 in silico programs used yielded similar results. Limitations Overestimation of in silico program sensitivity given that they may have been used in the categorization of variants labeled as pathogenic in disease repositories. Conclusions Our results suggest that in silico predictions are sensitive but not specific to assign COL4A3/A4/A5 variant pathogenicity, with misclassification of benign variants and variants of uncertain significance. Thus, we do not recommend in silico programs but instead recommend pursuing more objective levels of evidence suggested by medical genetics guidelines.
Collapse
Affiliation(s)
- Cole Shulman
- Division of Nephrology, University Health Network, Toronto, Canada.,Toronto General Hospital Research Institute, Toronto General Hospital, Toronto, Canada
| | - Emerald Liang
- Division of Nephrology, University Health Network, Toronto, Canada.,Toronto General Hospital Research Institute, Toronto General Hospital, Toronto, Canada
| | - Misato Kamura
- Department of Molecular Medicine, Graduate School of Pharmaceutical Science, Kumamoto University, Kumamoto, Japan
| | - Khalil Udwan
- Division of Nephrology, University Health Network, Toronto, Canada.,Toronto General Hospital Research Institute, Toronto General Hospital, Toronto, Canada
| | - Tony Yao
- Division of Nephrology, University Health Network, Toronto, Canada.,Toronto General Hospital Research Institute, Toronto General Hospital, Toronto, Canada
| | - Daniel Cattran
- Division of Nephrology, University Health Network, Toronto, Canada.,Toronto General Hospital Research Institute, Toronto General Hospital, Toronto, Canada.,Institute of Medical Sciences, Toronto, Canada.,Department of Medicine, Toronto, Canada
| | - Heather Reich
- Division of Nephrology, University Health Network, Toronto, Canada.,Toronto General Hospital Research Institute, Toronto General Hospital, Toronto, Canada.,Institute of Medical Sciences, Toronto, Canada.,Department of Medicine, Toronto, Canada
| | - Michelle Hladunewich
- Division of Nephrology, University Health Network, Toronto, Canada.,Toronto General Hospital Research Institute, Toronto General Hospital, Toronto, Canada.,Institute of Medical Sciences, Toronto, Canada.,Department of Medicine, Toronto, Canada
| | - York Pei
- Division of Nephrology, University Health Network, Toronto, Canada.,Toronto General Hospital Research Institute, Toronto General Hospital, Toronto, Canada.,Institute of Medical Sciences, Toronto, Canada.,Department of Medicine, Toronto, Canada
| | - Judy Savige
- University of Melbourne, Melbourne, Australia
| | - Andrew D Paterson
- Division of Epidemiology and Biostatistics, Dalla Lana School of Public Health, Toronto, Canada.,Genetics and Genome Biology, Research Institute at Hospital for Sick Children, Toronto, Canada
| | - Mary Ann Suico
- Department of Molecular Medicine, Graduate School of Pharmaceutical Science, Kumamoto University, Kumamoto, Japan
| | - Hirofumi Kai
- Department of Molecular Medicine, Graduate School of Pharmaceutical Science, Kumamoto University, Kumamoto, Japan
| | - Moumita Barua
- Division of Nephrology, University Health Network, Toronto, Canada.,Toronto General Hospital Research Institute, Toronto General Hospital, Toronto, Canada.,Institute of Medical Sciences, Toronto, Canada.,Department of Medicine, Toronto, Canada
| |
Collapse
|
9
|
Hahn G, Lutz SM, Hecker J, Prokopenko D, Cho MH, Silverman EK, Weiss ST, Lange C, The NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium. locStra: Fast analysis of regional/global stratification in whole-genome sequencing studies. Genet Epidemiol 2021; 45:82-98. [PMID: 32929743 PMCID: PMC7856019 DOI: 10.1002/gepi.22356] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Revised: 08/05/2020] [Accepted: 08/24/2020] [Indexed: 01/08/2023]
Abstract
locStra is an R -package for the analysis of regional and global population stratification in whole-genome sequencing (WGS) studies, where regional stratification refers to the substructure defined by the loci in a particular region on the genome. Population substructure can be assessed based on the genetic covariance matrix, the genomic relationship matrix, and the unweighted/weighted genetic Jaccard similarity matrix. Using a sliding window approach, the regional similarity matrices are compared with the global ones, based on user-defined window sizes and metrics, for example, the correlation between regional and global eigenvectors. An algorithm for the specification of the window size is provided. As the implementation fully exploits sparse matrix algebra and is written in C++, the analysis is highly efficient. Even on single cores, for realistic study sizes (several thousand subjects, several million rare variants per subject), the runtime for the genome-wide computation of all regional similarity matrices does typically not exceed one hour, enabling an unprecedented investigation of regional stratification across the entire genome. The package is applied to three WGS studies, illustrating the varying patterns of regional substructure across the genome and its beneficial effects on association testing.
Collapse
Affiliation(s)
- Georg Hahn
- Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, USA
| | - Sharon M. Lutz
- Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, USA
| | - Julian Hecker
- Department of Medicine, Brigham and Women's Hospital, Harvard University, Boston, Massachusetts, USA
| | - Dmitry Prokopenko
- Massachusetts General Hospital, Harvard University, Boston, Massachusetts, USA
| | - Michael H. Cho
- Department of Medicine, Brigham and Women's Hospital, Harvard University, Boston, Massachusetts, USA
| | - Edwin K. Silverman
- Department of Medicine, Brigham and Women's Hospital, Harvard University, Boston, Massachusetts, USA
| | - Scott T. Weiss
- Department of Medicine, Brigham and Women's Hospital, Harvard University, Boston, Massachusetts, USA
| | - Christoph Lange
- Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, USA
| | | |
Collapse
|
10
|
Quick C, Wen X, Abecasis G, Boehnke M, Kang HM. Integrating comprehensive functional annotations to boost power and accuracy in gene-based association analysis. PLoS Genet 2020; 16:e1009060. [PMID: 33320851 PMCID: PMC7737906 DOI: 10.1371/journal.pgen.1009060] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 08/18/2020] [Indexed: 11/19/2022] Open
Abstract
Gene-based association tests aggregate genotypes across multiple variants for each gene, providing an interpretable gene-level analysis framework for genome-wide association studies (GWAS). Early gene-based test applications often focused on rare coding variants; a more recent wave of gene-based methods, e.g. TWAS, use eQTLs to interrogate regulatory associations. Regulatory variants are expected to be particularly valuable for gene-based analysis, since most GWAS associations to date are non-coding. However, identifying causal genes from regulatory associations remains challenging and contentious. Here, we present a statistical framework and computational tool to integrate heterogeneous annotations with GWAS summary statistics for gene-based analysis, applied with comprehensive coding and tissue-specific regulatory annotations. We compare power and accuracy identifying causal genes across single-annotation, omnibus, and annotation-agnostic gene-based tests in simulation studies and an analysis of 128 traits from the UK Biobank, and find that incorporating heterogeneous annotations in gene-based association analysis increases power and performance identifying causal genes.
Collapse
Affiliation(s)
- Corbin Quick
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Xiaoquan Wen
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Gonçalo Abecasis
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
- Regeneron Genetics Center, Regeneron Pharmaceuticals, Tarrytown, NY, USA
| | - Michael Boehnke
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Hyun Min Kang
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
11
|
[An improved association analysis pipeline for tumor susceptibility variant in haplotype amplification area]. NAN FANG YI KE DA XUE XUE BAO = JOURNAL OF SOUTHERN MEDICAL UNIVERSITY 2020; 40:1493-1499. [PMID: 33118521 PMCID: PMC7606235 DOI: 10.12122/j.issn.1673-4254.2020.10.16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
OBJECTIVE Haplotype amplification on germline variants is suggested to imply potential selective advantages and clonal expansion susceptibility and has become an important signature for seeking cancer susceptibility gene.Here we propose an improved association method that fully considers the haplotype amplification status. METHODS The haplotype amplification status was estimated by the variant allelic frequencies.We adopted a permutation test on variant allelic frequencies to divide the candidate variants into multiple groups.A likelihood clustering method was then applied to establish the neighborhood system of the hidden Markov random field framework.A filtering pipeline was introduced into the proposed method to further refine the candidate variants, including a Wilson's interval filter and a false discovery rate controller.The final candidate set along with the haplotype amplification status was collapsed into the weighted virtual sites for association tests. RESULTS Through simulated tests on a series of datasets, we compared the type Ⅰ error rates of different minor allele frequencies, which stably fell within 2%, suggesting good robustness of the algorithm.In addition, we compared another 5 published association approaches for Type-Ⅰ and Type-Ⅱ error rates with the proposed method, which resulted in the error rates all within 2%, demonstrating significant advantages and a good statistical ability of the proposed method. CONCLUSIONS The proposed method can accurately identify tumor susceptibility variants in haplotype amplification area with good robustness and stability.
Collapse
|
12
|
Posner DC, Lin H, Meigs JB, Kolaczyk ED, Dupuis J. Convex combination sequence kernel association test for rare-variant studies. Genet Epidemiol 2020; 44:352-367. [PMID: 32100372 PMCID: PMC7205561 DOI: 10.1002/gepi.22287] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Revised: 12/17/2019] [Accepted: 01/27/2020] [Indexed: 02/06/2023]
Abstract
We propose a novel variant set test for rare-variant association studies, which leverages multiple single-nucleotide variant (SNV) annotations. Our approach optimizes a convex combination of different sequence kernel association test (SKAT) statistics, where each statistic is constructed from a different annotation and combination weights are optimized through a multiple kernel learning algorithm. The combination test statistic is evaluated empirically through data splitting. In simulations, we find our method preserves type I error at α = 2.5 × 1 0 - 6 and has greater power than SKAT(-O) when SNV weights are not misspecified and sample sizes are large ( N ≥ 5 , 000 ). We utilize our method in the Framingham Heart Study (FHS) to identify SNV sets associated with fasting glucose. While we are unable to detect any genome-wide significant associations between fasting glucose and 4-kb windows of rare variants ( p < 1 0 - 7 ) in 6,419 FHS participants, our method identifies suggestive associations between fasting glucose and rare variants near ROCK2 ( p = 2.1 × 1 0 - 5 ) and within CPLX1 ( p = 5.3 × 1 0 - 5 ). These two genes were previously reported to be involved in obesity-mediated insulin resistance and glucose-induced insulin secretion by pancreatic beta-cells, respectively. These findings will need to be replicated in other cohorts and validated by functional genomic studies.
Collapse
Affiliation(s)
- Daniel C Posner
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts
| | - Honghuang Lin
- National Heart Lung and Blood Institute's, Boston University's Framingham Heart Study, Framingham, Massachusetts
- Section of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, Boston, Massachusetts
| | - James B Meigs
- Division of General Internal Medicine, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| | - Eric D Kolaczyk
- Department of Mathematics and Statistics, Boston University, Boston, Massachusetts
| | - Josée Dupuis
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts
- National Heart Lung and Blood Institute's, Boston University's Framingham Heart Study, Framingham, Massachusetts
| |
Collapse
|
13
|
Longchamps RJ, Castellani CA, Yang SY, Newcomb CE, Sumpter JA, Lane J, Grove ML, Guallar E, Pankratz N, Taylor KD, Rotter JI, Boerwinkle E, Arking DE. Evaluation of mitochondrial DNA copy number estimation techniques. PLoS One 2020; 15:e0228166. [PMID: 32004343 PMCID: PMC6994099 DOI: 10.1371/journal.pone.0228166] [Citation(s) in RCA: 101] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 01/08/2020] [Indexed: 12/16/2022] Open
Abstract
Mitochondrial DNA copy number (mtDNA-CN), a measure of the number of mitochondrial genomes per cell, is a minimally invasive proxy measure for mitochondrial function and has been associated with several aging-related diseases. Although quantitative real-time PCR (qPCR) is the current gold standard method for measuring mtDNA-CN, mtDNA-CN can also be measured from genotyping microarray probe intensities and DNA sequencing read counts. To conduct a comprehensive examination on the performance of these methods, we use known mtDNA-CN correlates (age, sex, white blood cell count, Duffy locus genotype, incident cardiovascular disease) to evaluate mtDNA-CN calculated from qPCR, two microarray platforms, as well as whole genome (WGS) and whole exome sequence (WES) data across 1,085 participants from the Atherosclerosis Risk in Communities (ARIC) study and 3,489 participants from the Multi-Ethnic Study of Atherosclerosis (MESA). We observe mtDNA-CN derived from WGS data is significantly more associated with known correlates compared to all other methods (p < 0.001). Additionally, mtDNA-CN measured from WGS is on average more significantly associated with traits by 5.6 orders of magnitude and has effect size estimates 5.8 times more extreme than the current gold standard of qPCR. We further investigated the role of DNA extraction method on mtDNA-CN estimate reproducibility and found mtDNA-CN estimated from cell lysate is significantly less variable than traditional phenol-chloroform-isoamyl alcohol (p = 5.44x10-4) and silica-based column selection (p = 2.82x10-7). In conclusion, we recommend the field moves towards more accurate methods for mtDNA-CN, as well as re-analyze trait associations as more WGS data becomes available from larger initiatives such as TOPMed.
Collapse
Affiliation(s)
- Ryan J. Longchamps
- Department of Genetic Medicine, McKusick-Nathans Institute, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
| | - Christina A. Castellani
- Department of Genetic Medicine, McKusick-Nathans Institute, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
| | - Stephanie Y. Yang
- Department of Genetic Medicine, McKusick-Nathans Institute, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
| | - Charles E. Newcomb
- Department of Genetic Medicine, McKusick-Nathans Institute, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
| | - Jason A. Sumpter
- Department of Genetic Medicine, McKusick-Nathans Institute, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
| | - John Lane
- Department of Laboratory Medicine and Pathology, University of Minnesota Medical School, Minneapolis, MN, United States of America
| | - Megan L. Grove
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States of America
| | - Eliseo Guallar
- Department of Epidemiology and the Welch Center for Prevention, Epidemiology and Clinical Research, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States of America
| | - Nathan Pankratz
- Department of Laboratory Medicine and Pathology, University of Minnesota Medical School, Minneapolis, MN, United States of America
| | - Kent D. Taylor
- LABioMed and Department of Pediatrics, at Harbor-UCLA Medical Center, Institute for Translational Genomics and Population Sciences, Torrance, CA, United States of America
| | - Jerome I. Rotter
- LABioMed and Department of Pediatrics, at Harbor-UCLA Medical Center, Institute for Translational Genomics and Population Sciences, Torrance, CA, United States of America
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States of America
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, United States of America
| | - Dan E. Arking
- Department of Genetic Medicine, McKusick-Nathans Institute, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
| |
Collapse
|
14
|
Höglund J, Rafati N, Rask-Andersen M, Enroth S, Karlsson T, Ek WE, Johansson Å. Improved power and precision with whole genome sequencing data in genome-wide association studies of inflammatory biomarkers. Sci Rep 2019; 9:16844. [PMID: 31727947 PMCID: PMC6856527 DOI: 10.1038/s41598-019-53111-7] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Accepted: 10/26/2019] [Indexed: 02/07/2023] Open
Abstract
Genome-wide association studies (GWAS) have identified associations between thousands of common genetic variants and human traits. However, common variants usually explain a limited fraction of the heritability of a trait. A powerful resource for identifying trait-associated variants is whole genome sequencing (WGS) data in cohorts comprised of families or individuals from a limited geographical area. To evaluate the power of WGS compared to imputations, we performed GWAS on WGS data for 72 inflammatory biomarkers, in a kinship-structured cohort. When using WGS data, we identified 18 novel associations that were not detected when analyzing the same biomarkers with genotyped or imputed SNPs. Five of the novel top variants were low frequency variants with a minor allele frequency (MAF) of <5%. Our results suggest that, even when applying a GWAS approach, we gain power and precision using WGS data, presumably due to more accurate determination of genotypes. The lack of a comparable dataset for replication of our results is a limitation in our study. However, this further highlights that there is a need for more genetic epidemiological studies based on WGS data.
Collapse
Affiliation(s)
- Julia Höglund
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden.
| | - Nima Rafati
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Mathias Rask-Andersen
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Stefan Enroth
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Torgny Karlsson
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Weronica E Ek
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Åsa Johansson
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| |
Collapse
|
15
|
Abstract
Zusammenfassung
Häufige Krankheiten, die sog. Volkskrankheiten, sind in der Regel multifaktoriell verursacht, d. h. zu ihrer Entwicklung tragen sowohl genetische Faktoren als auch nicht-genetische Umgebungseinflüsse bei. Die geschätzte Gesamterblichkeit (‑heritabilität) reicht von moderat bis vergleichsweise hoch. Die genetische Architektur ist komplex und kann das gesamte allelische Spektrum, von häufigen Varianten mit niedriger Penetranz bis hin zu seltenen Varianten mit höherer Penetranz, sowie alle möglichen Kombinationen umfassen. Während häufige Varianten seit mehreren Jahren mit großem Erfolg durch genomweite Assoziationsstudien (GWAS) identifiziert werden, war bisher die Identifizierung seltener Varianten, insbesondere aufgrund der großen Zahl beitragender Gene, nur begrenzt erfolgreich. Dies ändert sich derzeit dank der Anwendung von Hochdurchsatz-Sequenziertechnologien („next-generation sequencing“, NGS) und der daraus resultierenden zunehmenden Verfügbarkeit von exom- und genomweiten Sequenzdaten großer Kollektive. In diesem Artikel geben wir einen Überblick über die Bedeutung seltener Varianten bei häufigen Erkrankungen sowie den aktuellen Stand in Bezug auf deren Identifizierung mittels NGS. Wir betrachten insbesondere die folgenden Fragen: Bei welchen häufigen Krankheiten ist ein Beitrag seltener Varianten zu erwarten, wie können diese Varianten identifiziert werden, und welches Potenzial bieten seltene Varianten für das Verständnis biologischer Prozesse bzw. für die Translation in die klinische Praxis?
Collapse
Affiliation(s)
- Kerstin U. Ludwig
- Aff2 0000 0000 8786 803X grid.15090.3d Emmy-Noether-Gruppe „Kraniofaziale Genomik“, Institut für Humangenetik U ni ver si täts kli ni kum Bonn Venusberg-Campus 1, Gebäude 76 53127 Bonn Deutschland
| | - Franziska Degenhardt
- Aff1 0000 0000 8786 803X grid.15090.3d Institut für Humangenetik Universitätsklinikum Bonn Bonn Deutschland
| | - Markus M. Nöthen
- Aff1 0000 0000 8786 803X grid.15090.3d Institut für Humangenetik Universitätsklinikum Bonn Bonn Deutschland
| |
Collapse
|
16
|
Naj AC, Lin H, Vardarajan BN, White S, Lancour D, Ma Y, Schmidt M, Sun F, Butkiewicz M, Bush WS, Kunkle BW, Malamon J, Amin N, Choi SH, Hamilton-Nelson KL, van der Lee SJ, Gupta N, Koboldt DC, Saad M, Wang B, Nato AQ, Sohi HK, Kuzma A, Wang LS, Cupples LA, van Duijn C, Seshadri S, Schellenberg GD, Boerwinkle E, Bis JC, Dupuis J, Salerno WJ, Wijsman EM, Martin ER, DeStefano AL. Quality control and integration of genotypes from two calling pipelines for whole genome sequence data in the Alzheimer's disease sequencing project. Genomics 2019; 111:808-818. [PMID: 29857119 PMCID: PMC6397097 DOI: 10.1016/j.ygeno.2018.05.004] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Revised: 04/03/2018] [Accepted: 05/06/2018] [Indexed: 12/30/2022]
Abstract
The Alzheimer's Disease Sequencing Project (ADSP) performed whole genome sequencing (WGS) of 584 subjects from 111 multiplex families at three sequencing centers. Genotype calling of single nucleotide variants (SNVs) and insertion-deletion variants (indels) was performed centrally using GATK-HaplotypeCaller and Atlas V2. The ADSP Quality Control (QC) Working Group applied QC protocols to project-level variant call format files (VCFs) from each pipeline, and developed and implemented a novel protocol, termed "consensus calling," to combine genotype calls from both pipelines into a single high-quality set. QC was applied to autosomal bi-allelic SNVs and indels, and included pipeline-recommended QC filters, variant-level QC, and sample-level QC. Low-quality variants or genotypes were excluded, and sample outliers were noted. Quality was assessed by examining Mendelian inconsistencies (MIs) among 67 parent-offspring pairs, and MIs were used to establish additional genotype-specific filters for GATK calls. After QC, 578 subjects remained. Pipeline-specific QC excluded ~12.0% of GATK and 14.5% of Atlas SNVs. Between pipelines, ~91% of SNV genotypes across all QCed variants were concordant; 4.23% and 4.56% of genotypes were exclusive to Atlas or GATK, respectively; the remaining ~0.01% of discordant genotypes were excluded. For indels, variant-level QC excluded ~36.8% of GATK and 35.3% of Atlas indels. Between pipelines, ~55.6% of indel genotypes were concordant; while 10.3% and 28.3% were exclusive to Atlas or GATK, respectively; and ~0.29% of discordant genotypes were. The final WGS consensus dataset contains 27,896,774 SNVs and 3,133,926 indels and is publicly available.
Collapse
Affiliation(s)
- Adam C Naj
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| | - Honghuang Lin
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Badri N Vardarajan
- Department of Neurology, Columbia University Medical Center, New York, NY, USA
| | - Simon White
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Daniel Lancour
- Department of Biomedical Genetics, Boston University School of Medicine, Boston, MA, USA
| | - Yiyi Ma
- Department of Biomedical Genetics, Boston University School of Medicine, Boston, MA, USA
| | - Michael Schmidt
- John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Fangui Sun
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Mariusz Butkiewicz
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| | - William S Bush
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| | - Brian W Kunkle
- John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - John Malamon
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Najaf Amin
- Department of Epidemiology, Erasmus Medical Center, Rotterdam, the Netherlands
| | - Seung Hoan Choi
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Kara L Hamilton-Nelson
- John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Sven J van der Lee
- Department of Epidemiology, Erasmus Medical Center, Rotterdam, the Netherlands
| | - Namrata Gupta
- Medical and Population Genetics Program, Broad Institute, Cambridge, MA, USA
| | - Daniel C Koboldt
- Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA
| | - Mohamad Saad
- Department of Biostatistics, University of Washington, Seattle, WA, USA; Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Bowen Wang
- Department of Statistics, University of Washington, Seattle, WA, USA
| | - Alejandro Q Nato
- Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Harkirat K Sohi
- Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Amanda Kuzma
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - L Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; The Framingham Heart Study, Framingham, MA, USA
| | - Cornelia van Duijn
- Department of Epidemiology, Erasmus Medical Center, Rotterdam, the Netherlands
| | - Sudha Seshadri
- The Framingham Heart Study, Framingham, MA, USA; Department of Neurology, Boston University School of Medicine, Boston, MA, USA
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Eric Boerwinkle
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA; Human Genetics Center, University of Texas Health Science Center, Houston, TX, USA
| | - Joshua C Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Josée Dupuis
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; The Framingham Heart Study, Framingham, MA, USA
| | - William J Salerno
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Ellen M Wijsman
- Department of Biostatistics, University of Washington, Seattle, WA, USA; Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Eden R Martin
- John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Anita L DeStefano
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; The Framingham Heart Study, Framingham, MA, USA; Department of Neurology, Boston University School of Medicine, Boston, MA, USA
| |
Collapse
|
17
|
Zhang Y, Zhou Y, van der Mei IAF, Simpson S, Ponsonby AL, Lucas RM, Tettey P, Charlesworth J, Kostner K, Taylor BV. Lipid-related genetic polymorphisms significantly modulate the association between lipids and disability progression in multiple sclerosis. J Neurol Neurosurg Psychiatry 2019; 90:636-641. [PMID: 30782980 DOI: 10.1136/jnnp-2018-319870] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Revised: 12/14/2018] [Accepted: 12/24/2018] [Indexed: 12/31/2022]
Abstract
OBJECTIVE To investigate whether lipid-related or body mass index (BMI)-related common genetic polymorphisms modulate the associations between serum lipid levels, BMI and disability progression in multiple sclerosis (MS). METHODS The association between disability progression (annualised Expanded Disability Status Scale (EDSS) change over 5 years, ΔEDSS) and lipid-related or BMI-related genetic polymorphisms was evaluated in a longitudinal cohort (n=184), diagnosed with MS. We constructed a cumulative genetic risk score (CGRS) of associated polymorphisms (p<0.05) and examined the interactions between the CGRS and lipid levels (measured at baseline) in predicting ΔEDSS. All analyses were conducted using linear regression. RESULTS Five lipid polymorphisms (rs2013208, rs9488822, rs17173637, rs10401969 and rs2277862) and one BMI polymorphism (rs2033529) were nominally associated with ΔEDSS. The constructed lipid CGRS showed a significant, dose-dependent association with ΔEDSS (ptrend=1.4×10-6), such that participants having ≥6 risk alleles progressed 0.38 EDSS points per year faster compared with those having ≤3. This CGRS model explained 16% of the variance in ΔEDSS. We also found significant interactions between the CGRS and lipid levels in modulating ΔEDSS, including high-density lipoprotein (HDL; pinteraction=0.005) and total cholesterol:high-density lipoprotein ratio (TC:HDL; pinteraction=0.030). The combined model (combination of CGRS and the lipid parameter) explained 26% of the disability variance for HDL and 27% for TC:HDL. INTERPRETATION In this prospective cohort study, both lipid levels and lipid-related polymorphisms individually and jointly were associated with significantly increased disability progression in MS. These results indicate that these polymorphisms and tagged genes might be potential points of intervention to moderate disability progression.
Collapse
Affiliation(s)
- Yan Zhang
- Menzies Institute for Medical Research, University of Tasmania, Hobart, Tasmania, Australia
| | - Yuan Zhou
- Menzies Institute for Medical Research, University of Tasmania, Hobart, Tasmania, Australia
| | - Ingrid A F van der Mei
- Menzies Institute for Medical Research, University of Tasmania, Hobart, Tasmania, Australia
| | - Steve Simpson
- Menzies Institute for Medical Research, University of Tasmania, Hobart, Tasmania, Australia.,Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, Victoria, Australia
| | - Anne-Louise Ponsonby
- Murdoch Children's Research Institute, The University of Melbourne, Melbourne, Victoria, Australia
| | - Robyn M Lucas
- National Centre for Epidemiology and Population Health, Research School of Population Health, College of Medicine, Biology and Environment, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Prudence Tettey
- Menzies Institute for Medical Research, University of Tasmania, Hobart, Tasmania, Australia.,School of Public Health, University of Ghana, Accra, Ghana
| | - Jac Charlesworth
- Menzies Institute for Medical Research, University of Tasmania, Hobart, Tasmania, Australia
| | - Karam Kostner
- Mater Hospital, University of Queensland, Brisbane, Queensland, Australia
| | - Bruce V Taylor
- Menzies Institute for Medical Research, University of Tasmania, Hobart, Tasmania, Australia
| | | |
Collapse
|
18
|
Li Z, Li X, Liu Y, Shen J, Chen H, Zhou H, Morrison AC, Boerwinkle E, Lin X. Dynamic Scan Procedure for Detecting Rare-Variant Association Regions in Whole-Genome Sequencing Studies. Am J Hum Genet 2019; 104:802-814. [PMID: 30982610 PMCID: PMC6507043 DOI: 10.1016/j.ajhg.2019.03.002] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2018] [Accepted: 03/01/2019] [Indexed: 11/19/2022] Open
Abstract
Whole-genome sequencing (WGS) studies are being widely conducted in order to identify rare variants associated with human diseases and disease-related traits. Classical single-marker association analyses for rare variants have limited power, and variant-set-based analyses are commonly used by researchers for analyzing rare variants. However, existing variant-set-based approaches need to pre-specify genetic regions for analysis; hence, they are not directly applicable to WGS data because of the large number of intergenic and intron regions that consist of a massive number of non-coding variants. The commonly used sliding-window method requires the pre-specification of fixed window sizes, which are often unknown as a priori, are difficult to specify in practice, and are subject to limitations given that the sizes of genetic-association regions are likely to vary across the genome and phenotypes. We propose a computationally efficient and dynamic scan-statistic method (Scan the Genome [SCANG]) for analyzing WGS data; this method flexibly detects the sizes and the locations of rare-variant association regions without the need to specify a prior, fixed window size. The proposed method controls for the genome-wise type I error rate and accounts for the linkage disequilibrium among genetic variants. It allows the detected sizes of rare-variant association regions to vary across the genome. Through extensive simulated studies that consider a wide variety of scenarios, we show that SCANG substantially outperforms several alternative methods for detecting rare-variant-associations while controlling for the genome-wise type I error rates. We illustrate SCANG by analyzing the WGS lipids data from the Atherosclerosis Risk in Communities (ARIC) study.
Collapse
Affiliation(s)
- Zilin Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Xihao Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Yaowu Liu
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Jincheng Shen
- Department of Population Health Sciences, University of Utah, Salt Lake City, UT 84108, USA
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, the University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Center for Precision Health, School of Public Health and School of Biomedical Informatics, the University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Hufeng Zhou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, the University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, the University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Department of Statistics, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
19
|
Brody JA, Morrison AC, Bis JC, O'Connell JR, Brown MR, Huffman JE, Ames DC, Carroll A, Conomos MP, Gabriel S, Gibbs RA, Gogarten SM, Gupta N, Jaquish CE, Johnson AD, Lewis JP, Liu X, Manning AK, Papanicolaou GJ, Pitsillides AN, Rice KM, Salerno W, Sitlani CM, Smith NL, Heckbert SR, Laurie CC, Mitchell BD, Vasan RS, Rich SS, Rotter JI, Wilson JG, Boerwinkle E, Psaty BM, Cupples LA. Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology. Nat Genet 2019; 49:1560-1563. [PMID: 29074945 DOI: 10.1038/ng.3968] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Affiliation(s)
- Jennifer A Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, Washington, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Joshua C Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, Washington, USA
| | - Jeffrey R O'Connell
- Department of Medicine, Division of Endocrinology, Diabetes, and Nutrition, University of Maryland, Baltimore, Maryland, USA
| | - Michael R Brown
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Jennifer E Huffman
- Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, Massachusetts, USA
| | | | | | - Matthew P Conomos
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
| | - Stacey Gabriel
- Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | | | - Namrata Gupta
- Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, USA
| | - Cashell E Jaquish
- National Heart, Lung, and Blood Institute, Division of Cardiovascular Sciences, Bethesda, Maryland, USA
| | - Andrew D Johnson
- Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, Massachusetts, USA
| | - Joshua P Lewis
- Department of Medicine, Division of Endocrinology, Diabetes, and Nutrition, University of Maryland, Baltimore, Maryland, USA
| | - Xiaoming Liu
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Alisa K Manning
- Center for Human Genetics Research, Massachusetts General Hospital, Boston, Massachusetts, USA.,Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA.,Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA
| | - George J Papanicolaou
- National Heart, Lung, and Blood Institute, Division of Cardiovascular Sciences, Bethesda, Maryland, USA
| | - Achilleas N Pitsillides
- Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, Massachusetts, USA
| | - Kenneth M Rice
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
| | - William Salerno
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | - Colleen M Sitlani
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, Washington, USA
| | - Nicholas L Smith
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, Washington, USA.,Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA.,Seattle Epidemiologic Research and Information Center, Department of Veteran Affairs Office of Research and Development, Seattle, Washington, USA.,Department of Epidemiology, University of Washington, Seattle, Washington, USA
| | | | | | | | | | - Susan R Heckbert
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, Washington, USA.,Department of Epidemiology, University of Washington, Seattle, Washington, USA
| | - Cathy C Laurie
- Department of Biostatistics, University of Washington, Seattle, Washington, USA
| | - Braxton D Mitchell
- Department of Medicine, Division of Endocrinology, Diabetes, and Nutrition, University of Maryland, Baltimore, Maryland, USA.,Geriatrics Research and Education Clinical Center, Baltimore Veterans Administration Medical Center, Baltimore, Maryland, USA
| | - Ramachandran S Vasan
- Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, Massachusetts, USA.,Sections of Preventive Medicine and Epidemiology, and of Cardiology, Department of Medicine, Boston University School of Medicine, Boston, Massachusetts, USA.,Department of Epidemiology, Boston University School of Public Health, Boston, Massachusetts, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, USA
| | - Jerome I Rotter
- Institute for Translational Genomics and Population Sciences, Departments of Pediatrics and Medicine, LABioMed at Harbor -UCLA Medical Center, Torrance, California, USA
| | - James G Wilson
- Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, Mississippi, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, University of Texas Health Science Center at Houston, Houston, Texas, USA.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, Washington, USA.,Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA.,Departments of Medicine, Epidemiology, and Health Services, University of Washington, Seattle, Washington, USA
| | - L Adrienne Cupples
- Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, Massachusetts, USA.,Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, USA
| |
Collapse
|
20
|
Ma L, Cole J, Da Y, VanRaden P. Symposium review: Genetics, genome-wide association study, and genetic improvement of dairy fertility traits. J Dairy Sci 2019; 102:3735-3743. [DOI: 10.3168/jds.2018-15269] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2018] [Accepted: 08/16/2018] [Indexed: 12/13/2022]
|
21
|
High-Throughput Sequencing in Respiratory, Critical Care, and Sleep Medicine Research. An Official American Thoracic Society Workshop Report. Ann Am Thorac Soc 2019; 16:1-16. [PMID: 30592451 PMCID: PMC6812157 DOI: 10.1513/annalsats.201810-716ws] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
High-throughput, "next-generation" sequencing methods are now being broadly applied across all fields of biomedical research, including respiratory disease, critical care, and sleep medicine. Although there are numerous review articles and best practice guidelines related to sequencing methods and data analysis, there are fewer resources summarizing issues related to study design and interpretation, especially as applied to common, complex, nonmalignant diseases. To address these gaps, a single-day workshop was held at the American Thoracic Society meeting in May 2017, led by the American Thoracic Society Section on Genetics and Genomics. The aim of this workshop was to review the design, analysis, interpretation, and functional follow-up of high-throughput sequencing studies in respiratory, critical care, and sleep medicine research. This workshop brought together experts in multiple fields, including genetic epidemiology, biobanking, bioinformatics, and research ethics, along with physician-scientists with expertise in a range of relevant diseases. The workshop focused on application of DNA and RNA sequencing research in common chronic diseases and did not cover sequencing studies in lung cancer, monogenic diseases (e.g., cystic fibrosis), or microbiome sequencing. Participants reviewed and discussed study design, data analysis and presentation, interpretation, functional follow-up, and reporting of results. This report summarizes the main conclusions of the workshop, specifically addressing the application of these methods in respiratory, critical care, and sleep medicine research. This workshop report may serve as a resource for our research community as well as for journal editors and reviewers of sequencing-based manuscript submissions in our research field.
Collapse
|
22
|
Gao TH, Zhang J, Miguelangel DM, Wang X. Methods to evaluate rare variants gene-age interaction for triglycerides. BMC Proc 2018; 12:49. [PMID: 30263050 PMCID: PMC6156913 DOI: 10.1186/s12919-018-0136-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/13/2023] Open
Abstract
Triglycerides are an important measure of heart health. Although more than 90 genes have been found to be associated to lipids, they only explain 12 to 15% of the variance in lipid levels. Evidence suggests that age may interact with the genetic effect on lipid levels. Existing methods to detect the main effect of rare variants cannot be readily applied for testing the gene environment interaction effect of rare variants, as those methods either have unstable results or inflated Type I error rates when the main effect exists. To overcome these difficulties, we developed two statistical methods: testing of optimally weighted combination of single-nucleotide polymorphism (SNP) environment interaction (TOW-SE) and a variable weight TOW-SE (VW-TOW-SE) to test the gene environment interaction effect of rare variants by grouping SNPs into biologically meaningful SNP-sets (SNPs in a gene or pathway) to improve power and interpretability. The proposed methods can be applied to either continuous or binary environmental variables, and to either continuous or binary outcomes. Simulation studies show that Type I error rates of the proposed methods are under control. Comparing the two methods with the existing interaction sequence kernel association test (iSKAT), the VW-TOW-SE is the most powerful test and the TOW-SE is the second most powerful test when gene environment interaction effect exists for both rare and common variants. The three tests were applied to the GAW20 simulated data, among the five regions in which the main effect of common SNPs was simulated and the gene–age interaction effect was not included. As expected, none of the tests indicated positive results.
Collapse
Affiliation(s)
- Tony Huayang Gao
- 1Texas Academy of Mathematics & Science, University of North Texas, 1155 Union Circle #311430, Denton, TX 76203 USA
| | - Jianjun Zhang
- 2Department of Mathematics, University of North Texas, 1155 Union Circle #311430, Denton, TX 76203 USA
| | | | - Xuexia Wang
- 2Department of Mathematics, University of North Texas, 1155 Union Circle #311430, Denton, TX 76203 USA
| |
Collapse
|
23
|
Lumley T, Brody J, Peloso G, Morrison A, Rice K. FastSKAT: Sequence kernel association tests for very large sets of markers. Genet Epidemiol 2018; 42:516-527. [PMID: 29932245 PMCID: PMC6129408 DOI: 10.1002/gepi.22136] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 04/30/2018] [Accepted: 05/10/2018] [Indexed: 11/06/2022]
Abstract
The sequence kernel association test (SKAT) is widely used to test for associations between a phenotype and a set of genetic variants that are usually rare. Evaluating tail probabilities or quantiles of the null distribution for SKAT requires computing the eigenvalues of a matrix related to the genotype covariance between markers. Extracting the full set of eigenvalues of this matrix (an n × n matrix, for n subjects) has computational complexity proportional to n3 . As SKAT is often used when n > 10 4 , this step becomes a major bottleneck in its use in practice. We therefore propose fastSKAT, a new computationally inexpensive but accurate approximations to the tail probabilities, in which the k largest eigenvalues of a weighted genotype covariance matrix or the largest singular values of a weighted genotype matrix are extracted, and a single term based on the Satterthwaite approximation is used for the remaining eigenvalues. While the method is not particularly sensitive to the choice of k, we also describe how to choose its value, and show how fastSKAT can automatically alert users to the rare cases where the choice may affect results. As well as providing faster implementation of SKAT, the new method also enables entirely new applications of SKAT that were not possible before; we give examples grouping variants by topologically associating domains, and comparing chromosome-wide association by class of histone marker.
Collapse
Affiliation(s)
| | - Jennifer Brody
- Cardiovascular Health Research Unit, University of Washington
| | - Gina Peloso
- Department of Biostatistics, Boston University
| | | | - Kenneth Rice
- Department of Biostatistics, University of Washington
| |
Collapse
|
24
|
Natarajan P, Peloso GM, Zekavat SM, Montasser M, Ganna A, Chaffin M, Khera AV, Zhou W, Bloom JM, Engreitz JM, Ernst J, O'Connell JR, Ruotsalainen SE, Alver M, Manichaikul A, Johnson WC, Perry JA, Poterba T, Seed C, Surakka IL, Esko T, Ripatti S, Salomaa V, Correa A, Vasan RS, Kellis M, Neale BM, Lander ES, Abecasis G, Mitchell B, Rich SS, Wilson JG, Cupples LA, Rotter JI, Willer CJ, Kathiresan S. Deep-coverage whole genome sequences and blood lipids among 16,324 individuals. Nat Commun 2018; 9:3391. [PMID: 30140000 PMCID: PMC6107638 DOI: 10.1038/s41467-018-05747-8] [Citation(s) in RCA: 127] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 06/22/2018] [Indexed: 12/20/2022] Open
Abstract
Large-scale deep-coverage whole-genome sequencing (WGS) is now feasible and offers potential advantages for locus discovery. We perform WGS in 16,324 participants from four ancestries at mean depth >29X and analyze genotypes with four quantitative traits-plasma total cholesterol, low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol, and triglycerides. Common variant association yields known loci except for few variants previously poorly imputed. Rare coding variant association yields known Mendelian dyslipidemia genes but rare non-coding variant association detects no signals. A high 2M-SNP LDL-C polygenic score (top 5th percentile) confers similar effect size to a monogenic mutation (~30 mg/dl higher for each); however, among those with severe hypercholesterolemia, 23% have a high polygenic score and only 2% carry a monogenic mutation. At these sample sizes and for these phenotypes, the incremental value of WGS for discovery is limited but WGS permits simultaneous assessment of monogenic and polygenic models to severe hypercholesterolemia.
Collapse
Affiliation(s)
- Pradeep Natarajan
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, 02114, USA
- Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA
- Broad Institute of Harvard & MIT, Cambridge, MA, 02142, USA
| | - Gina M Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, 02118, USA
| | - Seyedeh Maryam Zekavat
- Broad Institute of Harvard & MIT, Cambridge, MA, 02142, USA
- Yale School of Medicine, New Haven, CT, 06510, USA
- Department of Computational Biology & Bioinformatics, Yale University, New Haven, CT, 06520, USA
| | - May Montasser
- School of Medicine, University of Maryland, Baltimore, MD, 21201, USA
| | - Andrea Ganna
- Broad Institute of Harvard & MIT, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Mark Chaffin
- Broad Institute of Harvard & MIT, Cambridge, MA, 02142, USA
| | - Amit V Khera
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, 02114, USA
- Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA
- Broad Institute of Harvard & MIT, Cambridge, MA, 02142, USA
| | - Wei Zhou
- Department of Computational Medicine and Bioinformatics, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Jonathan M Bloom
- Broad Institute of Harvard & MIT, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Jesse M Engreitz
- Broad Institute of Harvard & MIT, Cambridge, MA, 02142, USA
- Society of Fellows, Harvard University, Cambridge, MA, 02138, USA
| | - Jason Ernst
- Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | | | | | - Maris Alver
- Estonian Genome Center, University of Tartu, Tartu, 51010, Estonia
| | - Ani Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, 22908, USA
| | - W Craig Johnson
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA
| | - James A Perry
- School of Medicine, University of Maryland, Baltimore, MD, 21201, USA
| | - Timothy Poterba
- Broad Institute of Harvard & MIT, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Cotton Seed
- Broad Institute of Harvard & MIT, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Ida L Surakka
- Institute for Molecular Medicine Finland, Helsinki, 00290, Finland
| | - Tonu Esko
- Estonian Genome Center, University of Tartu, Tartu, 51010, Estonia
| | - Samuli Ripatti
- Institute for Molecular Medicine Finland, Helsinki, 00290, Finland
| | - Veikko Salomaa
- Institute for Molecular Medicine Finland, Helsinki, 00290, Finland
| | - Adolfo Correa
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, 39216, USA
| | - Ramachandran S Vasan
- Sections of Preventive Medicine and Epidemiology and Cardiology, Department of Medicine, Boston University School of Medicine, Boston, MA, 02118, USA
- Department of Epidemiology, Boston University School of Public Health, Boston, MA, 02118, USA
- Framingham Heart Study, Framingham, MA, 01702, USA
| | - Manolis Kellis
- Broad Institute of Harvard & MIT, Cambridge, MA, 02142, USA
- Computer Science and Artificial Intelligence Lab (CSAIL), Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Benjamin M Neale
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, 02114, USA
- Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA
- Broad Institute of Harvard & MIT, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Eric S Lander
- Broad Institute of Harvard & MIT, Cambridge, MA, 02142, USA
| | - Goncalo Abecasis
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Braxton Mitchell
- School of Medicine, University of Maryland, Baltimore, MD, 21201, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, 22908, USA
| | - James G Wilson
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, 39216, USA
- Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS, 39216, USA
| | - L Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, 02118, USA
- Framingham Heart Study, Framingham, MA, 01702, USA
| | - Jerome I Rotter
- Institute for Translational Genomics and Population Sciences, LABioMed and Departments of Pediatrics and Medicine, Harbor-UCLA Medical Center, Torrance, CA, 90502, USA
| | - Cristen J Willer
- Departments of Human Genetics, Internal Medicine, and Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Sekar Kathiresan
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, 02114, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA.
- Broad Institute of Harvard & MIT, Cambridge, MA, 02142, USA.
| |
Collapse
|
25
|
Sequence-Based Analysis of Lipid-Related Metabolites in a Multiethnic Study. Genetics 2018; 209:607-616. [PMID: 29610217 DOI: 10.1534/genetics.118.300751] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Accepted: 03/29/2018] [Indexed: 01/07/2023] Open
Abstract
Small molecule lipid-related metabolites are important components of fatty acid and steroid metabolism-two important contributors to human health. This study investigated the extent to which rare and common genetic variants spanning the human genome influence the lipid-related metabolome. Sequence data from 1552 European-Americans (EA) and 1872 African-Americans (AA) were analyzed to examine the impact of common and rare variants on the levels of 102 circulating lipid-related metabolites measured by a combination of chromatography and mass spectroscopy. We conducted single variant tests [minor allele frequency (MAF) > 5%, statistical significance P-value ≤ 2.45 × 10-10] and tests aggregating rare variants (MAF ≤ 5%) across multiple genomic motifs, such as coding regions and regulatory domains, and sliding windows. Multiethnic meta-analyses detected 53 lipid-related metabolites-locus pairs, which were inspected for evidence of consistent signal between the two ethnic groups. Thirty-eight lipid-related metabolite-genomic region associations were consistent across ethnicities, among which seven were novel. The regions contain genes that are related to metabolite transport (SLC10A1) and metabolism (SCD, FDX1, UGT2B15, and FADS2). Six of the seven novel findings lie in expression quantitative trait loci affecting the expression levels of 14 surrounding genes in multiple tissues. Imputed expression levels of 10 of the affected genes were associated with four corresponding lipid-related traits in at least one tissue. Our findings offer valuable insight into circulating lipid-related metabolite regulation in a multiethnic population.
Collapse
|
26
|
Korvigo I, Afanasyev A, Romashchenko N, Skoblov M. Generalising better: Applying deep learning to integrate deleteriousness prediction scores for whole-exome SNV studies. PLoS One 2018; 13:e0192829. [PMID: 29538399 PMCID: PMC5851551 DOI: 10.1371/journal.pone.0192829] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Accepted: 01/31/2018] [Indexed: 12/15/2022] Open
Abstract
Many automatic classifiers were introduced to aid inference of phenotypical effects of uncategorised nsSNVs (nonsynonymous Single Nucleotide Variations) in theoretical and medical applications. Lately, several meta-estimators have been proposed that combine different predictors, such as PolyPhen and SIFT, to integrate more information in a single score. Although many advances have been made in feature design and machine learning algorithms used, the shortage of high-quality reference data along with the bias towards intensively studied in vitro models call for improved generalisation ability in order to further increase classification accuracy and handle records with insufficient data. Since a meta-estimator basically combines different scoring systems with highly complicated nonlinear relationships, we investigated how deep learning (supervised and unsupervised), which is particularly efficient at discovering hierarchies of features, can improve classification performance. While it is believed that one should only use deep learning for high-dimensional input spaces and other models (logistic regression, support vector machines, Bayesian classifiers, etc) for simpler inputs, we still believe that the ability of neural networks to discover intricate structure in highly heterogenous datasets can aid a meta-estimator. We compare the performance with various popular predictors, many of which are recommended by the American College of Medical Genetics and Genomics (ACMG), as well as available deep learning-based predictors. Thanks to hardware acceleration we were able to use a computationally expensive genetic algorithm to stochastically optimise hyper-parameters over many generations. Overfitting was hindered by noise injection and dropout, limiting coadaptation of hidden units. Although we stress that this work was not conceived as a tool comparison, but rather an exploration of the possibilities of deep learning application in ensemble scores, our results show that even relatively simple modern neural networks can significantly improve both prediction accuracy and coverage. We provide open-access to our finest model via the web-site: http://score.generesearch.ru/services/badmut/.
Collapse
Affiliation(s)
- Ilia Korvigo
- Laboratory of Functional Analysis of the Genome, Moscow Institute of Physics and Technology, Moscow, Russia
- Laboratory of Microbiological Monitoring and Bioremediation of Soils, All-Russia Research Institute for Agricultural Microbiology, St. Petersburg, Russia
- ITMO University, St. Petersburg, Russia
| | - Andrey Afanasyev
- Laboratory of Functional Analysis of the Genome, Moscow Institute of Physics and Technology, Moscow, Russia
- iBinom Inc., Los Angeles, CA, United States of America
| | - Nikolay Romashchenko
- Laboratory of Microbiological Monitoring and Bioremediation of Soils, All-Russia Research Institute for Agricultural Microbiology, St. Petersburg, Russia
| | - Mikhail Skoblov
- Laboratory of Functional Analysis of the Genome, Moscow Institute of Physics and Technology, Moscow, Russia
- Research Center for Medical Genetics, Moscow, Russia
| |
Collapse
|
27
|
Blue EE, Bis JC, Dorschner MO, Tsuang D, Barral SM, Beecham G, Below JE, Bush WS, Butkiewicz M, Cruchaga C, DeStefano A, Farrer LA, Goate A, Haines J, Jaworski J, Jun G, Kunkle B, Kuzma A, Lee JJ, Lunetta K, Ma Y, Martin E, Naj A, Nato AQ, Navas P, Nguyen H, Reitz C, Reyes D, Salerno W, Schellenberg GD, Seshadri S, Sohi H, Thornton TA, Valladares O, van Duijn C, Vardarajan BN, Wang LS, Boerwinkle E, Dupuis J, Pericak-Vance MA, Mayeux R, Wijsman EM. Genetic Variation in Genes Underlying Diverse Dementias May Explain a Small Proportion of Cases in the Alzheimer's Disease Sequencing Project. Dement Geriatr Cogn Disord 2018; 45:1-17. [PMID: 29486463 PMCID: PMC5971141 DOI: 10.1159/000485503] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Accepted: 11/20/2017] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND/AIMS The Alzheimer's Disease Sequencing Project (ADSP) aims to identify novel genes influencing Alzheimer's disease (AD). Variants within genes known to cause dementias other than AD have previously been associated with AD risk. We describe evidence of co-segregation and associations between variants in dementia genes and clinically diagnosed AD within the ADSP. METHODS We summarize the properties of known pathogenic variants within dementia genes, describe the co-segregation of variants annotated as "pathogenic" in ClinVar and new candidates observed in ADSP families, and test for associations between rare variants in dementia genes in the ADSP case-control study. The participants were clinically evaluated for AD, and they represent European, Caribbean Hispanic, and isolate Dutch populations. RESULTS/CONCLUSIONS Pathogenic variants in dementia genes were predominantly rare and conserved coding changes. Pathogenic variants within ARSA, CSF1R, and GRN were observed, and candidate variants in GRN and CHMP2B were nominated in ADSP families. An independent case-control study provided evidence of an association between variants in TREM2, APOE, ARSA, CSF1R, PSEN1, and MAPT and risk of AD. Variants in genes which cause dementing disorders may influence the clinical diagnosis of AD in a small proportion of cases within the ADSP.
Collapse
Affiliation(s)
| | | | | | - Debby Tsuang
- University of Washington
- Veterans Administration Puget Sound Health Care
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Eric Boerwinkle
- Baylor College of Medicine
- University of Texas Health Sciences Center at Houston
| | | | | | | | | |
Collapse
|
28
|
Roles of NUCKS1 in Diseases: Susceptibility, Potential Biomarker, and Regulatory Mechanisms. BIOMED RESEARCH INTERNATIONAL 2018; 2018:7969068. [PMID: 29619377 PMCID: PMC5830027 DOI: 10.1155/2018/7969068] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 12/31/2017] [Indexed: 12/16/2022]
Abstract
Nuclear casein kinase and cyclin-dependent kinase substrate 1 (NUCKS1) is a 27 kD chromosomal, highly conserved, and vertebrate-specific protein. NUCKS1 gene encodes a nuclear protein and the conserved regions of NUCKS1 contain several consensus phosphorylation sites for casein kinase II (CK2) and cyclin-dependent kinases (Cdk) and a basic DNA-binding domain. NUCKS1 is similar to the high mobility group (HMG) family which dominates chromatin remodeling and regulates gene transcription. Meanwhile, NUCKS1 is a RAD51 associated protein 1 (RAD51AP1) paralog that is significant for homologous recombination (HR) and genome stability and also a transcriptional regulator of the insulin signaling components. NUCKS1 plays an important role in DNA damage response and metabolism, participates in inflammatory immune response, and correlates with microRNA. Although there is still not enough functional information on NUCKS1, evidences suggest that NUCKS1 can be used as the biomarker of several cancers. This review summarizes the latest research on NUCKS1 about its susceptibility in diseases, expression levels, and regulatory mechanisms as well as the possible functions in reference to diseases.
Collapse
|
29
|
de Vries PS, Yu B, Feofanova EV, Metcalf GA, Brown MR, Zeighami AL, Liu X, Muzny DM, Gibbs RA, Boerwinkle E, Morrison AC. Whole-genome sequencing study of serum peptide levels: the Atherosclerosis Risk in Communities study. Hum Mol Genet 2018; 26:3442-3450. [PMID: 28854705 DOI: 10.1093/hmg/ddx266] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Accepted: 07/04/2017] [Indexed: 01/27/2023] Open
Abstract
Oligopeptides are important markers of protein metabolism, as they are cleaved from larger polypeptides and proteins. Genetic association studies may help elucidate their origin and function. In 1,552 European Americans and 1,872 African Americans of the Atherosclerosis Risk in Communities study, we performed whole-genome and whole-exome sequencing and measured serum levels of 25 peptides. Common variants (minor allele frequency > 5%) were analysed individually. We grouped low-frequency variants (minor allele frequency ≤ 5%) by a genome-wide sliding window using region-based aggregate tests. Furthermore, low-frequency regulatory variants were grouped by gene, as were functional coding variants. All analyses were performed separately in each ancestry group and then meta-analysed. We identified 22 common variant associations with peptide levels (P-value < 4.2 × 10-10), including 16 novel gene-peptide pairs. Notably, variants in kinin-kallikrein genes KNG1, F12, KLKB1, and ACE were associated with several different peptides. Variants in KLKB1 and ACE were associated with a fragment of complement component 3f. Both common variants and low-frequency coding variants in CPN1 were associated with a fibrinogen cleavage peptide. Four sliding windows were significantly associated with peptide levels (P-value < 4.2 × 10-10). Our results highlight the importance of the kinin-kallikrein system in the regulation of serum peptide levels, strengthen the evidence for a broad link between the kinin-kallikrein and complement systems, and suggest a role of CPN1 in the conversion of fibrinogen to fibrin.
Collapse
Affiliation(s)
- Paul S de Vries
- Department of Epidemiology, Human Genetics, and Environmental Sciences, Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, 77030 TX, USA
| | - Bing Yu
- Department of Epidemiology, Human Genetics, and Environmental Sciences, Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, 77030 TX, USA
| | - Elena V Feofanova
- Department of Epidemiology, Human Genetics, and Environmental Sciences, Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, 77030 TX, USA
| | - Ginger A Metcalf
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, 77030 TX, USA
| | - Michael R Brown
- Department of Epidemiology, Human Genetics, and Environmental Sciences, Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, 77030 TX, USA
| | - Atefeh L Zeighami
- Department of Epidemiology, Human Genetics, and Environmental Sciences, Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, 77030 TX, USA
| | - Xiaoming Liu
- Department of Epidemiology, Human Genetics, and Environmental Sciences, Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, 77030 TX, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, 77030 TX, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, 77030 TX, USA
| | - Eric Boerwinkle
- Department of Epidemiology, Human Genetics, and Environmental Sciences, Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, 77030 TX, USA.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, 77030 TX, USA
| | - Alanna C Morrison
- Department of Epidemiology, Human Genetics, and Environmental Sciences, Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, 77030 TX, USA
| |
Collapse
|
30
|
Li C, Grove ML, Yu B, Jones BC, Morrison A, Boerwinkle E, Liu X. Genetic variants in microRNA genes and targets associated with cardiovascular disease risk factors in the African-American population. Hum Genet 2018; 137:85-94. [PMID: 29264654 PMCID: PMC5790599 DOI: 10.1007/s00439-017-1858-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Accepted: 12/06/2017] [Indexed: 02/07/2023]
Abstract
The purpose of this study is to identify microRNA (miRNA) related polymorphism, including single nucleotide variants (SNVs) in mature miRNA-encoding sequences or in miRNA-target sites, and their association with cardiovascular disease (CVD) risk factors in African-American population. To achieve our objective, we examined 1900 African-Americans from the Atherosclerosis Risk in Communities study using SNVs identified from whole-genome sequencing data. A total of 971 SNVs found in 726 different mature miRNA-encoding sequences and 16,057 SNVs found in the three prime untranslated region (3'UTR) of 3647 protein-coding genes were identified and interrogated their associations with 17 CVD risk factors. Using single-variant-based approach, we found 5 SNVs in miRNA-encoding sequences to be associated with serum Lipoprotein(a) [Lp(a)], high-density lipoprotein (HDL) or triglycerides, and 2 SNVs in miRNA-target sites to be associated with Lp(a) and HDL, all with false discovery rates of 5%. Using a gene-based approach, we identified 3 pairs of associations between gene NSD1 and platelet count, gene HSPA4L and cardiac troponin T, and gene AHSA2 and magnesium. We successfully validated the association between a variant specific to African-American population, NR_039880.1:n.18A>C, in mature hsa-miR-4727-5p encoding sequence and serum HDL level in an independent sample of 2135 African-Americans. Our study provided candidate miRNAs and their targets for further investigation of their potential contribution to ethnic disparities in CVD risk factors.
Collapse
Affiliation(s)
- Chang Li
- Human Genetics Center and Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Megan L Grove
- Human Genetics Center and Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Bing Yu
- Human Genetics Center and Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Barbara C Jones
- Human Genetics Center and Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Alanna Morrison
- Human Genetics Center and Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center and Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA.
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
| | - Xiaoming Liu
- Human Genetics Center and Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA.
- Center for Precision Health, The University of Texas Health Science Center at Houston, Houston, TX, USA.
| |
Collapse
|
31
|
Vitali C, Khetarpal SA, Rader DJ. HDL Cholesterol Metabolism and the Risk of CHD: New Insights from Human Genetics. Curr Cardiol Rep 2017; 19:132. [PMID: 29103089 DOI: 10.1007/s11886-017-0940-0] [Citation(s) in RCA: 79] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
PURPOSE OF REVIEW Elevated high-density lipoprotein cholesterol levels in the blood (HDL-C) represent one of the strongest epidemiological surrogates for protection against coronary heart disease (CHD), but recent human genetic and pharmacological intervention studies have raised controversy about the causality of this relationship. Here, we review recent discoveries from human genome studies using new analytic tools as well as relevant animal studies that have both addressed, and in some cases, fueled this controversy. RECENT FINDINGS Methodologic developments in genotyping and sequencing, such as genome-wide association studies (GWAS), exome sequencing, and exome array genotyping, have been applied to the study of HDL-C and risk of CHD in large, multi-ethnic populations. Some of these efforts focused on population-wide variation in common variants have uncovered new polymorphisms at novel loci associated with HDL-C and, in some cases, CHD risk. Other efforts have discovered loss-of-function variants for the first time in genes previously implicated in HDL metabolism through common variant studies or animal models. These studies have allowed the genetic relationship between these pathways, HDL-C and CHD to be explored in humans for the first time through analysis tools such as Mendelian randomization. We explore these discoveries for selected key HDL-C genes CETP, LCAT, LIPG, SCARB1, and novel loci implicated from GWAS including GALNT2, KLF14, and TTC39B. Recent human genetics findings have identified new nodes regulating HDL metabolism while reshaping our current understanding of known candidate genes to HDL and CHD risk through the study of critical variants across model systems. Despite their effect on HDL-C, variants in many of the reviewed genes were found to lack any association with CHD. These data collectively indicate that HDL-C concentration, which represents a static picture of a very dynamic and heterogeneous metabolic milieu, is unlikely to be itself causally protective against CHD. In this context, human genetics represent an extremely valuable tool to further explore the biological mechanisms regulating HDL metabolism and investigate what role, if any, HDL plays in the pathogenesis of CHD.
Collapse
Affiliation(s)
- Cecilia Vitali
- Perelman School of Medicine at the University of Pennsylvania, 11-162 TRC, 3400 Civic Center Blvd, Philadelphia, PA, 19104, USA
| | - Sumeet A Khetarpal
- Perelman School of Medicine at the University of Pennsylvania, 11-162 TRC, 3400 Civic Center Blvd, Philadelphia, PA, 19104, USA
| | - Daniel J Rader
- Perelman School of Medicine at the University of Pennsylvania, 11-162 TRC, 3400 Civic Center Blvd, Philadelphia, PA, 19104, USA. .,Departments of Genetics and Medicine, Cardiovascular Institute, and Institute for Translational Medicine and Therapeutics, Perelman School of Medicine at the University of Pennsylvania, 11-125 TRC, 3400 Civic Center Blvd, Philadelphia, PA, 19104, USA.
| |
Collapse
|
32
|
Dron JS, Wang J, Low-Kam C, Khetarpal SA, Robinson JF, McIntyre AD, Ban MR, Cao H, Rhainds D, Dubé MP, Rader DJ, Lettre G, Tardif JC, Hegele RA. Polygenic determinants in extremes of high-density lipoprotein cholesterol. J Lipid Res 2017; 58:2162-2170. [PMID: 28870971 PMCID: PMC5665671 DOI: 10.1194/jlr.m079822] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2017] [Revised: 08/31/2017] [Indexed: 11/24/2022] Open
Abstract
HDL cholesterol (HDL-C) remains a superior biochemical predictor of CVD risk, but its genetic basis is incompletely defined. In patients with extreme HDL-C concentrations, we concurrently evaluated the contributions of multiple large- and small-effect genetic variants. In a discovery cohort of 255 unrelated lipid clinic patients with extreme HDL-C levels, we used a targeted next-generation sequencing panel to evaluate rare variants in known HDL metabolism genes, simultaneously with common variants bundled into a polygenic trait score. Two additional cohorts were used for validation and included 1,746 individuals from the Montréal Heart Institute Biobank and 1,048 individuals from the University of Pennsylvania. Findings were consistent between cohorts: we found rare heterozygous large-effect variants in 18.7% and 10.9% of low- and high-HDL-C patients, respectively. We also found common variant accumulation, indicated by extreme polygenic trait scores, in an additional 12.8% and 19.3% of overall cases of low- and high-HDL-C extremes, respectively. Thus, the genetic basis of extreme HDL-C concentrations encountered clinically is frequently polygenic, with contributions from both rare large-effect and common small-effect variants. Multiple types of genetic variants should be considered as contributing factors in patients with extreme dyslipidemia.
Collapse
Affiliation(s)
- Jacqueline S Dron
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
- Robarts Research Institute, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Jian Wang
- Robarts Research Institute, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Cécile Low-Kam
- Montréal Heart Institute et Faculté de Médecine, Université de Montréal, Montréal, Québec, Canada
| | - Sumeet A Khetarpal
- Departments of Genetics and Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - John F Robinson
- Robarts Research Institute, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Adam D McIntyre
- Robarts Research Institute, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Matthew R Ban
- Robarts Research Institute, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Henian Cao
- Robarts Research Institute, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - David Rhainds
- Montréal Heart Institute et Faculté de Médecine, Université de Montréal, Montréal, Québec, Canada
| | - Marie-Pierre Dubé
- Montréal Heart Institute et Faculté de Médecine, Université de Montréal, Montréal, Québec, Canada
| | - Daniel J Rader
- Departments of Genetics, Medicine, and Pediatrics, the Cardiovascular Institute, and the Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Guillaume Lettre
- Montréal Heart Institute et Faculté de Médecine, Université de Montréal, Montréal, Québec, Canada
| | - Jean-Claude Tardif
- Montréal Heart Institute et Faculté de Médecine, Université de Montréal, Montréal, Québec, Canada
| | - Robert A Hegele
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
- Robarts Research Institute, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
- Department of Medicine, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| |
Collapse
|
33
|
Gienapp P, Fior S, Guillaume F, Lasky JR, Sork VL, Csilléry K. Genomic Quantitative Genetics to Study Evolution in the Wild. Trends Ecol Evol 2017; 32:897-908. [PMID: 29050794 DOI: 10.1016/j.tree.2017.09.004] [Citation(s) in RCA: 90] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Revised: 09/05/2017] [Accepted: 09/08/2017] [Indexed: 11/19/2022]
Abstract
Quantitative genetic theory provides a means of estimating the evolutionary potential of natural populations. However, this approach was previously only feasible in systems where the genetic relatedness between individuals could be inferred from pedigrees or experimental crosses. The genomic revolution opened up the possibility of obtaining the realized proportion of genome shared among individuals in natural populations of virtually any species, which could promise (more) accurate estimates of quantitative genetic parameters in virtually any species. Such a 'genomic' quantitative genetics approach relies on fewer assumptions, offers a greater methodological flexibility, and is thus expected to greatly enhance our understanding of evolution in natural populations, for example, in the context of adaptation to environmental change, eco-evolutionary dynamics, and biodiversity conservation.
Collapse
Affiliation(s)
- Phillip Gienapp
- Department of Animal Ecology, Netherlands Institute of Ecology (NIOO-KNAW), Wageningen, The Netherlands.
| | - Simone Fior
- Plant Ecological Genetics, ETH Zurich, Switzerland
| | - Frédéric Guillaume
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Switzerland
| | - Jesse R Lasky
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Victoria L Sork
- Department of Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles, CA, USA
| | - Katalin Csilléry
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Switzerland; Biodiversity and Conservation Biology, WSL Swiss Federal Research Institute, Birmensdorf, Switzerland
| |
Collapse
|
34
|
Zhao B, Lu Q, Cheng Y, Belcher JM, Siew ED, Leaf DE, Body SC, Fox AA, Waikar SS, Collard CD, Thiessen-Philbrook H, Ikizler TA, Ware LB, Edelstein CL, Garg AX, Choi M, Schaub JA, Zhao H, Lifton RP, Parikh CR. A Genome-Wide Association Study to Identify Single-Nucleotide Polymorphisms for Acute Kidney Injury. Am J Respir Crit Care Med 2017; 195:482-490. [PMID: 27576016 DOI: 10.1164/rccm.201603-0518oc] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
RATIONALE Acute kidney injury is a common and severe complication of critical illness and cardiac surgery. Despite significant attempts at developing treatments, therapeutic advances to attenuate acute kidney injury and expedite recovery have largely failed. OBJECTIVES Identifying genetic loci associated with increased risk of acute kidney injury may reveal novel pathways for therapeutic development. METHODS We conducted an exploratory genome-wide association study to identify single-nucleotide polymorphisms associated with genetic susceptibility to in-hospital acute kidney injury. MEASUREMENTS AND MAIN RESULTS We genotyped 609,508 single-nucleotide polymorphisms and performed genotype imputation in 760 acute kidney injury cases and 669 controls. We then evaluated polymorphisms that showed the strongest association with acute kidney injury in a replication patient population containing 206 cases with 1,406 controls. We observed an association between acute kidney injury and four single-nucleotide polymorphisms at two independent loci on metaanalysis of discovery and replication populations. These include rs62341639 (metaanalysis P = 2.48 × 10-7; odds ratio [OR], 0.64; 95% confidence interval [CI], 0.55-0.76) and rs62341657 (P = 3.26 × 10-7; OR, 0.65; 95% CI, 0.55-0.76) on chromosome 4 near APOL1-regulator IRF2, and rs9617814 (metaanalysis P = 3.81 × 10-6; OR, 0.70; 95% CI, 0.60-0.81) and rs10854554 (P = 6.53 × 10-7; OR, 0.67; 95% CI, 0.57-0.79) on chromosome 22 near acute kidney injury-related gene TBX1. CONCLUSIONS Our findings reveal two genetic loci that are associated with acute kidney injury. Additional studies should be conducted to functionally evaluate these loci and to identify other common genetic variants contributing to acute kidney injury.
Collapse
Affiliation(s)
- Bixiao Zhao
- 1 Department of Genetics, Yale University School of Medicine, New Haven, Connecticut
| | - Qiongshi Lu
- 2 Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut
| | - Yuwei Cheng
- 3 Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut
| | - Justin M Belcher
- 4 Program of Applied Translational Research and.,5 Section of Nephrology, Yale University School of Medicine, New Haven, Connecticut.,6 Clinical Epidemiology Research Center, Veterans Affairs Medical Center, West Haven, Connecticut
| | - Edward D Siew
- 7 Division of Nephrology and Hypertension and.,8 Vanderbilt Center for Kidney Disease, and.,9 Vanderbilt Integrated Program for Acute Kidney Injury Research, Vanderbilt University Medical Center, Nashville, Tennessee
| | | | - Simon C Body
- 11 Department of Anesthesiology, Perioperative and Pain Medicine, Brigham and Women's Hospital, Boston, Massachusetts
| | - Amanda A Fox
- 12 Department of Anesthesiology and Pain Management and.,13 McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas
| | | | - Charles D Collard
- 14 Department of Anesthesiology, Baylor St. Luke's Medical Center and the Texas Heart Institute, Houston, Texas
| | - Heather Thiessen-Philbrook
- 4 Program of Applied Translational Research and.,5 Section of Nephrology, Yale University School of Medicine, New Haven, Connecticut.,15 Lilibeth Caberto Kidney Clinical Research Unit, London Health Sciences Centre, London, Ontario, Canada
| | - T Alp Ikizler
- 7 Division of Nephrology and Hypertension and.,8 Vanderbilt Center for Kidney Disease, and.,9 Vanderbilt Integrated Program for Acute Kidney Injury Research, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Lorraine B Ware
- 16 Division of Allergy, Pulmonary, and Critical Care Medicine, Department of Medicine
| | | | - Amit X Garg
- 15 Lilibeth Caberto Kidney Clinical Research Unit, London Health Sciences Centre, London, Ontario, Canada.,18 Division of Nephrology, Department of Medicine and Department of Epidemiology and Biostatistics, Western University, London, Ontario, Canada.,19 Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada; and
| | - Murim Choi
- 1 Department of Genetics, Yale University School of Medicine, New Haven, Connecticut
| | | | - Hongyu Zhao
- 1 Department of Genetics, Yale University School of Medicine, New Haven, Connecticut.,2 Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut
| | - Richard P Lifton
- 1 Department of Genetics, Yale University School of Medicine, New Haven, Connecticut.,20 Howard Hughes Medical Institute, Yale University School of Medicine, New Haven, Connecticut
| | - Chirag R Parikh
- 4 Program of Applied Translational Research and.,5 Section of Nephrology, Yale University School of Medicine, New Haven, Connecticut.,6 Clinical Epidemiology Research Center, Veterans Affairs Medical Center, West Haven, Connecticut
| | | |
Collapse
|
35
|
Rashkin S, Jun G, Chen S, Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO), Abecasis GR. Optimal sequencing strategies for identifying disease-associated singletons. PLoS Genet 2017. [PMID: 28640830 PMCID: PMC5501675 DOI: 10.1371/journal.pgen.1006811] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
With the increasing focus of genetic association on the identification of trait-associated rare variants through sequencing, it is important to identify the most cost-effective sequencing strategies for these studies. Deep sequencing will accurately detect and genotype the most rare variants per individual, but may limit sample size. Low pass sequencing will miss some variants in each individual but has been shown to provide a cost-effective alternative for studies of common variants. Here, we investigate the impact of sequencing depth on studies of rare variants, focusing on singletons—the variants that are sampled in a single individual and are hardest to detect at low sequencing depths. We first estimate the sensitivity to detect singleton variants in both simulated data and in down-sampled deep genome and exome sequence data. We then explore the power of association studies comparing burden of singleton variants in cases and controls under a variety of conditions. We show that the power to detect singletons increases with coverage, typically plateauing for coverage > ~25x. Next, we show that, when total sequencing capacity is fixed, the power of association studies focused on singletons is typically maximized for coverage of 15-20x, independent of relative risk, disease prevalence, singleton burden, and case-control ratio. Our results suggest sequencing depth of 15-20x as an appropriate compromise of singleton detection power and sample size for studies of rare variants in complex disease. Genetic studies of rare variants can help us understand the biology of human disease. With modern techniques and sufficient effort, it is possible to very accurately resolve any human genome, identifying most of its unique features. When funding is limited, applying these techniques to study human disease often involves a trade-off between examining more samples, at reduced accuracy per sample, or fewer samples, each at greater accuracy. We evaluate these trade-offs for studies of very rare variants, using both simulation and real data. We propose cost effective strategies for increasing our understanding of human disease.
Collapse
Affiliation(s)
- Sara Rashkin
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, California, United States of America
- * E-mail: (SR); (GRA)
| | - Goo Jun
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
- Human Genetics Center, School of Public Health, University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Sai Chen
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | | | - Goncalo R. Abecasis
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail: (SR); (GRA)
| |
Collapse
|
36
|
Do C, Shearer A, Suzuki M, Terry MB, Gelernter J, Greally JM, Tycko B. Genetic-epigenetic interactions in cis: a major focus in the post-GWAS era. Genome Biol 2017. [PMID: 28629478 PMCID: PMC5477265 DOI: 10.1186/s13059-017-1250-y] [Citation(s) in RCA: 95] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Studies on genetic-epigenetic interactions, including the mapping of methylation quantitative trait loci (mQTLs) and haplotype-dependent allele-specific DNA methylation (hap-ASM), have become a major focus in the post-genome-wide-association-study (GWAS) era. Such maps can nominate regulatory sequence variants that underlie GWAS signals for common diseases, ranging from neuropsychiatric disorders to cancers. Conversely, mQTLs need to be filtered out when searching for non-genetic effects in epigenome-wide association studies (EWAS). Sequence variants in CCCTC-binding factor (CTCF) and transcription factor binding sites have been mechanistically linked to mQTLs and hap-ASM. Identifying these sites can point to disease-associated transcriptional pathways, with implications for targeted treatment and prevention.
Collapse
Affiliation(s)
- Catherine Do
- Institute for Cancer Genetics and Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY, 10032, USA
| | - Alyssa Shearer
- Institute for Cancer Genetics and Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY, 10032, USA
| | - Masako Suzuki
- Center for Epigenomics, Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Mary Beth Terry
- Department of Epidemiology, Columbia University Mailman School of Public Health, and Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY, 10032, USA
| | - Joel Gelernter
- Departments of Psychiatry, Genetics, and Neurobiology, Yale University School of Medicine, New Haven, CT, 06520, USA
| | - John M Greally
- Center for Epigenomics, Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Benjamin Tycko
- Institute for Cancer Genetics, Herbert Irving Comprehensive Cancer Center, Taub Institute for Research on Alzheimer's disease and the Aging Brain, New York, NY, 10032, USA. .,Department of Pathology and Cell Biology, Columbia University, New York, NY, 10032, USA.
| |
Collapse
|
37
|
Abstract
Despite thousands of genetic loci identified to date, a large proportion of genetic variation predisposing to complex disease and traits remains unaccounted for. Advances in sequencing technology enable focused explorations on the contribution of low-frequency and rare variants to human traits. Here we review experimental approaches and current knowledge on the contribution of these genetic variants in complex disease and discuss challenges and opportunities for personalised medicine.
Collapse
Affiliation(s)
- Lorenzo Bomba
- Human Genetics, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, CB10 1HH, UK
| | - Klaudia Walter
- Human Genetics, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, CB10 1HH, UK
| | - Nicole Soranzo
- Human Genetics, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, CB10 1HH, UK. .,Department of Haematology, University of Cambridge, Hills Rd, Cambridge, CB2 0AH, UK. .,The National Institute for Health Research Blood and Transplant Unit (NIHR BTRU) in Donor Health and Genomics at the University of Cambridge, University of Cambridge, Strangeways Research Laboratory, Wort's Causeway, Cambridge, CB1 8RN, UK.
| |
Collapse
|
38
|
Kallel-Bouattour R, Belguith-Maalej S, Zouari-Bradai E, Mnif M, Abid M, Hadj Kacem H. Intronic variants of SLC26A4 gene enhance splicing efficiency in hybrid minigene assay. Gene 2017; 620:10-14. [PMID: 28389359 DOI: 10.1016/j.gene.2017.03.043] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Revised: 03/19/2017] [Accepted: 03/31/2017] [Indexed: 12/14/2022]
Abstract
The SLC26A4 genomic sequence screening in autoimmune thyroid diseases (AITD) revealed different variants types with possible pathogenic effects. Although intronic variants may have more detrimental effects than those coding, they are poorly explored. Thus, in a first assessment, our bioinformatics analysis of intronic variants predicted a pathogenic effect of c.1002-9A>C, c.1545-5T>G and c.1544+9C>T variants. Validating these variants pathogenicity may provide new clues on the AITD physiopathology. Variants were explored in a general population by PCR-RFLP. These variants effects on the mRNA processing was assessed using functional splicing assay based in DNA hybrid minigene in HeLa cell lines. The constructs splicing efficiency was investigated by real time PCR. Our results revealed that c.1002-9A>C is a rare allele (minor frequency allele (MFA)=0.007) whereas c.1545-5T>G and c.1544+9C>T are low frequency variants. The RT-PCR analysis showed that these variants did not affect the mRNA processing. However, quantifying the transcripts generated from minigene constructs proved an mRNA splicing enhancement. Our study suggests a pathogenic effect of three intronic variants on the mRNA splicing efficiency using a DNA Hybrid minigene. By quantifying these transcripts, we unveil the limit of standard RT-PCR in analyzing a splicing minigene assay.
Collapse
Affiliation(s)
- Rihab Kallel-Bouattour
- Laboratoire Procédés de Criblage Moléculaire et Cellulaire, Centre Biotechnologie de Sfax, Tunisia
| | - Salima Belguith-Maalej
- Laboratoire Procédés de Criblage Moléculaire et Cellulaire, Centre Biotechnologie de Sfax, Tunisia
| | - Emna Zouari-Bradai
- Laboratoire Procédés de Criblage Moléculaire et Cellulaire, Centre Biotechnologie de Sfax, Tunisia
| | - Mouna Mnif
- Service d'Endocrinologie, CHU, Hédi Chaker, Sfax, Tunisia
| | - Mohamed Abid
- Service d'Endocrinologie, CHU, Hédi Chaker, Sfax, Tunisia
| | - Hassen Hadj Kacem
- Laboratoire Procédés de Criblage Moléculaire et Cellulaire, Centre Biotechnologie de Sfax, Tunisia; Department of Applied Biology, College of Sciences, University of Sharjah, United Arab Emirates.
| |
Collapse
|
39
|
Loehlein Fier H, Prokopenko D, Hecker J, Cho MH, Silverman EK, Weiss ST, Tanzi RE, Lange C. On the association analysis of genome-sequencing data: A spatial clustering approach for partitioning the entire genome into nonoverlapping windows. Genet Epidemiol 2017; 41:332-340. [PMID: 28318110 DOI: 10.1002/gepi.22040] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Revised: 12/20/2016] [Accepted: 02/04/2017] [Indexed: 12/16/2022]
Abstract
For the association analysis of whole-genome sequencing (WGS) studies, we propose an efficient and fast spatial-clustering algorithm. Compared to existing analysis approaches for WGS data, that define the tested regions either by sliding or consecutive windows of fixed sizes along variants, a meaningful grouping of nearby variants into consecutive regions has the advantage that, compared to sliding window approaches, the number of tested regions is likely to be smaller. In comparison to consecutive, fixed-window approaches, our approach is likely to group nearby variants together. Given existing biological evidence that disease-associated mutations tend to physically cluster in specific regions along the chromosome, the identification of meaningful groups of nearby located variants could thus lead to a potential power gain for association analysis. Our algorithm defines consecutive genomic regions based on the physical positions of the variants, assuming an inhomogeneous Poisson process and groups together nearby variants. As parameters are estimated locally, the algorithm takes the differing variant density along the chromosome into account and provides locally optimal partitioning of variants into consecutive regions. An R-implementation of the algorithm is provided. We discuss the theoretical advances of our algorithm compared to existing, window-based approaches and show the performance and advantage of our introduced algorithm in a simulation study and by an application to Alzheimer's disease WGS data. Our analysis identifies a region in the ITGB3 gene that potentially harbors disease susceptibility loci for Alzheimer's disease. The region-based association signal of ITGB3 replicates in an independent data set and achieves formally genome-wide significance. Software Implementation: An implementation of the algorithm in R is available at: https://github.com/heidefier/cluster_wgs_data.
Collapse
Affiliation(s)
- Heide Loehlein Fier
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America.,Working Group of Genomic Mathematics, University of Bonn, Bonn, Germany
| | - Dmitry Prokopenko
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Julian Hecker
- Working Group of Genomic Mathematics, University of Bonn, Bonn, Germany
| | - Michael H Cho
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Edwin K Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Scott T Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Rudolph E Tanzi
- Genetics and Aging Research Unit, MassGeneral Institute for Neurodegenerative Disease, Massachusetts General Hospital, Harvard Medical School, Charlestown, Massachusetts, United States of America
| | - Christoph Lange
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America.,Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| |
Collapse
|
40
|
Morrison AC, Huang Z, Yu B, Metcalf G, Liu X, Ballantyne C, Coresh J, Yu F, Muzny D, Feofanova E, Rustagi N, Gibbs R, Boerwinkle E. Practical Approaches for Whole-Genome Sequence Analysis of Heart- and Blood-Related Traits. Am J Hum Genet 2017; 100:205-215. [PMID: 28089252 DOI: 10.1016/j.ajhg.2016.12.009] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2016] [Accepted: 12/14/2016] [Indexed: 01/11/2023] Open
Abstract
Whole-genome sequencing (WGS) allows for a comprehensive view of the sequence of the human genome. We present and apply integrated methodologic steps for interrogating WGS data to characterize the genetic architecture of 10 heart- and blood-related traits in a sample of 1,860 African Americans. In order to evaluate the contribution of regulatory and non-protein coding regions of the genome, we conducted aggregate tests of rare variation across the entire genomic landscape using a sliding window, complemented by an annotation-based assessment of the genome using predefined regulatory elements and within the first intron of all genes. These tests were performed treating all variants equally as well as with individual variants weighted by a measure of predicted functional consequence. Significant findings were assessed in 1,705 individuals of European ancestry. After these steps, we identified and replicated components of the genomic landscape significantly associated with heart- and blood-related traits. For two traits, lipoprotein(a) levels and neutrophil count, aggregate tests of low-frequency and rare variation were significantly associated across multiple motifs. For a third trait, cardiac troponin T, investigation of regulatory domains identified a locus on chromosome 9. These practical approaches for WGS analysis led to the identification of informative genomic regions and also showed that defined non-coding regions, such as first introns of genes and regulatory domains, are associated with important risk factor phenotypes. This study illustrates the tractable nature of WGS data and outlines an approach for characterizing the genetic architecture of complex traits.
Collapse
Affiliation(s)
- Alanna C Morrison
- Human Genetics Center, University of Texas School of Public Health, Houston, TX 77030, USA.
| | - Zhuoyi Huang
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Bing Yu
- Human Genetics Center, University of Texas School of Public Health, Houston, TX 77030, USA
| | - Ginger Metcalf
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Xiaoming Liu
- Human Genetics Center, University of Texas School of Public Health, Houston, TX 77030, USA
| | - Christie Ballantyne
- Section of Cardiovascular Research, Baylor College of Medicine, Houston, TX 77030, USA; Houston Methodist Debakey Heart and Vascular Center, Houston, TX 77030, USA
| | - Josef Coresh
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21287, USA
| | - Fuli Yu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Donna Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Elena Feofanova
- Human Genetics Center, University of Texas School of Public Health, Houston, TX 77030, USA
| | - Navin Rustagi
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Richard Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Eric Boerwinkle
- Human Genetics Center, University of Texas School of Public Health, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA.
| |
Collapse
|
41
|
Yu B, de Vries PS, Metcalf GA, Wang Z, Feofanova EV, Liu X, Muzny DM, Wagenknecht LE, Gibbs RA, Morrison AC, Boerwinkle E. Whole genome sequence analysis of serum amino acid levels. Genome Biol 2016; 17:237. [PMID: 27884205 PMCID: PMC5123402 DOI: 10.1186/s13059-016-1106-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2016] [Accepted: 11/10/2016] [Indexed: 02/08/2023] Open
Abstract
Background Blood levels of amino acids are important biomarkers of disease and are influenced by synthesis, protein degradation, and gene–environment interactions. Whole genome sequence analysis of amino acid levels may establish a paradigm for analyzing quantitative risk factors. Results In a discovery cohort of 1872 African Americans and a replication cohort of 1552 European Americans we sequenced exons and whole genomes and measured serum levels of 70 amino acids. Rare and low-frequency variants (minor allele frequency ≤5%) were analyzed by three types of aggregating motifs defined by gene exons, regulatory regions, or genome-wide sliding windows. Common variants (minor allele frequency >5%) were analyzed individually. Over all four analysis strategies, 14 gene–amino acid associations were identified and replicated. The 14 loci accounted for an average of 1.8% of the variance in amino acid levels, which ranged from 0.4 to 9.7%. Among the identified locus–amino acid pairs, four are novel and six have been reported to underlie known Mendelian conditions. These results suggest that there may be substantial genetic effects on amino acid levels in the general population that may underlie inborn errors of metabolism. We also identify a predicted promoter variant in AGA (the gene that encodes aspartylglucosaminidase) that is significantly associated with asparagine levels, with an effect that is independent of any observed coding variants. Conclusions These data provide insights into genetic influences on circulating amino acid levels by integrating -omic technologies in a multi-ethnic population. The results also help establish a paradigm for whole genome sequence analysis of quantitative traits. Electronic supplementary material The online version of this article (doi:10.1186/s13059-016-1106-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Bing Yu
- Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Paul S de Vries
- Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Ginger A Metcalf
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Zhe Wang
- Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Elena V Feofanova
- Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Xiaoming Liu
- Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Donna Marie Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Lynne E Wagenknecht
- Public Health Sciences, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Alanna C Morrison
- Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX, USA. .,Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
42
|
Kim YJ, Lee J, Kim BJ, Park T. PreCimp: Pre-collapsing imputation approach increases imputation accuracy of rare variants in terms of collapsed variables. Genet Epidemiol 2016; 41:41-50. [PMID: 27859580 DOI: 10.1002/gepi.22020] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2015] [Revised: 08/17/2016] [Accepted: 09/21/2016] [Indexed: 12/22/2022]
Abstract
Imputation is widely used for obtaining information about rare variants. However, one issue concerning imputation is the low accuracy of imputed rare variants as the inaccurate imputed rare variants may distort the results of region-based association tests. Therefore, we developed a pre-collapsing imputation method (PreCimp) to improve the accuracy of imputation by using collapsed variables. Briefly, collapsed variables are generated using rare variants in the reference panel, and a new reference panel is constructed by inserting pre-collapsed variables into the original reference panel. Following imputation analysis provides the imputed genotypes of the collapsed variables. We demonstrated the performance of PreCimp on 5,349 genotyped samples using a Korean population specific reference panel including 848 samples of exome sequencing, Affymetrix 5.0, and exome chip. PreCimp outperformed a traditional post-collapsing method that collapses imputed variants after single rare variant imputation analysis. Compared with the results of post-collapsing method, PreCimp approach was shown to relatively increase imputation accuracy about 3.4-6.3% when dosage r2 is between 0.6 and 0.8, 10.9-16.1% when dosage r2 is between 0.4 and 0.6, and 21.4 ∼ 129.4% when dosage r2 is below 0.4.
Collapse
Affiliation(s)
- Young Jin Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea.,Division of Structural and Functional Genomics, Center for Genome Science, Korean National Institute of Health, Osong, Chungchungbuk-do, Korea
| | - Juyoung Lee
- Division of Structural and Functional Genomics, Center for Genome Science, Korean National Institute of Health, Osong, Chungchungbuk-do, Korea
| | - Bong-Jo Kim
- Division of Structural and Functional Genomics, Center for Genome Science, Korean National Institute of Health, Osong, Chungchungbuk-do, Korea
| | | | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea.,Department of Statistics, Seoul National University, Seoul, Korea
| |
Collapse
|
43
|
Zhang P, Li Q, Qi J, Lv Q, Zheng X, Wu X, Gu J. Association between vitamin D receptor gene polymorphism and ankylosing spondylitis in Han Chinese. Int J Rheum Dis 2016; 20:1510-1516. [PMID: 27778467 DOI: 10.1111/1756-185x.12949] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Pingping Zhang
- Department of Rheumatology; The Third Affiliated Hospital of Sun Yat-Sen University; Guangzhou China
- Department of Pediatrics; The Third Affiliated Hospital of Sun Yat-Sen University; Guangzhou China
| | - Qiuxia Li
- Department of Rheumatology; The Third Affiliated Hospital of Sun Yat-Sen University; Guangzhou China
| | - Jun Qi
- Department of Rheumatology; The Third Affiliated Hospital of Sun Yat-Sen University; Guangzhou China
| | - Qing Lv
- Department of Rheumatology; The Third Affiliated Hospital of Sun Yat-Sen University; Guangzhou China
| | - Xuqi Zheng
- Department of Rheumatology; The Third Affiliated Hospital of Sun Yat-Sen University; Guangzhou China
| | - Xinyu Wu
- Department of Rheumatology; The Third Affiliated Hospital of Sun Yat-Sen University; Guangzhou China
| | - Jieruo Gu
- Department of Rheumatology; The Third Affiliated Hospital of Sun Yat-Sen University; Guangzhou China
| |
Collapse
|
44
|
Kim T, Wei P. Incorporating ENCODE information into association analysis of whole genome sequencing data. BMC Proc 2016; 10:257-261. [PMID: 27980646 PMCID: PMC5133533 DOI: 10.1186/s12919-016-0040-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
With the rapidly decreasing cost of the next-generation sequencing technology, a large number of whole genome sequences have been generated, enabling researchers to survey rare variants in the protein-coding and regulatory regions of the genome. However, it remains a daunting task to identify functional variants associated with complex diseases from whole genome sequencing (WGS) data because of the millions of candidate variants and yet moderate sample size. We propose to incorporate the Encyclopedia of DNA Elements (ENCODE) information in the association analysis of WGS data to boost the statistical power. We use the RegulomeDB and PolyPhen2 scores as external weights in existing rare variants association tests. We demonstrate the proposed framework using the WGS data and blood pressure phenotype from the San Antonio Family Studies provided by the Genetic Analysis Workshop 19. We identified a genome-wide significant locus in gene SNUPN on chromosome 15 that harbors a rare nonsynonymous variant, which was not detected by benchmark methods that did not incorporate biological information, including the T5 burden test and sequence kernel association test.
Collapse
Affiliation(s)
- Taebeom Kim
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030 USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030 USA
| |
Collapse
|
45
|
Abstract
The study of rare variants may enhance our understanding of the genetic determinants of the metabolome. Here, we analyze the association between 217 plasma metabolites and exome variants on the Illumina HumanExome Beadchip in 2,076 participants in the Framingham Heart Study, with replication in 1,528 participants of the Atherosclerosis Risk in Communities Study. We identify an association between GMPS and xanthosine using single variant analysis and associations between HAL and histidine, PAH and phenylalanine, and UPB1 and ureidopropionate using gene-based tests (P<5 × 10−8 in meta-analysis), highlighting novel coding variants that may underlie inborn errors of metabolism. Further, we show how an examination of variants across the spectrum of allele frequency highlights independent association signals at select loci and generates a more integrated view of metabolite heritability. These studies build on prior metabolomics genome wide association studies to provide a more complete picture of the genetic architecture of the plasma metabolome. Several GWAS have identified many common variants associated with blood metabolites. Here, the authors use an exome array to identify low frequency, potentially functional variants that impact human metabolism.
Collapse
|
46
|
Abstract
There are thousands of known associations between genetic variants and complex human phenotypes, and the rate of novel discoveries is rapidly increasing. Translating those associations into knowledge of disease mechanisms remains a fundamental challenge because the associated variants are overwhelmingly in noncoding regions of the genome where we have few guiding principles to predict their function. Intersecting the compendium of identified genetic associations with maps of regulatory activity across the human genome has revealed that phenotype-associated variants are highly enriched in candidate regulatory elements. Allele-specific analyses of gene regulation can further prioritize variants that likely have a functional effect on disease mechanisms; and emerging high-throughput assays to quantify the activity of candidate regulatory elements are a promising next step in that direction. Together, these technologies have created the ability to systematically and empirically test hypotheses about the function of noncoding variants and haplotypes at the scale needed for comprehensive and systematic follow-up of genetic association studies. Major coordinated efforts to quantify regulatory mechanisms across genetically diverse populations in increasingly realistic cell models would be highly beneficial to realize that potential.
Collapse
Affiliation(s)
- William L Lowe
- Division of Endocrinology, Metabolism and Molecular Medicine, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois 60611, USA
| | - Timothy E Reddy
- Department of Biostatistics and Bioinformatics, Duke University Medical School, Durham, North Carolina 27708, USA; Center for Genomic and Computational Biology, Duke University Medical School, Durham, North Carolina 27708, USA
| |
Collapse
|
47
|
Whole-genome sequencing in French Canadians from Quebec. Hum Genet 2016; 135:1213-1221. [PMID: 27376640 DOI: 10.1007/s00439-016-1702-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Accepted: 06/21/2016] [Indexed: 12/17/2022]
Abstract
Genome-wide association studies (GWAS) have had a tremendous success in the identification of common DNA sequence variants associated with complex human diseases and traits. However, because of their design, GWAS are largely inappropriate to characterize the role of rare and low-frequency DNA variants on human phenotypic variation. Rarer genetic variation is geographically more restricted, supporting the need for local whole-genome sequencing (WGS) efforts to study these variants in specific populations. Here, we present the first large-scale low-pass WGS of the French-Canadian population. Specifically, we sequenced at ~5.6× coverage the whole genome of 1970 French Canadians recruited by the Montreal Heart Institute Biobank and identified 29 million bi-allelic variants (31 % novel), including 19 million variants with a minor allele frequency (MAF) <0.5 %. Genotypes from the WGS data are highly concordant with genotypes obtained by exome array on the same individuals (99.8 %), even when restricting this analysis to rare variants (MAF <0.5, 99.9 %) or heterozygous sites (98.9 %). To further validate our data set, we showed that we can effectively use it to replicate several genetic associations with myocardial infarction risk and blood lipid levels. Furthermore, we analyze the utility of our WGS data set to generate a French-Canadian-specific imputation reference panel and to infer population structure in the Province of Quebec. Our results illustrate the value of low-pass WGS to study the genetics of human diseases in the founder French-Canadian population.
Collapse
|
48
|
Yazdani A, Yazdani A, Liu X, Boerwinkle E. Identification of Rare Variants in Metabolites of the Carnitine Pathway by Whole Genome Sequencing Analysis. Genet Epidemiol 2016; 40:486-91. [PMID: 27256581 DOI: 10.1002/gepi.21980] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Revised: 01/06/2016] [Accepted: 04/04/2016] [Indexed: 12/28/2022]
Abstract
We use whole genome sequence data and rare variant analysis methods to investigate a subset of the human serum metabolome, including 16 carnitine-related metabolites that are important components of mammalian energy metabolism. Medium pass sequence data consisting of 12,820,347 rare variants and serum metabolomics data were available on 1,456 individuals. By applying a penalization method, we identified two genes FGF8 and MDGA2 with significant effects on lysine and cis-4-decenoylcarnitine, respectively, using Δ-AIC and likelihood ratio test statistics. Single variant analyses in these regions did not identify a single low-frequency variant (minor allele count > 3) responsible for the underlying signal. The results demonstrate the utility of whole genome sequence and innovative analyses for identifying candidate regions influencing complex phenotypes.
Collapse
Affiliation(s)
- Akram Yazdani
- Human Genetics Center, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Azam Yazdani
- Human Genetics Center, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Xiaoming Liu
- Human Genetics Center, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Eric Boerwinkle
- Human Genetics Center, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America.,Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
| |
Collapse
|
49
|
|
50
|
Niiranen TJ, Vasan RS. Epidemiology of cardiovascular disease: recent novel outlooks on risk factors and clinical approaches. Expert Rev Cardiovasc Ther 2016; 14:855-69. [PMID: 27057779 DOI: 10.1080/14779072.2016.1176528] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
INTRODUCTION Cardiovascular (CVD) risk assessment with traditional risk factors (age, sex, blood pressure, lipids, smoking and diabetes) has remained relatively invariant over the past decades despite some inaccuracies associated with this approach. However, the search for novel, robust and cost-effective risk markers of CVD risk is ongoing. AREAS COVERED A large share of the major developments in CVD risk prediction during the past five years has been made in large-scale biomarker discovery and the so called 'omics' - the rapidly growing fields of genomics, transcriptomics, epigenetics and metabolomics. This review focuses on how these new technologies are helping drive primary CVD risk estimation forward in recent years, and speculates on how they could be utilized more effectively for discovering novel risk factors in the future. Expert commentary: The search for new CVD risk factors is currently undergoing a significant revolution as the simple relationship between single risk factors and disease will have to be replaced by models that strive to integrate the whole field of omics into medicine.
Collapse
Affiliation(s)
- Teemu J Niiranen
- a National Heart, Blood and Lung Institute's and Boston University's Framingham Heart Study , Framingham , MA , USA
| | - Ramachandran S Vasan
- a National Heart, Blood and Lung Institute's and Boston University's Framingham Heart Study , Framingham , MA , USA
| |
Collapse
|