1
|
Martins Rodrigues F, Terekhanova NV, Imbach KJ, Clauser KR, Esai Selvan M, Mendizabal I, Geffen Y, Akiyama Y, Maynard M, Yaron TM, Li Y, Cao S, Storrs EP, Gonda OS, Gaite-Reguero A, Govindan A, Kawaler EA, Wyczalkowski MA, Klein RJ, Turhan B, Krug K, Mani DR, Leprevost FDV, Nesvizhskii AI, Carr SA, Fenyö D, Gillette MA, Colaprico A, Iavarone A, Robles AI, Huang KL, Kumar-Sinha C, Aguet F, Lazar AJ, Cantley LC, Marigorta UM, Gümüş ZH, Bailey MH, Getz G, Porta-Pardo E, Ding L. Precision proteogenomics reveals pan-cancer impact of germline variants. Cell 2025; 188:2312-2335.e26. [PMID: 40233739 DOI: 10.1016/j.cell.2025.03.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 04/29/2024] [Accepted: 03/13/2025] [Indexed: 04/17/2025]
Abstract
We investigate the impact of germline variants on cancer patients' proteomes, encompassing 1,064 individuals across 10 cancer types. We introduced an approach, "precision peptidomics," mapping 337,469 coding germline variants onto peptides from patients' mass spectrometry data, revealing their potential impact on post-translational modifications, protein stability, allele-specific expression, and protein structure by leveraging the relevant protein databases. We identified rare pathogenic and common germline variants in cancer genes potentially affecting proteomic features, including variants altering protein abundance and structure and variants in kinases (ERBB2 and MAP2K2) impacting phosphorylation. Precision peptidome analysis predicted destabilizing events in signal-regulatory protein alpha (SIRPA) and glial fibrillary acid protein (GFAP), relevant to immunomodulation and glioblastoma diagnostics, respectively. Genome-wide association studies identified quantitative trait loci for gene expression and protein levels, spanning millions of SNPs and thousands of proteins. Polygenic risk scores correlated with distal effects from risk variants. Our findings emphasize the contribution of germline genetics to cancer heterogeneity and high-throughput precision peptidomics.
Collapse
Affiliation(s)
- Fernanda Martins Rodrigues
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO, USA; McDonnell Genome Institute, Washington University in St. Louis, Saint Louis, MO, USA; Department of Genetics, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Nadezhda V Terekhanova
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO, USA; McDonnell Genome Institute, Washington University in St. Louis, Saint Louis, MO, USA; Department of Genetics, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Kathleen J Imbach
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, Barcelona, Spain; Universitat Autonoma de Barcelona, Barcelona, Spain
| | | | - Myvizhi Esai Selvan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Center for Thoracic Oncology, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Isabel Mendizabal
- Center for Cooperative Research in Biosciences (CIC bioGUNE), Basque Research and Technology Alliance (BRTA), Bizkaia Technology Park, Derio, Spain; Ikerbasque, Basque Foundation for Science, Bilbao, Spain; Translational Prostate Cancer Research Lab, CIC bioGUNE-Basurto, Biocruces Bizkaia Health Research Institute, Derio, Spain
| | - Yifat Geffen
- Broad Institute of MIT and Harvard, Cambridge, MA, USA; Cancer Center and Department of Pathology, Massachusetts General Hospital, Boston, MA, USA
| | - Yo Akiyama
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Tomer M Yaron
- Meyer Cancer Center, Department of Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Yize Li
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO, USA; McDonnell Genome Institute, Washington University in St. Louis, Saint Louis, MO, USA; Department of Genetics, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Song Cao
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO, USA; McDonnell Genome Institute, Washington University in St. Louis, Saint Louis, MO, USA; Department of Genetics, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Erik P Storrs
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO, USA; McDonnell Genome Institute, Washington University in St. Louis, Saint Louis, MO, USA; Department of Genetics, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Olivia S Gonda
- Department of Biology, Brigham Young University, Salt Lake City, UT, USA
| | - Adrian Gaite-Reguero
- Center for Cooperative Research in Biosciences (CIC bioGUNE), Basque Research and Technology Alliance (BRTA), Bizkaia Technology Park, Derio, Spain
| | - Akshay Govindan
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO, USA; McDonnell Genome Institute, Washington University in St. Louis, Saint Louis, MO, USA; Department of Genetics, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Emily A Kawaler
- Applied Bioinformatics Laboratories, New York University Langone Health, New York City, NY, USA
| | - Matthew A Wyczalkowski
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO, USA; McDonnell Genome Institute, Washington University in St. Louis, Saint Louis, MO, USA; Department of Genetics, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Robert J Klein
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Berk Turhan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Karsten Krug
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - D R Mani
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Alexey I Nesvizhskii
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Steven A Carr
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - David Fenyö
- Institute for Systems Genetics, NYU Grossman School of Medicine, New York, NY, USA
| | | | - Antonio Colaprico
- Department of Public Health Sciences, University of Miami Miller School of Medicine, Miami, FL, USA; Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Antonio Iavarone
- Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, FL, USA; Department of Neurological Surgery, Department of Biochemistry and Molecular Biology, University of Miami, Miller School of Medicine, Miami, FL, USA
| | - Ana I Robles
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, Rockville, MD, USA
| | - Kuan-Lin Huang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Center for Transformative Disease Modeling, Tisch Cancer Institute, Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Chandan Kumar-Sinha
- Department of Pathology, University of Michigan, Ann Arbor, MI, USA; Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, MI, USA
| | | | - Alexander J Lazar
- Departments of Pathology and Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | | - Urko M Marigorta
- Center for Cooperative Research in Biosciences (CIC bioGUNE), Basque Research and Technology Alliance (BRTA), Bizkaia Technology Park, Derio, Spain; Ikerbasque, Basque Foundation for Science, Bilbao, Spain
| | - Zeynep H Gümüş
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Center for Thoracic Oncology, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Matthew H Bailey
- Department of Biology, Brigham Young University, Salt Lake City, UT, USA.
| | - Gad Getz
- Broad Institute of MIT and Harvard, Cambridge, MA, USA; Cancer Center and Department of Pathology, Massachusetts General Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA.
| | - Eduard Porta-Pardo
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, Barcelona, Spain; Barcelona Supercomputing Center (BSC), Barcelona, Spain.
| | - Li Ding
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO, USA; McDonnell Genome Institute, Washington University in St. Louis, Saint Louis, MO, USA; Department of Genetics, Washington University in St. Louis, St. Louis, MO 63110, USA; Siteman Cancer Center, Washington University in St. Louis, Saint Louis, MO, USA.
| |
Collapse
|
2
|
Riccio C, Jansen ML, Thalén F, Koliopanos G, Link V, Ziegler A. Assessment of the functionality and usability of open-source rare variant analysis pipelines. Brief Bioinform 2025; 26:bbaf044. [PMID: 39907318 PMCID: PMC11795309 DOI: 10.1093/bib/bbaf044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Revised: 01/07/2025] [Accepted: 01/20/2025] [Indexed: 02/06/2025] Open
Abstract
Sequencing of increasingly larger cohorts has revealed many rare variants, presenting an opportunity to further unravel the genetic basis of complex traits. Compared with common variants, rare variants are more complex to analyze. Specialized computational tools for these analyses should be both flexible and user-friendly. However, an overview of the available rare variant analysis pipelines and their functionalities is currently lacking. Here, we provide a systematic review of the currently available rare variant analysis pipelines. We searched MEDLINE and Google Scholar until 27 November 2023, and included open-source rare variant pipelines that accepted genotype data from cohort and case-control studies and group variants into testing units. Eligible pipelines were assessed based on functionality and usability criteria. We identified 17 rare variant pipelines that collectively support various trait types, association tests, testing units, and variant weighting schemes. Currently, no single pipeline can handle all data types in a scalable and flexible manner. We recommend different tools to meet diverse analysis needs. STAARpipeline is suitable for newcomers and common applications owing to its built-in definitions for the testing units. REGENIE is highly scalable, actively maintained, regularly updated, and well documented. Ravages is suitable for analyzing multinomial variables, and OrdinalGWAS is tailored for analyzing ordinal variables. Opportunities remain for developing a user-friendly pipeline that provides high degrees of flexibility and scalability. Such a pipeline would enable researchers to exploit the potential of rare variant analyses to uncover the genetic basis of complex traits.
Collapse
Affiliation(s)
- Cristian Riccio
- Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
- Swiss Institute of Bioinformatics, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
| | - Max L Jansen
- Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
- Swiss Institute of Bioinformatics, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
| | - Felix Thalén
- Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
- Swiss Institute of Bioinformatics, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
| | - Georgios Koliopanos
- Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
- Swiss Institute of Bioinformatics, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
| | - Vivian Link
- Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
- Swiss Institute of Bioinformatics, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
| | - Andreas Ziegler
- Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
- Swiss Institute of Bioinformatics, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
- Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20251 Hamburg, Germany
- University Center of Cardiovascular Science & Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20251 Hamburg, Germany
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Ave, Scottsville, Pietermaritzburg, 3201, South Africa
| |
Collapse
|
3
|
Martins Rodrigues F, Jasielec J, Perpich M, Kim A, Moma L, Li Y, Storrs E, Wendl MC, Jayasinghe RG, Fiala M, Stefka A, Derman B, Jakubowiak AJ, DiPersio JF, Vij R, Godley LA, Ding L. Germline predisposition in multiple myeloma. iScience 2025; 28:111620. [PMID: 39845416 PMCID: PMC11750583 DOI: 10.1016/j.isci.2024.111620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2024] [Revised: 10/04/2024] [Accepted: 11/14/2024] [Indexed: 01/24/2025] Open
Abstract
We present a study of rare germline predisposition variants in 954 unrelated individuals with multiple myeloma (MM) and 82 MM families. Using a candidate gene approach, we identified such variants across all age groups in 9.1% of sporadic and 18% of familial cases. Implicated genes included genes suggested in other MM risk studies as potential risk genes (DIS3, EP300, KDM1A, and USP45); genes involved in predisposition to other cancers (ATM, BRCA1/2, CHEK2, PMS2, POT1, PRF1, and TP53); and BRIP1, EP300, and FANCM in individuals of African ancestry. Variants were characterized using loss of heterozygosity (LOH), biallelic events, and gene expression analyses, revealing 31 variants in 3.25% of sporadic cases for which pathogenicity was supported by multiple lines of evidence. Our results suggest that the disruption of DNA damage repair pathways may play a role in MM susceptibility. These results will inform improved surveillance in high-risk groups and potential therapeutic strategies.
Collapse
Affiliation(s)
- Fernanda Martins Rodrigues
- Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
- Division of Oncology, Washington University School of Medicine, St. Louis, MO 63110, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Jagoda Jasielec
- Section of Hematology/Oncology, Department of Medicine, The University of Chicago, Chicago, IL 60637, USA
| | - Melody Perpich
- Section of Hematology/Oncology, Department of Medicine, The University of Chicago, Chicago, IL 60637, USA
| | - Aelin Kim
- Section of Hematology/Oncology, Department of Medicine, The University of Chicago, Chicago, IL 60637, USA
| | - Luke Moma
- Section of Hematology/Oncology, Department of Medicine, The University of Chicago, Chicago, IL 60637, USA
| | - Yize Li
- Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
- Division of Oncology, Washington University School of Medicine, St. Louis, MO 63110, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Erik Storrs
- Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
- Division of Oncology, Washington University School of Medicine, St. Louis, MO 63110, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Michael C. Wendl
- Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
- Division of Oncology, Washington University School of Medicine, St. Louis, MO 63110, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Reyka G. Jayasinghe
- Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
- Division of Oncology, Washington University School of Medicine, St. Louis, MO 63110, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Mark Fiala
- Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
- Division of Oncology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Andrew Stefka
- Section of Hematology/Oncology, Department of Medicine, The University of Chicago, Chicago, IL 60637, USA
| | - Benjamin Derman
- Section of Hematology/Oncology, Department of Medicine, The University of Chicago, Chicago, IL 60637, USA
| | - Andrzej J. Jakubowiak
- Section of Hematology/Oncology, Department of Medicine, The University of Chicago, Chicago, IL 60637, USA
| | - John F. DiPersio
- Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
- Division of Oncology, Washington University School of Medicine, St. Louis, MO 63110, USA
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Ravi Vij
- Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
- Division of Oncology, Washington University School of Medicine, St. Louis, MO 63110, USA
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Lucy A. Godley
- Division of Hematology/Oncology, Department of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Li Ding
- Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
- Division of Oncology, Washington University School of Medicine, St. Louis, MO 63110, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63110, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA
| |
Collapse
|
4
|
Shen L, Amei A, Liu B, Xu G, Liu Y, Oh EC, Zhou X, Wang Z. Marginal interaction test for detecting interactions between genetic marker sets and environment in genome-wide studies. G3 (BETHESDA, MD.) 2025; 15:jkae263. [PMID: 39538414 PMCID: PMC11708225 DOI: 10.1093/g3journal/jkae263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Accepted: 11/04/2024] [Indexed: 11/16/2024]
Abstract
As human complex diseases are influenced by the interaction between genetics and the environment, identifying gene-environment interactions (G×E) is crucial for understanding disease mechanisms and predicting risk. Developing robust quantitative tools for G×E analysis can enhance the study of complex diseases. However, many existing methods that explore G×E focus on the interplay between an environmental factor and genetic variants, exclusively for common or rare variants. In this study, we developed MAGEIT_RAN and MAGEIT_FIX to identify interactions between an environmental factor and a set of genetic markers, including both rare and common variants, based on the MinQue for Summary statistics. The genetic main effects in MAGEIT_RAN and MAGEIT_FIX are modeled as random and fixed effects, respectively. Simulation studies showed that both tests had type I error under control, with MAGEIT_RAN being the most powerful test. Applying MAGEIT to a genome-wide analysis of gene-alcohol interactions on hypertension and seated systolic blood pressure in the Multiethnic Study of Atherosclerosis revealed genes like EIF2AK2, CCNDBP1, and EPB42 influencing blood pressure through alcohol interaction. Pathway analysis identified 1 apoptosis and survival pathway involving PKR and 2 signal transduction pathways associated with hypertension and alcohol intake, demonstrating MAGEIT_RAN's ability to detect biologically relevant gene-environment interactions.
Collapse
Affiliation(s)
- Linchuan Shen
- Department of Mathematical Sciences, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
| | - Amei Amei
- Department of Mathematical Sciences, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
| | - Bowen Liu
- Department of Mathematical Sciences, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
- Division of Computing, Analysis, and Mathematics, University of Missouri, Kansas City, MO 64108, USA
| | - Gang Xu
- Department of Mathematical Sciences, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT 06510, USA
| | - Yunqing Liu
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT 06510, USA
| | - Edwin C Oh
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, Las Vegas, NV 89154, USA
- Department of Internal Medicine, University of Nevada School of Medicine, Las Vegas, NV 89154, USA
| | - Xin Zhou
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT 06510, USA
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT 06510, USA
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT 06510, USA
| |
Collapse
|
5
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Rare variant association on unrelated individuals in case-control studies using aggregation tests: existing methods and current limitations. Brief Bioinform 2023; 24:bbad412. [PMID: 37974506 DOI: 10.1093/bib/bbad412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 10/14/2023] [Accepted: 10/28/2023] [Indexed: 11/19/2023] Open
Abstract
Over the past years, progress made in next-generation sequencing technologies and bioinformatics have sparked a surge in association studies. Especially, genome-wide association studies (GWASs) have demonstrated their effectiveness in identifying disease associations with common genetic variants. Yet, rare variants can contribute to additional disease risk or trait heterogeneity. Because GWASs are underpowered for detecting association with such variants, numerous statistical methods have been recently proposed. Aggregation tests collapse multiple rare variants within a genetic region (e.g. gene, gene set, genomic loci) to test for association. An increasing number of studies using such methods successfully identified trait-associated rare variants and led to a better understanding of the underlying disease mechanism. In this review, we compare existing aggregation tests, their statistical features and scope of application, splitting them into the five classical classes: burden, adaptive burden, variance-component, omnibus and other. Finally, we describe some limitations of current aggregation tests, highlighting potential direction for further investigations.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- WELBIO department, WEL Research Institute, avenue Pasteur, 6, 1300 Wavre, Belgium
| |
Collapse
|
6
|
Leimi L, Koski JR, Kilpivaara O, Vettenranta K, Lokki AI, Meri S. Rare variants in complement system genes associate with endothelial damage after pediatric allogeneic hematopoietic stem cell transplantation. Front Immunol 2023; 14:1249958. [PMID: 37771589 PMCID: PMC10525714 DOI: 10.3389/fimmu.2023.1249958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 08/29/2023] [Indexed: 09/30/2023] Open
Abstract
Introduction Complement system has a postulated role in endothelial problems after hematopoietic stem cell transplantation (HSCT). In this retrospective, singlecenter study we studied genetic complement system variants in patients with documented endotheliopathy. In our previous study among pediatric patients with an allogeneic HSCT (2001-2013) at the Helsinki University Children´s Hospital, Finland, we identified a total of 19/122 (15.6%) patients with vascular complications, fulfilling the criteria of capillary leak syndrome (CLS), venoocclusive disease/sinusoidal obstruction syndrome (VOD/SOS) or thrombotic microangiopathy (TMA). Methods We performed whole exome sequencing (WES) on 109 patients having an adequate pre-transplantation DNA for the analysis to define possible variations and mutations potentially predisposing to functional abnormalities of the complement system. In our data analysis, we focused on 41 genes coding for complement components. Results 50 patients (45.9%) had one or several, nonsynonymous, rare germline variants in complement genes. 21/66 (31.8%) of the variants were in the terminal pathway. Patients with endotheliopathy had variants in different complement genes: in the terminal pathway (C6 and C9), lectin pathway (MASP1) and receptor ITGAM (CD11b, part of CR3). Four had the same rare missense variant (rs183125896; Thr279Ala) in the C9 gene. Two of these patients were diagnosed with endotheliopathy and one with capillary leak syndrome-like problems. The C9 variant Thr279Ala has no previously known disease associations and is classified by the ACMG guidelines as a variant of uncertain significance (VUS). We conducted a gene burden test with gnomAD Finnish (fin) as the reference population. Complement gene variants seen in our patient population were investigated and Total Frequency Testing (TFT) was used for execution of burden tests. The gene variants seen in our patients with endotheliopathy were all significantly (FDR < 0.05) enriched compared to gnomAD. Overall, 14/25 genes coding for components of the complement system had an increased burden of missense variants among the patients when compared to the gnomAD Finnish population (N=10 816). Discussion Injury to the vascular endothelium is relatively common after HSCT with different phenotypic appearances suggesting yet unidentified underlying mechanisms. Variants in complement components may be related to endotheliopathy and poor prognosis in these patients.
Collapse
Affiliation(s)
- Lilli Leimi
- Pediatric Research Center, Children’s Hospital, Helsinki University Hospital, University of Helsinki, Helsinki, Finland
| | - Jessica R. Koski
- Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Department of Medical and Clinical Genetics, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Medicum, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Outi Kilpivaara
- Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Department of Medical and Clinical Genetics, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Medicum, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Diagnostic Center, Helsinki University Hospital, Helsinki, Finland
| | - Kim Vettenranta
- Pediatric Research Center, Children’s Hospital, Helsinki University Hospital, University of Helsinki, Helsinki, Finland
| | - A. Inkeri Lokki
- Department of Bacteriology and Immunology and Translational Immunology Research Program, University of Helsinki, Helsinki, Finland
| | - Seppo Meri
- Diagnostic Center, Helsinki University Hospital, Helsinki, Finland
- Department of Bacteriology and Immunology and Translational Immunology Research Program, University of Helsinki, Helsinki, Finland
| |
Collapse
|
7
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Excalibur: A new ensemble method based on an optimal combination of aggregation tests for rare-variant association testing for sequencing data. PLoS Comput Biol 2023; 19:e1011488. [PMID: 37708232 PMCID: PMC10522036 DOI: 10.1371/journal.pcbi.1011488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 09/26/2023] [Accepted: 09/04/2023] [Indexed: 09/16/2023] Open
Abstract
The development of high-throughput next-generation sequencing technologies and large-scale genetic association studies produced numerous advances in the biostatistics field. Various aggregation tests, i.e. statistical methods that analyze associations of a trait with multiple markers within a genomic region, have produced a variety of novel discoveries. Notwithstanding their usefulness, there is no single test that fits all needs, each suffering from specific drawbacks. Selecting the right aggregation test, while considering an unknown underlying genetic model of the disease, remains an important challenge. Here we propose a new ensemble method, called Excalibur, based on an optimal combination of 36 aggregation tests created after an in-depth study of the limitations of each test and their impact on the quality of result. Our findings demonstrate the ability of our method to control type I error and illustrate that it offers the best average power across all scenarios. The proposed method allows for novel advances in Whole Exome/Genome sequencing association studies, able to handle a wide range of association models, providing researchers with an optimal aggregation analysis for the genetic regions of interest.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- WELBIO department, WEL Research Institute, Wavre, Belgium
| |
Collapse
|
8
|
Shen L, Amei A, Liu B, Liu Y, Xu G, Oh EC, Wang Z. Detection of interactions between genetic marker sets and environment in a genome-wide study of hypertension. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.28.542666. [PMID: 37398075 PMCID: PMC10312472 DOI: 10.1101/2023.05.28.542666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
As human complex diseases are influenced by the interplay of genes and environment, detecting gene-environment interactions ( G × E ) can shed light on biological mechanisms of diseases and play an important role in disease risk prediction. Development of powerful quantitative tools to incorporate G × E in complex diseases has potential to facilitate the accurate curation and analysis of large genetic epidemiological studies. However, most of existing methods that interrogate G × E focus on the interaction effects of an environmental factor and genetic variants, exclusively for common or rare variants. In this study, we proposed two tests, MAGEIT_RAN and MAGEIT_FIX, to detect interaction effects of an environmental factor and a set of genetic markers containing both rare and common variants, based on the MinQue for Summary statistics. The genetic main effects in MAGEIT_RAN and MAGEIT_FIX are modeled as random or fixed, respectively. Through simulation studies, we illustrated that both tests had type I error under control and MAGEIT_RAN was overall the most powerful test. We applied MAGEIT to a genome-wide analysis of gene-alcohol interactions on hypertension in the Multi-Ethnic Study of Atherosclerosis. We detected two genes, CCNDBP1 and EPB42, that interact with alcohol usage to influence blood pressure. Pathway analysis identified sixteen significant pathways related to signal transduction and development that were associated with hypertension, and several of them were reported to have an interactive effect with alcohol intake. Our results demonstrated that MAGEIT can detect biologically relevant genes that interact with environmental factors to influence complex traits.
Collapse
Affiliation(s)
- Linchuan Shen
- Department of Mathematical Sciences, University of Nevada, Las Vegas
| | - Amei Amei
- Department of Mathematical Sciences, University of Nevada, Las Vegas
| | - Bowen Liu
- Department of Mathematical Sciences, University of Nevada, Las Vegas
| | - Yunqing Liu
- Department of Biostatistics, Yale School of Public Health
| | - Gang Xu
- Department of Mathematical Sciences, University of Nevada, Las Vegas
- Department of Biostatistics, Yale School of Public Health
| | - Edwin C. Oh
- Department of Internal Medicine, University of Nevada School of Medicine, Las Vegas
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health
| |
Collapse
|
9
|
Zhang J, Liang X, Gonzales S, Liu J, Gao XR, Wang X. A gene based combination test using GWAS summary data. BMC Bioinformatics 2023; 24:2. [PMID: 36597047 PMCID: PMC9811798 DOI: 10.1186/s12859-022-05114-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 12/13/2022] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Gene-based association tests provide a useful alternative and complement to the usual single marker association tests, especially in genome-wide association studies (GWAS). The way of weighting for variants in a gene plays an important role in boosting the power of a gene-based association test. Appropriate weights can boost statistical power, especially when detecting genetic variants with weak effects on a trait. One major limitation of existing gene-based association tests lies in using weights that are predetermined biologically or empirically. This limitation often attenuates the power of a test. On another hand, effect sizes or directions of causal genetic variants in real data are usually unknown, driving a need for a flexible yet robust methodology of gene based association tests. Furthermore, access to individual-level data is often limited, while thousands of GWAS summary data are publicly and freely available. RESULTS To resolve these limitations, we propose a combination test named as OWC which is based on summary statistics from GWAS data. Several traditional methods including burden test, weighted sum of squared score test [SSU], weighted sum statistic [WSS], SNP-set Kernel Association Test [SKAT], and the score test are special cases of OWC. To evaluate the performance of OWC, we perform extensive simulation studies. Results of simulation studies demonstrate that OWC outperforms several existing popular methods. We further show that OWC outperforms comparison methods in real-world data analyses using schizophrenia GWAS summary data and a fasting glucose GWAS meta-analysis data. The proposed method is implemented in an R package available at https://github.com/Xuexia-Wang/OWC-R-package CONCLUSIONS: We propose a novel gene-based association test that incorporates four different weighting schemes (two constant weights and two weights proportional to normal statistic Z) and includes several popular methods as its special cases. Results of the simulation studies and real data analyses illustrate that the proposed test, OWC, outperforms comparable methods in most scenarios. These results demonstrate that OWC is a useful tool that adapts to the underlying biological model for a disease by weighting appropriately genetic variants and combination of well-known gene-based tests.
Collapse
Affiliation(s)
- Jianjun Zhang
- grid.266869.50000 0001 1008 957XDepartment of Mathematics, University of North Texas, 225 Avenue E, Denton, TX 76201 USA
| | - Xiaoyu Liang
- grid.17088.360000 0001 2150 1785Department of Epidemiology and Biostatistics, Michigan State University, 909 Wilson Rd Room B601, East Lansing, MI 48824 USA
| | - Samantha Gonzales
- grid.266869.50000 0001 1008 957XDepartment of Mathematics, University of North Texas, 225 Avenue E, Denton, TX 76201 USA
| | - Jianguo Liu
- grid.266869.50000 0001 1008 957XDepartment of Mathematics, University of North Texas, 225 Avenue E, Denton, TX 76201 USA
| | - Xiaoyi Raymond Gao
- grid.261331.40000 0001 2285 7943Department of Ophthalmology and Visual Science, Department of Biomedical informatics, Division of Human Genetics, Ohio State University, 915 Olentangy River Road, Columbus, OH 43212 USA
| | - Xuexia Wang
- grid.65456.340000 0001 2110 1845Department of Biostatistics, Robert Stempel College of Public Health and Social Work, Florida International University, 11200 SW 8th street, Miami, FL 33174 USA
| |
Collapse
|
10
|
Whole Exome Sequencing Study Identifies Novel Rare Risk Variants for Habitual Coffee Consumption Involved in Olfactory Receptor and Hyperphagia. Nutrients 2022; 14:nu14204330. [PMID: 36297015 PMCID: PMC9607528 DOI: 10.3390/nu14204330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 10/13/2022] [Accepted: 10/13/2022] [Indexed: 11/06/2022] Open
Abstract
Habitual coffee consumption is an addictive behavior with unknown genetic variations and has raised public health issues about its potential health-related outcomes. We performed exome-wide association studies to identify rare risk variants contributing to habitual coffee consumption utilizing the newly released UK Biobank exome dataset (n = 200,643). A total of 34,761 qualifying variants were imported into SKAT to conduct gene-based burden and robust tests with minor allele frequency <0.01, adjusting the polygenic risk scores (PRS) of coffee intake to exclude the effect of common coffee-related polygenic risk. The gene-based burden and robust test of the exonic variants found seven exome-wide significant associations, such as OR2G2 (PSKAT = 1.88 × 10−9, PSKAT-Robust = 2.91 × 10−17), VEZT1 (PSKAT = 3.72 × 10−7, PSKAT-Robust = 1.41 × 10−7), and IRGC (PSKAT = 2.92 × 10−5, PSKAT-Robust = 1.07 × 10−7). These candidate genes were verified in the GWAS summary data of coffee intake, such as rs12737801 (p = 0.002) in OR2G2, and rs34439296 (p = 0.008) in IRGC. This study could help to extend genetic insights into the pathogenesis of coffee addiction, and may point to molecular mechanisms underlying health effects of habitual coffee consumption.
Collapse
|
11
|
Lee JY, Shen PS, Cheng KF. A robust association test with multiple genetic variants and covariates. Stat Appl Genet Mol Biol 2022; 21:sagmb-2021-0029. [DOI: 10.1515/sagmb-2021-0029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Accepted: 05/20/2022] [Indexed: 11/15/2022]
Abstract
Abstract
Due to the advancement of genome sequencing techniques, a great stride has been made in exome sequencing such that the association study between disease and genetic variants has become feasible. Some powerful and well-known association tests have been proposed to test the association between a group of genes and the disease of interest. However, some challenges still remain, in particular, many factors can affect the performance of testing power, e.g., the sample size, the number of causal and non-causal variants, and direction of the effect of causal variants. Recently, a powerful test, called T
REM
, is derived based on a random effects model. T
REM
has the advantages of being less sensitive to the inclusion of non-causal rare variants or low effect common variants or the presence of missing genotypes. However, the testing power of T
REM
can be low when a portion of causal variants has effects in opposite directions. To improve the drawback of T
REM
, we propose a novel test, called T
ROB
, which keeps the advantages of T
REM
and is more robust than T
REM
in terms of having adequate power in the case of variants with opposite directions of effect. Simulation results show that T
ROB
has a stable type I error rate and outperforms T
REM
when the proportion of risk variants decreases to a certain level and its advantage over T
REM
increases as the proportion decreases. Furthermore, T
ROB
outperforms several other competing tests in most scenarios. The proposed methodology is illustrated using the Shanghai Breast Cancer Study.
Collapse
Affiliation(s)
- Jen-Yu Lee
- Department of Statistics , Feng Chia University , Taichung , Taiwan, ROC
| | - Pao-Sheng Shen
- Department of Statistics , Tunghai University , Taichung , Taiwan, ROC
| | - Kuang-Fu Cheng
- Biostatistics Center , Taipei Medical University , Taipei , Taiwan, ROC
- Department of Business Administration , Asia University , Taichung , Taiwan, ROC
| |
Collapse
|
12
|
Li MK, Yuan YX, Zhu B, Wang KW, Fung WK, Zhou JY. Gene-Based Methods for Estimating the Degree of the Skewness of X Chromosome Inactivation. Genes (Basel) 2022; 13:genes13050827. [PMID: 35627212 PMCID: PMC9140558 DOI: 10.3390/genes13050827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 05/01/2022] [Accepted: 05/02/2022] [Indexed: 11/16/2022] Open
Abstract
Skewed X chromosome inactivation (XCI-S) has been reported to be associated with some X-linked diseases, and currently several methods have been proposed to estimate the degree of the XCI-S (denoted as γ) for a single locus. However, no method has been available to estimate γ for genes. Therefore, in this paper, we first propose the point estimate and the penalized point estimate of γ for genes, and then derive its confidence intervals based on the Fieller’s and penalized Fieller’s methods, respectively. Further, we consider the constraint condition of γ∈[0, 2] and propose the Bayesian methods to obtain the point estimates and the credible intervals of γ, where a truncated normal prior and a uniform prior are respectively used (denoted as GBN and GBU). The simulation results show that the Bayesian methods can avoid the extreme point estimates (0 or 2), the empty sets, the noninformative intervals ([0, 2]) and the discontinuous intervals to occur. GBN performs best in both the point estimation and the interval estimation. Finally, we apply the proposed methods to the Minnesota Center for Twin and Family Research data for their practical use. In summary, in practical applications, we recommend using GBN to estimate γ of genes.
Collapse
Affiliation(s)
- Meng-Kai Li
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou 510515, China; (M.-K.L.); (Y.-X.Y.); (B.Z.); (K.-W.W.)
- Guangdong-Hong Hong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou 510006, China
| | - Yu-Xin Yuan
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou 510515, China; (M.-K.L.); (Y.-X.Y.); (B.Z.); (K.-W.W.)
- Guangdong-Hong Hong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou 510006, China
| | - Bin Zhu
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou 510515, China; (M.-K.L.); (Y.-X.Y.); (B.Z.); (K.-W.W.)
- Guangdong-Hong Hong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou 510006, China
| | - Kai-Wen Wang
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou 510515, China; (M.-K.L.); (Y.-X.Y.); (B.Z.); (K.-W.W.)
- Guangdong-Hong Hong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou 510006, China
| | - Wing Kam Fung
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong, China;
| | - Ji-Yuan Zhou
- Department of Biostatistics, State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, School of Public Health, Southern Medical University, Guangzhou 510515, China; (M.-K.L.); (Y.-X.Y.); (B.Z.); (K.-W.W.)
- Guangdong-Hong Hong-Macao Joint Laboratory for Contaminants Exposure and Health, Guangzhou 510006, China
- Correspondence:
| |
Collapse
|
13
|
Zhang S, Cooper-Knock J, Weimer AK, Shi M, Moll T, Marshall JNG, Harvey C, Nezhad HG, Franklin J, Souza CDS, Ning K, Wang C, Li J, Dilliott AA, Farhan S, Elhaik E, Pasniceanu I, Livesey MR, Eitan C, Hornstein E, Kenna KP, Veldink JH, Ferraiuolo L, Shaw PJ, Snyder MP. Genome-wide identification of the genetic basis of amyotrophic lateral sclerosis. Neuron 2022; 110:992-1008.e11. [PMID: 35045337 PMCID: PMC9017397 DOI: 10.1016/j.neuron.2021.12.019] [Citation(s) in RCA: 79] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 10/07/2021] [Accepted: 12/13/2021] [Indexed: 02/01/2023]
Abstract
Amyotrophic lateral sclerosis (ALS) is a complex disease that leads to motor neuron death. Despite heritability estimates of 52%, genome-wide association studies (GWASs) have discovered relatively few loci. We developed a machine learning approach called RefMap, which integrates functional genomics with GWAS summary statistics for gene discovery. With transcriptomic and epigenetic profiling of motor neurons derived from induced pluripotent stem cells (iPSCs), RefMap identified 690 ALS-associated genes that represent a 5-fold increase in recovered heritability. Extensive conservation, transcriptome, network, and rare variant analyses demonstrated the functional significance of candidate genes in healthy and diseased motor neurons and brain tissues. Genetic convergence between common and rare variation highlighted KANK1 as a new ALS gene. Reproducing KANK1 patient mutations in human neurons led to neurotoxicity and demonstrated that TDP-43 mislocalization, a hallmark pathology of ALS, is downstream of axonal dysfunction. RefMap can be readily applied to other complex diseases.
Collapse
Affiliation(s)
- Sai Zhang
- Department of Genetics, Center for Genomics and Personalized Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Johnathan Cooper-Knock
- Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, S10 2HQ, UK
| | - Annika K Weimer
- Department of Genetics, Center for Genomics and Personalized Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Minyi Shi
- Department of Genetics, Center for Genomics and Personalized Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Tobias Moll
- Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, S10 2HQ, UK
| | - Jack N G Marshall
- Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, S10 2HQ, UK
| | - Calum Harvey
- Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, S10 2HQ, UK
| | - Helia Ghahremani Nezhad
- Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, S10 2HQ, UK
| | - John Franklin
- Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, S10 2HQ, UK
| | - Cleide Dos Santos Souza
- Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, S10 2HQ, UK
| | - Ke Ning
- Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, S10 2HQ, UK
| | - Cheng Wang
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, the Bakar Computational Health Sciences Institute, the Parker Institute for Cancer Immunotherapy, and the Department of Neurology, School of Medicine, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Jingjing Li
- The Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, the Bakar Computational Health Sciences Institute, the Parker Institute for Cancer Immunotherapy, and the Department of Neurology, School of Medicine, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Allison A Dilliott
- Department of Neurology and Neurosurgery, the Montreal Neurological Institute, McGill University, Montreal, QC H3A 1A1, Canada
| | - Sali Farhan
- Department of Neurology and Neurosurgery, the Montreal Neurological Institute, McGill University, Montreal, QC H3A 1A1, Canada
| | - Eran Elhaik
- Department of Biology, Lunds Universitet, Lund 223 62, Sweden
| | - Iris Pasniceanu
- Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, S10 2HQ, UK
| | - Matthew R Livesey
- Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, S10 2HQ, UK
| | - Chen Eitan
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Eran Hornstein
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Kevin P Kenna
- Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht 3584 CX, the Netherlands
| | - Jan H Veldink
- Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht 3584 CX, the Netherlands
| | - Laura Ferraiuolo
- Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, S10 2HQ, UK
| | - Pamela J Shaw
- Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, S10 2HQ, UK
| | - Michael P Snyder
- Department of Genetics, Center for Genomics and Personalized Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA.
| |
Collapse
|
14
|
Cheng S, Lyu J, Shi X, Wang K, Wang Z, Deng M, Sun B, Wang C. Rare variant association tests for ancestry-matched case-control data based on conditional logistic regression. Brief Bioinform 2022; 23:6502553. [PMID: 35021184 DOI: 10.1093/bib/bbab572] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 11/29/2021] [Accepted: 12/13/2021] [Indexed: 12/13/2022] Open
Abstract
With the increasing volume of human sequencing data available, analysis incorporating external controls becomes a popular and cost-effective approach to boost statistical power in disease association studies. To prevent spurious association due to population stratification, it is important to match the ancestry backgrounds of cases and controls. However, rare variant association tests based on a standard logistic regression model are conservative when all ancestry-matched strata have the same case-control ratio and might become anti-conservative when case-control ratio varies across strata. Under the conditional logistic regression (CLR) model, we propose a weighted burden test (CLR-Burden), a variance component test (CLR-SKAT) and a hybrid test (CLR-MiST). We show that the CLR model coupled with ancestry matching is a general approach to control for population stratification, regardless of the spatial distribution of disease risks. Through extensive simulation studies, we demonstrate that the CLR-based tests robustly control type 1 errors under different matching schemes and are more powerful than the standard Burden, SKAT and MiST tests. Furthermore, because CLR-based tests allow for different case-control ratios across strata, a full-matching scheme can be employed to efficiently utilize all available cases and controls to accelerate the discovery of disease associated genes.
Collapse
Affiliation(s)
- Shanshan Cheng
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| | - Jingjing Lyu
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| | - Xian Shi
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| | - Kai Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| | - Zengmiao Wang
- Center for Quantitative Biology, Peking University, Beijing 100871, P. R. China
| | - Minghua Deng
- Center for Quantitative Biology, Peking University, Beijing 100871, P. R. China.,LMAM, School of Mathematical Sciences, Peking University, Beijing 100871, P. R. China.,Center for Statistical Sciences, Peking University, Beijing 100871, P. R. China
| | - Baoluo Sun
- Department of Statistics and Data Science, National University of Singapore, Singapore 117546, Singapore
| | - Chaolong Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China.,Department of Orthopedic Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| |
Collapse
|
15
|
Jiang L, Jiang H, Dai S, Chen Y, Song Y, Tang CSM, Pang SYY, Ho SL, Wang B, Garcia-Barcelo MM, Tam PKH, Cherny SS, Li MJ, Sham PC, Li M. Deviation from baseline mutation burden provides powerful and robust rare-variants association test for complex diseases. Nucleic Acids Res 2021; 50:e34. [PMID: 34931221 PMCID: PMC8989543 DOI: 10.1093/nar/gkab1234] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 11/19/2021] [Accepted: 12/04/2021] [Indexed: 02/07/2023] Open
Abstract
Identifying rare variants that contribute to complex diseases is challenging because of the low statistical power in current tests comparing cases with controls. Here, we propose a novel and powerful rare variants association test based on the deviation of the observed mutation burden of a gene in cases from a baseline predicted by a weighted recursive truncated negative-binomial regression (RUNNER) on genomic features available from public data. Simulation studies show that RUNNER is substantially more powerful than state-of-the-art rare variant association tests and has reasonable type 1 error rates even for stratified populations or in small samples. Applied to real case-control data, RUNNER recapitulates known genes of Hirschsprung disease and Alzheimer's disease missed by current methods and detects promising new candidate genes for both disorders. In a case-only study, RUNNER successfully detected a known causal gene of amyotrophic lateral sclerosis. The present study provides a powerful and robust method to identify susceptibility genes with rare risk variants for complex diseases.
Collapse
Affiliation(s)
- Lin Jiang
- Program in Bioinformatics, Zhongshan School of Medicine and The Fifth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China.,Research Center of Medical Sciences, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China.,Key Laboratory of Tropical Disease Control (Sun Yat-sen University), Ministry of Education, Sun Yat-sen University, Guangzhou, China.,Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China
| | - Hui Jiang
- Program in Bioinformatics, Zhongshan School of Medicine and The Fifth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China.,Key Laboratory of Tropical Disease Control (Sun Yat-sen University), Ministry of Education, Sun Yat-sen University, Guangzhou, China.,Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China
| | - Sheng Dai
- Program in Bioinformatics, Zhongshan School of Medicine and The Fifth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China.,Key Laboratory of Tropical Disease Control (Sun Yat-sen University), Ministry of Education, Sun Yat-sen University, Guangzhou, China.,Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China
| | - Ying Chen
- Program in Bioinformatics, Zhongshan School of Medicine and The Fifth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China.,Key Laboratory of Tropical Disease Control (Sun Yat-sen University), Ministry of Education, Sun Yat-sen University, Guangzhou, China.,Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China
| | - Youqiang Song
- School of Biomedical Sciences, the University of Hong Kong, Hong Kong, SAR China.,State Key Laboratory of Brain and Cognitive Sciences, the University of Hong Kong, Hong Kong, SAR China
| | - Clara Sze-Man Tang
- Department of Surgery, the University of Hong Kong, Hong Kong, SAR China.,Dr. Li Dak-Sum Research Centre, The University of Hong Kong - Karolinska Institutet Collaboration in Regenerative Medicine, Hong Kong, SAR China
| | - Shirley Yin-Yu Pang
- Division of Neurology, Department of Medicine, the University of Hong Kong, Hong Kong, SAR China
| | - Shu-Leong Ho
- Division of Neurology, Department of Medicine, the University of Hong Kong, Hong Kong, SAR China
| | - Binbin Wang
- Department of Genetics, National Research Institute for Family Planning, Beijing, China
| | | | - Paul Kwong-Hang Tam
- Department of Surgery, the University of Hong Kong, Hong Kong, SAR China.,Dr. Li Dak-Sum Research Centre, The University of Hong Kong - Karolinska Institutet Collaboration in Regenerative Medicine, Hong Kong, SAR China.,Faculty of Medicine, Macau University of Science and Technology, Macau, SAR China
| | | | - Mulin Jun Li
- The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China
| | - Pak Chung Sham
- The Centre for PanorOmic Sciences, the University of Hong Kong, Hong Kong, SAR China.,State Key Laboratory of Brain and Cognitive Sciences, the University of Hong Kong, Hong Kong, SAR China.,Department of Psychiatry, the University of Hong Kong, Hong Kong, SAR China
| | - Miaoxin Li
- Program in Bioinformatics, Zhongshan School of Medicine and The Fifth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China.,Key Laboratory of Tropical Disease Control (Sun Yat-sen University), Ministry of Education, Sun Yat-sen University, Guangzhou, China.,Center for Precision Medicine, Sun Yat-sen University, Guangzhou, China.,The Centre for PanorOmic Sciences, the University of Hong Kong, Hong Kong, SAR China.,Guangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, China
| |
Collapse
|
16
|
SNP Development in Penaeus vannamei via Next-Generation Sequencing and DNA Pool Sequencing. FISHES 2021. [DOI: 10.3390/fishes6030036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Next-generation sequencing and pool sequencing have been widely used in SNP (single-nucleotide polymorphism) detection and population genetics research; however, there are few reports on SNPs related to the growth of Penaeus vannamei. The purpose of this study was to call SNPs from rapid-growing (RG) and slow-growing (SG) individuals’ transcriptomes and use DNA pool sequencing to assess the reliability of SNPs. Two parameters were applied to detect SNPs. One parameter was the p-values generated using Fisher’s exact test, which were used to calculate the significance of allele frequency differences between RG and SG. The other one was the AFI (minor allele frequency imbalance), which was defined to highlight the fold changes in MAF (minor allele frequency) values between RG and SG. There were 216,015 hypothetical SNPs, which were obtained based on the transcriptome data. Finally, 104 high-quality SNPs and 96,819 low-quality SNPs were predicted. Then, 18 high-quality SNPs and 17 low-quality SNPs were selected to assess the reliability of the detection process. Here, 72.22% (13/18) accuracy was achieved for high-quality SNPs, while only 52.94% (9/17) accuracy was achieved for low-quality SNPs. These SNPs enrich the data for population genetics studies of P. vannamei and may play a role in the development of SNP markers for future breeding studies.
Collapse
|
17
|
Yang T, Wei P, Pan W. Integrative analysis of multi-omics data for discovering low-frequency variants associated with low-density lipoprotein cholesterol levels. Bioinformatics 2021; 36:5223-5228. [PMID: 33070182 DOI: 10.1093/bioinformatics/btaa898] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 09/26/2020] [Accepted: 10/06/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The abundance of omics data has facilitated integrative analyses of single and multiple molecular layers with genome-wide association studies focusing on common variants. Built on its successes, we propose a general analysis framework to leverage multi-omics data with sequencing data to improve the statistical power of discovering new associations and understanding of the disease susceptibility due to low-frequency variants. The proposed test features its robustness to model misspecification, high power across a wide range of scenarios and the potential of offering insights into the underlying genetic architecture and disease mechanisms. RESULTS Using the Framingham Heart Study data, we show that low-frequency variants are predictive of DNA methylation, even after conditioning on the nearby common variants. In addition, DNA methylation and gene expression provide complementary information to functional genomics. In the Avon Longitudinal Study of Parents and Children with a sample size of 1497, one gene CLPTM1 is identified to be associated with low-density lipoprotein cholesterol levels by the proposed powerful adaptive gene-based test integrating information from gene expression, methylation and enhancer-promoter interactions. It is further replicated in the TwinsUK study with 1706 samples. The signal is driven by both low-frequency and common variants. AVAILABILITY AND IMPLEMENTATION Models are available at https://github.com/ytzhong/DNAm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tianzhong Yang
- Department of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Wei Pan
- Department of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
18
|
Torres GG, Nygaard M, Caliebe A, Blanché H, Chantalat S, Galan P, Lieb W, Christiansen L, Deleuze JF, Christensen K, Strauch K, Müller-Nurasyid M, Peters A, Nöthen MM, Hoffmann P, Flachsbart F, Schreiber S, Ellinghaus D, Franke A, Dose J, Nebel A. Exome-Wide Association Study Identifies FN3KRP and PGP as New Candidate Longevity Genes. J Gerontol A Biol Sci Med Sci 2021; 76:786-795. [PMID: 33491046 PMCID: PMC8087267 DOI: 10.1093/gerona/glab023] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Indexed: 12/19/2022] Open
Abstract
Despite enormous research efforts, the genetic component of longevity has remained largely elusive. The investigation of common variants, mainly located in intronic or regulatory regions, has yielded only little new information on the heritability of the phenotype. Here, we performed a chip-based exome-wide association study investigating 62 488 common and rare coding variants in 1248 German long-lived individuals, including 599 centenarians and 6941 younger controls (age < 60 years). In a single-variant analysis, we observed an exome-wide significant association between rs1046896 in the gene fructosamine-3-kinase-related-protein (FN3KRP) and longevity. Noteworthy, we found the longevity allele C of rs1046896 to be associated with an increased FN3KRP expression in whole blood; a database look-up confirmed this effect for various other human tissues. A gene-based analysis, in which potential cumulative effects of common and rare variants were considered, yielded the gene phosphoglycolate phosphatase (PGP) as another potential longevity gene, though no single variant in PGP reached the discovery p-value (1 × 10E−04). Furthermore, we validated the previously reported longevity locus cyclin-dependent kinase inhibitor 2B antisense RNA 1 (CDKN2B-AS1). Replication of our results in a French longevity cohort was only successful for rs1063192 in CDKN2B-AS1. In conclusion, we identified 2 new potential candidate longevity genes, FN3KRP and PGP which may influence the phenotype through their role in metabolic processes, that is, the reverse glycation of proteins (FN3KRP) and the control of glycerol-3-phosphate levels (PGP).
Collapse
Affiliation(s)
- Guillermo G Torres
- Institute of Clinical Molecular Biology, Kiel University, University Hospital Schleswig-Holstein, Germany
| | - Marianne Nygaard
- The Danish Twin Registry and The Danish Aging Research Center, Department of Public Health, University of Southern Denmark, Odense C.,Department of Clinical Genetics, Odense University Hospital, Denmark
| | - Amke Caliebe
- Institute of Medical Informatics and Statistics, Kiel University, University Hospital Schleswig-Holstein, Germany
| | - Hélène Blanché
- Fondation Jean Dausset-Centre d'Etude du Polymorphisme Humain (CEPH), Paris, France
| | - Sophie Chantalat
- Centre National de Recherche en Génomique Humaine CNRGH-CEA, Evry, France
| | - Pilar Galan
- Université Sorbonne Paris Cité-UREN, Unité de Recherche en Epidémiologie Nutritionelle, U557 Inserm, U1125 Inra, Bobigny, France
| | - Wolfgang Lieb
- Institute of Epidemiology and Biobank Popgen, Kiel University, University Hospital Schleswig-Holstein, Germany
| | - Lene Christiansen
- The Danish Twin Registry and The Danish Aging Research Center, Department of Public Health, University of Southern Denmark, Odense C.,Department of Clinical Immunology, Copenhagen University Hospital, Rigshospitalet, Denmark
| | - Jean-François Deleuze
- Fondation Jean Dausset-Centre d'Etude du Polymorphisme Humain (CEPH), Paris, France.,Centre National de Recherche en Génomique Humaine CNRGH-CEA, Evry, France
| | - Kaare Christensen
- The Danish Twin Registry and The Danish Aging Research Center, Department of Public Health, University of Southern Denmark, Odense C.,Department of Clinical Genetics, Odense University Hospital, Denmark.,Department of Clinical Biochemistry and Pharmacology, Odense University Hospital, Denmark
| | - Konstantin Strauch
- Institute of Genetic Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,Chair of Genetic Epidemiology, IBE, Faculty of Medicine, Ludwig-Maximilians-University (LMU) Munich, Germany
| | - Martina Müller-Nurasyid
- Institute of Genetic Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,Chair of Genetic Epidemiology, IBE, Faculty of Medicine, Ludwig-Maximilians-University (LMU) Munich, Germany.,Department of Internal Medicine I (Cardiology), Hospital of the LMU Munich, Germany
| | - Annette Peters
- Institute of Genetic Epidemiology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany.,German Center for Diabetes Research (DZD), Neuherberg, Germany
| | | | - Per Hoffmann
- Institute of Human Genetics, University of Bonn, Germany
| | - Friederike Flachsbart
- Institute of Clinical Molecular Biology, Kiel University, University Hospital Schleswig-Holstein, Germany
| | - Stefan Schreiber
- Institute of Clinical Molecular Biology, Kiel University, University Hospital Schleswig-Holstein, Germany
| | - David Ellinghaus
- Institute of Clinical Molecular Biology, Kiel University, University Hospital Schleswig-Holstein, Germany
| | - Andre Franke
- Institute of Clinical Molecular Biology, Kiel University, University Hospital Schleswig-Holstein, Germany
| | - Janina Dose
- Institute of Clinical Molecular Biology, Kiel University, University Hospital Schleswig-Holstein, Germany
| | - Almut Nebel
- Institute of Clinical Molecular Biology, Kiel University, University Hospital Schleswig-Holstein, Germany
| |
Collapse
|
19
|
Yang Y, Basu S, Zhang L. A Bayesian hierarchically structured prior for rare-variant association testing. Genet Epidemiol 2021; 45:413-424. [PMID: 33565109 DOI: 10.1002/gepi.22379] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 01/08/2021] [Accepted: 01/25/2021] [Indexed: 12/12/2022]
Abstract
Although genome-wide association studies have been widely used to identify associations between complex diseases and genetic variants, standard single-variant analyses often have limited power when applied to rare variants. To overcome this problem, set-based methods have been developed with the aim of boosting power by borrowing strength from multiple rare variants. We propose the adaptive hierarchically structured variable selection (HSVS-A) before test for association of rare variants in a set with continuous or dichotomous phenotypes and to estimate the effect of individual rare variants simultaneously. HSVS-A has the flexibility to integrate a pairwise weighting scheme, which adaptively induces desirable correlations among variants of similar significance such that we can borrow information from potentially causal and noncausal rare variants to boost power. Simulation studies show that for both continuous and dichotomous phenotypes, HSVS-A is powerful when there are multiple causal rare variants, either in the same or opposite direction of effect, with the presence of a large number of noncausal variants. We also apply HSVS-A to the Wellcome Trust Case Control Consortium Crohn's disease data for testing the association of Crohn's disease with rare variants in pathways. HSVS-A identifies two pathways harboring novel protective rare variants for Crohn's disease.
Collapse
Affiliation(s)
- Yi Yang
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, USA.,Department of Biostatistics, Columbia University, New York, New York, USA
| | - Saonli Basu
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Lin Zhang
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
20
|
Laaksonen J, Mishra PP, Seppälä I, Lyytikäinen LP, Raitoharju E, Mononen N, Lepistö M, Almusa H, Ellonen P, Hutri-Kähönen N, Juonala M, Raitakari O, Kähönen M, Salonen JT, Lehtimäki T. Examining the effect of mitochondrial DNA variants on blood pressure in two Finnish cohorts. Sci Rep 2021; 11:611. [PMID: 33436758 PMCID: PMC7804469 DOI: 10.1038/s41598-020-79931-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Accepted: 12/10/2020] [Indexed: 12/12/2022] Open
Abstract
High blood pressure (BP) is a major risk factor for many noncommunicable diseases. The effect of mitochondrial DNA single-nucleotide polymorphisms (mtSNPs) on BP is less known than that of nuclear SNPs. We investigated the mitochondrial genetic determinants of systolic, diastolic, and mean arterial BP. MtSNPs were determined from peripheral blood by sequencing or with genome-wide association study SNP arrays in two independent Finnish cohorts, the Young Finns Study and the Finnish Cardiovascular Study, respectively. In total, over 4200 individuals were included. The effects of individual common mtSNPs, with an additional focus on sex-specificity, and aggregates of rare mtSNPs grouped by mitochondrial genes were evaluated by meta-analysis of linear regression and a sequence kernel association test, respectively. We accounted for the predicted pathogenicity of the rare variants within protein-encoding and the tRNA regions. In the meta-analysis of 87 common mtSNPs, we did not observe significant associations with any of the BP traits. Sex-specific and rare-variant analyses did not pinpoint any significant associations either. Our results are in agreement with several previous studies suggesting that mtDNA variation does not have a significant role in the regulation of BP. Future studies might need to reconsider the mechanisms thought to link mtDNA with hypertension.
Collapse
Affiliation(s)
- Jaakko Laaksonen
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, Arvo Ylpön katu 34, PO Box 100, 33014, Tampere, Finland.
| | - Pashupati P Mishra
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, Arvo Ylpön katu 34, PO Box 100, 33014, Tampere, Finland
| | - Ilkka Seppälä
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, Arvo Ylpön katu 34, PO Box 100, 33014, Tampere, Finland
| | - Leo-Pekka Lyytikäinen
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, Arvo Ylpön katu 34, PO Box 100, 33014, Tampere, Finland
| | - Emma Raitoharju
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, Arvo Ylpön katu 34, PO Box 100, 33014, Tampere, Finland
| | - Nina Mononen
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, Arvo Ylpön katu 34, PO Box 100, 33014, Tampere, Finland
| | - Maija Lepistö
- Institute for Molecular Medicine (FIMM), University of Helsinki, Helsinki, Finland
| | - Henrikki Almusa
- Institute for Molecular Medicine (FIMM), University of Helsinki, Helsinki, Finland
| | - Pekka Ellonen
- Institute for Molecular Medicine (FIMM), University of Helsinki, Helsinki, Finland
| | - Nina Hutri-Kähönen
- Department of Paediatrics, Tampere University Hospital and Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Markus Juonala
- Department of Medicine, University of Turku, Turku, Finland.,Division of Medicine, Turku University Hospital, Turku, Finland.,Murdoch Children's Research Institute, Parkville, VIC, Australia
| | - Olli Raitakari
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland.,Research Centre for Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland.,Department of Clinical Physiology and Nuclear Medicine, University of Turku and Turku University Hospital, Turku, Finland
| | - Mika Kähönen
- Department of Clinical Physiology, Tampere University Hospital and Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Jukka T Salonen
- Department of Public Health, Faculty of Medicine, University of Helsinki, Helsinki, Finland.,MAS-Metabolic Analytical Services Oy, Helsinki, Finland
| | - Terho Lehtimäki
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, Arvo Ylpön katu 34, PO Box 100, 33014, Tampere, Finland
| |
Collapse
|
21
|
A powerful method for pleiotropic analysis under composite null hypothesis identifies novel shared loci between Type 2 Diabetes and Prostate Cancer. PLoS Genet 2020; 16:e1009218. [PMID: 33290408 PMCID: PMC7748289 DOI: 10.1371/journal.pgen.1009218] [Citation(s) in RCA: 83] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 12/18/2020] [Accepted: 10/22/2020] [Indexed: 12/24/2022] Open
Abstract
There is increasing evidence that pleiotropy, the association of multiple traits with the same genetic variants/loci, is a very common phenomenon. Cross-phenotype association tests are often used to jointly analyze multiple traits from a genome-wide association study (GWAS). The underlying methods, however, are often designed to test the global null hypothesis that there is no association of a genetic variant with any of the traits, the rejection of which does not implicate pleiotropy. In this article, we propose a new statistical approach, PLACO, for specifically detecting pleiotropic loci between two traits by considering an underlying composite null hypothesis that a variant is associated with none or only one of the traits. We propose testing the null hypothesis based on the product of the Z-statistics of the genetic variants across two studies and derive a null distribution of the test statistic in the form of a mixture distribution that allows for fractions of variants to be associated with none or only one of the traits. We borrow approaches from the statistical literature on mediation analysis that allow asymptotic approximation of the null distribution avoiding estimation of nuisance parameters related to mixture proportions and variance components. Simulation studies demonstrate that the proposed method can maintain type I error and can achieve major power gain over alternative simpler methods that are typically used for testing pleiotropy. PLACO allows correlation in summary statistics between studies that may arise due to sharing of controls between disease traits. Application of PLACO to publicly available summary data from two large case-control GWAS of Type 2 Diabetes and of Prostate Cancer implicated a number of novel shared genetic regions: 3q23 (ZBTB38), 6q25.3 (RGS17), 9p22.1 (HAUS6), 9p13.3 (UBAP2), 11p11.2 (RAPSN), 14q12 (AKAP6), 15q15 (KNL1) and 18q23 (ZNF236). We propose a new approach PLACO that uses aggregate-level genotype-phenotype association statistics—commonly referred to as GWAS summary statistics—to identify genetic variants that influence risk of two traits or diseases. It allows correlation in summary statistics between studies that may arise due to sharing of controls between disease traits. We demonstrate that PLACO can achieve major power gain over alternative methods that are typically used. We applied PLACO to Type 2 Diabetes and Prostate Cancer summary data from two large case-control studies. Many previous studies have reported an inverse association of these two chronic diseases suggesting shared risk factors; however, shared genetic mechanisms underlying this association is poorly understood. PLACO identified a number of novel shared genetic regions that are not detected by individual trait analysis. Many of the loci implicated by PLACO increase risk for one disease while decreasing risk for the other. PLACO can similarly be used on other traits to shed light on shared genetic risk factors.
Collapse
|
22
|
Xue Y, Ding J, Wang J, Zhang S, Pan D. Two-phase SSU and SKAT in genetic association studies. J Genet 2020. [DOI: 10.1007/s12041-019-1166-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
23
|
Cooper-Knock J, Zhang S, Kenna KP, Moll T, Franklin JP, Allen S, Nezhad HG, Iacoangeli A, Yacovzada NY, Eitan C, Hornstein E, Elhaik E, Celadova P, Bose D, Farhan S, Fishilevich S, Lancet D, Morrison KE, Shaw CE, Al-Chalabi A, Veldink JH, Kirby J, Snyder MP, Shaw PJ. Rare Variant Burden Analysis within Enhancers Identifies CAV1 as an ALS Risk Gene. Cell Rep 2020; 33:108456. [PMID: 33264630 PMCID: PMC7710676 DOI: 10.1016/j.celrep.2020.108456] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 09/15/2020] [Accepted: 11/09/2020] [Indexed: 02/01/2023] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is an incurable neurodegenerative disease. CAV1 and CAV2 organize membrane lipid rafts (MLRs) important for cell signaling and neuronal survival, and overexpression of CAV1 ameliorates ALS phenotypes in vivo. Genome-wide association studies localize a large proportion of ALS risk variants within the non-coding genome, but further characterization has been limited by lack of appropriate tools. By designing and applying a pipeline to identify pathogenic genetic variation within enhancer elements responsible for regulating gene expression, we identify disease-associated variation within CAV1/CAV2 enhancers, which replicate in an independent cohort. Discovered enhancer mutations reduce CAV1/CAV2 expression and disrupt MLRs in patient-derived cells, and CRISPR-Cas9 perturbation proximate to a patient mutation is sufficient to reduce CAV1/CAV2 expression in neurons. Additional enrichment of ALS-associated mutations within CAV1 exons positions CAV1 as an ALS risk gene. We propose CAV1/CAV2 overexpression as a personalized medicine target for ALS.
Collapse
Affiliation(s)
- Johnathan Cooper-Knock
- Sheffield Institute for Translational Neuroscience (SITraN), University of Sheffield, Sheffield, UK.
| | - Sai Zhang
- Stanford Center for Genomics and Personalized Medicine, Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Kevin P Kenna
- Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Tobias Moll
- Sheffield Institute for Translational Neuroscience (SITraN), University of Sheffield, Sheffield, UK
| | - John P Franklin
- Sheffield Institute for Translational Neuroscience (SITraN), University of Sheffield, Sheffield, UK
| | - Samantha Allen
- Sheffield Institute for Translational Neuroscience (SITraN), University of Sheffield, Sheffield, UK
| | - Helia Ghahremani Nezhad
- Sheffield Institute for Translational Neuroscience (SITraN), University of Sheffield, Sheffield, UK
| | - Alfredo Iacoangeli
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Nancy Y Yacovzada
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Chen Eitan
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Eran Hornstein
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Eran Elhaik
- Department of Biology, Lund University, Lund, Sweden
| | - Petra Celadova
- Sheffield Institute for Nucleic Acids, University of Sheffield, Sheffield, UK
| | - Daniel Bose
- Sheffield Institute for Nucleic Acids, University of Sheffield, Sheffield, UK
| | - Sali Farhan
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Simon Fishilevich
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Doron Lancet
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | | | - Christopher E Shaw
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Ammar Al-Chalabi
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Jan H Veldink
- Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Janine Kirby
- Sheffield Institute for Translational Neuroscience (SITraN), University of Sheffield, Sheffield, UK
| | - Michael P Snyder
- Stanford Center for Genomics and Personalized Medicine, Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Pamela J Shaw
- Sheffield Institute for Translational Neuroscience (SITraN), University of Sheffield, Sheffield, UK.
| |
Collapse
|
24
|
Fore R, Boehme J, Li K, Westra J, Tintle N. Multi-Set Testing Strategies Show Good Behavior When Applied to Very Large Sets of Rare Variants. Front Genet 2020; 11:591606. [PMID: 33240333 PMCID: PMC7680887 DOI: 10.3389/fgene.2020.591606] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 10/05/2020] [Indexed: 12/22/2022] Open
Abstract
Gene-based tests of association (e.g., variance components and burden tests) are now common practice for analyses attempting to elucidate the contribution of rare genetic variants on common disease. As sequencing datasets continue to grow in size, the number of variants within each set (e.g., gene) being tested is also continuing to grow. Pathway-based methods have been used to allow for the initial aggregation of gene-based statistical evidence and then the subsequent aggregation of evidence across the pathway. This “multi-set” approach (first gene-based test, followed by pathway-based) lacks thorough exploration in regard to evaluating genotype–phenotype associations in the age of large, sequenced datasets. In particular, we wonder whether there are statistical and biological characteristics that make the multi-set approach optimal vs. simply doing all gene-based tests? In this paper, we provide an intuitive framework for evaluating these questions and use simulated data to affirm us this intuition. A real data application is provided demonstrating how our insights manifest themselves in practice. Ultimately, we find that when initial subsets are biologically informative (e.g., tending to aggregate causal genetic variants within one or more subsets, often genes), multi-set strategies can improve statistical power, with particular gains in cases where causal variants are aggregated in subsets with less variants overall (high proportion of causal variants in the subset). However, we find that there is little advantage when the sets are non-informative (similar proportion of causal variants in the subsets). Our application to real data further demonstrates this intuition. In practice, we recommend wider use of pathway-based methods and further exploration of optimal ways of aggregating variants into subsets based on emerging biological evidence of the genetic architecture of complex disease.
Collapse
Affiliation(s)
- Ruby Fore
- Department of Biostatistics, Brown University, Providence, RI, United States
| | - Jaden Boehme
- Department of Mathematics, Oregon State University, Corvallis, OR, United States
| | - Kevin Li
- Department of Mathematics, School of Arts and Sciences, Columbia University, New York, NY, United States
| | - Jason Westra
- Department of Mathematics and Statistics, Dordt University, Sioux Center, IA, United States
| | - Nathan Tintle
- Department of Mathematics and Statistics, Dordt University, Sioux Center, IA, United States
| |
Collapse
|
25
|
[An improved association analysis pipeline for tumor susceptibility variant in haplotype amplification area]. NAN FANG YI KE DA XUE XUE BAO = JOURNAL OF SOUTHERN MEDICAL UNIVERSITY 2020; 40:1493-1499. [PMID: 33118521 PMCID: PMC7606235 DOI: 10.12122/j.issn.1673-4254.2020.10.16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
OBJECTIVE Haplotype amplification on germline variants is suggested to imply potential selective advantages and clonal expansion susceptibility and has become an important signature for seeking cancer susceptibility gene.Here we propose an improved association method that fully considers the haplotype amplification status. METHODS The haplotype amplification status was estimated by the variant allelic frequencies.We adopted a permutation test on variant allelic frequencies to divide the candidate variants into multiple groups.A likelihood clustering method was then applied to establish the neighborhood system of the hidden Markov random field framework.A filtering pipeline was introduced into the proposed method to further refine the candidate variants, including a Wilson's interval filter and a false discovery rate controller.The final candidate set along with the haplotype amplification status was collapsed into the weighted virtual sites for association tests. RESULTS Through simulated tests on a series of datasets, we compared the type Ⅰ error rates of different minor allele frequencies, which stably fell within 2%, suggesting good robustness of the algorithm.In addition, we compared another 5 published association approaches for Type-Ⅰ and Type-Ⅱ error rates with the proposed method, which resulted in the error rates all within 2%, demonstrating significant advantages and a good statistical ability of the proposed method. CONCLUSIONS The proposed method can accurately identify tumor susceptibility variants in haplotype amplification area with good robustness and stability.
Collapse
|
26
|
Oak N, Cherniack AD, Mashl RJ, Hirsch FR, Ding L, Beroukhim R, Gümüş ZH, Plon SE, Huang KL. Ancestry-specific predisposing germline variants in cancer. Genome Med 2020; 12:51. [PMID: 32471518 PMCID: PMC7260738 DOI: 10.1186/s13073-020-00744-3] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Accepted: 05/07/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Distinct prevalence of inherited genetic predisposition may partially explain the difference of cancer risks across ancestries. Ancestry-specific analyses of germline genomes are required to inform cancer genetic risk and prognosis of diverse populations. METHODS We conducted analyses using germline and somatic sequencing data generated by The Cancer Genome Atlas. Collapsing pathogenic and likely pathogenic variants to cancer predisposition genes (CPG), we analyzed the association between CPGs and cancer types within ancestral groups. We also identified the predisposition-associated two-hit events and gene expression effects in tumors. RESULTS Genetic ancestry analysis classified the cohort of 9899 cancer cases into individuals of primarily European (N = 8184, 82.7%), African (N = 966, 9.8%), East Asian (N = 649, 6.6%), South Asian (N = 48, 0.5%), Native/Latin American (N = 41, 0.4%), and admixed (N = 11, 0.1%) ancestries. In the African ancestry, we discovered a potentially novel association of BRCA2 in lung squamous cell carcinoma (OR = 41.4 [95% CI, 6.1-275.6]; FDR = 0.002) previously identified in Europeans, along with a known association of BRCA2 in ovarian serous cystadenocarcinoma (OR = 8.5 [95% CI, 1.5-47.4]; FDR = 0.045). In the East Asian ancestry, we discovered one previously known association of BRIP1 in stomach adenocarcinoma (OR = 12.8 [95% CI, 1.8-90.8]; FDR = 0.038). Rare variant burden analysis further identified 7 suggestive associations in African ancestry individuals previously described in European ancestry, including SDHB in pheochromocytoma and paraganglioma, ATM in prostate adenocarcinoma, VHL in kidney renal clear cell carcinoma, FH in kidney renal papillary cell carcinoma, and PTEN in uterine corpus endometrial carcinoma. Most predisposing variants were found exclusively in one ancestry in the TCGA and gnomAD datasets. Loss of heterozygosity was identified for 7 out of the 15 African ancestry carriers of predisposing variants. Further, tumors from the SDHB or BRCA2 carriers showed simultaneous allelic-specific expression and low gene expression of their respective affected genes, and FH splice-site variant carriers showed mis-splicing of FH. CONCLUSIONS While several CPGs are shared across patients, many pathogenic variants are found to be ancestry-specific and trigger somatic effects. Studies using larger cohorts of diverse ancestries are required to pinpoint ancestry-specific genetic predisposition and inform genetic screening strategies.
Collapse
Affiliation(s)
- Ninad Oak
- Department of Oncology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Andrew D Cherniack
- The Eli and Edythe L. Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA, 02142, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - R Jay Mashl
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63108, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
| | - Fred R Hirsch
- Department of Oncological Sciences, Center for Thoracic Oncology, Tisch Cancer Institute, New York, NY, USA
| | - Li Ding
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, 63108, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, 63108, USA
- Department of Genetics, Washington University in St. Louis, St. Louis, MO, 63108, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, 63108,, USA
| | - Rameen Beroukhim
- The Eli and Edythe L. Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA, 02142, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Zeynep H Gümüş
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Sharon E Plon
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
- Department of Pediatrics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Kuan-Lin Huang
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
- Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
- Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
| |
Collapse
|
27
|
Statistical Method Based on Bayes-Type Empirical Score Test for Assessing Genetic Association with Multilocus Genotype Data. Int J Genomics 2020; 2020:4708152. [PMID: 32455126 PMCID: PMC7229558 DOI: 10.1155/2020/4708152] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 04/21/2020] [Indexed: 12/20/2022] Open
Abstract
Simultaneous testing of multiple genetic variants for association is widely recognized as a valuable complementary approach to single-marker tests. As such, principal component regression (PCR) has been found to have competitive power. We focus on exploring a robust test for an unknown genetic mode of all SNPs, an unknown Hardy-Weinberg equilibrium (HWE) in a population, and a large number of all SNPs. First, we propose a new global test by means of the use of codominant codes for all markers and PCR. The new global test is built on an empirical Bayes-type score statistic for testing marginal associations with each single marker. The new global test gains power by robustly exploiting the Hardy-Weinberg equilibrium in the control population and effectively using linkage disequilibrium among test markers. The new global test reduces to PCR when the genotype for each marker is coded as the number of minor alleles. This connection lends insight into the power of the new global test relative to PCR and some other popular multimarker test methods. Second, we propose a robust test method based on the new global test and the ordinary PCR test built on a prospective score statistic for testing marginal associations with each single marker when the genotype for each marker is coded as the number of minor alleles by taking the minimum p value of these two tests. Finally, through extensive simulation studies and analysis of the association between pancreatic cancer and some genes of interest, we show that the proposed robust test method has desirable power and can often identify association signals that may be missed by existing methods.
Collapse
|
28
|
Deng Y, He T, Fang R, Li S, Cao H, Cui Y. Genome-Wide Gene-Based Multi-Trait Analysis. Front Genet 2020; 11:437. [PMID: 32508874 PMCID: PMC7248273 DOI: 10.3389/fgene.2020.00437] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Accepted: 04/08/2020] [Indexed: 11/29/2022] Open
Abstract
Genome-wide association studies focusing on a single phenotype have been broadly conducted to identify genetic variants associated with a complex disease. The commonly applied single variant analysis is limited by failing to consider the complex interactions between variants, which motivated the development of association analyses focusing on genes or gene sets. Moreover, when multiple correlated phenotypes are available, methods based on a multi-trait analysis can improve the association power. However, most currently available multi-trait analyses are single variant-based analyses; thus have limited power when disease variants function as a group in a gene or a gene set. In this work, we propose a genome-wide gene-based multi-trait analysis method by considering genes as testing units. For a given phenotype, we adopt a rapid and powerful kernel-based testing method which can evaluate the joint effect of multiple variants within a gene. The joint effect, either linear or nonlinear, is captured through kernel functions. Given a series of candidate kernel functions, we propose an omnibus test strategy to integrate the test results based on different candidate kernels. A p-value combination method is then applied to integrate dependent p-values to assess the association between a gene and multiple correlated phenotypes. Simulation studies show a reasonable type I error control and an excellent power of the proposed method compared to its counterparts. We further show the utility of the method by applying it to two data sets: the Human Liver Cohort and the Alzheimer Disease Neuroimaging Initiative data set, and novel genes are identified. Our method has broad applications in other fields in which the interest is to evaluate the joint effect (linear or nonlinear) of a set of variants.
Collapse
Affiliation(s)
- Yamin Deng
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Tao He
- Department of Mathematics, San Francisco State University, San Francisco, CA, United States
| | - Ruiling Fang
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Shaoyu Li
- Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, NC, United States
| | - Hongyan Cao
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, MI, United States
| |
Collapse
|
29
|
The exhaustive genomic scan approach, with an application to rare-variant association analysis. Eur J Hum Genet 2020; 28:1283-1291. [PMID: 32415273 PMCID: PMC7608423 DOI: 10.1038/s41431-020-0639-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 02/28/2020] [Accepted: 04/07/2020] [Indexed: 12/12/2022] Open
Abstract
Region-based genome-wide scans are usually performed by use of a priori chosen analysis regions. Such an approach will likely miss the region comprising the strongest signal and, thus, may result in increased type II error rates and decreased power. Here, we propose a genomic exhaustive scan approach that analyzes all possible subsequences and does not rely on a prior definition of the analysis regions. As a prime instance, we present a computationally ultraefficient implementation using the rare-variant collapsing test for phenotypic association, the genomic exhaustive collapsing scan (GECS). Our implementation allows for the identification of regions comprising the strongest signals in large, genome-wide rare-variant association studies while controlling the family-wise error rate via permutation. Application of GECS to two genomic data sets revealed several novel significantly associated regions for age-related macular degeneration and for schizophrenia. Our approach also offers a high potential to improve genome-wide scans for selection, methylation, and other analyses.
Collapse
|
30
|
Zhang M, Gelfman S, McCarthy J, Harms MB, Moreno CAM, Goldstein DB, Allen AS. Incorporating external information to improve sparse signal detection in rare-variant gene-set-based analyses. Genet Epidemiol 2020; 44:330-338. [PMID: 32043633 DOI: 10.1002/gepi.22283] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 12/17/2019] [Accepted: 01/27/2020] [Indexed: 01/30/2023]
Abstract
Gene-set analyses are used to assess whether there is any evidence of association with disease among a set of biologically related genes. Such an analysis typically treats all genes within the sets similarly, even though there is substantial, external, information concerning the likely importance of each gene within each set. For example, for traits that are under purifying selection, we would expect genes showing extensive genic constraint to be more likely to be trait associated than unconstrained genes. Here we improve gene-set analyses by incorporating such external information into a higher-criticism-based signal detection analysis. We show that when this external information is predictive of whether a gene is associated with disease, our approach can lead to a significant increase in power. Further, our approach is particularly powerful when the signal is sparse, that is when only a small number of genes within the set are associated with the trait. We illustrate our approach with a gene-set analysis of amyotrophic lateral sclerosis (ALS) and implicate a number of gene-sets containing SOD1 and NEK1 as well as showing enrichment of small p values for gene-sets containing known ALS genes. We implement our approach in the R package wHC.
Collapse
Affiliation(s)
- Mengqi Zhang
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina.,Center for Genomic and Computational Biology, Duke University, Durham, North Carolina.,Center for Statistical Genetics and Genomics, Duke University, Durham, North Carolina
| | - Sahar Gelfman
- Institute of Genomic Medicine, Columbia University, New York City, New York
| | - Janice McCarthy
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina
| | - Matthew B Harms
- Institute of Genomic Medicine, Columbia University, New York City, New York.,Department of Neurology, Columbia University, New York City, New York.,Center for Motor Neuron Biology and Disease, Columbia University, New York City, New York
| | - Cristiane A M Moreno
- Institute of Genomic Medicine, Columbia University, New York City, New York.,Center for Motor Neuron Biology and Disease, Columbia University, New York City, New York
| | - David B Goldstein
- Institute of Genomic Medicine, Columbia University, New York City, New York
| | - Andrew S Allen
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina.,Center for Genomic and Computational Biology, Duke University, Durham, North Carolina.,Center for Statistical Genetics and Genomics, Duke University, Durham, North Carolina
| |
Collapse
|
31
|
Yang T, Wu C, Wei P, Pan W. Integrating DNA sequencing and transcriptomic data for association analyses of low-frequency variants and lipid traits. Hum Mol Genet 2020; 29:515-526. [PMID: 31919517 PMCID: PMC7015848 DOI: 10.1093/hmg/ddz314] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 12/11/2019] [Accepted: 12/16/2019] [Indexed: 12/13/2022] Open
Abstract
Transcriptome-wide association studies (TWAS) integrate genome-wide association studies (GWAS) and transcriptomic data to showcase their improved statistical power of identifying gene-trait associations while, importantly, offering further biological insights. TWAS have thus far focused on common variants as available from GWAS. Compared with common variants, the findings for or even applications to low-frequency variants are limited and their underlying role in regulating gene expression is less clear. To fill this gap, we extend TWAS to integrating whole genome sequencing data with transcriptomic data for low-frequency variants. Using the data from the Framingham Heart Study, we demonstrate that low-frequency variants play an important and universal role in predicting gene expression, which is not completely due to linkage disequilibrium with the nearby common variants. By including low-frequency variants, in addition to common variants, we increase the predictivity of gene expression for 79% of the examined genes. Incorporating this piece of functional genomic information, we perform association testing for five lipid traits in two UK10K whole genome sequencing cohorts, hypothesizing that cis-expression quantitative trait loci, including low-frequency variants, are more likely to be trait-associated. We discover that two genes, LDLR and TTC22, are genome-wide significantly associated with low-density lipoprotein cholesterol based on 3203 subjects and that the association signals are largely independent of common variants. We further demonstrate that a joint analysis of both common and low-frequency variants identifies association signals that would be missed by testing on either common variants or low-frequency variants alone.
Collapse
Affiliation(s)
- Tianzhong Yang
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Chong Wu
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
32
|
Xue Y, Ding J, Wang J, Zhang S, Pan D. Two-phase SSU and SKAT in genetic association studies. J Genet 2020; 99:9. [PMID: 32089528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The sum of squared score (SSU) and sequence kernel association test (SKAT) are the two good alternative tests for genetic association studies in case-control data. Both SSU and SKAT are derived through assuming a dose-response model between the risk of disease and genotypes. However, in practice, the real genetic mode of inheritance is impossible to know. Thus, these two tests might losepower substantially as shown in simulation results when the genetic model is misspecified. Here, to make both the tests suitable in broad situations, we propose two-phase SSU (tpSSU) and two-phase SKAT (tpSKAT), where the Hardy-Weinberg equilibrium test is adopted to choose the genetic model in the first phase and the SSU and SKAT are constructed corresponding to the selected genetic model in the second phase. We found that both tpSSU and tpSKAT outperformed the original SSU and SKAT in most of our simulation scenarios. Byapplying tpSSU and tpSKAT to the study of type 2 diabetes data, we successfully identified some genes that have direct effects on obesity. Besides, we also detected the significant chromosomal region 10q21.22 in GAW16 rheumatoid arthritis dataset, with P<10-6. These findings suggest that tpSSU and tpSKAT can be effective in identifying genetic variants for complex diseases in case-control association studies.
Collapse
Affiliation(s)
- Yuan Xue
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China.
| | | | | | | | | |
Collapse
|
33
|
Bi W, Li Y, Smeltzer MP, Gao G, Zhao S, Kang G. STEPS: an efficient prospective likelihood approach to genetic association analyses of secondary traits in extreme phenotype sequencing. Biostatistics 2020; 21:33-49. [PMID: 30007308 PMCID: PMC8559722 DOI: 10.1093/biostatistics/kxy030] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Revised: 05/16/2018] [Accepted: 06/02/2018] [Indexed: 11/13/2022] Open
Abstract
It has been well acknowledged that methods for secondary trait (ST) association analyses under a case-control design (ST$_{\text{CC}}$) should carefully consider the sampling process to avoid biased risk estimates. A similar situation also exists in the extreme phenotype sequencing (EPS) designs, which is to select subjects with extreme values of continuous primary phenotype for sequencing. EPS designs are commonly used in modern epidemiological and clinical studies such as the well-known National Heart, Lung, and Blood Institute Exome Sequencing Project. Although naïve generalized regression or ST$_{\text{CC}}$ method could be applied, their validity is questionable due to difference in statistical designs. Herein, we propose a general prospective likelihood framework to perform association testing for binary and continuous STs under EPS designs (STEPS), which can also incorporate covariates and interaction terms. We provide a computationally efficient and robust algorithm to obtain the maximum likelihood estimates. We also present two empirical mathematical formulas for power/sample size calculations to facilitate planning of binary/continuous STs association analyses under EPS designs. Extensive simulations and application to a genome-wide association study of benign ethnic neutropenia under an EPS design demonstrate the superiority of STEPS over all its alternatives above.
Collapse
Affiliation(s)
- Wenjian Bi
- Department of Biostatistics, St. Jude Children’s Research
Hospital, Memphis, TN 38105, USA
| | - Yun Li
- Department of Genetics, University of North Carolina, Chapel
Hill, NC 27599, USA
- Department of Biostatistics, University of North Carolina, Chapel
Hill, NC 27599, USA
- Department of Computer Science, University of North Carolina,
Chapel Hill, NC 27599, USA
| | - Matthew P Smeltzer
- Division of Epidemiology, Biostatistics, and Environmental Health, School of
Public Health, University of Memphis, Memphis, TN 38152, USA
| | - Guimin Gao
- Department of Public Health Sciences, University of Chicago,
Chicago, IL 60637, USA
| | - Shengli Zhao
- School of Statistics, Qufu Normal University, Qufu 273165, PR
China
| | - Guolian Kang
- Department of Biostatistics, St. Jude Children’s Research
Hospital, Memphis, TN 38105, USA
| |
Collapse
|
34
|
Papachristou C, Biswas S. Comparison of haplotype-based tests for detecting gene-environment interactions with rare variants. Brief Bioinform 2019; 21:851-862. [PMID: 31329820 DOI: 10.1093/bib/bbz031] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2018] [Revised: 02/06/2019] [Accepted: 02/28/2019] [Indexed: 11/13/2022] Open
Abstract
Dissecting the genetic mechanism underlying a complex disease hinges on discovering gene-environment interactions (GXE). However, detecting GXE is a challenging problem especially when the genetic variants under study are rare. Haplotype-based tests have several advantages over the so-called collapsing tests for detecting rare variants as highlighted in recent literature. Thus, it is of practical interest to compare haplotype-based tests for detecting GXE including the recent ones developed specifically for rare haplotypes. We compare the following methods: haplo.glm, hapassoc, HapReg, Bayesian hierarchical generalized linear model (BhGLM) and logistic Bayesian LASSO (LBL). We simulate data under different types of association scenarios and levels of gene-environment dependence. We find that when the type I error rates are controlled to be the same for all methods, LBL is the most powerful method for detecting GXE. We applied the methods to a lung cancer data set, in particular, in region 15q25.1 as it has been suggested in the literature that it interacts with smoking to affect the lung cancer susceptibility and that it is associated with smoking behavior. LBL and BhGLM were able to detect a rare haplotype-smoking interaction in this region. We also analyzed the sequence data from the Dallas Heart Study, a population-based multi-ethnic study. Specifically, we considered haplotype blocks in the gene ANGPTL4 for association with trait serum triglyceride and used ethnicity as a covariate. Only LBL found interactions of haplotypes with race (Hispanic). Thus, in general, LBL seems to be the best method for detecting GXE among the ones we studied here. Nonetheless, it requires the most computation time.
Collapse
Affiliation(s)
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX, USA
| |
Collapse
|
35
|
Wang C, Deng S, Sun L, Li L, Hu YQ. A nonparametric test for association with multiple loci in the retrospective case-control study. Stat Methods Med Res 2019; 29:589-602. [PMID: 30987531 DOI: 10.1177/0962280219842892] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The genome-wide association studies aim at identifying common or rare variants associated with common diseases and explaining more heritability. It is well known that common diseases are influenced by multiple single nucleotide polymorphisms (SNPs) that are usually correlated in location or function. In order to powerfully detect association signals, it is highly desirable to take account of correlations or linkage disequilibrium (LD) information among multiple SNPs in testing for association. In this article, we propose a test SLIDE that depicts the difference of the average multi-locus genotypes between cases and controls and derive its variance-covariance matrix in the retrospective design. This matrix is composed of the pairwise LD between SNPs. Thus SLIDE can borrow the strength from an external database in the population of interest with a few thousands to hundreds of thousands individuals to improve the power for detecting association. Extensive simulations show that SLIDE has apparent superiority over the existing methods, especially in the situation involving both common and rare variants, both protective and deleterious variants. Furthermore, the efficiency of the proposed method is demonstrated in the application to the data from the Wellcome Trust Case Control Consortium.
Collapse
Affiliation(s)
- Chan Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Institute of Biostatistics, Fudan University, Shanghai, China.,Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY, USA
| | - Shufang Deng
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Institute of Biostatistics, Fudan University, Shanghai, China
| | - Leiming Sun
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Institute of Biostatistics, Fudan University, Shanghai, China
| | - Liming Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Institute of Biostatistics, Fudan University, Shanghai, China
| | - Yue-Qing Hu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Institute of Biostatistics, Fudan University, Shanghai, China.,Shanghai Center for Mathematical Sciences, Fudan University, Shanghai, China
| |
Collapse
|
36
|
Datta AS, Lin S, Biswas S. A Family-Based Rare Haplotype Association Method for Quantitative Traits. Hum Hered 2019; 83:175-195. [PMID: 30799419 DOI: 10.1159/000493543] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Accepted: 09/07/2018] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The variants identified in genome-wide association studies account for only a small fraction of disease heritability. A key to this "missing heritability" is believed to be rare variants. Specifically, we focus on rare haplotype variant (rHTV). The existing methods for detecting rHTV are mostly population-based, and as such, are susceptible to population stratification and admixture, leading to an inflated false-positive rate. Family-based methods are more robust in this respect. METHODS We propose a method for detecting rHTVs associated with quantitative traits called family-based quantitative Bayesian LASSO (famQBL). FamQBL can analyze any type of pedigree and is based on a mixed model framework. We regularize the haplotype effects using Bayesian LASSO and estimate the posterior distributions using Markov chain Monte Carlo methods. RESULTS We conduct simulation studies, including analyses of Genetic Analysis Workshop 18 simulated data, to study the properties of famQBL and compare with a standard family-based haplotype association test implemented in FBAT (family-based association test) software. We find famQBL to be more powerful than FBAT with well-controlled false-positive rates. We also apply famQBL to the Framingham Heart Study data and detect an rHTV associated with diastolic blood pressure. CONCLUSION FamQBL can help uncover rHTVs associated with quantitative traits.
Collapse
Affiliation(s)
- Ananda S Datta
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, Ohio, USA
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA,
| |
Collapse
|
37
|
Chen Z, Wang K. Gene-based sequential burden association test. Stat Med 2019; 38:2353-2363. [PMID: 30706509 DOI: 10.1002/sim.8111] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Revised: 11/29/2018] [Accepted: 01/10/2019] [Indexed: 11/10/2022]
Abstract
Detecting the association between a set of variants and a phenotype of interest is the first and important step in genetic and genomic studies. Although it attracted a large amount of attention in the scientific community and several related statistical approaches have been proposed in the literature, powerful and robust statistical tests are still highly desired and yet to be developed in this area. In this paper, we propose a powerful and robust association test, which combines information from each individual single-nucleotide polymorphisms based on sequential independent burden tests. We compare the proposed approach with some popular tests through a comprehensive simulation study and real data application. Our results show that, in general, the new test is more powerful; the gain in detecting power can be substantial in many situations, compared to other methods.
Collapse
Affiliation(s)
- Zhongxue Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Indiana University Bloomington, Bloomington, Indiana
| | - Kai Wang
- Department of Biostatistics, College of Public Health, University of Iowa, Iowa City, Iowa
| |
Collapse
|
38
|
Zhang X, Basile AO, Pendergrass SA, Ritchie MD. Real world scenarios in rare variant association analysis: the impact of imbalance and sample size on the power in silico. BMC Bioinformatics 2019; 20:46. [PMID: 30669967 PMCID: PMC6343276 DOI: 10.1186/s12859-018-2591-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 12/26/2018] [Indexed: 11/11/2022] Open
Abstract
Background The development of sequencing techniques and statistical methods provides great opportunities for identifying the impact of rare genetic variation on complex traits. However, there is a lack of knowledge on the impact of sample size, case numbers, the balance of cases vs controls for both burden and dispersion based rare variant association methods. For example, Phenome-Wide Association Studies may have a wide range of case and control sample sizes across hundreds of diagnoses and traits, and with the application of statistical methods to rare variants, it is important to understand the strengths and limitations of the analyses. Results We conducted a large-scale simulation of randomly selected low-frequency protein-coding regions using twelve different balanced samples with an equal number of cases and controls as well as twenty-one unbalanced sample scenarios. We further explored statistical performance of different minor allele frequency thresholds and a range of genetic effect sizes. Our simulation results demonstrate that using an unbalanced study design has an overall higher type I error rate for both burden and dispersion tests compared with a balanced study design. Regression has an overall higher type I error with balanced cases and controls, while SKAT has higher type I error for unbalanced case-control scenarios. We also found that both type I error and power were driven by the number of cases in addition to the case to control ratio under large control group scenarios. Based on our power simulations, we observed that a SKAT analysis with case numbers larger than 200 for unbalanced case-control models yielded over 90% power with relatively well controlled type I error. To achieve similar power in regression, over 500 cases are needed. Moreover, SKAT showed higher power to detect associations in unbalanced case-control scenarios than regression. Conclusions Our results provide important insights into rare variant association study designs by providing a landscape of type I error and statistical power for a wide range of sample sizes. These results can serve as a benchmark for making decisions about study design for rare variant analyses. Electronic supplementary material The online version of this article (10.1186/s12859-018-2591-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xinyuan Zhang
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Anna O Basile
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Sarah A Pendergrass
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, USA
| | - Marylyn D Ritchie
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. .,Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA.
| |
Collapse
|
39
|
Lu HM, Li S, Black MH, Lee S, Hoiness R, Wu S, Mu W, Huether R, Chen J, Sridhar S, Tian Y, McFarland R, Dolinsky J, Tippin Davis B, Mexal S, Dunlop C, Elliott A. Association of Breast and Ovarian Cancers With Predisposition Genes Identified by Large-Scale Sequencing. JAMA Oncol 2019; 5:51-57. [PMID: 30128536 PMCID: PMC6439764 DOI: 10.1001/jamaoncol.2018.2956] [Citation(s) in RCA: 117] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Accepted: 05/04/2018] [Indexed: 12/21/2022]
Abstract
Importance Since the discovery of BRCA1 and BRCA2, multiple high- and moderate-penetrance genes have been reported as risk factors for hereditary breast cancer, ovarian cancer, or both; however, it is unclear whether these findings represent the complete genetic landscape of these cancers. Systematic investigation of the genetic contributions to breast and ovarian cancers is needed to confirm these findings and explore potentially new associations. Objective To confirm reported and identify additional predisposition genes for breast or ovarian cancer. Design, Setting, and Participants In this sample of 11 416 patients with clinical features of breast cancer, ovarian cancer, or both who were referred for genetic testing from 1200 hospitals and clinics across the United States and of 3988 controls who were referred for genetic testing for noncancer conditions between 2014 and 2015, whole-exome sequencing was conducted and gene-phenotype associations were examined. Case-control analyses using the Genome Aggregation Database as a set of reference controls were also conducted. Main Outcomes and Measures Breast cancer risk associated with pathogenic variants among 625 cancer predisposition genes; association of identified predisposition breast or ovarian cancer genes with the breast cancer subtypes invasive ductal, invasive lobular, hormone receptor-positive, hormone receptor-negative, and male, and with early-onset disease. Results Of 9639 patients with breast cancer, 3960 (41.1%) were early-onset cases (≤45 years at diagnosis) and 123 (1.3%) were male, with men having an older age at diagnosis than women (mean [SD] age, 61.8 [12.8] vs 48.6 [11.4] years). Of 2051 women with ovarian cancer, 445 (21.7%) received a diagnosis at 45 years or younger. Enrichment of pathogenic variants were identified in 4 non-BRCA genes associated with breast cancer risk: ATM (odds ratio [OR], 2.97; 95% CI, 1.67-5.68), CHEK2 (OR, 2.19; 95% CI, 1.40-3.56), PALB2 (OR, 5.53; 95% CI, 2.24-17.65), and MSH6 (OR, 2.59; 95% CI, 1.35-5.44). Increased risk for ovarian cancer was associated with 4 genes: MSH6 (OR, 4.16; 95% CI, 1.95-9.47), RAD51C (OR, not estimable; false-discovery rate-corrected P = .004), TP53 (OR, 18.50; 95% CI, 2.56-808.10), and ATM (OR, 2.85; 95% CI, 1.30-6.32). Neither the MRN complex genes nor CDKN2A was associated with increased breast or ovarian cancer risk. The findings also do not support previously reported breast cancer associations with the ovarian cancer susceptibility genes BRIP1, RAD51C, and RAD51D, or mismatch repair genes MSH2 and PMS2. Conclusions and Relevance The results of this large-scale exome sequencing of patients and controls shed light on both well-established and controversial non-BRCA predisposition gene associations with breast or ovarian cancer reported to date and may implicate additional breast or ovarian cancer susceptibility gene candidates involved in DNA repair and genomic maintenance.
Collapse
Affiliation(s)
| | - Shuwei Li
- Ambry Genetics, Aliso Viejo, California
| | | | - Shela Lee
- Ambry Genetics, Aliso Viejo, California
- Now with Simcere Pharmaceutical, Jiangsu, China
| | | | - Sitao Wu
- Ambry Genetics, Aliso Viejo, California
| | - Wenbo Mu
- Ambry Genetics, Aliso Viejo, California
| | - Robert Huether
- Ambry Genetics, Aliso Viejo, California
- Tempus, Chicago, Illinois
| | | | - Srijani Sridhar
- Ambry Genetics, Aliso Viejo, California
- Intellia Therapeutics, Cambridge, Massachusetts
| | - Yuan Tian
- Ambry Genetics, Aliso Viejo, California
| | - Rachel McFarland
- Ambry Genetics, Aliso Viejo, California
- Department of Epidemiology, School of Medicine,
University of California, Irvine
| | | | | | | | | | | |
Collapse
|
40
|
He T, Li S, Zhong PS, Cui Y. An optimal kernel-based U
-statistic method for quantitative gene-set association analysis. Genet Epidemiol 2018; 43:137-149. [DOI: 10.1002/gepi.22170] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 08/19/2018] [Accepted: 09/26/2018] [Indexed: 11/09/2022]
Affiliation(s)
- Tao He
- Department of Mathematics; San Francisco State University; San Francisco California
| | - Shaoyu Li
- Department of Mathematics and Statistics; University of North Carolina at Charlotte; Charlotte North Carolina
| | - Ping-Shou Zhong
- Department of Mathematics, Statistics, and Computer Science; University of Illinois at Chicago; Chicago Illinois
| | - Yuehua Cui
- Department of Statistics & Probability; Michigan State University; East Lansing Michigan
- School of Public Health, Zhengzhou University; Zhengzhou China
| |
Collapse
|
41
|
Coombes BJ, Basu S, McGue M. A linear mixed model framework for gene-based gene-environment interaction tests in twin studies. Genet Epidemiol 2018; 42:648-663. [PMID: 30203856 DOI: 10.1002/gepi.22150] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Revised: 04/25/2018] [Accepted: 04/30/2018] [Indexed: 02/03/2023]
Abstract
Interaction between genes and environments (G×E) can be well investigated in families due to the shared genes and environment among family members. However, the majority of the current tests of G×E interaction between a set of variants and an environment are only suitable for studies with unrelated subjects. In this paper, we extend several G×E interaction tests to a linear mixed model framework to study interaction between a set of correlated environments and a candidate gene in families. The correlated environments can either be modeled separately or jointly in one model. We demonstrate theoretically that the tests developed by modeling correlated environments separately are valid and present a computationally fast alternative to detect G×E interaction in families. For either strategy, we propose treating the genetic main effects as a random effect to reduce the number of main-effect parameters and thus improve the power to detect interactions. Additionally, we propose a generalization of a test of interaction that adaptively sums the interactions using a sequential algorithm. This generalized set of tests, referred to as the sequential algorithm for the sum of powered score (Seq-SPU) family of tests, can be expressed as a weighted version of the SPU. We find that the adaptive version of our test, Seq-aSPU, can outperform aSPU in cases where the interactions effects are in opposite directions. We applied these methods to the Minnesota Center for Twin and Family Research data set and found one significant gene in interaction with four psychosocial environmental factors affecting the alcohol consumption among the twins.
Collapse
Affiliation(s)
- Brandon J Coombes
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| | - Saonli Basu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| | - Matt McGue
- Department of Psychology, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| |
Collapse
|
42
|
Huang KL, Mashl RJ, Wu Y, Ritter DI, Wang J, Oh C, Paczkowska M, Reynolds S, Wyczalkowski MA, Oak N, Scott AD, Krassowski M, Cherniack AD, Houlahan KE, Jayasinghe R, Wang LB, Zhou DC, Liu D, Cao S, Kim YW, Koire A, McMichael JF, Hucthagowder V, Kim TB, Hahn A, Wang C, McLellan MD, Al-Mulla F, Johnson KJ, Lichtarge O, Boutros PC, Raphael B, Lazar AJ, Zhang W, Wendl MC, Govindan R, Jain S, Wheeler D, Kulkarni S, Dipersio JF, Reimand J, Meric-Bernstam F, Chen K, Shmulevich I, Plon SE, Chen F, Ding L. Pathogenic Germline Variants in 10,389 Adult Cancers. Cell 2018; 173:355-370.e14. [PMID: 29625052 PMCID: PMC5949147 DOI: 10.1016/j.cell.2018.03.039] [Citation(s) in RCA: 578] [Impact Index Per Article: 82.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2017] [Revised: 02/24/2018] [Accepted: 03/15/2018] [Indexed: 12/20/2022]
Abstract
We conducted the largest investigation of predisposition variants in cancer to date, discovering 853 pathogenic or likely pathogenic variants in 8% of 10,389 cases from 33 cancer types. Twenty-one genes showed single or cross-cancer associations, including novel associations of SDHA in melanoma and PALB2 in stomach adenocarcinoma. The 659 predisposition variants and 18 additional large deletions in tumor suppressors, including ATM, BRCA1, and NF1, showed low gene expression and frequent (43%) loss of heterozygosity or biallelic two-hit events. We also discovered 33 such variants in oncogenes, including missenses in MET, RET, and PTPN11 associated with high gene expression. We nominated 47 additional predisposition variants from prioritized VUSs supported by multiple evidences involving case-control frequency, loss of heterozygosity, expression effect, and co-localization with mutations and modified residues. Our integrative approach links rare predisposition variants to functional consequences, informing future guidelines of variant classification and germline genetic testing in cancer.
Collapse
Affiliation(s)
- Kuan-Lin Huang
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO 63108, USA; McDonnell Genome Institute, Washington University in St. Louis, Saint Louis, MO 63108, USA
| | - R Jay Mashl
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO 63108, USA; McDonnell Genome Institute, Washington University in St. Louis, Saint Louis, MO 63108, USA
| | - Yige Wu
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO 63108, USA; McDonnell Genome Institute, Washington University in St. Louis, Saint Louis, MO 63108, USA
| | - Deborah I Ritter
- Baylor College of Medicine and Texas Children's Hospital, Houston, TX, USA
| | - Jiayin Wang
- School of Management, Xi'an Jiaotong University, Xi'an, Shanxi, China
| | - Clara Oh
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO 63108, USA
| | - Marta Paczkowska
- Computational Biology Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | | | - Matthew A Wyczalkowski
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO 63108, USA; McDonnell Genome Institute, Washington University in St. Louis, Saint Louis, MO 63108, USA
| | - Ninad Oak
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Adam D Scott
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO 63108, USA; McDonnell Genome Institute, Washington University in St. Louis, Saint Louis, MO 63108, USA
| | - Michal Krassowski
- Computational Biology Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | | | - Kathleen E Houlahan
- Computational Biology Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada; Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Reyka Jayasinghe
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO 63108, USA; McDonnell Genome Institute, Washington University in St. Louis, Saint Louis, MO 63108, USA
| | - Liang-Bo Wang
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO 63108, USA; McDonnell Genome Institute, Washington University in St. Louis, Saint Louis, MO 63108, USA
| | - Daniel Cui Zhou
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO 63108, USA; McDonnell Genome Institute, Washington University in St. Louis, Saint Louis, MO 63108, USA
| | - Di Liu
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO 63108, USA
| | - Song Cao
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO 63108, USA; McDonnell Genome Institute, Washington University in St. Louis, Saint Louis, MO 63108, USA
| | - Young Won Kim
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Amanda Koire
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Joshua F McMichael
- McDonnell Genome Institute, Washington University in St. Louis, Saint Louis, MO 63108, USA
| | | | - Tae-Beom Kim
- Departments of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Abigail Hahn
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Chen Wang
- Department of Health Sciences Research and Department of Obstetrics and Gynecology, Mayo Clinic College of Medicine, Rochester, MN 55905 USA
| | - Michael D McLellan
- McDonnell Genome Institute, Washington University in St. Louis, Saint Louis, MO 63108, USA
| | - Fahd Al-Mulla
- Dasman Diabetes Institute and Molecular Pathology Laboratory, Kuwait University, Kuwait
| | - Kimberly J Johnson
- Brown School Master of Public Health Program, Washington University in St. Louis, Saint Louis, MO 63108, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Paul C Boutros
- Computational Biology Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada; Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Benjamin Raphael
- Lewis-Sigler Institute, Princeton University, Princeton, NJ 08544, USA
| | - Alexander J Lazar
- Departments of Pathology and Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Wei Zhang
- Department of Cancer Biology and Center for Genomics and Personalized Medicine Research, Wake Forest School of Medicine, Winston Salem, NC 27157 USA
| | - Michael C Wendl
- McDonnell Genome Institute, Washington University in St. Louis, Saint Louis, MO 63108, USA; Department of Genetics, Washington University in St. Louis, Saint Louis, MO 63108, USA; Department of Mathematics, Washington University in St. Louis, Saint Louis, MO 63108, USA
| | - Ramaswamy Govindan
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO 63108, USA
| | - Sanjay Jain
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO 63108, USA
| | - David Wheeler
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Shashikant Kulkarni
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Baylor Genetics, Houston, TX 77021, USA
| | - John F Dipersio
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO 63108, USA; Siteman Cancer Center, Washington University in St. Louis, Saint Louis, MO 63108, USA
| | - Jüri Reimand
- Computational Biology Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada; Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Funda Meric-Bernstam
- Department of Investigational Cancer Therapeutics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Ken Chen
- Departments of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | | | - Sharon E Plon
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Feng Chen
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO 63108, USA; Siteman Cancer Center, Washington University in St. Louis, Saint Louis, MO 63108, USA.
| | - Li Ding
- Department of Medicine, Washington University in St. Louis, Saint Louis, MO 63108, USA; McDonnell Genome Institute, Washington University in St. Louis, Saint Louis, MO 63108, USA; Department of Genetics, Washington University in St. Louis, Saint Louis, MO 63108, USA; Siteman Cancer Center, Washington University in St. Louis, Saint Louis, MO 63108, USA.
| |
Collapse
|
43
|
Chen L, Wang Y, Zhou Y. Association analysis of multiple traits by an approach of combining P values. J Genet 2018; 97:79-85. [PMID: 29666327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Increasing evidence shows that one variant can affect multiple traits, which is a widespread phenomenon in complex diseases. Joint analysis of multiple traits can increase statistical power of association analysis and uncover the underlying genetic mechanism. Although there are many statistical methods to analyse multiple traits, most of these methods are usually suitable for detecting common variants associated with multiple traits. However, because of low minor allele frequency of rare variant, these methods are not optimal for rare variant association analysis. In this paper, we extend an adaptive combination of P values method (termed ADA) for single trait to test association between multiple traits and rare variants in the given region. For a given region, we use reverse regression model to test each rare variant associated with multiple traits and obtain the P value of single-variant test. Further, we take the weighted combination of these P values as the test statistic. Extensive simulation studies show that our approach is more powerful than several other comparison methods in most cases and is robust to the inclusion of a high proportion of neutral variants and the different directions of effects of causal variants.
Collapse
Affiliation(s)
- Lili Chen
- Department of Mathematics, School of Sciences, Harbin Institute of Technology, Harbin 150001, People's Republic of China.
| | | | | |
Collapse
|
44
|
Russo A, Di Gaetano C, Cugliari G, Matullo G. Advances in the Genetics of Hypertension: The Effect of Rare Variants. Int J Mol Sci 2018; 19:E688. [PMID: 29495593 PMCID: PMC5877549 DOI: 10.3390/ijms19030688] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Revised: 02/19/2018] [Accepted: 02/26/2018] [Indexed: 12/22/2022] Open
Abstract
Worldwide, hypertension still represents a serious health burden with nine million people dying as a consequence of hypertension-related complications. Essential hypertension is a complex trait supported by multifactorial genetic inheritance together with environmental factors. The heritability of blood pressure (BP) is estimated to be 30-50%. A great effort was made to find genetic variants affecting BP levels through Genome-Wide Association Studies (GWAS). This approach relies on the "common disease-common variant" hypothesis and led to the identification of multiple genetic variants which explain, in aggregate, only 2-3% of the genetic variance of hypertension. Part of the missing genetic information could be caused by variants too rare to be detected by GWAS. The use of exome chips and Next-Generation Sequencing facilitated the discovery of causative variants. Here, we report the advances in the detection of novel rare variants, genes, and/or pathways through the most promising approaches, and the recent statistical tests that have emerged to handle rare variants. We also discuss the need to further support rare novel variants with replication studies within larger consortia and with deeper functional studies to better understand how new genes might improve patient care and the stratification of the response to antihypertensive treatments.
Collapse
Affiliation(s)
- Alessia Russo
- Department of Medical Sciences, University of Turin, 10126 Turin, Italy.
- Italian Institute for Genomic Medicine (IIGM, Formerly HuGeF), 10126 Turin, Italy.
| | - Cornelia Di Gaetano
- Department of Medical Sciences, University of Turin, 10126 Turin, Italy.
- Italian Institute for Genomic Medicine (IIGM, Formerly HuGeF), 10126 Turin, Italy.
| | - Giovanni Cugliari
- Department of Medical Sciences, University of Turin, 10126 Turin, Italy.
- Italian Institute for Genomic Medicine (IIGM, Formerly HuGeF), 10126 Turin, Italy.
| | - Giuseppe Matullo
- Department of Medical Sciences, University of Turin, 10126 Turin, Italy.
- Italian Institute for Genomic Medicine (IIGM, Formerly HuGeF), 10126 Turin, Italy.
| |
Collapse
|
45
|
Chen L, Wang Y, Zhou Y. Association analysis of multiple traits by an approach of combining
$$P$$
P
values. J Genet 2018. [DOI: 10.1007/s12041-018-0885-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
46
|
Timpson NJ, Greenwood CMT, Soranzo N, Lawson DJ, Richards JB. Genetic architecture: the shape of the genetic contribution to human traits and disease. Nat Rev Genet 2018; 19:110-124. [PMID: 29225335 DOI: 10.1038/nrg.2017.101] [Citation(s) in RCA: 257] [Impact Index Per Article: 36.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Genetic architecture describes the characteristics of genetic variation that are responsible for heritable phenotypic variability. It depends on the number of genetic variants affecting a trait, their frequencies in the population, the magnitude of their effects and their interactions with each other and the environment. Defining the genetic architecture of a complex trait or disease is central to the scientific and clinical goals of human genetics, which are to understand disease aetiology and aid in disease screening, diagnosis, prognosis and therapy. Recent technological advances have enabled genome-wide association studies and emerging next-generation sequencing studies to begin to decipher the nature of the heritable contribution to traits and disease. Here, we describe the types of genetic architecture that have been observed, how architecture can be measured and why an improved understanding of genetic architecture is central to future advances in the field.
Collapse
Affiliation(s)
- Nicholas J Timpson
- MRC Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Oakfield House, Oakfield Grove, Clifton, Bristol BS8 2BN, UK
| | - Celia M T Greenwood
- Lady Davis Institute for Medical Research, Jewish General Hospital, McGill University, 3755 Cote Ste Catherine, Montréal, Québec H3T 1E2, Canada.,Department of Oncology, McGill University, 3755 Cote Ste Catherine, Montréal, Québec H3T 1E2, Canada.,Departments of Human Genetics and Epidemiology, Biostatistics and Occupational Health, McGill University, 3755 Cote Ste Catherine, Montréal, Québec H3T 1E2, Canada
| | - Nicole Soranzo
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK.,Department of Haematology, University of Cambridge, Long Road, Cambridge CB2 0PT, UK
| | - Daniel J Lawson
- MRC Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Oakfield House, Oakfield Grove, Clifton, Bristol BS8 2BN, UK
| | - J Brent Richards
- Departments of Human Genetics and Epidemiology, Biostatistics and Occupational Health, McGill University, 3755 Cote Ste Catherine, Montréal, Québec H3T 1E2, Canada.,Department of Medicine, Lady Davis Institute for Medical Research, Jewish General Hospital, McGill University, 3755 Cote Ste Catherine, Montréal, Québec H3T 1E2, Canada.,Department of Twin Research & Genetic Epidemiology, King's College London, St Thomas' Campus, Lambeth Palace Road, London SE1 7EH, UK
| |
Collapse
|
47
|
Li Y, Xiang Y, Xu C, Shen H, Deng H. Rare variant association analysis in case-parents studies by allowing for missing parental genotypes. BMC Genet 2018; 19:7. [PMID: 29334894 PMCID: PMC5769338 DOI: 10.1186/s12863-018-0597-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Accepted: 01/04/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The development of next-generation sequencing technologies has facilitated the identification of rare variants. Family-based design is commonly used to effectively control for population admixture and substructure, which is more prominent for rare variants. Case-parents studies, as typical strategies in family-based design, are widely used in rare variant-disease association analysis. Current methods in case-parents studies are based on complete case-parents data; however, parental genotypes may be missing in case-parents trios, and removing these data may lead to a loss in statistical power. The present study focuses on testing for rare variant-disease association in case-parents study by allowing for missing parental genotypes. RESULTS In this report, we extended the collapsing method for rare variant association analysis in case-parents studies to allow for missing parental genotypes, and investigated the performance of two methods by using the difference of genotypes between affected offspring and their corresponding "complements" in case-parent trios and TDT framework. Using simulations, we showed that, compared with the methods just only using complete case-parents data, the proposed strategy allowing for missing parental genotypes, or even adding unrelated affected individuals, can greatly improve the statistical power and meanwhile is not affected by population stratification. CONCLUSIONS We conclude that adding case-parents data with missing parental genotypes to complete case-parents data set can greatly improve the power of our strategy for rare variant-disease association.
Collapse
Affiliation(s)
- Yumei Li
- School of Mathematics and Computational Science, Huaihua University, Huaihua, Hunan 418008 People’s Republic of China
- Center for Bioinformatics and Genomics, Department of Global Biostatistics and Data Science, Tulane University, New Orleans, LA 70112 USA
| | - Yang Xiang
- School of Mathematics and Computational Science, Huaihua University, Huaihua, Hunan 418008 People’s Republic of China
| | - Chao Xu
- Center for Bioinformatics and Genomics, Department of Global Biostatistics and Data Science, Tulane University, New Orleans, LA 70112 USA
| | - Hui Shen
- Center for Bioinformatics and Genomics, Department of Global Biostatistics and Data Science, Tulane University, New Orleans, LA 70112 USA
| | - Hongwen Deng
- Center for Bioinformatics and Genomics, Department of Global Biostatistics and Data Science, Tulane University, New Orleans, LA 70112 USA
- Center for Bioinformatics and Genomics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA 70112 USA
| |
Collapse
|
48
|
Abstract
While genome-wide association studies have been very successful in identifying associations of common genetic variants with many different traits, the rarer frequency spectrum of the genome has not yet been comprehensively explored. Technological developments increasingly lift restrictions to access rare genetic variation. Dense reference panels enable improved genotype imputation for rarer variants in studies using DNA microarrays. Moreover, the decreasing cost of next generation sequencing makes whole exome and genome sequencing increasingly affordable for large samples. Large-scale efforts based on sequencing, such as ExAC, 100,000 Genomes, and TopMed, are likely to significantly advance this field.The main challenge in evaluating complex trait associations of rare variants is statistical power. The choice of population should be considered carefully because allele frequencies and linkage disequilibrium structure differ between populations. Genetically isolated populations can have favorable genomic characteristics for the study of rare variants.One strategy to increase power is to assess the combined effect of multiple rare variants within a region, known as aggregate testing. A range of methods have been developed for this. Model performance depends on the genetic architecture of the region of interest.
Collapse
Affiliation(s)
- Karoline Kuchenbaecker
- Wellcome Trust Sanger Institute, Cambridge, UK. .,University College London, London, UK.
| | - Emil Vincent Rosenbaum Appel
- Novo Nordisk Foundation Center for Basic Metabolic Research, Section for Metabolic Genetics, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
49
|
Lin JR, Zhang Q, Cai Y, Morrow BE, Zhang ZD. Integrated rare variant-based risk gene prioritization in disease case-control sequencing studies. PLoS Genet 2017; 13:e1007142. [PMID: 29281626 PMCID: PMC5760082 DOI: 10.1371/journal.pgen.1007142] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Revised: 01/09/2018] [Accepted: 12/01/2017] [Indexed: 12/17/2022] Open
Abstract
Rare variants of major effect play an important role in human complex diseases and can be discovered by sequencing-based genome-wide association studies. Here, we introduce an integrated approach that combines the rare variant association test with gene network and phenotype information to identify risk genes implicated by rare variants for human complex diseases. Our data integration method follows a 'discovery-driven' strategy without relying on prior knowledge about the disease and thus maintains the unbiased character of genome-wide association studies. Simulations reveal that our method can outperform a widely-used rare variant association test method by 2 to 3 times. In a case study of a small disease cohort, we uncovered putative risk genes and the corresponding rare variants that may act as genetic modifiers of congenital heart disease in 22q11.2 deletion syndrome patients. These variants were missed by a conventional approach that relied on the rare variant association test alone. Case-control sequencing studies are a promising design to uncover risk genes of human complex diseases implicated by rare variants. The recent development of different types of rare variant association tests has improved the statistical power to identify disease genes that harbor risk rare variants. However, none of the recent sequencing-based genome-wide association studies identified robust disease association of rare variants or genes based on them. Due to limited sample sizes that can be feasibly achieved in real applications, current rare variant association tests can only generate marginal association signals for most risk genes. Here we proposed an integrated method that combined association signals with orthogonal biological evidence to uncover risk genes in sequencing studies. Designed to address the lack-of-power issue, our method was shown to effectively uncover risk genes with marginal association signals in data simulation. Indeed, in a real application demonstrated in our case study our method disclosed important risk genes of congenital heart disease in 22q11.2 deletion syndrome that were missed by the previous study.
Collapse
Affiliation(s)
- Jhih-Rong Lin
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Quanwei Zhang
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Ying Cai
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Bernice E Morrow
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Zhengdong D Zhang
- Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| |
Collapse
|
50
|
Hsieh AR, Chen DP, Chattopadhyay AS, Li YJ, Chang CC, Fann CSJ. A non-threshold region-specific method for detecting rare variants in complex diseases. PLoS One 2017; 12:e0188566. [PMID: 29190701 PMCID: PMC5708778 DOI: 10.1371/journal.pone.0188566] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Accepted: 11/09/2017] [Indexed: 11/23/2022] Open
Abstract
A region-specific method, NTR (non-threshold rare) variant detection method, was developed—it does not use the threshold for defining rare variants and accounts for directions of effects. NTR also considers linkage disequilibrium within the region and accommodates common and rare variants simultaneously. NTR weighs variants according to minor allele frequency and odds ratio to combine the effects of common and rare variants on disease occurrence into a single score and provides a test statistic to assess the significance of the score. In the simulations, under different effect sizes, the power of NTR increased as the effect size increased, and the type I error of our method was controlled well. Moreover, NTR was compared with several other existing methods, including the combined multivariate and collapsing method (CMC), weighted sum statistic method (WSS), sequence kernel association test (SKAT), and its modification, SKAT-O. NTR yields comparable or better power in simulations, especially when the effects of linkage disequilibrium between variants were at least moderate. In an analysis of diabetic nephropathy data, NTR detected more confirmed disease-related genes than the other aforementioned methods. NTR can thus be used as a complementary tool to help in dissecting the etiology of complex diseases.
Collapse
Affiliation(s)
- Ai-Ru Hsieh
- Graduate Institute of Biostatistics, China Medical University, Taichung, Taiwan
| | - Dao-Peng Chen
- Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei, Taiwan
| | | | - Ying-Ju Li
- Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei, Taiwan
| | - Chien-Ching Chang
- Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei, Taiwan
| | - Cathy S. J. Fann
- Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei, Taiwan
- * E-mail:
| |
Collapse
|