1
|
Riccio C, Jansen ML, Thalén F, Koliopanos G, Link V, Ziegler A. Assessment of the functionality and usability of open-source rare variant analysis pipelines. Brief Bioinform 2025; 26:bbaf044. [PMID: 39907318 PMCID: PMC11795309 DOI: 10.1093/bib/bbaf044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Revised: 01/07/2025] [Accepted: 01/20/2025] [Indexed: 02/06/2025] Open
Abstract
Sequencing of increasingly larger cohorts has revealed many rare variants, presenting an opportunity to further unravel the genetic basis of complex traits. Compared with common variants, rare variants are more complex to analyze. Specialized computational tools for these analyses should be both flexible and user-friendly. However, an overview of the available rare variant analysis pipelines and their functionalities is currently lacking. Here, we provide a systematic review of the currently available rare variant analysis pipelines. We searched MEDLINE and Google Scholar until 27 November 2023, and included open-source rare variant pipelines that accepted genotype data from cohort and case-control studies and group variants into testing units. Eligible pipelines were assessed based on functionality and usability criteria. We identified 17 rare variant pipelines that collectively support various trait types, association tests, testing units, and variant weighting schemes. Currently, no single pipeline can handle all data types in a scalable and flexible manner. We recommend different tools to meet diverse analysis needs. STAARpipeline is suitable for newcomers and common applications owing to its built-in definitions for the testing units. REGENIE is highly scalable, actively maintained, regularly updated, and well documented. Ravages is suitable for analyzing multinomial variables, and OrdinalGWAS is tailored for analyzing ordinal variables. Opportunities remain for developing a user-friendly pipeline that provides high degrees of flexibility and scalability. Such a pipeline would enable researchers to exploit the potential of rare variant analyses to uncover the genetic basis of complex traits.
Collapse
Affiliation(s)
- Cristian Riccio
- Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
- Swiss Institute of Bioinformatics, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
| | - Max L Jansen
- Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
- Swiss Institute of Bioinformatics, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
| | - Felix Thalén
- Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
- Swiss Institute of Bioinformatics, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
| | - Georgios Koliopanos
- Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
- Swiss Institute of Bioinformatics, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
| | - Vivian Link
- Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
- Swiss Institute of Bioinformatics, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
| | - Andreas Ziegler
- Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
- Swiss Institute of Bioinformatics, Herman-Burchard-Str. 12, 7265 Davos Wolfgang, Switzerland
- Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20251 Hamburg, Germany
- University Center of Cardiovascular Science & Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20251 Hamburg, Germany
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Ave, Scottsville, Pietermaritzburg, 3201, South Africa
| |
Collapse
|
2
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: trends from three decades of genetic variant impact predictors. Hum Genomics 2024; 18:90. [PMID: 39198917 PMCID: PMC11360829 DOI: 10.1186/s40246-024-00663-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2024] [Accepted: 08/19/2024] [Indexed: 09/01/2024] Open
Abstract
BACKGROUND Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). RESULTS The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past three decades, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 190 VIPs, resulting in a total of 407 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. CONCLUSIONS VIPdb version 2 summarizes 407 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. VIPdb is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA
| | - Arul S Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA
- Illumina, Foster City, CA, 94404, USA
| | - Steven E Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA.
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA.
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA.
| |
Collapse
|
3
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: Trends from 25 years of genetic variant impact predictors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600283. [PMID: 38979289 PMCID: PMC11230257 DOI: 10.1101/2024.06.25.600283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Background Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). Results The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past 25 years, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 186 VIPs, resulting in a total of 403 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. Conclusions VIPdb version 2 summarizes 403 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. Availability VIPdb version 2 is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
| | - Arul S. Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Currently at: Illumina, Foster City, California 94404, USA
| | - Steven E. Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
4
|
Chattopadhyay A, Shih CY, Hsu YC, Juang JMJ, Chuang EY, Lu TP. CLIN_SKAT: an R package to conduct association analysis using functionally relevant variants. BMC Bioinformatics 2022; 23:441. [PMID: 36274122 PMCID: PMC9590128 DOI: 10.1186/s12859-022-04987-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 10/16/2022] [Indexed: 12/03/2022] Open
Abstract
Background Availability of next generation sequencing data, allows low-frequency and rare variants to be studied through strategies other than the commonly used genome-wide association studies (GWAS). Rare variants are important keys towards explaining the heritability for complex diseases that remains to be explained by common variants due to their low effect sizes. However, analysis strategies struggle to keep up with the huge amount of data at disposal therefore creating a bottleneck. This study describes CLIN_SKAT, an R package, that provides users with an easily implemented analysis pipeline with the goal of (i) extracting clinically relevant variants (both rare and common), followed by (ii) gene-based association analysis by grouping the selected variants.
Results CLIN_SKAT offers four simple functions that can be used to obtain clinically relevant variants, map them to genes or gene sets, calculate weights from global healthy populations and conduct weighted case–control analysis. CLIN_SKAT introduces improvements by adding certain pre-analysis steps and customizable features to make the SKAT results clinically more meaningful. Moreover, it offers several plot functions that can be availed towards obtaining visualizations for interpretation of the analyses results. CLIN_SKAT is available on Windows/Linux/MacOS and is operative for R version 4.0.4 or later. It can be freely downloaded from https://github.com/ShihChingYu/CLIN_SKAT, installed through devtools::install_github("ShihChingYu/CLIN_SKAT", force=T) and executed by loading the package into R using library(CLIN_SKAT). All outputs (tabular and graphical) can be downloaded in simple, publishable formats.
Conclusions Statistical association analysis is often underpowered due to low sample sizes and high numbers of variants to be tested, limiting detection of causal ones. Therefore, retaining a subset of variants that are biologically meaningful seems to be a more effective strategy for identifying explainable associations while reducing the degrees of freedom. CLIN_SKAT offers users a one-stop R package that identifies disease risk variants with improved power via a series of tailor-made procedures that allows dimension reduction, by retaining functionally relevant variants, and incorporating ethnicity based priors. Furthermore, it also eliminates the requirement for high computational resources and bioinformatics expertise. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04987-2.
Collapse
|
5
|
El-Boraie A, Tanner JA, Zhu AZX, Claw KG, Prasad B, Schuetz EG, Thummel KE, Fukunaga K, Mushiroda T, Kubo M, Benowitz NL, Lerman C, Tyndale RF. Functional characterization of novel rare CYP2A6 variants and potential implications for clinical outcomes. Clin Transl Sci 2021; 15:204-220. [PMID: 34476898 PMCID: PMC8742641 DOI: 10.1111/cts.13135] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 07/23/2021] [Accepted: 07/24/2021] [Indexed: 11/28/2022] Open
Abstract
CYP2A6 activity, phenotyped by the nicotine metabolite ratio (NMR), is a predictor of several smoking behaviors, including cessation and smoking‐related disease risk. The heritability of the NMR is 60–80%, yet weighted genetic risk scores (wGRSs) based on common variants explain only 30–35%. Rare variants (minor allele frequency <1%) are hypothesized to explain some of this missing heritability. We present two targeted sequencing studies where rare protein‐coding variants are functionally characterized in vivo, in silico, and in vitro to examine this hypothesis. In a smoking cessation trial, 1687 individuals were sequenced; characterization measures included the in vivo NMR, in vitro protein expression, and metabolic activity measured from recombinant proteins. In a human liver bank, 312 human liver samples were sequenced; measures included RNA expression, protein expression, and metabolic activity from extracted liver tissue. In total, 38 of 47 rare coding variants identified were novel; characterizations ranged from gain‐of‐function to loss‐of‐function. On a population level, the portion of NMR variation explained by the rare coding variants was small (~1%). However, upon incorporation, the accuracy of the wGRS was improved for individuals with rare protein‐coding variants (i.e., the residuals were reduced), and approximately one‐third of these individuals (12/39) were re‐assigned from normal to slow metabolizer status. Rare coding variants can alter an individual’s CYP2A6 activity; their integration into wGRSs through precise functional characterization is necessary to accurately assess clinical outcomes and achieve precision medicine for all. Investigation into noncoding variants is warranted to further explain the missing heritability in the NMR.
Collapse
Affiliation(s)
- Ahmed El-Boraie
- Department of Pharmacology & Toxicology, University of Toronto, Toronto, ON, Canada.,Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health and Division of Brain and Therapeutics, Toronto, ON, Canada
| | | | - Andy Z X Zhu
- Department of Quantitative Translational Sciences, Takeda Pharmaceuticals, Cambridge, Massachusetts, USA
| | - Katrina G Claw
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado, Aurora, Colorado, USA
| | - Bhagwat Prasad
- Department of Pharmaceutical Sciences, Washington State University, Spokane, Washington, USA
| | - Erin G Schuetz
- Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee, USA
| | - Kenneth E Thummel
- Department of Pharmaceutics, University of Washington, Seattle, Washington, USA
| | - Koya Fukunaga
- Center for Integrative Medical Sciences, RIKEN, Yokohama, Japan
| | | | - Michiaki Kubo
- Center for Integrative Medical Sciences, RIKEN, Yokohama, Japan
| | - Neal L Benowitz
- Clinical Pharmacology Research Program, Division of Cardiology, Department of Medicine and Center for Tobacco Control Research and Education, University of California San Francisco, San Francisco, California, USA
| | - Caryn Lerman
- Department of Psychiatry, USC Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, USA
| | - Rachel F Tyndale
- Department of Pharmacology & Toxicology, University of Toronto, Toronto, ON, Canada.,Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health and Division of Brain and Therapeutics, Toronto, ON, Canada.,Department of Psychiatry, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
6
|
Shivakumar M, Miller JE, Dasari VR, Zhang Y, Lee MTM, Carey DJ, Gogoi R, Kim D. Genetic Analysis of Functional Rare Germline Variants across Nine Cancer Types from an Electronic Health Record Linked Biobank. Cancer Epidemiol Biomarkers Prev 2021; 30:1681-1688. [PMID: 34244158 DOI: 10.1158/1055-9965.epi-21-0082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 02/15/2021] [Accepted: 06/17/2021] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Rare variants play an essential role in the etiology of cancer. In this study, we aim to characterize rare germline variants that impact the risk of cancer. METHODS We performed a genome-wide rare variant analysis using germline whole exome sequencing (WES) data derived from the Geisinger MyCode initiative to discover cancer predisposition variants. The case-control association analysis was conducted by binning variants in 5,538 patients with cancer and 7,286 matched controls in a discovery set and 1,991 patients with cancer and 2,504 matched controls in a validation set across nine cancer types. Further, The Cancer Genome Atlas (TCGA) germline data were used to replicate the findings. RESULTS We identified 133 significant pathway-cancer pairs (85 replicated) and 90 significant gene-cancer pairs (12 replicated). In addition, we identified 18 genes and 3 pathways that were associated with survival outcome across cancers (Bonferroni P < 0.05). CONCLUSIONS In this study, we identified potential predisposition genes and pathways based on rare variants in nine cancers. IMPACT This work adds to the knowledge base and progress being made in precision medicine.
Collapse
Affiliation(s)
- Manu Shivakumar
- Biomedical & Translational Informatics Institute, Geisinger, Danville, Pennsylvania
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Jason E Miller
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, Pennsylvania
| | | | - Yanfei Zhang
- Genomic Medicine Institute, Geisinger, Danville, Pennsylvania
| | | | - David J Carey
- Department of Molecular and Functional Genomics, Geisinger, Danville, Pennsylvania
| | - Radhika Gogoi
- Weis Center for Research, Geisinger Clinic, Danville, Pennsylvania.
| | | |
Collapse
|
7
|
Adelson RP, Renton AE, Li W, Barzilai N, Atzmon G, Goate AM, Davies P, Freudenberg-Hua Y. Empirical design of a variant quality control pipeline for whole genome sequencing data using replicate discordance. Sci Rep 2019; 9:16156. [PMID: 31695094 PMCID: PMC6834861 DOI: 10.1038/s41598-019-52614-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Accepted: 10/18/2019] [Indexed: 12/29/2022] Open
Abstract
The success of next-generation sequencing depends on the accuracy of variant calls. Few objective protocols exist for QC following variant calling from whole genome sequencing (WGS) data. After applying QC filtering based on Genome Analysis Tool Kit (GATK) best practices, we used genotype discordance of eight samples that were sequenced twice each to evaluate the proportion of potentially inaccurate variant calls. We designed a QC pipeline involving hard filters to improve replicate genotype concordance, which indicates improved accuracy of genotype calls. Our pipeline analyzes the efficacy of each filtering step. We initially applied this strategy to well-characterized variants from the ClinVar database, and subsequently to the full WGS dataset. The genome-wide biallelic pipeline removed 82.11% of discordant and 14.89% of concordant genotypes, and improved the concordance rate from 98.53% to 99.69%. The variant-level read depth filter most improved the genome-wide biallelic concordance rate. We also adapted this pipeline for triallelic sites, given the increasing proportion of multiallelic sites as sample sizes increase. For triallelic sites containing only SNVs, the concordance rate improved from 97.68% to 99.80%. Our QC pipeline removes many potentially false positive calls that pass in GATK, and may inform future WGS studies prior to variant effect analysis.
Collapse
Affiliation(s)
- Robert P Adelson
- Litwin-Zucker Center for Alzheimer's Disease, The Feinstein Institute for Medical Research, Northwell Health, Manhasset, New York, 11030, USA
| | - Alan E Renton
- Ronald M. Loeb Center for Alzheimer's Disease and Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
| | - Wentian Li
- Robert S. Boas Center for Genomics & Human Genetics, The Feinstein Institute for Medical Research, Northwell Health, Manhasset, New York, 11030, USA
| | - Nir Barzilai
- Robert S. Boas Center for Genomics & Human Genetics, The Feinstein Institute for Medical Research, Northwell Health, Manhasset, New York, 11030, USA
| | - Gil Atzmon
- Institute for Aging Research, Albert Einstein College of Medicine, Bronx, New York, 10461, USA
- Faculty of Natural Sciences, University of Haifa, Haifa, 31905, Israel
| | - Alison M Goate
- Ronald M. Loeb Center for Alzheimer's Disease and Departments of Neuroscience, Genetics and Genomic Sciences, and Neurology, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
| | - Peter Davies
- Litwin-Zucker Center for Alzheimer's Disease, The Feinstein Institute for Medical Research, Northwell Health, Manhasset, New York, 11030, USA
| | - Yun Freudenberg-Hua
- Litwin-Zucker Center for Alzheimer's Disease, The Feinstein Institute for Medical Research, Northwell Health, Manhasset, New York, 11030, USA.
- Division of Geriatric Psychiatry, Zucker Hillside Hospital, Northwell Health, Glen Oaks, New York, 11004, USA.
| |
Collapse
|
8
|
Dershem R, Metpally RPR, Jeffreys K, Krishnamurthy S, Smelser DT, Hershfinkel M, Carey DJ, Robishaw JD, Breitwieser GE. Rare-variant pathogenicity triage and inclusion of synonymous variants improves analysis of disease associations of orphan G protein-coupled receptors. J Biol Chem 2019; 294:18109-18121. [PMID: 31628190 DOI: 10.1074/jbc.ra119.009253] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 10/08/2019] [Indexed: 02/02/2023] Open
Abstract
The pace of deorphanization of G protein-coupled receptors (GPCRs) has slowed, and new approaches are required. Small molecule targeting of orphan GPCRs can potentially be of clinical benefit even if the endogenous receptor ligand has not been identified. Many GPCRs lack common variants that lead to reproducible genome-wide disease associations, and rare-variant approaches have emerged as a viable alternative to identify disease associations for such genes. Therefore, our goal was to prioritize orphan GPCRs by determining their associations with human diseases in a large clinical population. We used sequence kernel association tests to assess the disease associations of 85 orphan or understudied GPCRs in an unselected cohort of 51,289 individuals. Using rare loss-of-function variants, missense variants predicted to be pathogenic or likely pathogenic, and a subset of rare synonymous variants that cause large changes in local codon bias as independent data sets, we found strong, phenome-wide disease associations shared by two or more variant categories for 39% of the GPCRs. To validate the bioinformatics and sequence kernel association test analyses, we functionally characterized rare missense and synonymous variants of GPR39, a family A GPCR, revealing altered expression or Zn2+-mediated signaling for members of both variant classes. These results support the utility of rare variant analyses for identifying disease associations for GPCRs that lack impactful common variants. We highlight the importance of rare synonymous variants in human physiology and argue for their routine inclusion in any comprehensive analysis of genomic variants as potential causes of disease.
Collapse
Affiliation(s)
- Ridge Dershem
- Department of Molecular and Functional Genomics, Geisinger, Weis Center for Research, Danville, Pennsylvania 17822
| | - Raghu P R Metpally
- Department of Molecular and Functional Genomics, Geisinger, Weis Center for Research, Danville, Pennsylvania 17822
| | - Kirk Jeffreys
- Department of Molecular and Functional Genomics, Geisinger, Weis Center for Research, Danville, Pennsylvania 17822
| | - Sarathbabu Krishnamurthy
- Department of Molecular and Functional Genomics, Geisinger, Weis Center for Research, Danville, Pennsylvania 17822
| | - Diane T Smelser
- Department of Molecular and Functional Genomics, Geisinger, Weis Center for Research, Danville, Pennsylvania 17822
| | - Michal Hershfinkel
- Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheva, 8410501 Israel
| | -
- Regeneron Pharmaceuticals, Inc., Tarrytown, New York 10591
| | - David J Carey
- Department of Molecular and Functional Genomics, Geisinger, Weis Center for Research, Danville, Pennsylvania 17822
| | - Janet D Robishaw
- Schmidt College of Medicine, Florida Atlantic University, Boca Raton, Florida 33431
| | - Gerda E Breitwieser
- Department of Molecular and Functional Genomics, Geisinger, Weis Center for Research, Danville, Pennsylvania 17822.
| |
Collapse
|
9
|
Shivakumar M, Miller JE, Dasari VR, Gogoi R, Kim D. Exome-Wide Rare Variant Analysis From the DiscovEHR Study Identifies Novel Candidate Predisposition Genes for Endometrial Cancer. Front Oncol 2019; 9:574. [PMID: 31338326 PMCID: PMC6626914 DOI: 10.3389/fonc.2019.00574] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Accepted: 06/13/2019] [Indexed: 12/19/2022] Open
Abstract
Endometrial cancer is the fourth most commonly diagnosed cancer in women. Family history is a known risk factor for endometrial cancer. The incidence of endometrial cancer in a first-degree relative elevates the relative risk to range between 1.3 and 2.8. It is unclear to what extent or what other novel germline variants are at play in endometrial cancer. We aim to address this question by utilizing whole exome sequencing as a means to identify novel, rare variant associations between exonic regions and endometrial cancer. The MyCode community health initiative is an excellent resource for this study with germline whole exome data for 60,000 patients available in the first phase, and further 30,000 patients independently sequenced in the second phase as part of DiscovEHR study. We conducted exome-wide rare variant association using 472 cases and 4,110 controls in 60,000 patients (discovery cohort); and 261 cases and 1,531 controls from 30,000 patients (replication cohort). After binning rare germline variants into genes, case-control association tests performed using Optimal Unified Approach for Rare-Variant Association, SKAT-O. Seven genes, including RBM12, NDUFB6, ATP6V1A, RECK, SLC35E1, RFX3 (Bonferroni-corrected P < 0.05) and ATP8A1 (suggestive P < 10−5), and one long non-coding RNA, DLGAP4-AS1 (Bonferroni-corrected P < 0.05), were associated with endometrial cancer. Notably, RECK, and ATP8A1 were replicated from the replication cohort (suggestive threshold P < 0.05). Additionally, a pathway-based rare variant analysis, using pathogenic and likely pathogenic variants, identified two significant pathways, pyrimidine metabolism and protein processing in the endoplasmic reticulum (Bonferroni-corrected P < 0.05). In conclusion, our results using the single-source electronic health records (EHR) linked to genomic data highlights candidate genes and pathways associated with endometrial cancer and indicates rare variants involvement in endometrial cancer predisposition, which could help in personalized prognosis and also further our understanding of its genetic etiology.
Collapse
Affiliation(s)
- Manu Shivakumar
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, United States
| | - Jason E Miller
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, United States.,Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | | | - Radhika Gogoi
- Weis Center for Research, Geisinger Clinic, Danville, PA, United States
| | - Dokyoon Kim
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, United States.,Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.,Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
10
|
Zhang X, Basile AO, Pendergrass SA, Ritchie MD. Real world scenarios in rare variant association analysis: the impact of imbalance and sample size on the power in silico. BMC Bioinformatics 2019; 20:46. [PMID: 30669967 PMCID: PMC6343276 DOI: 10.1186/s12859-018-2591-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 12/26/2018] [Indexed: 11/11/2022] Open
Abstract
Background The development of sequencing techniques and statistical methods provides great opportunities for identifying the impact of rare genetic variation on complex traits. However, there is a lack of knowledge on the impact of sample size, case numbers, the balance of cases vs controls for both burden and dispersion based rare variant association methods. For example, Phenome-Wide Association Studies may have a wide range of case and control sample sizes across hundreds of diagnoses and traits, and with the application of statistical methods to rare variants, it is important to understand the strengths and limitations of the analyses. Results We conducted a large-scale simulation of randomly selected low-frequency protein-coding regions using twelve different balanced samples with an equal number of cases and controls as well as twenty-one unbalanced sample scenarios. We further explored statistical performance of different minor allele frequency thresholds and a range of genetic effect sizes. Our simulation results demonstrate that using an unbalanced study design has an overall higher type I error rate for both burden and dispersion tests compared with a balanced study design. Regression has an overall higher type I error with balanced cases and controls, while SKAT has higher type I error for unbalanced case-control scenarios. We also found that both type I error and power were driven by the number of cases in addition to the case to control ratio under large control group scenarios. Based on our power simulations, we observed that a SKAT analysis with case numbers larger than 200 for unbalanced case-control models yielded over 90% power with relatively well controlled type I error. To achieve similar power in regression, over 500 cases are needed. Moreover, SKAT showed higher power to detect associations in unbalanced case-control scenarios than regression. Conclusions Our results provide important insights into rare variant association study designs by providing a landscape of type I error and statistical power for a wide range of sample sizes. These results can serve as a benchmark for making decisions about study design for rare variant analyses. Electronic supplementary material The online version of this article (10.1186/s12859-018-2591-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xinyuan Zhang
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Anna O Basile
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Sarah A Pendergrass
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, USA
| | - Marylyn D Ritchie
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. .,Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA.
| |
Collapse
|