1
|
Mondragon-Estrada E, Newburger JW, DePalma SR, Brueckner M, Cleveland J, Chung WK, Gelb BD, Goldmuntz E, Hagler DJ, Huang H, McQuillen P, Miller TA, Panigrahy A, Porter GA, Roberts AE, Rollins CK, Russell MW, Tristani-Firouzi M, Grant PE, Im K, Morton SU. Noncoding variants and sulcal patterns in congenital heart disease: Machine learning to predict functional impact. iScience 2025; 28:111707. [PMID: 39877905 PMCID: PMC11772982 DOI: 10.1016/j.isci.2024.111707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Revised: 11/24/2024] [Accepted: 12/26/2024] [Indexed: 01/31/2025] Open
Abstract
Neurodevelopmental impairments associated with congenital heart disease (CHD) may arise from perturbations in brain developmental pathways, including the formation of sulcal patterns. While genetic factors contribute to sulcal features, the association of noncoding de novo variants (ncDNVs) with sulcal patterns in people with CHD remains poorly understood. Leveraging deep learning models, we examined the predicted impact of ncDNVs on gene regulatory signals. Predicted impact was compared between participants with CHD and a jointly called cohort without CHD. We then assessed the relationship of the predicted impact of ncDNVs with their sulcal folding patterns. ncDNVs predicted to increase H3K9me2 modification were associated with larger disruptions in right parietal sulcal patterns in the CHD cohort. Genes predicted to be regulated by these ncDNVs were enriched for functions related to neuronal development. This highlights the potential of deep learning models to generate hypotheses about the role of noncoding variants in brain development.
Collapse
Affiliation(s)
- Enrique Mondragon-Estrada
- Division of Newborn Medicine, Department of Pediatrics, Boston Children’s Hospital, Boston, MA, USA
- Fetal Neonatal Neuroimaging and Developmental Science Center, Boston Children’s Hospital, Boston, MA, USA
| | - Jane W. Newburger
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Department of Cardiology, Boston Children’s Hospital, Boston, MA, USA
| | | | - Martina Brueckner
- Departments of Genetics and Pediatrics, Yale University School of Medicine, New Haven, CT, USA
| | - John Cleveland
- Departments of Surgery and Pediatrics, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Wendy K. Chung
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Department of Pediatrics, Boston Children’s Hospital, Boston, MA, USA
| | - Bruce D. Gelb
- Mindich Child Health and Development Institute and Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Elizabeth Goldmuntz
- Division of Cardiology, Children’s Hospital of Philadelphia, Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Donald J. Hagler
- Center for Multimodal Imaging and Genetics, University of California San Diego, La Jolla, CA, USA
- Department of Radiology, School of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Hao Huang
- Department of Radiology, Children’s Hospital of Philadelphia, University of Pennsylvania, Philadelphia, PA, USA
| | - Patrick McQuillen
- Departments of Pediatrics and Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Thomas A. Miller
- Department of Pediatrics, Primary Children’s Hospital, University of Utah, Salt Lake City, UT, USA
- Division of Pediatric Cardiology, Maine Medical Center, Portland, ME, USA
| | - Ashok Panigrahy
- Department of Pediatric Radiology, Children’s Hospital of Pittsburgh, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - George A. Porter
- Department of Pediatrics, University of Rochester Medical Center, Rochester, NY, USA
| | - Amy E. Roberts
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Department of Cardiology, Boston Children’s Hospital, Boston, MA, USA
- Division of Genetics and Genomics, Department of Pediatrics, Boston Children’s Hospital, Boston, MA, USA
| | - Caitlin K. Rollins
- Department of Neurology, Boston Children’s Hospital, Boston, MA, USA
- Department of Neurology, Harvard Medical School, Boston, MA, USA
| | - Mark W. Russell
- Department of Pediatrics, C.S. Mott Children’s Hospital, University of Michigan, Ann Arbor, MI, USA
| | - Martin Tristani-Firouzi
- Division of Pediatric Cardiology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - P. Ellen Grant
- Division of Newborn Medicine, Department of Pediatrics, Boston Children’s Hospital, Boston, MA, USA
- Fetal Neonatal Neuroimaging and Developmental Science Center, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Department of Radiology, Boston Children’s Hospital, Boston, MA, USA
| | - Kiho Im
- Division of Newborn Medicine, Department of Pediatrics, Boston Children’s Hospital, Boston, MA, USA
- Fetal Neonatal Neuroimaging and Developmental Science Center, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Sarah U. Morton
- Division of Newborn Medicine, Department of Pediatrics, Boston Children’s Hospital, Boston, MA, USA
- Fetal Neonatal Neuroimaging and Developmental Science Center, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
2
|
Mousavi Z, Arvanitis M, Duong T, Brody JA, Battle A, Sotoodehnia N, Shojaie A, Arking DE, Bader JS. Prioritization of causal genes from genome-wide association studies by Bayesian data integration across loci. PLoS Comput Biol 2025; 21:e1012725. [PMID: 39774334 PMCID: PMC11741684 DOI: 10.1371/journal.pcbi.1012725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 01/17/2025] [Accepted: 12/16/2024] [Indexed: 01/11/2025] Open
Abstract
MOTIVATION Genome-wide association studies (GWAS) have identified genetic variants, usually single-nucleotide polymorphisms (SNPs), associated with human traits, including disease and disease risk. These variants (or causal variants in linkage disequilibrium with them) usually affect the regulation or function of a nearby gene. A GWAS locus can span many genes, however, and prioritizing which gene or genes in a locus are most likely to be causal remains a challenge. Better prioritization and prediction of causal genes could reveal disease mechanisms and suggest interventions. RESULTS We describe a new Bayesian method, termed SigNet for significance networks, that combines information both within and across loci to identify the most likely causal gene at each locus. The SigNet method builds on existing methods that focus on individual loci with evidence from gene distance and expression quantitative trait loci (eQTL) by sharing information across loci using protein-protein and gene regulatory interaction network data. In an application to cardiac electrophysiology with 226 GWAS loci, only 46 (20%) have within-locus evidence from Mendelian genes, protein-coding changes, or colocalization with eQTL signals. At the remaining 180 loci lacking functional information, SigNet selects 56 genes other than the minimum distance gene, equal to 31% of the information-poor loci and 25% of the GWAS loci overall. Assessment by pathway enrichment demonstrates improved performance by SigNet. Review of individual loci shows literature evidence for genes selected by SigNet, including PMP22 as a novel causal gene candidate.
Collapse
Affiliation(s)
- Zeinab Mousavi
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Marios Arvanitis
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - ThuyVy Duong
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Jennifer A. Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, Washington, United States of America
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Nona Sotoodehnia
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, Washington, United States of America
| | - Ali Shojaie
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Dan E. Arking
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| | - Joel S. Bader
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
| |
Collapse
|
3
|
Choi J, Xu Z, Sun R. Variance-components tests for genetic association with multiple interval-censored outcomes. Stat Med 2024; 43:2560-2574. [PMID: 38636557 PMCID: PMC11116038 DOI: 10.1002/sim.10081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 02/18/2024] [Accepted: 04/02/2024] [Indexed: 04/20/2024]
Abstract
Massive genetic compendiums such as the UK Biobank have become an invaluable resource for identifying genetic variants that are associated with complex diseases. Due to the difficulties of massive data collection, a common practice of these compendiums is to collect interval-censored data. One challenge in analyzing such data is the lack of methodology available for genetic association studies with interval-censored data. Genetic effects are difficult to detect because of their rare and weak nature, and often the time-to-event outcomes are transformed to binary phenotypes for access to more powerful signal detection approaches. However transforming the data to binary outcomes can result in loss of valuable information. To alleviate such challenges, this work develops methodology to associate genetic variant sets with multiple interval-censored outcomes. Testing sets of variants such as genes or pathways is a common approach in genetic association settings to lower the multiple testing burden, aggregate small effects, and improve interpretations of results. Instead of performing inference with only a single outcome, utilizing multiple outcomes can increase statistical power by aggregating information across multiple correlated phenotypes. Simulations show that the proposed strategy can offer significant power gains over a single outcome approach. We apply the proposed test to the investigation that motivated this study, a search for the genes that perturb risks of bone fractures and falls in the UK Biobank.
Collapse
Affiliation(s)
- Jaihee Choi
- Department of Statistics, Rice University, Texas, USA
| | - Zhichao Xu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Texas, USA
| | - Ryan Sun
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Texas, USA
| |
Collapse
|
4
|
Melton HJ, Zhang Z, Wu C. SUMMIT-FA: a new resource for improved transcriptome imputation using functional annotations. Hum Mol Genet 2024; 33:624-635. [PMID: 38129112 PMCID: PMC10954367 DOI: 10.1093/hmg/ddad205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 10/24/2023] [Accepted: 11/30/2023] [Indexed: 12/23/2023] Open
Abstract
Transcriptome-wide association studies (TWAS) integrate gene expression prediction models and genome-wide association studies (GWAS) to identify gene-trait associations. The power of TWAS is determined by the sample size of GWAS and the accuracy of the expression prediction model. Here, we present a new method, the Summary-level Unified Method for Modeling Integrated Transcriptome using Functional Annotations (SUMMIT-FA), which improves gene expression prediction accuracy by leveraging functional annotation resources and a large expression quantitative trait loci (eQTL) summary-level dataset. We build gene expression prediction models in whole blood using SUMMIT-FA with the comprehensive functional database MACIE and eQTL summary-level data from the eQTLGen consortium. We apply these models to GWAS for 24 complex traits and show that SUMMIT-FA identifies significantly more gene-trait associations and improves predictive power for identifying "silver standard" genes compared to several benchmark methods. We further conduct a simulation study to demonstrate the effectiveness of SUMMIT-FA.
Collapse
Affiliation(s)
- Hunter J Melton
- Department of Statistics, Florida State University, 214 Rogers Building, 117 N. Woodward Avenue, Tallahassee, FL 32306, United States
| | - Zichen Zhang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 7007 Bertner Avenue, Unit 1689, Houston, TX 77030, United States
| | - Chong Wu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, 7007 Bertner Avenue, Unit 1689, Houston, TX 77030, United States
| |
Collapse
|
5
|
Zhu H, Choi J, Kui N, Yang T, Wei P, Li D, Sun R. Identification of Pancreatic Cancer Germline Risk Variants With Effects That Are Modified by Smoking. JCO Precis Oncol 2024; 8:e2300355. [PMID: 38564682 PMCID: PMC11000774 DOI: 10.1200/po.23.00355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 12/08/2023] [Accepted: 02/08/2024] [Indexed: 04/04/2024] Open
Abstract
PURPOSE Pancreatic cancer (PC) is a deadly disease most often diagnosed in late stages. Identification of high-risk subjects could both contribute to preventative measures and help diagnose the disease at earlier timepoints. However, known risk factors, assessed independently, are currently insufficient for accurately stratifying patients. We use large-scale data from the UK Biobank (UKB) to identify genetic variant-smoking interaction effects and show their importance in risk assessment. METHODS We draw data from 15,086,830 genetic variants and 315,512 individuals in the UKB. There are 765 cases of PC. Crucially, robust resampling corrections are used to overcome well-known challenges in hypothesis testing for interactions. Replication analysis is conducted in two independent cohorts totaling 793 cases and 570 controls. Integration of functional annotation data and construction of polygenic risk scores (PRS) demonstrate the additional insight provided by interaction effects. RESULTS We identify the genome-wide significant variant rs77196339 on chromosome 2 (per minor allele odds ratio in never-smokers, 2.31 [95% CI, 1.69 to 3.15]; per minor allele odds ratio in ever-smokers, 0.53 [95% CI, 0.30 to 0.91]; P = 3.54 × 10-8) as well as eight other loci with suggestive evidence of interaction effects (P < 5 × 10-6). The rs77196339 region association is validated (P < .05) in the replication sample. PRS incorporating interaction effects show improved discriminatory ability over PRS of main effects alone. CONCLUSION This study of genome-wide germline variants identified smoking to modify the effect of rs77196339 on PC risk. Interactions between known risk factors can provide critical information for identifying high-risk subjects, given the relative inadequacy of models considering only main effects, as demonstrated in PRS. Further studies are necessary to advance toward comprehensive risk prediction approaches for PC.
Collapse
Affiliation(s)
- Huili Zhu
- Section of Hematology and Oncology, Department of Medicine, Baylor College of Medicine, Houston, Texas
| | - Jaihee Choi
- Department of Statistics, Rice University, Houston, Texas
| | - Naishu Kui
- Department of Biostatistics, University of Texas School of Public Health, Houston, Texas
| | - Tianzhong Yang
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, Minnesota
| | - Peng Wei
- Department of Biostatistics, Division of Basic Science, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Donghui Li
- Department of Gastrointestinal Medical Oncology, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Ryan Sun
- Department of Biostatistics, Division of Basic Science, The University of Texas MD Anderson Cancer Center, Houston, Texas
| |
Collapse
|
6
|
Li H, Yu Z, Du F, Song L, Gao Y, Shi F. sscNOVA: a semi-supervised convolutional neural network for predicting functional regulatory variants in autoimmune diseases. Front Immunol 2024; 15:1323072. [PMID: 38380333 PMCID: PMC10876991 DOI: 10.3389/fimmu.2024.1323072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 01/15/2024] [Indexed: 02/22/2024] Open
Abstract
Genome-wide association studies (GWAS) have identified thousands of variants in the human genome with autoimmune diseases. However, identifying functional regulatory variants associated with autoimmune diseases remains challenging, largely because of insufficient experimental validation data. We adopt the concept of semi-supervised learning by combining labeled and unlabeled data to develop a deep learning-based algorithm framework, sscNOVA, to predict functional regulatory variants in autoimmune diseases and analyze the functional characteristics of these regulatory variants. Compared to traditional supervised learning methods, our approach leverages more variants' data to explore the relationship between functional regulatory variants and autoimmune diseases. Based on the experimentally curated testing dataset and evaluation metrics, we find that sscNOVA outperforms other state-of-the-art methods. Furthermore, we illustrate that sscNOVA can help to improve the prioritization of functional regulatory variants from lead single-nucleotide polymorphisms and the proxy variants in autoimmune GWAS data.
Collapse
Affiliation(s)
- Haibo Li
- School of Information Engineering, Ningxia University, Yinchuan, China
| | - Zhenhua Yu
- School of Information Engineering, Ningxia University, Yinchuan, China
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Yinchuan, Ningxia University, Yinchuan, China
| | - Fang Du
- School of Information Engineering, Ningxia University, Yinchuan, China
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Yinchuan, Ningxia University, Yinchuan, China
| | - Lijuan Song
- School of Information Engineering, Ningxia University, Yinchuan, China
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Yinchuan, Ningxia University, Yinchuan, China
| | - Yang Gao
- School of Medical Technology, North Minzu University, Yinchuan, China
| | - Fangyuan Shi
- School of Information Engineering, Ningxia University, Yinchuan, China
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Yinchuan, Ningxia University, Yinchuan, China
| |
Collapse
|
7
|
Feng X, Liu S, Li K, Bu F, Yuan H. NCAD v1.0: a database for non-coding variant annotation and interpretation. J Genet Genomics 2024; 51:230-242. [PMID: 38142743 DOI: 10.1016/j.jgg.2023.12.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 12/15/2023] [Accepted: 12/18/2023] [Indexed: 12/26/2023]
Abstract
The application of whole genome sequencing is expanding in clinical diagnostics across various genetic disorders, and the significance of non-coding variants in penetrant diseases is increasingly being demonstrated. Therefore, it is urgent to improve the diagnostic yield by exploring the pathogenic mechanisms of variants in non-coding regions. However, the interpretation of non-coding variants remains a significant challenge, due to the complex functional regulatory mechanisms of non-coding regions and the current limitations of available databases and tools. Hence, we develop the non-coding variant annotation database (NCAD, http://www.ncawdb.net/), encompassing comprehensive insights into 665,679,194 variants, regulatory elements, and element interaction details. Integrating data from 96 sources, spanning both GRCh37 and GRCh38 versions, NCAD v1.0 provides vital information to support the genetic diagnosis of non-coding variants, including allele frequencies of 12 diverse populations, with a particular focus on the population frequency information for 230,235,698 variants in 20,964 Chinese individuals. Moreover, it offers prediction scores for variant functionality, five categories of regulatory elements, and four types of non-coding RNAs. With its rich data and comprehensive coverage, NCAD serves as a valuable platform, empowering researchers and clinicians with profound insights into non-coding regulatory mechanisms while facilitating the interpretation of non-coding variants.
Collapse
Affiliation(s)
- Xiaoshu Feng
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China
| | - Sihan Liu
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China
| | - Ke Li
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China
| | - Fengxiao Bu
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China.
| | - Huijun Yuan
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China.
| |
Collapse
|
8
|
Miltenberger-Miltenyi G, Ortega RA, Domingo A, Yadav R, Nishiyama A, Raymond D, Katsnelson V, Urval N, Swan M, Shanker V, Miravite J, Walker RH, Bressman SB, Ozelius LJ, Cabassa JC, Saunders-Pullman R. Genetic risk variants in New Yorkers of Puerto Rican and Dominican Republic heritage with Parkinson's disease. NPJ Parkinsons Dis 2023; 9:160. [PMID: 38062033 PMCID: PMC10703927 DOI: 10.1038/s41531-023-00599-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 11/02/2023] [Indexed: 01/31/2024] Open
Abstract
There is a paucity of genetic characterization in people with Parkinson's disease (PD) of Latino and Afro-Caribbean descent. Screening LRRK2 and GBA variants in 32 New Yorkers of Puerto Rican ethnicity with PD and in 119 non-Hispanic-non-Jewish European PD cases revealed that Puerto Rican participants were more likely to harbor the LRRK2-p.G2019S variant (15.6% vs. 4.2%, respectively). Additionally, whole exome sequencing of twelve Puerto Rican and Dominican PD participants was performed as an exploratory study.
Collapse
Affiliation(s)
- Gabriel Miltenberger-Miltenyi
- Laboratório de Genética, Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal.
- Department of Neurology, Ludwig-Maximilians-Universität München, Munich, Germany.
- Department of Neurology, Icahn School of Medicine, Mount Sinai, New York, NY, USA.
| | - Roberto A Ortega
- Department of Neurology, Icahn School of Medicine, Mount Sinai, New York, NY, USA
- Department of Neurology, Mount Sinai Beth Israel, New York, NY, USA
| | - Aloysius Domingo
- Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genomics, Broad Institute, Cambridge, MA, USA
| | - Rachita Yadav
- Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genomics, Broad Institute, Cambridge, MA, USA
| | - Ayumi Nishiyama
- Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
| | - Deborah Raymond
- Department of Neurology, Icahn School of Medicine, Mount Sinai, New York, NY, USA
- Department of Neurology, Mount Sinai Beth Israel, New York, NY, USA
| | - Viktoriya Katsnelson
- Department of Neurology, Icahn School of Medicine, Mount Sinai, New York, NY, USA
- Department of Neurology, Mount Sinai Beth Israel, New York, NY, USA
| | - Nikita Urval
- Department of Neurology, Icahn School of Medicine, Mount Sinai, New York, NY, USA
- Department of Neurology, Mount Sinai Beth Israel, New York, NY, USA
| | - Matthew Swan
- Department of Neurology, Icahn School of Medicine, Mount Sinai, New York, NY, USA
- Department of Neurology, Mount Sinai Beth Israel, New York, NY, USA
| | - Vicki Shanker
- Department of Neurology, Icahn School of Medicine, Mount Sinai, New York, NY, USA
- Department of Neurology, Mount Sinai Beth Israel, New York, NY, USA
| | - Joan Miravite
- Department of Neurology, Icahn School of Medicine, Mount Sinai, New York, NY, USA
- Department of Neurology, Mount Sinai Beth Israel, New York, NY, USA
| | - Ruth H Walker
- Department of Neurology, Icahn School of Medicine, Mount Sinai, New York, NY, USA
- Department of Neurology, James J. Peters Veterans Affairs Medical Center, Bronx, NY, USA
| | - Susan B Bressman
- Department of Neurology, Icahn School of Medicine, Mount Sinai, New York, NY, USA
- Department of Neurology, Mount Sinai Beth Israel, New York, NY, USA
| | - Laurie J Ozelius
- Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
| | - José C Cabassa
- Department of Neurology, Icahn School of Medicine, Mount Sinai, New York, NY, USA
| | - Rachel Saunders-Pullman
- Department of Neurology, Icahn School of Medicine, Mount Sinai, New York, NY, USA.
- Department of Neurology, Mount Sinai Beth Israel, New York, NY, USA.
| |
Collapse
|
9
|
Mo C, Ye Z, Pan Y, Zhang Y, Wu Q, Bi C, Liu S, Mitchell B, Kochunov P, Hong LE, Ma T, Chen S. An in-depth association analysis of genetic variants within nicotine-related loci: Meeting in middle of GWAS and genetic fine-mapping. Mol Cell Neurosci 2023; 127:103895. [PMID: 37634742 PMCID: PMC11128188 DOI: 10.1016/j.mcn.2023.103895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 08/21/2023] [Accepted: 08/24/2023] [Indexed: 08/29/2023] Open
Abstract
In the last two decades of Genome-wide association studies (GWAS), nicotine-dependence-related genetic loci (e.g., nicotinic acetylcholine receptor - nAChR subunit genes) are among the most replicable genetic findings. Although GWAS results have reported tens of thousands of SNPs within these loci, further analysis (e.g., fine-mapping) is required to identify the causal variants. However, it is computationally challenging for existing fine-mapping methods to reliably identify causal variants from thousands of candidate SNPs based on the posterior inclusion probability. To address this challenge, we propose a new method to select SNPs by jointly modeling the SNP-wise inference results and the underlying structured network patterns of the linkage disequilibrium (LD) matrix. We use adaptive dense subgraph extraction method to recognize the latent network patterns of the LD matrix and then apply group LASSO to select causal variant candidates. We applied this new method to the UK biobank data to identify the causal variant candidates for nicotine addiction. Eighty-one nicotine addiction-related SNPs (i.e.,-log(p) > 50) of nAChR were selected, which are highly correlated (average r2>0.8) although they are physically distant (e.g., >200 kilobase away) and from various genes. These findings revealed that distant SNPs from different genes can show higher LD r2 than their neighboring SNPs, and jointly contribute to a complex trait like nicotine addiction.
Collapse
Affiliation(s)
- Chen Mo
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
| | - Zhenyao Ye
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
| | - Yezhi Pan
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
| | - Yuan Zhang
- Department of Statistics, College of Arts and Sciences, Ohio State University, Columbus, Ohio, United States
| | - Qiong Wu
- Department of Mathematics, University of Maryland, College Park, Maryland, United States
| | - Chuan Bi
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
| | - Song Liu
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, Shandong, China
| | - Braxton Mitchell
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
| | - Peter Kochunov
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
| | - L. Elliot Hong
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
| | - Tianzhou Ma
- Department of Epidemiology and Biostatistics, School of Public Health, University of Maryland, College Park, Maryland, United States
| | - Shuo Chen
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States
- Division of Biostatistics and Bioinformatics, Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, Maryland, United States
| |
Collapse
|
10
|
Jiang Z, Zhang H, Ahearn TU, Garcia-Closas M, Chatterjee N, Zhu H, Zhan X, Zhao N. The sequence kernel association test for multicategorical outcomes. Genet Epidemiol 2023; 47:432-449. [PMID: 37078108 DOI: 10.1002/gepi.22527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 03/29/2023] [Accepted: 03/30/2023] [Indexed: 04/21/2023]
Abstract
Disease heterogeneity is ubiquitous in biomedical and clinical studies. In genetic studies, researchers are increasingly interested in understanding the distinct genetic underpinning of subtypes of diseases. However, existing set-based analysis methods for genome-wide association studies are either inadequate or inefficient to handle such multicategorical outcomes. In this paper, we proposed a novel set-based association analysis method, sequence kernel association test (SKAT)-MC, the sequence kernel association test for multicategorical outcomes (nominal or ordinal), which jointly evaluates the relationship between a set of variants (common and rare) and disease subtypes. Through comprehensive simulation studies, we showed that SKAT-MC effectively preserves the nominal type I error rate while substantially increases the statistical power compared to existing methods under various scenarios. We applied SKAT-MC to the Polish breast cancer study (PBCS), and identified gene FGFR2 was significantly associated with estrogen receptor (ER)+ and ER- breast cancer subtypes. We also investigated educational attainment using UK Biobank data (N = 127 , 127 $N=127,127$ ) with SKAT-MC, and identified 21 significant genes in the genome. Consequently, SKAT-MC is a powerful and efficient analysis tool for genetic association studies with multicategorical outcomes. A freely distributed R package SKAT-MC can be accessed at https://github.com/Zhiwen-Owen-Jiang/SKATMC.
Collapse
Affiliation(s)
- Zhiwen Jiang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, USA
| | - Thomas U Ahearn
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, USA
| | - Montserrat Garcia-Closas
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Xiang Zhan
- Department of Biostatistics, Peking University, Beijing, China
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA
| |
Collapse
|
11
|
McCaw ZR, O'Dushlaine C, Somineni H, Bereket M, Klein C, Karaletsos T, Casale FP, Koller D, Soare TW. An allelic-series rare-variant association test for candidate-gene discovery. Am J Hum Genet 2023; 110:1330-1342. [PMID: 37494930 PMCID: PMC10432147 DOI: 10.1016/j.ajhg.2023.07.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 06/30/2023] [Accepted: 07/01/2023] [Indexed: 07/28/2023] Open
Abstract
Allelic series are of candidate therapeutic interest because of the existence of a dose-response relationship between the functionality of a gene and the degree or severity of a phenotype. We define an allelic series as a collection of variants in which increasingly deleterious mutations lead to increasingly large phenotypic effects, and we have developed a gene-based rare-variant association test specifically targeted to identifying genes containing allelic series. Building on the well-known burden test and sequence kernel association test (SKAT), we specify a variety of association models covering different genetic architectures and integrate these into a Coding-Variant Allelic-Series Test (COAST). Through extensive simulations, we confirm that COAST maintains the type I error and improves the power when the pattern of coding-variant effect sizes increases monotonically with mutational severity. We applied COAST to identify allelic-series genes for four circulating-lipid traits and five cell-count traits among 145,735 subjects with available whole-exome sequencing data from the UK Biobank. Compared with optimal SKAT (SKAT-O), COAST identified 29% more Bonferroni-significant associations with circulating-lipid traits, on average, and 82% more with cell-count traits. All of the gene-trait associations identified by COAST have corroborating evidence either from rare-variant associations in the full cohort (Genebass, n = 400,000) or from common-variant associations in the GWAS Catalog. In addition to detecting many gene-trait associations present in Genebass by using only a fraction (36.9%) of the sample, COAST detects associations, such as that between ANGPTL4 and triglycerides, that are absent from Genebass but that have clear common-variant support.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Francesco Paolo Casale
- Institute of AI for Health, Helmholtz Munich, Neuherberg, Germany; Helmholtz Pioneer Campus, Helmholtz Munich, Neuherberg, Germany; School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | | | | |
Collapse
|
12
|
Han Y, Byun J, Zhu C, Sun R, Roh JY, Cordell HJ, Lee HS, Shaw VR, Kang SW, Razjouyan J, Cooley MA, Hassan MM, Siminovitch KA, Folseraas T, Ellinghaus D, Bergquist A, Rushbrook SM, Franke A, Karlsen TH, Lazaridis KN, McGlynn KA, Roberts LR, Amos CI. Multitrait genome-wide analyses identify new susceptibility loci and candidate drugs to primary sclerosing cholangitis. Nat Commun 2023; 14:1069. [PMID: 36828809 PMCID: PMC9958016 DOI: 10.1038/s41467-023-36678-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 02/10/2023] [Indexed: 02/26/2023] Open
Abstract
Primary sclerosing cholangitis (PSC) is a rare autoimmune bile duct disease that is strongly associated with immune-mediated disorders. In this study, we implemented multitrait joint analyses to genome-wide association summary statistics of PSC and numerous clinical and epidemiological traits to estimate the genetic contribution of each trait and genetic correlations between traits and to identify new lead PSC risk-associated loci. We identified seven new loci that have not been previously reported and one new independent lead variant in the previously reported locus. Functional annotation and fine-mapping nominated several potential susceptibility genes such as MANBA and IRF5. Network-based in silico drug efficacy screening provided candidate agents for further study of pharmacological effect in PSC.
Collapse
Affiliation(s)
- Younghun Han
- Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA
- Section of Epidemiology and Population Sciences, Department of Medicine, Baylor College of Medicine, Houston, TX, USA
| | - Jinyoung Byun
- Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA
- Section of Epidemiology and Population Sciences, Department of Medicine, Baylor College of Medicine, Houston, TX, USA
- Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Catherine Zhu
- Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA
| | - Ryan Sun
- Department of Biostatistics, University of Texas, M.D. Anderson Cancer Center, Houston, TX, USA
| | - Julia Y Roh
- Department of Pharmacy, Ochsner Health, New Orleans, LA, USA
| | - Heather J Cordell
- Population Health Sciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Hyun-Sung Lee
- David J. Sugarbaker Division of Thoracic Surgery, Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, TX, USA
| | - Vikram R Shaw
- Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA
| | - Sung Wook Kang
- David J. Sugarbaker Division of Thoracic Surgery, Michael E. DeBakey Department of Surgery, Baylor College of Medicine, Houston, TX, USA
| | - Javad Razjouyan
- VA HSR&D, Center for Innovations in Quality, Effectiveness and Safety, Michael E. DeBakey VA Medical Center, Houston, TX, USA
- Big Data Scientist Training Enhancement Program (BD-STEP), VA Office of Research and Development, Washington, DC, USA
- Department of Medicine, Baylor College of Medicine, Houston, TX, USA
- VA Quality Scholars Coordinating Center, IQuESt, Michael E. DeBakey VA Medical Center, Houston, TX, USA
| | - Matthew A Cooley
- Mayo Clinic Graduate School of Biomedical Sciences, Mayo Clinic, Rochester, MN, USA
| | - Manal M Hassan
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Katherine A Siminovitch
- Departments of Medicine, Immunology and Medical Sciences, University of Toronto, Toronto, Ontario, Canada
- Mount Sinai Hospital, Lunenfeld-Tanenbaum Research Institute and Toronto General Research Institute, Toronto, Ontario, Canada
| | - Trine Folseraas
- Norwegian PSC Research Center, Oslo University Hospital Rikshospitalet, Oslo, Norway
| | - David Ellinghaus
- Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
| | - Annika Bergquist
- Department of Medicine Huddinge, Unit of Gastroenterology and Rheumatology, Karolinska Institutet, Karolinska University Hospital, Stockholm, Sweden
| | - Simon M Rushbrook
- Department of Gastroenterology, Norfolk and Norwich University Hospital, Norwich, United Kingdom
- Norwich Medical School, University of East Anglia, Norfolk, United Kingdom
| | - Andre Franke
- Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
| | - Tom H Karlsen
- Oslo University Hospital Rikshospitalet and University of Oslo, Oslo, Norway
| | - Konstantinos N Lazaridis
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, Mayo Clinic, Rochester, MN, USA
| | - Katherine A McGlynn
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD, USA
| | - Lewis R Roberts
- Oslo University Hospital Rikshospitalet and University of Oslo, Oslo, Norway
| | - Christopher I Amos
- Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA.
- Section of Epidemiology and Population Sciences, Department of Medicine, Baylor College of Medicine, Houston, TX, USA.
- Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
13
|
Melton HJ, Zhang Z, Wu C. SUMMIT-FA: A new resource for improved transcriptome imputation using functional annotations. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.02.23285208. [PMID: 36798253 PMCID: PMC9934719 DOI: 10.1101/2023.02.02.23285208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Transcriptome-wide association studies (TWAS) integrate gene expression prediction models and genome-wide association studies (GWAS) to identify gene-trait associations. The power of TWAS is determined by the sample size of GWAS and the accuracy of the expression prediction model. Here, we present a new method, the Summary-level Unified Method for Modeling Integrated Transcriptome using Functional Annotations (SUMMIT-FA), that improves the accuracy of gene expression prediction by leveraging functional annotation resources and a large expression quantitative trait loci (eQTL) summary-level dataset. We build gene expression prediction models using SUMMIT-FA with a comprehensive functional database MACIE and the eQTL summary-level data from the eQTLGen consortium. By applying the resulting models to GWASs for 24 complex traits and exploring it through a simulation study, we show that SUMMIT-FA improves the accuracy of gene expression prediction models in whole blood, identifies significantly more gene-trait associations, and improves predictive power for identifying "silver standard" genes compared to several benchmark methods.
Collapse
Affiliation(s)
- Hunter J. Melton
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Zichen Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Chong Wu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
14
|
Agarwal A, Zhao F, Jiang Y, Chen L. TIVAN-indel: a computational framework for annotating and predicting non-coding regulatory small insertions and deletions. Bioinformatics 2023; 39:btad060. [PMID: 36707993 PMCID: PMC9900211 DOI: 10.1093/bioinformatics/btad060] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 01/20/2023] [Accepted: 01/25/2023] [Indexed: 01/29/2023] Open
Abstract
MOTIVATION Small insertion and deletion (sindel) of human genome has an important implication for human disease. One important mechanism for non-coding sindel (nc-sindel) to have an impact on human diseases and phenotypes is through the regulation of gene expression. Nevertheless, current sequencing experiments may lack statistical power and resolution to pinpoint the functional sindel due to lower minor allele frequency or small effect size. As an alternative strategy, a supervised machine learning method can identify the otherwise masked functional sindels by predicting their regulatory potential directly. However, computational methods for annotating and predicting the regulatory sindels, especially in the non-coding regions, are underdeveloped. RESULTS By leveraging labeled nc-sindels identified by cis-expression quantitative trait loci analyses across 44 tissues in Genotype-Tissue Expression (GTEx), and a compilation of both generic functional annotations and large-scale epigenomic profiles, we develop TIssue-specific Variant Annotation for Non-coding indel (TIVAN-indel), which is a supervised computational framework for predicting non-coding regulatory sindels. As a result, we demonstrate that TIVAN-indel achieves the best prediction performance in both with-tissue prediction and cross-tissue prediction. As an independent evaluation, we train TIVAN-indel from the 'Whole Blood' tissue in GTEx and test the model using 15 immune cell types from an independent study named Database of Immune Cell Expression. Lastly, we perform an enrichment analysis for both true and predicted sindels in key regulatory regions such as chromatin interactions, open chromatin regions and histone modification sites, and find biologically meaningful enrichment patterns. AVAILABILITY AND IMPLEMENTATION https://github.com/lichen-lab/TIVAN-indel. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Aman Agarwal
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| | - Fengdi Zhao
- Department of Biostatistics, University of Florida, Gainesville, FL 32603, USA
| | - Yuchao Jiang
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27516, USA
| | - Li Chen
- Department of Biostatistics, University of Florida, Gainesville, FL 32603, USA
| |
Collapse
|
15
|
Zhou H, Arapoglou T, Li X, Li Z, Zheng X, Moore J, Asok A, Kumar S, Blue E, Buyske S, Cox N, Felsenfeld A, Gerstein M, Kenny E, Li B, Matise T, Philippakis A, Rehm HL, Sofia HJ, Snyder G, NHGRI Genome Sequencing Program Variant Functional Annotation Working Group, Weng Z, Neale B, Sunyaev S, Lin X. FAVOR: functional annotation of variants online resource and annotator for variation across the human genome. Nucleic Acids Res 2023; 51:D1300-D1311. [PMID: 36350676 PMCID: PMC9825437 DOI: 10.1093/nar/gkac966] [Citation(s) in RCA: 74] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 09/25/2022] [Accepted: 10/14/2022] [Indexed: 11/11/2022] Open
Abstract
Large biobank-scale whole genome sequencing (WGS) studies are rapidly identifying a multitude of coding and non-coding variants. They provide an unprecedented resource for illuminating the genetic basis of human diseases. Variant functional annotations play a critical role in WGS analysis, result interpretation, and prioritization of disease- or trait-associated causal variants. Existing functional annotation databases have limited scope to perform online queries and functionally annotate the genotype data of large biobank-scale WGS studies. We develop the Functional Annotation of Variants Online Resources (FAVOR) to meet these pressing needs. FAVOR provides a comprehensive multi-faceted variant functional annotation online portal that summarizes and visualizes findings of all possible nine billion single nucleotide variants (SNVs) across the genome. It allows for rapid variant-, gene- and region-level queries of variant functional annotations. FAVOR integrates variant functional information from multiple sources to describe the functional characteristics of variants and facilitates prioritizing plausible causal variants influencing human phenotypes. Furthermore, we provide a scalable annotation tool, FAVORannotator, to functionally annotate large-scale WGS studies and efficiently store the genotype and their variant functional annotation data in a single file using the annotated Genomic Data Structure (aGDS) format, making downstream analysis more convenient. FAVOR and FAVORannotator are available at https://favor.genohub.org.
Collapse
Affiliation(s)
- Hufeng Zhou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Theodore Arapoglou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Xihao Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Zilin Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Xiuwen Zheng
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Jill Moore
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | | | - Sushant Kumar
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Elizabeth E Blue
- Division of Medical Genetics, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Steven Buyske
- Department of Statistics, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Nancy Cox
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Eimear Kenny
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Bingshan Li
- Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Tara Matise
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Anthony Philippakis
- Data Science Platform, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Heidi J Sofia
- National Human Genome Research Institute, Bethesda, DC, USA
| | - Grace Snyder
- National Human Genome Research Institute, Bethesda, DC, USA
| | | | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Benjamin Neale
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Shamil R Sunyaev
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Statistics, Harvard University, Cambridge, MA, USA
| |
Collapse
|
16
|
Kuksa PP, Greenfest-Allen E, Cifello J, Ionita M, Wang H, Nicaretta H, Cheng PL, Lee WP, Wang LS, Leung YY. Scalable approaches for functional analyses of whole-genome sequencing non-coding variants. Hum Mol Genet 2022; 31:R62-R72. [PMID: 35943817 PMCID: PMC9585666 DOI: 10.1093/hmg/ddac191] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 08/04/2022] [Accepted: 08/08/2022] [Indexed: 11/23/2022] Open
Abstract
Non-coding genetic variants outside of protein-coding genome regions play an important role in genetic and epigenetic regulation. It has become increasingly important to understand their roles, as non-coding variants often make up the majority of top findings of genome-wide association studies (GWAS). In addition, the growing popularity of disease-specific whole-genome sequencing (WGS) efforts expands the library of and offers unique opportunities for investigating both common and rare non-coding variants, which are typically not detected in more limited GWAS approaches. However, the sheer size and breadth of WGS data introduce additional challenges to predicting functional impacts in terms of data analysis and interpretation. This review focuses on the recent approaches developed for efficient, at-scale annotation and prioritization of non-coding variants uncovered in WGS analyses. In particular, we review the latest scalable annotation tools, databases and functional genomic resources for interpreting the variant findings from WGS based on both experimental data and in silico predictive annotations. We also review machine learning-based predictive models for variant scoring and prioritization. We conclude with a discussion of future research directions which will enhance the data and tools necessary for the effective functional analyses of variants identified by WGS to improve our understanding of disease etiology.
Collapse
Affiliation(s)
- Pavel P Kuksa
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Emily Greenfest-Allen
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jeffrey Cifello
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Matei Ionita
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Hui Wang
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Heather Nicaretta
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Po-Liang Cheng
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Wan-Ping Lee
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Li-San Wang
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yuk Yee Leung
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
17
|
Byun J, Han Y, Li Y, Xia J, Long E, Choi J, Xiao X, Zhu M, Zhou W, Sun R, Bossé Y, Song Z, Schwartz A, Lusk C, Rafnar T, Stefansson K, Zhang T, Zhao W, Pettit RW, Liu Y, Li X, Zhou H, Walsh KM, Gorlov I, Gorlova O, Zhu D, Rosenberg SM, Pinney S, Bailey-Wilson JE, Mandal D, de Andrade M, Gaba C, Willey JC, You M, Anderson M, Wiencke JK, Albanes D, Lam S, Tardon A, Chen C, Goodman G, Bojeson S, Brenner H, Landi MT, Chanock SJ, Johansson M, Muley T, Risch A, Wichmann HE, Bickeböller H, Christiani DC, Rennert G, Arnold S, Field JK, Shete S, Le Marchand L, Melander O, Brunnstrom H, Liu G, Andrew AS, Kiemeney LA, Shen H, Zienolddiny S, Grankvist K, Johansson M, Caporaso N, Cox A, Hong YC, Yuan JM, Lazarus P, Schabath MB, Aldrich MC, Patel A, Lan Q, Rothman N, Taylor F, Kachuri L, Witte JS, Sakoda LC, Spitz M, Brennan P, Lin X, McKay J, Hung RJ, Amos CI. Cross-ancestry genome-wide meta-analysis of 61,047 cases and 947,237 controls identifies new susceptibility loci contributing to lung cancer. Nat Genet 2022; 54:1167-1177. [PMID: 35915169 PMCID: PMC9373844 DOI: 10.1038/s41588-022-01115-x] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 05/27/2022] [Indexed: 02/03/2023]
Abstract
To identify new susceptibility loci to lung cancer among diverse populations, we performed cross-ancestry genome-wide association studies in European, East Asian and African populations and discovered five loci that have not been previously reported. We replicated 26 signals and identified 10 new lead associations from previously reported loci. Rare-variant associations tended to be specific to populations, but even common-variant associations influencing smoking behavior, such as those with CHRNA5 and CYP2A6, showed population specificity. Fine-mapping and expression quantitative trait locus colocalization nominated several candidate variants and susceptibility genes such as IRF4 and FUBP1. DNA damage assays of prioritized genes in lung fibroblasts indicated that a subset of these genes, including the pleiotropic gene IRF4, potentially exert effects by promoting endogenous DNA damage.
Collapse
Affiliation(s)
- Jinyoung Byun
- Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA
- Section of Epidemiology and Population Sciences, Department of Medicine, Baylor College of Medicine, Houston, TX, USA
| | - Younghun Han
- Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA
- Section of Epidemiology and Population Sciences, Department of Medicine, Baylor College of Medicine, Houston, TX, USA
| | - Yafang Li
- Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA
- Section of Epidemiology and Population Sciences, Department of Medicine, Baylor College of Medicine, Houston, TX, USA
- Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Jun Xia
- Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Erping Long
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jiyeon Choi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Xiangjun Xiao
- Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA
| | - Meng Zhu
- Department of Epidemiology, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing, P. R. China
| | - Wen Zhou
- Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA
| | - Ryan Sun
- Department of Biostatistics, University of Texas, M.D. Anderson Cancer Center, Houston, TX, USA
| | - Yohan Bossé
- Institut universitaire de cardiologie et de pneumologie de Québec - Université Laval, Department of Molecular Medicine, Laval University, Quebec City, Quebec, Canada
| | - Zhuoyi Song
- Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Ann Schwartz
- Department of Oncology, Wayne State University School of Medicine, Detroit, MI, USA
- Karmanos Cancer Institute, Detroit, MI, USA
| | - Christine Lusk
- Department of Oncology, Wayne State University School of Medicine, Detroit, MI, USA
- Karmanos Cancer Institute, Detroit, MI, USA
| | | | | | - Tongwu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Wei Zhao
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Rowland W Pettit
- Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA
| | - Yanhong Liu
- Section of Epidemiology and Population Sciences, Department of Medicine, Baylor College of Medicine, Houston, TX, USA
- Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Xihao Li
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - Hufeng Zhou
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - Kyle M Walsh
- Duke Cancer Institute, Duke University Medical Center, Durham, NC, USA
| | - Ivan Gorlov
- Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA
- Section of Epidemiology and Population Sciences, Department of Medicine, Baylor College of Medicine, Houston, TX, USA
- Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Olga Gorlova
- Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA
- Section of Epidemiology and Population Sciences, Department of Medicine, Baylor College of Medicine, Houston, TX, USA
- Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Dakai Zhu
- Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA
- Section of Epidemiology and Population Sciences, Department of Medicine, Baylor College of Medicine, Houston, TX, USA
| | - Susan M Rosenberg
- Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Susan Pinney
- University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | | | - Diptasri Mandal
- Louisiana State University Health Sciences Center, New Orleans, LA, USA
| | | | - Colette Gaba
- The University of Toledo College of Medicine and Life Sciences, University of Toledo, Toledo, OH, USA
| | - James C Willey
- The University of Toledo College of Medicine and Life Sciences, University of Toledo, Toledo, OH, USA
| | - Ming You
- Center for Cancer Prevention, Houston Methodist Research Institute, Houston, TX, USA
| | | | - John K Wiencke
- Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA
| | - Demetrius Albanes
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Stephan Lam
- Department of Integrative Oncology, BC Cancer, Vancouver, British Columbia, Canada
| | - Adonina Tardon
- Public Health Department, University of Oviedo, ISPA and CIBERESP, Asturias, Spain
| | - Chu Chen
- Program in Epidemiology, Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | | | - Stig Bojeson
- Department of Clinical Biochemistry, Herlev Gentofte Hospital, Copenhagen University Hospital, Copenhagen, Denmark
- Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Hermann Brenner
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Division of Preventive Oncology, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases (NCT), Heidelberg, Germany
- German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Maria Teresa Landi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Stephen J Chanock
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mattias Johansson
- Section of Genetics, International Agency for Research on Cancer, World Health Organization, Lyon, France
| | - Thomas Muley
- Division of Cancer Epigenomics, DKFZ - German Cancer Research Center, Heidelberg, Germany
- Translational Lung Research Center Heidelberg (TLRC-H), German Center for Lung Research (DZL), Heidelberg, Germany
| | - Angela Risch
- Division of Cancer Epigenomics, DKFZ - German Cancer Research Center, Heidelberg, Germany
- Translational Lung Research Center Heidelberg (TLRC-H), German Center for Lung Research (DZL), Heidelberg, Germany
- Department of Biosciences and Medical Biology, Allergy-Cancer-BioNano Research Centre, University of Salzburg, Salzburg, Austria
- Cancer Cluster Salzburg, Salzburg, Austria
| | | | - Heike Bickeböller
- Department of Genetic Epidemiology, University Medical Center, Georg-August-University Göttingen, Göttingen, Germany
| | - David C Christiani
- Department of Epidemiology, Harvard T.H.Chan School of Public Health, Boston, MA, USA
| | - Gad Rennert
- Clalit National Cancer Control Center at Carmel Medical Center and Technion Faculty of Medicine, Haifa, Israel
| | - Susanne Arnold
- University of Kentucky, Markey Cancer Center, Lexington, KY, USA
| | - John K Field
- Roy Castle Lung Cancer Research Programme, Department of Molecular and Clinical Cancer Medicine, University of Liverpool, Liverpool, UK
| | - Sanjay Shete
- Department of Biostatistics, University of Texas, M.D. Anderson Cancer Center, Houston, TX, USA
- Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Loic Le Marchand
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI, USA
| | | | | | - Geoffrey Liu
- University Health Network- The Princess Margaret Cancer Centre, Toronto, Ontario, Canada
| | - Angeline S Andrew
- Departments of Epidemiology and Community and Family Medicine, Dartmouth College, Hanover, NH, USA
| | | | - Hongbing Shen
- Department of Epidemiology and Biostatistics, Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, School of Public Health, Nanjing Medical University, Nanjing, P. R. China
| | | | - Kjell Grankvist
- Department of Medical Biosciences, Umeå University, Umeå, Sweden
| | - Mikael Johansson
- Department of Radiation Sciences, Oncology, Umeå University, Umeå, Sweden
| | - Neil Caporaso
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Angela Cox
- Department of Oncology and Metabolism, University of Sheffield, Sheffield, UK
| | - Yun-Chul Hong
- Department of Preventive Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Jian-Min Yuan
- UPMC Hillman Cancer Center and Department of Epidemiology, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Philip Lazarus
- Department of Pharmaceutical Sciences, College of Pharmacy, Washington State University, Spokane, WA, USA
| | - Matthew B Schabath
- Department of Cancer Epidemiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA
| | - Melinda C Aldrich
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Alpa Patel
- American Cancer Society, Atlanta, GA, USA
| | - Qing Lan
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nathaniel Rothman
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Fiona Taylor
- Department of Oncology and Metabolism, University of Sheffield, Sheffield, UK
| | - Linda Kachuri
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA, USA
| | - John S Witte
- Department of Epidemiology and Population Health, Stanford University, Stanford, CA, USA
| | - Lori C Sakoda
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - Margaret Spitz
- Section of Epidemiology and Population Sciences, Department of Medicine, Baylor College of Medicine, Houston, TX, USA
| | - Paul Brennan
- Section of Genetics, International Agency for Research on Cancer, World Health Organization, Lyon, France
| | - Xihong Lin
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - James McKay
- Section of Genetics, International Agency for Research on Cancer, World Health Organization, Lyon, France
| | - Rayjean J Hung
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada
- Division of Epidemiology, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Christopher I Amos
- Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, USA.
- Section of Epidemiology and Population Sciences, Department of Medicine, Baylor College of Medicine, Houston, TX, USA.
- Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|