1
|
Xu X, Wang S, Zhou H, Tan Q, Lang Z, Zhu Y, Yuan H, Wu Z, Zhu L, Hu K, Li W, Zhou D, Wu M, Wu X. Transcriptome-wide association study of alternative polyadenylation identifies susceptibility genes in non-small cell lung cancer. Oncogene 2025:10.1038/s41388-025-03338-8. [PMID: 40205015 DOI: 10.1038/s41388-025-03338-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Revised: 02/09/2025] [Accepted: 02/28/2025] [Indexed: 04/11/2025]
Abstract
Alternative polyadenylation (APA) plays a crucial role in cancer development and prognosis. However, the molecular characteristics of APA related to non-small cell lung cancer (NSCLC) susceptibility remain understudied, especially in East Asian populations. In this study, we constructed an atlas of APA-regulated 3' untranslated region (3'UTR) and profiled its genetic regulation in 747 lung tissue samples (including tumors and paired normal tissues) from 417 NSCLC Chinese patients. We verified a significant global shortening of 3'UTRs in tumor samples compared to normal samples and underscored the value of APA-regulation as a prognostic marker. The 3'UTR APA quantitative trait loci (3'aQTL) was identified by regressing the percentage of distal poly(A) site usage index (PDUI) value on genetic variants. We found that a significant proportion 3'aQTLs are independent of genetic regulation of expression and are specific in Chinese. We also conducted a 3'UTR APA transcriptome-wide association study (3'aTWAS) by integrating the APA regulation atlas with a genome-wide association study (GWAS) for NSCLC involving 7035 cases and 185,413 cancer-free controls. We identified NSCLC-associated genes, highlighting TUBB, TEAD3, and PPP1R10. Combining the consistent results from colocalization analysis, differential APA analysis, and survival analysis, we provide novel evidence for the role TUBB APA regulation in NSCLC and identified potential upstream regulators. Overall, our study profiled the APA regulation and highlighted the substantial role of APA in NSCLC carcinogenesis and prognosis in East Asian populations.
Collapse
Affiliation(s)
- Xiaohang Xu
- Center of Clinical Big Data and Analytics of the Second Affiliated Hospital and School of Public Health, Zhejiang University School of Medicine, Hangzhou, China
- Zhejiang Key Laboratory of Intelligent Preventive Medicine, Hangzhou, China
| | - Sicong Wang
- Center of Clinical Big Data and Analytics of the Second Affiliated Hospital and School of Public Health, Zhejiang University School of Medicine, Hangzhou, China
- Zhejiang Key Laboratory of Intelligent Preventive Medicine, Hangzhou, China
| | - Hanyi Zhou
- Center of Clinical Big Data and Analytics of the Second Affiliated Hospital and School of Public Health, Zhejiang University School of Medicine, Hangzhou, China
| | - Qilong Tan
- Center of Clinical Big Data and Analytics of the Second Affiliated Hospital and School of Public Health, Zhejiang University School of Medicine, Hangzhou, China
- Zhejiang Key Laboratory of Intelligent Preventive Medicine, Hangzhou, China
| | - Zeyong Lang
- Center of Clinical Big Data and Analytics of the Second Affiliated Hospital and School of Public Health, Zhejiang University School of Medicine, Hangzhou, China
| | - Yun Zhu
- Center of Clinical Big Data and Analytics of the Second Affiliated Hospital and School of Public Health, Zhejiang University School of Medicine, Hangzhou, China
| | - Huadi Yuan
- Department of Thoracic Surgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Zixiang Wu
- Department of Thoracic Surgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Ling Zhu
- Department of Thoracic Surgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Kejia Hu
- Center of Clinical Big Data and Analytics of the Second Affiliated Hospital and School of Public Health, Zhejiang University School of Medicine, Hangzhou, China
- National Institute for Data Science in Health and Medicine, Zhejiang University, Hangzhou, Zhejiang, China
| | - Wenyuan Li
- Center of Clinical Big Data and Analytics of the Second Affiliated Hospital and School of Public Health, Zhejiang University School of Medicine, Hangzhou, China
- Zhejiang Key Laboratory of Intelligent Preventive Medicine, Hangzhou, China
| | - Dan Zhou
- Center of Clinical Big Data and Analytics of the Second Affiliated Hospital and School of Public Health, Zhejiang University School of Medicine, Hangzhou, China
- Zhejiang Key Laboratory of Intelligent Preventive Medicine, Hangzhou, China
| | - Ming Wu
- Department of Thoracic Surgery, Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xifeng Wu
- Center of Clinical Big Data and Analytics of the Second Affiliated Hospital and School of Public Health, Zhejiang University School of Medicine, Hangzhou, China.
- National Institute for Data Science in Health and Medicine, Zhejiang University, Hangzhou, Zhejiang, China.
- School of Medicine and Health Science, George Washington University, Washington, DC, USA.
- Zhejiang Cancer Hospital, Hangzhou, China.
| |
Collapse
|
2
|
He R, Ren J, Malakhov MM, Pan W. Enhancing nonlinear transcriptome- and proteome-wide association studies via trait imputation with applications to Alzheimer's disease. PLoS Genet 2025; 21:e1011659. [PMID: 40209152 PMCID: PMC12040266 DOI: 10.1371/journal.pgen.1011659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 04/29/2025] [Accepted: 03/18/2025] [Indexed: 04/12/2025] Open
Abstract
Genome-wide association studies (GWAS) performed on large cohort and biobank datasets have identified many genetic loci associated with Alzheimer's disease (AD). However, the younger demographic of biobank participants relative to the typical age of late-onset AD has resulted in an insufficient number of AD cases, limiting the statistical power of GWAS and any downstream analyses. To mitigate this limitation, several trait imputation methods have been proposed to impute the expected future AD status of individuals who may not have yet developed the disease. This paper explores the use of imputed AD status in nonlinear transcriptome/proteome-wide association studies (TWAS/PWAS) to identify genes and proteins whose genetically regulated expression is associated with AD risk. In particular, we considered the TWAS/PWAS method DeLIVR, which utilizes deep learning to model the nonlinear effects of expression on disease. We trained transcriptome and proteome imputation models for DeLIVR on data from the Genotype-Tissue Expression (GTEx) Project and the UK Biobank (UKB), respectively, with imputed AD status in UKB participants as the outcome. Next, we performed hypothesis testing for the DeLIVR models using clinically diagnosed AD cases from the Alzheimer's Disease Sequencing Project (ADSP). Our results demonstrate that nonlinear TWAS/PWAS trained with imputed AD outcomes successfully identifies known and putative AD risk genes and proteins. Notably, we found that training with imputed outcomes can increase statistical power without inflating false positives, enabling the discovery of molecular exposures with potentially nonlinear effects on neurodegeneration.
Collapse
Affiliation(s)
- Ruoyu He
- School of Statistics, University of Minnesota, Minneapolis, Minnesota, United States of America
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Jingchen Ren
- School of Statistics, University of Minnesota, Minneapolis, Minnesota, United States of America
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Mykhaylo M. Malakhov
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
3
|
Zhang D, Gao B, Feng Q, Manichaikul A, Peloso GM, Tracy RP, Durda P, Taylor KD, Liu Y, Johnson WC, Gabriel S, Gupta N, Smith JD, Aguet F, Ardlie KG, Blackwell TW, Gerszten RE, Rich SS, Rotter JI, Scott LJ, Zhou X, Lee S. Proteome-wide association studies for blood lipids and comparison with transcriptome-wide association studies. HGG ADVANCES 2025; 6:100383. [PMID: 39543875 PMCID: PMC11650301 DOI: 10.1016/j.xhgg.2024.100383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 11/08/2024] [Accepted: 11/08/2024] [Indexed: 11/17/2024] Open
Abstract
Blood lipid traits are treatable and heritable risk factors for heart disease, a leading cause of mortality worldwide. Although genome-wide association studies (GWASs) have discovered hundreds of variants associated with lipids in humans, most of the causal mechanisms of lipids remain unknown. To better understand the biological processes underlying lipid metabolism, we investigated the associations of plasma protein levels with total cholesterol (TC), triglycerides (TG), high-density lipoprotein (HDL) cholesterol, and low-density lipoprotein (LDL) cholesterol in blood. We trained protein prediction models based on samples in the Multi-Ethnic Study of Atherosclerosis (MESA) and applied them to conduct proteome-wide association studies (PWASs) for lipids using the Global Lipids Genetics Consortium (GLGC) data. Of the 749 proteins tested, 42 were significantly associated with at least one lipid trait. Furthermore, we performed transcriptome-wide association studies (TWASs) for lipids using 9,714 gene expression prediction models trained on samples from peripheral blood mononuclear cells (PBMCs) in MESA and 49 tissues in the Genotype-Tissue Expression (GTEx) project. We found that although PWASs and TWASs can show different directions of associations in an individual gene, 40 out of 49 tissues showed a positive correlation between PWAS and TWAS signed p values across all the genes, which suggests high-level consistency between proteome-lipid associations and transcriptome-lipid associations.
Collapse
Affiliation(s)
- Daiwei Zhang
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA; Departments of Biostatistics and Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Boran Gao
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Qidi Feng
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA; Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Gina M Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Russell P Tracy
- Departments of Pathology and Laboratory Medicine, and Biochemistry, Larner College of Medicine, University of Vermont, Burlington, VT, USA
| | - Peter Durda
- Department of Pathology and Laboratory Medicine, Larner College of Medicine, University of Vermont, Burlington, VT, USA
| | - Kent D Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Yongmei Liu
- Department of Medicine, Divisions of Cardiology and Neurology, Duke University Medical Center, Durham, NC, USA
| | - W Craig Johnson
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Stacey Gabriel
- Genomics Platform, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
| | - Namrata Gupta
- Genomics Platform, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
| | - Joshua D Smith
- Department of Genome Sciences, Human Genetics, and Translational Genomics, University of Washington, Seattle, WA, USA
| | - Francois Aguet
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
| | - Kristin G Ardlie
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
| | - Thomas W Blackwell
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO, USA
| | - Robert E Gerszten
- Division of Cardiovascular Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Laura J Scott
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA.
| | - Xiang Zhou
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Seunggeun Lee
- Graduate School of Data Science, Seoul National University, Seoul, Republic of Korea; Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
4
|
Kunkel D, Sørensen P, Shankar V, Morgante F. Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes. PLoS Genet 2025; 21:e1011519. [PMID: 39775068 PMCID: PMC11741642 DOI: 10.1371/journal.pgen.1011519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 01/17/2025] [Accepted: 11/27/2024] [Indexed: 01/11/2025] Open
Abstract
Polygenic prediction of complex trait phenotypes has become important in human genetics, especially in the context of precision medicine. Recently, mr.mash, a flexible and computationally efficient method that models multiple phenotypes jointly and leverages sharing of effects across such phenotypes to improve prediction accuracy, was introduced. However, a drawback of mr.mash is that it requires individual-level data, which are often not publicly available. In this work, we introduce mr.mash-rss, an extension of the mr.mash model that requires only summary statistics from Genome-Wide Association Studies (GWAS) and linkage disequilibrium (LD) estimates from a reference panel. By using summary data, we achieve the twin goal of increasing the applicability of the mr.mash model to data sets that are not publicly available and making it scalable to biobank-size data. Through simulations, we show that mr.mash-rss is competitive with, and often outperforms, current state-of-the-art methods for single- and multi-phenotype polygenic prediction in a variety of scenarios that differ in the pattern of effect sharing across phenotypes, the number of phenotypes, the number of causal variants, and the genomic heritability. We also present a real data analysis of 16 blood cell phenotypes in the UK Biobank, showing that mr.mash-rss achieves higher prediction accuracy than competing methods for the majority of traits, especially when the data set has smaller sample size.
Collapse
Affiliation(s)
- Deborah Kunkel
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, South Carolina, United States of America
| | - Peter Sørensen
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | - Vijay Shankar
- Center for Human Genetics, Clemson University, Greenwood, South Carolina, United States of America
| | - Fabio Morgante
- Center for Human Genetics, Clemson University, Greenwood, South Carolina, United States of America
- Department of Genetics and Biochemistry, Clemson University, Clemson, South Carolina, United States of America
| |
Collapse
|
5
|
Ramprasad P, Ren J, Pan W. Enhancing Gene Expression Predictions Using Deep Learning and Functional Annotations. Genet Epidemiol 2025; 49:e22595. [PMID: 39344923 DOI: 10.1002/gepi.22595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 07/17/2024] [Accepted: 09/03/2024] [Indexed: 10/01/2024]
Abstract
Transcriptome-wide association studies (TWAS) aim to uncover genotype-phenotype relationships through a two-stage procedure: predicting gene expression from genotypes using an expression quantitative trait locus (eQTL) data set, then testing the predicted expression for trait associations. Accurate gene expression prediction in stage 1 is crucial, as it directly impacts the power to identify associations in stage 2. Currently, the first stage of such studies is primarily conducted using linear models like elastic net regression, which fail to capture the nonlinear relationships inherent in biological systems. Deep learning methods have the potential to model such nonlinear effects, but have yet to demonstrably outperform linear methods at this task. To address this gap, we propose a new deep learning architecture to predict gene expression from genotypic variation across individuals. Our method utilizes a learnable input scaling layer in conjunction with a convolutional encoder to capture nonlinear effects and higher-order interactions without compromising on interpretability. We further augment this approach to allow for parameter sharing across multiple networks, enabling us to utilize prior information for individual variants in the form of functional annotations. Evaluations on real-world genomic data show that our method consistently outperforms elastic net regression across a large set of heritable genes. Furthermore, our model statistically significantly improved predictive performance by leveraging functional annotations, whereas elastic net regression failed to show equivalent gains when using the same information, suggesting that our method can capture nonlinear functional information beyond the capability of linear models.
Collapse
Grants
- This research was supported by NIH grants U01 AG073079, R01 AG065636, R01 AG069895, and RF1 AG067924, and by the Minnesota Supercomputing Institute at the University of Minnesota. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS; the GTEx data were obtained from dbGaP Project #26511.
Collapse
Affiliation(s)
- Pratik Ramprasad
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minnesota, USA
| | - Jingchen Ren
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minnesota, USA
| | - Wei Pan
- Division of Biostatistics and Health Data Science, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
6
|
Moore A, Venkatesh R, Levin MG, Damrauer SM, Reza N, Cappola TP, Ritchie MD. Connecting intermediate phenotypes to disease using multi-omics in heart failure. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2025; 30:504-521. [PMID: 39670392 PMCID: PMC11822568 DOI: 10.1142/9789819807024_0036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/16/2025]
Abstract
Heart failure (HF) is one of the most common, complex, heterogeneous diseases in the world, with over 1-3% of the global population living with the condition. Progression of HF can be tracked via MRI measures of structural and functional changes to the heart, namely left ventricle (LV), including ejection fraction, mass, end-diastolic volume, and LV end-systolic volume. Moreover, while genome-wide association studies (GWAS) have been a useful tool to identify candidate variants involved in HF risk, they lack crucial tissue-specific and mechanistic information which can be gained from incorporating additional data modalities. This study addresses this gap by incorporating transcriptome-wide and proteome-wide association studies (TWAS and PWAS) to gain insights into genetically-regulated changes in gene expression and protein abundance in precursors to HF measured using MRI-derived cardiac measures as well as full-stage all-cause HF. We identified several gene and protein overlaps between LV ejection fraction and end-systolic volume measures. Many of the overlaps identified in MRI-derived measurements through TWAS and PWAS appear to be shared with all-cause HF. We implicate many putative pathways relevant in HF associated with these genes and proteins via gene-set enrichment and protein-protein interaction network approaches. The results of this study (1) highlight the benefit of using multi-omics to better understand genetics and (2) provide novel insights as to how changes in heart structure and function may relate to HF.
Collapse
Affiliation(s)
- Anni Moore
- Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, 3700 Hamilton Walk Philadelphia, PA, 19104, USA
| | - Rasika Venkatesh
- Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, 3700 Hamilton Walk Philadelphia, PA, 19104, USA
| | - Michael G Levin
- Division of Cardiovascular Medicine, University of Pennsylvania Perelman School of Medicine, 3400 Civic Center Blvd Philadelphia, PA, 19104, USA
| | - Scott M Damrauer
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, 3400 Spruce St Philadelphia, PA 19104, USA
| | - Nosheen Reza
- Division of Cardiovascular Medicine, University of Pennsylvania Perelman School of Medicine, 3400 Civic Center Blvd Philadelphia, PA, 19104, USA
| | - Thomas P Cappola
- Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, 3700 Hamilton Walk Philadelphia, PA, 19104, USA
| | - Marylyn D Ritchie
- Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, 3700 Hamilton Walk Philadelphia, PA, 19104, USA,
| |
Collapse
|
7
|
Shao M, Chen K, Zhang S, Tian M, Shen Y, Cao C, Gu N. Multiome-wide Association Studies: Novel Approaches for Understanding Diseases. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae077. [PMID: 39471467 PMCID: PMC11630051 DOI: 10.1093/gpbjnl/qzae077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 10/06/2024] [Accepted: 10/23/2024] [Indexed: 11/01/2024]
Abstract
The rapid development of multiome (transcriptome, proteome, cistrome, imaging, and regulome)-wide association study methods have opened new avenues for biologists to understand the susceptibility genes underlying complex diseases. Thorough comparisons of these methods are essential for selecting the most appropriate tool for a given research objective. This review provides a detailed categorization and summary of the statistical models, use cases, and advantages of recent multiome-wide association studies. In addition, to illustrate gene-disease association studies based on transcriptome-wide association study (TWAS), we collected 478 disease entries across 22 categories from 235 manually reviewed publications. Our analysis reveals that mental disorders are the most frequently studied diseases by TWAS, indicating its potential to deepen our understanding of the genetic architecture of complex diseases. In summary, this review underscores the importance of multiome-wide association studies in elucidating complex diseases and highlights the significance of selecting the appropriate method for each study.
Collapse
Affiliation(s)
- Mengting Shao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Kaiyang Chen
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Shuting Zhang
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Min Tian
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Yan Shen
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Chen Cao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Ning Gu
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
- Nanjing Key Laboratory for Cardiovascular Information and Health Engineering Medicine, Institute of Clinical Medicine, Nanjing Drum Tower Hospital, Medical School, Nanjing University, Nanjing 210093, China
| |
Collapse
|
8
|
Guo X, Zheng Z, Cheong KH, Zou Q, Tiwari P, Ding Y. Sequence homology score-based deep fuzzy network for identifying therapeutic peptides. Neural Netw 2024; 178:106458. [PMID: 38901093 DOI: 10.1016/j.neunet.2024.106458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 05/29/2024] [Accepted: 06/09/2024] [Indexed: 06/22/2024]
Abstract
The detection of therapeutic peptides is a topic of immense interest in the biomedical field. Conventional biochemical experiment-based detection techniques are tedious and time-consuming. Computational biology has become a useful tool for improving the detection efficiency of therapeutic peptides. Most computational methods do not consider the deviation caused by noise. To improve the generalization performance of therapeutic peptide prediction methods, this work presents a sequence homology score-based deep fuzzy echo-state network with maximizing mixture correntropy (SHS-DFESN-MMC) model. Our method is compared with the existing methods on eight types of therapeutic peptide datasets. The model parameters are determined by 10 fold cross-validation on their training sets and verified by independent test sets. Across the 8 datasets, the average area under the receiver operating characteristic curve (AUC) values of SHS-DFESN-MMC are the highest on both the training (0.926) and independent sets (0.923).
Collapse
Affiliation(s)
- Xiaoyi Guo
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, PR China; Quzhou People's Hospital, Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou, 324000, PR China; Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, S637371, Singapore.
| | - Ziyu Zheng
- Department of Mathematical Sciences, University of Nottingham Ningbo, Ningbo, 315100, PR China.
| | - Kang Hao Cheong
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, S637371, Singapore; College of Computing and Data Science, Nanyang Technological University, S639798, Singapore.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, PR China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, PR China.
| | - Prayag Tiwari
- School of Information Technology, Halmstad University, Sweden.
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, PR China.
| |
Collapse
|
9
|
Yang R, Wang R, Zhao D, Lian K, Shang B, Dong L, Yang X, Dang X, Sun D, Cheng Y. Integrative analysis of transcriptome-wide association study and mRNA expression profile identified risk genes for bipolar disorder. Neurosci Lett 2024; 839:137935. [PMID: 39151574 DOI: 10.1016/j.neulet.2024.137935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 08/05/2024] [Accepted: 08/09/2024] [Indexed: 08/19/2024]
Abstract
OBJECTIVE Bipolar disorder (BD) is a debilitating neuropsychiatric disorder, which is associated with genetic variation through "vast but mixed" Genome-Wide Association Studies (GWAS). Transcriptome-Wide Association Study (TWAS) is more effective in explaining genetic factors that influence complex diseases and can help identifying risk genes more reliably. So, this study aims to identify potential BD risk genes in pedigrees with TWAS. METHODS We conducted a TWAS analysis with expression quantitative trait loci (eQTL) analysis on extended BD pedigrees, and the BD genome-wide association study (GWAS) summary data acquired from the Psychiatric Genomics Consortium (PGC). Furthermore, the BD-associated genes identified by TWAS were validated by mRNA expression profiles from the Gene Expression Omnibus (GEO) Datasets (GSE23848 and GSE46416). Functional enrichment and annotation analysis were implemented by RStudio (version 4.2.0). RESULTS TWAS identified 362 genes with P value < 0.05, and 18 genes remain significant after Bonferroni correction, such as SEMA3G (PTWAS=1.07 × 10-11), ALOX5AP (PTWAS=3.12 × 10-8), and PLEC (PTWAS=1.27 × 10-7). Further 6 overlapped genes were detected in integrative analysis, such as UQCRB (PTWAS=0.0020, PmRNA=0.0000), TMPRSS9 (PTWAS=0.0405, PmRNA=0.0032), and SNX10 (PTWAS=0.0104, PmRNA=0.0015). Using genes identified by TWAS, Gene Ontology (GO) enrichment analysis identified 40 significant GO terms, such as mitochondrial ATP synthesis coupled electron transport, mitochondrial respiratory, aerobic electron transport chain, oxidative phosphorylation, mitochondrial membrane proteins, and ubiquinone activity. The Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway enrichment analysis identified significant 15 pathways for BD, such as Oxidative phosphorylation, endocannabinoids signaling, neurodegeneration, and reactive oxide species. CONCLUSIONS We found a set of BD-associated genes and pathways, validating the important role of neurodevelopmental abnormalities, inflammatory responses, and mitochondrial dysfunction in the pathology of BD, offering novel information for comprehending the genetic basis of BD.
Collapse
Affiliation(s)
- Runxu Yang
- Psychiatry Department, First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China
| | - Rui Wang
- Department of Prevention and Health Care, First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China
| | - Dongyan Zhao
- First Affiliated Hospital of Dali University, Dali, Yunnan, China
| | - Kun Lian
- Psychiatry Department, First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China
| | - Binli Shang
- Psychiatry Department, First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China
| | - Lei Dong
- Psychiatry Department, First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China
| | - Xuejuan Yang
- Lincang Psychiatric Hospital, Lincang, Yunnan, China
| | - Xinglun Dang
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Duo Sun
- Psychiatry Department, First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China
| | - Yuqi Cheng
- Psychiatry Department, First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan, China.
| |
Collapse
|
10
|
Chen M, Zou Q, Qi R, Ding Y. PseU-KeMRF: A Novel Method for Identifying RNA Pseudouridine Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1423-1435. [PMID: 38625768 DOI: 10.1109/tcbb.2024.3389094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2024]
Abstract
Pseudouridine is a type of abundant RNA modification that is seen in many different animals and is crucial for a variety of biological functions. Accurately identifying pseudouridine sites within the RNA sequence is vital for the subsequent study of various biological mechanisms of pseudouridine. However, the use of traditional experimental methods faces certain challenges. The development of fast and convenient computational methods is necessary to accurately identify pseudouridine sites from RNA sequence information. To address this, we introduce a novel pseudouridine site prediction model called PseU-KeMRF, which can identify pseudouridine sites in three species, H. sapiens, S. cerevisiae, and M. musculus. Through comprehensive analysis, we selected four RNA coding schemes, including binary feature, position-specific trinucleotide propensity based on single strand (PSTNPss), nucleotide chemical property (NCP) and pseudo k-tuple composition (PseKNC). Then the support vector machine-recursive feature elimination (SVM-RFE) method was used for feature selection and the feature subset was optimized. Finally, the best feature subsets are input into the kernel based on multinomial random forests (KeMRF) classifier for cross-validation and independent testing. As a new classification method, compared with the traditional random forest, KeMRF not only improves the node splitting process of decision tree construction based on multinomial distribution, but also combines the easy to interpret kernel method for prediction, which makes the classification performance better. Our results indicate superior predictive performance of PseU-KeMRF over other existing models, which can prove that PseU-KeMRF is a highly competitive predictive model that can successfully identify pseudouridine sites in RNA sequences.
Collapse
|
11
|
Shao M, Tian M, Chen K, Jiang H, Zhang S, Li Z, Shen Y, Chen F, Shen B, Cao C, Gu N. Leveraging Random Effects in Cistrome-Wide Association Studies for Decoding the Genetic Determinants of Prostate Cancer. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2400815. [PMID: 39099406 PMCID: PMC11423091 DOI: 10.1002/advs.202400815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 07/09/2024] [Indexed: 08/06/2024]
Abstract
Cistrome-wide association studies (CWAS) are pivotal for identifying genetic determinants of diseases by correlating genetically regulated cistrome states with phenotypes. Traditional CWAS typically develops a model based on cistrome and genotype data to associate predicted cistrome states with phenotypes. The random effect cistrome-wide association study (RECWAS), reevaluates the necessity of cistrome state prediction in CWAS. RECWAS utilizes either a linear model or marginal effect for initial feature selection, followed by kernel-based feature aggregation for association testing is introduced. Through simulations and analysis of prostate cancer data, a thorough evaluation of CWAS and RECWAS is conducted. The results suggest that RECWAS offers improved power compared to traditional CWAS, identifying additional genomic regions associated with prostate cancer. CWAS identified 102 significant regions, while RECWAS found 50 additional significant regions compared to CWAS, many of which are validated. Validation encompassed a range of biological evidence, including risk signals from the GWAS catalog, susceptibility genes from the DisGeNET database, and enhancer-domain scores. RECWAS consistently demonstrated improved performance over traditional CWAS in identifying genomic regions associated with prostate cancer. These findings demonstrate the benefits of incorporating kernel methods into CWAS and provide new insights for genetic discovery in complex diseases.
Collapse
Affiliation(s)
- Mengting Shao
- Key Laboratory for Bio‐Electromagnetic Environment and Advanced Medical TheranosticsSchool of Biomedical Engineering and InformaticsNanjing Medical UniversityNanjing211166P. R. China
| | - Min Tian
- Key Laboratory for Bio‐Electromagnetic Environment and Advanced Medical TheranosticsSchool of Biomedical Engineering and InformaticsNanjing Medical UniversityNanjing211166P. R. China
| | - Kaiyang Chen
- Key Laboratory for Bio‐Electromagnetic Environment and Advanced Medical TheranosticsSchool of Biomedical Engineering and InformaticsNanjing Medical UniversityNanjing211166P. R. China
| | - Hangjin Jiang
- Center for Data ScienceZhejiang UniversityHangzhou310058P. R. China
| | - Shuting Zhang
- Key Laboratory for Bio‐Electromagnetic Environment and Advanced Medical TheranosticsSchool of Biomedical Engineering and InformaticsNanjing Medical UniversityNanjing211166P. R. China
| | - Zhenghui Li
- Key Laboratory for Bio‐Electromagnetic Environment and Advanced Medical TheranosticsSchool of Biomedical Engineering and InformaticsNanjing Medical UniversityNanjing211166P. R. China
| | - Yan Shen
- Key Laboratory for Bio‐Electromagnetic Environment and Advanced Medical TheranosticsSchool of Biomedical Engineering and InformaticsNanjing Medical UniversityNanjing211166P. R. China
| | - Feng Chen
- Key Laboratory for Bio‐Electromagnetic Environment and Advanced Medical TheranosticsSchool of Biomedical Engineering and InformaticsNanjing Medical UniversityNanjing211166P. R. China
| | - Baixin Shen
- Department of UrologyThe Second Affiliated Hospital of Nanjing Medical UniversityNanjing210011P. R. China
| | - Chen Cao
- Key Laboratory for Bio‐Electromagnetic Environment and Advanced Medical TheranosticsSchool of Biomedical Engineering and InformaticsNanjing Medical UniversityNanjing211166P. R. China
- Department of UrologyThe Second Affiliated Hospital of Nanjing Medical UniversityNanjing210011P. R. China
| | - Ning Gu
- Key Laboratory for Bio‐Electromagnetic Environment and Advanced Medical TheranosticsSchool of Biomedical Engineering and InformaticsNanjing Medical UniversityNanjing211166P. R. China
- Nanjing Key Laboratory for Cardiovascular Information and Health Engineering MedicineInstitute of Clinical MedicineNanjing Drum Tower HospitalMedical SchoolNanjing UniversityNanjing210093P. R. China
| |
Collapse
|
12
|
Moore A, Venkatesh R, Levin MG, Damrauer SM, Reza N, Cappola TP, Ritchie MD. Connecting intermediate phenotypes to disease using multi-omics in heart failure. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.08.06.24311572. [PMID: 39148828 PMCID: PMC11326335 DOI: 10.1101/2024.08.06.24311572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Heart failure (HF) is one of the most common, complex, heterogeneous diseases in the world, with over 1-3% of the global population living with the condition. Progression of HF can be tracked via MRI measures of structural and functional changes to the heart, namely left ventricle (LV), including ejection fraction, mass, end-diastolic volume, and LV end-systolic volume. Moreover, while genome-wide association studies (GWAS) have been a useful tool to identify candidate variants involved in HF risk, they lack crucial tissue-specific and mechanistic information which can be gained from incorporating additional data modalities. This study addresses this gap by incorporating transcriptome-wide and proteome-wide association studies (TWAS and PWAS) to gain insights into genetically-regulated changes in gene expression and protein abundance in precursors to HF measured using MRI-derived cardiac measures as well as full-stage all-cause HF. We identified several gene and protein overlaps between LV ejection fraction and end-systolic volume measures. Many of the overlaps identified in MRI-derived measurements through TWAS and PWAS appear to be shared with all-cause HF. We implicate many putative pathways relevant in HF associated with these genes and proteins via gene-set enrichment and protein-protein interaction network approaches. The results of this study (1) highlight the benefit of using multi-omics to better understand genetics and (2) provide novel insights as to how changes in heart structure and function may relate to HF.
Collapse
Affiliation(s)
- Anni Moore
- Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, 3700 Hamilton Walk Philadelphia, PA, 19104, USA
| | - Rasika Venkatesh
- Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, 3700 Hamilton Walk Philadelphia, PA, 19104, USA
| | - Michael G. Levin
- Division of Cardiovascular Medicine, University of Pennsylvania Perelman School of Medicine, 3400 Civic Center Blvd Philadelphia, PA, 19104, USA
| | - Scott M. Damrauer
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, 3400 Spruce St Philadelphia, PA 19104
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, 3700 Hamilton Walk Philadelphia, PA, 19104, USA
- Corporal Michael Crescenz VA Medical Center, 3900 Woodland Ave Philadelphia, PA
| | - Nosheen Reza
- Division of Cardiovascular Medicine, University of Pennsylvania Perelman School of Medicine, 3400 Civic Center Blvd Philadelphia, PA, 19104, USA
| | - Thomas P. Cappola
- Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, 3700 Hamilton Walk Philadelphia, PA, 19104, USA
- Division of Cardiovascular Medicine, University of Pennsylvania Perelman School of Medicine, 3400 Civic Center Blvd Philadelphia, PA, 19104, USA
| | - Marylyn D. Ritchie
- Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, 3700 Hamilton Walk Philadelphia, PA, 19104, USA
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, 3700 Hamilton Walk Philadelphia, PA, 19104, USA
| |
Collapse
|
13
|
Kunkel D, Sørensen P, Shankar V, Morgante F. Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.06.592745. [PMID: 38766136 PMCID: PMC11100663 DOI: 10.1101/2024.05.06.592745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Polygenic prediction of complex trait phenotypes has become important in human genetics, especially in the context of precision medicine. Recently, Morgante et al. introduced mr.mash, a flexible and computationally efficient method that models multiple phenotypes jointly and leverages sharing of effects across such phenotypes to improve prediction accuracy. However, a drawback of mr.mash is that it requires individual-level data, which are often not publicly available. In this work, we introduce mr.mash-rss, an extension of the mr.mash model that requires only summary statistics from Genome-Wide Association Studies (GWAS) and linkage disequilibrium (LD) estimates from a reference panel. By using summary data, we achieve the twin goal of increasing the applicability of the mr.mash model to data sets that are not publicly available and making it scalable to biobank-size data. Through simulations, we show that mr.mash-rss is competitive with, and often outperforms, current state-of-the-art methods for single- and multi-phenotype polygenic prediction in a variety of scenarios that differ in the pattern of effect sharing across phenotypes, the number of phenotypes, the number of causal variants, and the genomic heritability. We also present a real data analysis of 16 blood cell phenotypes in UK Biobank, showing that mr.mash-rss achieves higher prediction accuracy than competing methods for the majority of traits, especially when the data has smaller sample size.
Collapse
Affiliation(s)
- Deborah Kunkel
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, SC, United States of America
| | - Peter Sørensen
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | - Vijay Shankar
- Center for Human Genetics, Clemson University, Greenwood, SC, United States of America
| | - Fabio Morgante
- Center for Human Genetics, Clemson University, Greenwood, SC, United States of America
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, United States of America
| |
Collapse
|
14
|
Aguiar TFM, Rivas MP, de Andrade Silva EM, Pires SF, Dangoni GD, Macedo TC, Defelicibus A, Barros BDDF, Novak E, Cristofani LM, Odone V, Cypriano M, de Toledo SRC, da Cunha IW, da Costa CML, Carraro DM, Tojal I, de Oliveira Mendes TA, Krepischi ACV. First Transcriptome Analysis of Hepatoblastoma in Brazil: Unraveling the Pivotal Role of Noncoding RNAs and Metabolic Pathways. Biochem Genet 2024:10.1007/s10528-024-10764-y. [PMID: 38649558 DOI: 10.1007/s10528-024-10764-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 02/27/2024] [Indexed: 04/25/2024]
Abstract
Hepatoblastoma stands as the most prevalent liver cancer in the pediatric population. Characterized by a low mutational burden, chromosomal and epigenetic alterations are key drivers of its tumorigenesis. Transcriptome analysis is a powerful tool for unraveling the molecular intricacies of hepatoblastoma, shedding light on the effects of genetic and epigenetic changes on gene expression. In this study conducted in Brazilian patients, an in-depth whole transcriptome analysis was performed on 14 primary hepatoblastomas, compared to control liver tissues. The analysis unveiled 1,492 differentially expressed genes (1,031 upregulated and 461 downregulated), including 920 protein-coding genes (62%). Upregulated biological processes were linked to cell differentiation, signaling, morphogenesis, and development, involving known hepatoblastoma-associated genes (DLK1, MEG3, HDAC2, TET1, HMGA2, DKK1, DKK4), alongside with novel findings (GYNG4, CDH3, and TNFRSF19). Downregulated processes predominantly centered around oxidation and metabolism, affecting amines, nicotinamides, and lipids, featuring novel discoveries like the repression of SYT7, TTC36, THRSP, CCND1, GCK and CAMK2B. Two genes, which displayed a concordant pattern of DNA methylation alteration in their promoter regions and dysregulation in the transcriptome, were further validated by RT-qPCR: the upregulated TNFRSF19, a key gene in the embryonic development, and the repressed THRSP, connected to lipid metabolism. Furthermore, based on protein-protein interaction analysis, we identified genes holding central positions in the network, such as HDAC2, CCND1, GCK, and CAMK2B, among others, that emerged as prime candidates warranting functional validation in future studies. Notably, a significant dysregulation of non-coding RNAs (ncRNAs), predominantly upregulated transcripts, was observed, with 42% of the top 50 highly expressed genes being ncRNAs. An integrative miRNA-mRNA analysis revealed crucial biological processes associated with metabolism, oxidation reactions of lipids and carbohydrates, and methylation-dependent chromatin silencing. In particular, four upregulated miRNAs (miR-186, miR-214, miR-377, and miR-494) played a pivotal role in the network, potentially targeting multiple protein-coding transcripts, including CCND1 and CAMK2B. In summary, our transcriptome analysis highlighted disrupted embryonic development as well as metabolic pathways, particularly those involving lipids, emphasizing the emerging role of ncRNAs as epigenetic regulators in hepatoblastomas. These findings provide insights into the complexity of the hepatoblastoma transcriptome and identify potential targets for future therapeutic interventions.
Collapse
Affiliation(s)
- Talita Ferreira Marques Aguiar
- Department of Genetics and Evolutionary Biology, Institute of Biosciences, Human Genome and Stem-Cell Research Center, University of São Paulo, São Paulo, Brazil
- Columbia University Irving Medical Center, New York, NY, USA
| | - Maria Prates Rivas
- Department of Genetics and Evolutionary Biology, Institute of Biosciences, Human Genome and Stem-Cell Research Center, University of São Paulo, São Paulo, Brazil
| | - Edson Mario de Andrade Silva
- Department of Biochemistry and Molecular Biology, Federal University of Viçosa, Minas Gerais, Brazil
- Horticultural Sciences Department, University of Florida, Gainesville, USA
| | - Sara Ferreira Pires
- Department of Genetics and Evolutionary Biology, Institute of Biosciences, Human Genome and Stem-Cell Research Center, University of São Paulo, São Paulo, Brazil
| | - Gustavo Dib Dangoni
- Department of Genetics and Evolutionary Biology, Institute of Biosciences, Human Genome and Stem-Cell Research Center, University of São Paulo, São Paulo, Brazil
| | - Taiany Curdulino Macedo
- Department of Genetics and Evolutionary Biology, Institute of Biosciences, Human Genome and Stem-Cell Research Center, University of São Paulo, São Paulo, Brazil
| | | | | | - Estela Novak
- Pediatric Cancer Institute (ITACI) at the Pediatric Department, São Paulo University Medical School, São Paulo, Brazil
| | - Lilian Maria Cristofani
- Pediatric Cancer Institute (ITACI) at the Pediatric Department, São Paulo University Medical School, São Paulo, Brazil
| | - Vicente Odone
- Pediatric Cancer Institute (ITACI) at the Pediatric Department, São Paulo University Medical School, São Paulo, Brazil
| | - Monica Cypriano
- Department of Pediatrics, Adolescent and Child With Cancer Support Group (GRAACC), Federal University of São Paulo, São Paulo, Brazil
| | - Silvia Regina Caminada de Toledo
- Department of Pediatrics, Adolescent and Child With Cancer Support Group (GRAACC), Federal University of São Paulo, São Paulo, Brazil
| | | | | | - Dirce Maria Carraro
- International Center for Research, A. C. Camargo Cancer Center, São Paulo, Brazil
| | - Israel Tojal
- International Center for Research, A. C. Camargo Cancer Center, São Paulo, Brazil
| | | | - Ana Cristina Victorino Krepischi
- Department of Genetics and Evolutionary Biology, Institute of Biosciences, Human Genome and Stem-Cell Research Center, University of São Paulo, São Paulo, Brazil.
| |
Collapse
|
15
|
Wigdor EM, Samocha KE, Eberhardt RY, Chundru VK, Firth HV, Wright CF, Hurles ME, Martin HC. Investigating the role of common cis-regulatory variants in modifying penetrance of putatively damaging, inherited variants in severe neurodevelopmental disorders. Sci Rep 2024; 14:8708. [PMID: 38622173 PMCID: PMC11018828 DOI: 10.1038/s41598-024-58894-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 04/04/2024] [Indexed: 04/17/2024] Open
Abstract
Recent work has revealed an important role for rare, incompletely penetrant inherited coding variants in neurodevelopmental disorders (NDDs). Additionally, we have previously shown that common variants contribute to risk for rare NDDs. Here, we investigate whether common variants exert their effects by modifying gene expression, using multi-cis-expression quantitative trait loci (cis-eQTL) prediction models. We first performed a transcriptome-wide association study for NDDs using 6987 probands from the Deciphering Developmental Disorders (DDD) study and 9720 controls, and found one gene, RAB2A, that passed multiple testing correction (p = 6.7 × 10-7). We then investigated whether cis-eQTLs modify the penetrance of putatively damaging, rare coding variants inherited by NDD probands from their unaffected parents in a set of 1700 trios. We found no evidence that unaffected parents transmitting putatively damaging coding variants had higher genetically-predicted expression of the variant-harboring gene than their child. In probands carrying putatively damaging variants in constrained genes, the genetically-predicted expression of these genes in blood was lower than in controls (p = 2.7 × 10-3). However, results for proband-control comparisons were inconsistent across different sets of genes, variant filters and tissues. We find limited evidence that common cis-eQTLs modify penetrance of rare coding variants in a large cohort of NDD probands.
Collapse
Affiliation(s)
- Emilie M Wigdor
- Human Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
| | - Kaitlin E Samocha
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, USA
| | - Ruth Y Eberhardt
- Human Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - V Kartik Chundru
- Human Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- Department of Clinical and Biomedical Sciences, University of Exeter Medical School, Royal Devon and Exeter Hospital, Exeter, UK
| | - Helen V Firth
- Department of Medical Genetics, Addenbrooke's Hospital, Cambridge University Hospitals, Cambridge, UK
| | - Caroline F Wright
- Institute of Biomedical and Clinical Science, University of Exeter Medical School, Royal Devon and Exeter Hospital, Exeter, UK
| | - Matthew E Hurles
- Human Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Hilary C Martin
- Human Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
| |
Collapse
|
16
|
Guo X, Chatterjee N, Dutta D. Subset-based method for cross-tissue transcriptome-wide association studies improves power and interpretability. HGG ADVANCES 2024; 5:100283. [PMID: 38491773 PMCID: PMC10999697 DOI: 10.1016/j.xhgg.2024.100283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 03/09/2024] [Accepted: 03/09/2024] [Indexed: 03/18/2024] Open
Abstract
Integrating results from genome-wide association studies (GWASs) and studies of molecular phenotypes such as gene expressions can improve our understanding of the biological functions of trait-associated variants and can help prioritize candidate genes for downstream analysis. Using reference expression quantitative trait locus (eQTL) studies, several methods have been proposed to identify gene-trait associations, primarily based on gene expression imputation. To increase the statistical power by leveraging substantial eQTL sharing across tissues, meta-analysis methods aggregating such gene-based test results across multiple tissues or contexts have been developed as well. However, most existing meta-analysis methods have limited power to identify associations when the gene has weaker associations in only a few tissues and cannot identify the subset of tissues in which the gene is "activated." For this, we developed a cross-tissue subset-based transcriptome-wide association study (CSTWAS) meta-analysis method that improves power under such scenarios and can extract the set of potentially associated tissues. To improve applicability, CSTWAS uses only GWAS summary statistics and pre-computed correlation matrices to identify a subset of tissues that have the maximal evidence of gene-trait association. Through numerical simulations, we found that CSTWAS can maintain a well-calibrated type-I error rate, improves power especially when there is a small number of associated tissues for a gene-trait association, and identifies an accurate associated tissue set. By analyzing GWAS summary statistics of three complex traits and diseases, we demonstrate that CSTWAS could identify biological meaningful signals while providing an interpretation of disease etiology by extracting a set of potentially associated tissues.
Collapse
Affiliation(s)
- Xinyu Guo
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90007, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21218, USA; Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Diptavo Dutta
- Integrative Tumor Epidemiology Branch, Division of Cancer Epidemiology & Genetics, National Cancer Institute, Rockville, MD 20850, USA.
| |
Collapse
|
17
|
Li Q, Bian J, Qian Y, Kossinna P, Gau C, Gordon PMK, Zhou X, Guo X, Yan J, Wu J, Long Q. An expression-directed linear mixed model discovering low-effect genetic variants. Genetics 2024; 226:iyae018. [PMID: 38314848 PMCID: PMC11630775 DOI: 10.1093/genetics/iyae018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 11/29/2023] [Accepted: 01/05/2024] [Indexed: 02/07/2024] Open
Abstract
Detecting genetic variants with low-effect sizes using a moderate sample size is difficult, hindering downstream efforts to learn pathology and estimating heritability. In this work, by utilizing informative weights learned from training genetically predicted gene expression models, we formed an alternative approach to estimate the polygenic term in a linear mixed model. Our linear mixed model estimates the genetic background by incorporating their relevance to gene expression. Our protocol, expression-directed linear mixed model, enables the discovery of subtle signals of low-effect variants using moderate sample size. By applying expression-directed linear mixed model to cohorts of around 5,000 individuals with either binary (WTCCC) or quantitative (NFBC1966) traits, we demonstrated its power gain at the low-effect end of the genetic etiology spectrum. In aggregate, the additional low-effect variants detected by expression-directed linear mixed model substantially improved estimation of missing heritability. Expression-directed linear mixed model moves precision medicine forward by accurately detecting the contribution of low-effect genetic variants to human diseases.
Collapse
Affiliation(s)
- Qing Li
- Department of Biochemistry & Molecular Biology, University of Calgary, Calgary T2N 1N4, Canada
| | - Jiayi Bian
- Department of Mathematics and Statistics, University of Calgary, Calgary T2N 1N4, Canada
| | - Yanzhao Qian
- Department of Mathematics and Statistics, University of Calgary, Calgary T2N 1N4, Canada
| | - Pathum Kossinna
- Department of Biochemistry & Molecular Biology, University of Calgary, Calgary T2N 1N4, Canada
| | - Cooper Gau
- Department of Mathematics and Statistics, University of Calgary, Calgary T2N 1N4, Canada
| | - Paul M K Gordon
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary T2N 1N4, Canada
| | - Xiang Zhou
- School of Public Health, University of Michigan, Ann Arbor 48109, USA
| | - Xingyi Guo
- Department of Medicine & Biomedical Informatics, Vanderbilt University Medical Center, Nashville 37203, USA
| | - Jun Yan
- Physiology and Pharmacology, University of Calgary, Calgary T2N 1N4, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary T2N 1N4, Canada
| | - Jingjing Wu
- Department of Mathematics and Statistics, University of Calgary, Calgary T2N 1N4, Canada
| | - Quan Long
- Department of Biochemistry & Molecular Biology, University of Calgary, Calgary T2N 1N4, Canada
- Department of Mathematics and Statistics, University of Calgary, Calgary T2N 1N4, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary T2N 1N4, Canada
- Hotchkiss Brain Institute, University of Calgary, Calgary T2N 1N4, Canada
- Department of Medical Genetics, University of Calgary, Calgary T2N 1N4, Canada
| |
Collapse
|
18
|
He J, Li Q, Zhang Q. rvTWAS: identifying gene-trait association using sequences by utilizing transcriptome-directed feature selection. Genetics 2024; 226:iyad204. [PMID: 38001381 DOI: 10.1093/genetics/iyad204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 11/14/2023] [Accepted: 11/16/2023] [Indexed: 11/26/2023] Open
Abstract
Toward the identification of genetic basis of complex traits, transcriptome-wide association study (TWAS) is successful in integrating transcriptome data. However, TWAS is only applicable for common variants, excluding rare variants in exome or whole-genome sequences. This is partly because of the inherent limitation of TWAS protocols that rely on predicting gene expressions. Our previous research has revealed the insight into TWAS: the 2 steps in TWAS, building and applying the expression prediction models, are essentially genetic feature selection and aggregations that do not have to involve predictions. Based on this insight disentangling TWAS, rare variants' inability of predicting expression traits is no longer an obstacle. Herein, we developed "rare variant TWAS," or rvTWAS, that first uses a Bayesian model to conduct expression-directed feature selection and then uses a kernel machine to carry out feature aggregation, forming a model leveraging expressions for association mapping including rare variants. We demonstrated the performance of rvTWAS by thorough simulations and real data analysis in 3 psychiatric disorders, namely schizophrenia, bipolar disorder, and autism spectrum disorder. We confirmed that rvTWAS outperforms existing TWAS protocols and revealed additional genes underlying psychiatric disorders. Particularly, we formed a hypothetical mechanism in which zinc finger genes impact all 3 disorders through transcriptional regulations. rvTWAS will open a door for sequence-based association mappings integrating gene expressions.
Collapse
Affiliation(s)
- Jingni He
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary T2N 1N4, Canada
| | - Qing Li
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary T2N 1N4, Canada
| | - Qingrun Zhang
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary T2N 1N4, Canada
- Department of Mathematics and Statistics, University of Calgary, Calgary T2N 1N4, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary T2N 1N4, Canada
- Arnie Charbonneau Cancer Institute, University of Calgary, Calgary T2N 1N4, Canada
| |
Collapse
|
19
|
Chen Z, Lin W, Cai Q, Kweon SS, Shu XO, Tanikawa C, Jia WH, Wang Y, Su X, Yuan Y, Wen W, Kim J, Shin A, Jee SH, Matsuo K, Kim DH, Wang N, Ping J, Shin MH, Ren Z, Oh JH, Oze I, Ahn YO, Jung KJ, Gao YT, Pan ZZ, Kamatani Y, Han W, Long J, Matsuda K, Zheng W, Guo X. A large-scale microRNA transcriptome-wide association study identifies two susceptibility microRNAs, miR-1307-5p and miR-192-3p, for colorectal cancer risk. Hum Mol Genet 2024; 33:333-341. [PMID: 37903058 PMCID: PMC10840382 DOI: 10.1093/hmg/ddad185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 07/20/2023] [Accepted: 10/24/2023] [Indexed: 11/01/2023] Open
Abstract
Transcriptome-wide association studies (TWAS) have identified many putative susceptibility genes for colorectal cancer (CRC) risk. However, susceptibility miRNAs, critical dysregulators of gene expression, remain unexplored. We genotyped DNA samples from 313 CRC East Asian patients and performed small RNA sequencing in their normal colon tissues distant from tumors to build genetic models for predicting miRNA expression. We applied these models and data from genome-wide association studies (GWAS) including 23 942 cases and 217 267 controls of East Asian ancestry to investigate associations of predicted miRNA expression with CRC risk. Perturbation experiments separately by promoting and inhibiting miRNAs expressions and further in vitro assays in both SW480 and HCT116 cells were conducted. At a Bonferroni-corrected threshold of P < 4.5 × 10-4, we identified two putative susceptibility miRNAs, miR-1307-5p and miR-192-3p, located in regions more than 500 kb away from any GWAS-identified risk variants in CRC. We observed that a high predicted expression of miR-1307-5p was associated with increased CRC risk, while a low predicted expression of miR-192-3p was associated with increased CRC risk. Our experimental results further provide strong evidence of their susceptible roles by showing that miR-1307-5p and miR-192-3p play a regulatory role, respectively, in promoting and inhibiting CRC cell proliferation, migration, and invasion, which was consistently observed in both SW480 and HCT116 cells. Our study provides additional insights into the biological mechanisms underlying CRC development.
Collapse
Affiliation(s)
- Zhishan Chen
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, 2525 West End Ave, Nashville, TN 37203, United States
| | - Weiqiang Lin
- International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, No. N1, Shangcheng Avenue, Yiwu, 322000 China
| | - Qiuyin Cai
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, 2525 West End Ave, Nashville, TN 37203, United States
| | - Sun-Seog Kweon
- Department of Preventive Medicine, Chonnam National University Medical School, 160, Baekseo-ro, Dong-gu, Gwangju 61469, South Korea
| | - Xiao-Ou Shu
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, 2525 West End Ave, Nashville, TN 37203, United States
| | - Chizu Tanikawa
- Laboratory of Genome Technology, Human Genome Center, Institute of Medical Science, University of Tokyo, 4 Chome-6-1 Shirokanedai, Minato City, Tokyo 108-8639, Japan
| | - Wei-Hua Jia
- State Key Laboratory of Oncology in South China, Cancer Center, Sun Yat-sen University, No. 651 Dongfeng Road East, Guangzhou 510060, China
| | - Ying Wang
- International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, No. N1, Shangcheng Avenue, Yiwu, 322000 China
| | - Xinwan Su
- The Kidney Disease Center, the First Affiliated Hospital, Zhejiang University School of Medicine, 79 Qingchun Rd, Hangzhou, 310003 China
| | - Yuan Yuan
- The Kidney Disease Center, the First Affiliated Hospital, Zhejiang University School of Medicine, 79 Qingchun Rd, Hangzhou, 310003 China
| | - Wanqing Wen
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, 2525 West End Ave, Nashville, TN 37203, United States
| | - Jeongseon Kim
- Department of Cancer Biomedical Science, Graduate School of Cancer Science and Policy, National Cancer Center, 323 Ilsan-ro, Ilsandong-gu, Goyang-si, 10408, Gyeonggi-do, South Korea
| | - Aesun Shin
- Department of Preventive Medicine, Seoul National University College of Medicine, Seoul National University Cancer Research Institute, 03 Daehak-ro, Jongno-gu, 03080, Seoul, Korea
| | - Sun Ha Jee
- Department of Epidemiology and Health Promotion, Graduate School of Public Health, Yonsei University, 50-1, Yonsei-Ro, Seodaemun-gu, Seoul 03722, South Korea
| | - Keitaro Matsuo
- Division of Molecular and Clinical Epidemiology, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku Nagoya 464-8681, Japan
- Department of Epidemiology, Nagoya University Graduate School of Medicine, 65 Tsurumai-cho, Showa-ku, Nagoya, 466-8550, Japan
| | - Dong-Hyun Kim
- Department of Social and Preventive Medicine, Hallym University College of Medicine, Okcheon-dong, Chuncheon, 200-702 South Korea
| | - Nan Wang
- Department of General Surgery, Tangdu Hospital, the Air Force Medical University, 569 Xinsi Road, Xi'an, Shaanxi, 710038 China
| | - Jie Ping
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, 2525 West End Ave, Nashville, TN 37203, United States
| | - Min-Ho Shin
- Department of Preventive Medicine, Chonnam National University Medical School, 160, Baekseo-ro, Dong-gu, Gwangju 61469, South Korea
| | - Zefang Ren
- School of Public Health, Sun Yat-sen University, No. 74 Zhongshan Road 2, Yuexiu, Guangzhou, Guangdong 510080 China
| | - Jae Hwan Oh
- Center for Colorectal Cancer, National Cancer Center Hospital, National Cancer Center, 323, Ilsan-ro, Ilsandong-gu, Goyang-si, Gyeonggi-do,10408, South Korea
| | - Isao Oze
- Division of Cancer Epidemiology and Prevention, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku Nagoya 464-8681, Japan
| | - Yoon-Ok Ahn
- Department of Preventive Medicine, Seoul National University College of Medicine, Seoul National University Cancer Research Institute, 03 Daehak-ro, Jongno-gu, 03080, Seoul, Korea
| | - Keum Ji Jung
- Department of Epidemiology and Health Promotion, Graduate School of Public Health, Yonsei University, 50-1, Yonsei-Ro, Seodaemun-gu, Seoul 03722, South Korea
| | - Yu-Tang Gao
- State Key Laboratory of Oncogene and Related Genes & Department of Epidemiology, Shanghai Cancer Institute, Renji Hospital, Shanghai Jiao Tong University School of Medicine, 227 South Chongqing Road, Shanghai, China
| | - Zhi-Zhong Pan
- State Key Laboratory of Oncology in South China, Cancer Center, Sun Yat-sen University, No. 651 Dongfeng Road East, Guangzhou 510060, China
| | - Yoichiro Kamatani
- Laboratory for Statistical Analysis, RIKEN Center for Integrative Medical Sciences, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa, 230-0045, Japan
- Kyoto-McGill International Collaborative School in Genomic Medicine, Kyoto University Graduate School of Medicine, Yoshida-Konoe-cho, Sakyo-ku, Kyoto 606-8501, Japan
| | - Weidong Han
- Department of Medical Oncology, Sir Run Run Shaw Hospital, Zhejiang University College of Medicine, Xiasha Road, Hangzhou, 310018 China
| | - Jirong Long
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, 2525 West End Ave, Nashville, TN 37203, United States
| | - Koichi Matsuda
- Laboratory of Clinical Genome Sequencing, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa-shi, Chiba-ken 277-8562, Japan
| | - Wei Zheng
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, 2525 West End Ave, Nashville, TN 37203, United States
| | - Xingyi Guo
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, 2525 West End Ave, Nashville, TN 37203, United States
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, 2525 West End Ave, Nashville, TN 37203, United States
| |
Collapse
|
20
|
Visonà G, Bouzigon E, Demenais F, Schweikert G. Network propagation for GWAS analysis: a practical guide to leveraging molecular networks for disease gene discovery. Brief Bioinform 2024; 25:bbae014. [PMID: 38340090 PMCID: PMC10858647 DOI: 10.1093/bib/bbae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/28/2023] [Accepted: 01/08/2024] [Indexed: 02/12/2024] Open
Abstract
MOTIVATION Genome-wide association studies (GWAS) have enabled large-scale analysis of the role of genetic variants in human disease. Despite impressive methodological advances, subsequent clinical interpretation and application remains challenging when GWAS suffer from a lack of statistical power. In recent years, however, the use of information diffusion algorithms with molecular networks has led to fruitful insights on disease genes. RESULTS We present an overview of the design choices and pitfalls that prove crucial in the application of network propagation methods to GWAS summary statistics. We highlight general trends from the literature, and present benchmark experiments to expand on these insights selecting as case study three diseases and five molecular networks. We verify that the use of gene-level scores based on GWAS P-values offers advantages over the selection of a set of 'seed' disease genes not weighted by the associated P-values if the GWAS summary statistics are of sufficient quality. Beyond that, the size and the density of the networks prove to be important factors for consideration. Finally, we explore several ensemble methods and show that combining multiple networks may improve the network propagation approach.
Collapse
Affiliation(s)
- Giovanni Visonà
- Empirical Inference, Max-Planck Institute for Intelligent Systems, Tübingen 72076, Germany
| | | | | | | |
Collapse
|
21
|
Cao C, Shao M, Zuo C, Kwok D, Liu L, Ge Y, Zhang Z, Cui F, Chen M, Fan R, Ding Y, Jiang H, Wang G, Zou Q. RAVAR: a curated repository for rare variant-trait associations. Nucleic Acids Res 2024; 52:D990-D997. [PMID: 37831073 PMCID: PMC10767942 DOI: 10.1093/nar/gkad876] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/20/2023] [Accepted: 09/28/2023] [Indexed: 10/14/2023] Open
Abstract
Rare variants contribute significantly to the genetic causes of complex traits, as they can have much larger effects than common variants and account for much of the missing heritability in genome-wide association studies. The emergence of UK Biobank scale datasets and accurate gene-level rare variant-trait association testing methods have dramatically increased the number of rare variant associations that have been detected. However, no systematic collection of these associations has been carried out to date, especially at the gene level. To address the issue, we present the Rare Variant Association Repository (RAVAR), a comprehensive collection of rare variant associations. RAVAR includes 95 047 high-quality rare variant associations (76186 gene-level and 18 861 variant-level associations) for 4429 reported traits which are manually curated from 245 publications. RAVAR is the first resource to collect and curate published rare variant associations in an interactive web interface with integrated visualization, search, and download features. Detailed gene and SNP information are provided for each association, and users can conveniently search for related studies by exploring the EFO tree structure and interactive Manhattan plots. RAVAR could vastly improve the accessibility of rare variant studies. RAVAR is freely available for all users without login requirement at http://www.ravar.bio.
Collapse
Affiliation(s)
- Chen Cao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Mengting Shao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Chunman Zuo
- Institute of Artificial Intelligence, Donghua University, Shanghai, China
| | - Devin Kwok
- School of Computer Science, McGill University, Montreal, Canada
| | - Lin Liu
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Yuli Ge
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Zilong Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Feifei Cui
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Mingshuai Chen
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Rui Fan
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Hangjin Jiang
- Center for Data Science, Zhejiang University, Hangzhou, China
| | - Guishen Wang
- College of Computer Science and Engineering, Changchun University of Technology, Changchun, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|
22
|
He J, Antonyan L, Zhu H, Ardila K, Li Q, Enoma D, Zhang W, Liu A, Chekouo T, Cao B, MacDonald ME, Arnold PD, Long Q. A statistical method for image-mediated association studies discovers genes and pathways associated with four brain disorders. Am J Hum Genet 2024; 111:48-69. [PMID: 38118447 PMCID: PMC10806749 DOI: 10.1016/j.ajhg.2023.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 11/04/2023] [Accepted: 11/16/2023] [Indexed: 12/22/2023] Open
Abstract
Brain imaging and genomics are critical tools enabling characterization of the genetic basis of brain disorders. However, imaging large cohorts is expensive and may be unavailable for legacy datasets used for genome-wide association studies (GWASs). Using an integrated feature selection/aggregation model, we developed an image-mediated association study (IMAS), which utilizes borrowed imaging/genomics data to conduct association mapping in legacy GWAS cohorts. By leveraging the UK Biobank image-derived phenotypes (IDPs), the IMAS discovered genetic bases underlying four neuropsychiatric disorders and verified them by analyzing annotations, pathways, and expression quantitative trait loci (eQTLs). A cerebellar-mediated mechanism was identified to be common to the four disorders. Simulations show that, if the goal is identifying genetic risk, our IMAS is more powerful than a hypothetical protocol in which the imaging results were available in the GWAS dataset. This implies the feasibility of reanalyzing legacy GWAS datasets without conducting additional imaging, yielding cost savings for integrated analysis of genetics and imaging.
Collapse
Affiliation(s)
- Jingni He
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Lilit Antonyan
- Department of Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; The Mathison Centre for Mental Health Research & Education, Hotchkiss Brain Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Harold Zhu
- Department of Biological Sciences, Faculty of Science, University of Calgary, Calgary, AB, Canada
| | - Karen Ardila
- Department of Biomedical Engineering, Schulich School of Engineering, University of Calgary, Calgary, AB, Canada
| | - Qing Li
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - David Enoma
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | | | - Andy Liu
- Sir Winston Churchill High School, Calgary, AB, Canada; College of Letters and Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Thierry Chekouo
- Department of Mathematics and Statistics, Faculty of Science, University of Calgary, Calgary, AB, Canada; Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Bo Cao
- Department of Psychiatry, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, AB, Canada
| | - M Ethan MacDonald
- The Mathison Centre for Mental Health Research & Education, Hotchkiss Brain Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; Department of Biomedical Engineering, Schulich School of Engineering, University of Calgary, Calgary, AB, Canada; Department of Electrical and Software Engineering, Schulich School of Engineering, University of Calgary, Calgary, AB, Canada; Department of Radiology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; Alberta Children's Hospital Research Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Paul D Arnold
- Department of Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; The Mathison Centre for Mental Health Research & Education, Hotchkiss Brain Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; Department of Psychiatry, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; Alberta Children's Hospital Research Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.
| | - Quan Long
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; Department of Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; The Mathison Centre for Mental Health Research & Education, Hotchkiss Brain Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; Alberta Children's Hospital Research Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; Department of Mathematics and Statistics, Faculty of Science, University of Calgary, Calgary, AB, Canada.
| |
Collapse
|
23
|
Chhotaray S, Vohra V, Uttam V, Santhosh A, Saxena P, Gahlyan RK, Gowane G. TWAS revealed significant causal loci for milk production and its composition in Murrah buffaloes. Sci Rep 2023; 13:22401. [PMID: 38104199 PMCID: PMC10725422 DOI: 10.1038/s41598-023-49767-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 12/12/2023] [Indexed: 12/19/2023] Open
Abstract
Milk yield is the most complex trait in dairy animals, and mapping all causal variants even with smallest effect sizes has been difficult with the genome-wide association study (GWAS) sample sizes available in geographical regions with small livestock holdings such as Indian sub-continent. However, Transcriptome-wide association studies (TWAS) could serve as an alternate for fine mapping of expression quantitative trait loci (eQTLs). This is a maiden attempt to identify milk production and its composition related genes using TWAS in Murrah buffaloes (Bubalus bubalis). TWAS was conducted on a test (N = 136) set of Murrah buffaloes genotyped through ddRAD sequencing. Their gene expression level was predicted using reference (N = 8) animals having both genotype and mammary epithelial cell (MEC) transcriptome information. Gene expression prediction was performed using Elastic-Net and Dirichlet Process Regression (DPR) model with fivefold cross-validation and without any cross-validation. DPR model without cross-validation predicted 80.92% of the total genes in the test group of Murrah buffaloes which was highest compared to other methods. TWAS in test individuals based on predicted gene expression, identified a significant association of one unique gene for Fat%, and two for SNF% at Bonferroni corrected threshold. The false discovery rates (FDR) corrected P-values of the top ten SNPs identified through GWAS were comparatively higher than TWAS. Gene ontology of TWAS-identified genes was performed to understand the function of these genes, it was revealed that milk production and composition genes were mainly involved in Relaxin, AMPK, and JAK-STAT signaling pathway, along with CCRI, and several key metabolic processes. The present study indicates that TWAS offers a lower false discovery rate and higher significant hits than GWAS for milk production and its composition traits. Hence, it is concluded that TWAS can be effectively used to identify genes and cis-SNPs in a population, which can be used for fabricating a low-density genomic chip for predicting milk production in Murrah buffaloes.
Collapse
Affiliation(s)
- Supriya Chhotaray
- Division of Animal Genetics and Breeding, ICAR-Central Institute for Research on Buffaloes, Hisar, Haryana, 125001, India
- Animal Genetics and Breeding Division, ICAR-National Dairy Research Institute, Karnal, Haryana, 132001, India
| | - Vikas Vohra
- Animal Genetics and Breeding Division, ICAR-National Dairy Research Institute, Karnal, Haryana, 132001, India.
| | - Vishakha Uttam
- Animal Genetics and Breeding Division, ICAR-National Dairy Research Institute, Karnal, Haryana, 132001, India
| | - Ameya Santhosh
- Animal Genetics and Breeding Division, ICAR-National Dairy Research Institute, Karnal, Haryana, 132001, India
| | - Punjika Saxena
- Animal Genetics and Breeding Division, ICAR-National Dairy Research Institute, Karnal, Haryana, 132001, India
| | - Rajesh Kumar Gahlyan
- Animal Genetics and Breeding Division, ICAR-National Dairy Research Institute, Karnal, Haryana, 132001, India
| | - Gopal Gowane
- Animal Genetics and Breeding Division, ICAR-National Dairy Research Institute, Karnal, Haryana, 132001, India
| |
Collapse
|
24
|
Bhattacharya A, Vo DD, Jops C, Kim M, Wen C, Hervoso JL, Pasaniuc B, Gandal MJ. Isoform-level transcriptome-wide association uncovers genetic risk mechanisms for neuropsychiatric disorders in the human brain. Nat Genet 2023; 55:2117-2128. [PMID: 38036788 PMCID: PMC10703692 DOI: 10.1038/s41588-023-01560-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 10/05/2023] [Indexed: 12/02/2023]
Abstract
Methods integrating genetics with transcriptomic reference panels prioritize risk genes and mechanisms at only a fraction of trait-associated genetic loci, due in part to an overreliance on total gene expression as a molecular outcome measure. This challenge is particularly relevant for the brain, in which extensive splicing generates multiple distinct transcript-isoforms per gene. Due to complex correlation structures, isoform-level modeling from cis-window variants requires methodological innovation. Here we introduce isoTWAS, a multivariate, stepwise framework integrating genetics, isoform-level expression and phenotypic associations. Compared to gene-level methods, isoTWAS improves both isoform and gene expression prediction, yielding more testable genes, and increased power for discovery of trait associations within genome-wide association study loci across 15 neuropsychiatric traits. We illustrate multiple isoTWAS associations undetectable at the gene-level, prioritizing isoforms of AKT3, CUL3 and HSPD1 in schizophrenia and PCLO with multiple disorders. Results highlight the importance of incorporating isoform-level resolution within integrative approaches to increase discovery of trait associations, especially for brain-relevant traits.
Collapse
Affiliation(s)
- Arjun Bhattacharya
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX, USA.
- Institute for Data Science in Oncology, University of Texas MD Anderson Cancer Center, Houston, TX, USA.
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, USA.
| | - Daniel D Vo
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Lifespan Brain Institute at Penn Med and the Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Connor Jops
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Lifespan Brain Institute at Penn Med and the Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Minsoo Kim
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Cindy Wen
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA, USA
| | - Jonatan L Hervoso
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Michael J Gandal
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Lifespan Brain Institute at Penn Med and the Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, CA, USA.
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA, USA.
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
25
|
Shi M, Tanikawa C, Munter HM, Akiyama M, Koyama S, Tomizuka K, Matsuda K, Lathrop GM, Terao C, Koido M, Kamatani Y. Genotype imputation accuracy and the quality metrics of the minor ancestry in multi-ancestry reference panels. Brief Bioinform 2023; 25:bbad509. [PMID: 38221906 PMCID: PMC10788679 DOI: 10.1093/bib/bbad509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 11/20/2023] [Accepted: 12/13/2023] [Indexed: 01/16/2024] Open
Abstract
Large-scale imputation reference panels are currently available and have contributed to efficient genome-wide association studies through genotype imputation. However, whether large-size multi-ancestry or small-size population-specific reference panels are the optimal choices for under-represented populations continues to be debated. We imputed genotypes of East Asian (180k Japanese) subjects using the Trans-Omics for Precision Medicine reference panel and found that the standard imputation quality metric (Rsq) overestimated dosage r2 (squared correlation between imputed dosage and true genotype) particularly in marginal-quality bins. Variance component analysis of Rsq revealed that the increased imputed-genotype certainty (dosages closer to 0, 1 or 2) caused upward bias, indicating some systemic bias in the imputation. Through systematic simulations using different template switching rates (θ value) in the hidden Markov model, we revealed that the lower θ value increased the imputed-genotype certainty and Rsq; however, dosage r2 was insensitive to the θ value, thereby causing a deviation. In simulated reference panels with different sizes and ancestral diversities, the θ value estimates from Minimac decreased with the size of a single ancestry and increased with the ancestral diversity. Thus, Rsq could be deviated from dosage r2 for a subpopulation in the multi-ancestry panel, and the deviation represents different imputed-dosage distributions. Finally, despite the impact of the θ value, distant ancestries in the reference panel contributed only a few additional variants passing a predefined Rsq threshold. We conclude that the θ value substantially impacts the imputed dosage and the imputation quality metric value.
Collapse
Affiliation(s)
- Mingyang Shi
- Laboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Chizu Tanikawa
- Laboratory of Clinical Genome Sequencing, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Hans Markus Munter
- Victor Phillip Dahdaleh Institute of Genomic Medicine, McGill University, Montreal, Québec, Canada
| | - Masato Akiyama
- Department of Ocular Pathology and Imaging Science, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
| | - Satoshi Koyama
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Kohei Tomizuka
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Koichi Matsuda
- Laboratory of Clinical Genome Sequencing, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Gregory Mark Lathrop
- Victor Phillip Dahdaleh Institute of Genomic Medicine, McGill University, Montreal, Québec, Canada
| | - Chikashi Terao
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Masaru Koido
- Laboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Yoichiro Kamatani
- Laboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
- Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| |
Collapse
|
26
|
Majumdar A, Pasaniuc B. A Bayesian method for estimating gene-level polygenicity under the framework of transcriptome-wide association study. Stat Med 2023; 42:4867-4885. [PMID: 37643728 DOI: 10.1002/sim.9892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Revised: 06/03/2023] [Accepted: 08/10/2023] [Indexed: 08/31/2023]
Abstract
Polygenicity refers to the phenomenon that multiple genetic variants have a nonzero effect on a complex trait. It is defined as the proportion of genetic variants with a nonzero effect on the trait. Evaluation of polygenicity can provide valuable insights into the genetic architecture of the trait. Several recent works have attempted to estimate polygenicity at the single nucleotide polymorphism level. However, evaluating polygenicity at the gene level can be biologically more meaningful. We propose the notion of gene-level polygenicity, defined as the proportion of genes having a nonzero effect on the trait under the framework of a transcriptome-wide association study. We introduce a Bayesian approach genepoly to estimate this quantity for a trait. The method is based on spike and slab prior and simultaneously estimates the subset of non-null genes. Our simulation study shows that genepoly efficiently estimates gene-level polygenicity. The method produces a downward bias for small choices of trait heritability due to a non-null gene, which diminishes rapidly with an increase in the genome-wide association study (GWAS) sample size. While identifying the subset of non-null genes, genepoly offers a high level of specificity and an overall good level of sensitivity-the sensitivity increases as the sample size of the reference panel expression and GWAS data increase. We applied the method to seven phenotypes in the UK Biobank, integrating expression data. We find height to be the most polygenic and asthma to be the least polygenic.
Collapse
Affiliation(s)
- Arunabha Majumdar
- Department of Mathematics, Indian Institute of Technology Hyderabad, Kandi, Telangana, India
| | - Bogdan Pasaniuc
- Department of Pathology and Laboratory Medicine, University of California, Los Angeles, Los Angeles, California
| |
Collapse
|
27
|
Kim S, Qin Y, Park HJ, Yue M, Xu Z, Forno E, Chen W, Celedón JC. Methyl-TWAS: A powerful method for in silico transcriptome-wide association studies (TWAS) using long-range DNA methylation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.10.566586. [PMID: 38014125 PMCID: PMC10680683 DOI: 10.1101/2023.11.10.566586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
In silico transcriptome-wide association studies (TWAS) are commonly used to test whether expression of specific genes is linked to a complex trait. However, genotype-based in silico TWAS such as PrediXcan, exhibit low prediction accuracy for a majority of genes because genotypic data lack tissue- and disease-specificity and are not affected by the environment. Because methylation is tissue-specific and, like gene expression, can be modified by environment or disease status, methylation should predict gene expression with more accuracy than SNPs. Therefore, we propose Methyl-TWAS, the first approach that utilizes long-range methylation markers to impute gene expression for in silico TWAS through penalized regression. Methyl-TWAS 1) predicts epigenetically regulated/associated expression (eGReX), which incorporates tissue-specific expression and both genetically- (GReX) and environmentally-regulated expression to identify differentially expressed genes (DEGs) that could not be identified by genotype-based methods; and 2) incorporates both cis- and trans- CpGs, including various regulatory regions to identify DEGs that would be missed using cis- methylation only. Methyl-TWAS outperforms PrediXcan and two other methods in imputing gene expression in the nasal epithelium, particularly for immunity-related genes and DEGs in atopic asthma. Methyl-TWAS identified 3,681 (85.2%) of the 4,316 DEGs identified in a previous TWAS of atopic asthma using measured expression, while PrediXcan could not identify any gene. Methyl-TWAS also outperforms PrediXcan for expression imputation as well as in silico TWAS in white blood cells. Methyl-TWAS is a valuable tool for in silico TWAS, leveraging a growing body of publicly available genome-wide DNA methylation data for a variety of human tissues.
Collapse
Affiliation(s)
- Soyeon Kim
- Division of Pulmonary Medicine, Department of Pediatrics, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA
| | - Yidi Qin
- Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Hyun Jung Park
- Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Molin Yue
- Department of Biostatistics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Zhongli Xu
- Division of Pulmonary Medicine, Department of Pediatrics, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA
- School of Medicine, Tsinghua University, Beijing, China
| | - Erick Forno
- Division of Pulmonary Medicine, Department of Pediatrics, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA
| | - Wei Chen
- Division of Pulmonary Medicine, Department of Pediatrics, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA
| | - Juan C. Celedón
- Division of Pulmonary Medicine, Department of Pediatrics, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
28
|
Hu H, Zhao H, Zhong T, Dong X, Wang L, Han P, Li Z. Adaptive deep propagation graph neural network for predicting miRNA-disease associations. Brief Funct Genomics 2023; 22:453-462. [PMID: 37078739 DOI: 10.1093/bfgp/elad010] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Revised: 02/13/2023] [Accepted: 03/09/2023] [Indexed: 04/21/2023] Open
Abstract
BACKGROUND A large number of experiments show that the abnormal expression of miRNA is closely related to the occurrence, diagnosis and treatment of diseases. Identifying associations between miRNAs and diseases is important for clinical applications of complex human diseases. However, traditional biological experimental methods and calculation-based methods have many limitations, which lead to the development of more efficient and accurate deep learning methods for predicting miRNA-disease associations. RESULTS In this paper, we propose a novel model on the basis of adaptive deep propagation graph neural network to predict miRNA-disease associations (ADPMDA). We first construct the miRNA-disease heterogeneous graph based on known miRNA-disease pairs, miRNA integrated similarity information, miRNA sequence information and disease similarity information. Then, we project the features of miRNAs and diseases into a low-dimensional space. After that, attention mechanism is utilized to aggregate the local features of central nodes. In particular, an adaptive deep propagation graph neural network is employed to learn the embedding of nodes, which can adaptively adjust the local and global information of nodes. Finally, the multi-layer perceptron is leveraged to score miRNA-disease pairs. CONCLUSION Experiments on human microRNA disease database v3.0 dataset show that ADPMDA achieves the mean AUC value of 94.75% under 5-fold cross-validation. We further conduct case studies on the esophageal neoplasm, lung neoplasms and lymphoma to confirm the effectiveness of our proposed model, and 49, 49, 47 of the top 50 predicted miRNAs associated with these diseases are confirmed, respectively. These results demonstrate the effectiveness and superiority of our model in predicting miRNA-disease associations.
Collapse
Affiliation(s)
- Hua Hu
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277122, China
| | - Huan Zhao
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221008, China
| | - Tangbo Zhong
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221008, China
| | - Xishang Dong
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277122, China
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277122, China
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Science, Nanning 541006, China
| | - Pengyong Han
- Central Lab, Changzhi Medical College, Changzhi 046012, China
| | - Zhengwei Li
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277122, China
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Science, Nanning 541006, China
- KUNPAND Communications (Kunshan) Co., Ltd., Suzhou 215300, China
| |
Collapse
|
29
|
Guo X, Ping J, Yang Y, Su X, Shu XO, Wen W, Chen Z, Zhang Y, Tao R, Jia G, He J, Cai Q, Zhang Q, Giles GG, Pearlman R, Rennert G, Vodicka P, Phipps A, Gruber SB, Casey G, Peters U, Long J, Lin W, Zheng W. Large-scale alternative polyadenylation (APA)-wide association studies to identify putative susceptibility genes in human common cancers. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.11.05.23298125. [PMID: 37986797 PMCID: PMC10659493 DOI: 10.1101/2023.11.05.23298125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Alternative polyadenylation (APA) modulates mRNA processing in the 3' untranslated regions (3'UTR), which affect mRNA stability and translation efficiency. Here, we build genetic models to predict APA levels in multiple tissues using sequencing data of 1,337 samples from the Genotype-Tissue Expression, and apply these models to assess associations between genetically predicted APA levels and cancer risk with data from large genome-wide association studies of six common cancers, including breast, ovary, prostate, colorectum, lung, and pancreas among European-ancestry populations. At a Bonferroni-corrected P □<□0.05, we identify 58 risk genes, including seven in newly identified loci. Using luciferase reporter assays, we demonstrate that risk alleles of 3'UTR variants, rs324015 ( STAT6 ), rs2280503 ( DIP2B ), rs1128450 ( FBXO38 ) and rs145220637 ( LDAH ), could significantly increase post-transcriptional activities of their target genes compared to reference alleles. Further gene knockdown experiments confirm their oncogenic roles. Our study provides additional insight into the genetic susceptibility of these common cancers.
Collapse
|
30
|
Li J, Ma S, Pei H, Jiang J, Zou Q, Lv Z. Review of T cell proliferation regulatory factors in treatment and prognostic prediction for solid tumors. Heliyon 2023; 9:e21329. [PMID: 37954355 PMCID: PMC10637962 DOI: 10.1016/j.heliyon.2023.e21329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/15/2023] [Accepted: 10/19/2023] [Indexed: 11/14/2023] Open
Abstract
T cell proliferation regulators (Tcprs), which are positive regulators that promote T cell function, have made great contributions to the development of therapies to improve T cell function. CAR (chimeric antigen receptor) -T cell therapy, a type of adoptive cell transfer therapy that targets tumor cells and enhances immune lethality, has led to significant progress in the treatment of hematologic tumors. However, the applications of CAR-T in solid tumor treatment remain limited. Therefore, in this review, we focus on the development of Tcprs for solid tumor therapy and prognostic prediction. We summarize potential strategies for targeting different Tcprs to enhance T cell proliferation and activation and inhibition of cancer progression, thereby improving the antitumor activity and persistence of CAR-T. In summary, we propose means of enhancing CAR-T cells by expressing different Tcprs, which may lead to the development of a new generation of cell therapies.
Collapse
Affiliation(s)
- Jiayu Li
- Student Innovation Competition Team, College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
- College of Life Science, Sichuan University, Chengdu 610065, China
| | - Shuhan Ma
- Student Innovation Competition Team, College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Hongdi Pei
- Student Innovation Competition Team, College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Jici Jiang
- Student Innovation Competition Team, College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Zhibin Lv
- Student Innovation Competition Team, College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| |
Collapse
|
31
|
Ding Y, Zhou H, Zou Q, Yuan L. Identification of drug-side effect association via correntropy-loss based matrix factorization with neural tangent kernel. Methods 2023; 219:73-81. [PMID: 37783242 DOI: 10.1016/j.ymeth.2023.09.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 09/18/2023] [Accepted: 09/20/2023] [Indexed: 10/04/2023] Open
Abstract
Adverse drug reactions include side effects, allergic reactions, and secondary infections. Severe adverse reactions can cause cancer, deformity, or mutation. The monitoring of drug side effects is an important support for post marketing safety supervision of drugs, and an important basis for revising drug instructions. Its purpose is to timely detect and control drug safety risks. Traditional methods are time-consuming. To accelerate the discovery of side effects, we propose a machine learning based method, called correntropy-loss based matrix factorization with neural tangent kernel (CLMF-NTK), to solve the prediction of drug side effects. Our method and other computational methods are tested on three benchmark datasets, and the results show that our method achieves the best predictive performance.
Collapse
Affiliation(s)
- Yijie Ding
- Key Laboratory of Computational Science and Application of Hainan Province, Hainan Normal University, Haikou 571158, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China; School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Hongmei Zhou
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China.
| | - Lei Yuan
- Department of Hepatobiliary Surgery, Quzhou People's Hospital, 100# Minjiang Main Road, Quzhou 324000, China.
| |
Collapse
|
32
|
Lundberg M, Sng LMF, Szul P, Dunne R, Bayat A, Burnham SC, Bauer DC, Twine NA. Novel Alzheimer's disease genes and epistasis identified using machine learning GWAS platform. Sci Rep 2023; 13:17662. [PMID: 37848535 PMCID: PMC10582044 DOI: 10.1038/s41598-023-44378-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 10/07/2023] [Indexed: 10/19/2023] Open
Abstract
Alzheimer's disease (AD) is a complex genetic disease, and variants identified through genome-wide association studies (GWAS) explain only part of its heritability. Epistasis has been proposed as a major contributor to this 'missing heritability', however, many current methods are limited to only modelling additive effects. We use VariantSpark, a machine learning approach to GWAS, and BitEpi, a tool for epistasis detection, to identify AD associated variants and interactions across two independent cohorts, ADNI and UK Biobank. By incorporating significant epistatic interactions, we captured 10.41% more phenotypic variance than logistic regression (LR). We validate the well-established AD loci, APOE, and identify two novel genome-wide significant AD associated loci in both cohorts, SH3BP4 and SASH1, which are also in significant epistatic interactions with APOE. We show that the SH3BP4 SNP has a modulating effect on the known pathogenic APOE SNP, demonstrating a possible protective mechanism against AD. SASH1 is involved in a triplet interaction with pathogenic APOE SNP and ACOT11, where the SASH1 SNP lowered the pathogenic interaction effect between ACOT11 and APOE. Finally, we demonstrate that VariantSpark detects disease associations with 80% fewer controls than LR, unlocking discoveries in well annotated but smaller cohorts.
Collapse
Affiliation(s)
- Mischa Lundberg
- Transformational Bioinformatics, Commonwealth Scientific and Industrial Research Organisation, Sydney, NSW, Australia.
- UQ Frazer Institute, The University of Queensland, Woolloongabba, QLD, Australia.
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD, Australia.
| | - Letitia M F Sng
- Transformational Bioinformatics, Commonwealth Scientific and Industrial Research Organisation, Sydney, NSW, Australia
| | - Piotr Szul
- Health Data Semantics and Interoperability, Commonwealth Scientific and Industrial Research Organisation AU, Brisbane, QLD, Australia
| | - Rob Dunne
- Data61, Commonwealth Scientific and Industrial Research Organisation, Brisbane, QLD, Australia
| | - Arash Bayat
- The Kinghorn Cancer Center (KCCG), Garvan Institute of Medical Research, Sydney, NSW, Australia
| | | | - Denis C Bauer
- Transformational Bioinformatics, Commonwealth Scientific and Industrial Research Organisation, Sydney, NSW, Australia
- Department of Biomedical Sciences, Faculty of Medicine and Health Science, Macquarie University, Macquarie Park, NSW, Australia
- Applied BioSciences, Faculty of Science and Engineering, Macquarie University, Macquarie Park, NSW, Australia
| | - Natalie A Twine
- Transformational Bioinformatics, Commonwealth Scientific and Industrial Research Organisation, Sydney, NSW, Australia.
- Applied BioSciences, Faculty of Science and Engineering, Macquarie University, Macquarie Park, NSW, Australia.
| |
Collapse
|
33
|
Shao M, Zhang Z, Sun H, He J, Wang J, Zhang Q, Cao C. Editorial: Statistical methods for genome-wide association studies (GWAS) and transcriptome-wide association studies (TWAS) and their applications. Front Genet 2023; 14:1287673. [PMID: 37766879 PMCID: PMC10520498 DOI: 10.3389/fgene.2023.1287673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Accepted: 09/05/2023] [Indexed: 09/29/2023] Open
Affiliation(s)
- Mengting Shao
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou, China
| | - Huiyan Sun
- School of Artificial Intelligence, Jilin University, Changchun, China
| | - Jingni He
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, AB, Canada
| | - Juexin Wang
- Department of Biohealth Informatics, Indiana University Purdue University Indianapolis, Indianapolis, IN, United States
| | - Qingrun Zhang
- Department of Mathematics and Statistics, University of Calgary, Calgary, AB, Canada
| | - Chen Cao
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| |
Collapse
|
34
|
Lu M, Feng R, Zhang C, Xiao Y, Yin C. Identifying Novel Drug Targets for Epilepsy Through a Brain Transcriptome-Wide Association Study and Protein-Wide Association Study with Chemical-Gene-Interaction Analysis. Mol Neurobiol 2023; 60:5055-5066. [PMID: 37246165 PMCID: PMC10415436 DOI: 10.1007/s12035-023-03382-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 05/04/2023] [Indexed: 05/30/2023]
Abstract
Epilepsy is a severe neurological condition affecting 50-65 million individuals worldwide that can lead to brain damage. Nevertheless, the etiology of epilepsy remains poorly understood. Meta-analyses of genome-wide association studies involving 15,212 epilepsy cases and 29,677 controls of the ILAE Consortium cohort were used to conduct transcriptome-wide association studies (TWAS) and protein-wide association studies (PWAS). Furthermore, a protein-protein interaction (PPI) network was generated using the STRING database, and significant epilepsy-susceptible genes were verified using chip data. Chemical-related gene set enrichment analysis (CGSEA) was performed to determine novel drug targets for epilepsy. TWAS analysis identified 21,170 genes, of which 58 were significant (TWASfdr < 0.05) in ten brain regions, and 16 differentially expressed genes were verified based on mRNA expression profiles. The PWAS identified 2249 genes, of which 2 were significant (PWASfdr < 0.05). Through chemical-gene set enrichment analysis, 287 environmental chemicals associated with epilepsy were identified. We identified five significant genes (WIPF1, IQSEC1, JAM2, ICAM3, and ZNF143) that had causal relationships with epilepsy. CGSEA identified 159 chemicals that were significantly correlated with epilepsy (Pcgsea < 0.05), such as pentobarbital, ketone bodies, and polychlorinated biphenyl. In summary, we performed TWAS, PWAS (for genetic factors), and CGSEA (for environmental factors) analyses and identified several epilepsy-associated genes and chemicals. The results of this study will contribute to our understanding of genetic and environmental factors for epilepsy and may predict novel drug targets.
Collapse
Affiliation(s)
- Mengnan Lu
- Department of Pediatrics, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710054, Shanxi, China
| | - Ruoyang Feng
- Department of Joint Surgery, HongHui Hospital, Xi'an Jiaotong University, Xi'an, 710054, Shanxi, China
| | - Chenglin Zhang
- Department of Pediatrics, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710054, Shanxi, China
| | - Yanfeng Xiao
- Department of Pediatrics, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710054, Shanxi, China.
| | - Chunyan Yin
- Department of Pediatrics, The Second Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710054, Shanxi, China.
| |
Collapse
|
35
|
Zhang D, Gao B, Feng Q, Manichaikul A, Peloso GM, Tracy RP, Durda P, Taylor KD, Liu Y, Johnson WC, Gabriel S, Gupta N, Smith JD, Aguet F, Ardlie KG, Blackwell TW, Gerszten RE, Rich SS, Rotter JI, Scott LJ, Zhou X, Lee S. Proteome-Wide Association Studies for Blood Lipids and Comparison with Transcriptome-Wide Association Studies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.17.553749. [PMID: 37662416 PMCID: PMC10473643 DOI: 10.1101/2023.08.17.553749] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Blood lipid traits are treatable and heritable risk factors for heart disease, a leading cause of mortality worldwide. Although genome-wide association studies (GWAS) have discovered hundreds of variants associated with lipids in humans, most of the causal mechanisms of lipids remain unknown. To better understand the biological processes underlying lipid metabolism, we investigated the associations of plasma protein levels with total cholesterol (TC), triglycerides (TG), high-density lipoprotein cholesterol (HDL), and low-density lipoprotein cholesterol (LDL) in blood. We trained protein prediction models based on samples in the Multi-Ethnic Study of Atherosclerosis (MESA) and applied them to conduct proteome-wide association studies (PWAS) for lipids using the Global Lipids Genetics Consortium (GLGC) data. Of the 749 proteins tested, 42 were significantly associated with at least one lipid trait. Furthermore, we performed transcriptome-wide association studies (TWAS) for lipids using 9,714 gene expression prediction models trained on samples from peripheral blood mononuclear cells (PBMCs) in MESA and 49 tissues in the Genotype-Tissue Expression (GTEx) project. We found that although PWAS and TWAS can show different directions of associations in an individual gene, 40 out of 49 tissues showed a positive correlation between PWAS and TWAS signed p-values across all the genes, which suggests a high-level consistency between proteome-lipid associations and transcriptome-lipid associations.
Collapse
Affiliation(s)
- Daiwei Zhang
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA USA
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI USA
| | - Boran Gao
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI USA
| | - Qidi Feng
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA USA
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI USA
| | - Ani Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA USA
| | - Gina M Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA USA
| | - Russell P Tracy
- Departments of Pathology & Laboratory Medicine, and Biochemistry, Larner College of Medicine, University of Vermont, Burlington, VT USA
| | - Peter Durda
- Departments of Pathology & Laboratory Medicine, Larner College of Medicine, The University of Vermont, Burlington, VT USA
| | - Kent D Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA USA
| | - Yongmei Liu
- Department of Medicine, Divisions of Cardiology and Neurology, Duke University Medical Center, Durham, NC USA
| | - W Craig Johnson
- Department of Biostatistics, University of Washington, Seattle, WA USA
| | - Stacey Gabriel
- Genomics Platform, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA USA
| | - Namrata Gupta
- Genomics Platform, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA USA
| | - Joshua D Smith
- Department of Genome Sciences, Human Genetics and Translational Genomics, The University of Washington, Seattle, WA, USA
| | - Francois Aguet
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA USA
| | - Kristin G Ardlie
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA USA
| | - Thomas W Blackwell
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO USA
| | - Robert E Gerszten
- Genomics Platform, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA USA
| | - Laura J Scott
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI USA
| | - Xiang Zhou
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI USA
| | - Seunggeun Lee
- Graduate School of Data Science, Seoul National University, Seoul, Republic of Korea
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI USA
| |
Collapse
|
36
|
Qian Y, Shang T, Guo F, Wang C, Cui Z, Ding Y, Wu H. Identification of DNA-binding protein based multiple kernel model. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:13149-13170. [PMID: 37501482 DOI: 10.3934/mbe.2023586] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
DNA-binding proteins (DBPs) play a critical role in the development of drugs for treating genetic diseases and in DNA biology research. It is essential for predicting DNA-binding proteins more accurately and efficiently. In this paper, a Laplacian Local Kernel Alignment-based Restricted Kernel Machine (LapLKA-RKM) is proposed to predict DBPs. In detail, we first extract features from the protein sequence using six methods. Second, the Radial Basis Function (RBF) kernel function is utilized to construct pre-defined kernel metrics. Then, these metrics are combined linearly by weights calculated by LapLKA. Finally, the fused kernel is input to RKM for training and prediction. Independent tests and leave-one-out cross-validation were used to validate the performance of our method on a small dataset and two large datasets. Importantly, we built an online platform to represent our model, which is now freely accessible via http://8.130.69.121:8082/.
Collapse
Affiliation(s)
- Yuqing Qian
- College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Tingting Shang
- College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Chunliang Wang
- The Second Affiliated Hospital of Soochow University, Suzhou, China
| | - Zhiming Cui
- College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Hongjie Wu
- College of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| |
Collapse
|
37
|
Dai Q, Zhou G, Zhao H, Võsa U, Franke L, Battle A, Teumer A, Lehtimäki T, Raitakari OT, Esko T, Epstein MP, Yang J. OTTERS: a powerful TWAS framework leveraging summary-level reference data. Nat Commun 2023; 14:1271. [PMID: 36882394 PMCID: PMC9992663 DOI: 10.1038/s41467-023-36862-w] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Accepted: 02/20/2023] [Indexed: 03/09/2023] Open
Abstract
Most existing TWAS tools require individual-level eQTL reference data and thus are not applicable to summary-level reference eQTL datasets. The development of TWAS methods that can harness summary-level reference data is valuable to enable TWAS in broader settings and enhance power due to increased reference sample size. Thus, we develop a TWAS framework called OTTERS (Omnibus Transcriptome Test using Expression Reference Summary data) that adapts multiple polygenic risk score (PRS) methods to estimate eQTL weights from summary-level eQTL reference data and conducts an omnibus TWAS. We show that OTTERS is a practical and powerful TWAS tool by both simulations and application studies.
Collapse
Affiliation(s)
- Qile Dai
- Department of Biostatistics and Bioinformatics, Emory University School of Public Health, Atlanta, GA, 30322, USA
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
| | - Geyu Zhou
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06511, USA
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06511, USA
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA
| | - Urmo Võsa
- Estonian Genome Centre, Institute of Genomics, University of Tartu, 50090, Tartu, Estonia
| | - Lude Franke
- Department of Genetics, University of Groningen, University Medical Center Groningen, 9700 RB, Groningen, The Netherlands
- Oncode Institute, 3521 AL, Utrecht, The Netherlands
| | - Alexis Battle
- Department of Computer Science, and Departments of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Alexander Teumer
- Institute for Community Medicine, University Medicine Greifswald, 17489, Greifswald, Germany
| | - Terho Lehtimäki
- Department of Clinical Chemistry, Fimlab Laboratories and Finnish Centre for Cardiovascular Disease Tampere, Faculty of Medicine and Health Technology, Tampere University, Tampere, 33520, Finland
| | - Olli T Raitakari
- Centre for Population Health Research, University of Turku and Turku University Hospital, 20520, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, 20520, Turku, Finland
- Department of Clinical Physiology and Nuclear Medicine, Turku University Hospital, 20521, Turku, Finland
| | - Tõnu Esko
- Estonian Genome Centre, Institute of Genomics, University of Tartu, 50090, Tartu, Estonia
| | - Michael P Epstein
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA.
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA.
| |
Collapse
|
38
|
CoNet: Efficient Network Regression for Survival Analysis in Transcriptome-Wide Association Studies—With Applications to Studies of Breast Cancer. Genes (Basel) 2023; 14:genes14030586. [PMID: 36980857 PMCID: PMC10048118 DOI: 10.3390/genes14030586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 02/23/2023] [Accepted: 02/23/2023] [Indexed: 03/02/2023] Open
Abstract
Transcriptome-wide association studies (TWASs) aim to detect associations between genetically predicted gene expression and complex diseases or traits through integrating genome-wide association studies (GWASs) and expression quantitative trait loci (eQTL) mapping studies. Most current TWAS methods analyze one gene at a time, ignoring the correlations between multiple genes. Few of the existing TWAS methods focus on survival outcomes. Here, we propose a novel method, namely a COx proportional hazards model for NEtwork regression in TWAS (CoNet), that is applicable for identifying the association between one given network and the survival time. CoNet considers the general relationship among the predicted gene expression as edges of the network and quantifies it through pointwise mutual information (PMI), which is under a two-stage TWAS. Extensive simulation studies illustrate that CoNet can not only achieve type I error calibration control in testing both the node effect and edge effect, but it can also gain more power compared with currently available methods. In addition, it demonstrates superior performance in real data application, namely utilizing the breast cancer survival data of UK Biobank. CoNet effectively accounts for network structure and can simultaneously identify the potential effecting nodes and edges that are related to survival outcomes in TWAS.
Collapse
|
39
|
Zhang Z, Xu J, Wu Y, Liu N, Wang Y, Liang Y. CapsNet-LDA: predicting lncRNA-disease associations using attention mechanism and capsule network based on multi-view data. Brief Bioinform 2023; 24:6889447. [PMID: 36511221 DOI: 10.1093/bib/bbac531] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 10/25/2022] [Accepted: 11/07/2022] [Indexed: 12/15/2022] Open
Abstract
Cumulative studies have shown that many long non-coding RNAs (lncRNAs) are crucial in a number of diseases. Predicting potential lncRNA-disease associations (LDAs) can facilitate disease prevention, diagnosis and treatment. Therefore, it is vital to develop practical computational methods for LDA prediction. In this study, we propose a novel predictor named capsule network (CapsNet)-LDA for LDA prediction. CapsNet-LDA first uses a stacked autoencoder for acquiring the informative low-dimensional representations of the lncRNA-disease pairs under multiple views, then the attention mechanism is leveraged to implement an adaptive allocation of importance weights to them, and they are subsequently processed using a CapsNet-based architecture for predicting LDAs. Different from the conventional convolutional neural networks (CNNs) that have some restrictions with the usage of scalar neurons and pooling operations. the CapsNets use vector neurons instead of scalar neurons that have better robustness for the complex combination of features and they use dynamic routing processes for updating parameters. CapsNet-LDA is superior to other five state-of-the-art models on four benchmark datasets, four perturbed datasets and an independent test set in the comparison experiments, demonstrating that CapsNet-LDA has excellent performance and robustness against perturbation, as well as good generalization ability. The ablation studies verify the effectiveness of some modules of CapsNet-LDA. Moreover, the ability of multi-view data to improve performance is proven. Case studies further indicate that CapsNet-LDA can accurately predict novel LDAs for specific diseases.
Collapse
Affiliation(s)
- Zequn Zhang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 310045 Jiangxi, China
| | - Junlin Xu
- College of Information Science and Engineering, Hunan University, Changsha 410082, Hunan, China
| | - Yanan Wu
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 310045 Jiangxi, China
| | - Niannian Liu
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 310045 Jiangxi, China
| | - Yinglong Wang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 310045 Jiangxi, China
| | - Ying Liang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 310045 Jiangxi, China
| |
Collapse
|
40
|
He R, Xue H, Pan W. Statistical power of transcriptome-wide association studies. Genet Epidemiol 2022; 46:572-588. [PMID: 35766062 PMCID: PMC9669108 DOI: 10.1002/gepi.22491] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 05/27/2022] [Accepted: 05/31/2022] [Indexed: 01/02/2023]
Abstract
Transcriptome-Wide Association Studies (TWASs) have become increasingly popular in identifying genes (or other endophenotypes or exposures) associated with complex traits. In TWAS, one first builds a predictive model for gene expressions using an expression quantitative trait loci (eQTL) data set in stage 1, then tests the association between the predicted gene expression and a trait based on a large, independent genome-wide association study (GWAS) data set in stage 2. However, since the sample size of the eQTL data set is usually small and the coefficient of multiple determination (i.e.,R 2 ${R}^{2}$ ) of the model for many genes is also small, a question of interest is to what extent these factors affect the statistical power of TWAS. In addition, in contrast to a standard (univariate) TWAS (UV-TWAS) considering only a single gene at a time, multivariate TWAS (MV-TWAS) methods have recently emerged to account for the effects of multiple genes, or a gene's nonlinear effects, simultaneously. With the absence of the power analysis for these MV-TWAS methods, it would be of interest to investigate whether one can gain or lose power by using the newly proposed MV-TWAS instead of UV-TWAS. In this paper, we first outline a general method for sample size/power calculations for two-sample TWAS, then use real data-the Alzheimer's Disease Neuroimaging Initiative (ADNI) expression quantitative trait loci (eQTL) data and the Genotype-Tissue Expression (GTEx) eQTL data for stage 1, the International Genomics of Alzheimer's Project Alzheimer's disease (AD) GWAS summary data and UK Biobank (UKB) individual-level data for stage 2-to empirically address these questions. Our most important conclusions are the following. First, a sample size of a few thousands (~8000) would suffice in stage 1, where the power of TWAS would be more determined by cis-heritability of gene expression. Second, as in the general case of simple regression versus multiple regression, the power of MV-TWAS may be higher or lower than that of UV-TWAS, depending on the specific relationships among the GWAS trait and multiple genes (or linear and nonlinear terms of the same gene's expression levels), such as their correlations and effect sizes. Interestingly, several top genes with large power gains in MV-TWAS (over that in UV-TWAS) were known to be (and in our data more significantly) associated with AD. We also reached similar conclusions in an application to the GTEx whole blood gene expression data and UKB GWAS data of high-density lipoprotein cholesterol. The proposed method and the conclusions are expected to be useful in planning and designing future TWAS and other related studies (e.g., Proteome- or Metabolome-Wide Association Studies) when determining the sample sizes for the two stages.
Collapse
Affiliation(s)
- Ruoyu He
- School of StatisticsUniversity of MinnesotaMinneapolisMinnesotaUSA
- University of MinnesotaDivision of Biostatistics, School of Public HealthMinneapolisMinnesotaUSA
| | - Haoran Xue
- University of MinnesotaDivision of Biostatistics, School of Public HealthMinneapolisMinnesotaUSA
| | - Wei Pan
- University of MinnesotaDivision of Biostatistics, School of Public HealthMinneapolisMinnesotaUSA
| | | |
Collapse
|
41
|
He J, Wen W, Beeghly A, Chen Z, Cao C, Shu XO, Zheng W, Long Q, Guo X. Integrating transcription factor occupancy with transcriptome-wide association analysis identifies susceptibility genes in human cancers. Nat Commun 2022; 13:7118. [PMID: 36402776 PMCID: PMC9675749 DOI: 10.1038/s41467-022-34888-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 11/10/2022] [Indexed: 11/21/2022] Open
Abstract
Transcriptome-wide association studies (TWAS) have successfully discovered many putative disease susceptibility genes. However, TWAS may suffer from inaccuracy of gene expression predictions due to inclusion of non-regulatory variants. By integrating prior knowledge of susceptible transcription factor occupied elements, we develop sTF-TWAS and demonstrate that it outperforms existing TWAS approaches in both simulation and real data analyses. Under the sTF-TWAS framework, we build genetic models to predict alternative splicing and gene expression in normal breast, prostate and lung tissues from the Genotype-Tissue Expression project and apply these models to data from large genome-wide association studies (GWAS) conducted among European-ancestry populations. At Bonferroni-corrected P < 0.05, we identify 354 putative susceptibility genes for these cancers, including 189 previously unreported in GWAS loci and 45 in loci unreported by GWAS. These findings provide additional insight into the genetic susceptibility of human cancers. Additionally, we show the generalizability of the sTF-TWAS on non-cancer diseases.
Collapse
Affiliation(s)
- Jingni He
- grid.22072.350000 0004 1936 7697Department of Biochemistry & Molecular Biology, University of Calgary, Calgary, Canada ,grid.452223.00000 0004 1757 7615Department of Oncology, Xiangya Hospital, Central South University, Changsha, Hunan China
| | - Wanqing Wen
- grid.152326.10000 0001 2264 7217Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN USA
| | - Alicia Beeghly
- grid.152326.10000 0001 2264 7217Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN USA
| | - Zhishan Chen
- grid.152326.10000 0001 2264 7217Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN USA
| | - Chen Cao
- grid.22072.350000 0004 1936 7697Department of Biochemistry & Molecular Biology, University of Calgary, Calgary, Canada
| | - Xiao-Ou Shu
- grid.152326.10000 0001 2264 7217Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN USA
| | - Wei Zheng
- grid.152326.10000 0001 2264 7217Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN USA
| | - Quan Long
- grid.22072.350000 0004 1936 7697Department of Biochemistry & Molecular Biology, University of Calgary, Calgary, Canada ,grid.22072.350000 0004 1936 7697Department of Medical Genetics, University of Calgary, Calgary, Canada ,grid.22072.350000 0004 1936 7697Department of Mathematics & Statistics, University of Calgary, Calgary, Canada ,grid.22072.350000 0004 1936 7697Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, Canada ,grid.22072.350000 0004 1936 7697Hotchkiss Brain Institute, University of Calgary, Calgary, Canada
| | - Xingyi Guo
- grid.152326.10000 0001 2264 7217Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, TN USA ,grid.152326.10000 0001 2264 7217Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN USA
| |
Collapse
|
42
|
Lu M, Zhang Y, Yang F, Mai J, Gao Q, Xu X, Kang H, Hou L, Shang Y, Qain Q, Liu J, Jiang M, Zhang H, Bu C, Wang J, Zhang Z, Zhang Z, Zeng J, Li J, Xiao J. TWAS Atlas: a curated knowledgebase of transcriptome-wide association studies. Nucleic Acids Res 2022; 51:D1179-D1187. [PMID: 36243959 PMCID: PMC9825460 DOI: 10.1093/nar/gkac821] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Revised: 09/08/2022] [Accepted: 09/14/2022] [Indexed: 01/30/2023] Open
Abstract
Transcriptome-wide association studies (TWASs), as a practical and prevalent approach for detecting the associations between genetically regulated genes and traits, are now leading to a better understanding of the complex mechanisms of genetic variants in regulating various diseases and traits. Despite the ever-increasing TWAS outputs, there is still a lack of databases curating massive public TWAS information and knowledge. To fill this gap, here we present TWAS Atlas (https://ngdc.cncb.ac.cn/twas/), an integrated knowledgebase of TWAS findings manually curated from extensive literature. In the current implementation, TWAS Atlas collects 401,266 high-quality human gene-trait associations from 200 publications, covering 22,247 genes and 257 traits across 135 tissue types. In particular, an interactive knowledge graph of the collected gene-trait associations is constructed together with single nucleotide polymorphism (SNP)-gene associations to build up comprehensive regulatory networks at multi-omics levels. In addition, TWAS Atlas, as a user-friendly web interface, efficiently enables users to browse, search and download all association information, relevant research metadata and annotation information of interest. Taken together, TWAS Atlas is of great value for promoting the utility and availability of TWAS results in explaining the complex genetic basis as well as providing new insights for human health and disease research.
Collapse
Affiliation(s)
| | | | | | | | - Qianwen Gao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiaowei Xu
- Institute of Medical Information, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China
| | - Hongyu Kang
- Institute of Medical Information, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China
| | - Li Hou
- Institute of Medical Information, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing 100020, China
| | - Yunfei Shang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qiheng Qain
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jie Liu
- North China University of Science and Technology Affiliated Hospital, Tangshan 063000, China
| | - Meiye Jiang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hao Zhang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Congfan Bu
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Jinyue Wang
- Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Zhewen Zhang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Zaichao Zhang
- Department of Biology, The University of Western Ontario, London, OntarioN6A 5B7, Canada
| | - Jingyao Zeng
- Correspondence may also be addressed to Jingyao Zeng.
| | - Jiao Li
- Correspondence may also be addressed to Jiao Li.
| | - Jingfa Xiao
- To whom correspondence should be addressed. Tel: +86 10 8409 7443; Fax: +86 10 8409 7720;
| |
Collapse
|
43
|
Bhattacharya A, Hirbo JB, Zhou D, Zhou W, Zheng J, Kanai M, Pasaniuc B, Gamazon ER, Cox NJ. Best practices for multi-ancestry, meta-analytic transcriptome-wide association studies: Lessons from the Global Biobank Meta-analysis Initiative. CELL GENOMICS 2022; 2:100180. [PMID: 36341024 PMCID: PMC9631681 DOI: 10.1016/j.xgen.2022.100180] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Revised: 08/09/2022] [Accepted: 09/01/2022] [Indexed: 12/13/2022]
Abstract
The Global Biobank Meta-analysis Initiative (GBMI), through its diversity, provides a valuable opportunity to study population-wide and ancestry-specific genetic associations. However, with multiple ascertainment strategies and multi-ancestry study populations across biobanks, GBMI presents unique challenges in implementing statistical genetics methods. Transcriptome-wide association studies (TWASs) boost detection power for and provide biological context to genetic associations by integrating genetic variant-to-trait associations from genome-wide association studies (GWASs) with predictive models of gene expression. TWASs present unique challenges beyond GWASs, especially in a multi-biobank, meta-analytic setting. Here, we present the GBMI TWAS pipeline, outlining practical considerations for ancestry and tissue specificity, meta-analytic strategies, and open challenges at every step of the framework. We advise conducting ancestry-stratified TWASs using ancestry-specific expression models and meta-analyzing results using inverse-variance weighting, showing the least test statistic inflation. Our work provides a foundation for adding transcriptomic context to biobank-linked GWASs, allowing for ancestry-aware discovery to accelerate genomic medicine.
Collapse
Affiliation(s)
- Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Institute of Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Jibril B. Hirbo
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Dan Zhou
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Wei Zhou
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Jie Zheng
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita 565-0871, Japan
| | - the Global Biobank Meta-analysis Initiative
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Institute of Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita 565-0871, Japan
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- MRC Epidemiology Unit, University of Cambridge, Cambridge, UK
| | - Bogdan Pasaniuc
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Eric R. Gamazon
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
- MRC Epidemiology Unit, University of Cambridge, Cambridge, UK
| | - Nancy J. Cox
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
44
|
Mattheisen M, Grove J, Als TD, Martin J, Voloudakis G, Meier S, Demontis D, Bendl J, Walters R, Carey CE, Rosengren A, Strom NI, Hauberg ME, Zeng B, Hoffman G, Zhang W, Bybjerg-Grauholm J, Bækvad-Hansen M, Agerbo E, Cormand B, Nordentoft M, Werge T, Mors O, Hougaard DM, Buxbaum JD, Faraone SV, Franke B, Dalsgaard S, Mortensen PB, Robinson EB, Roussos P, Neale BM, Daly MJ, Børglum AD. Identification of shared and differentiating genetic architecture for autism spectrum disorder, attention-deficit hyperactivity disorder and case subgroups. Nat Genet 2022; 54:1470-1478. [PMID: 36163277 PMCID: PMC10848300 DOI: 10.1038/s41588-022-01171-3] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 06/20/2022] [Indexed: 02/02/2023]
Abstract
Attention-deficit hyperactivity disorder (ADHD) and autism spectrum disorder (ASD) are highly heritable neurodevelopmental conditions, with considerable overlap in their genetic etiology. We dissected their shared and distinct genetic etiology by cross-disorder analyses of large datasets. We identified seven loci shared by the disorders and five loci differentiating them. All five differentiating loci showed opposite allelic directions in the two disorders and significant associations with other traits, including educational attainment, neuroticism and regional brain volume. Integration with brain transcriptome data enabled us to identify and prioritize several significantly associated genes. The shared genomic fraction contributing to both disorders was strongly correlated with other psychiatric phenotypes, whereas the differentiating portion was correlated most strongly with cognitive traits. Additional analyses revealed that individuals diagnosed with both ASD and ADHD were double-loaded with genetic predispositions for both disorders and showed distinctive patterns of genetic association with other traits compared with the ASD-only and ADHD-only subgroups. These results provide insights into the biological foundation of the development of one or both conditions and of the factors driving psychopathology discriminatively toward either ADHD or ASD.
Collapse
Affiliation(s)
- Manuel Mattheisen
- Department of Biomedicine - Human Genetics and the iSEQ Center, Aarhus University, Aarhus, Denmark.
- Department of Community Health and Epidemiology & Department of Psychiatry, Dalhousie University, Halifax, NS, Canada.
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich, Germany.
| | - Jakob Grove
- Department of Biomedicine - Human Genetics and the iSEQ Center, Aarhus University, Aarhus, Denmark
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Center for Genomics and Personalized Medicine, Aarhus, Denmark
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Thomas D Als
- Department of Biomedicine - Human Genetics and the iSEQ Center, Aarhus University, Aarhus, Denmark
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Center for Genomics and Personalized Medicine, Aarhus, Denmark
| | - Joanna Martin
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Cardiff, UK
| | - Georgios Voloudakis
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Sandra Meier
- Department of Biomedicine - Human Genetics and the iSEQ Center, Aarhus University, Aarhus, Denmark
- Department of Community Health and Epidemiology & Department of Psychiatry, Dalhousie University, Halifax, NS, Canada
| | - Ditte Demontis
- Department of Biomedicine - Human Genetics and the iSEQ Center, Aarhus University, Aarhus, Denmark
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Center for Genomics and Personalized Medicine, Aarhus, Denmark
| | - Jaroslav Bendl
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Raymond Walters
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Caitlin E Carey
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Anders Rosengren
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Institute of Biological Psychiatry, Mental Health Services Copenhagen, Copenhagen University Hospital, Copenhagen, Denmark
| | - Nora I Strom
- Department of Biomedicine - Human Genetics and the iSEQ Center, Aarhus University, Aarhus, Denmark
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich, Germany
- Centre for Psychiatry Research, Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
| | - Mads Engel Hauberg
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Biao Zeng
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Gabriel Hoffman
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Wen Zhang
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jonas Bybjerg-Grauholm
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
| | - Marie Bækvad-Hansen
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
| | - Esben Agerbo
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
- Centre for Integrated Register-based Research, Aarhus University, Aarhus, Denmark
| | - Bru Cormand
- Department of Genetics, Microbiology and Statistics, Faculty of Biology, University of Barcelona, Barcelona, Catalonia, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Instituto de Salud Carlos III, Madrid, Spain
- Institut de Biomedicina de la Universitat de Barcelona (IBUB), Barcelona, Catalonia, Spain
- Institut de Recerca Sant Joan de Déu (IR-SJD), Esplugues de Llobregat, Catalonia, Spain
| | - Merete Nordentoft
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Department of Clinical Medicine, Faculty of Health Science, University of Copenhagen, Copenhagen, Denmark
- Copenhagen Research Centre for Mental Health (CORE), Mental Health Centre Copenhagen, Copenhagen, Denmark
- University Hospital, Hellerup, Denmark
| | - Thomas Werge
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Institute of Biological Psychiatry, Mental Health Services Copenhagen, Copenhagen University Hospital, Copenhagen, Denmark
- Department of Clinical Medicine, Faculty of Health Science, University of Copenhagen, Copenhagen, Denmark
- GLOBE Institute, Center for GeoGenetics, University of Copenhagen, Copenhagen, Denmark
| | - Ole Mors
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Psychosis Research Unit, Aarhus University Hospital-Psychiatry, Aarhus, Denmark
| | - David M Hougaard
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
| | - Joseph D Buxbaum
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Stephen V Faraone
- Department of Psychiatry, State University of New York Upstate Medical University, Syracuse, NY, USA
- Department of Neuroscience and Physiology, State University of New York Upstate Medical University, Syracuse, NY, USA
| | - Barbara Franke
- Department of Psychiatry, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Søren Dalsgaard
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
| | - Preben B Mortensen
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Center for Genomics and Personalized Medicine, Aarhus, Denmark
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
- Centre for Integrated Register-based Research, Aarhus University, Aarhus, Denmark
| | - Elise B Robinson
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Panos Roussos
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technology, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, JJ Peters VA Medical Center, Bronx, NY, USA
| | - Benjamin M Neale
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mark J Daly
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
| | - Anders D Børglum
- Department of Biomedicine - Human Genetics and the iSEQ Center, Aarhus University, Aarhus, Denmark.
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark.
- Center for Genomics and Personalized Medicine, Aarhus, Denmark.
| |
Collapse
|
45
|
Bhattacharya A, Freedman AN, Avula V, Harris R, Liu W, Pan C, Lusis AJ, Joseph RM, Smeester L, Hartwell HJ, Kuban KCK, Marsit CJ, Li Y, O'Shea TM, Fry RC, Santos HP. Placental genomics mediates genetic associations with complex health traits and disease. Nat Commun 2022; 13:706. [PMID: 35121757 PMCID: PMC8817049 DOI: 10.1038/s41467-022-28365-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 12/15/2021] [Indexed: 01/09/2023] Open
Abstract
As the master regulator in utero, the placenta is core to the Developmental Origins of Health and Disease (DOHaD) hypothesis but is historically understudied. To identify placental gene-trait associations (GTAs) across the life course, we perform distal mediator-enriched transcriptome-wide association studies (TWAS) for 40 traits, integrating placental multi-omics from the Extremely Low Gestational Age Newborn Study. At [Formula: see text], we detect 248 GTAs, mostly for neonatal and metabolic traits, across 176 genes, enriched for cell growth and immunological pathways. In aggregate, genetic effects mediated by placental expression significantly explain 4 early-life traits but no later-in-life traits. 89 GTAs show significant mediation through distal genetic variants, identifying hypotheses for distal regulation of GTAs. Investigation of one hypothesis in human placenta-derived choriocarcinoma cells reveal that knockdown of mediator gene EPS15 upregulates predicted targets SPATA13 and FAM214A, both associated with waist-hip ratio in TWAS, and multiple genes involved in metabolic pathways. These results suggest profound health impacts of placental genomic regulation in developmental programming across the life course.
Collapse
Affiliation(s)
- Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA.
- Institute for Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA.
| | - Anastasia N Freedman
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27514, USA
| | - Vennela Avula
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27514, USA
| | - Rebeca Harris
- Biobehavioral Laboratory, School of Nursing, University of North Carolina, Chapel Hill, NC, 27514, USA
| | - Weifang Liu
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27514, USA
| | - Calvin Pan
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Aldons J Lusis
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
- Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
- Department of Microbiology, Immunology and Molecular Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
| | - Robert M Joseph
- Department of Anatomy and Neurobiology, Boston University School of Medicine, Boston, MA, 02118, USA
| | - Lisa Smeester
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27514, USA
- Institute for Environmental Health Solutions, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27514, USA
- Curriculum in Toxicology and Environmental Medicine, University of North Carolina, Chapel Hill, NC, 27514, USA
| | - Hadley J Hartwell
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27514, USA
| | - Karl C K Kuban
- Department of Pediatrics, Division of Pediatric Neurology, Boston University Medical Center, Boston, MA, 02118, USA
| | - Carmen J Marsit
- Gangarosa Department of Environmental Health, Rollins School of Public Health Emory University, Atlanta, GA, 30322, USA
| | - Yun Li
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27514, USA
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27514, USA
- Department of Computer Science, University of North Carolina, Chapel Hill, NC, 27514, USA
| | - T Michael O'Shea
- Department of Pediatrics, School of Medicine, University of North Carolina, Chapel Hill, NC, 27514, USA
| | - Rebecca C Fry
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27514, USA.
- Institute for Environmental Health Solutions, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27514, USA.
- Curriculum in Toxicology and Environmental Medicine, University of North Carolina, Chapel Hill, NC, 27514, USA.
| | - Hudson P Santos
- Biobehavioral Laboratory, School of Nursing, University of North Carolina, Chapel Hill, NC, 27514, USA.
- Institute for Environmental Health Solutions, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, 27514, USA.
| |
Collapse
|
46
|
Parrish RL, Gibson GC, Epstein MP, Yang J. TIGAR-V2: Efficient TWAS tool with nonparametric Bayesian eQTL weights of 49 tissue types from GTEx V8. HGG ADVANCES 2022; 3:100068. [PMID: 35047855 PMCID: PMC8756507 DOI: 10.1016/j.xhgg.2021.100068] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 11/01/2021] [Indexed: 01/12/2023] Open
Abstract
Standard transcriptome-wide association study (TWAS) methods first train gene expression prediction models using reference transcriptomic data and then test the association between the predicted genetically regulated gene expression and phenotype of interest. Most existing TWAS tools require cumbersome preparation of genotype input files and extra coding to enable parallel computation. To improve the efficiency of TWAS tools, we developed Transcriptome-Integrated Genetic Association Resource V2 (TIGAR-V2), which directly reads Variant Call Format (VCF) files, enables parallel computation, and reduces up to 90% of computation cost (mainly due to loading genotype data) compared to the original version. TIGAR-V2 can train gene expression imputation models using either nonparametric Bayesian Dirichlet process regression (DPR) or Elastic-Net (as used by PrediXcan), perform TWASs using either individual-level or summary-level genome-wide association study (GWAS) data, and implement both burden and variance-component statistics for gene-based association tests. We trained gene expression prediction models by DPR for 49 tissues using Genotype-Tissue Expression (GTEx) V8 by TIGAR-V2 and illustrated the usefulness of these Bayesian cis-expression quantitative trait locus (eQTL) weights through TWASs of breast and ovarian cancer utilizing public GWAS summary statistics. We identified 88 and 37 risk genes, respectively, for breast and ovarian cancer, most of which are either known or near previously identified GWAS (∼95%) or TWAS (∼40%) risk genes and three novel independent TWAS risk genes with known functions in carcinogenesis. These findings suggest that TWASs can provide biological insight into the transcriptional regulation of complex diseases. The TIGAR-V2 tool, trained Bayesian cis-eQTL weights, and linkage disequilibrium (LD) information from GTEx V8 are publicly available, providing a useful resource for mapping risk genes of complex diseases.
Collapse
Affiliation(s)
- Randy L. Parrish
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Greg C. Gibson
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Michael P. Epstein
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| |
Collapse
|
47
|
Cao C, Kossinna P, Kwok D, Li Q, He J, Su L, Guo X, Zhang Q, Long Q. Disentangling genetic feature selection and aggregation in transcriptome-wide association studies. Genetics 2021; 220:6444993. [PMID: 34849857 DOI: 10.1093/genetics/iyab216] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 11/04/2021] [Indexed: 12/14/2022] Open
Abstract
The success of transcriptome-wide association studies (TWAS) has led to substantial research towards improving the predictive accuracy of its core component of Genetically Regulated eXpression (GReX). GReX links expression information with genotype and phenotype by playing two roles simultaneously: it acts as both the outcome of the genotype-based predictive models (for predicting expressions) and the linear combination of genotypes (as the predicted expressions) for association tests. From the perspective of machine learning (considering SNPs as features), these are actually two separable steps-feature selection and feature aggregation-which can be independently conducted. In this work, we show that the single approach of GReX limits the adaptability of TWAS methodology and practice. By conducting simulations and real data analysis, we demonstrate that disentangled protocols adapting straightforward approaches for feature selection (e.g., simple marker test) and aggregation (e.g., kernel machines) outperform the standard TWAS protocols that rely on GReX. Our development provides more powerful novel tools for conducting TWAS. More importantly, our characterization of the exact nature of TWAS suggests that, instead of questionably binding two distinct steps into the same statistical form (GReX), methodological research focusing on optimal combinations of feature selection and aggregation approaches will bring higher power to TWAS protocols.
Collapse
Affiliation(s)
- Chen Cao
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Pathum Kossinna
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Devin Kwok
- Department of Mathematics & Statistics, University of Calgary, Calgary, AB T2N 1N4, Canada
| | - Qing Li
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Jingni He
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Liya Su
- Department of Pathology, Anatomy and Cell Biology, Thomas Jefferson University, Philadelphia, PA 19107, USA
| | - Xingyi Guo
- Division of Epidemiology, Department of Medicine, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| | - Qingrun Zhang
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada.,Department of Mathematics & Statistics, University of Calgary, Calgary, AB T2N 1N4, Canada
| | - Quan Long
- Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB T2N 4N1, Canada.,Department of Mathematics & Statistics, University of Calgary, Calgary, AB T2N 1N4, Canada.,Department of Medical Genetics, University of Calgary, Calgary, AB T2N 4N1, Canada.,Hotchkiss Brain Institute, O'Brien Institute for Public Health, University of Calgary, Calgary, AB T2N 4N1, Canada
| |
Collapse
|
48
|
Cao C, Wang J, Kwok D, Cui F, Zhang Z, Zhao D, Li MJ, Zou Q. webTWAS: a resource for disease candidate susceptibility genes identified by transcriptome-wide association study. Nucleic Acids Res 2021; 50:D1123-D1130. [PMID: 34669946 PMCID: PMC8728162 DOI: 10.1093/nar/gkab957] [Citation(s) in RCA: 145] [Impact Index Per Article: 36.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 09/24/2021] [Accepted: 10/05/2021] [Indexed: 12/20/2022] Open
Abstract
The development of transcriptome-wide association studies (TWAS) has enabled researchers to better identify and interpret causal genes in many diseases. However, there are currently no resources providing a comprehensive listing of gene-disease associations discovered by TWAS from published GWAS summary statistics. TWAS analyses are also difficult to conduct due to the complexity of TWAS software pipelines. To address these issues, we introduce a new resource called webTWAS, which integrates a database of the most comprehensive disease GWAS datasets currently available with credible sets of potential causal genes identified by multiple TWAS software packages. Specifically, a total of 235 064 gene-diseases associations for a wide range of human diseases are prioritized from 1298 high-quality downloadable European GWAS summary statistics. Associations are calculated with seven different statistical models based on three popular and representative TWAS software packages. Users can explore associations at the gene or disease level, and easily search for related studies or diseases using the MeSH disease tree. Since the effects of diseases are highly tissue-specific, webTWAS applies tissue-specific enrichment analysis to identify significant tissues. A user-friendly web server is also available to run custom TWAS analyses on user-provided GWAS summary statistics data. webTWAS is freely available at http://www.webtwas.net.
Collapse
Affiliation(s)
- Chen Cao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada
| | - Jianhua Wang
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Devin Kwok
- School of Computer Science, McGill University, Montreal, Canada
| | - Feifei Cui
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Zilong Zhang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Da Zhao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Mulin Jun Li
- Department of Pharmacology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
49
|
Bhattacharya A, Li Y, Love MI. MOSTWAS: Multi-Omic Strategies for Transcriptome-Wide Association Studies. PLoS Genet 2021; 17:e1009398. [PMID: 33684137 PMCID: PMC7971899 DOI: 10.1371/journal.pgen.1009398] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2020] [Revised: 03/18/2021] [Accepted: 02/04/2021] [Indexed: 02/06/2023] Open
Abstract
Traditional predictive models for transcriptome-wide association studies (TWAS) consider only single nucleotide polymorphisms (SNPs) local to genes of interest and perform parameter shrinkage with a regularization process. These approaches ignore the effect of distal-SNPs or other molecular effects underlying the SNP-gene association. Here, we outline multi-omics strategies for transcriptome imputation from germline genetics to allow more powerful testing of gene-trait associations by prioritizing distal-SNPs to the gene of interest. In one extension, we identify mediating biomarkers (CpG sites, microRNAs, and transcription factors) highly associated with gene expression and train predictive models for these mediators using their local SNPs. Imputed values for mediators are then incorporated into the final predictive model of gene expression, along with local SNPs. In the second extension, we assess distal-eQTLs (SNPs associated with genes not in a local window around it) for their mediation effect through mediating biomarkers local to these distal-eSNPs. Distal-eSNPs with large indirect mediation effects are then included in the transcriptomic prediction model with the local SNPs around the gene of interest. Using simulations and real data from ROS/MAP brain tissue and TCGA breast tumors, we show considerable gains of percent variance explained (1-2% additive increase) of gene expression and TWAS power to detect gene-trait associations. This integrative approach to transcriptome-wide imputation and association studies aids in identifying the complex interactions underlying genetic regulation within a tissue and important risk genes for various traits and disorders.
Collapse
Affiliation(s)
- Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, University of California-Los Angeles, Los Angeles, California, United States of America
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Michael I. Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| |
Collapse
|