1
|
Yuan Y, Zheng X, Zhang W, Ren Z, Liang B. A cross-tissue transcriptome-wide association study identifies novel susceptibility genes for atrial fibrillation. J Arrhythm 2025; 41:e70097. [PMID: 40416952 PMCID: PMC12099065 DOI: 10.1002/joa3.70097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2025] [Revised: 04/09/2025] [Accepted: 05/13/2025] [Indexed: 05/27/2025] Open
Abstract
Background Atrial fibrillation (AF), the most common cardiac arrhythmia, has been linked to numerous loci identified by genome-wide association studies (GWAS). However, the causal genes and underlying mechanisms remain unclear. Methods We conducted a cross-tissue transcriptome-wide association studies (TWAS) using the unified test for molecular signatures (UTMOST), integrating genetic data from the FinnGen R11 cohort (287 805 individuals) with gene expression profiles from the genotype-tissue expression (GTEx) project. To enhance reliability, we applied functional summary-based imputation (FUSION), fine-mapping of causal gene sets (FOCUS), and multi-marker analysis of GenoMic annotation (MAGMA) for gene prioritization, followed by Mendelian randomization (MR) and colocalization analyses. GeneMANIA was used to explore gene functions. Results By integrating four TWAS approaches, this study identified five novel susceptibility genes significantly associated with AF risk. MR analysis further revealed that the gene expression levels of FKBP7, CEP68, and CAMK2D were positively associated with AF risk, while SPATS2L exhibited a significant protective effect. Colocalization analysis demonstrated that CEP68 and SPATS2L share causal variants with AF. Through comprehensive evaluation of multidimensional functional annotations and existing biological evidence, this study highlighted SPATS2L and CEP68 as potential functional candidate genes in AF pathogenesis. Conclusions This cross-tissue TWAS identified five novel AF susceptibility genes (CAMK2D, SPAST2L, CEP68, FKBP7, and SHRMOO3). Elevated expression of FKBP7, CEP68, and CAMK2D increases AF risk, while SPATS2L showed a protective effect, with colocalization analysis implicating CEP68 and SPATS2L as prioritized candidates. The integration of multi-omics approaches effectively unravels AF's genetic mechanisms.
Collapse
Affiliation(s)
- Yalin Yuan
- Shanxi Medical UniversityTaiyuanShanxiChina
| | - Xin Zheng
- Shanxi Medical UniversityTaiyuanShanxiChina
| | | | - Zhaoyu Ren
- Shanxi Medical UniversityTaiyuanShanxiChina
| | - Bin Liang
- Department of Cardiovascular MedicineSecond Hospital of Shanxi Medical UniversityTaiyuanShanxiChina
| |
Collapse
|
2
|
Guan D, Bai Z, Zhu X, Zhong C, Hou Y, Zhu D, Li H, Lan F, Diao S, Yao Y, Zhao B, Li X, Pan Z, Gao Y, Wang Y, Zou D, Wang R, Xu T, Sun C, Yin H, Teng J, Xu Z, Lin Q, Shi S, Shao D, Degalez F, Lagarrigue S, Wang Y, Wang M, Peng M, Rocha D, Charles M, Smith J, Watson K, Buitenhuis AJ, Sahana G, Lund MS, Warren W, Frantz L, Larson G, Lamont SJ, Si W, Zhao X, Li B, Zhang H, Luo C, Shu D, Qu H, Luo W, Li Z, Nie Q, Zhang X, Xiang R, Liu S, Zhang Z, Zhang Z, Liu GE, Cheng H, Yang N, Hu X, Zhou H, Fang L. Genetic regulation of gene expression across multiple tissues in chickens. Nat Genet 2025; 57:1298-1308. [PMID: 40200121 DOI: 10.1038/s41588-025-02155-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 03/05/2025] [Indexed: 04/10/2025]
Abstract
The chicken is a valuable model for understanding fundamental biology and vertebrate evolution and is a major global source of nutrient-dense and lean protein. Despite being the first non-mammalian amniote to have its genome sequenced, a systematic characterization of functional variation on the chicken genome remains lacking. Here, we integrated bulk RNA sequencing (RNA-seq) data from 7,015 samples, single-cell RNA-seq data from 127,598 cells and 2,869 whole-genome sequences to present a pilot atlas of regulatory variants across 28 chicken tissues. This atlas reveals millions of regulatory effects on primary expression (protein-coding genes, long non-coding RNA and exons) and post-transcriptional modifications (alternative splicing and 3'-untranslated region alternative polyadenylation). We highlighted distinct molecular mechanisms underlying these regulatory variants, their context-dependent behavior and their utility in interpreting genome-wide associations for 39 chicken complex traits. Finally, our comparative analyses of gene regulation between chickens and mammals demonstrate how this resource can facilitate cross-species gene mapping of complex traits.
Collapse
Affiliation(s)
- Dailu Guan
- Department of Animal Science, University of California-Davis, Davis, CA, USA
| | - Zhonghao Bai
- Center for Quantitative Genetics and Genomics (QGG), Aarhus University, Aarhus, Denmark
| | - Xiaoning Zhu
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Conghao Zhong
- State Key Laboratory of Animal Biotech Breeding and Frontier Science Center of Molecular Design Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Yali Hou
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Di Zhu
- Center for Quantitative Genetics and Genomics (QGG), Aarhus University, Aarhus, Denmark
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Houcheng Li
- Center for Quantitative Genetics and Genomics (QGG), Aarhus University, Aarhus, Denmark
| | - Fangren Lan
- State Key Laboratory of Animal Biotech Breeding and Frontier Science Center of Molecular Design Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Shuqi Diao
- State Key Laboratory of Swine and Poultry Breeding Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Yuelin Yao
- MRC Human Genetics Unit at the Institute of Genetics and Cancer, The University of Edinburgh, Edinburgh, UK
- School of Informatics, The University of Edinburgh, Edinburgh, UK
| | - Bingru Zhao
- Jiangsu Livestock Embryo Engineering Laboratory, College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, China
| | - Xiaochang Li
- State Key Laboratory of Animal Biotech Breeding and Frontier Science Center of Molecular Design Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Zhangyuan Pan
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Yahui Gao
- State Key Laboratory of Swine and Poultry Breeding Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
- Animal Genomics and Improvement Laboratory, Henry A. Wallace Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, USA
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD, USA
| | - Yuzhe Wang
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Dong Zou
- Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, China
| | - Ruizhen Wang
- Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Tianyi Xu
- Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, China
| | - Congjiao Sun
- State Key Laboratory of Animal Biotech Breeding and Frontier Science Center of Molecular Design Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Hongwei Yin
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Jinyan Teng
- State Key Laboratory of Swine and Poultry Breeding Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Zhiting Xu
- State Key Laboratory of Swine and Poultry Breeding Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Qing Lin
- State Key Laboratory of Swine and Poultry Breeding Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Shourong Shi
- Poultry Institute, Chinese Academy of Agricultural Sciences, Yangzhou, China
| | - Dan Shao
- Poultry Institute, Chinese Academy of Agricultural Sciences, Yangzhou, China
| | | | | | - Ying Wang
- Department of Animal Science, University of California-Davis, Davis, CA, USA
| | - Mingshan Wang
- State Key Laboratory of Genetic Evolution & Animal Models and Yunnan Key Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Minsheng Peng
- State Key Laboratory of Genetic Evolution & Animal Models and Yunnan Key Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Dominique Rocha
- INRAE, GABI, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France
| | - Mathieu Charles
- INRAE, GABI, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France
| | - Jacqueline Smith
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, UK
| | - Kellie Watson
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian, UK
| | | | - Goutam Sahana
- Center for Quantitative Genetics and Genomics (QGG), Aarhus University, Aarhus, Denmark
| | - Mogens Sandø Lund
- Center for Quantitative Genetics and Genomics (QGG), Aarhus University, Aarhus, Denmark
| | - Wesley Warren
- Department of Animal Sciences, Data Science and Informatics Institute, University of Missouri, Columbia, MO, USA
| | - Laurent Frantz
- Palaeogenomics Group, Department of Veterinary Sciences, Ludwig Maximilian University, Munich, Germany
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK
| | - Greger Larson
- The Palaeogenomics & Bio-Archaeology Research Network, School of Archaeology, University of Oxford, Oxford, UK
| | - Susan J Lamont
- Department of Animal Science, Iowa State University, Ames, IA, USA
| | - Wei Si
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
- Department of Animal Science, McGill University, Montreal, Quebec, Canada
| | - Xin Zhao
- Department of Animal Science, McGill University, Montreal, Quebec, Canada
| | - Bingjie Li
- Scotland's Rural College (SRUC), Roslin Institute Building, Midlothian, UK
| | - Haihan Zhang
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, China
| | - Chenglong Luo
- State Key Laboratory of Swine and Poultry Breeding Industry, Guangdong Key Laboratory of Animal Breeding and Nutrition, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Dingming Shu
- State Key Laboratory of Swine and Poultry Breeding Industry, Guangdong Key Laboratory of Animal Breeding and Nutrition, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Hao Qu
- State Key Laboratory of Swine and Poultry Breeding Industry, Guangdong Key Laboratory of Animal Breeding and Nutrition, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Wei Luo
- State Key Laboratory of Swine and Poultry Breeding Industry, Guangdong Key Laboratory of Animal Breeding and Nutrition, Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China
| | - Zhenhui Li
- State Key Laboratory of Swine and Poultry Breeding Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Qinghua Nie
- State Key Laboratory of Swine and Poultry Breeding Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Xiquan Zhang
- State Key Laboratory of Swine and Poultry Breeding Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Ruidong Xiang
- Agriculture Victoria, Agribio, Centre for AgriBiosciences, Bundoora, Victoria, Australia
- Cambridge-Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- School of Agriculture, Food and Ecosystem Sciences, The University of Melbourne, Parkville, Victoria, Australia
| | - Shuli Liu
- School of Life Sciences, Westlake University, Hangzhou, China
| | - Zhe Zhang
- State Key Laboratory of Swine and Poultry Breeding Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Zhang Zhang
- Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - George E Liu
- Animal Genomics and Improvement Laboratory, Henry A. Wallace Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, USA
| | - Hans Cheng
- Avian Disease and Oncology Laboratory, USDA, ARS, USNPRC, East Lansing, MI, USA
| | - Ning Yang
- State Key Laboratory of Animal Biotech Breeding and Frontier Science Center of Molecular Design Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China.
| | - Xiaoxiang Hu
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China.
| | - Huaijun Zhou
- Department of Animal Science, University of California-Davis, Davis, CA, USA.
| | - Lingzhao Fang
- Center for Quantitative Genetics and Genomics (QGG), Aarhus University, Aarhus, Denmark.
| |
Collapse
|
3
|
Triozzi JL, Yu Z, Giri A, Chen HC, Wilson OD, Ferolito B, Ikizler TA, Akwo EA, Robinson-Cohen C, Gaziano JM, Cho K, Phillips LS, Tao R, Pereira AC, Hung AM. GLP1R Gene Expression and Kidney Disease Progression. JAMA Netw Open 2024; 7:e2440286. [PMID: 39453656 PMCID: PMC11581634 DOI: 10.1001/jamanetworkopen.2024.40286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Accepted: 08/21/2024] [Indexed: 10/26/2024] Open
Abstract
Importance Glucagon-like peptide 1 receptor agonists (GLP-1RAs) may have nephroprotective properties beyond those related to weight loss and glycemic control. Objective To investigate the association of genetically proxied GLP-1RAs with kidney disease progression. Design, Setting, and Participants This genetic association study assembled a national retrospective cohort of veterans aged 18 years or older from the US Department of Veterans Affairs Million Veteran Program between January 10, 2011, and December 31, 2021. Data were analyzed from November 2023 to February 2024. Exposures Genetic risk score for systemic GLP1R gene expression that was calculated for each study participant based on genetic variants associated with GLP1R mRNA levels across all tissue samples within the Genotype-Tissue Expression project. Main Outcomes and Measures The primary composite outcome was incident end-stage kidney disease or a 40% decline in estimated glomerular filtration rate. Cox proportional hazards regression survival analysis assessed the association between genetically proxied GLP-1RAs and kidney disease progression. Results Among 353 153 individuals (92.5% men), median age was 66 years (IQR, 58.0-72.0 years) and median follow-up was 5.1 years (IQR, 3.1-7.2 years). Overall, 25.7% had diabetes, and 45.0% had obesity. A total of 4.6% experienced kidney disease progression. Overall, higher genetic GLP1R gene expression was associated with a lower risk of kidney disease progression in the unadjusted model (hazard ratio [HR], 0.96; 95% CI, 0.92-0.99; P = .02) and in the fully adjusted model accounting for baseline patient characteristics, body mass index, and the presence or absence of diabetes (HR, 0.96; 95% CI, 0.92-1.00; P = .04). The results were similar in sensitivity analyses stratified by diabetes or obesity status. Conclusions and Relevance In this genetic association study, higher GLP1R gene expression was associated with a small reduction in risk of kidney disease progression. These findings support pleiotropic nephroprotective mechanisms of GLP-1RAs independent of their effects on body weight and glycemic control.
Collapse
Affiliation(s)
- Jefferson L. Triozzi
- Division of Nephrology and Hypertension, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Zhihong Yu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Ayush Giri
- Division of Quantitative Sciences, Department of Obstetrics and Gynecology, Vanderbilt University, Nashville, Tennessee
- Division of Epidemiology, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Hua-Chang Chen
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Otis D. Wilson
- Nashville VA Medical Center, VA Tennessee Valley Healthcare System, Nashville
| | - Brian Ferolito
- Million Veteran Program Coordinating Center, VA Boston Healthcare System, Boston, Massachusetts
| | - T. Alp Ikizler
- Division of Nephrology and Hypertension, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Elvis A. Akwo
- Division of Nephrology and Hypertension, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Cassianne Robinson-Cohen
- Division of Nephrology and Hypertension, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
| | - John Michael Gaziano
- Million Veteran Program Coordinating Center, VA Boston Healthcare System, Boston, Massachusetts
- Department of Medicine, Brigham and Women’s Hospital and Harvard School of Medicine, Boston, Massachusetts
| | - Kelly Cho
- Million Veteran Program Coordinating Center, VA Boston Healthcare System, Boston, Massachusetts
- Department of Medicine, Brigham and Women’s Hospital and Harvard School of Medicine, Boston, Massachusetts
| | - Lawrence S. Phillips
- VA Atlanta Health Care System, Decatur, Georgia
- Division of Endocrinology and Metabolism, Department of Medicine, Emory University School of Medicine, Atlanta, Georgia
| | - Ran Tao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Alexandre C. Pereira
- Million Veteran Program Coordinating Center, VA Boston Healthcare System, Boston, Massachusetts
- Department of Medicine, Brigham and Women’s Hospital and Harvard School of Medicine, Boston, Massachusetts
| | - Adriana M. Hung
- Division of Nephrology and Hypertension, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
- Nashville VA Medical Center, VA Tennessee Valley Healthcare System, Nashville
| |
Collapse
|
4
|
Ashayeri H, Sobhi N, Pławiak P, Pedrammehr S, Alizadehsani R, Jafarizadeh A. Transfer Learning in Cancer Genetics, Mutation Detection, Gene Expression Analysis, and Syndrome Recognition. Cancers (Basel) 2024; 16:2138. [PMID: 38893257 PMCID: PMC11171544 DOI: 10.3390/cancers16112138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 05/30/2024] [Accepted: 06/01/2024] [Indexed: 06/21/2024] Open
Abstract
Artificial intelligence (AI), encompassing machine learning (ML) and deep learning (DL), has revolutionized medical research, facilitating advancements in drug discovery and cancer diagnosis. ML identifies patterns in data, while DL employs neural networks for intricate processing. Predictive modeling challenges, such as data labeling, are addressed by transfer learning (TL), leveraging pre-existing models for faster training. TL shows potential in genetic research, improving tasks like gene expression analysis, mutation detection, genetic syndrome recognition, and genotype-phenotype association. This review explores the role of TL in overcoming challenges in mutation detection, genetic syndrome detection, gene expression, or phenotype-genotype association. TL has shown effectiveness in various aspects of genetic research. TL enhances the accuracy and efficiency of mutation detection, aiding in the identification of genetic abnormalities. TL can improve the diagnostic accuracy of syndrome-related genetic patterns. Moreover, TL plays a crucial role in gene expression analysis in order to accurately predict gene expression levels and their interactions. Additionally, TL enhances phenotype-genotype association studies by leveraging pre-trained models. In conclusion, TL enhances AI efficiency by improving mutation prediction, gene expression analysis, and genetic syndrome detection. Future studies should focus on increasing domain similarities, expanding databases, and incorporating clinical data for better predictions.
Collapse
Affiliation(s)
- Hamidreza Ashayeri
- Student Research Committee, Tabriz University of Medical Sciences, Tabriz 5165665811, Iran;
| | - Navid Sobhi
- Nikookari Eye Center, Tabriz University of Medical Sciences, Tabriz 5165665811, Iran; (N.S.); (A.J.)
| | - Paweł Pławiak
- Department of Computer Science, Faculty of Computer Science and Telecommunications, Cracow University of Technology, Warszawska 24, 31-155 Krakow, Poland
- Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Bałtycka 5, 44-100 Gliwice, Poland
| | - Siamak Pedrammehr
- Faculty of Design, Tabriz Islamic Art University, Tabriz 5164736931, Iran;
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Burwood, VIC 3216, Australia;
| | - Roohallah Alizadehsani
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Burwood, VIC 3216, Australia;
| | - Ali Jafarizadeh
- Nikookari Eye Center, Tabriz University of Medical Sciences, Tabriz 5165665811, Iran; (N.S.); (A.J.)
- Immunology Research Center, Tabriz University of Medical Sciences, Tabriz 5165665811, Iran
| |
Collapse
|
5
|
Zhang S, Jiang Z, Zeng P. Incorporating genetic similarity of auxiliary samples into eGene identification under the transfer learning framework. J Transl Med 2024; 22:258. [PMID: 38461317 PMCID: PMC10924384 DOI: 10.1186/s12967-024-05053-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 03/01/2024] [Indexed: 03/11/2024] Open
Abstract
BACKGROUND The term eGene has been applied to define a gene whose expression level is affected by at least one independent expression quantitative trait locus (eQTL). It is both theoretically and empirically important to identify eQTLs and eGenes in genomic studies. However, standard eGene detection methods generally focus on individual cis-variants and cannot efficiently leverage useful knowledge acquired from auxiliary samples into target studies. METHODS We propose a multilocus-based eGene identification method called TLegene by integrating shared genetic similarity information available from auxiliary studies under the statistical framework of transfer learning. We apply TLegene to eGene identification in ten TCGA cancers which have an explicit relevant tissue in the GTEx project, and learn genetic effect of variant in TCGA from GTEx. We also adopt TLegene to the Geuvadis project to evaluate its usefulness in non-cancer studies. RESULTS We observed substantial genetic effect correlation of cis-variants between TCGA and GTEx for a larger number of genes. Furthermore, consistent with the results of our simulations, we found that TLegene was more powerful than existing methods and thus identified 169 distinct candidate eGenes, which was much larger than the approach that did not consider knowledge transfer across target and auxiliary studies. Previous studies and functional enrichment analyses provided empirical evidence supporting the associations of discovered eGenes, and it also showed evidence of allelic heterogeneity of gene expression. Furthermore, TLegene identified more eGenes in Geuvadis and revealed that these eGenes were mainly enriched in cells EBV transformed lymphocytes tissue. CONCLUSION Overall, TLegene represents a flexible and powerful statistical method for eGene identification through transfer learning of genetic similarity shared across auxiliary and target studies.
Collapse
Affiliation(s)
- Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Xuzhou Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Jiangsu Engineering Research Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
6
|
Shi FY, Wang Y, Huang D, Liang Y, Liang N, Chen XW, Gao G. Computational Assessment of the Expression-modulating Potential for Non-coding Variants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:662-673. [PMID: 34890839 PMCID: PMC10787178 DOI: 10.1016/j.gpb.2021.10.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 10/13/2021] [Accepted: 11/01/2021] [Indexed: 06/13/2023]
Abstract
Large-scale genome-wide association studies (GWAS) and expression quantitative trait locus (eQTL) studies have identified multiple non-coding variants associated with genetic diseases by affecting gene expression. However, pinpointing causal variants effectively and efficiently remains a serious challenge. Here, we developed CARMEN, a novel algorithm to identify functional non-coding expression-modulating variants. Multiple evaluations demonstrated CARMEN's superior performance over state-of-the-art tools. Applying CARMEN to GWAS and eQTL datasets further pinpointed several causal variants other than the reported lead single-nucleotide polymorphisms (SNPs). CARMEN scales well with the massive datasets, and is available online as a web server at http://carmen.gao-lab.org.
Collapse
Affiliation(s)
- Fang-Yuan Shi
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Yu Wang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Dong Huang
- State Key Laboratory of Membrane Biology, Institute of Molecular Medicine, Peking University, Beijing 100871, China
| | - Yu Liang
- Human Aging Research Institute, School of Life Science, Nanchang University, Nanchang 330031, China
| | - Nan Liang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Xiao-Wei Chen
- State Key Laboratory of Membrane Biology, Institute of Molecular Medicine, Peking University, Beijing 100871, China; Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Ge Gao
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China.
| |
Collapse
|
7
|
Liu D, Zhu J, Zhou D, Nikas EG, Mitanis NT, Sun Y, Wu C, Mancuso N, Cox NJ, Wang L, Freedland SJ, Haiman CA, Gamazon ER, Nikas JB, Wu L. A transcriptome-wide association study identifies novel candidate susceptibility genes for prostate cancer risk. Int J Cancer 2022; 150:80-90. [PMID: 34520569 PMCID: PMC8595764 DOI: 10.1002/ijc.33808] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 08/20/2021] [Accepted: 08/30/2021] [Indexed: 01/03/2023]
Abstract
A large proportion of heritability for prostate cancer risk remains unknown. Transcriptome-wide association study combined with validation comparing overall levels will help to identify candidate genes potentially playing a role in prostate cancer development. Using data from the Genotype-Tissue Expression Project, we built genetic models to predict normal prostate tissue gene expression using the statistical framework PrediXcan, a modified version of the unified test for molecular signatures and Joint-Tissue Imputation. We applied these prediction models to the genetic data of 79 194 prostate cancer cases and 61 112 controls to investigate the associations of genetically determined gene expression with prostate cancer risk. Focusing on associated genes, we compared their expression in prostate tumor vs normal prostate tissue, compared methylation of CpG sites located at these loci in prostate tumor vs normal tissue, and assessed the correlations between the differentiated genes' expression and the methylation of corresponding CpG sites, by analyzing The Cancer Genome Atlas (TCGA) data. We identified 573 genes showing an association with prostate cancer risk at a false discovery rate (FDR) ≤ 0.05, including 451 novel genes and 122 previously reported genes. Of the 573 genes, 152 showed differential expression in prostate tumor vs normal tissue samples. At loci of 57 genes, 151 CpG sites showed differential methylation in prostate tumor vs normal tissue samples. Of these, 20 CpG sites were correlated with expression of 11 corresponding genes. In this TWAS, we identified novel candidate susceptibility genes for prostate cancer risk, providing new insights into prostate cancer genetics and biology.
Collapse
Affiliation(s)
- Duo Liu
- Department of Pharmacy, Harbin Medical University Cancer Hospital, Harbin, China
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawaii Cancer Center, University of Hawaii at Manoa, Honolulu, HI, USA
| | - Jingjing Zhu
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawaii Cancer Center, University of Hawaii at Manoa, Honolulu, HI, USA
| | - Dan Zhou
- Vanderbilt Genetics Institute and Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Emily G Nikas
- School of Mathematics, University of Minnesota, Minneapolis, MN, USA
| | - Nikos T Mitanis
- Department of Mathematics, University of the Aegean, Samos, Greece
| | - Yanfa Sun
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawaii Cancer Center, University of Hawaii at Manoa, Honolulu, HI, USA
- College of Life Science, Longyan University, Longyan, Fujian, P. R. China
- Fujian Provincial Key Laboratory for the Prevention and Control of Animal Infectious Diseases and Biotechnology, Longyan, Fujian, 364012, P.R. China
- Key Laboratory of Preventive Veterinary Medicine and Biotechnology (Longyan University), Fujian Province University, Longyan, Fujian, 364012, P.R. China
| | - Chong Wu
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Nicholas Mancuso
- Center for Genetic Epidemiology, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA; Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA, USA
| | - Nancy J Cox
- Vanderbilt Genetics Institute and Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Liang Wang
- Department of Tumor Biology, H. Lee Moffitt Cancer Center, Tampa, FL, USA
| | - Stephen J Freedland
- Center for Integrated Research in Cancer and Lifestyle, Cedars-Sinai Medical Center, Los Angeles, CA
- Section of Urology, Durham VA Medical Center, Durham, NC, USA
| | - Christopher A Haiman
- Center for Genetic Epidemiology, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA; Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA, USA
| | - Eric R Gamazon
- Vanderbilt Genetics Institute and Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Clare Hall, University of Cambridge, Cambridge, UK
- MRC Epidemiology Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - Jason B Nikas
- Research & Development, Genomix Inc., Minneapolis, MN, USA
| | - Lang Wu
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawaii Cancer Center, University of Hawaii at Manoa, Honolulu, HI, USA
| |
Collapse
|
8
|
Xie Y, Li M, Dong W, Jiang W, Zhao H. M-DATA: A statistical approach to jointly analyzing de novo mutations for multiple traits. PLoS Genet 2021; 17:e1009849. [PMID: 34735430 PMCID: PMC8568192 DOI: 10.1371/journal.pgen.1009849] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 09/29/2021] [Indexed: 11/22/2022] Open
Abstract
Recent studies have demonstrated that multiple early-onset diseases have shared risk genes, based on findings from de novo mutations (DNMs). Therefore, we may leverage information from one trait to improve statistical power to identify genes for another trait. However, there are few methods that can jointly analyze DNMs from multiple traits. In this study, we develop a framework called M-DATA (Multi-trait framework for De novo mutation Association Test with Annotations) to increase the statistical power of association analysis by integrating data from multiple correlated traits and their functional annotations. Using the number of DNMs from multiple diseases, we develop a method based on an Expectation-Maximization algorithm to both infer the degree of association between two diseases as well as to estimate the gene association probability for each disease. We apply our method to a case study of jointly analyzing data from congenital heart disease (CHD) and autism. Our method was able to identify 23 genes for CHD from joint analysis, including 12 novel genes, which is substantially more than single-trait analysis, leading to novel insights into CHD disease etiology.
Collapse
Affiliation(s)
- Yuhan Xie
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Mo Li
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Weilai Dong
- Department of Genetics, Yale School of Medicine, New Haven, Connecticut, United States of America
| | - Wei Jiang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
- Department of Genetics, Yale School of Medicine, New Haven, Connecticut, United States of America
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|
9
|
Li B, Ritchie MD. From GWAS to Gene: Transcriptome-Wide Association Studies and Other Methods to Functionally Understand GWAS Discoveries. Front Genet 2021; 12:713230. [PMID: 34659337 PMCID: PMC8515949 DOI: 10.3389/fgene.2021.713230] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Accepted: 07/27/2021] [Indexed: 12/12/2022] Open
Abstract
Since their inception, genome-wide association studies (GWAS) have identified more than a hundred thousand single nucleotide polymorphism (SNP) loci that are associated with various complex human diseases or traits. The majority of GWAS discoveries are located in non-coding regions of the human genome and have unknown functions. The valley between non-coding GWAS discoveries and downstream affected genes hinders the investigation of complex disease mechanism and the utilization of human genetics for the improvement of clinical care. Meanwhile, advances in high-throughput sequencing technologies reveal important genomic regulatory roles that non-coding regions play in the transcriptional activities of genes. In this review, we focus on data integrative bioinformatics methods that combine GWAS with functional genomics knowledge to identify genetically regulated genes. We categorize and describe two types of data integrative methods. First, we describe fine-mapping methods. Fine-mapping is an exploratory approach that calibrates likely causal variants underneath GWAS signals. Fine-mapping methods connect GWAS signals to potentially causal genes through statistical methods and/or functional annotations. Second, we discuss gene-prioritization methods. These are hypothesis generating approaches that evaluate whether genetic variants regulate genes via certain genetic regulatory mechanisms to influence complex traits, including colocalization, mendelian randomization, and the transcriptome-wide association study (TWAS). TWAS is a gene-based association approach that investigates associations between genetically regulated gene expression and complex diseases or traits. TWAS has gained popularity over the years due to its ability to reduce multiple testing burden in comparison to other variant-based analytic approaches. Multiple types of TWAS methods have been developed with varied methodological designs and biological hypotheses over the past 5 years. We dive into discussions of how TWAS methods differ in many aspects and the challenges that different TWAS methods face. Overall, TWAS is a powerful tool for identifying complex trait-associated genes. With the advent of single-cell sequencing, chromosome conformation capture, gene editing technologies, and multiplexing reporter assays, we are expecting a more comprehensive understanding of genomic regulation and genetically regulated genes underlying complex human diseases and traits in the future.
Collapse
Affiliation(s)
- Binglan Li
- Department of Biomedical Data Science, Stanford University, Stanford, CA, United States
| | - Marylyn D Ritchie
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, United States.,Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
10
|
Cuomo ASE, Alvari G, Azodi CB, McCarthy DJ, Bonder MJ. Optimizing expression quantitative trait locus mapping workflows for single-cell studies. Genome Biol 2021; 22:188. [PMID: 34167583 PMCID: PMC8223300 DOI: 10.1186/s13059-021-02407-x] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 06/09/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) has enabled the unbiased, high-throughput quantification of gene expression specific to cell types and states. With the cost of scRNA-seq decreasing and techniques for sample multiplexing improving, population-scale scRNA-seq, and thus single-cell expression quantitative trait locus (sc-eQTL) mapping, is increasingly feasible. Mapping of sc-eQTL provides additional resolution to study the regulatory role of common genetic variants on gene expression across a plethora of cell types and states and promises to improve our understanding of genetic regulation across tissues in both health and disease. RESULTS While previously established methods for bulk eQTL mapping can, in principle, be applied to sc-eQTL mapping, there are a number of open questions about how best to process scRNA-seq data and adapt bulk methods to optimize sc-eQTL mapping. Here, we evaluate the role of different normalization and aggregation strategies, covariate adjustment techniques, and multiple testing correction methods to establish best practice guidelines. We use both real and simulated datasets across single-cell technologies to systematically assess the impact of these different statistical approaches. CONCLUSION We provide recommendations for future single-cell eQTL studies that can yield up to twice as many eQTL discoveries as default approaches ported from bulk studies.
Collapse
Affiliation(s)
- Anna S E Cuomo
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK.
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK.
| | - Giordano Alvari
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Christina B Azodi
- St. Vincent's Institute of Medical Research, Fitzroy, Victoria, Australia
- University of Melbourne, Parkville, Victoria, Australia
| | - Davis J McCarthy
- St. Vincent's Institute of Medical Research, Fitzroy, Victoria, Australia.
- University of Melbourne, Parkville, Victoria, Australia.
| | - Marc Jan Bonder
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany.
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.
| |
Collapse
|
11
|
Zhuang Y, Wade K, Saba LM, Kechris K. Development of a tissue augmented Bayesian model for expression quantitative trait loci analysis. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2019; 17:122-143. [PMID: 31731343 PMCID: PMC7384761 DOI: 10.3934/mbe.2020007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2023]
Abstract
Expression quantitative trait loci (eQTL) analyses detect genetic variants (SNPs) associated with RNA expression levels of genes. The conventional eQTL analysis is to perform individual tests for each gene-SNP pair using simple linear regression and to perform the test on each tissue separately ignoring the extensive information known about RNA expression in other tissue(s). Although Bayesian models have been recently developed to improve eQTL prediction on multiple tissues, they are often based on uninformative priors or treat all tissues equally. In this study, we develop a novel tissue augmented Bayesian model for eQTL analysis (TA-eQTL), which takes prior eQTL information from a different tissue into account to better predict eQTL for another tissue. We demonstrate that our modified Bayesian model has comparable performance to several existing methods in terms of sensitivity and specificity using allele-specific expression (ASE) as the gold standard. Furthermore, the tissue augmented Bayesian model improves the power and accuracy for local-eQTL prediction especially when the sample size is small. In summary, TA-eQTL's performance is comparable to existing methods but has additional flexibility to evaluate data from different platforms, can focus prediction on one tissue using only summary statistics from the secondary tissue(s), and provides a closed form solution for estimation.
Collapse
Affiliation(s)
- Yonghua Zhuang
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Denver Anschutz Medical Campus, Mail Stop B119, 13001 E. 17th Place, Aurora, 80045, USA
| | - Kristen Wade
- Human Medical Genetics and Genomics Program, School of Medicine, University of Colorado Denver Anschutz Medical Campus, 80045, Aurora, USA
| | - Laura M. Saba
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Denver Anschutz Medical Campus, 80045, Aurora, USA
| | - Katerina Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Denver Anschutz Medical Campus, Mail Stop B119, 13001 E. 17th Place, Aurora, 80045, USA
- Correspondence:, ; Tel: +13037244363, +13037249697
| |
Collapse
|
12
|
Gai L, Eskin E. Finding associated variants in genome-wide association studies on multiple traits. Bioinformatics 2019; 34:i467-i474. [PMID: 29949991 PMCID: PMC6022769 DOI: 10.1093/bioinformatics/bty249] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Motivation Many variants identified by genome-wide association studies (GWAS) have been found to affect multiple traits, either directly or through shared pathways. There is currently a wealth of GWAS data collected in numerous phenotypes, and analyzing multiple traits at once can increase power to detect shared variant effects. However, traditional meta-analysis methods are not suitable for combining studies on different traits. When applied to dissimilar studies, these meta-analysis methods can be underpowered compared to univariate analysis. The degree to which traits share variant effects is often not known, and the vast majority of GWAS meta-analysis only consider one trait at a time. Results Here, we present a flexible method for finding associated variants from GWAS summary statistics for multiple traits. Our method estimates the degree of shared effects between traits from the data. Using simulations, we show that our method properly controls the false positive rate and increases power when an effect is present in a subset of traits. We then apply our method to the North Finland Birth Cohort and UK Biobank datasets using a variety of metabolic traits and discover novel loci. Availability and implementation Our source code is available at https://github.com/lgai/CONFIT. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lisa Gai
- Department of Computer Science, University of California, Los Angeles, CA, USA
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, CA, USA.,Department of Human Genetics, University of California, Los Angeles, CA, USA
| |
Collapse
|
13
|
Hu Y, Li M, Lu Q, Weng H, Wang J, Zekavat SM, Yu Z, Li B, Gu J, Muchnik S, Shi Y, Kunkle BW, Mukherjee S, Natarajan P, Naj A, Kuzma A, Zhao Y, Crane PK, Lu H, Zhao H. A statistical framework for cross-tissue transcriptome-wide association analysis. Nat Genet 2019; 51:568-576. [PMID: 30804563 PMCID: PMC6788740 DOI: 10.1038/s41588-019-0345-7] [Citation(s) in RCA: 245] [Impact Index Per Article: 40.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Accepted: 01/09/2019] [Indexed: 12/12/2022]
Abstract
Transcriptome-wide association analysis is a powerful approach to studying the genetic architecture of complex traits. A key component of this approach is to build a model to impute gene expression levels from genotypes by using samples with matched genotypes and gene expression data in a given tissue. However, it is challenging to develop robust and accurate imputation models with a limited sample size for any single tissue. Here, we first introduce a multi-task learning method to jointly impute gene expression in 44 human tissues. Compared with single-tissue methods, our approach achieved an average of 39% improvement in imputation accuracy and generated effective imputation models for an average of 120% more genes. We describe a summary-statistic-based testing framework that combines multiple single-tissue associations into a powerful metric to quantify the overall gene-trait association. We applied our method, called UTMOST (unified test for molecular signatures), to multiple genome-wide-association results and demonstrate its advantages over single-tissue strategies.
Collapse
Affiliation(s)
- Yiming Hu
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Mo Li
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Qiongshi Lu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Haoyi Weng
- Division of Biostatistics, The Jockey Club School of Public Health and Primary Care, The Chinese University of Hong Kong, Shatin, NT, Hong Kong
| | - Jiawei Wang
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Seyedeh M Zekavat
- Yale School of Medicine, New Haven, CT, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Zhaolong Yu
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Boyang Li
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Jianlei Gu
- SJTU-Yale Joint Center for Biostatistics, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiaotong University, Shanghai, China
| | - Sydney Muchnik
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Yu Shi
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Brian W Kunkle
- John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL, USA
| | | | - Pradeep Natarajan
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Adam Naj
- Center for Clinical Epidemiology and Biostatistic, and the Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Amanda Kuzma
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yi Zhao
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Paul K Crane
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - Hui Lu
- SJTU-Yale Joint Center for Biostatistics, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiaotong University, Shanghai, China
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA.
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.
- SJTU-Yale Joint Center for Biostatistics, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiaotong University, Shanghai, China.
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA.
| |
Collapse
|
14
|
Xu X, Eales JM, Akbarov A, Guo H, Becker L, Talavera D, Ashraf F, Nawaz J, Pramanik S, Bowes J, Jiang X, Dormer J, Denniff M, Antczak A, Szulinska M, Wise I, Prestes PR, Glyda M, Bogdanski P, Zukowska-Szczechowska E, Berzuini C, Woolf AS, Samani NJ, Charchar FJ, Tomaszewski M. Molecular insights into genome-wide association studies of chronic kidney disease-defining traits. Nat Commun 2018; 9:4800. [PMID: 30467309 PMCID: PMC6250666 DOI: 10.1038/s41467-018-07260-4] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2018] [Accepted: 10/17/2018] [Indexed: 02/08/2023] Open
Abstract
Genome-wide association studies (GWAS) have identified >100 loci of chronic kidney disease-defining traits (CKD-dt). Molecular mechanisms underlying these associations remain elusive. Using 280 kidney transcriptomes and 9958 gene expression profiles from 44 non-renal tissues we uncover gene expression partners (eGenes) for 88.9% of CKD-dt GWAS loci. Through epigenomic chromatin segmentation analysis and variant effect prediction we annotate functional consequences to 74% of these loci. Our colocalisation analysis and Mendelian randomisation in >130,000 subjects demonstrate causal effects of three eGenes (NAT8B, CASP9 and MUC1) on estimated glomerular filtration rate. We identify a common alternative splice variant in MUC1 (a gene responsible for rare Mendelian form of kidney disease) and observe increased renal expression of a specific MUC1 mRNA isoform as a plausible molecular mechanism of the GWAS association signal. These data highlight the variants and genes underpinning the associations uncovered in GWAS of CKD-dt.
Collapse
Affiliation(s)
- Xiaoguang Xu
- Division of Cardiovascular Sciences, Faculty of Medicine, Biology and Health, University of Manchester, Manchester, M13 9PT, UK
| | - James M Eales
- Division of Cardiovascular Sciences, Faculty of Medicine, Biology and Health, University of Manchester, Manchester, M13 9PT, UK
| | - Artur Akbarov
- Division of Cardiovascular Sciences, Faculty of Medicine, Biology and Health, University of Manchester, Manchester, M13 9PT, UK
| | - Hui Guo
- Division of Population Health, Health Services Research and Primary Care, Faculty of Medicine, Biology and Health, University of Manchester, Manchester, M13 9PL, UK
| | - Lorenz Becker
- Division of Cardiovascular Sciences, Faculty of Medicine, Biology and Health, University of Manchester, Manchester, M13 9PT, UK
| | - David Talavera
- Division of Cardiovascular Sciences, Faculty of Medicine, Biology and Health, University of Manchester, Manchester, M13 9PT, UK
| | - Fehzan Ashraf
- Division of Cardiovascular Sciences, Faculty of Medicine, Biology and Health, University of Manchester, Manchester, M13 9PT, UK
| | - Jabran Nawaz
- Division of Cardiovascular Sciences, Faculty of Medicine, Biology and Health, University of Manchester, Manchester, M13 9PT, UK
| | - Sanjeev Pramanik
- Division of Cardiovascular Sciences, Faculty of Medicine, Biology and Health, University of Manchester, Manchester, M13 9PT, UK
| | - John Bowes
- Division of Musculoskeletal and Dermatological Sciences, Faculty of Medicine, Biology and Health, University of Manchester, Manchester, M13 9PT, UK
| | - Xiao Jiang
- Division of Cardiovascular Sciences, Faculty of Medicine, Biology and Health, University of Manchester, Manchester, M13 9PT, UK
| | - John Dormer
- University Hospitals of Leicester NHS Trust, Leicester, LE1 5WW, UK
| | - Matthew Denniff
- Department of Cardiovascular Sciences, University of Leicester, Leicester, LE3 9QP, UK
| | - Andrzej Antczak
- Department of Urology and Uro-oncology, Karol Marcinkowski University of Medical Sciences, Poznan, 61-285, Poland
| | - Monika Szulinska
- Department of Internal Medicine, Metabolic Disorders and Hypertension, Karol Marcinkowski University of Medical Sciences, Poznan, 60-569, Poland
| | - Ingrid Wise
- School of Health and Life Sciences, Federation University Australia, Ballarat, 3350, VIC, Australia
| | - Priscilla R Prestes
- School of Health and Life Sciences, Federation University Australia, Ballarat, 3350, VIC, Australia
| | - Maciej Glyda
- Department of Transplantology and General Surgery, District Public Hospital, University of Zielona Góra, Poznan, 65-417, Poland
| | - Pawel Bogdanski
- Department of Obesity and Metabolic Disorders Treatment and Clinical Dietetics, Karol Marcinkowski University of Medical Sciences, Poznan, 60-569, Poland
| | | | - Carlo Berzuini
- Division of Population Health, Health Services Research and Primary Care, Faculty of Medicine, Biology and Health, University of Manchester, Manchester, M13 9PL, UK
| | - Adrian S Woolf
- Department of Paediatric Nephrology, Royal Manchester Children's Hospital, Manchester University NHS Foundation Trust, Manchester, M13 9WL, UK
| | - Nilesh J Samani
- Department of Cardiovascular Sciences, University of Leicester, Leicester, LE3 9QP, UK.,NIHR Leicester Biomedical Research Centre, Glenfield Hospital, Leicester, LE3 9QP, UK
| | - Fadi J Charchar
- Department of Cardiovascular Sciences, University of Leicester, Leicester, LE3 9QP, UK.,School of Health and Life Sciences, Federation University Australia, Ballarat, 3350, VIC, Australia.,Department of Physiology, University of Melbourne, Melbourne, 3010, VIC, Australia
| | - Maciej Tomaszewski
- Division of Cardiovascular Sciences, Faculty of Medicine, Biology and Health, University of Manchester, Manchester, M13 9PT, UK. .,Division of Medicine, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, M13 9PL, UK.
| |
Collapse
|
15
|
Duong D, Ahmad WU, Eskin E, Chang KW, Li JJ. Word and Sentence Embedding Tools to Measure Semantic Similarity of Gene Ontology Terms by Their Definitions. J Comput Biol 2018; 26:38-52. [PMID: 30383443 DOI: 10.1089/cmb.2018.0093] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
The gene ontology (GO) database contains GO terms that describe biological functions of genes. Previous methods for comparing GO terms have relied on the fact that GO terms are organized into a tree structure. Under this paradigm, the locations of two GO terms in the tree dictate their similarity score. In this article, we introduce two new solutions for this problem by focusing instead on the definitions of the GO terms. We apply neural network-based techniques from the natural language processing (NLP) domain. The first method does not rely on the GO tree, whereas the second indirectly depends on the GO tree. In our first approach, we compare two GO definitions by treating them as two unordered sets of words. The word similarity is estimated by a word embedding model that maps words into an N-dimensional space. In our second approach, we account for the word-ordering within a sentence. We use a sentence encoder to embed GO definitions into vectors and estimate how likely one definition entails another. We validate our methods in two ways. In the first experiment, we test the model's ability to differentiate a true protein-protein network from a randomly generated network. In the second experiment, we test the model in identifying orthologs from randomly matched genes in human, mouse, and fly. In both experiments, a hybrid of NLP and GO tree-based method achieves the best classification accuracy.
Collapse
Affiliation(s)
- Dat Duong
- 1 Department of Computer Science, University of California, Los Angeles, California
| | - Wasi Uddin Ahmad
- 1 Department of Computer Science, University of California, Los Angeles, California
| | - Eleazar Eskin
- 1 Department of Computer Science, University of California, Los Angeles, California.,2 Department of Human Genetics, and University of California, Los Angeles, California
| | - Kai-Wei Chang
- 1 Department of Computer Science, University of California, Los Angeles, California
| | - Jingyi Jessica Li
- 2 Department of Human Genetics, and University of California, Los Angeles, California.,3 Department of Statistics, University of California, Los Angeles, California
| |
Collapse
|
16
|
Endo C, Johnson TA, Morino R, Nakazono K, Kamitsuji S, Akita M, Kawajiri M, Yamasaki T, Kami A, Hoshi Y, Tada A, Ishikawa K, Hine M, Kobayashi M, Kurume N, Tsunemi Y, Kamatani N, Kawashima M. Genome-wide association study in Japanese females identifies fifteen novel skin-related trait associations. Sci Rep 2018; 8:8974. [PMID: 29895819 PMCID: PMC5997657 DOI: 10.1038/s41598-018-27145-2] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Accepted: 05/25/2018] [Indexed: 12/27/2022] Open
Abstract
Skin trait variation impacts quality-of-life, especially for females from the viewpoint of beauty. To investigate genetic variation related to these traits, we conducted a GWAS of various skin phenotypes in 11,311 Japanese women and identified associations for age-spots, freckles, double eyelids, straight/curly hair, eyebrow thickness, hairiness, and sweating. In silico annotation with RoadMap Epigenomics epigenetic state maps and colocalization analysis of GWAS and GTEx Project eQTL signals provided information about tissue specificity, candidate causal variants, and functional target genes. Novel signals for skin-spot traits neighboured AKAP1/MSI2 (rs17833789; P = 2.2 × 10-9), BNC2 (rs10810635; P = 2.1 × 10-22), HSPA12A (rs12259842; P = 7.1 × 10-11), PPARGC1B (rs251468; P = 1.3 × 10-21), and RAB11FIP2 (rs10444039; P = 5.6 × 10-21). HSPA12A SNPs were the only protein-coding gene eQTLs identified across skin-spot loci. Double edged eyelid analysis identified that a signal around EMX2 (rs12570134; P = 8.2 × 10-15) was also associated with expression of EMX2 and the antisense-RNA gene EMX2OS in brain putamen basal ganglia tissue. A known hair morphology signal in EDAR was associated with both eyebrow thickness (rs3827760; P = 1.7 × 10-9) and straight/curly hair (rs260643; P = 1.6 × 10-103). Excessive hairiness signals' top SNPs were also eQTLs for TBX15 (rs984225; P = 1.6 × 10-8), BCL2 (rs7226979; P = 7.3 × 10-11), and GCC2 and LIMS1 (rs6542772; P = 2.2 × 10-9). For excessive sweating, top variants in two signals in chr2:28.82-29.05 Mb (rs56089836; P = 1.7 × 10-11) were eQTLs for either PPP1CB or PLB1, while a top chr16:48.26-48.45 Mb locus SNP was a known ABCC11 missense variant (rs6500380; P = 6.8 × 10-10). In total, we identified twelve loci containing sixteen association signals, of which fifteen were novel. These findings will help dermatologic researchers better understand the genetic underpinnings of skin-related phenotypic variation in human populations.
Collapse
Affiliation(s)
- Chihiro Endo
- Department of Dermatology, School of Medicine, Tokyo Women's Medical University, Shinjuku, Tokyo, 162-8666, Japan
| | | | - Ryoko Morino
- EverGene Ltd., Shinjuku-ku, Tokyo, 163-1435, Japan
| | | | | | | | | | - Tatsuya Yamasaki
- Life Science Group, Healthcare Division, Department of Healthcare Business, MTI Ltd., Shinjuku-ku, Tokyo, 163-1435, Japan
| | - Azusa Kami
- EverGene Ltd., Shinjuku-ku, Tokyo, 163-1435, Japan
| | - Yuria Hoshi
- Life Science Group, Healthcare Division, Department of Healthcare Business, MTI Ltd., Shinjuku-ku, Tokyo, 163-1435, Japan
| | - Asami Tada
- EverGene Ltd., Shinjuku-ku, Tokyo, 163-1435, Japan
| | | | - Maaya Hine
- LunaLuna Division, Department of Healthcare Business, MTI Ltd., Shinjuku-ku, Tokyo, 163-1435, Japan
| | - Miki Kobayashi
- LunaLuna Division, Department of Healthcare Business, MTI Ltd., Shinjuku-ku, Tokyo, 163-1435, Japan
| | - Nami Kurume
- LunaLuna Division, Department of Healthcare Business, MTI Ltd., Shinjuku-ku, Tokyo, 163-1435, Japan
| | - Yuichiro Tsunemi
- Department of Dermatology, School of Medicine, Tokyo Women's Medical University, Shinjuku, Tokyo, 162-8666, Japan
| | | | - Makoto Kawashima
- Department of Dermatology, School of Medicine, Tokyo Women's Medical University, Shinjuku, Tokyo, 162-8666, Japan
| |
Collapse
|
17
|
Single-cell RNA sequencing identifies celltype-specific cis-eQTLs and co-expression QTLs. Nat Genet 2018; 50:493-497. [PMID: 29610479 PMCID: PMC5905669 DOI: 10.1038/s41588-018-0089-9] [Citation(s) in RCA: 235] [Impact Index Per Article: 33.6] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2017] [Accepted: 02/23/2018] [Indexed: 11/17/2022]
|