1
|
Li R, Benz L, Duan R, Denny JC, Hakonarson H, Mosley JD, Smoller JW, Wei WQ, Lumley T, Ritchie MD, Moore JH, Chen Y. A One-Shot Lossless Algorithm for Cross-Cohort Learning in Mixed-Outcomes Analysis. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.09.24301073. [PMID: 38260403 PMCID: PMC10802662 DOI: 10.1101/2024.01.09.24301073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
In cross-cohort studies, integrating diverse datasets, such as electronic health records (EHRs), is both essential and challenging due to cohort-specific variations, distributed data storage, and data privacy concerns. Traditional methods often require data pooling or complex data harmonization, which can reduce efficiency and limit the scope of cross-cohort learning. We introduce mixWAS, a one-shot, lossless algorithm that efficiently integrates distributed EHR datasets via summary statistics. Unlike existing approaches, mixWAS preserves cohort-specific covariate associations and supports simultaneous mixed-outcome analyses. Simulations demonstrate that mixWAS outperforms conventional methods in accuracy and efficiency across various scenarios. Applied to EHR data from seven cohorts in the US, mixWAS identified 4,534 significant cross-cohort genetic associations among traits such as blood lipids, BMI, and circulatory diseases. Validation with an independent UK EHR dataset confirmed 97.7% of these associations, underscoring the algorithm's robustness. By enabling lossless cross-cohort integration, mixWAS improves the precision of multi-outcome analyses and expands the potential for actionable insights in healthcare research.
Collapse
Affiliation(s)
- Ruowang Li
- Department of Computational Biomedicine, Cedars-Sinai Medical Center
| | - Luke Benz
- Department of Biostatistics, Harvard T.H. Chan School of Public Health
| | - Rui Duan
- Department of Biostatistics, Harvard T.H. Chan School of Public Health
| | - Joshua C Denny
- National Human Genome Research Institute, National Institutes of Health
| | - Hakon Hakonarson
- Division of Human Genetics, Children's Hospital of Philadelphia
- Center for Applied Genomics, Children's Hospital of Philadelphia
- Department of Pediatrics, University of Pennsylvania, Perelman School of Medicine
| | - Jonathan D Mosley
- Department of Medicine, Vanderbilt University Medical Center
- Department of Biomedical Informatics, Vanderbilt University Medical Center
| | - Jordan W Smoller
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center
| | | | - Marylyn D Ritchie
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania
| | - Jason H Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center
| | - Yong Chen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania
| |
Collapse
|
2
|
Guo J, Guo Q, Zhong T, Xu C, Xia Z, Fang H, Chen Q, Zhou Y, Xie J, Jin D, Yang Y, Wu X, Zhu H, Hour A, Jin X, Zhou Y, Li Q. Phenome-wide association study in 25,639 pregnant Chinese women reveals loci associated with maternal comorbidities and child health. CELL GENOMICS 2024; 4:100632. [PMID: 39389020 PMCID: PMC11602594 DOI: 10.1016/j.xgen.2024.100632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 12/02/2023] [Accepted: 07/19/2024] [Indexed: 10/12/2024]
Abstract
Phenome-wide association studies (PheWAS) have been less focused on maternal diseases and maternal-newborn comorbidities, especially in the Chinese population. To enhance our understanding of the genetic basis of these related diseases, we conducted a PheWAS on 25,639 pregnant women and 14,151 newborns in the Chinese Han population using ultra-low-coverage whole-genome sequence (ulcWGS). We identified 2,883 maternal trait-associated SNPs associated with 26 phenotypes, among which 99.5% were near established genome-wide association study (GWAS) loci. Further refinement delineated these SNPs to 442 unique trait-associated loci (TALs) predicated on linkage disequilibrium R2 > 0.8, revealing that 75.6% demonstrated pleiotropy and 50.9% were located in genes implicated in analogous phenotypes. Notably, we discovered 21 maternal SNPs associated with 35 neonatal phenotypes, including two SNPs associated with identical complications in both mothers and children. These findings underscore the importance of integrating ulcWGS data to enrich the discoveries derived from traditional PheWAS approaches.
Collapse
Affiliation(s)
- Jintao Guo
- United Diagnostic and Research Center for Clinical Genetics, Women and Children's Hospital, School of Medicine and School of Public Health, Xiamen University, Xiamen 361102, China; National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China; Department of Hematology, School of Medicine, Xiamen University, Xiamen 361102, China; Weifang People's Hospital, Shandong Second Medical University, Shandong 261041, China
| | - Qiwei Guo
- United Diagnostic and Research Center for Clinical Genetics, Women and Children's Hospital, School of Medicine and School of Public Health, Xiamen University, Xiamen 361102, China
| | - Taoling Zhong
- National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China
| | - Chaoqun Xu
- National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China
| | - Zhongmin Xia
- United Diagnostic and Research Center for Clinical Genetics, Women and Children's Hospital, School of Medicine and School of Public Health, Xiamen University, Xiamen 361102, China
| | - Hongkun Fang
- National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China; Weifang People's Hospital, Shandong Second Medical University, Shandong 261041, China
| | - Qinwei Chen
- National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China; Department of Hematology, School of Medicine, Xiamen University, Xiamen 361102, China
| | - Ying Zhou
- National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China
| | - Jieqiong Xie
- United Diagnostic and Research Center for Clinical Genetics, Women and Children's Hospital, School of Medicine and School of Public Health, Xiamen University, Xiamen 361102, China
| | - Dandan Jin
- United Diagnostic and Research Center for Clinical Genetics, Women and Children's Hospital, School of Medicine and School of Public Health, Xiamen University, Xiamen 361102, China
| | - You Yang
- BGI-Shenzhen, Shenzhen 518103, China
| | - Xin Wu
- BGI-Shenzhen, Shenzhen 518103, China
| | | | - Ailing Hour
- Department of Life Science, Fu-Jen Catholic University, Xinzhuang Dist., New Taipei City 242, Taiwan
| | - Xin Jin
- BGI-Shenzhen, Shenzhen 518103, China
| | - Yulin Zhou
- United Diagnostic and Research Center for Clinical Genetics, Women and Children's Hospital, School of Medicine and School of Public Health, Xiamen University, Xiamen 361102, China.
| | - Qiyuan Li
- Department of Pediatrics, School of Medicine, Xiamen University, Xiamen 361102, China; National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen 361102, China; Department of Hematology, School of Medicine, Xiamen University, Xiamen 361102, China.
| |
Collapse
|
3
|
Kontou PI, Bagos PG. The goldmine of GWAS summary statistics: a systematic review of methods and tools. BioData Min 2024; 17:31. [PMID: 39238044 PMCID: PMC11375927 DOI: 10.1186/s13040-024-00385-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 08/27/2024] [Indexed: 09/07/2024] Open
Abstract
Genome-wide association studies (GWAS) have revolutionized our understanding of the genetic architecture of complex traits and diseases. GWAS summary statistics have become essential tools for various genetic analyses, including meta-analysis, fine-mapping, and risk prediction. However, the increasing number of GWAS summary statistics and the diversity of software tools available for their analysis can make it challenging for researchers to select the most appropriate tools for their specific needs. This systematic review aims to provide a comprehensive overview of the currently available software tools and databases for GWAS summary statistics analysis. We conducted a comprehensive literature search to identify relevant software tools and databases. We categorized the tools and databases by their functionality, including data management, quality control, single-trait analysis, and multiple-trait analysis. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a total of 305 functioning software tools and databases dedicated to GWAS summary statistics, each with unique strengths and limitations. We provide descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discuss the overall usability and applicability of each tool for different research scenarios. This comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases. By providing a detailed overview of the available tools and databases, we aim to facilitate informed tool selection and maximize the effectiveness of GWAS summary statistics analysis.
Collapse
Affiliation(s)
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131, Lamia, Greece.
| |
Collapse
|
4
|
Tkachenko AA, Changalidis AI, Maksiutenko EM, Nasykhova YA, Barbitoff YA, Glotov AS. Replication of Known and Identification of Novel Associations in Biobank-Scale Datasets: A Survey Using UK Biobank and FinnGen. Genes (Basel) 2024; 15:931. [PMID: 39062709 PMCID: PMC11275374 DOI: 10.3390/genes15070931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 07/03/2024] [Accepted: 07/07/2024] [Indexed: 07/28/2024] Open
Abstract
Over the last two decades, numerous genome-wide association studies (GWAS) have been performed to unveil the genetic architecture of human complex traits. Despite multiple efforts aimed at the trans-biobank integration of GWAS results, no systematic analysis of the variant-level properties affecting the replication of known associations (or identifying novel ones) in genome-wide meta-analysis has yet been performed using biobank-scale data. To address this issue, we performed a systematic comparison of GWAS summary statistics for 679 complex traits in the UK Biobank (UKB) and FinnGen (FG) cohorts. We identified 37,148 index variants with genome-wide associations with at least one trait in either cohort or in the meta-analysis, only 3528 (9.5%) of which were shared between UKB and FG. Nearly twice as many variants (6577) were replicated in another dataset at the significance level adjusted for the number of variants selected for replication. However, as many as 9230 loci failed to be replicated. Moreover, as many as 5813 loci were observed as significant associations only in meta-analysis results, highlighting the importance of trans-biobank meta-analysis efforts. We showed that variants that failed to replicate in UKB or FG tend to correspond to rare, less pleiotropic variants with lower effect sizes and lower LD score values. Genome-wide associations specific to meta-analysis were also enriched in low-effect variants; however, such variants tended to be more common and have more consistent frequencies between populations. Taken together, our results show a relatively high rate of non-replication of genome-wide associations in the studied cohorts and highlight both widely appreciated and less acknowledged properties of the associations affecting their identification and replication.
Collapse
Affiliation(s)
| | | | | | | | - Yury A. Barbitoff
- Department of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, 199034 St. Petersburg, Russia; (A.A.T.); (A.I.C.); (E.M.M.); (Y.A.N.); (A.S.G.)
| | | |
Collapse
|
5
|
Dalal T, Patel CJ. PYPE: A pipeline for phenome-wide association and Mendelian randomization in investigator-driven biobank scale analysis. PATTERNS (NEW YORK, N.Y.) 2024; 5:100982. [PMID: 39005490 PMCID: PMC11240175 DOI: 10.1016/j.patter.2024.100982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 11/30/2023] [Accepted: 04/08/2024] [Indexed: 07/16/2024]
Abstract
Phenome-wide association studies (PheWASs) serve as a way of documenting the relationship between genotypes and multiple phenotypes, helping to uncover unexplored genotype-phenotype associations (known as pleiotropy). Secondly, Mendelian randomization (MR) can be harnessed to make causal statements about a pair of phenotypes by comparing their genetic architecture. Thus, approaches that automate both PheWASs and MR can enhance biobank-scale analyses, circumventing the need for multiple tools by providing a comprehensive, end-to-end tool to drive scientific discovery. To this end, we present PYPE, a Python pipeline for running, visualizing, and interpreting PheWASs. PYPE utilizes input genotype or phenotype files to automatically estimate associations between the chosen independent variables and phenotypes. PYPE can also produce a variety of visualizations and can be used to identify nearby genes and functional consequences of significant associations. Finally, PYPE can identify possible causal relationships between phenotypes using MR under a variety of causal effect modeling scenarios.
Collapse
Affiliation(s)
- Taykhoom Dalal
- Harvard Medical School Department of Biomedical Informatics, Boston, MA 02115, USA
| | - Chirag J Patel
- Harvard Medical School Department of Biomedical Informatics, Boston, MA 02115, USA
| |
Collapse
|
6
|
Roy DG, Singh L, Chaturvedi HK, Chinnaswamy S. Gender-dependent multiple cross-phenotype association of interferon lambda genetic variants with peripheral blood profiles in healthy individuals. Mol Genet Genomic Med 2024; 12:e2292. [PMID: 37795763 PMCID: PMC10767428 DOI: 10.1002/mgg3.2292] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 09/08/2023] [Accepted: 09/19/2023] [Indexed: 10/06/2023] Open
Abstract
BACKGROUND Type III interferons (IFN), also called as lambda IFNs (IFN-λs), are antiviral and immunomodulatory cytokines that are evolutionarily important in humans. Given their central roles in innate immunity, they could be influencing other aspects of human biology. This study aimed to examine the association of genetic variants that control the expression and/or activity of IFN-λ3 and IFN-λ4 with multiple phenotypes in blood profiles of healthy individuals. METHODS In a cohort of about 550 self-declared healthy individuals, after applying several exclusion criteria to determine their health status, we measured 30 blood parameters, including cellular, biochemical, and metabolic profiles. We genotyped them at rs12979860 and rs28416813 using competitive allele-specific PCR assays and tested their association with the blood profiles under dominant and recessive models for the minor allele. IFN-λ4 variants rs368234815 and rs117648444 were also genotyped or inferred. RESULTS We saw no association in the combined cohort under either of the models for any of the phenotypes. When we stratified the cohort based on gender, we saw a significant association only in males with monocyte (p = 1 × 10-3 ) and SGOT (p = 7 × 10-3 ) levels under the dominant model and with uric acid levels (p = 0.01) under the recessive model. When we tested the IFN-λ4 activity modifying variant within groupings based on absence or presence of one or two copies of IFN-λ4 and on different activity levels of IFN-λ4, we found significant (p < 0.05) association with several phenotypes like monocyte, triglyceride, VLDL, ALP, and uric acid levels, only in males. All the above significant associations did not show any confounding when we tested for the same with up to ten different demographic and lifestyle variables. CONCLUSIONS These results show that lambda interferons can have pleiotropic effects. However, gender seems to be an effect modifier, with males being more sensitive than females to the effect.
Collapse
Affiliation(s)
- Debarati Guha Roy
- Infectious Disease GeneticsNational Institute of Biomedical GenomicsKalyaniIndia
- Regional Centre for BiotechnologyFaridabadIndia
| | - Lucky Singh
- ICMR‐National Institute of Medical StatisticsNew DelhiIndia
| | | | - Sreedhar Chinnaswamy
- Infectious Disease GeneticsNational Institute of Biomedical GenomicsKalyaniIndia
- Regional Centre for BiotechnologyFaridabadIndia
| |
Collapse
|
7
|
Zhang G, Zhu C, Chen X, Yan J, Xue D, Wei Z, Chuai G, Liu Q. Systematic Exploration of Optimized Base Editing gRNA Design and Pleiotropic Effects with BExplorer. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:1237-1245. [PMID: 35792260 PMCID: PMC11082405 DOI: 10.1016/j.gpb.2022.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Revised: 05/24/2022] [Accepted: 06/27/2022] [Indexed: 06/15/2023]
Abstract
Base editing technology is being increasingly applied in genome engineering, but the current strategy for designing guide RNAs (gRNAs) relies substantially on empirical experience rather than a dependable and efficient in silico design. Furthermore, the pleiotropic effect of base editing on disease treatment remains unexplored, which prevents its further clinical usage. Here, we presented BExplorer, an integrated and comprehensive computational pipeline to optimize the design of gRNAs for 26 existing types of base editors in silico. Using BExplorer, we described its results for two types of mainstream base editors, BE3 and ABE7.10, and evaluated the pleiotropic effects of the corresponding base editing loci. BExplorer revealed 524 and 900 editable pathogenic single nucleotide polymorphism (SNP) loci in the human genome together with the selected optimized gRNAs for BE3 and ABE7.10, respectively. In addition, the impact of 707 edited pathogenic SNP loci following base editing on 131 diseases was systematically explored by revealing their pleiotropic effects, indicating that base editing should be carefully utilized given the potential pleiotropic effects. Collectively, the systematic exploration of optimized base editing gRNA design and the corresponding pleiotropic effects with BExplorer provides a computational basis for applying base editing in disease treatment.
Collapse
Affiliation(s)
- Gongchen Zhang
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Chenyu Zhu
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Xiaohan Chen
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Jifang Yan
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Dongyu Xue
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Zixuan Wei
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Guohui Chuai
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Qi Liu
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China.
| |
Collapse
|
8
|
Fan M, Jin C, Li D, Deng Y, Yao L, Chen Y, Ma YL, Wang T. Multi-level advances in databases related to systems pharmacology in traditional Chinese medicine: a 60-year review. Front Pharmacol 2023; 14:1289901. [PMID: 38035021 PMCID: PMC10682728 DOI: 10.3389/fphar.2023.1289901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 11/03/2023] [Indexed: 12/02/2023] Open
Abstract
The therapeutic effects of traditional Chinese medicine (TCM) involve intricate interactions among multiple components and targets. Currently, computational approaches play a pivotal role in simulating various pharmacological processes of TCM. The application of network analysis in TCM research has provided an effective means to explain the pharmacological mechanisms underlying the actions of herbs or formulas through the lens of biological network analysis. Along with the advances of network analysis, computational science has coalesced around the core chain of TCM research: formula-herb-component-target-phenotype-ZHENG, facilitating the accumulation and organization of the extensive TCM-related data and the establishment of relevant databases. Nonetheless, recent years have witnessed a tendency toward homogeneity in the development and application of these databases. Advancements in computational technologies, including deep learning and foundation model, have propelled the exploration and modeling of intricate systems into a new phase, potentially heralding a new era. This review aims to delves into the progress made in databases related to six key entities: formula, herb, component, target, phenotype, and ZHENG. Systematically discussions on the commonalities and disparities among various database types were presented. In addition, the review raised the issue of research bottleneck in TCM computational pharmacology and envisions the forthcoming directions of computational research within the realm of TCM.
Collapse
Affiliation(s)
- Mengyue Fan
- Innovation Research Institute of Chinese Medicine, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Ching Jin
- Northwestern Institute on Complex Systems, Northwestern University, Evanston, IL, United States
| | - Daping Li
- Innovation Research Institute of Chinese Medicine, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Yingshan Deng
- College of Acupuncture and Massage, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Lin Yao
- Innovation Research Institute of Chinese Medicine, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Yongjun Chen
- Innovation Research Institute of Chinese Medicine, Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Yu-Ling Ma
- Oxford Chinese Medicine Research Centre, University of Oxford, Oxford, United Kingdom
| | - Taiyi Wang
- Innovation Research Institute of Chinese Medicine, Shandong University of Traditional Chinese Medicine, Jinan, China
- Oxford Chinese Medicine Research Centre, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
9
|
Du L, Zhang J, Zhao Y, Shang M, Guo L, Han J. inMTSCCA: An Integrated Multi-task Sparse Canonical Correlation Analysis for Multi-omic Brain Imaging Genetics. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:396-413. [PMID: 37442417 PMCID: PMC10634656 DOI: 10.1016/j.gpb.2023.03.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Revised: 01/29/2023] [Accepted: 03/14/2023] [Indexed: 07/15/2023]
Abstract
Identifying genetic risk factors for Alzheimer's disease (AD) is an important research topic. To date, different endophenotypes, such as imaging-derived endophenotypes and proteomic expression-derived endophenotypes, have shown the great value in uncovering risk genes compared to case-control studies. Biologically, a co-varying pattern of different omics-derived endophenotypes could result from the shared genetic basis. However, existing methods mainly focus on the effect of endophenotypes alone; the effect of cross-endophenotype (CEP) associations remains largely unexploited. In this study, we used both endophenotypes and their CEP associations of multi-omic data to identify genetic risk factors, and proposed two integrated multi-task sparse canonical correlation analysis (inMTSCCA) methods, i.e., pairwise endophenotype correlation-guided MTSCCA (pcMTSCCA) and high-order endophenotype correlation-guided MTSCCA (hocMTSCCA). pcMTSCCA employed pairwise correlations between magnetic resonance imaging (MRI)-derived, plasma-derived, and cerebrospinal fluid (CSF)-derived endophenotypes as an additional penalty. hocMTSCCA used high-order correlations among these multi-omic data for regularization. To figure out genetic risk factors at individual and group levels, as well as altered endophenotypic markers, we introduced sparsity-inducing penalties for both models. We compared pcMTSCCA and hocMTSCCA with three related methods on both simulation and real (consisting of neuroimaging data, proteomic analytes, and genetic data) datasets. The results showed that our methods obtained better or comparable canonical correlation coefficients (CCCs) and better feature subsets than benchmarks. Most importantly, the identified genetic loci and heterogeneous endophenotypic markers showed high relevance. Therefore, jointly using multi-omic endophenotypes and their CEP associations is promising to reveal genetic risk factors. The source code and manual of inMTSCCA are available at https://ngdc.cncb.ac.cn/biocode/tools/BT007330.
Collapse
Affiliation(s)
- Lei Du
- Department of Intelligent Science and Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China.
| | - Jin Zhang
- Department of Intelligent Science and Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Ying Zhao
- Department of Intelligent Science and Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Muheng Shang
- Department of Intelligent Science and Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Lei Guo
- Department of Intelligent Science and Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Junwei Han
- Department of Intelligent Science and Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| |
Collapse
|
10
|
Matta J, Dobrino D, Yeboah D, Howard S, EL-Manzalawy Y, Obafemi-Ajayi T. Connecting phenotype to genotype: PheWAS-inspired analysis of autism spectrum disorder. Front Hum Neurosci 2022; 16:960991. [PMID: 36310845 PMCID: PMC9605200 DOI: 10.3389/fnhum.2022.960991] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 09/14/2022] [Indexed: 04/13/2024] Open
Abstract
Autism Spectrum Disorder (ASD) is extremely heterogeneous clinically and genetically. There is a pressing need for a better understanding of the heterogeneity of ASD based on scientifically rigorous approaches centered on systematic evaluation of the clinical and research utility of both phenotype and genotype markers. This paper presents a holistic PheWAS-inspired method to identify meaningful associations between ASD phenotypes and genotypes. We generate two types of phenotype-phenotype (p-p) graphs: a direct graph that utilizes only phenotype data, and an indirect graph that incorporates genotype as well as phenotype data. We introduce a novel methodology for fusing the direct and indirect p-p networks in which the genotype data is incorporated into the phenotype data in varying degrees. The hypothesis is that the heterogeneity of ASD can be distinguished by clustering the p-p graph. The obtained graphs are clustered using network-oriented clustering techniques, and results are evaluated. The most promising clusterings are subsequently analyzed for biological and domain-based relevance. Clusters obtained delineated different aspects of ASD, including differentiating ASD-specific symptoms, cognitive, adaptive, language and communication functions, and behavioral problems. Some of the important genes associated with the clusters have previous known associations to ASD. We found that clusters based on integrated genetic and phenotype data were more effective at identifying relevant genes than clusters constructed from phenotype information alone. These genes included five with suggestive evidence of ASD association and one known to be a strong candidate.
Collapse
Affiliation(s)
- John Matta
- Department of Computer Science, Southern Illinois University Edwardsville, Edwardsville, IL, United States
| | - Daniel Dobrino
- Department of Computer Science, Southern Illinois University Edwardsville, Edwardsville, IL, United States
| | - Dacosta Yeboah
- Department of Computer Science, Missouri State University, Springfield, MO, United States
| | - Swade Howard
- Department of Computer Science, Southern Illinois University Edwardsville, Edwardsville, IL, United States
| | - Yasser EL-Manzalawy
- Department of Translational Data Science and Informatics, Geisinger, Danville, PA, United States
| | - Tayo Obafemi-Ajayi
- Engineering Program, Missouri State University, Springfield, MO, United States
| |
Collapse
|
11
|
Reinert S. Quantitative genetics of pleiotropy and its potential for plant sciences. JOURNAL OF PLANT PHYSIOLOGY 2022; 276:153784. [PMID: 35944292 DOI: 10.1016/j.jplph.2022.153784] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 07/14/2022] [Accepted: 07/18/2022] [Indexed: 06/15/2023]
Affiliation(s)
- Stephan Reinert
- Friedrich-Alexander-University Erlangen-Nürnberg, Department of Biology, Division of Biochemistry, Biocomputing Lab, Staudtstraße 5, 91058, Erlangen, Germany.
| |
Collapse
|
12
|
Pan J, Kwon JJ, Talamas JA, Borah AA, Vazquez F, Boehm JS, Tsherniak A, Zitnik M, McFarland JM, Hahn WC. Sparse dictionary learning recovers pleiotropy from human cell fitness screens. Cell Syst 2022; 13:286-303.e10. [PMID: 35085500 PMCID: PMC9035054 DOI: 10.1016/j.cels.2021.12.005] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Revised: 10/30/2021] [Accepted: 12/21/2021] [Indexed: 12/28/2022]
Abstract
In high-throughput functional genomic screens, each gene product is commonly assumed to exhibit a singular biological function within a defined protein complex or pathway. In practice, a single gene perturbation may induce multiple cascading functional outcomes, a genetic principle known as pleiotropy. Here, we model pleiotropy in fitness screen collections by representing each gene perturbation as the sum of multiple perturbations of biological functions, each harboring independent fitness effects inferred empirically from the data. Our approach (Webster) recovered pleiotropic functions for DNA damage proteins from genotoxic fitness screens, untangled distinct signaling pathways upstream of shared effector proteins from cancer cell fitness screens, and predicted the stoichiometry of an unknown protein complex subunit from fitness data alone. Modeling compound sensitivity profiles in terms of genetic functions recovered compound mechanisms of action. Our approach establishes a sparse approximation mechanism for unraveling complex genetic architectures underlying high-dimensional gene perturbation readouts.
Collapse
Affiliation(s)
- Joshua Pan
- Dana-Farber Cancer Institute, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Boston, MA 02215, USA
| | - Jason J Kwon
- Dana-Farber Cancer Institute, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Boston, MA 02215, USA
| | - Jessica A Talamas
- Dana-Farber Cancer Institute, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Boston, MA 02215, USA
| | - Ashir A Borah
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | - Jesse S Boehm
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Aviad Tsherniak
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Marinka Zitnik
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Department of Biomedical Informatics, Boston, MA 02215, USA; Harvard University, Data Science Initiative, Cambridge, MA 02138, USA
| | | | - William C Hahn
- Dana-Farber Cancer Institute, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Boston, MA 02215, USA; Brigham and Women's Hospital and Harvard Medical School, Department of Medicine, Boston, MA 02215, USA.
| |
Collapse
|
13
|
New insights into pathogenesis of IgA nephropathy. Int Urol Nephrol 2022; 54:1873-1880. [DOI: 10.1007/s11255-021-03094-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2020] [Accepted: 12/08/2021] [Indexed: 10/19/2022]
|
14
|
Chen J, Sun L, Yu K, Batmanghelich K. Extracting Disease-Relevant Features with Adversarial Regularization. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2021; 2021:3464-3471. [PMID: 35198261 PMCID: PMC8863436 DOI: 10.1109/bibm52615.2021.9669878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Extracting hidden phenotypes is essential in medical data analysis because it facilitates disease subtyping, diagnosis, and understanding of disease etiology. Since the hidden phenotype is usually a low-dimensional representation that comprehensively describes the disease, we require a dimensionality-reduction method that captures as much disease-relevant information as possible. However, most unsupervised or self-supervised methods cannot achieve the goal because they learn a holistic representation containing both disease-relevant and disease-irrelevant information. Supervised methods can capture information that is predictive to the target clinical variable only, but the learned representation is usually not generalizable for the various aspects of the disease. Hence, we develop a dimensionality-reduction approach to extract Disease Relevant Features (DRFs) based on information theory. We propose to use clinical variables that weakly define the disease as so-called anchors. We derive a formulation that makes the DRF predictive of the anchors while forcing the remaining representation to be irrelevant to the anchors via adversarial regularization. We apply our method to a large-scale study of Chronic Obstructive Pulmonary Disease (COPD). Our experiment shows: (1) Learned DRFs are as predictive as the original representation in predicting the anchors, although it is in a significantly lower dimension. (2) Compared to supervised representation, the learned DRFs are more predictive to other relevant disease metrics that are not used during the training. (3) The learned DRFs are related to non-imaging biological measurements such as gene expressions, suggesting the DRFs include information related to the underlying biology of the disease.
Collapse
Affiliation(s)
- Junxiang Chen
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Li Sun
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Ke Yu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Kayhan Batmanghelich
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania
| |
Collapse
|
15
|
Matta J, Dobrino D, Howard S, Yeboah D, Kopel J, El-Manzalawy Y, Obafemi-Ajayi T. A PheWAS Model of Autism Spectrum Disorder. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:2110-2114. [PMID: 34891705 DOI: 10.1109/embc46164.2021.9629533] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Children with Autism Spectrum Disorder (ASD) exhibit a wide diversity in type, number, and severity of social deficits as well as communicative and cognitive difficulties. It is a challenge to categorize the phenotypes of a particular ASD patient with their unique genetic variants. There is a need for a better understanding of the connections between genotype information and the phenotypes to sort out the heterogeneity of ASD. In this study, single nucleotide polymorphism (SNP) and phenotype data obtained from a simplex ASD sample are combined using a PheWAS-inspired approach to construct a phenotype-phenotype network. The network is clustered, yielding groups of etiologically related phenotypes. These clusters are analyzed to identify relevant genes associated with each set of phenotypes. The results identified multiple discriminant SNPs associated with varied phenotype clusters such as ASD aberrant behavior (self-injury, compulsiveness and hyperactivity), as well as IQ and language skills. Overall, these SNPs were linked to 22 significant genes. An extensive literature search revealed that eight of these are known to have strong evidence of association with ASD. The others have been linked to related disorders such as mental conditions, cognition, and social functioning.Clinical relevance- This study further informs on connections between certain groups of ASD phenotypes and their unique genetic variants. Such insight regarding the heterogeneity of ASD would support clinicians to advance more tailored interventions and improve outcomes for ASD patients.
Collapse
|
16
|
Zhao L, Batta I, Matloff W, O'Driscoll C, Hobel S, Toga AW. Neuroimaging PheWAS (Phenome-Wide Association Study): A Free Cloud-Computing Platform for Big-Data, Brain-Wide Imaging Association Studies. Neuroinformatics 2021; 19:285-303. [PMID: 32822005 DOI: 10.1007/s12021-020-09486-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Large-scale, case-control genome-wide association studies (GWASs) have revealed genetic variations associated with diverse neurological and psychiatric disorders. Recent advances in neuroimaging and genomic databases of large healthy and diseased cohorts have empowered studies to characterize effects of the discovered genetic factors on brain structure and function, implicating neural pathways and genetic mechanisms in the underlying biology. However, the unprecedented scale and complexity of the imaging and genomic data requires new advanced biomedical data science tools to manage, process and analyze the data. In this work, we introduce Neuroimaging PheWAS (phenome-wide association study): a web-based system for searching over a wide variety of brain-wide imaging phenotypes to discover true system-level gene-brain relationships using a unified genotype-to-phenotype strategy. This design features a user-friendly graphical user interface (GUI) for anonymous data uploading, study definition and management, and interactive result visualizations as well as a cloud-based computational infrastructure and multiple state-of-art methods for statistical association analysis and multiple comparison correction. We demonstrated the potential of Neuroimaging PheWAS with a case study analyzing the influences of the apolipoprotein E (APOE) gene on various brain morphological properties across the brain in the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. Benchmark tests were performed to evaluate the system's performance using data from UK Biobank. The Neuroimaging PheWAS system is freely available. It simplifies the execution of PheWAS on neuroimaging data and provides an opportunity for imaging genetics studies to elucidate routes at play for specific genetic variants on diseases in the context of detailed imaging phenotypic data.
Collapse
Affiliation(s)
- Lu Zhao
- Laboratory of Neuro Imaging, USC Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA, USA
| | - Ishaan Batta
- Laboratory of Neuro Imaging, USC Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA, USA
| | - William Matloff
- Laboratory of Neuro Imaging, USC Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA, USA
| | - Caroline O'Driscoll
- Laboratory of Neuro Imaging, USC Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA, USA
| | - Samuel Hobel
- Laboratory of Neuro Imaging, USC Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA, USA
| | - Arthur W Toga
- Laboratory of Neuro Imaging, USC Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
17
|
Tarsani E, Kranis A, Maniatis G, Hager-Theodorides AL, Kominakis A. Detection of loci exhibiting pleiotropic effects on body weight and egg number in female broilers. Sci Rep 2021; 11:7441. [PMID: 33811218 PMCID: PMC8018976 DOI: 10.1038/s41598-021-86817-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 03/16/2021] [Indexed: 12/14/2022] Open
Abstract
The objective of the present study was to discover the genetic variants, functional candidate genes, biological processes and molecular functions underlying the negative genetic correlation observed between body weight (BW) and egg number (EN) traits in female broilers. To this end, first a bivariate genome-wide association and second stepwise conditional-joint analyses were performed using 2586 female broilers and 240 k autosomal SNPs. The aforementioned analyses resulted in a total number of 49 independent cross-phenotype (CP) significant SNPs with 35 independent markers showing antagonistic action i.e., positive effects on one trait and negative effects on the other trait. A number of 33 independent CP SNPs were located within 26 and 14 protein coding and long non-coding RNA genes, respectively. Furthermore, 26 independent markers were situated within 44 reported QTLs, most of them related to growth traits. Investigation of the functional role of protein coding genes via pathway and gene ontology analyses highlighted four candidates (CPEB3, ACVR1, MAST2 and CACNA1H) as most plausible pleiotropic genes for the traits under study. Three candidates (CPEB3, MAST2 and CACNA1H) were associated with antagonistic pleiotropy, while ACVR1 with synergistic pleiotropic action. Current results provide a novel insight into the biological mechanism of the genetic trade-off between growth and reproduction, in broilers.
Collapse
Affiliation(s)
- Eirini Tarsani
- Department of Animal Science and Aquaculture, Agricultural University of Athens, Iera Odos 75, 11855, Athens, Greece.
| | - Andreas Kranis
- Aviagen, Newbridge, EH28 8SZ, Midlothian, UK
- The Roslin Institute, University of Edinburgh, Midlothian, EH25 9RG, UK
| | | | - Ariadne L Hager-Theodorides
- Department of Animal Science and Aquaculture, Agricultural University of Athens, Iera Odos 75, 11855, Athens, Greece
| | - Antonios Kominakis
- Department of Animal Science and Aquaculture, Agricultural University of Athens, Iera Odos 75, 11855, Athens, Greece
| |
Collapse
|
18
|
Hendelman A, Zebell S, Rodriguez-Leal D, Dukler N, Robitaille G, Wu X, Kostyun J, Tal L, Wang P, Bartlett ME, Eshed Y, Efroni I, Lippman ZB. Conserved pleiotropy of an ancient plant homeobox gene uncovered by cis-regulatory dissection. Cell 2021; 184:1724-1739.e16. [PMID: 33667348 DOI: 10.1016/j.cell.2021.02.001] [Citation(s) in RCA: 134] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 01/03/2021] [Accepted: 02/01/2021] [Indexed: 01/09/2023]
Abstract
Divergence of gene function is a hallmark of evolution, but assessing functional divergence over deep time is not trivial. The few alleles available for cross-species studies often fail to expose the entire functional spectrum of genes, potentially obscuring deeply conserved pleiotropic roles. Here, we explore the functional divergence of WUSCHEL HOMEOBOX9 (WOX9), suggested to have species-specific roles in embryo and inflorescence development. Using a cis-regulatory editing drive system, we generate a comprehensive allelic series in tomato, which revealed hidden pleiotropic roles for WOX9. Analysis of accessible chromatin and conserved cis-regulatory sequences identifies the regions responsible for this pleiotropic activity, the functions of which are conserved in groundcherry, a tomato relative. Mimicking these alleles in Arabidopsis, distantly related to tomato and groundcherry, reveals new inflorescence phenotypes, exposing a deeply conserved pleiotropy. We suggest that targeted cis-regulatory mutations can uncover conserved gene functions and reduce undesirable effects in crop improvement.
Collapse
Affiliation(s)
- Anat Hendelman
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Sophia Zebell
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA; Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | | | - Noah Dukler
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Gina Robitaille
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA; Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Xuelin Wu
- The Salk Institute for Biological Research, San Diego, CA, USA
| | - Jamie Kostyun
- Biology Department, University of Massachusetts Amherst, Amherst, MA, USA
| | - Lior Tal
- Department of Plant and Environmental Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Peipei Wang
- Institute of Plant Sciences and Genetics in Agriculture, The Robert H. Smith Faculty of Agriculture, The Hebrew University, Rehovot, Israel
| | | | - Yuval Eshed
- Department of Plant and Environmental Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Idan Efroni
- Institute of Plant Sciences and Genetics in Agriculture, The Robert H. Smith Faculty of Agriculture, The Hebrew University, Rehovot, Israel.
| | - Zachary B Lippman
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA; Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| |
Collapse
|
19
|
Fernandes SB, Zhang KS, Jamann TM, Lipka AE. How Well Can Multivariate and Univariate GWAS Distinguish Between True and Spurious Pleiotropy? Front Genet 2021; 11:602526. [PMID: 33584799 PMCID: PMC7873880 DOI: 10.3389/fgene.2020.602526] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 12/11/2020] [Indexed: 11/13/2022] Open
Abstract
Quantification of the simultaneous contributions of loci to multiple traits, a phenomenon called pleiotropy, is facilitated by the increased availability of high-throughput genotypic and phenotypic data. To understand the prevalence and nature of pleiotropy, the ability of multivariate and univariate genome-wide association study (GWAS) models to distinguish between pleiotropic and non-pleiotropic loci in linkage disequilibrium (LD) first needs to be evaluated. Therefore, we used publicly available maize and soybean genotypic data to simulate multiple pairs of traits that were either (i) controlled by quantitative trait nucleotides (QTNs) on separate chromosomes, (ii) controlled by QTNs in various degrees of LD with each other, or (iii) controlled by a single pleiotropic QTN. We showed that multivariate GWAS could not distinguish between QTNs in LD and a single pleiotropic QTN. In contrast, a unique QTN detection rate pattern was observed for univariate GWAS whenever the simulated QTNs were in high LD or pleiotropic. Collectively, these results suggest that multivariate and univariate GWAS should both be used to infer whether or not causal mutations underlying peak GWAS associations are pleiotropic. Therefore, we recommend that future studies use a combination of multivariate and univariate GWAS models, as both models could be useful for identifying and narrowing down candidate loci with potential pleiotropic effects for downstream biological experiments.
Collapse
Affiliation(s)
- Samuel B. Fernandes
- Department of Crop Science, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| | | | | | - Alexander E. Lipka
- Department of Crop Science, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| |
Collapse
|
20
|
Golriz Khatami S, Domingo-Fernández D, Mubeen S, Hoyt CT, Robinson C, Karki R, Iyappan A, Kodamullil AT, Hofmann-Apitius M. A Systems Biology Approach for Hypothesizing the Effect of Genetic Variants on Neuroimaging Features in Alzheimer's Disease. J Alzheimers Dis 2021; 80:831-840. [PMID: 33554913 PMCID: PMC8075382 DOI: 10.3233/jad-201397] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/07/2021] [Indexed: 01/14/2023]
Abstract
BACKGROUND Neuroimaging markers provide quantitative insight into brain structure and function in neurodegenerative diseases, such as Alzheimer's disease, where we lack mechanistic insights to explain pathophysiology. These mechanisms are often mediated by genes and genetic variations and are often studied through the lens of genome-wide association studies. Linking these two disparate layers (i.e., imaging and genetic variation) through causal relationships between biological entities involved in the disease's etiology would pave the way to large-scale mechanistic reasoning and interpretation. OBJECTIVE We explore how genetic variants may lead to functional alterations of intermediate molecular traits, which can further impact neuroimaging hallmarks over a series of biological processes across multiple scales. METHODS We present an approach in which knowledge pertaining to single nucleotide polymorphisms and imaging readouts is extracted from the literature, encoded in Biological Expression Language, and used in a novel workflow to assist in the functional interpretation of SNPs in a clinical context. RESULTS We demonstrate our approach in a case scenario which proposes KANSL1 as a candidate gene that accounts for the clinically reported correlation between the incidence of the genetic variants and hippocampal atrophy. We find that the workflow prioritizes multiple mechanisms reported in the literature through which KANSL1 may have an impact on hippocampal atrophy such as through the dysregulation of cell proliferation, synaptic plasticity, and metabolic processes. CONCLUSION We have presented an approach that enables pinpointing relevant genetic variants as well as investigating their functional role in biological processes spanning across several, diverse biological scales.
Collapse
Affiliation(s)
- Sepehr Golriz Khatami
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (Fraunhofer SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Daniel Domingo-Fernández
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (Fraunhofer SCAI), Sankt Augustin, Germany
| | - Sarah Mubeen
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (Fraunhofer SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Charles Tapley Hoyt
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (Fraunhofer SCAI), Sankt Augustin, Germany
| | - Christine Robinson
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (Fraunhofer SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Reagon Karki
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (Fraunhofer SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Anandhi Iyappan
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (Fraunhofer SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Alpha Tom Kodamullil
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (Fraunhofer SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (Fraunhofer SCAI), Sankt Augustin, Germany
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| |
Collapse
|
21
|
The Gene scb-1 Underlies Variation in Caenorhabditis elegans Chemotherapeutic Responses. G3-GENES GENOMES GENETICS 2020; 10:2353-2364. [PMID: 32385045 PMCID: PMC7341127 DOI: 10.1534/g3.120.401310] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Pleiotropy, the concept that a single gene controls multiple distinct traits, is prevalent in most organisms and has broad implications for medicine and agriculture. The identification of the molecular mechanisms underlying pleiotropy has the power to reveal previously unknown biological connections between seemingly unrelated traits. Additionally, the discovery of pleiotropic genes increases our understanding of both genetic and phenotypic complexity by characterizing novel gene functions. Quantitative trait locus (QTL) mapping has been used to identify several pleiotropic regions in many organisms. However, gene knockout studies are needed to eliminate the possibility of tightly linked, non-pleiotropic loci. Here, we use a panel of 296 recombinant inbred advanced intercross lines of Caenorhabditis elegans and a high-throughput fitness assay to identify a single large-effect QTL on the center of chromosome V associated with variation in responses to eight chemotherapeutics. We validate this QTL with near-isogenic lines and pair genome-wide gene expression data with drug response traits to perform mediation analysis, leading to the identification of a pleiotropic candidate gene, scb-1, for some of the eight chemotherapeutics. Using deletion strains created by genome editing, we show that scb-1, which was previously implicated in response to bleomycin, also underlies responses to other double-strand DNA break-inducing chemotherapeutics. This finding provides new evidence for the role of scb-1 in the nematode drug response and highlights the power of mediation analysis to identify causal genes.
Collapse
|
22
|
Nghe P, de Vos MGJ, Kingma E, Kogenaru M, Poelwijk FJ, Laan L, Tans SJ. Predicting Evolution Using Regulatory Architecture. Annu Rev Biophys 2020; 49:181-197. [PMID: 32040932 DOI: 10.1146/annurev-biophys-070317-032939] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The limits of evolution have long fascinated biologists. However, the causes of evolutionary constraint have remained elusive due to a poor mechanistic understanding of studied phenotypes. Recently, a range of innovative approaches have leveraged mechanistic information on regulatory networks and cellular biology. These methods combine systems biology models with population and single-cell quantification and with new genetic tools, and they have been applied to a range of complex cellular functions and engineered networks. In this article, we review these developments, which are revealing the mechanistic causes of epistasis at different levels of biological organization-in molecular recognition, within a single regulatory network, and between different networks-providing first indications of predictable features of evolutionary constraint.
Collapse
Affiliation(s)
- Philippe Nghe
- Laboratoire de Biochimie, UMR CBI 8231, ESPCI Paris, PSL Research University, 75005 Paris, France
| | - Marjon G J de Vos
- University of Groningen, GELIFES, 9747 AG Groningen, The Netherlands
| | - Enzo Kingma
- Bionanoscience Department, Delft University of Technology, 2629HZ Delft, The Netherlands
| | - Manjunatha Kogenaru
- Department of Life Sciences, Imperial College London, London SW7 2AZ, United Kingdom
| | - Frank J Poelwijk
- cBio Center, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
| | - Liedewij Laan
- Bionanoscience Department, Delft University of Technology, 2629HZ Delft, The Netherlands
| | - Sander J Tans
- Bionanoscience Department, Delft University of Technology, 2629HZ Delft, The Netherlands.,AMOLF, 1098 XG Amsterdam, The Netherlands;
| |
Collapse
|
23
|
Shikov AE, Skitchenko RK, Predeus AV, Barbitoff YA. Phenome-wide functional dissection of pleiotropic effects highlights key molecular pathways for human complex traits. Sci Rep 2020; 10:1037. [PMID: 31974475 PMCID: PMC6978431 DOI: 10.1038/s41598-020-58040-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Accepted: 01/08/2020] [Indexed: 02/07/2023] Open
Abstract
Over the recent decades, genome-wide association studies (GWAS) have dramatically changed the understanding of human genetics. A recent genetic data release by UK Biobank (UKB) has allowed many researchers worldwide to have comprehensive look into the genetic architecture of thousands of human phenotypes. In this study, we used GWAS summary statistics derived from the UKB cohort to investigate functional mechanisms of pleiotropic effects across the human phenome. We find that highly pleiotropic variants often correspond to broadly expressed genes with ubiquitous functions, such as matrisome components and cell growth regulators; and tend to colocalize with tissue-shared eQTLs. At the same time, signaling pathway components are more prevalent among highly pleiotropic genes compared to regulatory proteins such as transcription factors. Our results suggest that protein-level pleiotropy mediated by ubiquitously expressed genes is the most prevalent mechanism of pleiotropic genetic effects across the human phenome.
Collapse
Affiliation(s)
- Anton E Shikov
- Bioinformatics Institute, Saint Petersburg, Russia
- City Hospital No. 40, Saint Petersburg, Russia
- All-Russian Research Institute for Agricultural Microbiology (ARRIAM), Saint Petersburg, Russia
| | - Rostislav K Skitchenko
- Bioinformatics Institute, Saint Petersburg, Russia
- ITMO University, Saint Petersburg, Russia
| | | | - Yury A Barbitoff
- Bioinformatics Institute, Saint Petersburg, Russia.
- Department of Genetics and Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia.
| |
Collapse
|
24
|
Pendergrass SA, Buyske S, Jeff JM, Frase A, Dudek S, Bradford Y, Ambite JL, Avery CL, Buzkova P, Deelman E, Fesinmeyer MD, Haiman C, Heiss G, Hindorff LA, Hsu CN, Jackson RD, Lin Y, Le Marchand L, Matise TC, Monroe KR, Moreland L, North KE, Park SL, Reiner A, Wallace R, Wilkens LR, Kooperberg C, Ritchie MD, Crawford DC. A phenome-wide association study (PheWAS) in the Population Architecture using Genomics and Epidemiology (PAGE) study reveals potential pleiotropy in African Americans. PLoS One 2019; 14:e0226771. [PMID: 31891604 PMCID: PMC6938343 DOI: 10.1371/journal.pone.0226771] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Accepted: 12/03/2019] [Indexed: 12/11/2022] Open
Abstract
We performed a hypothesis-generating phenome-wide association study (PheWAS) to identify and characterize cross-phenotype associations, where one SNP is associated with two or more phenotypes, between thousands of genetic variants assayed on the Metabochip and hundreds of phenotypes in 5,897 African Americans as part of the Population Architecture using Genomics and Epidemiology (PAGE) I study. The PAGE I study was a National Human Genome Research Institute-funded collaboration of four study sites accessing diverse epidemiologic studies genotyped on the Metabochip, a custom genotyping chip that has dense coverage of regions in the genome previously associated with cardio-metabolic traits and outcomes in mostly European-descent populations. Here we focus on identifying novel phenome-genome relationships, where SNPs are associated with more than one phenotype. To do this, we performed a PheWAS, testing each SNP on the Metabochip for an association with up to 273 phenotypes in the participating PAGE I study sites. We identified 133 putative pleiotropic variants, defined as SNPs associated at an empirically derived p-value threshold of p<0.01 in two or more PAGE study sites for two or more phenotype classes. We further annotated these PheWAS-identified variants using publicly available functional data and local genetic ancestry. Amongst our novel findings is SPARC rs4958487, associated with increased glucose levels and hypertension. SPARC has been implicated in the pathogenesis of diabetes and is also known to have a potential role in fibrosis, a common consequence of multiple conditions including hypertension. The SPARC example and others highlight the potential that PheWAS approaches have in improving our understanding of complex disease architecture by identifying novel relationships between genetic variants and an array of common human phenotypes.
Collapse
Affiliation(s)
| | - Steven Buyske
- Department of Statistics, Rutgers University, Piscataway, New Jersey, United States of America
- Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America
| | - Janina M. Jeff
- Illumina, Inc., San Diego, California, United States of America
| | - Alex Frase
- Department of Genetics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Scott Dudek
- Department of Genetics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Yuki Bradford
- Department of Genetics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Jose-Luis Ambite
- Information Sciences Institute; University of Southern California, Marina del Rey, California, United States of America
| | - Christy L. Avery
- Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Petra Buzkova
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Ewa Deelman
- Information Sciences Institute; University of Southern California, Marina del Rey, California, United States of America
| | | | - Christopher Haiman
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, California, United States of America
| | - Gerardo Heiss
- Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Lucia A. Hindorff
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Chun-Nan Hsu
- Center for Research in Biological Systems, Department of Neurosciences, University of California, San Diego, La Jolla, California, United States of America
| | | | - Yi Lin
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Loic Le Marchand
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii, United States of America
| | - Tara C. Matise
- Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America
| | - Kristine R. Monroe
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, California, United States of America
| | - Larry Moreland
- University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Kari E. North
- Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Sungshim L. Park
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, California, United States of America
| | - Alex Reiner
- Department of Epidemiology, University of Washington, Seattle, Washington, United States of America
| | - Robert Wallace
- Departments of Epidemiology and Internal Medicine, University of Iowa, Iowa City, Iowa, United States of America
| | - Lynne R. Wilkens
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii, United States of America
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Marylyn D. Ritchie
- Department of Genetics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Dana C. Crawford
- Cleveland Institute for Computational Biology, Cleveland, Ohio, United States of America
- Departments of Population and Quantitative Health Sciences and Genetics and Genome Sciences, Case Western Reserve University, Cleveland, Ohio, United States of America
- * E-mail:
| |
Collapse
|
25
|
Precision medicine review: rare driver mutations and their biophysical classification. Biophys Rev 2019; 11:5-19. [PMID: 30610579 PMCID: PMC6381362 DOI: 10.1007/s12551-018-0496-2] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Accepted: 12/18/2018] [Indexed: 02/07/2023] Open
Abstract
How can biophysical principles help precision medicine identify rare driver mutations? A major tenet of pragmatic approaches to precision oncology and pharmacology is that driver mutations are very frequent. However, frequency is a statistical attribute, not a mechanistic one. Rare mutations can also act through the same mechanism, and as we discuss below, “latent driver” mutations may also follow the same route, with “helper” mutations. Here, we review how biophysics provides mechanistic guidelines that extend precision medicine. We outline principles and strategies, especially focusing on mutations that drive cancer. Biophysics has contributed profoundly to deciphering biological processes. However, driven by data science, precision medicine has skirted some of its major tenets. Data science embodies genomics, tissue- and cell-specific expression levels, making it capable of defining genome- and systems-wide molecular disease signatures. It classifies cancer driver genes/mutations and affected pathways, and its associated protein structural data guide drug discovery. Biophysics complements data science. It considers structures and their heterogeneous ensembles, explains how mutational variants can signal through distinct pathways, and how allo-network drugs can be harnessed. Biophysics clarifies how one mutation—frequent or rare—can affect multiple phenotypic traits by populating conformations that favor interactions with other network modules. It also suggests how to identify such mutations and their signaling consequences. Biophysics offers principles and strategies that can help precision medicine push the boundaries to transform our insight into biological processes and the practice of personalized medicine. By contrast, “phenotypic drug discovery,” which capitalizes on physiological cellular conditions and first-in-class drug discovery, may not capture the proper molecular variant. This is because variants of the same protein can express more than one phenotype, and a phenotype can be encoded by several variants.
Collapse
|
26
|
Verma A, Bang L, Miller JE, Zhang Y, Lee MTM, Zhang Y, Byrska-Bishop M, Carey DJ, Ritchie MD, Pendergrass SA, Kim D. Human-Disease Phenotype Map Derived from PheWAS across 38,682 Individuals. Am J Hum Genet 2019; 104:55-64. [PMID: 30598166 PMCID: PMC6323551 DOI: 10.1016/j.ajhg.2018.11.006] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Accepted: 11/12/2018] [Indexed: 12/17/2022] Open
Abstract
Phenome-wide association studies (PheWASs) have been a useful tool for testing associations between genetic variations and multiple complex traits or diagnoses. Linking PheWAS-based associations between phenotypes and a variant or a genomic region into a network provides a new way to investigate cross-phenotype associations, and it might broaden the understanding of genetic architecture that exists between diagnoses, genes, and pleiotropy. We created a network of associations from one of the largest PheWASs on electronic health record (EHR)-derived phenotypes across 38,682 unrelated samples from the Geisinger's biobank; the samples were genotyped through the DiscovEHR project. We computed associations between 632,574 common variants and 541 diagnosis codes. Using these associations, we constructed a "disease-disease" network (DDN) wherein pairs of diseases were connected on the basis of shared associations with a given genetic variant. The DDN provides a landscape of intra-connections within the same disease classes, as well as inter-connections across disease classes. We identified clusters of diseases with known biological connections, such as autoimmune disorders (type 1 diabetes, rheumatoid arthritis, and multiple sclerosis) and cardiovascular disorders. Previously unreported relationships between multiple diseases were identified on the basis of genetic associations as well. The network approach applied in this study can be used to uncover interactions between diseases as a result of their shared, potentially pleiotropic SNPs. Additionally, this approach might advance clinical research and even clinical practice by accelerating our understanding of disease mechanisms on the basis of similar underlying genetic associations.
Collapse
Affiliation(s)
- Anurag Verma
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA; The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| | - Lisa Bang
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA 17821, USA
| | - Jason E Miller
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yanfei Zhang
- Genomic Medicine Institute, Geisinger, Danville, PA 17821, USA
| | | | - Yu Zhang
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA
| | - Marta Byrska-Bishop
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA 17821, USA
| | - David J Carey
- Weis Center for Research, Geisinger, Danville, PA 17821, USA
| | - Marylyn D Ritchie
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA; The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| | - Sarah A Pendergrass
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA 17821, USA
| | - Dokyoon Kim
- The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA; Biomedical and Translational Informatics Institute, Geisinger, Danville, PA 17821, USA.
| |
Collapse
|
27
|
Genomic and Phenomic Research in the 21st Century. Trends Genet 2018; 35:29-41. [PMID: 30342790 DOI: 10.1016/j.tig.2018.09.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Revised: 09/24/2018] [Accepted: 09/25/2018] [Indexed: 02/06/2023]
Abstract
The field of human genomics has changed dramatically over time. Initial genomic studies were predominantly restricted to rare disorders in small families. Over the past decade, researchers changed course from family-based studies and instead focused on common diseases and traits in populations of unrelated individuals. With further advancements in biobanking, computer science, electronic health record (EHR) data, and more affordable high-throughput genomics, we are experiencing a new paradigm in human genomic research. Rapidly changing technologies and resources now make it possible to study thousands of diseases simultaneously at the genomic level. This review will focus on these advancements as scientists begin to incorporate phenome-wide strategies in human genomic research to understand the etiology of human diseases and develop new drugs to treat them.
Collapse
|
28
|
Kulmanov M, Schofield PN, Gkoutos GV, Hoehndorf R. Ontology-based validation and identification of regulatory phenotypes. Bioinformatics 2018; 34:i857-i865. [PMID: 30423068 PMCID: PMC6129279 DOI: 10.1093/bioinformatics/bty605] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Motivation Function annotations of gene products, and phenotype annotations of genotypes, provide valuable information about molecular mechanisms that can be utilized by computational methods to identify functional and phenotypic relatedness, improve our understanding of disease and pathobiology, and lead to discovery of drug targets. Identifying functions and phenotypes commonly requires experiments which are time-consuming and expensive to carry out; creating the annotations additionally requires a curator to make an assertion based on reported evidence. Support to validate the mutual consistency of functional and phenotype annotations as well as a computational method to predict phenotypes from function annotations, would greatly improve the utility of function annotations. Results We developed a novel ontology-based method to validate the mutual consistency of function and phenotype annotations. We apply our method to mouse and human annotations, and identify several inconsistencies that can be resolved to improve overall annotation quality. We also apply our method to the rule-based prediction of regulatory phenotypes from functions and demonstrate that we can predict these phenotypes with Fmax of up to 0.647. Availability and implementation https://github.com/bio-ontology-research-group/phenogocon.
Collapse
Affiliation(s)
- Maxat Kulmanov
- Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Centre, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Paul N Schofield
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, UK
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, UK
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, UK
- NIHR Experimental Cancer Medicine Centre, Birmingham, UK
- NIHR Surgical Reconstruction and Microbiology Research Centre, Birmingham, UK
- NIHR Biomedical Research Centre, Birmingham, UK
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Centre, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
29
|
Multiphenotype association study of patients randomized to initiate antiretroviral regimens in AIDS Clinical Trials Group protocol A5202. Pharmacogenet Genomics 2017; 27:101-111. [PMID: 28099408 PMCID: PMC5285297 DOI: 10.1097/fpc.0000000000000263] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Supplemental Digital Content is available in the text. Background High-throughput approaches are increasingly being used to identify genetic associations across multiple phenotypes simultaneously. Here, we describe a pilot analysis that considered multiple on-treatment laboratory phenotypes from antiretroviral therapy-naive patients who were randomized to initiate antiretroviral regimens in a prospective clinical trial, AIDS Clinical Trials Group protocol A5202. Participants and methods From among 5 9545 294 polymorphisms imputed genome-wide, we analyzed 2544, including 2124 annotated in the PharmGKB, and 420 previously associated with traits in the GWAS Catalog. We derived 774 phenotypes on the basis of context from six variables: plasma atazanavir (ATV) pharmacokinetics, plasma efavirenz (EFV) pharmacokinetics, change in the CD4+ T-cell count, HIV-1 RNA suppression, fasting low-density lipoprotein-cholesterol, and fasting triglycerides. Permutation testing assessed the likelihood of associations being by chance alone. Pleiotropy was assessed for polymorphisms with the lowest P-values. Results This analysis included 1181 patients. At P less than 1.5×10−4, most associations were not by chance alone. Polymorphisms with the lowest P-values for EFV pharmacokinetics (CYPB26 rs3745274), low-density lipoprotein -cholesterol (APOE rs7412), and triglyceride (APOA5 rs651821) phenotypes had been associated previously with those traits in previous studies. The association between triglycerides and rs651821 was present with ATV-containing regimens, but not with EFV-containing regimens. Polymorphisms with the lowest P-values for ATV pharmacokinetics, CD4 T-cell count, and HIV-1 RNA phenotypes had not been reported previously to be associated with that trait. Conclusion Using data from a prospective HIV clinical trial, we identified expected genetic associations, potentially novel associations, and at least one context-dependent association. This study supports high-throughput strategies that simultaneously explore multiple phenotypes from clinical trials’ datasets for genetic associations.
Collapse
|
30
|
Oetjens MT, Bush WS, Denny JC, Birdwell K, Kodaman N, Verma A, Dilks HH, Pendergrass SA, Ritchie MD, Crawford DC. Evidence for extensive pleiotropy among pharmacogenes. Pharmacogenomics 2016; 17:853-66. [PMID: 27249515 DOI: 10.2217/pgs-2015-0007] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
AIM We sought to identify potential pleiotropy involving pharmacogenes. METHODS We tested 184 functional variants in 34 pharmacogenes for associations using a custom grouping of International Classification and Disease, Ninth Revision billing codes extracted from deidentified electronic health records of 6892 patients. RESULTS We replicated several associations including ABCG2 (rs2231142) and gout (p = 1.73 × 10(-7); odds ratio [OR]: 1.73; 95% CI: 1.40-2.12); and SLCO1B1 (rs4149056) and jaundice (p = 2.50 × 10(-4); OR: 1.67; 95% CI: 1.27-2.20). CONCLUSION In this systematic screen for phenotypic associations with functional variants, several novel genotype-phenotype combinations also achieved phenome-wide significance, including SLC15A2 rs1143672 and renal osteodystrophy (p = 2.67 × 10(-) (6); OR: 0.61; 95% CI: 0.49-0.75).
Collapse
Affiliation(s)
- Matthew T Oetjens
- Center for Human Genetics Research, Vanderbilt University, Nashville, TN 37232, USA
| | - William S Bush
- Department of Epidemiology & Biostatistics, Institute for Computational Biology, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37203, USA
| | - Kelly Birdwell
- Department of Medicine, Vanderbilt University, Nashville, TN 37232, USA
| | - Nuri Kodaman
- Center for Human Genetics Research, Vanderbilt University, Nashville, TN 37232, USA
| | - Anurag Verma
- Center for Systems Genomics, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Holli H Dilks
- Sarah Cannon Research Institute, Nashville, TN 37203 USA
| | - Sarah A Pendergrass
- Center for Systems Genomics, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Marylyn D Ritchie
- Center for Systems Genomics, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Dana C Crawford
- Department of Epidemiology & Biostatistics, Institute for Computational Biology, Case Western Reserve University, Cleveland, OH 44106, USA
| |
Collapse
|
31
|
Zhang YP, Zhang YY, Duan DD. From Genome-Wide Association Study to Phenome-Wide Association Study: New Paradigms in Obesity Research. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2016; 140:185-231. [PMID: 27288830 DOI: 10.1016/bs.pmbts.2016.02.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Obesity is a condition in which excess body fat has accumulated over an extent that increases the risk of many chronic diseases. The current clinical classification of obesity is based on measurement of body mass index (BMI), waist-hip ratio, and body fat percentage. However, these measurements do not account for the wide individual variations in fat distribution, degree of fatness or health risks, and genetic variants identified in the genome-wide association studies (GWAS). In this review, we will address this important issue with the introduction of phenome, phenomics, and phenome-wide association study (PheWAS). We will discuss the new paradigm shift from GWAS to PheWAS in obesity research. In the era of precision medicine, phenomics and PheWAS provide the required approaches to better definition and classification of obesity according to the association of obese phenome with their unique molecular makeup, lifestyle, and environmental impact.
Collapse
Affiliation(s)
- Y-P Zhang
- Pediatric Heart Center, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
| | - Y-Y Zhang
- Department of Cardiology, Changzhou Second People's Hospital, Changzhou, Jiangsu, China
| | - D D Duan
- Laboratory of Cardiovascular Phenomics, Center for Cardiovascular Research, Department of Pharmacology, and Center for Molecular Medicine, University of Nevada School of Medicine, Reno, NV, United States.
| |
Collapse
|
32
|
Unravelling the human genome-phenome relationship using phenome-wide association studies. Nat Rev Genet 2016; 17:129-45. [PMID: 26875678 DOI: 10.1038/nrg.2015.36] [Citation(s) in RCA: 182] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Advances in genotyping technology have, over the past decade, enabled the focused search for common genetic variation associated with human diseases and traits. With the recently increased availability of detailed phenotypic data from electronic health records and epidemiological studies, the impact of one or more genetic variants on the phenome is starting to be characterized both in clinical and population-based settings using phenome-wide association studies (PheWAS). These studies reveal a number of challenges that will need to be overcome to unlock the full potential of PheWAS for the characterization of the complex human genome-phenome relationship.
Collapse
|