1
|
Hajiaghabozorgi M, Fischbach M, Albrecht M, Wang W, Myers CL. BridGE: a pathway-based analysis tool for detecting genetic interactions from GWAS. Nat Protoc 2024; 19:1400-1435. [PMID: 38514837 PMCID: PMC11311251 DOI: 10.1038/s41596-024-00954-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 11/22/2023] [Indexed: 03/23/2024]
Abstract
Genetic interactions have the potential to modulate phenotypes, including human disease. In principle, genome-wide association studies (GWAS) provide a platform for detecting genetic interactions; however, traditional methods for identifying them, which tend to focus on testing individual variant pairs, lack statistical power. In this protocol, we describe a novel computational approach, called Bridging Gene sets with Epistasis (BridGE), for discovering genetic interactions between biological pathways from GWAS data. We present a Python-based implementation of BridGE along with instructions for its application to a typical human GWAS cohort. The major stages include initial data processing and quality control, construction of a variant-level genetic interaction network, measurement of pathway-level genetic interactions, evaluation of statistical significance using sample permutations and generation of results in a standardized output format. The BridGE software pipeline includes options for running the analysis on multiple cores and multiple nodes for users who have access to computing clusters or a cloud computing environment. In a cluster computing environment with 10 nodes and 100 GB of memory per node, the method can be run in less than 24 h for typical human GWAS cohorts. Using BridGE requires knowledge of running Python programs and basic shell script programming experience.
Collapse
Affiliation(s)
- Mehrad Hajiaghabozorgi
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Mathew Fischbach
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
- Graduate Program in Bioinformatics and Computational Biology (BICB), University of Minnesota, Minneapolis, MN, USA
| | - Michael Albrecht
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
| | - Wen Wang
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA.
| | - Chad L Myers
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA.
- Graduate Program in Bioinformatics and Computational Biology (BICB), University of Minnesota, Minneapolis, MN, USA.
| |
Collapse
|
2
|
Vasilopoulou C, Morris AP, Giannakopoulos G, Duguez S, Duddy W. What Can Machine Learning Approaches in Genomics Tell Us about the Molecular Basis of Amyotrophic Lateral Sclerosis? J Pers Med 2020; 10:E247. [PMID: 33256133 PMCID: PMC7712791 DOI: 10.3390/jpm10040247] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 11/21/2020] [Accepted: 11/23/2020] [Indexed: 02/07/2023] Open
Abstract
Amyotrophic Lateral Sclerosis (ALS) is the most common late-onset motor neuron disorder, but our current knowledge of the molecular mechanisms and pathways underlying this disease remain elusive. This review (1) systematically identifies machine learning studies aimed at the understanding of the genetic architecture of ALS, (2) outlines the main challenges faced and compares the different approaches that have been used to confront them, and (3) compares the experimental designs and results produced by those approaches and describes their reproducibility in terms of biological results and the performances of the machine learning models. The majority of the collected studies incorporated prior knowledge of ALS into their feature selection approaches, and trained their machine learning models using genomic data combined with other types of mined knowledge including functional associations, protein-protein interactions, disease/tissue-specific information, epigenetic data, and known ALS phenotype-genotype associations. The importance of incorporating gene-gene interactions and cis-regulatory elements into the experimental design of future ALS machine learning studies is highlighted. Lastly, it is suggested that future advances in the genomic and machine learning fields will bring about a better understanding of ALS genetic architecture, and enable improved personalized approaches to this and other devastating and complex diseases.
Collapse
Affiliation(s)
- Christina Vasilopoulou
- Northern Ireland Centre for Stratified Medicine, Altnagelvin Hospital Campus, Ulster University, Londonderry BT47 6SB, UK; (C.V.); (S.D.)
| | - Andrew P. Morris
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, University of Manchester, Manchester M13 9PT, UK;
| | - George Giannakopoulos
- Institute of Informatics and Telecommunications, NCSR Demokritos, 153 10 Aghia Paraskevi, Greece;
- Science For You (SciFY) PNPC, TEPA Lefkippos-NCSR Demokritos, 27, Neapoleos, 153 41 Ag. Paraskevi, Greece
| | - Stephanie Duguez
- Northern Ireland Centre for Stratified Medicine, Altnagelvin Hospital Campus, Ulster University, Londonderry BT47 6SB, UK; (C.V.); (S.D.)
| | - William Duddy
- Northern Ireland Centre for Stratified Medicine, Altnagelvin Hospital Campus, Ulster University, Londonderry BT47 6SB, UK; (C.V.); (S.D.)
| |
Collapse
|
3
|
Discovering genetic interactions bridging pathways in genome-wide association studies. Nat Commun 2019; 10:4274. [PMID: 31537791 PMCID: PMC6753138 DOI: 10.1038/s41467-019-12131-7] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 08/20/2019] [Indexed: 12/20/2022] Open
Abstract
Genetic interactions have been reported to underlie phenotypes in a variety of systems, but the extent to which they contribute to complex disease in humans remains unclear. In principle, genome-wide association studies (GWAS) provide a platform for detecting genetic interactions, but existing methods for identifying them from GWAS data tend to focus on testing individual locus pairs, which undermines statistical power. Importantly, a global genetic network mapped for a model eukaryotic organism revealed that genetic interactions often connect genes between compensatory functional modules in a highly coherent manner. Taking advantage of this expected structure, we developed a computational approach called BridGE that identifies pathways connected by genetic interactions from GWAS data. Applying BridGE broadly, we discover significant interactions in Parkinson's disease, schizophrenia, hypertension, prostate cancer, breast cancer, and type 2 diabetes. Our novel approach provides a general framework for mapping complex genetic networks underlying human disease from genome-wide genotype data.
Collapse
|
4
|
Cheng S, Andrew AS, Andrews PC, Moore JH. Complex systems analysis of bladder cancer susceptibility reveals a role for decarboxylase activity in two genome-wide association studies. BioData Min 2016; 9:40. [PMID: 27999618 PMCID: PMC5154053 DOI: 10.1186/s13040-016-0119-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2016] [Accepted: 12/02/2016] [Indexed: 11/24/2022] Open
Abstract
Background Bladder cancer is common disease with a complex etiology that is likely due to many different genetic and environmental factors. The goal of this study was to embrace this complexity using a bioinformatics analysis pipeline designed to use machine learning to measure synergistic interactions between single nucleotide polymorphisms (SNPs) in two genome-wide association studies (GWAS) and then to assess their enrichment within functional groups defined by Gene Ontology. The significance of the results was evaluated using permutation testing and those results that replicated between the two GWAS data sets were reported. Results In the first step of our bioinformatics pipeline, we estimated the pairwise synergistic effects of SNPs on bladder cancer risk in both GWAS data sets using Multifactor Dimensionality Reduction (MDR) machine learning method that is designed specifically for this purpose. Statistical significance was assessed using a 1000-fold permutation test. Each single SNP was assigned a p-value based on its strongest pairwise association. Each SNP was then mapped to one or more genes using a window of 500 kb upstream and downstream from each gene boundary. This window was chosen to capture as many regulatory variants as possible. Using Exploratory Visual Analysis (EVA), we then carried out a gene set enrichment analysis at the gene level to identify those genes with an overabundance of significant SNPs relative to the size of their mapped regions. Each gene was assigned to a biological functional group defined by Gene Ontology (GO). We next used EVA to evaluate the overabundance of significant genes in biological functional groups. Our study yielded one GO category, carboxy-lysase activity (GO:0016831), that was significant in analyses from both GWAS data sets. Interestingly, only the gamma-glutamyl carboxylase (GGCX) gene from this GO group was significant in both the detection and replication data, highlighting the complexity of the pathway-level effects on risk. The GGCX gene is expressed in the bladder, but has not been previously associated with bladder cancer in univariate GWAS. However, there is some experimental evidence that carboxy-lysase activity might play a role in cancer and that genes in this pathway should be explored as drug targets. This study provides a genetic basis for that observation. Conclusions Our machine learning analysis of genetic associations in two GWAS for bladder cancer identified numerous associations with pairs of SNPs. Gene set enrichment analysis found aggregation of risk-associated SNPs in genes and significant genes in GO functional groups. This study supports a role for decarboxylase protein complexes in bladder cancer susceptibility. Previous research has implicated decarboxylases in bladder cancer etiology; however, the genes that we found to be significant in the detection and replication data are not known to have direct influence on bladder cancer, suggesting some novel hypotheses. This study highlights the need for a complex systems approach to the genetic and genomic analysis of common diseases such as cancer.
Collapse
Affiliation(s)
- Samantha Cheng
- Department of Biostatistics and Epidemiology, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104-6116 USA
| | - Angeline S Andrew
- Department of Epidemiology, Geisel School of Medicine, Dartmouth College, Hanover, NH 03755 USA
| | - Peter C Andrews
- Department of Biostatistics and Epidemiology, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104-6116 USA
| | - Jason H Moore
- Department of Biostatistics and Epidemiology, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104-6116 USA
| |
Collapse
|
5
|
Gershoni-Emek N, Chein M, Gluska S, Perlson E. Amyotrophic lateral sclerosis as a spatiotemporal mislocalization disease: location, location, location. INTERNATIONAL REVIEW OF CELL AND MOLECULAR BIOLOGY 2015; 315:23-71. [PMID: 25708461 DOI: 10.1016/bs.ircmb.2014.11.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Spatiotemporal localization of signals is a fundamental feature impacting cell survival and proper function. The cell needs to respond in an accurate manner in both space and time to both intra- and intercellular environment cues. The regulation of this comprehensive process involves the cytoskeleton and the trafficking machinery, as well as local protein synthesis and ligand-receptor mechanisms. Alterations in such mechanisms can lead to cell dysfunction and disease. Motor neurons that can extend over tens of centimeters are a classic example for the importance of such events. Changes in spatiotemporal localization mechanisms are thought to play a role in motor neuron degeneration that occurs in amyotrophic lateral sclerosis (ALS). In this review we will discuss these mechanisms and argue that possible misregulated factors can lead to motor neuron degeneration in ALS.
Collapse
Affiliation(s)
- Noga Gershoni-Emek
- Department of Physiology and Pharmacology, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel; The Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Michael Chein
- Department of Physiology and Pharmacology, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel; The Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Shani Gluska
- Department of Physiology and Pharmacology, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel; The Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Eran Perlson
- Department of Physiology and Pharmacology, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel; The Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
6
|
Zieselman AL, Fisher JM, Hu T, Andrews PC, Greene CS, Shen L, Saykin AJ, Moore JH. Computational genetics analysis of grey matter density in Alzheimer's disease. BioData Min 2014; 7:17. [PMID: 25165488 PMCID: PMC4145360 DOI: 10.1186/1756-0381-7-17] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2014] [Accepted: 08/18/2014] [Indexed: 12/24/2022] Open
Abstract
Background Alzheimer’s disease is the most common form of progressive dementia and there is currently no known cure. The cause of onset is not fully understood but genetic factors are expected to play a significant role. We present here a bioinformatics approach to the genetic analysis of grey matter density as an endophenotype for late onset Alzheimer’s disease. Our approach combines machine learning analysis of gene-gene interactions with large-scale functional genomics data for assessing biological relationships. Results We found a statistically significant synergistic interaction among two SNPs located in the intergenic region of an olfactory gene cluster. This model did not replicate in an independent dataset. However, genes in this region have high-confidence biological relationships and are consistent with previous findings implicating sensory processes in Alzheimer’s disease. Conclusions Previous genetic studies of Alzheimer’s disease have revealed only a small portion of the overall variability due to DNA sequence differences. Some of this missing heritability is likely due to complex gene-gene and gene-environment interactions. We have introduced here a novel bioinformatics analysis pipeline that embraces the complexity of the genetic architecture of Alzheimer’s disease while at the same time harnessing the power of functional genomics. These findings represent novel hypotheses about the genetic basis of this complex disease and provide open-access methods that others can use in their own studies.
Collapse
Affiliation(s)
- Amanda L Zieselman
- Department of Genetics, Institute for Quantitative Biomedical Sciences, Geisel School of Medicine, Dartmouth College, Hanover, New Hampshire 03755, USA
| | - Jonathan M Fisher
- Department of Genetics, Institute for Quantitative Biomedical Sciences, Geisel School of Medicine, Dartmouth College, Hanover, New Hampshire 03755, USA
| | - Ting Hu
- Department of Genetics, Institute for Quantitative Biomedical Sciences, Geisel School of Medicine, Dartmouth College, Hanover, New Hampshire 03755, USA
| | - Peter C Andrews
- Department of Genetics, Institute for Quantitative Biomedical Sciences, Geisel School of Medicine, Dartmouth College, Hanover, New Hampshire 03755, USA
| | - Casey S Greene
- Department of Genetics, Institute for Quantitative Biomedical Sciences, Geisel School of Medicine, Dartmouth College, Hanover, New Hampshire 03755, USA
| | - Li Shen
- Department of Radiology and Imaging Sciences, Center for Neuroimaging and Indiana Alzheimer's Disease Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Andrew J Saykin
- Department of Radiology and Imaging Sciences, Center for Neuroimaging and Indiana Alzheimer's Disease Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Jason H Moore
- Department of Genetics, Institute for Quantitative Biomedical Sciences, Geisel School of Medicine, Dartmouth College, Hanover, New Hampshire 03755, USA
| |
Collapse
|
7
|
Shang H, Liu G, Jiang Y, Fu J, Zhang B, Song R, Wang W. Pathway Analysis of Two Amyotrophic Lateral Sclerosis GWAS Highlights Shared Genetic Signals with Alzheimer’s Disease and Parkinson’s Disease. Mol Neurobiol 2014; 51:361-9. [DOI: 10.1007/s12035-014-8673-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2014] [Accepted: 03/10/2014] [Indexed: 10/25/2022]
|
8
|
Pan Q, Hu T, Malley JD, Andrew AS, Karagas MR, Moore JH. A system-level pathway-phenotype association analysis using synthetic feature random forest. Genet Epidemiol 2014; 38:209-19. [PMID: 24535726 DOI: 10.1002/gepi.21794] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2013] [Revised: 11/21/2013] [Accepted: 01/02/2014] [Indexed: 11/07/2022]
Abstract
As the cost of genome-wide genotyping decreases, the number of genome-wide association studies (GWAS) has increased considerably. However, the transition from GWAS findings to the underlying biology of various phenotypes remains challenging. As a result, due to its system-level interpretability, pathway analysis has become a popular tool for gaining insights on the underlying biology from high-throughput genetic association data. In pathway analyses, gene sets representing particular biological processes are tested for significant associations with a given phenotype. Most existing pathway analysis approaches rely on single-marker statistics and assume that pathways are independent of each other. As biological systems are driven by complex biomolecular interactions, embracing the complex relationships between single-nucleotide polymorphisms (SNPs) and pathways needs to be addressed. To incorporate the complexity of gene-gene interactions and pathway-pathway relationships, we propose a system-level pathway analysis approach, synthetic feature random forest (SF-RF), which is designed to detect pathway-phenotype associations without making assumptions about the relationships among SNPs or pathways. In our approach, the genotypes of SNPs in a particular pathway are aggregated into a synthetic feature representing that pathway via Random Forest (RF). Multiple synthetic features are analyzed using RF simultaneously and the significance of a synthetic feature indicates the significance of the corresponding pathway. We further complement SF-RF with pathway-based Statistical Epistasis Network (SEN) analysis that evaluates interactions among pathways. By investigating the pathway SEN, we hope to gain additional insights into the genetic mechanisms contributing to the pathway-phenotype association. We apply SF-RF to a population-based genetic study of bladder cancer and further investigate the mechanisms that help explain the pathway-phenotype associations using SEN. The bladder cancer associated pathways we found are both consistent with existing biological knowledge and reveal novel and plausible hypotheses for future biological validations.
Collapse
Affiliation(s)
- Qinxin Pan
- Department of Genetics, Geisel School of Medicine, Dartmouth College, Hanover, New Hampshire, United States of America
| | | | | | | | | | | |
Collapse
|