1
|
Shang J, Xu A, Bi M, Zhang Y, Li F, Liu JX. A review: simulation tools for genome-wide interaction studies. Brief Funct Genomics 2024; 23:745-753. [PMID: 39173096 DOI: 10.1093/bfgp/elae034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 07/25/2024] [Accepted: 08/10/2024] [Indexed: 08/24/2024] Open
Abstract
Genome-wide association study (GWAS) is essential for investigating the genetic basis of complex diseases; nevertheless, it usually ignores the interaction of multiple single nucleotide polymorphisms (SNPs). Genome-wide interaction studies provide crucial means for exploring complex genetic interactions that GWAS may miss. Although many interaction methods have been proposed, challenges still persist, including the lack of epistasis models and the inconsistency of benchmark datasets. SNP data simulation is a pivotal intermediary between interaction methods and real applications. Therefore, it is important to obtain epistasis models and benchmark datasets by simulation tools, which is helpful for further improving interaction methods. At present, many simulation tools have been widely employed in the field of population genetics. According to their basic principles, these existing tools can be divided into four categories: coalescent simulation, forward-time simulation, resampling simulation, and other simulation frameworks. In this paper, their basic principles and representative simulation tools are compared and analyzed in detail. Additionally, this paper provides a discussion and summary of the advantages and disadvantages of these frameworks and tools, offering technical insights for the design of new methods, and serving as valuable reference tools for researchers to comprehensively understand GWAS and genome-wide interaction studies.
Collapse
Affiliation(s)
- Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Anqi Xu
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Mingyuan Bi
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266033, China
| | - Feng Li
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Jin-Xing Liu
- School of Health and Life Sciences, University of Health and Rehabilitation Sciences, Qingdao 266114, China
| |
Collapse
|
2
|
MDSN: A Module Detection Method for Identifying High-Order Epistatic Interactions. Genes (Basel) 2022; 13:genes13122403. [PMID: 36553670 PMCID: PMC9778340 DOI: 10.3390/genes13122403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/14/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022] Open
Abstract
Epistatic interactions are referred to as SNPs (single nucleotide polymorphisms) that affect disease development and trait expression nonlinearly, and hence identifying epistatic interactions plays a great role in explaining the pathogenesis and genetic heterogeneity of complex diseases. Many methods have been proposed for epistasis detection; nevertheless, they mainly focus on low-order epistatic interactions, two-order or three-order for instance, and often ignore high-order interactions due to computational burden. In this paper, a module detection method called MDSN is proposed for identifying high-order epistatic interactions. First, an SNP network is constructed by a construction strategy of interaction complementary, which consists of low-order SNP interactions that can be obtained from fast computations. Then, a node evaluation measure that integrates multi-topological features is proposed to improve the node expansion algorithm, where the importance of a node is comprehensively evaluated by the topological characteristics of the neighborhood. Finally, modules are detected in the constructed SNP network, which have high-order epistatic interactions associated with the disease. The MDSN was compared with four state-of-the-art methods on simulation datasets and a real Age-related Macular Degeneration dataset. The results demonstrate that MDSN has higher performance on detecting high-order interactions.
Collapse
|
3
|
Shang J, Cai X, Zhang T, Sun Y, Zhang Y, Liu J, Guan B. EpiReSIM: A Resampling Method of Epistatic Model without Marginal Effects Using Under-Determined System of Equations. Genes (Basel) 2022; 13:genes13122286. [PMID: 36553553 PMCID: PMC9777644 DOI: 10.3390/genes13122286] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 11/30/2022] [Accepted: 12/01/2022] [Indexed: 12/12/2022] Open
Abstract
Simulation experiments are essential to evaluate epistasis detection methods, which is the main way to prove their effectiveness and move toward practical applications. However, due to the lack of effective simulators, especially for simulating models without marginal effects (eNME models), epistasis detection methods can hardly verify their effectiveness through simulation experiments. In this study, we propose a resampling simulation method (EpiReSIM) for generating the eNME model. First, EpiReSIM provides two strategies for solving eNME models. One is to calculate eNME models using prevalence constraints, and another is by joint constraints of prevalence and heritability. We transform the computation of the model into the problem of solving the under-determined system of equations. Introducing the complete orthogonal decomposition method and Newton's method, EpiReSIM calculates the solution of the underdetermined system of equations to obtain the eNME model, especially the solution of the high-order model, which is the highlight of EpiReSIM. Second, based on the computed eNME model, EpiReSIM generates simulation data by a resampling method. Experimental results show that EpiReSIM has advantages in preserving the biological properties of minor allele frequencies and calculating high-order models, and it is a convenient and effective alternative method for current simulation software.
Collapse
Affiliation(s)
- Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Xinrui Cai
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Tongdui Zhang
- Science and Technology Innovation Service Institution of Rizhao, Rizhao 276827, China
| | - Yan Sun
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, China
| | - Jinxing Liu
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Boxin Guan
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
- Correspondence:
| |
Collapse
|
4
|
Tuo S, Li C, Liu F, Zhu Y, Chen T, Feng Z, Liu H, Li A. A Novel Multitasking Ant Colony Optimization Method for Detecting Multiorder SNP Interactions. Interdiscip Sci 2022; 14:814-832. [PMID: 35788965 DOI: 10.1007/s12539-022-00530-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 05/29/2022] [Accepted: 06/01/2022] [Indexed: 06/15/2023]
Abstract
MOTIVATION Linear or nonlinear interactions of multiple single-nucleotide polymorphisms (SNPs) play an important role in understanding the genetic basis of complex human diseases. However, combinatorial analytics in high-dimensional space makes it extremely challenging to detect multiorder SNP interactions. Most classic approaches can only perform one task (for detecting k-order SNP interactions) in each run. Since prior knowledge of a complex disease is usually not available, it is difficult to determine the value of k for detecting k-order SNP interactions. METHODS A novel multitasking ant colony optimization algorithm (named MTACO-DMSI) is proposed to detect multiorder SNP interactions, and it is divided into two stages: searching and testing. In the searching stage, multiple multiorder SNP interaction detection tasks (from 2nd-order to kth-order) are executed in parallel, and two subpopulations that separately adopt the Bayesian network-based K2-score and Jensen-Shannon divergence (JS-score) as evaluation criteria are generated for each task to improve the global search capability and the discrimination ability for various disease models. In the testing stage, the G test statistical test is adopted to further verify the authenticity of candidate solutions to reduce the error rate. RESULT Three multiorder simulated disease models with different interaction effects and three real age-related macular degeneration (AMD), rheumatoid arthritis (RA) and type 1 diabetes (T1D) datasets were used to investigate the performance of the proposed MTACO-DMSI. The experimental results show that the MTACO-DMSI has a faster search speed and higher discriminatory power for diverse simulation disease models than traditional single-task algorithms. The results on real AMD data and RA and T1D datasets indicate that MTACO-DMSI has the ability to detect multiorder SNP interactions at a genome-wide scale. Availability and implementation: https://github.com/shouhengtuo/MTACO-DMSI/.
Collapse
Affiliation(s)
- Shouheng Tuo
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China.
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China.
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China.
| | - Chao Li
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - Fan Liu
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - YanLing Zhu
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - TianRui Chen
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - ZengYu Feng
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - Haiyan Liu
- School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China
- Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an, 710121, Shaanxi, China
- Xi'an Key Laboratory of Big Data and Intelligent Computing, Xi'an, 710121, Shaanxi, China
| | - Aimin Li
- School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, 710048, Shaanxi, China
| |
Collapse
|
5
|
Tuo S, Li C, Liu F, Li A, He L, Geem ZW, Shang J, Liu H, Zhu Y, Feng Z, Chen T. MTHSA-DHEI: multitasking harmony search algorithm for detecting high-order SNP epistatic interactions. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00813-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
AbstractGenome-wide association studies have succeeded in identifying genetic variants associated with complex diseases, but the findings have not been well interpreted biologically. Although it is widely accepted that epistatic interactions of high-order single nucleotide polymorphisms (SNPs) [(1) Single nucleotide polymorphisms (SNP) are mainly deoxyribonucleic acid (DNA) sequence polymorphisms caused by variants at a single nucleotide at the genome level. They are the most common type of heritable variation in humans.] are important causes of complex diseases, the combinatorial explosion of millions of SNPs and multiple tests impose a large computational burden. Moreover, it is extremely challenging to correctly distinguish high-order SNP epistatic interactions from other high-order SNP combinations due to small sample sizes. In this study, a multitasking harmony search algorithm (MTHSA-DHEI) is proposed for detecting high-order epistatic interactions [(2) In classical genetics, if genes X1 and X2 are mutated and each mutation by itself produces a unique disease status (phenotype) but the mutations together cause the same disease status as the gene X1 mutation, gene X1 is epistatic and gene X2 is hypostatic, and gene X1 has an epistatic effect (main effect) on disease status. In this work, a high-order epistatic interaction occurs when two or more SNP loci have a joint influence on disease status.], with the goal of simultaneously detecting multiple types of high-order (k1-order, k2-order, …, kn-order) SNP epistatic interactions. Unified coding is adopted for multiple tasks, and four complementary association evaluation functions are employed to improve the capability of discriminating the high-order SNP epistatic interactions. We compare the proposed MTHSA-DHEI method with four excellent methods for detecting high-order SNP interactions for 8 high-orderepistatic interaction models with no marginal effect (EINMEs) and 12 epistatic interaction models with marginal effects (EIMEs) (*) and implement the MTHSA-DHEI algorithm with a real dataset: age-related macular degeneration (AMD). The experimental results indicate that MTHSA-DHEI has power and an F1-score exceeding 90% for all EIMEs and five EINMEs and reduces the computational time by more than 90%. It can efficiently perform multiple high-order detection tasks for high-order epistatic interactions and improve the discrimination ability for diverse epistasis models.
Collapse
|
6
|
González-Seoane B, Ponte-Fernández C, González-Domínguez J, Martín MJ. PyToxo: a Python tool for calculating penetrance tables of high-order epistasis models. BMC Bioinformatics 2022; 23:117. [PMID: 35366804 PMCID: PMC8977015 DOI: 10.1186/s12859-022-04645-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 03/22/2022] [Indexed: 11/16/2022] Open
Abstract
Background Epistasis is the interaction between different genes when expressing a certain phenotype. If epistasis involves more than two loci it is called high-order epistasis. High-order epistasis is an area under active research because it could be the cause of many complex traits. The most common way to specify an epistasis interaction is through a penetrance table. Results This paper presents PyToxo, a Python tool for generating penetrance tables from any-order epistasis models. Unlike other tools available in the bibliography, PyToxo is able to work with high-order models and realistic penetrance and heritability values, achieving high-precision results in a short time. In addition, PyToxo is distributed as open-source software and includes several interfaces to ease its use. Conclusions PyToxo provides the scientific community with a useful tool to evaluate algorithms and methods that can detect high-order epistasis to continue advancing in the discovery of the causes behind complex diseases. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04645-7.
Collapse
|
7
|
Ponte-Fernandez C, Gonzalez-Dominguez J, Carvajal-Rodriguez A, Martin MJ. Evaluation of Existing Methods for High-Order Epistasis Detection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:912-926. [PMID: 33055017 DOI: 10.1109/tcbb.2020.3030312] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Finding epistatic interactions among loci when expressing a phenotype is a widely employed strategy to understand the genetic architecture of complex traits in GWAS. The abundance of methods dedicated to the same purpose, however, makes it increasingly difficult for scientists to decide which method is more suitable for their studies. This work compares the different epistasis detection methods published during the last decade in terms of runtime, detection power and type I error rate, with a special emphasis on high-order interactions. Results show that in terms of detection power, the only methods that perform well across all experiments are the exhaustive methods, although their computational cost may be prohibitive in large-scale studies. Regarding non-exhaustive methods, not one could consistently find epistasis interactions when marginal effects are absent. If marginal effects are present, there are methods that perform well for high-order interactions, such as BADTrees, FDHE-IW, SingleMI or SNPHarvester. As for false-positive control, only SNPHarvester, FDHE-IW and DCHE show good results. The study concludes that there is no single epistasis detection method to recommend in all scenarios. Authors should prioritize exhaustive methods when sufficient computational resources are available considering the data set size, and resort to non-exhaustive methods when the analysis time is prohibitive.
Collapse
|