1
|
Guo X. JS-MA: A Jensen-Shannon Divergence Based Method for Mapping Genome-Wide Associations on Multiple Diseases. Front Genet 2020; 11:507038. [PMID: 33193597 PMCID: PMC7662082 DOI: 10.3389/fgene.2020.507038] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Accepted: 09/21/2020] [Indexed: 12/14/2022] Open
Abstract
Taking advantage of the high-throughput genotyping technology of Single Nucleotide Polymorphism (SNP), Genome-Wide Association Studies (GWASs) have been successfully implemented for defining the relative role of genes and the environment in disease risk, assisting in enabling preventative and precision medicine. However, current multi-locus-based methods are insufficient in terms of computational cost and discrimination power to detect statistically significant interactions with different genetic effects on multifarious diseases. Statistical tests for multi-locus interactions (≥2 SNPs) raise huge analytical challenges because computational cost increases exponentially as the growth of the cardinality of SNPs in an interaction module. In this paper, we develop a simple, fast, and powerful method, named JS-MA, based on Jensen-Shannon divergence and agglomerative hierarchical clustering, to detect the genome-wide multi-locus interactions associated with multiple diseases. From the systematical simulation, JS-MA is more powerful and efficient compared with the state-of-the-art association mapping tools. JS-MA was applied to the real GWAS datasets for two common diseases, i.e., Rheumatoid Arthritis and Type 1 Diabetes. The results showed that JS-MA not only confirmed recently reported, biologically meaningful associations, but also identified novel multi-locus interactions. Therefore, we believe that JS-MA is suitable and efficient for a full-scale analysis of multi-disease-related interactions in the large GWASs.
Collapse
Affiliation(s)
- Xuan Guo
- Department of Computer Science and Engineering, University of North Texas, Denton, TX, United States
| |
Collapse
|
2
|
Ni P, Wang J, Zhong P, Li Y, Wu FX, Pan Y. Constructing Disease Similarity Networks Based on Disease Module Theory. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:906-915. [PMID: 29993782 DOI: 10.1109/tcbb.2018.2817624] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Quantifying the associations between diseases is now playing an important role in modern biology and medicine. Actually discovering associations between diseases could help us gain deeper insights into pathogenic mechanisms of complex diseases, thus could lead to improvements in disease diagnosis, drug repositioning, and drug development. Due to the growing body of high-throughput biological data, a number of methods have been developed for computing similarity between diseases during the past decade. However, these methods rarely consider the interconnections of genes related to each disease in protein-protein interaction network (PPIN). Recently, the disease module theory has been proposed, which states that disease-related genes or proteins tend to interact with each other in the same neighborhood of a PPIN. In this study, we propose a new method called ModuleSim to measure associations between diseases by using disease-gene association data and PPIN data based on disease module theory. The experimental results show that by considering the interactions between disease modules and their modularity, the disease similarity calculated by ModuleSim has a significant correlation with disease classification of Disease Ontology (DO). Furthermore, ModuleSim outperforms other four popular methods which are all using disease-gene association data and PPIN data to measure disease-disease associations. In addition, the disease similarity network constructed by MoudleSim suggests that ModuleSim is capable of finding potential associations between diseases.
Collapse
|
3
|
Liu L, Yu B, Han M, Yuan S, Wang N. Mild cognitive impairment understanding: an empirical study by data-driven approach. BMC Bioinformatics 2019; 20:481. [PMID: 31874606 PMCID: PMC6929464 DOI: 10.1186/s12859-019-3057-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Accepted: 08/26/2019] [Indexed: 11/14/2022] Open
Abstract
Background Cognitive decline has emerged as a significant threat to both public health and personal welfare, and mild cognitive decline/impairment (MCI) can further develop into Dementia/Alzheimer’s disease. While treatment of Dementia/Alzheimer’s disease can be expensive and ineffective sometimes, the prevention of MCI by identifying modifiable risk factors is a complementary and effective strategy. Results In this study, based on the data collected by Centers for Disease Control and Prevention (CDC) through the nationwide telephone survey, we apply a data-driven approach to re-exam the previously founded risk factors and discover new risk factors. We found that depression, physical health, cigarette usage, education level, and sleep time play an important role in cognitive decline, which is consistent with the previous discovery. Besides that, the first time, we point out that other factors such as arthritis, pulmonary disease, stroke, asthma, marital status also contribute to MCI risk, which is less exploited previously. We also incorporate some machine learning and deep learning algorithms to weigh the importance of various factors contributed to MCI and predicted cognitive declined. Conclusion By incorporating the data-driven approach, we can determine that risk factors significantly correlated with diseases. These correlations could also be expanded to another medical diagnosis besides MCI.
Collapse
Affiliation(s)
- Liyuan Liu
- Data-driven Intelligence Research Laboratory, Kennesaw State University, 1100 South Marietta Pkwy, Marietta, GA, USA
| | - Bingchen Yu
- Data-driven Intelligence Research Laboratory, Kennesaw State University, 1100 South Marietta Pkwy, Marietta, GA, USA.,Georgia State University, 33 Gilmer Street SE, Atlanta, 30302, GA, USA
| | - Meng Han
- Data-driven Intelligence Research Laboratory, Kennesaw State University, 1100 South Marietta Pkwy, Marietta, GA, USA.
| | - Shanshan Yuan
- Hubei University, 11 Xueyuan Road, Wuhan, 430062, Hubei, China
| | - Na Wang
- Key Laboratory of Systems Biomedicine (Ministry of Education) and Collaborative Innovation Center of Systems Biomedicine, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, 200240, China.
| |
Collapse
|
4
|
Abstract
BACKGROUND A collection of disease-associated data contributes to study the association between diseases. Discovering closely related diseases plays a crucial role in revealing their common pathogenic mechanisms. This might further imply treatment that can be appropriated from one disease to another. During the past decades, a number of approaches for calculating disease similarity have been developed. However, most of them are designed to take advantage of single or few data sources, which results in their low accuracy. METHODS In this paper, we propose a novel method, called MultiSourcDSim, to calculate disease similarity by integrating multiple data sources, namely, gene-disease associations, GO biological process-disease associations and symptom-disease associations. Firstly, we establish three disease similarity networks according to the three disease-related data sources respectively. Secondly, the representation of each node is obtained by integrating the three small disease similarity networks. In the end, the learned representations are applied to calculate the similarity between diseases. RESULTS Our approach shows the best performance compared to the other three popular methods. Besides, the similarity network built by MultiSourcDSim suggests that our method can also uncover the latent relationships between diseases. CONCLUSIONS MultiSourcDSim is an efficient approach to predict similarity between diseases.
Collapse
Affiliation(s)
- Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha, 410075 China
| | - Danyi Ye
- School of Computer Science and Engineering, Central South University, Changsha, 410075 China
| | - Junmin Zhao
- School of Computer and Data Science, Henan University of Urban Construction, Pingdingshan, 467000 China
| | - Jingpu Zhang
- School of Computer and Data Science, Henan University of Urban Construction, Pingdingshan, 467000 China
| |
Collapse
|
5
|
Xu EL, Qian X, Yu Q, Zhang H, Cui S. Feature selection with interactions in logistic regression models using multivariate synergies for a GWAS application. BMC Genomics 2018; 19:170. [PMID: 29589561 PMCID: PMC5872388 DOI: 10.1186/s12864-018-4552-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
BACKGROUND Genotype-phenotype association has been one of the long-standing problems in bioinformatics. Identifying both the marginal and epistatic effects among genetic markers, such as Single Nucleotide Polymorphisms (SNPs), has been extensively integrated in Genome-Wide Association Studies (GWAS) to help derive "causal" genetic risk factors and their interactions, which play critical roles in life and disease systems. Identifying "synergistic" interactions with respect to the outcome of interest can help accurate phenotypic prediction and understand the underlying mechanism of system behavior. Many statistical measures for estimating synergistic interactions have been proposed in the literature for such a purpose. However, except for empirical performance, there is still no theoretical analysis on the power and limitation of these synergistic interaction measures. RESULTS In this paper, it is shown that the existing information-theoretic multivariate synergy depends on a small subset of the interaction parameters in the model, sometimes on only one interaction parameter. In addition, an adjusted version of multivariate synergy is proposed as a new measure to estimate the interactive effects, with experiments conducted over both simulated data sets and a real-world GWAS data set to show the effectiveness. CONCLUSIONS We provide rigorous theoretical analysis and empirical evidence on why the information-theoretic multivariate synergy helps with identifying genetic risk factors via synergistic interactions. We further establish the rigorous sample complexity analysis on detecting interactive effects, confirmed by both simulated and real-world data sets.
Collapse
Affiliation(s)
- Easton Li Xu
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, 48109 MI USA
- School of Science and Engineering, Chinese University of Hong Kong, Shenzhen, Guangdong, 518172 China
| | - Xiaoning Qian
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, 77843 TX USA
| | - Qilian Yu
- Department of Electrical and Computer Engineering, University of California, Davis, 95616 CA USA
| | - Han Zhang
- Department of Electrical and Computer Engineering, University of California, Davis, 95616 CA USA
| | - Shuguang Cui
- Department of Electrical and Computer Engineering, University of California, Davis, 95616 CA USA
| |
Collapse
|
6
|
Wen J, Quitadamo A, Hall B, Shi X. Epistasis analysis of microRNAs on pathological stages in colon cancer based on an Empirical Bayesian Elastic Net method. BMC Genomics 2017. [PMID: 29513198 PMCID: PMC5657052 DOI: 10.1186/s12864-017-4130-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Background Colon cancer is a leading cause of worldwide cancer death. It has become clear that microRNAs (miRNAs) play a role in the progress of colon cancer and understanding the effect of miRNAs on tumorigenesis could lead to better prognosis and improved treatment. However, most studies have focused on studying differentially expressed miRNAs between tumor and non-tumor samples or between stages in tumor tissue. Limited work has conducted to study the interactions or epistasis between miRNAs and how the epistasis brings about effect on tumor progression. In this study, we investigate the main and pair-wise epistatic effects of miRNAs on the pathological stages of colon cancer using datasets from The Cancer Genome Atlas. Results We develop a workflow composed of multiple steps for feature selection based on the Empirical Bayesian Elastic Net (EBEN) method. First, we identify the main effects using a model with only main effect on the phenotype. Second, a corrected phenotype is calculated by removing the significant main effect from the original phenotype. Third, we select features with epistatic effect on the corrected phenotype. Finally, we run the full model with main and epistatic effects on the previously selected main and epistatic features. Using the multi-step workflow, we identify a set of miRNAs with main and epistatic effect on the pathological stages of colon cancer. Many of miRNAs with main effect on colon cancer have been previously reported to be associated with colon cancer, and the majority of the epistatic miRNAs share common target genes that could explain their epistasis effect on the pathological stages of colon cancer. We also find many of the target genes of detected miRNAs are associated with colon cancer. Go Ontology Enrichment Analysis of the experimentally validates targets of main and epistatic miRNAs, shows that these target genes are enriched for biological processes associated with cancer progression. Conclusion Our results provide a set of candidate miRNAs associated with colon cancer progression that could have potential translational and therapeutic utility. Our analysis workflow offers a new opportunity to efficiently explore epistatic interactions among genetic and epigenetic factors that could be associated with human diseases. Furthermore, our workflow is flexible and can be applied to analyze the main and epistatic effect of various genetic and epigenetic factors on a wide range of phenotypes. Electronic supplementary material The online version of this article (10.1186/s12864-017-4130-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jia Wen
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Andrew Quitadamo
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Benika Hall
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Xinghua Shi
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA.
| |
Collapse
|
7
|
Ni P, Li M, Zhong P, Duan G, Wang J, Li Y, Wu F. Relating Diseases Based on Disease Module Theory. LECTURE NOTES IN COMPUTER SCIENCE 2017:24-33. [DOI: 10.1007/978-3-319-59575-7_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|