1
|
Ye L, Zhang L, Tang B, Liang J, Tan R, Jiang H, Peng W, Lin N, Li K, Xue C, Li M. Ge-SAND: an explainable deep learning-driven framework for disease risk prediction by uncovering complex genetic interactions in parallel. BMC Genomics 2025; 26:432. [PMID: 40312319 PMCID: PMC12044951 DOI: 10.1186/s12864-025-11588-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2024] [Accepted: 04/09/2025] [Indexed: 05/03/2025] Open
Abstract
BACKGROUND Accurate genetic risk prediction and understanding the mechanisms underlying complex diseases are essential for effective intervention and precision medicine. However, current methods often struggle to capture the intricate and subtle genetic interactions contributing to disease risk. This challenge may be further exacerbated by the curse of dimensionality when considering large-scale pairwise genetic combinations with limited samples. Overcoming these limitations could transform biomedicine by providing deeper insights into disease mechanisms, moving beyond black-box models and single-locus analyses, and enabling a more comprehensive understanding of cross-disease patterns. RESULTS We developed Ge-SAND (Genomic Embedding Self-Attention Neurodynamic Decoder), an explainable deep learning-driven framework designed to uncover complex genetic interactions at scales exceeding 106 in parallel for accurate disease risk prediction. Ge-SAND leverages genotype and genomic positional information to identify both intra- and interchromosomal interactions associated with disease phenotypes, providing comprehensive insights into pathogenic mechanisms crucial for disease risk prediction. Applied to simulated datasets and UK Biobank cohorts for Crohn's disease, schizophrenia, and Alzheimer's disease, Ge-SAND achieved up to a 20% improvement in AUC-ROC compared to mainstream methods. Beyond its predictive accuracy, through self-attention-based interaction networks, Ge-SAND provided insights into large-scale genotype relationships and revealed genetic mechanisms underlying these complex diseases. For instance, Ge-SAND identified potential genetic interaction pairs, including novel relationships such as ISOC1 and HOMER2, potentially implicating the brain-gut axis in Crohn's and Alzheimer's diseases. CONCLUSION Ge-SAND is a novel deep-learning approach designed to address the challenges of capturing large-scale genetic interactions. By integrating disease risk prediction with interpretable insights into genetic mechanisms, Ge-SAND offers a valuable tool for advancing genomic research and precision medicine.
Collapse
Affiliation(s)
- Lihang Ye
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, 510080, China
| | - Liubin Zhang
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, 510080, China
| | - Bin Tang
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, 510080, China
| | - Junhao Liang
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, 510080, China
| | - Ruijie Tan
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, 510080, China
| | - Hui Jiang
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China
- Department of Medical Genetics and Prenatal Diagnosis, The Third Affiliated Hospital of Zhengzhou University, Zhengzhou, 450052, China
| | - Wenjie Peng
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, 510080, China
| | - Nan Lin
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, 510080, China
| | - Kun Li
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China
- School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Chao Xue
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, 510080, China
| | - Miaoxin Li
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, 510080, China.
- Key Laboratory of Tropical Disease Control (Sun Yat-Sen University), Ministry of Education, Guangzhou, 510080, China.
| |
Collapse
|
2
|
Wang Y, Yang Y, Jia X, Zhao C, Yang C, Fan J, Wang N, Shi X. Identification of the shared genetic architecture underlying seven autoimmune diseases with GWAS summary statistics. Front Immunol 2024; 14:1303675. [PMID: 38259487 PMCID: PMC10800382 DOI: 10.3389/fimmu.2023.1303675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 12/11/2023] [Indexed: 01/24/2024] Open
Abstract
Background The common clinical symptoms and immunopathological mechanisms have been observed among multiple autoimmune diseases (ADs), but the shared genetic etiology remains unclear. Methods GWAS summary statistics of seven ADs were downloaded from Open Targets Genetics and Dryad. Linkage disequilibrium score regression (LDSC) was applied to estimate overall genetic correlations, bivariate causal mixture model (MiXeR) was used to qualify the polygenic overlap, and stratified-LDSC partitioned heritability to reveal tissue and cell type specific enrichments. Ultimately, we conducted a novel adaptive association test called MTaSPUsSet for identifying pleiotropic genes. Results The high heritability of seven ADs ranged from 0.1228 to 0.5972, and strong genetic correlations among certain phenotypes varied between 0.185 and 0.721. There was substantial polygenic overlap, with the number of shared SNPs approximately 0.03K to 0.21K. The specificity of SNP heritability was enriched in the immune/hematopoietic related tissue and cells. Furthermore, we identified 32 pleiotropic genes associated with seven ADs, 23 genes were considered as novel genes. These genes were involved in several cell regulation pathways and immunologic signatures. Conclusion We comprehensively explored the shared genetic architecture across seven ADs. The findings progress the exploration of common molecular mechanisms and biological processes involved, and facilitate understanding of disease etiology.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Xuezhong Shi
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, China
| |
Collapse
|