1
|
Martí-Gómez C, Zhou J, Chen WC, Kinney JB, McCandlish DM. Inference and visualization of complex genotype-phenotype maps with gpmap-tools. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.09.642267. [PMID: 40161830 PMCID: PMC11952336 DOI: 10.1101/2025.03.09.642267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Multiplex assays of variant effect (MAVEs) allow the functional characterization of an unprecedented number of sequence variants in both gene regulatory regions and protein coding sequences. This has enabled the study of nearly complete combinatorial libraries of mutational variants and revealed the widespread influence of higher-order genetic interactions that arise when multiple mutations are combined. However, the lack of appropriate tools for exploratory analysis of this high-dimensional data limits our overall understanding of the main qualitative properties of complex genotype-phenotype maps. To fill this gap, we have developed gpmap-tools (https://github.com/cmarti/gpmap-tools), a python library that integrates Gaussian process models for inference, phenotypic imputation, and error estimation from incomplete and noisy MAVE data and collections of natural sequences, together with methods for summarizing patterns of higher-order epistasis and non-linear dimensionality reduction techniques that allow visualization of genotype-phenotype maps containing up to millions of genotypes. Here, we used gpmap-tools to study the genotype-phenotype map of the Shine-Dalgarno sequence, a motif that modulates binding of the 16S rRNA to the 5' untranslated region (UTR) of mRNAs through base pair complementarity during translation initiation in prokaryotes. We inferred full combinatorial landscapes containing 262,144 different sequences from the sequences of 5,311 5'UTRs in the E. coli genome and from experimental MAVE data. Visualizations of the inferred landscapes were largely consistent with each other, and unveiled a simple molecular mechanism underlying the highly epistatic genotype-phenotype map of the Shine-Dalgarno sequence.
Collapse
Affiliation(s)
- Carlos Martí-Gómez
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - Juannan Zhou
- Department of Biology, University of Florida, Gainesville, FL, 32611
| | - Wei-Chia Chen
- Department of Physics, National Chung Cheng University, Chiayi 62102, Taiwan, Republic of China
| | - Justin B Kinney
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724
| |
Collapse
|
2
|
Hoffmann M, Hennighausen L. Spotlight on amino acid changing mutations in the JAK-STAT pathway: from disease-specific mutation to general mutation databases. Sci Rep 2025; 15:6202. [PMID: 39979591 PMCID: PMC11842829 DOI: 10.1038/s41598-025-90788-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2025] [Accepted: 02/17/2025] [Indexed: 02/22/2025] Open
Abstract
The JAK-STAT pathway is central to cytokine signaling and controls normal physiology and disease. Aberrant activation via mutations that change amino acids in proteins of the pathway can result in diseases. While disease-centric databases like COSMIC catalog mutations in cancer, their prevalence in healthy populations remains underexplored. We systematically studied such mutations in the JAK-STAT genes by comparing COSMIC and the population-focused All of Us database. Our analysis revealed frequent mutations in all JAK and STAT domains, particularly among white females. We further identified three categories: Mutations uniquely found in All of Us that were associated with cancer in the literature but could not be found in COSMIC, underscoring COSMIC's limitations. Mutations unique to COSMIC underline their potential as drivers of cancer due to their absence in the general population. Mutations present in both databases, e.g., JAK2Val617Phe/V617F - widely recognized as a cancer driver in hematopoietic cells, but without disease associations in All of Us, raising the possibility that combinatorial SNPs might be responsible for disease development. These findings illustrate the complementarity of both databases for understanding mutation impacts and underscore the need for multi-mutation analyses to uncover genetic factors underlying complex diseases and advance personalized medicine.
Collapse
Affiliation(s)
- Markus Hoffmann
- Laboratory of Genetics and Physiology, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, 20892, USA.
| | - Lothar Hennighausen
- Laboratory of Genetics and Physiology, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, 20892, USA
| |
Collapse
|
3
|
Li H, Zeng J, Snyder MP, Zhang S. Modeling gene interactions in polygenic prediction via geometric deep learning. Genome Res 2025; 35:178-187. [PMID: 39562137 PMCID: PMC11789630 DOI: 10.1101/gr.279694.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 11/14/2024] [Indexed: 11/21/2024]
Abstract
Polygenic risk score (PRS) is a widely used approach for predicting individuals' genetic risk of complex diseases, playing a pivotal role in advancing precision medicine. Traditional PRS methods, predominantly following a linear structure, often fall short in capturing the intricate relationships between genotype and phenotype. In this study, we present PRS-Net, an interpretable geometric deep learning-based framework that effectively models the nonlinearity of biological systems for enhanced disease prediction and biological discovery. PRS-Net begins by deconvoluting the genome-wide PRS at the single-gene resolution and then explicitly encapsulates gene-gene interactions leveraging a graph neural network (GNN) for genetic risk prediction, enabling a systematic characterization of molecular interplay underpinning diseases. An attentive readout module is introduced to facilitate model interpretation. Extensive tests across multiple complex traits and diseases demonstrate the superior prediction performance of PRS-Net compared with a wide range of conventional PRS methods. The interpretability of PRS-Net further enhances the identification of disease-relevant genes and gene programs. PRS-Net provides a potent tool for concurrent genetic risk prediction and biological discovery for complex diseases.
Collapse
Affiliation(s)
- Han Li
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China
| | - Jianyang Zeng
- School of Engineering, Research Center for Industries of the Future, Westlake University, Hangzhou, 310030, Zhejiang, China;
| | - Michael P Snyder
- Department of Genetics, Center for Genomics and Personalized Medicine, Stanford University School of Medicine, Stanford, California 94304, USA;
| | - Sai Zhang
- Department of Epidemiology, University of Florida, Gainesville, Florida 32603, USA;
- Departments of Biostatistics & Biomedical Engineering, UF Genetics Institute, University of Florida, Gainesville, Florida 32603, USA
| |
Collapse
|
4
|
Cocoș R, Popescu BO. Scrutinizing neurodegenerative diseases: decoding the complex genetic architectures through a multi-omics lens. Hum Genomics 2024; 18:141. [PMID: 39736681 DOI: 10.1186/s40246-024-00704-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2024] [Accepted: 12/10/2024] [Indexed: 01/01/2025] Open
Abstract
Neurodegenerative diseases present complex genetic architectures, reflecting a continuum from monogenic to oligogenic and polygenic models. Recent advances in multi-omics data, coupled with systems genetics, have significantly refined our understanding of how these data impact neurodegenerative disease mechanisms. To contextualize these genetic discoveries, we provide a comprehensive critical overview of genetic architecture concepts, from Mendelian inheritance to the latest insights from oligogenic and omnigenic models. We explore the roles of common and rare genetic variants, gene-gene and gene-environment interactions, and epigenetic influences in shaping disease phenotypes. Additionally, we emphasize the importance of multi-omics layers including genomic, transcriptomic, proteomic, epigenetic, and metabolomic data in elucidating the molecular mechanisms underlying neurodegeneration. Special attention is given to missing heritability and the contribution of rare variants, particularly in the context of pleiotropy and network pleiotropy. We examine the application of single-cell omics technologies, transcriptome-wide association studies, and epigenome-wide association studies as key approaches for dissecting disease mechanisms at tissue- and cell-type levels. Our review introduces the OmicPeak Disease Trajectory Model, a conceptual framework for understanding the genetic architecture of neurodegenerative disease progression, which integrates multi-omics data across biological layers and time points. This review highlights the critical importance of adopting a systems genetics approach to unravel the complex genetic architecture of neurodegenerative diseases. Finally, this emerging holistic understanding of multi-omics data and the exploration of the intricate genetic landscape aim to provide a foundation for establishing more refined genetic architectures of these diseases, enhancing diagnostic precision, predicting disease progression, elucidating pathogenic mechanisms, and refining therapeutic strategies for neurodegenerative conditions.
Collapse
Affiliation(s)
- Relu Cocoș
- Department of Medical Genetics, 'Carol Davila' University of Medicine and Pharmacy, Bucharest, Romania.
- Genomics Research and Development Institute, Bucharest, Romania.
| | - Bogdan Ovidiu Popescu
- Department of Clinical Neurosciences, 'Carol Davila' University of Medicine and Pharmacy, Bucharest, Romania.
| |
Collapse
|
5
|
Sha Z, Freda PJ, Bhandary P, Ghosh A, Matsumoto N, Moore JH, Hu T. Distinct network patterns emerge from Cartesian and XOR epistasis models: a comparative network science analysis. BioData Min 2024; 17:61. [PMID: 39732697 DOI: 10.1186/s13040-024-00413-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 12/09/2024] [Indexed: 12/30/2024] Open
Abstract
BACKGROUND Epistasis, the phenomenon where the effect of one gene (or variant) is masked or modified by one or more other genes, significantly contributes to the phenotypic variance of complex traits. Traditionally, epistasis has been modeled using the Cartesian epistatic model, a multiplicative approach based on standard statistical regression. However, a recent study investigating epistasis in obesity-related traits has identified potential limitations of the Cartesian epistatic model, revealing that it likely only detects a fraction of the genetic interactions occurring in natural systems. In contrast, the exclusive-or (XOR) epistatic model has shown promise in detecting a broader range of epistatic interactions and revealing more biologically relevant functions associated with interacting variants. To investigate whether the XOR epistatic model also forms distinct network structures compared to the Cartesian model, we applied network science to examine genetic interactions underlying body mass index (BMI) in rats (Rattus norvegicus). RESULTS Our comparative analysis of XOR and Cartesian epistatic models in rats reveals distinct topological characteristics. The XOR model exhibits enhanced sensitivity to epistatic interactions between the network communities found in the Cartesian epistatic network, facilitating the identification of novel trait-related biological functions via community-based enrichment analysis. Additionally, the XOR network features triangle network motifs, indicative of higher-order epistatic interactions. This research also evaluates the impact of linkage disequilibrium (LD)-based edge pruning on network-based epistasis analysis, finding that LD-based edge pruning may lead to increased network fragmentation, which may hinder the effectiveness of network analysis for the investigation of epistasis. We confirmed through network permutation analysis that most XOR and Cartesian epistatic networks derived from the data display distinct structural properties compared to randomly shuffled networks. CONCLUSIONS Collectively, these findings highlight the XOR model's ability to uncover meaningful biological associations and higher-order epistasis derived from lower-order network topologies. The introduction of community-based enrichment analysis and motif-based epistatic discovery emphasize network science as a critical approach for advancing epistasis research and understanding complex genetic architectures.
Collapse
Affiliation(s)
- Zhendong Sha
- School of Computing, Queen's University, 557 Goodwin Hall, 21-25 Union St, Kingston, K7L 2N8, Ontario, Canada
| | - Philip J Freda
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, 90069, CA, USA
| | - Priyanka Bhandary
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, 90069, CA, USA
| | - Attri Ghosh
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, 90069, CA, USA
| | - Nicholas Matsumoto
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, 90069, CA, USA
| | - Jason H Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, 90069, CA, USA.
| | - Ting Hu
- School of Computing, Queen's University, 557 Goodwin Hall, 21-25 Union St, Kingston, K7L 2N8, Ontario, Canada.
| |
Collapse
|
6
|
Stoyanova K, Stoyanov D, Khorev V, Kurkin S. Identifying neural network structures explained by personality traits: combining unsupervised and supervised machine learning techniques in translational validity assessment. THE EUROPEAN PHYSICAL JOURNAL SPECIAL TOPICS 2024. [DOI: 10.1140/epjs/s11734-024-01411-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Accepted: 11/14/2024] [Indexed: 01/12/2025]
Abstract
AbstractThere have been studies previously the neurobiological underpinnings of personality traits in various paradigms such as psychobiological theory and Eysenck’s model as well as five-factor model. However, there are limited results in terms of co-clustering of the functional connectivity as measured by functional MRI, and personality profiles. In the present study, we have analyzed resting-state connectivity networks and character type with the Lowen bioenergetic test in 66 healthy subjects. There have been identified direct correspondences between network metrics such as eigenvector centrality (EC), clustering coefficient (CC), node strength (NS) and specific personality characteristics. Specifically, N Acc L and OFCmed were associated with oral and masochistic traits in terms of EC and CC, while Insula R is associated with oral traits in terms of NS and EC. It is noteworthy that we observed significant correlations between individual items and node measures in specific regions, suggesting a more targeted relationship. However, the more relevant finding is the correlation between metrics (NS, CC, and EC) and overall traits. A hierarchical clustering algorithm (agglomerative clustering, an unsupervised machine learning technique) and principal component analysis were applied, where we identified three prominent principal components that cumulatively explain 76% of the psychometric data. Furthermore, we managed to cluster the network metrics (by unsupervised clustering) to explore whether neural connectivity patterns could be grouped based on combined average network metrics and psychometric data (global and local efficiencies, node strength, eigenvector centrality, and node strength). We identified three principal components, where the cumulative amount of explained data reaches 99%. The correspondence between network measures (CC and NS) and predictors (responses to Lowen’s items) is 62% predicted with a precision of 90%.
Collapse
|
7
|
Shang J, Xu A, Bi M, Zhang Y, Li F, Liu JX. A review: simulation tools for genome-wide interaction studies. Brief Funct Genomics 2024; 23:745-753. [PMID: 39173096 DOI: 10.1093/bfgp/elae034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 07/25/2024] [Accepted: 08/10/2024] [Indexed: 08/24/2024] Open
Abstract
Genome-wide association study (GWAS) is essential for investigating the genetic basis of complex diseases; nevertheless, it usually ignores the interaction of multiple single nucleotide polymorphisms (SNPs). Genome-wide interaction studies provide crucial means for exploring complex genetic interactions that GWAS may miss. Although many interaction methods have been proposed, challenges still persist, including the lack of epistasis models and the inconsistency of benchmark datasets. SNP data simulation is a pivotal intermediary between interaction methods and real applications. Therefore, it is important to obtain epistasis models and benchmark datasets by simulation tools, which is helpful for further improving interaction methods. At present, many simulation tools have been widely employed in the field of population genetics. According to their basic principles, these existing tools can be divided into four categories: coalescent simulation, forward-time simulation, resampling simulation, and other simulation frameworks. In this paper, their basic principles and representative simulation tools are compared and analyzed in detail. Additionally, this paper provides a discussion and summary of the advantages and disadvantages of these frameworks and tools, offering technical insights for the design of new methods, and serving as valuable reference tools for researchers to comprehensively understand GWAS and genome-wide interaction studies.
Collapse
Affiliation(s)
- Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Anqi Xu
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Mingyuan Bi
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266033, China
| | - Feng Li
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Jin-Xing Liu
- School of Health and Life Sciences, University of Health and Rehabilitation Sciences, Qingdao 266114, China
| |
Collapse
|
8
|
Shao M, Chen K, Zhang S, Tian M, Shen Y, Cao C, Gu N. Multiome-wide Association Studies: Novel Approaches for Understanding Diseases. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae077. [PMID: 39471467 PMCID: PMC11630051 DOI: 10.1093/gpbjnl/qzae077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 10/06/2024] [Accepted: 10/23/2024] [Indexed: 11/01/2024]
Abstract
The rapid development of multiome (transcriptome, proteome, cistrome, imaging, and regulome)-wide association study methods have opened new avenues for biologists to understand the susceptibility genes underlying complex diseases. Thorough comparisons of these methods are essential for selecting the most appropriate tool for a given research objective. This review provides a detailed categorization and summary of the statistical models, use cases, and advantages of recent multiome-wide association studies. In addition, to illustrate gene-disease association studies based on transcriptome-wide association study (TWAS), we collected 478 disease entries across 22 categories from 235 manually reviewed publications. Our analysis reveals that mental disorders are the most frequently studied diseases by TWAS, indicating its potential to deepen our understanding of the genetic architecture of complex diseases. In summary, this review underscores the importance of multiome-wide association studies in elucidating complex diseases and highlights the significance of selecting the appropriate method for each study.
Collapse
Affiliation(s)
- Mengting Shao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Kaiyang Chen
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Shuting Zhang
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Min Tian
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Yan Shen
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Chen Cao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Ning Gu
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
- Nanjing Key Laboratory for Cardiovascular Information and Health Engineering Medicine, Institute of Clinical Medicine, Nanjing Drum Tower Hospital, Medical School, Nanjing University, Nanjing 210093, China
| |
Collapse
|
9
|
Gerussi A, Cappadona C, Bernasconi DP, Cristoferi L, Valsecchi MG, Carbone M, Invernizzi P, Asselta R. Improving predictive accuracy in primary biliary cholangitis: A new genetic risk score. Liver Int 2024; 44:1952-1960. [PMID: 38619000 DOI: 10.1111/liv.15916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 03/05/2024] [Accepted: 03/11/2024] [Indexed: 04/16/2024]
Abstract
BACKGROUND AND AIMS Genetic variants influence primary biliary cholangitis (PBC) risk. We established and tested an accurate polygenic risk score (PRS) using these variants. METHODS Data from two Italian cohorts (OldIT 444 cases, 901 controls; NewIT 255 cases, 579 controls) were analysed. The latest international genome-wide meta-analysis provided effect size estimates. The PRS, together with human leukocyte antigen (HLA) status and sex, was included in an integrated risk model. RESULTS Starting from 46 non-HLA genes, 22 variants were selected. PBC patients in the OldIT cohort showed a higher risk score than controls: -.014 (interquartile range, IQR, -.023, .005) versus -.022 (IQR -.030, -.013) (p < 2.2 × 10-16). For genetic-based prediction, the area under the curve (AUC) was .72; adding sex increased the AUC to .82. Validation in the NewIT cohort confirmed the model's accuracy (.71 without sex, .81 with sex). Individuals in the top group, representing the highest 25%, had a PBC risk approximately 14 times higher than that of the reference group (lowest 25%; p < 10-6). CONCLUSION The combination of sex and a novel PRS accurately discriminated between PBC cases and controls. The model identified a subset of individuals at increased risk of PBC who might benefit from tailored monitoring.
Collapse
Affiliation(s)
- Alessio Gerussi
- Division of Gastroenterology, Center for Autoimmune Liver Diseases, European Reference Network on Hepatological Diseases (ERN RARE-LIVER), IRCCS Fondazione San Gerardo dei Tintori, Monza, Italy
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | - Claudio Cappadona
- Department of Biomedical Sciences, Humanitas University, Milan, Italy
- IRCCS Humanitas Research Hospital, Milan, Italy
| | - Davide Paolo Bernasconi
- Bicocca Bioinformatics Biostatistics and Bioimaging Centre-B4, Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | - Laura Cristoferi
- Division of Gastroenterology, Center for Autoimmune Liver Diseases, European Reference Network on Hepatological Diseases (ERN RARE-LIVER), IRCCS Fondazione San Gerardo dei Tintori, Monza, Italy
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
- Bicocca Bioinformatics Biostatistics and Bioimaging Centre-B4, Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | - Maria Grazia Valsecchi
- Bicocca Bioinformatics Biostatistics and Bioimaging Centre-B4, Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
- Biostatistics and Clinical Epidemiology, Fondazione IRCCS San Gerardo dei Tintori, Monza, Italy
| | - Marco Carbone
- Division of Gastroenterology, Center for Autoimmune Liver Diseases, European Reference Network on Hepatological Diseases (ERN RARE-LIVER), IRCCS Fondazione San Gerardo dei Tintori, Monza, Italy
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | - Pietro Invernizzi
- Division of Gastroenterology, Center for Autoimmune Liver Diseases, European Reference Network on Hepatological Diseases (ERN RARE-LIVER), IRCCS Fondazione San Gerardo dei Tintori, Monza, Italy
- Department of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy
| | - Rosanna Asselta
- Department of Biomedical Sciences, Humanitas University, Milan, Italy
- IRCCS Humanitas Research Hospital, Milan, Italy
| |
Collapse
|
10
|
Ham S, Kim SS, Park S, Kwon HC, Ha SG, Bae Y, Lee G, Lee SV. Combinatorial transcriptomic and genetic dissection of insulin/IGF-1 signaling-regulated longevity in Caenorhabditis elegans. Aging Cell 2024; 23:e14151. [PMID: 38529797 PMCID: PMC11258480 DOI: 10.1111/acel.14151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 02/22/2024] [Accepted: 03/10/2024] [Indexed: 03/27/2024] Open
Abstract
Classical genetic analysis is invaluable for understanding the genetic interactions underlying specific phenotypes, but requires laborious and subjective experiments to characterize polygenic and quantitative traits. Contrarily, transcriptomic analysis enables the simultaneous and objective identification of multiple genes whose expression changes are associated with specific phenotypes. Here, we conducted transcriptomic analysis of genes crucial for longevity using datasets with daf-2/insulin/IGF-1 receptor mutant Caenorhabditis elegans. Our analysis unraveled multiple epistatic relationships at the transcriptomic level, in addition to verifying genetically established interactions. Our combinatorial analysis also revealed transcriptomic changes associated with longevity conferred by daf-2 mutations. In particular, we demonstrated that the extent of lifespan changes caused by various mutant alleles of the longevity transcription factor daf-16/FOXO matched their effects on transcriptomic changes in daf-2 mutants. We identified specific aging-regulating signaling pathways and subsets of structural and functional RNA elements altered by different genes in daf-2 mutants. Lastly, we elucidated the functional cooperation between several longevity regulators, based on the combination of transcriptomic and molecular genetic analysis. These data suggest that different biological processes coordinately exert their effects on longevity in biological networks. Together our work demonstrates the utility of transcriptomic dissection analysis for identifying important genetic interactions for physiological processes, including aging and longevity.
Collapse
Affiliation(s)
- Seokjin Ham
- Department of Biological SciencesKorea Advanced Institute of Science and TechnologyDaejeonSouth Korea
| | - Sieun S. Kim
- Department of Biological SciencesKorea Advanced Institute of Science and TechnologyDaejeonSouth Korea
| | - Sangsoon Park
- Department of Biological SciencesKorea Advanced Institute of Science and TechnologyDaejeonSouth Korea
| | - Hyunwoo C. Kwon
- Department of Biological SciencesKorea Advanced Institute of Science and TechnologyDaejeonSouth Korea
| | - Seokjun G. Ha
- Department of Biological SciencesKorea Advanced Institute of Science and TechnologyDaejeonSouth Korea
| | - Yunkyu Bae
- Department of Biological SciencesKorea Advanced Institute of Science and TechnologyDaejeonSouth Korea
| | - Gee‐Yoon Lee
- Department of Biological SciencesKorea Advanced Institute of Science and TechnologyDaejeonSouth Korea
| | - Seung‐Jae V. Lee
- Department of Biological SciencesKorea Advanced Institute of Science and TechnologyDaejeonSouth Korea
| |
Collapse
|
11
|
Sha Z, Freda PJ, Bhandary P, Ghosh A, Matsumoto N, Moore JH, Hu T. Distinct Network Patterns Emerge from Cartesian and XOR Epistasis Models: A Comparative Network Science Analysis. RESEARCH SQUARE 2024:rs.3.rs-4392123. [PMID: 38826481 PMCID: PMC11142370 DOI: 10.21203/rs.3.rs-4392123/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Background Epistasis, the phenomenon where the effect of one gene (or variant) is masked or modified by one or more other genes, can significantly contribute to the observed phenotypic variance of complex traits. To date, it has been generally assumed that genetic interactions can be detected using a Cartesian, or multiplicative, interaction model commonly utilized in standard regression approaches. However, a recent study investigating epistasis in obesity-related traits in rats and mice has identified potential limitations of the Cartesian model, revealing that it only detects some of the genetic interactions occurring in these systems. By applying an alternative approach, the exclusive-or (XOR) model, the researchers detected a greater number of epistatic interactions and identified more biologically relevant ontological terms associated with the interacting loci. This suggests that the XOR model may provide a more comprehensive understanding of epistasis in these species and phenotypes. To further explore these findings and determine if different interaction models also make up distinct epistatic networks, we leverage network science to provide a more comprehensive view into the genetic interactions underlying BMI in this system. Results Our comparative analysis of networks derived from Cartesian and XOR interaction models in rats (Rattus norvegicus) uncovers distinct topological characteristics for each model-derived network. Notably, we discover that networks based on the XOR model exhibit an enhanced sensitivity to epistatic interactions. This sensitivity enables the identification of network communities, revealing novel trait-related biological functions through enrichment analysis. Furthermore, we identify triangle network motifs in the XOR epistatic network, suggestive of higher-order epistasis, based on the topology of lower-order epistasis. Conclusions These findings highlight the XOR model's ability to uncover meaningful biological associations as well as higher-order epistasis from lower-order epistatic networks. Additionally, our results demonstrate that network approaches not only enhance epistasis detection capabilities but also provide more nuanced understandings of genetic architectures underlying complex traits. The identification of community structures and motifs within these distinct networks, especially in XOR, points to the potential for network science to aid in the discovery of novel genetic pathways and regulatory networks. Such insights are important for advancing our understanding of phenotype-genotype relationships.
Collapse
Affiliation(s)
- Zhendong Sha
- School of Computing, Queen’s University, 557 Goodwin Hall, 21-25 Union St, Kingston, Ontario, K7L 2N8, Canada
| | - Philip J. Freda
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, CA, 90069, U.S.A
| | - Priyanka Bhandary
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, CA, 90069, U.S.A
| | - Attri Ghosh
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, CA, 90069, U.S.A
| | - Nicholas Matsumoto
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, CA, 90069, U.S.A
| | - Jason H. Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N. San Vicente Blvd., Pacific Design Center, Suite G540, West Hollywood, CA, 90069, U.S.A
| | - Ting Hu
- School of Computing, Queen’s University, 557 Goodwin Hall, 21-25 Union St, Kingston, Ontario, K7L 2N8, Canada
| |
Collapse
|
12
|
Ma J, Li J, Chen Y, Yang Z, He Y. Poor statistical power in population-based association study of gene interaction. BMC Med Genomics 2024; 17:111. [PMID: 38678264 PMCID: PMC11055307 DOI: 10.1186/s12920-024-01884-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 04/19/2024] [Indexed: 04/29/2024] Open
Abstract
BACKGROUND Statistical epistasis, or "gene-gene interaction" in genetic association studies, means the nonadditive effects between the polymorphic sites on two different genes affecting the same phenotype. In the genetic association analysis of complex traits, nevertheless, the researchers haven't found enough clues of statistical epistasis so far. METHODS We developed a statistical model where the statistical epistasis was presented as an extra linkage disequilibrium between the polymorphic sites of different risk genes. The power of statistical test for identifying the gene-gene interaction was calculated and then compared in different hypothesis scenarios. RESULTS Our results show the statistical power increases with the increasing of interaction coefficient, relative risk, and linkage disequilibrium with genetic markers. However, the power of interaction discovery is much lower than that of regular single-site association test. When rigorous criteria were employed in statistical tests, the identification of gene-gene interaction became a very difficult task. Since the criterion of significance was given to be p-value ≤ 5.0 × 10-8, the same as that of many genome-wide association studies, there is little chance to identify the gene-gene interaction in all kind of circumstances. CONCLUSIONS The lack of epistasis tends to be an inevitable result caused by the statistical principles of methods in the genetic association studies and therefore is the inherent characteristic of the research itself.
Collapse
Affiliation(s)
- Jiarui Ma
- Shanghai Key Laboratory of Medical Epigenetics, International Co-Laboratory of Medical Epigenetics and Metabolism (Ministry of Science and Technology), Institutes of Biomedical Sciences, Fudan University, Shanghai, 200032, China
| | - Jian Li
- Shanghai Key Laboratory of Medical Epigenetics, International Co-Laboratory of Medical Epigenetics and Metabolism (Ministry of Science and Technology), Institutes of Biomedical Sciences, Fudan University, Shanghai, 200032, China
| | - Yuqi Chen
- Shanghai Key Laboratory of Medical Epigenetics, International Co-Laboratory of Medical Epigenetics and Metabolism (Ministry of Science and Technology), Institutes of Biomedical Sciences, Fudan University, Shanghai, 200032, China
| | - Zhen Yang
- Center for Medical Research and Innovation of Pudong Hospital, Intelligent Medicine Institute, Fudan University, Shanghai, 200032, China
| | - Yungang He
- Shanghai Fifth People's Hospital, Intelligent Medicine Institute, Fudan University, Shanghai, 200032, PR China.
| |
Collapse
|
13
|
Alfayyadh MM, Maksemous N, Sutherland HG, Lea RA, Griffiths LR. Unravelling the Genetic Landscape of Hemiplegic Migraine: Exploring Innovative Strategies and Emerging Approaches. Genes (Basel) 2024; 15:443. [PMID: 38674378 PMCID: PMC11049430 DOI: 10.3390/genes15040443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 03/25/2024] [Indexed: 04/28/2024] Open
Abstract
Migraine is a severe, debilitating neurovascular disorder. Hemiplegic migraine (HM) is a rare and debilitating neurological condition with a strong genetic basis. Sequencing technologies have improved the diagnosis and our understanding of the molecular pathophysiology of HM. Linkage analysis and sequencing studies in HM families have identified pathogenic variants in ion channels and related genes, including CACNA1A, ATP1A2, and SCN1A, that cause HM. However, approximately 75% of HM patients are negative for these mutations, indicating there are other genes involved in disease causation. In this review, we explored our current understanding of the genetics of HM. The evidence presented herein summarises the current knowledge of the genetics of HM, which can be expanded further to explain the remaining heritability of this debilitating condition. Innovative bioinformatics and computational strategies to cover the entire genetic spectrum of HM are also discussed in this review.
Collapse
Affiliation(s)
| | | | | | | | - Lyn R. Griffiths
- Centre for Genomics and Personalised Health, Genomics Research Centre, School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, QLD 4059, Australia; (M.M.A.); (N.M.); (H.G.S.); (R.A.L.)
| |
Collapse
|
14
|
Lin WY. Searching for gene-gene interactions through variance quantitative trait loci of 29 continuous Taiwan Biobank phenotypes. Front Genet 2024; 15:1357238. [PMID: 38516378 PMCID: PMC10956579 DOI: 10.3389/fgene.2024.1357238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Accepted: 02/27/2024] [Indexed: 03/23/2024] Open
Abstract
Introduction: After the era of genome-wide association studies (GWAS), thousands of genetic variants have been identified to exhibit main effects on human phenotypes. The next critical issue would be to explore the interplay between genes, the so-called "gene-gene interactions" (GxG) or epistasis. An exhaustive search for all single-nucleotide polymorphism (SNP) pairs is not recommended because this will induce a harsh penalty of multiple testing. Limiting the search of epistasis on SNPs reported by previous GWAS may miss essential interactions between SNPs without significant marginal effects. Moreover, most methods are computationally intensive and can be challenging to implement genome-wide. Methods: I here searched for GxG through variance quantitative trait loci (vQTLs) of 29 continuous Taiwan Biobank (TWB) phenotypes. A discovery cohort of 86,536 and a replication cohort of 25,460 TWB individuals were analyzed, respectively. Results: A total of 18 nearly independent vQTLs with linkage disequilibrium measure r 2 < 0.01 were identified and replicated from nine phenotypes. 15 significant GxG were found with p-values <1.1E-5 (in the discovery cohort) and false discovery rates <2% (in the replication cohort). Among these 15 GxG, 11 were detected for blood traits including red blood cells, hemoglobin, and hematocrit; 2 for total bilirubin; 1 for fasting glucose; and 1 for total cholesterol (TCHO). All GxG were observed for gene pairs on the same chromosome, except for the APOA5 (chromosome 11)-TOMM40 (chromosome 19) interaction for TCHO. Discussion: This study provided a computationally feasible way to search for GxG genome-wide and applied this approach to 29 phenotypes.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, Taipei, Taiwan
- Master of Public Health Degree Program, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
15
|
Batista S, Madar VS, Freda PJ, Bhandary P, Ghosh A, Matsumoto N, Chitre AS, Palmer AA, Moore JH. Interaction models matter: an efficient, flexible computational framework for model-specific investigation of epistasis. BioData Min 2024; 17:7. [PMID: 38419006 PMCID: PMC10900690 DOI: 10.1186/s13040-024-00358-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Accepted: 02/20/2024] [Indexed: 03/02/2024] Open
Abstract
PURPOSE Epistasis, the interaction between two or more genes, is integral to the study of genetics and is present throughout nature. Yet, it is seldom fully explored as most approaches primarily focus on single-locus effects, partly because analyzing all pairwise and higher-order interactions requires significant computational resources. Furthermore, existing methods for epistasis detection only consider a Cartesian (multiplicative) model for interaction terms. This is likely limiting as epistatic interactions can evolve to produce varied relationships between genetic loci, some complex and not linearly separable. METHODS We present new algorithms for the interaction coefficients for standard regression models for epistasis that permit many varied models for the interaction terms for loci and efficient memory usage. The algorithms are given for two-way and three-way epistasis and may be generalized to higher order epistasis. Statistical tests for the interaction coefficients are also provided. We also present an efficient matrix based algorithm for permutation testing for two-way epistasis. We offer a proof and experimental evidence that methods that look for epistasis only at loci that have main effects may not be justified. Given the computational efficiency of the algorithm, we applied the method to a rat data set and mouse data set, with at least 10,000 loci and 1,000 samples each, using the standard Cartesian model and the XOR model to explore body mass index. RESULTS This study reveals that although many of the loci found to exhibit significant statistical epistasis overlap between models in rats, the pairs are mostly distinct. Further, the XOR model found greater evidence for statistical epistasis in many more pairs of loci in both data sets with almost all significant epistasis in mice identified using XOR. In the rat data set, loci involved in epistasis under the XOR model are enriched for biologically relevant pathways. CONCLUSION Our results in both species show that many biologically relevant epistatic relationships would have been undetected if only one interaction model was applied, providing evidence that varied interaction models should be implemented to explore epistatic interactions that occur in living systems.
Collapse
Affiliation(s)
- Sandra Batista
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N San Vicente Blvd., Pacific Design Center, Guite G540, West Hollywood, CA, 90069, USA.
| | | | - Philip J Freda
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N San Vicente Blvd., Pacific Design Center, Guite G540, West Hollywood, CA, 90069, USA
| | - Priyanka Bhandary
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N San Vicente Blvd., Pacific Design Center, Guite G540, West Hollywood, CA, 90069, USA
| | - Attri Ghosh
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N San Vicente Blvd., Pacific Design Center, Guite G540, West Hollywood, CA, 90069, USA
| | - Nicholas Matsumoto
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N San Vicente Blvd., Pacific Design Center, Guite G540, West Hollywood, CA, 90069, USA
| | - Apurva S Chitre
- Department of Psychiatry, University of California, San Diego, 9500 Gilman Dr., Mailcode: 0667, La Jolla, CA, 92093-0667, USA
| | - Abraham A Palmer
- Department of Psychiatry, University of California, San Diego, 9500 Gilman Dr., Mailcode: 0667, La Jolla, CA, 92093-0667, USA
- Institute for Genomic Medicine, University of California, San Diego, 9500 Gilman Dr., Mailcode: 0667, La Jolla, CA, 92093-0667, USA
| | - Jason H Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, 700 N San Vicente Blvd., Pacific Design Center, Guite G540, West Hollywood, CA, 90069, USA.
| |
Collapse
|
16
|
Schwab B, Yin J. Computational multigene interactions in virus growth and infection spread. Virus Evol 2023; 10:vead082. [PMID: 38361828 PMCID: PMC10868543 DOI: 10.1093/ve/vead082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/29/2023] [Accepted: 12/19/2023] [Indexed: 02/17/2024] Open
Abstract
Viruses persist in nature owing to their extreme genetic heterogeneity and large population sizes, which enable them to evade host immune defenses, escape antiviral drugs, and adapt to new hosts. The persistence of viruses is challenging to study because mutations affect multiple virus genes, interactions among genes in their impacts on virus growth are seldom known, and measures of viral fitness are yet to be standardized. To address these challenges, we employed a data-driven computational model of cell infection by a virus. The infection model accounted for the kinetics of viral gene expression, functional gene-gene interactions, genome replication, and allocation of host cellular resources to produce progeny of vesicular stomatitis virus, a prototype RNA virus. We used this model to computationally probe how interactions among genes carrying up to eleven deleterious mutations affect different measures of virus fitness: single-cycle growth yields and multicycle rates of infection spread. Individual mutations were implemented by perturbing biophysical parameters associated with individual gene functions of the wild-type model. Our analysis revealed synergistic epistasis among deleterious mutations in their effects on virus yield; so adverse effects of single deleterious mutations were amplified by interaction. For the same mutations, multicycle infection spread indicated weak or negligible epistasis, where single mutations act alone in their effects on infection spread. These results were robust to simulation in high- and low-host resource environments. Our work highlights how different types and magnitudes of epistasis can arise for genetically identical virus variants, depending on the fitness measure. More broadly, gene-gene interactions can differently affect how viruses grow and spread.
Collapse
Affiliation(s)
- Bradley Schwab
- Wisconsin Institute for Discovery, Chemical and Biological Engineering, University of Wisconsin-Madison, 330 N. Orchard Street, Madison, WI 53715, USA
| | - John Yin
- Wisconsin Institute for Discovery, Chemical and Biological Engineering, University of Wisconsin-Madison, 330 N. Orchard Street, Madison, WI 53715, USA
| |
Collapse
|
17
|
Merabet N, Ramoz N, Boulmaiz A, Bourefis A, Benabdelkrim M, Djeffal O, Moyse E, Tolle V, Berredjem H. SNPs-Panel Polymorphism Variations in GHRL and GHSR Genes Are Not Associated with Prostate Cancer. Biomedicines 2023; 11:3276. [PMID: 38137497 PMCID: PMC10741232 DOI: 10.3390/biomedicines11123276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 12/05/2023] [Accepted: 12/09/2023] [Indexed: 12/24/2023] Open
Abstract
Prostate cancer (PCa) is a major public health problem worldwide. Recent studies have suggested that ghrelin and its receptor could be involved in the susceptibility to several cancers such as PCa, leading to their use as an important predictive way for the clinical progression and prognosis of cancer. However, conflicting results of single nucleotide polymorphisms (SNPs) with ghrelin (GHRL) and its receptor (GHSR) genes were demonstrated in different studies. Thus, the present case-control study was undertaken to investigate the association of GHRL and GHSR polymorphisms with the susceptibility to sporadic PCa. A cohort of 120 PCa patients and 95 healthy subjects were enrolled in this study. Genotyping of six SNPs was performed: three tag SNPs in GHRL (rs696217, rs4684677, rs3491141) and three tag SNPs in the GHSR (rs2922126, rs572169, rs2948694) using TaqMan. The allele and genotype distribution, as well as haplotypes frequencies and linked disequilibrium (LD), were established. Multifactor dimensionality reduction (MDR) analysis was used to study gene-gene interactions between the six SNPs. Our results showed no significant association of the target polymorphisms with PCa (p > 0.05). Nevertheless, SNPs are often just markers that help identify or delimit specific genomic regions that may harbour functional variants rather than the variants causing the disease. Furthermore, we found that one GHSR rs2922126, namely the TT genotype, was significantly more frequent in PCa patients than in controls (p = 0.040). These data suggest that this genotype could be a PCa susceptibility genotype. MDR analyses revealed that the rs2922126 and rs572169 combination was the best model, with 81.08% accuracy (p = 0.0001) for predicting susceptibility to PCa. The results also showed a precision of 98.1% (p < 0.0001) and a PR-AUC of 1.00. Our findings provide new insights into the influence of GHRL and GHSR polymorphisms and significant evidence for gene-gene interactions in PCa susceptibility, and they may guide clinical decision-making to prevent overtreatment and enhance patients' quality of life.
Collapse
Affiliation(s)
- Nesrine Merabet
- Laboratory of Applied Biochemistry and Microbiology, Department of Biochemistry, Faculty of Sciences, Badji Mokhtar University, Annaba 23000, Algeria; (A.B.); (A.B.); (M.B.)
- Unit 85 PRC (Physiology of Reproduction and Behavior), Centre INRAe of Tours, University of Tours, 37380 Nouzilly, France;
| | - Nicolas Ramoz
- University Paris Cité, INSERM U1266, Institute of Psychiatry and Neuroscience of Paris (IPNP), 75014 Paris, France; (N.R.); (V.T.)
| | - Amel Boulmaiz
- Laboratory of Applied Biochemistry and Microbiology, Department of Biochemistry, Faculty of Sciences, Badji Mokhtar University, Annaba 23000, Algeria; (A.B.); (A.B.); (M.B.)
| | - Asma Bourefis
- Laboratory of Applied Biochemistry and Microbiology, Department of Biochemistry, Faculty of Sciences, Badji Mokhtar University, Annaba 23000, Algeria; (A.B.); (A.B.); (M.B.)
| | - Maroua Benabdelkrim
- Laboratory of Applied Biochemistry and Microbiology, Department of Biochemistry, Faculty of Sciences, Badji Mokhtar University, Annaba 23000, Algeria; (A.B.); (A.B.); (M.B.)
| | - Omar Djeffal
- Private Medical Uro-Chirurgical Cabinet, Cité SafSaf, BatR02 n°S01, Annaba 23000, Algeria;
| | - Emmanuel Moyse
- Unit 85 PRC (Physiology of Reproduction and Behavior), Centre INRAe of Tours, University of Tours, 37380 Nouzilly, France;
| | - Virginie Tolle
- University Paris Cité, INSERM U1266, Institute of Psychiatry and Neuroscience of Paris (IPNP), 75014 Paris, France; (N.R.); (V.T.)
| | - Hajira Berredjem
- Laboratory of Applied Biochemistry and Microbiology, Department of Biochemistry, Faculty of Sciences, Badji Mokhtar University, Annaba 23000, Algeria; (A.B.); (A.B.); (M.B.)
| |
Collapse
|
18
|
Lai S, Yan D, Xu J, Yu X, Guo J, Fang X, Tang M, Zhang R, Zhang H, Jia W, Luo M, Hu C. Genetic variants in epoxyeicosatrienoic acid processing and degradation pathways are associated with gestational diabetes mellitus. Nutr J 2023; 22:31. [PMID: 37370090 DOI: 10.1186/s12937-023-00862-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 06/24/2023] [Indexed: 06/29/2023] Open
Abstract
AIM To explore the genetic effects of CYP2C8, CYP2C9, CYP2J2, and EPHX2, the key genes involved in epoxyeicosatrienoic acid processing and degradation pathways in gestational diabetes mellitus (GDM) and metabolic traits in Chinese pregnant women. METHODS A total of 2548 unrelated pregnant women were included, of which 938 had GDM and 1610 were considered as controls. Common variants were genotyped using the Infinium Asian Screening Array. Association studies of single nucleotide polymorphisms (SNPs) with GDM and related traits were performed using logistic regression and multivariable linear regression analyses. A genetic risk score (GRS) model based on 12 independent target SNPs associated with GDM was constructed. Logistic regression was used to estimate odds ratios and 95% confidence intervals, adjusting for potential confounders including age, pre-pregnancy body mass index, history of polycystic ovarian syndrome, history of GDM, and family history of diabetes, with GRS entered both as a continuous variable and categorized groups. The relationship between GRS and quantitative traits was also evaluated. RESULTS The 12 SNPs in CYP2C8, CYP2C9, CYP2J2, and EPHX2 were significantly associated with GDM after adjusting for covariates (all P < 0.05). The GRS generated from these SNPs significantly correlated with GDM. Furthermore, a significant interaction between CYP2J2 and CYP2C8 in GDM (PInteraction = 0.014, ORInteraction= 0.61, 95%CI 0.41-0.90) was observed. CONCLUSION We found significant associations between GDM susceptibility and 12 SNPs of the four genes involved in epoxyeicosatrienoic acid processing and degradation pathways in a Chinese population. Subjects with a higher GRS showed higher GDM susceptibility with higher fasting plasma glucose and area under the curve of glucose and poorer β-cell function.
Collapse
Affiliation(s)
- Siyu Lai
- The Third School of Clinical Medicine, Southern Medical University, Guangzhou, China
- Department of Endocrinology and Metabolism, Southern Medical University Affiliated Fengxian Hospital, Shanghai, China
| | - Dandan Yan
- Shanghai Diabetes Institute, Shanghai Key Laboratory of Diabetes Mellitus, Shanghai Clinical Center for Diabetes, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jie Xu
- Shanghai Diabetes Institute, Shanghai Key Laboratory of Diabetes Mellitus, Shanghai Clinical Center for Diabetes, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Xiangtian Yu
- Clinical Research Center, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jingyi Guo
- Clinical Research Center, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Xiangnan Fang
- The Third School of Clinical Medicine, Southern Medical University, Guangzhou, China
- Department of Endocrinology and Metabolism, Southern Medical University Affiliated Fengxian Hospital, Shanghai, China
- Department of Endocrinology, First Affiliated Hospital of Gannan Medical University, Ganzhou, China
| | - Mengyang Tang
- The Third School of Clinical Medicine, Southern Medical University, Guangzhou, China
- Department of Endocrinology and Metabolism, Southern Medical University Affiliated Fengxian Hospital, Shanghai, China
| | - Rong Zhang
- Shanghai Diabetes Institute, Shanghai Key Laboratory of Diabetes Mellitus, Shanghai Clinical Center for Diabetes, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Hong Zhang
- Shanghai Diabetes Institute, Shanghai Key Laboratory of Diabetes Mellitus, Shanghai Clinical Center for Diabetes, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Weiping Jia
- Shanghai Diabetes Institute, Shanghai Key Laboratory of Diabetes Mellitus, Shanghai Clinical Center for Diabetes, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| | - Mingjuan Luo
- Department of Endocrinology and Metabolism, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China.
| | - Cheng Hu
- The Third School of Clinical Medicine, Southern Medical University, Guangzhou, China.
- Department of Endocrinology and Metabolism, Southern Medical University Affiliated Fengxian Hospital, Shanghai, China.
- Shanghai Diabetes Institute, Shanghai Key Laboratory of Diabetes Mellitus, Shanghai Clinical Center for Diabetes, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| |
Collapse
|
19
|
Montesinos-López OA, Saint Pierre C, Gezan SA, Bentley AR, Mosqueda-González BA, Montesinos-López A, van Eeuwijk F, Beyene Y, Gowda M, Gardner K, Gerard GS, Crespo-Herrera L, Crossa J. Optimizing Sparse Testing for Genomic Prediction of Plant Breeding Crops. Genes (Basel) 2023; 14:genes14040927. [PMID: 37107685 PMCID: PMC10137724 DOI: 10.3390/genes14040927] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 04/07/2023] [Accepted: 04/13/2023] [Indexed: 04/29/2023] Open
Abstract
While sparse testing methods have been proposed by researchers to improve the efficiency of genomic selection (GS) in breeding programs, there are several factors that can hinder this. In this research, we evaluated four methods (M1-M4) for sparse testing allocation of lines to environments under multi-environmental trails for genomic prediction of unobserved lines. The sparse testing methods described in this study are applied in a two-stage analysis to build the genomic training and testing sets in a strategy that allows each location or environment to evaluate only a subset of all genotypes rather than all of them. To ensure a valid implementation, the sparse testing methods presented here require BLUEs (or BLUPs) of the lines to be computed at the first stage using an appropriate experimental design and statistical analyses in each location (or environment). The evaluation of the four cultivar allocation methods to environments of the second stage was done with four data sets (two large and two small) under a multi-trait and uni-trait framework. We found that the multi-trait model produced better genomic prediction (GP) accuracy than the uni-trait model and that methods M3 and M4 were slightly better than methods M1 and M2 for the allocation of lines to environments. Some of the most important findings, however, were that even under a scenario where we used a training-testing relation of 15-85%, the prediction accuracy of the four methods barely decreased. This indicates that genomic sparse testing methods for data sets under these scenarios can save considerable operational and financial resources with only a small loss in precision, which can be shown in our cost-benefit analysis.
Collapse
Affiliation(s)
| | - Carolina Saint Pierre
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, El Batan, Texcoco 56237, Mexico
| | | | - Alison R Bentley
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, El Batan, Texcoco 56237, Mexico
| | - Brandon A Mosqueda-González
- Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN), Mexico City 07738, Mexico
| | - Abelardo Montesinos-López
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44430, Mexico
| | - Fred van Eeuwijk
- Department of Plant Science Mathematical and Statistical Methods-Biometrics, P.O. Box 16, 6700AA Wageningen, The Netherlands
| | - Yoseph Beyene
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, El Batan, Texcoco 56237, Mexico
| | - Manje Gowda
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, El Batan, Texcoco 56237, Mexico
| | - Keith Gardner
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, El Batan, Texcoco 56237, Mexico
| | - Guillermo S Gerard
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, El Batan, Texcoco 56237, Mexico
| | - Leonardo Crespo-Herrera
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, El Batan, Texcoco 56237, Mexico
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Km 45, Carretera Mexico-Veracruz, El Batan, Texcoco 56237, Mexico
- Colegio de Postgraduados, Montecillos 56230, Mexico
| |
Collapse
|
20
|
Xiong W, Chen Y, Ma S. Unified model-free interaction screening via CV-entropy filter. Comput Stat Data Anal 2023; 180:107684. [PMID: 36910335 PMCID: PMC9997997 DOI: 10.1016/j.csda.2022.107684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
For many practical high-dimensional problems, interactions have been increasingly found to play important roles beyond main effects. A representative example is gene-gene interaction. Joint analysis, which analyzes all interactions and main effects in a single model, can be seriously challenged by high dimensionality. For high-dimensional data analysis in general, marginal screening has been established as effective for reducing computational cost, increasing stability, and improving estimation/selection performance. Most of the existing marginal screening methods are designed for the analysis of main effects only. The existing screening methods for interaction analysis are often limited by making stringent model assumptions, lacking robustness, and/or requiring predictors to be continuous (and hence lacking flexibility). A unified marginal screening approach tailored to interaction analysis is developed, which can be applied to regression, classification, and survival analysis. Predictors are allowed to be continuous and discrete. The proposed approach is built on Coefficient of Variation (CV) filters based on information entropy. Statistical properties are rigorously established. It is shown that the CV filters are almost insensitive to the distribution tails of predictors, correlation structure among predictors, and sparsity level of signals. An efficient two-stage algorithm is developed to make the proposed approach scalable to ultrahigh-dimensional data. Simulations and the analysis of TCGA LUAD data further establish the practical superiority of the proposed approach.
Collapse
Affiliation(s)
- Wei Xiong
- School of Statistics, University of International Business and Economics, Beijing 100872, PR China
| | - Yaxian Chen
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong
| | - Shuangge Ma
- Department of Biostatistics, Yale School of Public Health, USA
| |
Collapse
|
21
|
Learning high-order interactions for polygenic risk prediction. PLoS One 2023; 18:e0281618. [PMID: 36763605 PMCID: PMC9916647 DOI: 10.1371/journal.pone.0281618] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 01/27/2023] [Indexed: 02/11/2023] Open
Abstract
Within the framework of precision medicine, the stratification of individual genetic susceptibility based on inherited DNA variation has paramount relevance. However, one of the most relevant pitfalls of traditional Polygenic Risk Scores (PRS) approaches is their inability to model complex high-order non-linear SNP-SNP interactions and their effect on the phenotype (e.g. epistasis). Indeed, they incur in a computational challenge as the number of possible interactions grows exponentially with the number of SNPs considered, affecting the statistical reliability of the model parameters as well. In this work, we address this issue by proposing a novel PRS approach, called High-order Interactions-aware Polygenic Risk Score (hiPRS), that incorporates high-order interactions in modeling polygenic risk. The latter combines an interaction search routine based on frequent itemsets mining and a novel interaction selection algorithm based on Mutual Information, to construct a simple and interpretable weighted model of user-specified dimensionality that can predict a given binary phenotype. Compared to traditional PRSs methods, hiPRS does not rely on GWAS summary statistics nor any external information. Moreover, hiPRS differs from Machine Learning-based approaches that can include complex interactions in that it provides a readable and interpretable model and it is able to control overfitting, even on small samples. In the present work we demonstrate through a comprehensive simulation study the superior performance of hiPRS w.r.t. state of the art methods, both in terms of scoring performance and interpretability of the resulting model. We also test hiPRS against small sample size, class imbalance and the presence of noise, showcasing its robustness to extreme experimental settings. Finally, we apply hiPRS to a case study on real data from DACHS cohort, defining an interaction-aware scoring model to predict mortality of stage II-III Colon-Rectal Cancer patients treated with oxaliplatin.
Collapse
|
22
|
Sha Z, Chen Y, Hu T. NSPA: characterizing the disease association of multiple genetic interactions at single-subject resolution. BIOINFORMATICS ADVANCES 2023; 3:vbad010. [PMID: 36818729 PMCID: PMC9927570 DOI: 10.1093/bioadv/vbad010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 01/02/2023] [Accepted: 02/02/2023] [Indexed: 02/10/2023]
Abstract
Motivation The interaction between genetic variables is one of the major barriers to characterizing the genetic architecture of complex traits. To consider epistasis, network science approaches are increasingly being used in research to elucidate the genetic architecture of complex diseases. Network science approaches associate genetic variables' disease susceptibility to their topological importance in the network. However, this network only represents genetic interactions and does not describe how these interactions attribute to disease association at the subject-scale. We propose the Network-based Subject Portrait Approach (NSPA) and an accompanying feature transformation method to determine the collective risk impact of multiple genetic interactions for each subject. Results The feature transformation method converts genetic variants of subjects into new values that capture how genetic variables interact with others to attribute to a subject's disease association. We apply this approach to synthetic and genetic datasets and learn that (1) the disease association can be captured using multiple disjoint sets of genetic interactions and (2) the feature transformation method based on NSPA improves predictive performance comparing with using the original genetic variables. Our findings confirm the role of genetic interaction in complex disease and provide a novel approach for gene-disease association studies to identify genetic architecture in the context of epistasis. Availability and implementation The codes of NSPA are now available in: https://github.com/MIB-Lab/Network-based-Subject-Portrait-Approach. Contact ting.hu@queensu.ca. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Zhendong Sha
- School of Computing, Queen’s University, Kingston, Ontario, Canada K7L 2N8
| | - Yuanzhu Chen
- School of Computing, Queen’s University, Kingston, Ontario, Canada K7L 2N8
| | - Ting Hu
- To whom correspondence should be addressed.
| |
Collapse
|
23
|
Jeon D, Kang Y, Lee S, Choi S, Sung Y, Lee TH, Kim C. Digitalizing breeding in plants: A new trend of next-generation breeding based on genomic prediction. FRONTIERS IN PLANT SCIENCE 2023; 14:1092584. [PMID: 36743488 PMCID: PMC9892199 DOI: 10.3389/fpls.2023.1092584] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 01/05/2023] [Indexed: 06/18/2023]
Abstract
As the world's population grows and food needs diversification, the demand for cereals and horticultural crops with beneficial traits increases. In order to meet a variety of demands, suitable cultivars and innovative breeding methods need to be developed. Breeding methods have changed over time following the advance of genetics. With the advent of new sequencing technology in the early 21st century, predictive breeding, such as genomic selection (GS), emerged when large-scale genomic information became available. GS shows good predictive ability for the selection of individuals with traits of interest even for quantitative traits by using various types of the whole genome-scanning markers, breaking away from the limitations of marker-assisted selection (MAS). In the current review, we briefly describe the history of breeding techniques, each breeding method, various statistical models applied to GS and methods to increase the GS efficiency. Consequently, we intend to propose and define the term digital breeding through this review article. Digital breeding is to develop a predictive breeding methods such as GS at a higher level, aiming to minimize human intervention by automatically proceeding breeding design, propagating breeding populations, and to make selections in consideration of various environments, climates, and topography during the breeding process. We also classified the phases of digital breeding based on the technologies and methods applied to each phase. This review paper will provide an understanding and a direction for the final evolution of plant breeding in the future.
Collapse
Affiliation(s)
- Donghyun Jeon
- Plant Computational Genomics Laboratory, Department of Science in Smart Agriculture Systems, Chungnam National University, Daejeon, Republic of Korea
| | - Yuna Kang
- Plant Computational Genomics Laboratory, Department of Crop Science, Chungnam National University, Daejeon, Republic of Korea
| | - Solji Lee
- Plant Computational Genomics Laboratory, Department of Crop Science, Chungnam National University, Daejeon, Republic of Korea
| | - Sehyun Choi
- Plant Computational Genomics Laboratory, Department of Crop Science, Chungnam National University, Daejeon, Republic of Korea
| | - Yeonjun Sung
- Plant Computational Genomics Laboratory, Department of Science in Smart Agriculture Systems, Chungnam National University, Daejeon, Republic of Korea
| | - Tae-Ho Lee
- Genomics Division, National Institute of Agricultural Sciences, Jeonju, Republic of Korea
| | - Changsoo Kim
- Plant Computational Genomics Laboratory, Department of Science in Smart Agriculture Systems, Chungnam National University, Daejeon, Republic of Korea
- Plant Computational Genomics Laboratory, Department of Crop Science, Chungnam National University, Daejeon, Republic of Korea
| |
Collapse
|
24
|
MDSN: A Module Detection Method for Identifying High-Order Epistatic Interactions. Genes (Basel) 2022; 13:genes13122403. [PMID: 36553670 PMCID: PMC9778340 DOI: 10.3390/genes13122403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/14/2022] [Accepted: 12/15/2022] [Indexed: 12/23/2022] Open
Abstract
Epistatic interactions are referred to as SNPs (single nucleotide polymorphisms) that affect disease development and trait expression nonlinearly, and hence identifying epistatic interactions plays a great role in explaining the pathogenesis and genetic heterogeneity of complex diseases. Many methods have been proposed for epistasis detection; nevertheless, they mainly focus on low-order epistatic interactions, two-order or three-order for instance, and often ignore high-order interactions due to computational burden. In this paper, a module detection method called MDSN is proposed for identifying high-order epistatic interactions. First, an SNP network is constructed by a construction strategy of interaction complementary, which consists of low-order SNP interactions that can be obtained from fast computations. Then, a node evaluation measure that integrates multi-topological features is proposed to improve the node expansion algorithm, where the importance of a node is comprehensively evaluated by the topological characteristics of the neighborhood. Finally, modules are detected in the constructed SNP network, which have high-order epistatic interactions associated with the disease. The MDSN was compared with four state-of-the-art methods on simulation datasets and a real Age-related Macular Degeneration dataset. The results demonstrate that MDSN has higher performance on detecting high-order interactions.
Collapse
|
25
|
Das S, Taylor K, Kozubek J, Sardell J, Gardner S. Genetic risk factors for ME/CFS identified using combinatorial analysis. J Transl Med 2022; 20:598. [PMID: 36517845 PMCID: PMC9749644 DOI: 10.1186/s12967-022-03815-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Accepted: 12/07/2022] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a debilitating chronic disease that lacks known pathogenesis, distinctive diagnostic criteria, and effective treatment options. Understanding the genetic (and other) risk factors associated with the disease would begin to help to alleviate some of these issues for patients. METHODS We applied both GWAS and the PrecisionLife combinatorial analytics platform to analyze ME/CFS cohorts from UK Biobank, including the Pain Questionnaire cohort, in a case-control design with 1000 cycles of fully random permutation. Results from this study were supported by a series of replication and cohort comparison experiments, including use of disjoint Verbal Interview CFS, post-viral fatigue syndrome and fibromyalgia cohorts also derived from UK Biobank, and compared results for overlap and reproducibility. RESULTS Combinatorial analysis revealed 199 SNPs mapping to 14 genes that were significantly associated with 91% of the cases in the ME/CFS population. These SNPs were found to stratify by shared cases into 15 clusters (communities) made up of 84 high-order combinations of between 3 and 5 SNPs. p-values for these communities range from 2.3 × 10-10 to 1.6 × 10-72. Many of the genes identified are linked to the key cellular mechanisms hypothesized to underpin ME/CFS, including vulnerabilities to stress and/or infection, mitochondrial dysfunction, sleep disturbance and autoimmune development. We identified 3 of the critical SNPs replicated in the post-viral fatigue syndrome cohort and 2 SNPs replicated in the fibromyalgia cohort. We also noted similarities with genes associated with multiple sclerosis and long COVID, which share some symptoms and potentially a viral infection trigger with ME/CFS. CONCLUSIONS This study provides the first detailed genetic insights into the pathophysiological mechanisms underpinning ME/CFS and offers new approaches for better diagnosis and treatment of patients.
Collapse
Affiliation(s)
- Sayoni Das
- PrecisionLife Ltd, Long Hanborough, Oxford, UK
| | | | | | | | | |
Collapse
|
26
|
Montesinos-López OA, Carter AH, Bernal-Sandoval DA, Cano-Paez B, Montesinos-López A, Crossa J. A Comparison between Three Tuning Strategies for Gaussian Kernels in the Context of Univariate Genomic Prediction. Genes (Basel) 2022; 13:genes13122282. [PMID: 36553547 PMCID: PMC9778581 DOI: 10.3390/genes13122282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 11/15/2022] [Accepted: 11/29/2022] [Indexed: 12/07/2022] Open
Abstract
Genomic prediction is revolutionizing plant breeding since candidate genotypes can be selected without the need to measure their trait in the field. When a reference population contains both phenotypic and genotypic information, it is trained by a statistical machine learning method that is subsequently used for making predictions of breeding or phenotypic values of candidate genotypes that were only genotyped. Nevertheless, the successful implementation of the genomic selection (GS) methodology depends on many factors. One key factor is the type of statistical machine learning method used since some are unable to capture nonlinear patterns available in the data. While kernel methods are powerful statistical machine learning algorithms that capture complex nonlinear patterns in the data, their successful implementation strongly depends on the careful tuning process of the involved hyperparameters. As such, in this paper we compare three methods of tuning (manual tuning, grid search, and Bayesian optimization) for the Gaussian kernel under a Bayesian best linear unbiased predictor model. We used six real datasets of wheat (Triticum aestivum L.) to compare the three strategies of tuning. We found that if we want to obtain the major benefits of using Gaussian kernels, it is very important to perform a careful tuning process. The best prediction performance was observed when the tuning process was performed with grid search and Bayesian optimization. However, we did not observe relevant differences between the grid search and Bayesian optimization approach. The observed gains in terms of prediction performance were between 2.1% and 27.8% across the six datasets under study.
Collapse
Affiliation(s)
| | - Arron H. Carter
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA 99164, USA
| | | | - Bernabe Cano-Paez
- Facultad de Ciencias, Universidad Nacional Autónoma de México (UNAM), Mexico City 04510, Mexico
| | - Abelardo Montesinos-López
- Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Guadalajara 44430, Mexico
- Correspondence: (A.M.-L.); (J.C.)
| | - José Crossa
- International Maize and Wheat Improvement Center (CIMMYT), El Batan, Texcoco 56237, Mexico
- Hidrociencias, Colegio de Postgraduados, Campus Montecillos, Carretera México-Texcoco Km. 36.5, Montecillo 56230, Mexico
- Correspondence: (A.M.-L.); (J.C.)
| |
Collapse
|
27
|
Abd El Hamid MM, Omar YM, Shaheen M, Mabrouk MS. Discovering epistasis interactions in Alzheimer's disease using deep learning model. GENE REPORTS 2022. [DOI: 10.1016/j.genrep.2022.101673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
28
|
Budhlakoti N, Mishra DC, Majumdar SG, Kumar A, Srivastava S, Rai SN, Rai A. Integrated model for genomic prediction under additive and non-additive genetic architecture. FRONTIERS IN PLANT SCIENCE 2022; 13:1027558. [PMID: 36531414 PMCID: PMC9749549 DOI: 10.3389/fpls.2022.1027558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 10/11/2022] [Indexed: 06/17/2023]
Abstract
Using data from genome-wide molecular markers, genomic selection procedures have proved useful for estimating breeding values and phenotypic prediction. The link between an individual genotype and phenotype has been modelled using a number of parametric methods to estimate individual breeding value. It has been observed that parametric methods perform satisfactorily only when the system under study has additive genetic architecture. To capture non-additive (dominance and epistasis) effects, nonparametric approaches have also been developed; however, they typically fall short of capturing additive effects. The idea behind this study is to select the most appropriate model from each parametric and nonparametric category and build an integrated model that can incorporate the best features of both models. It was observed from the results of the current study that GBLUP performed admirably under additive architecture, while SVM's performance in non-additive architecture was found to be encouraging. A robust model for genomic prediction has been developed in light of these findings, which can handle both additive and epistatic effects simultaneously by minimizing their error variance. The developed integrated model has been assessed using standard evaluation measures like predictive ability and error variance.
Collapse
Affiliation(s)
- Neeraj Budhlakoti
- Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Dwijesh Chandra Mishra
- Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Sayanti Guha Majumdar
- Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Anuj Kumar
- Department of Microbiology and Immunology, Dalhousie University, Halifax, NS, Canada
| | - Sudhir Srivastava
- Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - S. N. Rai
- Bioinformatics and Biostatistics Department, University of Louisville, Louisville, KY, United States
| | - Anil Rai
- Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| |
Collapse
|
29
|
Wang X, Wen Y. A penalized linear mixed model with generalized method of moments estimators for complex phenotype prediction. Bioinformatics 2022; 38:5222-5228. [PMID: 36205617 DOI: 10.1093/bioinformatics/btac659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 07/27/2022] [Accepted: 10/05/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Linear mixed models (LMMs) have long been the method of choice for risk prediction analysis on high-dimensional data. However, it remains computationally challenging to simultaneously model a large amount of variants that can be noise or have predictive effects of complex forms. RESULTS In this work, we have developed a penalized LMM with generalized method of moments (pLMMGMM) estimators for prediction analysis. pLMMGMM is built within the LMM framework, where random effects are used to model the joint predictive effects from all variants within a region. Different from existing methods that focus on linear relationships and use empirical criteria for variable screening, pLMMGMM can efficiently detect regions that harbor genetic variants with both linear and non-linear predictive effects. In addition, unlike existing LMMs that can only handle a very limited number of random effects, pLMMGMM is much less computationally demanding. It can jointly consider a large number of regions and accurately detect those that are predictive. Through theoretical investigations, we have shown that our method has the selection consistency and asymptotic normality. Through extensive simulations and the analysis of PET-imaging outcomes, we have demonstrated that pLMMGMM outperformed existing models and it can accurately detect regions that harbor risk factors with various forms of predictive effects. AVAILABILITY AND IMPLEMENTATION The R-package is available at https://github.com/XiaQiong/GMMLasso. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaqiong Wang
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| | - Yalu Wen
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| |
Collapse
|
30
|
Whole genome sequencing reveals epistasis effects within RET for Hirschsprung disease. Sci Rep 2022; 12:20423. [PMID: 36443333 PMCID: PMC9705416 DOI: 10.1038/s41598-022-24077-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 11/09/2022] [Indexed: 11/29/2022] Open
Abstract
Common variants in RET and NRG1 have been associated with Hirschsprung disease (HSCR), a congenital disorder characterised by incomplete innervation of distal gut, in East Asian (EA) populations. However, the allelic effects so far identified do not fully explain its heritability, suggesting the presence of epistasis, where effect of one genetic variant differs depending on other (modifier) variants. Few instances of epistasis have been documented in complex diseases due to modelling complexity and data challenges. We proposed four epistasis models to comprehensively capture epistasis for HSCR between and within RET and NRG1 loci using whole genome sequencing (WGS) data in EA samples. 65 variants within the Topologically Associating Domain (TAD) of RET demonstrated significant epistasis with the lead enhancer variant (RET+3; rs2435357). These epistatic variants formed two linkage disequilibrium (LD) clusters represented by rs2506026 and rs2506028 that differed in minor allele frequency and the best-supported epistatic model. Intriguingly, rs2506028 is in high LD with one cis-regulatory variant (rs2506030) highlighted previously, suggesting that detected epistasis might be mediated through synergistic effects on transcription regulation of RET. Our findings demonstrated the advantages of WGS data for detecting epistasis, and support the presence of interactive effects of regulatory variants in RET for HSCR.
Collapse
|
31
|
The Enigmatic Etiology of Oculo-Auriculo-Vertebral Spectrum (OAVS): An Exploratory Gene Variant Interaction Approach in Candidate Genes. Life (Basel) 2022; 12:life12111723. [PMID: 36362878 PMCID: PMC9693117 DOI: 10.3390/life12111723] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 10/12/2022] [Accepted: 10/24/2022] [Indexed: 11/17/2022] Open
Abstract
The clinical diagnosis of oculo-auriculo-vertebral spectrum (OAVS) is established when microtia is present in association with hemifacial hypoplasia (HH) and/or ocular, vertebral, and/or renal malformations. Genetic and non-genetic factors have been associated with microtia/OAVS. Although the etiology remains unknown in most patients, some cases may have an autosomal dominant, autosomal recessive, or multifactorial inheritance. Among the possible genetic factors, gene−gene interactions may play important roles in the etiology of complex diseases, but the literature lacks related reports in OAVS patients. Therefore, we performed a gene−variant interaction analysis within five microtia/OAVS candidate genes (HOXA2, TCOF1, SALL1, EYA1 and TBX1) in 49 unrelated OAVS Mexican patients (25 familial and 24 sporadic cases). A statistically significant intergenic interaction (p-value < 0.001) was identified between variants p.(Pro1099Arg) TCOF1 (rs1136103) and p.(Leu858=) SALL1 (rs1965024). This intergenic interaction may suggest that the products of these genes could participate in pathways related to craniofacial alterations, such as the retinoic acid (RA) pathway. The absence of clearly pathogenic variants in any of the analyzed genes does not support a monogenic etiology for microtia/OAVS involving these genes in our patients. Our findings could suggest that in addition to high-throughput genomic approaches, future gene−gene interaction analyses could contribute to improving our understanding of the etiology of microtia/OAVS.
Collapse
|
32
|
Kassani PH, Lu F, Guen YL, Belloy ME, He Z. Deep neural networks with controlled variable selection for the identification of putative causal genetic variants. NAT MACH INTELL 2022; 4:761-771. [PMID: 37859729 PMCID: PMC10586424 DOI: 10.1038/s42256-022-00525-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 07/26/2022] [Indexed: 11/09/2022]
Abstract
Deep neural networks (DNNs) have been successfully utilized in many scientific problems for their high prediction accuracy, but their application to genetic studies remains challenging due to their poor interpretability. Here we consider the problem of scalable, robust variable selection in DNNs for the identification of putative causal genetic variants in genome sequencing studies. We identified a pronounced randomness in feature selection in DNNs due to its stochastic nature, which may hinder interpretability and give rise to misleading results. We propose an interpretable neural network model, stabilized using ensembling, with controlled variable selection for genetic studies. The merit of the proposed method includes: flexible modelling of the nonlinear effect of genetic variants to improve statistical power; multiple knockoffs in the input layer to rigorously control the false discovery rate; hierarchical layers to substantially reduce the number of weight parameters and activations, and improve computational efficiency; and stabilized feature selection to reduce the randomness in identified signals. We evaluate the proposed method in extensive simulation studies and apply it to the analysis of Alzheimer's disease genetics. We show that the proposed method, when compared with conventional linear and nonlinear methods, can lead to substantially more discoveries.
Collapse
Affiliation(s)
- Peyman H. Kassani
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
| | - Fred Lu
- Department of Statistics, Stanford University, Stanford, CA, USA
| | - Yann Le Guen
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
| | - Michael E. Belloy
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
| | - Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
- Quantitative Sciences Unit, Department of Medicine (Biomedical Informatics Research), Stanford University, Stanford, CA, USA
| |
Collapse
|
33
|
Chu X, Jiang M, Liu ZJ. Biomarker interaction selection and disease detection based on multivariate gain ratio. BMC Bioinformatics 2022; 23:176. [PMID: 35550010 PMCID: PMC9103137 DOI: 10.1186/s12859-022-04699-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 04/14/2022] [Indexed: 11/30/2022] Open
Abstract
Background Disease detection is an important aspect of biotherapy. With the development of biotechnology and computer technology, there are many methods to detect disease based on single biomarker. However, biomarker does not influence disease alone in some cases. It’s the interaction between biomarkers that determines disease status. The existing influence measure I-score is used to evaluate the importance of interaction in determining disease status, but there is a deviation about the number of variables in interaction when applying I-score. To solve the problem, we propose a new influence measure Multivariate Gain Ratio (MGR) based on Gain Ratio (GR) of single-variate, which provides us with multivariate combination called interaction. Results We propose a preprocessing verification algorithm based on partial predictor variables to select an appropriate preprocessing method. In this paper, an algorithm for selecting key interactions of biomarkers and applying key interactions to construct a disease detection model is provided. MGR is more credible than I-score in the case of interaction containing small number of variables. Our method behaves better with average accuracy \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$93.13\%$$\end{document}93.13% than I-score of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$91.73\%$$\end{document}91.73% in Breast Cancer Wisconsin (Diagnostic) Dataset. Compared to the classification results \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$89.80\%$$\end{document}89.80% based on all predictor variables, MGR identifies the true main biomarkers and realizes the dimension reduction. In Leukemia Dataset, the experiment results show the effectiveness of MGR with the accuracy of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$97.32\%$$\end{document}97.32% compared to I-score with accuracy \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$89.11\%$$\end{document}89.11%. The results can be explained by the nature of MGR and I-score mentioned above because every key interaction contains a small number of variables in Leukemia Dataset. Conclusions MGR is effective for selecting important biomarkers and biomarker interactions even in high-dimension feature space in which the interaction could contain more than two biomarkers. The prediction ability of interactions selected by MGR is better than I-score in the case of interaction containing small number of variables. MGR is generally applicable to various types of biomarker datasets including cell nuclei, gene, SNPs and protein datasets.
Collapse
Affiliation(s)
- Xiao Chu
- Academy of Mathematics and Systems Science Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China.
| | - Mao Jiang
- Academy of Mathematics and Systems Science Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Zhuo-Jun Liu
- Academy of Mathematics and Systems Science Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
34
|
Chen P, Michel AH, Zhang J. Transposon insertional mutagenesis of diverse yeast strains suggests coordinated gene essentiality polymorphisms. Nat Commun 2022; 13:1490. [PMID: 35314699 PMCID: PMC8938418 DOI: 10.1038/s41467-022-29228-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 03/01/2022] [Indexed: 12/18/2022] Open
Abstract
Due to epistasis, the same mutation can have drastically different phenotypic consequences in different individuals. This phenomenon is pertinent to precision medicine as well as antimicrobial drug development, but its general characteristics are largely unknown. We approach this question by genome-wide assessment of gene essentiality polymorphism in 16 Saccharomyces cerevisiae strains using transposon insertional mutagenesis. Essentiality polymorphism is observed for 9.8% of genes, most of which have had repeated essentiality switches in evolution. Genes exhibiting essentiality polymorphism lean toward having intermediate numbers of genetic and protein interactions. Gene essentiality changes tend to occur concordantly among components of the same protein complex or metabolic pathway and among a group of over 100 mitochondrial proteins, revealing molecular machines or functional modules as units of gene essentiality variation. Most essential genes tolerate transposon insertions consistently among strains in one or more coding segments, delineating nonessential regions within essential genes.
Collapse
Affiliation(s)
- Piaopiao Chen
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Agnès H Michel
- Department of Biochemistry, University of Oxford, Oxford, OX1 3QU, UK
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
35
|
Zeng L, Moser S, Mirza-Schreiber N, Lamina C, Coassin S, Nelson CP, Annilo T, Franzén O, Kleber ME, Mack S, Andlauer TFM, Jiang B, Stiller B, Li L, Willenborg C, Munz M, Kessler T, Kastrati A, Laugwitz KL, Erdmann J, Moebus S, Nöthen MM, Peters A, Strauch K, Müller-Nurasyid M, Gieger C, Meitinger T, Steinhagen-Thiessen E, März W, Metspalu A, Björkegren JLM, Samani NJ, Kronenberg F, Müller-Myhsok B, Schunkert H. Cis-epistasis at the LPA locus and risk of cardiovascular diseases. Cardiovasc Res 2022; 118:1088-1102. [PMID: 33878186 PMCID: PMC8930071 DOI: 10.1093/cvr/cvab136] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 04/16/2021] [Indexed: 12/28/2022] Open
Abstract
AIMS Coronary artery disease (CAD) has a strong genetic predisposition. However, despite substantial discoveries made by genome-wide association studies (GWAS), a large proportion of heritability awaits identification. Non-additive genetic effects might be responsible for part of the unaccounted genetic variance. Here, we attempted a proof-of-concept study to identify non-additive genetic effects, namely epistatic interactions, associated with CAD. METHODS AND RESULTS We tested for epistatic interactions in 10 CAD case-control studies and UK Biobank with focus on 8068 SNPs at 56 loci with known associations with CAD risk. We identified a SNP pair located in cis at the LPA locus, rs1800769 and rs9458001, to be jointly associated with risk for CAD [odds ratio (OR) = 1.37, P = 1.07 × 10-11], peripheral arterial disease (OR = 1.22, P = 2.32 × 10-4), aortic stenosis (OR = 1.47, P = 6.95 × 10-7), hepatic lipoprotein(a) (Lp(a)) transcript levels (beta = 0.39, P = 1.41 × 10-8), and Lp(a) serum levels (beta = 0.58, P = 8.7 × 10-32), while individual SNPs displayed no association. Further exploration of the LPA locus revealed a strong dependency of these associations on a rare variant, rs140570886, that was previously associated with Lp(a) levels. We confirmed increased CAD risk for heterozygous (relative OR = 1.46, P = 9.97 × 10-32) and individuals homozygous for the minor allele (relative OR = 1.77, P = 0.09) of rs140570886. Using forward model selection, we also show that epistatic interactions between rs140570886, rs9458001, and rs1800769 modulate the effects of the rs140570886 risk allele. CONCLUSIONS These results demonstrate the feasibility of a large-scale knowledge-based epistasis scan and provide rare evidence of an epistatic interaction in a complex human disease. We were directed to a variant (rs140570886) influencing risk through additive genetic as well as epistatic effects. In summary, this study provides deeper insights into the genetic architecture of a locus important for cardiovascular diseases.
Collapse
Affiliation(s)
- Lingyao Zeng
- Deutsches Herzzentrum München, Klinik für Herz- und Kreislauferkrankungen, Technische Universität München, 80636 Munich, Germany
| | - Sylvain Moser
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, 80804 Munich, Germany
- International Max Planck Research School for Translational Psychiatry (IMPRS-TP), Munich 80804, Germany
| | - Nazanin Mirza-Schreiber
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, 80804 Munich, Germany
- Institute of Neurogenomics, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Claudia Lamina
- Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck 6020, Austria
| | - Stefan Coassin
- Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck 6020, Austria
| | - Christopher P Nelson
- Department of Cardiovascular Sciences, University of Leicester, BHF Cardiovascular Research Centre, Glenfield Hospital, Groby Rd, Leicester LE3 9QP, UK
- NIHR Leicester Biomedical Research Centre, Glenfield Hospital, Leicester LE3 9QP, UK
| | - Tarmo Annilo
- Estonian Genome Center, Institute of Genomics, University of Tartu, 51010 Tartu, Estonia
| | - Oscar Franzén
- Department of Genetics and Genomic Sciences and Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
- Integrated Cardio Metabolic Centre, Karolinska Institutet, Huddinge, 14186 Stockholm, Sweden
| | - Marcus E Kleber
- Medizinische Klinik V (Nephrologie, Hypertensiologie, Rheumatologie, Endokrinologie, Diabetologie), Medizinische Fakultät Mannheim der Universität Heidelberg, 69120 Heidelberg, Germany
| | - Salome Mack
- Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck 6020, Austria
| | - Till F M Andlauer
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, 80804 Munich, Germany
- Department of Neurology, Klinikum rechts der Isar, School of Medicine, Technical University of Munich, 81675 Munich, Germany
| | - Beibei Jiang
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, 80804 Munich, Germany
| | - Barbara Stiller
- Deutsches Herzzentrum München, Klinik für Herz- und Kreislauferkrankungen, Technische Universität München, 80636 Munich, Germany
| | - Ling Li
- Deutsches Herzzentrum München, Klinik für Herz- und Kreislauferkrankungen, Technische Universität München, 80636 Munich, Germany
| | - Christina Willenborg
- Institute for Cardiogenetics and University Heart Center Luebeck, University of Lübeck, 23562 Lübeck, Germany
| | - Matthias Munz
- Institute for Cardiogenetics and University Heart Center Luebeck, University of Lübeck, 23562 Lübeck, Germany
- Deutsches Zentrum für Herz- und Kreislauf-Forschung (DZHK), Partner Site Hamburg/Lübeck/Kiel, 23562 Lübeck, Germany
- Charité – University Medicine Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute for Dental and Craniofacial Sciences, Department of Periodontology and Synoptic Dentistry, 14197 Berlin, Germany
| | - Thorsten Kessler
- Deutsches Herzzentrum München, Klinik für Herz- und Kreislauferkrankungen, Technische Universität München, 80636 Munich, Germany
- Deutsches Zentrum für Herz- und Kreislauf-Forschung (DZHK), Partner Site Munich Heart Alliance, 80636 Munich, Germany
| | - Adnan Kastrati
- Deutsches Herzzentrum München, Klinik für Herz- und Kreislauferkrankungen, Technische Universität München, 80636 Munich, Germany
- Deutsches Zentrum für Herz- und Kreislauf-Forschung (DZHK), Partner Site Munich Heart Alliance, 80636 Munich, Germany
| | - Karl-Ludwig Laugwitz
- Medizinische Klinik, Klinikum rechts der Isar, Technische Universität München, 81675 Munich, Germany
| | - Jeanette Erdmann
- Institute for Cardiogenetics and University Heart Center Luebeck, University of Lübeck, 23562 Lübeck, Germany
- Deutsches Zentrum für Herz- und Kreislauf-Forschung (DZHK), Partner Site Hamburg/Lübeck/Kiel, 23562 Lübeck, Germany
| | - Susanne Moebus
- Institute for Medical Informatics, Biometry and Epidemiology, University Hospital Essen, 45147 Essen, Germany
- Centre for Urbane Epidemiology, University Hospital Essen, 45147 Essen, Germany
| | - Markus M Nöthen
- Institute of Human Genetics, University of Bonn School of Medicine & University Hospital Bonn, 53012 Bonn, Germany
| | - Annette Peters
- Institute of Genetic Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany
- IBE, Faculty of Medicine, LMU Munich, 81377 Munich, Germany
| | - Konstantin Strauch
- Institute of Genetic Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany
- IBE, Faculty of Medicine, LMU Munich, 81377 Munich, Germany
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, 55101 Mainz, Germany
| | - Martina Müller-Nurasyid
- Institute of Genetic Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany
- IBE, Faculty of Medicine, LMU Munich, 81377 Munich, Germany
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, 55101 Mainz, Germany
- Department of Internal Medicine I (Cardiology), Hospital of the Ludwig-Maximilians-University (LMU) Munich, 81377 Munich, Germany
| | - Christian Gieger
- Institute of Genetic Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany
- Institute of Epidemiology II, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Thomas Meitinger
- Institute of Human Genetics, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | | | - Winfried März
- Medizinische Klinik V (Nephrologie, Hypertensiologie, Rheumatologie, Endokrinologie, Diabetologie), Medizinische Fakultät Mannheim der Universität Heidelberg, 69120 Heidelberg, Germany
- Synlab Akademie, Synlab Holding Deutschland GmbH, Mannheim und Augsburg, 86156 Augsburg, Germany
| | - Andres Metspalu
- Estonian Genome Center, Institute of Genomics, University of Tartu, 51010 Tartu, Estonia
- Institute of Molecular and Cell Biology, University of Tartu, 51010 Tartu, Estonia
| | - Johan L M Björkegren
- Department of Genetics and Genomic Sciences and Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
- Integrated Cardio Metabolic Centre, Karolinska Institutet, Huddinge, 14186 Stockholm, Sweden
| | - Nilesh J Samani
- Department of Cardiovascular Sciences, University of Leicester, BHF Cardiovascular Research Centre, Glenfield Hospital, Groby Rd, Leicester LE3 9QP, UK
- NIHR Leicester Biomedical Research Centre, Glenfield Hospital, Leicester LE3 9QP, UK
| | - Florian Kronenberg
- Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck 6020, Austria
| | - Bertram Müller-Myhsok
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, 80804 Munich, Germany
- Munich Cluster of Systems Biology, SyNergy, 81377 Munich, Germany
- Department of Health Data Science, University of Liverpool, Liverpool L69 3BX, UK
| | - Heribert Schunkert
- Deutsches Herzzentrum München, Klinik für Herz- und Kreislauferkrankungen, Technische Universität München, 80636 Munich, Germany
- Deutsches Zentrum für Herz- und Kreislauf-Forschung (DZHK), Partner Site Hamburg/Lübeck/Kiel, 23562 Lübeck, Germany
| |
Collapse
|
36
|
Budhlakoti N, Kushwaha AK, Rai A, Chaturvedi KK, Kumar A, Pradhan AK, Kumar U, Kumar RR, Juliana P, Mishra DC, Kumar S. Genomic Selection: A Tool for Accelerating the Efficiency of Molecular Breeding for Development of Climate-Resilient Crops. Front Genet 2022; 13:832153. [PMID: 35222548 PMCID: PMC8864149 DOI: 10.3389/fgene.2022.832153] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 01/10/2022] [Indexed: 12/17/2022] Open
Abstract
Since the inception of the theory and conceptual framework of genomic selection (GS), extensive research has been done on evaluating its efficiency for utilization in crop improvement. Though, the marker-assisted selection has proven its potential for improvement of qualitative traits controlled by one to few genes with large effects. Its role in improving quantitative traits controlled by several genes with small effects is limited. In this regard, GS that utilizes genomic-estimated breeding values of individuals obtained from genome-wide markers to choose candidates for the next breeding cycle is a powerful approach to improve quantitative traits. In the last two decades, GS has been widely adopted in animal breeding programs globally because of its potential to improve selection accuracy, minimize phenotyping, reduce cycle time, and increase genetic gains. In addition, given the promising initial evaluation outcomes of GS for the improvement of yield, biotic and abiotic stress tolerance, and quality in cereal crops like wheat, maize, and rice, prospects of integrating it in breeding crops are also being explored. Improved statistical models that leverage the genomic information to increase the prediction accuracies are critical for the effectiveness of GS-enabled breeding programs. Study on genetic architecture under drought and heat stress helps in developing production markers that can significantly accelerate the development of stress-resilient crop varieties through GS. This review focuses on the transition from traditional selection methods to GS, underlying statistical methods and tools used for this purpose, current status of GS studies in crop plants, and perspectives for its successful implementation in the development of climate-resilient crops.
Collapse
Affiliation(s)
- Neeraj Budhlakoti
- ICAR- Indian Agricultural Statistics Research Institute, New Delhi, India
| | | | - Anil Rai
- ICAR- Indian Agricultural Statistics Research Institute, New Delhi, India
| | - K K Chaturvedi
- ICAR- Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Anuj Kumar
- ICAR- Indian Agricultural Statistics Research Institute, New Delhi, India
| | | | - Uttam Kumar
- Borlaug Institute for South Asia (BISA), Ludhiana, India
| | | | | | - D C Mishra
- ICAR- Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Sundeep Kumar
- ICAR- National Bureau of Plant Genetic Resources, New Delhi, India
| |
Collapse
|
37
|
Verma RK, Kalyakulina A, Mishra A, Ivanchenko M, Jalan S. Role of mitochondrial genetic interactions in determining adaptation to high altitude human population. Sci Rep 2022; 12:2046. [PMID: 35132109 PMCID: PMC8821606 DOI: 10.1038/s41598-022-05719-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Accepted: 12/17/2021] [Indexed: 12/13/2022] Open
Abstract
Physiological and haplogroup studies performed to understand high-altitude adaptation in humans are limited to individual genes and polymorphic sites. Due to stochastic evolutionary forces, the frequency of a polymorphism is affected by changes in the frequency of a near-by polymorphism on the same DNA sample making them connected in terms of evolution. Here, first, we provide a method to model these mitochondrial polymorphisms as "co-mutation networks" for three high-altitude populations, Tibetan, Ethiopian and Andean. Then, by transforming these co-mutation networks into weighted and undirected gene-gene interaction (GGI) networks, we were able to identify functionally enriched genetic interactions of CYB and CO3 genes in Tibetan and Andean populations, while NADH dehydrogenase genes in the Ethiopian population playing a significant role in high altitude adaptation. These co-mutation based genetic networks provide insights into the role of different set of genes in high-altitude adaptation in human sub-populations.
Collapse
Affiliation(s)
- Rahul K Verma
- Department of Biosciences and Biomedical Engineering, Indian Institute of Technology Indore, Khandwa Road, Simrol, Indore, 453552, India
| | - Alena Kalyakulina
- Department of Applied Mathematics and Centre of Bioinformatics, Lobachevsky State University of Nizhny Novgorod, Nizhny Novgorod, Russia
| | - Ankit Mishra
- Complex Systems Lab, Department of Physics, Indian Institute of Technology Indore, Khandwa Road, Simrol, Indore, 453552, India
| | - Mikhail Ivanchenko
- Department of Applied Mathematics and Centre of Bioinformatics, Lobachevsky State University of Nizhny Novgorod, Nizhny Novgorod, Russia.,Laboratory of Systems Medicine of Healthy Aging and Department of Applied Mathematics, Lobachevsky University, Nizhny Novgorod, Russia
| | - Sarika Jalan
- Department of Biosciences and Biomedical Engineering, Indian Institute of Technology Indore, Khandwa Road, Simrol, Indore, 453552, India. .,Complex Systems Lab, Department of Physics, Indian Institute of Technology Indore, Khandwa Road, Simrol, Indore, 453552, India.
| |
Collapse
|
38
|
Pathak AK, Sukhavasi K, Marnetto D, Chaubey G, Pandey AK. Human population genomics approach in food metabolism. FUTURE FOODS 2022. [DOI: 10.1016/b978-0-323-91001-9.00033-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open
|
39
|
On the Fourier transform of a quantitative trait: Implications for compressive sensing. J Theor Biol 2021; 540:110985. [PMID: 34953868 DOI: 10.1016/j.jtbi.2021.110985] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Revised: 12/01/2021] [Accepted: 12/09/2021] [Indexed: 11/23/2022]
Abstract
This paper explores the genotype-phenotype relationship. It outlines conditions under which the dependence of a quantitative trait on the genome might be predictable, based on measurement of a limited subset of genotypes. It uses the theory of real-valued Boolean functions in a systematic way to translate trait data into the Fourier domain. Important trait features, such as the roughness of the trait landscape or the modularity of a trait have a simple Fourier interpretation. Roughness at a gene location corresponds to high sensitivity to mutation, while a modular organization of gene activity reduces such sensitivity. Traits where rugged loci are rare will naturally compress gene data in the Fourier domain, leading to a sparse representation of trait data, concentrated in identifiable, low-level coefficients. This Fourier representation of a trait organizes epistasis in a form which is isometric to the trait data. As Fourier matrices are known to be maximally incoherent with the standard basis, this permits employing compressive sensing techniques to work from data sets that are relatively small-sometimes even of polynomial size-compared to the exponentially large sets of possible genomes. This theory provides a theoretical underpinning for systematic use of Boolean function machinery to dissect the dependency of a trait on the genome and environment.
Collapse
|
40
|
Xiong W, Pan H. Interaction screening for high-dimensional heterogeneous data via robust hybrid metrics. Stat Med 2021; 40:6651-6673. [PMID: 34542189 DOI: 10.1002/sim.9204] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Revised: 07/22/2021] [Accepted: 09/02/2021] [Indexed: 11/07/2022]
Abstract
A novel model-free interaction screening approach called the hybrid metrics is introduced for high-dimensional heterogeneous data analysis. The metrics established based on the variation of conditional joint distribution function are measurements of interaction that include both size and direction. They are robust and can work with many types of response variables, including continuous, discrete, and categorical variables. We can apply the hybrid metrics to effective interaction selection for classification, response index models, and Poisson regression, among others. When dealing with classification, the hybrid metrics are capable of capturing both nonlinear category-general and category-specific interaction effects, providing us with a comprehensive overview and precise discovery of category information. When faced with a continuous response, the hybrid metrics perform fairly well even if the signal strength is weak, behaving as if the true interactions were known. To facilitate implementation, a fast two-stage procedure which naturally and efficiently enforces both strong and weak heredity is advocated. We further demonstrate their superior performances over popular competitors by exhaustive simulations and a SRBCT real data example. Supplementary materials for this article are available online.
Collapse
Affiliation(s)
- Wei Xiong
- School of Statistics, University of International Business and Economics, Beijing, China
| | - Han Pan
- School of Statistics, University of International Business and Economics, Beijing, China
| |
Collapse
|
41
|
Musolf AM, Holzinger ER, Malley JD, Bailey-Wilson JE. What makes a good prediction? Feature importance and beginning to open the black box of machine learning in genetics. Hum Genet 2021; 141:1515-1528. [PMID: 34862561 PMCID: PMC9360120 DOI: 10.1007/s00439-021-02402-z] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Accepted: 11/08/2021] [Indexed: 01/26/2023]
Abstract
Genetic data have become increasingly complex within the past decade, leading researchers to pursue increasingly complex questions, such as those involving epistatic interactions and protein prediction. Traditional methods are ill-suited to answer these questions, but machine learning (ML) techniques offer an alternative solution. ML algorithms are commonly used in genetics to predict or classify subjects, but some methods evaluate which features (variables) are responsible for creating a good prediction; this is called feature importance. This is critical in genetics, as researchers are often interested in which features (e.g., SNP genotype or environmental exposure) are responsible for a good prediction. This allows for the deeper analysis beyond simple prediction, including the determination of risk factors associated with a given phenotype. Feature importance further permits the researcher to peer inside the black box of many ML algorithms to see how they work and which features are critical in informing a good prediction. This review focuses on ML methods that provide feature importance metrics for the analysis of genetic data. Five major categories of ML algorithms: k nearest neighbors, artificial neural networks, deep learning, support vector machines, and random forests are described. The review ends with a discussion of how to choose the best machine for a data set. This review will be particularly useful for genetic researchers looking to use ML methods to answer questions beyond basic prediction and classification.
Collapse
Affiliation(s)
- Anthony M Musolf
- Statistical Genetics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive Suite 1200, Baltimore, MD, 21224, USA
| | - Emily R Holzinger
- Target Sciences, Informatics and Predictive Sciences, Bristol Myers Squibb, Cambridge, MA, USA
| | - James D Malley
- Statistical Genetics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive Suite 1200, Baltimore, MD, 21224, USA
| | - Joan E Bailey-Wilson
- Statistical Genetics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive Suite 1200, Baltimore, MD, 21224, USA.
| |
Collapse
|
42
|
Abd El Hamid MM, Shaheen M, Mabrouk MS, Omar YMK. MACHINE LEARNING FOR DETECTING EPISTASIS INTERACTIONS AND ITS RELEVANCE TO PERSONALIZED MEDICINE IN ALZHEIMER’S DISEASE: SYSTEMATIC REVIEW. BIOMEDICAL ENGINEERING: APPLICATIONS, BASIS AND COMMUNICATIONS 2021; 33. [DOI: 10.4015/s1016237221500472] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
Alzheimer’s disease (AD) is a progressive disease that attacks the brain’s neurons and causes problems in memory, thinking, and reasoning skills. Personalized Medicine (PM) needs a better and more accurate understanding of the relationship between human genetic data and complex diseases like AD. The goal of PM is to tailor the treatment of a case person to his individual properties. PM requires the prediction of a person’s disease from genetic data, and its success depends on the accurate detection of genetic biomarkers. Single Nucleotide polymorphisms (SNPs) are considered the most prevalent type of variation in the human genome. Epistasis has a biological relevance to complex diseases and has an important impact on PM. Detection of the most significant epistasis interactions associated with complex diseases is a big challenge. This paper reviews several machine learning techniques and algorithms to detect the most significant epistasis interactions in Alzheimer’s disease. We discuss many machine learning techniques that can be used for detecting SNPs’ combinations like Random Forests, Support Vector Machines, Multifactor Dimensionality Reduction, Neural Network, and Deep Learning. This review paper highlights the pros and cons of these techniques and explains how they can be applied in an efficient framework to apply knowledge discovery and data mining in AD disease.
Collapse
Affiliation(s)
- Marwa M. Abd El Hamid
- The Higher Institute of Computer Science & Information Technology, El-Shorouk Academy, El Shorouk City, Cairo, Egypt
- College of Computing and Information Technology AASTMT, Egypt
| | - Mohamed Shaheen
- College of Computing and Information Technology AASTMT, Egypt
| | - Mai S. Mabrouk
- Biomedical Engineering Department Misr University for Science and Technology 6th of October City, Egypt
| | | |
Collapse
|
43
|
MIDESP: Mutual Information-Based Detection of Epistatic SNP Pairs for Qualitative and Quantitative Phenotypes. BIOLOGY 2021; 10:biology10090921. [PMID: 34571798 PMCID: PMC8469369 DOI: 10.3390/biology10090921] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 09/09/2021] [Accepted: 09/13/2021] [Indexed: 11/17/2022]
Abstract
Simple Summary The interactions between SNPs, which are known as epistasis, can strongly influence the phenotype. Their detection is still a challenge, which is made even more difficult through the existence of background associations that can hide correct epistatic interactions. To address the limitations of existing methods, we present in this study our novel method MIDESP for the detection of epistatic SNP pairs. It is the first mutual information-based method that can be applied to both qualitative and quantitative phenotypes and which explicitly accounts for background associations in the dataset. Abstract The interactions between SNPs result in a complex interplay with the phenotype, known as epistasis. The knowledge of epistasis is a crucial part of understanding genetic causes of complex traits. However, due to the enormous number of SNP pairs and their complex relationship to the phenotype, identification still remains a challenging problem. Many approaches for the detection of epistasis have been developed using mutual information (MI) as an association measure. However, these methods have mainly been restricted to case–control phenotypes and are therefore of limited applicability for quantitative traits. To overcome this limitation of MI-based methods, here, we present an MI-based novel algorithm, MIDESP, to detect epistasis between SNPs for qualitative as well as quantitative phenotypes. Moreover, by incorporating a dataset-dependent correction technique, we deal with the effect of background associations in a genotypic dataset to separate correct epistatic interaction signals from those of false positive interactions resulting from the effect of single SNP×phenotype associations. To demonstrate the effectiveness of MIDESP, we apply it on two real datasets with qualitative and quantitative phenotypes, respectively. Our results suggest that by eliminating the background associations, MIDESP can identify important genes, which play essential roles for bovine tuberculosis or the egg weight of chickens.
Collapse
|
44
|
Liu D, Ban HJ, El Sergani AM, Lee MK, Hecht JT, Wehby GL, Moreno LM, Feingold E, Marazita ML, Cha S, Szabo-Rogers HL, Weinberg SM, Shaffer JR. PRICKLE1 × FOCAD Interaction Revealed by Genome-Wide vQTL Analysis of Human Facial Traits. Front Genet 2021; 12:674642. [PMID: 34434215 PMCID: PMC8381734 DOI: 10.3389/fgene.2021.674642] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 06/03/2021] [Indexed: 12/14/2022] Open
Abstract
The human face is a highly complex and variable structure resulting from the intricate coordination of numerous genetic and non-genetic factors. Hundreds of genomic loci impacting quantitative facial features have been identified. While these associations have been shown to influence morphology by altering the mean size and shape of facial measures, their effect on trait variance remains unclear. We conducted a genome-wide association analysis for the variance of 20 quantitative facial measurements in 2,447 European individuals and identified several suggestive variance quantitative trait loci (vQTLs). These vQTLs guided us to conduct an efficient search for gene-by-gene (G × G) interactions, which uncovered an interaction between PRICKLE1 and FOCAD affecting cranial base width. We replicated this G × G interaction signal at the locus level in an additional 5,128 Korean individuals. We used the hypomorphic Prickle1 Beetlejuice (Prickle1 Bj ) mouse line to directly test the function of Prickle1 on the cranial base and observed wider cranial bases in Prickle1 Bj/Bj . Importantly, we observed that the Prickle1 and Focadhesin proteins co-localize in murine cranial base chondrocytes, and this co-localization is abnormal in the Prickle1 Bj/Bj mutants. Taken together, our findings uncovered a novel G × G interaction effect in humans with strong support from both epidemiological and molecular studies. These results highlight the potential of studying measures of phenotypic variability in gene mapping studies of facial morphology.
Collapse
Affiliation(s)
- Dongjing Liu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Hyo-Jeong Ban
- Future Medicine Division, Korea Institute of Oriental Medicine, Daejeon, South Korea
| | - Ahmed M. El Sergani
- Center for Craniofacial and Dental Genetics, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Oral and Craniofacial Sciences, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Myoung Keun Lee
- Center for Craniofacial and Dental Genetics, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Jacqueline T. Hecht
- Department of Pediatrics, McGovern Medical Center, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - George L. Wehby
- Department of Health Management and Policy, The University of Iowa, Iowa City, IA, United States
| | - Lina M. Moreno
- Department of Orthodontics, The University of Iowa, Iowa City, IA, United States
| | - Eleanor Feingold
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, United States
| | - Mary L. Marazita
- Center for Craniofacial and Dental Genetics, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Oral and Craniofacial Sciences, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Psychiatry, Clinical and Translational Science Institute, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Seongwon Cha
- Future Medicine Division, Korea Institute of Oriental Medicine, Daejeon, South Korea
| | - Heather L. Szabo-Rogers
- Department of Oral and Craniofacial Sciences, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Developmental Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States
- Regenerative Medicine at the McGowan Institute, University of Pittsburgh, Pittsburgh, PA, United States
- Center for Craniofacial Regeneration, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, United States
| | - Seth M. Weinberg
- Center for Craniofacial and Dental Genetics, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Oral and Craniofacial Sciences, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, United States
| | - John R. Shaffer
- Center for Craniofacial and Dental Genetics, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Oral and Craniofacial Sciences, School of Dental Medicine, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, United States
| |
Collapse
|
45
|
Genetics of synucleins in neurodegenerative diseases. Acta Neuropathol 2021; 141:471-490. [PMID: 32740728 DOI: 10.1007/s00401-020-02202-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 07/23/2020] [Accepted: 07/24/2020] [Indexed: 12/14/2022]
Abstract
The SNCA locus currently has an indisputable role in Parkinson's disease and other synucleinopathies. The role of genetic variability in the other members of the synuclein family (SNCB and SNCG) in disease is far less clear. In this review, we critically assess the pathogenicity, main characteristics, and roles of genetic variants in these genes reported to be causative of synucleinopathies. We also summarize the different association signals identified in the SNCA locus that have been associated with risk for disease. We take a bird's eye view of the variability currently reported in the general population for the three genes and use these data to infer on the potential relationship between each of the genes and human disease.
Collapse
|
46
|
Montesinos-López A, Montesinos-López OA, Montesinos-López JC, Flores-Cortes CA, de la Rosa R, Crossa J. A guide for kernel generalized regression methods for genomic-enabled prediction. Heredity (Edinb) 2021; 126:577-596. [PMID: 33649571 PMCID: PMC8115678 DOI: 10.1038/s41437-021-00412-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Revised: 01/23/2021] [Accepted: 01/24/2021] [Indexed: 01/30/2023] Open
Abstract
The primary objective of this paper is to provide a guide on implementing Bayesian generalized kernel regression methods for genomic prediction in the statistical software R. Such methods are quite efficient for capturing complex non-linear patterns that conventional linear regression models cannot. Furthermore, these methods are also powerful for leveraging environmental covariates, such as genotype × environment (G×E) prediction, among others. In this study we provide the building process of seven kernel methods: linear, polynomial, sigmoid, Gaussian, Exponential, Arc-cosine 1 and Arc-cosine L. Additionally, we highlight illustrative examples for implementing exact kernel methods for genomic prediction under a single-environment, a multi-environment and multi-trait framework, as well as for the implementation of sparse kernel methods under a multi-environment framework. These examples are followed by a discussion on the strengths and limitations of kernel methods and, subsequently by conclusions about the main contributions of this paper.
Collapse
Affiliation(s)
- Abelardo Montesinos-López
- Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430, Guadalajara, Jalisco, México
| | | | | | | | - Roberto de la Rosa
- Colegio de Postgraduados (CP), Campus Tabasco, Producción Agroalimentaria en el Trópico, H. Cárdenas, Tabasco, México
| | - José Crossa
- Colegio de Postgraduados, Campus Montecillos, CP 56230, Montecillos, Edo. de México, México.
- Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Km 45, CP 52640, Carretera Mexico-Veracruz, México.
| |
Collapse
|
47
|
Contreras MG, Keys K, Magaña J, Goddard PC, Risse-Adams O, Zeiger AM, Mak AC, Samedy-Bates LA, Neophytou AM, Lee E, Thakur N, Elhawary JR, Hu D, Huntsman S, Eng C, Hu T, Burchard EG, White MJ. Native American Ancestry and Air Pollution Interact to Impact Bronchodilator Response in Puerto Rican Children with Asthma. Ethn Dis 2021; 31:77-88. [PMID: 33519158 PMCID: PMC7843041 DOI: 10.18865/ed.31.1.77] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Objective Asthma is the most common chronic disease in children. Short-acting bronchodilator medications are the most commonly prescribed asthma treatment worldwide, regardless of disease severity. Puerto Rican children display the highest asthma morbidity and mortality of any US population. Alarmingly, Puerto Rican children with asthma display poor bronchodilator drug response (BDR). Reduced BDR may explain, in part, the increased asthma morbidity and mortality observed in Puerto Rican children with asthma. Gene-environment interactions may explain a portion of the heritability of BDR. We aimed to identify gene-environment interactions associated with BDR in Puerto Rican children with asthma. Setting Genetic, environmental, and psycho-social data from the Genes-environments and Admixture in Latino Americans (GALA II) case-control study. Participants Our discovery dataset consisted of 658 Puerto Rican children with asthma; our replication dataset consisted of 514 Mexican American children with asthma. Main Outcome Measures We assessed the association of pairwise interaction models with BDR using ViSEN (Visualization of Statistical Epistasis Networks). Results We identified a non-linear interaction between Native American genetic ancestry and air pollution significantly associated with BDR in Puerto Rican children with asthma. This interaction was robust to adjustment for age and sex but was not significantly associated with BDR in our replication population. Conclusions Decreased Native American ancestry coupled with increased air pollution exposure was associated with increased BDR in Puerto Rican children with asthma. Our study acknowledges BDR's phenotypic complexity, and emphasizes the importance of integrating social, environmental, and biological data to further our understanding of complex disease.
Collapse
Affiliation(s)
- María G. Contreras
- Department of Medicine, University of California, San Francisco, CA
- SF BUILD, San Francisco State University, San Francisco, CA
- MARC, San Francisco State University, San Francisco, CA
| | - Kevin Keys
- Department of Medicine, University of California, San Francisco, CA
| | - Joaquin Magaña
- Department of Medicine, University of California, San Francisco, CA
| | - Pagé C. Goddard
- Department of Medicine, University of California, San Francisco, CA
| | - Oona Risse-Adams
- Department of Medicine, University of California, San Francisco, CA
- Lowell Science Research Program, Lowell High School, San Francisco, CA
| | - Andrew M. Zeiger
- Department of Medicine, University of California, San Francisco, CA
- Department of Biology, University of Washington, Seattle, WA
| | - Angel C.Y. Mak
- Department of Medicine, University of California, San Francisco, CA
| | - Lesly-Anne Samedy-Bates
- Department of Medicine, University of California, San Francisco, CA
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA
| | - Andreas M. Neophytou
- Environmental Health Sciences Division, Berkeley School of Public Health, Berkeley, CA
- Department of Environmental and Radiological Health Sciences, Colorado State University, Fort Collins, CO
| | - Eunice Lee
- National Institute of Environmental Health Sciences, Cary NC
| | - Neeta Thakur
- Department of Medicine, University of California, San Francisco, CA
| | | | - Donglei Hu
- Department of Medicine, University of California, San Francisco, CA
| | - Scott Huntsman
- Department of Medicine, University of California, San Francisco, CA
| | - Celeste Eng
- Department of Medicine, University of California, San Francisco, CA
| | - Ting Hu
- School of Computing, Queen’s University, Kingston, ON, Canada
| | - Esteban G. Burchard
- Department of Medicine, University of California, San Francisco, CA
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA
| | | |
Collapse
|
48
|
Zhou F, Ren J, Lu X, Ma S, Wu C. Gene-Environment Interaction: A Variable Selection Perspective. Methods Mol Biol 2021; 2212:191-223. [PMID: 33733358 DOI: 10.1007/978-1-0716-0947-7_13] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Gene-environment interactions have important implications for elucidating the genetic basis of complex diseases beyond the joint function of multiple genetic factors and their interactions (or epistasis). In the past, G × E interactions have been mainly conducted within the framework of genetic association studies. The high dimensionality of G × E interactions, due to the complicated form of environmental effects and the presence of a large number of genetic factors including gene expressions and SNPs, has motivated the recent development of penalized variable selection methods for dissecting G × E interactions, which has been ignored in the majority of published reviews on genetic interaction studies. In this article, we first survey existing studies on both gene-environment and gene-gene interactions. Then, after a brief introduction to the variable selection methods, we review penalization and relevant variable selection methods in marginal and joint paradigms, respectively, under a variety of conceptual models. Discussions on strengths and limitations, as well as computational aspects of the variable selection methods tailored for G × E studies, have also been provided.
Collapse
Affiliation(s)
- Fei Zhou
- Department of Statistics, Kansas State University, Manhattan, KS, USA
| | - Jie Ren
- Department of Biostatistics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Xi Lu
- Department of Statistics, Kansas State University, Manhattan, KS, USA
| | - Shuangge Ma
- Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, USA
| | - Cen Wu
- Department of Statistics, Kansas State University, Manhattan, KS, USA.
| |
Collapse
|
49
|
A Novel Mapping Strategy Utilizing Mouse Chromosome Substitution Strains Identifies Multiple Epistatic Interactions That Regulate Complex Traits. G3-GENES GENOMES GENETICS 2020; 10:4553-4563. [PMID: 33023974 PMCID: PMC7718749 DOI: 10.1534/g3.120.401824] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The genetic contribution of additive vs. non-additive (epistatic) effects in the regulation of complex traits is unclear. While genome-wide association studies typically ignore gene-gene interactions, in part because of the lack of statistical power for detecting them, mouse chromosome substitution strains (CSSs) represent an alternate approach for detecting epistasis given their limited allelic variation. Therefore, we utilized CSSs to identify and map both additive and epistatic loci that regulate a range of hematologic- and metabolism-related traits, as well as hepatic gene expression. Quantitative trait loci (QTL) were identified using a CSS-based backcross strategy involving the segregation of variants on the A/J-derived substituted chromosomes 4 and 6 on an otherwise C57BL/6J genetic background. In the liver transcriptomes of offspring from this cross, we identified and mapped additive QTL regulating the hepatic expression of 768 genes, and epistatic QTL pairs for 519 genes. Similarly, we identified additive QTL for fat pad weight, platelets, and the percentage of granulocytes in blood, as well as epistatic QTL pairs controlling the percentage of lymphocytes in blood and red cell distribution width. The variance attributed to the epistatic QTL pairs was approximately equal to that of the additive QTL; however, the SNPs in the epistatic QTL pairs that accounted for the largest variances were undetected in our single locus association analyses. These findings highlight the need to account for epistasis in association studies, and more broadly demonstrate the importance of identifying genetic interactions to understand the complete genetic architecture of complex traits.
Collapse
|
50
|
Kalyakulina A, Iannuzzi V, Sazzini M, Garagnani P, Jalan S, Franceschi C, Ivanchenko M, Giuliani C. Investigating Mitonuclear Genetic Interactions Through Machine Learning: A Case Study on Cold Adaptation Genes in Human Populations From Different European Climate Regions. Front Physiol 2020; 11:575968. [PMID: 33262703 PMCID: PMC7686538 DOI: 10.3389/fphys.2020.575968] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 10/14/2020] [Indexed: 01/18/2023] Open
Abstract
Cold climates represent one of the major environmental challenges that anatomically modern humans faced during their dispersal out of Africa. The related adaptive traits have been achieved by modulation of thermogenesis and thermoregulation processes where nuclear (nuc) and mitochondrial (mt) genes play a major role. In human populations, mitonuclear genetic interactions are the result of both the peculiar genetic history of each human group and the different environments they have long occupied. This study aims to investigate mitonuclear genetic interactions by considering all the mitochondrial genes and 28 nuclear genes involved in brown adipose tissue metabolism, which have been previously hypothesized to be crucial for cold adaptation. For this purpose, we focused on three human populations (i.e., Finnish, British, and Central Italian people) of European ancestry from different biogeographical and climatic areas, and we used a machine learning approach to identify relevant nucDNA–mtDNA interactions that characterized each population. The obtained results are twofold: (i) at the methodological level, we demonstrated that a machine learning approach is able to detect patterns of genetic structure among human groups from different latitudes both at single genes and by considering combinations of mtDNA and nucDNA loci; (ii) at the biological level, the analysis identified population-specific nuclear genes and variants that likely play a relevant biological role in association with a mitochondrial gene (such as the “obesity gene” FTO in Finnish people). Further studies are needed to fully elucidate the evolutionary dynamics (e.g., migration, admixture, and/or local adaptation) that shaped these nucDNA–mtDNA interactions and their functional role.
Collapse
Affiliation(s)
- Alena Kalyakulina
- Department of Applied Mathematics, Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University of Nizhny Novgorod, Nizhny Novgorod, Russia
| | - Vincenzo Iannuzzi
- Alma Mater Research Institute on Global Challenges and Climate Change (Alma Climate), University of Bologna, Bologna, Italy.,Laboratory of Molecular Anthropology and Centre for Genome Biology, Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy
| | - Marco Sazzini
- Laboratory of Molecular Anthropology and Centre for Genome Biology, Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy
| | - Paolo Garagnani
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), University of Bologna, Bologna, Italy
| | - Sarika Jalan
- Complex Systems Laboratory, Discipline of Physics, Indian Institute of Technology Indore, Indore, India.,Center for Theoretical Physics of Complex Systems, Institute for Basic Science (IBS), Daejeon, South Korea
| | - Claudio Franceschi
- Laboratory of Systems Medicine of Healthy Aging, Lobachevsky State University of Nizhny Novgorod, Nizhny Novgorod, Russia
| | - Mikhail Ivanchenko
- Department of Applied Mathematics, Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University of Nizhny Novgorod, Nizhny Novgorod, Russia.,Laboratory of Systems Medicine of Healthy Aging, Lobachevsky State University of Nizhny Novgorod, Nizhny Novgorod, Russia
| | - Cristina Giuliani
- Laboratory of Molecular Anthropology and Centre for Genome Biology, Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy.,School of Anthropology and Museum Ethnography, University of Oxford, Oxford, United Kingdom
| |
Collapse
|