1
|
Mowlaei ME, Shi X. FSF-GA: A Feature Selection Framework for Phenotype Prediction Using Genetic Algorithms. Genes (Basel) 2023; 14:genes14051059. [PMID: 37239419 DOI: 10.3390/genes14051059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 05/04/2023] [Accepted: 05/06/2023] [Indexed: 05/28/2023] Open
Abstract
(1) Background: Phenotype prediction is a pivotal task in genetics in order to identify how genetic factors contribute to phenotypic differences. This field has seen extensive research, with numerous methods proposed for predicting phenotypes. Nevertheless, the intricate relationship between genotypes and complex phenotypes, including common diseases, has resulted in an ongoing challenge to accurately decipher the genetic contribution. (2) Results: In this study, we propose a novel feature selection framework for phenotype prediction utilizing a genetic algorithm (FSF-GA) that effectively reduces the feature space to identify genotypes contributing to phenotype prediction. We provide a comprehensive vignette of our method and conduct extensive experiments using a widely used yeast dataset. (3) Conclusions: Our experimental results show that our proposed FSF-GA method delivers comparable phenotype prediction performance as compared to baseline methods, while providing features selected for predicting phenotypes. These selected feature sets can be used to interpret the underlying genetic architecture that contributes to phenotypic variation.
Collapse
Affiliation(s)
- Mohammad Erfan Mowlaei
- Department of Computer and Information Sciences, Temple University, 925 N. 12th Street, Philadelphia, PA 19122, USA
| | - Xinghua Shi
- Department of Computer and Information Sciences, Temple University, 925 N. 12th Street, Philadelphia, PA 19122, USA
| |
Collapse
|
2
|
Deep polygenic neural network for predicting and identifying yield-associated genes in Indonesian rice accessions. Sci Rep 2022; 12:13823. [PMID: 35970979 PMCID: PMC9378700 DOI: 10.1038/s41598-022-16075-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 07/04/2022] [Indexed: 11/12/2022] Open
Abstract
As the fourth most populous country in the world, Indonesia must increase the annual rice production rate to achieve national food security by 2050. One possible solution comes from the nanoscopic level: a genetic variant called Single Nucleotide Polymorphism (SNP), which can express significant yield-associated genes. The prior benchmark of this study utilized a statistical genetics model where no SNP position information and attention mechanism were involved. Hence, we developed a novel deep polygenic neural network, named the NucleoNet model, to address these obstacles. The NucleoNets were constructed with the combination of prominent components that include positional SNP encoding, the context vector, wide models, Elastic Net, and Shannon’s entropy loss. This polygenic modeling obtained up to 2.779 of Mean Squared Error (MSE) with 47.156% of Symmetric Mean Absolute Percentage Error (SMAPE), while revealing 15 new important SNPs. Furthermore, the NucleoNets reduced the MSE score up to 32.28% compared to the Ordinary Least Squares (OLS) model. Through the ablation study, we learned that the combination of Xavier distribution for weights initialization and Normal distribution for biases initialization sparked more various important SNPs throughout 12 chromosomes. Our findings confirmed that the NucleoNet model was successfully outperformed the OLS model and identified important SNPs to Indonesian rice yields.
Collapse
|
3
|
Zhang M, Lu N, Jiang L, Liu B, Fei Y, Ma W, Shi C, Wang J. Multiple dynamic models reveal the genetic architecture for growth in height of Catalpa bungei in the field. TREE PHYSIOLOGY 2022; 42:1239-1255. [PMID: 34940852 DOI: 10.1093/treephys/tpab171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 12/19/2021] [Indexed: 06/14/2023]
Abstract
Growth in height (GH) is a critical determinant for tree survival and development in forests and can be depicted using logistic growth curves. Our understanding of the genetic mechanism underlying dynamic GH, however, is limited, particularly under field conditions. We applied two mapping models (Funmap and FVTmap) to find quantitative trait loci responsible for dynamic GH and two epistatic models (2HiGWAS and 1HiGWAS) to detect epistasis in Catalpa bungei grown in the field. We identified 13 co-located quantitative trait loci influencing the growth curve by Funmap and three heterochronic parameters (the timing of the inflection point, maximum acceleration and maximum deceleration) by FVTmap. The combined use of FVTmap and Funmap reduced the number of candidate genes by >70%. We detected 76 significant epistatic interactions, amongst which a key gene, COMT14, co-located by three models (but not 1HiGWAS) interacted with three other genes, implying that a novel network of protein interaction centered on COMT14 may control the dynamic GH of C. bungei. These findings provide new insights into the genetic mechanisms underlying the dynamic growth in tree height in natural environments and emphasize the necessity of incorporating multiple dynamic models for screening more reliable candidate genes.
Collapse
Affiliation(s)
- Miaomiao Zhang
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China
| | - Nan Lu
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China
| | - Libo Jiang
- School of Life Sciences and Medicine, Shandong University of Technology, Zibo 255049, China
| | - Bingyang Liu
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China
| | - Yue Fei
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China
| | - Wenjun Ma
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China
| | - Chaozhong Shi
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China
| | - Junhui Wang
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China
| |
Collapse
|