1
|
Chien CH, Huang LY, Lo SF, Chen LJ, Liao CC, Chen JJ, Chu YW. Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants. Front Genet 2021; 12:798107. [PMID: 34976025 PMCID: PMC8718795 DOI: 10.3389/fgene.2021.798107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 11/15/2021] [Indexed: 11/13/2022] Open
Abstract
To change the expression of the flanking genes by inserting T-DNA into the genome is commonly used in rice functional gene research. However, whether the expression of a gene of interest is enhanced must be validated experimentally. Consequently, to improve the efficiency of screening activated genes, we established a model to predict gene expression in T-DNA mutants through machine learning methods. We gathered experimental datasets consisting of gene expression data in T-DNA mutants and captured the PROMOTER and MIDDLE sequences for encoding. In first-layer models, support vector machine (SVM) models were constructed with nine features consisting of information about biological function and local and global sequences. Feature encoding based on the PROMOTER sequence was weighted by logistic regression. The second-layer models integrated 16 first-layer models with minimum redundancy maximum relevance (mRMR) feature selection and the LADTree algorithm, which were selected from nine feature selection methods and 65 classified methods, respectively. The accuracy of the final two-layer machine learning model, referred to as TIMgo, was 99.3% based on fivefold cross-validation, and 85.6% based on independent testing. We discovered that the information within the local sequence had a greater contribution than the global sequence with respect to classification. TIMgo had a good predictive ability for target genes within 20 kb from the 35S enhancer. Based on the analysis of significant sequences, the G-box regulatory sequence may also play an important role in the activation mechanism of the 35S enhancer.
Collapse
Affiliation(s)
- Ching-Hsuan Chien
- Ph.D. Program in Medical Biotechnology, National Chung Hsing University, Taichung, Taiwan
| | - Lan-Ying Huang
- Ph.D. Program in Medical Biotechnology, National Chung Hsing University, Taichung, Taiwan
| | - Shuen-Fang Lo
- Biotechnology Center, National Chung Hsing University, Taichung, Taiwan
| | - Liang-Jwu Chen
- Institute of Molecular Biology, National Chung Hsing University, Taichung, Taiwan
- Advanced Plant Biotechnology Center National Chung Hsing University, Taichung, Taiwan
| | - Chi-Chou Liao
- Institute of Molecular Biology, National Chung Hsing University, Taichung, Taiwan
| | - Jia-Jyun Chen
- Institute of Genomics and Bioinformatics, National Chung Hsing University, Taichung, Taiwan
| | - Yen-Wei Chu
- Ph.D. Program in Medical Biotechnology, National Chung Hsing University, Taichung, Taiwan
- Biotechnology Center, National Chung Hsing University, Taichung, Taiwan
- Institute of Molecular Biology, National Chung Hsing University, Taichung, Taiwan
- Institute of Genomics and Bioinformatics, National Chung Hsing University, Taichung, Taiwan
- Agricultural Biotechnology Center, National Chung Hsing University, Taichung, Taiwan
- Ph.D. Program in Translational Medicine, National Chung Hsing University, Taichung, Taiwan
- Rong Hsing Research Center for Translational Medicine, National Chung Hsing University, Taichung, Taiwan
| |
Collapse
|
2
|
Hsieh KT, Chen YT, Hu TJ, Lin SM, Hsieh CH, Liu SH, Shiue SY, Lo SF, Wang IW, Tseng CS, Chen LJ. Comparisons within the Rice GA 2-Oxidase Gene Family Revealed Three Dominant Paralogs and a Functional Attenuated Gene that Led to the Identification of Four Amino Acid Variants Associated with GA Deactivation Capability. RICE (NEW YORK, N.Y.) 2021; 14:70. [PMID: 34322729 PMCID: PMC8319247 DOI: 10.1186/s12284-021-00499-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 06/03/2021] [Indexed: 05/16/2023]
Abstract
BACKGROUND GA 2-oxidases (GA2oxs) are involved in regulating GA homeostasis in plants by inactivating bioactive GAs through 2β-hydroxylation. Rice GA2oxs are encoded by a family of 10 genes; some of them have been characterized, but no comprehensive comparisons for all these genes have been conducted. RESULTS Rice plants with nine functional GA2oxs were demonstrated in the present study, and these genes not only were differentially expressed but also revealed various capabilities for GA deactivation based on their height-reducing effects in transgenic plants. Compared to that of wild-type plants, the relative plant height (RPH) of transgenic plants was scored to estimate their reducing effects, and 8.3% to 59.5% RPH was observed. Phylogenetic analysis of class I GA2ox genes revealed two functionally distinct clades in the Poaceae. The OsGA2ox3, 4, and 8 genes belonging to clade A showed the most severe effect (8.3% to 8.7% RPH) on plant height reduction, whereas the OsGA2ox7 gene belonging to clade B showed the least severe effect (59.5% RPH). The clade A OsGA2ox3 gene contained two conserved C186/C194 amino acids that were crucial for enzymatic activity. In the present study, these amino acids were replaced with OsGA2ox7-conserved arginine (C186R) and proline (C194P), respectively, or simultaneously (C186R/C194P) to demonstrate their importance in planta. Another two amino acids, Q220 and Y274, conserved in OsGA2ox3 were substituted with glutamic acid (E) and phenylalanine (F), respectively, or simultaneously to show their significance in planta. In addition, through sequence divergence, RNA expression profile and GA deactivation capability analyses, we proposed that OsGA2ox1, OsGA2ox3 and OsGA2ox6 function as the predominant paralogs in each of their respective classes. CONCLUSIONS This study demonstrates rice has nine functional GA2oxs and the class I GA2ox genes are divided into two functionally distinct clades. Among them, the OsGA2ox7 of clade B is a functional attenuated gene and the OsGA2ox1, OsGA2ox3 and OsGA2ox6 are the three predominant paralogs in the family.
Collapse
Affiliation(s)
- Kun-Ting Hsieh
- Institute of Molecular Biology, National Chung Hsing University, Taichung, 40227, Taiwan
| | - Yi-Ting Chen
- Institute of Molecular Biology, National Chung Hsing University, Taichung, 40227, Taiwan
| | - Ting-Jen Hu
- Institute of Molecular Biology, National Chung Hsing University, Taichung, 40227, Taiwan
| | - Shih-Min Lin
- Institute of Molecular Biology, National Chung Hsing University, Taichung, 40227, Taiwan
| | - Chih-Hung Hsieh
- Institute of Molecular Biology, National Chung Hsing University, Taichung, 40227, Taiwan
| | - Su-Hui Liu
- Institute of Molecular Biology, National Chung Hsing University, Taichung, 40227, Taiwan
| | - Shiau-Yu Shiue
- Institute of Molecular Biology, National Chung Hsing University, Taichung, 40227, Taiwan
| | - Shuen-Fang Lo
- Biotechnology Center, National Chung Hsing University, Taichung, 40227, Taiwan
| | - I-Wen Wang
- Division of Biotechnology, Taiwan Agriculture Research Institute, Taichung, 41362, Taiwan
| | - Ching-Shan Tseng
- Division of Biotechnology, Taiwan Agriculture Research Institute, Taichung, 41362, Taiwan
| | - Liang-Jwu Chen
- Institute of Molecular Biology, National Chung Hsing University, Taichung, 40227, Taiwan.
- Biotechnology Center, National Chung Hsing University, Taichung, 40227, Taiwan.
| |
Collapse
|
3
|
Danilevicz MF, Tay Fernandez CG, Marsh JI, Bayer PE, Edwards D. Plant pangenomics: approaches, applications and advancements. CURRENT OPINION IN PLANT BIOLOGY 2020; 54:18-25. [PMID: 31982844 DOI: 10.1016/j.pbi.2019.12.005] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 12/15/2019] [Accepted: 12/18/2019] [Indexed: 05/05/2023]
Abstract
With the assembly of increasing numbers of plant genomes, it is becoming accepted that a single reference assembly does not reflect the gene diversity of a species. The production of pangenomes, which reflect the structural variation and polymorphisms in genomes, enables in depth comparisons of variation within species or higher taxonomic groups. In this review, we discuss the current and emerging approaches for pangenome assembly, analysis and visualisation. In addition, we consider the potential of pangenomes for applied crop improvement, evolutionary and biodiversity studies. To fully exploit the value of pangenomes it is important to integrate broad information such as phenotypic, environmental, and expression data to gain insights into the role of variable regions within genomes.
Collapse
Affiliation(s)
- Monica Furaste Danilevicz
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | | | - Jacob Ian Marsh
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Philipp Emanuel Bayer
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia.
| |
Collapse
|