1
|
Farooq M, van Dijk ADJ, Nijveen H, Aarts MGM, Kruijer W, Nguyen TP, Mansoor S, de Ridder D. Prior Biological Knowledge Improves Genomic Prediction of Growth-Related Traits in Arabidopsis thaliana. Front Genet 2021; 11:609117. [PMID: 33552126 PMCID: PMC7855462 DOI: 10.3389/fgene.2020.609117] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 12/21/2020] [Indexed: 01/11/2023] Open
Abstract
Prediction of growth-related complex traits is highly important for crop breeding. Photosynthesis efficiency and biomass are direct indicators of overall plant performance and therefore even minor improvements in these traits can result in significant breeding gains. Crop breeding for complex traits has been revolutionized by technological developments in genomics and phenomics. Capitalizing on the growing availability of genomics data, genome-wide marker-based prediction models allow for efficient selection of the best parents for the next generation without the need for phenotypic information. Until now such models mostly predict the phenotype directly from the genotype and fail to make use of relevant biological knowledge. It is an open question to what extent the use of such biological knowledge is beneficial for improving genomic prediction accuracy and reliability. In this study, we explored the use of publicly available biological information for genomic prediction of photosynthetic light use efficiency (Φ PSII ) and projected leaf area (PLA) in Arabidopsis thaliana. To explore the use of various types of knowledge, we mapped genomic polymorphisms to Gene Ontology (GO) terms and transcriptomics-based gene clusters, and applied these in a Genomic Feature Best Linear Unbiased Predictor (GFBLUP) model, which is an extension to the traditional Genomic BLUP (GBLUP) benchmark. Our results suggest that incorporation of prior biological knowledge can improve genomic prediction accuracy for both Φ PSII and PLA. The improvement achieved depends on the trait, type of knowledge and trait heritability. Moreover, transcriptomics offers complementary evidence to the Gene Ontology for improvement when used to define functional groups of genes. In conclusion, prior knowledge about trait-specific groups of genes can be directly translated into improved genomic prediction.
Collapse
Affiliation(s)
- Muhammad Farooq
- Bioinformatics Group, Wageningen University, Wageningen, Netherlands
- Molecular Virology and Gene Silencing Lab, Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Punjab, Pakistan
| | - Aalt D. J. van Dijk
- Bioinformatics Group, Wageningen University, Wageningen, Netherlands
- Biometris, Wageningen University, Wageningen, Netherlands
| | - Harm Nijveen
- Bioinformatics Group, Wageningen University, Wageningen, Netherlands
| | - Mark G. M. Aarts
- Laboratory of Genetics, Wageningen University, Wageningen, Netherlands
| | - Willem Kruijer
- Biometris, Wageningen University, Wageningen, Netherlands
| | - Thu-Phuong Nguyen
- Laboratory of Genetics, Wageningen University, Wageningen, Netherlands
| | - Shahid Mansoor
- Molecular Virology and Gene Silencing Lab, Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Punjab, Pakistan
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University, Wageningen, Netherlands
| |
Collapse
|
2
|
Podia V, Milioni D, Martzikou M, Haralampidis K. The role of Arabidopsis thaliana RASD1 gene in ABA-dependent abiotic stress response. PLANT BIOLOGY (STUTTGART, GERMANY) 2018; 20:307-317. [PMID: 29125669 DOI: 10.1111/plb.12662] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Accepted: 11/06/2017] [Indexed: 06/07/2023]
Abstract
Abiotic stress is one of the key parameters affecting plant productivity. Drought and soil salinity, in particular, challenge plants to activate various response mechanisms to withstand these adverse growth conditions. While the molecular events that take place are complex and to a large extent unclear, the plant hormone abscisic acid (ABA) is considered a major player in mediating the adaptation of plants to stress. Here we report the identification of an ABA-insensitive mutant from Arabidopsis thaliana. A combination of molecular, genetic and physiology approaches were implemented, to characterise the AtRASD1 locus (RESPONSIVENESS TO ABA SALT AND DROUGHT 1) and to investigate its role in plant development. RASD1 is expressed predominantly in the vascular system of A. thaliana and encodes a peptide of unknown function with no similarity to any known sequence to date. The protein is localised in the nucleus and the cytoplasm, and RASD1-impaired plants are drought-intolerant and insensitive to exogenous ABA and NaCl during germination and root growth. Our data indicate that RASD1 is involved in ABA-dependent signal transduction pathways and therefore in enabling plants to activate response mechanisms related to seed germination and abiotic stress.
Collapse
Affiliation(s)
- V Podia
- Faculty of Biology, Department of Botany, National and Kapodistrian University of Athens, Athens, Greece
| | - D Milioni
- Department of Agricultural Biotechnology, Agricultural University of Athens, Athens, Greece
| | - M Martzikou
- Faculty of Biology, Department of Botany, National and Kapodistrian University of Athens, Athens, Greece
| | - K Haralampidis
- Faculty of Biology, Department of Botany, National and Kapodistrian University of Athens, Athens, Greece
| |
Collapse
|
3
|
Zhang L, Kong H, Ma H, Yang J. Phylogenomic detection and functional prediction of genes potentially important for plant meiosis. Gene 2018; 643:83-97. [PMID: 29223357 DOI: 10.1016/j.gene.2017.12.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Revised: 11/18/2017] [Accepted: 12/04/2017] [Indexed: 11/17/2022]
Abstract
Meiosis is a specialized type of cell division necessary for sexual reproduction in eukaryotes. A better understanding of the cytological procedures of meiosis has been achieved by comprehensive cytogenetic studies in plants, while the genetic mechanisms regulating meiotic progression remain incompletely understood. The increasing accumulation of complete genome sequences and large-scale gene expression datasets has provided a powerful resource for phylogenomic inference and unsupervised identification of genes involved in plant meiosis. By integrating sequence homology and expression data, 164, 131, 124 and 162 genes potentially important for meiosis were identified in the genomes of Arabidopsis thaliana, Oryza sativa, Selaginella moellendorffii and Pogonatum aloides, respectively. The predicted genes were assigned to 45 meiotic GO terms, and their functions were related to different processes occurring during meiosis in various organisms. Most of the predicted meiotic genes underwent lineage-specific duplication events during plant evolution, with about 30% of the predicted genes retaining only a single copy in higher plant genomes. The results of this study provided clues to design experiments for better functional characterization of meiotic genes in plants, promoting the phylogenomic approach to the evolutionary dynamics of the plant meiotic machineries.
Collapse
Affiliation(s)
- Luoyan Zhang
- Key Lab of Plant Stress Research, College of Life Science, Shandong Normal University, Jinan, Shandong, China; Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China
| | - Hongzhi Kong
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China
| | - Hong Ma
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China
| | - Ji Yang
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China; Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai, China.
| |
Collapse
|
4
|
Meng J, Xu WY, Chen X, Lin T, Deng XY. Gene locations may contribute to predicting gene regulatory relationships. J Zhejiang Univ Sci B 2018; 19:25-37. [PMID: 29308605 DOI: 10.1631/jzus.b1700303] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We propose that locations of genes on chromosomes can contribute to the prediction of gene regulatory relationships. We constructed a time-based gene regulatory network of zebrafish cardiogenesis on the basis of a spatio-temporal neighborhood method. Through the network, specific regulatory pathways and order of gene expression during zebrafish cardiogenesis were obtained. By comparing the order with locations of these genes on chromosomes, we discovered that there exists a reversal phenomenon between the order and order of gene locations. The discovery provides an inherent rule to instruct exploration of gene regulatory relationships. Specifically, the discovery can help to predict if regulatory relationships between genes exist and contribute to evaluating the correctness of discovered gene regulatory relationships.
Collapse
Affiliation(s)
- Jun Meng
- Department of System Science and Engineering, School of Electrical Engineering, Zhejiang University, Hangzhou 310027, China
| | - Wen-Yuan Xu
- Department of System Science and Engineering, School of Electrical Engineering, Zhejiang University, Hangzhou 310027, China
| | - Xiao Chen
- Department of System Science and Engineering, School of Electrical Engineering, Zhejiang University, Hangzhou 310027, China
| | - Tao Lin
- Laboratory of Machine Learning and Optimization, École Polytechnique Fédérale de Lausanne (EPFL), Route Cantonale, 1015 Lausanne 999034, Switzerland
| | - Xiao-Yu Deng
- Department of System Science and Engineering, School of Electrical Engineering, Zhejiang University, Hangzhou 310027, China
| |
Collapse
|
5
|
Nap JP, Sanchez-Perez GF, van Dijk ADJ. Similarities between plant traits based on their connection to underlying gene functions. PLoS One 2017; 12:e0182097. [PMID: 28797052 PMCID: PMC5552327 DOI: 10.1371/journal.pone.0182097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 07/12/2017] [Indexed: 11/19/2022] Open
Abstract
Understanding of phenotypes and their genetic basis is a major focus in current plant biology. Large amounts of phenotype data are being generated, both for macroscopic phenotypes such as size or yield, and for molecular phenotypes such as expression levels and metabolite levels. More insight in the underlying genetic and molecular mechanisms that influence phenotypes will enable a better understanding of how various phenotypes are related to each other. This will be a major step forward in understanding plant biology, with immediate value for plant breeding and academic plant research. Currently the genetic basis of most phenotypes remains however to be discovered, and the relatedness of different traits is unclear. We here present a novel approach to connect phenotypes to underlying biological processes and molecular functions. These connections define similarities between different types of phenotypes. The approach starts by using Quantitative Trait Locus (QTL) data, which are abundantly available for many phenotypes of interest. Overrepresentation analysis of gene functions based on Gene Ontology term enrichment across multiple QTL regions for a given phenotype, be it macroscopic or molecular, results in a small set of biological processes and molecular functions for each phenotype. Subsequently, similarity between different phenotypes can be defined in terms of these gene functions. Using publicly available rice data as example, a close relationship with defined molecular phenotypes is demonstrated for many macroscopic phenotypes. This includes for example a link between 'leaf senescence' and 'aspartic acid', as well as between 'days to maturity' and 'choline'. Relationships between macroscopic and molecular phenotypes may result in more efficient marker-assisted breeding and are likely to direct future research aimed at a better understanding of plant phenotypes.
Collapse
Affiliation(s)
- Jan-Peter Nap
- Applied Bioinformatics, Wageningen University & Research, Droevendaalsesteeg 1, PB Wageningen, The Netherlands
| | - Gabino F. Sanchez-Perez
- Applied Bioinformatics, Wageningen University & Research, Droevendaalsesteeg 1, PB Wageningen, The Netherlands
- Laboratory of Bioinformatics, Wageningen University & Research, Droevendaalsesteeg 1, PB Wageningen, The Netherlands
| | - Aalt D. J. van Dijk
- Applied Bioinformatics, Wageningen University & Research, Droevendaalsesteeg 1, PB Wageningen, The Netherlands
- Laboratory of Bioinformatics, Wageningen University & Research, Droevendaalsesteeg 1, PB Wageningen, The Netherlands
- Biometris, Wageningen University & Research, Droevendaalsesteeg 1, PB Wageningen, The Netherlands
- * E-mail:
| |
Collapse
|
6
|
Kurotani A, Yamada Y, Sakurai T. Alga-PrAS (Algal Protein Annotation Suite): A Database of Comprehensive Annotation in Algal Proteomes. PLANT & CELL PHYSIOLOGY 2017; 58:e6. [PMID: 28069893 PMCID: PMC5444574 DOI: 10.1093/pcp/pcw212] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2016] [Accepted: 11/24/2016] [Indexed: 06/06/2023]
Abstract
Algae are smaller organisms than land plants and offer clear advantages in research over terrestrial species in terms of rapid production, short generation time and varied commercial applications. Thus, studies investigating the practical development of effective algal production are important and will improve our understanding of both aquatic and terrestrial plants. In this study we estimated multiple physicochemical and secondary structural properties of protein sequences, the predicted presence of post-translational modification (PTM) sites, and subcellular localization using a total of 510,123 protein sequences from the proteomes of 31 algal and three plant species. Algal species were broadly selected from green and red algae, glaucophytes, oomycetes, diatoms and other microalgal groups. The results were deposited in the Algal Protein Annotation Suite database (Alga-PrAS; http://alga-pras.riken.jp/), which can be freely accessed online.
Collapse
Affiliation(s)
- Atsushi Kurotani
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa, 230-0045, Japan
| | - Yutaka Yamada
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa, 230-0045, Japan
| | - Tetsuya Sakurai
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa, 230-0045, Japan
- Interdisciplinary Science Unit, Multidisciplinary Science Cluster, Research and Education Faculty, Kochi University, 200 Otsu, Monobe, Nankoku, Kochi, 783-8502, Japan
| |
Collapse
|
7
|
Li X, Zhang R, Patena W, Gang SS, Blum SR, Ivanova N, Yue R, Robertson JM, Lefebvre PA, Fitz-Gibbon ST, Grossman AR, Jonikas MC. An Indexed, Mapped Mutant Library Enables Reverse Genetics Studies of Biological Processes in Chlamydomonas reinhardtii. THE PLANT CELL 2016; 28:367-87. [PMID: 26764374 PMCID: PMC4790863 DOI: 10.1105/tpc.15.00465] [Citation(s) in RCA: 250] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Revised: 11/30/2015] [Accepted: 01/11/2016] [Indexed: 05/18/2023]
Abstract
The green alga Chlamydomonas reinhardtii is a leading unicellular model for dissecting biological processes in photosynthetic eukaryotes. However, its usefulness has been limited by difficulties in obtaining mutants in specific genes of interest. To allow generation of large numbers of mapped mutants, we developed high-throughput methods that (1) enable easy maintenance of tens of thousands of Chlamydomonas strains by propagation on agar media and by cryogenic storage, (2) identify mutagenic insertion sites and physical coordinates in these collections, and (3) validate the insertion sites in pools of mutants by obtaining >500 bp of flanking genomic sequences. We used these approaches to construct a stably maintained library of 1935 mapped mutants, representing disruptions in 1562 genes. We further characterized randomly selected mutants and found that 33 out of 44 insertion sites (75%) could be confirmed by PCR, and 17 out of 23 mutants (74%) contained a single insertion. To demonstrate the power of this library for elucidating biological processes, we analyzed the lipid content of mutants disrupted in genes encoding proteins of the algal lipid droplet proteome. This study revealed a central role of the long-chain acyl-CoA synthetase LCS2 in the production of triacylglycerol from de novo-synthesized fatty acids.
Collapse
Affiliation(s)
- Xiaobo Li
- Department of Plant Biology, Carnegie Institution for Science, Stanford, California 94305
| | - Ru Zhang
- Department of Plant Biology, Carnegie Institution for Science, Stanford, California 94305
| | - Weronika Patena
- Department of Plant Biology, Carnegie Institution for Science, Stanford, California 94305
| | - Spencer S Gang
- Department of Plant Biology, Carnegie Institution for Science, Stanford, California 94305
| | - Sean R Blum
- Department of Plant Biology, Carnegie Institution for Science, Stanford, California 94305
| | - Nina Ivanova
- Department of Plant Biology, Carnegie Institution for Science, Stanford, California 94305
| | - Rebecca Yue
- Department of Plant Biology, Carnegie Institution for Science, Stanford, California 94305
| | - Jacob M Robertson
- Department of Plant Biology, Carnegie Institution for Science, Stanford, California 94305
| | - Paul A Lefebvre
- Department of Plant Biology, University of Minnesota, St. Paul, Minnesota 55108
| | - Sorel T Fitz-Gibbon
- Department of Chemistry and Biochemistry, University of California, Los Angeles, California 90095
| | - Arthur R Grossman
- Department of Plant Biology, Carnegie Institution for Science, Stanford, California 94305
| | - Martin C Jonikas
- Department of Plant Biology, Carnegie Institution for Science, Stanford, California 94305
| |
Collapse
|
8
|
Spetale FE, Tapia E, Krsticevic F, Roda F, Bulacio P. A Factor Graph Approach to Automated GO Annotation. PLoS One 2016; 11:e0146986. [PMID: 26771463 PMCID: PMC4714749 DOI: 10.1371/journal.pone.0146986] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 12/23/2015] [Indexed: 12/19/2022] Open
Abstract
As volume of genomic data grows, computational methods become essential for providing a first glimpse onto gene annotations. Automated Gene Ontology (GO) annotation methods based on hierarchical ensemble classification techniques are particularly interesting when interpretability of annotation results is a main concern. In these methods, raw GO-term predictions computed by base binary classifiers are leveraged by checking the consistency of predefined GO relationships. Both formal leveraging strategies, with main focus on annotation precision, and heuristic alternatives, with main focus on scalability issues, have been described in literature. In this contribution, a factor graph approach to the hierarchical ensemble formulation of the automated GO annotation problem is presented. In this formal framework, a core factor graph is first built based on the GO structure and then enriched to take into account the noisy nature of GO-term predictions. Hence, starting from raw GO-term predictions, an iterative message passing algorithm between nodes of the factor graph is used to compute marginal probabilities of target GO-terms. Evaluations on Saccharomyces cerevisiae, Arabidopsis thaliana and Drosophila melanogaster protein sequences from the GO Molecular Function domain showed significant improvements over competing approaches, even when protein sequences were naively characterized by their physicochemical and secondary structure properties or when loose noisy annotation datasets were considered. Based on these promising results and using Arabidopsis thaliana annotation data, we extend our approach to the identification of most promising molecular function annotations for a set of proteins of unknown function in Solanum lycopersicum.
Collapse
Affiliation(s)
- Flavio E. Spetale
- CIFASIS-Conicet Institute, Rosario, Argentina
- Facultad de Cs. Exactas, Ingeniería y Agrimensura, National University of Rosario, Rosario, Argentina
| | - Elizabeth Tapia
- CIFASIS-Conicet Institute, Rosario, Argentina
- Facultad de Cs. Exactas, Ingeniería y Agrimensura, National University of Rosario, Rosario, Argentina
| | - Flavia Krsticevic
- CIFASIS-Conicet Institute, Rosario, Argentina
- Facultad Regional San Nicolás, National Technological University, San Nicolás, Argentina
| | | | - Pilar Bulacio
- CIFASIS-Conicet Institute, Rosario, Argentina
- Facultad de Cs. Exactas, Ingeniería y Agrimensura, National University of Rosario, Rosario, Argentina
- Facultad Regional San Nicolás, National Technological University, San Nicolás, Argentina
| |
Collapse
|
9
|
Serin EAR, Nijveen H, Hilhorst HWM, Ligterink W. Learning from Co-expression Networks: Possibilities and Challenges. FRONTIERS IN PLANT SCIENCE 2016; 7:444. [PMID: 27092161 PMCID: PMC4825623 DOI: 10.3389/fpls.2016.00444] [Citation(s) in RCA: 196] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Accepted: 03/21/2016] [Indexed: 05/18/2023]
Abstract
Plants are fascinating and complex organisms. A comprehensive understanding of the organization, function and evolution of plant genes is essential to disentangle important biological processes and to advance crop engineering and breeding strategies. The ultimate aim in deciphering complex biological processes is the discovery of causal genes and regulatory mechanisms controlling these processes. The recent surge of omics data has opened the door to a system-wide understanding of the flow of biological information underlying complex traits. However, dealing with the corresponding large data sets represents a challenging endeavor that calls for the development of powerful bioinformatics methods. A popular approach is the construction and analysis of gene networks. Such networks are often used for genome-wide representation of the complex functional organization of biological systems. Network based on similarity in gene expression are called (gene) co-expression networks. One of the major application of gene co-expression networks is the functional annotation of unknown genes. Constructing co-expression networks is generally straightforward. In contrast, the resulting network of connected genes can become very complex, which limits its biological interpretation. Several strategies can be employed to enhance the interpretation of the networks. A strategy in coherence with the biological question addressed needs to be established to infer reliable networks. Additional benefits can be gained from network-based strategies using prior knowledge and data integration to further enhance the elucidation of gene regulatory relationships. As a result, biological networks provide many more applications beyond the simple visualization of co-expressed genes. In this study we review the different approaches for co-expression network inference in plants. We analyse integrative genomics strategies used in recent studies that successfully identified candidate genes taking advantage of gene co-expression networks. Additionally, we discuss promising bioinformatics approaches that predict networks for specific purposes.
Collapse
Affiliation(s)
- Elise A. R. Serin
- Wageningen Seed Lab, Laboratory of Plant Physiology, Wageningen UniversityWageningen, Netherlands
| | - Harm Nijveen
- Wageningen Seed Lab, Laboratory of Plant Physiology, Wageningen UniversityWageningen, Netherlands
- Laboratory of Bioinformatics, Wageningen UniversityWageningen, Netherlands
| | - Henk W. M. Hilhorst
- Wageningen Seed Lab, Laboratory of Plant Physiology, Wageningen UniversityWageningen, Netherlands
| | - Wilco Ligterink
- Wageningen Seed Lab, Laboratory of Plant Physiology, Wageningen UniversityWageningen, Netherlands
- *Correspondence: Wilco Ligterink
| |
Collapse
|
10
|
Kurotani A, Sakurai T. In Silico Analysis of Correlations between Protein Disorder and Post-Translational Modifications in Algae. Int J Mol Sci 2015; 16:19812-35. [PMID: 26307970 PMCID: PMC4581327 DOI: 10.3390/ijms160819812] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Revised: 08/12/2015] [Accepted: 08/13/2015] [Indexed: 12/23/2022] Open
Abstract
Recent proteome analyses have reported that intrinsically disordered regions (IDRs) of proteins play important roles in biological processes. In higher plants whose genomes have been sequenced, the correlation between IDRs and post-translational modifications (PTMs) has been reported. The genomes of various eukaryotic algae as common ancestors of plants have also been sequenced. However, no analysis of the relationship to protein properties such as structure and PTMs in algae has been reported. Here, we describe correlations between IDR content and the number of PTM sites for phosphorylation, glycosylation, and ubiquitination, and between IDR content and regions rich in proline, glutamic acid, serine, and threonine (PEST) and transmembrane helices in the sequences of 20 algae proteomes. Phosphorylation, O-glycosylation, ubiquitination, and PEST preferentially occurred in disordered regions. In contrast, transmembrane helices were favored in ordered regions. N-glycosylation tended to occur in ordered regions in most of the studied algae; however, it correlated positively with disordered protein content in diatoms. Additionally, we observed that disordered protein content and the number of PTM sites were significantly increased in the species-specific protein clusters compared to common protein clusters among the algae. Moreover, there were specific relationships between IDRs and PTMs among the algae from different groups.
Collapse
Affiliation(s)
- Atsushi Kurotani
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan.
| | - Tetsuya Sakurai
- RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan.
| |
Collapse
|
11
|
Pan IC, Tsai HH, Cheng YT, Wen TN, Buckhout TJ, Schmidt W. Post-Transcriptional Coordination of the Arabidopsis Iron Deficiency Response is Partially Dependent on the E3 Ligases RING DOMAIN LIGASE1 (RGLG1) and RING DOMAIN LIGASE2 (RGLG2). Mol Cell Proteomics 2015; 14:2733-52. [PMID: 26253232 DOI: 10.1074/mcp.m115.048520] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2015] [Indexed: 11/06/2022] Open
Abstract
Acclimation to changing environmental conditions is mediated by proteins, the abundance of which is carefully tuned by an elaborate interplay of DNA-templated and post-transcriptional processes. To dissect the mechanisms that control and mediate cellular iron homeostasis, we conducted quantitative high-resolution iTRAQ proteomics and microarray-based transcriptomic profiling of iron-deficient Arabidopsis thaliana plants. A total of 13,706 and 12,124 proteins was identified with a quadrupole-Orbitrap hybrid mass spectrometer in roots and leaves, respectively. This deep proteomic coverage allowed accurate estimates of post-transcriptional regulation in response to iron deficiency. Similarly regulated transcripts were detected in only 13% (roots) and 11% (leaves) of the 886 proteins that differentially accumulated between iron-sufficient and iron-deficient plants, indicating that the majority of the iron-responsive proteins was post-transcriptionally regulated. Mutants harboring defects in the RING DOMAIN LIGASE1 (RGLG1)(1) and RING DOMAIN LIGASE2 (RGLG2) showed a pleiotropic phenotype that resembled iron-deficient plants with reduced trichome density and the formation of branched root hairs. Proteomic and transcriptomic profiling of rglg1 rglg2 double mutants revealed that the functional RGLG protein is required for the regulation of a large set of iron-responsive proteins including the coordinated expression of ribosomal proteins. This integrative analysis provides a detailed catalog of post-transcriptionally regulated proteins and allows the concept of a chiefly transcriptionally regulated iron deficiency response to be revisited. Protein data are available via ProteomeXchange with identifier PXD002126.
Collapse
Affiliation(s)
- I-Chun Pan
- From the ‡Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan
| | - Huei-Hsuan Tsai
- From the ‡Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan
| | - Ya-Tan Cheng
- From the ‡Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan
| | - Tuan-Nan Wen
- From the ‡Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan
| | | | - Wolfgang Schmidt
- From the ‡Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan; ¶Biotechnology Center, National Chung-Hsing University, Taichung, Taiwan; ‖Genome and Systems Biology Degree Program, College of Life Science, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
12
|
Lee T, Kim H, Lee I. Network-assisted crop systems genetics: network inference and integrative analysis. CURRENT OPINION IN PLANT BIOLOGY 2015; 24:61-70. [PMID: 25698380 DOI: 10.1016/j.pbi.2015.02.001] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2014] [Revised: 01/15/2015] [Accepted: 02/02/2015] [Indexed: 05/24/2023]
Abstract
Although next-generation sequencing (NGS) technology has enabled the decoding of many crop species genomes, most of the underlying genetic components for economically important crop traits remain to be determined. Network approaches have proven useful for the study of the reference plant, Arabidopsis thaliana, and the success of network-based crop genetics will also require the availability of a genome-scale functional networks for crop species. In this review, we discuss how to construct functional networks and elucidate the holistic view of a crop system. The crop gene network then can be used for gene prioritization and the analysis of resequencing-based genome-wide association study (GWAS) data, the amount of which will rapidly grow in the field of crop science in the coming years.
Collapse
Affiliation(s)
- Tak Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
| | - Hyojin Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea.
| |
Collapse
|
13
|
Kurotani A, Yamada Y, Shinozaki K, Kuroda Y, Sakurai T. Plant-PrAS: a database of physicochemical and structural properties and novel functional regions in plant proteomes. PLANT & CELL PHYSIOLOGY 2015; 56:e11. [PMID: 25435546 PMCID: PMC4301743 DOI: 10.1093/pcp/pcu176] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2014] [Accepted: 10/31/2014] [Indexed: 05/21/2023]
Abstract
Arabidopsis thaliana is an important model species for studies of plant gene functions. Research on Arabidopsis has resulted in the generation of high-quality genome sequences, annotations and related post-genomic studies. The amount of annotation, such as gene-coding regions and structures, is steadily growing in the field of plant research. In contrast to the genomics resource of animals and microorganisms, there are still some difficulties with characterization of some gene functions in plant genomics studies. The acquisition of information on protein structure can help elucidate the corresponding gene function because proteins encoded in the genome possess highly specific structures and functions. In this study, we calculated multiple physicochemical and secondary structural parameters of protein sequences, including length, hydrophobicity, the amount of secondary structure, the number of intrinsically disordered regions (IDRs) and the predicted presence of transmembrane helices and signal peptides, using a total of 208,333 protein sequences from the genomes of six representative plant species, Arabidopsis thaliana, Glycine max (soybean), Populus trichocarpa (poplar), Oryza sativa (rice), Physcomitrella patens (moss) and Cyanidioschyzon merolae (alga). Using the PASS tool and the Rosetta Stone method, we annotated the presence of novel functional regions in 1,732 protein sequences that included unannotated sequences from the Arabidopsis and rice proteomes. These results were organized into the Plant Protein Annotation Suite database (Plant-PrAS), which can be freely accessed online at http://plant-pras.riken.jp/.
Collapse
Affiliation(s)
- Atsushi Kurotani
- RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045 Japan Department of Biotechnology and Life Sciences, Faculty of Technology, Tokyo University of Agriculture and Technology, Koganei, Tokyo, 184-8588 Japan
| | - Yutaka Yamada
- RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045 Japan
| | - Kazuo Shinozaki
- RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045 Japan
| | - Yutaka Kuroda
- Department of Biotechnology and Life Sciences, Faculty of Technology, Tokyo University of Agriculture and Technology, Koganei, Tokyo, 184-8588 Japan
| | - Tetsuya Sakurai
- RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045 Japan
| |
Collapse
|
14
|
Bargsten JW, Nap JP, Sanchez-Perez GF, van Dijk ADJ. Prioritization of candidate genes in QTL regions based on associations between traits and biological processes. BMC PLANT BIOLOGY 2014; 14:330. [PMID: 25492368 PMCID: PMC4274756 DOI: 10.1186/s12870-014-0330-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Accepted: 11/10/2014] [Indexed: 05/18/2023]
Abstract
BACKGROUND Elucidation of genotype-to-phenotype relationships is a major challenge in biology. In plants, it is the basis for molecular breeding. Quantitative Trait Locus (QTL) mapping enables to link variation at the trait level to variation at the genomic level. However, QTL regions typically contain tens to hundreds of genes. In order to prioritize such candidate genes, we show that we can identify potentially causal genes for a trait based on overrepresentation of biological processes (gene functions) for the candidate genes in the QTL regions of that trait. RESULTS The prioritization method was applied to rice QTL data, using gene functions predicted on the basis of sequence- and expression-information. The average reduction of the number of genes was over ten-fold. Comparison with various types of experimental datasets (including QTL fine-mapping and Genome Wide Association Study results) indicated both statistical significance and biological relevance of the obtained connections between genes and traits. A detailed analysis of flowering time QTLs illustrates that genes with completely unknown function are likely to play a role in this important trait. CONCLUSIONS Our approach can guide further experimentation and validation of causal genes for quantitative traits. This way it capitalizes on QTL data to uncover how individual genes influence trait variation.
Collapse
Affiliation(s)
- Joachim W Bargsten
- />Applied Bioinformatics, Bioscience, Plant Sciences Group, Wageningen University and Research Centre, Wageningen, The Netherlands
- />Netherlands Bioinformatics Centre (NBIC), Nijmegen, The Netherlands
- />Laboratory for Plant Breeding, Plant Sciences Group, Wageningen University and Research Centre, Wageningen, The Netherlands
| | - Jan-Peter Nap
- />Applied Bioinformatics, Bioscience, Plant Sciences Group, Wageningen University and Research Centre, Wageningen, The Netherlands
- />Netherlands Bioinformatics Centre (NBIC), Nijmegen, The Netherlands
| | - Gabino F Sanchez-Perez
- />Applied Bioinformatics, Bioscience, Plant Sciences Group, Wageningen University and Research Centre, Wageningen, The Netherlands
- />Laboratory of Bioinformatics, Plant Sciences Group, Wageningen University and Research Centre, Wageningen, The Netherlands
| | - Aalt DJ van Dijk
- />Applied Bioinformatics, Bioscience, Plant Sciences Group, Wageningen University and Research Centre, Wageningen, The Netherlands
- />Biometris, Wageningen University and Research Centre, Wageningen, The Netherlands
| |
Collapse
|
15
|
Vermeirssen V, De Clercq I, Van Parys T, Van Breusegem F, Van de Peer Y. Arabidopsis ensemble reverse-engineered gene regulatory network discloses interconnected transcription factors in oxidative stress. THE PLANT CELL 2014; 26:4656-79. [PMID: 25549671 PMCID: PMC4311199 DOI: 10.1105/tpc.114.131417] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2014] [Revised: 11/27/2014] [Accepted: 12/10/2014] [Indexed: 05/19/2023]
Abstract
The abiotic stress response in plants is complex and tightly controlled by gene regulation. We present an abiotic stress gene regulatory network of 200,014 interactions for 11,938 target genes by integrating four complementary reverse-engineering solutions through average rank aggregation on an Arabidopsis thaliana microarray expression compendium. This ensemble performed the most robustly in benchmarking and greatly expands upon the availability of interactions currently reported. Besides recovering 1182 known regulatory interactions, cis-regulatory motifs and coherent functionalities of target genes corresponded with the predicted transcription factors. We provide a valuable resource of 572 abiotic stress modules of coregulated genes with functional and regulatory information, from which we deduced functional relationships for 1966 uncharacterized genes and many regulators. Using gain- and loss-of-function mutants of seven transcription factors grown under control and salt stress conditions, we experimentally validated 141 out of 271 predictions (52% precision) for 102 selected genes and mapped 148 additional transcription factor-gene regulatory interactions (49% recall). We identified an intricate core oxidative stress regulatory network where NAC13, NAC053, ERF6, WRKY6, and NAC032 transcription factors interconnect and function in detoxification. Our work shows that ensemble reverse-engineering can generate robust biological hypotheses of gene regulation in a multicellular eukaryote that can be tested by medium-throughput experimental validation.
Collapse
Affiliation(s)
- Vanessa Vermeirssen
- Department of Plant Systems Biology, VIB, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium
| | - Inge De Clercq
- Department of Plant Systems Biology, VIB, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium
| | - Thomas Van Parys
- Department of Plant Systems Biology, VIB, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium
| | - Frank Van Breusegem
- Department of Plant Systems Biology, VIB, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium
| | - Yves Van de Peer
- Department of Plant Systems Biology, VIB, 9052 Gent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Gent, Belgium Genomics Research Institute, University of Pretoria, Pretoria 0028, South Africa
| |
Collapse
|
16
|
Hooper CM, Tanz SK, Castleden IR, Vacher MA, Small ID, Millar AH. SUBAcon: a consensus algorithm for unifying the subcellular localization data of the Arabidopsis proteome. ACTA ACUST UNITED AC 2014; 30:3356-64. [PMID: 25150248 DOI: 10.1093/bioinformatics/btu550] [Citation(s) in RCA: 123] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
MOTIVATION Knowing the subcellular location of proteins is critical for understanding their function and developing accurate networks representing eukaryotic biological processes. Many computational tools have been developed to predict proteome-wide subcellular location, and abundant experimental data from green fluorescent protein (GFP) tagging or mass spectrometry (MS) are available in the model plant, Arabidopsis. None of these approaches is error-free, and thus, results are often contradictory. RESULTS To help unify these multiple data sources, we have developed the SUBcellular Arabidopsis consensus (SUBAcon) algorithm, a naive Bayes classifier that integrates 22 computational prediction algorithms, experimental GFP and MS localizations, protein-protein interaction and co-expression data to derive a consensus call and probability. SUBAcon classifies protein location in Arabidopsis more accurately than single predictors. AVAILABILITY SUBAcon is a useful tool for recovering proteome-wide subcellular locations of Arabidopsis proteins and is displayed in the SUBA3 database (http://suba.plantenergy.uwa.edu.au). The source code and input data is available through the SUBA3 server (http://suba.plantenergy.uwa.edu.au//SUBAcon.html) and the Arabidopsis SUbproteome REference (ASURE) training set can be accessed using the ASURE web portal (http://suba.plantenergy.uwa.edu.au/ASURE).
Collapse
Affiliation(s)
- Cornelia M Hooper
- Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| | - Sandra K Tanz
- Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| | - Ian R Castleden
- Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| | - Michael A Vacher
- Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| | - Ian D Small
- Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| | - A Harvey Millar
- Centre of Excellence in Computational Systems Biology, The University of Western Australia, Perth, WA 6009, Australia and ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| |
Collapse
|
17
|
Mewalal R, Mizrachi E, Mansfield SD, Myburg AA. Cell wall-related proteins of unknown function: missing links in plant cell wall development. PLANT & CELL PHYSIOLOGY 2014; 55:1031-43. [PMID: 24683037 DOI: 10.1093/pcp/pcu050] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Lignocellulosic biomass is an important feedstock for the pulp and paper industry as well as emerging biofuel and biomaterial industries. However, the recalcitrance of the secondary cell wall to chemical or enzymatic degradation remains a major hurdle for efficient extraction of economically important biopolymers such as cellulose. It has been estimated that approximately 10-15% of about 27,000 protein-coding genes in the Arabidopsis genome are dedicated to cell wall development; however, only about 130 Arabidopsis genes thus far have experimental evidence validating cell wall function. While many genes have been implicated through co-expression analysis with known genes, a large number are broadly classified as proteins of unknown function (PUFs). Recently the functionality of some of these unknown proteins in cell wall development has been revealed using reverse genetic approaches. Given the large number of cell wall-related PUFs, how do we approach and subsequently prioritize the investigation of such unknown genes that may be essential to or influence plant cell wall development and structure? Here, we address the aforementioned question in two parts; we first identify the different kinds of PUFs based on known and predicted features such as protein domains. Knowledge of inherent features of PUFs may allow for functional inference and a concomitant link to biological context. Secondly, we discuss omics-based technologies and approaches that are helping identify and prioritize cell wall-related PUFs by functional association. In this way, hypothesis-driven experiments can be designed for functional elucidation of many proteins that remain missing links in our understanding of plant cell wall biosynthesis.
Collapse
Affiliation(s)
- Ritesh Mewalal
- Department of Genetics, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Private bag X20, Hatfield, Pretoria, 0028, South Africa
| | - Eshchar Mizrachi
- Department of Genetics, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Private bag X20, Hatfield, Pretoria, 0028, South Africa
| | - Shawn D Mansfield
- Department of Wood Science, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Alexander A Myburg
- Department of Genetics, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Private bag X20, Hatfield, Pretoria, 0028, South Africa
| |
Collapse
|
18
|
Rhee SY, Mutwil M. Towards revealing the functions of all genes in plants. TRENDS IN PLANT SCIENCE 2014; 19:212-21. [PMID: 24231067 DOI: 10.1016/j.tplants.2013.10.006] [Citation(s) in RCA: 158] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Revised: 10/10/2013] [Accepted: 10/16/2013] [Indexed: 05/19/2023]
Abstract
The great recent progress made in identifying the molecular parts lists of organisms revealed the paucity of our understanding of what most of the parts do. In this review, we introduce computational and statistical approaches and omics data used for inferring gene function in plants, with an emphasis on network-based inference. We also discuss caveats associated with network-based function predictions such as performance assessment, annotation propagation, the guilt-by-association concept, and the meaning of hubs. Finally, we note the current limitations and possible future directions such as the need for gold standard data from several species, unified access to data and tools, quantitative comparison of data and tool quality, and high-throughput experimental validation platforms for systematic gene function elucidation in plants.
Collapse
Affiliation(s)
- Seung Yon Rhee
- Carnegie Institution for Science, Department of Plant Biology, 260 Panama St, Stanford, CA 94305, USA.
| | - Marek Mutwil
- Max Planck Institute for Molecular Plant Physiology, 14476 Potsdam, Germany.
| |
Collapse
|
19
|
Hansen BO, Vaid N, Musialak-Lange M, Janowski M, Mutwil M. Elucidating gene function and function evolution through comparison of co-expression networks of plants. FRONTIERS IN PLANT SCIENCE 2014; 5:394. [PMID: 25191328 PMCID: PMC4137175 DOI: 10.3389/fpls.2014.00394] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Accepted: 07/23/2014] [Indexed: 05/20/2023]
Abstract
The analysis of gene expression data has shown that transcriptionally coordinated (co-expressed) genes are often functionally related, enabling scientists to use expression data in gene function prediction. This Focused Review discusses our original paper (Large-scale co-expression approach to dissect secondary cell wall formation across plant species, Frontiers in Plant Science 2:23). In this paper we applied cross-species analysis to co-expression networks of genes involved in cellulose biosynthesis. We showed that the co-expression networks from different species are highly similar, indicating that whole biological pathways are conserved across species. This finding has two important implications. First, the analysis can transfer gene function annotation from well-studied plants, such as Arabidopsis, to other, uncharacterized plant species. As the analysis finds genes that have similar sequence and similar expression pattern across different organisms, functionally equivalent genes can be identified. Second, since co-expression analyses are often noisy, a comparative analysis should have higher performance, as parts of co-expression networks that are conserved are more likely to be functionally relevant. In this Focused Review, we outline the comparative analysis done in the original paper and comment on the recent advances and approaches that allow comparative analyses of co-function networks. We hypothesize that in comparison to simple co-expression analysis, comparative analysis would yield more accurate gene function predictions. Finally, by combining comparative analysis with genomic information of green plants, we propose a possible composition of cellulose biosynthesis machinery during earlier stages of plant evolution.
Collapse
|
20
|
Kourmpetis YAI, van Dijk ADJ, ter Braak CJF. Gene Ontology consistent protein function prediction: the FALCON algorithm applied to six eukaryotic genomes. Algorithms Mol Biol 2013; 8:10. [PMID: 23531338 PMCID: PMC3691668 DOI: 10.1186/1748-7188-8-10] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2011] [Accepted: 03/04/2013] [Indexed: 11/10/2022] Open
Abstract
: Gene Ontology (GO) is a hierarchical vocabulary for the description of biological functions and locations, often employed by computational methods for protein function prediction. Due to the structure of GO, function predictions can be self- contradictory. For example, a protein may be predicted to belong to a detailed functional class, but not in a broader class that, due to the vocabulary structure, includes the predicted one.We present a novel discrete optimization algorithm called Functional Annotation with Labeling CONsistency (FALCON) that resolves such contradictions. The GO is modeled as a discrete Bayesian Network. For any given input of GO term membership probabilities, the algorithm returns the most probable GO term assignments that are in accordance with the Gene Ontology structure. The optimization is done using the Differential Evolution algorithm. Performance is evaluated on simulated and also real data from Arabidopsis thaliana showing improvement compared to related approaches. We finally applied the FALCON algorithm to obtain genome-wide function predictions for six eukaryotic species based on data provided by the CAFA (Critical Assessment of Function Annotation) project.
Collapse
Affiliation(s)
- Yiannis AI Kourmpetis
- Biometris, Wageningen University and Research Centre, 6700AC Wageningen, The Netherlands
- Current address: Functional Genomics, Nestlé Institute of Health Sciences, Campus EPFL, Quartier de l’Innovation, 1015 Lausanne, Switzerland
| | - Aalt DJ van Dijk
- Biometris, Wageningen University and Research Centre, 6700AC Wageningen, The Netherlands
- Applied Bioinformatics, Plant Research International, Wageningen University and Research Centre, 6700AC Wageningen, The Netherlands
| | - Cajo JF ter Braak
- Biometris, Wageningen University and Research Centre, 6700AC Wageningen, The Netherlands
| |
Collapse
|
21
|
Kusano M, Fukushima A. Current challenges and future potential of tomato breeding using omics approaches. BREEDING SCIENCE 2013; 63:31-41. [PMID: 23641179 PMCID: PMC3621443 DOI: 10.1270/jsbbs.63.31] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2012] [Accepted: 10/30/2012] [Indexed: 05/16/2023]
Abstract
As tomatoes are one of the most important vegetables in the world, improvements in the quality and yield of tomato are strongly required. For this purpose, omics approaches such as metabolomics and transcriptomics are used not only for basic research to understand relationships between important traits and metabolism but also for the development of next generation breeding strategies of tomato plants, because an increase in the knowledge improves the taste and quality, stress resistance and/or potentially health-beneficial metabolites and is connected to improvements in the biochemical composition of tomatoes. Such omics data can be applied to network analyses to potentially reveal unknown cellular regulatory networks in tomato plants. The high-quality tomato genome that was sequenced in 2012 will likely accelerate the application of omics strategies, including next generation sequencing for tomato breeding. In this review, we highlight the current studies of omics network analyses of tomatoes and other plant species, in particular, a gene coexpression network. Key applications of omics approaches are also presented as case examples to improve economically important traits for tomato breeding.
Collapse
Affiliation(s)
- Miyako Kusano
- RIKEN Plant Science Center, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
- Kihara Institute for Biological Research, Yokohama City University, 641-12 Maioka, Totsuka, Yokohama, Kanagawa 244-0813, Japan
- Corresponding author (e-mail: )
| | - Atsushi Fukushima
- RIKEN Plant Science Center, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
| |
Collapse
|
22
|
Van Landeghem S, De Bodt S, Drebert ZJ, Inzé D, Van de Peer Y. The potential of text mining in data integration and network biology for plant research: a case study on Arabidopsis. THE PLANT CELL 2013; 25:794-807. [PMID: 23532071 PMCID: PMC3634689 DOI: 10.1105/tpc.112.108753] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Revised: 02/27/2013] [Accepted: 03/08/2013] [Indexed: 05/21/2023]
Abstract
Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein-protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies.
Collapse
Affiliation(s)
- Sofie Van Landeghem
- Department of Plant Systems Biology, VIB, 9052 Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
| | - Stefanie De Bodt
- Department of Plant Systems Biology, VIB, 9052 Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
| | - Zuzanna J. Drebert
- Department of Plant Systems Biology, VIB, 9052 Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
| | - Dirk Inzé
- Department of Plant Systems Biology, VIB, 9052 Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
| | - Yves Van de Peer
- Department of Plant Systems Biology, VIB, 9052 Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
- Address correspondence to
| |
Collapse
|
23
|
Abstract
Background Computational/manual annotations of protein functions are one of the first routes to making sense of a newly sequenced genome. Protein domain predictions form an essential part of this annotation process. This is due to the natural modularity of proteins with domains as structural, evolutionary and functional units. Sometimes two, three, or more adjacent domains (called supra-domains) are the operational unit responsible for a function, e.g. via a binding site at the interface. These supra-domains have contributed to functional diversification in higher organisms. Traditionally functional ontologies have been applied to individual proteins, rather than families of related domains and supra-domains. We expect, however, to some extent functional signals can be carried by protein domains and supra-domains, and consequently used in function prediction and functional genomics. Results Here we present a domain-centric Gene Ontology (dcGO) perspective. We generalize a framework for automatically inferring ontological terms associated with domains and supra-domains from full-length sequence annotations. This general framework has been applied specifically to primary protein-level annotations from UniProtKB-GOA, generating GO term associations with SCOP domains and supra-domains. The resulting 'dcGO Predictor', can be used to provide functional annotation to protein sequences. The functional annotation of sequences in the Critical Assessment of Function Annotation (CAFA) has been used as a valuable opportunity to validate our method and to be assessed by the community. The functional annotation of all completely sequenced genomes has demonstrated the potential for domain-centric GO enrichment analysis to yield functional insights into newly sequenced or yet-to-be-annotated genomes. This generalized framework we have presented has also been applied to other domain classifications such as InterPro and Pfam, and other ontologies such as mammalian phenotype and disease ontology. The dcGO and its predictor are available at http://supfam.org/SUPERFAMILY/dcGO including an enrichment analysis tool. Conclusions As functional units, domains offer a unique perspective on function prediction regardless of whether proteins are multi-domain or single-domain. The 'dcGO Predictor' holds great promise for contributing to a domain-centric functional understanding of genomes in the next generation sequencing era.
Collapse
Affiliation(s)
- Hai Fang
- Department of Computer Science, University of Bristol, The Merchant Venturers Building, Bristol BS8 1UB, UK.
| | | |
Collapse
|
24
|
Incidence of genome structure, DNA asymmetry, and cell physiology on T-DNA integration in chromosomes of the phytopathogenic fungus Leptosphaeria maculans. G3-GENES GENOMES GENETICS 2012; 2:891-904. [PMID: 22908038 PMCID: PMC3411245 DOI: 10.1534/g3.112.002048] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2012] [Accepted: 06/07/2012] [Indexed: 11/18/2022]
Abstract
The ever-increasing generation of sequence data is accompanied by unsatisfactory functional annotation, and complex genomes, such as those of plants and filamentous fungi, show a large number of genes with no predicted or known function. For functional annotation of unknown or hypothetical genes, the production of collections of mutants using Agrobacterium tumefaciens–mediated transformation (ATMT) associated with genotyping and phenotyping has gained wide acceptance. ATMT is also widely used to identify pathogenicity determinants in pathogenic fungi. A systematic analysis of T-DNA borders was performed in an ATMT-mutagenized collection of the phytopathogenic fungus Leptosphaeria maculans to evaluate the features of T-DNA integration in its particular transposable element-rich compartmentalized genome. A total of 318 T-DNA tags were recovered and analyzed for biases in chromosome and genic compartments, existence of CG/AT skews at the insertion site, and occurrence of microhomologies between the T-DNA left border (LB) and the target sequence. Functional annotation of targeted genes was done using the Gene Ontology annotation. The T-DNA integration mainly targeted gene-rich, transcriptionally active regions, and it favored biological processes consistent with the physiological status of a germinating spore. T-DNA integration was strongly biased toward regulatory regions, and mainly promoters. Consistent with the T-DNA intranuclear-targeting model, the density of T-DNA insertion correlated with CG skew near the transcription initiation site. The existence of microhomologies between promoter sequences and the T-DNA LB flanking sequence was also consistent with T-DNA integration to host DNA mediated by homologous recombination based on the microhomology-mediated end-joining pathway.
Collapse
|
25
|
Heyndrickx KS, Vandepoele K. Systematic identification of functional plant modules through the integration of complementary data sources. PLANT PHYSIOLOGY 2012; 159:884-901. [PMID: 22589469 PMCID: PMC3387714 DOI: 10.1104/pp.112.196725] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
A major challenge is to unravel how genes interact and are regulated to exert specific biological functions. The integration of genome-wide functional genomics data, followed by the construction of gene networks, provides a powerful approach to identify functional gene modules. Large-scale expression data, functional gene annotations, experimental protein-protein interactions, and transcription factor-target interactions were integrated to delineate modules in Arabidopsis (Arabidopsis thaliana). The different experimental input data sets showed little overlap, demonstrating the advantage of combining multiple data types to study gene function and regulation. In the set of 1,563 modules covering 13,142 genes, most modules displayed strong coexpression, but functional and cis-regulatory coherence was less prevalent. Highly connected hub genes showed a significant enrichment toward embryo lethality and evidence for cross talk between different biological processes. Comparative analysis revealed that 58% of the modules showed conserved coexpression across multiple plants. Using module-based functional predictions, 5,562 genes were annotated, and an evaluation experiment disclosed that, based on 197 recently experimentally characterized genes, 38.1% of these functions could be inferred through the module context. Examples of confirmed genes of unknown function related to cell wall biogenesis, xylem and phloem pattern formation, cell cycle, hormone stimulus, and circadian rhythm highlight the potential to identify new gene functions. The module-based predictions offer new biological hypotheses for functionally unknown genes in Arabidopsis (1,701 genes) and six other plant species (43,621 genes). Furthermore, the inferred modules provide new insights into the conservation of coexpression and coregulation as well as a starting point for comparative functional annotation.
Collapse
|