1
|
Khan MS, Kumar S, Singh RK, Singh J, Duttamajumder SK, Kapur R. Characterization of leaf transcriptome, development and utilization of unigenes-derived microsatellite markers in sugarcane ( Saccharum sp. hybrid). PHYSIOLOGY AND MOLECULAR BIOLOGY OF PLANTS : AN INTERNATIONAL JOURNAL OF FUNCTIONAL PLANT BIOLOGY 2018; 24:665-682. [PMID: 30042621 PMCID: PMC6041238 DOI: 10.1007/s12298-018-0563-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Revised: 05/14/2018] [Accepted: 05/22/2018] [Indexed: 06/08/2023]
Abstract
Sugarcane (Saccharum species hybrid) is the major source of sugar (> 80% sugar) in the world and is cultivated in more than 115 countries. It has recently gained attention as a source of biofuel (ethanol). Due to genomic complexity, the development of new genomic resources is imperative in understanding the gene regulation and function, and to fine tune the genetic improvement of sugarcane. In this study, a cDNA library was constructed from mature leaves so as to develop ESTs resources which were further compared with nucleotide and protein databases to explore the functional identity of sugarcane genes. The non-redundant ESTs (unigenes) were categorized into 18 metabolic functions. The major categories were bioenergetics and photosynthesis (4%), cell metabolism (5%), development related protein (3%), membrane-related, mobile genetic elements (5%), signal transduction (2%), DNA (1%), RNA (1%) and protein (2%) metabolism, other metabolic processes (3%), transcription factors (1%), transport (4%) and proteins related to stress/defense (4%). From 540 unique ESTs, 212 simple sequence repeats were identified, of which 206 were from 463 singlets and six were mined from 77 contig sequences. A total of 540 unique EST sequences were used for SSR search of which 97 (17.9%) contained specified SSR motifs, generating 212 unique SSRs. The genes characterized in this study and the EST-derived microsatellite markers identified from the cDNA library will enrich genomic resources for association- and linkage-mapping studies in sugarcane.
Collapse
Affiliation(s)
- Mohammad Suhail Khan
- ICAR-Indian Institute of Sugarcane Research, Raibareli Road, P.O. Dilkusha, Lucknow, U.P. 226002 India
| | - Sanjeev Kumar
- ICAR-Indian Institute of Sugarcane Research, Raibareli Road, P.O. Dilkusha, Lucknow, U.P. 226002 India
| | - Ram Kewal Singh
- ICAR-Indian Institute of Sugarcane Research, Raibareli Road, P.O. Dilkusha, Lucknow, U.P. 226002 India
- Present Address: Division of Crop Science, Indian Council of Agricultural Research, Dr. Rajendra Prasad Road, Krishi Bhawan, New Delhi, 110 001 India
| | - Jyotsnendra Singh
- ICAR-Indian Institute of Sugarcane Research, Raibareli Road, P.O. Dilkusha, Lucknow, U.P. 226002 India
| | | | - Raman Kapur
- ICAR-Indian Institute of Sugarcane Research, Raibareli Road, P.O. Dilkusha, Lucknow, U.P. 226002 India
| |
Collapse
|
2
|
De novo assembly and transcriptome analysis of contrasting sugarcane varieties. PLoS One 2014; 9:e88462. [PMID: 24523899 PMCID: PMC3921171 DOI: 10.1371/journal.pone.0088462] [Citation(s) in RCA: 101] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2013] [Accepted: 01/07/2014] [Indexed: 02/01/2023] Open
Abstract
Sugarcane is an important crop and a major source of sugar and alcohol. In this study, we performed de novo assembly and transcriptome annotation for six sugarcane genotypes involved in bi-parental crosses. The de novo assembly of the sugarcane transcriptome was performed using short reads generated using the Illumina RNA-Seq platform. We produced more than 400 million reads, which were assembled into 72,269 unigenes. Based on a similarity search, the unigenes showed significant similarity to more than 28,788 sorghum proteins, including a set of 5,272 unigenes that are not present in the public sugarcane EST databases; many of these unigenes are likely putative undescribed sugarcane genes. From this collection of unigenes, a large number of molecular markers were identified, including 5,106 simple sequence repeats (SSRs) and 708,125 single-nucleotide polymorphisms (SNPs). This new dataset will be a useful resource for future genetic and genomic studies in this species.
Collapse
|
3
|
Thiebaut F, Grativol C, Carnavale-Bottino M, Rojas CA, Tanurdzic M, Farinelli L, Martienssen RA, Hemerly AS, Ferreira PCG. Computational identification and analysis of novel sugarcane microRNAs. BMC Genomics 2012; 13:290. [PMID: 22747909 PMCID: PMC3464620 DOI: 10.1186/1471-2164-13-290] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2012] [Accepted: 05/02/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND MicroRNA-regulation of gene expression plays a key role in the development and response to biotic and abiotic stresses. Deep sequencing analyses accelerate the process of small RNA discovery in many plants and expand our understanding of miRNA-regulated processes. We therefore undertook small RNA sequencing of sugarcane miRNAs in order to understand their complexity and to explore their role in sugarcane biology. RESULTS A bioinformatics search was carried out to discover novel miRNAs that can be regulated in sugarcane plants submitted to drought and salt stresses, and under pathogen infection. By means of the presence of miRNA precursors in the related sorghum genome, we identified 623 candidates of new mature miRNAs in sugarcane. Of these, 44 were classified as high confidence miRNAs. The biological function of the new miRNAs candidates was assessed by analyzing their putative targets. The set of bona fide sugarcane miRNA includes those likely targeting serine/threonine kinases, Myb and zinc finger proteins. Additionally, a MADS-box transcription factor and an RPP2B protein, which act in development and disease resistant processes, could be regulated by cleavage (21-nt-species) and DNA methylation (24-nt-species), respectively. CONCLUSIONS A large scale investigation of sRNA in sugarcane using a computational approach has identified a substantial number of new miRNAs and provides detailed genotype-tissue-culture miRNA expression profiles. Comparative analysis between monocots was valuable to clarify aspects about conservation of miRNA and their targets in a plant whose genome has not yet been sequenced. Our findings contribute to knowledge of miRNA roles in regulatory pathways in the complex, polyploidy sugarcane genome.
Collapse
Affiliation(s)
- Flávia Thiebaut
- Laboratorio de Biologia Molecular de Plantas, Instituto de Bioquímica Médica, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brazil
| | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Abstract
In the canonical version of evolution by gene duplication, one copy is kept unaltered while the other is free to evolve. This process of evolutionary experimentation can persist for millions of years. Since it is so short lived in comparison to the lifetime of the core genes that make up the majority of most genomes, a substantial fraction of the genome and the transcriptome may—in principle—be attributable to what we will refer to as “evolutionary transients”, referring here to both the process and the genes that have gone or are undergoing this process. Using the rice gene set as a test case, we argue that this phenomenon goes a long way towards explaining why there are so many more rice genes than Arabidopsis genes, and why most excess rice genes show low similarity to eudicots.
Collapse
|
5
|
Umate P, Tuteja R, Tuteja N. Genome-wide analysis of helicase gene family from rice and Arabidopsis: a comparison with yeast and human. PLANT MOLECULAR BIOLOGY 2010; 73:449-65. [PMID: 20383562 DOI: 10.1007/s11103-010-9632-5] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2010] [Accepted: 03/18/2010] [Indexed: 05/04/2023]
Abstract
Helicases are motor proteins which can catalyze the unwinding of stable RNA or DNA duplex utilizing mainly ATP as source of energy. In this study we have identified complete sets of helicases from rice and Arabidopsis. The helicase gene family in rice and Arabidopsis contains 115 and 113 genes respectively. These helicases were validated based on their annotations and supported with organization of conserved helicase signature motifs. We have also identified homologs of 64 rice RNA and DNA helicases in Arabidopsis, yeast and human. We explored Arabidopsis oligonucleotide array data to gain functional insights into the transcriptome of helicase family members under ten different stress conditions. Our results revealed that expression of helicase genes is profoundly regulated under various stress conditions. The helicases identified in this study lay a foundation for the in depth characterization of each helicase type.
Collapse
Affiliation(s)
- Pavan Umate
- Plant Molecular Biology Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), Aruna Asaf Ali Marg, New Delhi 110067, India
| | | | | |
Collapse
|
6
|
Jacquemin J, Laudié M, Cooke R. A recent duplication revisited: phylogenetic analysis reveals an ancestral duplication highly-conserved throughout the Oryza genus and beyond. BMC PLANT BIOLOGY 2009; 9:146. [PMID: 20003305 PMCID: PMC2797015 DOI: 10.1186/1471-2229-9-146] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2009] [Accepted: 12/10/2009] [Indexed: 05/18/2023]
Abstract
BACKGROUND The role of gene duplication in the structural and functional evolution of genomes has been well documented. Analysis of complete rice (Oryza sativa) genome sequences suggested an ancient whole genome duplication, common to all the grasses, some 50-70 million years ago and a more conserved segmental duplication between the distal regions of the short arms of chromosomes 11 and 12, whose evolutionary history is controversial. RESULTS We have carried out a comparative analysis of this duplication within the wild species of the genus Oryza, using a phylogenetic approach to specify its origin and evolutionary dynamics. Paralogous pairs were isolated for nine genes selected throughout the region in all Oryza genome types, as well as in two outgroup species, Leersia perrieri and Potamophila parviflora. All Oryza species display the same global evolutionary dynamics but some lineage-specific features appear towards the proximal end of the duplicated region. The same level of conservation is observed between the redundant copies of the tetraploid species Oryza minuta. The presence of orthologous duplicated blocks in the genome of the more distantly-related species, Brachypodium distachyon, strongly suggests that this duplication between chromosomes 11 and 12 was formed as part of the whole genome duplication common to all Poaceae. CONCLUSION Our observations suggest that recurrent but heterogeneous concerted evolution throughout the Oryza genus and in related species has led specifically to the extremely high sequence conservation occurring in this region of more than 2 Mbp.
Collapse
Affiliation(s)
- Julie Jacquemin
- Laboratoire Génome et Développement des Plantes, Unité mixte de recherche 5096, Centre national de la recherche scientifique, Institut pour la recherche et le développement, Université de Perpignan via Domitia, 58, Av Paul Alduy, 66860 Perpignan Cedex, France
| | - Michèle Laudié
- Laboratoire Génome et Développement des Plantes, Unité mixte de recherche 5096, Centre national de la recherche scientifique, Institut pour la recherche et le développement, Université de Perpignan via Domitia, 58, Av Paul Alduy, 66860 Perpignan Cedex, France
| | - Richard Cooke
- Laboratoire Génome et Développement des Plantes, Unité mixte de recherche 5096, Centre national de la recherche scientifique, Institut pour la recherche et le développement, Université de Perpignan via Domitia, 58, Av Paul Alduy, 66860 Perpignan Cedex, France
| |
Collapse
|
7
|
Corrêa LGG, Riaño-Pachón DM, Schrago CG, dos Santos RV, Mueller-Roeber B, Vincentz M. The role of bZIP transcription factors in green plant evolution: adaptive features emerging from four founder genes. PLoS One 2008; 3:e2944. [PMID: 18698409 PMCID: PMC2492810 DOI: 10.1371/journal.pone.0002944] [Citation(s) in RCA: 205] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2008] [Accepted: 07/22/2008] [Indexed: 01/07/2023] Open
Abstract
Background Transcription factors of the basic leucine zipper (bZIP) family control important processes in all eukaryotes. In plants, bZIPs are regulators of many central developmental and physiological processes including photomorphogenesis, leaf and seed formation, energy homeostasis, and abiotic and biotic stress responses. Here we performed a comprehensive phylogenetic analysis of bZIP genes from algae, mosses, ferns, gymnosperms and angiosperms. Methodology/Principal Findings We identified 13 groups of bZIP homologues in angiosperms, three more than known before, that represent 34 Possible Groups of Orthologues (PoGOs). The 34 PoGOs may correspond to the complete set of ancestral angiosperm bZIP genes that participated in the diversification of flowering plants. Homologous genes dedicated to seed-related processes and ABA-mediated stress responses originated in the common ancestor of seed plants, and three groups of homologues emerged in the angiosperm lineage, of which one group plays a role in optimizing the use of energy. Conclusions/Significance Our data suggest that the ancestor of green plants possessed four bZIP genes functionally involved in oxidative stress and unfolded protein responses that are bZIP-mediated processes in all eukaryotes, but also in light-dependent regulations. The four founder genes amplified and diverged significantly, generating traits that benefited the colonization of new environments.
Collapse
Affiliation(s)
- Luiz Gustavo Guedes Corrêa
- Centro de Biologia Molecular e Engenharia Genética, Departamento de Genética e Evolução, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, Brazil
- Department of Molecular Biology, University of Potsdam, Potsdam-Golm, Germany
- Cooperative Research Group, Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, Germany
| | - Diego Mauricio Riaño-Pachón
- Department of Molecular Biology, University of Potsdam, Potsdam-Golm, Germany
- GabiPD Team, Bioinformatics Group, Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, Germany
| | - Carlos Guerra Schrago
- Laboratório de Biodiversidade Molecular, Departamento de Genética, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Renato Vicentini dos Santos
- Centro de Biologia Molecular e Engenharia Genética, Departamento de Genética e Evolução, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, Brazil
| | - Bernd Mueller-Roeber
- Department of Molecular Biology, University of Potsdam, Potsdam-Golm, Germany
- Cooperative Research Group, Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, Germany
| | - Michel Vincentz
- Centro de Biologia Molecular e Engenharia Genética, Departamento de Genética e Evolução, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, Brazil
- * E-mail:
| |
Collapse
|
8
|
Menossi M, Silva-Filho MC, Vincentz M, Van-Sluys MA, Souza GM. Sugarcane functional genomics: gene discovery for agronomic trait development. INTERNATIONAL JOURNAL OF PLANT GENOMICS 2008; 2008:458732. [PMID: 18273390 PMCID: PMC2216073 DOI: 10.1155/2008/458732] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2007] [Accepted: 11/21/2007] [Indexed: 05/04/2023]
Abstract
Sugarcane is a highly productive crop used for centuries as the main source of sugar and recently to produce ethanol, a renewable bio-fuel energy source. There is increased interest in this crop due to the impending need to decrease fossil fuel usage. Sugarcane has a highly polyploid genome. Expressed sequence tag (EST) sequencing has significantly contributed to gene discovery and expression studies used to associate function with sugarcane genes. A significant amount of data exists on regulatory events controlling responses to herbivory, drought, and phosphate deficiency, which cause important constraints on yield and on endophytic bacteria, which are highly beneficial. The means to reduce drought, phosphate deficiency, and herbivory by the sugarcane borer have a negative impact on the environment. Improved tolerance for these constraints is being sought. Sugarcane's ability to accumulate sucrose up to 16% of its culm dry weight is a challenge for genetic manipulation. Genome-based technology such as cDNA microarray data indicates genes associated with sugar content that may be used to develop new varieties improved for sucrose content or for traits that restrict the expansion of the cultivated land. The genes can also be used as molecular markers of agronomic traits in traditional breeding programs.
Collapse
Affiliation(s)
- M. Menossi
- Departmento de Genetica e Evolução IB-Unicamp, Centro de Biologia Molecular e Engenharia Genética, Universidade Estadual de Campinas,
C.P. 6010, CEP 13083-970 Campinas, SP, Brazil
| | - M. C. Silva-Filho
- Departamento de Genética,
Escola Superior de Agricultura Luiz de Queiroz,
Universidade de São Paulo,
Av. Pádua Dias, 11, C.P. 83, 13400-970 Piracicaba, SP, Brazil
| | - M. Vincentz
- Departmento de Genetica e Evolução IB-Unicamp, Centro de Biologia Molecular e Engenharia Genética, Universidade Estadual de Campinas,
C.P. 6010, CEP 13083-970 Campinas, SP, Brazil
| | - M.-A. Van-Sluys
- Departamento de Botânica, Instituto de Biociências,
Universidade de São Paulo,
Rua do Matão 277, 05508-090 São Paulo, SP, Brazil
| | - G. M. Souza
- Departamento de Bioquímica,
Instituto de Química,
Universidade de São Paulo,
Av. Prof. Lineu Prestes 748, 05508-900 São Paulo, SP, Brazil
- *G. M. Souza:
| |
Collapse
|
9
|
Scortecci KC, Lima AFO, Carvalho FM, Silva UB, Agnez-Lima LF, Batistuzzo de Medeiros SR. A characterization of a MutM/Fpg ortholog in sugarcane--A monocot plant. Biochem Biophys Res Commun 2007; 361:1054-60. [PMID: 17686457 DOI: 10.1016/j.bbrc.2007.07.134] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2007] [Accepted: 07/25/2007] [Indexed: 11/20/2022]
Abstract
Plant genomic projects, such as Arabidopsis thaliana, rice, and maize, have provided excellent tools for comparative genome analysis on Base Excision DNA Repair (BER). A data-mining study associated with the SUCEST Genome project identified two EST clusters that shared homology to the bacteria MutM/Fpg protein. Comparative analyses presented here show a duplication of the MutM/Fpg gene in sugarcane, wheat and rice. The complementation assays show that both cDNAs from sugarcane are able to complement the Fpg and MutY-glycosylase deficiency in a double mutant Escherichia coli strain (CC104mutMmutY), reducing the spontaneous mutation frequency by 10-fold. The expression analyses by semi-quantitative RT-PCR show that these two mRNAs have different expression levels.
Collapse
Affiliation(s)
- Katia C Scortecci
- Departamento de Biologia Celular e Genética, Centro de Biociências, Universidade Federal do Rio Grande do Norte, Brazil.
| | | | | | | | | | | |
Collapse
|
10
|
Ujino-Ihara T, Kanamori H, Yamane H, Taguchi Y, Namiki N, Mukai Y, Yoshimura K, Tsumura Y. Comparative analysis of expressed sequence tags of conifers and angiosperms reveals sequences specifically conserved in conifers. PLANT MOLECULAR BIOLOGY 2005; 59:895-907. [PMID: 16307365 DOI: 10.1007/s11103-005-2080-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2005] [Accepted: 08/12/2005] [Indexed: 05/05/2023]
Abstract
To identify and characterize lineage-specific genes of conifers, two sets of ESTs (with 12791 and 5902 ESTs, representing 5373 and 3018 gene transcripts, respectively) were generated from the Cupressaceae species Cryptomeria japonica and Chamaecyparis obtusa. These transcripts were compared with non-redundant sets of genes generated from Pinaceae species, other gymnosperms and angiosperms. About 6% of tentative unique genes (Unigenes) of C. japonica and C. obtusa had homologs in other conifers but not angiosperms, and about 70% had apparent homologs in angiosperms. The calculated GC contents of orthologous genes showed that GC contents of coniferous genes are likely to be lower than those of angiosperms. Comparisons of the numbers of homologous genes in each species suggest that copy numbers of genes may be correlated between diverse seed plants. This correlation suggests that the multiplicity of such genes may have arisen before the divergence of gymnosperms and angiosperms.
Collapse
Affiliation(s)
- Tokuko Ujino-Ihara
- Genome Analysis Laboratory, Department of Forest Genetics, Forestry and Forest Products Research Institute, 1 Matsunosato, Tsukuba, 305-8687, Ibaraki, Japan. udino@ affrc.go.jp
| | | | | | | | | | | | | | | |
Collapse
|
11
|
de Araujo PG, Rossi M, de Jesus EM, Saccaro NL, Kajihara D, Massa R, de Felix JM, Drummond RD, Falco MC, Chabregas SM, Ulian EC, Menossi M, Van Sluys MA. Transcriptionally active transposable elements in recent hybrid sugarcane. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2005; 44:707-17. [PMID: 16297064 DOI: 10.1111/j.1365-313x.2005.02579.x] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Transposable elements (TEs) are considered to be important components of the maintenance and diversification of genomes. The recent increase in genome sequence data has created an opportunity to evaluate the impact of these active mobile elements on the evolution of plant genomes. Analysis of the sugarcane transcriptome identified 267 clones with significant similarity to previously described plant TEs. After full cDNA sequencing, 68 sugarcane TE clones were assigned to 11 families according to their best sequence alignment against a fully characterized element. Expression was further investigated through a combined study utilizing electronic Northerns, macroarray, transient and stable sugarcane transformation. Newly synthesized cDNA probes from flower, leaf roll, apical meristem and callus tissues confirm previous results. Callus was identified as the tissue with the highest number of TEs being expressed, revealing that tissue culture drastically induced the expression of different elements. No tissue-specific family was identified. Different representatives within a TE family displayed differential expression patterns, showing that each family presented expression in almost every tissue. Transformation experiments demonstrated that most Hopscotch clone-derived U3 regions are, indeed, active promoters, although under a strong transcriptional regulation. This is a large-scale study about the expression pattern of TEs and indicates that mobile genetic elements are transcriptionally active in the highly polyploid and complex sugarcane genome.
Collapse
|
12
|
Abstract
Expressed sequence tag (EST) data are a major contributor to the known plant sequence space. Organization of the data into non-redundant clusters representing tentative unique genes provides snapshots of the gene repertoires of a species. This chapter reviews availability of sequences and sequence analysis results and describes several resources and tools that should facilitate broad-based utilization of EST data for gene structure annotation, gene discovery, and comparative genomics.
Collapse
Affiliation(s)
- Qunfeng Dong
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, Iowa 50011-3260, USA
| | | | | | | | | |
Collapse
|
13
|
Vandepoele K, Van de Peer Y. Exploring the plant transcriptome through phylogenetic profiling. PLANT PHYSIOLOGY 2005; 137:31-42. [PMID: 15644465 PMCID: PMC548836 DOI: 10.1104/pp.104.054700] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2004] [Revised: 11/10/2004] [Accepted: 11/10/2004] [Indexed: 05/18/2023]
Abstract
Publicly available protein sequences represent only a small fraction of the full catalog of genes encoded by the genomes of different plants, such as green algae, mosses, gymnosperms, and angiosperms. By contrast, an enormous amount of expressed sequence tags (ESTs) exists for a wide variety of plant species, representing a substantial part of all transcribed plant genes. Integrating protein and EST sequences in comparative and evolutionary analyses is not straightforward because of the heterogeneous nature of both types of sequence data. By combining information from publicly available EST and protein sequences for 32 different plant species, we identified more than 250,000 plant proteins organized in more than 12,000 gene families. Approximately 60% of the proteins are absent from current sequence databases but provide important new information about plant gene families. Analysis of the distribution of gene families over different plant species through phylogenetic profiling reveals interesting insights into plant gene evolution, and identifies species- and lineage-specific gene families, orphan genes, and conserved core genes across the green plant lineage. We counted a similar number of approximately 9,500 gene families in monocotyledonous and eudicotyledonous plants and found strong evidence for the existence of at least 33,700 genes in rice (Oryza sativa). Interestingly, the larger number of genes in rice compared to Arabidopsis (Arabidopsis thaliana) can partially be explained by a larger amount of species-specific single-copy genes and species-specific gene families. In addition, a majority of large gene families, typically containing more than 50 genes, are bigger in rice than Arabidopsis, whereas the opposite seems true for small gene families.
Collapse
Affiliation(s)
- Klaas Vandepoele
- Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology , Ghent University, B-9052 Ghent, Belgium
| | | |
Collapse
|